Empirical CPU–Memory Benchmarking for Long-Read Genome Assembly Resource Optimization in High-Performance Computing
DOI:
https://doi.org/10.63158/journalisi.v8i3.1602Keywords:
High Performance Computing, Slurm workload manager, long-read assembly benchmarking, BUSCO-based completeness assessment, resource optimizationAbstract
Efficient resource utilization is a critical challenge in High-Performance Computing (HPC) environments, particularly for long-read genome assembly workflows that require substantial computational resources. This study presents an empirical benchmarking framework to optimize resource allocation for de novo long-read genome assembly of Acacia crassicarpa. Nine experimental scenarios were evaluated by varying CPU cores (32, 48, and 64) and memory allocations (32 GB, 64 GB, and 128 GB) managed via the Slurm workload manager. Performance was assessed based on execution time, assembly continuity (N50), and biological completeness using BUSCO. The results demonstrate that CPU scalability significantly impacts performance, reducing execution time by up to 49% when scaling from 32 to 64 cores. Conversely, increasing memory allocation beyond 64 GB yielded no significant improvements in assembly quality, highlighting the risks of resource over-provisioning. Scenario 2 (64 CPU cores and 64 GB RAM) was selected as the optimal configuration because it balanced runtime, N50 continuity, memory efficiency, and BUSCO completeness, not because it produced the absolute shortest runtime. Under Scenario 2, the workflow achieved an average runtime of 59 hours 39 minutes 40 seconds, an N50 value of 7.8 Mb, and a genome completeness score of 99.8%. These findings provide practical guidance for resource planning and workload scheduling in shared HPC-based genomic workflows.
Downloads
References
[1] A. E. Ahmed et al., “Design considerations for workflow management systems use in production genomics research and the clinic,” Sci. Rep., vol. 11, no. 1, 2021, doi: 10.1038/s41598-021-99288-8.
[2] J. Mariette et al., “Jflow: A workflow management system for web applications,” Bioinformatics, vol. 32, no. 3, pp. 456–458, 2016, doi: 10.1093/bioinformatics/btv589.
[3] N. Mujkanovic, J. J. Durillo, N. Hammer, and T. Müller, “Survey of adaptive containerization architectures for HPC,” in ACM Int. Conf. Proc. Ser., Association for Computing Machinery, 2023, pp. 165–176. doi: 10.1145/3624062.3624588.
[4] J. Rybicki and C. Böttcher, “Data Logistics Service in eFlows4HPC,” in ICT Electron. Conv., MIPRO - Proc., Babic S., Car Z., Cicin-Sain M., Cisic D., Ergovic P., Grbac T.G., Gradisnik V., Gros S., Jokic A., Jovic A., Jurekovic D., Katulic T., Koricic M., Mornar V., Petrovic J., Skala K., Skvorc D., Sruk V., Svaco M., Tijan E., Vrcek N., and Vrdoljak B., Eds., Institute of Electrical and Electronics Engineers Inc., 2024, pp. 892–897. doi: 10.1109/MIPRO60963.2024.10569664.
[5] M. Jiang, C. Bu, J. Zeng, Z. Du, and J. Xiao, “Applications and challenges of high performance computing in genomics,” CCF Trans. High Perform. Comput., vol. 3, no. 4, pp. 344–352, 2021, doi: 10.1007/s42514-021-00081-w.
[6] J. I. Diaz-Riaño and J. Duitama, “Current Progress in Phased Genome Assembly from Long-Read DNA Sequencing Data,” in Methods Mol. Biol., vol. 2955, Humana Press Inc., 2025, pp. 51–70. doi: 10.1007/978-1-0716-4702-8_4.
[7] P. Morisse, T. Lecroq, and A. Lefebvre, “Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph,” Bioinformatics, vol. 34, no. 24, pp. 4213–4222, 2018, doi: 10.1093/bioinformatics/bty521.
[8] O. Bhowmik, T. Rahman, and A. Kalyanaraman, “Maptcha: an efficient parallel workflow for hybrid genome scaffolding,” BMC Bioinformatics, vol. 25, no. 1, 2024, doi: 10.1186/s12859-024-05878-4.
[9] L. Obinu, T. Booth, H. De Weerd, U. Trivedi, and A. Porceddu, “Colora: a Snakemake workflow for complete chromosome-scale de novo genome assembly,” Bioinformatics, vol. 41, no. 5, 2025, doi: 10.1093/bioinformatics/btaf175.
[10] V. Sanz, A. Pousa, M. Naiouf, and A. De Giusti, “A Fast and Scalable Genomic Data Compressor for Multicore Clusters,” in Lect. Notes Comput. Sci., Lees M.H., Cai W., Cheong S.A., Su Y., Abramson D., Dongarra J.J., and Sloot P.M.A., Eds., Springer Science and Business Media Deutschland GmbH, 2025, pp. 180–188. doi: 10.1007/978-3-031-97635-3_22.
[11] S. S. Sunkara, E. Abeysinghe, C. Langin, S. Pamidighantam, M. Pierce, and S. Marru, “Simplifying access to campus resources at Southern Illinois University with a science gateway,” in ACM Int. Conf. Proc. Ser., Association for Computing Machinery, 2018. doi: 10.1145/3219104.3229252.
[12] A. Tyryshkina, N. Coraor, and A. Nekrutenko, “Predicting runtimes of bioinformatics tools based on historical data: Five years of Galaxy usage,” Bioinformatics, vol. 35, no. 18, pp. 3453–3460, 2019, doi: 10.1093/bioinformatics/btz054.
[13] J. Bader, F. Lehmann, L. Thamsen, U. Leser, and O. Kao, “Lotaru: Locally predicting workflow task runtimes for resource management on heterogeneous infrastructures,” in Future Gener Comput Syst, Elsevier B.V., 2024, pp. 171–185. doi: 10.1016/j.future.2023.08.022.
[14] J. Bader, F. Lehmann, L. Thamsen, J. Will, U. Leser, and O. Kao, “Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters,” in ACM Int. Conf. Proc. Ser., Pourabbas E., Zhou Y., Li Y., and Yang B., Eds., Association for Computing Machinery, 2022. doi: 10.1145/3538712.3538739.
[15] J. He and X. Liu, “Hybrid Teaching-Learning-Based Optimization for Workflow Scheduling in Cloud Environment,” IEEE Access, vol. 11, pp. 100755–100768, 2023, doi: 10.1109/ACCESS.2023.3314735.
[16] M. Arif, A. Maurya, M. M. Rafique, D. S. Nikolopoulos, and A. R. Butt, “Application-Attuned Memory Management for Containerized HPC Workflows,” in Proc. - IEEE Int. Parallel Distrib. Process. Symp., IPDPS, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 114–127. doi: 10.1109/IPDPS57955.2024.00019.
[17] S. Bergman, O. Mutlu, W. Yong, K. Huang, and J. Zhang, “Composable Storage Servers: A Storage Paradigm for Disaggregated Systems,” in Int. Conf. Netw., Archit. Storage, NAS, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/NAS63802.2024.10781363.
[18] Y. Wang, Y. Zhao, A. Bollas, Y. Wang, and K. F. Au, “Nanopore sequencing technology, bioinformatics and applications,” Nat. Biotechnol., vol. 39, no. 11, pp. 1348–1365, Nov. 2021, doi: 10.1038/s41587-021-01108-x.
[19] J. Hu et al., “An efficient error correction and accurate assembly tool for noisy long reads,” Bioinformatics, preprint, Mar. 2023. doi: 10.1101/2023.03.09.531669.
[20] H. Cheng, M. Asri, J. Lucas, S. Koren, and H. Li, “Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph,” Nat. Methods, vol. 21, no. 6, pp. 967–970, Jun. 2024, doi: 10.1038/s41592-024-02269-8.
[21] I. Massaro, R. S. Poethig, N. R. Sinha, and A. R. Leichty, “Chromosome-level genome of the transformable northern wattle, Acacia crassicarpa,” G3 Genes Genomes Genet., vol. 14, no. 3, p. jkad284, Mar. 2024, doi: 10.1093/g3journal/jkad284.
[22] X. Yue, Y. Yu, W. Gao, S. Chen, Z. Weng, and G. Ye, “Complete chloroplast genome sequence of Acacia crassicarpa (Fabaceae),” Mitochondrial DNA Part B, vol. 6, no. 8, pp. 2249–2250, Aug. 2021, doi: 10.1080/23802359.2021.1944365.
[23] J. Hu et al., “NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads,” Genome Biol., vol. 25, no. 1, p. 107, Apr. 2024, doi: 10.1186/s13059-024-03252-4.
[24] R. R. Wick and K. E. Holt, “Benchmarking of long-read assemblers for prokaryote whole genome sequencing.” Feb. 01, 2021.
[25] T. Wirahman et al., “Performance Evaluation of NAS Parallel and High-Performance Conjugate Gradient Benchmarks in Mahameru,” J. Online Inform., vol. 10, no. 2, pp. 248–259, Aug. 2025, doi: 10.15575/join.v10i2.1557.
[26] H. Li and R. Durbin, “Genome assembly in the telomere-to-telomere era,” Nat. Rev. Genet., vol. 25, no. 9, pp. 658–670, Sep. 2024, doi: 10.1038/s41576-024-00718-w.
[27] W. De Coster and R. Rademakers, “NanoPack2: population-scale evaluation of long-read sequencing data,” Bioinformatics, vol. 39, no. 5, p. btad311, May 2023, doi: 10.1093/bioinformatics/btad311.
[28] A. A. Jauhal and R. D. Newcomb, “Assessing genome assembly quality prior to downstream analysis: N50 versus BUSCO,” Mol. Ecol. Resour., vol. 21, no. 5, pp. 1416–1421, Jul. 2021, doi: 10.1111/1755-0998.13364.
[29] M. Manni, M. R. Berkeley, M. Seppey, and E. M. Zdobnov, “BUSCO: Assessing Genomic Data Quality and Beyond,” Curr. Protoc., vol. 1, no. 12, Dec. 2021, doi: 10.1002/cpz1.323.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














