Performance of DNA Sequences Compression on Multicores and GPUs using Adaptive-GenCodexMethod
DOI:
https://doi.org/10.26713/jims.v7i3.309Keywords:
DNA Sequence, Bandwidth, Throughput, Compression Ratio, Speedup, Code byteAbstract
The DNA sequences are huge in size and the databases are growing at an exponential rate. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The main reason for this is the invention of new species and increasing number of DNA profiles. The growth of the DNA affects the storage as well as bandwidth when these sequences need to be transferred. Applications such as DNA profiling, Real time DNA crime investigation require access to the DNA sequences in real time. The inherent property of DNA is that it contains many repeats which makes it highly compressible. However, the applications mentioned not only require good compression ratio but also needs faster compression. Multicores and GPUs can be used to perform the compression quickly. In this paper, we propose a new algorithm with a focus on the throughput along with the compression ratio. The algorithm scales well on GPUs and achieves a speedup of 11 on multicores and upto 23 on GPUs when run on M2070 Tesla card and upto 57 on K20 Kepler GPUs. We also extended this algorithm such that it adapts to the input sequence depending on the number of consecutive repeats and accordingly chooses theright algorithm which leads to a better compression.
Downloads
References
"DNA Sequence Compression using the Burrows-Wheeler Transform”, IEEE, 2002.
TC Bell et al., Prentice Hall, 1990.
A Grumbach & F Tahi.,"Compression of DNA sequences.”, Proceedings of the "IEEE Data Compression Conference, Snowbird, UT, USA.”,March 30April 2. 1993.
Xin Chen, Sam Kwong, Ming Li,"A Compression Algorithm for DNA Sequences.”,Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, Tokyo, Japan,April 8-11, 2000.
Xin Chen 1, Ming Li, Bin Ma, and John Tromp. "DNACompress: fast and effective DNA sequence compression”,volume 18, pages 1696-1698, 2002.
P. Raja Rajeswari & Dr Allam AppaRao, "DNABIT Compress Genome compression algorithm”, Book on Bioinformation, 2011, volume 5, pages 350-360.
P Raja Rajeswari & Dr Allam AppaRao,GENBIT Compress-Algorithm for repetitive and non repetitive DNA sequences, International Journal of Computer Science and Information Technology,2010 ,volume 2,pages 25-29.
Matsumoto,T., Sadakane,K. and Imai,H, "Biological sequence compression algorithms”, Genome Informatics Workshop, Universal Academy Press, 2002,43-52.
J. Ziv and A. Lempel,"A universal algorithm for sequential data compression ”,IEEE Trans. Inform. Theory,1977,volume -23, pages 337-343.
Ziv and Lempel,"Compression of individual sequences via variable-rate coding.”, IEEE Transactions on Information Theory, September 1978, volume 24, pages 530-536.
Ma,B., Tromp,J. and Li,M. "PatternHunter : faster and more sensitive homology search.”, Bioinformatics, 2002, pages 440-445.
Jackson, Peter, Book Chapter "Introduction To Expert Systems”, Addison Wesley, 1998 , 3rd edition,ISBN 978-0-201-87686-4.
Regina Barzilay and Daryl Mccullough and Owen Rambow and Jonathan Decristofaro and Tanya Korelsky and Benoit Lavoie and Cogentex Inc ,"A New Approach to Expert System Explanations”, In 9th International Workshop on Natural
Language Generation, 1998, pages 78-87.
Russell, Stuart J.; Norvig, Peter, "Artificial Intelligence: A modern approach”, Prentice Ha, 2003, 2nd Edition.
Nils J. Nilsson, "Artificial Intelligence: A New Synthesis”, Morgan Kaufmann Publishers, San Francisco, 1998. [16] McCorduck, Pamela, "Machines Who Think”, A. K. Peters, Ltd., 2004.
Koch, C.G., Isle, B.A. ; Butler, A.W.,"Intelligent user interface for expert systems applied to power plant maintenance and troubleshooting”, IEEE Transactions, March 1988, 3rd volume.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a CCAL that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.