Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Computer Engineering and Computer Science

Degree Program

Computer Science and Engineering, PhD

Committee Chair

Yampolskiy, Roman Vladimirovich

Committee Co-Chair (if applicable)

Desoky, Ahmed

Committee Member

Rouchka, Eric

Committee Member

Imam, Ibrahim

Committee Member

Naber, John


Data encryption (Computer science); Computer networks--Security measures; Nucleotide sequence--Data processing


Recent advances in genetic engineering have allowed the insertion of artificial DNA strands into the living cells of organisms. Several methods have been developed to insert information into a DNA sequence for the purpose of data storage, watermarking, or communication of secret messages. The ability to detect, extract, and decode messages from DNA is important for forensic data collection and for data security. We have developed a software toolkit that is able to detect the presence of a hidden message within a DNA sequence, extract that message, and then decode it. The toolkit is able to detect, extract, and decode messages that have been encoded with a variety of different coding schemes. The goal of this project is to enable our software toolkit to determine with which coding scheme a message has been encoded in DNA and then to decode it. The software package is able to decode messages that have been encoded with every variation of most of the coding schemes described in this document. The software toolkit has two different options for decoding that can be selected by the user. The first is a frequency analysis approach that is very commonly used in cryptanalysis. This approach is very fast, but is unable to decode messages shorter than 200 words accurately. The second option is using a Genetic Algorithm (GA) in combination with a Wisdom of Artificial Crowds (WoAC) technique. This approach is very time consuming, but can decode shorter messages with much higher accuracy.