DNA Sequence of 30 bases
GGCCGTGGTGCCCATTGTTCGTCGATCGGGTGATTGCGCT
Minimum free energy secondary structure corresponding to the sequence above predicted by the algorithm at a temperature of 37.0° C

Validation accuracy of 99.6% for predicting secondary structure was reached after training the Deep Neural Network for 1600 epochs

DNA STRUCTURE PREDICTION
​
​
​
​
​
Context
Uniquely programmable DNA strands, that can be parallelly identified without cross-talk, are at the core of technologies that rely on amplification such as high throughput drug screening, diagnostics, DNA data storage, and nanostructure fabrication. However, with an increase in the DNA sequence length, designing a strand with desired properties grows dramatically complex due to a massive increase in number of possible interactions. Current state-of-the-art dynamic programming algorithms such as M-fold, V-fold, NUPACK, etc. are impractical for scaling as they escalate in O(n^3) the computing time and design cost with increase in sequence length and have proved challenging to concurrently design against crosstalk. There is an opportunity to apply the advances in the field of Machine Learning to create a new tool to enable faster, accurate secondary structure prediction and therefore, facile design of DNA complexes.
​
Aim
Efficient and accurate prediction of DNA secondary structures using Machine Learning
​
Results
I developed an algorithm that could predict the secondary structure and energy of DNA sequences with an accuracy of >99.6% and with an improvement of 3 orders of magnitude in computing time compared to NUPACK for sequences of lengths 20, 30 and 40 bases and for G-C contents of 50%, 60% and 70%.
​
CONTRIBUTIONS
​
Machine Learning
Programming
Design
Data analysis
​
​
Approach
This research study is still under progress and more advances are coming soon. If you are interested to know more, please contact me.
​
​
This research has been conducted under the supervision of my advisor Dr. Ashwin Gopinath.