DNA STRUCTURE PREDICTION

DNA Sequence of 30 bases

GGCCGTGGTGCCCATTGTTCGTCGATCGGGTGATTGCGCT

Minimum free energy secondary structure corresponding to the sequence above predicted by the algorithm at a temperature of 37.0° C

Validation accuracy of 99.6% for predicting secondary structure was reached after training the Deep Neural Network for 1600 epochs

DNA STRUCTURE PREDICTION

Context

Uniquely programmable DNA strands, that can be parallelly identified without cross-talk, are at the core of technologies that rely on amplification such as high throughput drug screening, diagnostics, DNA data storage, and nanostructure fabrication. However, with an increase in the DNA sequence length, designing a strand with desired properties grows dramatically complex due to a massive increase in number of possible interactions. Current state-of-the-art dynamic programming algorithms such as M-fold, V-fold, NUPACK, etc. are impractical for scaling as they escalate in O(n^3) the computing time and design cost with increase in sequence length and have proved challenging to concurrently design against crosstalk. There is an opportunity to apply the advances in the field of Machine Learning to create a new tool to enable faster, accurate secondary structure prediction and therefore, facile design of DNA complexes.

Aim

Efficient and accurate prediction of DNA secondary structures using Machine Learning

Results

I developed an algorithm that could predict the secondary structure and energy of DNA sequences with an accuracy of >99.6% and with an improvement of 3 orders of magnitude in computing time compared to NUPACK for sequences of lengths 20, 30 and 40 bases and for G-C contents of 50%, 60% and 70%.

CONTRIBUTIONS

Machine Learning

Programming

Design

Data analysis

Approach

This research study is still under progress and more advances are coming soon. If you are interested to know more, please contact me.

This research has been conducted under the supervision of my advisor Dr. Ashwin Gopinath.

Presented talk at DNA26 International Conference on DNA Computing and Molecular Programming - Sep, 2020

Go back to PROJECTS