Advanced studying the formation of stable secondary DNA structures as applied to DNA technologies

Advanced studying the formation of stable secondary DNA structures as applied to DNA technologies”

IRN AP08855353

Research in the field of natural sciences. Fundamental research in biology of animals, plants and microorganisms.

Start and end date of the project: 01.10.2020-31.12.2020. Duration – 27 months

Keywords that characterize the industry and the direction of the application for the selection of expert panel: DNA structure, DNA amplification or detection technology, DNA and RNA hybridization, machine learning algorithm, bioinformatics software

 

Abstract

DNA oligonucleotides are essential components of a high number of technologies in molecular biology which are based on DNA and RNA hybridization. Such DNA hybridization-based experimental methods as multiplex polymerase chain reaction, microarray analysis, NanoString multiplex analysis, next-generation targeted sequencing, and similar approaches require the use of complex mixtures of oligonucleotides (primers and probes) in one tube. Single-stranded DNA molecules also tend to bind to themselves. The probability of such nonspecific binding increases depending on the degree of analysis complexity. Moreover, there is a necessity to revise existing approaches to the development of certain hybridization probes and primers for existing DNA detection technologies. First of all, it is necessary for such technologies as standard or quantitative PCR with various DNA amplification methods for the detection of a specific amplicon using hybridization probes, as well as for isothermal DNA amplification methods that combine many nested primers and fluorescent probes. Revision is needed to accurately determine the melting temperature for both complementary DNA duplexes and DNA duplexes with the presence of non-complementary bases through the use of machine learning methods.

The main objective of the project is to conduct study of stable secondary structures of nucleic acids based on experimental data on DNA/DNA hybridization for complex single-stranded DNA mixtures using a machine learning approach. The development of bioinformatics tools that implement machine learning approaches for calculating the basic thermodynamics of secondary DNA structures as applied to DNA detection and amplification technologies.

The algorithms developed by us will enable to explain the observed artifacts in mega-multiplex DNA amplification and predict thermodynamic prognoses regarding the melting of DNA duplexes. Software to be developed will enable to detect nucleic acid interactions for the design of individual oligonucleotides and their mixtures, which are characterized by the weakest possible cross-interactions. Model experiments in DNA hybridization and determination of melting temperatures using synthetic DNA structures will be carried out. Studies of stable secondary nucleic acid structures will be carried out based on experimental data on DNA/DNA duplexes for complex single-stranded DNA mixtures. Bioinformatics tools implementing a machine learning approach for calculating the basic thermodynamics of secondary DNA structures as applied to DNA detection and amplification will be developed. Algorithms, which implement machine learning approaches, will be developed for the design of PCR primers, probes, microchips. Online applications will be installed on the server. At least 3 articles will be published in peer-reviewed scientific journals from academic publishing company (Springer Nature, Cell Press, PLOS, PeerJ, MDPI, Oxford Press, Frontiers and Elsevier), included in the Q1-Q3 quartile in the Web of Science database with a CiteScore percentile in the Scopus database not less than 50.

Composition of the project research team

Principal Investigator – Ruslan, Kalendar, Professor (Biology), Associate Professor of Genetics (University of Helsinki), Leading Researcher, Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Nazarbayev University.

ORCID:  http://orcid.org/0000-0003-3986-2460

Scopus:  http://www.scopus.com/authid/detail.url?authorId=6602789279

Publons: https://publons.com/researcher/254291/ruslan-kalendar/

 

Asset Daniyarov, Junior Researcher at Laboratory of Bioinformatics and Systems Biology "National Laboratory Astana" (Nazarbayev University), young specialist in the field of bioinformatics and Python programming.

ORCID:  https://orcid.org/0000-0003-3886-718X

Publons: https://publons.com/researcher/1629505/asset-daniyarov/

Scopus:  https://www.scopus.com/authid/detail.uri?authorId=57204420341

Achieved results

Research has been carried out to determine the thermodynamic parameters for secondary structures of DNA, the calculation of basic thermodynamics and melting temperatures for these structures (including those with the presence of complementary mismatches) using machine learning approaches. We have developed sequences and synthesized oligonucleotides for studying the kinetics of hybridization for calculating basic thermodynamics and melting temperatures. A one-dimensional recurrent neural network was used to analyze the sequences of short oligonucleotides. As a result of these studies, we prepared a computer code for the development of sets of oligonucleotides for the specific detection of a specific sequence, as well as allelic variants of single nucleotide polymorphism (SNP), by the AS-PCR method (Allele-Specific PCR), for example, the KASP technology (Kompetitive Allele Specific PCR), based on the genotyping of a biallelic SNP variant (LGC Biosearch), insertions or deletions (Indels) at a specific locus. Additionally, we prepared a computer code for the development of sets of primer extension technology for the SNaPshot multiplex system for multiplex SNP genotyping (Thermo Fisher Scientific). The computer code developed by us, implemented in the FastPCR program environment for solving the problems of developing PCR-based genotyping analyzes for the detection of SNP and InDels. To increase the specificity of the response and enhance the discriminating ability of detecting SNP alleles, the SNP site is located on the penultimate base of each allele-specific primer. As an alternative set of universal KASP reporter system based on FRET cassettes, we have developed a more advanced solution, different from the manufacturer LGC Biosearch. This solution includes a unique and universal anti-primer with 3'-end Dark Quencher, complementary to the 5'-ends of the tails of allele-specific primers. The unique tail sequence of the end of the allele-specific primers forms from the complementary sequence a universal anti-primer with a 3'-end Dark Quencher and a short unique sequence of 6 nucleotides located between the tail and allele-specific sequences.

As a result of this work, a new computer code was developed for the development of kits for the specific detection of a specific sequence, as well as allelic SNP variants, by quantitative PCR using TaqMan and MGB probes, and microarray hybridization.

The computer code is implemented in software - FastPCR, implemented in Java, which is freely available at http://primerdigital.com/tools/.

During the current period of the project implementation, 6 peer-reviewed articles were published in international journals with an impact factor, indicating the project number (AP08855353), including in which the project manager acts as an author for correspondence, included in 1 (first), 2 (second) quartiles in the Web of Science database and with a high CiteScore percentile in the Scopus database.

List of publications

  1. Belyayev A., Jandova M., Josefiova J., Kalendar R., Mahelka V., Mandak B., Krak K. The major satellite DNA families of the diploid Chenopodium album aggregate species: Arguments for and against the "library hypothesis" // PLoS One. ‒ 2020. ‒ T. 15, № 10. ‒ C. e0241206. doi:10.1371/journal.pone.0241206 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7591062/

(Scopus CiteScore2020=5.3 (92st percentile), SJR Q1; Web of Science IF2020=3.24, Q2)

  1. Erper I., Ozer G., Kalendar R., Avci S., Yildirim E., Alkan M., Turkkan M. Genetic diversity and pathogenicity of Rhizoctonia isolates associated with red cabbage in Samsun (Turkey) // J Fungi (Basel). 2021. Т. 7. № 3. doi:10.3390/jof7030234. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8004240/ (WoS IF2020=5.5 Q1; Scopus CiteScore2020=5.5, SJR2020=1.702, 88th percentile; SCR=Q1)
  2. Kalendar R., Shustov A., Schulman A. Palindromic sequence-targeted (PST) PCR, version 2: an advanced method for high-throughput targeted gene characterization and transposon display // Frontiers in Plant Science. 2021. Т. 12. — C. 691940. doi:10.3389/fpls.2021.691940. https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC8258406/ WoS IF2020=5.753 Q1; Scopus CiteScore2020=8.2 SCR Q1 95th percentile
  3. Kalendar R., Sabot F., Rodriguez F., Alix K., Natali L., Karlov G.I. Editorial: Mobile Elements and Plant Genome Evolution, Comparative Analyzes and Computational Tools // Frontiers in Plant Science. 2021. Т. 12. — C. 735134. doi:10.3389/fpls.2021.735134. https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC8500305/ (WoS IF2020=5.753 Q1; Scopus CiteScore2020=8.2 SCR Q1 95th percentile)
  4. Kalendar R, Baidyussen A, Serikbay D, Zotova L, Khassanova G, Kuzbakova M, Kurishbayev A, Jatayev S, Hu Y-G, Schramm C, Anderson PA, Jenkins CLD, Soole KL, Shavrukov Y. Modified ‘Allele-specific qPCR’ method for SNP genotyping based on FRET // Frontiers in Plant Science. 2021. doi:10.3389/fpls.2021.747886. https://www.frontiersin.org/articles/10.3389/fpls.2021.747886/abstract (WoS IF2020=5.753 Q1; Scopus CiteScore2020=8.2 SCR Q1 95th percentile)
  5. Khapilina O., Raiser O., Danilova A., Shevtsov V., Turzhanova A., Kalendar R. DNA profiling and assessment of genetic diversity of relict species Allium altaicum on the territory of Altai // PeerJ. 2021. Т. 9. — C. e10674. doi:10.7717/peerj.10674. https://peerj.com/articles/10674/ (WoS IF2020=2.379 Q2; Scopus CiteScore2020=3.8, SJR2020=0.927, 83th percentile; SCR=Q1)
  6. Kalendar R., Kospanova D., Schulman A. Transposon-based tagging in silico using FastPCR software // Methods in Molecular Biology. 2021. Т. 2250. — C. 245-256. doi:10.1007/978-1-0716-1134-0_23. https://pubmed.ncbi.nlm.nih.gov/33900610/ (Scopus CiteScore2020=2.2, SJR2020=0.711, 24th percentile; SCR=Q3)
  7. Kalendar R. A guide to using FASTPCR software for PCR, in silico PCR, and oligonucleotide analysis. Methods in molecular biology. 2022. Т. 2392. — C. 223-243. doi:10.1007/978-1-0716-1799-1_16 https://pubmed.ncbi.nlm.nih.gov/34773626/ (Scopus CiteScore2020=2.2, SJR2020=0.711, 24th percentile; SCR=Q3)