Computational methods for collateral mutations analysis
Project goal: This project aims to develop and implement computational pipelines to identify and characterize collateral mutations and consists of three complementary objectives: (i) collection and analysis of NGS/ONT whole-genome sequencing datasets; and new software packages for analysis of (ii) collateral mutations and (iii) evolution of the integral characteristics of genomes.
Project description: Whole-genome sequencing is one of the main drivers of genetics research, producing a huge amount of digital data and creating challenging tasks for effective computer-aided data processing and analysis. This project is aimed to develop and implement computational pipelines to identify and characterize collateral mutations and consists of three complementary objectives: (i) collection and analysis of NGS/ONT whole-genome sequencing datasets; and new software packages for analysis of (ii) collateral mutations and (iii) evolution of the integral characteristics of genomes. This project will accent on cancer collateral mutations studies, especially skin cancers with high UV radiation, as well as on studies of mutational signatures, associated with patient’s DNA repair deficiency, especially Xeroderma Pigmentosum group. Our preliminary results show that clustered mutations provide reliable and intriguing information on the underlying biochemical mechanisms of mutational processes and may help to develop new visions on protection from the sunlight and recommendations for prediction, prevention, and treatment of skin cancer. In this project proposal we are raising new bioinformatics tasks on the analysis of cancer collateral mutations as well as effective data processing: from whole-genome sequencing data high-throughput processing to computational analysis of genomes integral characteristics evolution. This project continues a long-term collaboration between National Laboratory Astana, Nazarbayev University, Kazakhstan, and Gustave Roussy Cancer Campus, France, and starting research collaboration with School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland. On the level of the World-wide studies, we emphasize, that our group is the first one who (i) explores genome-wide mutational patterns in Xeroderma Pigmentosum group C patients beyond cutaneous malignancies (for non-skin cancers), (ii) develop original methodology and software packages to analyze and simulate cancer clustered mutations. First results were published in 2020 in high-impact journal Nature communications. Our bioinformatics software package is open for research community at GitHub code repository, https://github.com/genkvg/clusmut. The most important advantage of the proposed project is a close access to Gustave Roussy, a leading cancer center in Europe, with the unique opportunity to collect and analyze sequencing data for patients with rare genetic disorders in DNA repair. Cutting edge technologies, whole-genome sequencing with modified base detection, allow us to perform quantitative and qualitative assessments of the mutagenesis and other types of genomic instability at whole genome scale. Improved understanding of UV mutagenesis genome-wide on the base of effective computational data processing and analysis, which will be the result of this study, would affect the recommendations for the protection from the sunlight, for the prevention of skin cancer and sunburns. This will be particularly relevant for the patients with Xeroderma Pigmentosum group V, which have a 1’000-fold increased risk of skin cancer as compared to the general population. This study may provide insights for new recommendations for prediction, prevention, and treatment of skin cancer, as well as can open new visions on protection from the sunlight. This project is an interdisciplinary fundamental research; all results will be published in research journals with preferably open access option, commercialization of project results is not expected.
Project facilitators: Pi: B. Matkarimov,
Y. Baiken,
Z. Ramazanova,
A. Imashev,
A. Adilkhanov
B. Nygmetzhanov,
C. Shakenov,
D. Kontsevoi,
M. Kolyvanova
Project partners: Gustave Roussy Cancer Campus, France
Realisation period: 2024-2026
Expected results: "- 30+ complete datasets for human samples, sequenced both by Illumina and Oxford Nanopore Technologies with raw data;
- Oxford Nanopore sequencing dataset with base modifications annotations on raw data.
- Transformer architecture model for Oxford nanopore base calling fine-tuned for modified base calling;
- Processing and analysis of 200+ samples from external datasets and 30+ samples, sequenced in Kazakhstan and partner sites; - Signatures of collateral mutations for skin cancers;
- Estimations of mutations enrichment near putative and observed UV lesions;
- Statistics of regular n-mer patterns on 20+ model organism’s genomes;
- Statistics and classification of observed mutations for processed whole-genome datasets;
- Mathematical model of the genome’s integral characteristics evolution."
Methodology: Data collection and analysis, programming, development and implementation of algorithms and computational methods for analysis and simulation of collateral mutations
Contacts: Bakhyt Matkarimov, bmatkarimov@nu.edu.kz
Project description: Whole-genome sequencing is one of the main drivers of genetics research, producing a huge amount of digital data and creating challenging tasks for effective computer-aided data processing and analysis. This project is aimed to develop and implement computational pipelines to identify and characterize collateral mutations and consists of three complementary objectives: (i) collection and analysis of NGS/ONT whole-genome sequencing datasets; and new software packages for analysis of (ii) collateral mutations and (iii) evolution of the integral characteristics of genomes. This project will accent on cancer collateral mutations studies, especially skin cancers with high UV radiation, as well as on studies of mutational signatures, associated with patient’s DNA repair deficiency, especially Xeroderma Pigmentosum group. Our preliminary results show that clustered mutations provide reliable and intriguing information on the underlying biochemical mechanisms of mutational processes and may help to develop new visions on protection from the sunlight and recommendations for prediction, prevention, and treatment of skin cancer. In this project proposal we are raising new bioinformatics tasks on the analysis of cancer collateral mutations as well as effective data processing: from whole-genome sequencing data high-throughput processing to computational analysis of genomes integral characteristics evolution. This project continues a long-term collaboration between National Laboratory Astana, Nazarbayev University, Kazakhstan, and Gustave Roussy Cancer Campus, France, and starting research collaboration with School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland. On the level of the World-wide studies, we emphasize, that our group is the first one who (i) explores genome-wide mutational patterns in Xeroderma Pigmentosum group C patients beyond cutaneous malignancies (for non-skin cancers), (ii) develop original methodology and software packages to analyze and simulate cancer clustered mutations. First results were published in 2020 in high-impact journal Nature communications. Our bioinformatics software package is open for research community at GitHub code repository, https://github.com/genkvg/clusmut. The most important advantage of the proposed project is a close access to Gustave Roussy, a leading cancer center in Europe, with the unique opportunity to collect and analyze sequencing data for patients with rare genetic disorders in DNA repair. Cutting edge technologies, whole-genome sequencing with modified base detection, allow us to perform quantitative and qualitative assessments of the mutagenesis and other types of genomic instability at whole genome scale. Improved understanding of UV mutagenesis genome-wide on the base of effective computational data processing and analysis, which will be the result of this study, would affect the recommendations for the protection from the sunlight, for the prevention of skin cancer and sunburns. This will be particularly relevant for the patients with Xeroderma Pigmentosum group V, which have a 1’000-fold increased risk of skin cancer as compared to the general population. This study may provide insights for new recommendations for prediction, prevention, and treatment of skin cancer, as well as can open new visions on protection from the sunlight. This project is an interdisciplinary fundamental research; all results will be published in research journals with preferably open access option, commercialization of project results is not expected.
Project facilitators: Pi: B. Matkarimov,
Y. Baiken,
Z. Ramazanova,
A. Imashev,
A. Adilkhanov
B. Nygmetzhanov,
C. Shakenov,
D. Kontsevoi,
M. Kolyvanova
Project partners: Gustave Roussy Cancer Campus, France
Realisation period: 2024-2026
Expected results: "- 30+ complete datasets for human samples, sequenced both by Illumina and Oxford Nanopore Technologies with raw data;
- Oxford Nanopore sequencing dataset with base modifications annotations on raw data.
- Transformer architecture model for Oxford nanopore base calling fine-tuned for modified base calling;
- Processing and analysis of 200+ samples from external datasets and 30+ samples, sequenced in Kazakhstan and partner sites; - Signatures of collateral mutations for skin cancers;
- Estimations of mutations enrichment near putative and observed UV lesions;
- Statistics of regular n-mer patterns on 20+ model organism’s genomes;
- Statistics and classification of observed mutations for processed whole-genome datasets;
- Mathematical model of the genome’s integral characteristics evolution."
Methodology: Data collection and analysis, programming, development and implementation of algorithms and computational methods for analysis and simulation of collateral mutations
Contacts: Bakhyt Matkarimov, bmatkarimov@nu.edu.kz