Master in Bioinformatics, International University of Valencia (VIU)
Barcelona Biomedical Genomics Lab (BBGLab) https://bbglab.irbbarcelona.org/
Institut de Recerca Biomèdica de Barcelona (IRB Barcelona)
This repository contains the code to reproduce the data from the final master project entitled "Estudio de la evolución tumoral en un paciente pediátrico".
Student: Elisabet Figuerola Bou
Supervisor: Mònica Sánchez Guixé
Academic tutor: Ángela Riffo
Course: 2023-2024
The following figure indicates the workflow used to reproduce all the results.
A blood
sample and three tumor samples (tumor1 or melanoma
; tumor2 or sarcoma-primary
; tumor3 or sarcoma-metastasis
) of a pediatric patient were sequenced by Whole Genome Sequencing at depth of 30X and 120X, respectively, and obtained FASTQ files. This data was pre-processed with the nf-core sarek pipeline using GATK practices to obtain BAM files. Somatic mutations were called using the matched normal sample with the DNA analysis workflow from Hartwig Medical Foundation implemented in nf-core (nf-core oncoanalyser) in addition to the sarek pipeline, to obtain the VCF files. Germline variants were called with the sarek pipeline tool GATK haplotypecaller. Annotation of the variants was analysed with Variant Effect Predictor (VEP) tool from Ensembl. Creating all input files, reading and processing intermediate tables and compute graphical figures was performed with Jupyter notebooks, which are grouped in three main processes represented as a file icon (further details in the following lines).
Created with Biorender.com
1. Whole Genome Sequencing (WGS) Analysis wgs_analysis
-
Mapping of Sequencing Reads
reads_mapping
, contains data to run the Sarek pipeline and obtain the mapped reads in BAM file format. -
Variant Calling
variant_calling
, contains data to run nf-sareksarek
and nf-oncoanalyseroncoanalyser
pipelines and obtain the corresponding variant files in VCF format.
2. Variant Analysis variant_analysis
This section contains data related to the analysis of somatic and germline variants including filtering of variants, clonality and phylogenetic analyses.
2.1 Germline variants germline
:
Figure 9B
Figure 9C
2.2 Somatic variants somatic
:
-
consensus_mut
:Figure 10
Figure 11
Figure 12A
Figure 13 -
clonality
:Figure 14
Figure 15 -
phylogeny
:Figure 16
3. Variant Annotation variant_annotation
This section contains the code to run Ensembl VEP: split the variant tables into chromosome files, the code to writte a QMap jobs file to parallelize jobs and the final QMap files.
4. Variant Filtering variant_filtering
This section contains filtering of annotated variants in previous section and selection of variants that are protein damaging and cancer drivers from intOGen.
Figure 17
tfm_environment.yml
: This is the main conda environment used in this project in all jupiter notebooks as kernel.
Some analysis include other conda environments which can be found in the according directory.