Skip to content
/ master Public

Repository of the code used in the Final Master Project

License

Notifications You must be signed in to change notification settings

efigb/master

Repository files navigation

FINAL MASTER PROJECT

Master in Bioinformatics, International University of Valencia (VIU)
Barcelona Biomedical Genomics Lab (BBGLab) https://bbglab.irbbarcelona.org/
Institut de Recerca Biomèdica de Barcelona (IRB Barcelona)

This repository contains the code to reproduce the data from the final master project entitled "Estudio de la evolución tumoral en un paciente pediátrico".
Student: Elisabet Figuerola Bou
Supervisor: Mònica Sánchez Guixé
Academic tutor: Ángela Riffo
Course: 2023-2024

The following figure indicates the workflow used to reproduce all the results. A blood sample and three tumor samples (tumor1 or melanoma; tumor2 or sarcoma-primary; tumor3 or sarcoma-metastasis) of a pediatric patient were sequenced by Whole Genome Sequencing at depth of 30X and 120X, respectively, and obtained FASTQ files. This data was pre-processed with the nf-core sarek pipeline using GATK practices to obtain BAM files. Somatic mutations were called using the matched normal sample with the DNA analysis workflow from Hartwig Medical Foundation implemented in nf-core (nf-core oncoanalyser) in addition to the sarek pipeline, to obtain the VCF files. Germline variants were called with the sarek pipeline tool GATK haplotypecaller. Annotation of the variants was analysed with Variant Effect Predictor (VEP) tool from Ensembl. Creating all input files, reading and processing intermediate tables and compute graphical figures was performed with Jupyter notebooks, which are grouped in three main processes represented as a file icon (further details in the following lines).

IMAGE
Created with Biorender.com


Organization of the repository and location of Figures:

1. Whole Genome Sequencing (WGS) Analysis wgs_analysis

  • Mapping of Sequencing Reads reads_mapping, contains data to run the Sarek pipeline and obtain the mapped reads in BAM file format.

  • Variant Calling variant_calling, contains data to run nf-sarek sarek and nf-oncoanalyser oncoanalyser pipelines and obtain the corresponding variant files in VCF format.

2. Variant Analysis variant_analysis

This section contains data related to the analysis of somatic and germline variants including filtering of variants, clonality and phylogenetic analyses.

2.1 Germline variants germline:
Figure 9B
Figure 9C

2.2 Somatic variants somatic:

  • consensus_mut:

    Figure 10
    Figure 11
    Figure 12A
    Figure 13

  • clonality:

    Figure 14
    Figure 15

  • phylogeny:

    Figure 16

3. Variant Annotation variant_annotation

This section contains the code to run Ensembl VEP: split the variant tables into chromosome files, the code to writte a QMap jobs file to parallelize jobs and the final QMap files.

4. Variant Filtering variant_filtering

This section contains filtering of annotated variants in previous section and selection of variants that are protein damaging and cancer drivers from intOGen.

Figure 17

Conda environment:

tfm_environment.yml: This is the main conda environment used in this project in all jupiter notebooks as kernel. Some analysis include other conda environments which can be found in the according directory.

About

Repository of the code used in the Final Master Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages