Skip to content

Commit

Permalink
Merge pull request #564 from genomic-medicine-sweden/develop
Browse files Browse the repository at this point in the history
chore: dev to master
  • Loading branch information
monikaBrandt authored Jan 20, 2025
2 parents 14656e2 + 8dc9563 commit e1b04e9
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 16 deletions.
6 changes: 3 additions & 3 deletions docs/dna_cnvs.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ CNV regions that overlap with clinically relevant genes for amplifications ([`cn
</table>

## CNV filtering
Filtering the CNV amplifications and deletions are performed by the [filtering hydra-genetics module](https://filtering.readthedocs.io/en/latest/).
Filtering the CNV amplifications and deletions are performed by the [filtering hydra-genetics module](https://hydra-genetics-filtering.readthedocs.io/en/latest/).

### Amplification filtering
Genes and filtering criteria specified in `config_hard_filter_cnv_amp.yaml` are listed below:
Expand Down Expand Up @@ -284,7 +284,7 @@ For more information, see the [hydra-genetics/reports documentation](https://hyd


## Germline vcf
The germline vcf used by CNVkit, Jumble, and the CNV html report is based on the [VEP annotated vcf](dna_snv_indels.md#vep) file from the SNV and INDEL calling. Annotated vcfs are hard filtered first by removing black listed regions with noisy germline VAFs in normal samples and then filtered by a number of filtering criteria described below. See the [filtering hydra-genetics module](https://filtering.readthedocs.io/en/latest/) for additional information.
The germline vcf used by CNVkit, Jumble, and the CNV html report is based on the [VEP annotated vcf](dna_snv_indels.md#vep) file from the SNV and INDEL calling. Annotated vcfs are hard filtered first by removing black listed regions with noisy germline VAFs in normal samples and then filtered by a number of filtering criteria described below. See the [filtering hydra-genetics module](https://hydra-genetics-filtering.readthedocs.io/en/latest/) for additional information.

### Exclude exonic regions
Use **[bcftools filter -T](https://samtools.github.io/bcftools/bcftools.html)** v1.15 to exclude variants overlapping blacklisted regions defined in a bed file.
Expand All @@ -295,7 +295,7 @@ Use **[bcftools filter -T](https://samtools.github.io/bcftools/bcftools.html)**
* [Bed file](references.md#bcftools_filter_exclude_region) with blacklisted regions

### Filter vcf
The germline vcf file are filtered using the **[hydra-genetics filtering](https://filtering.readthedocs.io/en/latest/)** functionality included in v0.15.0.
The germline vcf file are filtered using the **[hydra-genetics filtering](https://hydra-genetics-filtering.readthedocs.io/en/latest/)** functionality included in v0.15.0.

### Configuration
The filters are specified in the config file `config_hard_filter_germline.yaml` and consists of the following filters:
Expand Down
6 changes: 3 additions & 3 deletions docs/dna_snv_indels.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SNV and INDEL calling, annotation and filtering
See the [snv_indels hydra-genetics module](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) documentation for more details on the softwares for variant calling, [annotation hydra-genetics module](https://annotation.readthedocs.io/en/latest/) for annotation and [filtering hydra-genetics module](https://filtering.readthedocs.io/en/latest/) for filtering. Default hydra-genetics settings/resources are used if no configuration is specified.
See the [snv_indels hydra-genetics module](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) documentation for more details on the softwares for variant calling, [annotation hydra-genetics module](https://hydra-genetics-annotation.readthedocs.io/en/latest/) for annotation and [filtering hydra-genetics module](https://hydra-genetics-filtering.readthedocs.io/en/latest/) for filtering. Default hydra-genetics settings/resources are used if no configuration is specified.

<br />
![dag plot](images/snv.png)
Expand Down Expand Up @@ -87,7 +87,7 @@ Variant vcf files from the two callers are ensembled into one vcf file using **[
| sort_order | --names vardict, gatk_mutect2 | priority order for retaining variant information |

## Annotation
The ensembled vcf file is annotated firstly using VEP, followed by artifact annotation and background annotation. See the [annotation hydra-genetics module](https://annotation.readthedocs.io/en/latest/) for additional information.
The ensembled vcf file is annotated firstly using VEP, followed by artifact annotation and background annotation. See the [annotation hydra-genetics module](https://hydra-genetics-annotation.readthedocs.io/en/latest/) for additional information.

### VEP
The ensembled vcf file is annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v105. VEP adds a pletora of information for each variant which is specified by the configuration flags listed below. Of note are --pick which picks only one representative transcript for each variant, --af_gnomad which adds germline information, and --cache which uses a local copy of the databases for better performance. See [VEP options](https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html) for more information.
Expand Down Expand Up @@ -161,7 +161,7 @@ Example annotation for one variant added to a vcf file in the INFO field:
* [Panel of Normal](references.md#background_db) with position specific background information

## Filtering
Annotated vcfs are hard filtered first by removing regions outside exons and then filtered by a number of filtering criteria described below. See the [filtering hydra-genetics module](https://filtering.readthedocs.io/en/latest/) for additional information. A soft filtered version of the exonic regions is also provided for development and other investigations.
Annotated vcfs are hard filtered first by removing regions outside exons and then filtered by a number of filtering criteria described below. See the [filtering hydra-genetics module](https://hydra-genetics-filtering.readthedocs.io/en/latest/) for additional information. A soft filtered version of the exonic regions is also provided for development and other investigations.

### Extract exonic regions
Use **[bcftools filter -R](https://samtools.github.io/bcftools/bcftools.html)** v1.15 to extract variants overlapping exonic regions (including 20 bp padding) defined in a bed file which is a sub bed file of the general design bed file.
Expand Down
4 changes: 2 additions & 2 deletions docs/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ PROJECT_REF_DATA: "PATH_TO/design_and_ref_files" # parent folder for ref_data, e
```
## Input sample files
The pipeline uses sample input files (`samples.tsv` and `units.tsv`) with information regarding sample information, sequencing meta information as well as the location of the fastq-files. Specification for the input files can be found at [Twist Solid schemas](https://github.com/genomic-medicine-sweden/Twist_Solid/blob/develop/workflow/schemas/). Using the python virtual environment created above it is possible to generate these files automatically using [hydra-genetics create-input-files](https://hydra-genetics.readthedocs.io/en/latest/create_sample_files/):
The pipeline uses sample input files (`samples.tsv` and `units.tsv`) with information regarding sample information, sequencing meta information as well as the location of the fastq-files. Specification for the input files can be found at [Twist Solid schemas](https://github.com/genomic-medicine-sweden/Twist_Solid/blob/develop/workflow/schemas/). Using the python virtual environment created above it is possible to generate these files automatically using [hydra-genetics create-input-files](https://hydra-genetics.readthedocs.io/en/latest/run_pipeline/create_sample_files/):
```bash
hydra-genetics create-input-files -d path/to/fastq-files/
```
Expand All @@ -95,7 +95,7 @@ Using the activated python virtual environment created above, this is a basic co
snakemake --profile profiles/NAME_OF_PROFILE -s workflow/Snakefile
```
<br />
The are many additional [snakemake running options](https://snakemake.readthedocs.io/en/stable/executing/cli.html#) some of which is listed below. However, options that are always used should be put in the [profile](https://hydra-genetics.readthedocs.io/en/latest/profile/).
The are many additional [snakemake running options](https://snakemake.readthedocs.io/en/stable/executing/cli.html#) some of which is listed below. However, options that are always used should be put in the [profile](https://hydra-genetics.readthedocs.io/en/latest/run_pipeline/profile/).

* --notemp - Saves all intermediate files. Good for development and testing different options.
* --until <rule> - Runs only rules dependent on the specified rule.
Expand Down
10 changes: 5 additions & 5 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@ There are a number of main files that governs how the pipeline is executed liste
* profile/uppsala/config.yaml
* samples.tsv and units.tsv

There is more general information about the content of these files in hydra-genetics documentation in [code standards](https://hydra-genetics.readthedocs.io/en/latest/standards/), [config](https://hydra-genetics.readthedocs.io/en/latest/config/) and [Snakefile](https://hydra-genetics.readthedocs.io/en/latest/import/).
There is more general information about the content of these files in hydra-genetics documentation in [code standards](https://hydra-genetics.readthedocs.io/en/latest/development/standards/), [config](https://hydra-genetics.readthedocs.io/en/latest/make_pipeline/config/) and [Snakefile](https://hydra-genetics.readthedocs.io/en/latest/make_pipeline/import/).

## Snakefile
The `Snakefile` is located in workflow/ and imports hydra-genetics modules and rules as well as modifies these rules when needed. It also imports pipeline specific rules and define rule orders. Finally, this is where the rule all is defined.

## common.smk
The `common.smk` is located under workflow/rules/. This is a general rule taking care of any actions that are not directly connected with running a specific program. It includes version checks, import of config, resources, tsv-files and validations using schemas. Functions used by pipeline specific rules are also defined here as well as the output files using the function **compile_output_list** which programmatically generates a list of all necessary output files for the module to be targeted in the all rule defined in the `Snakemake` file. See further [Result files](https://hydra-genetics.readthedocs.io/en/latest/results/).
The `common.smk` is located under workflow/rules/. This is a general rule taking care of any actions that are not directly connected with running a specific program. It includes version checks, import of config, resources, tsv-files and validations using schemas. Functions used by pipeline specific rules are also defined here as well as the output files using the function **compile_output_list** which programmatically generates a list of all necessary output files for the module to be targeted in the all rule defined in the `Snakemake` file. See further [Result files](https://hydra-genetics.readthedocs.io/en/latest/make_pipeline/results/).

## config.yaml
The `config.yaml` is located under config/. The file ties all file and other dependencies as well as parameters for different rules together.
See further [pipeline configuration](https://hydra-genetics.readthedocs.io/en/latest/config/).
See further [pipeline configuration](https://hydra-genetics.readthedocs.io/en/latest/make_pipeline/config/).

<br />

Expand All @@ -30,7 +30,7 @@ See further [pipeline configuration](https://hydra-genetics.readthedocs.io/en/la


## resources.yaml
The `resources.yaml` is located under config/. The file declares default resources used by rules as well as resources for specific rules that needs more resources than allocated by default. See further [pipeline configuration](https://hydra-genetics.readthedocs.io/en/latest/config/).
The `resources.yaml` is located under config/. The file declares default resources used by rules as well as resources for specific rules that needs more resources than allocated by default. See further [pipeline configuration](https://hydra-genetics.readthedocs.io/en/latest/make_pipeline/config/).

```yaml
# ex, default resources
Expand Down Expand Up @@ -78,7 +78,7 @@ default-resources: [threads=1, time="04:00:00", partition="low", mem_mb="3074",
```
## samples.tsv and units.tsv
The `samples.tsv` and `units.tsv` are input files that must be generated before running the pipeline and should in general be located in the base folder of the analysis folder, can be changed in the config.yaml. See further [running the pipeline](running.md) and [create input files](https://hydra-genetics.readthedocs.io/en/latest/create_sample_files/).
The `samples.tsv` and `units.tsv` are input files that must be generated before running the pipeline and should in general be located in the base folder of the analysis folder, can be changed in the config.yaml. See further [running the pipeline](running.md) and [create input files](https://hydra-genetics.readthedocs.io/en/latest/run_pipeline/create_sample_files/).

### Example samples.tsv

Expand Down
40 changes: 37 additions & 3 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,12 @@ use rule bcftools_id_snps as bcftools_id_snps_dna with:

module prealignment:
snakefile:
get_module_snakefile(config, "hydra-genetics/prealignment", path="workflow/Snakefile", tag="v1.0.0")
get_module_snakefile(
config,
"hydra-genetics/prealignment",
path="workflow/Snakefile",
tag="v1.0.0",
)
config:
config

Expand Down Expand Up @@ -266,7 +271,10 @@ use rule vep from annotation as annotation_vep_wo_pick with:
log:
"{file}.vep_annotated_wo_pick.vcf.log",
benchmark:
repeat("{file}.vep_annotated_wo_pick.vcf.benchmark.tsv", config.get("vep_wo_pick", {}).get("benchmark_repeats", 1))
repeat(
"{file}.vep_annotated_wo_pick.vcf.benchmark.tsv",
config.get("vep_wo_pick", {}).get("benchmark_repeats", 1),
)


use rule bcftools_annotate from annotation as annotation_bcftools_annotate_purecn with:
Expand Down Expand Up @@ -321,7 +329,7 @@ use rule * from qc exclude all as qc_*
use rule multiqc from qc as qc_multiqc with:
output:
html=temp("qc/multiqc/multiqc_{report}.html"),
data=temp(directory("qc/multiqc/multiqc_{report}_data")),
data=directory("qc/multiqc/multiqc_{report}_data"),
data_json="qc/multiqc/multiqc_{report}_data/multiqc_data.json",


Expand Down Expand Up @@ -676,6 +684,32 @@ use rule purecn from cnv_sv as cnv_sv_purecn with:
unpack(cnv_sv.get_purecn_inputs),
vcf="cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.gz",
tbi="cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.gz.tbi",
output:
csv="cnv_sv/purecn/temp/{sample}_{type}/{sample}_{type}.csv",
outdir=directory("cnv_sv/purecn/temp/{sample}_{type}/"),


use rule purecn_coverage from cnv_sv as cnv_sv_purecn_coverage with:
output:
purecn=expand(
"cnv_sv/purecn_coverage/{{sample}}_{{type}}{ext}",
ext=[
"_coverage.txt.gz",
"_coverage_loess.txt.gz",
"_coverage_loess.png",
"_coverage_loess_qc.txt",
],
),


use rule purecn_copy_output from cnv_sv as cnv_sv_purecn_copy_output with:
output:
files="cnv_sv/purecn/{sample}_{type}{suffix}",


use rule purecn_purity_file from cnv_sv as cnv_sv_purecn_purity_file with:
output:
purity="cnv_sv/purecn_purity_file/{sample}_{type}.purity.txt",


module reports:
Expand Down

0 comments on commit e1b04e9

Please sign in to comment.