Skip to content

Latest commit

 

History

History
792 lines (553 loc) · 25.1 KB

Snakemake.rst

File metadata and controls

792 lines (553 loc) · 25.1 KB

Snakemake Workflows

https://img.shields.io/conda/dn/bioconda/snakemake.svg?label=Bioconda https://img.shields.io/github/workflow/status/snakemake/snakemake/Publish%20to%20Docker%20Hub?color=blue&label=docker%20container&branch=main https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=main&label=tests

The Snakemake workflows management system is a tool to create reproducible and scalable data analyses.

The Snakemake language extends the Python language, adding syntactic structures for rule definition and additional controls. All added syntactic structures begin with a keyword followed by a code block that is either in the same line or indented and consists of multiple lines. The resulting syntax resembles that of original Python constructs.

A Snakemake workflow is defined by specifying rules in a Snakefile. The rules decompose the workflow into small steps by specifying how to create sets of output files from sets of input files. It will automatically determine the dependencies between the rules by matching file names.

This tutorial presents a bioinformatics pipeline using Snakemake and :ref:`the Reproducibility Framework (RF) <Reproducibility-Framework>`.

It will used the two classes of L1-associated somatic variants in the human brain from Salk Institute for Biological Studies dataset.

Note

Bioproject: PRJEB10849 SRA Study: ERP012147.

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=ERR1016570

Pipeline

Then, let's create the pipeline directory structure to store this tutorial.

[userid@login03 ~]$ mkdir -p pipeline/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/bwamem/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/genome/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/bwamem/rmdup/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/bwamem/rmdup/tags/_h
[userid@login03 ~]$ mkdir -p pipeline/cutadapt/bwamem/rmdup/tags/tabix/_h

A directory structure like this is expected to run our pipeline.

[userid@login03 pipeline]$ rf status
.                                   done  (step done)
└── cutadapt                        done
    ├── bwamem              ready to run  (step with the run file, but not performed)
    │   └── rmdup          no run script
    │       └── tags       no run script  (step without the run file)
    │           └── tabix  no run script
    └── genome                         no _h
        └── hg19                ready to run
            ├── bwa             ready to run
            └── chromsizes      ready to run

Once the scripts are ready to run, as is the case with the genome step. Just run the rf command recursively using the -r tag.

[userid@login03 pipeline]$ cd genome/
[userid@login03 genome]$ rf sbatch -r .

Start /home/userid/pipeline/genome/hg19: 2022-05-04 17:30:53-04:00
End /home/userid/pipeline/genome/hg19: 2022-05-04 17:30:53-04:00
Start /home/userid/pipeline/genome/hg19/bwa: 2022-05-04 17:30:53-04:00
End /home/userid/pipeline/genome/hg19/bwa: 2022-05-04 17:30:53-04:00
Start /home/userid/pipeline/genome/hg19/chromsizes: 2022-05-04 17:30:53-04:00
End /home/userid/pipeline/genome/hg19/chromsizes: 2022-05-04 17:30:53-04:00

[rdesouz4@login03 genome]$ rf status
.                           no _h
└── hg19                     done
    ├── bwa                  done
    └── chromsizes           done

SRA Toolkit

To download sequence data files using SRA Toolkit, you need create a run file into pipeline/_h folder.

#!/bin/bash

#SBATCH -J sra_tools
#SBATCH -p defq
#SBATCH -N 1
#SBATCH --time=2:00:00
#login01SBATCH --cpus-per-task=4
#SBATCH --output=Array_test.%A_%a.out
#SBATCH --array=1-101

ml sra-tools/3.0.0

# Bioproject PRJEB10849 samples

sra_numbers=($(echo {1016570..1016671}))

sra_id='ERR'${sra_numbers[ $SLURM_ARRAY_TASK_ID - 1 ]}

prefetch --max-size 100G $sra_id --force yes --verify no
fastq-dump --outdir . --gzip --skip-technical  --readids --read-filter pass --dumpbase --split-3 --clip ${sra_id}/${sra_id}.sra

rm $sra_id -Rf

The rf command will call the run script to retrieve SRA Normalized Format files with full base quality scores, and store them fastq files into _m folder.

[userid@login03 ~]$ cd pipeline/
[userid@login03 ~]$ chmod +x _h/run
[userid@login03 pipeline]$ rf sbatch -v .
all: /home/userid/pipeline/_m/SUCCESS

.ONESHELL:
/home/userid/pipeline/_m/SUCCESS:
      echo -n "Start /home/userid/pipeline: "; date --rfc-3339=seconds
      mkdir /home/userid/pipeline/_m
      cd /home/userid/pipeline/_m
      sbatch ../_h/run > nohup.out 2>&1
      touch SUCCESS
      echo -n "End /home/userid/pipeline: "; date --rfc-3339=seconds

Start /home/userid/pipeline: 2022-04-27 16:14:52-04:00
End /home/userid/pipeline: 2022-04-27 16:14:52-04:00

Note

  • Writing Workflows : "In Snakemake, workflows are specified as Snakefiles. Inspired by GNU Make, a Snakefile contains rules that denote how to create output files from input files. Dependencies between rules are handled implicitly, by matching filenames of input files against output files. Thereby wildcards can be used to write general rules."
  • Snakefiles and Rules : "A Snakemake workflow defines a data analysis in terms of rules that are specified in the Snakefile."

We will create a hypothetical scenario with precedent steps, where for example the Level 5 (tabix) depends on the Level 4 (tags), and so on.

Note

Level 1 (cutadapt) -> Level 2 (bwamem) -> Level 3 (rmdup) -> Level 4 (tags) -> Level 5 (tabix)

Cutadapt

install with bioconda

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. It helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way.

[userid@login03 pipeline]$ cd cutadapt/
[userid@login03 cutadapt]$ vi _h/run

#!/bin/bash

#SBATCH -J cutadapt
#SBATCH -p defq
#SBATCH --time=2:00:00
#login01SBATCH --cpus-per-task=4
#SBATCH --output=cutadapt.job.%j.out

module load snakemake/7.6.0

# Syntax to run it on Rockfish cluster
snakemake --jobs 101 --latency-wait 240 --cluster 'sbatch --parsable --distribution=arbitrary' --snakefile ../_h/snakemake.slurm.script

So, we need create a script to perform the rev_comp_seq. Given a DNA sequence in string object, it will return its reverse.

[userid@login03 cutadapt]$ vi ~/.local/bin/rc

#!/bin/bash
if [ ! -z "$1" ]; then
    echo "$1" | tr "[ATGCatgc]" "[TACGtacg]" | rev
else
    echo ""
    echo "usage: rc DNASEQUENCE"
    echo ""
fi

[userid@login03 cutadapt]$ chmod +x ~/.local/bin/rc
[userid@login03 cutadapt]$ vi _h/snakemake.slurm.script

cutadapt snakemake.slurm.script

import glob
import os.path
import itertools

SOURCE_DIR = '../../_m'
EXT = '_pass_1.fastq.gz'

def sample_dict_iter(path, ext):
  for filename in glob.iglob(path+'/*'+ext):
      sample = os.path.basename(filename)[:-len(ext)]

      yield sample, {'r1_in': SOURCE_DIR + '/' + sample + '_pass_1.fastq.gz',
                     'r2_in': SOURCE_DIR + '/' + sample + '_pass_2.fastq.gz'
        }

SAMPLE_DICT = {k:v for k,v in sample_dict_iter(SOURCE_DIR, EXT)}

#insure errors propogate along pipe'd shell commands
shell.prefix("set -o pipefail; ")

rule all:
  input:
      expand('../_m/{sample}_{suffix}.fastq.gz',
       sample=SAMPLE_DICT.keys(),
       suffix=['R1','R2'])

rule cutadapt:
  input:
      r1 = lambda x: SAMPLE_DICT[x.sample]['r1_in'],
      r2 = lambda x: SAMPLE_DICT[x.sample]['r2_in']
  output:
      r1 = '../_m/{sample}_R1.fastq.gz',
      r2 = '../_m/{sample}_R2.fastq.gz'

  params:
      sample = '{sample}'

  shell:
      '''
  module load cutadapt/3.2

  export PATH=$HOME'/.local/bin:'$PATH

  R1_ADAPTER='AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
  R2_ADAPTER='CAAGCAGAAGACGGCATACGAGANNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'

  NESTED_PRIMER='TAACTAACCTGCACAATGTGCAC'

  R1_FRONT=${{R1_ADAPTER}}
  R2_FRONT=${{R2_ADAPTER}}${{NESTED_PRIMER}}
  R1_END=`rc ${{R2_FRONT}}`
  R2_END=`rc ${{R1_FRONT}}`

  QUALITY_BASE=33
  QUALITY_CUTOFF=28
  MINIMUM_LENGTH=36
  ADAPTOR_OVERLAP=5
  ADAPTOR_TIMES=4

  cutadapt -j 0 --quality-base=${{QUALITY_BASE}} --quality-cutoff=${{QUALITY_CUTOFF}} --minimum-length=${{MINIMUM_LENGTH}} --overlap=${{ADAPTOR_OVERLAP}} --times=${{ADAPTOR_TIMES}} --front=${{R1_FRONT}} --adapter=${{R1_END}} --paired-output tmp.2.{params.sample}.fastq -o tmp.1.{params.sample}.fastq {input.r1} {input.r2} > {params.sample}_R1.cutadapt.out

  cutadapt -j 0 --quality-base=${{QUALITY_BASE}} --quality-cutoff=${{QUALITY_CUTOFF}} --minimum-length=${{MINIMUM_LENGTH}} --overlap=${{ADAPTOR_OVERLAP}} --times=${{ADAPTOR_TIMES}} --front=${{R2_FRONT}} --adapter=${{R2_END}} --paired-output {output.r1} -o {output.r2} tmp.2.{params.sample}.fastq tmp.1.{params.sample}.fastq > {params.sample}_R2.cutadapt.out

  rm -f tmp.2.{params.sample}.fastq tmp.1.{params.sample}.fastq

'''
[userid@login03 cutadapt]$ chmod +x _h/run
[userid@login03 cutadapt]$ rf sbatch .

Start /home/userid/pipeline/cutadapt: 2022-05-04 14:35:06-04:00
End /home/userid/pipeline/cutadapt: 2022-05-04 14:35:06-04:00

Monitoring submitted jobs

[userid@login02 _m]$ sqme
  USER   ACCOUNT        JOBID PARTITION       NAME NODES  CPUS TIME_LIMIT     TIME NODELIST ST REASON
userid   rfadmin      4157118 defq      snakejob.c     1     1    1:00:00    21:15     c221 R None
userid   rfadmin      4157146 defq      snakejob.c     1     1    1:00:00    21:15     c301 R None
userid   rfadmin      4157061 defq      snakejob.c     1     1    1:00:00    21:26     c157 R None
userid   rfadmin      4157072 defq      snakejob.c     1     1    1:00:00    21:26     c132 R None
userid   rfadmin      4157102 defq      snakejob.c     1     1    1:00:00    21:26     c303 R None
userid   rfadmin      4157046 defq        cutadapt     1     1    2:00:00    21:28     c124 R None

To monitoring all submitted processed jobs, tail -f on the file called cutadapt.job.<JOBID>.out.

[userid@login03 cutadapt]$ cat _m/cutadapt.job.4157046.out

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 200
Job stats:
job         count    min threads    max threads
--------  -------  -------------  -------------
all             1              1              1
cutadapt      101              1              1
total         102              1              1

Select jobs to execute...

[Wed May  4 14:48:20 2022]
rule cutadapt:
    input: ../../_m/ERR1016599_pass_1.fastq.gz, ../../_m/ERR1016599_pass_2.fastq.gz
    output: ../_m/ERR1016599_R1.fastq.gz, ../_m/ERR1016599_R2.fastq.gz
    jobid: 26
    wildcards: sample=ERR1016599
    resources: mem_mb=1709, disk_mb=1709, tmpdir=/tmp

Submitted job 26 with external jobid '4157048'.

[Wed May  4 14:48:20 2022]
rule cutadapt:
    input: ../../_m/ERR1016661_pass_1.fastq.gz, ../../_m/ERR1016661_pass_2.fastq.gz
    output: ../_m/ERR1016661_R1.fastq.gz, ../_m/ERR1016661_R2.fastq.gz
    jobid: 86
    wildcards: sample=ERR1016661
    resources: mem_mb=3245, disk_mb=3245, tmpdir=/tmp

........
........
........
........

[Wed May  4 14:48:30 2022]
rule cutadapt:
    input: ../../_m/ERR1016581_pass_1.fastq.gz, ../../_m/ERR1016581_pass_2.fastq.gz
    output: ../_m/ERR1016581_R1.fastq.gz, ../_m/ERR1016581_R2.fastq.gz
    jobid: 85
    wildcards: sample=ERR1016581
    resources: mem_mb=1891, disk_mb=1891, tmpdir=/tmp

Submitted job 85 with external jobid '4157148'.
[Wed May  4 14:49:33 2022]
Finished job 37.
1 of 102 steps (1%) done
[Wed May  4 14:50:31 2022]
Finished job 30.
2 of 102 steps (2%) done
[Wed May  4 14:51:35 2022]
Finished job 16.
3 of 102 steps (3%) done
[Wed May  4 14:51:48 2022]
Finished job 25.
4 of 102 steps (4%) done
[Wed May  4 14:51:49 2022]
Finished job 87.
5 of 102 steps (5%) done

Also, it is possible to see the outputs for each sample processed, just monitoring the file called slurm-<snakejobid>.out.

[userid@login02 _m]$ cat slurm-4157147.out

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Select jobs to execute...

[Wed May  4 14:48:37 2022]
rule cutadapt:
    input: ../../_m/ERR1016667_pass_1.fastq.gz, ../../_m/ERR1016667_pass_2.fastq.gz
    output: ../_m/ERR1016667_R1.fastq.gz, ../_m/ERR1016667_R2.fastq.gz
    jobid: 0
    wildcards: sample=ERR1016667
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp

[Wed May  4 14:51:25 2022]
Finished job 0.
1 of 1 steps (100%) done

Burrows-Wheeler Alignment Tool

Build Status SourceForge Downloads GitHub Downloads BioConda Install

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.

[userid@login03 cutadapt]$ cd bwamem
[userid@login03 bwamem]$ vi _h/run
#!/bin/bash

#SBATCH -J bwamem
#SBATCH -p defq
#SBATCH --time=2:00:00
#login01SBATCH --cpus-per-task=4
#SBATCH --output=bwamem.job.job.%j.out

module load snakemake/7.6.0

# Syntax to run it on Rockfish cluster
snakemake --jobs 101 --latency-wait 240 --cluster 'sbatch --parsable --distribution=arbitrary' --snakefile ../_h/snakemake.slurm.script

bwamem snakemake.slurm.script

[userid@login03 bwamem]$ vi _h/snakemake.slurm.script
import glob
import os.path
import itertools

SOURCE_DIR = '../../_m'
EXT = '_R1.fastq.gz'

def sample_dict_iter(path, ext):
    for filename in glob.iglob(path+'/*'+ext):
        sample = os.path.basename(filename)[:-len(ext)]
        yield sample, {'r1_in': SOURCE_DIR + '/' + sample + '_R1.fastq.gz',
                                 'r2_in': SOURCE_DIR + '/' + sample + '_R2.fastq.gz'
                    }

SAMPLE_DICT = {k:v for k,v in sample_dict_iter(SOURCE_DIR, EXT)}

#insure errors propogate along pipe'd shell commands
shell.prefix("set -o pipefail; ")

rule all:
    input:
        expand('../_m/{sample}.bam',
             sample=SAMPLE_DICT.keys())

rule bwamem:
    input:
        r1 = lambda x: SAMPLE_DICT[x.sample]['r1_in'],
            r2 = lambda x: SAMPLE_DICT[x.sample]['r2_in']

    output:
        '../_m/{sample}.bam'

    params:
        sample = '{sample}'

    shell:
        '''
    module load bwa-mem/0.7.17 samtools/1.15.1

    export PATH=$HOME'/.local/bin:'$PATH

    GENOME='../../../../genome/bwa/_m/hg19.fa'

    bwa mem -T 19 -t 4 ${{GENOME}} {input.r1} {input.r2} 2> {params.sample}.stderr | samtools view -S -b - > {output}
'''
[userid@login03 bwamem]$ chmod +x _h/run
[userid@login03 bwamem]$ rf sbatch .

Remove duplicates

rmdup is a script part of the SLAV-Seq protocol written by Apuã Paquola, coded in Perl to read .bam input files and apply samtools software to treat paired-end reads and single-end reads.

[userid@login03 cutadapt]$ cd rmdup
[userid@login03 rmdup]$ vi _h/run
#!/bin/bash

#SBATCH -J rmdup
#SBATCH -p defq
#SBATCH --time=2:00:00
#SBATCH --cpus-per-task=4
#SBATCH --output=rmdup.job.job.%j.out

module load snakemake/7.6.0

# Syntax to run it on Rockfish cluster
snakemake --jobs 101 --latency-wait 240 --cluster 'sbatch --parsable --distribution=arbitrary' --snakefile ../_h/snakemake.slurm.script

rmdup snakemake.slurm.script

[userid@login03 rmdup]$ vi _h/snakemake.slurm.script
import glob
import os.path
import itertools

SOURCE_DIR = '../../_m'
EXT = '.bam'

def sample_dict_iter(path, ext):
    for filename in glob.iglob(path+'/*'+ext):
        sample = os.path.basename(filename)[:-len(ext)]
        yield sample, {'filename': filename}


SAMPLE_DICT = {k:v for k,v in sample_dict_iter(SOURCE_DIR, EXT)}

#insure errors propogate along pipe'd shell commands
shell.prefix("set -o pipefail; ")

rule all:
    input:
        expand('../_m/{sample}.bam', sample=SAMPLE_DICT.keys())

rule process_one_sample:
    input:
        lambda x: SAMPLE_DICT[x.sample]['filename']

    output:
        '../_m/{sample}.bam'
    log:
        stderr = '{sample}.stderr',
        stdout = '{sample}.stdout'
    shell:
        '../_h/slavseq_rmdup.pl {input} {output}'
[userid@login03 rmdup]$ chmod +x _h/run
[userid@login03 rmdup]$ rf sbatch .

Add tags

tags is a script part of the SLAV-Seq protocol written by Apuã Paquola, coded in Perl to add the custom flags into bam files.

[userid@login03 rmdup]$ cd tags
[userid@login03 tags]$ vi _h/run
#!/bin/bash

#SBATCH -J tags
#SBATCH -p defq
#SBATCH --time=2:00:00
#login01SBATCH --cpus-per-task=4
#SBATCH --output=tags.job.job.%j.out

module load snakemake/7.6.0

# Syntax to run it on Rockfish cluster
snakemake --jobs 101 --latency-wait 240 --cluster 'sbatch --parsable --distribution=arbitrary' --snakefile ../_h/snakemake.slurm.script

tags snakemake.slurm.script

[userid@login03 tags]$ vi _h/snakemake.slurm.script
import glob
import os.path
import itertools

SOURCE_DIR = '../../_m'
EXT = '.bam'

def sample_dict_iter(path, ext):
    for filename in glob.iglob(path+'/*'+ext):
        sample = os.path.basename(filename)[:-len(ext)]
        yield sample, {'filename': SOURCE_DIR + '/' + sample + '.bam'}


SAMPLE_DICT = {k:v for k,v in sample_dict_iter(SOURCE_DIR, EXT)}

#insure errors propogate along pipe'd shell commands
shell.prefix("set -o pipefail; ")

rule all:
    input:
        expand('../_m/{sample}.bam',
               sample=SAMPLE_DICT.keys())

rule tags:
    input:
        '../../_m/{sample}.bam'

    output:
        '../_m/{sample}.bam'

    params:
        sample = '{sample}'

    shell:
        '''
    module load samtools/1.15.1

    export PERL5LIB=$HOME'/perl5/lib/perl5/'
    export CONSENSUS='ATGTACCCTAAAACTTAGAGTATAATAAA'
    export PATH=$HOME'/.local/bin:'$PATH

    GENOME='../../../../../../genome/bwa/_m/hg19.fa'

    PREFIX_LENGTH=`perl -e 'print length($ENV{{CONSENSUS}})+2'`
    R1_FLANK_LENGTH=750
    R2_FLANK_LENGTH=${{PREFIX_LENGTH}}
    SOFT_CLIP_LENGTH_THRESHOLD=5

    (samtools view -h {input} | ../_h/add_tags_hts.pl --genome_fasta_file ${{GENOME}} --prefix_length ${{PREFIX_LENGTH}} --consensus ${{CONSENSUS}} --r1_flank_length ${{R1_FLANK_LENGTH}} --r2_flank_length ${{R2_FLANK_LENGTH}} --soft_clip_length_threshold ${{SOFT_CLIP_LENGTH_THRESHOLD}} | samtools view -S -b - > {output}) 2> {params.sample}.stderr
'''
[userid@login03 tags]$ chmod +x _h/run
[userid@login03 tags]$ rf sbatch .

Tabix

Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file (in.tab.bgz.tbi or in.tab.bgz.csi) when region is absent from the command-line.

[userid@login03 tags]$ cd tabix
[userid@login03 tabix]$ vi _h/run
#!/bin/bash

#SBATCH -J tabix
#SBATCH -p defq
#SBATCH --time=2:00:00
#login01SBATCH --cpus-per-task=4
#SBATCH --output=tabix.job.job.%j.out

module load snakemake/7.6.0

# Syntax to run it on Rockfish cluster
snakemake --jobs 101 --latency-wait 240 --cluster 'sbatch --parsable --distribution=arbitrary' --snakefile ../_h/snakemake.slurm.script

tabix snakemake.slurm.script

[userid@login03 tabix]$ vi _h/snakemake.slurm.script
import glob
import os.path
import itertools
import os
import sys
import warnings
import subprocess

SOURCE_DIR = '../../_m'
EXT = '.bam'

def sample_dict_iter(path, ext):
    for filename in glob.iglob(path+'/*'+ext):
        sample = os.path.basename(filename)[:-len(ext)]
        yield sample, {'filename': SOURCE_DIR + '/' + sample + '.bam'}

SAMPLE_DICT = {k:v for k,v in sample_dict_iter(SOURCE_DIR, EXT)}

#insure errors propogate along pipe'd shell commands
shell.prefix("set -o pipefail; ")

rule all:
    input:
        expand('../_m/{sample}.{ext}',
               sample=SAMPLE_DICT.keys(),
             ext=['bgz', 'bgz.tbi'])

rule tabix:
    input:
        '../../_m/{sample}.bam'

    output:
        bgz = '../_m/{sample}.bgz',
        tbi = '../_m/{sample}.bgz.tbi'

    params:
        sample = '{sample}'

    shell:
        '''
    module load tabix/1.13 samtools/1.15.1 bzip2/1.0.8

    export PATH=$HOME'/.local/bin:'$PATH

    TMP_DIR='tmp.{params.sample}'
    mkdir ${{TMP_DIR}}

    export LC_ALL=C

    ( samtools view {input} | ../_h/sam_to_tabix.py 2>{params.sample}.stderr | sort --temporary-directory=${{TMP_DIR}} --buffer-size=10G -k1,1 -k2,2n -k3,3n | bgzip2 -c > {output.bgz} )

    rmdir ${{TMP_DIR}}

    tabix -s 1 -b 2 -e 3 -0 {output.bgz}
'''
[userid@login03 tabix]$ chmod +x _h/run
[userid@login03 tabix]$ rf sbatch .

Once you coded the pipeline, just run :ref:`the Reproducibility Framework (RF) <Reproducibility-Framework>`.

├── pipeline                          no _h
  ├── cutadapt                 ready to run
  │    └── bwamem             no run script
  │        └── rmdup          no run script
  │          └── tags         no run script
  │               └── tabix   no run script
  └── genome                          no _h
      └── hg19                 ready to run
          ├── bwa              ready to run
          └── chromsizes       ready to run

You run one level at a time, or you can use the -r option for recursive. It will perform the rf command, once the level 1 is finishes, it will run next level, so consecutively.

[userid@login03 ~]$ interact -c 2 -t 120
[userid@c010 ~]$ cd pipeline
[userid@c010 ~]$ rf run -r .

Warning

The rf command is validated to run in interactive mode, so far.