Skip to content

Input

The --input value is a tab-separated file (sample-sheet) with each row describing the data and metadata of a sample.

A minimal sample-sheet for the vcf workflow could look like this:

individual_id vcf
sample0 sample0.vcf.gz
sample1 sample1.vcf.gz
sample2 sample2.vcf.gz

Sample-sheet values are case sensitive. Columns can contain values of different types:

type description
boolean allowed values: [true, false]
enum categorical value
file absolute file path or file path relative to the sample sheet
file list comma-separated list of file paths
string text
string list comma-separated list of strings

The following sections describe the columns that can be used in every sample-sheet followed by workflow specific columns.

Columns

column type required default description
project_id string vip project identifier
family_id string vip_fam<index> family identifier
individual_id string yes sample identifier of the individual
paternal_id string sample identifier of the father
maternal_id string sample identifier of the mother
sex enum unknown sex values: [male,female]
affected boolean unknown affected status whether the individual is affected
proband boolean depends1 individual being reported on
hpo_ids string list regex: /HP:\d{7}/
assembly enum GRCh38 allowed values: [GRCh37,GRCh38]
sequencing_method enum WGS allowed values: [WES,WGS]

1 Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.

Columns: FASTQ

column type required default description
fastq file list yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. single-reads file(s)
fastq_r1 file list yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #1
fastq_r2 file list yes2 allowed file extensions: [fastq, fastq.gz, fq, fq.gz]. paired-end reads file(s) #2
sequencing_platform enum illumina allowed values: [illumina,nanopore,pacbio_hifi]

2 Either the fastq or the fastq_r1 and fastq_r2 are required.

Columns: CRAM

column type required default description
cram file yes allowed file extensions: [bam, cram]
sequencing_platform enum illumina allowed values: [illumina,nanopore,pacbio_hifi]

Columns: VCF

column type required default description
vcf file yes allowed file extensions: [vcf, vcf.gz, vcf.bgz, bcf, bcf.gz, bcf.bgz]
cram file allowed file extensions: [bam, cram]