Skip to content

User Input - The Params File

User input is primarily passed as a .yaml or .json file containing the required parameters. This file is referred to as the params-file. Normally, the parameters in the file are listed in a flat structure without branches.

As this pipeline supports the analysis of common 10x Genomics libraries for single-cell (immune profiling) by using CellRanger Multi, please refer to the original website for more information: Cell Ranger multi 3' and Cell Ranger multi 5' immune profiling.

References

The references are provided through a series of different parameters. If you are interested in how the reference is built, read this

Gene Expression:

  • gene_expression_reference: <path> - A prebuilt 10xGenomics compatible gene expression reference.

  • gene_expression_reference_version: <'2020'/'2024'> - Version of the reference building script to use - default: '2024'

  • gene_expression_source_fa: <path> - Genome to build the custom gene expression reference with at runtime.

  • gene_expression_source_gtf: <path> - Annotation to build the custom gene expression reference with at runtime.

  • gene_expression_car_fa: <path> - CAR genome to add to gene expression reference at runtime.

  • gene_expression_car_gtf: <path> - CAR annotation to add to gene expression reference at runtime.

VDJ:

  • vdj_reference: <path> - A prebuilt 10xGenomics compatible VDJ reference.

Feature Barcoding:

  • feature_reference: <path>: A prebuilt 10xGenomics compatible feature reference.

These parameters can be put into the config like this:

gene_expression_reference: '/path/to/gex/reference'
vdj_reference: '/path/to/vdj/reference'
feature_reference: '/path/to/feature/reference'

Note that only the references that will actually be used are necessary. For instance, if no VDJ-T library is used, there is no need to provide a VDJ reference.

Regarding the gene_expression_* parameters, it is important to understand that the behavior of the pipeline changes depending on the parameters provided. This is because the pipeline can build a gene expression reference at runtime if the appropriate files are provided (at least gene_expression_source_fa and gene_expression_source_gtf). The gene_expression_car_fa and gene_expression_car_gtf parameters also come into play when building a custom reference (e.g. for detection of CAR mapping reads), as they are concatenated with their source counterparts if set. However, they are not simply ignored when gene_expression_reference is set. This is useful if you have a prebuilt custom reference with a concatenated CAR construct because in order for the pipeline to build metrics around the CAR construct it needs the unconcatenated construct. To do this just provide both gene_expression_reference as well as gene_expression_car_fa and gene_expression_car_gtf. For more details on how the reference-building process works, see the reference building explanation.

Samples

Unlike other parameters, the samples parameter is a list of maps. Each sample consists of the attributes name and libraries. The name attribute is an identifier for the sample and is used when naming the output. libraries, on the other hand, is, again, a list of maps. Each entry in libraries represents a 10x Genomics library and must include the fields fastq_path, fastq_id, and feature_types. These fields correspond to the definitions used by Cell Ranger Multi. When compiled, the value of samples might look like this:

samples:
  - name: 'sample_1'
    libraries:
      - fastq_id: 'sample_1_R'
        fastqs: '/path/to/sample1'
        feature_types: 'Gene Expression'
      - fastq_id: 'sample_1_B'
        fastqs: '/path/to/sample1'
        feature_types: 'VDJ-B'
      - ...
  - name: ... 

Currently, only the feature types Gene Expression, VDJ-T, VDJ-B and Antibody Capture are supported.

Miscellaneous settings

For this pipeline, the params-file is primarily used to supply input to the pipeline, but it is not limited to just that. Settings like the output directory can also be overridden here instead of using command line arguments. Here is a short list of common settings and their default values:

outdir: "${projectDir}/output"
skip_qc: false
skip_multiqc: false
trace_report_suffix: "{Current date (yyyy-MM-dd_HH-mm-ss)}"
validate_params: true

Read this if you want to learn more about the supported arguments.

Params-File Examples

gene_expression_reference: '/path/to/gex/reference'
vdj_reference: '/path/to/vdj/reference'
feature_reference: '/path/to/feature/reference'
samples:
  - name: 'sample_1'
    libraries:
      - fastq_id: 'sample_1_R'
        fastqs: '/path/to/sample1'
        feature_types: 'Gene Expression'
      - fastq_id: 'sample_1_T'
        fastqs: '/path/to/sample1'
        feature_types: 'VDJ-T'
      - fastq_id: 'sample_1_B'
        fastqs: '/path/to/sample1'
        feature_types: 'VDJ-B'
      - fastq_id: 'sample_1_A'
        fastqs: '/path/to/sample1'
        feature_types: 'Antibody Capture'
  - name: ...
gene_expression_source_fa: '/path/to/gex/source.fa'
gene_expression_source_gtf: '/path/to/gex/source.gtf'
gene_expression_car_fa: '/path/to/gex/car.fa'
gene_expression_car_gtf: '/path/to/gex/car.gtf'
feature_reference: '/path/to/feature/reference'
samples:
  - name: 's1'
    libraries:
      - fastq_id: 'GEX_s1'
        fastqs: '/path/to/GEX_s1'
        feature_types: 'Gene Expression'
      - fastq_id: 'ADT_s1'
        fastqs: '/path/to/ADT_s1'
        feature_types: 'Antibody Capture'