Output¶
Where to Find Output¶
The output directory can be set using the outdir
parameter. If this parameter is not overridden using the params-file or command-line arguments, the output is written to ${projectDir}/output
.
Output Structure¶
After the pipeline completes successfully, the output directory should contain the following subdirectories:
Gene Expression Reference
The gene_expression_reference
directory contains the output of the BUILD_GEX_REFERENCE
process, which is run when building a reference at runtime. Using this output instead of regenerating the reference for each pipeline run helps reduce runtime and computing power.
Analysis Results¶
CellRanger Multi
The cellranger_multi
directory contains per-sample output directories from the CELLRANGER_MULTI
process.
Seurat Object
The seurat_object
directory contains the output of the SEURAT_OBJECT
process, specifically the Seurat object itself. A merged Seurat object (seurat_merged.Rds
) is generated, which includes all samples and their respective data types from 10x Genomics libraries.
Gene expression data is stored in an assay called "RNA," while Antibody Capture data is stored in an additional assay called "ADT." As count matrix was corrected with SoupX, original counts are stored in "original.counts" assay.
Additional metadata includes quality scores, mitochondrial and ribosomal gene abundance, cell cycle scores, and doublet removal information.
Metadata Fields¶
General Information
- orig.ident
: Original identity or batch identifier for each sample.
- nCount_RNA
: Total RNA molecule counts (UMI counts) for each cell.
- nFeature_RNA
: Number of unique genes detected in the RNA assay for each cell.
- nCount_ADT
: Total counts for the Antibody-Derived Tag (ADT) assay.
- nFeature_ADT
: Number of unique features detected in the ADT assay.
Doublet Removal Scores (calculated with scDblFinder)
- scDblFinder_score
: Score estimating the likelihood that a cell is a doublet.
- scDblFinder_class
: Classification of cells as "singlet" or "doublet".
Cell Cycle Information
- S.Score
: Score representing the activity of the S phase calculated with Seurat CellCycleScoring().
- G2M.Score
: Score representing the activity of the G2/M phase calculated with Seurat CellCycleScoring().
- Phase
: Predicted cell cycle phase (G2M, S or G1) based on S.Score and G2M.Score
- CellCycle
: True if cycling else False. Calculated using a cluster-based enrichement method. Enrichment based on genes from cell.cycle.obj of ProjectTILs package.
- CellCycle_Phase
: Combined Cell Cycle information based on CellCycleScoring() and Cluster-based apprach using gene set enrichment.
Clonality and VDJ Information (if available)
- CTgene
: Clonotype gene information for each cell.
- CTnt
: Nucleotide sequence information of the clonotype.
- CTaa
: Amino acid sequence information of the clonotype.
- clonalProportion
: Proportion of cells belonging to a given clonotype.
- clonalFrequency
: Frequency of a specific clonotype in the sample.
- cloneSize
: Clonotype size category (e.g., "Single," "Small," "Medium," "Large," or "Hyperexpanded").
Quarto Webpage Summary
The quarto
directory contains an interactive webpage that summarizes cross-sample quality metrics, allowing for direct comparisons between different samples. This summary includes:
- CAR-specific quality control metrics
- GEX-specific metrics
- VDJ-specific metrics
CAR-Specific Quality Control Metrics¶
Read-Level Metrics - Coverage (unique and multimapping reads) across the CAR construct. - Absolute read counts against CAR construct per sample.
Count-Level Metrics - Percentage (and absolute numbers) of CAR-positive cells (count>0) compared to all T cells (CD4, CD8, gd) (based on scGate annotation) per sample. - Percentage (and absolute numbers) of CAR-positive cells (count>0) compared to all CD4+ (A) and CD8+ (B) T cells (scGate annotation PBMC model) per sample.
GEX-specific Metrics¶
Cell Proportions - Insight into the distribution of different cell types, predicted with scGate using the PBMC model and default parameters.
VDJ-specific Metrics¶
Clonotype composition for T and B cell receptor data: Absolute and relative numbers of unique clonotypes per sample.
QC¶
FastQC
The fastqc
directory contains the per-sample output directories from the FASTQC
processes, containing a FastQC report for each sample.
FastQ Screen
The fastq_screen
directory contains the per-sample output directories from the FASTQ_SCREEN
processes, including a report verifying whether sequencing runs contain expected sequences by testing against genomes in assets/fastq_databases
.
MultiQC
The multiqc
directory contains the output of the MULTIQC
process, which provides a summary of FastQC and FastQ Screen results.
Pipeline Info
The pipeline_info
directory contains log data from the Nextflow pipeline itself.