Skip to content

Pipeline to detect HLA disruption from WES and RNAseq data

License

Notifications You must be signed in to change notification settings

McGranahanLab/mhc-hammer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Disruption of the class I human leukocyte antigen (HLA) molecules has important implications for immune evasion and tumor evolution. To evaluate the extent of genomic and transcriptomic HLA disruption, we developed MHC Hammer, which has the following four major components: (1) identifying allele-specific HLA somatic mutations, (2) calculating HLA LOH, (3) evaluating HLA allele-specific repression and (4) identifying allele-specific HLA alternative splicing.

diagram

You can find our MHC Hammer publication here: https://www.nature.com/articles/s41588-024-01883-8

MHC Hammer requires every patient to have a whole exome sequencing (WES) germline blood sample. In addition, MHC Hammer requires the following inputs:

To estimate DNA HLA allelic imbalance and somatic mutations:

  • A tumour WES BAM file. To estimate DNA HLA copy number and LOH:
  • A tumour WES BAM file with purity and ploidy estimates. To estimate RNA HLA allelic expression, allelic imbalance and alternative splicing:
  • A tumour or normal RNAseq BAM file. To estimate RNA HLA allelic repression:
  • A tumour and normal RNAseq BAM file.

Pipeline overview

diagram

Steps before running the pipeline

1. Installing Nextflow and Singularity

  1. Install Nextflow (>=21.10.3)

  2. Install Singularity

2. Make an inventory file

You need to create a inventory file with the following columns:

  • patient - the patient name. MHC Hammer will replace spaces in the patient name with underscores. Required.
  • sample_name - the sample name. MHC Hammer will replace spaces in the sample name with underscores. Required.
  • sample_type - either tumour or normal. Required.
  • bam_path - full path to the WXS or RNAseq BAM file. Required.
  • sequencing_type - either wxs or rnaseq. Required.
  • purity - the purity of the tumour region. Can be left empty.
  • ploidy - the ploidy of the tumour region. Can be left empty.
  • normal_sample_name - when sequencing_type is wxs this is the matched germline WXS. When sequencing_type is rnaseq this is the matched RNAseq normal name. Can be left empty.

The inventory should be a csv file and is input to the pipeline with the --input parameter.

The following is an example inventory for a single patient with:

  • two tumour regions with WXS (sample_name1 and sample_name2), one of which has RNAseq (sample_name1)
  • one germline WXS sample (sample_name3)
  • one normal RNAseq sample (sample_name4)
patient sample_name sample_type bam_path sequencing_type purity ploidy normal_sample_name
patient1 sample_name1 tumour path/to/sample_name1.bam wxs 0.5 3 sample_name3
patient1 sample_name2 tumour path/to/sample_name2.bam wxs 0.3 2.5 sample_name3
patient1 sample_name3 normal path/to/sample_name3.bam wxs
patient1 sample_name1 tumour path/to/sample_name4.bam rnaseq sample_name4
patient1 sample_name4 normal path/to/sample_name5.bam rnaseq

3. Clone this repo

git clone [email protected]:McGranahanLab/mhc-hammer.git
mkdir mhc-hammer/singularity_images
cd mhc-hammer
project_dir=${PWD}

4. Download the MHC Hammer reference files

The MHC Hammer refernece files are created from sequences stored in the IMGT database. We have created MHC Hammer references from two IMGT versions:

This should download two folders, kmer_files and mhc_references. Save these folders in the assets folder:

  • assets/kmer_files/imgt_30mers.fa - This file contains all 30mers created from the sequences in the IMGT database. For an overview of how this file was created see docs/mhc_reference_files.md
  • assets/mhc_references - this folder contains the MHC reference files used in the MHC Hammer pipeline. For an overview of how these file were created see docs/mhc_reference_files.md

5. HLA allele typing

Every sample run through MHC Hammer requires HLA allele types. MHC Hammer provides three options for typing HLA alleles:

  1. Install HLA-HD locally. MHC Hammer will run the locally installed HLA-HD.
  2. Create a container containing HLA-HD. MHC Hammer will run HLA-HD using this container.
  3. Provide HLA allele types as an input to MHC Hammer, in this case MHC Hammer will not run HLA-HD.

The HLA allele types predicted by HLA-HD (option 1 or 2) or input to MHC Hammer (option 3) must match the alleles in the MHC Hammer reference files

This means that if using HLA-HD within MHC Hammer (option 1 or 2) the reference version used by HLA-HD must be the same as the IMGT reference version used to create the MHC Hammer reference files. If HLA allele types are input to MHC Hammer, these allele types must be present in the MHC Hammer reference files. More information on this is prodived below.

Option 1: Install HLA-HD and its dependencies locally (recommended)

The steps are as follows:

  1. On the HLA-HD website fill in the download request form to get a download link for HLA-HD

  2. Move the downloaded hlahd.version.tar.gz file into the project bin directory.

    mv /path/to/hlahd_download.tar.gz ${project_dir}/bin/
  3. Run the install_hlahd.sh script. This script will:

    • install HLA-HD and bowtie2 (2.5.1) and store them in the ${project_dir}/bin/ directory.
    • update the HLA-HD allele dictionary to the IMGT database version 3.55. This is the same IMGT version that was used to make the reference files which can be downloaded from https://zenodo.org/records/12606532

    This install_hlahd.sh script requires:

    • g++, wget and python3 to be installed
    • The mhc_hammer_preprocessing_latest.sif to be in the $project_dir/singularity_images/ folder (see below)
    • The hlahd_download variable to be set as the path to /path/to/hlahd_download.tar.gz.

    To download the mhc_hammer_preprocessing_latest.sif container:

    cd ${project_dir}/singularity_images
    singularity pull --arch amd64 library://tpjones15/default/mhc_hammer_preprocessing:latest
    mhc_hammer_preprocessing_sif="${project_dir}/singularity_images/mhc_hammer_preprocessing_latest.sif"

    Then, run install_hlahd.sh:

    bash ${project_dir}/scripts/install_hlahd.sh -p ${project_dir} -h ${hlahd_download}

    If you want to use a different version of the IMGT database with HLA-HD you can change line 14 in bin/update.dictionary.alt.sh to your choosen version of the IMGT database:

    wget https://github.com/ANHIG/IMGTHLA/raw/3550/hla.dat.zip ## this downloads version 3.55
    
    ## For example, for version 3.38, replace line above with:
    wget https://github.com/ANHIG/IMGTHLA/raw/3380/hla.dat.zip
    
    ## Or, for the latest version:
    wget https://github.com/ANHIG/IMGTHLA/raw/Latest/hla.dat.zip
    

    Remember that the HLA-HD database version should match the version used to create the files in the assets/mhc_references folder.

  4. When running the pipeline ensure you run with --hlahd_local_install true (default)

Option 2: Create your own HLA-HD singularity container

We are unable to provide a singularity container for HLA-HD tool. Instead, we have provided steps to create your own container:

  1. On the HLA-HD website fill in the download request form to get a download link for HLA-HD

  2. Edit the assets/hlahd_container.def file:

    • Update the /path/to/downloaded/hlahd.version.tar.gz in the %files section
    • Update the /path/to/project_dir/bin/update.dictionary.alt.sh in the %files section
    • Update the HLAHD_VERSION variable in the %post section
  3. Build the singularity image:

    singularity build hlahd.sif assets/hlahd_container_template.def
  4. Move the image into the singularity_images directory

    mv hlahd.sif singularity_images
  5. When running the MHC Hammer pipeline ensure you run with --hlahd_local_install false.

    If you want to use a different version of the IMGT database with HLA-HD you can change line 14 in bin/update.dictionary.alt.sh to your choosen version of the IMGT database before building the image:

    wget https://github.com/ANHIG/IMGTHLA/raw/3550/hla.dat.zip ## this downloads version 3.55
    
    ## For example, for version 3.38, replace line above with:
    wget https://github.com/ANHIG/IMGTHLA/raw/3380/hla.dat.zip
    
    ## Or, for the latest version:
    wget https://github.com/ANHIG/IMGTHLA/raw/Latest/hla.dat.zip
    

    Remember that the HLA-HD database version should match the version used to create the files in the assets/mhc_references folder.

Option 3: Input HLA alleles to MHC Hammer

If you already have HLA allele types for your samples you can skip the HLA-HD step in the pipeline. To do this:

  • add a new column to the inventory called hla_alleles_path that contains the path to a csv file listing the HLA alleles. This table should have three columns with no column names. The columns are:
    • Gene
    • Allele 1 type
    • Allele 2 type

An example of the file format can be found here: https://github.com/McGranahanLab/mhc-hammer/blob/main/test/data/SIM001_hla_alleles.csv

  • run the pipeline with the --run_hlahd false flag.

Remember that the alleles input to MHC Hammer must be present in the MHC Hammer reference files in the assets/mhc_references folder. You can get a list of alleles from the fasta file, e.g. grep '^>' assets/mhc_references/mhc_genome.fasta

6. Update the HPC config files

The conf/hpc.config file controls how the pipeline is run on your HPC system. Before running the pipeline you may want to update the variables in conf/hpc.config to suit your HPC system. In particular, it might be useful to specify the singularity bind directory by adding

singularity {
    runOptions = "-B /bind_directory"
}

to conf/hpc.config, and changing bind_directory to your choosen path. You may also need to add the name of your HPC queue by adding

process {
    queue = 'cpu'
}

to conf/hpc.config, and changing cpu to the name of the HPC queue that you are using.

Alternatively, if it exists, you can use a config file specific for your institute. See this page for more information on nextflow config files.

7. Update the MHC Hammer pipeline parameters

You can change the MHC Hammer pipeline parameters from the default in the nextflow.conf file. Alternatively, you can change the parameters by inputting them directly when you run the pipeline. For a full overview of the pipeline parameters run:

nextflow run main.nf --help --show_hidden_params

Running the MHC Hammer pipeline

To run the MHC Hammer pipeline:

nextflow run main.nf \
--input /path/to/inventory \
-c conf/hpc.config -resume

This command needs to be run from the project directory.

The -resume flag tells the pipeline to not rerun tasks that have sucessfully completed. See this page for more information on Nextflow caching.

To change a pipeline parameter, either change the parameter in the nextflow.conf file, or directly as an input to the pipeline. Parameters input to the pipeline take precedence over parameters in the nextflow.conf file. For example, to change the min_depth parameter:

nextflow run main.nf \
--input /path/to/inventory \
-c conf/hpc.config \
--min_depth 5 -resume

Running the MHC Hammer pipeline with subsetted BAM files and flagstat output

If you already have subsetted BAM files and flagstat output, you can input these to the MHC Hammer pipeline instead of rerunning these steps. To do this:

  • the bam_path column in the inventory file should contain the path to the subsetted BAM files
  • add a new column to the inventory called library_size_path that contains the path to a text file with the library size for the sample. This can be calculated from the flagstat output.
  • run the pipeline with the --run_bam_subsetting false flag.

MHC Hammer pipeline outputs

By defult, the output is saved in the working directory in a folder called mhc_hammer_results. See docs/mhc_hammer_outputs.md for an overview of all outputs from MHC Hammer.

Test dataset

A test dataset is provided. The input BAMs and inventory are in the test/data folder. Note that you will need to update the inventory columns bam_path and hla_alleles_path so that they contain the full paths to the files.

To run the pipeline with the test dataset, including the HLA-HD step:

nextflow run main.nf -profile test,singularity --input test/data/mhc_hammer_test_inventory.csv

To run the pipeline with the test dataset, without the HLA-HD step:

nextflow run main.nf -profile test,singularity --input test/data/mhc_hammer_test_inventory.csv --run_hlahd false

The output will be saved in the test/results folder.

Files downloaded in the assets directory

Files downloaded with the git repository

  • codon_table.csv - contains a mapping between codons and amino acids, this is used to determine the consequence of alternate splicing events in the HLA alleles.
  • contigs_placeholder.txt - This is a placeholder for the subset BAM module. It will be ignored if user inputs a new path to a contigs file.
  • hlahd_container_template.def - A template for making a HLA-HD singularity file
  • mhc_coords_chr6.txt - these genomic coordinates can be used when subsetting the bams. Any reads falling within these coordinates are included in the subsetted bams.
  • strand_info.txt - contains a mapping between the HLA gene and the strand (forward="+" or reverse="-")
  • transcriptome_placeholder.txt - A placeholder so the pipeline will run with only WXS data.

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

About

Pipeline to detect HLA disruption from WES and RNAseq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published