Sixth researcher

Another Bioinformatics website

Tag: genotyping

Amplicon sequencing and high-throughput genotyping – HLA typing

In the previous post I explained the fundamentals about the Amplicon Sequencing  (AS) technique, today I will show some current and future applications in HLA-typing.

Other field of use of AS is the genotyping of complex gene families. For example, the major histocompatibility complex (MHC). This gene family is known to be highly polymorphic (high allele variation) and to have multiple copies of an ancestor gene (paralogues). MHC genes of class I and II codify the cellular receptors that present antigens to immune cells. MHC in humans is also called human leukocyte antigen (HLA). HLA-typing has a key role in the compatibility upon any tissue transplantation and has been associated with more than 100 different diseases (primarily autoimmune diseases) and recently is associated to various drug positive and negative responses. HLA loci are so polymorphic that there are not 2 individuals in a non-endogamic population with the same set of alleles (except twins).

Number of HLA alleles known up to date. Source: IMGT-HLA database

As in personalized medicine and metagenomics/metabarcoding, there are 2 approaches for NGS HLA-typing: the first is to use the whole genomic, exomic or transcriptome data and the second is to amplify specific HLA loci regions by amplicon sequencing. Second approach is suitable for typing hundreds/thousands of individuals but requires tested primers for multiplex PCR of HLA regions.

Basically the HLA-typing analysis workflow after sequencing the PCR products, consists in:

  1. Map/align the reads against HLA allele reference sequences from the IMGT-HLA public database.
  2. Retrieve the genotypes from the references with longer and better mapping scores.

Inoue et al. wrote a complete review about the topic in ‘The impact of next-generation sequencing technologies on HLA research‘.

HLA-typing workflow. Modified from Inoue et al.

Nowadays there are commercial kits that allow reliable, fast and economic HLA-typing: Illumina TruSight HLA v2, Omixon Holotype HLA, GenDx NGSgo or One Lambda NXType NGS.

Amplicon sequencing and high-throughput genotyping – Basics

Amplicon sequencing  (AS) technique consists in sequencing the products from multiple PCRs (amplicons). Where a single amplicon is the set of DNA sequences obtained in each individual PCR.

Before the arrival of high-throughput sequencing technologies, PCR products were Sanger sequenced individually. But Sanger sequencing is only able to resolve one DNA sequence (allele) per sample, or as maximum a mix of 2 sequences differing only in one nucleotide (usually heterozygous alleles):

Sanger sequencing limitations

The solution to the previous limitation was bacterial cloning and further isolation, amplification and sequencing of individual clones. Nevertheless, bacterial cloning is a time-consuming approach that is only feasible with few dozens of sequences.

Bacterial cloning to Sanger sequence clones individually

 

Fortunately, high-throughput sequencing techniques (HTS) also called next-generation sequencing (NGS) are able to sequence millions of sequences with individual resolution. And the combination of amplicon sequencing with NGS allows us to genotype hundreds/thousands of samples in a single experiment.

The only requirement to carry out such kind of experiment is to include different DNA tags to identify the individuals/samples in the experiment. A DNA tag is a short and unique sequence of nucleotides (ex. ACGGTA) that is attached to the 5′ end of any PCR primer or ligated after individual PCR. Each tag should be different for each sample/individual to analyze to be able to classify the sequences or reads from each individual PCR (amplicon) (Binladen et al. 2007; Meyer et al. 2007).

High-throughput amplicon sequencing schema

Again, NGS has other intrinsic problems: sequenced fragments are shorter than in Sanger and sequencing errors are more frequent. Random sequencing errors can be corrected by increasing the depth/coverage: reading more times the same sequence. And longer sequences can be split in shorter fragments and assembled together later by computer.

In the following link you can watch a video explaining the amplicon sequencing process using NGS:
http://www.jove.com/video/51709/next-generation-sequencing-of-16s-ribosomal-rna-gene-amplicons

 

Basically there are 4 basic steps in an NGS Amplicon Sequencing analysis:

  1. Experimental design of the primer sequences to amplify the desired gene regions (markers) and of the DNA tags to use to identify samples or individuals.
  2. PCR amplification of the markers, usually an individual PCR per sample.
  3. NGS sequencing of the amplification products. The most commonly used techniques are: Illumina, Ion Torrent and 454, also Pac Bio is increasing in importance as its error rate decreases.
  4. Bioinformatic analysis of the sequencing data. The analysis should consists in: classification of reads into amplicons, sequencing error correction, filtering of spurious and contaminant reads, and final displaying of results in a human readable way, ex. an Excel sheet.

Amplicon Sequencing 4 steps workflow

© 2018 Sixth researcher

Theme by Anders NorenUp ↑