Whole genome multi locus sequence typing (wgMLST)

Multi-locus sequence typing (MLST) has proven its usefulness for molecular typing of bacteria over the last 15 years. Classical MLST schemes typically define seven loci (housekeeping genes), which are sequenced using Sanger technology. Unique sequences for each locus are assigned allele numbers and bacterial strains are identified based on their allelic profiles, which is the combination of the seven allele numbers.

As next-generation sequencing, which offers a fast and cost-effective way to sequence bacterial genomes, is increasingly replacing Sanger sequencing, conventional MLST can be extended to whole genome MLST (wgMLST). Since many more loci (typically 1500 – 2500) are considered in wgMLST, a much higher typing resolution can be obtained.

NGS analysis optionsIn contrast to whole genome SNP analysis, wgMLST is based on the concept of allelic variation, meaning that recombinations and deletions or insertions of multiple positions are counted as single evolutionary events. This approach might be biologically more relevant than approaches that consider only point mutations.

The major drawback of the technique is that it requires allele curation. In absence of suitable automated tools, it would be a daunting task to maintain a consistent allele assignment for thousands of loci. To accommodate for this, automated curation tools are provided.

BioNumerics for whole genome multi locus sequence typing

The wgMLST curator and WGS tools plugins

The wgMLST curator plugin offers a full range of automated curation tools needed to set up and maintain a wgMLST schema for any organism of choice.

  • Automatic naming of alleles
  • Automatic sequence type assignment
  • Creation of sub-schemes such as MLST, eMLST, rMLST starting from wgMLST
  • Access to quality control tools

Whereas only a limited number of people will have access to the curated organism-specific reference database, most users will only need the WGS tools plugin. This plugin provides a fully automated pipeline for identifying alleles based on whole genome sequence data.

BioNumerics uses two methods to identify alleles:

  • based on a de novo assembly, followed by a BLAST search
  • using an assembly-free approach, i.e. directly from the sequence reads

Calculation engine

In BioNumerics version 7.5 and higher, demanding calculations such as de novo assemblies can be performed on an external calculation engine. The choice here is offered between virtually setup-free pay-per-use cloud solutions (e.g. via Amazon) or a local deployment e.g. on a computer cluster (requires custom services).

From within BioNumerics, jobs can be posted on the calculation engine and the results from such calculations retrieved (including parameters for quality control). Only the wgMLST allelic profiles are stored in the BioNumerics database as character sets, resulting in a lightweight and responsive strain database. This also means that hardware requirements for your desktop or laptop computer running BioNumerics are kept low and that you can let the calculation engine do all the heavy lifting!

Typing schemes on different levels

Based on the loci included in the wgMLST scheme, additional typing schemes can be defined on different levels, e.g. core genome MLST (cMLST), ribosomal MLST (rMLST), etc.

The character views in BioNumerics version 7.5 and higher offer a flexible tool to select the set of loci used for typing, cluster analysis (e.g. minimum spanning trees) or statistical tests present in the software.

Why use BioNumerics for your wgMLST applications?

BioNumerics offers you:

  • A fully automated pipeline for wgMLST
  • Integrated calculation engine: either cloud-based or setup on-premises possible
  • Lightweight sample database
  • Flexible selection of loci

Currently, fully functional wgMLST schemas are available for following organisms:

Following wgMLST schemas are currently under development and will be available soon:

  • Francisella tularensis
  • Cronobacter
  • Enterococcus
  • Neisseria gonorrhea
  • Clostridium difficile
  • Vibrio
  • Yersinia
  • Acinetobacter baumanii
  • Pseudomonas aeruginosa
  • Brucella sp.

If you are working on one of these organisms or any other organism not present in this list, please do not hesitate to contact us for a possible collaboration on wgMLST analysis.

wgMLST: the future of molecular typing!

You want to try it out for yourself?

You can! Just request a BioNumerics evaluation license and download the Staphylococcus aureus or Listeria monocytogenes wgMLST demonstration database from our sample data download page. These databases allows you to get a feeling for wgMLST analysis without the need for a wgMLST project or calculation engine credits. A wgMLST tutorial is available to guide you through the process.

Download this application note as PDF: 
Share this: