Whole genome multi locus sequence typing (wgMLST)

Multi-locus sequence typing (MLST) has proven its usefulness for molecular typing of bacteria over the last 15 years. Classical MLST schemes typically define seven loci (housekeeping genes), which are sequenced using Sanger technology. Unique sequences for each locus are assigned allele numbers and bacterial strains are identified based on their allelic profiles, which is the combination of the seven allele numbers.

As next-generation sequencing, which offers a fast and cost-effective way to sequence bacterial genomes, is increasingly replacing Sanger sequencing, conventional MLST can be extended to whole genome MLST (wgMLST). Since many more loci (typically 1500 – 4000) are considered in wgMLST, a much higher typing resolution can be obtained.

NGS analysis optionsIn contrast to whole genome SNP analysis, wgMLST is based on the concept of allelic variation, meaning that recombinations and deletions or insertions of multiple positions are counted as single evolutionary events. This approach might be biologically more relevant than approaches that consider only point mutations.

The major drawback of the technique is that it requires allele curation. In absence of suitable automated tools, it would be a daunting task to maintain a consistent allele assignment for thousands of loci. To accommodate for this, automated curation tools are provided.

BIONUMERICS for whole genome multi locus sequence typing

The wgMLST curator and WGS tools plugins

The wgMLST curator plugin offers a full range of automated curation tools needed to set up and maintain a wgMLST schema for any organism of choice.

  • Automatic naming of alleles
  • Automatic sequence type assignment
  • Creation of sub-schemes such as MLST, eMLST, rMLST starting from wgMLST
  • Access to quality control tools

Whereas only a limited number of people will have access to the curated organism-specific reference database, most users will only need the WGS tools plugin. This plugin provides a fully automated pipeline for identifying alleles based on whole genome sequence data.

BIONUMERICS uses two methods to identify alleles:

  • based on a de novo assembly, followed by a BLAST search
  • using an assembly-free approach, i.e. directly from the sequence reads

Calculation engine

In BIONUMERICS, demanding calculations such as de novo assemblies can be performed on an external calculation engine. The choice here is offered between virtually setup-free pay-per-use cloud solutions (e.g. via Amazon) or a local deployment e.g. on a computer cluster (requires custom services).

From within BIONUMERICS, jobs can be posted on the calculation engine and the results from such calculations retrieved (including parameters for quality control). Only the wgMLST allelic profiles are stored in the BIONUMERICS database as character sets, resulting in a lightweight and responsive strain database.

Typing schemes on different levels

Based on the loci included in the wgMLST scheme, additional typing schemes can be defined on different levels, e.g. core genome MLST (cMLST), ribosomal MLST (rMLST), etc.

The character views in BIONUMERICS offer a flexible tool to select the set of loci used for typing, cluster analysis (e.g. minimum spanning trees) or statistical tests present in the software.

Why use BIONUMERICS for your wgMLST applications?

BIONUMERICS offers you:

  • A fully automated pipeline for wgMLST
  • Integrated calculation engine: either cloud-based or setup on-premises possible
  • Lightweight sample database
  • Flexible selection of loci

Currently, fully functional wgMLST schemas are available for following organisms:

Organism Total # loci Subschemas (#loci) Reference genomes
Acinetobacter baumannii 5,633 wgMLST (5,619)
MLST PubMLST Oxford (7)
MLST PubMLST Pasteur (7)
1,734
Bacillus cereus 30,363 wgMLST (30,356)
MLST PubMLST (7)
372
Bacillus subtilis 7,753 wgMLST (7,746)
MLST PubMLST (7)
 
Burkholderia cepacia complex 45,472 wgMLST (45,465)
MLST PubMLST (7)
336
Brucella spp. 5,325 TRUNK (3,246)
inopinata (4,122)
microti (4,102)
pinnipedialis (4,133)
vulpis (3,891)
ceti (4,119)
suis (4,191)
abortus (4,096)
neotomae (4,068)
canis (4,183)
melitensis (4,217)
ovis (4,110)
MLST PubMLST 9 loci (9)
MLST PubMLST 21 loci (21)
372
Campylobacter coli - C. jejuni 3,529 wgMLST loci (3,522)
Core Oxford (1,343)
rMLST (52)
MLST jejuni PubMLST (7)
96
Citrobacter spp. 29,176 wgMLST (29,169)
MLST freundii PubMLST (7)
134
Clostridium difficile 8,745 wgMLST loci (8,712)
core (1,999)
MLST PubMLST (7)
MLST Pasteur (7)
PubMLST other loci (13)
CWP cluster genes PubMLST (6)
259
Cronobacter spp. 15,739 wgMLST loci (15,727)
cogMLST loci (1,865)
Gcog loci (222)
Ext-MLST loci (10)
Tax-MLST loci (9)
O-serotype loci (2)
MLST PubMLST (7)
78
Enterobacter cloacae 15,612 wgMLST (15,605)
MLST PubMLST (7)
846
Enterococcus faecalis 5,292 wgMLST (5,285)
MLST PubMLST (7)
493
Enterococcus faecium 5,496 wgMLST (5,489)
MLST PubMLST (7)
510
Enterococcus raffinosus 4,470 wgMLST (4,470) 4
Escherichia coli / Shigella 17,380 wgMLST loci(17,350)
core Enterobase (2,513)
MLST Pasteur (8)
MLST Whittam (15)
MLST PubMLST Achtman (7)
289
Francisella tularensis 2,808 wgMLST (2,808)
core (1,147)
226
Klebsiella aerogenes 14,229 wgMLST (14,222)
MLST PubMLST (7)
116
Klebsiella oxytoca 16,277 wgMLST (16,270)
MLST PubMLST (7)
84
Klebsiella pneumoniae 19,729 wgMLST loci (19,720)
Core (634)
MLST Pasteur (7)
Wzc (1)
wzi (1)
67
Lactobacillus sanfranciscensis 1,797 wgMLST (1,797)  
Legionella pneumophila 5,777 wgMLST loci (5,770)
core (1,521)
SBT (7)
32
Leuconostoc spp. 16,274 wgMLST (16,274)  
Listeria monocytogenes 4,804 wgMLST loci (4,797)
Core Pasteur (1,748)
MLST PubMLST (7)
150
Micrococcus spp. 8,207 wgMLST (8,207)  
Mycobacterium bovis 4,701 wgMLST (4,701)  
Mycobacterium kansasii 8,629 wgMLST (8,629)  
Mycobacterium leprae 2,237 wgMLST (2,237)  
Mycobacterium tuberculosis 4,032 Core (v2) (2,891) 46
Neisseria gonorrhoeae 2,431 wgMLST (2,424)
MLST PubMLST (7)
11
Neisseria meningitidis 2,909 wgMLST (2,887)
core (1,587)
eMLST partial genes pubMLST (20)
MLST PubMLST (7)
 
Pasteurella multocida 3,803 wgMLST (3,789)
MLST multihost PubMLST (7)
MLST RIRDC PubMLST (7)
 
Proteus vulgaris 5,201 wgMLST (5,201)  
Pseudomonas aeruginosa 15,143 wgMLST (15,136)
MLST PubMLST (7)
400
Salmonella enterica 15,874 wgMLST (15,867)
Core Enterobase (3,002)
MLST PubMLST (7)
260
Serratia marcescens 9,377 wgMLST (9,377) 299
Staphylococcus aureus 3,904 wgMLST loci (3,897)
Core loci (1,861)
MLST PubMLST (7)
31
Staphylococcus epidermidis 4,877 wgMLST (4,870)
MLST PubMLST (7)
 
Staphylococcus pseudointermedius 4,431 wgMLST (4,424)
MLST PubMLST (7)
 
Stenotrophomonas maltophilia 17,603 wgMLST (17,596)
MLST PubMLST (7)
 
Streptococcus agalactiae 4,052 wgMLST (4,045)
MLST PubMLST (7)
 
Streptococcus mitis / oralis 10,389 wgMLST (10,382)
MLST oralis PubMLST (7)
 
Streptococcus pyogenes 2,735 wgMLST (2,728)
MLST PubMLST (7)
 
Weissella ssp. 24,004 wgMLST (24,004)  

All wgMLST schemas mentioned above are automatically curated.

Following wgMLST schemas are currently under development and will be available soon:

  • Bordetella pertussis
  • Clostridium botulinum
  • Mycobacterium abscessus
  • Vibrio spp.
  • Yersinia spp.

If you are working on one of these organisms or any other organism not present in this list, please do not hesitate to contact us for a possible collaboration on wgMLST analysis.

wgMLST: the future of molecular typing!

You want to try it out for yourself?

You can! Just request a BIONUMERICS evaluation license and download the Brucella, Escherichia coli, Listeria monocytogenes, or Staphylococcus aureus wgMLST demonstration database from our sample data download page. These databases allows you to get a feeling for wgMLST analysis without the need for a wgMLST project or calculation engine credits. Several wgMLST tutorials are available to guide you through the process.

Supported in BIONUMERICS configurations: 
BIONUMERICS-SEQ
BIONUMERICS-SEQ with COMPLIANT option
BIONUMERICS-SUITE
BIONUMERICS-SUITE with COMPLIANT option
Download this application note as PDF: