Whole genome multi locus sequence typing (wgMLST)

Multi-locus sequence typing (MLST) has proven its usefulness for molecular typing of bacteria over the last 15 years. Classical MLST schemes typically define seven loci (housekeeping genes), which are sequenced using Sanger technology. Unique sequences for each locus are assigned allele numbers and bacterial strains are identified based on their allelic profiles, which is the combination of the seven allele numbers.

As next-generation sequencing, which offers a fast and cost-effective way to sequence bacterial genomes, is increasingly replacing Sanger sequencing, conventional MLST can be extended to whole genome MLST (wgMLST). Since many more loci (typically 1500 – 4000) are considered in wgMLST, a much higher typing resolution can be obtained.

In contrast to whole genome SNP analysis, wgMLST is based on the concept of allelic variation, meaning that recombinations and deletions or insertions of multiple positions are counted as single evolutionary events. This approach might be biologically more relevant than approaches that consider only point mutations.

The major drawback of the technique is that it requires allele curation. In absence of suitable automated tools, it would be a daunting task to maintain a consistent allele assignment for thousands of loci. To accommodate for this, automated curation tools are provided.

BIONUMERICS for whole genome multi locus sequence typing

The wgMLST curator and WGS tools plugins

The wgMLST curator plugin offers a full range of automated curation tools needed to set up and maintain a wgMLST schema for any organism of choice.

Automatic naming of alleles
Automatic sequence type assignment
Creation of sub-schemes such as MLST, eMLST, rMLST starting from wgMLST
Access to quality control tools

Whereas only a limited number of people will have access to the curated organism-specific reference database, most users will only need the WGS tools plugin. This plugin provides a fully automated pipeline for identifying alleles based on whole genome sequence data.

BIONUMERICS uses two methods to identify alleles:

based on a de novo assembly, followed by a BLAST search
using an assembly-free approach, i.e. directly from the sequence reads

Calculation engine

In BIONUMERICS, demanding calculations such as de novo assemblies can be performed on an external calculation engine. The choice here is offered between virtually setup-free pay-per-use cloud solutions (e.g. via Amazon) or a local deployment e.g. on a computer cluster (requires custom services).

From within BIONUMERICS, jobs can be posted on the calculation engine and the results from such calculations retrieved (including parameters for quality control). Only the wgMLST allelic profiles are stored in the BIONUMERICS database as character sets, resulting in a lightweight and responsive strain database.

Typing schemes on different levels

Based on the loci included in the wgMLST scheme, additional typing schemes can be defined on different levels, e.g. core genome MLST (cMLST), ribosomal MLST (rMLST), etc.

The character views in BIONUMERICS offer a flexible tool to select the set of loci used for typing, cluster analysis (e.g. minimum spanning trees) or statistical tests present in the software.

Why use BIONUMERICS for your wgMLST applications?

BIONUMERICS offers you:

A fully automated pipeline for wgMLST
Integrated calculation engine: either cloud-based or setup on-premises possible
Lightweight sample database
Flexible selection of loci

Currently, fully functional wgMLST schemas are available for following organisms:

Organism	Total # loci	Subschemas (#loci)	Reference genomes
Acinetobacter baumannii	5,633	wgMLST (5,619) MLST PubMLST Oxford* (7) MLST PubMLST Pasteur* (7)	1,734
Bacillus cereus	30,363	wgMLST (30,356) MLST PubMLST* (7)	372
Bacillus licheniformis	6,610	wgMLST (6,604) MLST PubMLST* (6)	89
Bacillus pumilus	6,068	wgMLST (6,068)	31
Bacillus subtilis	7,753	wgMLST (7,746) MLST PubMLST* (7)	101
Burkholderia cepacia complex	45,472	wgMLST (45,465) MLST PubMLST* (7)	336
Brucella spp.	5,325	TRUNK (3,246) inopinata (4,122) microti (4,102) pinnipedialis (4,133) vulpis (3,891) ceti (4,119) suis (4,191) abortus (4,096) neotomae (4,068) canis (4,183) melitensis (4,217) ovis (4,110) MLST PubMLST 9 loci* (9) MLST PubMLST 21 loci* (21)	372
Campylobacter coli - C. jejuni	3,529	wgMLST loci (3,522) Core Oxford* (1,343) rMLST (52) MLST jejuni PubMLST* (7)	96
Citrobacter spp.	29,176	wgMLST (29,169) MLST freundii PubMLST* (7)	134
Clostridium difficile	8,745	wgMLST loci (8,712) core (1,999) MLST PubMLST* (7) MLST Pasteur (7) PubMLST other loci (13) CWP cluster genes PubMLST* (6)	259
Clostridium perfringens	9,744	wgMLST (9,736) MLST PubMLST* (8)	145
Cronobacter spp.	15,739	wgMLST loci (15,727) cogMLST loci (1,865) Gcog loci (222) Ext-MLST loci (10) Tax-MLST loci (9) O-serotype loci (2) MLST PubMLST* (7)	78
Enterobacter cloacae	15,612	wgMLST (15,605) MLST PubMLST* (7)	846
Enterococcus faecalis	5,292	wgMLST (5,285) MLST PubMLST* (7)	493
Enterococcus faecium	5,496	wgMLST (5,489) MLST PubMLST* (7)	510
Enterococcus raffinosus	4,470	wgMLST (4,470)	4
Escherichia coli / Shigella	17,380	wgMLST loci(17,350) core Enterobase (2,513) MLST Pasteur (8) MLST Whittam (15) MLST PubMLST Achtman (7)	289
Francisella tularensis	2,808	wgMLST (2,808) core (1,147)	226
Fusobacterium spp.	20,024	wgMLST (20,024)	169
Klebsiella aerogenes	14,229	wgMLST (14,222) MLST PubMLST* (7)	116
Klebsiella oxytoca	16,277	wgMLST (16,270) MLST PubMLST* (7)	84
Klebsiella pneumoniae	19,729	wgMLST loci (19,720) Core (634) MLST Pasteur (7) Wzc (1) wzi (1)	67
Lactobacillus sanfranciscensis	1,797	wgMLST (1,797)	20
Lactococcus lactis	8,800	wgMLST (8,800)	171
Legionella pneumophila	5,777	wgMLST loci (5,770) core (1,521) SBT (7)	32
Leuconostoc spp.	16,274	wgMLST (16,274)	113
Listeria monocytogenes	4,804	wgMLST loci (4,797) Core Pasteur (1,748) MLST PubMLST (7)	150
Micrococcus spp.	8,207	wgMLST (8,207)	35
Mycobacterium bovis	4,701	wgMLST (4,701)	93
Mycobacterium kansasii	8,629	wgMLST (8,629)	28
Mycobacterium leprae	2,237	wgMLST (2,237)	5
Mycobacterium tuberculosis	4,032	Core (v2) (2,891)	46
Neisseria gonorrhoeae	2,431	wgMLST (2,424) MLST PubMLST* (7)	11
Neisseria meningitidis	2,909	wgMLST (2,887) core (1,587) eMLST partial genes pubMLST* (20) MLST PubMLST* (7)	357
Paenibacillus larvae	5,745	wgMLST (5,738) MLST PubMLST* (7)	126
Pasteurella multocida	3,803	wgMLST (3,789) MLST multihost PubMLST* (7) MLST RIRDC PubMLST* (7)	157
Proteus vulgaris	5,201	wgMLST (5,201)	7
Pseudomonas aeruginosa	15,143	wgMLST (15,136) MLST PubMLST* (7)	400
Salmonella enterica	15,874	wgMLST (15,867) Core Enterobase (3,002) MLST PubMLST* (7) MLST PubMLST Achtman (7)	260
Serratia marcescens	9,377	wgMLST (9,377)	299
Staphylococcus aureus	3,904	wgMLST loci (3,897) Core loci (1,861) MLST PubMLST* (7)	31
Staphylococcus epidermidis	4,877	wgMLST (4,870) MLST PubMLST* (7)	312
Staphylococcus pseudointermedius	4,431	wgMLST (4,424) MLST PubMLST* (7)	118
Stenotrophomonas maltophilia	17,603	wgMLST (17,596) MLST PubMLST* (7)	179
Streptococcus agalactiae	4,052	wgMLST (4,045) MLST PubMLST* (7)	153
Streptococcus mitis / oralis	10,389	wgMLST (10,382) MLST oralis PubMLST* (7)	171
Streptococcus pyogenes	2,735	wgMLST (2,728) MLST PubMLST* (7)	304
Weissella ssp.	24,004	wgMLST (24,004)	80
Yersinia ssp.	20,658	wgMLST (20,638) MLST Yersinia spp. PubMLST* (7) MLST Y. ruckeri PubMLST* (6) cgMLST Yersinia spp. Pasteur (500) cgMLST Y. enterocolitica Pasteur (1,727) cgMLST Y. pseudotuberculosis Pasteur** (1,921)	88

*: Allelic profiles are synchronized with public nomenclature hosted on pubmlst.org

**: Allelic profiles are synchronized with public nomenclature hosted on bigsdb.pasteur.fr

All wgMLST schemas mentioned above are automatically curated. Last curation was performed 2020-12-20.

You want to try it out for yourself?

You can! Several wgMLST tutorials are available to guide you through the process.

Supported in BIONUMERICS configurations:

BIONUMERICS-SEQ

BIONUMERICS-SUITE

Search form