SeqSero, launched in 2015, is a software tool for Salmonella serotype determination from whole genome sequencing (WGS) data. Despite its routine use in public health and food safety laboratories in the United States and other countries, the original SeqSero pipeline is relatively slow (minutes per genome using sequencing reads), is not optimized for draft genome assemblies, and may assign multiple serotypes for a strain. Here we present SeqSero2 (github.com/denglab/SeqSero2; denglab.info/SeqSero2), an algorithmic transformation and functional update of the original SeqSero. Major improvements include: 1) additional sequence markers for identification of Salmonella species and subspecies and certain serotypes; 2) a k-mer based algorithm for rapid serotype prediction from raw reads (seconds per genome) and improved serotype prediction from assemblies; and 3) a targeted assembly approach for specific retrieval of serotype determinants from WGS for serotype prediction, new allele discovery, and prediction troubleshooting. Evaluated using 5,794 genomes representing 364 common US serotypes, including 2,280 human isolates of 117 serotypes from the National Antimicrobial Resistance Monitoring System, SeqSero2 is up to 50 times faster than the original SeqSero while maintaining equivalent accuracy for raw reads and substantially improving accuracy for assemblies. SeqSero2 further suggested that 3% of the tested genomes contained reads from multiple serotypes, indicating a use for contamination detection. In addition to short reads, SeqSero2 demonstrated potential for accurate and rapid serotype prediction directly from long nanopore reads despite base call errors. Testing of 40 nanopore-sequenced genomes of 17 serotypes yielded a single H antigen misidentification.
Click here to read the article, published in the American Society for Microbiology.