Noblis’ in-house tools such as BioVelocity and PSET enable advanced bioinformatic analysis in a variety of domains. Backed by a team of subject matter experts, Noblis can leverage these tools along with community-developed open source applications to solve a variety of complex problems quickly.
BioVelocity is a bioinformatics tool based on an innovative algorithm and approach to genomic reference indices. Using a fast and accurate hashing algorithm, BioVelocity can quickly align reads to a set of references. BioVelocity takes advantage of a CRAY-XMT2 supercomputer with four terabytes of RAM—resulting in faster speeds, increased functionality, increased throughput, and improved accuracy over current technologies. The CRAY-XMT2 enables us to use a brute force index, built out of all possible base pair sequences of various k-mer lengths. This index is used to map against thousands of references and allows for quick alignment of the k-mers amongst them simultaneously.
BioVelocity has a variety of functions that the user can choose from, including SNP detection, metagenomics analysis, conserved/signature sequence detection, and compression.
K-mer index sequences are matched with the reference genome sequence, then the algorithm pipeline looks for a read that starts with similar bases. Once the read with the similar sequence is found, it is aligned against the reference genome based on the k-mer index position. In this case, the k-mer index looks for matching sequences from start to end of the reference genome.
BioVelocity involves simple alignment, where the reads align to the reference genome from start to finish of the read length. A Needleman-Wunch algorithm was also added to the BioVelocity pipeline to facilitate the selection of indels.
BioVelocity read mapping conducts global pairwise alignment; without skipping any bases, the k-mer index searches the entire reference genome globally to find a matching sequence that is complementary with a read. Local pairwise alignment aligns the reads on the reference genomes at random positions where the k-mer indexes are matched with the reference genomes.
As sequencing costs drop and thousands of new complete genomes are added to public repositories every year, the landscape of ground truth for current bacterial and viral organisms is in constant flux. Many PCR assays currently used to detect agents and foodborne pathogens were designed years ago, which means they are limited by the availability of genomic data at the time of their inception. Could there be a way to use newer genomic information to ensure that your current assays are still performing as expected?
Noblis developed a tool called the PCR Signature Erosion Tool (PSET). PSET tests PCR assays in silico against the latest versions of NCBI's sequence databases to determine if they still match only to their intended targets.
Primer3 is a tool for picking primers for PCR reactions. It considers a range of criteria such as oligonucleotide melting temperature, size, GC content, and primer-dimer possibilities. We use Primer3 along with our signature detection process to identify potential new primer sets.
TensorFlow is an open source software library for numerical computation using data flow graphs. It was developed for conducting machine learning and deep neural networks research. We use TensorFlow to evaluate our algorithms for the classification of multi-contributor human DNA samples. www.tensorflow.org
kSNP v3 performs SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes.
kChooser determines the optimum k-mer size for a dataset and calculates FCK, a measure of diversity of sequences in the dataset.
Cytoscape is an open-source software platform for visualizing molecular interaction networks and biological pathways; it integrates these networks with annotations, gene expression profiles, and other state data. We use Cytoscape for many different applications, including generating temporal graphs, associating genetic drift with antimicrobial resistance, and even predicting stock market movements. It's great for visualization of discovered relationships and further analysis of relationship networks.
SeqSero is a novel web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC, and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies.