Tools

Bio Tools at Noblis


Noblis’ in-house tools such as BioVelocity and PSET enable advanced bioinformatic analysis in a variety of domains. Backed by a team of subject matter experts, Noblis can leverage these tools along with community-developed open source applications to solve a variety of complex problems quickly.

  • We have experts in bioinformatics, data science, microbiology, and software development that work in concert to provide holistic solutions to even the most difficult problems.
  • Noblis developed tools run on high performance computing architecture and excel at analyzing very large datasets such as next-generation sequencing reads.

Diagram showing relationship between SME's, Opn source and noblis tools and how the combine to make the noblis performmance space
BioVelocity Logo

BioVelocity is a bioinformatics tool based on an innovative algorithm and approach to genomic reference indices. Using a fast and accurate hashing algorithm, BioVelocity can quickly align reads to a set of references. BioVelocity takes advantage of a CRAY-XMT2 supercomputer with four terabytes of RAM—resulting in faster speeds, increased functionality, increased throughput, and improved accuracy over current technologies. The CRAY-XMT2 enables us to use a brute force index, built out of all possible base pair sequences of various k-mer lengths. This index is used to map against thousands of references and allows for quick alignment of the k-mers amongst them simultaneously.

BioVelocity has a variety of functions that the user can choose from, including SNP detection, metagenomics analysis, conserved/signature sequence detection, and compression.

K-mer index sequences are matched with the reference genome sequence, then the algorithm pipeline looks for a read that starts with similar bases. Once the read with the similar sequence is found, it is aligned against the reference genome based on the k-mer index position. In this case, the k-mer index looks for matching sequences from start to end of the reference genome.

BioVelocity involves simple alignment, where the reads align to the reference genome from start to finish of the read length. A Needleman-Wunch algorithm was also added to the BioVelocity pipeline to facilitate the selection of indels.

BioVelocity read mapping conducts global pairwise alignment; without skipping any bases, the k-mer index searches the entire reference genome globally to find a matching sequence that is complementary with a read. Local pairwise alignment aligns the reads on the reference genomes at random positions where the k-mer indexes are matched with the reference genomes.

Pset Logo

As sequencing costs drop and thousands of new complete genomes are added to public repositories every year, the landscape of ground truth for current bacterial and viral organisms is in constant flux. Many PCR assays currently used to detect agents and foodborne pathogens were designed years ago, which means they are limited by the availability of genomic data at the time of their inception. Could there be a way to use newer genomic information to ensure that your current assays are still performing as expected?

Noblis developed a tool called the PCR Signature Erosion Tool (PSET). PSET tests PCR assays in silico against the latest versions of NCBI's sequence databases to determine if they still match only to their intended targets.

  • As NCBI's database and other public databases are updated over time, newly added strain genomes can highlight where primers and probes may no longer be functional or where PCR assays may detect previously un-sequenced near neighbors.
  • Using this information, an assay provider can be better aware of potential false hits and be better prepared to design new primers when false hits become intractable.
  • The Army’s Defense Biological Products Assurance Office currently uses PSET to periodically test their assay collection.

Open Source Tools


PRIMER3

Primer3 is a tool for picking primers for PCR reactions. It considers a range of criteria such as oligonucleotide melting temperature, size, GC content, and primer-dimer possibilities. We use Primer3 along with our signature detection process to identify potential new primer sets.

http://primer3.sourceforge.net/

TENSORFLOW

TensorFlow is an open source software library for numerical computation using data flow graphs. It was developed for conducting machine learning and deep neural networks research. We use TensorFlow to evaluate our algorithms for the classification of multi-contributor human DNA samples. www.tensorflow.org

http://www.tensorflow.org

kSNP and kChooser

kSNP v3 performs SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes.

kChooser determines the optimum k-mer size for a dataset and calculates FCK, a measure of diversity of sequences in the dataset.

https://sourceforge.net/projects/ksnp/

Cytoscape

Cytoscape is an open-source software platform for visualizing molecular interaction networks and biological pathways; it integrates these networks with annotations, gene expression profiles, and other state data. We use Cytoscape for many different applications, including generating temporal graphs, associating genetic drift with antimicrobial resistance, and even predicting stock market movements. It's great for visualization of discovered relationships and further analysis of relationship networks.

www.cytoscape.org

Serovar Identification Tool

SeqSero is a novel web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC, and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies.

http://www.denglab.info/SeqSero