Nucleati NLP Utilities

More than 30 million peer-reviewed, human-written articles constitute a massive dataset to drive and support scientific discoveries. However, so far, mostly human used this knowledge base to mine and make scientific discoveries. We are on a mission to develop tools to extract more out of this knowledge base. We are proud to make some of these utilities available for the scientific community to use.

Utilities to power AI-driven scientific discoveries
Convert biomedical annotations data from one format to another

NER Data Format Converter

Named entity recognition is a primary information extraction process that finds relevant entity mentions from unstructured text. Corpora for various entity extraction tasks have been built by different groups and made available in multiple formats. The converter helps to change annotations from one format to another.

BioC Diff

Although numerical measures (like precision and recall) provide an overall performance of NER prediction results, there is a need to visualize and understand the difference between actual and predicted annotations. BioC-Diff takes two BioC-XML files and computes and shows similarity and difference between them through an intuitive interface.

Visually compare bio-medical annotations in two BioC files