Data Release Notes

From BiomarkerKB Wiki
Jump to navigation Jump to search

Versioning Format

The versioning format follows a three-digit structure: X.Y.Z.

  • The first digit (X) changes annually, typically at the start of a new project year.
  • The second digit (Y) increments with each new release.
  • The third digit (Z) is updated for bug fixes or minor changes.

Version 1.0.2

Data Updates

Backend and Infrastructure Updates

  • evidence_source database names now retain their original casing for accuracy and consistency.
  • EDRN identifiers were added to the namespace map.
  • HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
  • Fixed an issue where evidence_source values without tags were previously dropped; these are now preserved.
  • Added a user-guided spelling correction function to improve data entry quality.
  • The TSV-to-JSON converter now automatically checks for header spelling errors.
  • Introduced _suggest_header_corrections to flag and propose fixes for misspelled headers.
  • Enhanced _stream_tsv with a call to _check_header_spelling to prevent invalid headers from being processed.

Version 1.0.1

Data Updates

  • Added xrefs.tsv to the list of datasets.

Backend & Infrastructure Updates

  • Fixed ID formatting issues in NCBI and UniProt references within oncomx.tsv, removing erroneous spaces (e.g., NCBI: 3288 NCBI:3288) and extraneous text (e.g., "(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
  • Merged assessed entity type synonyms.

Version 1.0.0

  • BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.