Data Release Notes: Difference between revisions

From BiomarkerKB Wiki
Jump to navigation Jump to search
MariaKim (talk | contribs)
Data release 1.0.3
 
(8 intermediate revisions by 2 users not shown)
Line 2: Line 2:
The versioning format follows a three-digit structure: X.Y.Z.
The versioning format follows a three-digit structure: X.Y.Z.
* The first digit (X) changes when a major update is introduced, such as changes in the data model.
* The first digit (X) changes when a major update is introduced, such as changes in the data model.
* The second digit (Y) increments with each new release.
* The second digit (Y) increments when new data is added.
* The third digit (Z) is updated for bug fixes or minor changes.
* The third digit (Z) is updated for bug fixes or minor changes.
== Version 2.1.0 ==
Date: December 11, 2025
=== Data Updates ===
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.
=== Backend and Infrastructure Updates ===
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.
* LOINC codes are no longer tied to specimen IDs.
== Version 2.0.2 ==
Date: December 4, 2025
=== Bug Fixes ===
* LOINC codes are no longer tied to specimen (UBERON) IDs.
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by "in review".
== Version 2.0.1 ==
=== Data Updates ===
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:
** [https://www.gtexportal.org/home/ GTEx]
** [https://pharos.nih.gov/ Pharos]
** [https://reactome.org/ Reactome]
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]
== Version 2.0.0 ==
=== Data Updates ===
* The biomarker field is now standardized using controlled vocabulary terms.
* Added metabolite as an <code>assessed_entity_type</code> to <code>mw_loinc_biomarkers.tsv</code>.
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.
== Version 1.0.6 ==
=== Data Updates ===
* Added a new dataset: MW LOINC biomarkers (<code>mw_loinc_biomarkers.tsv</code>).
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.
=== Backend and Infrastructure Updates ===
* Added the <code>display_name</code> field to the <code>format-converter</code> so data source names appear with correct casing.
== Version 1.0.5 ==
=== Data Updates ===
* Updated the Troponin biomarker value <code>assessed_biomarker_entity</code> for consistency.
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
* Added Cell Ontology and Protein Ontology cross-references.
=== Backend and Infrastructure Updates ===
* Updated all script paths to use <code>data_source.conf</code> and validated data source names.
== Version 1.0.4 ==
This release introduces new datasets, cross-references, and bug fixes.
=== Data Updates ===
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.
* Added Cell Ontology and Protein Ontology cross-references.
=== Bug Fixes ===
* Fixed issue where cookie preferences weren't being saved when selecting "Allow".


== Version 1.0.3 ==
== Version 1.0.3 ==

Latest revision as of 17:41, 15 December 2025

Versioning Format

The versioning format follows a three-digit structure: X.Y.Z.

  • The first digit (X) changes when a major update is introduced, such as changes in the data model.
  • The second digit (Y) increments when new data is added.
  • The third digit (Z) is updated for bug fixes or minor changes.

Version 2.1.0

Date: December 11, 2025

Data Updates

  • Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.

Backend and Infrastructure Updates

  • The incorrect download links on the Data Portal have been fixed.
  • LOINC codes are no longer tied to specimen IDs.

Version 2.0.2

Date: December 4, 2025

Bug Fixes

  • LOINC codes are no longer tied to specimen (UBERON) IDs.
  • For biomarkers that could not be mapped to Controlled Vocabulary the original biomarker name is displayed, followed by "in review".

Version 2.0.1

Data Updates

Version 2.0.0

Data Updates

  • The biomarker field is now standardized using controlled vocabulary terms.
  • Added metabolite as an assessed_entity_type to mw_loinc_biomarkers.tsv.
  • Added RNAcentral cross-reference support.
  • Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.

Version 1.0.6

Data Updates

Backend and Infrastructure Updates

  • Added the display_name field to the format-converter so data source names appear with correct casing.

Version 1.0.5

Data Updates

  • Updated the Troponin biomarker value assessed_biomarker_entity for consistency.
  • Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
  • Added Cell Ontology and Protein Ontology cross-references.

Backend and Infrastructure Updates

  • Updated all script paths to use data_source.conf and validated data source names.

Version 1.0.4

This release introduces new datasets, cross-references, and bug fixes.

Data Updates

  • Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
  • Added Metabolomics Workbench LOINC data on metabolite biomarkers.
  • Added Cell Ontology and Protein Ontology cross-references.

Bug Fixes

  • Fixed issue where cookie preferences weren't being saved when selecting "Allow".

Version 1.0.3

This release introduces new cross-references and updates to ensure compatibility with external resources.

Data Updates

  • NCBI cross-references added across gene biomarker entries.
  • ChEBI cross-references integrated for small molecules and metabolites.

Backend and Infrastructure Updates

  • ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.

Version 1.0.2

Data Updates

Backend and Infrastructure Updates

  • evidence_source database names now retain their original casing for accuracy and consistency.
  • EDRN identifiers were added to the namespace map.
  • HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
  • Fixed an issue where evidence_source values without tags were previously dropped; these are now preserved.
  • Added a user-guided spelling correction function to improve data entry quality.
  • The TSV-to-JSON converter now automatically checks for header spelling errors.
  • Introduced _suggest_header_corrections to flag and propose fixes for misspelled headers.
  • Enhanced _stream_tsv with a call to _check_header_spelling to prevent invalid headers from being processed.

Version 1.0.1

Data Updates

  • Added xrefs.tsv to the list of datasets.

Backend & Infrastructure Updates

  • Fixed ID formatting issues in NCBI and UniProt references within oncomx.tsv, removing erroneous spaces (e.g., NCBI: 3288 NCBI:3288) and extraneous text (e.g., "(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
  • Merged assessed entity type synonyms.

Version 1.0.0

  • BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.