Data Release Notes: Difference between revisions

From BiomarkerKB Wiki
Jump to navigation Jump to search
MariaKim (talk | contribs)
No edit summary
MariaKim (talk | contribs)
No edit summary
 
(11 intermediate revisions by 3 users not shown)
Line 2: Line 2:
The versioning format follows a three-digit structure: X.Y.Z.
The versioning format follows a three-digit structure: X.Y.Z.
* The first digit (X) changes when a major update is introduced, such as changes in the data model.
* The first digit (X) changes when a major update is introduced, such as changes in the data model.
* The second digit (Y) increments with each new release.
* The second digit (Y) increments when new data is added.
* The third digit (Z) is updated for bug fixes or minor changes.
* The third digit (Z) is updated for bug fixes or minor changes.
== Version 2.3.1 ==
Date: January 22, 2026
=== Backend Updates ===
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].
== Version 2.3.0 ==
Date: January 12, 2026
=== Data Updates ===
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.
=== Backend Updates ===
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:
** <code>cgi</code> (Cancer Genome Interpreter)
** <code>civic</code> (CIViC)
** <code>clinvar</code> (ClinVar)
** <code>edrn</code> (Early Detection Research Network)
** <code>gwas</code> (Genome-Wide Association Studies)
** <code>llm_glycan</code> (LLM-extracted glycan biomarkers)
** <code>markerdb</code> (MarkerDB)
** <code>mw</code> (Metabolomics Workbench)
** <code>oncomx</code> (OncoMX)
** <code>opentargets</code> (OpenTargets)
** <code>PMC_biomarker_sets</code> (PubMed Central)
** <code>sennet</code> (SenNet Consortium)
** <code>top_50</code> (Top-50 clinically relevant biomarkers)
** <code>upkb_reviewed_v2</code> (UniProtKB)
=== Bug Fixes ===
* The <code>biomarker_controlled_vocab</code> field in TSV files is now constructed based on the <code>biomarker_id</code> and <code>biomarker_orig</code> tuple. Previously it only used <code>biomarker_id</code> as key, introducing inconsistencies in biomarkers that had multiple <code>biomarker_component</code> objects.
== Version 2.2.0 ==
Date: December 22, 2025
=== Data Updates ===
* Electronic Health Records data has been added to creatinine biomarkers.
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].
=== Backend Updates ===
* On the API level, each biomarker now contains a new field: <code>biomarker_controlled_vocab</code> which shows the standardized biomarker name. Original biomarker names are now shown in the <code>biomarker_orig</code> field.
== Version 2.1.0 ==
Date: December 11, 2025
=== Data Updates ===
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.
=== Backend Updates ===
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.
* LOINC codes are no longer tied to specimen IDs.
== Version 2.0.2 ==
Date: December 4, 2025
=== Bug Fixes ===
* LOINC codes are no longer tied to specimen (UBERON) IDs.
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by "in review".
== Version 2.0.1 ==
=== Data Updates ===
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:
** [https://www.gtexportal.org/home/ GTEx]
** [https://pharos.nih.gov/ Pharos]
** [https://reactome.org/ Reactome]
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]


== Version 2.0.0 ==
== Version 2.0.0 ==
Line 10: Line 71:
* Added metabolite as an <code>assessed_entity_type</code> to <code>mw_loinc_biomarkers.tsv</code>.
* Added metabolite as an <code>assessed_entity_type</code> to <code>mw_loinc_biomarkers.tsv</code>.
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.


== Version 1.0.6 ==
== Version 1.0.6 ==
Line 15: Line 77:
* Added a new dataset: MW LOINC biomarkers (<code>mw_loinc_biomarkers.tsv</code>).
* Added a new dataset: MW LOINC biomarkers (<code>mw_loinc_biomarkers.tsv</code>).
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.
=== Backend and Infrastructure Updates ===
=== Backend Updates ===
* Added the <code>display_name</code> field to the <code>format-converter</code> so data source names appear with correct casing.
* Added the <code>display_name</code> field to the <code>format-converter</code> so data source names appear with correct casing.


Line 23: Line 85:
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
* Added Cell Ontology and Protein Ontology cross-references.
* Added Cell Ontology and Protein Ontology cross-references.
=== Backend and Infrastructure Updates ===
=== Backend Updates ===
* Updated all script paths to use <code>data_source.conf</code> and validated data source names.
* Updated all script paths to use <code>data_source.conf</code> and validated data source names.


Line 40: Line 102:
* NCBI cross-references added across gene biomarker entries.
* NCBI cross-references added across gene biomarker entries.
* ChEBI cross-references integrated for small molecules and metabolites.
* ChEBI cross-references integrated for small molecules and metabolites.
=== Backend and Infrastructure Updates ===
=== Backend Updates ===
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.
** Old services retired 1 September 2025.
** Old services retired 1 September 2025.
Line 49: Line 111:
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].
=== Backend and Infrastructure Updates ===
=== Backend Updates ===
* <code>evidence_source</code> database names now retain their original casing for accuracy and consistency.
* <code>evidence_source</code> database names now retain their original casing for accuracy and consistency.
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].
Line 61: Line 123:
== Version 1.0.1 ==
== Version 1.0.1 ==
=== Data Updates ===
=== Data Updates ===
* Added <code> xrefs.tsv</code> to the list of datasets.
* Added <code>xrefs.tsv</code> to the list of datasets.
=== Backend & Infrastructure Updates ===
=== Backend Updates ===
* Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
* Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
* Merged assessed entity type synonyms.
* Merged assessed entity type synonyms.

Latest revision as of 16:31, 26 January 2026

Versioning Format

The versioning format follows a three-digit structure: X.Y.Z.

  • The first digit (X) changes when a major update is introduced, such as changes in the data model.
  • The second digit (Y) increments when new data is added.
  • The third digit (Z) is updated for bug fixes or minor changes.

Version 2.3.1

Date: January 22, 2026

Backend Updates

  • A master list of all biomarkers present in BiomarkerKB will be available for download at data.biomarkerkb.org.

Version 2.3.0

Date: January 12, 2026

Data Updates

  • New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.

Backend Updates

  • New Advanced Search type: users can now search biomarkers by Data Source. The following data sources are currently available:
    • cgi (Cancer Genome Interpreter)
    • civic (CIViC)
    • clinvar (ClinVar)
    • edrn (Early Detection Research Network)
    • gwas (Genome-Wide Association Studies)
    • llm_glycan (LLM-extracted glycan biomarkers)
    • markerdb (MarkerDB)
    • mw (Metabolomics Workbench)
    • oncomx (OncoMX)
    • opentargets (OpenTargets)
    • PMC_biomarker_sets (PubMed Central)
    • sennet (SenNet Consortium)
    • top_50 (Top-50 clinically relevant biomarkers)
    • upkb_reviewed_v2 (UniProtKB)

Bug Fixes

  • The biomarker_controlled_vocab field in TSV files is now constructed based on the biomarker_id and biomarker_orig tuple. Previously it only used biomarker_id as key, introducing inconsistencies in biomarkers that had multiple biomarker_component objects.

Version 2.2.0

Date: December 22, 2025

Data Updates

  • Electronic Health Records data has been added to creatinine biomarkers.
  • New dataset: senescence biomarkers from SenNet Consortium.

Backend Updates

  • On the API level, each biomarker now contains a new field: biomarker_controlled_vocab which shows the standardized biomarker name. Original biomarker names are now shown in the biomarker_orig field.

Version 2.1.0

Date: December 11, 2025

Data Updates

  • Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.

Backend Updates

  • The incorrect download links on the Data Portal have been fixed.
  • LOINC codes are no longer tied to specimen IDs.

Version 2.0.2

Date: December 4, 2025

Bug Fixes

  • LOINC codes are no longer tied to specimen (UBERON) IDs.
  • For biomarkers that could not be mapped to Controlled Vocabulary the original biomarker name is displayed, followed by "in review".

Version 2.0.1

Data Updates

Version 2.0.0

Data Updates

  • The biomarker field is now standardized using controlled vocabulary terms.
  • Added metabolite as an assessed_entity_type to mw_loinc_biomarkers.tsv.
  • Added RNAcentral cross-reference support.
  • Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.

Version 1.0.6

Data Updates

Backend Updates

  • Added the display_name field to the format-converter so data source names appear with correct casing.

Version 1.0.5

Data Updates

  • Updated the Troponin biomarker value assessed_biomarker_entity for consistency.
  • Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
  • Added Cell Ontology and Protein Ontology cross-references.

Backend Updates

  • Updated all script paths to use data_source.conf and validated data source names.

Version 1.0.4

This release introduces new datasets, cross-references, and bug fixes.

Data Updates

  • Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
  • Added Metabolomics Workbench LOINC data on metabolite biomarkers.
  • Added Cell Ontology and Protein Ontology cross-references.

Bug Fixes

  • Fixed issue where cookie preferences weren't being saved when selecting "Allow".

Version 1.0.3

This release introduces new cross-references and updates to ensure compatibility with external resources.

Data Updates

  • NCBI cross-references added across gene biomarker entries.
  • ChEBI cross-references integrated for small molecules and metabolites.

Backend Updates

  • ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.

Version 1.0.2

Data Updates

Backend Updates

  • evidence_source database names now retain their original casing for accuracy and consistency.
  • EDRN identifiers were added to the namespace map.
  • HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
  • Fixed an issue where evidence_source values without tags were previously dropped; these are now preserved.
  • Added a user-guided spelling correction function to improve data entry quality.
  • The TSV-to-JSON converter now automatically checks for header spelling errors.
  • Introduced _suggest_header_corrections to flag and propose fixes for misspelled headers.
  • Enhanced _stream_tsv with a call to _check_header_spelling to prevent invalid headers from being processed.

Version 1.0.1

Data Updates

  • Added xrefs.tsv to the list of datasets.

Backend Updates

  • Fixed ID formatting issues in NCBI and UniProt references within oncomx.tsv, removing erroneous spaces (e.g., NCBI: 3288 NCBI:3288) and extraneous text (e.g., "(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
  • Merged assessed entity type synonyms.

Version 1.0.0

  • BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.