Data Release Notes: Difference between revisions

← Older edit

VisualWikitext

Latest revision as of 16:31, 26 January 2026

Versioning Format

The versioning format follows a three-digit structure: X.Y.Z.

The first digit (X) changes when a major update is introduced, such as changes in the data model.
The second digit (Y) increments when new data is added.
The third digit (Z) is updated for bug fixes or minor changes.

Version 2.3.1

Date: January 22, 2026

Backend Updates

A master list of all biomarkers present in BiomarkerKB will be available for download at data.biomarkerkb.org.

Version 2.3.0

Date: January 12, 2026

Data Updates

New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.

Backend Updates

New Advanced Search type: users can now search biomarkers by Data Source. The following data sources are currently available:
- cgi (Cancer Genome Interpreter)
- civic (CIViC)
- clinvar (ClinVar)
- edrn (Early Detection Research Network)
- gwas (Genome-Wide Association Studies)
- llm_glycan (LLM-extracted glycan biomarkers)
- markerdb (MarkerDB)
- mw (Metabolomics Workbench)
- oncomx (OncoMX)
- opentargets (OpenTargets)
- PMC_biomarker_sets (PubMed Central)
- sennet (SenNet Consortium)
- top_50 (Top-50 clinically relevant biomarkers)
- upkb_reviewed_v2 (UniProtKB)

Bug Fixes

The biomarker_controlled_vocab field in TSV files is now constructed based on the biomarker_id and biomarker_orig tuple. Previously it only used biomarker_id as key, introducing inconsistencies in biomarkers that had multiple biomarker_component objects.

Version 2.2.0

Date: December 22, 2025

Data Updates

Electronic Health Records data has been added to creatinine biomarkers.
New dataset: senescence biomarkers from SenNet Consortium.

Backend Updates

On the API level, each biomarker now contains a new field: biomarker_controlled_vocab which shows the standardized biomarker name. Original biomarker names are now shown in the biomarker_orig field.

Version 2.1.0

Date: December 11, 2025

Data Updates

Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.

Backend Updates

The incorrect download links on the Data Portal have been fixed.
LOINC codes are no longer tied to specimen IDs.

Version 2.0.2

Date: December 4, 2025

Bug Fixes

LOINC codes are no longer tied to specimen (UBERON) IDs.
For biomarkers that could not be mapped to Controlled Vocabulary the original biomarker name is displayed, followed by "in review".

Version 2.0.1

Data Updates

Added cross-references to the Common Fund Data Ecosystem (CFDE) Data Coordinating Centers and other resources:

Version 2.0.0

Data Updates

The biomarker field is now standardized using controlled vocabulary terms.
Added metabolite as an assessed_entity_type to mw_loinc_biomarkers.tsv.
Added RNAcentral cross-reference support.
Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.

Version 1.0.6

Data Updates

Added a new dataset: MW LOINC biomarkers (mw_loinc_biomarkers.tsv).
Added National Cancer Institute Thesaurus and Protein Data Bank cross-references.

Backend Updates

Added the display_name field to the format-converter so data source names appear with correct casing.

Version 1.0.5

Data Updates

Updated the Troponin biomarker value assessed_biomarker_entity for consistency.
Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
Added Cell Ontology and Protein Ontology cross-references.

Backend Updates

Updated all script paths to use data_source.conf and validated data source names.

Version 1.0.4

This release introduces new datasets, cross-references, and bug fixes.

Data Updates

Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
Added Metabolomics Workbench LOINC data on metabolite biomarkers.
Added Cell Ontology and Protein Ontology cross-references.

Bug Fixes

Fixed issue where cookie preferences weren't being saved when selecting "Allow".

Version 1.0.3

This release introduces new cross-references and updates to ensure compatibility with external resources.

Data Updates

NCBI cross-references added across gene biomarker entries.
ChEBI cross-references integrated for small molecules and metabolites.

Backend Updates

ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.
- Old services retired 1 September 2025.
- New stable API: ChEBI REST API docs
- New data products and beta interface available at ChEBI 2.0.

Version 1.0.2

Data Updates

Published updated Metabolomics Workbench data.
Published sample data from the Early Detection Research Network.

Backend Updates

evidence_source database names now retain their original casing for accuracy and consistency.
EDRN identifiers were added to the namespace map.
HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
Fixed an issue where evidence_source values without tags were previously dropped; these are now preserved.
Added a user-guided spelling correction function to improve data entry quality.
The TSV-to-JSON converter now automatically checks for header spelling errors.
Introduced _suggest_header_corrections to flag and propose fixes for misspelled headers.
Enhanced _stream_tsv with a call to _check_header_spelling to prevent invalid headers from being processed.

Version 1.0.1

Data Updates

Added xrefs.tsv to the list of datasets.

Backend Updates

Fixed ID formatting issues in NCBI and UniProt references within oncomx.tsv, removing erroneous spaces (e.g., NCBI: 3288 → NCBI:3288) and extraneous text (e.g., "(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
Merged assessed entity type synonyms.

Version 1.0.0

BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.

@@ Line 2: / Line 2: @@
 The versioning format follows a three-digit structure: X.Y.Z.
 * The first digit (X) changes when a major update is introduced, such as changes in the data model.
-* The second digit (Y) increments with each new release.
+* The second digit (Y) increments when new data is added.
 * The third digit (Z) is updated for bug fixes or minor changes.
+== Version 2.3.1 ==
+Date: January 22, 2026
+=== Backend Updates ===
+* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].
+== Version 2.3.0 ==
+Date: January 12, 2026
+=== Data Updates ===
+* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.
+=== Backend Updates ===
+* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:
+** <code>cgi</code> (Cancer Genome Interpreter)
+** <code>civic</code> (CIViC)
+** <code>clinvar</code> (ClinVar)
+** <code>edrn</code> (Early Detection Research Network)
+** <code>gwas</code> (Genome-Wide Association Studies)
+** <code>llm_glycan</code> (LLM-extracted glycan biomarkers)
+** <code>markerdb</code> (MarkerDB)
+** <code>mw</code> (Metabolomics Workbench)
+** <code>oncomx</code> (OncoMX)
+** <code>opentargets</code> (OpenTargets)
+** <code>PMC_biomarker_sets</code> (PubMed Central)
+** <code>sennet</code> (SenNet Consortium)
+** <code>top_50</code> (Top-50 clinically relevant biomarkers)
+** <code>upkb_reviewed_v2</code> (UniProtKB)
+=== Bug Fixes ===
+* The <code>biomarker_controlled_vocab</code> field in TSV files is now constructed based on the <code>biomarker_id</code> and <code>biomarker_orig</code> tuple. Previously it only used <code>biomarker_id</code> as key, introducing inconsistencies in biomarkers that had multiple <code>biomarker_component</code> objects.
+== Version 2.2.0 ==
+Date: December 22, 2025
+=== Data Updates ===
+* Electronic Health Records data has been added to creatinine biomarkers.
+* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].
+=== Backend Updates ===
+* On the API level, each biomarker now contains a new field: <code>biomarker_controlled_vocab</code> which shows the standardized biomarker name. Original biomarker names are now shown in the <code>biomarker_orig</code> field.
+== Version 2.1.0 ==
+Date: December 11, 2025
+=== Data Updates ===
+* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.
+=== Backend Updates ===
+* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.
+* LOINC codes are no longer tied to specimen IDs.
+== Version 2.0.2 ==
+Date: December 4, 2025
+=== Bug Fixes ===
+* LOINC codes are no longer tied to specimen (UBERON) IDs.
+* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by "in review".
+== Version 2.0.1 ==
+=== Data Updates ===
+* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:
+** [https://www.gtexportal.org/home/ GTEx]
+** [https://pharos.nih.gov/ Pharos]
+** [https://reactome.org/ Reactome]
+** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]
+** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]
+** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]
+** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]
+== Version 2.0.0 ==
+=== Data Updates ===
+* The biomarker field is now standardized using controlled vocabulary terms.
+* Added metabolite as an <code>assessed_entity_type</code> to <code>mw_loinc_biomarkers.tsv</code>.
+* Added [https://rnacentral.org/ RNAcentral] cross-reference support.
+* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.
+== Version 1.0.6 ==
+=== Data Updates ===
+* Added a new dataset: MW LOINC biomarkers (<code>mw_loinc_biomarkers.tsv</code>).
+* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.
+=== Backend Updates ===
+* Added the <code>display_name</code> field to the <code>format-converter</code> so data source names appear with correct casing.
+== Version 1.0.5 ==
+=== Data Updates ===
+* Updated the Troponin biomarker value <code>assessed_biomarker_entity</code> for consistency.
+* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
+* Added Cell Ontology and Protein Ontology cross-references.
+=== Backend Updates ===
+* Updated all script paths to use <code>data_source.conf</code> and validated data source names.
+== Version 1.0.4 ==
+This release introduces new datasets, cross-references, and bug fixes.
+=== Data Updates ===
+* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
+* Added Metabolomics Workbench LOINC data on metabolite biomarkers.
+* Added Cell Ontology and Protein Ontology cross-references.
+=== Bug Fixes ===
+* Fixed issue where cookie preferences weren't being saved when selecting "Allow".
 == Version 1.0.3 ==
@@ Line 10: / Line 102: @@
 * NCBI cross-references added across gene biomarker entries.
 * ChEBI cross-references integrated for small molecules and metabolites.
-=== Backend and Infrastructure Updates ===
+=== Backend Updates ===
 * ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.
 ** Old services retired 1 September 2025.
@@ Line 19: / Line 111: @@
 * Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.
 * Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].
-=== Backend and Infrastructure Updates ===
+=== Backend Updates ===
 * <code>evidence_source</code> database names now retain their original casing for accuracy and consistency.
 * EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].
@@ Line 31: / Line 123: @@
 == Version 1.0.1 ==
 === Data Updates ===
-* Added <code> xrefs.tsv</code> to the list of datasets.
+* Added <code>xrefs.tsv</code> to the list of datasets.
-=== Backend & Infrastructure Updates ===
+=== Backend Updates ===
 * Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
 * Merged assessed entity type synonyms.

Data Release Notes: Difference between revisions

Latest revision as of 16:31, 26 January 2026

Versioning Format

Version 2.3.1

Backend Updates

Version 2.3.0

Data Updates

Backend Updates

Bug Fixes

Version 2.2.0

Data Updates

Backend Updates

Version 2.1.0

Data Updates

Backend Updates

Version 2.0.2

Bug Fixes

Version 2.0.1

Data Updates

Version 2.0.0

Data Updates

Version 1.0.6

Data Updates

Backend Updates

Version 1.0.5

Data Updates

Backend Updates

Version 1.0.4

Data Updates

Bug Fixes

Version 1.0.3

Data Updates

Backend Updates

Version 1.0.2

Data Updates

Backend Updates

Version 1.0.1

Data Updates

Backend Updates

Version 1.0.0

Navigation menu

Search