Data Release Notes: Difference between revisions
Jump to navigation
Jump to search
Data release 1.0.3 |
No edit summary |
||
| (13 intermediate revisions by 3 users not shown) | |||
| Line 2: | Line 2: | ||
The versioning format follows a three-digit structure: X.Y.Z. | The versioning format follows a three-digit structure: X.Y.Z. | ||
* The first digit (X) changes when a major update is introduced, such as changes in the data model. | * The first digit (X) changes when a major update is introduced, such as changes in the data model. | ||
* The second digit (Y) increments | * The second digit (Y) increments when new data is added. | ||
* The third digit (Z) is updated for bug fixes or minor changes. | * The third digit (Z) is updated for bug fixes or minor changes. | ||
== Version 2.3.1 == | |||
Date: January 22, 2026 | |||
=== Backend Updates === | |||
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org]. | |||
== Version 2.3.0 == | |||
Date: January 12, 2026 | |||
=== Data Updates === | |||
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta. | |||
=== Backend Updates === | |||
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available: | |||
** <code>cgi</code> (Cancer Genome Interpreter) | |||
** <code>civic</code> (CIViC) | |||
** <code>clinvar</code> (ClinVar) | |||
** <code>edrn</code> (Early Detection Research Network) | |||
** <code>gwas</code> (Genome-Wide Association Studies) | |||
** <code>llm_glycan</code> (LLM-extracted glycan biomarkers) | |||
** <code>markerdb</code> (MarkerDB) | |||
** <code>mw</code> (Metabolomics Workbench) | |||
** <code>oncomx</code> (OncoMX) | |||
** <code>opentargets</code> (OpenTargets) | |||
** <code>PMC_biomarker_sets</code> (PubMed Central) | |||
** <code>sennet</code> (SenNet Consortium) | |||
** <code>top_50</code> (Top-50 clinically relevant biomarkers) | |||
** <code>upkb_reviewed_v2</code> (UniProtKB) | |||
=== Bug Fixes === | |||
* The <code>biomarker_controlled_vocab</code> field in TSV files is now constructed based on the <code>biomarker_id</code> and <code>biomarker_orig</code> tuple. Previously it only used <code>biomarker_id</code> as key, introducing inconsistencies in biomarkers that had multiple <code>biomarker_component</code> objects. | |||
== Version 2.2.0 == | |||
Date: December 22, 2025 | |||
=== Data Updates === | |||
* Electronic Health Records data has been added to creatinine biomarkers. | |||
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium]. | |||
=== Backend Updates === | |||
* On the API level, each biomarker now contains a new field: <code>biomarker_controlled_vocab</code> which shows the standardized biomarker name. Original biomarker names are now shown in the <code>biomarker_orig</code> field. | |||
== Version 2.1.0 == | |||
Date: December 11, 2025 | |||
=== Data Updates === | |||
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung. | |||
=== Backend Updates === | |||
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed. | |||
* LOINC codes are no longer tied to specimen IDs. | |||
== Version 2.0.2 == | |||
Date: December 4, 2025 | |||
=== Bug Fixes === | |||
* LOINC codes are no longer tied to specimen (UBERON) IDs. | |||
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by "in review". | |||
== Version 2.0.1 == | |||
=== Data Updates === | |||
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources: | |||
** [https://www.gtexportal.org/home/ GTEx] | |||
** [https://pharos.nih.gov/ Pharos] | |||
** [https://reactome.org/ Reactome] | |||
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network] | |||
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal] | |||
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] | |||
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS] | |||
== Version 2.0.0 == | |||
=== Data Updates === | |||
* The biomarker field is now standardized using controlled vocabulary terms. | |||
* Added metabolite as an <code>assessed_entity_type</code> to <code>mw_loinc_biomarkers.tsv</code>. | |||
* Added [https://rnacentral.org/ RNAcentral] cross-reference support. | |||
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example. | |||
== Version 1.0.6 == | |||
=== Data Updates === | |||
* Added a new dataset: MW LOINC biomarkers (<code>mw_loinc_biomarkers.tsv</code>). | |||
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references. | |||
=== Backend Updates === | |||
* Added the <code>display_name</code> field to the <code>format-converter</code> so data source names appear with correct casing. | |||
== Version 1.0.5 == | |||
=== Data Updates === | |||
* Updated the Troponin biomarker value <code>assessed_biomarker_entity</code> for consistency. | |||
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers. | |||
* Added Cell Ontology and Protein Ontology cross-references. | |||
=== Backend Updates === | |||
* Updated all script paths to use <code>data_source.conf</code> and validated data source names. | |||
== Version 1.0.4 == | |||
This release introduces new datasets, cross-references, and bug fixes. | |||
=== Data Updates === | |||
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB. | |||
* Added Metabolomics Workbench LOINC data on metabolite biomarkers. | |||
* Added Cell Ontology and Protein Ontology cross-references. | |||
=== Bug Fixes === | |||
* Fixed issue where cookie preferences weren't being saved when selecting "Allow". | |||
== Version 1.0.3 == | == Version 1.0.3 == | ||
| Line 10: | Line 102: | ||
* NCBI cross-references added across gene biomarker entries. | * NCBI cross-references added across gene biomarker entries. | ||
* ChEBI cross-references integrated for small molecules and metabolites. | * ChEBI cross-references integrated for small molecules and metabolites. | ||
=== Backend | === Backend Updates === | ||
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration. | * ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration. | ||
** Old services retired 1 September 2025. | ** Old services retired 1 September 2025. | ||
| Line 19: | Line 111: | ||
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data. | * Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data. | ||
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network]. | * Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network]. | ||
=== Backend | === Backend Updates === | ||
* <code>evidence_source</code> database names now retain their original casing for accuracy and consistency. | * <code>evidence_source</code> database names now retain their original casing for accuracy and consistency. | ||
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map]. | * EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map]. | ||
| Line 31: | Line 123: | ||
== Version 1.0.1 == | == Version 1.0.1 == | ||
=== Data Updates === | === Data Updates === | ||
* Added <code> xrefs.tsv</code> to the list of datasets. | * Added <code>xrefs.tsv</code> to the list of datasets. | ||
=== Backend | === Backend Updates === | ||
* Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others. | * Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others. | ||
* Merged assessed entity type synonyms. | * Merged assessed entity type synonyms. | ||
Latest revision as of 16:31, 26 January 2026
Versioning Format
The versioning format follows a three-digit structure: X.Y.Z.
- The first digit (X) changes when a major update is introduced, such as changes in the data model.
- The second digit (Y) increments when new data is added.
- The third digit (Z) is updated for bug fixes or minor changes.
Version 2.3.1
Date: January 22, 2026
Backend Updates
- A master list of all biomarkers present in BiomarkerKB will be available for download at data.biomarkerkb.org.
Version 2.3.0
Date: January 12, 2026
Data Updates
- New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.
Backend Updates
- New Advanced Search type: users can now search biomarkers by Data Source. The following data sources are currently available:
cgi(Cancer Genome Interpreter)civic(CIViC)clinvar(ClinVar)edrn(Early Detection Research Network)gwas(Genome-Wide Association Studies)llm_glycan(LLM-extracted glycan biomarkers)markerdb(MarkerDB)mw(Metabolomics Workbench)oncomx(OncoMX)opentargets(OpenTargets)PMC_biomarker_sets(PubMed Central)sennet(SenNet Consortium)top_50(Top-50 clinically relevant biomarkers)upkb_reviewed_v2(UniProtKB)
Bug Fixes
- The
biomarker_controlled_vocabfield in TSV files is now constructed based on thebiomarker_idandbiomarker_origtuple. Previously it only usedbiomarker_idas key, introducing inconsistencies in biomarkers that had multiplebiomarker_componentobjects.
Version 2.2.0
Date: December 22, 2025
Data Updates
- Electronic Health Records data has been added to creatinine biomarkers.
- New dataset: senescence biomarkers from SenNet Consortium.
Backend Updates
- On the API level, each biomarker now contains a new field:
biomarker_controlled_vocabwhich shows the standardized biomarker name. Original biomarker names are now shown in thebiomarker_origfield.
Version 2.1.0
Date: December 11, 2025
Data Updates
- Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.
Backend Updates
- The incorrect download links on the Data Portal have been fixed.
- LOINC codes are no longer tied to specimen IDs.
Version 2.0.2
Date: December 4, 2025
Bug Fixes
- LOINC codes are no longer tied to specimen (UBERON) IDs.
- For biomarkers that could not be mapped to Controlled Vocabulary the original biomarker name is displayed, followed by "in review".
Version 2.0.1
Data Updates
- Added cross-references to the Common Fund Data Ecosystem (CFDE) Data Coordinating Centers and other resources:
Version 2.0.0
Data Updates
- The biomarker field is now standardized using controlled vocabulary terms.
- Added metabolite as an
assessed_entity_typetomw_loinc_biomarkers.tsv. - Added RNAcentral cross-reference support.
- Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.
Version 1.0.6
Data Updates
- Added a new dataset: MW LOINC biomarkers (
mw_loinc_biomarkers.tsv). - Added National Cancer Institute Thesaurus and Protein Data Bank cross-references.
Backend Updates
- Added the
display_namefield to theformat-converterso data source names appear with correct casing.
Version 1.0.5
Data Updates
- Updated the Troponin biomarker value
assessed_biomarker_entityfor consistency. - Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.
- Added Cell Ontology and Protein Ontology cross-references.
Backend Updates
- Updated all script paths to use
data_source.confand validated data source names.
Version 1.0.4
This release introduces new datasets, cross-references, and bug fixes.
Data Updates
- Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.
- Added Metabolomics Workbench LOINC data on metabolite biomarkers.
- Added Cell Ontology and Protein Ontology cross-references.
Bug Fixes
- Fixed issue where cookie preferences weren't being saved when selecting "Allow".
Version 1.0.3
This release introduces new cross-references and updates to ensure compatibility with external resources.
Data Updates
- NCBI cross-references added across gene biomarker entries.
- ChEBI cross-references integrated for small molecules and metabolites.
Backend Updates
- ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.
- Old services retired 1 September 2025.
- New stable API: ChEBI REST API docs
- New data products and beta interface available at ChEBI 2.0.
Version 1.0.2
Data Updates
- Published updated Metabolomics Workbench data.
- Published sample data from the Early Detection Research Network.
Backend Updates
evidence_sourcedatabase names now retain their original casing for accuracy and consistency.- EDRN identifiers were added to the namespace map.
- HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
- Fixed an issue where
evidence_sourcevalues without tags were previously dropped; these are now preserved. - Added a user-guided spelling correction function to improve data entry quality.
- The TSV-to-JSON converter now automatically checks for header spelling errors.
- Introduced
_suggest_header_correctionsto flag and propose fixes for misspelled headers. - Enhanced
_stream_tsvwith a call to_check_header_spellingto prevent invalid headers from being processed.
Version 1.0.1
Data Updates
- Added
xrefs.tsvto the list of datasets.
Backend Updates
- Fixed ID formatting issues in NCBI and UniProt references within
oncomx.tsv, removing erroneous spaces (e.g.,NCBI: 3288→NCBI:3288) and extraneous text (e.g.,"(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others. - Merged assessed entity type synonyms.
Version 1.0.0
- BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.