Data Release Notes: Difference between revisions

From BiomarkerKB Wiki
Jump to navigation Jump to search
MariaKim (talk | contribs)
Version 1.0
MariaKim (talk | contribs)
Expanded 1.0.2; introduced Data Updates and Backed & Infrastructure Updates headers
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Versioning Format ==
The versioning format follows a three-digit structure: X.Y.Z.
* The first digit (X) changes annually, typically at the start of a new project year.
* The second digit (Y) increments with each new release.
* The third digit (Z) is updated for bug fixes or minor changes.


== Version 1.0 ==
== Version 1.0.2 ==
=== Data Updates ===
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].
=== Backend and Infrastructure Updates ===
* <code>evidence_source</code> database names now retain their original casing for accuracy and consistency.
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.
* Fixed an issue where <code>evidence_source</code> values without tags were previously dropped; these are now preserved.
* Added a user-guided spelling correction function to improve data entry quality.
* The TSV-to-JSON converter now automatically checks for header spelling errors.
* Introduced <code>_suggest_header_corrections</code> to flag and propose fixes for misspelled headers.
* Enhanced <code>_stream_tsv</code> with a call to <code>_check_header_spelling</code> to prevent invalid headers from being processed.


== Version 1.0.1 ==
=== Data Updates ===
* Added <code> xrefs.tsv</code> to the list of datasets.
=== Backend & Infrastructure Updates ===
* Fixed ID formatting issues in NCBI and UniProt references within <code> oncomx.tsv</code>, removing erroneous spaces (e.g., <code> NCBI: 3288</code> → <code> NCBI:3288</code>) and extraneous text (e.g., <code>"(composition)"</code>). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
* Merged assessed entity type synonyms.
== Version 1.0.0 ==
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.

Latest revision as of 15:26, 29 August 2025

Versioning Format

The versioning format follows a three-digit structure: X.Y.Z.

  • The first digit (X) changes annually, typically at the start of a new project year.
  • The second digit (Y) increments with each new release.
  • The third digit (Z) is updated for bug fixes or minor changes.

Version 1.0.2

Data Updates

Backend and Infrastructure Updates

  • evidence_source database names now retain their original casing for accuracy and consistency.
  • EDRN identifiers were added to the namespace map.
  • HUGO Gene Nomenclature Committee (HGNC) was added to the cross-reference JSON file.
  • Fixed an issue where evidence_source values without tags were previously dropped; these are now preserved.
  • Added a user-guided spelling correction function to improve data entry quality.
  • The TSV-to-JSON converter now automatically checks for header spelling errors.
  • Introduced _suggest_header_corrections to flag and propose fixes for misspelled headers.
  • Enhanced _stream_tsv with a call to _check_header_spelling to prevent invalid headers from being processed.

Version 1.0.1

Data Updates

  • Added xrefs.tsv to the list of datasets.

Backend & Infrastructure Updates

  • Fixed ID formatting issues in NCBI and UniProt references within oncomx.tsv, removing erroneous spaces (e.g., NCBI: 3288 NCBI:3288) and extraneous text (e.g., "(composition)"). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.
  • Merged assessed entity type synonyms.

Version 1.0.0

  • BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.