<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.biomarkerkb.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=MariaKim</id>
	<title>BiomarkerKB Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.biomarkerkb.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=MariaKim"/>
	<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/Special:Contributions/MariaKim"/>
	<updated>2026-05-08T13:24:43Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.42.3</generator>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=209</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=209"/>
		<updated>2026-05-07T22:18:27Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Standardized and Controlled Vocabulary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence&amp;lt;/code&amp;gt; is one or more exact citations from the evidence source (in most cases, it will be the PubMed publication).&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;DOID:0080600&amp;lt;/code&amp;gt;. Refer to https://disease-ontology.org/do/.&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;UBERON:0000178&amp;lt;/code&amp;gt;. Refer to https://www.ebi.ac.uk/ols4/ontologies/uberon.&lt;br /&gt;
## &amp;lt;code&amp;gt;loinc_code&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;LOINC:100153-6&amp;lt;/code&amp;gt;. Refer to https://loinc.org/ (you may need to create an account to access the search functionality).&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SOURCE:ID&amp;lt;/code&amp;gt;, for example &amp;lt;code&amp;gt;PubMed:32677844&amp;lt;/code&amp;gt;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to the [https://github.com/clinical-biomarkers/biomarker-controlled-vocabulary GitHub documentation] for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The [BiomarkerKB data page] has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Submission === &lt;br /&gt;
Once data is formatted and cleaned please send any data to mazumder_lab@gwu.edu.&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== condition ===&lt;br /&gt;
&amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; should be reported in all lowercase and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; (from Disease Ontology, MONDO, or SNOMED) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] to infer the correct biomarker role. Accepted role terms are:&lt;br /&gt;
* diagnostic&lt;br /&gt;
* monitoring&lt;br /&gt;
* predictive&lt;br /&gt;
* prognostic&lt;br /&gt;
* response&lt;br /&gt;
* risk&lt;br /&gt;
* safety&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=208</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=208"/>
		<updated>2026-05-07T22:14:57Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence&amp;lt;/code&amp;gt; is one or more exact citations from the evidence source (in most cases, it will be the PubMed publication).&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;DOID:0080600&amp;lt;/code&amp;gt;. Refer to https://disease-ontology.org/do/.&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;UBERON:0000178&amp;lt;/code&amp;gt;. Refer to https://www.ebi.ac.uk/ols4/ontologies/uberon.&lt;br /&gt;
## &amp;lt;code&amp;gt;loinc_code&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;LOINC:100153-6&amp;lt;/code&amp;gt;. Refer to https://loinc.org/ (you may need to create an account to access the search functionality).&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SOURCE:ID&amp;lt;/code&amp;gt;, for example &amp;lt;code&amp;gt;PubMed:32677844&amp;lt;/code&amp;gt;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to the [https://github.com/clinical-biomarkers/biomarker-controlled-vocabulary GitHub documentation] for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The [BiomarkerKB data page] has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Submission === &lt;br /&gt;
Once data is formatted and cleaned please send any data to mazumder_lab@gwu.edu.&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== condition ===&lt;br /&gt;
&amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; should be reported in all lowercase and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; (from Disease Ontology, MONDO, or SNOMED) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=207</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=207"/>
		<updated>2026-04-30T22:52:12Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;DOID:0080600&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;UBERON:0000178&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SOURCE:ID&amp;lt;/code&amp;gt;, for example &amp;lt;code&amp;gt;PubMed:32677844&amp;lt;/code&amp;gt;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Submission === &lt;br /&gt;
Once data is formatted and cleaned please send any data to mazumder_lab@gwu.edu.&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; (from Disease Ontology, MONDO, or SNOMED) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=206</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=206"/>
		<updated>2026-04-29T15:27:03Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 3.2.1 ==&lt;br /&gt;
Date: April 23rd, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset containing user-submitted biomarkers.&lt;br /&gt;
* Added a new manually curated dataset of biomarkers with exposure agents.&lt;br /&gt;
&lt;br /&gt;
== Version 3.1.1 ==&lt;br /&gt;
Date: April 9th, 2026&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
Complete overhaul of the backend data pipeline architecture:&lt;br /&gt;
* Improved ETL processes for greater reliability and scalability&lt;br /&gt;
* Enhanced data validation and error handling across pipeline stages&lt;br /&gt;
* Optimized performance for faster data processing and reduced runtime&lt;br /&gt;
* Refactored codebase for maintainability and extensibility&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.3 ==&lt;br /&gt;
Date: February 26th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new image-based biomarker from the OncoMX dataset.&lt;br /&gt;
* Fixed UniProtKB biomarkers that incorrectly included exposure agents.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated ChEBI API integration to properly parse JSON responses.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.2 ==&lt;br /&gt;
Date: February 19th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An archive file of all biomarker NTriples is now available for download at [https://data.biomarkerkb.org/BMK_000019 data.biomarkerkb.org/BMK_000019].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.1 ==&lt;br /&gt;
Date: February 12th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/BMK_000007 data.biomarkerkb.org/BMK_000007].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Data_Processing_and_Modeling_Specification&amp;diff=205</id>
		<title>BiomarkerKB Data Processing and Modeling Specification</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Data_Processing_and_Modeling_Specification&amp;diff=205"/>
		<updated>2026-04-28T20:07:16Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: Created page with &amp;quot;= Biomarker identification = A biomarker’s canonical identity is defined by removing the disease or exposure agent dimension from its component combination. For each row in the source TSV, a combination list is first constructed out of three components: (1) assessed entity identifier, (2) condition or exposure agent identifier, and (3) controlled vocabulary term. The condition or exposure agent component is then removed to generate a canonical combination consisting on...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Biomarker identification =&lt;br /&gt;
A biomarker’s canonical identity is defined by removing the disease or exposure agent dimension from its component combination. For each row in the source TSV, a combination list is first constructed out of three components: (1) assessed entity identifier, (2) condition or exposure agent identifier, and (3) controlled vocabulary term. The condition or exposure agent component is then removed to generate a canonical combination consisting only of assessed entity identifier and controlled vocabulary pairs. This canonical representation is hashed using MD5 to produce a stable identifier, which serves as a condition-independent anchor across datasets.&lt;br /&gt;
&lt;br /&gt;
Second-level biomarker identifiers are assigned by appending an index to the canonical ID, where the index increments for each distinct condition or exposure agent linked to the same canonical combination, resulting in identifiers such as AN6278-1 and AN6278-2.&lt;br /&gt;
&lt;br /&gt;
To ensure persistence across releases, previously assigned canonical IDs are reused by scanning historical ID-tracking directories. New canonical entries receive sequential identifiers (e.g., BMKB000270). Changes over time are recorded through history tracking, capturing relationships such as inheritance, replacement, and discontinuation.&lt;br /&gt;
&lt;br /&gt;
= Deduplication =&lt;br /&gt;
Deduplication is performed at both the biomarker component and row levels to ensure that identical biomarkers are consistently identified across sources. At the component level, the fundamental unit of deduplication is a normalized combination key, defined as a sorted and JSON-serialized list of assessed entity ID, condition or exposure agent ID, and controlled vocabulary tuples. Records from different sources that resolve to the same lowercased combination key are treated as the same biomarker and assigned a shared identifier. Provenance is preserved through a mapping from each unique combination key to the set of source files that contributed it, allowing aggregation without duplication.&lt;br /&gt;
&lt;br /&gt;
At the row level, exact duplicate entries within individual TSV files are removed during ingestion using a dictionary keyed by the serialized row content.&lt;br /&gt;
&lt;br /&gt;
A critical prerequisite for effective deduplication is controlled vocabulary normalization. Semantically equivalent biomarker descriptions, such as “increased IL6 level” and “elevated IL-6 levels,” must first be standardized to a common controlled vocabulary term; otherwise, their component combinations will not match and will be treated as distinct entities.&lt;br /&gt;
&lt;br /&gt;
= Data Modeling =&lt;br /&gt;
Data modeling begins with controlled vocabulary normalization. Raw biomarker strings are tokenized and matched against defined pattern definitions, where each rule maps to a structured label of the form change_type, aspect_type, and mod_type (for example, increased, level, not_specified). The matched rule is then used to generate a standardized representation such as “Increased level of protein IL6/UPKB:P05231.” Special cases, including single-nucleotide polymorphisms, mutations, and glycan modifications, are handled through explicit logic. Terms that do not match any pattern are flagged as “[biomarker_term_in_review],” while a supplementary rules file captures hard-coded overrides for edge cases.&lt;br /&gt;
&lt;br /&gt;
The normalized data are then assembled into a document-oriented model, where each record in the c_biomarker MongoDB collection represents a single biomarker entry. Each document includes core biomarker identifiers (canonical IDs and second-level IDs) and a biomarker component array describing each assessed entity, including its identifier, type, controlled vocabulary term, associated specimens, and supporting evidence sources. Disease context is captured in a condition object containing standardized names and synonyms derived from a disease database. Additional fields capture biomarker roles (such as diagnostic or prognostic), aggregated evidence sources, and citation metadata sourced from PubMed. Where available, documents also include normal range statistics, a list of contributing upstream sources, and cross-references to external databases.&lt;br /&gt;
&lt;br /&gt;
= ETL (Extract, Transform, Load) Processes =&lt;br /&gt;
The ETL pipeline is organized as a staged workflow that transforms raw source data into structured, queryable databases. In the first step, literature ingestion is performed by a set of scripts, which collect all PubMed identifiers referenced in the source TSV files, retrieve the corresponding MEDLINE XML records, and extract structured citation data into JSON files.&lt;br /&gt;
&lt;br /&gt;
The second step focuses on building reference databases. Disease objects are constructed from GlyGen and Disease Ontology sources, and statistical summaries are computed, including minimum, maximum, mean, median, interquartile range, and whiskers, stratified by age group and sex using clinical datasets from Oracle Health and GWDC.&lt;br /&gt;
&lt;br /&gt;
In the third step, the core databases are assembled. The main biomarker collection is generated, integrating normalized biomarker records with disease annotations, citation data, normal ranges, and cross-references. Supporting structures are then created for efficient querying by generating flattened list representations optimized for search results, including computed relevance scores and filter bitmaps; all searchable fields are tokenized into phrase-level indices to support the search layer.&lt;br /&gt;
&lt;br /&gt;
The final step produces auxiliary databases. Aggregate statistics are generated, initialization metadata for the search system is written, and precomputed sort orders for list fields are created.&lt;br /&gt;
&lt;br /&gt;
All stages of this pipeline depend on upstream dataset preparation. The preprocessing step ingests raw TSV files, applies controlled vocabulary normalization, assigns component group identifiers to link multi-component biomarkers, and outputs intermediate TSV files that serve as the input to the downstream object construction pipeline.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=204</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=204"/>
		<updated>2026-04-21T18:58:49Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Instructions to submit Biomarker Data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;DOID:0080600&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;UBERON:0000178&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SOURCE:ID&amp;lt;/code&amp;gt;, for example &amp;lt;code&amp;gt;PubMed:32677844&amp;lt;/code&amp;gt;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Submission === &lt;br /&gt;
Once data is formatted and cleaned please send any data to mazumder_lab@gwu.edu.&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and condition ID (from Disease Ontology ID) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and specimen_ID in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=203</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=203"/>
		<updated>2026-04-21T18:52:30Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Once data is formatted and cleaned please send any data to daniallmasood@gwu.edu */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = DOID&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = UBERON&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;quot;SOURCE&amp;quot;:&amp;quot;ID&amp;quot;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Submission === &lt;br /&gt;
Once data is formatted and cleaned please send any data to mazumder_lab@gwu.edu.&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and condition ID (from Disease Ontology ID) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and specimen_ID in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=202</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=202"/>
		<updated>2026-04-21T18:51:44Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## &amp;lt;code&amp;gt;biomarker&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;condition&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; OR &amp;lt;code&amp;gt;exposure_agent&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;exposure_agent_id&amp;lt;/code&amp;gt;&lt;br /&gt;
## &amp;lt;code&amp;gt;component_group&amp;lt;/code&amp;gt; containing integers (1, 2, 3...). A multicomponent biomarker must have the same integer in all rows related to that biomarker.&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## &amp;lt;code&amp;gt;condition_id&amp;lt;/code&amp;gt; = DOID&lt;br /&gt;
## &amp;lt;code&amp;gt;specimen_id&amp;lt;/code&amp;gt; = UBERON&lt;br /&gt;
## &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; = &amp;quot;SOURCE&amp;quot;:&amp;quot;ID&amp;quot;&lt;br /&gt;
## For &amp;lt;code&amp;gt;assessed_biomarker_entity_id&amp;lt;/code&amp;gt; please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, &amp;lt;code&amp;gt;data_conversion.py&amp;lt;/code&amp;gt; script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Once data is formatted and cleaned please send any data to daniallmasood@gwu.edu ===&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and condition ID (from Disease Ontology ID) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and specimen_ID in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Biomaker_Ontology_Working_Group&amp;diff=201</id>
		<title>Biomaker Ontology Working Group</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Biomaker_Ontology_Working_Group&amp;diff=201"/>
		<updated>2026-04-21T15:20:39Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: MariaKim moved page Biomaker Ontology Working Group to Biomarker Ontology Working Group: Misspelled title&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Biomarker Ontology Working Group]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Biomarker_Ontology_Working_Group&amp;diff=200</id>
		<title>Biomarker Ontology Working Group</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Biomarker_Ontology_Working_Group&amp;diff=200"/>
		<updated>2026-04-21T15:20:39Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: MariaKim moved page Biomaker Ontology Working Group to Biomarker Ontology Working Group: Misspelled title&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The Biomarker Ontology Working Group is a collaborative effort dedicated to developing, refining, and maintaining standardized ontological frameworks that support the representation, integration, and interoperability of biomarker-related data. The group brings together ontology experts, data curators, biomarker experts, to establish consistent terminologies, formal relationships, and semantic structures that enhance the discoverability and reuse of biomarker information across diverse biomedical resources. By aligning with community standards and promoting transparent, open development practices, the working group plays a central role in ensuring that biomarker data can be reliably linked, compared, and analyzed within translational research, clinical informatics, and computational biology applications.&lt;br /&gt;
&lt;br /&gt;
The Biomarker Ontology Working Group have jointly developed &#039;&#039;&#039;Ontology for Biomarkers of Clinical Interest (OBCI)&#039;&#039;&#039;. A fully functioning ontology in OWL format can be found at https://proteininformationresource.org/staff/nataled/OBCI/core/obci_core_full.owl&lt;br /&gt;
&lt;br /&gt;
The Ontology Working Group has also created a &#039;&#039;&#039;Controlled Vocabularies (CV)&#039;&#039;&#039; for standardizing the representation of the biomarker values and the biomarker entity types. The CV can be found at [[Controlled Vocabulary and Keywords]]&lt;br /&gt;
&lt;br /&gt;
Biomarker Ontology Working Group currently contains the members:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Name&#039;&#039;&#039;&lt;br /&gt;
|&#039;&#039;&#039;ORCID&#039;&#039;&#039;&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Darren A. Natale&#039;&#039;&#039;&lt;br /&gt;
|0000-0001-5809-9523&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Raja Mazumder&#039;&#039;&#039;&lt;br /&gt;
|0000-0001-8823-9945&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Daniall Masood&#039;&#039;&#039;&lt;br /&gt;
|0000-0001-7441-1628&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Hande Küçük McGinty&#039;&#039;&#039;&lt;br /&gt;
|0000-0002-9025-5538&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Marc E. Gillespie&#039;&#039;&#039;&lt;br /&gt;
|0000-0002-5766-1702&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Astghik Sargsyan&#039;&#039;&#039;&lt;br /&gt;
|0000-0002-5860-6369&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Soheil Abadifard&#039;&#039;&#039;&lt;br /&gt;
|0000-0002-2980-4251&lt;br /&gt;
|-&lt;br /&gt;
|&#039;&#039;&#039;Jeet K. Vora&#039;&#039;&#039;&lt;br /&gt;
|0000-0002-5317-1458&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=199</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=199"/>
		<updated>2026-04-18T02:37:29Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of biomarkers for all diseases.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;br /&gt;
&lt;br /&gt;
= Pending Resources =&lt;br /&gt;
== biomarker.org ==&lt;br /&gt;
Reached out on March 17th, 2026 regarding data access and sent follow-up communications; however, no response was received.&lt;br /&gt;
&lt;br /&gt;
== [https://cadsr.cancer.gov/onedata/Home.jsp caDSR] ==&lt;br /&gt;
The Cancer Data Standards Registry and Repository (caDSR) is a metadata registry, not a biomarker knowledge source. It defines Common Data Elements (CDEs), including field names, definitions, and controlled value sets, but does not contain biomarker-condition relationships or evidence.&lt;br /&gt;
&lt;br /&gt;
For BiomarkerKB, it could potentially be valuable for schema standardization rather than data ingestion. For example, align fields like condition, specimen, and entity type to controlled vocabularies (via NCI Thesaurus).&lt;br /&gt;
&lt;br /&gt;
Recommendation: not ingestible as a data source.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=198</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=198"/>
		<updated>2026-04-14T19:56:06Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Resources for Exploration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of biomarkers for all diseases.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;br /&gt;
&lt;br /&gt;
= Pending Resources =&lt;br /&gt;
== biomarker.org ==&lt;br /&gt;
Reached out on March 17th, 2026 regarding data access and sent follow-up communications; however, no response was received.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=197</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=197"/>
		<updated>2026-04-14T19:55:48Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of biomarkers for all diseases.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;br /&gt;
&lt;br /&gt;
= Pending Resources =&lt;br /&gt;
== biomarker.org ==&lt;br /&gt;
Reached out on March 17th, 2026 regarding data access and sent follow-up communications; however, no response was received.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=194</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=194"/>
		<updated>2026-03-24T14:32:31Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of biomarkers for all diseases.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;br /&gt;
&lt;br /&gt;
= Pending Resources =&lt;br /&gt;
== biomarker.org ==&lt;br /&gt;
Sent a message on March 17th, 2026 via biomarker.org contact form, followed up March 24th, 2026. Awaiting response.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=193</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=193"/>
		<updated>2026-03-20T16:08:23Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of biomarkers for all diseases.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=192</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=192"/>
		<updated>2026-03-20T16:08:03Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
== CIViC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== ClinVar ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of all biomarkers.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=191</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=191"/>
		<updated>2026-03-19T16:42:28Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* ClinVar */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
= CIViC =&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
= ClinVar =&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only biomarkers from &amp;quot;cancer&amp;quot; and &amp;quot;carcinoma&amp;quot; tags were pulled. Pending integration of all biomarkers.&lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=190</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=190"/>
		<updated>2026-03-17T20:18:26Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from a wide range of resources. Not all collected data are directly integrated into the core data model; some are included as contextual annotations or cross-references to enrich existing entries.&lt;br /&gt;
&lt;br /&gt;
= Resources for Exploration =&lt;br /&gt;
*[https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer]&lt;br /&gt;
*[https://themarker.idrblab.cn/ Marker Database]&lt;br /&gt;
*biomarker.org&lt;br /&gt;
*ResMarkerDB&lt;br /&gt;
*SalivaDB&lt;br /&gt;
*[https://glycanage.com/publications GlycanAge Publications]&lt;br /&gt;
*[https://www.cancergenomeinterpreter.org/biomarkers Cancer Genome Interpreter (Biomarkers)]&lt;br /&gt;
*[https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code])&lt;br /&gt;
*[https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
For suggestions of additional biomarker data resources, please contact: mazumder_lab@gwu.edu&lt;br /&gt;
&lt;br /&gt;
= Data Sources =&lt;br /&gt;
== GWAS ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Genome-wide association studies (GWAS) provide biomarkers in the form of SNPs.&lt;br /&gt;
* The GWAS Catalog includes SNPs associated with a wide range of diseases.&lt;br /&gt;
** Preliminary curation has only focused on cancer.&lt;br /&gt;
** As of 12/11/2026, biomarkers for all available conditions in the GWAS Catalog have been integrated.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: CC BY-NC 4.0&lt;br /&gt;
&lt;br /&gt;
== MetaKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** Clinical Interpretation of Variants in Cancer (CIViC) &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
** OncoKB &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** The Jackson Laboratory Clinical Knowledgebase (JAX-CKB) &#039;&#039;(restricted from commercial use and has share-alike requirements for non-commercial use)&#039;&#039;&lt;br /&gt;
** MolecularMatch &#039;&#039;(restricted from commercial use)&#039;&#039;&lt;br /&gt;
** Precision Medicine Knowledgebase) &#039;&#039;(pending integration)&#039;&#039;&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component &#039;&#039;(integrated)&#039;&#039;&lt;br /&gt;
* Enables mapping of: &lt;br /&gt;
** Variant → Disease → Drug relationships&lt;br /&gt;
** Evidence levels and citations&lt;br /&gt;
** Ontology-aligned entities (genes, variants, diseases, drugs)&lt;br /&gt;
* Notes:&lt;br /&gt;
** Requires validation of entity mappings against BiomarkerKB schema&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
== Glycan LLM Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: only biomarkers with &amp;lt;code&amp;gt;assessed_entity_type: protein&amp;lt;/code&amp;gt; were integrated, with the goal of expanding to glycan entity types once the Glycan Structure Dictionary is finalized.&lt;br /&gt;
&lt;br /&gt;
== Top 50 Biomarkers ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
== EDRN ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Sample integration into data model&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
== LOINC ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Metabolite data only&lt;br /&gt;
* We are currently working with the Metabolomics Workbench group to get the complete data&lt;br /&gt;
&lt;br /&gt;
== OncoKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
== HPO ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
== UniProtKB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
= CIViC =&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
= ClinVar =&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated. Pending integration of all biomarkers. &lt;br /&gt;
&lt;br /&gt;
== MarkerDB ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Cross-Reference&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== Metabolomics Workbench ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
== OncoMX ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
== OpenTargets ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* &#039;&#039;&#039;License&#039;&#039;&#039;: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Only cancer data was integrated.&lt;br /&gt;
&lt;br /&gt;
== PubMed Central Biomarker Gene Set ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direct integration into data model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
== SenNet ==&lt;br /&gt;
&#039;&#039;&#039;Status&#039;&#039;&#039;: Direction integration into data model&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=189</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=189"/>
		<updated>2026-03-17T19:04:57Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* GWAS */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, [https://www.cancergenomeinterpreter.org/biomarkers https://www.c], [https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code]), [https://www.alliancegenome.org/ Alliance Genome]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
= GWAS =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** All available biomarkers for conditions in GWAS Catalog are integrated 12/11/2026.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
= MetaKB =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.&lt;br /&gt;
* Aggregates and standardizes variant interpretation data from six major knowledgebases:&lt;br /&gt;
** CIViC (Clinical Interpretation of Variants in Cancer)  [Already Integrated Directly]&lt;br /&gt;
** OncoKB  [Yet to be integrated]&lt;br /&gt;
** JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated]&lt;br /&gt;
** MolecularMatch [Yet to be integrated]&lt;br /&gt;
** PMKB (Precision Medicine Knowledgebase) [Yet to be integrated]&lt;br /&gt;
** Cancer Genome Interpreter (CGI) – through its &#039;&#039;Cancer Biomarkers Database&#039;&#039; component .[Integrated]&lt;br /&gt;
* Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs).&lt;br /&gt;
* Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model.&lt;br /&gt;
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.&lt;br /&gt;
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.&lt;br /&gt;
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.&lt;br /&gt;
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:&lt;br /&gt;
** CIViC – CC0 (Public Domain)&lt;br /&gt;
** PMKB – CC-BY 4.0&lt;br /&gt;
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool&lt;br /&gt;
** JAX-CKB – CC-BY-NC-SA 4.0&lt;br /&gt;
** OncoKB – custom non-commercial license&lt;br /&gt;
** MolecularMatch – restricted commercial use&lt;br /&gt;
** MetaKB codebase – MIT license&lt;br /&gt;
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.&lt;br /&gt;
&lt;br /&gt;
= Glycan LLM Biomarkers =&lt;br /&gt;
* LangChain LLM method used to collect biomarkers from PubMed Central abstracts&lt;br /&gt;
* Method identifies glycan entities and changes mentioned in them associated to disease&lt;br /&gt;
&lt;br /&gt;
= Top 50 Biomarkers =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
* Biomarkers collected during Summer Volunteership&lt;br /&gt;
* Volunteers identified top 50 biomarker entities from BiomarkerKB&lt;br /&gt;
* Using this information the top 50 biomarker entities were searched in PubMed&lt;br /&gt;
* 100 biomarkers were manually curated&lt;br /&gt;
&lt;br /&gt;
*&lt;br /&gt;
&lt;br /&gt;
= EDRN =&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
* Sample of EDRN Biomarkers provided from EDRN LLM method&lt;br /&gt;
* Biomarkers are extracted from free text in EDRN publicly available biomarkers&lt;br /&gt;
&lt;br /&gt;
= LOINC =&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= OncoKB =&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
= UniProtKB =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;br /&gt;
&lt;br /&gt;
= CIViC =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
= MarkerDB =&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
= SenNet Biomarker Data =&lt;br /&gt;
Status: Direction Integration Into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cell senescence biomarkers from SenNet group&lt;br /&gt;
* Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2&lt;br /&gt;
* Data is still valuable as contextual data and can be revisited to complete biomarker field in future&lt;br /&gt;
For infomation about Cross-references and Annotations in BiomarkerKB please visit - [[Xrefs and annotations]]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=181</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=181"/>
		<updated>2026-02-25T15:57:29Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.3 (draft) ==&lt;br /&gt;
Date: February 26th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An image biomarker was added to OncoMX data.&lt;br /&gt;
* Fix UPKB biomarkers containing exposure agents.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Read ChEBI API response which is in JSON format&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.2 ==&lt;br /&gt;
Date: February 19th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An archive file of all biomarker NTriples is now available for download at [https://data.biomarkerkb.org/BMK_000019 data.biomarkerkb.org/BMK_000019].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.1 ==&lt;br /&gt;
Date: February 12th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/BMK_000007 data.biomarkerkb.org/BMK_000007].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=179</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=179"/>
		<updated>2026-02-24T22:22:17Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.3 (draft) ==&lt;br /&gt;
Date: February 26th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An image biomarker was added to OncoMX data.&lt;br /&gt;
* Fix UPKB biomarkers containing exposure agents.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Read ChEBI API response which is in JSON format&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.2 ==&lt;br /&gt;
Date: February 19th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An archive file of all biomarker NTriples is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.1 ==&lt;br /&gt;
Date: February 12th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=178</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=178"/>
		<updated>2026-02-24T19:12:24Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Version 2.4.1 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.2 ==&lt;br /&gt;
Date: February 19th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An archive file of all biomarker NTriples is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.1 ==&lt;br /&gt;
Date: February 12th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=177</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=177"/>
		<updated>2026-02-24T19:10:18Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.2 ==&lt;br /&gt;
Date: February 19th, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* An archive file of all biomarker NTriples is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.1 ==&lt;br /&gt;
Date: February 12th, 2026&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=176</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=176"/>
		<updated>2026-02-06T19:31:26Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB is now available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=175</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=175"/>
		<updated>2026-02-06T19:30:55Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
* GWAS and SenNet biomarkers have their controlled vocabulary terms displayed consistently.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=174</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=174"/>
		<updated>2026-02-05T20:09:58Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.4.0 ==&lt;br /&gt;
Date: February 5, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* MarkerDB data has been removed due to its license being free for academic use only.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed the issue where glycan biomarkers were being assigned incorrect GlyTouCan IDs in the controlled vocabulary field.&lt;br /&gt;
* An advanced search by some data sources, e.g., ClinVar, now yields biomarkers from the data source in question instead of showing all biomarkers.&lt;br /&gt;
* Duplicate entity normal range rows have been removed where applicable.&lt;br /&gt;
* Entity type casing in searches and search filters has been corrected.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=172</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=172"/>
		<updated>2026-01-26T16:31:59Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.1 ==&lt;br /&gt;
Date: January 22, 2026&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=162</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=162"/>
		<updated>2026-01-19T19:41:37Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.1 ==&lt;br /&gt;
Planned date: January 22, 2026&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* A master list of all biomarkers present in BiomarkerKB will be available for download at [https://data.biomarkerkb.org/ data.biomarkerkb.org].&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=161</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=161"/>
		<updated>2026-01-15T14:14:50Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* The &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; field in TSV files is now constructed based on the &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; tuple. Previously it only used &amp;lt;code&amp;gt;biomarker_id&amp;lt;/code&amp;gt; as key, introducing inconsistencies in biomarkers that had multiple &amp;lt;code&amp;gt;biomarker_component&amp;lt;/code&amp;gt; objects.&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=158</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=158"/>
		<updated>2026-01-12T20:35:10Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Version 2.2.0 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shown in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=157</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=157"/>
		<updated>2026-01-12T20:34:38Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: 2.3.0&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments when new data is added.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.3.0 ==&lt;br /&gt;
Date: January 12, 2026&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* New dataset: Top 50 Clinically Relevant Disease Biomarkers created and manually curated by Sparsh Gupta.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* New [https://biomarkerkb.org/biomarker-search/ Advanced Search] type: users can now search biomarkers by Data Source. The following data sources are currently available:&lt;br /&gt;
** &amp;lt;code&amp;gt;cgi&amp;lt;/code&amp;gt; (Cancer Genome Interpreter)&lt;br /&gt;
** &amp;lt;code&amp;gt;civic&amp;lt;/code&amp;gt; (CIViC)&lt;br /&gt;
** &amp;lt;code&amp;gt;clinvar&amp;lt;/code&amp;gt; (ClinVar)&lt;br /&gt;
** &amp;lt;code&amp;gt;edrn&amp;lt;/code&amp;gt; (Early Detection Research Network)&lt;br /&gt;
** &amp;lt;code&amp;gt;gwas&amp;lt;/code&amp;gt; (Genome-Wide Association Studies)&lt;br /&gt;
** &amp;lt;code&amp;gt;llm_glycan&amp;lt;/code&amp;gt; (LLM-extracted glycan biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;markerdb&amp;lt;/code&amp;gt; (MarkerDB)&lt;br /&gt;
** &amp;lt;code&amp;gt;mw&amp;lt;/code&amp;gt; (Metabolomics Workbench)&lt;br /&gt;
** &amp;lt;code&amp;gt;oncomx&amp;lt;/code&amp;gt; (OncoMX)&lt;br /&gt;
** &amp;lt;code&amp;gt;opentargets&amp;lt;/code&amp;gt; (OpenTargets)&lt;br /&gt;
** &amp;lt;code&amp;gt;PMC_biomarker_sets&amp;lt;/code&amp;gt; (PubMed Central)&lt;br /&gt;
** &amp;lt;code&amp;gt;sennet&amp;lt;/code&amp;gt; (SenNet Consortium)&lt;br /&gt;
** &amp;lt;code&amp;gt;top_50&amp;lt;/code&amp;gt; (Top-50 clinically relevant biomarkers)&lt;br /&gt;
** &amp;lt;code&amp;gt;upkb_reviewed_v2&amp;lt;/code&amp;gt; (UniProtKB)&lt;br /&gt;
&lt;br /&gt;
== Version 2.2.0 ==&lt;br /&gt;
Date: December 22, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Electronic Health Records data has been added to creatinine biomarkers.&lt;br /&gt;
* New dataset: senescence biomarkers from [https://docs.sennetconsortium.org/biomarkers/ SenNet Consortium].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* On the API level, each biomarker now contains a new field: &amp;lt;code&amp;gt;biomarker_controlled_vocab&amp;lt;/code&amp;gt; which shows the standardized biomarker name. Original biomarker names are now shows in the &amp;lt;code&amp;gt;biomarker_orig&amp;lt;/code&amp;gt; field.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt;xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=147</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=147"/>
		<updated>2025-12-15T15:27:25Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.1.0 ==&lt;br /&gt;
Date: December 11, 2025&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added the LLM-extracted glycan biomarker dataset provided by Cyrus Chun Hong Au Yeung.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* The incorrect download links on the [https://data.biomarkerkb.org Data Portal] have been fixed.&lt;br /&gt;
* LOINC codes are no longer tied to specimen IDs.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 4, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=136</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=136"/>
		<updated>2025-12-08T20:18:47Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Version 2.0.2 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 8, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer tied to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=135</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=135"/>
		<updated>2025-12-08T20:18:31Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.2 ==&lt;br /&gt;
Date: December 8, 2025&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* LOINC codes are no longer to specimen (UBERON) IDs.&lt;br /&gt;
* For biomarkers that could not be mapped to [[Controlled Vocabulary and Keywords|Controlled Vocabulary]] the original biomarker name is displayed, followed by &amp;quot;in review&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=134</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=134"/>
		<updated>2025-12-08T14:09:41Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Data Updates */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers and other resources:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://genevatool.org/ GENEVA]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://proteincapture.org/ Protein Capture Reagents Program]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://exrna-atlas.org/ exRNA Atlas]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=133</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=133"/>
		<updated>2025-11-21T00:32:51Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added cross-references to the Common Fund Data Ecosystem ([https://commonfund.nih.gov/dataecosystem CFDE]) Data Coordinating Centers:&lt;br /&gt;
** [https://www.gtexportal.org/home/ GTEx]&lt;br /&gt;
** [https://pharos.nih.gov/ Pharos]&lt;br /&gt;
** [https://genevatool.org/ GENEVA]&lt;br /&gt;
** [https://reactome.org/ Reactome]&lt;br /&gt;
** [https://undiagnosed.hms.harvard.edu/ Undiagnosed Diseases Network]&lt;br /&gt;
** [https://idg.reactome.org/ Illuminating the Druggable Genome (IDG) Reactome Portal]&lt;br /&gt;
** [https://proteincapture.org/ Protein Capture Reagents Program]&lt;br /&gt;
** [https://www.metabolomicsworkbench.org/ Metabolomics Workbench]&lt;br /&gt;
** [https://exrna-atlas.org/ exRNA Atlas]&lt;br /&gt;
** [https://maayanlab.cloud/sigcom-lincs SigCom LINCS]&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
* Added Electronic Health Records Normal ranges data from Oracle Health for Troponin I as an example.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=131</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=131"/>
		<updated>2025-11-14T16:02:29Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 2.0.0 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* The biomarker field is now standardized using controlled vocabulary terms.&lt;br /&gt;
* Added metabolite as an &amp;lt;code&amp;gt;assessed_entity_type&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Added [https://rnacentral.org/ RNAcentral] cross-reference support.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.6 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added a new dataset: MW LOINC biomarkers (&amp;lt;code&amp;gt;mw_loinc_biomarkers.tsv&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Added [https://ncithesaurus.nci.nih.gov/ National Cancer Institute Thesaurus] and [https://www.rcsb.org/ Protein Data Bank] cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Added the &amp;lt;code&amp;gt;display_name&amp;lt;/code&amp;gt; field to the &amp;lt;code&amp;gt;format-converter&amp;lt;/code&amp;gt; so data source names appear with correct casing.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.5 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Updated the Troponin biomarker value &amp;lt;code&amp;gt;assessed_biomarker_entity&amp;lt;/code&amp;gt; for consistency.&lt;br /&gt;
* Added normal ranges from Electronic Health Records provided by the University of New Mexico for Troponin biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* Updated all script paths to use &amp;lt;code&amp;gt;data_source.conf&amp;lt;/code&amp;gt; and validated data source names.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=117</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=117"/>
		<updated>2025-10-22T16:24:10Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Protein Biomarker */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## biomarker&lt;br /&gt;
## assessed_biomarker_entity and assessed_biomarker_entity_id&lt;br /&gt;
## condition and condition_id OR exposure_agent and exposure_agent_id&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## condition_id = DOID&lt;br /&gt;
## specimen_id = UBERON&lt;br /&gt;
## evidence_source = &amp;quot;SOURCE&amp;quot;:&amp;quot;ID&amp;quot;&lt;br /&gt;
## For assessed_biomarker_entity_id please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, data_conversion.py script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Once data is formatted and cleaned please send any data to daniallmasood@gwu.edu ===&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and condition ID (from Disease Ontology ID) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and specimen_ID in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *HGNC gene symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=116</id>
		<title>Data Submission/Data Upload</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Submission/Data_Upload&amp;diff=116"/>
		<updated>2025-10-21T15:19:38Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* Metabolite Biomarker */ fix casing&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Instructions to submit Biomarker Data==&lt;br /&gt;
To submit data for the BiomarkerKB Portal, the biomarker data model must be followed. Instructions on how to format the data for submission, where to send it, and creating a BCO for the data submitted will be provided below.&lt;br /&gt;
&lt;br /&gt;
# Biomarker data collected should follow the biomarker data model.&lt;br /&gt;
# &amp;quot;Core&amp;quot; fields should be filled in from the data source where biomarker data is collected. Core fields:&lt;br /&gt;
## biomarker&lt;br /&gt;
## assessed_biomarker_entity and assessed_biomarker_entity_id&lt;br /&gt;
## condition and condition_id OR exposure_agent and exposure_agent_id&lt;br /&gt;
# Other fields and annotations may also be collected from the data source, however if data is missing it can also be inferred or mapped from other sources.&lt;br /&gt;
# Apply the following standards to the data when possible:&lt;br /&gt;
## condition_id = DOID&lt;br /&gt;
## specimen_id = UBERON&lt;br /&gt;
## evidence_source = &amp;quot;SOURCE&amp;quot;:&amp;quot;ID&amp;quot;&lt;br /&gt;
## For assessed_biomarker_entity_id please refer to this GitHub documentation for which standards to follow&lt;br /&gt;
# Provide extra annotations from your DCC/data with the agreed upon standards from the Biomarker Annotation RFC. This data does not have to follow the data model and can be submitted in a separate file.&lt;br /&gt;
## For example: Relevant EHR data/LOINC data for biomarkers/biomarker entities can be included in a separate sheet.&lt;br /&gt;
# Create a tsv/json file with the agreed upon fields which correspond to the biomarker data model. The data dictionary provides details on what the different fields represent.&lt;br /&gt;
## The preferred method for data submission is a json file as it will help ingest the data into the existing data efficiently. However, tsv file submissions are ok as well. In the GitHub, data_conversion.py script exists in the Data Conversion Folder and it will handle tsv to json file conversion and json to tsv file conversion as well.&lt;br /&gt;
## The biomarker data page has examples of tsv data submissions and how the data should be formatted with the appropriate biomarker fields. Example&lt;br /&gt;
# For panel biomarkers, if the biomarkers are part of the same panel, the biomarker_id value for each biomarker should be any string value that can uniquely identify which rows are part of the same biomarker panel. Documentation&lt;br /&gt;
# If curating data in tsv format: If biomarker rows are part of the same biomarker entry but differ on specimen, evidence, or role, then the biomarker_id for each row should be any string value that can uniquely identify which rows are part of the same biomarker.&lt;br /&gt;
&lt;br /&gt;
=== Once data is formatted and cleaned please send any data to daniallmasood@gwu.edu ===&lt;br /&gt;
# Concurrently with submitting data please fill out the BCO Information: Biomarker Data Google Form.&lt;br /&gt;
## This will give metadata and description on how biomarker data was collected and is important for adding submitted data to the Biomarker Data page. An example of a previous BCO is provided in the sheet and available on the biomarker data page as well. [https://hivelab.biochemistry.gwu.edu/biomarker-partnership/data/BCO_000435 Example]&lt;br /&gt;
# If there are any further questions please consult the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for contributing data or reach out to Daniall using the email above.&lt;br /&gt;
&lt;br /&gt;
==Standardized and Controlled Vocabulary==&lt;br /&gt;
There is a standard way to report some biomarker data. This section covers how the actual biomarker should be reported and how other fields should be filled out.&lt;br /&gt;
&lt;br /&gt;
=== Condition ===&lt;br /&gt;
Condition should be reported in all lowercase and condition ID (from Disease Ontology ID) should be provided in the following column&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity ===&lt;br /&gt;
assessed_biomarker_entity is the entity in which the change is assessed.&lt;br /&gt;
&lt;br /&gt;
Should start off with a capital letter but if it is just a gene then it should remain in all capitals (e.g Myosin-binding protein H-like or IL6).&lt;br /&gt;
&lt;br /&gt;
If the entity type is anything but a gene the whole name should be typed out.&lt;br /&gt;
&lt;br /&gt;
=== assessed_entity_type ===&lt;br /&gt;
Report in all lowercase.&lt;br /&gt;
&lt;br /&gt;
=== assessed_biomarker_entity_id ===&lt;br /&gt;
Refer to the [https://github.com/clinical-biomarkers/biomarker-partnership/blob/main/supplementary_files/documentation/contributing_data.md GitHub Documentation] for the correct resource.&lt;br /&gt;
&lt;br /&gt;
=== best_biomarker_role ===&lt;br /&gt;
Report in all lowercase. Refer to the [https://www.ncbi.nlm.nih.gov/books/NBK326791/ BEST Resource] for the correct biomarker role.&lt;br /&gt;
&lt;br /&gt;
=== specimen ===&lt;br /&gt;
Report in all lowercase and specimen_ID in the following column should be from UBERON.&lt;br /&gt;
&lt;br /&gt;
=== biomarker ===&lt;br /&gt;
The biomarker field is the most important. There are several distinctions here and changes are made based on the entity being reported. The text should be in lowercase except when a gene name appears then it should remain all uppercase.&lt;br /&gt;
&lt;br /&gt;
==== Cell Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *cell name* count&#039;&#039;&#039;&lt;br /&gt;
* Example: increased WBC count&lt;br /&gt;
&lt;br /&gt;
==== Chemical Element Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *chemical element* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Na+ level&lt;br /&gt;
&lt;br /&gt;
==== DNA/RNA Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *DNA/RNA* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased cfDNA level&lt;br /&gt;
&lt;br /&gt;
==== Gene Biomarker ====&lt;br /&gt;
If the entity is a gene then there are different ways to report the biomarker based on how the mutation is reported:&lt;br /&gt;
&lt;br /&gt;
* Expression of gene:&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* overexpression&#039;&#039;&#039;&lt;br /&gt;
** &#039;&#039;&#039;*gene symbol* underexpression&#039;&#039;&#039;&lt;br /&gt;
** Example: EGFR overexpression&lt;br /&gt;
* Amplification of gene: &#039;&#039;&#039;*gene symbol* amplification&#039;&#039;&#039;&lt;br /&gt;
* Specific site mutation in the expressed protein that is caused by the gene: &#039;&#039;&#039;*gene symbol* *site mutation* mutation&#039;&#039;&#039;&lt;br /&gt;
** Example: BRAF V600E mutation&lt;br /&gt;
* SNPs: &#039;&#039;&#039;presence of *dbSNP ID* mutation in *gene symbol*&#039;&#039;&#039;&lt;br /&gt;
** Example: presence of rs180177132 mutation in PALB2&lt;br /&gt;
&lt;br /&gt;
==== Glycan Biomarker ====&lt;br /&gt;
Should be reported as: &#039;&#039;&#039;increased *glycan* level&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Example: increased N-glycan level&lt;br /&gt;
&lt;br /&gt;
==== Metabolite Biomarker ====&lt;br /&gt;
Should be reported as:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *metabolite* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased Urea level&lt;br /&gt;
&lt;br /&gt;
==== Protein Biomarker ====&lt;br /&gt;
Should be reported as either:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;increased *protein symbol* level&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;decreased *protein symbol* level&#039;&#039;&#039;&lt;br /&gt;
* Example: increased IL6 level&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For more examples please refer to the [https://data.biomarkerkb.org/ BiomarkerKB Data Page]&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=115</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=115"/>
		<updated>2025-10-17T16:38:35Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: 1.0.4 release&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.4 ==&lt;br /&gt;
This release introduces new datasets, cross-references, and bug fixes.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added Cancer Genome Interpreter data on cancer biomarkers from MetaKB.&lt;br /&gt;
* Added Metabolomics Workbench LOINC data on metabolite biomarkers.&lt;br /&gt;
* Added Cell Ontology and Protein Ontology cross-references.&lt;br /&gt;
=== Bug Fixes ===&lt;br /&gt;
* Fixed issue where cookie preferences weren&#039;t being saved when selecting &amp;quot;Allow&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=112</id>
		<title>Data Release Notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=Data_Release_Notes&amp;diff=112"/>
		<updated>2025-09-25T16:10:20Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: Data release 1.0.3&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Versioning Format ==&lt;br /&gt;
The versioning format follows a three-digit structure: X.Y.Z.&lt;br /&gt;
* The first digit (X) changes when a major update is introduced, such as changes in the data model.&lt;br /&gt;
* The second digit (Y) increments with each new release.&lt;br /&gt;
* The third digit (Z) is updated for bug fixes or minor changes.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.3 ==&lt;br /&gt;
This release introduces new cross-references and updates to ensure compatibility with external resources.&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* NCBI cross-references added across gene biomarker entries.&lt;br /&gt;
* ChEBI cross-references integrated for small molecules and metabolites.&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* ChEBI API migration: Updated all programmatic links from the legacy SOAP services to the new REST API endpoints, following ChEBI’s platform migration.&lt;br /&gt;
** Old services retired 1 September 2025.&lt;br /&gt;
** New stable API: [https://www.ebi.ac.uk/chebi/backend/api/docs ChEBI REST API docs]&lt;br /&gt;
** New data products and beta interface available at [https://www.ebi.ac.uk/chebi/beta/ ChEBI 2.0].&lt;br /&gt;
== Version 1.0.2 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Published updated [https://www.metabolomicsworkbench.org/ Metabolomics Workbench] data.&lt;br /&gt;
* Published sample data from the [https://edrn.nci.nih.gov/ Early Detection Research Network].&lt;br /&gt;
=== Backend and Infrastructure Updates ===&lt;br /&gt;
* &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; database names now retain their original casing for accuracy and consistency.&lt;br /&gt;
* EDRN identifiers were added to the [https://github.com/clinical-biomarkers/format-converter/blob/main/mapping_data/namespace_map.json namespace map].&lt;br /&gt;
* [https://www.genenames.org/ HUGO Gene Nomenclature Committee] (HGNC) was added to the cross-reference JSON file.&lt;br /&gt;
* Fixed an issue where &amp;lt;code&amp;gt;evidence_source&amp;lt;/code&amp;gt; values without tags were previously dropped; these are now preserved.&lt;br /&gt;
* Added a user-guided spelling correction function to improve data entry quality.&lt;br /&gt;
* The TSV-to-JSON converter now automatically checks for header spelling errors.&lt;br /&gt;
* Introduced &amp;lt;code&amp;gt;_suggest_header_corrections&amp;lt;/code&amp;gt; to flag and propose fixes for misspelled headers.&lt;br /&gt;
* Enhanced &amp;lt;code&amp;gt;_stream_tsv&amp;lt;/code&amp;gt; with a call to &amp;lt;code&amp;gt;_check_header_spelling&amp;lt;/code&amp;gt; to prevent invalid headers from being processed.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.1 ==&lt;br /&gt;
=== Data Updates ===&lt;br /&gt;
* Added &amp;lt;code&amp;gt; xrefs.tsv&amp;lt;/code&amp;gt; to the list of datasets.&lt;br /&gt;
=== Backend &amp;amp; Infrastructure Updates ===&lt;br /&gt;
* Fixed ID formatting issues in NCBI and UniProt references within &amp;lt;code&amp;gt; oncomx.tsv&amp;lt;/code&amp;gt;, removing erroneous spaces (e.g., &amp;lt;code&amp;gt; NCBI: 3288&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt; NCBI:3288&amp;lt;/code&amp;gt;) and extraneous text (e.g., &amp;lt;code&amp;gt;&amp;quot;(composition)&amp;quot;&amp;lt;/code&amp;gt;). Affected biomarkers included AN6295-1, AN6756-1, AN6728-1, and others.&lt;br /&gt;
* Merged assessed entity type synonyms.&lt;br /&gt;
&lt;br /&gt;
== Version 1.0.0 ==&lt;br /&gt;
* BiomarkerKB data portal available with OncoMX, OpenTargets, MarkerDB, ClinVar, PubMed Central Biomarker Gene Set Curation, MW, UniProtKB, GWAS, CIViC biomarker data.&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=111</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=111"/>
		<updated>2025-09-15T19:08:47Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* UniProtKB */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data.&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.&lt;br /&gt;
* Contextual information can be imputed if necessary.&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers:&lt;br /&gt;
** found_in will get a cross-reference;&lt;br /&gt;
** actual biomarkers will be directly integrated.&lt;br /&gt;
* Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file.&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0).&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=110</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=110"/>
		<updated>2025-09-15T19:07:50Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* OpenTargets */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets.&lt;br /&gt;
* Some effort was required to find the correct biomarker data.&lt;br /&gt;
* 1200 biomarkers collected.&lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=109</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=109"/>
		<updated>2025-09-15T19:07:20Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* OncoKB */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities.&lt;br /&gt;
* Also provides information based on what condition the entity is related to.&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=108</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=108"/>
		<updated>2025-09-15T19:06:48Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* MarkerDB */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well.&lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.&lt;br /&gt;
* Annotations that can be cross-referenced include the above.&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=107</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=107"/>
		<updated>2025-09-15T19:06:21Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* HPO */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations.&lt;br /&gt;
* Does not provide a change within the entity so we cannot collect biomarker data from here.&lt;br /&gt;
* However we can use it as a cross-reference within our cross-referencing section.&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO.&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License. &lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc&lt;br /&gt;
* Annotations that can be cross-referenced include the above&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=106</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=106"/>
		<updated>2025-09-15T19:05:34Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* GWAS */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Published genome-wide association studies (GWAS).&lt;br /&gt;
* Provides biomarkers in form of SNPs.&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases.&lt;br /&gt;
** Preliminary curation only focused on cancer.&lt;br /&gt;
** Will use existing script to map all biomarkers into data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations&lt;br /&gt;
* Does not provide a change within the entity&lt;br /&gt;
* So we cannot collect biomarker data from here&lt;br /&gt;
* However we can use it as a cross reference within our cross referencing section&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License. &lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc&lt;br /&gt;
* Annotations that can be cross-referenced include the above&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=105</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=105"/>
		<updated>2025-09-15T19:05:11Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* EDRN */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Cancer biomarkers.&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* published genome-wide association studies (GWAS)&lt;br /&gt;
* Provides biomarkers in form of SNPs&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases&lt;br /&gt;
** Preliminary curation only focused on cancer&lt;br /&gt;
** Will use existing script to map all biomarkers into data model&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations&lt;br /&gt;
* Does not provide a change within the entity&lt;br /&gt;
* So we cannot collect biomarker data from here&lt;br /&gt;
* However we can use it as a cross reference within our cross referencing section&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License. &lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc&lt;br /&gt;
* Annotations that can be cross-referenced include the above&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=104</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=104"/>
		<updated>2025-09-15T19:04:58Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* ClinVar */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses.&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* cancer biomarkers&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* published genome-wide association studies (GWAS)&lt;br /&gt;
* Provides biomarkers in form of SNPs&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases&lt;br /&gt;
** Preliminary curation only focused on cancer&lt;br /&gt;
** Will use existing script to map all biomarkers into data model&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations&lt;br /&gt;
* Does not provide a change within the entity&lt;br /&gt;
* So we cannot collect biomarker data from here&lt;br /&gt;
* However we can use it as a cross reference within our cross referencing section&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License. &lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc&lt;br /&gt;
* Annotations that can be cross-referenced include the above&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
	<entry>
		<id>https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=103</id>
		<title>BiomarkerKB Resource Integration</title>
		<link rel="alternate" type="text/html" href="https://wiki.biomarkerkb.org/index.php?title=BiomarkerKB_Resource_Integration&amp;diff=103"/>
		<updated>2025-09-15T19:04:37Z</updated>

		<summary type="html">&lt;p&gt;MariaKim: /* CIViC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.&lt;br /&gt;
&lt;br /&gt;
Other resources to be explored: [https://search.cancervariants.org/ MetaKB], [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data &lt;br /&gt;
&lt;br /&gt;
=CIViC=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Clinical Interpretation of Variants in Cancer (CIViC).&lt;br /&gt;
* Provides cancer biomarkers in form of DNA mutations (dbSNPs).&lt;br /&gt;
* Platform provides clinicians treatment options for patients based on unique tumor profile.&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=ClinVar=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Public archive of reports of human variations classified for diseases and drug responses&lt;br /&gt;
* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now&lt;br /&gt;
** dbSNPs&lt;br /&gt;
** File is really big but will go back and use existing script to map all biomarkers from here into the data model&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=EDRN=&lt;br /&gt;
Status: Sample Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* cancer biomarkers&lt;br /&gt;
&lt;br /&gt;
=GWAS=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* published genome-wide association studies (GWAS)&lt;br /&gt;
* Provides biomarkers in form of SNPs&lt;br /&gt;
* GWAS Catalog contains SNPs for a vast amount of diseases&lt;br /&gt;
** Preliminary curation only focused on cancer&lt;br /&gt;
** Will use existing script to map all biomarkers into data model&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=HPO=&lt;br /&gt;
&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* HPO provides disease and entity associations&lt;br /&gt;
* Does not provide a change within the entity&lt;br /&gt;
* So we cannot collect biomarker data from here&lt;br /&gt;
* However we can use it as a cross reference within our cross referencing section&lt;br /&gt;
* Provides cross-reference to OMIM, SNOMED, and MONDO&lt;br /&gt;
&lt;br /&gt;
=LOINC=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=MarkerDB=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Provides a lot of useful biomarker data and cross-references other resources as well&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License. &lt;br /&gt;
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc&lt;br /&gt;
* Annotations that can be cross-referenced include the above&lt;br /&gt;
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers&lt;br /&gt;
&lt;br /&gt;
=Metabolomics Workbench=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Metabolomics Workbench&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Metabolite biomarkers utilized in the uniform newborn screening program.&lt;br /&gt;
* Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.&lt;br /&gt;
&lt;br /&gt;
=OncoKB=&lt;br /&gt;
Status: Cross-Reference&lt;br /&gt;
&lt;br /&gt;
* Provides useful information on drugs and therapy options for different biomarker entities&lt;br /&gt;
* Also provides information based on what condition the entity is related to&lt;br /&gt;
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.&lt;br /&gt;
* Paid license is required&lt;br /&gt;
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution&lt;br /&gt;
&lt;br /&gt;
=OncoMX=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* integrated cancer mutation and expression resource for exploring cancer biomarkers&lt;br /&gt;
* Manual curation effort by GWU and JPL&lt;br /&gt;
* Over 600 single and panel biomarkers&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=OpenTargets=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Collects potential drug targets and therapeutic targets&lt;br /&gt;
* Some effort was required to find the correct biomarker data &lt;br /&gt;
* 1200 biomarkers collected &lt;br /&gt;
** dbSNPs related to cancer and other disease&lt;br /&gt;
* License: Creative Commons Attribution-NonCommercial 4.0 International License.&lt;br /&gt;
&lt;br /&gt;
=PubMed Central Biomarker Gene Set Curation=&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Data provided by Avi Ma&#039;ayan&#039;s LINCS group&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene. &lt;br /&gt;
* Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets. &lt;br /&gt;
* The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications. &lt;br /&gt;
* The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.&lt;br /&gt;
&lt;br /&gt;
=UniProtKB=&lt;br /&gt;
&lt;br /&gt;
Status: Direct Integration into Data Model&lt;br /&gt;
&lt;br /&gt;
* Can provide biomarker (change in entity), entity, condition, and sampling data&lt;br /&gt;
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted&lt;br /&gt;
* Contextual information can be imputed if necessary&lt;br /&gt;
* License is Creative Commons Attribution 4.0 International (CC BY 4.0)&lt;br /&gt;
* In UniProt there are found_in and entries that are actual biomarkers&lt;br /&gt;
** found_in will get an cross reference&lt;br /&gt;
** actual biomarkers will be directly integrated&lt;br /&gt;
*Manual curation of 56 reviewed entries with mention of &amp;quot;biomarker&amp;quot; in flat text file&lt;/div&gt;</summary>
		<author><name>MariaKim</name></author>
	</entry>
</feed>