Revision as of 16:20, 16 December 2025

BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.

Other resources to be explored: CADSR Cancer, https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.c, Glycan Biomarkers (code), Alliance Genome

Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data

GWAS

Status: Direct Integration into Data Model

Published genome-wide association studies (GWAS).
Provides biomarkers in form of SNPs.
GWAS Catalog contains SNPs for a vast amount of diseases.
- Preliminary curation only focused on cancer.
- All available biomarkers for conditions in GWAS Catalog are integrated 12/11
License: Creative Commons Attribution-NonCommercial 4.0 International License.

MetaKB

Status: Direct Integration into Data Model

Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.
Aggregates and standardizes variant interpretation data from six major knowledgebases:
- CIViC (Clinical Interpretation of Variants in Cancer) [Already Integrated Directly]
- OncoKB [Yet to be integrated]
- JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated]
- MolecularMatch [Yet to be integrated]
- PMKB (Precision Medicine Knowledgebase) [Yet to be integrated]
- Cancer Genome Interpreter (CGI) – through its Cancer Biomarkers Database component .[Integrated]
Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs).
Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model.
Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.
Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.
Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.
License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:
- CIViC – CC0 (Public Domain)
- PMKB – CC-BY 4.0
- CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool
- JAX-CKB – CC-BY-NC-SA 4.0
- OncoKB – custom non-commercial license
- MolecularMatch – restricted commercial use
- MetaKB codebase – MIT license
Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.

Glycan LLM Biomarkers

LangChain LLM method used to collect biomarkers from PubMed Central abstracts
Method identifies glycan entities and changes mentioned in them associated to disease

Top 50 Biomarkers

Status: Direct Integration into Data Model

Biomarkers collected during Summer Volunteership
Volunteers identified top 50 biomarker entities from BiomarkerKB
Using this information the top 50 biomarker entities were searched in PubMed
100 biomarkers were manually curated

EDRN

Status: Sample Integration into Data Model

Cancer biomarkers.
Sample of EDRN Biomarkers provided from EDRN LLM method
Biomarkers are extracted from free text in EDRN publicly available biomarkers

LOINC

Status: Cross-Reference

Data provided by Metabolomics Workbench

OncoKB

Status: Cross-Reference

Provides useful information on drugs and therapy options for different biomarker entities.
Also provides information based on what condition the entity is related to.
License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.
Paid license is required
Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.

HPO

Status: Cross-Reference

HPO provides disease and entity associations.
Does not provide a change within the entity so we cannot collect biomarker data from here.
However we can use it as a cross-reference within our cross-referencing section.
Provides cross-reference to OMIM, SNOMED, and MONDO.

UniProtKB

Status: Direct Integration into Data Model

Can provide biomarker (change in entity), entity, condition, and sampling data.
This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.
Contextual information can be imputed if necessary.
In UniProt there are found_in and entries that are actual biomarkers:
- found_in will get a cross-reference;
- actual biomarkers will be directly integrated.
Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file.
License is Creative Commons Attribution 4.0 International (CC BY 4.0).

CIViC

Status: Direct Integration into Data Model

Clinical Interpretation of Variants in Cancer (CIViC).
Provides cancer biomarkers in form of DNA mutations (dbSNPs).
Platform provides clinicians treatment options for patients based on unique tumor profile.
License: Creative Commons Attribution-NonCommercial 4.0 International License.

ClinVar

Status: Direct Integration into Data Model

Public archive of reports of human variations classified for diseases and drug responses.
Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.
- dbSNPs
- File is really big but will go back and use existing script to map all biomarkers from here into the data model.
License: Creative Commons Attribution-NonCommercial 4.0 International License.

MarkerDB

Status: Direct Integration into Data Model

Provides a lot of useful biomarker data and cross-references other resources as well.
Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.
Annotations that can be cross-referenced include the above.
By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.
License: Creative Commons Attribution-NonCommercial 4.0 International License.

Metabolomics Workbench

Status: Direct Integration into Data Model

Data provided by Metabolomics Workbench

Metabolite biomarkers utilized in the uniform newborn screening program.
Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.

OncoMX

Status: Direct Integration into Data Model

integrated cancer mutation and expression resource for exploring cancer biomarkers
Manual curation effort by GWU and JPL
Over 600 single and panel biomarkers
License: Creative Commons Attribution-NonCommercial 4.0 International License.

OpenTargets

Status: Direct Integration into Data Model

Collects potential drug targets and therapeutic targets.
Some effort was required to find the correct biomarker data.
1200 biomarkers collected.
- dbSNPs related to cancer and other disease
License: Creative Commons Attribution-NonCommercial 4.0 International License.

PubMed Central Biomarker Gene Set Curation

Status: Direct Integration into Data Model

Data provided by Avi Ma'ayan's LINCS group

This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene.
Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets.
The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications.
The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.

SenNet Biomarker Data

Status: Direction Integration Into Data Model

Cell senescence biomarkers from SenNet group
Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2
Data is still valuable as contextual data and can be revisited to complete biomarker field in future

@@ Line 7: / Line 7: @@
 Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data
-=CIViC=
+= GWAS =
 Status: Direct Integration into Data Model
-* Clinical Interpretation of Variants in Cancer (CIViC).
+* Published genome-wide association studies (GWAS).
-* Provides cancer biomarkers in form of DNA mutations (dbSNPs).
+* Provides biomarkers in form of SNPs.
-* Platform provides clinicians treatment options for patients based on unique tumor profile.
+* GWAS Catalog contains SNPs for a vast amount of diseases.
+** Preliminary curation only focused on cancer.
+** All available biomarkers for conditions in GWAS Catalog are integrated 12/11
 * License: Creative Commons Attribution-NonCommercial 4.0 International License.
-=ClinVar=
+= MetaKB =
+Status: Direct Integration into Data Model
+* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.
+* Aggregates and standardizes variant interpretation data from six major knowledgebases:
+** CIViC (Clinical Interpretation of Variants in Cancer)  [Already Integrated Directly]
+** OncoKB  [Yet to be integrated]
+** JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated]
+** MolecularMatch [Yet to be integrated]
+** PMKB (Precision Medicine Knowledgebase) [Yet to be integrated]
+** Cancer Genome Interpreter (CGI) – through its ''Cancer Biomarkers Database'' component .[Integrated]
+* Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs).
+* Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model.
+* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.
+* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.
+* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.
+* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:
+** CIViC – CC0 (Public Domain)
+** PMKB – CC-BY 4.0
+** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool
+** JAX-CKB – CC-BY-NC-SA 4.0
+** OncoKB – custom non-commercial license
+** MolecularMatch – restricted commercial use
+** MetaKB codebase – MIT license
+* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.
+= Glycan LLM Biomarkers =
+* LangChain LLM method used to collect biomarkers from PubMed Central abstracts
+* Method identifies glycan entities and changes mentioned in them associated to disease
+= Top 50 Biomarkers =
 Status: Direct Integration into Data Model
+* Biomarkers collected during Summer Volunteership
+* Volunteers identified top 50 biomarker entities from BiomarkerKB
+* Using this information the top 50 biomarker entities were searched in PubMed
+* 100 biomarkers were manually curated
-* Public archive of reports of human variations classified for diseases and drug responses.
+*
-* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.
-** dbSNPs
-** File is really big but will go back and use existing script to map all biomarkers from here into the data model.
-* License: Creative Commons Attribution-NonCommercial 4.0 International License.
-=EDRN=
+= EDRN =
 Status: Sample Integration into Data Model
 * Cancer biomarkers.
+* Sample of EDRN Biomarkers provided from EDRN LLM method
+* Biomarkers are extracted from free text in EDRN publicly available biomarkers
-=GWAS=
+= LOINC =
-Status: Direct Integration into Data Model
+Status: Cross-Reference
-* Published genome-wide association studies (GWAS).
+''Data provided by Metabolomics Workbench''
-* Provides biomarkers in form of SNPs.
-* GWAS Catalog contains SNPs for a vast amount of diseases.
-** Preliminary curation only focused on cancer.
-** Will use existing script to map all biomarkers into data model.
-* License: Creative Commons Attribution-NonCommercial 4.0 International License.
-= Glycan LLM Biomarkers =
+= OncoKB =
+Status: Cross-Reference
-* LangChain LLM method used to collect biomarkers from PubMed Central abstracts
+* Provides useful information on drugs and therapy options for different biomarker entities.
-* Method identifies glycan entities and changes mentioned in them associated to disease
+* Also provides information based on what condition the entity is related to.
+* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.
+* Paid license is required
+* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.
 =HPO=
@@ Line 53: / Line 86: @@
 * Provides cross-reference to OMIM, SNOMED, and MONDO.
-=LOINC=
+= UniProtKB =
-Status: Cross-Reference
+Status: Direct Integration into Data Model
+* Can provide biomarker (change in entity), entity, condition, and sampling data.
+* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.
+* Contextual information can be imputed if necessary.
+* In UniProt there are found_in and entries that are actual biomarkers:
+** found_in will get a cross-reference;
+** actual biomarkers will be directly integrated.
+* Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file.
+* License is Creative Commons Attribution 4.0 International (CC BY 4.0).
+= CIViC =
+Status: Direct Integration into Data Model
+* Clinical Interpretation of Variants in Cancer (CIViC).
+* Provides cancer biomarkers in form of DNA mutations (dbSNPs).
+* Platform provides clinicians treatment options for patients based on unique tumor profile.
+* License: Creative Commons Attribution-NonCommercial 4.0 International License.
+=ClinVar=
+Status: Direct Integration into Data Model
-''Data provided by Metabolomics Workbench''
+* Public archive of reports of human variations classified for diseases and drug responses.
+* Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.
+** dbSNPs
+** File is really big but will go back and use existing script to map all biomarkers from here into the data model.
+* License: Creative Commons Attribution-NonCommercial 4.0 International License.
-=MarkerDB=
+= MarkerDB =
 Status: Direct Integration into Data Model
@@ Line 74: / Line 131: @@
 * Metabolite biomarkers utilized in the uniform newborn screening program.
 * Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.
-=OncoKB=
-Status: Cross-Reference
-* Provides useful information on drugs and therapy options for different biomarker entities.
-* Also provides information based on what condition the entity is related to.
-* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.
-* Paid license is required
-* Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.
 =OncoMX=
@@ Line 117: / Line 165: @@
 * Biomarker data was collected and incorporated however biomarker field was incomplete and data integrated was given a score of -2
 * Data is still valuable as contextual data and can be revisited to complete biomarker field in future
-= Top 50 Biomarkers =
-Status: Direct Integration into Data Model
-* Biomarkers collected during Summer Volunteership
-* Volunteers identified top 50 biomarker entities from BiomarkerKB
-* Using this information the top 50 biomarker entities were searched in PubMed
-* 100 biomarkers were manually curated
-=UniProtKB=
-Status: Direct Integration into Data Model
-* Can provide biomarker (change in entity), entity, condition, and sampling data.
-* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.
-* Contextual information can be imputed if necessary.
-* In UniProt there are found_in and entries that are actual biomarkers:
-** found_in will get a cross-reference;
-** actual biomarkers will be directly integrated.
-* Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file.
-* License is Creative Commons Attribution 4.0 International (CC BY 4.0).
-=MetaKB=
-Status: Direct Integration into Data Model
-* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.
-* Aggregates and standardizes variant interpretation data from six major knowledgebases:
-** CIViC (Clinical Interpretation of Variants in Cancer)  [Already Integrated Directly]
-** OncoKB  [Yet to be integrated]
-** JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated]
-** MolecularMatch [Yet to be integrated]
-** PMKB (Precision Medicine Knowledgebase) [Yet to be integrated]
-** Cancer Genome Interpreter (CGI) – through its ''Cancer Biomarkers Database'' component .[Integrated]
-* Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs).
-* Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model.
-* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.
-* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.
-* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.
-* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:
-** CIViC – CC0 (Public Domain)
-** PMKB – CC-BY 4.0
-** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool
-** JAX-CKB – CC-BY-NC-SA 4.0
-** OncoKB – custom non-commercial license
-** MolecularMatch – restricted commercial use
-** MetaKB codebase – MIT license
-* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.

BiomarkerKB Resource Integration: Difference between revisions

Revision as of 16:20, 16 December 2025

Contents

GWAS

MetaKB

Glycan LLM Biomarkers

Top 50 Biomarkers

EDRN

LOINC

OncoKB

HPO

UniProtKB

CIViC

ClinVar

MarkerDB

Metabolomics Workbench

OncoMX

OpenTargets

PubMed Central Biomarker Gene Set Curation

SenNet Biomarker Data

Navigation menu

BiomarkerKB Resource Integration: Difference between revisions

Revision as of 16:20, 16 December 2025

GWAS

MetaKB

Glycan LLM Biomarkers

Top 50 Biomarkers

EDRN

LOINC

OncoKB

HPO

UniProtKB

CIViC

ClinVar

MarkerDB

Metabolomics Workbench

OncoMX

OpenTargets

PubMed Central Biomarker Gene Set Curation

SenNet Biomarker Data

Navigation menu

Search