BiomarkerKB Resource Integration

From BiomarkerKB Wiki
Revision as of 19:04, 15 September 2025 by MariaKim (talk | contribs) (→‎CIViC)
Jump to navigation Jump to search

BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.

Other resources to be explored: MetaKB, CADSR Cancer, https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.cancergenomeinterpreter.org/biomarkers


Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data

CIViC

Status: Direct Integration into Data Model

  • Clinical Interpretation of Variants in Cancer (CIViC).
  • Provides cancer biomarkers in form of DNA mutations (dbSNPs).
  • Platform provides clinicians treatment options for patients based on unique tumor profile.
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.

ClinVar

Status: Direct Integration into Data Model

  • Public archive of reports of human variations classified for diseases and drug responses
  • Provides biomarkers for all disease, but we have only curated cancer biomarkers for now
    • dbSNPs
    • File is really big but will go back and use existing script to map all biomarkers from here into the data model
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.

EDRN

Status: Sample Integration into Data Model

  • cancer biomarkers

GWAS

Status: Direct Integration into Data Model

  • published genome-wide association studies (GWAS)
  • Provides biomarkers in form of SNPs
  • GWAS Catalog contains SNPs for a vast amount of diseases
    • Preliminary curation only focused on cancer
    • Will use existing script to map all biomarkers into data model
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.

HPO

Status: Cross-Reference

  • HPO provides disease and entity associations
  • Does not provide a change within the entity
  • So we cannot collect biomarker data from here
  • However we can use it as a cross reference within our cross referencing section
  • Provides cross-reference to OMIM, SNOMED, and MONDO

LOINC

Status: Cross-Reference

Data provided by Metabolomics Workbench

MarkerDB

Status: Direct Integration into Data Model

  • Provides a lot of useful biomarker data and cross-references other resources as well
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.
  • Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc
  • Annotations that can be cross-referenced include the above
  • By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers

Metabolomics Workbench

Status: Direct Integration into Data Model

Data provided by Metabolomics Workbench

  • Metabolite biomarkers utilized in the uniform newborn screening program.
  • Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.

OncoKB

Status: Cross-Reference

  • Provides useful information on drugs and therapy options for different biomarker entities
  • Also provides information based on what condition the entity is related to
  • License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.
  • Paid license is required
  • Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution

OncoMX

Status: Direct Integration into Data Model

  • integrated cancer mutation and expression resource for exploring cancer biomarkers
  • Manual curation effort by GWU and JPL
  • Over 600 single and panel biomarkers
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.

OpenTargets

Status: Direct Integration into Data Model

  • Collects potential drug targets and therapeutic targets
  • Some effort was required to find the correct biomarker data
  • 1200 biomarkers collected
    • dbSNPs related to cancer and other disease
  • License: Creative Commons Attribution-NonCommercial 4.0 International License.

PubMed Central Biomarker Gene Set Curation

Status: Direct Integration into Data Model

Data provided by Avi Ma'ayan's LINCS group

  • This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene.
  • Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets.
  • The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications.
  • The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.

UniProtKB

Status: Direct Integration into Data Model

  • Can provide biomarker (change in entity), entity, condition, and sampling data
  • This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted
  • Contextual information can be imputed if necessary
  • License is Creative Commons Attribution 4.0 International (CC BY 4.0)
  • In UniProt there are found_in and entries that are actual biomarkers
    • found_in will get an cross reference
    • actual biomarkers will be directly integrated
  • Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file