BiomarkerKB Resource Integration: Difference between revisions
No edit summary |
|||
| (6 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references. | BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references. | ||
Other resources to be explored: | Other resources to be explored: [https://cadsr.cancer.gov/onedata/Home.jsp CADSR Cancer], https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, [https://www.cancergenomeinterpreter.org/biomarkers https://www.c], [https://github.com/issues/assigned?issue=clinical-biomarkers%7Cbiomarker-issue-repo%7C248 Glycan Biomarkers] ([https://github.com/glygener/CarboCurator code]) | ||
| Line 42: | Line 42: | ||
Status: Cross-Reference | Status: Cross-Reference | ||
* HPO provides disease and entity associations | * HPO provides disease and entity associations. | ||
* Does not provide a change within the entity | * Does not provide a change within the entity so we cannot collect biomarker data from here. | ||
* However we can use it as a cross-reference within our cross-referencing section. | |||
* However we can use it as a cross reference within our cross referencing section | * Provides cross-reference to OMIM, SNOMED, and MONDO. | ||
* Provides cross-reference to OMIM, SNOMED, and MONDO | |||
=LOINC= | =LOINC= | ||
| Line 56: | Line 55: | ||
Status: Direct Integration into Data Model | Status: Direct Integration into Data Model | ||
* Provides a lot of useful biomarker data and cross-references other resources as well | * Provides a lot of useful biomarker data and cross-references other resources as well. | ||
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc. | |||
* Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc | * Annotations that can be cross-referenced include the above. | ||
* Annotations that can be cross-referenced include the above | * By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers. | ||
* By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers | * License: Creative Commons Attribution-NonCommercial 4.0 International License. | ||
=Metabolomics Workbench= | =Metabolomics Workbench= | ||
| Line 73: | Line 72: | ||
Status: Cross-Reference | Status: Cross-Reference | ||
* Provides useful information on drugs and therapy options for different biomarker entities | * Provides useful information on drugs and therapy options for different biomarker entities. | ||
* Also provides information based on what condition the entity is related to | * Also provides information based on what condition the entity is related to. | ||
* License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes. | * License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes. | ||
* Paid license is required | * Paid license is required | ||
* Cross reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution | * Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution. | ||
=OncoMX= | =OncoMX= | ||
| Line 90: | Line 89: | ||
Status: Direct Integration into Data Model | Status: Direct Integration into Data Model | ||
* Collects potential drug targets and therapeutic targets | * Collects potential drug targets and therapeutic targets. | ||
* Some effort was required to find the correct biomarker data | * Some effort was required to find the correct biomarker data. | ||
* 1200 biomarkers collected | * 1200 biomarkers collected. | ||
** dbSNPs related to cancer and other disease | ** dbSNPs related to cancer and other disease | ||
* License: Creative Commons Attribution-NonCommercial 4.0 International License. | * License: Creative Commons Attribution-NonCommercial 4.0 International License. | ||
| Line 110: | Line 109: | ||
Status: Direct Integration into Data Model | Status: Direct Integration into Data Model | ||
* Can provide biomarker (change in entity), entity, condition, and sampling data | * Can provide biomarker (change in entity), entity, condition, and sampling data. | ||
* This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted | * This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted. | ||
* Contextual information can be imputed if necessary | * Contextual information can be imputed if necessary. | ||
* In UniProt there are found_in and entries that are actual biomarkers: | |||
* In UniProt there are found_in and entries that are actual biomarkers | ** found_in will get a cross-reference; | ||
** found_in will get | ** actual biomarkers will be directly integrated. | ||
** actual biomarkers will be directly integrated | * Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file. | ||
*Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file | * License is Creative Commons Attribution 4.0 International (CC BY 4.0). | ||
=MetaKB= | |||
Status: Direct Integration into Data Model | |||
* Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence. | |||
* Aggregates and standardizes variant interpretation data from six major knowledgebases: | |||
** CIViC (Clinical Interpretation of Variants in Cancer) [Already Integrated Directly] | |||
** OncoKB [Yet to be integrated] | |||
** JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated] | |||
** MolecularMatch [Yet to be integrated] | |||
** PMKB (Precision Medicine Knowledgebase) [Yet to be integrated] | |||
** Cancer Genome Interpreter (CGI) – through its ''Cancer Biomarkers Database'' component .[Integrated] | |||
* Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs). | |||
* Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model. | |||
* Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified. | |||
* Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references. | |||
* Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources. | |||
* License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses: | |||
** CIViC – CC0 (Public Domain) | |||
** PMKB – CC-BY 4.0 | |||
** CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool | |||
** JAX-CKB – CC-BY-NC-SA 4.0 | |||
** OncoKB – custom non-commercial license | |||
** MolecularMatch – restricted commercial use | |||
** MetaKB codebase – MIT license | |||
* Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers. | |||
Latest revision as of 20:26, 23 October 2025
BiomarkerKB collects data from many different resources. The data that is collected is not always directly integrated into the data model and data from a resource is sometimes just added as valuable contextual annotations or cross references.
Other resources to be explored: CADSR Cancer, https://themarker.idrblab.cn/, biomarker.org, ResMarkerDB, SalivaDB, https://glycanage.com/publications, https://www.c, Glycan Biomarkers (code)
Please contact us at mazumder_lab@gwu.edu and daniallmasood@gwu.edu if you have any other resources that may contain biomarker data
CIViC
Status: Direct Integration into Data Model
- Clinical Interpretation of Variants in Cancer (CIViC).
- Provides cancer biomarkers in form of DNA mutations (dbSNPs).
- Platform provides clinicians treatment options for patients based on unique tumor profile.
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
ClinVar
Status: Direct Integration into Data Model
- Public archive of reports of human variations classified for diseases and drug responses.
- Provides biomarkers for all disease, but we have only curated cancer biomarkers for now.
- dbSNPs
- File is really big but will go back and use existing script to map all biomarkers from here into the data model.
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
EDRN
Status: Sample Integration into Data Model
- Cancer biomarkers.
GWAS
Status: Direct Integration into Data Model
- Published genome-wide association studies (GWAS).
- Provides biomarkers in form of SNPs.
- GWAS Catalog contains SNPs for a vast amount of diseases.
- Preliminary curation only focused on cancer.
- Will use existing script to map all biomarkers into data model.
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
HPO
Status: Cross-Reference
- HPO provides disease and entity associations.
- Does not provide a change within the entity so we cannot collect biomarker data from here.
- However we can use it as a cross-reference within our cross-referencing section.
- Provides cross-reference to OMIM, SNOMED, and MONDO.
LOINC
Status: Cross-Reference
Data provided by Metabolomics Workbench
MarkerDB
Status: Direct Integration into Data Model
- Provides a lot of useful biomarker data and cross-references other resources as well.
- Information includes: panel information, abnormal levels of biomarkers by disease, structural information, etc.
- Annotations that can be cross-referenced include the above.
- By cross-referencing, BiomarkerKB will allow users to find more information for specific biomarkers and move towards the goal of being a comprehensive resource for biomarkers.
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
Metabolomics Workbench
Status: Direct Integration into Data Model
Data provided by Metabolomics Workbench
- Metabolite biomarkers utilized in the uniform newborn screening program.
- Detect treatable disorders that are life threatening or having long-term morbidity, before they become symptomatic.
OncoKB
Status: Cross-Reference
- Provides useful information on drugs and therapy options for different biomarker entities.
- Also provides information based on what condition the entity is related to.
- License: A license is required to use OncoKB for commercial and/or clinical purposes, and to access OncoKB data programmatically for academic purposes.
- Paid license is required
- Cross-reference from biomarkers in BiomarkerKB to the appropriate drug information and therapy information is the best solution.
OncoMX
Status: Direct Integration into Data Model
- integrated cancer mutation and expression resource for exploring cancer biomarkers
- Manual curation effort by GWU and JPL
- Over 600 single and panel biomarkers
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
OpenTargets
Status: Direct Integration into Data Model
- Collects potential drug targets and therapeutic targets.
- Some effort was required to find the correct biomarker data.
- 1200 biomarkers collected.
- dbSNPs related to cancer and other disease
- License: Creative Commons Attribution-NonCommercial 4.0 International License.
PubMed Central Biomarker Gene Set Curation
Status: Direct Integration into Data Model
Data provided by Avi Ma'ayan's LINCS group
- This data set was created through manual curation of biomarker gene sets on Pubmed Central using the results of gene sets returned from Rummagene.
- Using the outputted search results within the Rummagene web server, we manually identified publications that associated different conditions and environmental exposures to biomarker gene sets.
- The biomarker gene sets were retrieved through the validation of the gene mentioned within each of the publications.
- The primary use case for this data is to identify biomarker panels/ gene sets associated with conditions.
UniProtKB
Status: Direct Integration into Data Model
- Can provide biomarker (change in entity), entity, condition, and sampling data.
- This data is in a text file that has to be reviewed fully and to make sure it will be able to be automatically extracted.
- Contextual information can be imputed if necessary.
- In UniProt there are found_in and entries that are actual biomarkers:
- found_in will get a cross-reference;
- actual biomarkers will be directly integrated.
- Manual curation of 56 reviewed entries with mention of "biomarker" in flat text file.
- License is Creative Commons Attribution 4.0 International (CC BY 4.0).
MetaKB
Status: Direct Integration into Data Model
- Provides harmonized associations between cancer genomic variants, diseases, and therapeutic evidence.
- Aggregates and standardizes variant interpretation data from six major knowledgebases:
- CIViC (Clinical Interpretation of Variants in Cancer) [Already Integrated Directly]
- OncoKB [Yet to be integrated]
- JAX-CKB (The Jackson Laboratory Clinical Knowledgebase) [Yet to be integrated]
- MolecularMatch [Yet to be integrated]
- PMKB (Precision Medicine Knowledgebase) [Yet to be integrated]
- Cancer Genome Interpreter (CGI) – through its Cancer Biomarkers Database component .[Integrated]
- Enables mapping of variant–disease–drug relationships with supporting evidence levels, citations, and ontology alignment (e.g., genes, variants, diseases, and drugs).
- Data integration requires review to ensure harmonized entity mappings consistent with the BiomarkerKB data model.
- Focused on somatic variant–based biomarkers; contextual attributes such as tissue type, therapy response, or evidence type can be inferred or imputed where not directly specified.
- Manual curation may be required for entries with incomplete evidence annotation or lacking standard ontology references.
- Integration approach: direct mapping of variant, condition, and evidence entities; cross-references retained to original data sources.
- License: Aggregated data are available for non-commercial, research use only, respecting constituent licenses:
- CIViC – CC0 (Public Domain)
- PMKB – CC-BY 4.0
- CGI – CC0 for biomarkers database, CC-BY-NC 4.0 for tool
- JAX-CKB – CC-BY-NC-SA 4.0
- OncoKB – custom non-commercial license
- MolecularMatch – restricted commercial use
- MetaKB codebase – MIT license
- Overall usage requires adherence to non-commercial research terms; commercial use needs separate permissions from individual data providers.