Date Review Process and QC
As data is submitted and prepared it has to go through an internal review process by the BiomarkerKB team. Data is submitted from other collaborators and data contributing centers. Data is also collected internally by the BiomarkerKB team. Data can be collected through manual or automated curation. There are certain steps listed below on how the data is reviewed and QCed.
- If data is submitted from a collaborator or DCC, then the team at BiomarkerKB performs spot checks on the submitted data. This allows for a quick check to make sure that the data the is submitted is correct and in the correct biomarker data model format.
- Data from OncoMX was collected before the the BiomarkerKB project was started. The data was reformatted into the biomarker data model, but it went through 2 more rounds of QC to make sure the data was fully correct.
- The data is then run through the TSVtoJSON.py converter. This allows for the biomarker data to be run through different APIs to make sure the correct condition and entity IDs are being used. If some IDs are not recognized then a flag is created and the data can be corrected if needed.
- Once the above are corrected the biomarker data is run through another data_qc script. This script catches irregularities within the data such as casing mistakes, typos, formatting errors, and mismatch of biomarker IDs.
- Another last spot check is done on the data to ensure that all changes are corrected and the data has no other scientific issues.