This repository is to guide you through EHR Ops Dashboard. EHR Ops Dashboard is a tableau dashboard provided by AllofUs analytics team to reflect the data quality across over HPOs.
⚠ NO PHI entry in this repo.
- Definition of metrics and related queries
- How to read the dashboard
- What's to expect in new coming features
1. Data Transfer Rate A
- Definition: The number of participants with EHR data per HPO divided by the number of all participants eligible for EHR data transfer
- Metrics: Numerator is generated by
eligible_participants_ehr.sql
. Denominator is generated byall_eligible_participants.sql
2. Data Transfer Rate B
- Definition: The number of participants with EHR data per HPO divided by the number of all participants eligible for EHR data transfer who have completed an in-person visit (physical measurements OR biospecimen)
- Metrics: Numerator is generated by
eligible_participants_ehr.sql
. Denominator is generated byin_person_participants.sql
3. General Conformance
- GC-1
Every row in the condition_occurrence, visit_occurrence, procedure_occurrence, drug_exposure, measurement and observation tables should have a well defined concept_id - GC-1 Standard - Every row should have a standard concept_id with correct domain and it's not null or 0. - The metric is generated bygc_1_standard.sql
. - GC-1 Source - Every row should have a source concept_id with correct domain and it's not null or 0. - The metric is generated bygc_1_source.sql
. - GC-2
Every foreign keys should be valid. For example, the foreign keys checked in condition_occurrence_table includes condition_concept_id, visit_occurrence_id, person_id, condition_type_concept_id, condition_source_concept_id, condition_status_concept_id - GC-3
- GC-3 Standard
- Every row should have a concept_id which is not null or 0.
- The metric is generated by
gc_1_standard.sql
.
- GC-3 Source
- Every row should have a source concept_id which is not null or 0.
- The metric is generated by
gc_1_source.sql
.
- GC-3 Standard
4. Date Conformance
- DC-1
- There should not be any end dates before start dates in any of the clinical data tables.
- The metric is generated by
dc_1.sql
- DC-2
- No data points should exist beyond 30 days of death date, if applicable.
- The metric is generated by
dc_2.sql
- DC-3
- No dates should be prior to 1900 (for observation) or 1980 (for other clinical data tables).
- The metric is generated by
dc_3.sql
- DC-4
- Date and datetime fields should match.
- The metric is generated by
dc_4.sql
5. General DQ Metrics
- Unit Concept Failure
- In measurement table, rows with numerical value_source_value or value_as_number should have standard concept_id and have the domain "Unit"
- The metric is generated by
unit_route_failure.sql
- Route Concept Failure
- In drug_exposure table, concept_ids should both be standard and have the domain "Route"
- The metric is generated by
unit_route_failure.sql
- Visit ID Failure
- In condition and procedure tables, rows should not have null/0 visit_occurrence_id and these visit_occrrence_id should also exist in visit_occurrence table
- The metric is generated by
visit_id_failure.sql
- Duplicates
- In each table, there should not be duplicated rows.
- The metric is generated by
duplicates.sql
6. COVID Mapping
- In measurement table, rows related to COVID-19 results (i.e. rows with concept_id being desendant of 756055) should have neither null or 0 in value_as_concept_id and the value_source_value field should be mapped to standard concepts we provide on our website.
- https://sites.google.com/view/ehrupload/data-quality-metrics/covid-19-data-mapping?authuser=0
- The metric is generated by
covid_mapping.sql
7. Physical Measurements
- All sites should be submitting data on participants' physical measurements. Specifically, we check against body height, body weight, and BMI.
- The metric is generated by
physical_meas.sql
-
Filter Session
- You could filter on the awardee or organization level to focus on the sites that you are interested in.
- Click the filter and then apply the selections.
-
Aggregate Data Transfer Rate Session
- This session shows aggregate number of awardees/organizations with data transfer rate falling in different threshold.
- The calculation of Data Transfer Rate A and B are in the tooltips when you hover over the aggreagate numbers.
- If you click on the number, the dashboard will be filtered to show the sites which are in that number category only.
-
Data Transfer Rate Session
- This session is a detailed dashboard with all organizations' metrics for Data Transfer Rate and most recent submission time.
- You could hover over the green bar to see the break-down of participants with different status (eligible participants, eligible participants with EHR data and eligible participants with in-person visit)
- You could also hover over the blue dot to see details of the metrics.
- The calculation and definition are in the tooltips.
-
Data Conformance Session
- This session is showing the general conformance and date conformance metrics.
- If you hover over the cell, the success rate break-down by each table will show up. This is for your reference to check your table in further detail.
-
Definition Session
- This session is giving you the definition of each jargon and metric.
Here are the metrics still under design and implementation:
- Trending chart of metrics
- Other measurements and drug integration
Use compile.py
to render, or compile, query templates into runnable SQL.
- Python 3.7+
Note: The examples below assume a modern shell environment.
Assuming this is the contents of query_1.sql
:
SELECT
field_1
,field_2
FROM `{{project_id}}.{{dataset_id}}.table_1`
executing
python compile.py query_1.sql --project_id 'drc-prod' --dataset_id 'ehr-ops'
stores the compiled file at .compiled/query_1.sql
SELECT
field_1
,field_2
FROM `drc-prod.ehr-ops.table_1`
Assuming these template files exist:
query_1.sql
metrics/query_2.sql
executing the command
python compile.py **/*.sql --project_id 'drc-prod' --dataset_id 'ehr-ops'
compiles them and saves the results at:
.compiled/query_1.sql
.compiled/metrics/query_2.sql
The XML-formatted workbook is stored in the file EHR Ops Dashboard.twb
. By default, the XML may contain sensitive session-specific information (usernames, project names, etc.) when saving via Tableau Desktop. This information must be stripped out before committing to the repo. Follow the following instructions to setup a pre-commit workflow that will accomplish this.
-
Install project packages.
pip install - r requirements.txt
-
Create a file named
.cipher
and add key-value pairs for variables that need to be stripped out.- Follow the example in
.cipher.example
. - Currently these 3 variables are required to be set:
- pdr_project
- curation_dataset
- rdr_dataset
- Follow the example in
-
Initialize a pre-commit hook that flags the workbook for sensitive information.
sh ./init_pre_commit.sh
- Utilizes the pre-commit library.
- Follows configuration in
.pre-commit-config.yaml
. - Executes the script
strip_workbook.py
before each commit.
*Starting at the point after workbook is modified in Tableau Desktop
-
(optional) Execute
check_workbook.py
script to check if workbook contains sensitive information.python check_workbook.py
-
(optional) If outputted message indicates sensitive information, execute
strip_workbook.py
script to clean.python strip_workbook.py
-
Stage changes in working directory.
-
Commit the changes. An automated pre-commit process will run and do the following:
- Runs the
strip_workbook.py
script before committing to remove sensitive information. - Runs the
restore_workbook.py
script after committing to restore variable values in workbook.- Note: a minor side effect of this process is that the restored workbook is automatically re-staged. To unstage, manually execute
git reset HEAD "EHR Ops Dashboard.twb"
- Note: a minor side effect of this process is that the restored workbook is automatically re-staged. To unstage, manually execute
- Runs the
- Take an already stripped workbook.
- Execute the
restore_workbook.py
script to reset variables stored in.cipher
. - Open the workbook in Tableau Desktop and navigate the interface to reset data source connection parameters.
- Proceed with development as normal.
If a pull request indicates a merge conflict:
- In local repo, merge the destination branch into the source branch.
- If conflicts are within the
EHR Ops Dashboard.twb
file, accept all current changes via a conflict resolution tool (essentially overwriting the workbook). - Commit and push to conclude conflict resolution.
Please reference this guide for development setup.