An EDC (Electronic Data Capture systems) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop CDISC (Clinical Data Interchange Standards Consortium) SDTM (Study Data Tabulation Model) datasets in R. The reusable algorithms concept in ‘sdtm.oak’ provides a framework for modular programming. We plan to develop a code generation feature based on a standardized SDTM specification format, which has the potential to automate the creation of SDTM datasets.
The package is available from CRAN and can be installed with:
install.packages("sdtm.oak")
You can install the development version of {sdtm.oak}
from
GitHub with:
# install.packages("remotes")
remotes::install_github("pharmaverse/sdtm.oak")
-
Raw Data Structure: Data from different EDC systems come in varying structures, with different variable names, dataset names, etc.
-
Varying Data Collection Standards: Despite the availability of CDASH (Clinical Data Acquisition Standards Harmonization), pharmaceutical companies still create different eCRFs using CDASH standards.
Due to the differences in raw data structures and data collection standards, it may seem impossible to develop a common approach for programming SDTM datasets.
‘sdtm.oak’ aims to address this issue by providing an EDC-agnostic, standards-agnostic solution. It is an open-source R package that offers a framework for the modular programming of SDTM in R. With future releases; we plan to develop a code generation feature based on a standardized SDTM specification format, which has the potential to automate the creation of SDTM datasets.
Our goal is to use ‘sdtm.oak’ to program most of the domains specified
in SDTMIG (Study Data Tabulation Model Implementation Guide: Human
Clinical Trials) and SDTMIG-AP (Study Data Tabulation Model
Implementation Guide: Associated Persons). This R package is based on
the core concept of algorithms
, implemented as functions capable of
carrying out the SDTM mappings for any domains listed in the CDISC
SDTMIG and across different versions of SDTM IGs. The design of these
functions allows users to specify a raw dataset and a variable name(s)
as parameters, making it EDC (Electronic Data Capture) agnostic. As long
as the raw dataset and variable name(s) exist, ‘sdtm.oak’ will execute
the SDTM mapping using the selected function. It’s important to note
that ‘sdtm.oak’ may not handle sponsor-specific details related to
managing metadata for LAB tests, unit conversions, and coding
information, as many companies have unique business processes.
With the V0.2.0 release of ‘sdtm.oak’ users can now efficiently create the DM domain and various SDTM domains, encompassing Findings, Events, Findings About, and Intervention classes. However, the V0.2.0 release does NOT cover Trial Design Domains, SV (Subject Visits), SE (Subject Elements), RELREC (Related Records), Associated Person domains, or the EPOCH Variable across all domains.
Subsequent Releases: We are planning to develop the below features in the subsequent releases.
- Metadata driven code generation based on the standardized SDTM specification.
- Functions required to program the Domains SV (Subject Visits), SE (Subject Elements) and the EPOCH Variable.
- Functions to derive standard units and results based on metadata. - Additional features to be developed based on the user feedback.
- Please go to Algorithms article to learn about Algorithms.
- Please go to Create Events Domain to learn about step by step process to create an Events domain.
- Please go to Create Findings Domain to learn about step by step process to create a Findings domain.
- Please go to Path to Automation to learn about how the foundational release sets up the stage for automation.
- Please watch this YouTube video to learn about using the package YouTube Video
- RinPharma Virtual workshop slides
We ask users to follow the mentioned approach and try ‘sdtm.oak’ to map any SDTM domains supported in this release. Users can also utilize the test data in the package to become familiar with the concepts before attempting on their own data. Please get in touch with us using one of the recommended approaches listed below:
We thank the contributors and authors of the package. We also thank the CDISC COSA for sponsoring the ‘sdtm.oak’. Additionally, we would like to sincerely thank the volunteers from Roche, Pfizer, GSK, Vertex, and Merck for their valuable input as integral members of the CDISC COSA - OAK leadership team.