At the recent CDISC Europe Interchange in Geneva, I had the honor of presenting the details of a project mapping clinical data into CDISC SDTM, with a twist. It was a small healthy volunteer study run at a single clinic where the site investigator built the data capture system in REDCap.
Research Electronic Data Capture, or REDCap, which was developed at Vanderbilt University, is a free, user-friendly web-based interface that requires no background knowledge or technical experience to use, designed specifically for use by academic and public health non-profit institutions. The REDCap Consortium consists of 7760 institutions in 160 countries, with 2.3 million projects, 3.7 million users, and 45,300 citations. It represents a huge pool of data that could potentially be tapped by pharma and biotech.
While academic and public health non-profit institutions would like to use CDISC standards, if they are aware of them, training and experience are required to map difficult data. Even if there is budget available to support training, personnel at many of these institutions have short term contracts of less than five years; thus, most would be leaving at around the time that they become experts. CDISC has tried to bridge the gap by creating a basic set of Clinical Data Acquisition Standards Harmonization (CDASH) electronic case report forms available on the REDCap shared Data Instrument Library; however, searching for “CDASH”, “CDISC”, etc. in the search engine yields nothing. Searching on the CDISC website for “REDCap” will eventually lead to the library, after following several different links. It seems additional outreach from CDISC and documentation within the instrument library would be beneficial.
A current client asked for help with the SDTM mapping for the example study. As Director, Statistical Programming at PROMETRIKA, a full-service CRO based in Cambridge, MA, in my 30-ish years of clinical trials programming, I had not seen data quite like this study’s. Always being up for working on something different, I personally accepted the challenge.
Searching for tools or software to assist with the mapping yielded a few options. The R package REDCap2SDTM looked promising, but requires imbedding the SDTM domain name, variable name, and test code in the REDCap field annotation in the form. Unfortunately, while the case report form booklet was less than 35 pages, most of the forms were incredibly dense, so even if access to the REDCap database was available, it wasn’t really feasible to insert all of those annotations. There are a few software systems out there that allow the forms to be imported into them and then the tool can export into SDTM domains. Unfortunately, there was no budget to re-enter the data. Some of the data in REDCap also seemed to have been imported from the EHR, so there were no forms for those data; additionally, post processing still would have been required for complicated mappings. The remaining option was to dive in and start mapping manually.
The first step was to get familiar with the study. The data was provided in Excel spreadsheets as exported from REDCap. Each spreadsheet contained one tab of the raw values and one tab of decoded values. All but one of the files were subsets that somewhat correlated to individual forms. The remaining file contained all the data for all of the subjects in one big spreadsheet with 640 columns per tab. Importing the spreadsheets into SAS presented some challenges, since many of the question names were too long to import completely and defaulted to values such as VAR347, VAR348, etc. Heights entered as feet and inches in the format of feet-inches (e.g., 5-9) were converted to a date in Excel (e.g., 9-May) during the export. Concomitant medication names, doses, indications, etc., were entered into giant fields that, for the mapping, needed to be parsed into separate rows for each medication and separate variables for each aspect.
The next step was annotating the forms. Since the questions were organized based on the flow of the study, rather than by data type, and the layout was very dense, at least one form contained data that mapped to seven different domains and many forms that mapped to six domains. Creativity and patience were required to complete the annotations.
Completing the programming specifications was equally challenging. Keeping all the variable names and labels straight required exporting a Proc Contents listing into Excel and annotating the mapping domain and variable destinations directly in it. Thankfully the study was complete, so the specifications and programming only had to account for the available data and not what was possible. The very clear and detailed specifications and the use of look-up tabs that could be read into the programs for complicated mapping meant that programming and creation of the define.xml file proceeded relatively quickly.
Some data that usually is collected in clinical trials was missing, such as failed inclusion and exclusion criteria, actual first dose and last dose of study drug, and most units and methods. Date of informed consent was also missing for most subjects. Lab ranges were contained in a lengthy Microsoft Word document. Dates for when some of the events happened were not collected. Medical coding information for Adverse Events and Concomitant Medications consisted solely of a preferred term but the version of the dictionary that was used was not retained.
The Study Data Reviewer’s Guide was used to document the lack of an IE domain and the lab ranges were added as an appendix. First dose and last dose of study drug were both derived starting with a corresponding visit date. Dates of when particular forms were filled out in REDCap were leveraged for some of the missing event dates. The AE and CM preferred terms were imported into a coding tool and coded to obtain all the required coding levels.
After all the steps and qc were completed, the final mapping resulted in a relatively low number of Pinnacle 21 Conformance Checker issues and most of those were data value related. The mapping strategy worked! The happy client had neat and orderly SDTM domains to support their analysis.
In conclusion, mapping data from REDCap or other non-standard systems into CDISC SDTM cannot be done with automated tools and requires deep understanding of the data and the standards, for which not every organization has the expertise. If you have unconventional data and need SDTM, the PROMETRIKA CDISC Mapping team can help you make it happen.