The annual CDISC US Interchange brought enthusiasts of standards in clinical trials together in Bethesda, MD in October 2018.  Attendees included programmers, biostatisticians, data managers, managers, regulators, clinicians, and researchers from government agencies, pharmaceutical companies, CROs, universities and medical facilities. The standards specifically governed by CDISC (Clinical Data Interchange Standards Consortium) that were discussed ranged from the protocol through to data collection, tabulation, analysis and beyond.  I was fortunate to be able to attend to hear the latest news and catch-up with old friends and colleagues.

CDISC started out as a purely volunteer organization and, similar to the standards it created, continues to evolve and grow.  This year the new CDISC corporate branding was introduced with much fanfare, and swag was distributed to the attendees.  Along with the bells and whistles comes a renewed focus on the customer (users), the value provided, sustainability of the standards, and optimization of the available tools.  CDISC is heading towards creating standards and tools that allow end-to-end automation of the clinical trial process and data exchange to optimize patient treatment and health.

Interchange normally lasts for five days.  The first two and the fifth days are devoted to training courses, workshops and meetings of the CDISC Advisory Council (CAC) and numerous standards teams.  On the other two days, there is an exhibition hall of booths to visit, plenary and keynote addresses, presentations of papers, roundtable discussions, and a regulatory session to hear.  The middle half of the sessions have three tracks, divided into various categories, including several for newcomers and those not familiar with CDISC or the standards, while the remaining sessions are just one track.  I find this arrangement very convenient for being able to hear the bulk of the presentations and not missing important information nor having every topic I am interested in presented at exactly the same time in six different rooms.  There are many opportunities for networking and continuing discussions during meals, refreshment breaks, and the first night evening reception.  More than a few representatives from the FDA attend and speak, giving many opportunities to ask questions and gain clarity on the submission process.

On the regulatory side of things, I learned that, beginning in March 2020, the FDA will require the inclusion of LOINC (Logical Observation Identifiers Names and Codes) codes for medical laboratory data.  LOINC codes help identify specific tests by method and unit to enable accurate comparison of test results from multiple sources.  The FDA also just published a new set of Business Rules (see fda.gov or Study Data Technical Conformance Guide v. 4.2 (October 2018)) to aid in validating the tabulated and analysis data for submission.  These rules will be incorporated into checking tools in the near future and will have updated numbers identifying them, so it will be clear as to which version of the rules is being used.

Last year, CDISC made a promise to release updated documentation more frequently and they have been following through.  A new version of the SDTMIG (Study Data Tabulation Model Implementation Guide) was released in December 2018, the first update in five years.  Training on the new release was provided at Interchange.  The next IG version is slated to be released in 2019, with additional updates in 2020.  New versions of all the foundational standards for clinical (CDASH) and non-clinical (SEND) data collection, analysis datasets (ADaM), data exchange, questionnaires, and controlled terminology are being released at more frequent and regular intervals.  Even the format of the documentation is changing from pdf to xml to allow for portability and ease of making updates.

Naturally, with all this influx of change, many of the presentations focused on how to do things automatically or better and faster.  There were examples of automating annotations of case report forms (CRFs) and a fascinating presentation of using machine learning programming approaches for mapping data.  Several organizations gave presentations on how they govern and automate standards.  Many of the speakers were very passionate and their energy was infectious, causing me to come away with renewed energy and focus for my day-to-day work.

For me, the most interesting aspect of trying to do things better and faster has been the more recent focus on real world data (RWD) and incorporating electronic health records (EHR) directly into SDTM, drastically reducing the data manually entered in the study specific database.  The related presentation titled “Precisely Practicing Medicine from 700 Trillion Points of Clinical and Clinical Trials Data” caught my attention.  The author, Dr. Atul Butte from UCSF spoke of the role of big data in pooling information from multiple sources to find solutions for diagnosing and treating illness and disease, and providing guidance and direction for research.  He described it as a kind of retroactive crowdsourcing of patient data.  His first example was of a teenage girl who successfully used data that are freely available from the Gene Expression Omnibus (GEO) data warehouse to create one algorithm to diagnose breast cancer with 99% sensitivity and a second algorithm for leukemia.  Yes, a high school student did this.  Dr. Butte’s team has gathered data from all of the medical centers associated with the University of California into one data warehouse.  He displayed a graph of all the individual patients at once showing different conditions and status by time point during their lifespans, creating a visual map that looked like star fields across a night sky.  From these data, his researchers were able to determine groupings of conditions across age groups.  For example, they determined that, for elderly populations, though individuals had many different initial conditions, the number one cause of death was septicemia.  Using similar techniques, researchers will try to predict disease before it presents, explain rare diseases, and find treatments for diseases that have previously rarely been studied.  As Dr. Butte concluded, “Big data in biomedicine is hope.”

For more information, see the CDISC website: cdisc.org