This year, PROMETRIKA team members have attended many industry conferences in which the hot topics always seem to be AI and automation of processes. Of course, the innovation cool crowd is excited to jump on board. What has really piqued my attention are the discussions amongst technology vendors about automating SDTM mapping. PROMETRIKA, as a full-service CRO, has a strong focus on biostatistics, statistical, and SDTM programming. Since SDTM mapping is a major part of the programming work that we do, I asked myself “Do we need a new business model for automating SDTM programming?” After hearing multiple viewpoints on the subject, I’ve come away with a list of pros and cons of automating SDTM mapping.
With regard to SDTM programming, I’ve learned that AI can be harnessed to probe raw data and map it by writing the programs that create SDTM domains. Of course, a human is still needed to check results (output), modify algorithms as necessary, and/or complete mapping that AI was unable to address. Thereafter, automated processes can be utilized to create the SDTM domains, as needed, from the data.
For big pharma, who have the potential advantage of being able to standardize their data collection instruments across all studies, the pros of automating SDTM mapping are obvious. It allows you to have a daily, up-to-date set of SDTM domains at any given point in the study (assuming the data is available, of course). This decreases the amount of time spent on SDTM mapping prior to submissions, where shortening the timelines by even a few days could mean lost sales of hundreds of millions of dollars. Obviously, there is an up-front investment to either implement an off-the-shelf product that automates the mapping or develop a system in house. The payoff is exponentially worthwhile if the company can shave days off submission timelines when a high volume of programs needs to be run simultaneously.
Nonetheless, the up-front investment to set up this automation is nothing to sneeze at; developing and implementing a tool like this can take months or even years, depending upon the level of automation you need. Automated algorithms need to be “trained” on a number of studies to capture a good representative sampling of mapping rules. Once the automated algorithms are established, you test and modify them until a high percentage of the raw data that is fed into them auto-maps accurately to the SDTM domains.
This led me to wonder if post-production changes to the EDC database or data structure updates for external datasets would render the attempt at automation futile. As we know, EDC databases go through countless rounds of post-production changes during the lifetime of a trial. Similarly, the lab dataset format doesn’t always match the original data transfer specification. This concern is especially significant given the high volume of external data sources in clinical trials today. All that “learning” that the algorithm has done to try to automate the mapping goes out the window if the rules are changing constantly. Lastly, the SDTM standards themselves are constantly changing, so the algorithms seem to be chasing after a moving target. Still, big pharma fans of automating this process will argue that retraining the algorithms every time the underlying data structure changes still saves time overall, even if the exercise is costly and not entirely efficient.
But what if you are NOT big pharma? What if you are a small biotech with a limited pipeline? If that’s the case, then the argument for spending the resources to automate SDTM mapping becomes much weaker. Firstly, small biotechs often don’t have the luxury of having enough studies to train the mapping algorithms. Secondly, small biotechs differ from big pharma in that they often don’t have preferred partnerships with external vendors for lab testing, for example. Instead, they rely on their CROs to bring the vendors for the trials. As a consequence, efficiencies gained through automation may be lost because you cannot rely on the fact that your external datasets are always going to come from the same vendor and stick to the same format. Lastly, the business strategy of small biotechs differ from that of big pharma in that conserving funds and maintaining peak efficiency are critical for a small biotech’s survival. Meanwhile, big pharma have greater freedom to spend whatever they need to ensure that they get to the finish line faster. Given these points, it seems prudent for a small biotech to do the SDTM mapping when they need it, instead of spending the up-front investment on standardization and automation. Additionally, there is little value in updating the SDTM mapping every time the underlying raw data structure or values change. You save time and effort by only updating the mapping to match the latest set of data if and when you need the domains.
It seems that the industry is in agreement that the strategy for automation of SDTM mapping is not a one-size-fits-all approach. Sponsors need to take careful consideration of the benefits and risks of automation for their particular business situation and trial needs. PROMETRIKA’s Statistical and SDTM Programming and Technology experts have many years of experience helping our sponsors of every size determine the best approach to their data processing needs for each trial.