Reducing R&D Cycle Time in Pharma Without Increasing Regulatory Risk - with Vaithi Bharath of Bayer

38 min

•Jan 7, 20265 months ago

Summary

Vaithi Bharath from Bayer explains how pharma R&D delays stem from fragmented systems and validation gates rather than analytics limitations. The episode explores how guided, explainable AI workflows layered on existing systems can reduce cycle time while maintaining regulatory compliance and audit readiness without requiring full system replacement.

Insights

Pharma R&D delays are primarily systemic and process-driven (fragmented handoffs, validation gates, manual file transfers) rather than analytics or AI capability problems
In regulated environments, small technical changes trigger disproportionately large validation cycles (CSV release processes), forcing teams to tolerate inefficient workarounds for months
AI should function as a co-pilot that complements human expertise, not replaces it—AI-suggested findings still require clinical review and human approval before implementation
Reusable validation macros and standardized guided workflows can increase macro reusability from 40-50% to 70-80%, directly reducing database lock timelines and study amendments
Layering AI workflows via APIs on top of existing validated systems avoids expensive big-bang replacements and allows parallel implementation without pausing validation cycles

Trends

Shift from point solutions to consolidated data platforms (data lakes/lakehouses) to reduce integration burden across EDC, lab, safety, and statistical systemsAdoption of loosely-coupled API-first architectures over hardwired integrations to enable faster, more flexible system connections in regulated environmentsMovement toward standardized market platforms (Medidata, Viva, Databricks, SaaS) replacing custom-built systems in pharma R&D stacksGrowing vendor offerings for AI-guided workflows in clinical data transformation and quality control (e.g., SAS introducing AI-assisted capabilities)Emphasis on audit-ready traceability and immutable audit trails as foundational requirements for AI tool adoption in GXP environmentsIndustry recognition that data preparation and management should shift upstream to centralized platforms, freeing statistical teams for higher-value analysis workIncreased focus on explainability and lineage tracking as compliance requirements, not just technical nice-to-haves, in regulated AI implementations

Topics

Clinical R&D cycle time reductionPharma data integration and ETL pipelines21 CFR Part 11 compliance and e-signaturesGXP validation and CSV release processesElectronic data capture (EDC) systemsSDTM and ADAM data standardsAI explainability and audit trailsDatabase lock and study analysis plansProtocol deviation monitoringStatistical programming and SAS macrosData lineage and traceabilityRegulated system validation frameworksAPI-driven system integrationHuman-in-the-loop AI workflowsReusable validation templates and macros

Companies

Bayer

Vaithi Bharath is Associate Director of Data Science and AI Solutions; company implementing AI workflows in clinical R&D

Medidata

Mentioned as commonly-used EDC (electronic data capture) system in pharma clinical trials

Viva

Referenced as EDC system with API capabilities that can support AI-guided workflow layers

SAS

Statistical programming tool mentioned; introducing AI-guided workflow capabilities for SDTM transformations

Databricks

Identified as market-standard platform gaining adoption in pharma for data lake/lakehouse consolidation

Emerge AI Research

Podcast producer; Matthew DeMello is Editorial Director

People

Vaithi Bharath

Associate Director of Data Science and AI Solutions at Bayer; guest discussing AI in pharma R&D workflows

Matthew DeMello

Editorial Director at Emerge AI Research; host of the AI in Business Podcast episode

Daniel Fagella

CEO and head of research at Emerge AI Research; mentioned in closing credits

Quotes

"Most pharma delays aren't model problems. They're system delays, fractured handoffs, gates and rework across EDC labs, in safety, in statistics."

Matthew DeMello (paraphrasing Vaithi Bharath's position)•Early in episode

"Testing is running a function to see that it works. Validation is asserting what must and must not happen."

Matthew DeMello•Mid-episode distinction

"We are not placing humans in the background because our people are our most important assets in terms of the domain knowledge and the expertise they have brought in with so many years."

Vaithi Bharath•On human-in-the-loop AI

"If you don't touch the underlying framework itself, but you're just bringing on a layer of this AI workflow to that framework, then you're not revalidating the framework."

Vaithi Bharath•On avoiding big-bang replacements

"Speed gains can come from layering guided workflows via APIs, keeping human approval and audit-ready traceability."

Matthew DeMello (episode summary)•Closing takeaway

Full Transcript

Welcome, everyone, to the AI and Business Podcast. I'm Matthew DeMello, Editorial Director here at Emerge AI Research. Today's guest is Vaithi Bharath, Associate Director of Data Science and AI Solutions at Bayer. Vaithi joins us on today's show to explain why many clinical R&D delays aren't AI or analytics problems. They're systems and process problems, fragmented tools, slow handoffs, and validation gates that turn straightforward work into weeks of cycle time. He shares what actually causes stalls from data capture through database lock and why small changes in regulated environments can trigger outsized delays. Our conversation also focuses on how leaders can reclaim time without ripping out validated stacks, layering guided, explainable workflows on top of existing systems, keeping humans in the loop, and building audit-ready traceability so teams move faster with fewer rework loops and more predictable timelines. Today's episode is sponsored by AnswerRocket, but first, are you driving AI transformation at your organization? Or maybe Maybe you're guiding critical decisions on AI investments, strategy, or deployment. If so, the AI in Business podcast wants to hear from you. Each year, Emerge AI Research features hundreds of executive thought leaders, everyone from the CIO of Goldman Sachs to the head of AI at Raytheon and AI pioneers like Yoshua Bengio. With nearly a million annual listeners, AI in Business is the go-to destination for enterprise leaders navigating real-world AI adoption. You don't need to be an engineer or a technical expert to be on the show. If you're involved in AI implementation, decision-making, or strategy within your company, this is your opportunity to share your insights with a global audience of your peers. If you believe you can help other leaders move the needle on AI ROI, visit Emerge.com and fill out our Thought Leader submission form. That's Emerge.com and click on Be an Expert. You can also click the link in the description of today's show on your preferred podcast platform. That's Emerge.com slash ExpertOne. Again, that's Emerge.com slash ExpertOne. We look forward to featuring your story. Without further ado, here's our conversation with Bytee. Vaithi, thank you so much for being with us today. Thank you so much for having me. This is my first ever podcast and I'm thrilled to do this. So thank you. Yes, absolutely. As we had discussed in our preliminary conversations leading up to today's interview, most pharma delays aren't model problems. They're system delays, fractured handoffs, gates and rework across EDC labs, in safety, in statistics. We're getting into today how guided explainable workflows can cut days from cycles without touching your validated stack. But before we dive into solutions, I want to ask about, you know, you've said, or at least in our preliminary conversations again, that across pharma R&D, insights that should take hours often drag into days or weeks because systems aren't talking to each other. With that context in mind, where would you say decision cycles stall most often in pharma R&D? And what is the real impact on timelines? Maybe if you could walk us through a typical clinical data workflow from electronic data to capture the final table listings and figures where the real-time loss shows up. Scattered variety of systems. The clinical data lifecycle goes through a long process where there is data captured at the site where the study happens, all the way through to where medical writers produce the submission deliverables for a drug's efficacy to be submitted to the authorities for review. From the site to getting to a submission state, there are at least four or five different areas of handover between systems for data. And just by the nature of this wide array of geographies and time and systems, the systems are not always well integrated. So, for example, pharma companies often use MediData and Viva as EDC systems, electronic data capture systems. And from there, the data flows to a clinical trial management system typically. And in parallel, you would also have labs which are working off the EDC data to also feed data into the CTMS systems. And then you have something called safety databases where information about the required safety of a drug or a treatment is cross-referenced and should be considered in context of this data that comes through. And all of this then flows down to a statistical or a scientific computing environment where insights are generated using statistical programming tools such as Python, R, SAS, of course. And this information further flows down to, I mean, this information is used to produce submission data, which are used by the medical writers. And then this information is what is produced as deliverables for the authorities. And deliverables themselves may consist of something called tables, listings, and figures, which are often the commonly prescribed formats for authorities to review data. So where it stalls or where there are delays, for example. So when we have data coming from EDC systems, there needs to often at the moment, there is a transformation exercise required where data needs to be converted to, it's called a CRF data, which needs to be converted to analysis ready format. We call them SDTM and ADAM data sets. And then in addition, for generating these SDTM and ADAM data sets, we also need to map some variables. We call them CRF variables to what you call C-disc domains, which are industry standard data domains. This happens in one cycle, let's say. And in parallel, you would have biostatisticians who reconcile safety data across systems. And then there will be monitors, like medical monitors, which are groups of people who validate or observe any deviations in the protocol across these systems. So how data was captured or preserved in context of this protocol, if there are any deviations, they monitor that. So, and all of this needs to come together before the drug life cycle is completed and analysis is completed and deliverables are produced. So this often requires manual file transfers, format conversion, reconciliation, manually or through different tools. But at the end of it, as you can see, these things happen in disparate places among disparate groups. So what does this mean? So what? That's always the question. So what does this mean? That database lock, which is what in pharma R&D or the clinical study lifecycle is referred to as the point where the data is meant to be final and not touch in one cycle at least. And that data is meant to be used by statistical programmers to produce their submission deliverables. That log cycle can stretch from days to weeks depending on how often things change upstream. And even so, even after the database log is complete, you will sometimes or often notice that in the statistical analysis lifecycle that there are deviations or the data points that you have received may not always reflect the study analysis plan that was created at the very beginning. or the study analysis plan might have changed. But by the time the data came to us, the plan has changed. And so this team who does the statistical analysis may need to go back to the analysis plan and redo their mappings and format conversions and things like that. And this sometimes can happen across not just internal teams, but also across form vendors and CROs and teams like that. So all of this can potentially result in delays, which, if handled in a more streamlined manner, could result in more inside generation of analysis time as opposed to fixing these gaps. And could you give us an example of a multi-team workaround that stole time because the change triggered a full computer system validation release process? So on the ground, it looks like, you know, analysts and statisticians are primarily, you know, exporting SAS tables or, you know, analysis data tables. and data engineers are rebuilding sometimes the same ETL pipelines. The validation of the QA teams are often waiting to re-execute their validation runs. And obviously, this comes with the teams using their own format. So the whole transformation thing that I said earlier still applies. So in a regulated environment, what this means is if, for example, a format becomes incompatible or the ETL pipeline needs to change, then this triggers CSV release process. So a GXP and the CSV release process typically involves extensive documentation, extensive testing. And in the context of validation, testing is not just testing to see that something works. Testing validation means asserting that something that should do its job is doing its job the way it is supposed to be done. And something that should not happen is also not happening So that where validation differentiates itself from testing right So it a very rigid and controlled process So what this means so if there even a small change teams often have to tolerate inefficient workarounds So even in my own experience we had a situation where we had a data handoff with the vendor and the integration was being replaced with another modern pattern, so to speak. But the problem was the integration was not living up to its performance, to be honest. So it was taking us an extreme time to download the required data. So for a short while, for about six months to eight months, we were forced to live with a manual workaround where we would have data literally handed off in zip files onto cloud locations, and then there would be an email that's sent to a support team. The support team acknowledges the email. They download the file. Then they have to, with the password and protected file, they have to download it, and then they have to reestablish the file in context of a study area and then inform the study teams. So as you can see, right, what would have been a single click or a single job run now involved multiple days and multiple teams to bring that data in. So this is because we cannot accept the data as it is. We can't simply say that, well, we'll just send you a file by email or we'll put it on a Google Drive and then you take it. That doesn't work. We need to be able to explain the lineage of this file. Who created the file. What was the timestamp of it? Did somebody touch it? There needs to be a report that goes with it saying how many records were originally in the file. And when we download it, do we have it at the same level? There's no corruption of the data and things like that. So, it's a time-consuming and expensive process to validate it. But until that's done, we have to tolerate also inefficient workarounds. So, this is where, for example, this wasted engineering time or this slower time to set up studies can potentially be streamlined or should be potentially streamlined to avoid these stalls in the decision cycles. And this comes with better coordination or better planning across teams. And this is probably what we'll do something later, I think, right, to talk about how an AI who can potentially yield some improvements in that area. So Vaidhi, you've made a key distinction. Testing is running a function to see that it works. validation is asserting what must and must not happen. And that changes everything. So how do regulations like 21 CFR Part 11 audit trails and e-signatures change the day-to-day work? And what evidence are auditors actually looking for when they review your compliance processes? So as we know, 21 CFR Part 11, right? The set of requirements put forth by the FDA sets rules for trustworthy electronic records and signatures. This requires robust audit trails and secure access controls and validate systems to be in place where the GXP practices are followed. So this also ties into how an audit is performed. So there are auditors when they come in to auditor system, they have some expectations that focus on demonstrating data integrity, traceability, and security primarily. Auditors, what they want to check is to see that we have logs that are tamper proof, immutable, so to speak, or at the very least data with an audit trail on the log assessment to say that, you know, nobody can go in and change a log. Nobody should be allowed to go in and change a log as and when they wish to do so. So every single action needs to be recorded and audited, right? And then even in our day-to-day work, when we give access to users, when people onboard, new people come in or people from other parts of the org, they come in, they have to follow some trainings. They need to show the evidence that they have completed a training. There is always a justification for why they would use, in my context, an SCE. It needs to be approved by their line manager and the business team, business administrator, so to speak. So all of this needs to be placed because at the end of every year, we do what we call a periodic review, which is sort of input to any auditor. So the periodic review needs to establish that end of this year, we had so many users and for every one of these users, we have these necessary prerequisites met in terms of training, learning, the evidences to support that, right? And similarly, every document flow in the validation lifecycle on the statistical compute environment needs to have e-signatures. So that means we have a framework of tools which will not just generate documentation, but will also make sure that the person who's executing the validation cycle, his or her credentials are tagged as signatures to that document. thereby we don't in the past we used to do wet signatures so we would print them and actually sign them but now we are more in not just a talking sign but more like internal tools that can generate credentials that are considered as signatures so these as you can see right if this was a typical non-gxp system or a non-regulated space none of this would be required right all you need is somebody makes a change you know in a github repository you will have you'll know who made the change and then somebody tests it you will have a test case to show somebody tested it and then it's released into production. But when it comes to a regulated space, it's amplified because of obvious reasons, right? With drug development, you are dealing with real human lives at stake here. So when FDA takes over our results, right, they often expect to be able to reproduce the same results with the same data. So often they may even ask us for code and, you know, the reports that we've generated as part of the documentation to be submitted to them. And they would often try to reproduce that using the same code and data that we use and compare to see whether are we getting the same results that's when they know that okay you have given to given us this information which can be considered as integral or you know with integrity so to speak right so that's where you know the requirements get significantly amplified and this also now i talked about 21c for part 11 but there are other requirements across countries because clinical trials are often global and in other countries there are are more stringent requirements in some places like china for example has much more stringent requirements. The EMA often, European Medicines Agency, often follows or they work in parallel with FDA and have similar requirements. But as you can see, with global regulatory needs, the requirements to have a validated system or perform changes on a validated system are far more complex compared to a regular system. Okay, so now let's look ahead. You paint a future where routine tasks become guided, explainable workflows with humans approving key steps, which everyday questions should become guided workflows first? For example, checks on data set readiness, protocol deviations, or QC, and critically, how do we introduce these AI tools while keeping humans in the loop and still gaining speed? The whole compliance requirements around validation also now comes with the explainability part, right? So we need to be able to explain all our actions, our results, our reports, and our logs to the auditors and to the authorities. So how is that meant to be addressed, right? So typically, it's important to show the data lineage from the source to the source, meaning the AVC point, where the data was captured to the derived endpoints, which are the table listings and figures. So we need to be able to explain how the data came into the clinical cycle, how the transformation logic was implemented in a clinical information environment, and how the transformed data was finally mapped to the STTM and ADAM data sets that I had earlier mentioned. And how does all of this happen with programming using SAS or R or Python in a version controlled in a compliant manner? So typically, this would include, you know, decision logs where, and now I'm speaking this when an AI system is in, or an AI workflow could be in place to help us get through this process. So what would be a manual effort to maybe scan through the different documents and outputs and flag for deviations can potentially be replaced with AI guided workflows, which more and more vendors are bringing into the market right now. So companies like SaaS, for example, are introducing capabilities where there are AI guided workflows to do these transformations and call out these mappings. So typically what that means is you have these AI suggested findings, for example, but then you have a human in the loop that clinically reviews these findings. You have these immutable audit rails which are created while these flags are generated. And then you pass this along to like the monitoring teams who then review these outputs. And then based on their reviewer approval, you deliver these artifacts along with things like define.xml, the reviewer's guides and some validation reports that were used to track the lifecycle of these programs in this analysis. And here is where AI can also upgrade. So when you do a validation lifecycle using an AI tool, there could be AI-assisted quality control checks that can trace it back to an underlying macro that was used or to an underlying workflow that was triggered. Now, keep in mind, not all of these are fully mature or yet in place, as many in the industry would attest to. A lot of this is work in progress. There are various companies offering pilots and small proof of concepts on this concept. And so from my experience, we are also experimenting with some of these. I would be dishonest if I were to say that we are implementing all of this, but we are evaluating this, right? We are working with different vendors to improve the process here. That where you know I want to make it clear So essentially what we would what we are trying to do is not place humans in the importance because our people are our most important assets in terms of the domain knowledge and the expertise they have brought in with so many years So we bring in people to treat the AI suggestions in the same way how a human suggestion would have been provided. So the people still validate what the suggestion is. And then, for example, SDTM mapping or SDTM transformation, right? An AI tool could do that transformation probably in a matter of minutes or hours, where previously we would have to do multiple rounds of SAS programming to do the same. But that doesn't mean that we take it as face value. So this is where the stat analyst or the stat programmer will still review these suggestions, review each one of the independent mappings, double-check that they are consistent with the study analysis plan, for example, and make sure that they are generated in a compliant manner. So what I'm trying to say is that, you know, we treat these suggestions the same way as we would document a human generated transformation, right? So how the transformation was designed with the derivation logic, you know, and how the code was written for that. And then, you know, who signed it off and we ship all of this to an FDA or an EMA for that review, right? So the point here is that introducing these tools can reduce some of the mundane efforts or the repeated efforts that the humans may have to do in the past by faster ways of processing those information. Because we're talking about lots of documents and lots of data, and a faster way to get the first cut of these suggestions would greatly reduce the lifecycle. So you've described AI as a co-pilot. How do reusable macros and guided processes translate into faster database locks and more consistent quality across studies? So what does AI tool or working with AI look like in the R&D stack, right? So when AI is working the clinical development stack, typical or routine programming tasks and review tasks become guided workflows. And that's the main objective here, right? Where tasks are mundane or repeatable, you want to make it faster to reduce the decision, to reduce overall lifecycle, right? So for example, you know, locating the right SGTM or ADM data sets or running the protocol deviation checks, flagging data quality validations or data quality errors, right? And unexpected trends or safety signals, right? Some of these things are very much, right, in the up until now served by the expertise of the people behind these processes, right? But having an AI tool in there doesn't necessarily take away all of that, right? So you build in this domain knowledge into a workflow task And thereby, you are making sure that people who are reviewing these workflows are still applying their domain knowledge and expertise to make sure that the output that comes out of these workflows are still correct. And more importantly, they provide the correct next steps for whether you go to a database log or you have to amend the study analysis plan, SAP document upstream. So what this means is when this happens in a coordinated manner between the guided workflow and the people with the knowledge behind it, this becomes a good starting point as a package for a statistical programmer or a data manager to review and start their work once the data comes to them. Like I said earlier, a statistical programmer often may find discrepancies in data downstream, and then that may require a change all the way upstream where the data has come in from that site, from the EDC tool. This could be a week's effort. But here, if there is more rigid checks using these workflows for the reviewers, then as you can imagine, for the reviewers also, it makes their life easy that they are now looking for really the most important things in study analysis plan or even in these deviation checks as opposed to more mundane things, right? So this will reduce the lifecycle of the data or the reviews and amendments going back to the SAP and coming back to the statistical programming. And we have to keep in mind that sometimes these amendments also have financial implications, which can be from hundreds of thousands of dollars to even sometimes in the millions of dollars, depending on the nature of the study, right? So what is important here is that everything that happens by this guided workflow with using AI tool needs to be recorded. Meaning, again, we go back to data lineage, right? So how the data came in from the raw CRF to these endpoints, right? How the transformations were done in the clinical information environment. What were the standards that were applied? What were the underlying SOPs, so to speak, in pharma lifecycle? Standard operating procedures that were applied. Who approved it? All of that, once they are in place, even small things like who approved it, for example. If you have a clear lineage of who, the complete trail of who touched these documents in different places. It saves time, right, for a statistical program to go back and find out who did this at what point. Everything could take time in a global organization. So these things are, these are some things that could be facilitated by, you know, AI tools to reduce time there. On the next level, what that means is, right, now teams can build reusable validation micros because now these processes are standardized. You are not at the, you're not, I don't want to say the word mercy, but you're not at the discretion of each reviewer on their thought process. Now we have a more standardized way of reviewing everything using these guided workflows. Somebody could be more experienced, somebody could be less experienced, but that doesn't change the workflow or the review substream significantly because you now have an underlying workflow that broadly incorporates what is needed. And that's a starting point for the reviews. So when it goes downstream further, when the stack teams are building their macros or tools, they can make it more reusable. Now, you don't need to change these macros for every study level or build additional macros for a particular study. I'm not saying it's 100% reusable. It will never be. But you can increase the reusability from typically what is maybe 40-50% to maybe 70-80%, you know, which already produces significant time reduction, right? So when teams build such reusable macros and templates, right, what that means is in the statistical programming lifecycle, Now, that translates directly into shorter errors or shorter review types because you're now validating everything to the consistent, using the same macros. So if the macro produced certain outputs for a previous study, it can be guaranteed it will produce the same output. And then you maybe add on something to it. And that's the part maybe you need to validate more compared to validating everything there. So this also means that then a database lock can also be achieved faster with fewer amendments to the study analysis. plan and like I said earlier consistent quality across studies. So now we are not saying that a more experienced team produces a better study output compared to a less experienced team and similarly the audit trail and the quality of the logs that are generated also more consistent because now our macros are more reusable and more consistent. So all this basically results I would think in keeping in yielding more consistent statistical judgment of the data which is the ultimate aim. The statistical judgment is what is submitted as efficacy or potential adverse events and things like that to the authorities. So they need to know how safe is this drug, how effective is it, does it do what it does and does it also not do what it's not supposed to do in terms of side effects. So this is where we think that AI-guided workflows could be sort of a co-pilot, you know, last, it's used in the programming world, right? Like a co-pilot to work alongside these different teams across the life cycle in a clinical study. Absolutely. And given everything we've discussed, validation, auditability, and system fragmentation, where do you see the biggest near-term opportunity to reduce decision cycle delays without triggering large-scale revalidation? Right. And earlier, you mentioned that delays are also caused by disparate systems and you're starting to consolidate as a leader. You're arguing for wrappers and adapters over expensive big bang replacements. Where can we leverage APIs to put layers around existing systems to really deliver quick wins and not have to revalidate the whole stack? We spoke a lot about what are the requirements from a validation standpoint to satisfy compliance needs. So what does this mean? Everything that I stated about does it mean that we need to rip out existing systems and bring in new systems to do this consolidation or implement these AI-guided workflows? Not necessarily, right? So you can always expand on top of your system. So we can, technical teams can build wrappers. Every tool or every AI tool, so to speak, these days have APIs that can be leveraged to build adapters or wrappers that can leverage data from existing systems and then add a layer on top that provides these workflows to the process, right? So, for example, an EDC system like Viva, you could build an AI-guided workflow, which, because Viva also offers API to read data from within Viva, you can have an API-guided workflow tool or a layer that can leverage Viva's APIs to read the data and then provide the lineage, for example, lineage tracking, so to speak, so where the data came from. So have the lineage tracking established through this guided workflow tool You can also think about adding quality control checks using similar guided workflows to the existing SDTM or ADM transformation pipelines Typically pipelines and data engineering pipelines or ETL pipelines are relatively modern So they would be good candidates to support such assisted QC checks. On the other hand, there are validation frameworks which are also coming into play when you do the statistical analysis, for example. And these validation frameworks, depending on the company, you could have your homegrown frameworks that your teams have built with their knowledge, which relate in their own way. And there are also offerings in the market that offer similar frameworks. Now, either way, you know, if the data, the question is on the data that the validation frameworks use. So if you have a data lifecycle where you are validating data from development to validation to production lifecycle, then you could have this AI-assisted check or a tool that plugs into the data that moves. So when you say validation, it typically includes data, metadata, documentation, everything that's impacted when a data is moved from one stage to another. So having a workflow tool that can tap into the underlying data itself can be an addition to your validation framework. Without having to report the framework, you could just say, okay, I plug in this tool here when I promote a file from dev to val, so to speak. What does that mean? So when it does that, it needs to perform a series of checks on your repository, which are typically accessible by APIs or endpoints. And then it needs to make sure that the right versions, the right artifacts are captured in this promotion of this data to another stage. And this is essentially the bread and butter of a validation lifecycle because this is what goes to the authorities when they see that, okay, when you move your data across the lifecycle, how did you ensure that you are keeping track of everything that's required? And this is where presently where somebody might have extensive SAS macros or other tools to do this. you could bake that knowledge into this kind of a workflow. Because what it gives you is that the ability to reuse this workflow across studies, wherein in the past, it could have been study-specific validation frameworks or frameworks that are extended at a study level. Ideally, what we are looking at is to see, because replacing or ripping out systems is also an expensive and time-consuming process in the regulated space, the time to market, so to speak, for these initiatives. It was typically run from months to years, unlike a SaaS software as a service or a retail space where you can have releases every month, so to speak. It's just not possible in a regulated space. So we try to make the best of it. If you, for example, don't touch the underlying framework itself, but you're just bringing on a layer of this A workflow to that framework, then you're not revalidating the framework. You're just testing this workflow tool without having to revalidate everything. That saves significant time and effort and something that can be done in parallel as opposed to saying, well, now I don't have to pause my validation, right? Otherwise, if you have to change the whole framework, you have to pause your validation and that can result in other unexpected delays that you don't want to have. So idea is that your teams continue to use the tools and the technologies that they have, but complement their work, especially for these routine checks, mundane reviews and, you know, data quality checks and things like that with guided workflows, which are supported by further audit rates And thereby, this establishes more, say, intelligence and traceability to those flows without having to entirely replace everything. And as with most people in the industry, budgets are always tight and we have to be very mindful of that. So we try to bring in where we can complement things, not replace things. Absolutely. And given everything we've discussed, validation, auditability, and system fragmentation, where do you see the biggest near-term opportunity to reduce decision cycle delays without triggering large-scale revalidation? One of the ways to also reduce the decision cycles or delays in data management or data preparation phase is to consolidate some of the systems. In my own experience, in our company, we had a lot of systems that were feeding in data from different sources. You know, as I said earlier, from EDC to lab systems to medical safety systems and things like that. So we are embarking on an exercise to also have consistent stack of systems broadly fulfilling capabilities. So data management and data preparation is one stack. We have a data lake house, which is meant to sort of provide the central transformation capabilities. And then we would have an analytics environment downstream, which is meant to do statistical computing or produce these TLFs and analytics for even other stakeholders, not just authorities. So this, in the past where we would have built, for example, in my context, from a statistical computing environment where we would have built integrations to seven or eight systems to gather the data and do the transformation. Right now, we are looking at maybe integrating with one system because the lake house, which comes on upstream to us, is meant to shift some of the burden of the data preparation and data management to its fault, thereby reducing the effort required by our statistical teams or analysis teams to spend time on that. because their job is really doesn't have to do with the data management, the data wrangling. They would prefer to have data that's ready for analysis and thereby they are able to spend more time on deriving more analysis. Otherwise, all teams are running on tight deadlines. There's never a time where anybody has a luxury of time. So teams would prefer to do more or generate most value out of what they can do with it instead of having to prepare for what they can do with it. And this is the key where I think this is what we also see in the industry. I think that there is some level of consolidation to reduce disparate systems and bring in more market standard technology offerings that also offer plug and play capabilities in terms of being able to integrate with APIs as opposed to having hardwired endpoints and thereby make them loosely coupled. at the same time also ensure more, an easier data integration primarily, right? And this is one of the trends or one of the movements that we see, I think, in pharma industry compared to the past few years where there were more individual systems and now we're going into more systems like Piva, MIDI data, platforms like Databricks, platforms like SaaS and, you know, related stacks. Right, right. And we heard that delays in R&D are systemic. They happen at the handoffs, the CSV gates, and the rework across disparate systems. But just as you explained, explainable workflows paired with human approvals can significantly cut cycle time while actually improving your audit posture. It sounds like the key is to complement, not replace your validated care. So for leaders who want to start making that shift today, standardize the 8 to 10 week weekly questions that cause the most rework, map your data owners and approvers, then pilot two of those routine questions as guided workflows inside a governed sandbox. Track those days saved, the rework avoided, and then as you reduce manual handoffs, scale what clears is audits. This is really, really fascinating stuff. Vaidi, thank you so much for being with us this week. Once again, thank you for listening to my thoughts. And I hope I've been of some help in explaining some of the complex topics around validation and how some decision cycles could be faster in our tightly regulated landscape. Wrapping up today's show, I think we heard at least three critical takeaways from our conversation today with Vaidhi Bharath, Associate Director of Data Science and AI Solutions at Bayer. First, most delays come from disconnected systems and rework, not the analytics itself. Second, in regulated environments, small changes trigger heavy validation cycles, so standardization matters. Finally, speed gains can come from layering guided workflows via APIs, keeping human approval and audit-ready traceability. Interested in putting your AI product in front of household names in the Fortune 500? connect directly with enterprise leaders at market-leading companies. Emerge can position your brand where enterprise decision makers turn for insight, research, and guidance. Visit Emerge.com slash sponsor for more information. Again, that's E-M-E-R-J.com slash S-P-O-N-S-O-R. I'm your host, at least for today, Matthew DeMello, Editorial Director here at Emerge AI Research. On behalf of Daniel Fagella, our CEO and head of research, as well as the rest of the team here at Emerge. Thanks so much for joining us today, and we'll catch you next time on the AI in Business Podcast. you