Part I: Enrollment Projections - The Pitfalls of Spreadsheet-Based Models

Jin Kim

April 16, 2024

•

min read

Recent years have been brutal for many biotech companies trying to fundraise. At the 2024 JP Morgan Healthcare Conference earlier this year, many investors predicted that biotech funding may begin to ease up by later this year. Only time will tell if the biotech fundraising climate actually gets better.

In the meantime, it has become more important than ever to maintain accurate enrollment projections. This is not only to inform senior management and the company’s board, but also prepare for post-data readout with an optimally available runway, accounting for both best-case and worst-case scenarios.

In this two-part series, we’ll go over the traditional Excel-based method of projecting enrollment and several best practices that the Miracle team has observed across the biotech industry for an optimal clinical trial management — for instance, how you can leverage real-time data from your Electronic Data Capture (EDC) system to achieve dynamic and accurate projections.

Today, we’ll start by going over the traditional forecast method based on spreadsheets and where they tend to fall short.

‍

Traditional Forecast Method Using Spreadsheets (i.e. Excel, Smartsheet)

Historically, biotech companies have heavily relied on manual, spreadsheet-based models to forecast crucial milestones like completing enrollment and completing study. It’s very common to see projection spreadsheets built with Microsoft Excel or Smartsheet. Even for sponsors that are outsourcing their clinical trial to Contract Research Organizations (CRO), their in-house Clinical Operations teams are heavily involved in preparing accurate enrollment projections and timelines for their company board meetings.

This approach often involves crunching numbers into a spreadsheet. At a high level, it’s the number of screenings and randomizations for each of your research sites by month. Depending on which EDC system you use, it could be a simple CSV export to obtain those metrics, or it could be opening up a tab for each site to count up the number of screenings and randomizations for each month. In larger Phase 3 studies, there could be hundreds of research sites, and it’s nearly impossible to do this manually in an efficient manner.

Many spreadsheet-based forecast models tend to assume an intuition-based randomization rate for future months (i.e. 0.33 randomizations/month for lower performing sites, 0.75 randomizations/month for higher performing sites). Or they may apply average randomization rates across trial sites based on a ton of XLOOKUP and VLOOKUP functions on CSV exports from your EDC system that you copy-paste into the Excel model.

While this approach could work somewhat efficiently in smaller Phase 1 studies, it could be very time-consuming to update in larger studies. More importantly, this method is prone to yield inaccurate forecasts, which could have significant ramifications for a biotech company’s runway and future planning. Accurate enrollment forecasts are critical to efficient clinical trial management.

‍

Common Pitfalls with Spreadsheet-Based Enrollment Models

Here are several considerations to keep in mind when using Spreadsheet-based enrollment models:

1. Heavily Skewed Performances of Sites

Many Excel-based models often apply average screening and randomization rates across all trial sites, ignoring the individual performance variations that can significantly impact the accuracy of projections. However, only a handful of sites are typically responsible for the vast majority of randomizations in a clinical trial. As an example, the mean number of randomizations could be the 95th percentile of randomizations across your study sites. It would be unrealistic to apply the mean randomization rate in projections without taking into consideration the distribution of randomization rates across sites. You wouldn’t expect your trial sites with 0 randomizations to suddenly perform at the 95th percentile, so blindly applying the mean randomization rate would only severely skew your projections to an optimal scenario.

2. Lack of Flexibility

Especially if you’re heavily relying on Excel functions like XLOOKUP and VLOOKUP, you need to ensure you have the right values placed in right cells. If you make any changes, you need to make sure you propagate changes to all affected cells across all your sheets. Even making a simple change could easily turn into a nightmare to troubleshoot, placing a severe constraint on how easily you can make modifications.

3. Time-Consuming Manual Processes

As your clinical trial accumulates additional screenings and randomizations, you’d need to update your projections based on the recent performance of your trial sites. However, you’d need to incorporate the relevant metrics from your EDC system into your spreadsheet tracker. Sometimes, it’s easy as jotting down the number of new screens and randomizations at each site, but phase 3 studies could sometimes have hundreds of sites. In other cases, you may be downloading CSV exports from your EDC system and copy-pasting values into your Excel projection model. Regardless, you’d need someone from your team to manually update the metrics in order to update your projections.

4. File Size Limits

When trying to download CSV exports from your EDC system, you may find it taking a long time to download the CSV file, especially if your study is large. When you try to copy-paste these CSV files into your Excel-based enrollment projection model or upload into your Smartsheets, it could take a lot longer. We’ve also observed that some online spreadsheet tools have file size limits for upload, in which case, you’d need to split up the CSV export from your EDC system into multiple, smaller CSV files in order to upload.

5. Expensive to Maintain and Update

At smaller biotech companies, you typically don’t have a data scientist dedicated to maintaining your projection models. It could be the Head of Clinical Operations working on this, whose time could be better spent working with sites to address potential challenges, or perhaps outside consultants, in which case there would be a cost associated with any updates or changes you want to make to your Excel models. Even if the cost might not be a big deal, it could take days to coordinate with the outside consultant, go over the proposed changes, and implement the actual changes.

‍

Importance of Accurate Forecasts

Your team’s enrollment projections could have huge financial implications for your company’s future, so you need to be able to provide an accurate, data-driven forecast for your board, illustrating the best-case and worst-case scenarios for enrollment. This allows your leadership team to plan for post data readout, which could realistically yield positive or negative results, and maximize runway on the backend to ensure the company’s long term success.

Your enrollment projections could also shed actionable insights that could optimize burn. For instance, if certain sites are yielding higher screen failure rates than the mean across sites and have not randomized any patients, you could try removing those sites from your enrollment projection model. If removing those sites doesn’t impact your forecast (and assuming those sites aren’t where your KOL’s are), you may want to have a discussion with your team about removing those sites and using the freed up budget to activate new sites instead.

‍

In the next Part 2 of this series, we’ll go over some of the best practices that our team has observed in the industry, including how Miracle has been supporting our customers with data-driven enrollment projections that seamlessly integrates with your EDC system.

If any of the pain points we covered above resonate with you (or if we missed any that you encounter), please don’t hesitate to reach out.

‍