TrialDura | Md Zabirul Islam

Overview

Why Trial Duration Prediction Matters

Interpretable, multimodal timeline forecasting for clinical trial planning and budgeting

Clinical trials are often long and expensive, so reliable duration prediction supports budgeting, staffing, recruitment planning, and operational risk management. TrialDura formulates duration prediction as a supervised regression problem and estimates trial duration (years) using multimodal trial records.

Inputs: (1) trial phase (I–IV, one-hot), (2) disease set (ICD-10 codes), (3) drug molecules (drug names), and (4) eligibility criteria (inclusion + exclusion text). TrialDura embeds drug/disease/criteria text with Bio-BERT and models criteria using hierarchical attention to obtain interpretable importance signals.

Dataset scale: 114,604 trials total, split temporally: training/validation trials starting before Jan 1, 2019 and test trials starting after. The paper reports 77,818 training records and 36,786 testing records.

Architecture

Hierarchical Attention Transformer (TrialDura)

Bio-BERT embeddings + hierarchical attention over eligibility criteria + regression head

1 Bio-BERT Embeddings

Drug names and disease codes are embedded with Bio-BERT into 768-D vectors (token embeddings averaged). Eligibility criteria sentences use the Bio-BERT CLS token embedding.

2 Hierarchical Attention for Eligibility Criteria

Word-/token-level signals build sentence embeddings; a transformer attention block captures sentence-to-sentence relationships across inclusion and exclusion criteria to form a paragraph-level representation.

3 Multimodal Concatenation

Phase (one-hot) + drug embedding + disease embedding + eligibility embedding are concatenated into a unified representation for prediction.

4 MLP Regression Head

A multi-layer perceptron predicts continuous trial duration (years), trained using Mean Squared Error (MSE).

1.044 yrs

MAE (TrialDura)

1.390 yrs

RMSE (TrialDura)

0.463

Pearson Corr.

114,604

Clinical Trials

Clinical trial record example (Table 1) — Table 1: Example Clinical Trial Record

A concrete trial instance illustrating the multimodal inputs used by **TrialDura**, including trial phase, disease (ICD-coded), drug molecules, and structured eligibility criteria (inclusion/exclusion). Start date, completion date, and derived duration serve as supervision for regression modeling.

Experiments

Large-Scale Evaluation

Benchmarking against classical ML and neural baselines on 114,604 ClinicalTrials.gov records

Performance summary table (Table 4) — Table 4: Main Performance Comparison

TrialDura achieves the best overall results (lowest MAE and RMSE; highest R² and Pearson correlation) compared to MEAN, Linear Regression, GBDT, RF, XGBoost, AdaBoost, and MLP baselines.

Model performance comparison (Figure 2) — Figure 2: Model Performance Comparison

Visualization of MAE and RMSE across baselines. TrialDura is best (lowest error) across both metrics, matching the quantitative ranking in Table 4.

Performance across phases (Table 5) — Table 5: Performance Across Trial Phases

TrialDura performance reported for Phase 1–4 and overall, showing phase-dependent difficulty.

Interpretability

Explaining Predictions with Shapley Values

Identifying which eligibility criteria sentences/terms drive duration estimates

Shapley visualization (Figure 3) — Figure 3: Shapley-Based Text Attribution

Example interpretability visualization for Clinical Trial NCT03553810. Darker segments indicate higher Shapley value / stronger contribution to the predicted duration, providing transparency into which eligibility criteria constraints most influence the model.

Analysis

Ablation Study

Understanding the contribution of unified training and sentence aggregation choices

Ablation across phases (Table 6) — Table 6: Phase-Specific vs Unified Model

Phase-only models show negative R² and near-zero Pearson correlation, while the unified TrialDura achieves robust positive R² and correlation, supporting cross-phase learning.

Pooling methods comparison (Table 7) — Table 7: Aggregation Methods (Max / Mean / CLS)

Max pooling, mean pooling, and CLS token aggregation yield very similar results; mean pooling is slightly better in R² by a narrow margin in the reported experiments.

Case Study

Predicted vs Actual Trial Duration

Examples across phases, diseases, and drugs

Predicted vs actual durations (Table 8) — Table 8: Predicted and Actual Duration Examples

TrialDura predictions track real trial durations across diverse conditions and drugs, illustrating practical forecasting utility for planning and resource allocation.

Reference

Citation

@article{yue2024trialdura,
  title={TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction},
  author={Yue, Ling and Li, Jonathan and Islam, Md Zabirul and Xia, Bolun and Fu, Tianfan and Chen, Jintai},
  journal={arXiv preprint arXiv:2404.13235},
  year={2024}
}