Supercharging Statistical Analysis with ARDs and the {cards} R Package

Enhancing Automation, Traceability, and Reliability in Your Statistical Workflow.

Becca Krouse and Davide Garolini, GSK/Roche

Today’s Plan

  • Discuss our experience using ARD-first approach for TLGs.

  • But first, a little background

    • CDISC’s Analysis Results Standard (ARS)

    • ARDs with the {cards}+{cardx} packages

    • Tables with the {gtsummary} package

CDISC’s Analysis Results Standard (ARS)

How can we have reproducible results when there are so many layouts and formats?

Analysis Results Standard (ARS)

  • Goal: To improve the quality and efficiency of clinical reporting.
  • Enables: Automation, Reproducibility, Reusability, and Traceability.

Separates the result from the presentation - maximum flexibility!

  • An ARS contains the statistical outcome (e.g., a mean of 25.3).
  • It does not contain display instructions (e.g., font size or cell color)

CDISC’s Analysis Results Standard (ARS)

CDISC’s Analysis Results Standard (ARS)

  • The ARS provides a metadata-driven infrastructure for analysis

  • {cards} serves as the engine for the analysis

  • {gtsummary} is the engine for summary tables display

Analysis Results Data (ARD)

  • After the initial creation of an ARD, the results can later be re-used again and again for subsequent reporting needs.

ARDs using {cards}

cards website

{cards}: Introduction

  • Part of the Pharmaverse

  • Contains a variety of utilities for creating ARDs

  • Can be used within the ARS workflow and separately

  • 52K downloads per month 🤯

What does an ARD look like?

library(cards)

# create ARD with default summary statistics
ADSL |> 
  ard_continuous(
    variables = AGE
  )
{cards} data frame: 8 x 8
  variable   context stat_name stat_label   stat fmt_fn
1      AGE continuo…         N          N    254      0
2      AGE continuo…      mean       Mean 75.087      1
3      AGE continuo…        sd         SD  8.246      1
4      AGE continuo…    median     Median     77      1
5      AGE continuo…       p25         Q1     70      1
6      AGE continuo…       p75         Q3     81      1
7      AGE continuo…       min        Min     51      1
8      AGE continuo…       max        Max     89      1
ℹ 2 more variables: warning, error

What does an ARD look like?

It’s simple to pass any function to ard_continuous() (base R functions, functions from other package, user-defined functions, etc.)

ADSL |> 
  ard_continuous(
    by = ARM,
    variables = AGE,
    statistic = ~list(cv = \(x) sd(x) / mean(x))
  )
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label  stat
1    ARM      Placebo      AGE        cv         cv 0.114
2    ARM    Xanomeli…      AGE        cv         cv 0.106
3    ARM    Xanomeli…      AGE        cv         cv  0.11
ℹ 4 more variables: context, fmt_fn, warning, error

{cards}: ard_categorical()

ADSL |> 
  ard_categorical(
    by = ARM,
    variables = AGEGR1
  ) |> head(n = 5)
{cards} data frame: 5 x 11
  group1 group1_level variable variable_level stat_name stat_label  stat
1    ARM      Placebo   AGEGR1            <65         n          n    14
2    ARM      Placebo   AGEGR1            <65         N          N    86
3    ARM      Placebo   AGEGR1            <65         p          % 0.163
4    ARM      Placebo   AGEGR1            >80         n          n    30
5    ARM      Placebo   AGEGR1            >80         N          N    86
ℹ 4 more variables: context, fmt_fn, warning, error

Other Summary Functions: ard_dichotomous(), ard_hierarchical(), ard_complex(), and ard_missing() - they can be stacked into a single data frame. 🥞

{cardx} (read: extra cards)

{cardx}

  • Extension of the {cards} package, providing additional functions to create Analysis Results Datasets (ARDs).

  • The {cardx} package exports many ard_*() function for statistical methods.

cards and cardx package logos

{cardx}

  • Exports ARD frameworks for statistical analyses from many packages
  - {stats}
  - {car}
  - {effectsize}
  - {emmeans}
  - {geepack}
  - {lme4}
  - {parameters}
  - {smd}
  - {survey}
  - {survival}
  • This list is growing (rather quickly) 🌱

From ARDs to tables with {gtsummary}

{gtsummary} in a (stat) nutshell

{gtsummary} is one of the most popular packages for creating summary tables in the R ecosystem:

  • 1,500,000+ installations from CRAN

  • 1100+ GitHub stars

  • 300+ contributors

  • 50+ code contributors

  • Won ASA 2021 Innovation in Programming Award and 2024 Posit Pharma Table Contest

{gtsummary} runs on ARDs!

Demographics Example

library(gtsummary)

tbl <- dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  tbl_summary(
    by = TRT01A,
    include = c(AGE, AGEGR1),
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})")
  ) |> 
  add_overall() |> 
  add_stat_label()
tbl
Characteristic Overall
N = 254
Placebo
N = 86
Xanomeline High Dose
N = 72
Xanomeline Low Dose
N = 96
Age



    Mean (SD) 75 (8) 75 (9) 74 (8) 76 (8)
    Median (Q1, Q3) 77 (70, 81) 76 (69, 82) 76 (70, 79) 78 (71, 82)
Pooled Age Group 1, n (%)



    >64 221 (87%) 72 (84%) 61 (85%) 88 (92%)
    18-64 33 (13%) 14 (16%) 11 (15%) 8 (8.3%)

Demographics Example

  • Extract the ARD from the table object
gather_ard(tbl) |> purrr::pluck("tbl_summary")
{cards} data frame: 79 x 12
   group1 group1_level variable variable_level stat_name stat_label  stat
1  TRT01A      Placebo   AGEGR1            >64         n          n    72
2  TRT01A      Placebo   AGEGR1            >64         N          N    86
3  TRT01A      Placebo   AGEGR1            >64         p          % 0.837
4  TRT01A      Placebo   AGEGR1          18-64         n          n    14
5  TRT01A      Placebo   AGEGR1          18-64         N          N    86
6  TRT01A      Placebo   AGEGR1          18-64         p          % 0.163
7  TRT01A    Xanomeli…   AGEGR1            >64         n          n    61
8  TRT01A    Xanomeli…   AGEGR1            >64         N          N    72
9  TRT01A    Xanomeli…   AGEGR1            >64         p          % 0.847
10 TRT01A    Xanomeli…   AGEGR1          18-64         n          n    11
ℹ 69 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 5 more variables: context, fmt_fn, warning, error, gts_column

Demographics Example: ARD-first

dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  cards::ard_stack(
    .by = TRT01A, .overall = TRUE, .attributes = TRUE,
    ard_continuous(variables = AGE),
    ard_categorical(variables = AGEGR1)
  ) |> 
  tbl_ard_summary(
    by = TRT01A,
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
    overall = TRUE
  ) |> 
  add_stat_label()
Characteristic Overall Placebo Xanomeline High Dose Xanomeline Low Dose
Age



    Mean (SD) 75.1 (8.2) 75.2 (8.6) 73.8 (7.9) 76.0 (8.1)
    Median (Q1, Q3) 77.0 (70.0, 81.0) 76.0 (69.0, 82.0) 75.5 (70.0, 79.0) 78.0 (71.0, 82.0)
Pooled Age Group 1, n (%)



    >64 221 (87.0%) 72 (83.7%) 61 (84.7%) 88 (91.7%)
    18-64 33 (13.0%) 14 (16.3%) 11 (15.3%) 8 (8.3%)

Our Pilot

Our ARD-based Pilot

  • We wanted to dip our toes into CDISC’s Analysis Results Standard (ARS)

    • We did not try to implement the full model, just getting a taste

    • Using some metadata to drive the creation of TLGs

    • Using an ARD-first approach to create our TLGs, e.g. using the {cards} R package to create ARDs, and the {gtsummary} package for tables

What we liked!

  • Using {cards}+{cardx}+{gtsummary}, we created every summary for a trial read-out. 🕺🕺🕺

  • Intuitive design was a key factor in its adoption, even when no prior training was provided.

What we liked!

  • We loved the ARD-based results, which made:

    • QC easy and straightforward.
    • re-purpose easy for different reporting needs.
    • Automation?

What we would do differently?

  • The full ARS model is metadata driven - the metadata dictate the layout of the tables.

  • Non-“standard” tables can be problematic when we use metadata for layouts.

  • R scripts are better to handle than metadata files.

🕺🕺 ARD Team 🕺🕺

  • Won the 2021 American Statistical Association (ASA) Innovation in Programming Award

  • Agustin Calatroni and I won the 2024 Posit Pharma Table Contest by re-creating an entire CSR with the {gtsummary} package

ARDs uses outside of the ARS

  • Rethinking QC

    • Highly structured data frame of results is much simpler to QC compared to statistics in a summary table or figure.
  • Flexible data file types

    • ARD can be saved as a dataset (rds, xpt, parquet, etc.), YAML, or JSON file
  • ARDs integrate with the {gtsummary} package to create summary tables

{cardx} t-test Example

  • We see the results like the mean difference, the confidence interval, and p-value as expected.

  • And we also see the function’s inputs, which is incredibly useful for re-use, e.g. we know the we did not use equal variances.

pharmaverseadam::adsl |> 
  dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
  cardx::ard_stats_t_test(by = ARM, variables = AGE)
{cards} data frame: 14 x 9
   group1 variable   context   stat_name stat_label      stat
1     ARM      AGE stats_t_…    estimate  Mean Dif…    -1.286
2     ARM      AGE stats_t_…   estimate1  Group 1 …    74.381
3     ARM      AGE stats_t_…   estimate2  Group 2 …    75.667
4     ARM      AGE stats_t_…   statistic  t Statis…     -1.03
5     ARM      AGE stats_t_…     p.value    p-value     0.304
6     ARM      AGE stats_t_…   parameter  Degrees …   165.595
7     ARM      AGE stats_t_…    conf.low  CI Lower…     -3.75
8     ARM      AGE stats_t_…   conf.high  CI Upper…     1.179
9     ARM      AGE stats_t_…      method     method Welch Tw…
10    ARM      AGE stats_t_… alternative  alternat… two.sided
11    ARM      AGE stats_t_…          mu    H0 Mean         0
12    ARM      AGE stats_t_…      paired  Paired t…     FALSE
13    ARM      AGE stats_t_…   var.equal  Equal Va…     FALSE
14    ARM      AGE stats_t_…  conf.level  CI Confi…      0.95
ℹ 3 more variables: fmt_fn, warning, error

{cardx} Regression

  • Includes functionality to summarize nearly every type of regression model in the R ecosystem:

betareg::betareg(), biglm::bigglm(), brms::brm(), cmprsk::crr(), fixest::feglm(), fixest::femlm(), fixest::feNmlm(), fixest::feols(), gam::gam(), geepack::geeglm(), glmmTMB::glmmTMB(), lavaan::lavaan(), lfe::felm(), lme4::glmer.nb(), lme4::glmer(), lme4::lmer(), logitr::logitr(), MASS::glm.nb(), MASS::polr(), mgcv::gam(), mice::mira, mmrm::mmrm(), multgee::nomLORgee(), multgee::ordLORgee(), nnet::multinom(), ordinal::clm(), ordinal::clmm(), parsnip::model_fit, plm::plm(), pscl::hurdle(), pscl::zeroinfl(), rstanarm::stan_glm(), stats::aov(), stats::glm(), stats::lm(), stats::nls(), survey::svycoxph(), survey::svyglm(), survey::svyolr(), survival::cch(), survival::clogit(), survival::coxph(), survival::survreg(), tidycmprsk::crr(), VGAM::vglm() (and more)

{cardx} Regression Example

library(survival); library(ggsurvfit)

# build model
mod <- pharmaverseadam::adtte_onco |> 
  dplyr::filter(PARAM %in% "Progression Free Survival") |>
  coxph(Surv_CNSR() ~ ARM, data = _)

# put model in a summary table
tbl <- gtsummary::tbl_regression(mod, exponentiate = TRUE) |> 
  gtsummary::add_n(location = c('label', 'level')) |> 
  gtsummary::add_nevent(location = c('label', 'level'))


Characteristic N Event N HR 95% CI p-value
Description of Planned Arm 254 6


    Placebo 86 3
    Xanomeline High Dose 84 2 3.00 0.39, 22.9 0.3
    Xanomeline Low Dose 84 1 1.27 0.11, 14.3 0.8
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

When things go wrong 😱

What happens when statistics are un-calculable?

ard_gone_wrong <- 
  cards::ADSL |> 
  cards::ard_continuous(
    by = ARM,
    variable = AGEGR1,
    statistic = ~list(kurtosis = \(x) e1071::kurtosis(x))
  )
ard_gone_wrong
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label stat   warning     error
1    ARM      Placebo   AGEGR1  kurtosis   kurtosis      argument… non-nume…
2    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
3    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
ℹ 2 more variables: context, fmt_fn
cards::print_ard_conditions(ard_gone_wrong)

{gtsummary} extras

  • {gtsummary} tables are composable, meaning complex tables can be cobbled together one piece at a time and combined.

    • many other functions to create common structures, such as, tbl_continuous(), tbl_hierarchical(),tbl_cross(), tbl_wide_summary(), and many more

    • add_*() functions will add additional columns/summary statistics to an existing table.

    • tbl_merge() and tbl_stack() combine tow more more tables

    • and many more functions available for creating beautiful tables!🤩

  • Check out the PHUSE US Connect Workshop (later today) for more information!

{gtsummary} extras

  • If the structured tbl_*() and tbl_ard_*() functions don’t exactly meet your needs, use as_gtsummary()!

  • The as_gtsummary() function ingests a data frame and adds the {gtsummary} framework around it: great for listings and highly bespoke tables.

cards::ADAE[1:7, c("USUBJID", "AESOC", "AETERM", "AESEV")] |> 
  as_gtsummary() |> 
  modify_column_alignment(everything(), "left") |> 
  as_gt(groupname_col = "USUBJID")
Primary System Organ Class Reported Term for the Adverse Event Severity/Intensity
01-701-1015
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE ERYTHEMA MILD
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE PRURITUS MILD
GASTROINTESTINAL DISORDERS DIARRHOEA MILD
01-701-1023
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MILD
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MODERATE
CARDIAC DISORDERS ATRIOVENTRICULAR BLOCK SECOND DEGREE MILD
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MILD

{gtsummary} extras