ESOC 2023: Predicting physical activity level after stroke

Conference abstract
Physical activity
Elastic net
R
A little background to our abstract presented at the ESOC 2023 conference in Munich.
Published

May 3, 2023

Modified

October 9, 2024

Intro

I am presenting a poster at European Stroke Organisation Conference 2023 on predicting changes in physical activity after stroke.

The poster will be part of the poster viewing programme on Wednesday, May 24 2023.

About the poster

See the poster here.

I wanted to divert from the traditional text heavy poster format. For the session at ESOC, I will be at the poster stand for most of the time to talk about our work. The abstract will be available for download for the participants.

The poster is created in PowerPoint, as this was where I had an available template. The template was later abandoned though. The font used is the free and open source font Jost*. Inspired by German design tradition. Icons are from the Material Design Icons, and also open source.

Background

Physical activity (PA) reduces the risk of stroke and improves functional outcome. We aimed to investigate predictors for decrease and increase in PA after stroke. We have been interested in trying to predict patients at increased risk of physical activity decline after stroke.

Methods

All analysis were performed using R and RStudio. We used the elastic net regression model as implemented in the glmnet-package for R.1

I have used the great book “An introduction to statistical learning with applications in R”.(James et al. 2021) This book is freely available and the authors have even created small talks on each chapter (though only for the first edition). I believe this book is the main curriculum for beginning work with statistical learning (or machine learning, but that matter).

The script used for creating a regularised prediction model is below.

Optimisation and regularisation steps
## ====================================================================
## Step 0: data import and wrangling
## ====================================================================

# source("data_format.R")
y1<-factor(as.integer(y)-1) ## Outcome is required to be factor of 0 or 1.


## ====================================================================
## Step 1: settings
## ====================================================================

## Folds
K=10
set.seed(3)
c<-caret::createFolds(y=y, 
                      k = K, 
                      list = FALSE, 
                      returnTrain = TRUE) # Foldids for alpha tuning

## Defining tuning parameters
lambdas=2^seq(-10, 5, 1)
alphas<-seq(0,1,.1)

## Weights for models
weighted=TRUE
if (weighted == TRUE) {
  wght<-as.vector(1 - (table(y)[y] / length(y)))
} else {
  wght <- rep(1, nrow(y))
}


## Standardise numeric
## Centered and 



## ====================================================================
## Step 2: all cross validations for each alpha
## ====================================================================

library(furrr)
library(purrr)
library(doMC)
registerDoMC(cores=6)

# Nested CVs with analysis for all lambdas for each alpha
# 
set.seed(3)
cvs <- future_map(alphas, function(a){
  cv.glmnet(model.matrix(~.-1,X),
            y1,
            weights = wght,
            lambda=lambdas, 
            type.measure = "deviance", # This is standard measure and recommended for tuning
            foldid = c, # Per recommendation the folds are kept for alpha optimisation
            alpha=a,
            standardize=TRUE,
            family=quasibinomial,
            keep=TRUE) # Same as binomial, but not as picky
})

## ====================================================================
# Step 3: optimum lambda for each alpha
## ====================================================================


# For each alpha, lambda is chosen for the lowest meassure (deviance)
each_alpha <- sapply(seq_along(alphas), function(id) {
  each_cv <- cvs[[id]]
  alpha_val <- alphas[id]
  index_lmin <- match(each_cv$lambda.min, 
                      each_cv$lambda)
  c(lamb = each_cv$lambda.min, 
    alph = alpha_val,
    cvm = each_cv$cvm[index_lmin])
})

# Best lambda
best_lamb <- min(each_alpha["lamb", ])

# Alpha is chosen for best lambda with lowest model deviance, each_alpha["cvm",]
best_alph <- each_alpha["alph",][each_alpha["cvm",]==min(each_alpha["cvm",]
                                                         [each_alpha["lamb",] %in% best_lamb])]

## https://stackoverflow.com/questions/42007313/plot-an-roc-curve-in-r-with-ggplot2
p_roc<-roc.glmnet(cvs[[1]]$fit.preval, newy = y)[[match(best_alph,alphas)]]|> # Plots performance from model with best alpha
  ggplot(aes(FPR,TPR)) + 
  geom_step() +
  coord_cartesian(xlim=c(0,1), ylim=c(0,1)) +
  geom_abline()+
  theme_bw()

## ====================================================================
# Step 4: Creating the final model
## ====================================================================

source("regular_fun.R") # Custom function
optimised_model<-regular_fun(X,y1,K,lambdas=best_lamb,alpha=best_alph) 
# With lambda and alpha specified, the function is just a k-fold cross-validation wrapper, 
# but keeps model performance figures from each fold.

list2env(optimised_model,.GlobalEnv)
# Function outputs a list, which is unwrapped to Env.
# See source script for reference.

## ====================================================================
# Step 5: creating table of coefficients for inference
## ====================================================================

Bmatrix<-matrix(unlist(B),ncol=10)
Bmedian<-apply(Bmatrix,1,median)
Bmean<-apply(Bmatrix,1,mean)

reg_coef_tbl<-tibble(
  name = c("Intercept",Hmisc::label(X)),
  medianX = round(Bmedian,5),
  ORmed = round(exp(Bmedian),5),
  meanX = round(Bmean,5),
  ORmea = round(exp(Bmean),5))%>%
  # arrange(desc(abs(medianX)))%>%
  gt()

## ====================================================================
# Step 6: plotting predictive performance
## ====================================================================

reg_cfm<-confusionMatrix(cMatTest)
reg_auc_sum<-summary(auc_test[,1])

## ====================================================================
# Step 7: Packing list to save in loop
## ====================================================================

ls[[i]] <- list("RegularisedCoefs"=reg_coef_tbl,
                "bestA"=best_alph,
                "bestL"=best_lamb,
                "ConfusionMatrx"=reg_cfm,
                "AUROC"=reg_auc_sum)

Publication status

We have recently applied for additional registry based data on socio economic status and educational level to include in the analysis. We are awaiting this data before publishing our main article on this project.

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R.

Footnotes

  1. Versions of glmnet also exists for MATLAB and Python↩︎