The SSA NSW Branch is pleased to announce the following event:
Fast algorithms and modern visualisations for feature selection
When: 16 May 2019 9:00 AM, UTC+10:00
Where: UNSW Sydney, Kensington
About the course:
This short course focuses on model selection techniques for linear and generalised linear regression in two scenarios: when an extensive search of the model space is possible as well as when the dimension is large and either stepwise algorithms or regularisation techniques have to be employed to identify good models. We incorporate recent research on graphical tools for model choice and on how to tune regularisation procedures, such as the Lasso through resampling or model selection criteria. Importantly, the limitations of the various model selection procedures will be discussed. A key component of the course is assessing the stability of selected components which is paramount for reliable predictive final models. We show how this can be achieved through visualizing measures of stability.
The practical implementation of the discussed methods is an essential component of this course. Interactive labs will give participants the opportunity to apply what they have learnt. We will use the cross-platform, open-source software R, in particular the leaps, bestglm, glmnet and the mplot package.
Part 1: Exhaustive model searching with leaps and bestglm packages.
Part 2: Penalised regression methods as fast alternatives when exhaustive search is not possible. An introduction to cross-validation for model selection and the glmnet package.
Part 3: Assessing stability in model selection by bootstrapping regression models and visualising the results using advanced graphics with the mplot package.
Learning Objectives
The aim of all analyses is to use the data and, if available, information about its generating process, to construct statistical models which parsimoniously describe relevant and important features in the data. Too often in applied statistics, model selection is based on outdated methods, for example stepwise techniques. This workshop will highlight the limitations of established model selection methods and showcase more recent approaches for selecting with a focus on selecting a stable model.
Target Audience
Statistical model building is a fundamental part of many statistical analyses and will be of potential interest to anyone who wants to learn how to better select such models with increasingly high-dimensional and complex data.
Presenter Biographies
Samuel Mueller is a Professor of Statistics at the University Sydney and has 16 years’ experience as a mathematical statistician renowned for his contributions in model selection, classification and prediction for statistically challenging data. He has held academic positions at the University of Bern (Switzerland), ANU and UWA before having joined USyd in 2008 as a Lecturer with fast promotion to Professor by 2018. He currently leads two research groups on Theoretical Statistical Model Selection at the ANU (with Prof Welsh) and on Fast and Interactive Methods for Complex High-Dimensional Data at USyd. He was appointed by the Australian Research Council on their College of Experts for 2019-2021 as one of two members only representing Statistics as a discipline, is the Associate Dean Research Education (since 2016) in the Faculty of Science and serves as the Deputy Head of School (since 2019). He is also an Editor (Theory & Methods) of the Australian and New Zealand Journal of Statistics and Past-President of the International Biometric Society – Australasian Region.
Further details about Professor Mueller’s research can be found here.
Garth Tarr is a lecturer in statistics and data science at the University of Sydney. He has received more than A$3.3M in competitive grant funding and a number of citations for his teaching, including a Vice-Chancellor’s Award for Teaching Excellence in 2016. He received his PhD in Mathematical Statistics from the University of Sydney and has held positions at the University of Newcastle and the Australian National University. His diverse interests include robust statistics, data visualisation, model selection, econometric modelling, educational research, meat science and biostatistics. Garth is an expert R user and has created several R packages, including the mplot package, and has been a regular contributor to the Biometric Bulletin’s Software Corner.
Further details about Dr Tarr’s research can be found here.
For more information and to register, please click here.