Fast algorithms and modern visualisations for feature selection - CPD96

  • 16 May 2019
  • 9:00 AM - 4:30 PM
  • UNSW Sydney, Kensington

Registration

  • Payment before 17 April 2019
  • Payment before 17 April 2019
  • Payment before 17 April 2019
  • Payment before 17 April 2019
    Proof of full-time student status required
  • Payment from 17 April 2019
  • Payment from 17 April 2019
  • Payment from 17 April 2019
  • Payment from 17 April 2019 -
    proof of full-time student status required

Registration is closed

This short course focuses on model selection techniques for linear and generalised linear regression in two scenarios: when an extensive search of the model space is possible as well as when the dimension is large and either stepwise algorithms or regularisation techniques have to be employed to identify good models. We incorporate recent research on graphical tools for model choice and on how to tune regularisation procedures, such as the Lasso through resampling or model selection criteria. Importantly, the limitations of the various model selection procedures will be discussed. A key component of the course is assessing the stability of selected components which is paramount for reliable predictive final models. We show how this can be achieved through visualizing measures of stability.

The practical implementation of the discussed methods is an essential component of this course. Interactive labs will give participants the opportunity to apply what they have learnt. We will use the cross-platform, open-source software R, in particular the leaps, bestglm, glmnet and the mplot packages.

Part 1: Exhaustive model searching with leaps and bestglm packages.

Part 2: Penalised regression methods as fast alternatives when exhaustive search is not possible. An introduction to cross-validation for model selection and the glmnet package.

Part 3: Assessing stability in model selection by bootstrapping regression models and visualising the results using advanced graphics with the mplot package.

Learning Objectives
The aim of all analyses is to use the data and, if available, information about its generating process, to construct statistical models which parsimoniously describe relevant and important features in the data. Too often in applied statistics, model selection is based on outdated methods, for example stepwise techniques. This workshop will highlight the limitations of established model selection methods and showcase more recent approaches for selecting with a focus on selecting a stable model.

Target Audience
Statistical model building is a fundamental part of many statistical analyses and will be of potential interest to anyone who wants to learn how to better select such models with increasingly high-dimensional and complex data.

Presenter Biographies

Samuel Mueller is a Professor of Statistics at the University Sydney and has 16 years’ experience as a mathematical statistician renowned for his contributions in model selection, classification and prediction for statistically challenging data. He has held academic positions at the University of Bern (Switzerland), ANU and UWA before having joined USyd in 2008 as a Lecturer with fast promotion to Professor by 2018. He currently leads two research groups on Theoretical Statistical Model Selection at the ANU (with Prof Welsh) and on Fast and Interactive Methods for Complex High-Dimensional Data at USyd. He was appointed by the Australian Research Council on their College of Experts for 2019-2021 as one of two members only representing Statistics as a discipline, is the Associate Dean Research Education (since 2016) in the Faculty of Science and serves as the Deputy Head of School (since 2019). He is also an Editor (Theory & Methods) of the Australian and New Zealand Journal of Statistics and Past-President of the International Biometric Society – Australasian Region.
Further details about Professor Mueller’s research can be found here.

Garth Tarr is a lecturer in statistics and data science at the University of Sydney. He has received more than A$3.3M in competitive grant funding and a number of citations for his teaching, including a Vice-Chancellor’s Award for Teaching Excellence in 2016. He received his PhD in Mathematical Statistics from the University of Sydney and has held positions at the University of Newcastle and the Australian National University. His diverse interests include robust statistics, data visualisation, model selection, econometric modelling, educational research, meat science and biostatistics. Garth is an expert R user and has created several R packages, including the mplot package, and has been a regular contributor to the Biometric Bulletin’s Software Corner.
Further details about Dr Tarr’s research can be found here.

Workshop Venue
RC-4082, Red Centre building (Central Wing), UNSW, Kensington NSW

Fees
Full-time students can sign up for student membership with the SSA for only $20 for 12 months to take advantage of the student member rate! Regular membership is $245.

Deadlines
Registrations close strictly on 5 May 2019. Early Bird registration closes on 16 April.

Travel Expenses
Occasionally workshops have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen. Please contact the SSA Office before making any travel arrangements to confirm that the workshop will go ahead, because the Society will not be held responsible for any travel or accommodation expenses incurred due to a workshop cancellation.

Cancellation Policy
Cancellations received prior to Monday, 6 May 2019 will be refunded, minus a $20 administration fee. From 6 May onwards no part of the registration fee will be refunded. However, registrations are transferable within the same organisation. Please advise any changes to eo@statsoc.org.au. 

Powered by Wild Apricot Membership Software