Statistical Society of Australia warmly invites you to a workshop on machine learning with Python, presented by Patrick Robotham from Linktree. This workshop consists of two sessions, on 13th (Saturday) and 14th (Sunday) of November.
Patrick is a Staff Machine Learning Engineer at Linktree. He works to build production ready machine learning and statistical models and has 7 years of experience in industry.
WORKSHOP ABSTRACT
This two day workshop aims to enable data scientists to incrementally incorporate Python in their workflow. After an introduction of Python basics, the workshop focuses on developing Python models in a workflow framework that is most commonly seen in a production environment. Participants will benefit from a gentle introduction to Python on the first day before learning some powerful modelling concepts and tools on the second day.
WORKSHOP CONTENT
Day 1 Getting Started with Python and Pandas
This is a hands-on course for learning the basics of Python and data manipulation with the Pandas library.
We will begin this course with a gentle introduction to the basics of Python like variables assignments and data type conversions. We will then dive into Pandas which is the most popular package for manipulating tabular data in Python. We will end this session by making some basic plots for our data. Throughout the workshop you will program a sequence of Jupyter notebooks and gain experience in working with data in Python.
At the end of this module you will be able to:
-
Understand the basic data types in Python and how to convert between them.
-
Use the Python libraries pandas to import and manipulate data.
-
Use matplotlib to make basic visualisations on data.
Day 2 Introduction to Machine Learning
This workshop will teach you how to use the scikit-learn library to construct regression/classification models, tuning model parameters and evaluating model performance.
The scikit-learn library supports most of the standard classification, regression and clustering models that we regularly use everyday as statisticians and data scientists. In addition, scikit-learn offers a unique “workflow” framework that can wrap most data manipulations, scaling, imputations, tuning and evaluation together, which provides a consistent standard for machine learning model deployment.
The workshop will cover:
-
Use the Python libraries pandas and numpy to import and manipulate data.
-
Use scikit-learn to construct linear and tree-based models.
-
Know the difference between classification and regression.
-
Evaluate a predictive model with appropriate metrics and plots.
-
Improve a machine learning model using hyperparameter tuning.
-
Perform necessary scalings and imputation on the data.
-
Standardisation of model deployment using pipelines.
Timetable
Day 1
Time
|
Task
|
Outcome
|
09:00
|
1. Running and Quitting
|
How can I run Python programs?
|
09:15
|
2. Variables and Assignment
|
How can I store data in programs?
|
09:35
|
3. Data Types and Type Conversion
|
What kinds of data do programs store? How can I convert one type to another?
|
09:55
|
4. Built-in Functions and Help
|
How can I use built-in functions? How can I find out what they do? What kind of errors can occur in programs?
|
10:20
|
5. Morning Coffee
|
Break
|
10:35
|
6. Libraries
|
How can I use software that other people have written? How can I find out what that software does?
|
10:55
|
7. Reading Tabular Data into DataFrames
|
How can I read tabular data?
|
11:15
|
8. Pandas DataFrames
|
How can I do statistical analysis of tabular data?
|
11:45
|
9. Plotting
|
How can I plot my data? How can I save my plot for publishing?
|
Day 2
Time
|
Task
|
Outcome
|
09:00
|
1. Quick revision and set up
|
A quick recap of Day 1
|
09:10
|
2. Regression Models
|
What is a regression model and how can we fit one using scikit-learn?
|
09:35
|
3. Classification Models
|
What is a classification model and how can we fit one using scikit-learn?
|
09:55
|
4. Dummy encoding, scaling and imputation
|
What kind of manipulations should we apply to our data before we can fit a model?
|
10:20
|
5. Morning Coffee
|
Break
|
10:35
|
6. Cross Validation
|
How is cross validation used to evaluate model performance?
|
10:55
|
7. Hyperparameter Tuning
|
How can we make our model more accurate and flexible?
|
11:15
|
8. Pipelines
|
How can we wrap all preprocessing steps and model tuning and evaluations under a consistent framework?
|
11:45
|
9. Revision
|
Q&A and reserved time for participants
|
Expenses:
Occasionally workshops have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen. Please note that the Society will not be held responsible for any financial loss incurred due to a workshop cancellation.
Financial Support:
Financial support for SSA Vic members can be sought. For further information, please see https://statsoc.org.au/News-and-media-releases/10424132.
Contact:
Please contact the organisers: Patrick Robotham (patrick.robotham2@gmail.com) and Kevin Wang (kevinwangstats@gmail.com) for further details.