The WA Branch of the Statistical Society of Australia warmly invites you to a workshop on Time Series Data Mining with Python, presented by Dr Manuel Herrera of the University of Cambridge (UK). This workshop consists of two sessions, on 18th (Thursday) and 19th (Friday) of November 2021.
Manuel is a Research Associate in distributed intelligent systems at the Engineering Department of the University of Cambridge and a Royal Statistical Society Fellow. He works on engineering statistics and predictive analytics for smart and resilient critical infrastructure, having many years of experience on the management and maintenance of the UK national infrastructure.
Workshop Abstract
This two-day workshop aims to enable students and practitioners in data science to add methodologies of time-series data mining to their skill-set for future applications both for academic and industry projects. After an introduction to Python for time series analysis, the workshop explores data mining techniques for pattern extraction in time series, ranging from dimensionality reduction to anomaly detection. Participants will benefit from data wrangling for time series analysis with Python on the first day and a practical overview of time-series data mining tools on the second day.
Prerequisites
Internet connection, basic knowledge of coding (e.g. in R) and time series concepts, ideally a Google Drive account (recommended but not compulsory).
Workshop content
Day 1 - Fundamentals of time series analysis with Python
This is a hands-on course for learning the basics of data wrangling and time series analysis with Python.
We will begin the course with a quick introduction to Python and the Google Colab environment enabling a Jupyter notebook service to run Python code on a web browser with no setup requirements. We will then explore the use of libraries such as pandas, numpy and matplotlib to data acquisition, timestamping, preprocessing and visualization. We will continue the session by introducing the fundamentals of time series analysis. Throughout the workshop you will gain experience implementing these analysis in Python in real-life case-studies.
At the end of this module you will be able to:
- Get familiar with Python and the Google Colab environment.
- Use the Python libraries pandas and matplotlib to import, preprocessing, and data visualisation.
- Work on time series data analysis with the Python libraries pandas and statsmodels.
Day 2 - Introduction to time-series data mining
This workshop will introduce time-series data mining techniques using Symbolic Aggregate approXimation (SAX) with the specifically dedicated Python library saxpy, as well as with tslearn which provides more general machine learning tools for the analysis of time series data. We will see the benefits of the data dimension reduction using SAX, as well as its possibilities on the application further of clustering and classification techniques.
Matrix profile is a more advanced technique than SAX for time-series data mining. The workshop will introduce its theoretical basics while using the Python library matrixprofile for motif and novelty/discord discovery. The first, aiding to extract the most common patterns in a time series and the latter, to detect points and subsequences of potential anomalies. Other data mining problems, such as clustering and shapelet discovery for time series classification, will also be explored.
The workshop will cover:
- Use the Python library saxpy to work with SAX on time-series dimension reduction, clustering and classification.
- Explore the Python library tslearn for basic analysis based on SAX as well as for other machine learning techniques for time series.
- Work on time-series data mining using matrix profile and the Python library matrixprofile.
- Matrix profile analysis will include the discovery of time series discords that will lead to new possibilities for anomaly detection.
Timetable
All times in Australian Western Standard Time (AWST UTC+8).
Day 1
Time
|
Task
|
Outcome
|
13:30
|
1. Working environment
|
What is Google Colab about?
|
13:45
|
2. Basics of Python
|
How can I import/export time series in Python?
|
14:00
|
3. Basics of Python
|
How can I make preprocessing of time series data?
|
14:45
|
4. Basics of Python
|
How can I plot time series data?
|
15:30
|
5. Afternoon Coffee
|
Break
|
16:00
|
6. Basic patterns in time series
|
How can a time series be split into its main components?
|
16:15
|
7.Stationarity
|
How to identify if a series is stationary or not? How to make a time series stationary?
|
16:45
|
8. Missing data
|
How to treat missing values in a time series?
|
17:15
|
9. Basic analysis and forecasting
|
How to compute partial autocorrelation function? How to build a forecasting model using ARIMA?
|
17:45
|
10. Revision
|
Q&A and reserved time for participants
|
Day 2
Time
|
Task
|
Outcome
|
13:30
|
1. Intro to SAX
|
What is SAX about?
|
13:45
|
2. SAX representation
|
How can I reduce the dimension of a time series?
|
14:30
|
3. SAX for time series clustering
|
How can I use SAX for time series clustering?
|
15:00
|
4. SAX for time series classification
|
How can I use SAX for time series classification?
|
15:30
|
5. Afternoon Coffee
|
Break
|
16:00
|
6. Intro to matrix profile
|
What is matrix profile about?
|
16:30
|
7.Matrix profile for pattern discovery
|
How can I discover motifs and discords in a time series? Are those discords anomalies?
|
17:00
|
8. Other data mining tools
|
What are shapelets and how can I discover them in a time series? How can I make clustering of multiple time series?
|
17:30
|
9. Revision
|
Q&A and reserved time for participants
|
Registration Information
Members of the WA Branch of SSA will have priority access to registration for one-week before opening to participants outside of WA. Please contact ssa.wa.secretary@gmail.com for your registration code.
This workshop will be conducted via Zoom and Slack. An invitation to the Slack workspace will be sent to participants a few days prior.