Back

CPD145 - Time Series Data Mining with Python

Start
18 Nov 2021
End
19 Nov 2021
Schedule
2 sessions
#1.
18 Nov 2021, 1:30 PM 6:00 PM (AWST)
#2.
19 Nov 2021, 1:30 PM 6:00 PM (AWST)
Location
Online
Spaces left
15

Registration

Member – $150.00
Discounted registration for SSA Members.
Member (WA Branch) – $150.00
Discounted, prioritised registration for members of WA Branch of SSA.
Non-Members – $300.00
Registrations for those who are not members of SSA.
Retired Members – $100.00
Discounted registration for Retired Members of SSA.
Retired Members (WA Branch) – $100.00
Discounted registration for Retired Members of the WA Branch of SSA.
Student Members – $50.00
Discounted registration Student Members of SSA.

Registration is closed

The WA Branch of the Statistical Society of Australia warmly invites you to a workshop on Time Series Data Mining with Python, presented by Dr Manuel Herrera of the University of Cambridge (UK). This workshop consists of two sessions, on 18th (Thursday) and 19th (Friday) of November 2021.

Manuel is a Research Associate in distributed intelligent systems at the Engineering Department of the University of Cambridge and a Royal Statistical Society Fellow. He works on engineering statistics and predictive analytics for smart and resilient critical infrastructure, having many years of experience on the management and maintenance of the UK national infrastructure.

Workshop Abstract

This two-day workshop aims to enable students and practitioners in data science to add methodologies of time-series data mining to their skill-set for future applications both for academic and industry projects. After an introduction to Python for time series analysis, the workshop explores data mining techniques for pattern extraction in time series, ranging from dimensionality reduction to anomaly detection. Participants will benefit from data wrangling for time series analysis with Python on the first day and a practical overview of time-series data mining tools on the second day.

Prerequisites

Internet connection, basic knowledge of coding (e.g. in R) and time series concepts, ideally a Google Drive account (recommended but not compulsory).

Workshop content

Day 1 - Fundamentals of time series analysis with Python

This is a hands-on course for learning the basics of data wrangling and time series analysis with Python.

We will begin the course with a quick introduction to Python and the Google Colab environment enabling a Jupyter notebook service to run Python code on a web browser with no setup requirements. We will then explore the use of libraries such as pandas, numpy and matplotlib to data acquisition, timestamping, preprocessing and visualization. We will continue the session by introducing the fundamentals of time series analysis. Throughout the workshop you will gain experience implementing these analysis in Python in real-life case-studies.

At the end of this module you will be able to:

Get familiar with Python and the Google Colab environment.
Use the Python libraries pandas and matplotlib to import, preprocessing, and data visualisation.
Work on time series data analysis with the Python libraries pandas and statsmodels.

Day 2 - Introduction to time-series data mining

This workshop will introduce time-series data mining techniques using Symbolic Aggregate approXimation (SAX) with the specifically dedicated Python library saxpy, as well as with tslearn which provides more general machine learning tools for the analysis of time series data. We will see the benefits of the data dimension reduction using SAX, as well as its possibilities on the application further of clustering and classification techniques.

Matrix profile is a more advanced technique than SAX for time-series data mining. The workshop will introduce its theoretical basics while using the Python library matrixprofile for motif and novelty/discord discovery. The first, aiding to extract the most common patterns in a time series and the latter, to detect points and subsequences of potential anomalies. Other data mining problems, such as clustering and shapelet discovery for time series classification, will also be explored.

The workshop will cover:

Use the Python library saxpy to work with SAX on time-series dimension reduction, clustering and classification.
Explore the Python library tslearn for basic analysis based on SAX as well as for other machine learning techniques for time series.
Work on time-series data mining using matrix profile and the Python library matrixprofile.
Matrix profile analysis will include the discovery of time series discords that will lead to new possibilities for anomaly detection.

Timetable

All times in Australian Western Standard Time (AWST UTC+8).

Day 1

Time	Task	Outcome
13:30	1. Working environment	What is Google Colab about?
13:45	2. Basics of Python	How can I import/export time series in Python?
14:00	3. Basics of Python	How can I make preprocessing of time series data?
14:45	4. Basics of Python	How can I plot time series data?
15:30	5. Afternoon Coffee	Break
16:00	6. Basic patterns in time series	How can a time series be split into its main components?
16:15	7.Stationarity	How to identify if a series is stationary or not? How to make a time series stationary?
16:45	8. Missing data	How to treat missing values in a time series?
17:15	9. Basic analysis and forecasting	How to compute partial autocorrelation function? How to build a forecasting model using ARIMA?
17:45	10. Revision	Q&A and reserved time for participants

Day 2

Time	Task	Outcome
13:30	1. Intro to SAX	What is SAX about?
13:45	2. SAX representation	How can I reduce the dimension of a time series?
14:30	3. SAX for time series clustering	How can I use SAX for time series clustering?
15:00	4. SAX for time series classification	How can I use SAX for time series classification?
15:30	5. Afternoon Coffee	Break
16:00	6. Intro to matrix profile	What is matrix profile about?
16:30	7.Matrix profile for pattern discovery	How can I discover motifs and discords in a time series? Are those discords anomalies?
17:00	8. Other data mining tools	What are shapelets and how can I discover them in a time series? How can I make clustering of multiple time series?
17:30	9. Revision	Q&A and reserved time for participants

Registration Information

Members of the WA Branch of SSA will have priority access to registration for one-week before opening to participants outside of WA. Please contact ssa.wa.secretary@gmail.com for your registration code.

This workshop will be conducted via Zoom and Slack. An invitation to the Slack workspace will be sent to participants a few days prior.

Statistical Society of Australia (SSA)

PO Box 213

Belconnen ACT 2616 Australia

02 6251 3647

www.statsoc.org.au

ABN 82 853 491 081

Please direct enquiries to:

the SSA Team via email at

contact@statsoc.org.au

@StatSocAus

Privacy Security Sitemap

Website by Converge Design