Menu
Log in


Sharpening the BLADE - Missing Data Imputation using Supervised Machine Learning

  • 28 Jul 2020
  • 6:00 PM - 7:00 PM (UTC+10:00)
  • Virtual via Zoom

Times: 

From 6pm and concluding by 7pm: Presentation on Zoom

Virtual pre-drinks and nibbles are provided, but they don't taste as good as the real thing!


Register in advance for this meeting

Register in advance for this meeting: https://anu.zoom.us/meeting/register/tJEtfu-vrjsiE9Q4yLGZo-a6pGJWmD3gYXEH. After registering, you will receive a confirmation email containing information about joining the meeting.

Any questions, please feel free to contact ssacanberra@gmail.com  


SpeakerMarcus Suresh (Department of Industry, Science, Energy and Resources)


Topic: Sharpening the BLADE - Missing Data Imputation using Supervised Machine Learning


Abstract:  Incomplete data are quite common which can deteriorate statistical inference, often affecting evidence-based policymaking. A typical example is the Business Longitudinal Analysis Data Environment (BLADE), an Australian Government’s national data asset. In this paper, motivated by helping BLADE practitioners select and implement advanced imputation methods with a solid understanding of the impact different methods will have on data accuracy and reliability, we implement and examine performance of data imputation techniques based on 12 machine learning algorithms. They range from linear regression to neural networks. We compare the performance of these algorithms and assess the impact of various settings, including the number of input features and the length of time spans. To examine generalisability, we also impute two features with distinct characteristics. Experimental results show that three ensemble algorithms: extra trees regressor, bagging regressor and random forest consistently maintain high imputation performance over the benchmark linear regression across a range of performance metrics. Among them, we would recommend the extra trees regressor for its accuracy and computational efficiency.

Link to paper: https://link.springer.com/chapter/10.1007/978-3-030-35288-2_18


Biography:  Marcus is a Data Scientists and Economist in the Analysis and Insights Division(AID) at the Department of Industry, Science, Energy and Resources (DISER) and a former Visiting Scientist at CSIRO - Data61. He specialises in applying data science techniques to structured and unstructured data to support the advancement of public policy at DISER.


Marcus has a wealth of experience across several Commonwealth Government agencies. He started his career as an Economist at the Commonwealth Treasury where he worked on Financial Market and Taxation policy before joining the Department of Education and Training where he provided economic advice to support the then Government’s Higher Education Reform Bill. Marcus was seconded to the Department of Prime Minister and Cabinet’s, Behavioural Economics Team of the Australian Government (BETA) and co-authored a randomised control trial with the ATO to investigate the effects of behavioural treatments at driving improved compliance with the Deferred GST Scheme.


Marcus is a Master of Data Science candidate at the University of Sydney and holds a Master of Public Policy (Economic Policy) from the ANU and Bachelor of Economics(Hons) and Commerce from Murdoch University. His research interests are in computer vision and natural language processing.


Website link: 

https://statsoc.org.au/Canberra-Branch-meetings

https://www.meetup.com/CanberraDataSci/events/266992810/


Powered by Wild Apricot Membership Software