Dr Emi Tanaka* from Monash University gave a talk to members of the South Australian Branch on Programming paradigms x Statistical software design.
Emi started off by illustrating 5 computer programming paradigms in terms of drawing faces which was a very useful analogy.
Imperative programming is when your code to draw a face is lines of code that instruct the computer to do something. Functional programming where you gather your code together and generalize it in a ‘function’, so you can repeatedly apply the same code but with different attributes. Syntactic sugar where functions are designed to make it easier to express for humans- this may be as simple as giving a function a sensible name representing what it actually does, for example instead of calling it “face1” it could be called “face_angry”. Rethinking functions arguments -separating the parts of the function into sensible parts. In terms of the face, instead of having a function that creates the whole face, breaking it down into face parts (eyes, mouth, face shape)- each of which can be altered individually and also added to eg. Adding a mole or eyebrows. Object -oriented programming where the previous arguments of function are now objects. Now anyone can add a new object to add to the available options.
Emi recognized that software implements a mix of paradigms- and illustrated how they related to statistical programming. She looked at the ‘Grammar of graphics’ and illustrated with ggplot (a function from the R package ggplot2) which uses object-oriented programming style. ggplot follows the equifinality principal where there is more than one approach to the same thing. This allows users which may have different mental models to approach the same graph in different ways. Emi gave examples using a data set of the agridat R package, where the plot can show either infection rate or treatment, how easy it is to add captions, titles labels, change colours. ggplot allows the user to draw publication ready graphics.
Next was the ‘Grammar of data manipulations’ illustrated with the R library dplyr which combines element of syntactic sugar but has the disadvantage that the user may not understand the nuances of what is happening. dplyr is essentially a pipeline – consistent in terms of input and output, where both are ‘data.frames’.
Emi finished with the ‘Grammar of experimental design’ and touched briefly on some standard experimental design- from completely randomized design to split-plot designs. For the R package there is a CRAN task view of design of experiments with 112 R-packages, with the top downloaded packages in 2020 being AlgDesign and agricolae. Emi noted that Python another popular software language doesn’t have a lot of experimental design tools- R is really the best for experimental design and has the latest tools available in this space. In the grammar of experimental design space Emi has been developing her own package in R called edibble. Emi had taken the 3 components of experimental designs: Experimental units, treatments, allocations to treatments along with potential constraints for example blocks and created an interactive approach to generating an experimental design using ‘edibble’ which maps the 3 components using a sequential pipeline with ‘syntactic sugar’, complementing other experimental design tools.
Emi has made her slides available, and they can be found at: emitanaka.org/slides/SSA-SA-2021
Emi is hoping to expand to designs for clinical trials next year in collaboration with Andrew Forbes- so watch this space!
On behalf of those attending the talk – thanks Emi for an insightful look into programming and your new R package.
Helena Oakey
*Dr. Emi Tanaka is a lecturer in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research areas include data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and is an avid programmer in R, HTML/CSS and other computational languages.