The April 2021 meeting of the New South Wales branch was co-hosted by the Sydney section of the R Ladies group. Jenny Sloane gave a short presentation about R Ladies, both at the international and local levels, and explained how it promotes gender diversity within the R computing community - and also how it helps her as a researcher and data analyst.
We then had a seminar by Dr Gordana Popovic who is a research fellow and statistical consultant at the University of New South Wales. Gordana's talk was titled "Carrots are good for vision. Models are good for visualising discrete data". We didn't hear much about carrots after that, but penguins and spiders came up quite a bit. The animal theme was in keeping with Gordana's membership of the Ecological Statistics Research Group at her university. The commonly used pairwise scatterplot or visualisation of multivariate data was likened to drawings of the top and side views of an object commonly used in high school technical drawing courses.
Improved visualisation of the object is, of course, achieved by viewing it from several angles. Gordana likened this to flying in a drone around a multi-dimensional point cloud. The visualisation term for this is "tour" and the R package named "tourr" was mentioned due to it supporting drone-type views of multivariate data controlled by a computer mouse. A data concerning features of penguins collected at Palmer Station, a research station in Antartica, was used to illustrate problems with pairwise scatterplots and tours. This is because of factors such as discreteness of the observations and counts having very many zeroes. A common quick and simple remedy is to jitter the data but this has shortcomings such as not preserving ordering.
Speaker Popovic then moved onto her central theme: using models to help with visualisation. A key principle for this is Dunn-Smyth residuals, which were cooked up at the University of Queensland in the 1990s by Peter Dunn and Gordon Smyth and published in a very well-cited 1996 Journal of Computational and Graphical Statistics paper. Gordana explained how Dunn-Smyth principle is like jittering but preserves ordering. The upshot is much more useful pairwise scatterplots and tours. Similar illustrations were made for spiders and interpretations of the fondness for leaves by some spider species was made possible by the new methodology.
An R package by Gordana and colleagues, named "ecoCopula", was advertised as a way for R Ladies and gentlemen to do it themselves. There was a brief technical part concerning copulas as a latent variable model and biplot concepts. Then there was final theme concerning ordinal data and how the ideas translated well to their visualisation as well.
Matt Wand
University of Technology Sydney