This is something not well explained. If you go back to Statistical Models in S by Chambers and Hastie 1992, the plot appears without the square root (page 104). While the plot appears there, the purpose is not described, probably because is is "self evident" that is is to gage constancy of variance.
However, the square root makes the plot work better since it largely avoids the need to guess where the trend is for a skewed distribution - the distribution is closer to normal as can be seen empirically looking at
plot(density(sqrt(abs(rnorm(100000)))))
When the square root was introduced and by whom I have no idea, but it seems a good step.