by Guangming Lang
1 min read


  • r

When doing descriptive analysis, we often want to see how each variable is distributed. For a continuous variable, this means we want to plot its histogram and check its normality (checking normality of the response variable is necessary when you plan to use linear regression in your inference or predictive analysis). Here’re two small functions that allow you to do that on the fly. They don’t give you dazzling plots for final presentation, however, they do give you informative plots that allow you to see if a variable needs any transformation.

plot_hist = function(x, xlabel) {
        ## @x: a numeric vector
        ## @xlabel: a string
        # draw histogram
        hist(x, main = mtext(bquote(
                paste(bolditalic("median") == .(round(median(x),1)), " (",
                      bolditalic("mean") == .(round(mean(x),1)), ")")
                )), xlab = xlabel, xlim=range(x), prob=TRUE)
        # add a rug to histogram

        # draw red vertical line at median
        abline(v = median(x), col = "red", lwd=2)  
        # draw density curve over the histogram

check_normality = function(x) {
	qqnorm(x, pch=19)
	qqline(x, col=2)

Here’s a demo plot by plot_hist():

plot_hist(mtcars$hp, "horse power")


In this case, horse power is right skewed. As we can see, plot_hist() overlays a density plot on top of the histogram and draws a red vertical line at the median. I found adding these two things is much more helpful than the default histogram generated by hist().

Here’s a demo plot by check_normality():



It confirms horse power is not normal. To make it approximately normal, we may do a log transformation since it’s right skewed.