How to visualize the distribution of a continuous variable, ezplot - Part 5

Master R

By Guangming Lang Comment

In one of my earlier posts, I showed you how easy it is to make publish-ready histograms and boxplots using the ezplot package. When we’re analyzing a continuous variable, we often also want to know if it’s normally distributed. For example, if it’s normally distributed, we may use linear regression to model it. A good way to check if a continuous variable is normal is to look at its Q-Q plot. Now, there’s an ezplot function that allows you to display the histogram, boxplot, density plot and Q-Q plot all in one figure. Once again, it’s super easy to use and it’s really handy for exploratory analysis. Let me show you how.

Prerequisites

  1. Install a set of development tools
    • On Windows, download and install Rtools.
    • On Mac, install the Xcode command line tools.
    • On Linux, install the R development package, usually called r-devel or r-base-dev.
  2. Install devtools by running install.packages("devtools") in R.

Install and Load ezplot

devtools::install_github("gmlang/ezplot")
library(ezplot)

We’ll use the cars dataset, which comes with the base R distribution. It has two variables, speed and dist. Both are continuous.

str(cars)
## 'data.frame':	50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

We’ll plot dist first. The Q-Q plot shows that it isn’t normally distributed. A normal variable would have have most of the blue dots aligned linearly along the 45 degree diagonal line connecting the bottom left corner to the upper right corner.

f = plt_dist(cars)
f("dist")

center

We’ll plot speed next. We see speed is more or less normally distributed.

f("speed")

center

I created ezplot out of the frustration that there are too many detailed commands to remember when customizing a ggplot. If ezplot has improved your productivity, please tell your friends about it. In addition, I’m writing a book called ezplot: How to Easily Make ggplot2 Graphics for Data Analysis, and it is 20% complete. Read the sample chapters for FREE and get notified when the book is published.

If you enjoyed this post, get updates. It's FREE