Previously, I introduced the ezplot package and demoed how to use it to easily make nice looking ggplot2 barcharts and boxplots. In this post, I’ll discuss two common plot types that are used for displaying distributions of numeric variables, namely, the histogram and the density plot. By the end of this tutorial, you’ll learn how to make sophisticated ggplot2 histograms and density plots using ezplot. And once again, you’ll be amazed how simple and intuitive it is. Let’s get started.
- Install a set of development tools
- Install devtools by running
Install and Load ezplot
We’ll use the famous iris dataset, which comes with the base R distribution.
First, we pass iris (note: iris is a data frame) into the function mk_distplot() to output a function that we can use to draw histograms or density plots for any numeric variables in iris.
If you haven’t noticed, all the “mk_” functions in ezplot has one and only one input parameter, namely, a data frame. And they all output functions that can be called to make plots by passing in variables (surrounded by quotations) in the data frame. This design came from the simple idea that functions can return functions instead of values.
Next, we use the function
plt() to draw histogram for Sepal.Length.
We can also draw density plots instead of histograms.
Now, the iris data has a variable called “Species”. Wouldn’t it be nice if we can see how Sepal Length is distributed by different Species? It turns out that this is really easy to do with ezplot. We just need to pass in one more parameter to plt().
I created ezplot out of the frustration that there are too many detailed commands to remember when customizing a ggplot. I’d love to hear how ezplot has improved your productivity. In addition, I’m writing a book called ezplot: How to Easily Make ggplot2 Graphics for Data Analysis, and it is 20% complete. Take a sneak peek and get notified when the book is published.