An Easy Way to Make Ggplot2 Histograms and Density Plots, ezplot - Part 3

Master R

By Guangming Lang Comment

Previously, I introduced the ezplot package and demoed how to use it to easily make nice looking ggplot2 barcharts and boxplots. In this post, I’ll discuss two common plot types that are used for displaying distributions of numeric variables, namely, the histogram and the density plot. By the end of this tutorial, you’ll learn how to make sophisticated ggplot2 histograms and density plots using ezplot. And once again, you’ll be amazed how simple and intuitive it is. Let’s get started.

Prerequisites

  1. Install a set of development tools
    • On Windows, download and install Rtools.
    • On Mac, install the Xcode command line tools.
    • On Linux, install the R development package, usually called r-devel or r-base-dev.
  2. Install devtools by running install.packages("devtools") in R.

Install and Load ezplot

devtools::install_github("gmlang/ezplot")
library(ezplot)

We’ll use the famous iris dataset, which comes with the base R distribution.

First, we pass iris (note: iris is a data frame) into the function mk_distplot() to output a function that we can use to draw histograms or density plots for any numeric variables in iris.

plt = mk_distplot(iris)

If you haven’t noticed, all the “mk_” functions in ezplot has one and only one input parameter, namely, a data frame. And they all output functions that can be called to make plots by passing in variables (surrounded by quotations) in the data frame. This design came from the simple idea that functions can return functions instead of values.

Next, we use the function plt() to draw histogram for Sepal.Length.

# plot histogram for Sepal.Length
title1 = "Histogram of Sepal Length"
p = plt("Sepal.Length", main=title1)
print(p)

center

# adjust bin width
p = plt("Sepal.Length", main=title1, binw=0.3)
print(p)

center

# add a vertical line at the mean
p = plt("Sepal.Length", main=title1, binw=0.3, add_vline_mean=T)
print(p)

center

# add a vertical line at the median
p = plt("Sepal.Length", main=title1, binw=0.3, add_vline_median=T)
print(p)

center

# add both vertical lines at the mean and the median respectively
p = plt("Sepal.Length", binw=0.3, add_vline_mean=T, add_vline_median=T)
print(p)

center

We can also draw density plots instead of histograms.

# draw density plots for Sepal.Length
p = plt("Sepal.Length", type="density", main=title1)
print(p)

center

# add a vertical line at the mean
p = plt("Sepal.Length", type="density", main=title1, add_vline_mean=T)
print(p)

center

# add a vertical line at the median
p = plt("Sepal.Length", type="density", main=title1, add_vline_median=T)
print(p)

center

# add both vertical lines at the mean and the median respectively
p = plt("Sepal.Length", type="density", add_vline_median=T, add_vline_mean=T)
print(p)

center

Now, the iris data has a variable called “Species”. Wouldn’t it be nice if we can see how Sepal Length is distributed by different Species? It turns out that this is really easy to do with ezplot. We just need to pass in one more parameter to plt().

# draw histogram of Sepal.Length by Species
p = plt("Sepal.Length", fillby="Species", main=title1, binw=0.3)
print(p)

center

# add a vertical line at the mean
p = plt("Sepal.Length", fillby="Species", main=title1, binw=0.3, add_vline_mean=T)
print(p)

center

# add a vertical line at the median
p = plt("Sepal.Length", fillby="Species", main=title1, binw=0.3, add_vline_median=T)
print(p)

center

# draw density of Sepal.Length by Species
p = plt("Sepal.Length", fillby="Species", main=title1, type="density")
print(p)

center

# add a vertical line at the mean
p = plt("Sepal.Length", fillby="Species", type="density", add_vline_mean=T)
print(p)

center

# add a vertical line at the median
p = plt("Sepal.Length", fillby="Species", type="density", add_vline_median=T)
print(p)

center

I created ezplot out of the frustration that there are too many detailed commands to remember when customizing a ggplot. I’d love to hear how ezplot has improved your productivity. In addition, I’m writing a book called ezplot: How to Easily Make ggplot2 Graphics for Data Analysis, and it is 20% complete. Take a sneak peek and get notified when the book is published.

If you enjoyed this post, get updates. It's FREE