How to easily make ggplot2 type of scatterplot, ezplot - Part 7

Master R

By Guangming Lang Comment

When you have two continuous variables and want to see their relationship, you’d draw a scatterplot by putting one variable on the x-axis and the other variable on the y-axis. In this post, we’ll look at how to draw scatterplots using the mk_scatterplot() function from the ezplot package. We’ll also use the scale_axis() function, which allows us to easily change the scale of the x-axis or y-axis. Let’s get started.

Prerequisites

  1. Install a set of development tools
    • On Windows, download and install Rtools.
    • On Mac, install the Xcode command line tools.
    • On Linux, install the R development package, usually called r-devel or r-base-dev.
  2. Install devtools by running install.packages("devtools") in R.

Install and Load ezplot

devtools::install_github("gmlang/ezplot")
library(ezplot)
# mk_scatterplot() returns a function that we can use to draw scatterplots
plt = mk_scatterplot(films)

# draw a scatterplot
p = plt("budget", "boxoffice", xlab="budget", ylab="boxoffice",
        main="Boxoffice vs. Budget")

# use comma scale so that a number like "10000" is displayed as "10,000"
p = scale_axis(p, "y", scale = "comma")
p = scale_axis(p, "x", scale = "comma")
print(p)

center

It might be more informative if we use log10 scale instead.

p = plt("budget", "boxoffice", xlab="budget", ylab="boxoffice", 
        pt_alpha=0.2, pt_size=1.5, add_line=T, linew=0.8)
p = scale_axis(p, "x", scale = "log10")
p = scale_axis(p, "y", scale = "log10")
print(p)

center

Note we also changed the transparency and size of the points by passing values to the pt_alpha and pt_size arguments. And we added a regression line by setting add_line=T and gave it a narrower width by setting linew=0.8.

Finally, the dataset contains a variable called “made_money”, indicating if a film made money or not. We can use it to separate the points into two groups, where each group has its own regression line.

p = plt("budget", "boxoffice", fillby="made_money", ylab="boxoffice", 
        xlab="budget", add_line=T, linew=0.5)
p = scale_axis(p, "x", scale = "log10")
p = scale_axis(p, "y", scale = "log10")

# use color-blind friendly color
red = cb_color("reddish_purple")
green = cb_color("bluish_green")
p + ggplot2::scale_color_manual(values=c(red, green))

center

See how easy it is? All we need is to pass the name of the grouping variable, “made_money”, to the fillby argument.

I created ezplot because there are too many detailed commands to remember when making and customizing a ggplot. If ezplot has improved your productivity, please tell your friends about it. In addition, I’m writing a book called ezplot: How to Easily Make ggplot2 Graphics for Data Analysis, and it is 20% complete. Read the sample chapters for FREE and get notified when the book is published.

If you enjoyed this post, get updates. It's FREE