The simplest data structure in R is the vector. A vector is one dimensional and can be imagined as a sequence of blocks containing values:
| v1 | v2 | ... |
A R vector can have any length. Its elements must have the same data type. (We’ll see later that each element is itself a length-1 vector.) There are four common types: logical, integer, double (also called numeric), and character. A vector is numeric if and only if its elements are doubles. Similarly, a logical vector has
FALSE as its elements. An integer vector contains only integers, and a character vector has only strings. Given a vector
x, we can call
typeof(x) to find its type.
There’s an empty vector for each type, with the following syntax:
An empty vector has 0 elements and 0 length.
Unlike most other programming languages, R doesn’t have scalar types or values (or as I like to call them, singletons). What appear as singletons are really just vectors of length one. For example, literals like
"R is awesome" are vectors of length 1, and each has a different type. Constants like
NA_character_ are also vectors of length 1, where
NA has type logical even though it’s not written explicitly like the others.
Length-n vectors, n > 1
The syntax for a vector with at least 2 values is
c(v1, v2, ..., vn). (Now we know each value
v is itself a length-1 vector, we’ll stop repeating this and simply treat them as if they are single atomic values.) We can make a vector with
c(e1, ..., en) where each expression1
e is evaluated to a value. In practice, it’s more common to make a vector with
c(e1, e2), called “
e1 combined with
e1 evaluates to a “vector of type
e2 evaluates to another “vector of type
t.” The result is a new vector that starts with the elements in
e1 followed by the elements in
e1 evaluates to a single value, borrowing the word “cons” from FP (Functional Programming),
c(e1, e2) can also be called “
e1 consed onto
e2.” The result is then a new vector that starts with the value of
e1 followed by the elements in
How to use vectors
One goal of this RFP (R Functional Programming) series is to learn the fundamental ideas of functional programming using R2. These ideas are very powerful, and the first one we’ll look at is the emphasis on recursion. As we’ll see, because of recursion, all we need, when working with vectors, are three basic operations:
- Check if a vector is empty.
- Get the first element of a vector, raising an exception if the vector is empty.
- Get the tail of a vector without its first element, raising an exception if the vector is empty.
And we can solve almost all problems that involve one or more vectors. But R doesn’t provide these basic operations perfectly out of the box. Instead, we have to write our own functions for them.
tl(), we can use them inside of recursive functions to perform complex operations on vectors. For example, we can sum up all values in an integer or numeric vector.
We can also count from n down to 0 and return a vector with integer elements of n, n-1, …, 0.
countdown() are recursive functions. A recursive function (or recursion) has a base case and a recursive case. For example, in
sum_vec(), the base case is when the input vector is empty, and the result is just 0. When the vector is not empty, we enter the recursive case and get the result by adding its first element and the result of calling
sum_vec() on its tail, which is also a vector. In
countdown(), the base case is when n is 0, and the result is an empty integer vector. We enter the recursive case as long as n > 0, and get the result by consing n onto the result of calling
countdown() on n-1. In general, when thinking about recursion, we want to reason as follows:
- What’s the base case? What should the result be under the base case?
- What’s the recursive case? How can the result be expressed in terms of the result for the sub-problem (for example, the rest of the vector or n-1).
It is not a coincidence that we’ve written both
countdown() recursively. From the FP perspective, recursion is almost always THE approach when processing or building vectors because a vector can grow or shrink and its length isn’t needed for recursion to work. The alternative approach of using loops and assignment statements is inferior and discouraged. To learn why, google “loops discouraged in functional programming.” There are many good and thorough discussions about this topic on the internet. Here I’ll just give a superficial but important reason: it takes more lines to write the same
sum_vec() using a while-loop. We’ll also need extra things that recursion doesn’t, namely, local variables and assignment statements.