The simplest data structure in R is the vector. A vector is one dimensional and can be imagined as a sequence of blocks containing values:
| v1 | v2 | ... |
A R vector can have any length. Its elements must have the same data type. (We’ll see later that each element is itself a length-1 vector.) There are four common types: logical, integer, double (also called numeric), and character. A vector is numeric if and only if its elements are doubles. Similarly, a logical vector has TRUE
or FALSE
as its elements. An integer vector contains only integers, and a character vector has only strings. Given a vector x
, we can call typeof(x)
to find its type.
Empty vectors
There’s an empty vector for each type, with the following syntax:
logical()
integer()
numeric()
character()
An empty vector has 0 elements and 0 length.
Length-1 vectors
Unlike most other programming languages, R doesn’t have scalar types or values (or as I like to call them, singletons). What appear as singletons are really just vectors of length one. For example, literals like TRUE
, 1L
, 3.124
, "R is awesome"
are vectors of length 1, and each has a different type. Constants like NA
, NA_integer_
, NA_real_
and NA_character_
are also vectors of length 1, where NA
has type logical even though it’s not written explicitly like the others.
Length-n vectors, n > 1
The syntax for a vector with at least 2 values is c(v1, v2, ..., vn)
. (Now we know each value v
is itself a length-1 vector, we’ll stop repeating this and simply treat them as if they are single atomic values.) We can make a vector with c(e1, ..., en)
where each expression1 e
is evaluated to a value. In practice, it’s more common to make a vector with c(e1, e2)
, called “e1
combined with e2
,” where e1
evaluates to a “vector of type t
” and e2
evaluates to another “vector of type t
.” The result is a new vector that starts with the elements in e1
followed by the elements in e2
. When e1
evaluates to a single value, borrowing the word “cons” from FP (Functional Programming), c(e1, e2)
can also be called “e1
consed onto e2
.” The result is then a new vector that starts with the value of e1
followed by the elements in e2
.
How to use vectors
One goal of this RFP (R Functional Programming) series is to learn the fundamental ideas of functional programming using R2. These ideas are very powerful, and the first one we’ll look at is the emphasis on recursion. As we’ll see, because of recursion, all we need, when working with vectors, are three basic operations:
- Check if a vector is empty.
- Get the first element of a vector, raising an exception if the vector is empty.
- Get the tail of a vector without its first element, raising an exception if the vector is empty.
And we can solve almost all problems that involve one or more vectors. But R doesn’t provide these basic operations perfectly out of the box. Instead, we have to write our own functions for them.
Having defined is_empty()
, hd()
and tl()
, we can use them inside of recursive functions to perform complex operations on vectors. For example, we can sum up all values in an integer or numeric vector.
We can also count from n down to 0 and return a vector with integer elements of n, n-1, …, 0.
Both sum_vec()
and countdown()
are recursive functions. A recursive function (or recursion) has a base case and a recursive case. For example, in sum_vec()
, the base case is when the input vector is empty, and the result is just 0. When the vector is not empty, we enter the recursive case and get the result by adding its first element and the result of calling sum_vec()
on its tail, which is also a vector. In countdown()
, the base case is when n is 0, and the result is an empty integer vector. We enter the recursive case as long as n > 0, and get the result by consing n onto the result of calling countdown()
on n-1. In general, when thinking about recursion, we want to reason as follows:
- What’s the base case? What should the result be under the base case?
- What’s the recursive case? How can the result be expressed in terms of the result for the sub-problem (for example, the rest of the vector or n-1).
It is not a coincidence that we’ve written both sum_vec()
and countdown()
recursively. From the FP perspective, recursion is almost always THE approach when processing or building vectors because a vector can grow or shrink and its length isn’t needed for recursion to work. The alternative approach of using loops and assignment statements is inferior and discouraged. To learn why, google “loops discouraged in functional programming.” There are many good and thorough discussions about this topic on the internet. Here I’ll just give a superficial but important reason: it takes more lines to write the same sum_vec()
using a while-loop. We’ll also need extra things that recursion doesn’t, namely, local variables and assignment statements.