If you want to deal with categorical variables in R, you need to use a data structure called factor. A factor is just a numeric vector with a special attribute called levels. You can think of the levels as labels for the values. Given a character vector, you can turn it into a factor using the factor() function, for example,
We can subset a factor, and depending on how you want it, you can get a sub-factor that either preserves the original levels or has simplified levels that only appear in the sub-factor. Continue with the example, say we want to subset the first 4 letters.
We can re-order the values of a factor. For example, we can reverse the order of the values in z.
Note the values are reversed to “4 3 2 1” from the original “1 2 3 4”, while the labels remain the same order.
We can also re-order the levels of a factor. For example, we can reverse the order of the levels in z.
Note the labels are reversed to “d, c, b, a” from the original “a, b, c, d”, while the values remain the same order.
Now let’s make some fake numeric data for the levels of z and make a ggplot2 bar chart.
It’d be nice if we order the bars from tallest to shortest. To do that, we can make a factor specifying its levels to be the categories corresponding to the descending order of val.
Alternatively, we can use reorder() and rank() to merely reorder the levels of cat by the descending order of val.
Notice that rank() returns the ranking order of each value in its input vector, whereas order() returns the indices that would put its input vector in order.