How to Find Consecutive Repeats in R

Q: Given a sequence of random numbers, find the start and end positions of runs of five or more consecutive numbers that are greater than zero.

A: Use the rle() function.

For example, let’s apply rle() to the following sequence of numbers.

seq3 = c(2,2,2,2,2,5,3,7,7,7,2,2,5,5,5,3,3,3)
( rle.seq3 = rle(seq3) )

## Run Length Encoding
##   lengths: int [1:7] 5 1 1 3 2 3 3
##   values : num [1:7] 2 5 3 7 2 5 3

We see that rle() returns a list of two elements: lengths and values, where the latter gives the unique number of each run, and the former gives the run length, i.e. the number of consecutive repeats within each run. For example, the first run is the number 2 repeated 5 times, and the second run is the number 5 repeated once.

Let’s solve the original question. First, we set the seed, generate a sequence of normal random numbers greater than zero and apply rle() to it.

set.seed(201)
rnums = rnorm(100)
runs = rle(rnums > 0)

Next, we find indices of the runs with length of at least 5.

myruns = which(runs$values == TRUE & runs$lengths >= 5)
# check if myruns has any value in it 
any(myruns)

## [1] TRUE

Next, we can do a cumulative sum of the run lengths and extract the end positions of the runs with length of at least 5 using the above found indices.

runs.lengths.cumsum = cumsum(runs$lengths)
ends = runs.lengths.cumsum[myruns]

Next, we find the start positions of these runs.

newindex = ifelse(myruns>1, myruns-1, 0)
starts = runs.lengths.cumsum[newindex] + 1
if (0 %in% newindex) starts = c(1,starts)

Lastly, we print out the start and end positions of these runs and use them to extract the runs themselves.

print(starts)

## [1] 10 68 75

print(ends)

## [1] 14 73 79

print(rnums[starts[1]:ends[1]])

## [1] 0.1890041 0.6932962 0.2238094 0.3984569 1.0134744

print(rnums[starts[2]:ends[2]])

## [1] 0.5311486 0.1588756 1.1229208 0.7904306 2.0994378 0.8786987

print(rnums[starts[3]:ends[3]])

## [1] 0.5789541 0.6795760 1.1309282 1.0107847 1.9778476

Categories