Handy R Functions for Summary Statistics

When analyzing a variable, one of the first things you want to do is to count how many non-missing (or non-NA) values the variable has. Unfortunately, there’s no default functions in R that perform this simply task. The length() function counts every element including the NAs. But it’s not hard to use it together with a if clause to handle NAs.

length2 <- function (x, na.rm=TRUE) {
        # A version of length that can handle NA: if na.rm==T, don't count them
        # 
        # Args:
        #       x    : a vector
        #       na.rm: TRUE or FALSE
        # Returns:
        #       the length of x
        if (na.rm) sum(!is.na(x))
        else       length(x)
}

The default summary() function only returns the min, 1st quantile, median, mean, 3rd quantile and max of the input vector. However, you often also want to know its non-NA value counts, standard deviation, skewness and excess kurtosis. It’d be nice if there’s one function that returns all these summary statistics. So I wrote summary2(), which does exactly that. It leverages length2(), no_na_summary(), and the skewness() and kurtosis() functions in the e1071 package.

no_na_summary = function(x, na.rm=TRUE) {
        # Removes NA in a vector and apply summary to it
        # 
        # Args:
        #       x    : a numeric vector
        #       na.rm: TRUE or FALSE
        # Returns:
        #       the summary statistics of x
        summary(x[!is.na(x)])
}

summary2 = function(x) {
        # Removes NA in a numeric vecotr and Computes some summary statistics 
        #
        # Args:
        #       x: a numeric vector
        # Returns:
        #       min, 1st quantile, median, mean, 3rd quantile, max, sd,
        #       non-NA count of x, skewness, and excess kurtosis
        funs = c(no_na_summary, sd, length2, e1071::skewness, e1071::kurtosis)
        summ.stats = unlist(lapply(funs, function(f) f(x, na.rm=TRUE)))
        names(summ.stats) = c("min", "q1", "median", "mean", "q3", "max", 
                              "sd", "n", "skewness", "excess.kurtosis")
        summ.stats
}

Use these functions and tell others how they’ve made your daily data analysis job easier.

Categories