Skip to contents

Creates a data frame that is a summary table of counts and percentages

Usage

dataSumm(var, na.rm = TRUE, sort_n = FALSE)

Arguments

var

A column selected from a tibble/data frame that is a categorical/factor variable to that to be summarized into a table.

na.rm

Logical, defaults to TRUE. Drops NA values.

sort_n

Logical, defaults to FALSE. If TRUE, sorts the data by the count of each response (n_answers). If FALSE., sorts by response.

Value

a tibble with the data in 5 columns: item, response, n_answers, percent_answers and percent_answers_label. Item is the name of the original item, Response is all of the categorical responses possible for the item. n_answers is the count of each response, percent_answers is the percentage of each response and percent_answers_label is a character variable of percentage labelled with percent sign for use as a label.

Examples

data <- dplyr::tibble(
  role = factor(c(
    "Faculty", "Postdoc", "Undergraduate student", "Graduate student",
    "Graduate student", "Postdoc", "Postdoc", "Faculty",
    "Faculty", "Graduate student", "Graduate student", "Postdoc",
    "Faculty", "Faculty", "Faculty", "Faculty", "Faculty", "Graduate student",
    "Undergraduate student", "Undergraduate student", "NA", "NA"
  ), levels = c("Undergraduate student", "Graduate student", "Postdoc","Faculty"))
)

data %>%
  dplyr::select(role) %>%
  dataSumm()
#> # A tibble: 4 × 5
#>   question response              n_answers percent_answers percent_answers_label
#>   <chr>    <fct>                     <int>           <dbl> <chr>                
#> 1 role     Undergraduate student         3            0.15 15%                  
#> 2 role     Graduate student              5            0.25 25%                  
#> 3 role     Postdoc                       4            0.2  20%                  
#> 4 role     Faculty                       8            0.4  40%                  

# Includes NA values and sorted by count of response:
data %>%
  dplyr::select(role) %>%
  dataSumm(na.rm = FALSE, sort_n = TRUE)
#> # A tibble: 5 × 5
#>   question response              n_answers percent_answers percent_answers_label
#>   <chr>    <fct>                     <int>           <dbl> <chr>                
#> 1 role     Faculty                       8          0.364  36%                  
#> 2 role     Graduate student              5          0.227  23%                  
#> 3 role     Postdoc                       4          0.182  18%                  
#> 4 role     Undergraduate student         3          0.136  14%                  
#> 5 role     NA                            2          0.0909 9%