Skip to contents

Options for Importing Categorical Variables

When importing data from REDCap using read_redcap(), you have several options for handling coded categorical variables. These options determine how the coded values are represented in your R environment.

For this vignette, we will be using a sample classic project with a form that comprises most common REDCap data types.

library(REDCapTidieR)
token <- "123456789ABCDEF123456789ABCDEF04"
redcap_uri <- "https://my.institution.edu/redcap/api/"

Using raw_or_label = "raw" retrieves the raw coded values for categorical variables. This approach preserves the original coding, but you’ll need to separately reference the data dictionary from REDCap to interpret the meaning of each code.

redcap_form <-
  read_redcap(
    redcap_uri,
    token,
    raw_or_label = "raw"
  ) |>
  extract_tibble("labelled_vignette")

redcap_form
#> # A tibble: 4 × 7
#>   record_id text_box_1 radio_buttons_1 checkbox___1 checkbox___2 checkbox___3
#>       <dbl> <chr>                <dbl>        <dbl>        <dbl>        <dbl>
#> 1         1 Record 1                 1            1            0            0
#> 2         2 Record 2                 2            1            1            0
#> 3         3 Record 3                 3            0            1            1
#> 4         4 Record 4                 1            0            0            0
#> # ℹ 1 more variable: form_status_complete <dbl>

The default option, raw_or_label = "label", replaces each code with its corresponding label and converts categorical variables into factors. This is convenient for analysis but discards the original numeric codes, which may be necessary for tasks like data cleaning or re-exporting to other formats (e.g., Stata or SPSS).

redcap_form <-
  read_redcap(
    redcap_uri,
    token,
    raw_or_label = "label"
  ) |>
  extract_tibble("labelled_vignette")

redcap_form
#> # A tibble: 4 × 7
#>   record_id text_box_1 radio_buttons_1 checkbox___1 checkbox___2 checkbox___3
#>       <dbl> <chr>      <fct>           <lgl>        <lgl>        <lgl>       
#> 1         1 Record 1   A               TRUE         FALSE        FALSE       
#> 2         2 Record 2   B               TRUE         TRUE         FALSE       
#> 3         3 Record 3   C               FALSE        TRUE         TRUE        
#> 4         4 Record 4   A               FALSE        FALSE        FALSE       
#> # ℹ 1 more variable: form_status_complete <fct>

A third option, raw_or_label = "haven_labelled", imports categorical variables as labelled vectors using the “haven_labelled” class from the haven package (cf. vignette("semantics", package = "haven")). This method imports your categorical variables using their original coding and attaches the corresponding value labels to them as metadata.

redcap_form <-
  read_redcap(
    redcap_uri,
    token,
    raw_or_label = "haven"
  ) |>
  extract_tibble("labelled_vignette")

redcap_form
#> # A tibble: 4 × 7
#>   record_id text_box_1 radio_buttons_1 checkbox___1 checkbox___2 checkbox___3
#>       <dbl> <chr>      <dbl+lbl>       <lgl>        <lgl>        <lgl>       
#> 1         1 Record 1   1 [A]           TRUE         FALSE        FALSE       
#> 2         2 Record 2   2 [B]           TRUE         TRUE         FALSE       
#> 3         3 Record 3   3 [C]           FALSE        TRUE         TRUE        
#> 4         4 Record 4   1 [A]           FALSE        FALSE        FALSE       
#> # ℹ 1 more variable: form_status_complete <dbl+lbl>

Pros & Cons of Labelled Vectors

The "haven_labelled" class was originally developed to import data from statistical software like SPSS, Stata, or SAS, which use value labels for categorical variables. This format allows you store both the original coding and the labels attached to each value.

Advantages

  • Preservation of Original Coding: Both numeric codes and labels are retained, which is useful for data cleaning and re-exporting.
  • Metadata Management: The labelled package offers functions to manage value labels effectively.

You can manipulate value labels using functions such as:

Additionally, you can search through variables or generate a variable dictionary with labelled::look_for() (cf. vignette("look_for", package = "labelled")):

library(labelled)
redcap_form |>
  look_for()
#>  pos variable             label col_type missing values        
#>  1   record_id            —     dbl      0                     
#>  2   text_box_1           —     chr      0                     
#>  3   radio_buttons_1      —     dbl+lbl  0       [1] A         
#>                                                  [2] B         
#>                                                  [3] C         
#>  4   checkbox___1         —     lgl      0                     
#>  5   checkbox___2         —     lgl      0                     
#>  6   checkbox___3         —     lgl      0                     
#>  7   form_status_complete —     dbl+lbl  0       [0] Incomplete
#>                                                  [1] Unverified
#>                                                  [2] Complete

Disadvantages

Labelled vectors are not optimized for data analysis tasks like descriptive statistics, plotting, or modeling. For these purposes, categorical variables should be converted to factors or numeric vectors.

labelled Approaches
labelled Approaches

Approach A: Convert haven_labelled vectors to factors or numeric/character vectors just after import using functions like labelled::unlabelled(), labelled::to_factor(), or unclass(). Proceed with data cleaning, recoding, and analysis using standard R vector types.

Approach B: Retain haven_labelled vectors for data cleaning and coding to preserve original labels, especially if you plan to re-export the data. Use labelled functions to manage value labels, but convert the vectors to factors or numeric types before performing analysis or modeling.

Managing Variable Labels

It’s important to distinguish between value labels and variable labels:

  • Value Labels: Describe the meaning of specific values within a vector and change the vector’s class to "haven_labelled".
  • Variable Labels: Provide a textual description of the entire variable without altering its class.

The labelled package offers functions to handle variable labels, such as:

Using REDCapTidieR::make_labelled() allows you to add variable labels to data frames exported from REDCap:

redcap_form <-
  read_redcap(
    redcap_uri,
    token,
    raw_or_label = "haven"
  ) |>
  make_labelled() |>
  extract_tibble("labelled_vignette")

redcap_form |>
  look_for()
#>  pos variable             label                col_type missing values        
#>  1   record_id            Record ID            dbl      0                     
#>  2   text_box_1           Text Box             chr      0                     
#>  3   radio_buttons_1      Radio Buttons        dbl+lbl  0       [1] A         
#>                                                                 [2] B         
#>                                                                 [3] C         
#>  4   checkbox___1         Checkbox: A          lgl      0                     
#>  5   checkbox___2         Checkbox: B          lgl      0                     
#>  6   checkbox___3         Checkbox: C          lgl      0                     
#>  7   form_status_complete REDCap Instrument C~ dbl+lbl  0       [0] Incomplete
#>                                                                 [1] Unverified
#>                                                                 [2] Complete

This ensures that your data not only retains value labels but also includes descriptive labels for each variable, enhancing the readability and usability of your dataset.