REDCapTidieR 0.2.0 ☃️

REDCap

REDCapTidieR

REDCapTidieR v0.2.0 release announcement and change descriptions.

Author

Stephan Kadauke, Ezra Porter, Richard Hanna

Published

December 6, 2022

We’re thrilled to announce the release of REDCapTidieR v0.2.0 on CRAN! REDCapTidieR provides a user-friendly way to import data from a REDCap project into the R environment. You can install the current version from CRAN using install.packages("REDCapTidieR").

New names for functions

We have decided to rename some of the exported functions of our package. Here is what changed:

read_redcap_tidy() is now read_redcap()
bind_tables() is now bind_tibbles()
extract_table() is now extract_tibble()
extract_tables() is now extract_tibbles()

We have also cleaned up the API to make the function arguments more consistent.

We are handling each of these name changes with deprecation functions of the lifecycle package. This means that you can still use the original function names, but you will receive an annoying alert to switch to the new ones each time you try to use them. At some point in the future we may remove the deprecated functions from the package.

Why on earth would you do that?!

We want to design our API so it’s easy to learn and use.

The main function of the package is read_redcap(), and its name explains exactly what it does: it reads or imports data from REDCap. The fact that the object that this function returns (i.e. the supertibble) is tidy is a technical detail.

We replaced “table” with “tibble” in the data tibble extraction functions because “tibble” is a more precise term than “table.” Also, this way we are being more consistent in our use of REDCapTidieR vocabulary such as “supertibble” and “data tibble”.

We also thought about the names in terms of teachability. When you explain how to use the package to someone for the first time, how would you describe the action of each of the functions? For example, how do you read data from REDCap? With read_redcap(). How do you extract a data tibble from the supertibble? With extract_tibble(). Or if you’d rather bind the tibbles to your environment, use bind_tibbles().

Why now?

REDCapTidieR is still very new and there doesn’t yet exist much code that depends on it (we hope this will change!). The cost of fixing things is low now compared to later on.

Improved Documentation

We’ve been hard at work writing up extensive documentation to support new users in adopting REDCapTidieR. The Getting Started vignette walks new users through importing data from REDCap into a supertibble, exploring the contents of the supertibble in the RStudio Data Viewer, extracting data tibbles, and adding variable labels.

The Diving Deeper vignette explains in detail how REDCapTidieR constructs data tibbles.

We also created a comprehensive Glossary of REDCap and REDCapTidieR terms to which we will link frequently.

Hello metadata!

A key change to REDCapTidieR in v0.2.0 is that the supertibble now includes a lot of additional instrument-level metadata.

library(REDCapTidieR)

superheroes <- read_redcap(redcap_uri, token)

superheroes |>
  rmarkdown::paged_table()

REDCapTidieR 0.2.0 provides the following new columns:

redcap_form_label: a human-readable label for the instrument
redcap_metadata: The metadata tibble with instrument-level metadata derived from REDCapR::redcap_metadata_read()
redcap_events: Events and arms associated with this instrument (for longitudinal projects only)
data_rows, data_cols: Row and column counts of the data tibble (redcap_data)
data_size: Size of the redcap_data tibble in memory
data_na_pct: The percentage of missing data in the redcap_data tibble

REDCapTidieR ❤️ labelled

The labelled R package provides functions to attach a human-readable description (a label) to a variable (a variable label). Variable labels are awesome and we think you will find them useful! Take a look at Shannon Pileggi’s blog post The case for variable labels in R to find out more.

REDCapTidieR now provides make_labelled(), a function that attaches variable labels to the supertibble and the tibbles it contains:

superheroes_labelled <- superheroes |>
  make_labelled()

In the RStudio Data Viewer, variable labels appear below each column name. This makes it easy to inspect the contents of the supertibble:

Labelled supertibble

While the variable labels in the supertibble are static and pre-defined, variable labels in the data tibbles (redcap_data) are derived from the REDCap field labels. A field label is a piece of text that prompts the REDCap user during data entry. We repurpose field labels to provide a description for the variable:

Labelled tibble

New arguments for `read_redcap()`

We have introduced two new arguments that can be passed to the read_redcap() function:

forms
export_survey_fields

Retrieve data from a subset of instruments

You can now import data from specific instruments from your project instead of importing the entire dataset. This can be useful for very large projects.

# Only import the super_hero_powers instrument
superheroes_powers <- read_redcap(
  redcap_uri,
  token,
  forms = "super_hero_powers"
)

superheroes_powers |>
  rmarkdown::paged_table()

Support for REDCap surveys

REDCap now supports surveys, which are a special type of instrument that can be filled out by someone who isn’t a registered user of the REDCap project. Instruments that are used as surveys generate additional data columns:

redcap_survey_timestamp: the time at which the survey was completed
redcap_survey_identifier: the participant identifier (this will be NA if the Participant Identifier feature in REDCap is disabled such as for an anonymous survey)

By default, read_redcap() will now return these columns if the instrument is set up as a survey. Note the redcap_survey_identifier and redcap_survey_timestamp columns below:

survey_database <- read_redcap(redcap_uri, survey_token)

survey_database |>
  extract_tibble("survey") |> 
  dplyr::glimpse()

Rows: 4
Columns: 9
$ record_id                <dbl> 1, 2, 3, 4
$ survey_yesno             <lgl> TRUE, FALSE, NA, NA
$ survey_radio             <fct> Choice 1, Choice 2, NA, NA
$ survey_checkbox___one    <lgl> FALSE, FALSE, FALSE, FALSE
$ survey_checkbox___two    <lgl> TRUE, TRUE, FALSE, FALSE
$ survey_checkbox___three  <lgl> TRUE, TRUE, FALSE, FALSE
$ redcap_survey_identifier <lgl> NA, NA, NA, NA
$ redcap_survey_timestamp  <dttm> 2022-11-09 10:33:35, NA, NA, NA
$ form_status_complete     <fct> Complete, Incomplete, Incomplete, Incomplete

Performance improvements and enhancements

Improved execution time by >2.5X by optimizing internal functions
Added many helpful warnings and error messages, using cli for pretty printing
Tests and vignettes now use httptest to mock and cache REDCap API calls
Implemented GitHub Actions lint check
Replaced deprecated .data pronoun in tidyselect expressions

Bug fixes

Fixed a bug in which similarly named variables could be duplicated under some circumstances
Order of instruments in the supertibble is now the same as the order of instruments in REDCap
Fixed an issue in which extract_* functions under some circumstances returned NULL instead of the expected tibbles