Glossary of REDCap and REDCapTidieR TermsSource:
The REDCapTidieR package uses vocabulary that is standard for REDCap database architects but not necessarily well known to all R users. It also introduces several idiosyncratic terms.
Below we provide a rough mapping of REDCap concepts to their corresponding artifacts in REDCapTidieR. This is followed by a listing of term definitions.
Data is in the data tibble
|Field||Data column (a column of the data tibble)|
|Field name||Variable name of a data column|
|Field type||Data type of a data column|
|Data Access Group|
A rectangular data structure (matrix) that is constructed from multiple smaller rectangular data structures (blocks). In the context of REDCap, the block matrix is the rectangular data set that contains data from multiple instruments returned by the REDCap API. ↩︎
A primary key is a column in a table that is distinct in each row and serves to identify each row. A composite primary key is a primary key that consists of multiple columns that in combination are distinct in each row and serve to identify each row. Taken together, the identifier columns of the data tibble form a composite primary key. This makes it easy to join data tibbles together. ↩︎
An option or category defined in the context of a single-answer or multi-answer categorical field type in REDCap. You can define choices using the REDCap Field Editor. Choices have a raw value (a unique identifier - usually a serial number but this can be changed) and a choice label (a human readable description of the choice, which is displayed during data entry).
In the context of REDCapTidieR, choices come into play in two scenarios during the construction of the data tibble. Choice labels of single-answer type fields (dropdown and radio) are used to define the values of data columns that are derived from those fields. Raw values of the multi-answer checkbox field are used to construct the names of data columns derived from them. ↩︎
Also known as a traditional project, this the simplest type of REDCap project. You can define one or multiple instruments (also called forms) for data entry. Both repeating and nonrepeating instruments are allowed. Nonrepeating instruments are completed only once for each record. For nonrepeating instruments, one row of data in the data tibble represents one record. Repeating instruments can be completed an arbitrary number of times for each record. For repeating instruments, one row of data in the data tibble represents one repeat instance of one record. See also: Longitudinal project. ↩︎
The Data Access Group (DAG) feature of REDCap streamlines multi-group collaboration by partitioning groups of records of a single project. This feature is particularly useful when you want certain users or groups of users to only have access to a specific subset of the data in a project.
In a multi-site study, for instance, you might want each site to only have access to their own data. By setting up a DAG for each site, you can ensure that site users can only view and edit records that belong to their DAG. Super users (i.e., those with full privileges) can view and edit all records in the project, regardless of the DAG to which they belong.
A tibble that contains data that were entered
into the fields of a specific REDCap instrument. The
of the supertibble contains the data tibbles
of a project. The columns of the data tibble
include identifier columns that jointly
identify each row and data columns that
contain data that was entered into REDCap. REDCapTidieR provides several
functions to extract data tibbles from the supertibble. See also: Metadata tibble.
A part of the RStudio IDE functionality that allows you to inspect data frames, tibbles, and some other data structures. It includes features to perform basic exploratory data analysis such as sorting, filtering, and searching. The supertibble is designed to work well with the data viewer. ↩︎
A fundamental data structure in R that allows
binding a set of names to a set of objects. The
global environment is the namespace in which you bind
objects such as values and tibbles during
interactive work. The
bind_tibbles() function takes a supertibble and binds its data tibbles to the global environment.
An attribute about an entity (e.g., age or height) that can be captured in REDCap. Instruments are made up of fields. You can configure the fields of an instrument using the REDCap Field Editor. Fields have a field type and can have a descriptive field label. The data tibble contains the data entered into the fields of a REDCap project. ↩︎
The data type of the data that can be entered into a specific field. Important field types include:
text, which is used for free-text and numeric data
yesno and truefalse, which are used for logical data
dropdown and radio, which are used for single-answer categorical data
checkbox, which is used for multi-answer categorical data ↩︎
The level of detail that a specific row in a data tibble represents. This depends on the structure of the project (classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (repeating vs. nonrepeating), and, for longitudinal projects, the structure of the event (repeating vs. nonrepeating). For example, a data tibble containing data from a nonrepeating instrument in a longitudinal project with two arms has a granularity of one row per record per event per arm. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎
A column in the data
tibble that serves to partially identify the entity described in a
row. The record ID column is
always present in the data tibble. Depending on the structure of the project
(classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (nonrepeating vs repeating), and the structure of the event (repeating vs. nonrepeating) there may be additional
identifier columns, including
redcap_event_instance. Taken together, the identifier
columns form a composite primary
key. See also: the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
In the context of REDCapTidieR, this is the process of using the REDCap API to query data from a REDCap project to make it available inside the R environment. We use the term “import” in the
sense described in R
for Data Science which is to “take data stored in a file, database,
or web application programming interface (API), and load it into a data
frame in R.” Of note, the term “import” is ambiguous. From the
perspective of REDCap, “import” may mean writing external data into the
database. To clarify the direction of the import, we have named the main
function of REDCapTidieR
read_redcap() which is analogous
to other import functions in the tidyverse such as
read_csv(). You can use the
function to import data from a REDCap project.
Also called form. An electronic data entry form in
REDCap. An instrument contains fields into which
data can be entered. In the supertibble, each
row corresponds to one instrument. The instrument’s
name and human-readable label are shown in the
of the supertibble, respectively. A data
tibble contains all the data that was entered into a specific
The labelled R
package provides functions to attach a human-readable description (a
label) to a variable (a variable label). Labelled data can streamline
data exploration and assist with the generation of a data dictionary.
There are multiple
packages that support
labelled data. The
make_labelled() function attaches variable labels to the
variables of a supertibble and the variables
of the data tibbles and metadata tibbles contained in that
A list is a
fundamental data type in R. A tibble can contain
columns that are lists, and these columns are
columns. REDCapTidieR leverages list columns to store tibbles inside
of the supertibble. For example, the
redcap_data column of the supertibble is a list column that
contains data tibbles, and
redcap_metadata is a list column that contains metadata tibbles.
A type of REDCap project that contains events and optionally arms. One instrument can be associated with multiple events. This makes it possible to collect the same kind of data for the same record multiple times, which is useful for longitudinal research studies with multiple study visits. See also: Classic project. ↩︎
A tibble that contains metadata about a
specific REDCap instrument. The
redcap_metadata column of the supertibble contains the metadata tibbles of a
project. The rows of the metadata tibble
represent fields of the instrument. The columns
represent attributes of those fields. For example, the
field_type columns show the field’s name, a human-readable
description (the field label), and its field type. ↩︎
An instrument that can be filled out exactly once per record in a classic project and once per record per event instance (and per arm, if applicable) in a longitudinal project. See also: Repeating Instrument. ↩︎
Also called a database, a REDCap project is a
self-contained collection of all the of data and metadata related to
some data collection activity (for example, a specific research study).
A project may be classic or longitudinal. A classic project
consists of instruments that contain fields. A longitudinal project may additionally
include events and arms. You can
read_redcap() to import the data
from a project. ↩︎
The set of information about a single entity (e.g., a study participant) for which data is being captured in a specific REDCap project. Each record consists of a discrete data values organized into fields that can be spread across multiple instruments, events, and/or arms. Each record has a unique record ID. In the data tibble, the record ID is always the first column. The record ID column is one of the identifier columns. ↩︎
The application programming interface (API) of a REDCap instance allows external programs to connect, upload, and download data. To access the REDCap API, a user must have appropriate access privileges, an API token, and the uniform resource identifier (URI) of the API endpoint (something like “my.institution.edu/redcap/api”). The REDCapTidieR package uses REDCapR to query the REDCap API. ↩︎
An event whose associated instruments can be filled out zero, one, or multiple times per record per event (and per arm, if applicable). Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Event. ↩︎
An instrument that can be filled out zero, one, or multiple times per record in a classic project and zero, one, or multiple times per record per event (and per arm, if applicable) in a longitudinal project. Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Instrument. ↩︎
A horizontal series of cells in a data frame or tibble. One row of a supertibble represents an instrument. One row of a data tibble can represent different things, depending on the granularity of the data. See also: Column. ↩︎
The skimr R package
provides summary statistics to help users quickly skim and understand
their data. REDCapTidieR’s
uses skimr to add various
summary statistics of a specific field to the metadata tibbles. See also: the section Adding
summary statistics to the metadata with the skimr package in the Getting
The structure of an instrument can be repeating or nonrepeating. The supertibble shows the instrument’s structure in
structure column. The structure of a project can be classic, longitudinal, or longitudinal with arms. The structure of an event can
be repeating or nonrepeating. The granularity of a data
tibble depends on the structure of all three: the instrument, the
project, and the events associated with the instrument. Note: REDCap
does not allow repeating instruments inside a repeating event. See also:
the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
A special tibble that contains data and
metadata of a REDCap project returned by the
read_redcap() function. Each row of the
supertibble corresponds to one instrument. The
redcap_form_label columns identify the instrument. The
redcap_metadata contain the
instrument’s data tibble and metadata tibble. Additional columns contain
useful information about the data tibble, such as row and column counts,
size in memory, and the percentage of missing values in the data.
A variant of the R data frame that makes data analysis in the tidyverse a little easier. The data structures generated by REDCapTidieR are based on tibbles. See also: chapter on Tibbles in R for Data Science. ↩︎
The term “tidy” is part of REDCapTidieR’s name because it underlies two key ideas of the package.
The first is the concept of Tidy Data. A rectangular data structure is tidy if:
- Each variable forms a column
- Each observation forms a row
- Each type of observational unit forms a table (i.e. the granularity of rows in a table is consistent)
Data returned by the REDCap API (the “block matrix”) often satisfies the first two requirements of tidy data. However, if the project contains both repeating and nonrepeating instruments and/or events then the granularity is inconsistent from row to row. A key function of the REDCapTidieR package is to break down the block matrix by instrument. The resulting set of data tibbles tends to be tidier than the block matrix, because the granularity within each individual data tibble is consistent. This makes it easy to work with them.
- Reuse existing data structures.
- Compose simple functions with the pipe.
- Embrace functional programming.
- Design for humans.
We strive to follow these principles in the design of the REDCapTidieR package. ↩︎