The REDCapTidieR package uses vocabulary that is standard for REDCap database architects but not necessarily well known to all R users. It also introduces several idiosyncratic terms.
Below we provide a rough mapping of REDCap concepts to their corresponding artifacts in REDCapTidieR. This is followed by a listing of term definitions.
REDCap | REDCapTidieR |
---|---|
Project, Database | Supertibble |
Instrument, Form |
One row of the supertibble Data is in the data tibble |
Field | Data column (a column of the data tibble) |
Field name | Variable name of a data column |
Field type | Data type of a data column |
Field label |
Variable label of a data column (only present if supertibble is labelled) |
Record |
One or several rows of a data tibble Record ID column is the first column of a data tibble |
Event |
(only present if the project is longitudinal) |
Arm |
(only present if the project is longitudinal with multiple arms) |
Repeat Instrument |
(only present if the instrument is repeating) |
Repeat Event |
(only present if the instrument is associated with a repeating event) |
Data Access Group |
(only present if the project has data access groups enabled) |
Arm
An ordered group of events. Arms provide a mechanism that allows one longitudinal project to have multiple different sequences of events defined. ↩︎
Block matrix
A rectangular data structure (matrix) that is constructed from multiple smaller rectangular data structures (blocks). In the context of REDCap, the block matrix is the rectangular data set that contains data from multiple instruments returned by the REDCap API. ↩︎
Composite primary key
A primary key is a column in a table that is distinct in each row and serves to identify each row. A composite primary key is a primary key that consists of multiple columns that in combination are distinct in each row and serve to identify each row. Taken together, the identifier columns of the data tibble form a composite primary key. This makes it easy to join data tibbles together. ↩︎
Choice
An option or category defined in the context of a single-answer or multi-answer categorical field type in REDCap. You can define choices using the REDCap Field Editor. Choices have a raw value (a unique identifier - usually a serial number but this can be changed) and a choice label (a human readable description of the choice, which is displayed during data entry).
In the context of REDCapTidieR, choices come into play in two scenarios during the construction of the data tibble. Choice labels of single-answer type fields (dropdown and radio) are used to define the values of data columns that are derived from those fields. Raw values of the multi-answer checkbox field are used to construct the names of data columns derived from them. ↩︎
Classic project
Also known as a traditional project, this the simplest type of REDCap project. You can define one or multiple instruments (also called forms) for data entry. Both repeating and nonrepeating instruments are allowed. Nonrepeating instruments are completed only once for each record. For nonrepeating instruments, one row of data in the data tibble represents one record. Repeating instruments can be completed an arbitrary number of times for each record. For repeating instruments, one row of data in the data tibble represents one repeat instance of one record. See also: Longitudinal project. ↩︎
Data Access Group
The Data Access Group (DAG) feature of REDCap streamlines multi-group collaboration by partitioning groups of records of a single project. This feature is particularly useful when you want certain users or groups of users to only have access to a specific subset of the data in a project.
In a multi-site study, for instance, you might want each site to only have access to their own data. By setting up a DAG for each site, you can ensure that site users can only view and edit records that belong to their DAG. Super users (i.e., those with full privileges) can view and edit all records in the project, regardless of the DAG to which they belong.
When a project has DAGs enabled, a
corresponding redcap_data_access_group
column identifies which DAG a given record belongs to.
Database
In the context of REDCap, this is the same as project. We prefer the term “project” because it is has a more specific meaning. ↩︎
Data column
A column of the data tibble that is derived from data that were entered into the fields of a REDCap instrument. ↩︎
Data tibble
A tibble that contains data that were entered
into the fields of a specific REDCap instrument. The redcap_data
column
of the supertibble contains the data tibbles
of a project. The columns of the data tibble
include identifier columns that jointly
identify each row and data columns that
contain data that was entered into REDCap. REDCapTidieR provides several
functions to extract data tibbles from the supertibble. See also: Metadata tibble.
↩︎
Data viewer
A part of the RStudio IDE functionality that allows you to inspect data frames, tibbles, and some other data structures. It includes features to perform basic exploratory data analysis such as sorting, filtering, and searching. The supertibble is designed to work well with the data viewer. ↩︎
Environment
A fundamental data structure in R that allows
binding a set of names to a set of objects. The
global environment is the namespace in which you bind
objects such as values and tibbles during
interactive work. The bind_tibbles()
function takes a supertibble and binds its data tibbles to the global environment.
↩︎
Event
A part of a longitudinal project. Each event can be associated with one or multiple instruments and may be either repeating or nonrepeating. ↩︎
Factor variable
A data type in R for categorical data. By default, single-answer categorical REDCap field types (dropdown, radio) are represented as factor variables in the data tibble. ↩︎
Field
An attribute about an entity (e.g., age or height) that can be captured in REDCap. Instruments are made up of fields. You can configure the fields of an instrument using the REDCap Field Editor. Fields have a field type and can have a descriptive field label. The data tibble contains the data entered into the fields of a REDCap project. ↩︎
Field label
A piece of text that acts as the prompt for data entry in REDCap. The
make_labelled()
function creates variable labels based on the field label.
↩︎
Field type
The data type of the data that can be entered into a specific field. Important field types include:
text, which is used for free-text and numeric data
yesno and truefalse, which are used for logical data
dropdown and radio, which are used for single-answer categorical data
checkbox, which is used for multi-answer categorical data ↩︎
Form
In the context of REDCap, this is the same as an instrument. We prefer the term “instrument” because it has a more specific meaning than “form.” ↩︎
Format helper
A function provided by REDCapTidieR designed to help turning field labels of data
columns into pretty variable labels.
See format-helpers
.
↩︎
Granularity
The level of detail that a specific row in a data tibble represents. This depends on the structure of the project (classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (repeating vs. nonrepeating), and, for longitudinal projects, the structure of the event (repeating vs. nonrepeating). For example, a data tibble containing data from a nonrepeating instrument in a longitudinal project with two arms has a granularity of one row per record per event per arm. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎
Identifier column
A column in the data
tibble that serves to partially identify the entity described in a
row. The record ID column is
always present in the data tibble. Depending on the structure of the project
(classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (nonrepeating vs repeating), and the structure of the event (repeating vs. nonrepeating) there may be additional
identifier columns, including redcap_event
,
redcap_arm
, redcap_form_instance
, and
redcap_event_instance
. Taken together, the identifier
columns form a composite primary
key. See also: the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
Import
In the context of REDCapTidieR, this is the process of using the REDCap API to query data from a REDCap project to make it available inside the R environment. We use the term “import” in the
sense described in R
for Data Science which is to “take data stored in a file, database,
or web application programming interface (API), and load it into a data
frame in R.” Of note, the term “import” is ambiguous. From the
perspective of REDCap, “import” may mean writing external data into the
database. To clarify the direction of the import, we have named the main
function of REDCapTidieR read_redcap()
which is analogous
to other import functions in the tidyverse such as
read_csv()
. You can use the read_redcap()
function to import data from a REDCap project.
↩︎
Instrument
Also called form. An electronic data entry form in
REDCap. An instrument contains fields into which
data can be entered. In the supertibble, each
row corresponds to one instrument. The instrument’s
name and human-readable label are shown in the
redcap_form_name
and redcap_form_label
columns
of the supertibble, respectively. A data
tibble contains all the data that was entered into a specific
instrument. ↩︎
labelled
The labelled R
package provides functions to attach a human-readable description (a
label) to a variable (a variable label). Labelled data can streamline
data exploration and assist with the generation of a data dictionary.
There are multiple
packages that support
labelled data. The
make_labelled()
function attaches variable labels to the
variables of a supertibble and the variables
of the data tibbles and metadata tibbles contained in that
supertibble. ↩︎
List column
A list is a
fundamental data type in R. A tibble can contain
columns that are lists, and these columns are
called list
columns. REDCapTidieR leverages list columns to store tibbles inside
of the supertibble. For example, the
redcap_data
column of the supertibble is a list column that
contains data tibbles, and
redcap_metadata
is a list column that contains metadata tibbles.
↩︎
Longitudinal project
A type of REDCap project that contains events and optionally arms. One instrument can be associated with multiple events. This makes it possible to collect the same kind of data for the same record multiple times, which is useful for longitudinal research studies with multiple study visits. See also: Classic project. ↩︎
Metadata tibble
A tibble that contains metadata about a
specific REDCap instrument. The
redcap_metadata
column of the supertibble contains the metadata tibbles of a
project. The rows of the metadata tibble
represent fields of the instrument. The columns
represent attributes of those fields. For example, the
field_name
, field_label
, and
field_type
columns show the field’s name, a human-readable
description (the field label), and its field type. ↩︎
Nonrepeating Event
An event whose associated instruments can be filled out exactly once per record per event (and per arm, if applicable). See also: Repeating Event. ↩︎
Nonrepeating Instrument
An instrument that can be filled out exactly once per record in a classic project and once per record per event instance (and per arm, if applicable) in a longitudinal project. See also: Repeating Instrument. ↩︎
Project
Also called a database, a REDCap project is a
self-contained collection of all the of data and metadata related to
some data collection activity (for example, a specific research study).
A project may be classic or longitudinal. A classic project
consists of instruments that contain fields. A longitudinal project may additionally
include events and arms. You can
use read_redcap()
to import the data
from a project. ↩︎
Record
The set of information about a single entity (e.g., a study participant) for which data is being captured in a specific REDCap project. Each record consists of a discrete data values organized into fields that can be spread across multiple instruments, events, and/or arms. Each record has a unique record ID. In the data tibble, the record ID is always the first column. The record ID column is one of the identifier columns. ↩︎
REDCap API
The application programming interface (API) of a REDCap instance allows external programs to connect, upload, and download data. To access the REDCap API, a user must have appropriate access privileges, an API token, and the uniform resource identifier (URI) of the API endpoint (something like “my.institution.edu/redcap/api”). The REDCapTidieR package uses REDCapR to query the REDCap API. ↩︎
REDCapR
The REDCapR R package provides functions to interact with the REDCap API. REDCapTidieR builds on REDCapR to import data into R. ↩︎
Repeating Event
An event whose associated instruments can be filled out zero, one, or multiple times per record per event (and per arm, if applicable). Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Event. ↩︎
Repeating Instrument
An instrument that can be filled out zero, one, or multiple times per record in a classic project and zero, one, or multiple times per record per event (and per arm, if applicable) in a longitudinal project. Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Instrument. ↩︎
Row
A horizontal series of cells in a data frame or tibble. One row of a supertibble represents an instrument. One row of a data tibble can represent different things, depending on the granularity of the data. See also: Column. ↩︎
skimr
The skimr R package
provides summary statistics to help users quickly skim and understand
their data. REDCapTidieR’s add_skimr_metadata()
function
uses skimr to add various
summary statistics of a specific field to the metadata tibbles. See also: the section Adding
summary statistics to the metadata with the skimr package in the Getting
Started vignette.
Structure
The structure of an instrument can be repeating, nonrepeating, or mixed. The supertibble shows the instrument’s structure in
the structure
column. The structure of a project can be classic, longitudinal, or longitudinal with arms. The structure of an event can
be repeating or nonrepeating. As of REDCapTidieR v1.1.0,
mixed
structure instruments are supported. The granularity of a data
tibble depends on the structure of all three: the instrument, the
project, and the events associated with the instrument. Note: REDCap
does not allow repeating instruments inside a repeating event. See also:
the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
Supertibble
A special tibble that contains data and
metadata of a REDCap project returned by the
read_redcap()
function. Each row of the
supertibble corresponds to one instrument. The
redcap_form_name
and redcap_form_label
columns identify the instrument. The
redcap_data
and redcap_metadata
contain the
instrument’s data tibble and metadata tibble. Additional columns contain
useful information about the data tibble, such as row and column counts,
size in memory, and the percentage of missing values in the data.
↩︎
Survey
A special kind of instrument that can be completed by someone who is not a user on a REDCap project. ↩︎
Tibble
A variant of the R data frame that makes data analysis in the tidyverse a little easier. The data structures generated by REDCapTidieR are based on tibbles. See also: chapter on Tibbles in R for Data Science. ↩︎
Tidy
The term “tidy” is part of REDCapTidieR’s name because it underlies two key ideas of the package.
The first is the concept of Tidy Data. A rectangular data structure is tidy if:
- Each variable forms a column
- Each observation forms a row
- Each type of observational unit forms a table (i.e. the granularity of rows in a table is consistent)
Data returned by the REDCap API (the “block matrix”) often satisfies the first two requirements of tidy data. However, if the project contains both repeating and nonrepeating instruments and/or events then the granularity is inconsistent from row to row. A key function of the REDCapTidieR package is to break down the block matrix by instrument. The resulting set of data tibbles tends to be tidier than the block matrix, because the granularity within each individual data tibble is consistent. This makes it easy to work with them.
The second is the idea of Tidy Tools, which is a set of design guidelines for the packages of the Tidyverse. Tidy tools should follow the following principles:
- Reuse existing data structures.
- Compose simple functions with the pipe.
- Embrace functional programming.
- Design for humans.
We strive to follow these principles in the design of the REDCapTidieR package. ↩︎