growthcleanr has been developed and tested using R versions 3.6 and 4+. It should work using R on Windows, macOS, or Unix/Linux, although there are some additional platform-specific notes you may wish to review.

To get started with growthcleanr, install it from CRAN:

install.packages("growthcleanr")

To install the latest development version from GitHub using devtools:

devtools::install_github("carriedaymont/growthcleanr", ref="main")

Installing growthcleanr will install several additional packages in turn.

See GitHub and source-level install for developers for additional details.

Optional packages

These packages are not required for running cleangrowth() and the other main functions of growthcleanr, but may be necessary for certain use cases.

  • argparser is used by the gcdriver.R script for command line operation as described in Working with large data sets

  • bit64 is necessary if you have long (64-bit) integer subjid values

These can be installed the usual way:

install.packages(c("argparser", "bit64"))

Platform-specific notes

Windows

We have observed an issue with using growthcleanr with Windows in a large agency where some R packages are installed by an administrator for a user who may not have write access permissions on the package folder. Similar issues may occur with some networked drives. This caused problems with growthcleanr’s parallel processing option. If possible, install R and its packages in locations hosted on the same local machine and folder(s) for which the primary user has write permissions. These steps should help to avoid this problem.

Some users have reported that growthcleanr runs more slowly on Windows compared with Linux or macOS.

macOS

growthcleanr uses the data.table package for R extensively. data.table provides a faster version of R’s data frames, and is used to improve growthcleanr performance. Typically data.table installs in a manner that will be able to take advantage of multiple threads. You will know it worked successfully if, when you load the data.table library in R, you see something like the following:

library(data.table)
data.table 1.12.2 using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com

That data.table reports “using 2 threads” indicates that installation has succeeded. If the message reports using only one thread, see the advice under the “OpenMP enabled compiler for Mac” instructions to re-install data.table.

Users have reported errors running multiple batches with parallel = T from within RStudio. If this happens, the problem may be resolved by running from RGui or from the command line using Rscript. An example standalone script that may be used for this purpose is documented in Working with large data sets.

Docker

This package includes a Dockerfile that enables easy installation of R and growthcleanr on a machine with Docker installed. It requires an up-to-date Docker install, and a few command-line steps, but can save time over installing R and growthcleanr’s dependencies manually.

To install and run growthcleanr using Docker, open the PowerShell on Windows, or open the Terminal on macOS, and enter this docker command:

docker run -it ghcr.io/mitre/growthcleanr/gcr-image:latest R

The image tag latest in the example above will refer to the latest version of the package available on the main branch of the mitre/growthcleanr repository, which is typically in close sync with the upstream carriedaymont/growthcleanr repository. This will usually be the latest released version of growthcleanr. To explicitly choose a release by name, replace latest with the release tag, e.g. for the released package v2.1.0:

docker run -it ghcr.io/mitre/growthcleanr/gcr-image:v2.1.0 R

Whichever package you use, the first time this command is run, it might take a few minutes to download and extract several necessary components, but this should be fully automated. If successful, you should see an R prompt, from which you can use the growthcleanr package immediately.

This R environment is virtualized inside Docker, however, and isolated from your local machine. Because of this, you will need to map a local folder on your computer into the Docker environment to work with your own data. For example, if you are on Windows, and your data is in C:\Users\exampleuser\analysis, specify a mapping using the added -v step below:

docker run -it -v C:\Users\exampleusers\analysis:/usr/src/app \
    ghcr.io/mitre/growthcleanr/gcr-image:latest R

Note that the slashes in file paths reverse direction from the reference to the folder location on your Windows machine (before the colon) to the folder location on the Docker container (after the colon); this is intentional, and accounts for how the two different environments reference disk locations.

Note also that when mapping a folder on Windows, you may be prompted to confirm that you indeed want to “Share” the folder. This is a standard Windows security practice, and it is okay to confirm and proceed.

If you are on macOS, and your data is in /Users/exampleuser/analysis, specify a folder mapping like this:

docker run -it -v /Users/exampleuser/analysis:/usr/src/app \
    ghcr.io/mitre/growthcleanr/gcr-image:latest R

If you mapped a folder, then inside the Docker environment’s R prompt, when you then issue a command like list.files(), you should see a list of the same files in the R session that you see in that folder on your desktop. You can now open and read your data files, run cleangrowth() and other analyses, and write result files to that same directory.

Exit the Docker R environment with quit() as you normally would. Any new files you saved will appear in the desktop folder you mapped.

GitHub and source-level install for developers

You can install the growthcleanr package directly from GitHub using devtools in the R console with:

install.packages("devtools")
devtools::install_github("carriedaymont/growthcleanr", ref="main")

growthcleanr itself has several dependencies, so it may take a little while to download and install everything on your machine.

Note that the ref="main" part is required; the default value of ref refers to a branch name that is not used in the growthcleanr repository, which instead uses a default branch called “main”.

To install a different branch, for example if you want to test a branch associated with a merge request, specify the branch name as the value of ref.

If you are unable to install devtools, a similar function is available in the remotes package:

install.packages("remotes")
remotes::install_github("carriedaymont/growthcleanr", ref="main")

If you are developing the growthcleanr code itself, you can download or clone the growthcleanr source code and then install it from source. To clone the source using git:

git clone https://github.com/carriedaymont/growthcleanr.git

Once you have the growthcleanr package source, open an R session from the growthcleanr base directory. Then install growthcleanr using the R devtools package:

devtools::install(".")