Compartmentalized | Documented | Extendible | Reproducible | Robust |
This week I will discuss how to make an R package. R packages are not just for work that you share with others. Most of my code projects are organized into an R package and definitely any project that I have that involves data and code is organized into an R package.
Organizing your code into an R package is very easy. If you are at the stage where you write functions and multiple R scripts for your projects, you need to be aware of how to package your code because it is such a powerful (and common) code organization method in R. By the end of this session, you will be able to build your own mini R package. I’ll show you how to host it on GitHub with a nice little landing page.
R CMD check
, C++ code in your package, coding GitHub Actions workflows. Those writing R packagesIf/when you want to go into R packaging in more depth, see Hadley Wickham’s book R Packages.
An R package is an easy and the standard way to organize your R code, document your code, and share your code with other people. Why use an R package rather than just make a bunch of scripts with your data in a folder?
Mac users: You don’t need to do anything.
RStudio Cloud users: You don’t need to do anything.
Windows users: Try running this code and see what happens. You need to install devtools package if you don’t have it.
# install.packages("devtools")
devtools::install_github("RWorkflow-Workshop-2021/week6-testpackage")
If that code complains, then you need to install RTools. Note there is a different RTools for R 4.0.0 (released in April 2020) versus earlier R releases. Look for the little link for earlier versions of RTools if you don’t have 4.0.0 installed. Technically, it says you only need RTools to install packages with C/C++ so you might be fine. Personally, I always install RTools on my Windows machines since I install packages with C/C++ sometimes. But to keep things simple, try building a package without RTools and see if it works.
MyNewPackage
and select the directory where to put it.Open this link, MyNewPackage
2 files and a directory.
DESCRIPTION This file has the meta-data about your package. Name and what packages it depends on. Most of it is self-explanatory. The Depends:
and Imports:
lines specify any functions from other packages that you use in your functions.
NAMESPACE This file indicates what needs to be exposed to users for your R package. For our course, you won’t need to edit as Roxygen2 takes care of it.
R directory This is where all your R code goes for your package.
man A directory for documentation. You won’t need to write this. It will be added automatically by Roxygen2.
data A directory for data files saved in RData format (with the ending .RData
). Nothing else!
inst
folder for misc stuff
inst\extdata
folder for external data.
data-raw
A directory for raw data files that produced the data files in data
folder.
You have built MyNewFunction and loaded it. You can use the package functions. Type
hello()
Create a new R script file. File > New File > R Script.
Paste this code into the script and save as littleforecast.R
in the R directory.
littleforecast <- function(data, nyears=10){
fit <- forecast::auto.arima(data)
fc <- forecast::forecast(fit, h = nyears)
ggplot2::autoplot(fc)
}
export(littleforecast)
Imports: forecast, ggplot2
install.packages("forecast")
install.packages("ggplot2")
dat <- WWWusage
littleforecast(dat, nyears=100)
and a 100 year forecast of internet usage should appear.
Add a folder called data
Run these lines from the command line.
WWW2 <- WWWusage^2
save(WWW2, file="data/WWW2.RData")
Click Install and Restart from the Build tab
Now your data is available from your package. Type
WWW2
littleforecast(WWW2)
at the command line.
Open the file named DESCRIPTION. Most of it is self-explanatory. Depends:
means the user will have all the commands of that package at the command line. Imports:
is any other R packages that your package needs in order to work but it’s functions won’t be available at the command line (unless you choose). Developers: On July 23, I am going to show you exactly how to set up your NAMESPACE and Depends/Imports.
Package: MyNewPackage
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
Depends: R (>= 3.4.1), ggplot2
Imports: forecast
License: What license is it under? (GPL-3 or CCO for US Government)
Encoding: UTF-8
LazyData: true
The packages on the @Depends and @Imports lines are required to be installed in order to install your package. If the user doesn’t have these packages, then they will be installed when installing the package.
This file has the commands to export the functions (in the R folder) to the command line for use. If you don’t have a function here, the user will need to use :::
to access the function.
exportPattern("^[[:alpha:]]+")
export(littleforecast)
The first line means “export all functions”. I don’t normally have that line but it is handy when you are starting out; just export all your functions. The next line is exporting the littleforecast
function.
This is where functions are put. Each file is a separate function. You can put multiple functions in one file, but that can get confusing unless they are small functions.
It has this structure: name and the names of information passed into the function.
functionname <- function(infofunctionneeds1, infofunctionneeds2, ...){
# The work
return(<what you want to return to user>)
}
The code you will use to install from GitHub is:
library(devtools)
install_github("youraccount/MyNewPackage")
For example to install the package on ‘RVerse-Tutorials’, you would use
install_github("RVerse-Tutorials/TestPackage")
If you are on a Windows machine and get an error saying ‘loading failed for i386’ or similar, then try
options(devtools.install.args = "--no-multiarch")
If R asks you to update packages, and then proceeds to fail at installation because of a warning that a package was built under a later R version than you have on your computer, use
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS=TRUE)
If R asks you to update packages, you don’t need to update (normally). If you do update, and it asks if you want to install from source, you can probably say ‘No’. It is very unlikely that the package you trying to install needs the most updated version of a package. If that were the case, the package writer could have explicitly stated a version dependency, like forecast (>=2.0)
.
If R simply won’t install a package from GitHub/Lab (or CRAN even) because of a package dependency problem, something like can't install because couldn't remove old installation
error. Then click on the Packages tab (lower right panel) and click Install. Look at where R is installing packages. There might be more than one place. Close all your RStudio windows (exit altogether) and go to those locations and delete the library folder(s) for the offending package. Then open RStudio and re-install that package.
To limit the number of headaches that users face when trying to install your package from GitHub/Lab, use as few packages on your @Depends and @Imports lines in DESCRIPTION file as possible. If your package does not need the package to work, then put the package on @Suggests.
Make a release on GitHub?: Click Release to the right on your GitHub repo.
To install the latest release
install_github("youraccount/MyNewPackage@*release")
Why? It looks nicer and conveys the needed info to users. This is for GitHub.
MyNewPackage
, create a new text file called README.md
and type in some info about your package.We’ll cover pkgdown next week.
A data package can be exactly the same as a code package except that you don’t have much in the R
folder and you have a lot in the data
folder. A “data” package is just dedicated to data. There is nothing else very special about it.
Let’s alter MyNewPackage
to add some data and document that data.
I am going to use the Roxygen2 workflow for the documentation. You should do that too. To set up for Roxygen2, go to Tools > Project Options > Build Tools. Check the ‘Generate documentation with Roxygen’ box and then click Configure. Make sure the ‘Install and Restart’ box at the bottom is checked.
data-raw
folder.mydata.csv
mydata
and save the to a mydata.rda
file in the data
folder. Save your code.mydata.R
in the R
folder. This is how you document your data. Add this to that file.#' @title My Data
#'
#' @description My dataset on diamonds and here is more info.
#'
#' \itemize{
#' \item price. price in US dollars
#' \item carat. weight of the diamond
#' }
#'
#' @docType data
#' @name mydata
#' @usage data(mydata)
#' @format A data frame with 10 rows and 2 variables
NULL
Note, in the latest Roxygen2, you don’t need the @name
but that only works if you use LazyData: true
in your DESCRIPTION
file. For a pure data package, you might not want to do that.
Let’s use our new data package in a R Markdown document.
---
title: "Untitled"
output: html_document
---
```{r, eval=FALSE}
install_github("RWorkflow-Workshop-2021/MyNewPackage@*release")
```
```{r, echo=FALSE}
library(MyNewPackage)
data(mydata)
knitr::kable(mydata,
caption=paste("This is version", packageVersion("MyNewPackage')))
```
data-raw
so that you have the raw data and the rda
files in the data
directory. You can put whatever you want into data-raw
..Rbuildignore
file, add the line ^data-raw$
to not include that in a build.data-raw
? No. Another common place is inst\extdata
. Which one you use is up to you. I use extdata
more as a sandbox and it’ll have all sorts of info used to make the data
files.data
folder.data
folder.rda
files to the data
folder..R
files to the R
folder with your Roxygen2 data documentation.If you use LazyData: true
and your data all have unique names, you can use:
#' dataname
#'
#' data description
#'
"dataname"
or if you use LazyData: false
and your data do not have unique names, you use:
#' dataname
#'
#' data description
#'
#' @docType data
#' @name foo
NULL
Note, you’ll want to keep your raw data and code to convert that into the rda
files with the package. Put in data-raw
or inst\extdata
.
In a team application, you’ll be dividing up the work.
data-raw
in whatever format your team uses.R
directory. Note, in VRData the documentation headers are in data-raw
and I have code that processes that into the files in R
.
Comment on creating the Rd file
The Rd file in the
man
directory is what makes the documentation. The R Packages section on documenting data shows you how to write those files.But keep the following in mind. The Roxygen2 code that is shown in that section is where the dataset is defined when
library(mypackage)
is called. That would happen ifLazyData: true
in theDESCRIPTION
file. Here’s how the new Roxygen2 code looks. Notice no@name mydata
andNULL
at the bottom is replaced with"mydata"
.If you changed
LazyData: false
, all that Roxygen2 code is going to fail. So I personally would not use the new Roxygen2 “shortcut”.Why would you ever set
LazyData: false
? Because some of your data have the same name. I make R data packages with 100s of datasets with the exact same structure and same name. I use them like so wheredat
is a character string name of my data:All my data are stored with the name
salmon
not with the data file name.So like so:
I don’t ever want to refer to the Columbia River data as
columbia-river-chinook-esu
. In my workflow, that wouldn’t make sense.But in other applications, it often makes sense to give your data a specific name, like
sst
ornooksack-river
orthedata
. In that case, the style in the R Packages section on documenting data is fine.