| Compartmentalized | Documented | Extendible | Reproducible | Robust |
An R package is an easy and the standard way to organize your R code, document your code, and share your code with other people. Why use an R package rather than just make a bunch of scripts with your data in a folder?
TestPackage and select the directory
where to put it. Also check the little box saying ‘Create git
repository’.That’s it!!
You will see a ‘Build’ tab.
If you want to use RStudio Cloud
Open this link, TestPackage. You will need to login. You can use your Google account.
2 files and a directory.
DESCRIPTION This file has the meta-data about
your package. Name and what packages it depends on. Most of it is
self-explanatory. The Depends: and Imports:
lines specify any functions from other packages that you use in your
functions.
NAMESPACE This file indicates what needs to be exposed to users for your R package. For our course, you won’t need to edit as {roxygen2} takes care of it.
R directory This is where all your R code goes for your package.
man A directory for documentation. You won’t need to write this. It will be added automatically by {roxygen2}.
data A directory for data files saved in RData
format with the ending .rda or .RData. Nothing
else!
inst folder for misc stuff
inst\extdata folder for external data.
data-raw A directory for raw data files that
produced the data files in data folder.
.Rbuildignore optional, but in practice you will
always need this.
By default, RStudio will create the following files.
DESCRIPTIONPackage: DeleteMe
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true
NAMESPACEexportPattern("^[[:alpha:]]+")
This is saying export all functions in the R folder.
We are going to use {roxygen2} which will create our documentation. You should always use this. Don’t get into the bad habit of writing functions without documentation headers!
We could use
usethis::create_package("../TestPackage")
to create our package with {roxygen2} set up but I’ll walk you through do it manually.
install.packages("roxygen2")
NAMESPACE file{roxygen2} is going to create that so we need to get rid of non-roxygen2 one. If you forget, you’ll see a warning and {roxygen2} won’t delete the old one.
Click on Tools > Project Options > Build Tools
Make sure Generate documentation with Roxygen is checked. Don’t see that? Then you need to install the {roxygen2} package.
Click Configure next to the Roxygen line. Make sure all the checkboxes are checked. The last 2 won’t be by default.
Change hello.R in the R folder.
Paste this code into the script and save. The #' is
the {roxygen2} header.
#' @title Hello!
#'
#' @description This function just says hello.
#'
#' @export
hello <- function(){ cat("HELLO") }
Learn about your function with
?hello
Use your function with
hello()
Add a folder called data
Run these lines from the command line.
WWW2 <- WWWusage^2
save(WWW2, file="data/WWW2.rda")
Click Install and Restart from the Build tab
Now your data are available from your package. Type
WWW2
at the command line.
Now we will add a function that uses another R package.
Create a new R script file. File > New File > R Script.
Paste this code into the script and save as
littleforecast.R in the R directory.
#' Forecast with Arima Model
#'
#' This fits an Arima model to data with forecast's auto.arima() function and plots
#' a forecast with the forecast() function.
#'
#' @param data A vector (time series) of data
#' @param nyears Number of time steps to forecast forward
#' @return A plot of a forecast.
#' @examples
#' dat <- WWWusage
#' littleforecast(dat, nyears=100)
#' @export
littleforecast <- function(data, nyears=10){
fit <- forecast::auto.arima(data)
fc <- forecast::forecast(fit, h = nyears)
ggplot2::autoplot(fc)
}
This function depends on some packages: {forecast} and {ggplot2}. We need to tell our package about these dependencies.
Add this line to DESCRIPTION file after the
Description: line:
Imports: forecast, ggplot2
Click Build > Install and Restart. Now we can use our function.
littleforecast(WWW2)
Let’s edit our DESCRIPTION file to look like so:
Package: TestPackage
Title: This Is A Toy Package
Version: 1.3
Author: Eli Holmes
Maintainer: <eli.holmes@noaa.gov>
Description: This is a super simple toy package for students to copy and experiment with for the short course.
Depends: R (>= 3.4.1)
Imports: forecast, ggplot2
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.2
The packages on the Depends and Imports lines are required to be installed in order to install your package. If the user doesn’t have these packages, then they will be installed when installing the package. When you try to Build and Install, R will complain and throw an error if you are missing packages.
Depends: means the user will have all the commands of
that package at the command line.Imports: is any other R packages that your package
needs in order to work but its functions won’t be available at the
command line (unless you choose).{roxygen2} made this NAMESPACE file.
export(littleforecast)
export(hello)
How does {roxygen2} know to export a function? Add this to the documentation code at the top of your functions.
#' @export
This is where functions are put and our data documentation files. Each file is a separate function. You can put multiple functions in one file, but that can get confusing unless they are small functions. The top of the function has documentation in {roxygen2} format.
#' @title A little foo function
#'
#' @description This little function does this.
#'
#' @param arg1 what this argument is
#' @export foo
foo <- function(arg1){
# The work
return(<what you want to return to user>)
}
.RbuildignoreThough not required, in practice you will need to tell R what not to include in your package. RStudio will make this for you but you need to check it and add more stuff.
^.*\.Rproj$
^\.Rproj\.user$
^TestPackage\.Rcheck$
^TestPackage.*\.tar\.gz$
^TestPackage.*\.tgz$
.github
.git
Create a new R script file. File > New File > R Script.
Paste this code into the script and save as
irisaverages.R in the R directory.
#' dplyr example
#'
#' This adds a new function that needs {dplyr}
#' @param col which column to average
#' @export
irisaverages <- function(col = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")){
col <- match.arg(col)
iris$col <- iris[[col]]
iris %>% dplyr::group_by(Species) %>%
dplyr::summarize(mean = mean(col))
}
We now use {dplyr} and %>% (pipe).
We can either add {dplyr} to Depends in our DESCRIPTION
file but that would load the whole {dplyr} library and maybe we don’t
want to do that.
We can add {dplyr} to Imports but how to get
%>%? Add a file import_packages.R to the R
folder (the name of the file is unimportant).
#' @importFrom magrittr %>%
NULL
or add
#' @importFrom magrittr %>%
to the header of irisaverages.R.
How would I ever remember this?? Sadly if your use the
%>% pipe, you’ll gets lots of practice with this.
Starting with R version 4.1, there is now a
native R pipe, |>, which works like
%>% in most cases so you might want to switch to
that.
Add to the data folder as an .rda or
.RData file.
setosa <- subset(iris, Species=="setosa")
save(setosa, file="data/setosa.rda")
Add in the R folder data-setosa.R. Tip, it is good to
give your data documentation scripts a clear name tag to distinguish
them from functions.
#' @title The setosa dataset
#'
#' @description
#'
#' \itemize{
#' \item Sepal.Length. length of sepals
#' \item Sepal.Width. with of sepals
#' \item Petal.Length. length of petals
#' \item Petal.Width. with of petals
#' }
#'
#' @docType data
#' @name setosa
#' @usage data(setosa)
#' @references R base package.
#' @format A data frame.
#' @keywords datasets
NULL
Note, in the latest Roxygen2, you don’t need the
@name but that only works if you use
LazyData: true in your DESCRIPTION file. You
might not want to load data every time the user loads the package.
The rda filename in the data folder is what
is used to load data. For example, let’s say you have
save(cars1, cars2, file="data/carsdata.rda")
So 2 data objects saved to one rda file. To load both
data objects, you use
data(carsdata)
What do I document: cars1, cars2 or
carsdata? You can actually do whatever you want.
Do this to show this documentation with ?cars2.
#' @title a dataset of horsepower for different cars
#'
#' @docType data
#' @name cars2
NULL
Do this to show this documentation with ?cars1,
?cars2, and ?carsdata
#' @title some datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
#' @aliases cars1 cars2
NULL
Do this to show this documentation with ?carsdata.
#' @title three datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
NULL
This will only work for data that are exported. That means
Lazydata: true and what is loaded from
data(carsdata).
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"cars2"
So this fails since it is not carsdata that is exported.
That is just the name of the data file.
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"carsdata"
If/when you want to go into R packaging in more depth, see Hadley Wickham’s book R Packages. However, for simple packages you don’t need the book.