Compartmentalized | Documented | Extendible | Reproducible | Robust |
An R package is an easy and the standard way to organize your R code, document your code, and share your code with other people. Why use an R package rather than just make a bunch of scripts with your data in a folder?
TestPackage
and select the directory
where to put it. Also check the little box saying ‘Create git
repository’.That’s it!!
You will see a ‘Build’ tab.
If you want to use RStudio Cloud
Open this link, TestPackage. You will need to login. You can use your Google account.
2 files and a directory.
DESCRIPTION This file has the meta-data about
your package. Name and what packages it depends on. Most of it is
self-explanatory. The Depends:
and Imports:
lines specify any functions from other packages that you use in your
functions.
NAMESPACE This file indicates what needs to be exposed to users for your R package. For our course, you won’t need to edit as {roxygen2} takes care of it.
R directory This is where all your R code goes for your package.
man A directory for documentation. You won’t need to write this. It will be added automatically by {roxygen2}.
data A directory for data files saved in RData
format with the ending .rda
or .RData
. Nothing
else!
inst
folder for misc stuff
inst\extdata
folder for external data.
data-raw
A directory for raw data files that
produced the data files in data
folder.
.Rbuildignore
optional, but in practice you will
always need this.
By default, RStudio will create the following files.
DESCRIPTION
Package: DeleteMe
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true
NAMESPACE
exportPattern("^[[:alpha:]]+")
This is saying export all functions in the R folder.
We are going to use {roxygen2} which will create our documentation. You should always use this. Don’t get into the bad habit of writing functions without documentation headers!
We could use
usethis::create_package("../TestPackage")
to create our package with {roxygen2} set up but I’ll walk you through do it manually.
install.packages("roxygen2")
NAMESPACE
file{roxygen2} is going to create that so we need to get rid of non-roxygen2 one. If you forget, you’ll see a warning and {roxygen2} won’t delete the old one.
Click on Tools > Project Options > Build Tools
Make sure Generate documentation with Roxygen is checked. Don’t see that? Then you need to install the {roxygen2} package.
Click Configure next to the Roxygen line. Make sure all the checkboxes are checked. The last 2 won’t be by default.
Change hello.R
in the R folder.
Paste this code into the script and save. The #'
is
the {roxygen2} header.
#' @title Hello!
#'
#' @description This function just says hello.
#'
#' @export
hello <- function(){ cat("HELLO") }
Learn about your function with
?hello
Use your function with
hello()
Add a folder called data
Run these lines from the command line.
WWW2 <- WWWusage^2
save(WWW2, file="data/WWW2.rda")
Click Install and Restart from the Build tab
Now your data are available from your package. Type
WWW2
at the command line.
Now we will add a function that uses another R package.
Create a new R script file. File > New File > R Script.
Paste this code into the script and save as
littleforecast.R
in the R directory.
#' Forecast with Arima Model
#'
#' This fits an Arima model to data with forecast's auto.arima() function and plots
#' a forecast with the forecast() function.
#'
#' @param data A vector (time series) of data
#' @param nyears Number of time steps to forecast forward
#' @return A plot of a forecast.
#' @examples
#' dat <- WWWusage
#' littleforecast(dat, nyears=100)
#' @export
littleforecast <- function(data, nyears=10){
fit <- forecast::auto.arima(data)
fc <- forecast::forecast(fit, h = nyears)
ggplot2::autoplot(fc)
}
This function depends on some packages: {forecast} and {ggplot2}. We need to tell our package about these dependencies.
Add this line to DESCRIPTION
file after the
Description:
line:
Imports: forecast, ggplot2
Click Build > Install and Restart. Now we can use our function.
littleforecast(WWW2)
Let’s edit our DESCRIPTION
file to look like so:
Package: TestPackage
Title: This Is A Toy Package
Version: 1.3
Author: Eli Holmes
Maintainer: <eli.holmes@noaa.gov>
Description: This is a super simple toy package for students to copy and experiment with for the short course.
Depends: R (>= 3.4.1)
Imports: forecast, ggplot2
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.2
The packages on the Depends and Imports lines are required to be installed in order to install your package. If the user doesn’t have these packages, then they will be installed when installing the package. When you try to Build and Install, R will complain and throw an error if you are missing packages.
Depends:
means the user will have all the commands of
that package at the command line.Imports:
is any other R packages that your package
needs in order to work but its functions won’t be available at the
command line (unless you choose).{roxygen2} made this NAMESPACE file.
export(littleforecast)
export(hello)
How does {roxygen2} know to export a function? Add this to the documentation code at the top of your functions.
#' @export
This is where functions are put and our data documentation files. Each file is a separate function. You can put multiple functions in one file, but that can get confusing unless they are small functions. The top of the function has documentation in {roxygen2} format.
#' @title A little foo function
#'
#' @description This little function does this.
#'
#' @param arg1 what this argument is
#' @export foo
foo <- function(arg1){
# The work
return(<what you want to return to user>)
}
.Rbuildignore
Though not required, in practice you will need to tell R what not to include in your package. RStudio will make this for you but you need to check it and add more stuff.
^.*\.Rproj$
^\.Rproj\.user$
^TestPackage\.Rcheck$
^TestPackage.*\.tar\.gz$
^TestPackage.*\.tgz$
.github
.git
Create a new R script file. File > New File > R Script.
Paste this code into the script and save as
irisaverages.R
in the R directory.
#' dplyr example
#'
#' This adds a new function that needs {dplyr}
#' @param col which column to average
#' @export
irisaverages <- function(col = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")){
col <- match.arg(col)
iris$col <- iris[[col]]
iris %>% dplyr::group_by(Species) %>%
dplyr::summarize(mean = mean(col))
}
We now use {dplyr} and %>%
(pipe).
We can either add {dplyr} to Depends
in our DESCRIPTION
file but that would load the whole {dplyr} library and maybe we don’t
want to do that.
We can add {dplyr} to Imports
but how to get
%>%
? Add a file import_packages.R
to the R
folder (the name of the file is unimportant).
#' @importFrom magrittr %>%
NULL
or add
#' @importFrom magrittr %>%
to the header of irisaverages.R
.
How would I ever remember this?? Sadly if your use the
%>%
pipe, you’ll gets lots of practice with this.
Starting with R version 4.1, there is now a
native R pipe, |>
, which works like
%>%
in most cases so you might want to switch to
that.
Add to the data
folder as an .rda
or
.RData
file.
setosa <- subset(iris, Species=="setosa")
save(setosa, file="data/setosa.rda")
Add in the R folder data-setosa.R
. Tip, it is good to
give your data documentation scripts a clear name tag to distinguish
them from functions.
#' @title The setosa dataset
#'
#' @description
#'
#' \itemize{
#' \item Sepal.Length. length of sepals
#' \item Sepal.Width. with of sepals
#' \item Petal.Length. length of petals
#' \item Petal.Width. with of petals
#' }
#'
#' @docType data
#' @name setosa
#' @usage data(setosa)
#' @references R base package.
#' @format A data frame.
#' @keywords datasets
NULL
Note, in the latest Roxygen2, you don’t need the
@name
but that only works if you use
LazyData: true
in your DESCRIPTION
file. You
might not want to load data every time the user loads the package.
The rda
filename in the data
folder is what
is used to load data. For example, let’s say you have
save(cars1, cars2, file="data/carsdata.rda")
So 2 data objects saved to one rda
file. To load both
data objects, you use
data(carsdata)
What do I document: cars1
, cars2
or
carsdata
? You can actually do whatever you want.
Do this to show this documentation with ?cars2
.
#' @title a dataset of horsepower for different cars
#'
#' @docType data
#' @name cars2
NULL
Do this to show this documentation with ?cars1
,
?cars2
, and ?carsdata
#' @title some datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
#' @aliases cars1 cars2
NULL
Do this to show this documentation with ?carsdata
.
#' @title three datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
NULL
This will only work for data that are exported. That means
Lazydata: true
and what is loaded from
data(carsdata)
.
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"cars2"
So this fails since it is not carsdata
that is exported.
That is just the name of the data file.
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"carsdata"
If/when you want to go into R packaging in more depth, see Hadley Wickham’s book R Packages. However, for simple packages you don’t need the book.