Compartmentalized | Documented | Extendible | Reproducible | Robust |
This week I will discuss topics that are specific for R package developers. These are topics that can tough to figure out when you are starting:
R CMD check
A good resource is Hadley Wickham’s book R Packages.
Read about all the details of NAMESPACE in the R package book section on NAMESPACEs. But that has a lot of complex cases. I suggest trying to keep things as simple as you can.
Package: MyNewPackage
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
Depends: R (>= 4.0.0), ggplot2
Imports: forecast
Suggests: knitr
License: What license is it under? (GPL-3 or CCO for US Government)
Encoding: UTF-8
LazyData: true
Note see the pkgdown-template for how to set up your license for a NOAA package.
@Depends
These packages that will be loaded when your package is loaded. So if you have ggplot2 in @Depends
, like above, then the user automatically can use ggplot2 functions without issuing the command library(ggplot2)
. Note for public packages, it is polite to alter the user’s workspace as little as possible so only put a package in Depends
if you really need to or if it wouldn’t make sense to load your package and not have another package available.
If you have a package on @Depends
then you must import that package in your NAMESPACE. Let’s say ggplot2 (and only that) is on your @Depends
line.
import_packages.R
in the R folder and add this code to it. The NULL
is important.#' @import ggplot2
NULL
import(ggplot2)
In your package functions, you should use ::
to access the functions from the packages on the @Depends
line. Strictly speaking, you don’t have to, but I suggest you do. Otherwise a) if you ever move that package from @Depends
to @Imports
, you have much suffering in front of you as you search for all the functions and change to use ::
. b) Other people (or yourself) will know where every function comes from. c) Should you ever, perhaps inadvertently, create a function with the same name as one of the @Depends
functions, you won’t run into a conflict.
@Imports
Imports are required for the package functions, but the user will not have access to the functions without calling library(...)
. In your package, you must use ::
to access the functions from the packages on the @Imports
line. Most of your package dependencies will be here.
To limit the number of headaches that users face when trying to install your package, use as few packages on your @Depends
and @Imports
lines in DESCRIPTION file as possible. If your package does not need the package to work, then put the package on @Suggests
.
Many packages are loaded by base R but you still need to declare those in @Depends
or @Imports
. For example lm()
is in the stats package. Tip Routinely use ?function
to figure out what package a function comes from and then add xyz::
to that function if you are writing a function for a package. Or add the package to @Depends
and to your NAMESPACE.
@Suggests
These packages are used in vignettes or examples. You (the developer) will need these installed when you check your package.
@Depends
No not unless you have a version dependency. There were some big changes in R 4.0 so you might need that dependency. If you are doing a public R package, search about for options for testing your package under different R versions.
First off, when you are starting, don’t worry too much about this. Just add packages that are needed as you work on your functions.
ALWAYS use ::
to use functions from other packages in your package functions. Seriously. You will save yourself so many headaches down the road by being able to search for xyzpackage::
to find all that packages functions. Why? Trust me, one day you will want to swap out packages or remove dependencies. Note, this can be a hassle with functions like ggplot()
which use functions within their calls and you have to use ::
everywhere. Like so
ggplot2::ggplot(df) +
ggplot2::geom_point(ggplot2::aes(gp, y))
Arg. Another example is say a GAM call:
mgcv::gam(a ~ mgcv::s(b), data=df)
But this is just for your package functions. In your scripts, you’d probably use a library()
call.
Never ever use library()
(or require()
) in a function! Use xyzfunction::function
. Sure use library()
in your scripts, but never in a package function. When you add a function from a new package to your function, add those packages to @Depends
or @Imports
in your DESCRIPTION file as you go along.
Every so often check that you don’t have packages on @Depends
and @Imports
that you don’t use. Just do a Edit > Find in Files… search for xyzpackage::
to find out if you are still using xyzpackage
.
How do you know if you forgot a dependency or forgot a ::
somewhere? A few ways:
library(yourpackagename)
and try your functions. Things will fail if you have a package in @Imports
but forgot a ::
somewhere.I have R packages that are mainly for my personal use. I use the package to make sure I have access to the various packages that I’ll be using. So for example, if I am doing work on my sardine papers, I have set of packages that I use. When I issue the command library(SardineForecast)
a bunch of packages are loaded. This makes it handy for me, but all those dependencies makes it a huge hassle to install the package from GitHub for my collaborators (and even a hassle for me to install from GitHub). Huge Hassle. Invariably one of the 15 packages that I need will itself have a dependency that won’t load and then I have to debug that. If I need collaborators, who are on different operating systems and various versions of R to install it, it’s a suffer-fest.
For my MARSS package, I have only 3 non-base dependencies in the @Imports
line and nothing on the @Depends
line besides R. Then on the @Suggests
line, I have a bunch of packages that are used in the vignettes. MARSS is easy to install from GitHub (though it is also hosted on CRAN).
The is also called R CMD check
but if you are using RStudio, you can use Build tab > More… > Check
R CMD check
has many errors that can be hard to decipher. I am going to go through the common hard-to-decipher ones.
You will get weird errors about undeclared global variables if you use ggplot2 and dplyr functions.
This will thrown an error:
ggplot(df, aes(x=x, y=y))
You need to explicitly say that x
and y
come from the data argument (df
). Technically, the ggplot()
arguments are ggplot(data=df, mapping=aes())
.
ggplot2::ggplot(df, ggplot2::aes(x=.data$x, y=.data$y))
A lot of the dplyr functions must also be specified like this. This will thrown an check error:
dplyr::select(df, x)
You need to do this.
dplyr::select(df, .data$x)
All the dplyr verbs will throw this check error: arrange()
, filter()
, mutate()
, etc.
%>%
pipes%>%
this is actually a function and you need to import it from the magrittr package.
import_packages.R
in the R folder and add this code to it. The NULL
is important.#' @importFrom magrittr %>%
NULL
importFrom(magrittr, %>%)
Note,
Depends: magrittr (>= 2.0)
in your DESCRIPTION file. Note, R is picky about the space in front of the version number.
|>
. It’s a little than the magrittr pipe. If you use that in your package, you’ll need to add a dependency on R 4.1:Depends: R (>= 4.1)
::
somewhereUse dontrun{}
to make code that won’t run. Horribly, it can be really hard to actually not run this code, so make sure the code is correct. If you are showing bad code, then you’ll need to comment it out.
Use donttest{}
. It is hard to get this respected when you run check. Setting the system env flag should force R CMD check
to respect donttest{}
:
Sys.setenv("R_CHECK_DONTTEST_EXAMPLES"=FALSE)
But R Studio’s ‘check’ via the Build tab uses devtools::check()
and that doesn’t respect that flag. It hard codes in --run-donttest
. So go to Tools > Project Options > Build Tools and uncheck the little box that says ‘Use devtools functions if available’. Then try clicking Check from the Build tab. R package developers have been complaining about this a lot. It is new to R 4.0.
For getting R CMD check
to pass in a GitHub Action when you have donttest{}
in examples, see the comments below on GitHub Actions.
S3 classes and methods are easy to make.
So let’s say your package fits a model via lm()
and returns that fit. Then just assign an additional class to the fit:
fit <- lm(a ~ b, data=df)
class(fit) <- c("foo", class(fit))
print.foo <- function(x){
cat("Hello, this is a foo object!\n")
cat(paste("Your r2 is", summary(fit)$adj.r.squared), "\n")
}
S3method(print, foo)
This is an personal list of some simple degugging tools. RStudio has debugging tools too but I don’t know those.
debug()
debug(function)
undebug(function)
Allows you to go line by line through the function and interact at the command line. Use the little icons above the console window to step out of for
loops.
browser()
Put in your code where you want to enter the function.
options(error=recover)
Type this on the command line. Puts you into browser()
at the point of the error (instead of a specific spot)
traceback()
Tells you where your code stopped. Note RStudio will show this also. Check your Project Options under Tools if you don’t see Traceback on errors.
system.time( functionname )
How long does your function take?
Rprof()
and summaryRprof()
Profile your code to find out what are the time hogs.
a <- matrix(0,10,100)
Rprof(tmp<-tempfile())
for(i in 1:10000){ b <- t(a)%*%a }
Rprof()
summaryRprof(tmp)$by.self
## self.time self.pct total.time total.pct
## "%*%" 0.62 96.88 0.62 96.88
## "t.default" 0.02 3.12 0.02 3.12
Rprof(NULL) #stop profiling
Check out the profvis for profiling your code. I haven’t used it but others have said it’s a great tool.
This is a handy package for comparing speeds of code.
library(microbenchmark)
a <- 2
res <- microbenchmark(2 + 2, 2 + a, sum(2, a), sum(2, 2))
ggplot2::autoplot(res)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 11 rows containing non-finite values (stat_ydensity).
This shows an example of code using piping (%>%
) versus without. This is why I do not use piping in my simulations. It is slow though it has gotten much faster in magrittr 2.0.
library(magrittr)
x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907)
fun1 <- function(x){
x %>% log() %>%
diff() %>%
exp() %>%
round(1)
}
fun2 <- function(x){ round(exp(diff(log(x))), 1) }
res <- microbenchmark::microbenchmark(fun1(x), fun2(x))
ggplot2::autoplot(res)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
You should stick with a uniform style guide to make your code easier to follow. I use the tidyverse style guide with the styler R package. styler has an RStudio Addin which does all the work of styling my code for me. Install the package, restart RStudio, and then go to Tools > Addins > Browse Addins. Scroll down to styler, and select the file(s), you want to style.
Adding this line to your DESCRIPTION file can really speed up your code. This is one of the advantages of putting your functions in a package. It can actually make your functions faster.
ByteCompile: TRUE
The code you will use to install from GitHub is:
library(devtools)
install_github("youraccount/MyNewPackage")
For example to install the package on ‘RVerse-Tutorials’, you would use
install_github("RVerse-Tutorials/TestPackage")
Also look into remotes. I see that used now instead of devtools for this.
To install the latest release rather than the main branch use @release
at the end.
install_github("RVerse-Tutorials/TestPackage@release")
If you are on a Windows machine and get an error saying ‘loading failed for i386’ or similar, then try
options(devtools.install.args = "--no-multiarch")
If R asks you to update packages, and then proceeds to fail at installation because of a warning that a package was built under a later R version than you have on your computer, use
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS=TRUE)
If R asks you to update packages, you don’t need to update (normally). If you do update, and it asks if you want to install from source, you can probably say ‘No’. It is very unlikely that the package you trying to install needs the most updated version of a package. If that were the case, the package writer could have explicitly stated a version dependency, like forecast (>=2.0)
.
If R simply won’t install a package from GitHub (or CRAN even) because of a package dependency problem, something like can't install because couldn't remove old installation
error. Then click on the Packages tab (lower right panel) and click Install. Look at where R is installing packages. There might be more than one place. Close all your RStudio windows (exit altogether) and go to those locations and delete the library folder(s) for the offending package. Then open RStudio and re-install that package.
To limit the number of headaches that users face when trying to install your package from GitHub/Lab, use as few packages on your @Depends
and @Imports
lines in DESCRIPTION file as possible. If your package does not need the package to work, then put the package on @Suggests
.
This helps you automate tasks when you push (say) changes to GitHub. The super common one is check package and getting that nifty Passing badge.
usethis::use_github_actions()
Note what it does because you might need to change things.
usethis::use_github_actions_badge()
Set up the R CMD check
badge.
Look at examples:
R CMD check
doesn’t run donttest{}
in examples..github/workflows
folder and look at examples of workflows.You can also use GitHub Actions is many more ways to help you automate workflows.
Read this the compiled code section in the R package handbook.