R Package Quality: Documentation

This is part three of a five part series of related posts on validating R packages. Other posts in the series are:
- Validation Guidelines
- Package Popularity
- Package Documentation (this post)
- Code Quality
- Maintenance
In this post, we’ll take a closer look at package documentation and how it helps assess the “risky-ness” of a package The documentation score evaluates how complete and helpful a package’s documentation is. Package documentation comes in many guises. It could be a function examples, vignettes or even a website. While we don’t believe every package must have a website, vignettes, and examples. But the absence of all three usually points to weak documentation.
When validating R packages, documentation contributes around 15% to the total.
Score 1: Exported Objects Documentation
A score based on the proportion of exported objects that have documentation. For example, if we have ten functions, but only eight are documented, then the score would be 0.8. For all packages on CRAN, this is almost certainly 1, but for packages that are only available on GitHub, this may be less.
Score 2: Proportion of Help Pages with Examples
A score based on the proportion of help pages that have examples.
Score 3: NEWS file
A NEWS file is an indication of a development and release cycle.
It helps users understand what has changed between versions.
This detects the presence of a NEWS file.
Of course, R packages make this interesting with NEWS
, NEWS.md
, inst/NEWS.md
and/or Changelogs
!
Score 4: Vignettes
Around 40% of CRAN packages have a single vignette, with only 10% having more than one vignette - we checked! For simplicity, this score is a simple binary metric, based on whether a package has any vignettes.
Score 5: Package Website
Does a package have an associated website? Ten or fifteen years ago, package websites were rare. Today, GitHub and GitLab make it easy for a package to host a website.
Score 6: NEWS updated to the Current version
The package’s NEWS file is outdated or missing, making it challenging to track recent changes, bug fixes, or updates. This lack of transparency may pose risks, as users are unable to verify whether critical updates have been implemented.
Summary
We can all agree that a package doesn’t need all of the components described above. It’s perfectly reasonable to have few examples, but very detailed vignettes. The important point is to investigate packages that have little documentation.
Examples
Using the packages from the previous blog post, and omitting scores where all packages scored 1, we have the following results
Package | News Current | Vignettes | Examples |
---|---|---|---|
drat | 1.00 | 1.00 | 0.43 |
microbenchmark | 0.00 | 0.00 | 0.20 |
shinyjs | 1.00 | 1.00 | 0.90 |
tibble | 1.00 | 1.00 | 0.61 |
tsibble | 0.00 | 1.00 | 0.82 |
All packages use source control, have a package website and provide documentation.
The {microbenchmark}
doesn’t have NEWS/Changelog. Similarly it’s missing vignettes.
But recall it still has a high overall package score.
The idea behind litmus, isn’t that a package must be perfect, but to take a pragmatic approach to scoring.
Oddly, the {tsibble}
package does have a NEWS file, but it doesn’t mention the latest version, but I think this was an oversight.
