Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

R Package Quality: Package Popularity

Author: Colin Gillespie

Published: June 26, 2025

tags: r, litmus, validation, popularity, scoring

This is part two of a five part series of related posts on validating R packages. Other posts in the series are:

  • Validation Guidelines
  • Package Popularity (this post)
  • Package Documentation
  • Code Quality
  • Maintenance

In our previous post, we introduced the four components that make up a litmus package score: documentation, popularity, code quality, and maintenance. In this post, we’ll look at package popularity. Package popularity is an interesting, and sometimes controversial, measure. In our experience it often sparks strong (and usually negative) reactions. The idea is simple: if a package is widely used, bugs are more likely to be found and fixed, and if the maintainer steps away, there’s a higher chance someone else will take over. Of course, high usage doesn’t mean a package is risk-free. But popularity can provide helpful context. Consider this example:

  • {pkgA}: Extremely popular and a dependency for many other packages.
  • {pkgB}: Very few downloads and minimal usage.

In a situation like this, {pkgA} may offer more stability over time, simply because more people rely on it. It does not mean that {pkgA} is risk free, only that the risk is lower than {pkgB}.

All other things being equal, if you had sixty minutes to assess both packages, would you spend thirty minutes on each, or weight your time to the “least popular” package?

It’s important to keep in mind that statistical packages tend to be less popular than “foundational” ones. Packages for tasks like data wrangling, date-times, and plotting are used by nearly everyone, regardless of the use case. In contrast, more specialised packages, for example, those designed to handle experimental designs with drop-outs, naturally have a smaller audience.

So a lower popularity doesn’t necessarily reflect lower quality or usefulness. It may just reflect a more niche purpose.

Score 1: Yearly Downloads

For packages on CRAN, we can obtain download statistics. Of course, the obvious question is, “what is a large number of downloads?” To answer this question, we obtained the download statistics of every package on CRAN, and used that data as the basis of our score.

More precisely, if a package is in the upper quartile for the number of package downloads (approximately 7,000 downloads per year), the package is scored 1. Otherwise, the empirical CDF is used to score.

Score for Yearly downloads

Of course, you could choose a different period of time, say month, or a trend over time. But our investigations suggest that while having a variety of scores based on downloads, very little new information is gained. But there is an additional increase in complexity.

Need help with R package validation to unleash the power of open source? Check out the Litmusverse suite of risk assessment tools.

Score 2: Reverse Dependencies

We also examine the number of reverse dependencies, that is, how many other packages rely on it. The reasoning is simple: if many packages depend on it, there’s a greater chance that bugs will be spotted and fixed. It also suggests that other developers have reviewed and trusted the package enough to build on top of it.

Similar to package downloads, we used all packages on CRAN as a basis for scoring. Packages in the top quartile for reverse dependencies receive a score of 1. All others are scored using the empirical cumulative distribution function (CDF). In practice, this ends up behaving like a near-binary score, since only a small number of packages have significant reverse dependencies.

Examples

We’ve selected five packages to illustrate these scores - the total litmus score is given in brackets:

  • {drat} (0.94): A fantastic little package that simplifies creating local R repositories.
  • {microbenchmark} (0.87): A useful utility package, for (precisely) measuring function calls in R.
  • {shinyjs} (0.90): Perform common useful JavaScript operations in Shiny apps, created by Dean Attali.
  • {tibble} (0.81): The cornerstone(?) of the tidyverse.
  • {tsibble} (0.80): Tibbles for time series.

All five packages, as we would expect, have a high overall litmus score; we didn’t want to pick on more risky packages!

For package popularity, which makes up 15% of the total litmus score, all five packages selected, score a maximum of 1 for downloads and reverse dependencies. Potentially, we could change the score to make it a more “continuous” measure. For example, the number of downloads for {tibble} is always more than {tsibble}, as the latter depends on the former. However, the purpose of assessing packages, isn’t to provide a ranked list of packages, it’s to identify packages that are potentially risky. So having a more continuous measure isn’t that helpful.

Summary

We tend to think about package popularity as a way of crowd sourcing information about the package of interest. As we’ve mentioned, it’s only a signal, and as such it only contributes to 15% of the overall litmus score.


Jumping Rivers Logo

Recent Posts

  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 

Top Tags

  • R (236) 
  • Rbloggers (182) 
  • Pybloggers (89) 
  • Python (89) 
  • Shiny (63) 
  • Events (26) 
  • Training (23) 
  • Machine Learning (22) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Statistics (14) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Aida Gjoka 
  • Gigi Kenneth 
  • Osheen MacOscar 
  • Keith Newman 
  • Shane Halloran 
  • Russ Hyde 
  • Sebastian Mellor 
  • Myles Mitchell 
  • Pedro Silva 
  • Tim Brock 
  • Theo Roe 
  • Colin Gillespie 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd