Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Timing hash functions with the bench package

Author: Colin Gillespie

Published: May 21, 2019

tags: r, tidyverse, timing, digest

This blog post has two goals

  • Investigate the {bench} package for timing R functions
  • Consequently explore the different algorithms in the {digest} package using {bench}

What is {digest}?

The {digest} package provides a hash function to summarise R objects. Standard hashes are available, such as md5, crc32, sha-1, and sha-256.

The key function in the package is digest() that applies a cryptographical hash function to arbitrary R objects. By default, the objects are internally serialized using md5. For example,

library("digest")
digest(1234)
## [1] "37c3db57937cc950b924e1dccb76f051"
digest(1234, algo = "sha256")
## [1] "01b3680722a3f3a3094c9845956b6b8eba07f0b938e6a0238ed62b8b4065b538"

The {digest} package is fairly popular and has a large number of reverse dependencies

length(tools::package_dependencies("digest", reverse = TRUE)$digest)
## [1] 186

The number of available hashing algorithms has grown over the years, and as a little side project, we decided to test the speed of the various algorithms. To be clear, I’m not considering any security aspects or the potential of hash clashes, just pure speed.

Do you use Professional Posit Products? If so, check out our managed Posit services

Timing in R

There are numerous ways of timing R functions. A recent addition to this list is the {bench} package. The main function bench::mark() has a number of useful features over other timing functions.

To time and compare two functions, we load the relevant packages

library("bench")
library("digest")
library("tidyverse")

then we call the mark() function and compare the md5 with the sha1 hash

value = 1234
mark(check = FALSE,
     md5 = digest(value, algo = "md5"),
     sha1 = digest(value, algo = "sha256")) %>%
  select(expression,  median)
## # A tibble: 2 x 2
##   expression   median
##   <bch:expr> <bch:tm>
## 1 md5          50.2µs
## 2 sha1         50.9µs

The resulting tibble object, contains all the timing information. For simplicity, we’ve just selected the expression and median time.

More advanced {bench}

Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with mark(), as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more than one value. For example, in the {digest} package there are eight different algorithms. Ranging from the standard md5 to the newer xxhash64 methods. To compare times, we’ll generate n = 20 random character strings of length N = 10,000. This can all be wrapped up in the single function press() function call from the {bench} package:

N = 1e4
results = suppressMessages(
  bench::press(
    value = replicate(n = 20, paste0(sample(LETTERS, N, replace = TRUE), collapse = "")),
    {
      bench::mark(
        iterations = 1, check = FALSE,
        md5      = digest(value, algo = "md5"),
        sha1     = digest(value, algo = "sha1"),
        crc32    = digest(value, algo = "crc32"),
        sha256   = digest(value, algo = "sha256"),
        sha512   = digest(value, algo = "sha512"),
        xxhash32 = digest(value, algo = "xxhash32"),
        xxhash64 = digest(value, algo = "xxhash64"),
        murmur32 = digest(value, algo = "murmur32")
      )
    }
  )
)

The tibble results contain timing results. But it’s easier to work with relative timings. So we’ll rescale

rel_timings = results %>%
  unnest() %>%
  select(expression, median) %>%
  mutate(expression = names(expression)) %>%
  distinct() %>%
  mutate(median_rel = unclass(median/min(median)))

Then plot the results, ordered by slowed to largest

ggplot(rel_timings) +
  geom_boxplot(aes(x = fct_reorder(expression, median_rel), y = median_rel)) +
  theme_minimal() +
  ylab("Relative timings") + xlab(NULL) +
  ggtitle("N = 10,000") + coord_flip()

The sha256 algorithm is about three times slower than the xxhash32 method. However, it’s worth bearing in mind that although it’s relatively slower, the absolute times are very small

rel_timings %>%
  group_by(expression) %>%
  summarise(median = median(median)) %>%
  arrange(desc(median))
## # A tibble: 8 x 2
##   expression   median
##   <chr>      <bch:tm>
## 1 sha256      171.2µs
## 2 sha1        112.6µs
## 3 md5         109.5µs
## 4 sha512      108.4µs
## 5 crc32          91µs
## 6 xxhash64     85.1µs
## 7 murmur32     82.1µs
## 8 xxhash32     77.6µs

It’s also worth seeing how the results vary according to the size of the character string N.

Regardless of the value of N, the sha256 algorithm is consistently in the slowest.

Conclusion

R is going the way of “tidy” data. Though it wasn’t the focus of this blog post, I think that the {bench} package is as good as other timing packages out there. Not only that, but it fits in with the whole “tidy” data thing. Two birds, one stone.


Jumping Rivers Logo

Recent Posts

  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 

Top Tags

  • R (236) 
  • Rbloggers (182) 
  • Pybloggers (89) 
  • Python (89) 
  • Shiny (63) 
  • Events (26) 
  • Training (23) 
  • Machine Learning (22) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Statistics (14) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Tim Brock 
  • Theo Roe 
  • Colin Gillespie 
  • Aida Gjoka 
  • Shane Halloran 
  • Gigi Kenneth 
  • Osheen MacOscar 
  • Sebastian Mellor 
  • Keith Newman 
  • Pedro Silva 
  • Russ Hyde 
  • Myles Mitchell 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd