Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Selecting the correct image file type

Author: Colin Gillespie

Published: February 19, 2021

tags: r, knitr, graphics

This is part two of our four part series

  • Part 1: Specifying the correct figure dimension in {knitr}.
  • Part 2: What image format should you use for graphics (this post).
  • Part 3: Including external graphics in your document.
  • Part 4: Setting default {knitr} options.

There are (at least) three file formats to choose from: JPEG, PNG and SVG.

AttributeJPEGPNGSVG
TypeRasterRasterVector
TransparencyNoYesYes
AnimationNoNoYes
LossyYesNoYes
RecommendedOccasionallyYesOften

If you are reading this via a syndication site, be sure to go the original post for updated links.

JPEG Images

As the JPEG compression algorithm significantly reduces file size, JPEG files are ubiquitous across the web. If you take a photo on your camera, it’s almost certainly using a JPEG storage format. Historically the file extension was .jpg as Microsoft Windows only handled three character file extensions (also .htm vs .html). But today both extensions are used (personally I prefer .jpeg, but I’m not very consistent if I’m totally honest).

If you did a little Googling on which file format to use for images, then the answer you would come across is that JPEG’s are the default choice. But remember, figures are different from standard images! R figures have text, straight lines, lots of white space, and perhaps transparency.

However,

  1. JPEGs don’t support transparency.
  2. The algorithm used to compress a JPEG image is discrete cosine transform (DCT). Essentially, similar pixels within an image are merged. However, this averaging process means that the method is lossy, i.e. by storing the image as a JPEG, we are losing information.

So in general don’t use JPEGs. The only exception might be when you have a “photograph-type” plot, such as a detailed contour or heatmap. You would only consider this, however, if the file size of the PNG was large.

Do you use Professional Posit Products? If so, check out our managed Posit services

Portable Network Graphics (PNGs)

A Portable Network Graphics file (PNG) is a raster file format that uses lossless compression. It was originally created as a replacement for the GIF, but unlike GIFs, PNG files don’t support animations.

While the PNGs file size is a little larger than JPEGs, it is usually the better default option. For a nicer, smoother image, you should use type = "cairo-png" when creating a png.

The size of a PNG can often be reduced using the optipng utility. For graphs, a reduction of around 50% isn’t unusual. Running optipng takes around one to two seconds per image, so isn’t really suitable for dynamic, i.e. Shiny based applications. However, for generating images that will never change, e.g. blog posts, then an extra second or two is not an issue.

If you’ve already created a bunch of images, then a simple R script can easily optimise all files

png_files = list.files(pattern = "*\\.png$", 
                       full.names = TRUE, 
                       recursive = TRUE)
for (png in png_files) system2("optipng", png)

The optipng function has a compression argument, that allows you to improve the PNG compression, e.g.

system2("optipng", args = c("-o1", png))

The argument -o selects the level of compression. The higher the number, the hard optipng tries to compress. To be honest, I’ve found leaving the compression level at the default level to be more than significant. Life is too short to worry about the odd byte.

Scalable Vector Graphics (SVGs)

A Scalable Vector Graphics file (SVG) uses an XML-based format to precisely describe how the image should appear. Since the graph is described using text, an SVG can be scaled to different sizes without losing quality, i.e. we no longer worry about resolution.

This format is particularly appealing for figures, which are simply a combination of lines, texts and shapes. The downside is that file sizes can get prohibitively large. For example, if you have a scatter plot with lots of points, each individual point will have it’s own entry in the SVG file.

To understand the trades a bit more, lets create multiple {ggplot2} scatter plots where were gradually increase the number points

library("ggplot2")
no_of_pts = c(1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
for (i in no_of_pts) {
  x = runif(i)
  y = runif(i)
  g = ggplot(data = data.frame(x = x, y = y)) +
    geom_point(aes(x, y))
  # Save graphic using SVG, PNG, etc
}

The first scatter plot (when i = 1) only contains a single point, whereas the final scatter plot contains i = 100000 points, and is almost entirely black with points. For each scatter plot, we generated

  • a PNG version
  • a PNG version, optimised using optipng
  • a SVG graph.
Comparision of file sizes of different file formats.

The figure shows that the file size increases with the number of points. However, at around 10,000 pts, the PNGs file size starts to decrease. This is because the plots are simply a black background. However, the file sizes for SVGs increase in a predictably linear fashion. When plotting 200 pts, the file size is starting to get prohibitive. At around 5,000 pts, the SVG file is over 1MB.

As an aside, the JPEG file size is about the same as the PNG file size in this test.

Next Generation Formats

If you’ve ever used Google’s PageSpeed Insights, one of the recommendations is to serve images in next-gen formats, such as JPEG 2000, JPEG XR, and WebP. However, a little investigation suggests this is overkill for the vast majority of sites. As a significant numbers of browsers don’t yet support these formats, shaving off a few bytes doesn’t seem worth the effort. Obviously if you have a top 100 site like Amazon or are serving lots of images on a page, then it may/will be worth the hassle. For most sites using Hugo, shaving milliseconds off load time isn’t required.

Conclusion

Spending a little time optimising file size brings lots of benefits

  • your web-pages load faster
  • your documents are smaller
  • your gitlab repo clones quicker and takes up less storage

Typically, I don’t care about optimising to the nearest byte, but a combination of choosing the correct dimensions and the correct image type, gives you something close to optimal with little thought.


Jumping Rivers Logo

Recent Posts

  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 

Top Tags

  • R (236) 
  • Rbloggers (182) 
  • Pybloggers (89) 
  • Python (89) 
  • Shiny (63) 
  • Events (26) 
  • Training (23) 
  • Machine Learning (22) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Statistics (14) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Aida Gjoka 
  • Osheen MacOscar 
  • Keith Newman 
  • Shane Halloran 
  • Russ Hyde 
  • Myles Mitchell 
  • Tim Brock 
  • Theo Roe 
  • Colin Gillespie 
  • Gigi Kenneth 
  • Sebastian Mellor 
  • Pedro Silva 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd