Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Mapping the Spread of COVID-19 with Python

Author: Aoife Curran

Published: March 26, 2020

tags: python, visualisation, coronavirus, covid-19, mapping, folium, pandas, geopandas, visualization, corona, virus, pystats

The purpose of this post isn’t to add new insight into the spread of the coronavirus - there are plenty of experts out there more qualified. Instead, our goal is to highlight how to construct simple, interactive visualisations using live data such as:

Do you use Professional Posit Products? If so, check out our managed Posit services

Getting the tools

Folium is a Python library that allows you to create different types of interactive Leaflet maps. Here, we are using the Novel Corona Virus 2019 Dataset to demonstrate how to make a choropleth (map) with a timeslider.

We will be using the following libraries:

  • branca.colormap, a utility module for dealing with colourmaps
  • folium, a tool to visualize Python data on an interactive Leaflet map
  • GeoPandas, an open source project to make working with geospatial data - in Python easier
  • NumPy, the fundamental package for scientific computing with Python
  • pandas, a fast, powerful, flexible and easy to use data analysis and manipulation tool
import branca.colormap as cm
import folium
import geopandas as gpd
import numpy as np
import pandas as pd

Preparing the data

I’ve stored all my data in a directory called “python”. Using the pandas library, we can read in the data via

corona_df = pd.read_csv("python/covid_19_data.csv")
SNoObservationDateProvince.StateCountry.RegionLast.UpdateConfirmed
101/22/2020AnhuiMainland China1/22/2020 17:001
201/22/2020BeijingMainland China1/22/2020 17:0014
301/22/2020ChongqingMainland China1/22/2020 17:006
401/22/2020FujianMainland China1/22/2020 17:001
501/22/2020GansuMainland China1/22/2020 17:000
601/22/2020GuangdongMainland China1/22/2020 17:0026

In order to map our data, we need a shapefile. A shapefile is a geospatial vector data format. The shapefile format can spatially describe vector features such as points, lines, and polygons. In our case, countries are represented as polygons. GeoPandas provides an inbuilt dataset of country shapes ('naturalearth_lowres'), however this is missing some of the smaller countries that we require, such as Andorra. You can download the shapefiles that we used from here. We can read this file using GeoPandas:

countries = gpd.read_file('python/Countries_WGS84.shp')

The countries object looks very similar to a standard pandas dataFrame. The only addition is a geometry column containing (you guessed it) spatial information for that row. We can see here for Aruba, we have a polygon made up of several pairs of longitude and latitude.

XOBJECTIDCNTRY_NAMEgeometry
01ArubaPOLYGON ((-69.8822326660156 12.4111099243165, -69.9469451904296 12.4366655349731, -70.0590362548828 12.5402078628541, -70.0596618652343 12.6277761459352, -70.0331954956055 12.6183319091797, -69.93223571777339 12.5280551910401, -69.89695739746089 12.4808330535889, -69.8914031982421 12.4722213745117, -69.88555908203131 12.4577770233155, -69.87486267089839 12.4152765274048, -69.8822326660156 12.4111099243165))

To be able to join the shapefile and the coronavirus data, we need to do edit some of the information in both files. Unfortunately, this is a rather manual process. First, we need to make sure that the country names match between the data and the shapefiles:

corona_df = corona_df.replace({'Country/Region' :
                      dict.fromkeys(['Taiwan',
                                     'Mainland China',
                                     'Hong Kong',
                                     'Macau'],
                                     'China')})
corona_df = corona_df.replace({'Country/Region' : 'US'},
                                'United States')
corona_df = corona_df.replace({'Country/Region' : 'UK'},
                                'United Kingdom')
corona_df = corona_df.replace({'Country/Region' : 'North Ireland'},
                                'United Kingdom')
corona_df = corona_df.replace({'Country/Region' : 'Republic of Ireland'},
                                'Ireland')
corona_df = corona_df.replace({'Country/Region' : 'Vatican City'},
                                'Italy')
countries = countries.replace({'CNTRY_NAME' : 'Byelarus'},
                               'Belarus')
countries = countries.replace({'CNTRY_NAME' : 'Macedonia'},
                               'North Macedonia')

We also need to make sure that the country column has the same name in both files:

countries = countries.rename(columns={'CNTRY_NAME': 'Country/Region'})

Some countries are included in the data despite having zero confirmed cases. So we remove these:

corona_df = corona_df[corona_df.Confirmed != 0]

We then sort our data by country and reset the index:

sorted_df = corona_df.sort_values(['Country/Region',
                     'ObservationDate']).reset_index(drop=True)

Some countries, such as China, are split into different provinces/states. Since we just want the total number of cases per country, we get the sum for each country at each date:

sum_df = sorted_df.groupby(['Country/Region', 'ObservationDate'], as_index=False).sum()

Now we can join the data and the shapefile together:

joined_df = sum_df.merge(countries, on='Country/Region')

We are going to plot the log of the number of confirmed cases for each country, as there are a couple of countries, such as China and Italy, with a lot more cases compared to other countries.

joined_df['log_Confirmed'] = np.log10(joined_df['Confirmed'])

We also need to convert the ObservationDate to unix time in nanoseconds:

joined_df['date_sec'] = pd.to_datetime(joined_df['ObservationDate']).astype(int) / 10**9
joined_df['date_sec'] = joined_df['date_sec'].astype(int).astype(str)

We can now select the columns needed for the map and discard the others:

joined_df = joined_df[['Country/Region', 'date_sec', 'log_Confirmed', 'geometry']]

Time to map

A choropleth is a type of map where regions are shaded or patterned proportionally to a data variable. We are going to make a choropleth with a timeslider, to show the spread of COVID-19 over time. The TimeSliderChoropleth class needs at least two arguments: a GeoJSON file containing the features (in this case, the countries) and a style dictionary. The style dictionary should have the following form:

styledict = {
    : {
        : {'color': , 'opacity': }
        : {'color': , 'opacity': }
        ...
        },
    ...,
    : {
        : {'color': , 'opacity': }
        : {'color': , 'opacity': }
        ...
        }
}

In this case, the keys are feature (country) ids. So for each country, we have a dictionary where the timestamps are the keys and the values are the colour and opacity of the country at that time.

We have to first initialise the map. Folium allows the use of different map tiles. If we do not specify a map, it defaults to OpenStreetMap. Here, we will use 'cartodbpositron':

mymap = folium.Map(tiles='cartodbpositron')
mymap.save(outfile='infinite_scroll.html')

Now we have a map of the world. However, there are a couple of problems: the continents are continually repeated and the map can be panned endlessly from either side. In order to prevent this from happening, we set a minimum zoom and set max_bounds=True:

mymap_fix_boundary = folium.Map(min_zoom=2, max_bounds=True, tiles='cartodbpositron')
mymap_fix_boundary.save(outfile='fix_boundary.html')

Much better. You might need to change the value of min_zoom depending on your platform. Now we define a colour map in terms of the log of the number of confirmed cases:

max_colour = max(joined_df['log_Confirmed'])
min_colour = min(joined_df['log_Confirmed'])
cmap = cm.linear.YlOrRd_09.scale(min_colour, max_colour)
joined_df['colour'] = joined_df['log_Confirmed'].map(cmap)

Next, we construct our style dictionary:

country_list = joined_df['Country/Region'].unique().tolist()
country_idx = range(len(country_list))

style_dict = {}
for i in country_idx:
    country = country_list[i]
    result = joined_df[joined_df['Country/Region'] == country]
    inner_dict = {}
    for _, r in result.iterrows():
        inner_dict[r['date_sec']] = {'color': r['colour'], 'opacity': 0.7}
    style_dict[str(i)] = inner_dict

Then we need to make a dataframe containing the features for each country:

countries_df = joined_df[['geometry']]
countries_gdf = gpd.GeoDataFrame(countries_df)
countries_gdf = countries_gdf.drop_duplicates().reset_index()

Finally, we create our map and add a colourbar:

from folium.plugins import TimeSliderChoropleth

slider_map = folium.Map(min_zoom=2, max_bounds=True,tiles='cartodbpositron')

_ = TimeSliderChoropleth(
    data=countries_gdf.to_json(),
    styledict=style_dict,

).add_to(slider_map)

_ = cmap.add_to(slider_map)
cmap.caption = "Log of number of confirmed cases"
slider_map.save(outfile='TimeSliderChoropleth.html')

By the time this blog post is published, this data will more than likely be out of date. For up to date information on COVID-19 in your area, the World Heath Orginisation and hub.arcgis.com are a great place to start. The Washington Post also produced a great article, using data simulation to show how social distancing is important in tackling COVID-19.


Jumping Rivers Logo

Recent Posts

  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 

Top Tags

  • R (236) 
  • Rbloggers (182) 
  • Pybloggers (89) 
  • Python (89) 
  • Shiny (63) 
  • Events (26) 
  • Training (23) 
  • Machine Learning (22) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Statistics (14) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Tim Brock 
  • Theo Roe 
  • Colin Gillespie 
  • Aida Gjoka 
  • Shane Halloran 
  • Russ Hyde 
  • Gigi Kenneth 
  • Osheen MacOscar 
  • Sebastian Mellor 
  • Keith Newman 
  • Pedro Silva 
  • Myles Mitchell 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd