Mapping the Spread of COVID-19 with Python

Published: March 26, 2020

tags: python, visualisation, coronavirus, covid-19, mapping, folium, pandas, geopandas, visualization, corona, virus, pystats

The purpose of this post isn’t to add new insight into the spread of the coronavirus - there are plenty of experts out there more qualified. Instead, our goal is to highlight how to construct simple, interactive visualisations using live data such as:

Getting the tools

Folium is a Python library that allows you to create different types of interactive Leaflet maps. Here, we are using the Novel Corona Virus 2019 Dataset to demonstrate how to make a choropleth (map) with a timeslider.

We will be using the following libraries:

branca.colormap, a utility module for dealing with colourmaps
folium, a tool to visualize Python data on an interactive Leaflet map
GeoPandas, an open source project to make working with geospatial data - in Python easier
NumPy, the fundamental package for scientific computing with Python
pandas, a fast, powerful, flexible and easy to use data analysis and manipulation tool

import branca.colormap as cm
import folium
import geopandas as gpd
import numpy as np
import pandas as pd

Preparing the data

I’ve stored all my data in a directory called “python”. Using the pandas library, we can read in the data via

corona_df = pd.read_csv("python/covid_19_data.csv")

SNo	ObservationDate	Province.State	Country.Region	Last.Update	Confirmed
1	01/22/2020	Anhui	Mainland China	1/22/2020 17:00	1
2	01/22/2020	Beijing	Mainland China	1/22/2020 17:00	14
3	01/22/2020	Chongqing	Mainland China	1/22/2020 17:00	6
4	01/22/2020	Fujian	Mainland China	1/22/2020 17:00	1
5	01/22/2020	Gansu	Mainland China	1/22/2020 17:00	0
6	01/22/2020	Guangdong	Mainland China	1/22/2020 17:00	26

In order to map our data, we need a shapefile. A shapefile is a geospatial vector data format. The shapefile format can spatially describe vector features such as points, lines, and polygons. In our case, countries are represented as polygons. GeoPandas provides an inbuilt dataset of country shapes ('naturalearth_lowres'), however this is missing some of the smaller countries that we require, such as Andorra. You can download the shapefiles that we used from here. We can read this file using GeoPandas:

countries = gpd.read_file('python/Countries_WGS84.shp')

The countries object looks very similar to a standard pandas dataFrame. The only addition is a geometry column containing (you guessed it) spatial information for that row. We can see here for Aruba, we have a polygon made up of several pairs of longitude and latitude.

X	OBJECTID	CNTRY_NAME	geometry
0	1	Aruba	POLYGON ((-69.8822326660156 12.4111099243165, -69.9469451904296 12.4366655349731, -70.0590362548828 12.5402078628541, -70.0596618652343 12.6277761459352, -70.0331954956055 12.6183319091797, -69.93223571777339 12.5280551910401, -69.89695739746089 12.4808330535889, -69.8914031982421 12.4722213745117, -69.88555908203131 12.4577770233155, -69.87486267089839 12.4152765274048, -69.8822326660156 12.4111099243165))

To be able to join the shapefile and the coronavirus data, we need to do edit some of the information in both files. Unfortunately, this is a rather manual process. First, we need to make sure that the country names match between the data and the shapefiles:

corona_df = corona_df.replace({'Country/Region' :
                      dict.fromkeys(['Taiwan',
                                     'Mainland China',
                                     'Hong Kong',
                                     'Macau'],
                                     'China')})
corona_df = corona_df.replace({'Country/Region' : 'US'},
                                'United States')
corona_df = corona_df.replace({'Country/Region' : 'UK'},
                                'United Kingdom')
corona_df = corona_df.replace({'Country/Region' : 'North Ireland'},
                                'United Kingdom')
corona_df = corona_df.replace({'Country/Region' : 'Republic of Ireland'},
                                'Ireland')
corona_df = corona_df.replace({'Country/Region' : 'Vatican City'},
                                'Italy')
countries = countries.replace({'CNTRY_NAME' : 'Byelarus'},
                               'Belarus')
countries = countries.replace({'CNTRY_NAME' : 'Macedonia'},
                               'North Macedonia')

We also need to make sure that the country column has the same name in both files:

countries = countries.rename(columns={'CNTRY_NAME': 'Country/Region'})

Some countries are included in the data despite having zero confirmed cases. So we remove these:

corona_df = corona_df[corona_df.Confirmed != 0]

We then sort our data by country and reset the index:

sorted_df = corona_df.sort_values(['Country/Region',
                     'ObservationDate']).reset_index(drop=True)

Some countries, such as China, are split into different provinces/states. Since we just want the total number of cases per country, we get the sum for each country at each date:

sum_df = sorted_df.groupby(['Country/Region', 'ObservationDate'], as_index=False).sum()

Now we can join the data and the shapefile together:

joined_df = sum_df.merge(countries, on='Country/Region')

We are going to plot the log of the number of confirmed cases for each country, as there are a couple of countries, such as China and Italy, with a lot more cases compared to other countries.

joined_df['log_Confirmed'] = np.log10(joined_df['Confirmed'])

We also need to convert the ObservationDate to unix time in nanoseconds:

joined_df['date_sec'] = pd.to_datetime(joined_df['ObservationDate']).astype(int) / 10**9
joined_df['date_sec'] = joined_df['date_sec'].astype(int).astype(str)

We can now select the columns needed for the map and discard the others:

joined_df = joined_df[['Country/Region', 'date_sec', 'log_Confirmed', 'geometry']]

Time to map

A choropleth is a type of map where regions are shaded or patterned proportionally to a data variable. We are going to make a choropleth with a timeslider, to show the spread of COVID-19 over time. The TimeSliderChoropleth class needs at least two arguments: a GeoJSON file containing the features (in this case, the countries) and a style dictionary. The style dictionary should have the following form:

styledict = {
    : {
        : {'color': , 'opacity': }
        : {'color': , 'opacity': }
        ...
        },
    ...,
    : {
        : {'color': , 'opacity': }
        : {'color': , 'opacity': }
        ...
        }
}

In this case, the keys are feature (country) ids. So for each country, we have a dictionary where the timestamps are the keys and the values are the colour and opacity of the country at that time.

We have to first initialise the map. Folium allows the use of different map tiles. If we do not specify a map, it defaults to OpenStreetMap. Here, we will use 'cartodbpositron':

mymap = folium.Map(tiles='cartodbpositron')
mymap.save(outfile='infinite_scroll.html')

Now we have a map of the world. However, there are a couple of problems: the continents are continually repeated and the map can be panned endlessly from either side. In order to prevent this from happening, we set a minimum zoom and set max_bounds=True:

mymap_fix_boundary = folium.Map(min_zoom=2, max_bounds=True, tiles='cartodbpositron')
mymap_fix_boundary.save(outfile='fix_boundary.html')

Much better. You might need to change the value of min_zoom depending on your platform. Now we define a colour map in terms of the log of the number of confirmed cases:

max_colour = max(joined_df['log_Confirmed'])
min_colour = min(joined_df['log_Confirmed'])
cmap = cm.linear.YlOrRd_09.scale(min_colour, max_colour)
joined_df['colour'] = joined_df['log_Confirmed'].map(cmap)

Next, we construct our style dictionary:

country_list = joined_df['Country/Region'].unique().tolist()
country_idx = range(len(country_list))

style_dict = {}
for i in country_idx:
    country = country_list[i]
    result = joined_df[joined_df['Country/Region'] == country]
    inner_dict = {}
    for _, r in result.iterrows():
        inner_dict[r['date_sec']] = {'color': r['colour'], 'opacity': 0.7}
    style_dict[str(i)] = inner_dict

Then we need to make a dataframe containing the features for each country:

countries_df = joined_df[['geometry']]
countries_gdf = gpd.GeoDataFrame(countries_df)
countries_gdf = countries_gdf.drop_duplicates().reset_index()

Finally, we create our map and add a colourbar:

from folium.plugins import TimeSliderChoropleth

slider_map = folium.Map(min_zoom=2, max_bounds=True,tiles='cartodbpositron')

_ = TimeSliderChoropleth(
    data=countries_gdf.to_json(),
    styledict=style_dict,

).add_to(slider_map)

_ = cmap.add_to(slider_map)
cmap.caption = "Log of number of confirmed cases"
slider_map.save(outfile='TimeSliderChoropleth.html')

By the time this blog post is published, this data will more than likely be out of date. For up to date information on COVID-19 in your area, the World Heath Orginisation and hub.arcgis.com are a great place to start. The Washington Post also produced a great article, using data simulation to show how social distancing is important in tackling COVID-19.

Mapping the Spread of COVID-19 with Python

Getting the tools

Preparing the data

Time to map

Recent Posts

Top Tags

Authors