Inserting images into articles is something I have been battling with since my university days, when getting it right with LaTex was the ultimate challenge.
These days things have gotten much better, thanks to the use of formats like MarkDown or RestructuredText. These make it quite straightforward to include a JPG or PNG image in a document for the web (and can be used to produce PDF documents as well).
But these are still just static images. Instead, interactive images — where one can zoom, pan, and select different parts of the graph — allow the user to explore and better understand the visualization and the data. This goes up a notch from static plots and it is quite useful when the amount of information displayed is considerable.
Until a few years ago interactive visualizations were relegated to the wizardry of some Javascript gurus and only for the web. Luckily, several new Python libraries (e.g. Bokeh, Plotly ) came out that produce interactive graphs for e.g. Jupyter notebooks or dashboards.
When it comes to interactive visualizations, people would often go for the dashboard route, i.e. a bespoke web page/application that is created for the purpose of visualizing a specific set of data. But what if we just wanted to include a more user-friendly graph in a regular blog post? This is a much simpler solution than programming a full web application or dashboard. It comes down to the ability to export graphs in HTML format.
Plotly is a Javascript-based declarative data-visualization library. The library has APIs in many popular languages, most notably R and Python. It can be used to build powerful data visualizations and dashboards. Graphs created with Plotly are fully interactive and can be embedded in a standalone static web page. Let’s see how to do it using the Python API.
The library comes with some handy demo data that we can use. In this case, we’ll use the Gapminder dataset.
Plotly 4.0 offers a renewed
interface through the plotly.express
submodule.
A function call creates a Pandas DataFrame
gapminder
which can be directly fed into the
plotting function px.scatter()
which creates
a figure object.
Using a subset of the original dataframe
(year==2007
), we specify
which columns should be used on what axes/dimensions
of the graph (x, y, size, color
).
Lastly we specify the size of the plot in pixels.
from plotly import express as px
gapminder = px.data.gapminder()
fig = px.scatter(
gapminder.query("year==2007"),
x="gdpPercap",
y="lifeExp",
size="pop",
color="continent",
hover_name="country",
log_x=True,
size_max=60,
height=400,
width=650,
)
The last step is displaying the figure object.
This can be done directly to the screen or to a variety
of other formats for further consumption
using a set of bespoke methods.
In this case, the .to_html()
method will do the job.
The HTML output can be written to a file:
with open('plotly_graph.html', 'w') as f:
f.write(fig.to_html(include_plotlyjs='cdn'))
The include_plotlyjs='cdn'
keyword argument includes
the plotly.js library as a link to the official CDN.
The figure can be made offline-viewable by setting
include_plotlyjs=True
but that adds about 3MB of
extra code to the HTML document.
The document we just produced is a standalone HTML page, viewable by most modern browsers. However, it’s only containing the graph. There is a small extra step needed in order to include the graph in another page, e.g. this blog post.
The HTML document we just produced should be a familiar
one, composed by a header and a body, enclosed
within their respective <head>
and <body>
tags.
We are interested in the content of the body.
The body should contain a single <div>
which contains everything that is needed to visualize the graph.
We can just take that div and copy it to the point
in the page where we want to display the graph.
Then we should see the figure, displayed
exactly as it was created within plotly.py
.
Voila: