Lately, I’ve been using Python’s matplotlib plotting library to generate a lot of figures, such as, for instance, the bar charts I showed in this talk.
To improve readability, I like to put a number label at the top of each bar that gives the quantity that that bar represents. When I realized I wanted to add these labels to my charts, the first thing I did was look at this example from the matplotlib documentation, which seemed to be doing something a lot like what I wanted:
In the code that generates this figure, this little autolabel
function is responsible for putting labels on the bars:
1 2 3 4 5 6 7 8 9 10 

The autolabel
function expects its rects
argument to be a container that can be iterated over to get each of the bars of a bar plot. (Conveniently, the bar
method returns such a container.)
autolabel
was a good start for what I wanted to do, but unfortunately, it isn’t very robust. In the above figure, each column represents a number between 20 and 35:
1 2 

But what if we try to use the same code with some different data?
1 2 

Here’s what the plot looks like now:
Oh, dear. Now we’ve got ‘1300’, and to a lesser extent ‘1250’ and ‘1145’, just hanging out up there in space. Meanwhile, ‘15’ and ‘10’ are crowding the columns that they’re supposed to be above. How did that happen?
Looking again at autolabel
, we see that it uses the expression 1.05*height
to determine where to put the text label that goes with a given rectangle of height height
. So, autolabel
is multiplying the rectangle’s height by a small number, and the result is the height of the gap from the top of the column to where the text appears.
If height
varies more than a little from bar to bar, then multiplying that small number and height
will produce gaps of awkwardly varying size. It’s only the fact that the bar heights in the original example only vary from 20 to 35 that stop it from looking terrible. In fact, now having realized that the gap sizes depend on the data, we can see it there, too: the gap between ‘20’ and the top of its column is noticeably smaller than the gap just below ‘35’. That’s no good.
First attempt at a fix: add, don’t multiply
One way to fix this would be to add a suitable number to the column height, instead of multiplying, and use the result to determine where to put the label text. That is, instead of writing 1.05*height
, we can write height + 10
, or something like that. Indeed, people who answer questions about such things on Stack Overflow have already arrived at this solution. Using height + 10
in our own code, we get:
Alas, this approach isn’t robust, either. This is what it looks like when we try to go back and plot our original data:
Oh, no! Now our gaps, although all the same size, are way too big. Most of the labels are actually off the chart. In order to get this right, we’d have to change height + 10
to something smaller that would look nice with this data, like height + 1
:
That’s better. But having to pick a different number for every figure we plan to generate sounds about as fun as cleaning up my cat’s puke. Is there an approach to label placement that will work regardless of what the data looks like?
A more robust fix: scale according to the height of the axis
Why does adding a constant like 10 to height
not work the way we want it to? The problem is that height
is not in units of centimeters, or furlongs, or any unit of distance that would be consistent from one figure to the next; it’s in “axis points”, the same units as the actual data being plotted! For instance, the bar furthest to the left in the plot of our first set of data is 20 axis points tall, while the leftmost bar in the plot of our second set of data is 860 axis points tall. A gap of height 10 next to a column of height 20 is different from a gap of height 10 next to a column of height 860.
What we really want is to scale the height of the label gaps to whatever is reasonable for our figure. The trick to doing this is to look at the height — given in axis points — of the yaxis of the plot. For instance, with our first set of data, the range of the yaxis is [0, 40]
, so it has a height of 40 axis points, while in the second set, the yaxis range is [0, 1400]
, so, a height of 1400 axis points. If we can find out what the height of the yaxis is in axis points, we can have the label gaps be a fixed fraction of that height. We still have to decide what that fraction will be — but we only have to do that once, and then we’ll have proportionallysized gaps in every figure we generate.
To do this, we can call matplotlib’s get_ylim
method on an Axes
object to get the yaxis range. In the case of our example code, we even already have an Axes
object, called ax
, which we can pass to autolabel
. Then, we can find out the height of the yaxis by subtracting the bottom of its range from the top of its range, and finally, we can position the label above each bar at a height in proportion with the yaxis height.
Here’s what the revised code looks like, where I’ve chosen 0.01 as the number to multiply the axis height by.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Choosing 0.01 will give us gaps of 0.4 axis points for our first set of data, and 14 axis points for our second set. Here’s what it looks like when we plot the first data set:
And the second:
Much better!
One more thing
There’s also one last refinement that I made for my own plotting. As we saw above, sometimes bar labels run over the top edge of the figure, and it can happen even if we’re using our axisheightbased approach. For example, if we change ‘1300’ to ‘1350’ in the data, the above plot turns into this:
Not so nice. But we can have autolabel
handle this situation as well. For each bar, we can determine how much of the axis height it takes up. If the bar takes up almost all the height, say, 95% or more of it, we can choose to put the label inside the bar instead of above it. We just position the label at a certain distance below the top of the bar (again, proportional to the yaxis height), instead of above it. The exact percentage of the height we pick is a matter of what looks good, as is the yaxis height multiplier we use; in the code below, I picked 95% and 0.05 for these after some fiddling. But, again, you only have to set these once, and then they’ll work for every plot you do.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

Now our plot looks like this:
And that’s it! It’s also possible to have the labels go inside the bars by default, except in cases where the bars are too short to accommodate them, and that’s an easy change to the above code, left as an exercise to the reader.
Update (August 19, 2016): A few days ago, I wanted to use this labeling approach for a matplotlib bar chart that used a log scale for the yaxis. It was a lot like the above example, but with the addition of an ax.set_yscale('log')
call.
Does the above approach still work? Almost! We just need to make a onecharacter change to the above code, changing
1


to
1


(and possibly also tweaking the constant we’re multiplying y_height
by). And it’s still possible to make the labels appear inside the bars — just write height * (y_height * ...)
instead of height  (y_height * ...)
. A version of autolabel
that takes the type (that is, whether it’s logarithmic or linear) of the scale into account might be nice, but I think I’ll leave that as an exercise to the reader, too.