Tuesday, September 3, 2013

Nice Histogram Using Matplotlib

The purpose of this writeup is to outline a way to make a good looking, effective histogram using matplotlib. I don't want to limit my choices to expectations from well established scientists or publishers.

In my mind, the most important aspect of a plot is to provide a perspective on a dataset that allows for the uncovering of relevant trends. As such, visual efficiency supercedes making the plot "pretty".

I will start by importing all the relevant libraries, loading my data, generating the histogram and showing the plot.

#import libraries
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.pyplot as plt

#load data - single column
x = np.loadtxt("datafile.bla")

#initialize plot - add a figure object and create a subplot covering the entire figure
fig.plt.figure()
ax = fig.add_subplot(111)

#generate histogram of the data
#n refers to the counts at each bin
#bins refers to a list of quantiles (corresponding to different bars in the histogram)
n, bins = ax.hist(x)

#add labels
ax.set_xlabel('Size (nm)')
ax.set_ylabel('Probability')
plt.show()

The result is the underwhelming plot below. I consider the small number of bins the only major caveat with regards to efficacy. On the other hand, the histogram is rather unpleasant to the eyes.


I will first try to fix the number of bins, whose low resolution could lead to misleading conclusions. Considering my large dataset (~600 points), I will increase the number of bins to lie somewhere between 18-20. I will also change the unsightly blue color to a light gray. I want to maintain some level of shading to give the plot body and make it a point of focus. I can do this by adding the following options to the hist command.

n, bins, patches = ax.hist(x, bins=20, facecolor="#DDDDDD")



To me, this is already looking much more presentable. Now I will add some personal touches that I believe can be very important. Hatching allows shading to be better represented in monitors or prints that might not be very faithfull to light grays. It also allows for good contrast if several plots are superposed. I also personally feel that it enhances the aesthetic look without loosing visual efficacy. I will also thicken the borders on the bars following a similar argument as with hatching.

n, bins, patches = ax.hist(x, bins=20, facecolor="#DDDDDD",
                                     hatch="//", lw=1.5)



At this point, the histogram bars are (in my eyes) perfect. However, the axes could definitely use some work.

When I get a little more time, I will go through changing the font and possibly improving tics and plot borders. I need to talk about the never-mentioned issue of appropriate font usage, in addition to the well established issue of font sizes...among other things.






No comments:

Post a Comment