I’m very happy to release my first real open source software project: pygram11. I’ve been writing software for a while now, but it’s mostly been within the confines of a physics-experiment-specific use case. In that time I’ve used a lot of other developers’ software, so it feels quite nice to potentially help contribute to the scientific computing community in the same way.
This python library aims to make generating a lot of histograms a
quick task (targeting samples of size \(O(10^6)\) and larger), while
supporting weighted statistical uncertainties on the bin counts. To
do this I’ve implemented the ability to calculate histograms (both
fixed and variable bin width in one and two dimensions) which are
(optionally) accelerated with OpenMP. To do it in Python, I’ve used
pybind11. Pygram11 can essentially be a drop-in replacement for
numpy.histogram2d, while reaching speeds
20x faster (for a 1D histogram of an array of length 10,000) to
almost 100x faster than NumPy (for a 2D histogram of 100 million
\((x_i, y_i)\) pairs). The APIs are quite similar (with slightly
different return styles). On top of that, the
sum-of-weights-squared calculation is a “first class citizen” in
pygram11 (see my NumPy Histogram tricks for HEP post).
So, please go checkout the documentation and GitHub repository! Open issues, PRs, email me, tweet me, or write something even better.
To try it out (with OpenMP support), all you need is
conda install pygram11 -c conda-forge
Some fixed bin histogramming:
import numpy as np from pygram11 import histogram, histogram2d x = np.random.randn(100000) y = np.random.randn(100000) w = np.random.uniform(0.8, 1.2, 100000) h_1d = histogram(x, bins=20, range=(-4, 4), omp=True) h_2d = histogram2d(x, y, bins=[20, 40], range=[[-4, 4], [-3, 3]], omp=True) h_1d, sumw2_1d = histogram(x, bins=20, range=(-4, 4), weights=w, omp=True) h_2d, sumw2_2d = histogram2d(x, y, bins=[20, 40], range=[[-4, 4], [-3, 3]], weights=w, omp=True)
Notice the sum-of-weights squared is returned if the
argument is provided with an array of sample weights. Checkout
some benchmarks to see how the
omp argument can speed up the
And some variable bin histogramming, uniform logarithmic:
import numpy as np from pygram11 import histogram x = np.exp(np.random.uniform(0.1, 10.0, 100000)) bins = np.logspace(0.1, 1.0, 10, endpoint=True) h = histogram(x, bins=bins, omp=True)
I hope to eventually spend some time potentially optimizing the OpenMP usage. I think there is some room for improvement there. I would also like to add some optional visualization utilities, we’ll see.