Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-10-14 18:01
Iterating between theory and code
Yesterday I said on Twitter “Time to see whether practice agrees with theory, moving from LaTeX to Python. Wish me luck.” I got a lot of responses to that because it describes the experience of a lot of people. Someone asked if I’d blog about this. The content is confidential, but I’ll talk about the […]
Eloquent mathematical writing
Sir Michael Atiyah recommends Hermann Weyl’s book The Classical Groups for its clarity and beautiful prose. From my interview with Atiyah: Hermann Weyl is my great model. He used to write beautiful literature. Reading it was a joy because he put a lot of thought into it. Hermann Weyl wrote a book called The Classical Groups, […]
Focus on the most important terms
Consider the following Taylor series for sin(θ/7) and the following two functions based on the series, one takes only the first non-zero term def short_series(x): return 0.14285714*x and a second that three non-zero terms. def long_series(x): return 0.1425714*x - 4.85908649e-04*x**3 + 4.9582515e-07*x**5 Which is more accurate? Let’s make a couple plots plot to see. First […]
Scaling and Monte Carlo integration methods
Here’s an apparent paradox. You’ll hear that Monte Carlo methods are independent of dimension, and that they scale poorly with dimension. How can both statements be true? The most obvious way to compute multiple integrals is to use product methods, analogous to the way you learn to compute multiple integrals by hand. Unfortunately the amount […]
Integrating polynomials over a sphere or ball
Spheres and balls are examples of common words that take on a technical meaning in math, as I wrote about here. Recall the the unit sphere in n dimensions is the set of points with distance 1 from the origin. The unit ball is the set of points of distance less than or equal to 1 from the […]
Bounding the 3rd moment by the 4th moment
For a random variable X, the kth moment of X is the expected value of Xk. For any random variable X with 0 mean, or negative mean, there’s an inequality that bounds the 3rd moment, m3 in terms of the 4th moment, m4: The following example shows that this bound is the best possible. Define u […]
Putting a brace under something in LaTeX
Here’s a useful LaTeX command that I learned about recently: \underbrace. It does what it sounds like it does. It puts a brace under its argument. I used this a few days ago in the post on the new prime record when I wanted to show that the record prime is written in hexadecimal as […]
Emacs features that use regular expressions
The syntax of regular expressions in Emacs is a little disappointing, but the ways you can use regular expressions in Emacs is impressive. I’ve written before about the syntax of Emacs regular expressions. It’s a pretty conservative subset of the features you may be used to from other environments as summarized in the diagram below. But […]
Easter eggs and yellow pigs
An Easter egg is a hidden feature, a kind of joke. The term was first used in video games but the idea is broader and older than that. For example, Alfred Hitchcock made a brief appearance in all his movies. And I recently heard that there’s a pineapple or reference to a pineapple in every […]
Big derivatives
Suppose you have a function of n variables f. The kth derivative of f is a kth order tensor [1] with nk components. Not all those tensor components are unique. Assuming our function f is smooth, the order in which partial derivatives are taken doesn’t matter. It only matters which variables you differentiate with respect […]
Are coffee and wine good for you or bad for you?
One study will say that coffee is good for you and then another will say it’s bad for you. Ditto with wine and many other things. So which is it: are these things good for you or bad for you? Probably neither. That is, these things that are endlessly studied with contradictory conclusions must not […]
Distribution of matches between two shuffled decks
Take two desks of cards and shuffle them. They can be standard 52-card decks, though the number of cards in the decks doesn’t matter as long as they’re the same and the decks are fairly large. Now count the number of times the two desks match, i.e. how many times the same card is in […]
Magic squares as matrices
If you view a 3 × 3 magic square as a matrix and raise it to the third power, the result is also a magic square. More generally, if you multiply an odd number of 3 × 3 magic squares together, the result is a magic square. For example, here are three magic squares that […]
GDPR and the right to be forgotten
General Data Protection Regulation The European GDPR (General Data Protection Regulation) was adopted in 2016 and becomes enforceable in May of this year. Article 17 mandates a right to erasure, more commonly called the right to be forgotten. A right to be forgotten is tricky. It’s not immediately clear what this means or to what […]
Most useful math class
A few years ago someone asked me what was my most useful undergraduate math class. My first thought was topology. I have never directly applied topology for a client. Nobody has ever approached me wanting to know, for example, whether two objects were in the same homotopy class. But I believe topology was one of […]
Product review policies
I’ve often reviewed books on this site and may review other products some day. I wanted to let readers and potential vendors know what my policies are regarding product reviews. I don’t get paid for reviews. I review things that I find interesting and think that readers would find interesting. I don’t do reviews with […]
Average fraction round up
Pick a large number n. Divide n by each of the positive integers up to n and round the results up to the nearest integer. On average, how far do you round up? Or in terms of probability, what is the expected distance between a fraction n/r, where n is large and fixed and r is chosen randomly […]
Six blog posts on the mathematics of privacy
Six blog posts on mathematics and privacy: Randomized response and Bayes’ theorem Big aggregate queries can still violate privacy Quantifying privacy loss Database anonymization for testing Toxic pairs and re-identification Adding Laplacian or Gaussian noise to a database
Ten years of blogging
Ten years ago I started writing this blog. Since then I’ve written about 2700 posts. Thank you all for reading, commenting, and sharing. Update: For highlights of my posts over the years, see Tim Hopper’s post John Cook’s Ten Year Blogging Endeavour.
New prime number record: 50th Mersenne prime
A new record for the largest known prime was announced yesterday: This number has 23,249,425 digits when written in base 10. In base 2, 2p – 1 is a sequence of p ones. For example, 31 = 25 -1 which is 11111 in binary. So in binary, the new record prime is a string of 77,232,917 […]
The Engineer’s Nyquist frequency and the sampling theorem
The Nyquist sampling theorem says that a band-limited signal can be recovered from evenly-spaced samples. If the highest frequency component of the signal is fc then the function needs to be sampled at a frequency of at least the Nyquist frequency 2fc. Or to put it another way, the spacing between samples needs to be […]
Making sense of a probability problem in the WSJ
Someone wrote to me the other day asking if I could explain a probability example from the Wall Street Journal. (“Proving Investment Success Takes Time,” Spencer Jakab, November 25, 2017.) Victor Haghani … and two colleagues told several hundred acquaintances who worked in finance that they would flip two coins, one that was normal and […]
Exponential sum for the new year
Exponential sums can make intricate patterns. Last year I made a page that displays a different page each day, using the month, day, and year as parameters in the expression below. The images plot the partial sums of this sum. This was yesterday’s image. Today’s image is surprisingly plain if we use y = 18. This […]
Free technical books, mostly chemical engineering
Retiring professor Leonard Fabiano contacted me looking to give away a set of technical books, mostly chemical engineering books. If you’re interested please email him at lenfab@live.com. Here are the books: Click on the image to see a larger version. Two titles are not possible to read in the photo. These are Conduction of heat […]
Equation for the Eiffel Tower
Robert Banks’s book Towing Icebergs, Falling Dominoes, and Other Adventures in Applied Mathematics describes the Eiffel Tower’s shape as approximately the logarithmic curve where y* and x0 are chosen to match the tower’s dimensions. Here’s a plot of the curve: And here’s the code that produced the plot: from numpy import log, exp, linspace, vectorize import matplotlib.pyplot […]
Top five math posts of 2017
These have been the most popular math-related posts here this year. Golden powers are nearly integers How efficient is Morse code? Finding numbers in pi Common words used as technical terms Sierpinski triangle strikes again See also a list of the top five computing-related posts.
Hermite polynomials, expected values, and integration
In the previous post, I alluded to using Hermite polynomials in conjunction with higher-order Laplace approximation. In this post I’ll expand on what that means. Hermite polynomials are orthogonal polynomials over the real line with respect to the weight given by the standard normal distribution. (There are two conventions for defining Hermite polynomials, what Wikipedia […]
Higher-order Laplace approximation
Yesterday’s post presented the most common form of Laplace approximation, the second order version, and mentioned in passing that there are higher order versions. The extension to higher order is not trivial, so post gives a high level overview of how you’d do it. The previous post looked at integrating exp( log( g(x) ) ) by […]
Laplace approximation of an integral from Bayesian logistic regression
Define and This integral comes up in Bayesian logisitic regression with a uniform (improper) prior. We will use this integral to illustrate a simple case of Laplace approximation. The idea of Laplace approximation is to approximate the integral of a Gaussian-like function by the integral of a (scaled) Gaussian with the same mode and same […]
Top five computing blog posts of 2017
These have been the most popular computing-related posts here this year. Programming language life expectancy SHA1 no longer recommended, but hardly a failure The most disliked programming language Improving on the Unix shell One practical application of functional programming I plan to post a list of the top file math-related posts soon.
Scholarship versus research
One of the things about academia that most surprised and disappointed me was the low regard for scholarship. Exploration is tolerated as long as it results in a profusion of journal articles, and of course grant money, but is otherwise frowned upon. For example, I know someone who ruined his academic career by writing a massive […]
Intellectual onramps
Tyler Cowen’s latest blog post gives advice for learning about modern China. He says that “books about sequences of dynasties are mind-numbing and not readily absorbed” and recommends finding other entry points before reading about dynasties. Find an “entry point” into China of independent intrinsic interest to you, be it basketball, artificial intelligence, Chinese opera, […]
Efficiency is not associative for matrix multiplication
Here’s a fact that has been rediscovered many times in many different contexts: The way you parenthesize matrix products can greatly change the time it takes to compute the product. This is important, for example, for the back propagation algorithm in deep learning. Let A, B, and C be matrices that are compatible for multiplication. Then (AB)C = A(BC). […]
Higher order Taylor series in several variables
Most sources that present Taylor’s theorem for functions of several variables stop at second order terms. One reason is that one or two terms are good enough for many applications. But the bigger reason is that things get more complicated when you include higher order terms. The kth order term in Taylor’s theorem is a rank k […]
How can a statistician help a lawyer?
I’ll be presenting at a webinar on Wednesday, December 13 at 1:00 PM Eastern. The title of the presentation is “Seven questions a statistician and answer for an attorney.” I will discuss, among other things, when common sense applies and when correct analysis can be counter-intuitive. There will be ample time at the end of […]
Moment generating functions and connections to other things
This post relates moment generating functions to the Laplace transform and to exponential generating functions. It also brings in connections to the z-transform and the Fourier transform. Thanks to Brian Borchers who suggested the subject of this post in a comment on a previous post on transforms and convolutions. Moment generating functions The moment generating function (MGF) of […]
Shannon wavelet
The Shannon wavelet has an interesting plot: Given the complexity of the plot, the function definition is surprisingly simple: The Fourier transform is even simpler: it’s the indicator function of [-2π, -π] ∪ [π, 2π], i.e. the function that is 1 on the intervals [-2π, -π] and [π, 2π] but zero everywhere else. The Shannon […]
Transforms and Convolutions
There are many theorems of the form where f and g are functions, T is an integral transform, and * is a kind of convolution. In words, the transform of a convolution is the product of transforms. When the transformation changes, the notion of convolution changes. Here are three examples. Fourier transform and convolution With the Fourier transform […]
Gamma function partial sums
Last week I wrote about Jentzsch’s theorem. It says that if the power series of function has a finite radius of convergence, the set of zeros of the partial sums of the series will cluster around and fill in the boundary of convergence. This post will look at the power series for the gamma function […]
Hypergeometric functions are key
From Orthogonal Polynomials and Special Functions by Richard Askey: At first the results we needed were in the literature but after a while we ran out of known results and had to learn something about special functions. This was a very unsettling experience for there were very few places to go to really learn about […]
Distribution of Fibonacci numbers mod m
The last digits of Fibonacci numbers repeat with period 60. This is something I’ve written about before. The 61st Fibonacci number is 2504730781961. The 62nd is 4052739537881. Since these end in 1 and 1, the 63rd Fibonacci number must end in 2, etc. and so the pattern starts over. It’s not obvious that the cycle should […]
A circle of zeros: Jentzsch’s theorem
Take a function that has a power series with a finite radius of convergence. Then the zeros of the partial sums will be dense around the boundary of convergence. That is Jentzsch’s theorem. Here are a couple plots to visualize Jentzsch’s theorem using the plotting scheme described in this post. First, we take the function f(z) […]
Orthogonal polynomials and the beta distribution
This post shows a connection between three families of orthogonal polynomials—Legendre, Chebyshev, and Jacobi—and the beta distribution. Legendre, Chebyshev, and Jacobi polynomials A family of polynomials Pk is orthogonal over the interval [-1, 1] with respect to a weight w(x) if whenever m ≠ n. If w(x) = 1, we get the Legendre polynomials. If w(x) = (1 […]
Runge phenomena
I’ve mentioned the Runge phenomenon in a couple posts before. Here I’m going to go into a little more detail. First of all, the “Runge” here is Carl David Tolmé Runge, better known for the Runge-Kutta algorithm for numerically solving differential equations. His name rhymes with cowabunga, not with sponge. Runge showed that polynomial interpolation […]
Twenty questions and conditional probability
The previous post compared bits of information to answers in a game of Twenty Questions. The optimal strategy for playing Twenty Questions is for each question to split the remaining possibilities in half. There are a couple ways to justify this strategy: mixmax and average. The minmax approach is to minimize the worse thing that […]
Handedness, introversion, height, blood type, and PII
I’ve had data privacy on my mind a lot lately because I’ve been doing some consulting projects in that arena. When I saw a tweet from Tim Hopper a little while ago, my first thought was “How many bits of PII is that?”. [1] π Things Only Left Handed Introverts Over 6′ 5″ with O+ […]
Pareto distribution and Benford’s law
The Pareto probability distribution has density for x ≥ 1 where a > 0 is a shape parameter. The Pareto distribution and the Pareto principle (i.e. “80-20” rule) are named after the same person, the Italian economist Vilfredo Pareto. Samples from a Pareto distribution obey Benford’s law in the limit as the parameter a goes to […]
Random number generation posts
Random number generation is typically a two step process: first generate a uniformly distributed value, then transform that value to have the desired distribution. The former is the hard part, but also the part more likely to have been done for you in a library. The latter is relatively easy in principle, though some distributions […]
Quantifying information gain in beta-binomial Bayesian model
The beta-binomial model is the “hello world” example of Bayesian statistics. I would call it a toy model, except it is actually useful. It’s not nearly as complicated as most models used in application, but it illustrates the basics of Bayesian inference. Because it’s a conjugate model, the calculations work out trivially. For more on […]
Big aggregate queries can still violate privacy
Suppose you want to prevent your data science team from being able to find out information on individual customers, but you do want them to be able to get overall statistics. So you implement two policies. Data scientists can only query aggregate statistics, such as counts and averages. These aggregate statistics must be based on […]
...46474849505152535455...