Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-04-02 13:32
Bell numbers
The nth Bell number is the number of ways to partition a set of n labeled items. It’s also equal to the following sum. You may have to look at that sum twice to see it correctly. It looks a lot like the sum for en except the roles of k and n are reversed in […]
Relative error in the central limit theorem
If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply. See notes here.) But […]
Central limit theorem and Runge phenomena
I was playing around with something this afternoon and stumbled on something like Gibbs phenomena or Runge phenomena for the Central Limit Theorem. The first place most people encounter Gibbs phenomena is in Fourier series for a step function. The Fourier series develops “bat ears” near the discontinuity. Here’s an example I blogged about before […]
Computing extreme normal tail probabilities
Let me say up front that relying on the normal distribution as an accurate model of extreme events is foolish under most circumstances. The main reason to calculate the probability of, say, a 40 sigma event is to show how absurd it is to talk about 40 sigma events. See my previous post on six-sigma […]
Six sigma events
I saw on Twitter this afternoon a paraphrase of a quote from Nassim Taleb to the effect that if you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event. What does that mean? Six sigma means six standard deviations away from the mean of a probability distribution, sigma (σ) being the […]
Calendars and continued fractions
Calendars are based on three frequencies: the rotation of the Earth on its axis, the rotation of the moon around the Earth, and the rotation of the Earth around the sun. Calendars are complicated because none of these periods is a simple multiple of any other. The ratios are certainly not integers, but they’re not […]
Computing smooth max without overflow
Erik Erlandson sent me a note saying he found my post on computing the soft maximum helpful. (If you’re unfamiliar with the soft maximum, here’s a brief description of what it is and how you might use it.) Erik writes I used your post on practical techniques for computing smooth max, which will probably be […]
Proving life exists on Earth
NASA’s Galileo mission was primarily designed to explore Jupiter and its moons. In 1989, the Galileo probe started out traveling away from Jupiter in order to do a gravity assist swing around Venus. About a year later it also did a gravity assist maneuver around Earth. Carl Sagan suggested that when passing Earth, the Galileo […]
Combinatorics, just beyond the basics
Most basic combinatorial problems can be solved in terms of multiplication, permutations, and combinations. The next step beyond the basics, in my experience, is counting selections with replacement. Often when I run into a problem that is not quite transparent, it boils down to this. Examples of selection with replacement Here are three problems that […]
10 best rational approximations for pi
It’s easy to create rational approximations for π. Every time you write down π to a few decimal places, that’s a rational approximation. For example, 3.14 = 314/100. But that’s not the best approximation. Think of the denominator of your fraction as something you have to buy. If you have enough budget to buy a three-digit […]
Fixed points of logistic function
Here’s an interesting problem that came out of a logistic regression application. The input variable was between 0 and 1, and someone asked when and where the logistic transformation f(x) = 1/(1 + exp(a + bx)) has a fixed point, i.e. f(x) = x. So given logistic regression parameters a and b, when does the logistic curve given by y […]
Line art
A new video from 3Blue1Brown is about visualizing derivatives as stretching and shrinking factors. Along the way they consider the function f(x) = 1 + 1/x. Iterations of f converge on the golden ratio, no matter where you start (with one exception). The video creates a graph where they connect values of x on one […]
Causal inference and cryptic syntax
I just made one of those O’Reilly parody book covers. It’s a joke on Judea Pearl, expert in causal inference, and the Perl programming language, known for its unusual, terse syntax. Related:
Making a career out of the chain rule
When I was a teenager, my uncle gave me a calculus book and told me that mastering calculus was the most important thing I could do for starting out in math. So I learned the basics of calculus from that book. Later I read Michael Spivak’s two calculus books. I took courses that built on […]
Robustness and tests for equal variance
The two-sample t-test is a way to test whether two data sets come from distributions with the same mean. I wrote a few days ago about how the test performs under ideal circumstances, as well as less than ideal circumstances. This is an analogous post for testing whether two data sets come from distributions with the same […]
Ellipsoid geometry and Haumea
To first approximation, Earth is a sphere. A more accurate description is that the earth is an oblate spheroid, the polar axis being a little shorter than the equatorial diameter. See details here. Other planets are also oblate spheroids as well. Jupiter is further from spherical than the earth is more oblate. The general equation […]
Two-sample t-test and robustness
A two-sample t-test is intended to determine whether there’s evidence that two samples have come from distributions with different means. The test assumes that both samples come from normal distributions. Robust to non-normality, not to asymmetry It is fairly well known that the t-test is robust to departures from a normal distribution, as long as the actual […]
Spectral sparsification
The latest episode of My Favorite theorem features John Urschel, former offensive lineman for the Baltimore Ravens and current math graduate student. His favorite theorem is a result on graph approximation: for every weighted graph, no matter how densely connected, it is possible to find a sparse graph whose Laplacian approximates that of the original […]
Reciprocals of primes
Here’s an interesting little tidbit: For any prime p except 2 and 5, the decimal expansion of 1/p repeats with a period that divides p-1. The period could be as large as p-1, but no larger. If it’s less than p-1, then it’s a divisor of p-1. Here are a few examples. 1/3 = 0.33… […]
Rise and fall of the Windows Empire
This morning I ran across the following graph via Horace Dediu. I developed Windows software during the fattest part of the Windows curve. That was a great time to be in the Windows ecosystem. Before that I was in an academic bubble. My world consisted primarily of Macs and various flavors of Unix. I had […]
Robust statistics
P. J. Huber gives three desiderata for a statistical method in his book Robust Statistics: It should have a reasonably good (optimal or nearly optimal) efficiency at the assumed model. It should be robust in the sense that small deviations from the model assumptions should impair the performance only slightly. Somewhat larger deviations from the […]
Optimal low-rank matrix approximation
Matrix compression Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe. For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix […]
Least squares solutions to over- or underdetermined systems
If often happens in applications that a linear system of equations Ax = b either does not have a solution or has infinitely many solutions. Applications often use least squares to create a problem that has a unique solution. Overdetermined systems Suppose the matrix A has dimensions m by n and the right hand side vector b has dimension m. Then the […]
Computing SVD and pseudoinverse
In a nutshell, given the singular decomposition of a matrix A, the Moore-Penrose pseudoinverse is given by This post will explain what the terms above mean, and how to compute them in Python and in Matheamtica. Singular Value Decomposition (SVD) The singular value decomposition of a matrix is a sort of change of coordinates that makes […]
Probit regression
The previous post looked at how probability predictions from a logistic regression model vary as a function of the fitted parameters. This post goes through the same exercise for probit regression and compares the two kinds of nonlinear regression. Generalized linear models and link functions Logistic and probit regression are minor variations on a theme. […]
Sensitivity of logistic regression prediction on coefficients
The output of a logistic regression model is a function that predicts the probability of an event as a function of the input parameter. This post will only look at a simple logistic regression model with one predictor, but similar analysis applies to multiple regression with several predictors. Here’s a plot of such a curve […]
Tridiagonal systems, determinants, and natural cubic splines
Tridiagonal matrices A tridiagonal matrix is a matrix that has nonzero entries only on the main diagonal and on the adjacent off-diagonals. This special structure comes up frequently in applications. For example, the finite difference numerical solution to the heat equation leads to a tridiagonal system. Another application, the one we’ll look at in detail […]
Probability of coprime sets
The latest blog post from Gödel’s Lost Letter and P=NP looks at the problem of finding relatively prime pairs of large numbers. In particular, they want a deterministic algorithm. They mention in passing that the probability of a set of k large integers being relatively prime (coprime) is 1/ζ(k) where ζ is the Riemann zeta function. This […]
The quadratic formula and low-precision arithmetic
What could be interesting about the lowly quadratic formula? It’s a formula after all. You just stick numbers into it. Well, there’s an interesting wrinkle. When the linear coefficient b is large relative to the other coefficients, the quadratic formula can give wrong results when implemented in floating point arithmetic. Quadratic formula and loss of precision The […]
Off by one character
There was a discussion on Twitter today about a mistake calculus students make: I pointed out that it’s only off by one character: The first equation is simply wrong. The second is correct, but a gross violation of convention, using x as a constant and e as a variable.
New Twitter account: BasicStatistics
I’ve started a new Twitter account: @BasicStatistics. The new account is for people who are curious about statistics. It’s meant to be accessible to a wider audience than @DataSciFact. More Twitter accounts here.
Review of Matrix Mathematics
Bernstein’s Matrix Mathematics is impressive. It’s over 1500 pages and weighs 5.3 pounds (2.4 kg). It’s a reference book, not the kind of book you just sit down to read. (Actually, I have sat down to read parts of it.) I’d used a library copy of the first edition, and so when Princeton University Press […]
Moore-Penrose pseudoinverse is not an adjoint
The Moore-Penrose pseudoinverse of a matrix is a way of coming up with something like an inverse for a matrix that doesn’t have an inverse. If a matrix does have an inverse, then the pseudoinverse is in fact the inverse. The Moore-Penrose pseudoinverse is also called a generalized inverse for this reason: it’s not just […]
It’s like this other thing except …
One of my complaints about math writing is that definitions are hardly ever subtractive, even if that’s how people think of them. For example, a monoid is a group except without inverses. But that’s not how you’ll see it defined. Instead you’ll read that it’s a set with an associative binary operation and an identity […]
Obesity index: Measuring the fatness of probability distribution tails
A probability distribution is called “fat tailed” if its probability density goes to zero slowly. Slowly relative to what? That is often implicit and left up to context, but generally speaking the exponential distribution is the dividing line. Probability densities that decay faster than the exponential distribution are called “thin” or “light,” and densities that […]
Duffing equation for nonlinear oscillator
The Duffing equation is an ordinary differential equation describing a nonlinear damped driven oscillator. If the parameter μ were zero, this would be a damped driven linear oscillator. It’s the nonlinear x³ term that makes things nonlinear and interesting. Using an analog computer in 1961, Youshisuke Ueda discovered that this system was chaotic. It was […]
Surface area of an egg
The first post in this series looked at a possible formula for the shape of an egg, how to fit the parameters of the formula, and the curvature of the shape at each end of the egg. The second post looked at the volume. This post looks at the surface area. If you rotate the […]
Volume of an egg
The previous post looked at an equation to fit the shape of an egg. In two dimensions we had In this post, we’ll rotate that curve around the x-axis to find the volume. Then we’ll see how it compares to that of an ellipsoid. If we rotate the graph of a function f(x) around the x-axis with x ranging […]
Equation to fit an egg
How would you fit an equation to the shape of an egg? This site suggests an equation of the form Note that if k = 0 we get an ellipse. The larger the parameter k is, the more asymmetric the shape is about the y-axis. Let’s try that out in Mathematica: ContourPlot[ x^2/16 + y^2 (1 + 0.1 […]
Viability of unpopular programming languages
I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see whether that’s true. I won’t argue how […]
Eight-bit floating point
Researchers have discovered that for some problems, deep neural networks (DNNs) can get by with low precision weights. Using fewer bits to represent weights means that more weights can fit in memory at once. This, as well as embedded systems, has renewed interest in low-precision floating point. Microsoft mentioned its proprietary floating point formats ms-fp8 and […]
Comparing range and precision of IEEE and posit
The IEEE standard 754-2008 defines several sizes of floating point numbers—half precision (binary16), single precision (binary32), double precision (binary64), quadruple precision (binary128), etc.—each with its own specification. Posit numbers, on the other hand, can be defined for any number of bits. However, the IEEE specifications share common patterns so that you could consistently define theoretical […]
Categorical Data Analysis
Categorical data analysis could mean a couple different things. One is analyzing data that falls into unordered categories (e.g. red, green, and blue) rather than numerical values (e..g. height in centimeters). Another is using category theory to assist with the analysis of data. Here “category” means something more sophisticated than a list of items you […]
Anatomy of a posit number
This post will introduce posit numbers, explain the interpretation of their bits, and discuss their dynamic range and precision. Posit numbers are a new way to represent real numbers for computers, an alternative to the standard IEEE floating point formats. The primary advantage of posits is the ability to get more precision or dynamic range out […]
Up arrow and down arrow notation
I recently ran into a tweet saying that if ** denotes exponentiation then // should denote logarithm. With this notation, for example, if we say 3**4 == 81 we would also say 81 // 3 == 4. This runs counter to convention since // has come to be a comment marker or a notation for integer […]
Asymmetric surprise
Motivating example: planet spacing My previous post showed that planets are roughly evenly distributed on a log scale, not just in our solar system but also in extrasolar planetary systems. I hadn’t seen this before I stumbled on it by making some plots. I didn’t think it was an original discovery—I assume someone did this […]
Planets evenly spaced on log scale
The previous post was about Kepler’s observation that the planets were spaced out around the sun the same way that nested regular solids would be. Kepler only knew of six planets, which was very convenient because there are only five regular solids. In fact, Kepler thought there could only be six planets because there are only […]
Planets and Platonic solids
Johann Kepler discovered in 1596 that the ratios of the orbits of the six planets known in his day were the same as the ratios between nested Platonic solids. Kepler was understandably quite impressed with this discovery and called it the Mysterium Cosmographicum. I heard of this in a course in the history of astronomy […]
Hypothesis testing vs estimation
I was looking at my daughter’s statistics homework recently, and there were a pair of questions about testing the level of lead in drinking water. One question concerned testing whether the water was safe, and the other concerned testing whether the water was unsafe. There’s something bizarre, even embarrassing, about this. You want to do […]
Curvature and automatic differentiation
Curvature is tedious to calculate by hand because it involves calculating first and second order derivatives. Of course other applications require derivatives too, but curvature is the example we’ll look at in this post. Computing derivatives It would be nice to write programs that only explicitly implement the original function and let software take care […]
...40414243444546474849...