Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-04-26 03:16
Weakening the requirements of a group
A group is a set with a binary operation that closed associative, has an identity, and has inverses. This post will look at each of these properties and what happens if you modify or remove it. Magmas Closed means that applying the binary operation to any two elements of the set yields another elements of […]
How far is xy from yx on average for quaternions?
Given two quaternions x and y, the product xy might equal the product yx, but in general the two results are different. How different are xy and yx on average? That is, if you selected quaternions x and y at random, how big would you expect the difference xy – yx to be? Since this difference would increase proportionately if you increased the length of x […]
Sunny exponential sum
Today’s exponential sum is appropriate for a hot July day. Each day the exponential sum page shows a different exponential sum with the month, day, and year in the denominators of an exponential sum. The graphs are formed by plotting the partial sums and connecting the dots. For example, today’s sum, is based on the […]
Gauss’ golden theorem: quadratic reciprocity
Suppose you have an odd prime p and an integer a, with a not a multiple of p. Does a have a square root mod p? That is, does there exist an integer x such that x² is congruent to a mod p? Half the time the answer is yes, and half the time it’s no. (We will remove the restriction that p is prime near […]
Does computer science help you program?
The relationship between programming and computer science is hard to describe. Purists will say that computer science has nothing to do with programming, but that goes too far. Computer science is about more than programming, but it’s is all motivated by getting computers to do things. With few exceptions. students major in computer science in […]
Linear regression and planet spacing
A while back I wrote about how planets are evenly spaced on a log scale. I made a bunch of plots, based on our solar system and the extrasolar systems with the most planets, and said noted that they’re all roughly straight lines. Here’s the plot for our solar system, including dwarf planets, with distance […]
Objectives and constraints
Objectives and constraints are symmetrical in a mathematical sense but are asymmetrical in a psychological sense. By taking dual formulations, you can reverse the mathematical role of objectives and constraints, but in application objectives are more obvious than constraints. In the question “What is the minimum value of x² over the interval [1, 5]?” the […]
Surprising curves with sine and sn
In the previous post I said that the Jacobi functions are like trig functions. That’s true if you look along the real axis. If you look at the rest of the complex plane you’ll see how they can be very different. The sine function is periodic along the real axis, but it grows exponentially along […]
System of Jacobi elliptic functions
Jacobi’s elliptic functions are sorta like trig functions. His functions sn and cn have names that reminiscent of sine and cosine for good reason. These functions come up in applications such as the nonlinear pendulum (i.e. when θ is too large to assume θ is a good enough approximation to sin θ) and in conformal […]
Commutative multiplication of triples
The complex numbers make a field out of pairs of real numbers. The quaternions almost make a field out of four-tuples of numbers, though multiplication is not commutatative. Technically, quaternions form a division algebra. Frobenius’s theorem says only real vector spaces that can be made into division algebras are the real numbers, complex numbers, and […]
Three things about dominoes
Here are three things about dominoes, two easy and one more advanced. Counting First, how many pieces are there in a set of dominoes? A domino corresponds to an unordered pair of numbers from 0 to n. The most popular form has n = 6, but there are variations with other values of n. You can show that […]
Magical learning
I asked two questions on twitter yesterday. The previous post summarized the results for a question about books that I asked from my personal Twitter account. This post will summarize the results of a question I asked from @AnalysisFact. If a genie offered to give you a thorough understanding of one theorem, what theorem would […]
Books you’d like to have read
I asked on Twitter today for books that people would like to have read, but don’t want to put in the time and effort to read. What’s a book you would like to have read but don’t want to read? — John D. Cook (@JohnDCook) June 15, 2018 Here are the responses I got, organized […]
Low-rank matrix perturbations
Here are a couple of linear algebra identities that can be very useful, but aren’t that widely known, somewhere between common knowledge and arcane. Neither result assumes any matrix has low rank, but their most common application, at least in my experience, is in the context of something of low rank added to something of […]
Almost prime generators and almost integers
Here are two apparently unrelated things you may have seen before. The first is an observation going back to Euler that the polynomial produces a long sequence of primes. Namely, the values are prime for n = 1, 2, 3, …, 40. The second is that the number is extraordinarily close to an integer. This number […]
US flag if California splits into three states
There’s a proposal for California to split into three states. If that happens, what would happen to the US flag? The US flag has had 13 stripes from the beginning, representing the first 13 states. The number of stars has increased over time as the number of states has increased. Currently there are 50 stars, […]
Partition numbers and Ramanujan’s approximation
The partition function p(n) counts the number of ways n unlabeled things can be partitioned into non-empty sets. (Contrast with Bell numbers that count partitions of labeled things.) There’s no simple expression for p(n), but Ramanujan discovered a fairly simple asymptotic approximation: How accurate is this approximation? Here’s a little Matheamtica code to see. p[n_] := PartitionsP[n] approx[n_] […]
Perl as a better grep
I like Perl’s pattern matching features more than Perl as a programming language. I’d like to take advantage of the former without having to go any deeper than necessary into the latter. The book Minimal Perl is useful in this regard. It has chapters on Perl as a better grep, a better awk, a better […]
Mathematics of Deep Note
I just finished listening to the latest episode of Twenty Thousand Hertz, the story behind “Deep Note,” the THX logo sound. There are a couple mathematical details of the sound that I’d like to explore here: random number generation, and especially Pythagorean tuning. Random number generation First is that part of the construction of the […]
Stirling numbers, including negative arguments
Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there. Stirling numbers of the second kind The Stirling number of the second […]
Tetrahedral numbers
Start with the sequence of positive integers: 1, 2, 3, 4, … Now take partial sums, the nth term of the new series being the sum of the first n terms of the previous series. This gives us the triangular numbers, so called because they count the number of coins at each level of a […]
Bell numbers
The nth Bell number is the number of ways to partition a set of n labeled items. It’s also equal to the following sum. You may have to look at that sum twice to see it correctly. It looks a lot like the sum for en except the roles of k and n are reversed in […]
Relative error in the central limit theorem
If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply. See notes here.) But […]
Central limit theorem and Runge phenomena
I was playing around with something this afternoon and stumbled on something like Gibbs phenomena or Runge phenomena for the Central Limit Theorem. The first place most people encounter Gibbs phenomena is in Fourier series for a step function. The Fourier series develops “bat ears” near the discontinuity. Here’s an example I blogged about before […]
Computing extreme normal tail probabilities
Let me say up front that relying on the normal distribution as an accurate model of extreme events is foolish under most circumstances. The main reason to calculate the probability of, say, a 40 sigma event is to show how absurd it is to talk about 40 sigma events. See my previous post on six-sigma […]
Six sigma events
I saw on Twitter this afternoon a paraphrase of a quote from Nassim Taleb to the effect that if you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event. What does that mean? Six sigma means six standard deviations away from the mean of a probability distribution, sigma (σ) being the […]
Calendars and continued fractions
Calendars are based on three frequencies: the rotation of the Earth on its axis, the rotation of the moon around the Earth, and the rotation of the Earth around the sun. Calendars are complicated because none of these periods is a simple multiple of any other. The ratios are certainly not integers, but they’re not […]
Computing smooth max without overflow
Erik Erlandson sent me a note saying he found my post on computing the soft maximum helpful. (If you’re unfamiliar with the soft maximum, here’s a brief description of what it is and how you might use it.) Erik writes I used your post on practical techniques for computing smooth max, which will probably be […]
Proving life exists on Earth
NASA’s Galileo mission was primarily designed to explore Jupiter and its moons. In 1989, the Galileo probe started out traveling away from Jupiter in order to do a gravity assist swing around Venus. About a year later it also did a gravity assist maneuver around Earth. Carl Sagan suggested that when passing Earth, the Galileo […]
Combinatorics, just beyond the basics
Most basic combinatorial problems can be solved in terms of multiplication, permutations, and combinations. The next step beyond the basics, in my experience, is counting selections with replacement. Often when I run into a problem that is not quite transparent, it boils down to this. Examples of selection with replacement Here are three problems that […]
10 best rational approximations for pi
It’s easy to create rational approximations for π. Every time you write down π to a few decimal places, that’s a rational approximation. For example, 3.14 = 314/100. But that’s not the best approximation. Think of the denominator of your fraction as something you have to buy. If you have enough budget to buy a three-digit […]
Fixed points of logistic function
Here’s an interesting problem that came out of a logistic regression application. The input variable was between 0 and 1, and someone asked when and where the logistic transformation f(x) = 1/(1 + exp(a + bx)) has a fixed point, i.e. f(x) = x. So given logistic regression parameters a and b, when does the logistic curve given by y […]
Line art
A new video from 3Blue1Brown is about visualizing derivatives as stretching and shrinking factors. Along the way they consider the function f(x) = 1 + 1/x. Iterations of f converge on the golden ratio, no matter where you start (with one exception). The video creates a graph where they connect values of x on one […]
Causal inference and cryptic syntax
I just made one of those O’Reilly parody book covers. It’s a joke on Judea Pearl, expert in causal inference, and the Perl programming language, known for its unusual, terse syntax. Related:
Making a career out of the chain rule
When I was a teenager, my uncle gave me a calculus book and told me that mastering calculus was the most important thing I could do for starting out in math. So I learned the basics of calculus from that book. Later I read Michael Spivak’s two calculus books. I took courses that built on […]
Robustness and tests for equal variance
The two-sample t-test is a way to test whether two data sets come from distributions with the same mean. I wrote a few days ago about how the test performs under ideal circumstances, as well as less than ideal circumstances. This is an analogous post for testing whether two data sets come from distributions with the same […]
Ellipsoid geometry and Haumea
To first approximation, Earth is a sphere. A more accurate description is that the earth is an oblate spheroid, the polar axis being a little shorter than the equatorial diameter. See details here. Other planets are also oblate spheroids as well. Jupiter is further from spherical than the earth is more oblate. The general equation […]
Two-sample t-test and robustness
A two-sample t-test is intended to determine whether there’s evidence that two samples have come from distributions with different means. The test assumes that both samples come from normal distributions. Robust to non-normality, not to asymmetry It is fairly well known that the t-test is robust to departures from a normal distribution, as long as the actual […]
Spectral sparsification
The latest episode of My Favorite theorem features John Urschel, former offensive lineman for the Baltimore Ravens and current math graduate student. His favorite theorem is a result on graph approximation: for every weighted graph, no matter how densely connected, it is possible to find a sparse graph whose Laplacian approximates that of the original […]
Reciprocals of primes
Here’s an interesting little tidbit: For any prime p except 2 and 5, the decimal expansion of 1/p repeats with a period that divides p-1. The period could be as large as p-1, but no larger. If it’s less than p-1, then it’s a divisor of p-1. Here are a few examples. 1/3 = 0.33… […]
Rise and fall of the Windows Empire
This morning I ran across the following graph via Horace Dediu. I developed Windows software during the fattest part of the Windows curve. That was a great time to be in the Windows ecosystem. Before that I was in an academic bubble. My world consisted primarily of Macs and various flavors of Unix. I had […]
Robust statistics
P. J. Huber gives three desiderata for a statistical method in his book Robust Statistics: It should have a reasonably good (optimal or nearly optimal) efficiency at the assumed model. It should be robust in the sense that small deviations from the model assumptions should impair the performance only slightly. Somewhat larger deviations from the […]
Optimal low-rank matrix approximation
Matrix compression Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe. For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix […]
Least squares solutions to over- or underdetermined systems
If often happens in applications that a linear system of equations Ax = b either does not have a solution or has infinitely many solutions. Applications often use least squares to create a problem that has a unique solution. Overdetermined systems Suppose the matrix A has dimensions m by n and the right hand side vector b has dimension m. Then the […]
Computing SVD and pseudoinverse
In a nutshell, given the singular decomposition of a matrix A, the Moore-Penrose pseudoinverse is given by This post will explain what the terms above mean, and how to compute them in Python and in Matheamtica. Singular Value Decomposition (SVD) The singular value decomposition of a matrix is a sort of change of coordinates that makes […]
Probit regression
The previous post looked at how probability predictions from a logistic regression model vary as a function of the fitted parameters. This post goes through the same exercise for probit regression and compares the two kinds of nonlinear regression. Generalized linear models and link functions Logistic and probit regression are minor variations on a theme. […]
Sensitivity of logistic regression prediction on coefficients
The output of a logistic regression model is a function that predicts the probability of an event as a function of the input parameter. This post will only look at a simple logistic regression model with one predictor, but similar analysis applies to multiple regression with several predictors. Here’s a plot of such a curve […]
Tridiagonal systems, determinants, and natural cubic splines
Tridiagonal matrices A tridiagonal matrix is a matrix that has nonzero entries only on the main diagonal and on the adjacent off-diagonals. This special structure comes up frequently in applications. For example, the finite difference numerical solution to the heat equation leads to a tridiagonal system. Another application, the one we’ll look at in detail […]
Probability of coprime sets
The latest blog post from Gödel’s Lost Letter and P=NP looks at the problem of finding relatively prime pairs of large numbers. In particular, they want a deterministic algorithm. They mention in passing that the probability of a set of k large integers being relatively prime (coprime) is 1/ζ(k) where ζ is the Riemann zeta function. This […]
The quadratic formula and low-precision arithmetic
What could be interesting about the lowly quadratic formula? It’s a formula after all. You just stick numbers into it. Well, there’s an interesting wrinkle. When the linear coefficient b is large relative to the other coefficients, the quadratic formula can give wrong results when implemented in floating point arithmetic. Quadratic formula and loss of precision The […]
...40414243444546474849...