Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-11-23 08:46
No critical point between two peaks
If a function of one variable has two local maxima, it must have a local minimum in between. What about a function of two variables? If it has two local maxima, does it need to have a local minimum? No, it could have a saddle point in between, a point that is a local minimum […]
Clean obfuscated code
One way to obfuscate code is clever use of arcane programming language syntax. Hackers are able to write completely unrecognizable code by exploiting dark corners of programming techniques and languages. Some of these attempts are quite impressive. But it’s also possible to write clean source code that is nevertheless obfuscated. For example, it’s not at […]
How many musical scales are there?
How many musical scales are there? That’s not a simple question. It depends on how you define “scale.” For this post, I’ll only consider scales starting on C. That is, I’ll only consider changing the intervals between notes, not changing the starting note. Also, I’ll only consider subsets of the common chromatic scale; this post […]
Toxic pairs, re-identification, and information theory
Database fields can combine in subtle ways. For example, nationality is not usually enough to identify anyone. Neither is religion. But the combination of nationality and religion can be surprisingly informative. Information content of nationality How much information is contained in nationality? That depends on exactly how you define nations versus territories etc., but for […]
Chaos and the beta distribution
Iteration of the quadratic function f(x) = 4x(1-x) is a famous example in chaos theory. Here’s what the first few iterations look like, starting with 1/√3. (There’s nothing special about that starting point; any point that doesn’t iterate to exactly zero will do.) The values appear to bounce all over the place. Let’s look at a […]
Cellular automata with random initial conditions
The previous post looked at a particular cellular automaton, the so-called Rule 90. When started with a single pixel turned on, it draws a Sierpinski triangle. With random starting pixels, it draws a semi-random pattern that retains features like the Sierpinski triangle. There are only 256 possible elementary cellular automata, so it’s practical to plot […]
Sierpinski triangle strikes again
A couple months ago I wrote about how a simple random process gives rise to the Sierpinski triangle. Draw an equilateral triangle and pick a random point in the plane. Repeatedly pick a triangle vertex at random and move half way from the current position to that vertex. The result converges to a Sierpinksi triangle. […]
A cryptographically secure random number generator
A random number generator can have excellent statistical properties and yet not be suited for use in cryptography. I’ve written a few posts to demonstrate this. For example, this post shows how to discover the seed of an LCG random number generator. This is not possible with a secure random number generator. Or more precisely, […]
Aerial video of Hurricane Harvey aftermath and cleanup
Video by my friend Aaron Benzel showing the debris and cleanup typical of neighborhoods that flooded in Harvey.
Adding Laplace or Gaussian noise to database for privacy
In the previous two posts we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to continuous or unbounded data. I like to keep the posts here fairly short, but this topic is fairly technical. To keep it short I’ll omit some of […]
Quantifying privacy loss in a statistical database
In the previous post we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post we’ll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off. More general randomized response Suppose we have a binary response […]
Randomized response, privacy, and Bayes theorem
Suppose you want to gather data on an incriminating question. For example, maybe a statistics professor would like to know how many students cheated on a test. Being a statistician, the professor has a clever way to find out what he wants to know while giving each student deniability. Randomized response Each student is asked […]
Why don’t you simply use XeTeX?
From an FAQ post I wrote a few years ago: This may seem like an odd question, but it’s actually one I get very often. On my TeXtip twitter account, I include tips on how to create non-English characters such as using \AA to produce Å. Every time someone will ask “Why not use XeTeX and just […]
Pascal’s triangle and Fermat’s little theorem
I was listening to My Favorite Theorem when Jordan Ellenberg said something in passing about proving Fermat’s little theorem from Pascal’s triangle. I wasn’t familiar with that, and fortunately Evelyn Lamb wasn’t either and so she asked him to explain. Fermat’s little theorem says that for any prime p, then for any integer a, ap = a […]
Making a problem easier by making it harder
In the oral exam for my PhD, my advisor asked me a question about a differential equation. I don’t recall the question, but I remember the interaction that followed. I was stuck, and my advisor countered by saying “Let me ask you a harder question.” I was still stuck, and so he said “Let me […]
Quantifying the information content of personal data
It can be surprisingly easy to identify someone from data that’s not directly identifiable. One commonly cited result is that the combination of birth date, zip code, and sex is enough to identify most people. This post will look at how to quantify the amount of information contained in such data. If the answer to […]
Negative correlation introduced by success
Suppose you measure people on two independent attributes, X and Y, and take those for whom X+Y is above some threshold. Then even though X and Y are uncorrelated in the full population, they will be negatively correlated in your sample. This article gives the following example. Suppose beauty and acting ability were uncorrelated. Knowing how […]
Highly cited theorems
Some theorems are cited far more often than others. These are not the most striking theorems, not the most advanced or most elegant, but ones that are extraordinarily useful. I first noticed this when taking complex analysis where the Cauchy integral formula comes up over and over. When I first saw the formula I thought […]
Width of mixture PDFs
Suppose you draw samples from two populations, one of which has a wider probability distribution than the other. How does the width of the distribution of the combined sample vary as you change the proportions of the two populations? The extremes are easy. If you pick only from one population, then the resulting distribution will […]
Team dynamics and encouragement
When you add people to a project, the total productivity of the team as a whole may go up, but the productivity per person usually goes down. Someone suggested that as a rule of thumb, a company needs to triple its number of employees to double its productivity. Fred Brooks summarized this saying “Many hands […]
Relearning from a new perspective
I had a conversation with someone today who said he’s relearning logic from a categorical perspective. What struck me about this was not the specifics but the pattern: Relearning _______ from a _______ perspective. Not relearning something forgotten, but going back over something you already know well, but from a different starting point, a different […]
Hurricane Harvey update
As you may know, I live in the darkest region of the rainfall map below. My family and I are doing fine. Our house has not flooded, and at this point it looks like it will not flood. We’ve only lost electricity for a second or two. Of course not everyone in Houston is doing […]
Defining the Fourier transform on LCA groups
My previous post said that all the familiar variations on Fourier transforms—Fourier series analysis and synthesis, Fourier transforms on the real line, discrete Fourier transforms, etc.—can be unified into a single theory. They’re all instances of a Fourier transform on a locally compact Abelian (LCA) group. The difference between them is the underlying group. Given […]
Unified theory of Fourier transforms
You can take a periodic function and analyze it into its Fourier coefficients, or use the Fourier coefficients in a sum to synthesize a periodic function. You can take the Fourier transform of a function defined on the whole real line and get another such function. And you can compute the discrete Fourier transform via […]
Solving problems we wish we had
There’s a great line from Heather McGaw toward the end of the latest episode of 99 Percent Invisible: Sometimes … we can start to solve problems that we wish were problems because they’re easy to solve. Reminds me of an excerpt from Richard Weaver’s book Ideas Have Consequences: Obsession, according to the canons of psychology, […]
Predicting when an RNG will output a given value
A few days ago I wrote about how to pick the seed of a simple random number generator so that a desired output came n values later. The number n was fixed and we varied the seed. In this post, the seed will be fixed and we’ll solve for n. In other words, we ask when a […]
Programming language life expectancy
The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century. […]
Reverse engineering the seed of a linear congruential generator
The previous post gave an example of manipulating the seed of a random number generator to produce a desired result. This post will do something similar for a different generator. A couple times I’ve used the following LCG (linear congruential random number generator) in examples. An LCG starts with an initial value of z and updates z […]
Manipulating a random number generator
With some random number generators, it’s possible to select the seed carefully to manipulate the output. Sometimes this is easy to do. Sometimes it’s hard but doable. Sometimes it’s theoretically possible but practically impossible. In my recent correspondence with Melissa O’Neill, she gave me an example that seeds a random number generator so that the […]
Testing RNGs with PractRand
PractRand is a random number generator test suite, somewhat like the DIEHARDER and NIST tests I’ve written about before, but more demanding. Rather than running to completion, it runs until it a test fails with an infinitesimally small p-value. It runs all tests at a given sample size, then doubles the sample and runs the tests again. […]
Random minimum spanning trees
I just ran across a post by John Baez pointing to an article by Alan Frieze on random minimum spanning trees. Here’s the problem. Create a complete graph with n nodes, i.e. connect every node to every other node. Assign each edge a uniform random weight between 0 and 1. Find the minimum spanning tree. Add up […]
Selecting things in Emacs
You can select blocks of text in Emacs just as you would in most other environments. You could, for example, drag your mouse over a region. You could also hold down the Shift key and use arrow keys. But Emacs also has a number of commands that let you work in larger semantic units. That […]
Random walk on quaternions
The previous post was a riff on a tweet asking what you’d get if you extracted all the i‘s, j‘s, and k‘s from Finnegans Wake and multiplied them as quaternions. This post is a probabilistic variation on the previous one. If you randomly select a piece of English prose, extract the i‘s, j‘s, and k‘s, and multiply them together as quaternions, what […]
Wolfram Alpha, Finnegans Wake, and Quaternions
I stumbled on a Twitter account yesterday called Wolfram|Alpha Can’t. It posts bizarre queries that Wolfram Alpha can’t answer. Here’s one that caught my eye. result of extracting the i’s, j’s, and k’s in order from Finnegans Wake and interpreting as a quaternion product — Wolfram|Alpha Can’t (@wacnt) May 17, 2017 Suppose you did extract […]
The cross polytope
There are five regular solids in three dimensions: tetrahedron octahedron (pictured above) hexahedron (cube) dodecahedron icosahedron. I give a proof here that these are the only five. The first three of these regular solids generalize to all dimensions, and these generalizations are the only regular solids in dimensions 5 and higher. (There are six regular […]
Bayesian methods at Bletchley Park
From Nick Patterson’s interview on Talking Machines: GCHQ in the ’70s, we thought of ourselves as completely Bayesian statisticians. All our data analysis was completely Bayesian, and that was a direct inheritance from Alan Turing. I’m not sure this has ever really been published, but Turing, almost as a sideline during his cryptoanalytic work, reinvented […]
Sphere packing
The previous couple blog posts touched on a special case of sphere packing. We looked at the proportion of volume contained near the corners of a hypercube. If you take the set of points within a distance 1/2 of a corner of a hypercube, you could rearrange these points to form a full ball centered […]
Is most volume in the corners or not?
I’ve written a couple blog posts that may seem to contradict each other. Given a high-dimensional cube, is most of the volume in the corners or not? I recently wrote that the corners of a cube stick out more in high dimensions. You can quantify this by centering a ball at a corner and looking […]
Corners stick out more in high dimensions
High-dimensional geometry is full of surprises. For example, nearly all the area of a high-dimensional sphere is near the equator, and by symmetry it doesn’t matter which equator you take. Here’s another surprise: corners stick out more in high dimensions. Hypercubes, for example, become pointier as dimension increases. How might we quantify this? Think of […]
Math diagrams updated
I updated several of the math diagrams on this site today. They’re SVG now, so they resize nicely if you want to zoom in our out. Special functions Topological vector spaces Category theory concepts General topology Gamma function identities
Discrete example of concentration of measure
The previous post looked at a continuous example of concentration of measure. As you move away from a thin band around the equator, the remaining area in the rest of the sphere decreases as an exponential function of the dimension and the distance from the equator. This post will show a very similar result for […]
Nearly all the area in a high-dimensional sphere is near the equator
Nearly all the area of a high-dimensional sphere is near the equator. And by symmetry, it doesn’t matter which equator you take. Draw any great circle and nearly all of the area will be near that circle. This is the canonical example of “concentration of measure.” What exactly do we mean by “nearly all the […]
DIEHARDER random number generator test results for PCG and MWC
A few days ago I wrote about testing the PCG random number generator using the DIEHARDER test suite. In this post I’ll go into a little more background on this random number generator test suite. I’ll also show that like M. E. O’Neill’s PCG (“permuted congruential generator”), George Marsaglia’s MWC (“multiply with carry”) generator does quite […]
The chaos game and the Sierpinski triangle
The chaos game is played as follows. Pick a starting point at random. Then at each subsequent step, pick a triangle vertex at random and move half way from the current position to that vertex. The result looks like a fractal called the Sierpinski triangle or Sierpinski gasket. Here’s an example: If the random number […]
Testing the PCG random number generator
M. E. O’Neill’s PCG family of random number generators looks very promising. It appears to have excellent statistical and cryptographic properties. And it takes remarkably little code to implement. (PCG stands for Permuted Congruential Generator.) The journal article announcing PCG gives the results of testing it with the TestU01 test suite. I wanted to try it out […]
Simple random number generator does surprisingly well
I was running the NIST statistical test suite recently. I wanted an example of a random number generator where the tests failed, and so I used a simple generator, a linear congruence generator. But to my surprise, the generator passed nearly all the tests, even though some more sophisticated generators failed some of the same […]
Least common multiple of the first n positive integers
Here’s a surprising result: The least common multiple of the first n positive integers is approximately exp(n). More precisely, let φ(n) equal the log of the least common multiple of the numbers 1, 2, …, n. There are theorems that give upper and lower bounds on how far φ(n) can be from n. We won’t prove or […]
Subscribing by email
You can subscribe to my blog by email or RSS. I also have a brief newsletter you could sign up for. There are links to these in the sidebar of the blog: If you subscribe by email, you’ll get an email each morning containing the post(s) from the previous day. I just noticed a problem […]
Effective sample size for MCMC
In applications we’d like to draw independent random samples from complicated probability distributions, often the posterior distribution on parameters in a Bayesian analysis. Most of the time this is impractical. MCMC (Markov Chain Monte Carlo) gives us a way around this impasse. It lets us draw samples from practically any probability distribution. But there’s a […]
Quicksort and prime numbers
The average number of operations needed for quicksort to sort a list of n items is approximately 10 times the nth prime number. Here’s some data to illustrate this. |------+-----------------+---------| | n | avg. operations | 10*p(n) | |------+-----------------+---------| | 100 | 5200.2 | 5410 | | 200 | 12018.3 | 12230 | | 300 | 19446.9 […]
...41424344454647484950...