Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-12-01 09:47
Chebyshev interpolation
Fitting a polynomial to a function at more points might not produce a better approximation. This is Faber’s theorem, something I wrote about the other day. If the function you’re interpolating is smooth, then interpolating at more points may or may not improve the fit of the interpolation, depending on where you put the points. […]
Fourier-Bessel series and Gibbs phenomena
Fourier-Bessel series are analogous to Fourier series. And like Fourier series, they converge pointwise near a discontinuity with the same kind of overshoot and undershoot known as the Gibbs phenomenon. Fourier-Bessel series Bessel functions come up naturally when working in polar coordinates, just as sines and cosines come up naturally when working in rectangular coordinates. […]
Animated exponential sum
I’m experimenting with making animated versions of the kinds of images I wrote about in my previous post. Here’s an animated version of the exponential sum of the day for 12/4/17. Why that date? I wanted to start with something with a fairly small period, and that one looked interesting. I’ll have to do something […]
Database anonymization for testing
How do you create a database for testing that is like your production database? It depends on in what way you want the test database to be “like” the production one. Replacing sensitive data Companies often use an old version of their production database for testing. But what if the production database has sensitive information […]
Recent exponential sums
The exponential sum of the day draws a line between consecutive partial sums of where m, d, and y are the current month, day, and two-digit year. The four most recent images show how different these plots can be. These images are from 10/30/17, 10/31/17, 11/1/17, and 11/2/17. Consecutive dates often produce very different images for a couple […]
Yogi Berra meets Pafnuty Chebyshev
I just got an evaluation copy of The Best Writing on Mathematics 2017. My favorite chapter was Inverse Yogiisms by Lloyd N. Trefethen. Trefethen gives several famous Yogi Berra quotes and concludes that Yogiisms are statements that, if taken literally, are meaningless or contradictory or nonsensical or tautological—yet nevertheless convey something true. An inverse yogiism is […]
The most disliked programming language
According to this post from Stack Overflow, Perl is the most disliked programming language. I have fond memories of writing Perl, though it’s been a long time since I used it. I mostly wrote scripts for file munging, the task it does best, and never had to maintain someone else’s Perl code. Under different circumstances […]
Poisson distribution and prime numbers
Let ω(n) be the number of distinct prime factors of x. A theorem of Landau says that for N large, then for randomly selected positive integers less than N, ω-1 has a Poisson(log log N) distribution. This statement holds in the limit as N goes to infinity. Apparently N has to be extremely large before […]
US Counties and power laws
Yesterday I heard that the county I live in, Harris County, is the 3rd largest is the United States. (In population. It’s nowhere near the largest in area.) Somehow I’ve lived here a couple decades without knowing that. Houston is the 4th largest city in the US, so it’s no shock that Harris County the […]
Paul Klee meets Perry the Platypus
I was playing around with something in Mathematica and one of the images that came out of it surprised me. It’s a contour plot for the system function of a low pass filter. H[z_] := 0.05634*(1 + 1/z)*(1 - 1.0166/z + 1/z^2) / ((1 - 0.683/z)*(1 - 1.4461/z + 0.7957/z^2)) ContourPlot[ Arg[H[Exp[I (x + I […]
Differential equations and recurrence relations
Series solutions to differential equations can be grubby or elegant, depending on your perspective. Power series solutions At one level, there’s nothing profound going on. To find a series solution to a differential equation, assume the solution has a power series, stick the series into the equation, and solve for the coefficients. The mechanics are […]
Finding numbers in pi
You can find any integer you want as a substring of the digits in π. (Probably. See footnote for details.) So you could encode a number by reporting where it appears. If you want to encode a single digit, the best you can do is break even: it takes at least one digit to specify […]
Empirically testing the Chowla conjecture
Terry Tao’s most recent blog post looks at the Chowla conjecture theoretically. This post looks at the same conjecture empirically using Python. (Which is much easier!) The Liouville function λ(n) is (-1)Ω(n) where Ω(n) is the number of prime factors of n counted with multiplicity. So, for example, Ω(9) = 2 because even though 9 […]
Time series analysis vs DSP terminology
Time series analysis and digital signal processing are closely related. Unfortunately, the two fields use different terms to refer to the same things. Suppose you have a sequence of inputs x[n] and a sequence of outputs y[n] for integers n. Moving average / FIR If each output depends on a linear combination of a finite number of previous […]
How to eliminate the first order term from a second order ODE
Authors will often say that “without loss of generality” they will assume that a differential equation has no first order derivative term. They’ll explain that there’s no need to consider because a change of variables can turn the above equation into one of the form While this is true, the change of variables is seldom […]
Common words that have a technical meaning in math
Mathematical writing is the opposite of business writing in at least one respect. Math uses common words as technical terms, whereas business coins technical terms to refer to common ideas. There are a few math terms I use fairly often and implicitly assume readers understand. Perhaps the most surprising is almost as in “almost everywhere.” […]
A uniformly distributed sequence
If you take the fractional parts of the set of numbers {n cos nx : integer n > 0} the result is uniformly distributed for almost all x. That is, in the limit, the number of times the sequence visits a subinterval of [0, 1] is proportional to the length of the interval. (Clearly it’s not true […]
Applying probability to non-random things
Probability has surprising uses, including applications to things that absolutely are not random. I’ve touched on this a few times. For example, I’ve commented on how arguments about whether something is really random are often moot: Random is as random does. This post will take non-random uses for probability in a different direction. We’ll start […]
Misplacing a continent
There are many conventions for describing points on a sphere. For example, does latitude zero refer to the North Pole or the equator? Mathematicians tend to prefer the former and geoscientists the latter. There are also varying conventions for longitude. Volker Michel describes this clash of conventions colorfully in his book on constructive approximation. Many […]
Quaint supercomputers
The latest episode of Star Trek Discovery (S1E4) uses the word “supercomputer” a few times. This sounds jarring. The word has become less common in contemporary usage, and seems even more out of place in a work of fiction set more than two centuries in the future. According to Google’s Ngram Viewer, the term “supercomputer” […]
Exponential sum of the day
I’ve written a page that will show a different exponential sum each day, images along the line of the post Exponential sums make pretty pictures. Here’s page: https://www.johndcook.com/expsum/ Here are a few sample images. Small changes in the coefficients can make a big change in the appearance of the graphs.
Something that bothers me about deep neural nets
Overfitting happens when a model does too good a job of matching a particular data set and so does a poor job on new data. The way traditional statistical models address the danger of overfitting is to limit the number of parameters. For example, you might fit a straight line (two parameters) to 100 data […]
Exponential sums make pretty pictures
Exponential sums are a specialized area of math that studies series with terms that are complex exponentials. Estimating such sums is delicate work. General estimation techniques are ham-fisted compared to what is possible with techniques specialized for these particular sums. Exponential sums are closely related to Fourier analysis and number theory. Exponential sums also make […]
No critical point between two peaks
If a function of one variable has two local maxima, it must have a local minimum in between. What about a function of two variables? If it has two local maxima, does it need to have a local minimum? No, it could have a saddle point in between, a point that is a local minimum […]
Clean obfuscated code
One way to obfuscate code is clever use of arcane programming language syntax. Hackers are able to write completely unrecognizable code by exploiting dark corners of programming techniques and languages. Some of these attempts are quite impressive. But it’s also possible to write clean source code that is nevertheless obfuscated. For example, it’s not at […]
How many musical scales are there?
How many musical scales are there? That’s not a simple question. It depends on how you define “scale.” For this post, I’ll only consider scales starting on C. That is, I’ll only consider changing the intervals between notes, not changing the starting note. Also, I’ll only consider subsets of the common chromatic scale; this post […]
Toxic pairs, re-identification, and information theory
Database fields can combine in subtle ways. For example, nationality is not usually enough to identify anyone. Neither is religion. But the combination of nationality and religion can be surprisingly informative. Information content of nationality How much information is contained in nationality? That depends on exactly how you define nations versus territories etc., but for […]
Chaos and the beta distribution
Iteration of the quadratic function f(x) = 4x(1-x) is a famous example in chaos theory. Here’s what the first few iterations look like, starting with 1/√3. (There’s nothing special about that starting point; any point that doesn’t iterate to exactly zero will do.) The values appear to bounce all over the place. Let’s look at a […]
Cellular automata with random initial conditions
The previous post looked at a particular cellular automaton, the so-called Rule 90. When started with a single pixel turned on, it draws a Sierpinski triangle. With random starting pixels, it draws a semi-random pattern that retains features like the Sierpinski triangle. There are only 256 possible elementary cellular automata, so it’s practical to plot […]
Sierpinski triangle strikes again
A couple months ago I wrote about how a simple random process gives rise to the Sierpinski triangle. Draw an equilateral triangle and pick a random point in the plane. Repeatedly pick a triangle vertex at random and move half way from the current position to that vertex. The result converges to a Sierpinksi triangle. […]
A cryptographically secure random number generator
A random number generator can have excellent statistical properties and yet not be suited for use in cryptography. I’ve written a few posts to demonstrate this. For example, this post shows how to discover the seed of an LCG random number generator. This is not possible with a secure random number generator. Or more precisely, […]
Aerial video of Hurricane Harvey aftermath and cleanup
Video by my friend Aaron Benzel showing the debris and cleanup typical of neighborhoods that flooded in Harvey.
Adding Laplace or Gaussian noise to database for privacy
In the previous two posts we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to continuous or unbounded data. I like to keep the posts here fairly short, but this topic is fairly technical. To keep it short I’ll omit some of […]
Quantifying privacy loss in a statistical database
In the previous post we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post we’ll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off. More general randomized response Suppose we have a binary response […]
Randomized response, privacy, and Bayes theorem
Suppose you want to gather data on an incriminating question. For example, maybe a statistics professor would like to know how many students cheated on a test. Being a statistician, the professor has a clever way to find out what he wants to know while giving each student deniability. Randomized response Each student is asked […]
Why don’t you simply use XeTeX?
From an FAQ post I wrote a few years ago: This may seem like an odd question, but it’s actually one I get very often. On my TeXtip twitter account, I include tips on how to create non-English characters such as using \AA to produce Å. Every time someone will ask “Why not use XeTeX and just […]
Pascal’s triangle and Fermat’s little theorem
I was listening to My Favorite Theorem when Jordan Ellenberg said something in passing about proving Fermat’s little theorem from Pascal’s triangle. I wasn’t familiar with that, and fortunately Evelyn Lamb wasn’t either and so she asked him to explain. Fermat’s little theorem says that for any prime p, then for any integer a, ap = a […]
Making a problem easier by making it harder
In the oral exam for my PhD, my advisor asked me a question about a differential equation. I don’t recall the question, but I remember the interaction that followed. I was stuck, and my advisor countered by saying “Let me ask you a harder question.” I was still stuck, and so he said “Let me […]
Quantifying the information content of personal data
It can be surprisingly easy to identify someone from data that’s not directly identifiable. One commonly cited result is that the combination of birth date, zip code, and sex is enough to identify most people. This post will look at how to quantify the amount of information contained in such data. If the answer to […]
Negative correlation introduced by success
Suppose you measure people on two independent attributes, X and Y, and take those for whom X+Y is above some threshold. Then even though X and Y are uncorrelated in the full population, they will be negatively correlated in your sample. This article gives the following example. Suppose beauty and acting ability were uncorrelated. Knowing how […]
Highly cited theorems
Some theorems are cited far more often than others. These are not the most striking theorems, not the most advanced or most elegant, but ones that are extraordinarily useful. I first noticed this when taking complex analysis where the Cauchy integral formula comes up over and over. When I first saw the formula I thought […]
Width of mixture PDFs
Suppose you draw samples from two populations, one of which has a wider probability distribution than the other. How does the width of the distribution of the combined sample vary as you change the proportions of the two populations? The extremes are easy. If you pick only from one population, then the resulting distribution will […]
Team dynamics and encouragement
When you add people to a project, the total productivity of the team as a whole may go up, but the productivity per person usually goes down. Someone suggested that as a rule of thumb, a company needs to triple its number of employees to double its productivity. Fred Brooks summarized this saying “Many hands […]
Relearning from a new perspective
I had a conversation with someone today who said he’s relearning logic from a categorical perspective. What struck me about this was not the specifics but the pattern: Relearning _______ from a _______ perspective. Not relearning something forgotten, but going back over something you already know well, but from a different starting point, a different […]
Hurricane Harvey update
As you may know, I live in the darkest region of the rainfall map below. My family and I are doing fine. Our house has not flooded, and at this point it looks like it will not flood. We’ve only lost electricity for a second or two. Of course not everyone in Houston is doing […]
Defining the Fourier transform on LCA groups
My previous post said that all the familiar variations on Fourier transforms—Fourier series analysis and synthesis, Fourier transforms on the real line, discrete Fourier transforms, etc.—can be unified into a single theory. They’re all instances of a Fourier transform on a locally compact Abelian (LCA) group. The difference between them is the underlying group. Given […]
Unified theory of Fourier transforms
You can take a periodic function and analyze it into its Fourier coefficients, or use the Fourier coefficients in a sum to synthesize a periodic function. You can take the Fourier transform of a function defined on the whole real line and get another such function. And you can compute the discrete Fourier transform via […]
Solving problems we wish we had
There’s a great line from Heather McGaw toward the end of the latest episode of 99 Percent Invisible: Sometimes … we can start to solve problems that we wish were problems because they’re easy to solve. Reminds me of an excerpt from Richard Weaver’s book Ideas Have Consequences: Obsession, according to the canons of psychology, […]
Predicting when an RNG will output a given value
A few days ago I wrote about how to pick the seed of a simple random number generator so that a desired output came n values later. The number n was fixed and we varied the seed. In this post, the seed will be fixed and we’ll solve for n. In other words, we ask when a […]
Programming language life expectancy
The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century. […]
...48495051525354555657...