Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-09-11 20:31
Harmonographs
In the previous post, I said that Lissajous curves are the result of plotting a curve whose x and y coordinates come from (undamped) harmonic oscillators. If we add a little bit of dampening, multiplying our cosine terms by negative exponentials, the resulting curve is called a harmonograph. Here’s a bit of Mathematica code to […]
Lissajous curves and knots
Suppose that over time the x and y coordinates of a point are both given by a harmonic oscillator, i.e. x(t) = cos(nx t + φx) y(t) = cos(ny t + φy) Then the resulting path is called a Lissajous curve. If you add a z coordinate also given by harmonic oscillator z(t) = cos(nz […]
Curvature of an ellipsoid
For an ellipsoid with equation the Gaussian curvature at each point is given by Now suppose a ≥ b ≥ c > 0. Otherwise relabel the coordinate axes so that this is the case. Then the largest curvature occurs at (±a, 0, 0), and the smallest curvature occurs at (0, 0, ±c). You could prove […]
Fixed points
Take a calculator and enter any number. Then press the cosine key over and over. Eventually the numbers will stop changing. You will either see 0.99984774 or 0.73908513, depending on whether your calculator was in degree mode or radian mode. This is an example of a fixed point, a point that doesn’t change when you […]
Number of real roots in an interval
Suppose you have a polynomial p(x) and in interval [a, b] and you want to know how many distinct real roots the polynomial has in the interval. You can answer this question using Sturm’s algorithm. Let p0(x) = p(x) and letp1(x) be its derivative p‘(x). Then define a series of polynomials for i ≥ 1 […]
Total curvature of a knot
Tie a knot in a rope and join the ends together. At each point in the rope, compute the curvature, i.e. how much the rope bends, and integrate this over the length of the rope. The Fary-Milnor theorem says the result must be greater than 4π. This post will illustrate this theorem by computing numerically […]
A sort of mathematical quine
Julian Havil writes what I think of as serious recreational mathematics. His books are recreational in the sense that they tell a story rather than cover a subject. They are lighter reading than a text book, but require more advanced mathematics than books by Martin Gardner. Havil’s latest book is Curves for the Mathematically Curious. […]
Control characters
I didn’t realize until recently that there’s a connection between the control key on a computer keyboard and controlling a mechanical device. Both uses of the word control are related via ASCII control characters as I discovered by reading the blog post Four Column ASCII. Computers work with bits in groups of eight, and there […]
Fat tails and the t test
Suppose you want to test whether something you’re doing is having any effect. You take a few measurements and you compute the average. The average is different than what it would be if what you’re doing had no effect, but is the difference significant? That is, how likely is it that you might see the […]
Amendment to CCPA regarding personal information
California’s new privacy law takes effect January 1, 2020, less than 100 days from now. The bill was written in a hurry in order to prevent a similar measuring from appearing on a ballot initiative. The thought was that the state legislature would pass something quickly then clean it up later with amendments. Six amendments […]
Right to be forgotten in the news
The GDPR‘s right-to-be-forgotten has been in the news this week. This post will look at a couple news stories and how they relate. Forgetting about a stabbing On Monday the New York Times ran a story about an Italian news site that folded as a result of resisting requests to hide a story about a […]
Exception Driven Development
Using program exceptions as a learning tool: When I’m learning something new, I sometimes find myself practicing EDD (exception driven development). I try to evaluate some code, get an exception or error message, and then Google the error message to figure out what the heck happened. From Mastering Clojure Macros
One of these days I’m going to figure this out
If something is outside your grasp, it’s hard to know just how far outside it is. Many times I’ve intended to sit down and understand something thoroughly, and I’ve put it off for years. Maybe it’s a programming language that I just use a few features of, or a book I keep seeing references to. […]
Typesetting zodiac symbols in LaTeX
Typesetting zodiac symbols in LaTeX is admittedly an unusual thing to do. LaTeX is mostly used for scientific publication, and zodiac symbols are commonly associated with astrology. But occasionally zodiac symbols are used in more respectable contexts. The wasysym package for LaTeX includes miscellaneous symbols, including zodiac symbols. Here are the symbols, their LaTeX commands, […]
Airline flight number parity
I read in Wikipedia this morning that there’s a pattern to the parity of flight numbers. Among airline flight numbers, even numbers typically identify eastbound or northbound flights, and odd numbers typically identify westbound or southbound flights. I never noticed this. I could see how it might be a useful convention. It would mean that […]
Testing Rupert Miller’s suspicion
I was reading Rupert Miller’s book Beyond ANOVA when I ran across this line: I never use the Kolmogorov-Smirnov test (or one of its cousins) or the χ² test as a preliminary test of normality. … I have a feeling they are more likely to detect irregularities in the middle of the distribution than in […]
Why would anyone do that?
There are tools that I’ve used occasionally for many years that I’ve just started to appreciate lately. “Oh, that’s why they did that.” When you see something that looks poorly designed, don’t just exclaim “Why would anyone do that?!” but ask sincerely “Why would someone do that?” There’s probably a good reason, or at least […]
Predicted distribution of Mersenne primes
Mersenne primes are prime numbers of the form 2p – 1. It turns out that if 2p – 1 is a prime, so is p; the requirement that p is prime is a theorem, not part of the definition. So far 51 Mersenne primes have discovered [1]. Maybe that’s all there are, but it is […]
Short video introducing differential privacy
Here is a 12-minute video from Minute Physics, in collaboration with the US Census Bureau, giving an overview of differential privacy and how the 2020 census will use it to protect privacy. Related posts Scaling up differential privacy: lessons from the US Census Protecting privacy while keeping detailed date information Comparing differential privacy to Safe […]
Collatz conjecture skepticism
The Collatz conjecture asks whether the following procedure always terminates at 1. Take any positive integer n. If it’s odd, multiply it by 3 and add 1. Otherwise, divide it by 2. For obvious reasons the Collatz conjecture is also known as the 3n + 1 conjecture. It has been computationally verified that the Collatz […]
String interpolation in Python and R
One of the things I liked about Perl was string interpolation. If you use a variable name in a string, the variable will expand to its value. For example, if you a variable $x which equals 42, then the string "The answer is $x." will expand to “The answer is 42.” Perl requires variables to […]
Detecting typos with the four color theorem
In my previous post on VIN numbers, I commented that if a check sum has to be one of 11 characters, it cannot detect all possible changes to a string from an alphabet of 33 characters. The number of possible check sum characters must be at least as large as the number of possible characters […]
Vehicle Identification Number (VIN) check sum
A VIN (vehicle identification number) is a string of 17 characters that uniquely identifies a car or motorcycle. These numbers are used around the world and have three standardized formats: one for North America, one for the EU, and one for the rest of the world. Letters that resemble digits The characters used in a […]
Progress on the Collatz conjecture
The Collatz conjecture is for computer science what until recently Fermat’s last theorem was for mathematics: a famous unsolved problem that is very simple to state. The Collatz conjecture, also known as the 3n+1 problem, asks whether the following function terminates for all positive integer arguments n. def collatz(n): if n == 1: return 1 […]
How UTF-8 works
UTF-8 is a clever way of encoding Unicode text. I’ve mentioned it a couple times lately, but I haven’t blogged about UTF-8 per se. Here goes. The problem UTF-8 solves US keyboards can often produce 101 symbols, which suggests 101 symbols would be enough for most English text. Seven bits would be enough to encode […]
Excel, R, and Unicode
I received some data as an Excel file recently. I cleaned things up a bit, exported the data to a CSV file, and read it into R. Then something strange happened. Say the CSV file looked like this: foo,bar 1,2 3,4 I read the file into R with df <- read.csv("foobar.csv", header=TRUE) and could access […]
How fast were dead languages spoken?
A new paper in Science suggests that all human languages carry about the same amount of information per unit time. In languages with fewer possible syllables, people speak faster. In languages with more syllables, people speak slower. Researchers quantified the information content per syllable in 17 different languages by calculating Shannon entropy. When you multiply […]
Quiet mode
When you start a programming language like Python or R from the command line, you get a lot of initial text that you probably don’t read. For example, you might see something like this when you start Python. Python 2.7.6 (default, Nov 23 2017, 15:49:48) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" […]
More bc weirdness
As I mentioned in a footnote to my previous post, I just discovered that variable names in the bc programming language cannot contain capital letters. I think I understand why: Capital letters are reserved for hexadecimal constants, though in a weird sort of way. At first variable names in bc could only be one letter […]
Asimov’s question about π
In 1977, Isaac Asimov [1] asked how many terms of the slowly converging series π = 4 – 4/3 + 4/5 – 4/7 + 4/9 – … would you have to sum before doing better than the approximation π ≈ 355/113. A couple years later Richard Johnsonbaugh [2] answered Asimov’s question in the course of […]
National Drug Code (NDC)
The US Food and Drug Administration tracks drugs using an identifer called the NDC or National Drug Code. It is described as a 10-digit code, but it may be more helpful to think of it as a 12-character code. An NDC contains 10 digits, separated into three segments by two dashes. The three segments are […]
Prefix code examples
In many offices, you can dial a three digit number to reach someone else in the office. In such offices, you usually have to dial 9 to to reach an outside number. There’s no ambiguity because no one can have an extension that begins with 9. After you’ve entered three digits, the phone system knows […]
How many possible Unicode characters there are and why
How many? The previous post showed how the number of Unicode characters has grown over time. You’ll notice there was a big jump between versions 3.0 and 3.1. That will be important later on. Unicode started out relative small then became much more ambitious. Are they going to run out of room? How many possible […]
Growth of Unicode over time
My previous post quoted Randall Munroe saying Unicode “started out just trying to unify a couple different character sets” and grew much more ambitious. The first version of Unicode, published in 1991, had 7,191 characters. Now the latest version has 137,994 characters and so is about 19 times bigger. Here’s a plot of the number […]
The hopeless task of the Unicode Consortium
Randall Munroe, author of xkcd, discussing Unicode on the Triangulation podcast: I am endlessly delighted by the hopeless task that the Unicode Consortium has created for themselves. … They started out just trying to unify a couple different character sets. And before they quite realized what was happening, they were grappling with decisions at the […]
Regular expressions and special characters
Special characters make text processing more complicated because you have to pay close attention to context. If you’re looking at Python code containing a regular expression, you have to think about what you see, what Python sees, and what the regular expression engine sees. A character may be special to Python but not to regular […]
Munging CSV files with standard Unix tools
This post briefly discusses working with CSV (comma separated value) files using command line tools that are usually available on any Unix-like system. This will raise two objections: why CSV and why dusty old tools? Why CSV? In theory, and occasionally in practice, CSV can be a mess. But CSV is the de facto standard […]
Three-digit zip codes and data privacy
Birth date, sex, and five-digit zip code are enough information to uniquely identify a large majority of Americans. See more on this here. So if you want to deidentify a data set, the HIPAA Safe Harbor provision says you should chop off the last two digits of a zip code. And even though three-digit zip […]
Working with wide text files at the command line
Suppose you have a data file with obnoxiously long lines and you’d like to preview it from the command line. For example, the other day I downloaded some data from the American Community Survey and wanted to see what the files contained. I ran something like head data.csv to look at the first few lines […]
Estimating vocabulary size with Heaps’ law
Heaps’ law says that the number of unique words in a text of n words is approximated by V(n) = K nβ where K is a positive constant and β is between 0 and 1. According to the Wikipedia article on Heaps’ law, K is often between 10 and 100 and β is often between 0.4 […]
Mickey Mouse, Batman, and conformal mapping
A conformal map between two regions in the plane preserves angles [1]. If two curves meet at a given angle in the domain, their images will meet at the same angle in the range. Two subsets of the plane are conformally equivalent if there is a conformal map between them. The Riemann mapping theorem says […]
Star-crossed lovers
A story in The New Yorker quotes the following explanation from Arthur Eddington regarding relativity and the speed of light. Suppose that you are in love with a lady on Neptune and that she returns the sentiment. It will be some consolation for the melancholy separation if you can say to yourself at some—possibly prearranged—moment, […]
Contributing to open source projects
David Heinemeier Hansson presents a very gracious view of open source software in his keynote address at RailsConf 2019. And by gracious, I mean gracious in the theological sense. He says at one point “If I were a Christian …” implying that he is not, but his philosophy of software echos the Christian idea of […]
Stone-Weierstrass on a disk
A couple weeks ago I wrote about a sort of paradox, that Weierstrass’ approximation theorem could seem to contradict Morera’s theorem. Weierstrass says that the uniform limit of polynomials can be an arbitrary continuous function, and so may have sharp creases. But Morera’s theorem says that the uniform limit of polynomials is analytic and thus […]
Distribution of zip code population
There are three schools of thought regarding power laws: the naive, the enthusiasts, and the skeptics. Of course there are more than three schools of thought, but there are three I want to talk about. The naive haven’t heard of power laws or don’t know much about them. They probably tend to expect things to […]
Landau kernel
The previous post was about the trick Lebesgue used to construct a sequence of polynomials converging to |x| on the interval [-1, 1]. This was the main step in his proof of the Weierstrass approximation theorem. Before that, I wrote a post on Bernstein’s proof that used his eponymous polynomials to prove Weierstrass’ theorem. This […]
Lebesgue’s proof of Weierstrass’ theorem
A couple weeks ago I wrote about the Weierstrass approximation theorem, the theorem that says every continuous function on a closed finite interval can be approximated as closely as you like by a polynomial. The post mentioned above uses a proof by Bernstein. And in that post I used the absolute value function as an […]
Proving that a choice was made in good faith
How can you prove that a choice was made in good faith? For example, if your company selects a cohort of people for random drug testing, how can you convince those who were chosen that they weren’t chosen deliberately? Would a judge find your explanation persuasive? This is something I’ve helped companies with. It may […]
Detecting a short period in an RNG
The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look at its fixed points. These turn out to be less of a problem in practice than in theory. Yesterday I posted about testing the generator with the DIEHARDER test suite, the successor to George Marsaglia’s […]
Testing Cliff RNG with DIEHARDER
My previous post introduced the Cliff random number generator. The post showed how to find starting seeds where the generator will start out by producing approximately equal numbers. Despite this flaw, the generator works well by some criteria. I produced a file of s billion 32-bit integers by multiplying the output values, which were floating […]
...36373839404142434445...