Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-11-23 00:01
Estimating vocabulary size with Heaps’ law
Heaps’ law says that the number of unique words in a text of n words is approximated by V(n) = K nβ where K is a positive constant and β is between 0 and 1. According to the Wikipedia article on Heaps’ law, K is often between 10 and 100 and β is often between 0.4 […]
Mickey Mouse, Batman, and conformal mapping
A conformal map between two regions in the plane preserves angles [1]. If two curves meet at a given angle in the domain, their images will meet at the same angle in the range. Two subsets of the plane are conformally equivalent if there is a conformal map between them. The Riemann mapping theorem says […]
Star-crossed lovers
A story in The New Yorker quotes the following explanation from Arthur Eddington regarding relativity and the speed of light. Suppose that you are in love with a lady on Neptune and that she returns the sentiment. It will be some consolation for the melancholy separation if you can say to yourself at some—possibly prearranged—moment, […]
Contributing to open source projects
David Heinemeier Hansson presents a very gracious view of open source software in his keynote address at RailsConf 2019. And by gracious, I mean gracious in the theological sense. He says at one point “If I were a Christian …” implying that he is not, but his philosophy of software echos the Christian idea of […]
Stone-Weierstrass on a disk
A couple weeks ago I wrote about a sort of paradox, that Weierstrass’ approximation theorem could seem to contradict Morera’s theorem. Weierstrass says that the uniform limit of polynomials can be an arbitrary continuous function, and so may have sharp creases. But Morera’s theorem says that the uniform limit of polynomials is analytic and thus […]
Distribution of zip code population
There are three schools of thought regarding power laws: the naive, the enthusiasts, and the skeptics. Of course there are more than three schools of thought, but there are three I want to talk about. The naive haven’t heard of power laws or don’t know much about them. They probably tend to expect things to […]
Landau kernel
The previous post was about the trick Lebesgue used to construct a sequence of polynomials converging to |x| on the interval [-1, 1]. This was the main step in his proof of the Weierstrass approximation theorem. Before that, I wrote a post on Bernstein’s proof that used his eponymous polynomials to prove Weierstrass’ theorem. This […]
Lebesgue’s proof of Weierstrass’ theorem
A couple weeks ago I wrote about the Weierstrass approximation theorem, the theorem that says every continuous function on a closed finite interval can be approximated as closely as you like by a polynomial. The post mentioned above uses a proof by Bernstein. And in that post I used the absolute value function as an […]
Proving that a choice was made in good faith
How can you prove that a choice was made in good faith? For example, if your company selects a cohort of people for random drug testing, how can you convince those who were chosen that they weren’t chosen deliberately? Would a judge find your explanation persuasive? This is something I’ve helped companies with. It may […]
Detecting a short period in an RNG
The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look at its fixed points. These turn out to be less of a problem in practice than in theory. Yesterday I posted about testing the generator with the DIEHARDER test suite, the successor to George Marsaglia’s […]
Testing Cliff RNG with DIEHARDER
My previous post introduced the Cliff random number generator. The post showed how to find starting seeds where the generator will start out by producing approximately equal numbers. Despite this flaw, the generator works well by some criteria. I produced a file of s billion 32-bit integers by multiplying the output values, which were floating […]
Fixed points of the Cliff random number generator
I ran across the Cliff random number generator yesterday. Given a starting value x0 in the open interval (0, 1), the generator proceeds by xn+1 = | 100 log(xn) mod 1 | for n > 0. The article linked to above says that this generator passes a test of randomness based on generating points on […]
Ease of learning vs relearning
Much more is written about how easy or hard some technology is to learn than about how hard it is to relearn. Maybe this is because people are more eager to write about something while the excitement or frustration of their first encounter is fresh. Advocates of difficult-to-learn technologies say that tools should be optimized […]
Uniform approximation paradox
What I’m going to present here is not exactly a paradox, but I couldn’t think of a better way to describe it in the space of a title. I’ll discuss two theorems about uniform convergence that seem to contradict each other, then show by an example why there’s no contradiction. Weierstrass approximation theorem One of […]
Nearly parallel is nearly transitive
We begin with a bit of geometry, then show its relevance to statistics. Geometry Let X, Y, and Z be three unit vectors. If X is nearly parallel to Y, and Y is nearly parallel to Z, then X is nearly parallel to Z. Here’s a proof. Think of X, Y, and Z as points […]
Angles in the spiral of Theodorus
The previous post looked at how to plot the spiral of Theodorus shown below. We stopped the construction where we did because the next triangle to be added would overlap the first triangle, which would clutter the image. But we could certainly have kept going. If we do keep going, then the set of hypotenuse […]
How to plot the spiral of Theodorus
You may have seen the spiral of Theodorus. It sticks a sequence of right triangles together to make a sort of spiral. Each triangle has a short side of length 1, and the hypotenuse of each triangle becomes the long leg of the next triangle as shown below. How would you plot this spiral? At […]
Encryption as secure as factoring
RSA encryption is based on the assumption that factoring large integers is hard. However, it’s possible that breaking RSA is easier than factoring. That is, the ability to factor large integers is sufficient for breaking RSA, but it might not be necessary. Two years after the publication of RSA, Michael Rabin created an alternative that […]
Accelerating convergence with Aitken’s method
The previous post looked at Euler’s method for accelerating the convergence of a slowly converging alternating series. Both hypotheses are necessary. The signs must alternate between terms, and applying the method to a series that is already converging quickly can slow down convergence. Aitken’s method This post looks at Aitken’s method for speeding up the […]
Accelerating an alternating series
The most direct way of computing the sum of an alternating series, simply computing the partial sums in the terms get small enough, may not be the most efficient. Euler figured this out in the 18th century. For our demo we’ll evaluate the Struve function defined by the series Note that the the terms in […]
Data breach trends
Are data breaches becoming more or less common? This post gives a crude, back-of-the-envelope calculation to address the question. We won’t look at number of breaches per se but number of records breached. There’s a terrific visualization of data breach statistics at Information is Beautiful, and they share their data here. Note that the data […]
Beating the odds on the Diffie-Hellman decision problem
There are a couple variations on the Diffie-Hellman problem in cryptography: the computation problem (CDH) and the decision problem (DDH). This post will explain both and give an example of where the former is hard and the latter easy. The Diffie-Hellman problems The Diffie-Hellman problems are formulated for an Abelian group. The main group we […]
Magic square links and errata
Someone pointed out that what I called a knight’s tour magic square is technically a semi-magic square: the rows and columns add up to the same constant, but the diagonals do not. It turns out there are no strict magic squares formed by knight’s tours. This was proved in 2003. See a news article here. […]
Quaternion reference in the Vulgate
To contemporary ears “quaternion” refers to a number system discovered in the 19h century, but there were a couple precedents. Both refer to something related to a group of four things, but there is no relation to mathematical quaternions other than that they have four dimensions. I’ve written before about Milton’s use of the word […]
Fame, difficulty, and usefulness
Pierre Fermat is best known for two theorems, dubbed his “last” theorem and his “little” theorem. His last theorem is famous, difficult to prove, and useless. His little theorem is relatively arcane, easy to prove, and extremely useful. There’s little relation between technical difficulty and usefulness. Fermat’s last theorem Fermat’s last theorem says there are […]
Twisted elliptic curves
This morning I was sitting at a little bakery thinking about what to do before I check out of my hotel. I saw that the name of the bakery was Twist Bakery & Cafe, and that made me think of writing about twisted elliptic curves when I got back to my room. Twist of an […]
Hashing names does not protect privacy
Secure hash functions are practically impossible to reverse, but only if the input is unrestricted. If you generate 256 random bits and apply a secure 256-bit hash algorithm, an attacker wanting to recover your input can’t do much better than brute force, hashing 256-bit strings hoping to find one that matches your hash value. Even […]
May the best technology win
I’ve become skeptical of arguments of the form “X is a better technology, but people won’t quit using Y.” Comparisons of technologies are multi-faceted. When someone says “X is better than Y” I want to ask “By all criteria? There’s nothing better about Y?” When people say X is better but Y won, it’s often […]
Integral approximation trick
Here’s a simple integration approximation that works remarkably well in some contexts. Suppose you have an integrand that looks roughly like a normal density, something with a single peak that drops off fairly quickly on either side of the peak. The majority of integrals that arise in basic applications of probability and statistics fit this […]
Number of feet in a mile
Here are a couple amusing things I’ve run across recently regarding the number of feet in a mile. Both are frivolous, but also have a more serious side. Mnemonic First, you can use “five tomatoes” as a mnemonic for remembering that there are 5280 feet in a mile. “Five tomatoes” is a mnemonic for the […]
Discriminant of a cubic
The discriminant of a quadratic equation ax² + bx + c = 0 is Δ = b² – 4ac. If the discriminant Δ is zero, the equation has a double root, i.e. there is a unique x that makes the equation zero, and it counts twice as a root. If the discriminant is not zero, […]
Distribution of quadratic residues
Let p be an odd prime number. If the equation x² = n mod p has a solution then n is a square mod p, or in classical terminology, n is a quadratic residue mod p. Half of the numbers between 0 and p are quadratic residues and half are not. The residues are distributed […]
What does CCPA say about de-identified data?
The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my […]
Serious applications of a party trick
In a group of 30 people, it’s likely that two people have the same birthday. For a group of 23 the probability is about 1/2, and it goes up as the group gets larger. In a group of a few dozen people, it’s unlikely that anyone will have a particular birthday, but it’s likely that […]
Channel quantity and quality
Years ago, when there were a couple dozen television stations, someone [1] speculated that when we got more channels we’d also get better content. The argument was that people are more similar in their base interests than in their more refined interests. Therefore if there are only a few channels, they will all appeal to […]
Symmetry in exponential sums
Today’s exponential sum is highly symmetric: These sums are often symmetric, but not always. For example, here’s the sum from a couple days ago: It’s not obvious from looking at the parameters whether a sum will be symmetric or not. Maybe someone could find a prove criteria for a sum to have certain symmetries. For […]
Proto-calculus
David Bressoud has written a new book entitled Calculus Reordered: A History of the Big Ideas. He presents the major themes of calculus in historical order, which is very different from the order in which it is now taught. We now begin with limits, then differentiation, integration, and infinite series. Historically, integration came first and […]
Homomorphic encryption
A function that satisfies f(x*y) = f(x)*f(y) is called a homomorphism. The symbol “*” can stand for any operation, and it need not stand for the same thing on both sides of the equation. Technically * is the group operation, and if the function f maps elements of one group to another, the group operation […]
Journalistic stunt with Emacs
Emacs has been called a text editor with ambitions of being an operating system, and some people semi-seriously refer to it as their operating system. Emacs does not want to be an operating system per se, but it is certainly ambitious. It can be a shell, a web browser, an email client, a calculator, a […]
Notes on computing hash functions
A secure hash function maps a file to a string of bits in a way that is hard to reverse. Ideally such a function has three properties: pre-image resistance collision resistance second pre-image resistance Pre-image resistance means that starting from the hash value, it is very difficult to infer what led to that output; it […]
Translating poetry
You can’t preserve every aspect of a text when translating. A strict word-for-word translation attempts to be faithful to the words but may be ungrammatical in the target language. An idea-for-idea translation is more readable, but still may not convey the style of the original. Translation reminds me of making maps. There have been countless […]
Category theory for data science: cautious optimism
I’m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but I’m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty: [Jean-Pierre] Serre […]
Software to factor integers
In my previous post, I showed how changing one bit of a semiprime (i.e. the product of two primes) creates an integer that can be factored much faster. I started writing that post using Python with SymPy, but moved to Mathematica because factoring took too long. SymPy vs Mathematica When I’m working in Python, SymPy […]
Making public keys factorable with Rowhammer
The security of RSA encryption depends on the fact that the product of two large primes is difficult to factor. So if p and q are large primes, say 2048 bits each, then you can publish n = pq with little fear that someone can factor n to recover p and q. But if you […]
Bounds on the nth prime
The nth prime is approximately n log n. For more precise estimates, there are numerous upper and lower bounds for the nth prime, each tighter over some intervals than others. Here I want to point out upper and lower bounds from a dissertation by Christian Axler on page viii. First, define Then for sufficiently large […]
Converting between nines and sigmas
Nines and sigmas are two ways to measure quality. You’ll hear something has four or five nines of reliability or that some failure is a five sigma event. What do these mean, and how do you convert between them? Definitions If a system has fives nines of availability, that means the probability of the system […]
Maybe it’s just hard
If someone tells you repeatedly that something isn’t hard, maybe it’s just hard. Monads A post by Gilad Bracha got me thinking about this. He says Last time I looked, the Haskell wiki listed 29 tutorials on [monads]. … Could it just be that people just have a hard time understanding monads? If so, what […]
Why are regular expressions difficult?
Regular expressions are challenging, but not for the reasons commonly given. Non-reasons Here are some reasons given for the difficulty of regular expressions that I don’t agree with. Cryptic syntax I think complaints about cryptic syntax miss the mark. Some people say that Greek is hard to learn because it uses a different alphabet. If […]
Feller-Tornier constant
Here’s kind of an unusual question: What is the density of integers that have an even number of prime factors with an exponent greater than 1? To define the density, you take the proportion up to an integer N then take the limit as N goes to infinity. It’s not obvious that the limit should […]
Translating Robert Burns
Last year Adam Roberts had some fun with Finnegans Wake [1], seeing how little he could edit it and turn it into something that sounded like Return of the Jedi. I wrote a blog post where I quantified the difference between the original and the parody using Levenshtein distance, basically how many edits it takes […]
...31323334353637383940...