Feed john-d-cook John D. Cook

John D. Cook

Link	https://www.johndcook.com/blog
Feed	http://feeds.feedburner.com/TheEndeavour?format=xml
Updated	2025-07-04 05:31

Estimating vocabulary size with Heaps’ law

by

John

on 2019-08-27 12:00 (#4P1JZ)

Heapsâ€™ law says that the number of unique words in a text of n words is approximated by V(n) = K nÎ² where K is a positive constant and Î² is between 0 and 1. According to the Wikipedia article on Heapsâ€™ law, K is often between 10 and 100 and Î² is often between 0.4 [â€¦]

Mickey Mouse, Batman, and conformal mapping

by

John

on 2019-08-23 12:00 (#4NT3K)

A conformal map between two regions in the plane preserves angles [1]. If two curves meet at a given angle in the domain, their images will meet at the same angle in the range. Two subsets of the plane are conformally equivalent if there is a conformal map between them. The Riemann mapping theorem says [â€¦]

Star-crossed lovers

by

John

on 2019-08-22 16:34 (#4NRCR)

A story in The New Yorker quotes the following explanation from Arthur Eddington regarding relativity and the speed of light. Suppose that you are in love with a lady on Neptune and that she returns the sentiment. It will be some consolation for the melancholy separation if you can say to yourself at someâ€”possibly prearrangedâ€”moment, [â€¦]

Contributing to open source projects

by

John

on 2019-08-22 13:26 (#4NQXC)

David Heinemeier Hansson presents a very gracious view of open source software in his keynote address at RailsConf 2019. And by gracious, I mean gracious in the theological sense. He says at one point â€œIf I were a Christian â€¦â€ implying that he is not, but his philosophy of software echos the Christian idea of [â€¦]

Stone-Weierstrass on a disk

by

John

on 2019-08-22 12:08 (#4NQMA)

A couple weeks ago I wrote about a sort of paradox, that Weierstrassâ€™ approximation theorem could seem to contradict Moreraâ€™s theorem. Weierstrass says that the uniform limit of polynomials can be an arbitrary continuous function, and so may have sharp creases. But Moreraâ€™s theorem says that the uniform limit of polynomials is analytic and thus [â€¦]

Distribution of zip code population

by

John

on 2019-08-21 13:25 (#4NNCC)

There are three schools of thought regarding power laws: the naive, the enthusiasts, and the skeptics. Of course there are more than three schools of thought, but there are three I want to talk about. The naive havenâ€™t heard of power laws or donâ€™t know much about them. They probably tend to expect things to [â€¦]

by

John

on 2019-08-20 13:52 (#4NJS6)

The previous post was about the trick Lebesgue used to construct a sequence of polynomials converging to |x| on the interval [-1, 1]. This was the main step in his proof of the Weierstrass approximation theorem. Before that, I wrote a post on Bernsteinâ€™s proof that used his eponymous polynomials to prove Weierstrassâ€™ theorem. This [â€¦]

Lebesgue’s proof of Weierstrass’ theorem

by

John

on 2019-08-20 02:51 (#4NHRR)

A couple weeks ago I wrote about the Weierstrass approximation theorem, the theorem that says every continuous function on a closed finite interval can be approximated as closely as you like by a polynomial. The post mentioned above uses a proof by Bernstein. And in that post I used the absolute value function as an [â€¦]

Proving that a choice was made in good faith

by

John

on 2019-08-14 13:02 (#4N8ZP)

How can you prove that a choice was made in good faith? For example, if your company selects a cohort of people for random drug testing, how can you convince those who were chosen that they werenâ€™t chosen deliberately? Would a judge find your explanation persuasive? This is something Iâ€™ve helped companies with. It may [â€¦]

Detecting a short period in an RNG

by

John

on 2019-08-12 15:28 (#4N4JS)

The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look at its fixed points. These turn out to be less of a problem in practice than in theory. Yesterday I posted about testing the generator with the DIEHARDER test suite, the successor to George Marsagliaâ€™s [â€¦]

Testing Cliff RNG with DIEHARDER

by

John

on 2019-08-11 21:26 (#4N34Z)

My previous post introduced the Cliff random number generator. The post showed how to find starting seeds where the generator will start out by producing approximately equal numbers. Despite this flaw, the generator works well by some criteria. I produced a file of s billion 32-bit integers by multiplying the output values, which were floating [â€¦]

Fixed points of the Cliff random number generator

by

John

on 2019-08-11 12:59 (#4N2BV)

I ran across the Cliff random number generator yesterday. Given a starting value x0 in the open interval (0, 1), the generator proceeds by xn+1 = | 100 log(xn) mod 1 | for n > 0. The article linked to above says that this generator passes a test of randomness based on generating points on [â€¦]

Ease of learning vs relearning

by

John

on 2019-08-10 16:15 (#4N169)

Much more is written about how easy or hard some technology is to learn than about how hard it is to relearn. Maybe this is because people are more eager to write about something while the excitement or frustration of their first encounter is fresh. Advocates of difficult-to-learn technologies say that tools should be optimized [â€¦]

Uniform approximation paradox

by

John

on 2019-08-09 14:40 (#4MYZQ)

What Iâ€™m going to present here is not exactly a paradox, but I couldnâ€™t think of a better way to describe it in the space of a title. Iâ€™ll discuss two theorems about uniform convergence that seem to contradict each other, then show by an example why thereâ€™s no contradiction. Weierstrass approximation theorem One of [â€¦]

Nearly parallel is nearly transitive

by

John

on 2019-08-06 11:20 (#4MPW9)

We begin with a bit of geometry, then show its relevance to statistics. Geometry Let X, Y, and Z be three unit vectors. If X is nearly parallel to Y, and Y is nearly parallel to Z, then X is nearly parallel to Z. Hereâ€™s a proof. Think of X, Y, and Z as points [â€¦]

Angles in the spiral of Theodorus

by

John

on 2019-08-05 01:40 (#4MKWF)

The previous post looked at how to plot the spiral of Theodorus shown below. We stopped the construction where we did because the next triangle to be added would overlap the first triangle, which would clutter the image. But we could certainly have kept going. If we do keep going, then the set of hypotenuse [â€¦]

How to plot the spiral of Theodorus

by

John

on 2019-08-04 22:48 (#4MKP3)

You may have seen the spiral of Theodorus. It sticks a sequence of right triangles together to make a sort of spiral. Each triangle has a short side of length 1, and the hypotenuse of each triangle becomes the long leg of the next triangle as shown below. How would you plot this spiral? At [â€¦]

Encryption as secure as factoring

by

John

on 2019-08-03 13:14 (#4MHG6)

RSA encryption is based on the assumption that factoring large integers is hard. However, itâ€™s possible that breaking RSA is easier than factoring. That is, the ability to factor large integers is sufficient for breaking RSA, but it might not be necessary. Two years after the publication of RSA, Michael Rabin created an alternative that [â€¦]

Accelerating convergence with Aitken’s method

by

John

on 2019-08-02 12:51 (#4MFAY)

The previous post looked at Eulerâ€™s method for accelerating the convergence of a slowly converging alternating series. Both hypotheses are necessary. The signs must alternate between terms, and applying the method to a series that is already converging quickly can slow down convergence. Aitkenâ€™s method This post looks at Aitkenâ€™s method for speeding up the [â€¦]

Accelerating an alternating series

by

John

on 2019-08-02 01:49 (#4MEER)

The most direct way of computing the sum of an alternating series, simply computing the partial sums in the terms get small enough, may not be the most efficient. Euler figured this out in the 18th century. For our demo weâ€™ll evaluate the Struve function defined by the series Note that the the terms in [â€¦]

Data breach trends

by

John

on 2019-07-31 13:13 (#4MACW)

Are data breaches becoming more or less common? This post gives a crude, back-of-the-envelope calculation to address the question. We wonâ€™t look at number of breaches per se but number of records breached. Thereâ€™s a terrific visualization of data breach statistics at Information is Beautiful, and they share their data here. Note that the data [â€¦]

Beating the odds on the Diffie-Hellman decision problem

by

John

on 2019-07-30 13:10 (#4M7PN)

There are a couple variations on the Diffie-Hellman problem in cryptography: the computation problem (CDH) and the decision problem (DDH). This post will explain both and give an example of where the former is hard and the latter easy. The Diffie-Hellman problems The Diffie-Hellman problems are formulated for an Abelian group. The main group we [â€¦]

Magic square links and errata

by

John

on 2019-07-27 00:59 (#4M17C)

Someone pointed out that what I called a knightâ€™s tour magic square is technically a semi-magic square: the rows and columns add up to the same constant, but the diagonals do not. It turns out there are no strict magic squares formed by knightâ€™s tours. This was proved in 2003. See a news article here. [â€¦]

Quaternion reference in the Vulgate

by

John

on 2019-07-24 16:52 (#4KV68)

To contemporary ears â€œquaternionâ€ refers to a number system discovered in the 19h century, but there were a couple precedents. Both refer to something related to a group of four things, but there is no relation to mathematical quaternions other than that they have four dimensions. Iâ€™ve written before about Miltonâ€™s use of the word [â€¦]

Fame, difficulty, and usefulness

by

John

on 2019-07-24 16:07 (#4KV69)

Pierre Fermat is best known for two theorems, dubbed his â€œlastâ€ theorem and his â€œlittleâ€ theorem. His last theorem is famous, difficult to prove, and useless. His little theorem is relatively arcane, easy to prove, and extremely useful. Thereâ€™s little relation between technical difficulty and usefulness. Fermatâ€™s last theorem Fermatâ€™s last theorem says there are [â€¦]

Twisted elliptic curves

by

John

on 2019-07-23 13:18 (#4KR52)

This morning I was sitting at a little bakery thinking about what to do before I check out of my hotel. I saw that the name of the bakery was Twist Bakery & Cafe, and that made me think of writing about twisted elliptic curves when I got back to my room. Twist of an [â€¦]

Hashing names does not protect privacy

by

John

on 2019-07-20 16:43 (#4KJMT)

Secure hash functions are practically impossible to reverse, but only if the input is unrestricted. If you generate 256 random bits and apply a secure 256-bit hash algorithm, an attacker wanting to recover your input canâ€™t do much better than brute force, hashing 256-bit strings hoping to find one that matches your hash value. Even [â€¦]

May the best technology win

by

John

on 2019-07-18 19:50 (#4KEV1)

Iâ€™ve become skeptical of arguments of the form â€œX is a better technology, but people wonâ€™t quit using Y.â€ Comparisons of technologies are multi-faceted. When someone says â€œX is better than Yâ€ I want to ask â€œBy all criteria? Thereâ€™s nothing better about Y?â€ When people say X is better but Y won, itâ€™s often [â€¦]

Integral approximation trick

by

John

on 2019-07-17 12:49 (#4KBCJ)

Hereâ€™s a simple integration approximation that works remarkably well in some contexts. Suppose you have an integrand that looks roughly like a normal density, something with a single peak that drops off fairly quickly on either side of the peak. The majority of integrals that arise in basic applications of probability and statistics fit this [â€¦]

Number of feet in a mile

by

John

on 2019-07-16 18:33 (#4K9RZ)

Here are a couple amusing things Iâ€™ve run across recently regarding the number of feet in a mile. Both are frivolous, but also have a more serious side. Mnemonic First, you can use â€œfive tomatoesâ€ as a mnemonic for remembering that there are 5280 feet in a mile. â€œFive tomatoesâ€ is a mnemonic for the [â€¦]

Discriminant of a cubic

by

John

on 2019-07-14 20:16 (#4K571)

The discriminant of a quadratic equation axÂ² + bx + c = 0 is Î” = bÂ² â€“ 4ac. If the discriminant Î” is zero, the equation has a double root, i.e. there is a unique x that makes the equation zero, and it counts twice as a root. If the discriminant is not zero, [â€¦]

Distribution of quadratic residues

by

John

on 2019-07-12 12:15 (#4K12Z)

Let p be an odd prime number. If the equation xÂ² = n mod p has a solution then n is a square mod p, or in classical terminology, n is a quadratic residue mod p. Half of the numbers between 0 and p are quadratic residues and half are not. The residues are distributed [â€¦]

What does CCPA say about de-identified data?

by

John

on 2019-07-12 01:13 (#4K04B)

The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my [â€¦]

Serious applications of a party trick

by

John

on 2019-07-11 13:59 (#4JYQ2)

In a group of 30 people, itâ€™s likely that two people have the same birthday. For a group of 23 the probability is about 1/2, and it goes up as the group gets larger. In a group of a few dozen people, itâ€™s unlikely that anyone will have a particular birthday, but itâ€™s likely that [â€¦]

Channel quantity and quality

by

John

on 2019-07-10 18:53 (#4JX3P)

Years ago, when there were a couple dozen television stations, someone [1] speculated that when we got more channels weâ€™d also get better content. The argument was that people are more similar in their base interests than in their more refined interests. Therefore if there are only a few channels, they will all appeal to [â€¦]

Symmetry in exponential sums

by

John

on 2019-07-09 14:14 (#4JT1M)

Todayâ€™s exponential sum is highly symmetric: These sums are often symmetric, but not always. For example, hereâ€™s the sum from a couple days ago: Itâ€™s not obvious from looking at the parameters whether a sum will be symmetric or not. Maybe someone could find a prove criteria for a sum to have certain symmetries. For [â€¦]

by

John

on 2019-07-06 11:41 (#4JM40)

David Bressoud has written a new book entitled Calculus Reordered: A History of the Big Ideas. He presents the major themes of calculus in historical order, which is very different from the order in which it is now taught. We now begin with limits, then differentiation, integration, and infinite series. Historically, integration came first and [â€¦]

Homomorphic encryption

by

John

on 2019-07-04 16:30 (#4JGX8)

A function that satisfies f(x*y) = f(x)*f(y) is called a homomorphism. The symbol â€œ*â€ can stand for any operation, and it need not stand for the same thing on both sides of the equation. Technically * is the group operation, and if the function f maps elements of one group to another, the group operation [â€¦]

Journalistic stunt with Emacs

by

John

on 2019-07-03 13:47 (#4JEA1)

Emacs has been called a text editor with ambitions of being an operating system, and some people semi-seriously refer to it as their operating system. Emacs does not want to be an operating system per se, but it is certainly ambitious. It can be a shell, a web browser, an email client, a calculator, a [â€¦]

Notes on computing hash functions

by

John

on 2019-07-01 15:03 (#4J9PS)

A secure hash function maps a file to a string of bits in a way that is hard to reverse. Ideally such a function has three properties: pre-image resistance collision resistance second pre-image resistance Pre-image resistance means that starting from the hash value, it is very difficult to infer what led to that output; it [â€¦]

Translating poetry

by

John

on 2019-06-28 12:49 (#4J4AS)

You canâ€™t preserve every aspect of a text when translating. A strict word-for-word translation attempts to be faithful to the words but may be ungrammatical in the target language. An idea-for-idea translation is more readable, but still may not convey the style of the original. Translation reminds me of making maps. There have been countless [â€¦]

Category theory for data science: cautious optimism

by

John

on 2019-06-26 13:37 (#4HZH7)

Iâ€™m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but Iâ€™m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty: [Jean-Pierre] Serre [â€¦]

Software to factor integers

by

John

on 2019-06-25 13:51 (#4HX2K)

In my previous post, I showed how changing one bit of a semiprime (i.e. the product of two primes) creates an integer that can be factored much faster. I started writing that post using Python with SymPy, but moved to Mathematica because factoring took too long. SymPy vs Mathematica When Iâ€™m working in Python, SymPy [â€¦]

Making public keys factorable with Rowhammer

by

John

on 2019-06-21 22:24 (#4HPFP)

The security of RSA encryption depends on the fact that the product of two large primes is difficult to factor. So if p and q are large primes, say 2048 bits each, then you can publish n = pq with little fear that someone can factor n to recover p and q. But if you [â€¦]

Bounds on the nth prime

by

John

on 2019-06-21 03:48 (#4HMF7)

The nth prime is approximately n log n. For more precise estimates, there are numerous upper and lower bounds for the nth prime, each tighter over some intervals than others. Here I want to point out upper and lower bounds from a dissertation by Christian Axler on page viii. First, define Then for sufficiently large [â€¦]

Converting between nines and sigmas

by

John

on 2019-06-20 11:20 (#4HJNC)

Nines and sigmas are two ways to measure quality. Youâ€™ll hear something has four or five nines of reliability or that some failure is a five sigma event. What do these mean, and how do you convert between them? Definitions If a system has fives nines of availability, that means the probability of the system [â€¦]

Maybe it’s just hard

by

John

on 2019-06-19 19:56 (#4HHBA)

If someone tells you repeatedly that something isnâ€™t hard, maybe itâ€™s just hard. Monads A post by Gilad Bracha got me thinking about this. He says Last time I looked, the Haskell wiki listed 29 tutorials on [monads]. â€¦ Could it just be that people just have a hard time understanding monads? If so, what [â€¦]

Why are regular expressions difficult?

by

John

on 2019-06-19 19:13 (#4HHBB)

Regular expressions are challenging, but not for the reasons commonly given. Non-reasons Here are some reasons given for the difficulty of regular expressions that I donâ€™t agree with. Cryptic syntax I think complaints about cryptic syntax miss the mark. Some people say that Greek is hard to learn because it uses a different alphabet. If [â€¦]

Feller-Tornier constant

by

John

on 2019-06-19 02:28 (#4HFFR)

Hereâ€™s kind of an unusual question: What is the density of integers that have an even number of prime factors with an exponent greater than 1? To define the density, you take the proportion up to an integer N then take the limit as N goes to infinity. Itâ€™s not obvious that the limit should [â€¦]

Translating Robert Burns

by

John

on 2019-06-18 14:28 (#4HE56)

Last year Adam Roberts had some fun with Finnegans Wake [1], seeing how little he could edit it and turn it into something that sounded like Return of the Jedi. I wrote a blog post where I quantified the difference between the original and the parody using Levenshtein distance, basically how many edits it takes [â€¦]

...35 36 37 383940 41 42 43 44...