Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-04-26 03:16
Google Adiantum and the ChaCha RNG
The ChaCha cryptographic random number generator is in the news thanks to Google’s Adiantum project. I’ll discuss what’s going on, but first a little background. The name of the project comes from a genus of fern. More on that below as well. One-time pads The one-time pad is a provably unbreakable way to encrypt things. […]
Congress and the Equifax data breach
Dialog from a congressional hearing February 26, 2019. Representative Katie Porter: My question for you is whether you would be willing to share today your social security, your birth date, and your address at this public hearing. Equifax CEO Mark Begor: I would be a bit uncomfortable doing that, Congresswoman. If you’d so oblige me, […]
Sharing secrets with polynomials
This post will present a couple ways to share secrets using polynomials. We have a group of n people who want to share a secret between them so that k of them will have to cooperate in order to unlock the secret. For example, maybe a committee of n = 5 wants to require the cooperation of […]
Miscellaneous
Image editor Image editing software is complicated, and I don’t use it often enough to remember how to do much. I like Paint.NET on Windows because it is in a sort of sweet spot for me, more powerful than Paint and much less complicated than Photoshop. I found out there’s a program Pinta for Linux […]
What sticks in your head
This morning I read an article by Dennis Felsing about his impressive/intimidating Linux desktop setup. He uses a lot of tools that are not the easiest way to get things done immediately but are long-term productivity investments. Remembrance of syntax past Felsing apparently is able to remember the syntax of scores of tools and programming […]
Testing for primes less than a quintillion
The most common way to test whether a large number is prime is the Miller-Rabin test. If the test says a number is composite, it’s definitely composite. Otherwise the number is very likely, but not certain, to be prime. A pseudoprime is a composite number that slips past the Miller-Rabin test. (Actually, a strong pseudoprime. […]
The point at infinity
As I explained in an earlier post, a first pass at the definition of an elliptic curve is the set of points satisfying y² = x³ + ax + b. There are a few things missing from this definition, as indicated before, one being the mysterious “point at infinity.” I gave a hand-waving explanation that […]
More of everything
If you want your music to have more bass, more mid-range, and more treble, then you just want the music louder. You can increase all three components in absolute terms, but not in relative terms. You can’t increase the proportions of everything. Would you like more students to major in STEM subjects? OK, what subjects […]
Regression, modular arithmetic, and PQC
Linear regression Suppose you have a linear regression with a couple predictors and no intercept term: β1x1 + β2x2 = y + ε where the x‘s are inputs, the β are fixed but unknown, y is the output, and ε is random error. Given n observations (x1, x2, y + ε), linear regression estimates the parameters β1 […]
What is an elliptic curve?
Elliptic curves are pure and applied, concrete and abstract, simple and complex. Elliptic curves have been studied for many years by pure mathematicians with no intention to apply the results to anything outside math itself. And yet elliptic curves have become a critical part of applied cryptography. Elliptic curves are very concrete. There are some […]
Microsoft replacing SHA-1
According to this article, Microsoft is patching Windows 7 and Windows Server 2008 to look for SHA-2 hash functions of updates. These older versions of Windows have been using SHA-1, while newer version are already using SHA-2. This is a good move, but unnecessary. Here’s what I mean by that. The update was likely unnecessary […]
Hash function menagerie
Here’s an oversimplified survey of cryptographic hash functions: Everyone used to use MD5, now they use some variation on SHA. There’s some truth to that. MD5 was very popular, and remains popular years after it was proven insecure. And now variations on SHA like SHA1 and SHA256 are commonly used. But there are a lot […]
Addition on Curve1174
I’ve written about elliptic curve and alluded to the fact that there’s a special kind of addition for points on the curve. But I haven’t gone into details because it’s more complicated than I wanted to get into. However, there’s a special case where the details are not complicated, the so called Edwards curves. I’ll look […]
The hard part in becoming a command line wizard
I’ve long been impressed by shell one-liners. They seem like magical incantations. Pipe a few terse commands together, et voilà! Out pops the solution to a problem that would seem to require pages of code. Are these one-liners real or mythology? To some extent, they’re both. Below I’ll give a famous real example. Then I’ll argue […]
Naming elliptic curves used in cryptography
There are an infinite number of elliptic curves, but a small number that are used in elliptic curve cryptography (ECC), and these special curves have names. Apparently there are no hard and fast rules for how the names are chosen, but there are patterns. The named elliptic curves are over a prime field, i.e. a […]
Entropy extractor used in μRNG
Yesterday I mentioned μRNG, a true random number generator (TRNG) that takes physical sources of randomness as input. These sources are independent but non-uniform. This post will present the entropy extractor μRNG uses to take non-uniform bits as input and produce uniform bits as output. We will present Python code for playing with the entropy extractor. (μRNG […]
Solving for probability given entropy
If a coin comes up heads with probability p and tails with probability 1-p, the entropy in the coin flip is S = –p log2 p – (1-p) log2 (1-p). It’s common to start with p and compute entropy, but recently I had to go the other way around: given entropy, solve for p. It’s easy to come up […]
Missing information anxiety
A recurring theme in math is that you may not need to do what it looks like you need to do. There may be a shortcut to where you want to go. A special case of this is that you may not need all the information that you think you need. For example, if you […]
Sum-product theorem for finite fields
A week ago I wrote about using some Python code to play with the sum-product theorem of Erdős and Szemerédi and its conjectured refinement. This morning I learned that the Erdős-Szemerédi theorem has been extended to finite fields. David Johnston left a comment saying that he and his colleagues used this extension to finite fields as […]
Computing Legendre and Jacobi symbols
In a earlier post I introduce the Legendre symbol where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and -1 otherwise. The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It […]
Twitter account for data privacy
I’ve started a new Twitter account for data privacy and related topics. Twitter gave me the handle @data_tip even though that’s not what I typed in, and what I typed in is not being used. Apparently they don’t let you pick your handle any more. Data Privacy (@data_tip)
Dose finding != dose escalation
You’ll often hear Phase I dose-finding trials referred to as dose escalation studies. This is because simple dose-finding methods can only explore in one direction: they can only escalate. Three-plus-three rule The most common dose finding method is the 3+3 rule. There are countless variations on this theme, but the basic idea is that you give […]
RSA implementation flaws
Implementation flaws in RSA encryption make it less secure in practice than in theory. RSA encryption depends on 5 numbers: Large primes p and q The modulus n = pq Encryption key e Decryption key d The numbers p, q, and d are kept secret, and the numbers e and n are made public. The encryption method relies on the assumption that in practice one cannot […]
Supercookies
Supercookies, also known as evercookies or zombie cookies, are like browser cookies in that they can be used to track you, but are much harder to remove. What is a supercookie? The way I first heard supercookies describe was as a cookie that you can appear to delete, but as soon as you do, software […]
Exploring the sum-product conjecture
Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we […]
Normal approximation to Laplace distribution?
I heard the phrase “normal approximation to the Laplace distribution” recently and did a double take. The normal distribution does not approximate the Laplace! Normal and Laplace distributions A normal distribution has the familiar bell curve shape. A Laplace distribution, also known as a double exponential distribution, it pointed in the middle, like a pole […]
Probabilisitic Identifiers in CCPA
The CCPA, the California Privacy Protection Act, was passed last year and goes into effect at the beginning of next year. And just as the GDPR impacts businesses outside Europe, the CCPA will impact businesses outside California. The law specifically mentions probabilistic identifiers. “Probabilistic identifier” means the identification of a consumer or a device to a […]
Font Fingerprinting
Web sites may not be able to identify you, but they can probably identify your web browser. Your browser sends a lot of information back to web servers, and the combination of settings for a particular browser are usually unique. To get an idea what information we’re talking about, you could take a look at […]
The Soviet license plate game and Kolmogorov complexity
Physicist Lev Landau used to play a mental game with Soviet license plates [1]. The plates had the form of two digits, a dash, two more digits, and some letters. Rules of the game His game was to apply high school math operators to the numbers on both side of the dash so that the […]
3,000th blog post
I just saw that I’d written 2,999 blog posts, so that makes this one the 3,000th. About a year ago was the 10th anniversary, and Tim Hopper wrote his retrospective about my blog. In addition to chronological blog posts, there are about 200 “pages” on the site, mostly technical notes. These include the most popular […]
Economics, power laws, and hacking
Increasing costs impact some players more than others. Those who know about power laws and know how to prioritize are impacted less than those who naively believe everything is equally important. This post will look at economics and power laws in the context of password cracking. Increasing the cost of verifying a password does not […]
Varsity versus junior varsity sports
Yesterday my wife and I watched our daughter’s junior varsity soccer game. Several statistical questions came to mind. Larger schools tend to have better sports teams. If the talent distributions of a large school and a small school are the same, the larger school will have a better team because its players are the best […]
The most low-key newsletter
My monthly newsletter is one of the most low-key ones around. It’s almost a secret. You can find it via the navigation menu if you look for it. I won’t put a popup on my site cajoling you to subscribe, nor will I ask you to sign up before letting you read something I’ve written. […]
Salting and stretching a password
This post will look at a progression of ways to store passwords, from naive to sophisticated. Most naive: clear text Storing passwords in plain text is least secure thing a server could do. If this list is leaked, someone knows all the passwords with no effort. Better: hash values A better approach would be to […]
Reversing an MD5 hash
The MD5 hashing algorithm was once considered secure cryptographic hash, but those days are long gone [1]. For a given hash value, it doesn’t take much computing power to create a document with the same hash. Hash functions are not reversible in general. MD5 is a 128-bit hash, and so it maps any string, no […]
The science of waiting in line
There’s a branch of math that studies how people wait in line: queueing theory. It’s not just about people standing in line, but about any system with clients and servers. An introduction to queueing theory, about what you’d learn in one or two lectures, is very valuable for understanding how the world around you works. […]
Saxophone with short bell
Paul A. sent me a photo of his alto sax in response to my previous post on a saxophone with two octave keys. His saxophone also has two octave keys, and it has a short bell. Contemporary saxophones have a longer bell, go down to B flat, and have two large pads on the bell. […]
A convergence problem going around Twitter
Ten days ago, Fermat’s library posted a tweet saying that it is unknown whether the sum converges or diverges, stirring up a lot of discussion. Sam Walters has been part of this discussion and pointed to a paper that says this is known as the Flint Hills series. My first thought was to replace the […]
Please update your RSS subscription
When I started this blog I routed my RSS feed through Feedburner, and now Feedburner is no longer working for my site. If you subscribed by RSS, please check the feed URL. It should be https://www.johndcook.com/blog/feed which was previously forwarded to a Feedburner URL. If you subscribe directly to the feed with my domain, it […]
Big O tilde notation
There’s a variation on Landau’s big-O notation [1] that’s starting to become more common, one that puts a tilde on top of the O. At first it looks like a typo, a stray diacritic mark. What does that mean? In short, That is, big O tilde notation ignores logarithmic factors. For example, the FFT algorithm computes […]
Unstructured data is an oxymoron
Strictly speaking, “unstructured data” is a contradiction in terms. Data must have structure to be comprehensible. By “unstructured data” people usually mean data with a non-tabular structure. Tabular data is data that comes in tables. Each row corresponds to a subject, and each column corresponds to a kind of measurement. This is the easiest data to […]
How fast can you multiply really big numbers?
How long does it take to multiply very large integers? Using the algorithm you learned in elementary school, it takes O(n²) operations to multiply two n digit numbers. But for large enough numbers it pays to carry out multiplication very differently, using FFTs. If you’re multiplying integers with tens of thousands of decimal digits, the […]
Stigler’s law and human nature
Stigler’s law of eponymy states that no scientific discovery is named after the first person to discover it. Stephen Stigler acknowledged that he was not the first to realize this. Of course this is just an aphorism. Sometimes discoveries are indeed named after their discoverers. But the times when this isn’t the case are more […]
Projecting Unicode to ASCII
Sometimes you need to downgrade Unicode text to more restricted ASCII text. For example, while working on my previous post, I was surprised that there didn’t appear to be an asteroid named after Poincaré. There is one, but it was listed as Poincare in my list of asteroid names. Python module I used the Python module unidecode […]
Asteroids named after mathematicians
This evening I stumbled on the fact that John von Neumann and Fibonacci both have asteroids named after them. Then I wondered how many more famous mathematicians have asteroids named after them. As it turns out, most of them: Euclid, Gauss, Cauchy, Noether, Gödel, Ramanujan, … It’s easier to look at the names that are […]
Why are dates of service prohibited under HIPAA’s Safe Harbor provision?
The HIPAA Privacy Rule offers two ways to say that data has been de-identified: Safe Harbor and expert determination. This post is about the former. I help companies with the latter. Safe Harbor provision The Safe Harbor provision lists 18 categories of data that would cause a data set to not be considered de-identified unless […]
Exponential sums in 2019
I’ve made a small change in my exponential sum page. I’ll need to give a little background before explaining the change. First of all, you can read exactly what these exponential sums are here. These plots can be periodic in two senses. The first is simply repeating the same sequence of points. The second is […]
Goldbach’s conjecture, Lagrange’s theorem, and 2019
The previous post showed how to find all groups whose order is a product of two primes using 2019 as an example. Here are a couple more observations along the same line, illustrating the odd Goldbach conjecture and Lagrange’s four-square theorem with 2019. Odd Goldbach Conjecture Goldbach’s conjecture says that every even number greater than […]
Groups of order 2019
How many groups have 2019 elements? What are these groups? 2019 is a semiprime, i.e. the product of two primes, 3 and 673. For every semiprime s, there are either one or two distinct groups of order s. As explained here, if s = pq with p > q, all groups of order s are isomorphic if q is not a factor of p-1. […]
Flattest US states
I read somewhere that, contrary to popular belief, Kansas is not the flattest state in the US. Instead, Florida is the flattest, and Kansas was several notches further down the list. (Update: Nevertheless, Kansas is literally flatter than a pancake. Thanks to Andreas Krause for the link.) How would you measure the flatness of a […]
...36373839404142434445...