Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-11-23 01:47
Dose finding != dose escalation
You’ll often hear Phase I dose-finding trials referred to as dose escalation studies. This is because simple dose-finding methods can only explore in one direction: they can only escalate. Three-plus-three rule The most common dose finding method is the 3+3 rule. There are countless variations on this theme, but the basic idea is that you give […]
RSA implementation flaws
Implementation flaws in RSA encryption make it less secure in practice than in theory. RSA encryption depends on 5 numbers: Large primes p and q The modulus n = pq Encryption key e Decryption key d The numbers p, q, and d are kept secret, and the numbers e and n are made public. The encryption method relies on the assumption that in practice one cannot […]
Supercookies
Supercookies, also known as evercookies or zombie cookies, are like browser cookies in that they can be used to track you, but are much harder to remove. What is a supercookie? The way I first heard supercookies describe was as a cookie that you can appear to delete, but as soon as you do, software […]
Exploring the sum-product conjecture
Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we […]
Normal approximation to Laplace distribution?
I heard the phrase “normal approximation to the Laplace distribution” recently and did a double take. The normal distribution does not approximate the Laplace! Normal and Laplace distributions A normal distribution has the familiar bell curve shape. A Laplace distribution, also known as a double exponential distribution, it pointed in the middle, like a pole […]
Probabilisitic Identifiers in CCPA
The CCPA, the California Privacy Protection Act, was passed last year and goes into effect at the beginning of next year. And just as the GDPR impacts businesses outside Europe, the CCPA will impact businesses outside California. The law specifically mentions probabilistic identifiers. “Probabilistic identifier” means the identification of a consumer or a device to a […]
Font Fingerprinting
Web sites may not be able to identify you, but they can probably identify your web browser. Your browser sends a lot of information back to web servers, and the combination of settings for a particular browser are usually unique. To get an idea what information we’re talking about, you could take a look at […]
The Soviet license plate game and Kolmogorov complexity
Physicist Lev Landau used to play a mental game with Soviet license plates [1]. The plates had the form of two digits, a dash, two more digits, and some letters. Rules of the game His game was to apply high school math operators to the numbers on both side of the dash so that the […]
3,000th blog post
I just saw that I’d written 2,999 blog posts, so that makes this one the 3,000th. About a year ago was the 10th anniversary, and Tim Hopper wrote his retrospective about my blog. In addition to chronological blog posts, there are about 200 “pages” on the site, mostly technical notes. These include the most popular […]
Economics, power laws, and hacking
Increasing costs impact some players more than others. Those who know about power laws and know how to prioritize are impacted less than those who naively believe everything is equally important. This post will look at economics and power laws in the context of password cracking. Increasing the cost of verifying a password does not […]
Varsity versus junior varsity sports
Yesterday my wife and I watched our daughter’s junior varsity soccer game. Several statistical questions came to mind. Larger schools tend to have better sports teams. If the talent distributions of a large school and a small school are the same, the larger school will have a better team because its players are the best […]
The most low-key newsletter
My monthly newsletter is one of the most low-key ones around. It’s almost a secret. You can find it via the navigation menu if you look for it. I won’t put a popup on my site cajoling you to subscribe, nor will I ask you to sign up before letting you read something I’ve written. […]
Salting and stretching a password
This post will look at a progression of ways to store passwords, from naive to sophisticated. Most naive: clear text Storing passwords in plain text is least secure thing a server could do. If this list is leaked, someone knows all the passwords with no effort. Better: hash values A better approach would be to […]
Reversing an MD5 hash
The MD5 hashing algorithm was once considered secure cryptographic hash, but those days are long gone [1]. For a given hash value, it doesn’t take much computing power to create a document with the same hash. Hash functions are not reversible in general. MD5 is a 128-bit hash, and so it maps any string, no […]
The science of waiting in line
There’s a branch of math that studies how people wait in line: queueing theory. It’s not just about people standing in line, but about any system with clients and servers. An introduction to queueing theory, about what you’d learn in one or two lectures, is very valuable for understanding how the world around you works. […]
Saxophone with short bell
Paul A. sent me a photo of his alto sax in response to my previous post on a saxophone with two octave keys. His saxophone also has two octave keys, and it has a short bell. Contemporary saxophones have a longer bell, go down to B flat, and have two large pads on the bell. […]
A convergence problem going around Twitter
Ten days ago, Fermat’s library posted a tweet saying that it is unknown whether the sum converges or diverges, stirring up a lot of discussion. Sam Walters has been part of this discussion and pointed to a paper that says this is known as the Flint Hills series. My first thought was to replace the […]
Please update your RSS subscription
When I started this blog I routed my RSS feed through Feedburner, and now Feedburner is no longer working for my site. If you subscribed by RSS, please check the feed URL. It should be https://www.johndcook.com/blog/feed which was previously forwarded to a Feedburner URL. If you subscribe directly to the feed with my domain, it […]
Big O tilde notation
There’s a variation on Landau’s big-O notation [1] that’s starting to become more common, one that puts a tilde on top of the O. At first it looks like a typo, a stray diacritic mark. What does that mean? In short, That is, big O tilde notation ignores logarithmic factors. For example, the FFT algorithm computes […]
Unstructured data is an oxymoron
Strictly speaking, “unstructured data” is a contradiction in terms. Data must have structure to be comprehensible. By “unstructured data” people usually mean data with a non-tabular structure. Tabular data is data that comes in tables. Each row corresponds to a subject, and each column corresponds to a kind of measurement. This is the easiest data to […]
How fast can you multiply really big numbers?
How long does it take to multiply very large integers? Using the algorithm you learned in elementary school, it takes O(n²) operations to multiply two n digit numbers. But for large enough numbers it pays to carry out multiplication very differently, using FFTs. If you’re multiplying integers with tens of thousands of decimal digits, the […]
Stigler’s law and human nature
Stigler’s law of eponymy states that no scientific discovery is named after the first person to discover it. Stephen Stigler acknowledged that he was not the first to realize this. Of course this is just an aphorism. Sometimes discoveries are indeed named after their discoverers. But the times when this isn’t the case are more […]
Projecting Unicode to ASCII
Sometimes you need to downgrade Unicode text to more restricted ASCII text. For example, while working on my previous post, I was surprised that there didn’t appear to be an asteroid named after Poincaré. There is one, but it was listed as Poincare in my list of asteroid names. Python module I used the Python module unidecode […]
Asteroids named after mathematicians
This evening I stumbled on the fact that John von Neumann and Fibonacci both have asteroids named after them. Then I wondered how many more famous mathematicians have asteroids named after them. As it turns out, most of them: Euclid, Gauss, Cauchy, Noether, Gödel, Ramanujan, … It’s easier to look at the names that are […]
Why are dates of service prohibited under HIPAA’s Safe Harbor provision?
The HIPAA Privacy Rule offers two ways to say that data has been de-identified: Safe Harbor and expert determination. This post is about the former. I help companies with the latter. Safe Harbor provision The Safe Harbor provision lists 18 categories of data that would cause a data set to not be considered de-identified unless […]
Exponential sums in 2019
I’ve made a small change in my exponential sum page. I’ll need to give a little background before explaining the change. First of all, you can read exactly what these exponential sums are here. These plots can be periodic in two senses. The first is simply repeating the same sequence of points. The second is […]
Goldbach’s conjecture, Lagrange’s theorem, and 2019
The previous post showed how to find all groups whose order is a product of two primes using 2019 as an example. Here are a couple more observations along the same line, illustrating the odd Goldbach conjecture and Lagrange’s four-square theorem with 2019. Odd Goldbach Conjecture Goldbach’s conjecture says that every even number greater than […]
Groups of order 2019
How many groups have 2019 elements? What are these groups? 2019 is a semiprime, i.e. the product of two primes, 3 and 673. For every semiprime s, there are either one or two distinct groups of order s. As explained here, if s = pq with p > q, all groups of order s are isomorphic if q is not a factor of p-1. […]
Flattest US states
I read somewhere that, contrary to popular belief, Kansas is not the flattest state in the US. Instead, Florida is the flattest, and Kansas was several notches further down the list. (Update: Nevertheless, Kansas is literally flatter than a pancake. Thanks to Andreas Krause for the link.) How would you measure the flatness of a […]
Naked eye view vs photos of the northern lights
My daughter Elizabeth recently photographed the northern lights (aurora borealis) in Tromsø, Norway, about 3° above the Arctic Circle. I haven’t seen the northern lights in person, and I didn’t know until she told me that the lights appear gray to the naked eye, like smoke. Sometimes the lights have a hint of color, but […]
Check sums and error detection
The previous post looked at Crockford’s base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0. Crockford recommends the following check sum procedure, a simple error detection code: The check […]
Base 32 and base 64 encoding
Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical. Math convention By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 […]
Most popular posts of 2018
Here are 10 of my most popular posts this year, arranged as pairs of posts in different areas. Astronomy Planets evenly spaced on a log scale Gravity, stars, and cows Programming Viability of unpopular programming languages Currying in various contexts Computer arithmetic The quadratic formula in low-precision arithmetic Eight-bit floating point Math Computing SVD and […]
New prime record: 51st Mersenne prime discovered
A new prime record was announced yesterday. The largest known prime is now Written in hexadecimal the newly discovered prime is For decades the largest known prime has been a Mersenne prime because there’s an efficient test for checking whether a Mersenne number is prime. I explain the test here. There are now 51 known […]
Multi-arm adaptively randomized clinical trials
This post will look at adaptively randomized trial designs. In particular, we want to focus on multi-arm trials, i.e. trials of more than two treatments. The aim is to drop the less effective treatments quickly so the trial can focus on determining which of the better treatments is best. We’ll briefly review our approach to […]
Kepler and the contraction mapping theorem
The contraction mapping theorem says that if a function moves points closer together, then there must be some point the function doesn’t move. We’ll make this statement more precise and give a historically important application. Definitions and theorem A function f on a metric space X is a contraction if there exists a constant q with […]
Trademark symbol, LaTeX, and Unicode
Earlier this year I was a coauthor on a paper about the Cap Score™ test for male fertility from Androvia Life Sciences [1]. I just noticed today that when I added the publication to my CV, it caused some garbled text to appear in the PDF. Here is the corresponding LaTeX source code. Fixing the […]
RSA with one shared prime
The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product. Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p […]
Following an idea to its logical conclusion
Following an idea to its logical conclusion might be extrapolating a model beyond its valid range. Suppose you have a football field with area A. If you make two parallel sides twice as long, then the area will be 2A. If you double the length of the sides again, the area will be 4A. Following this […]
Technological optimism
Kevin Kelly is one of the most optimistic people writing about technology, but there’s a nuance to his optimism that isn’t widely appreciated. Kelly sees technological progress as steady and inevitable, but not monotone. He has often said that new technologies create almost as many problems as they solve. Maybe it’s 10 steps forward and […]
RSA with Pseudoprimes
RSA setup Recall the setup for RSA encryption given in the previous post. Select two very large prime numbers p and q. Compute n = pq and φ(n) = (p – 1)(q – 1). Choose an encryption key e relatively prime to φ(n). Calculate the decryption key d such that ed = 1 (mod φ(n)). Publish e and n, and keep d, p, and q secret. φ is Euler’s totient function, defined here. There’s a complication in the first […]
Can I have the last four digits of your social?
Imagine this conversation. “Could you tell me your social security number?” “Absolutely not! That’s private.” “OK, how about just the last four digits?” “Oh, OK. That’s fine.” When I was in college, professors would post grades by the last four digits of student social security numbers. Now that seems incredibly naive, but no one objected […]
RSA encryption exponents are mostly all the same
The big idea of public key cryptography is that it lets you publish an encryption key e without compromising your decryption key d. A somewhat surprising detail of RSA public key cryptography is that in practice e is nearly always the same number, specifically e = 65537. We will review RSA, explain how this default e was chosen, and discuss why […]
Revealing information by trying to suppress it
FAS posted an article yesterday explaining how blurring military installations out of satellite photos points draws attention to them, showing exactly where they are and how big they are. The Russian mapping service Yandex Maps blurred out sensitive locations in Israel and Turkey. As the article says, this is an example of the Streisand effect, […]
Numerical methods blog posts
I recently got a review copy of Scientific Computing: A Historical Perspective by Bertil Gustafsson. I thought that thumbing through the book might give me ideas for new topics to blog about. It still may, but mostly it made me think of numerical methods I’ve already blogged about. In historical order, or at least in the […]
Simulating identification by zip code, sex, and birthdate
As mentioned in the previous post, Latanya Sweeney estimated that 87% of Americans can be identified by the combination of zip code, sex, and birth date. We’ll do a quick-and-dirty estimate and a simulation to show that this result is plausible. There’s no point being too realistic with a simulation because the actual data that […]
No funding for uncomfortable results
In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous. The state of Massachusetts had released data on 135,000 state employees and their families with obvious identifiers removed. However, the data contained zip code, birth date, and sex for each individual. Sweeney was able to cross reference this data with publicly available […]
Sine of a googol
How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean? Sine of a trillion Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only […]
Six degrees of Kevin Bacon, Paul Erdos, and Wikipedia
I just discovered the web site Six Degrees of Wikipedia. It lets you enter two topics and it will show you how few hops it can take to get from one to the other. Since the mathematical equivalent of Six Degrees of Kevin Bacon is Six degrees of Paul Erdős, I tried looking for the […]
Mersenne prime trend
Mersenne primes have the form 2p -1 where p is a prime. The graph below plots the trend in the size of these numbers based on the 50 Mersenne primes currently known. The vertical axis plots the exponents p, which are essentially the logs base 2 of the Mersenne primes. The scale is logarithmic, so […]
...34353637383940414243...