Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-03-07 01:01
Why are dates of service prohibited under HIPAA’s Safe Harbor provision?
The HIPAA Privacy Rule offers two ways to say that data has been de-identified: Safe Harbor and expert determination. This post is about the former. I help companies with the latter. Safe Harbor provision The Safe Harbor provision lists 18 categories of data that would cause a data set to not be considered de-identified unless […]
Exponential sums in 2019
I’ve made a small change in my exponential sum page. I’ll need to give a little background before explaining the change. First of all, you can read exactly what these exponential sums are here. These plots can be periodic in two senses. The first is simply repeating the same sequence of points. The second is […]
Goldbach’s conjecture, Lagrange’s theorem, and 2019
The previous post showed how to find all groups whose order is a product of two primes using 2019 as an example. Here are a couple more observations along the same line, illustrating the odd Goldbach conjecture and Lagrange’s four-square theorem with 2019. Odd Goldbach Conjecture Goldbach’s conjecture says that every even number greater than […]
Groups of order 2019
How many groups have 2019 elements? What are these groups? 2019 is a semiprime, i.e. the product of two primes, 3 and 673. For every semiprime s, there are either one or two distinct groups of order s. As explained here, if s = pq with p > q, all groups of order s are isomorphic if q is not a factor of p-1. […]
Flattest US states
I read somewhere that, contrary to popular belief, Kansas is not the flattest state in the US. Instead, Florida is the flattest, and Kansas was several notches further down the list. (Update: Nevertheless, Kansas is literally flatter than a pancake. Thanks to Andreas Krause for the link.) How would you measure the flatness of a […]
Naked eye view vs photos of the northern lights
My daughter Elizabeth recently photographed the northern lights (aurora borealis) in Tromsø, Norway, about 3° above the Arctic Circle. I haven’t seen the northern lights in person, and I didn’t know until she told me that the lights appear gray to the naked eye, like smoke. Sometimes the lights have a hint of color, but […]
Check sums and error detection
The previous post looked at Crockford’s base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0. Crockford recommends the following check sum procedure, a simple error detection code: The check […]
Base 32 and base 64 encoding
Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical. Math convention By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 […]
Most popular posts of 2018
Here are 10 of my most popular posts this year, arranged as pairs of posts in different areas. Astronomy Planets evenly spaced on a log scale Gravity, stars, and cows Programming Viability of unpopular programming languages Currying in various contexts Computer arithmetic The quadratic formula in low-precision arithmetic Eight-bit floating point Math Computing SVD and […]
New prime record: 51st Mersenne prime discovered
A new prime record was announced yesterday. The largest known prime is now Written in hexadecimal the newly discovered prime is For decades the largest known prime has been a Mersenne prime because there’s an efficient test for checking whether a Mersenne number is prime. I explain the test here. There are now 51 known […]
Multi-arm adaptively randomized clinical trials
This post will look at adaptively randomized trial designs. In particular, we want to focus on multi-arm trials, i.e. trials of more than two treatments. The aim is to drop the less effective treatments quickly so the trial can focus on determining which of the better treatments is best. We’ll briefly review our approach to […]
Kepler and the contraction mapping theorem
The contraction mapping theorem says that if a function moves points closer together, then there must be some point the function doesn’t move. We’ll make this statement more precise and give a historically important application. Definitions and theorem A function f on a metric space X is a contraction if there exists a constant q with […]
Trademark symbol, LaTeX, and Unicode
Earlier this year I was a coauthor on a paper about the Cap Score™ test for male fertility from Androvia Life Sciences [1]. I just noticed today that when I added the publication to my CV, it caused some garbled text to appear in the PDF. Here is the corresponding LaTeX source code. Fixing the […]
RSA with one shared prime
The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product. Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p […]
Following an idea to its logical conclusion
Following an idea to its logical conclusion might be extrapolating a model beyond its valid range. Suppose you have a football field with area A. If you make two parallel sides twice as long, then the area will be 2A. If you double the length of the sides again, the area will be 4A. Following this […]
Technological optimism
Kevin Kelly is one of the most optimistic people writing about technology, but there’s a nuance to his optimism that isn’t widely appreciated. Kelly sees technological progress as steady and inevitable, but not monotone. He has often said that new technologies create almost as many problems as they solve. Maybe it’s 10 steps forward and […]
RSA with Pseudoprimes
RSA setup Recall the setup for RSA encryption given in the previous post. Select two very large prime numbers p and q. Compute n = pq and φ(n) = (p – 1)(q – 1). Choose an encryption key e relatively prime to φ(n). Calculate the decryption key d such that ed = 1 (mod φ(n)). Publish e and n, and keep d, p, and q secret. φ is Euler’s totient function, defined here. There’s a complication in the first […]
Can I have the last four digits of your social?
Imagine this conversation. “Could you tell me your social security number?” “Absolutely not! That’s private.” “OK, how about just the last four digits?” “Oh, OK. That’s fine.” When I was in college, professors would post grades by the last four digits of student social security numbers. Now that seems incredibly naive, but no one objected […]
RSA encryption exponents are mostly all the same
The big idea of public key cryptography is that it lets you publish an encryption key e without compromising your decryption key d. A somewhat surprising detail of RSA public key cryptography is that in practice e is nearly always the same number, specifically e = 65537. We will review RSA, explain how this default e was chosen, and discuss why […]
Revealing information by trying to suppress it
FAS posted an article yesterday explaining how blurring military installations out of satellite photos points draws attention to them, showing exactly where they are and how big they are. The Russian mapping service Yandex Maps blurred out sensitive locations in Israel and Turkey. As the article says, this is an example of the Streisand effect, […]
Numerical methods blog posts
I recently got a review copy of Scientific Computing: A Historical Perspective by Bertil Gustafsson. I thought that thumbing through the book might give me ideas for new topics to blog about. It still may, but mostly it made me think of numerical methods I’ve already blogged about. In historical order, or at least in the […]
Simulating identification by zip code, sex, and birthdate
As mentioned in the previous post, Latanya Sweeney estimated that 87% of Americans can be identified by the combination of zip code, sex, and birth date. We’ll do a quick-and-dirty estimate and a simulation to show that this result is plausible. There’s no point being too realistic with a simulation because the actual data that […]
No funding for uncomfortable results
In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous. The state of Massachusetts had released data on 135,000 state employees and their families with obvious identifiers removed. However, the data contained zip code, birth date, and sex for each individual. Sweeney was able to cross reference this data with publicly available […]
Sine of a googol
How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean? Sine of a trillion Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only […]
Six degrees of Kevin Bacon, Paul Erdos, and Wikipedia
I just discovered the web site Six Degrees of Wikipedia. It lets you enter two topics and it will show you how few hops it can take to get from one to the other. Since the mathematical equivalent of Six Degrees of Kevin Bacon is Six degrees of Paul Erdős, I tried looking for the […]
Mersenne prime trend
Mersenne primes have the form 2p -1 where p is a prime. The graph below plots the trend in the size of these numbers based on the 50 Mersenne primes currently known. The vertical axis plots the exponents p, which are essentially the logs base 2 of the Mersenne primes. The scale is logarithmic, so […]
Spherical trig, Research Triangle, and Mathematica
This post will look at the triangle behind North Carolina’s Research Triangle using Mathematica’s geographic functions. Spherical triangles A spherical triangle is a triangle drawn on the surface of a sphere. It has three vertices, given by points on the sphere, and three sides. The sides of the triangle are portions of great circles running […]
Visualizing data breaches
The image below is a static screen shot of an interactive visualization of the world’s biggest data breaches. The site lets you filter the data by industry and type of breach. See the site for credits and the raw data.
Topping out
There’s an ancient tradition of construction workers putting a Christmas tree on top of a building when it reaches its full height. I happened to drive by a recently topped out building this morning.
Complex exponentials
Here’s something that comes up occasionally, a case where I have to tell someone “It doesn’t work that way.” I’ll write it up here so next time I can just send them a link instead of retyping my explanation. Rules for exponents The rules for manipulating expressions with real numbers carry over to complex numbers […]
Sine sum
Sam Walters posted something interesting on Twitter yesterday I hadn’t seem before: The sines of the positive integers have just the right balance of pluses and minuses to keep their sum in a fixed interval. (Not hard to show.) #math pic.twitter.com/RxeoWg6bhn — Sam Walters ☕️ (@SamuelGWalters) November 29, 2018 If for some reason your browser […]
My Twitter graveyard
I ran into The Google Cemetery the other day, a site that lists Google products that have come and gone. Google receives a lot of criticism when they discontinue a product, which is odd for a couple reasons. First, the products are free, so no one is entitled to them. Second, it’s great for a […]
Poetic description of privacy-preserving analysis
Erlingsson et al give a poetic description of privacy-preserving analysis in their RAPPOR paper [1]. They say that the goal is to … allow the forest of client data to be studied, without permitting the possibility of looking at individual trees. Related posts What is differential privacy? Data privacy consulting [1] Úlfar Erlingsson, Vasyl Pihur, and […]
Searching for Mersenne primes
The nth Mersenne number is Mn = 2n – 1. A Mersenne prime is a Mersenne number which is also prime. So far 50 have been found [1]. A necessary condition for Mn to be prime is that n is prime, so searches for Mersenne numbers only test prime values of n. It’s not sufficient for n to be prime […]
Searching for Fermat primes
Fermat numbers have the form Fermat numbers are prime if n = 0, 1, 2, 3, or 4. Nobody has confirmed that any other Fermat numbers are prime. Maybe there are only five Fermat primes and we’ve found all of them. But there might be infinitely many Fermat primes. Nobody knows. There’s a specialized test for […]
Geometry of an oblate spheroid
We all live on an oblate spheroid [1], so it could be handy to know a little about oblate spheroids. Eccentricity Conventional notation uses a for the equatorial radius and c for the polar radius. Oblate means a > c. The eccentricity e is defined by For a perfect sphere, a = c and so e = 0. The eccentricity for earth is […]
All possible scales
Pete White contacted me in response to a blog post I wrote enumerating musical scales. He has written a book on the subject, with audio, that he is giving away. He asked if I would host the content, and I am hosting it here. Here are a couple screen shots from the book to give […]
Ellipsoid distance on Earth
To first approximation, Earth is a sphere. But it bulges at the equator, and to second approximation, Earth is an oblate spheroid. Earth is not exactly an oblate spheroid either, but the error in the oblate spheroid model is about 100x smaller than the error in the spherical model. Finding the distance between two points […]
Sequence alignment
In my previous post I illustrated the Levenshtein edit distance by comparing the opening paragraphs of Finnegans Wake by James Joyce and a parody by Adam Roberts. In this post I’ll show how to align two sequences using the sequence alignment algorithms of Needleman-Wunsch and Hirschberg. These algorithms can be used to compare any sequences, though they […]
Levenshtein distance from Finnegans Wake to Return of the Jedi
I ran into a delightfully strange blog post today called Finnegans Ewok that edits the first few paragraphs of Finnegans Wake to make it into something like Return of the Jedi. The author, Adam Roberts, said via Twitter “What I found interesting here was how little I had to change Joyce’s original text. Tweak a couple […]
Rényi Differential Privacy
Differential privacy, specifically ε-differential privacy, gives strong privacy guarantees, but it can be overly cautious by focusing on worst-case scenarios. The generalization (ε, δ)-differential privacy was introduced to make ε-differential privacy more flexible. Rényi differential privacy (RDP) is a new generalization of ε-differential privacy by Ilya Mironov that is comparable to the (ε, δ) version but has several […]
Rényi Entropy
The most common way of measuring information is Shannon entropy, but there are others. Rényi entropy, developed by Hungarian mathematician Alfréd Rényi, generalizes Shannon entropy and includes other entropy measures as special cases. Rényi entropy of order α If a discrete random variable X has n possible values, where the ith outcome has probability pi, then the Rényi entropy […]
Curry-Howard-Lambek correspondence
Curry-Howard-Lambek is a set of correspondences between logic, programming, and category theory. You may have heard of the slogan proofs-as-programs or propositions-as-types. These refer to the Curry-Howard correspondence between mathematical proofs and programs. Lambek’s name is appended to the Curry-Howard correspondence to represent connections to category theory. The term Curry-Howard isomorphism is often used but is an overstatement. Logic […]
International internet privacy law
Scott Hanselman interviewed attorney Gary Nissenbaum in show #647 of Hanselminutes. The title was “How GDPR is effecting the American Legal System.” Can Europe pass laws constraining American citizens? Didn’t we settle that question in 1776, or at least by 1783? And yet it is inevitable that European law effects Americans. And in fact Nissembaum […]
Prime denominators and nines complement
Let p be a prime. If the repeating decimal for the fraction a/p has even period, the the second half of the decimals are the 9’s complement of the first half. This is known as Midy’s theorem. For a small example, take 1/7 = 0.142857142857… and notice that 142 + 857 = 999. That is, 8, 5, […]
Kilogram redefined in terms of Planck constant
The General Conference on Weights and Measures voted today to redefine the kilogram. The official definition no longer refers to the mass of the International Prototype of the Kilogram (IPK) stored at the BIPM (Bureau International des Poids et Measures) in France. The Coulomb, kelvin, and mole have also been redefined. The vote took place today, 2018-11-16, and […]
Comparing bfloat16 range and precision to other 16-bit numbers
Deep learning has spurred interest in novel floating point formats. Algorithms often don’t need as much precision as standard IEEE-754 doubles or even single precision floats. Lower precision makes it possible to hold more numbers in memory, reducing the time spent swapping numbers in and out of memory. Also, low-precision circuits are far less complex. […]
Why “work smarter, not harder” bothers me
One of my most popular posts on Twitter was an implicit criticism of the cliché “work smarter, not harder.” Productivity tip: Work hard. — John D. Cook (@JohnDCook) October 8, 2015 I agree with the idea that you can often be more productive by stepping back and thinking about what you’re doing. I’ve written before, […]
New expansions of confluent hypergeometric function
Last week Bujanda et al published a paper [1] that gives new expansions for the confluent hypergeometric function. I’ll back up explain what that means before saying more about the new paper. Hypergeometric functions Hypergeometric functions are something of a “grand unified theory” of special functions. Many functions that come up in application are special […]
Big data and privacy
How does big data impact privacy? Which is a bigger risk to your privacy, being part of a little database or a big database? Rows vs Columns People commonly speak of big data in terms of volume—the “four v’s” of big data being volume, variety, velocity, and veracity—but what we’re concerned with here might better be […]
...36373839404142434445...