Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-05-06 23:17
Area of quadrilateral as a determinant
I've written several posts about how determinants come up in geometry. These determinants often look similar, having columns related to coordinates and a column of ones. You can find several examples here along with an explanation for this pattern. If you have three points z1, z2, and z3 in the complex plane, you can find [...]The post Area of quadrilateral as a determinant first appeared on John D. Cook.
A very accurate logarithm approximation
The previous post looked at an efficient way to approximate nth roots of fractions near 1 by hand. This post does the same for logarithms. As before, we assume x = p/q and define s = p + q d = p - q Because we're interested in values of x near 1, d is [...]The post A very accurate logarithm approximation first appeared on John D. Cook.
Handy approximation for roots of fractions
This post will discuss a curious approximation with a curious history. Approximation Let x be a number near 1, written as a fraction x = p / q. Then define s and d as the sum and difference of the numerator and denominator. s = p + q d = p - q Since we [...]The post Handy approximation for roots of fractions first appeared on John D. Cook.
Uncovering names masked with stars
Sometimes I'll see things like my name partially concealed as J*** C*** and think a lot of good that does." Masking letters reveals more than people realize. For example, when you see that someone's first name is four letters and begins with J, there's about a 70% chance they're male and there's a 44% chance [...]The post Uncovering names masked with stars first appeared on John D. Cook.
Almost ASCII
I was working recently with a gigabyte file that had a dozen non-ASCII characters. This is very common. The ASCII character set is not quite big enough for a lot of tasks. Of course it's completely inadequate if you're writing Japanese, but it's almost enough for documents written in English and a few other languages. [...]The post Almost ASCII first appeared on John D. Cook.
A knight’s tour of an infinite chessboard
Let ^2 be the lattice of points in the plane with integer coordinates. You could think of these points as being the centers of the squares in a chessboard extending to infinity in every direction. Cantor tells us that the points in ^2 are countable. What's more surprising is that you could count the points [...]The post A knight's tour of an infinite chessboard first appeared on John D. Cook.
Natural one-liners
I learned to use Unix in college-this was before Linux-but it felt a little mysterious. Clearly it was developed by really smart people, but what were the problems that motivated their design choices? Some of these are widely repeated. For example, commands have terse names because you may have to transmit commands over a glacial [...]The post Natural one-liners first appeared on John D. Cook.
When is less data less private?
If I give you a database, I give you every row in the database. So if you delete some rows from the database, you have less information, not more, right? This seems very simple, and it mostly is, but there are a couple subtleties. A common measure in data privacy is k-anonymity. The idea is [...]The post When is less data less private? first appeared on John D. Cook.
Additive functions
A functionf from positive integers to real numbers is defined to be additive if for relatively prime numbers m and n, f(mn) = f(m) + f(n). The function f is called completely addititive if the above holds for all positive integers m and n, i.e. we drop the requirement that m and n are relatively [...]The post Additive functions first appeared on John D. Cook.
Frequency analysis
Suppose you have a list of encrypted surnames names of US citizens. If the list is long enough, the encrypted name that occurs most often probably corresponds to Smith. The second most common encrypted name probably corresponds to Johnson, and so forth. This kind of inference is analogous to solving a cryptogram puzzle by counting [...]The post Frequency analysis first appeared on John D. Cook.
Security by obscurity
Security-by-obscurity is a bad idea in general. It's better, for example, to have a login page than to give your site an obscure URL. It's better to encrypt a file than to hide it in some odd directory. It's better to use a well-vetted encryption algorithm than to roll your own. There there are people [...]The post Security by obscurity first appeared on John D. Cook.
Advanced questions about a basic diagram
I saw a hand-drawn version of the diagram above yesterday and noticed that the points were too evenly distributed. That got me to thinking: is there any objective way to say that this famous diagram is in some sense complete? If you were to make a diagram with more points, what would they be? Simple [...]The post Advanced questions about a basic diagram first appeared on John D. Cook.
How much metadata is in a photo?
A few days ago I wrote about the privacy implications of metadata in a PDF. This post will do the same for photos. You can see the metadata in a photo using exiftool. By default cameras include time and location data. I ran this tool on a photo I took in Seattle a few years [...]The post How much metadata is in a photo? first appeared on John D. Cook.
The Borwein integrals
The Borwein integrals introduced in [1] are a famous example of how proof-by-example can go wrong. Define sinc(x) as sin(x)/x. Then the following equations hold. However where 2.3 * 10-11. This is where many presentations end, concluding with the moral that a pattern can hold for a while and then stop. But I'd [...]The post The Borwein integrals first appeared on John D. Cook.
Avoiding Multiprocessing Errors in Bash Shell
Suppose you have two Linux processes trying to modify a file at the same time and you don't want them stepping on each other's work and making a mess. A common solution is to use a lock" mechanism (a.k.a. mutex"). One process locks the lock" and by this action has sole ownership of a [...]The post Avoiding Multiprocessing Errors in Bash Shell first appeared on John D. Cook.
This-way-up and Knuth arrows
I was looking today at a cardboard box that had the this way up" symbol on it and wondered whether there is a Unicode value for it. Apparently not. But there is an ISO code for it: ISO 7000 symbol 0623. It's an international standard symbol for indicating how to orient a package. The name [...]The post This-way-up and Knuth arrows first appeared on John D. Cook.
Factoring pseudoprimes
Fermat's little theorem says that if p is a prime number, then for any positive integer b < p we hve bp-1 = 1 (mod p). This theorem gives a necessary but not sufficient condition for a number to be prime. Fermat's primality test The converse of Fermat's little theorem is not always true, but [...]The post Factoring pseudoprimes first appeared on John D. Cook.
Do comments in a LaTeX file change the output?
When you add a comment to a LaTeX file, it makes no visible change to the output. The comment is ignored as far as the appearance of the file. But is that comment somehow included in the file anyway? If you compile a LaTeX file to PDF, then edit it by throwing in a comment, [...]The post Do comments in a LaTeX file change the output? first appeared on John D. Cook.
Your PDF may reveal more than you intend
When you create a PDF file, what you see is not all you get. There is metadata embedded in the file that might be useful. It also might reveal information you'd rather not reveal. The previous post looked at just the time stamp on a file. This post will look at more metadata, focusing on [...]The post Your PDF may reveal more than you intend first appeared on John D. Cook.
If you save a file as PDF twice, you get two different files
If you save a file as a PDF twice, you won't get exactly the same file both times. To illustrate this, I created an LibreOffice document containing Hello world." and saved it twice, first as humpty.pdf then as dumpty.pdf. Then I compared the two files. % diff humpty.pdf dumpty.pdf Binary files humpty.pdf and dumpty.pdf differ [...]The post If you save a file as PDF twice, you get two different files first appeared on John D. Cook.
Is Low Precision Arithmetic Safe?
The popularity of low precision arithmetic for computing has exploded since the 2017 release of the Nvidia Volta GPU. The half precision tensor cores of Volta offered a massive 16X performance gain over double precision for key operations. The race to the bottom" for lower precision computations continues: some have even solved significant problems using [...]The post Is Low Precision Arithmetic Safe? first appeared on John D. Cook.
How likely is a random variable to be far from its center?
There are many answers to the question in the title: How likely is a random variable to be far from its center? The answers depend on how much you're willing to assume about your random variable. The more you can assume, the stronger your conclusion. The answers also depend on what you mean by center," [...]The post How likely is a random variable to be far from its center? first appeared on John D. Cook.
Connecting the FFT and quadratic reciprocity
Some readers will look at the title of this post and think Ah yes, the FFT. I use it all the time. But what is this quadratic reciprocity?" Others will look at the same title and think Gauss called the quadratic reciprocity theorem the jewel in the crown of mathematics. But what is this FFT [...]The post Connecting the FFT and quadratic reciprocity first appeared on John D. Cook.
Two-digit zip codes
It's common to truncate US zip codes to the first three digits for privacy reasons. Truncating to the first two digits is less common, but occurs in some data sets. HIPAA Safe Harbor requires sparse 3-digit zip codes to be suppressed; even when rolled up to three digits some regions are still sparsely populated. How [...]The post Two-digit zip codes first appeared on John D. Cook.
Bessel zero spacing
Bessel functions are to polar coordinates what sines and cosines are to rectangular coordinates. This is why Bessel function often arise in applications with radial symmetry. The locations of the zeros of Bessel functions are important in application, and so you can find software for computing these zeros in mathematical libraries. In days gone by [...]The post Bessel zero spacing first appeared on John D. Cook.
Coloring the queen’s graph
Suppose we have an n * n chessboard. The case n = 8 is of course most common, but we consider all positive integer values of n. The graph of a chess piece has an edge between two squares if and only if the piece can legally move between the two squares. Now suppose we [...]The post Coloring the queen's graph first appeared on John D. Cook.
Regex to match SWIFT-BIC codes
A SWIFT-BIC number identifies a bank, not a particular bank account. The BIC part stands for Bank Identifier Code. I had to look up the structure of SWIFT-BIC codes recently, and here it is: Four letters to identify the bank Two letters to identify the country Two letters or digits to identify the location Optionally, [...]The post Regex to match SWIFT-BIC codes first appeared on John D. Cook.
Bad takes on chaos theory
I just finished reading The Three Body Problem. At the end of the book is a preview of Cixin Liu's book Supernova Era. A bit of dialog in that preview stood out to me because it is touches on themes I've written about before. I've heard about that. When a butterfly flaps its wings, there's [...]The post Bad takes on chaos theory first appeared on John D. Cook.
New Ways To Make Code Run Faster
The news from Meta last week is a vivid reminder of the importance of making code run faster and more power-efficiently. Meta intends to purchase 350,000 Nvidia H100 GPUs this year [1]. Assuming 350W TDP [2] and $0.1621 per kW-h [3] average US energy cost, one expects a figure of $174 million per year in [...]The post New Ways To Make Code Run Faster first appeared on John D. Cook.
Brute force cryptanalysis
A naive view of simple substitution ciphers is that they are secure because there are 26! ways to permute the English alphabet, and so an attacker would have to try 26! 4 * 1026 permutations. However, such brute force is not required. In practice, simple substitution ciphers are breakable by hand in a few [...]The post Brute force cryptanalysis first appeared on John D. Cook.
Straddling checkerboard encryption
Introduction Computers fundamentally changed cryptography, opening up new possibilities for making and breaking codes. At first it may not have been clear which side benefited most, but now it's clear that computers gave more power to code makers than code breakers. We now have cryptographic primitives that cannot be attacked more efficiently than by brute [...]The post Straddling checkerboard encryption first appeared on John D. Cook.
Email subscription changes
I will soon be discontinuing the email subscription option for this blog. I recommend that email subscribers switch over to subscribing to the RSS feed for the blog. If you're unfamiliar with RSS, here is an article on how to get started. (I recommend RSS in general, and not just for subscribing to this blog. [...]The post Email subscription changes first appeared on John D. Cook.
Beta inequality symmetries
I was thinking about the work I did when I worked in biostatistics at MD Anderson. This work was practical rather than mathematically elegant, useful in its time but not of long-term interest. However, one result came out of this work that I would call elegant, and that was a symmetry I found. Let X [...]The post Beta inequality symmetries first appeared on John D. Cook.
When is a function of two variables separable?
Given a function f(x,y), how can you tell whetherf can be factored into the product of a function g(x) of x alone and a function h(y) of y alone? Depending on how an expression for f is written, it may or may not be obvious whether f(x, y) can be separated into g(x) h(y). There [...]The post When is a function of two variables separable? first appeared on John D. Cook.
Applications of Bernoulli differential equations
When a nonlinear first order ordinary differential equation has the form with n 1, the change of variables turns the equation into a linear equation in u. The equation is known as Bernoulli's equation, though Leibniz came up with the same technique. Apparently the history is complicated [1]. It's nice that Bernoulli's equation can [...]The post Applications of Bernoulli differential equations first appeared on John D. Cook.
The IQ Test That AI Can’t Pass
Large language models have recently achieved remarkable test scores on well-known academic and professional exams (see, e.g., [1], p. 6). On such tests, these models are at times said to reach human-level performance. However, there is one test that humans can pass but every AI method known to have been tried has abysmally failed. The [...]The post The IQ Test That AI Can't Pass first appeared on John D. Cook.
New Twitter account for cryptography
I've started a new Twitter account: @CryptographyTip. The icon for the account is the symbol for XOR, a common operation in encryption. I intend to post about cryptography theory as well as practical matters such as software and file formats. You can find a list of my other technical twitter accounts here. You can also [...]The post New Twitter account for cryptography first appeared on John D. Cook.
Means of means bounding the logarithmic mean
The geometric, logarithmic, and arithmetic means of a and b are defined as follows. A few days ago I mentioned that G L A. The logarithmic mean slips between the geometric and arithmetic means. Or to put it another way, the logarithmic mean is bounded by the geometric and arithmetic means. You can [...]The post Means of means bounding the logarithmic mean first appeared on John D. Cook.
Base 64 encoding remainder problem
I've mentioned base 64 encoding a few times here, but I've left out a detail. This post fills in that detail. Base 64 encoding comes up in multiple contexts in which you want to represent binary data in text form. I've mentioned base 64 encoding in the context of Gnu ASCII armor. A more common [...]The post Base 64 encoding remainder problem first appeared on John D. Cook.
Binary to text to binary
Gnu Privacy Guard includes a way to encode binary files as plain ASCII text files, and turn these text files back into binary. This is intended as a way to transmit encrypted data, but it can be used to convert any kind of file from binary to text and back to binary. To illustrate this, [...]The post Binary to text to binary first appeared on John D. Cook.
Why “a caret, euro, trademark” ’ in a file?
Why might you see aTM in the middle of an otherwise intelligible file? The reason is very similar to the reason you might see , which I explained in the previous post. You might want to read that post first if you're not familiar with Unicode and character encodings. It all has to do with [...]The post Why a caret, euro, trademark" aTM in a file? first appeared on John D. Cook.
A valid character to represent an invalid character
You may have seen a web page with the symbol scattered throughout the text, especially in older web pages. What is this symbol and why does it appear unexpected? The symbol we're discussing is a bit of a paradox. It's the (valid) Unicode character to represent an invalid Unicode character. If you just read [...]The post A valid character to represent an invalid character first appeared on John D. Cook.
When zeros at natural numbers implies zero everywhere
Suppose a function f(z) equals 0 at z = 0, 1, 2, 3, .... Under what circumstances might you be able to conclude that f is zero everywhere? Clearly you need some hypothesis on f. For example, the function sin(z) is zero at every integer but certainly not constantly zero. Carlson's theorem says that if [...]The post When zeros at natural numbers implies zero everywhere first appeared on John D. Cook.
When High Performance Computing Is Not High Performance
Everybody cares about codes running fast on their computers. Hardware improvements over recent decades have made this possible. But how well are we taking advantage of hardware speedups? Consider these two C++ code examples. Assume here n = 10000000. void sub(int* a, int* b) { for (int i=0; i<n; ++i) a[i] = i + [...]The post When High Performance Computing Is Not High Performance first appeared on John D. Cook.
Leading zeros
The confusion between numbers such as 7 and 007 comes up everywhere. We know they're different-James Bond isn't Agent 7-and yet the distinction isn't quite trivial. How should software handle the two kinds of numbers? The answer isn't as simple as Do what the user expects" because different users have different expectations. Excel If you [...]The post Leading zeros first appeared on John D. Cook.
Ky Fan’s inequality
Let with each component satisfying 0 < xi 1/2. Define the complement x' by taking the complement of each entry. Let G andA represent the geometric and arithmetic mean respectively. Then Ky Fan's inequality says Now let H be the harmonic mean. Since in general H G A, you might guess that [...]The post Ky Fan's inequality first appeared on John D. Cook.
Previous digital signature standard expires next month
The Digital Signature Standard (DSS) FIPS 184-4, first published in 2013, expires a few days from now, on February 3, 2024. It is superseded by NIST FIPS 184-5. This new version was published on February 3, 2023, giving everyone a year to adopt the new new standard before it became required. The differences between the [...]The post Previous digital signature standard expires next month first appeared on John D. Cook.
Integral representations of means
The average of two numbers,a andb, can be written as the average of x over the interval [a, b]. This is easily verified as follows. The average is the arithemtic mean. We can represent other means as above if we generalize the pattern to be For the arithmetic mean, (x) = x. Logarithmic mean If [...]The post Integral representations of means first appeared on John D. Cook.
SierpiƄski’s inequality
Let An, Gn and Hn be the arithmetic mean, geometric mean, and harmonic mean of a set of n numbers. When n = 2, the arithmetic mean times the harmonic mean is the geometric mean squared. The proof is simple: When n > 2 we no longer have equality. However, W. Sierpiski, perhaps best known [...]The post Sierpiski's inequality first appeared on John D. Cook.
The Five Safes data privacy framework
The Five Safes decision framework was created a couple decades ago by Felix Ritchie at the UK Office for National Statistics. It is a framework for evaluating the safe use of confidential data, particularly by government agencies. You can find a description of the Five Safes, for example, in NIST SP 800-188. The Five Safes [...]The post The Five Safes data privacy framework first appeared on John D. Cook.
12345678910...