Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-04-26 03:16
Möbius transformations over a finite field
A Mobius transformation is a function of the form where ad - bc = 1. We usually think of z as a complex number, but it doesn't have to be. We could define Mobius transformations in any context where we can multiply, add, and divide, i.e. over any field. In particular, we could work over [...]The post Mobius transformations over a finite field first appeared on John D. Cook.
Sort and remove duplicates
A common idiom in command line processing of text files is ... | sort | uniq | ... Some process produces lines of text. You want to pipe that text through sort to sort the lines in alphabetical order, then pass it to uniq to filter out all but the unique lines. The uniq utility [...]The post Sort and remove duplicates first appeared on John D. Cook.
Swish function and a Swiss mathematician
The previous post looked at the swish function and related activation functions for deep neural networks designed to address the dying ReLU problem." Unlike many activation functions, the function f(x) is not monotone but has a minimum near x0 = -1.2784. The exact location of the minimum is where W is the Lambert W function, [...]The post Swish function and a Swiss mathematician first appeared on John D. Cook.
Swish, mish, and serf
Swish, mish, and serf are neural net activation functions. The names are fun to say, but more importantly the functions have been shown to improve neural network performance by solving the dying ReLU problem." This happens when a large number of node weights become zero during training and do not contribute further to the learning [...]The post Swish, mish, and serf first appeared on John D. Cook.
Generating and inspecting an RSA private key
In principle you generate an RSA key by finding two large prime numbers, p and q, and computing n = pq. You could, for example, generate random numbers by rolling dice, then type the numbers into Mathematica to test each for primaility until you find a couple prime numbers of the right size. In practice [...]The post Generating and inspecting an RSA private key first appeared on John D. Cook.
RSA encryption in practice
At its core, RSA encryption is modular exponentiation. That is, given a message m, the encrypted form of m is x = me mod n where e is a publicly known exponent and n is a product of two large primes. The number n is made public but only the holder of the private key [...]The post RSA encryption in practice first appeared on John D. Cook.
Code to convert words to Major system numbers
A few days ago I wrote about using the CMU Pronouncing Dictionary to search for words that decode to certain numbers in the Major mnemonic system. You can find a brief description of the Major system in that post. As large as the CMU dictionary is, it did not contain words mapping to some three-digit [...]The post Code to convert words to Major system numbers first appeared on John D. Cook.
Software and the Allee effect
The Allee effect is named after Warder Clyde Allee who added a term to the famous logistic equation. His added term is highlighted in blue. Here N is the population of a species over time, r is the intrinsic rate of increase, K is the carrying capacity, and A is the critical point. If you [...]The post Software and the Allee effect first appeared on John D. Cook.
Solved problems becoming unsolved
That's a solved problem. So nobody knows how to solve it anymore." Once a problem is deemed solved" interest in the problem plummets. Solved" problems may not be fully solved, but sufficiently solved that the problem is no longer fashionable. Practical issues remain, but interest moves elsewhere. The software written for the problem slowly decays. [...]The post Solved problems becoming unsolved first appeared on John D. Cook.
The cobbler’s son
There's an old saying The cobbler's son has no shoes." It's generally taken to mean that we can neglect to do for ourselves something we do for other people. I've been writing a few scripts for my personal use, things I've long intended to do but only recently got around to doing. I said something [...]The post The cobbler's son first appeared on John D. Cook.
Date sequence from the command line
I was looking back at Jeroen Janssen's book Data Science at the Command Line and his dseq utility caught my eye. This utility prints out a sequence of dates relative to the current date. I've needed this and didn't know it. Suppose you have a CSV file and you need to add a column of [...]The post Date sequence from the command line first appeared on John D. Cook.
Up-down permutations
An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example 1, 5, 3, 4, 2 is an up-down permutation of 1, 2, 3, 4, 5 because 1 < 5 > 3 < 4 > 2. Up-down permutations are [...]The post Up-down permutations first appeared on John D. Cook.
Variance of binned data
Suppose you have data that for some reason has been summarized into bins of width h. You don't have the original data, only the number of counts in each bin. You can't exactly find the sample mean or sample variance of the data because you don't actually have the data. But what's the best you [...]The post Variance of binned data first appeared on John D. Cook.
Ancient estimate of π and modern numerical analysis
A very crude way to estimate would be to find the perimeter of squares inside and outside a unit circle. The outside square has sides of length 2, so 2 < 8. The inside square has sides of length 2/2, so 8/2 < 2. This tells us is between 2.82 and 4. Not [...]The post Ancient estimate of and modern numerical analysis first appeared on John D. Cook.
ARPAbet and the Major mnemonic system
ARPAbet is a phonetic spelling system developed by- you guessed it-ARPA, before it became DARPA. The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted in IPA is ZH in ARPAbet. In [...]The post ARPAbet and the Major mnemonic system first appeared on John D. Cook.
Ruzsa distance
A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group. Subset difference Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A - B [...]The post Ruzsa distance first appeared on John D. Cook.
Finding the imaginary part of an analytic function from the real part
A function f of a complex variable z = x +iy can be factored into real and imaginary parts: where x and y are real numbers, and u and v are real-valued functions of two real values. Suppose you are given u(x, y) and you want to find v(x, y). The function v is called [...]The post Finding the imaginary part of an analytic function from the real part first appeared on John D. Cook.
Every Japanese prefecture shrinking
It's well known that the population of Japan has been decreasing for years, and so I was a little puzzled by a recent headline saying that Japan's population has dropped in every one of its 47 prefectures. Although the national population is in decline, until now not all of the nation's 47 prefectures dropped in [...]The post Every Japanese prefecture shrinking first appeared on John D. Cook.
Named entity recognition
Named entity recognition (NER) is a task of natural language processing: pull out named things text. It sounds like trivial at first. Just create a giant list of named things and compare against that. But suppose, for example, University of Texas is on your list. If Texas is also on your list, do you report [...]The post Named entity recognition first appeared on John D. Cook.
Jaccard index and jazz albums
Jaccard index is a way of measuring the similarity of sets. The Jaccard index, or Jaccard similarity coefficient, of two sets A and B is the number of elements in their intersection, A B, divided by the number of elements in their union, A B. Jaccard similarity is a robust way to compare [...]The post Jaccard index and jazz albums first appeared on John D. Cook.
Trying NLP on Middle English
It's not fair to evaluate NLP software on a language it wasn't designed to process, but I wanted to try it anyway. The models in the spaCy software library were trained on modern English text and not on Middle English. Nevertheless, spaCy does a pretty good job of parsing Chaucer's Canterbury Tales, written over 600 [...]The post Trying NLP on Middle English first appeared on John D. Cook.
Extending harmonic numbers
For a positive integer n, the nth harmonic number is defined to be the sum of the reciprocals of the first n positive integers: How might we extend this definition so that n does not have to be a positive integer? First approach One way to extend harmonic numbers is as follows. Start with the [...]The post Extending harmonic numbers first appeared on John D. Cook.
A note on Zipf’s law
Very often when a number is large, and we don't know or care exactly how large it is, we can model it as infinite. This may make no practical difference and can make calculations much easier. I give several examples of this in the article Infinite is easier than big. When you run across a [...]The post A note on Zipf's law first appeared on John D. Cook.
Natural language processing and unnatural text
I recently evaluated two software applications designed to find PII (personally identifiable information) in free text using natural language processing. Both failed badly, passing over obvious examples of PII. By contrast, I also tried natural language processing software on a nonsensical poem, it the software did quite well. Doctor's notes It occurred to me later [...]The post Natural language processing and unnatural text first appeared on John D. Cook.
How rare is it to encounter a rare word?
I recently ran across a paper on typesetting rare Chinese characters. From the abstract: Written Chinese has tens of thousands of characters. But most available fonts contain only around 6 to 12 thousand common characters that can meet the needs of everyday users. However, in publications and information exchange in many professional fields, a number [...]The post How rare is it to encounter a rare word? first appeared on John D. Cook.
How an LLM might leak medical data
Machine learning models occasionally memorize training data. Under the right prompt, a model could return portions of the training data verbatim. If a large language model is trained on deidentified medical data, along with data that overlaps with the medical data, it could potentially leak details of a person's medical history. I'm not saying that [...]The post How an LLM might leak medical data first appeared on John D. Cook.
V-statistics
A few days ago I wrote about U-statistics, statistics which can be expressed as the average of a symmetric function over all combinations of elements of a set. V-statistics can be written as an average of over all products of elements of a set. Let S be a statistical sample of size n and let [...]The post V-statistics first appeared on John D. Cook.
Filtering on how words are being used
Yesterday I wrote about how you could use the spaCy Python library to find proper nouns in a document. Now suppose you want to refine this and find proper nouns that are the subjects of sentences or proper nouns that are direct objects. This post was motivated by a project in which I needed to [...]The post Filtering on how words are being used first appeared on John D. Cook.
Forever chemicals and blood donation
I saw a headline saying that donating blood lowers the level of forever chemicals in your body. This post will give a back-of-the-envelope calculation to show that this idea is plausible. Suppose there are chemicals in your bloodstream that do not break down and that your body will not filter out. Suppose you have about [...]The post Forever chemicals and blood donation first appeared on John D. Cook.
Searching for proper nouns
Suppose you want to find all the proper nouns in a document. You could grep for every word that starts with a capital letter with something like grep '\b[A-Z]\w+' but this would return the first word of each sentence in addition to the words you're after. You could grep for capitalized words that are not [...]The post Searching for proper nouns first appeared on John D. Cook.
Moments of Tukey’s g-and-h distribution
John Tukey developed his so-called g-and-h distribution to be very flexible, having a wide variety of possible values of skewness and kurtosis. Although the reason for the distribution's existence is its range of possible skewness and values, calculating the skewness and kurtosis of the distribution is not simple. Definition Let be the function of [...]The post Moments of Tukey's g-and-h distribution first appeared on John D. Cook.
Symmetric functions and U-statistics
A symmetric function is a function whose value is unchanged under every permutation of its arguments. The previous post showed how three symmetric functions of the sides of a triangle a + b + c ab + bc + ac abc are related to the perimeter, inner radius, and outer radius. It also mentioned that [...]The post Symmetric functions and U-statistics first appeared on John D. Cook.
Relating perimeter, inner radius, outer radius, and sides of a triangle
Suppose a triangle T has sides a, b, and c. Let s be the semi-perimeter, i.e. half the perimeter. Let r be the inner radius, the radius of the largest circle that can fit inside T. Let R be the outer radius, the radius of the smallest circle that can enclose T. Then three simple [...]The post Relating perimeter, inner radius, outer radius, and sides of a triangle first appeared on John D. Cook.
Experiments with Bing chat
My two previous posts looked at experiments with ChatGPT and Google Bard. This post will look at redoing the same experiments with Microsoft's Bing Chat: looking for mnemonic encodings and simplifying Boolean expressions. When you open up Bing chat you can select a conversational style: More creative More balanced More precise I chose more precise" [...]The post Experiments with Bing chat first appeared on John D. Cook.
Boolean function minimization with AI
I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression. I didn't expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it [...]The post Boolean function minimization with AI first appeared on John D. Cook.
Large language models and mnemonics
The Major mnemonic system encodes numbers as words in order to make them easier to remember. Digits correspond to consonant sounds (not spellings) as explained here. You can use the system ad hoc, improvising an encoding of a word as needed, or you can memorize canonical encodings of numbers, also known as pegs. Pegs have [...]The post Large language models and mnemonics first appeared on John D. Cook.
When does a function have an addition theorem?
Motivating examples The addition theorem for cosine says that and the addition theorem for hyperbolic cosine is analogous, though with a sign change. An addition theorem is a theorem that relates a function's value at x + y to its values at x and at y. The squaring function satisfies a very simple addition theorem [...]The post When does a function have an addition theorem? first appeared on John D. Cook.
How to mark a language in HTML
In HTML you can mark the language of a piece of text by putting it inside span tags and setting the lang attribute to a two-letter abbreviation. For example, <span lang="fr">Allons enfants de la Patrie, Le jour de gloire est arrive !<span> indicates that the first two lines of the French national anthem are in [...]The post How to mark a language in HTML first appeared on John D. Cook.
Russian transliteration hack
I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form & + transliteration + cy;. For example, the Cyrillic letter is based on the Greek letter and [...]The post Russian transliteration hack first appeared on John D. Cook.
Symbols for transforms
I was looking through HTML entities and ran across &Fouriertrf;. I searched for all entities ending in trf; and also found &Mellintrf;, &Laplacetrf;, and &zeetrf;. Apparently trf" stands transform" and these symbols are intended to be used to represent the Fourier transform, Mellin transform, Laplace transform, and z-transform. You would not know from the Unicode [...]The post Symbols for transforms first appeared on John D. Cook.
Visualizing a determinant identity
The previous post discussed an algorithm developed by Charles Dodgson (better known as Lewis Carroll) for computing determinants. The key identity for proving that Dodgson's algorithm is correct involves the Desnanot-Jacobi identity from 1841. The identity is intimidating in its symbolic form and yet easy to visualize. In algebraic form the identity says Here a [...]The post Visualizing a determinant identity first appeared on John D. Cook.
How Lewis Carroll computed determinants
Charles Dodgson, better known by his pen name Lewis Carroll, discovered a method of calculating determinants now known variously as the method of contractants, Dodgson condensation, or simply condensation. The method was devised for ease of computation by hand, but it has features that make it a practical method for computation by machine. Overview The [...]The post How Lewis Carroll computed determinants first appeared on John D. Cook.
Gram matrix
An elegant algebraic identity says If x is the vector [a b] and y is the vector [c d] then this identity can be written where the dot indicates the usual dot product. I posted this on Twitter the other day. Gram matrix Now suppose that x and y are vectors of any length n. [...]The post Gram matrix first appeared on John D. Cook.
Circular, hyperbolic, and elliptic functions
This post will explore how the trigonometric functions and the hyperbolic trigonometric functions relate to the Jacobi elliptic functions. There are six circular functions: sin, cos, tan, sec, csc, and cot. There are six hyperbolic functions: just stick an h' on the end of each of the circular functions. There are an infinite number of [...]The post Circular, hyperbolic, and elliptic functions first appeared on John D. Cook.
Mnemonic hexagons
I posted some notes here about mnemonics for trig identities and hyperbolic trig identities.The post Mnemonic hexagons first appeared on John D. Cook.
Best line to fit three points
Suppose you want to fit a line to three data points. If no line passes exactly through your points, what's the best compromise you could make? Chebyshev suggested the best thing to do is find the minmax line, the line that minimizes the maximum error. That is, for each candidate line, find the vertical distance [...]The post Best line to fit three points first appeared on John D. Cook.
Lehman’s inequality, circuits, and LaTeX
Let A, B, C, and D be positive numbers. Then Lehman's inequality says Proof by circuit This inequality can be proved analytically, but Lehman's proof is interesting because he uses electrical circuits [1]. Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left. Resistors R1 and [...]The post Lehman's inequality, circuits, and LaTeX first appeared on John D. Cook.
Contraharmonic mean quirk
A few weeks ago I wrote about the contraharmonic mean. Given two positive numbers a and b, their contraharmonic mean is This mean has the unusual property that increasing one of the two inputs could decrease the mean. If you take the partial derivative with respect to a you can see that it is zero [...]The post Contraharmonic mean quirk first appeared on John D. Cook.
Technological schadenfreude
I had a tweet Twitter go viral yesterday, at least relatively viral. Elon Musk could tweet a punctuation mark and get an orders of magnitude more traffic, but this was viral by my standards [1]. That schadenfreude-like feeling when you realize something you felt you should learn but didn't is now obsolete." Apparently this resonated [...]The post Technological schadenfreude first appeared on John D. Cook.
Complex differential equations
Differential equations in the complex plane are different from differential equations on the real line. Suppose you have an nth order linear differential equation as follows. The real case If the as are continuous, real-valued functions on an open interval of the real line, then there are n linearly independent solutions over that interval. The [...]The post Complex differential equations first appeared on John D. Cook.
...78910111213141516...