Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-05-06 18:03
Tracking and the Euler rotation theorem
Suppose you are in an air traffic control tower observing a plane moving in a straight line and you want to rotate your frame of reference to align with the plane. In the new frame the plane is moving along a coordinate axis with no component of motion in the other directions. You could do [...]The post Tracking and the Euler rotation theorem first appeared on John D. Cook.
Using WordNet to create a PAO system
NLP software infers parts of speech by context. For example, the SpaCy NLP software can determine the parts of speech in the poem Jabberwocky even though the words are nonsense. More on this here. If you want to tell the parts of speech for isolated words, maybe software like SpaCy isn't the right tool. You [...]The post Using WordNet to create a PAO system first appeared on John D. Cook.
Memorizing four-digit numbers
The Major mnemonic system is a method of converting numbers to words that can be more easily memorized. The basics of the system can be written on an index card, but there are practical details that are seldom written down. Presentations of the Major system can be misleading, intentionally or unintentionally, by implying that it [...]The post Memorizing four-digit numbers first appeared on John D. Cook.
The numerical range ellipse
Let A be an n * n complex matrix. The numerical range of A is the image of x*Ax over the unit sphere. That is, the numerical range of A is the set W(A) in defined by W(A) = {x*Ax | x n and ||x|| = 1} where x* is the conjugate transpose of [...]The post The numerical range ellipse first appeared on John D. Cook.
Random slices of a sphube
Ben Grimmer posted something yesterday on Twitter: A nice mathematical puzzle If you take a 4-norm ball and cut it carefully, you will find a two-norm ball. 3D printed visual evidence below. The puzzle: Why does this happen and how much more generally does it happen? (This question was first posed to me by Pablo [...]The post Random slices of a sphube first appeared on John D. Cook.
Twin stars and twin primes
Are there more twin stars or twin primes? If the twin prime conjecture is true, there are an infinite number of twin primes, and that would settle the question. We don't know whether there are infinitely many twin primes, and it's a little challenging to find any results on how many twin primes we're sure [...]The post Twin stars and twin primes first appeared on John D. Cook.
Simple way to distribute points on a sphere
Evenly placing points on a sphere is a difficult problem. It's impossible in general, and so you distribute the points as evenly as you can. The results vary according to how you measure how evenly the points are spread. However, there is a fast and simple way to distribute points that may be good enough, [...]The post Simple way to distribute points on a sphere first appeared on John D. Cook.
Spherical coordinate Rosetta Stone
If you've only seen one definition of spherical coordinates, you may be shocked to discover that there are multiple conventions. In particular, mathematicians and geoscientists have different conventions. As Volker Michel put it in book on constructive approximation, Many mathematicians have faced weird jigsaw puzzles with misplaced continents after using a data set from a [...]The post Spherical coordinate Rosetta Stone first appeared on John D. Cook.
Creating a Traveling Salesman Tour of Texas with Mathematica
A Traveling Salesman tour visits a list of destinations using the shortest path. There's an obvious way to find the shortest path connecting N points: try all N! paths and see which one is shortest. Unfortunately, that might take a while. Texas has 254 counties, and so calculating a tour of Texas counties by brute [...]The post Creating a Traveling Salesman Tour of Texas with Mathematica first appeared on John D. Cook.
Area and volume of hypersphere cap
A spherical cap is the portion of a sphere above some horizontal plane. For example, the polar ice cap of the earth is the region above some latitude. I mentioned in this post that the area above a latitude is where R is the earth's radius. Latitude is the angle up from the equator. [...]The post Area and volume of hypersphere cap first appeared on John D. Cook.
Random points in a high-dimensional orthant
In high dimensions, randomly chosen vectors are very likely nearly orthogonal. I'll unpack this a little bit then demonstrate it by simulation. Then I'll look at what happens when we restrict our attention to points with positive coordinates. *** The lengths of vectors don't contribute to the angles between them, so we may as well [...]The post Random points in a high-dimensional orthant first appeared on John D. Cook.
Cosine similarity does not satisfy the triangle inequality
The previous post looked at cosine similarity for embeddings of words in vector spaces. Word embeddings like word2vec map words into high-dimensional vector spaces in such a way that related words correspond to vectors that are roughly parallel. Ideally the more similar the words, the smaller the angle between their corresponding vectors. The cosine similarity [...]The post Cosine similarity does not satisfy the triangle inequality first appeared on John D. Cook.
Angles between words
Natural language processing represents words as high-dimensional vectors, on the order of 100 dimensions. For example, the glove-wiki-gigaword-50 set of word vectors contains 50-dimensional vectors, and the the glove-wiki-gigaword-200 set of word vectors contains 200-dimensional vectors. The intent is to represent words in such a way that the angle between vectors is related to similarity [...]The post Angles between words first appeared on John D. Cook.
Productive constraints
This post will discuss two scripting languages, but that's not what the post is really about. It's really about expressiveness and (or versus) productivity. *** I was excited to discover the awk programming language sometime in college because I had not used a scripting language before. Compared to C, awk was high-level luxury. Then a [...]The post Productive constraints first appeared on John D. Cook.
Möbius transformations over a finite field
A Mobius transformation is a function of the form where ad - bc = 1. We usually think of z as a complex number, but it doesn't have to be. We could define Mobius transformations in any context where we can multiply, add, and divide, i.e. over any field. In particular, we could work over [...]The post Mobius transformations over a finite field first appeared on John D. Cook.
Sort and remove duplicates
A common idiom in command line processing of text files is ... | sort | uniq | ... Some process produces lines of text. You want to pipe that text through sort to sort the lines in alphabetical order, then pass it to uniq to filter out all but the unique lines. The uniq utility [...]The post Sort and remove duplicates first appeared on John D. Cook.
Swish function and a Swiss mathematician
The previous post looked at the swish function and related activation functions for deep neural networks designed to address the dying ReLU problem." Unlike many activation functions, the function f(x) is not monotone but has a minimum near x0 = -1.2784. The exact location of the minimum is where W is the Lambert W function, [...]The post Swish function and a Swiss mathematician first appeared on John D. Cook.
Swish, mish, and serf
Swish, mish, and serf are neural net activation functions. The names are fun to say, but more importantly the functions have been shown to improve neural network performance by solving the dying ReLU problem." This happens when a large number of node weights become zero during training and do not contribute further to the learning [...]The post Swish, mish, and serf first appeared on John D. Cook.
Generating and inspecting an RSA private key
In principle you generate an RSA key by finding two large prime numbers, p and q, and computing n = pq. You could, for example, generate random numbers by rolling dice, then type the numbers into Mathematica to test each for primaility until you find a couple prime numbers of the right size. In practice [...]The post Generating and inspecting an RSA private key first appeared on John D. Cook.
RSA encryption in practice
At its core, RSA encryption is modular exponentiation. That is, given a message m, the encrypted form of m is x = me mod n where e is a publicly known exponent and n is a product of two large primes. The number n is made public but only the holder of the private key [...]The post RSA encryption in practice first appeared on John D. Cook.
Code to convert words to Major system numbers
A few days ago I wrote about using the CMU Pronouncing Dictionary to search for words that decode to certain numbers in the Major mnemonic system. You can find a brief description of the Major system in that post. As large as the CMU dictionary is, it did not contain words mapping to some three-digit [...]The post Code to convert words to Major system numbers first appeared on John D. Cook.
Software and the Allee effect
The Allee effect is named after Warder Clyde Allee who added a term to the famous logistic equation. His added term is highlighted in blue. Here N is the population of a species over time, r is the intrinsic rate of increase, K is the carrying capacity, and A is the critical point. If you [...]The post Software and the Allee effect first appeared on John D. Cook.
Solved problems becoming unsolved
That's a solved problem. So nobody knows how to solve it anymore." Once a problem is deemed solved" interest in the problem plummets. Solved" problems may not be fully solved, but sufficiently solved that the problem is no longer fashionable. Practical issues remain, but interest moves elsewhere. The software written for the problem slowly decays. [...]The post Solved problems becoming unsolved first appeared on John D. Cook.
The cobbler’s son
There's an old saying The cobbler's son has no shoes." It's generally taken to mean that we can neglect to do for ourselves something we do for other people. I've been writing a few scripts for my personal use, things I've long intended to do but only recently got around to doing. I said something [...]The post The cobbler's son first appeared on John D. Cook.
Date sequence from the command line
I was looking back at Jeroen Janssen's book Data Science at the Command Line and his dseq utility caught my eye. This utility prints out a sequence of dates relative to the current date. I've needed this and didn't know it. Suppose you have a CSV file and you need to add a column of [...]The post Date sequence from the command line first appeared on John D. Cook.
Up-down permutations
An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example 1, 5, 3, 4, 2 is an up-down permutation of 1, 2, 3, 4, 5 because 1 < 5 > 3 < 4 > 2. Up-down permutations are [...]The post Up-down permutations first appeared on John D. Cook.
Variance of binned data
Suppose you have data that for some reason has been summarized into bins of width h. You don't have the original data, only the number of counts in each bin. You can't exactly find the sample mean or sample variance of the data because you don't actually have the data. But what's the best you [...]The post Variance of binned data first appeared on John D. Cook.
Ancient estimate of π and modern numerical analysis
A very crude way to estimate would be to find the perimeter of squares inside and outside a unit circle. The outside square has sides of length 2, so 2 < 8. The inside square has sides of length 2/2, so 8/2 < 2. This tells us is between 2.82 and 4. Not [...]The post Ancient estimate of and modern numerical analysis first appeared on John D. Cook.
ARPAbet and the Major mnemonic system
ARPAbet is a phonetic spelling system developed by- you guessed it-ARPA, before it became DARPA. The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted in IPA is ZH in ARPAbet. In [...]The post ARPAbet and the Major mnemonic system first appeared on John D. Cook.
Ruzsa distance
A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group. Subset difference Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A - B [...]The post Ruzsa distance first appeared on John D. Cook.
Finding the imaginary part of an analytic function from the real part
A function f of a complex variable z = x +iy can be factored into real and imaginary parts: where x and y are real numbers, and u and v are real-valued functions of two real values. Suppose you are given u(x, y) and you want to find v(x, y). The function v is called [...]The post Finding the imaginary part of an analytic function from the real part first appeared on John D. Cook.
Every Japanese prefecture shrinking
It's well known that the population of Japan has been decreasing for years, and so I was a little puzzled by a recent headline saying that Japan's population has dropped in every one of its 47 prefectures. Although the national population is in decline, until now not all of the nation's 47 prefectures dropped in [...]The post Every Japanese prefecture shrinking first appeared on John D. Cook.
Named entity recognition
Named entity recognition (NER) is a task of natural language processing: pull out named things text. It sounds like trivial at first. Just create a giant list of named things and compare against that. But suppose, for example, University of Texas is on your list. If Texas is also on your list, do you report [...]The post Named entity recognition first appeared on John D. Cook.
Jaccard index and jazz albums
Jaccard index is a way of measuring the similarity of sets. The Jaccard index, or Jaccard similarity coefficient, of two sets A and B is the number of elements in their intersection, A B, divided by the number of elements in their union, A B. Jaccard similarity is a robust way to compare [...]The post Jaccard index and jazz albums first appeared on John D. Cook.
Trying NLP on Middle English
It's not fair to evaluate NLP software on a language it wasn't designed to process, but I wanted to try it anyway. The models in the spaCy software library were trained on modern English text and not on Middle English. Nevertheless, spaCy does a pretty good job of parsing Chaucer's Canterbury Tales, written over 600 [...]The post Trying NLP on Middle English first appeared on John D. Cook.
Extending harmonic numbers
For a positive integer n, the nth harmonic number is defined to be the sum of the reciprocals of the first n positive integers: How might we extend this definition so that n does not have to be a positive integer? First approach One way to extend harmonic numbers is as follows. Start with the [...]The post Extending harmonic numbers first appeared on John D. Cook.
A note on Zipf’s law
Very often when a number is large, and we don't know or care exactly how large it is, we can model it as infinite. This may make no practical difference and can make calculations much easier. I give several examples of this in the article Infinite is easier than big. When you run across a [...]The post A note on Zipf's law first appeared on John D. Cook.
Natural language processing and unnatural text
I recently evaluated two software applications designed to find PII (personally identifiable information) in free text using natural language processing. Both failed badly, passing over obvious examples of PII. By contrast, I also tried natural language processing software on a nonsensical poem, it the software did quite well. Doctor's notes It occurred to me later [...]The post Natural language processing and unnatural text first appeared on John D. Cook.
How rare is it to encounter a rare word?
I recently ran across a paper on typesetting rare Chinese characters. From the abstract: Written Chinese has tens of thousands of characters. But most available fonts contain only around 6 to 12 thousand common characters that can meet the needs of everyday users. However, in publications and information exchange in many professional fields, a number [...]The post How rare is it to encounter a rare word? first appeared on John D. Cook.
How an LLM might leak medical data
Machine learning models occasionally memorize training data. Under the right prompt, a model could return portions of the training data verbatim. If a large language model is trained on deidentified medical data, along with data that overlaps with the medical data, it could potentially leak details of a person's medical history. I'm not saying that [...]The post How an LLM might leak medical data first appeared on John D. Cook.
V-statistics
A few days ago I wrote about U-statistics, statistics which can be expressed as the average of a symmetric function over all combinations of elements of a set. V-statistics can be written as an average of over all products of elements of a set. Let S be a statistical sample of size n and let [...]The post V-statistics first appeared on John D. Cook.
Filtering on how words are being used
Yesterday I wrote about how you could use the spaCy Python library to find proper nouns in a document. Now suppose you want to refine this and find proper nouns that are the subjects of sentences or proper nouns that are direct objects. This post was motivated by a project in which I needed to [...]The post Filtering on how words are being used first appeared on John D. Cook.
Forever chemicals and blood donation
I saw a headline saying that donating blood lowers the level of forever chemicals in your body. This post will give a back-of-the-envelope calculation to show that this idea is plausible. Suppose there are chemicals in your bloodstream that do not break down and that your body will not filter out. Suppose you have about [...]The post Forever chemicals and blood donation first appeared on John D. Cook.
Searching for proper nouns
Suppose you want to find all the proper nouns in a document. You could grep for every word that starts with a capital letter with something like grep '\b[A-Z]\w+' but this would return the first word of each sentence in addition to the words you're after. You could grep for capitalized words that are not [...]The post Searching for proper nouns first appeared on John D. Cook.
Moments of Tukey’s g-and-h distribution
John Tukey developed his so-called g-and-h distribution to be very flexible, having a wide variety of possible values of skewness and kurtosis. Although the reason for the distribution's existence is its range of possible skewness and values, calculating the skewness and kurtosis of the distribution is not simple. Definition Let be the function of [...]The post Moments of Tukey's g-and-h distribution first appeared on John D. Cook.
Symmetric functions and U-statistics
A symmetric function is a function whose value is unchanged under every permutation of its arguments. The previous post showed how three symmetric functions of the sides of a triangle a + b + c ab + bc + ac abc are related to the perimeter, inner radius, and outer radius. It also mentioned that [...]The post Symmetric functions and U-statistics first appeared on John D. Cook.
Relating perimeter, inner radius, outer radius, and sides of a triangle
Suppose a triangle T has sides a, b, and c. Let s be the semi-perimeter, i.e. half the perimeter. Let r be the inner radius, the radius of the largest circle that can fit inside T. Let R be the outer radius, the radius of the smallest circle that can enclose T. Then three simple [...]The post Relating perimeter, inner radius, outer radius, and sides of a triangle first appeared on John D. Cook.
Experiments with Bing chat
My two previous posts looked at experiments with ChatGPT and Google Bard. This post will look at redoing the same experiments with Microsoft's Bing Chat: looking for mnemonic encodings and simplifying Boolean expressions. When you open up Bing chat you can select a conversational style: More creative More balanced More precise I chose more precise" [...]The post Experiments with Bing chat first appeared on John D. Cook.
Boolean function minimization with AI
I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression. I didn't expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it [...]The post Boolean function minimization with AI first appeared on John D. Cook.
Large language models and mnemonics
The Major mnemonic system encodes numbers as words in order to make them easier to remember. Digits correspond to consonant sounds (not spellings) as explained here. You can use the system ad hoc, improvising an encoding of a word as needed, or you can memorize canonical encodings of numbers, also known as pegs. Pegs have [...]The post Large language models and mnemonics first appeared on John D. Cook.
...234567891011...