Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2024-11-23 12:16
QR Codes and Percolation
Percolation theory looks at problems such as the probability of being able to traverse some region with random obstacles. It is motivated by problems such as modeling the flow of a fluid in a porous medium. Here’s a percolation problem for QR codes: What is the probability that there is a path from one side […]
Why is an empty sum 0 and an empty product 1?
In response to my earlier post on why 0! should be 1, several people replied that 0! = 1 because an empty product is 1. You can define the factorial of an integer n as the product of all positive numbers less than or equal to n. There are no positive integers less than or equal […]
Quantifying uncertainty
The primary way to quantify uncertainty is to use probability. Subject to certain axioms that aim to capture common-sense rules for quantifying uncertainty, probability theory is essentially the only way. (This is Cox’s theorem.) Other methods, such as fuzzy logic, may be useful, though they must violate common sense (at least as defined by Cox’s theorem) […]
Defining zero factorial
Things are defined the way they are for good reasons. This seems blatantly obvious now, but it was eye-opening when I learned this my first year in college. Our professor, Mike Starbird, asked us to go home and think about how convergence of a series should be defined. Not how it is defined, but how […]
Why not statistics
Jordan Ellenberg’s parents were both statisticians. In his interview with Strongly Connected Components Jordan explains why he went into mathematics rather than statistics. I tried. I tried to learn some statistics actually when I was younger and it’s a beautiful subject. But at the time I think I found the shakiness of the philosophical underpinnings […]
Another reason we don’t apply the 80-20 rule
I’ve written about the 80-20 rule several times because it keeps coming up. I’d like to believe that each time I revisit it I understand it a little better. In its simplest form the 80-20 rule says 80% of your outputs come from 20% of your inputs. You might find that 80% of your revenue comes from 20% of […]
Endorsements
I’ve added a page for endorsements to my site. Thanks to everyone who let me use their photo and quote. If you’d like to contribute an endorsement, please contact me.
Magicians vs Repairmen
From The World Beyond Your Head: The appeal of magic is that it promises to render objects plastic to the will without one’s getting too entangled with them. Treated at arm’s length, the object can issue no challenge to the self. … The clearest contrast … that I can think of is the repairman, who […]
Looking ten years ahead
From Freeman Dyson: Economic forecasting is useful for predicting the future up to about ten years ahead. Beyond ten years the quantitative changes which the forecast accesses are usually sidetracked or made irrelevant by qualitative changes in the rules of the game. Qualitative changes are produced by human cleverness … or by human stupidity … Neither […]
Key fobs and interstellar space
From JPL scientist Rich Terrile: In everyone’s pocket right now is a computer far more powerful than the one we flew on Voyager, and I don’t mean your cell phone—I mean the key fob that unlocks your car. These days technology is equated with computer technology. For example, the other day I heard someone talk […]
Integration by Darts
Monte Carlo integration has been called “Integration by Darts,” a clever pun on “integration by parts.” I ran across the phrase looking at some slides by Brian Hayes, but apparently it’s been around a while. The explanation that Monte Carlo is “integration by darts” is fine as a 0th order explanation, but it can be […]
Bayes factors vs p-values
Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact […]
Pros and cons of the term “data science”
I’ve resisted using the term “data science,” and enjoy poking fun at it now and then, but I’ve decided it’s not such a bad label after all. Here are some of the pros and cons of the term. (Listing “cons” first seems backward, but I’m currently leaning toward the pro side, so I thought I […]
Replace data with measurements
To tell whether a statement about data is over-hyped, see whether it retains its meaning if you replace data with measurements. So a request like “Please send me the data from your experiment” becomes “Please send me the measurements from your experiment.” Same thing. But rousing statements about the power of data become banal or even […]
Clinical trials and machine learning
Arguments over the difference between statistics and machine learning are often pointless. There is a huge overlap between the two approaches to analyzing data, sometimes obscured by differences in vocabulary. However, there is one distinction that is helpful. Statistics aims to build accurate models of phenomena, implicitly leaving the exploitation of these models to others. Machine learning aims to solve […]
Fitting a triangular distribution
Sometimes you only need a rough fit to some data and a triangular distribution will do. As the name implies, this is a distribution whose density function graph is a triangle. The triangle is determined by its base, running between points a and b, and a point c somewhere in between where the altitude intersects the base. […]
A subtle way to over-fit
If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So in machine learning and statistics, people split their data into two parts. They train the model on one half, and see how well it […]
Mathematical arbitrage
I suspect there’s a huge opportunity in moving mathematics from the pure column to the applied column. There may be a lot of useful math that never sees application because the experts are unconcerned with or unaware of applications. In particular I wonder what applications there may be of number theory, especially analytic number theory. […]
Mathematical modeling in Milton
In Book VIII of Paradise Lost, the angel Raphael tells Adam what difficulties men will have with astronomy: Hereafter, when they come to model heaven And calculate the stars: how they will wield the The mighty frame, how build, unbuild, contrive To save appearances, how gird the sphere With centric and eccentric scribbled o’er, Cycle […]
Partitioning natural numbers with pi
Every positive integer is either part of the sequence ⌊ nπ ⌋ or the sequence ⌊ nπ/(π – 1) ⌋ where n ranges over positive integers, and no positive integer is in both sequences. This is a special case of Beatty’s theorem.
Extremely small probabilities
One objection to modeling adult heights with a normal distribution is that the former is obviously positive but the latter can be negative. However, by this model negative heights are astronomically unlikely. I’ll explain below how one can take “astronomically” literally in this context. A common model says that men’s and women’s heights are normally […]
Atavachron
In the Star Trek episode “All Our Yesterdays” the people of the planet Sarpeidon have escaped into their past because their sun is about to become a supernova. They did this via a time machine called the Atavachron. One detail of the episode has stuck with me since I first saw it many years ago: although people can go back […]
Why isn’t everything normally distributed?
Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem. If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about […]
Machine learning and magic
When I first heard about a lie detector as a child, I was puzzled. How could a machine detect lies? If it could, why couldn’t you use it to predict the future? For example, you could say “IBM stock will go up tomorrow” and let the machine tell you whether you’re lying. Of course lie […]
Quaternions in Paradise Lost
Last night I checked a few books out from a library. One was Milton’s Paradise Lost and another was Kuipers’ Quaternions and Rotation Sequences. I didn’t expect any connection between these two books, but there is one. The following lines from Book V of Paradise Lost, starting at line 180, are quoted in Kuipers’ book: Air […]
Technical notes
For the last fifteen Wednesdays I’ve been posting links to technical notes. This is the end of the series. You can find most of the links from previous Wednesday posts on one page by going to technical notes from the navigation menu at the top of the site.
Oil on a parking lot
Oil on a wet parking lot
Graphemes
Here’s something amusing I ran across in the glossary of Programming Perl: grapheme A graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. Grapheme, or more fully, a grapheme cluster string is a single user-visible character, which in turn may be several characters (codepoints) long. For example … a “ȫ” […]
Too easy
When people sneer at a technology for being too easy to use, it’s worth trying out. If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way. The problem with the “OK […]
Clinical trial software
This week’s resource post lists some of the projects I managed or contributed to while working at MD Anderson Cancer Center. CRMSimulator is used to design CRM trials, dose-finding based only on toxicity outcomes. BMA-CRMSimulator is a variation on CRMSimulator using Bayesian model averaging. EffTox is used for dose-finding based on toxicity and efficacy outcomes. TTEConduct […]
Finding the best dose
In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here “best” may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc. Since your goal is to find the best dose, it seems natural to compare dose-finding […]
Career advice from Einstein
“If I would be a young man again and had to decide how to make my living, I would not try to become a scientist or scholar or teacher. I would rather choose to be a plumber or a peddler, in the hope to find that modest degree of independence still available under present circumstances.” […]
The opposite of an idiot
The origin of the word idiot is “one’s own,” the same root as idiom. So originally an idiot was someone in his own world, someone who takes no outside input. The historical meaning carries over to some degree: When you see a smart person do something idiotic, it’s usually because he’s acting alone. The opposite of […]
Successful companies with incompetent employees
It’s not hard to imagine how a company filled with great people can thrive. More intriguing are the companies that inspire Dilbert cartoons and yet manage to succeed. When a company thrives despite bad service and incompetent employees, they’re doing something right that isn’t obvious. Not everyone can be incompetent. Someone somewhere in the company must be very competent […]
Stand-alone code for numerical computing
For this week’s resource post, see the page Stand-alone code for numerical computing. It points to small, self-contained bits of code for special functions (log gamma, erf, etc.) and for random number generation (normal, Poisson, gamma, etc.). The code is available in Python, C++, and C# versions. It could easily be translated into other languages […]
Random walks and the arcsine law
Suppose you stand at 0 and flip a fair coin. If the coin comes up heads, you take a step to the right. Otherwise you take a step to the left. How much of the time will you spend to the right of where you started? As the number of steps N goes to infinity, […]
Playing with continued fractions and Khinchin’s constant
Take a real number x and expand it as a continued fraction. Compute the geometric mean of the first n coefficients. Aleksandr Khinchin proved that for almost all real numbers x, as n → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” […]
Grand unification of mathematics
Greg Egan’s short story Glory features a “xenomathematician” who discovers that an ancient civilization had produced a sort of grand unification of their various branches of mathematics. It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different […]
Natural optima occur in the middle
Akin’s eighth law of spacecraft design says In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point. When I first read this I immediately thought of several examples where theory said that an optima was at an extreme, but experience said otherwise. Linear programming (LP) says […]
“MTD” is misleading
Dose-finding trials of chemotherapy agents look for the MTD: maximum tolerated dose. The idea is to give patients as much chemotherapy as they can tolerate, hoping to do maximum damage to tumors without doing too much damage to patients. But “maximum tolerated dose” implies a degree of personalization that rarely exists in clinical trials. Phase I […]
Decide what to abandon
Sometimes it’s rational to walk away from something you’ve invested a great deal in. It’s hard imagine how investors could abandon something as large and expensive as a shopping mall. And yet it must have been a sensible decision. If anyone disagreed, they could buy the abandoned mall on the belief that they could make a […]
Code Project articles
This week’s resource post lists some articles along with source code I’ve posted on CodeProject. Probability Pitfalls in Random Number Generation includes several lessons learned the hard way. Simple Random Number Generation is a random number generator written in C# based on George Marsaglia’s WMC algorithm. Finding probability distribution parameters from percentiles Numerical computing Avoiding […]
Perl regex twitter account
I’ve started a new Twitter account @PerlRegex for Perl regular expressions. My original account, @RegexTip, is for regular expressions in general and doesn’t go into much detail regarding any particular implementation. @PerlRegex goes into the specifics of regular expressions in Perl. Why specifically Perl regular expressions? Because Perl has the most powerful support for regular […]
Another reason natural logarithms are natural
In mathematics, log means natural logarithm by default; the burden of explanation is on anyone taking logarithms to a different base. I elaborate on this a little here. Looking through Andrew Gelman and Jennifer Hill’s regression book, I noticed a justification for natural logarithms I hadn’t thought about before. We prefer natural logs (that is, logarithms […]
Miscellaneous math resources
Every Wednesday I’ve been pointing out various resources on my web site. So far they’ve all been web pages, but the following are all PDF files. Probability and statistics: How to test a random number generator Predictive probabilities for normal outcomes One-arm binary predictive probability Relating two definitions of expectation Illustrating the error in the […]
Rare letter combinations and key chords
A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe […]
Disappearing data projections
Suppose you have data in an N-dimensional space where N is large and consider the cube [-1, 1]N. The coordinate basis vectors start in the center of the cube and poke out through the middle of the faces. The diagonals of the cube run from the center to one of the corners. If your points cluster along one […]
...5051525354