Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-02-20 18:47
Unix-like shells on Windows
This post gives some notes on ways to create a Unix-like command line experience on Windows, without using a virtual machine like VMWare or a quasi-virtual machine like Cygwin. Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!) There […]
Data, code, and regulation
Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation. In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, […]
Subway map of the solar system
This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out. It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the […]
Fibonacci number system
Every positive integer can be written as the sum of distinct Fibonacci numbers. For example, 10 = 8 + 2, the sum of the fifth Fibonacci number and the second. This decomposition is unique if you impose the extra requirement that consecutive Fibonacci numbers are not allowed. [1] It’s easy to see that the rule against consecutive […]
New monthly newsletter
Thank you for reading my blog. I’m starting a new email newsletter to address two things that readers have mentioned. Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts. Others have said they’d like to know more about […]
Information hiding
One of the basic principles of software development is information hiding. People agree that it’s desirable, but may not realize they have different ideas of what it means. And when done poorly, well-meaning attempts to make software more maintainable backfire. Leo Brodie cautions … we should clarify. From what, or whom, are we hiding information? […]
Rotating PDF pages with Python
Yesterday I got a review copy of Automate the Boring Stuff with Python. It explains, among other things, how to manipulate PDFs from Python. This morning I needed to rotate some pages in a PDF, so I decided to try out the method in the book. The sample code uses PyPDF2. I’m using Conda for […]
RSS feeds for Twitter accounts
Twitter once provided RSS feeds for all Twitter accounts. They no longer provide this service. However, third parties can create RSS feeds from the content of Twitter accounts. BazQux has done this for my daily tip accounts, so you can subscribe to any of my accounts via RSS using the feeds linked to below. AlgebraFact AnalysisFact […]
Scientifically valid, practically invalid
In a recent episode of EconTalk, Phil Rosenzweig describes how the artificial conditions necessary to make experiments scientifically valid can also make the results practically invalid. Rosenzweig discusses experiments designed to study decision making. In order to make clean comparisons, subjects are presented with discrete choices over which they have no control. They cannot look for […]
The Mozart Myth
I don’t know how many times I’ve heard about how Mozart would compose entire musical scores in his head and only write them down once they were finished. Even authors who stress that creativity requires false starts and hard work have said that Mozart may have been an exception. But maybe he wasn’t. In his new book How to […]
Pedantic arithmetic rules
Generations of math teachers have drilled into their students that they must reduce fractions. That serves some purpose in the early years, but somewhere along the way students need to learn reducing fractions is not only unnecessary, but can be bad for communication. For example, if the fraction 45/365 comes up in the discussion of […]
QR Codes and Percolation
Percolation theory looks at problems such as the probability of being able to traverse some region with random obstacles. It is motivated by problems such as modeling the flow of a fluid in a porous medium. Here’s a percolation problem for QR codes: What is the probability that there is a path from one side […]
Why is an empty sum 0 and an empty product 1?
In response to my earlier post on why 0! should be 1, several people replied that 0! = 1 because an empty product is 1. You can define the factorial of an integer n as the product of all positive numbers less than or equal to n. There are no positive integers less than or equal […]
Quantifying uncertainty
The primary way to quantify uncertainty is to use probability. Subject to certain axioms that aim to capture common-sense rules for quantifying uncertainty, probability theory is essentially the only way. (This is Cox’s theorem.) Other methods, such as fuzzy logic, may be useful, though they must violate common sense (at least as defined by Cox’s theorem) […]
Defining zero factorial
Things are defined the way they are for good reasons. This seems blatantly obvious now, but it was eye-opening when I learned this my first year in college. Our professor, Mike Starbird, asked us to go home and think about how convergence of a series should be defined. Not how it is defined, but how […]
Why not statistics
Jordan Ellenberg’s parents were both statisticians. In his interview with Strongly Connected Components Jordan explains why he went into mathematics rather than statistics. I tried. I tried to learn some statistics actually when I was younger and it’s a beautiful subject. But at the time I think I found the shakiness of the philosophical underpinnings […]
Another reason we don’t apply the 80-20 rule
I’ve written about the 80-20 rule several times because it keeps coming up. I’d like to believe that each time I revisit it I understand it a little better. In its simplest form the 80-20 rule says 80% of your outputs come from 20% of your inputs. You might find that 80% of your revenue comes from 20% of […]
Endorsements
I’ve added a page for endorsements to my site. Thanks to everyone who let me use their photo and quote. If you’d like to contribute an endorsement, please contact me.
Magicians vs Repairmen
From The World Beyond Your Head: The appeal of magic is that it promises to render objects plastic to the will without one’s getting too entangled with them. Treated at arm’s length, the object can issue no challenge to the self. … The clearest contrast … that I can think of is the repairman, who […]
Looking ten years ahead
From Freeman Dyson: Economic forecasting is useful for predicting the future up to about ten years ahead. Beyond ten years the quantitative changes which the forecast accesses are usually sidetracked or made irrelevant by qualitative changes in the rules of the game. Qualitative changes are produced by human cleverness … or by human stupidity … Neither […]
Key fobs and interstellar space
From JPL scientist Rich Terrile: In everyone’s pocket right now is a computer far more powerful than the one we flew on Voyager, and I don’t mean your cell phone—I mean the key fob that unlocks your car. These days technology is equated with computer technology. For example, the other day I heard someone talk […]
Integration by Darts
Monte Carlo integration has been called “Integration by Darts,” a clever pun on “integration by parts.” I ran across the phrase looking at some slides by Brian Hayes, but apparently it’s been around a while. The explanation that Monte Carlo is “integration by darts” is fine as a 0th order explanation, but it can be […]
Bayes factors vs p-values
Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions. The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to “Bernardo (2010)” though I have not been able to find the exact […]
Pros and cons of the term “data science”
I’ve resisted using the term “data science,” and enjoy poking fun at it now and then, but I’ve decided it’s not such a bad label after all. Here are some of the pros and cons of the term. (Listing “cons” first seems backward, but I’m currently leaning toward the pro side, so I thought I […]
Replace data with measurements
To tell whether a statement about data is over-hyped, see whether it retains its meaning if you replace data with measurements. So a request like “Please send me the data from your experiment” becomes “Please send me the measurements from your experiment.” Same thing. But rousing statements about the power of data become banal or even […]
Clinical trials and machine learning
Arguments over the difference between statistics and machine learning are often pointless. There is a huge overlap between the two approaches to analyzing data, sometimes obscured by differences in vocabulary. However, there is one distinction that is helpful. Statistics aims to build accurate models of phenomena, implicitly leaving the exploitation of these models to others. Machine learning aims to solve […]
Fitting a triangular distribution
Sometimes you only need a rough fit to some data and a triangular distribution will do. As the name implies, this is a distribution whose density function graph is a triangle. The triangle is determined by its base, running between points a and b, and a point c somewhere in between where the altitude intersects the base. […]
A subtle way to over-fit
If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So in machine learning and statistics, people split their data into two parts. They train the model on one half, and see how well it […]
Mathematical arbitrage
I suspect there’s a huge opportunity in moving mathematics from the pure column to the applied column. There may be a lot of useful math that never sees application because the experts are unconcerned with or unaware of applications. In particular I wonder what applications there may be of number theory, especially analytic number theory. […]
Mathematical modeling in Milton
In Book VIII of Paradise Lost, the angel Raphael tells Adam what difficulties men will have with astronomy: Hereafter, when they come to model heaven And calculate the stars: how they will wield the The mighty frame, how build, unbuild, contrive To save appearances, how gird the sphere With centric and eccentric scribbled o’er, Cycle […]
Partitioning natural numbers with pi
Every positive integer is either part of the sequence ⌊ nπ ⌋ or the sequence ⌊ nπ/(π – 1) ⌋ where n ranges over positive integers, and no positive integer is in both sequences. This is a special case of Beatty’s theorem.
Extremely small probabilities
One objection to modeling adult heights with a normal distribution is that the former is obviously positive but the latter can be negative. However, by this model negative heights are astronomically unlikely. I’ll explain below how one can take “astronomically” literally in this context. A common model says that men’s and women’s heights are normally […]
Atavachron
In the Star Trek episode “All Our Yesterdays” the people of the planet Sarpeidon have escaped into their past because their sun is about to become a supernova. They did this via a time machine called the Atavachron. One detail of the episode has stuck with me since I first saw it many years ago: although people can go back […]
Why isn’t everything normally distributed?
Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem. If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about […]
Machine learning and magic
When I first heard about a lie detector as a child, I was puzzled. How could a machine detect lies? If it could, why couldn’t you use it to predict the future? For example, you could say “IBM stock will go up tomorrow” and let the machine tell you whether you’re lying. Of course lie […]
Quaternions in Paradise Lost
Last night I checked a few books out from a library. One was Milton’s Paradise Lost and another was Kuipers’ Quaternions and Rotation Sequences. I didn’t expect any connection between these two books, but there is one. The following lines from Book V of Paradise Lost, starting at line 180, are quoted in Kuipers’ book: Air […]
Technical notes
For the last fifteen Wednesdays I’ve been posting links to technical notes. This is the end of the series. You can find most of the links from previous Wednesday posts on one page by going to technical notes from the navigation menu at the top of the site.
Oil on a parking lot
Oil on a wet parking lot
Graphemes
Here’s something amusing I ran across in the glossary of Programming Perl: grapheme A graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. Grapheme, or more fully, a grapheme cluster string is a single user-visible character, which in turn may be several characters (codepoints) long. For example … a “ȫ” […]
Too easy
When people sneer at a technology for being too easy to use, it’s worth trying out. If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way. The problem with the “OK […]
Clinical trial software
This week’s resource post lists some of the projects I managed or contributed to while working at MD Anderson Cancer Center. CRMSimulator is used to design CRM trials, dose-finding based only on toxicity outcomes. BMA-CRMSimulator is a variation on CRMSimulator using Bayesian model averaging. EffTox is used for dose-finding based on toxicity and efficacy outcomes. TTEConduct […]
Finding the best dose
In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here “best” may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc. Since your goal is to find the best dose, it seems natural to compare dose-finding […]
Career advice from Einstein
“If I would be a young man again and had to decide how to make my living, I would not try to become a scientist or scholar or teacher. I would rather choose to be a plumber or a peddler, in the hope to find that modest degree of independence still available under present circumstances.” […]
The opposite of an idiot
The origin of the word idiot is “one’s own,” the same root as idiom. So originally an idiot was someone in his own world, someone who takes no outside input. The historical meaning carries over to some degree: When you see a smart person do something idiotic, it’s usually because he’s acting alone. The opposite of […]
Successful companies with incompetent employees
It’s not hard to imagine how a company filled with great people can thrive. More intriguing are the companies that inspire Dilbert cartoons and yet manage to succeed. When a company thrives despite bad service and incompetent employees, they’re doing something right that isn’t obvious. Not everyone can be incompetent. Someone somewhere in the company must be very competent […]
Stand-alone code for numerical computing
For this week’s resource post, see the page Stand-alone code for numerical computing. It points to small, self-contained bits of code for special functions (log gamma, erf, etc.) and for random number generation (normal, Poisson, gamma, etc.). The code is available in Python, C++, and C# versions. It could easily be translated into other languages […]
Random walks and the arcsine law
Suppose you stand at 0 and flip a fair coin. If the coin comes up heads, you take a step to the right. Otherwise you take a step to the left. How much of the time will you spend to the right of where you started? As the number of steps N goes to infinity, […]
Playing with continued fractions and Khinchin’s constant
Take a real number x and expand it as a continued fraction. Compute the geometric mean of the first n coefficients. Aleksandr Khinchin proved that for almost all real numbers x, as n → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” […]
Grand unification of mathematics
Greg Egan’s short story Glory features a “xenomathematician” who discovers that an ancient civilization had produced a sort of grand unification of their various branches of mathematics. It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different […]
Natural optima occur in the middle
Akin’s eighth law of spacecraft design says In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point. When I first read this I immediately thought of several examples where theory said that an optima was at an extreme, but experience said otherwise. Linear programming (LP) says […]
...515253545556