Feed john-d-cook John D. Cook

Favorite IconJohn D. Cook

Link https://www.johndcook.com/blog
Feed http://feeds.feedburner.com/TheEndeavour?format=xml
Updated 2025-04-26 03:16
Monthly highlights
If you enjoy reading the articles here, you might like a monthly review of the most popular posts. I send out a newsletter at the end of each month. I’ve sent out around 20 so far. They all have two parts: a review of the most popular posts of the month, and a few words […]
Cover time of a graph: cliques, chains, and lollipops
Cover time The cover time of a graph is the expected time it takes a simple random walk on the graph to visit every node. A simple random walk starts at some node, then at each step chooses with equal probability one of the adjacent nodes. The cover time is defined to be the maximum […]
Changing names
I’ve just started reading Laurus, an English translation of a contemporary Russian novel. The book opens with this paragraph. He had four names at various times. A person’s life is heterogeneous, so this could be seen as an advantage. Life’s parts sometimes have little in common, so little that it might appear that various people […]
Bernoulli numbers, Riemann zeta, and strange sums
In the previous post, we looked at sums of the first n consecutive powers, i.e. sums of the form where p was a positive integer. Here we look at what happens when we let p be a negative integer and we let n go to infinity. We’ll learn more about Bernoulli numbers and we’ll see what […]
Sums of consecutive powers
There’s a well-known formula for the sum of the first n positive integers: 1 + 2 + 3 + … + n = n(n + 1) / 2 There’s also a formula for the sum of the first n squares 12 + 22 + 32 + … + n2 = n(n + 1)(2n + 1) / 6 […]
Rapidly mixing random walks on graphs
Random walks mix faster on some graphs than on others. Rapid mixing is one of the ways to characterize expander graphs. By rapid mixing we mean that a random walk approaches its limiting (stationary) probability distribution quickly relative to random walks on other graphs. Here’s a graph that supports rapid mixing. Pick a prime p and label nodes 0, 1, 2, 3, […]
Top three posts of 2016
These were the most popular posts this year: Random number generator seed mistakes Unusual proof that there are infinitely many primes Literate programming: presenting code in human order
Branch cuts and Common Lisp
“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were […]
Subjectivity in statistics
Andrew Gelman on subjectivity in statistics: Bayesian methods are often characterized as “subjective” because the user must choose a prior distribution, that is, a mathematical expression of prior information. The prior distribution requires information and user input, that’s for sure, but I don’t see this as being any more “subjective” than other aspects of a […]
Most mathematical problem statement
Every so often college students will ask me for advice regarding going into applied math. I’ll tell them the first step in an application, and often the hardest step, is formulating a problem, not solving it. People don’t approach you with mathematical problems per se but problems that can be turned into mathematical problems. Nobody is going […]
An integral with a couple lessons
If you present calculus students with a definite integral, their Pavlovian response is “Take the anti-derivative, evaluate it at the limits, and subtract.” They think that’s what it means. But it’s not what a definite integral means. It’s how you (usually) calculate its value. This is not a pedantic fine point but a practically important distinction. It pays […]
How a couple failed auditions worked out well
When I was in high school, one year I made the Region choir. I had no intention of competing at the next level, Area, because I didn’t think I stood a chance of going all the way to State, and because the music was really hard: Stravinsky’s Symphony of Psalms. My choir director persuaded me […]
New iPhone app MathFeed for math news
Francis Su has created an iPhone app MathFeed that gives a stream of new math content: blog posts, book reviews, popular journal articles, and tweets. You can also get the same content via Twitter. Check it out!
Setting up Emacs shell on a Mac
Here are a few things I’ve had to figure out in the process of setting up Emacs on a Mac, in particular with getting shell-mode to work as I’d like. Maybe this will save someone else some time if they want to do the same. I’ve used a Mac occasionally since the days of the […]
Some frequently asked questions
I don’t have an FAQ page per se, but I’ve written a few blog posts where I answer some questions, and here I’ll answer a few more. Should I get a PhD? See my answer here and take a look at some of the other answers on the same site. Do you have any advice for people […]
Longhorn tribute to fallen Aggies
For many years, rivals University of Texas and Texas A&M University played each other in football on Thanksgiving. In 1999, the game fell one week after the collapse of the Aggie Bonfire killed 12 A&M students and injured 27. The University of Texas band’s half time show that year was a beautiful tribute to the fallen A&M students.
A different kind of network book
Yesterday I got a review copy of The Power of Networks. There’s some math inside, but not much, and what’s there is elementary. I’d say it’s not a book about networks per se but a collection of topics associated with networks: cell phone protocols, search engines, auctions, recommendation engines, etc. It would be a good […]
Hard work
The pinned tweet on my Twitter account at the moment says “Productivity tip: work hard.” It’s gotten a lot of positive feedback, so I assume it has resonated with a few people. Productivity tip: Work hard. — John D. Cook (@JohnDCook) October 8, 2015 I don’t know how people take it, but here’s what I […]
Ultra-reliable software
From a NASA page advocating formal methods: We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10^-7/hour). Emphasis added. Developing medium-reliability and high-reliability software are almost entirely different professions. Using typical […]
Technological allegiances
I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from […]
Truncated exponential series inequality
Define Tn to be the Taylor series for exp(x) truncated after n terms: How does this function compare to its limit, exp(x)? We might want to know because it’s often useful to have polynomial upper or lower bounds on exp(x). For x > 0 it’s clear that exp(x) is larger than Tn(x) since the discarded terms […]
Speed and correctness
Comment from Paul Phillips on making things easy to understand: It’s always been “We can’t do it that way. It would be too slow.” You know what’s slow? Spending all day trying to figure out why it doesn’t work. That’s slow. That’s the slowest thing I know.
Random squares
In geometry, you’d say that if a square has side x, then it has area x2. In calculus, you’d say more. First you’d say that if a square has side near x, then it has area near x2. That is, area is a continuous function of the length of a side. As the length of the side […]
Normal hazard continued fraction
The hazard function of a probability distribution is the instantaneous probability density of an event given that it hasn’t happened yet. This works out to be the ratio of the PDF (probability density function) to the CCDF (complementary cumulative density function). For the standard normal distribution, the hazard function is and has a surprisingly simple […]
A short, unusual proof that there are infinitely many primes
Sam Northshield [1] came up with the following clever proof that there are infinitely many primes. Suppose there are only finitely many primes and let P be their product. Then The original publication gives the calculation above with no explanation. Here’s a little commentary to explain the calculation. Since prime numbers are greater than 1, sin(π/p) is […]
A curious property of catenaries
Suppose you have a flat line f(x) = k and an interval [a, b]. Then the area under the line and over the interval is k times the length of the segment of the line. Surprisingly, the same is true for a catenary with scale k. With the flat line, the length of the segment of the graph is […]
Natural growth
Interesting passage from Small is Beautiful: Economics as if People Mattered by E. F. Schumacher: Nature always, so to speak, knows where and when to stop. There is a measure in all natural things—in their size, speed, or violence. As a result, the system of nature, of which man is a part, tends to be […]
When does a function equal its Taylor series?
Taylor’s theorem says When does the thing on the left equal the thing on the right? A few things could go wrong: Maybe not all the terms on the right side exist, i.e. the function f might not be infinitely differentiable. Maybe f is infinitely differentiable but the series diverges. Maybe f is infinitely differentiable but the […]
Valuing results and information
Chris Wiggins gave an excellent talk at Rice this afternoon on data science at the New York Times. In the Q&A afterward, someone asked how you would set up a machine learning algorithm where you’re trying to optimize for outcomes and for information. Here’s how I’ve approached this dilemma in the past. Information and outcomes are not […]
Computing discrete logarithms with baby-step giant-step algorithm
At first “discrete logarithm” sounds like a contradiction in terms. Logarithms aren’t discrete, not as we usually think of them. But you can define and compute logarithms in modular arithmetic. What is a logarithm? It’s the solution to an exponential equation. For example, the logarithm base 10 of 2 is the solution to the equation […]
Interim analysis, futility monitoring, and predictive probability
An interim analysis of a clinical trial is an unusual analysis. At the end of the trial you want to estimate how well some treatment X works. For example, you want to how likely is it that treatment X works better than the control treatment Y. But in the middle of the trial you want to know something more subtle. It’s […]
Periods of fractions
Suppose you have a fraction a/b where 0 < a < b, and a and b are relatively prime integers. The decimal expansion of a/b either terminates or it has an initial non-repeating part followed by a repeating part. How long is the non-repeating part? How long is the period of the repeating part? The answer depends on the prime factorization […]
Speeding up R code
People often come to me with R code that’s running slower than they’d like. It’s not unusual to make the code 10 or even 100 times faster by rewriting it in C++. Not all that speed improvement comes from changing languages. Some of it comes from better algorithms, eliminating redundancy, etc. Why bother optimizing? If […]
The big deal about neural networks
In their book Computer Age Statistical Inference, Brad Efron and Trevor Hastie give a nice description of neutral networks and deep learning. The knee-jerk response [to neural networks] from statisticians was “What’s the big deal? A neural network is just a nonlinear model, not too different from many other generalizations of linear models.” While this […]
Gentle introduction to R
The R language is closely tied to statistics. It’s ancestor was named S, because it was a language for Statistics. The open source descendant could have been named ‘T’, but its creators chose to call it’R.’ Most people learn R as they learn statistics: Here’s a statistical concept, and here’s how you can compute it in R. […]
Turning math inside-out
Here’s one of the things about category theory that takes a while to get used to. Mathematical objects are usually defined internally. For example, the Cartesian product P of two sets A and B is defined to be the set of all ordered pairs (a, b) where a comes from A and b comes from B. The definition of P depends on the elements […]
Optimal team size
Kevlin Henney’s keynote at GOTO Copenhagen this year discussed how project time varies as a function of the number of people on the project. The most naive assumption is that the time is inversely proportional to the number of people. That is t = W/n where t is the calendar time to completion, W is a measure […]
Efficiency of C# on Linux
This week I attended Mads Torgersen’s talk Why you should take another look at C#. Afterward I asked him about the efficiency of C# on Linux. When I last looked into it, it wasn’t good. A few years ago I asked someone on my team to try running some C# software on Linux using Mono. The code worked […]
GOTO Copenhagen
I gave a talk this morning at GOTO Copenhagen 2016 on ways to mix R with other programming languages: Rcpp, HaskellR, R Markdown, etc. It’s been fun to see some people I haven’t seen since I spoke at the GOTO and YOW conferences four years ago. Photo above by conference photographer Fritz Schumann.
Mathematical modeling for medical devices
We’re about to see a lot of new, powerful, inexpensive medical devices come out. And to my surprise, I’ve contributed to a few of them. Growing compute power and shrinking sensors open up possibilities we’re only beginning to explore. Even when the things we want to observe elude direct measurement, we may be able to infer them from […]
Publishable
For an article to be published, it has to be published somewhere. Each journal has a responsibility to select articles relevant to its readership. Articles that make new connections might be unpublishable because they don’t fit into a category. For example, I’ve seen papers rejected by theoretical journals for being too applied, and the same papers […]
One of my favorite proofs: Lagrange multipliers
One of my lightbulb moments in college was when my professor, Jim Vick, explained the Lagrange multiplier theorem. The way I’d seen it stated in a calculus text gave me no feel for why it should be true, but his explanation made sense immediately. Suppose f(x) is a function of several variables, i.e. x is a vector, and g(x) = c […]
Uncertainty in a probability
Suppose you did a pilot study with 10 subjects and found a treatment was effective in 7 out of the 10 subjects. With no more information than this, what would you estimate the probability to be that the treatment is effective in the next subject? Easy: 0.7. Now what would you estimate the probability to be […]
New Twitter account: FormalFact
I’m starting a new Twitter account for logic and formal methods: @FormalFact. Expect to see tweets about constructive logic, type theory, formal proofs, proof assistants, etc. The image for the account is a bowtie, a pun on formality. It’s also the symbol for natural join in relational algebra.
Münchausen numbers
The number 3435 has the following curious property: 3435 = 33 + 44 + 33 + 55. It is called a Münchausen number, an allusion to fictional Baron Münchausen. When each digit is raised to its own power and summed, you get the original number back. The only other Münchausen number is 1. At least in […]
Beta reduction: The difference typing makes
Beta reduction is essentially function application. If you have a function described by what it does to x and apply it to an argument t, you rewrite the xs as ts. The formal definition of β-reduction is more complicated than this in order to account for free versus bound variables, but this informal description is sufficient […]
Less likely to get half, more likely to get near half
I was catching up on Engines of our Ingenuity episodes this evening when the following line jumped out at me: If I flip a coin a million times, I’m virtually certain to get 50 percent heads and 50 percent tails. Depending on how you understand that line, it’s either imprecise or false. The more times you […]
Insufficient statistics
Experience with the normal distribution makes people think all distributions have (useful) sufficient statistics [1]. If you have data from a normal distribution, then the sufficient statistics are the sample mean and sample variance. These statistics are “sufficient” in that the entire data set isn’t any more informative than those two statistics. They effectively condense […]
Reversing WYSIWYG
The other day I found myself saying that I preferred org-mode files to Jupyter notebooks because with org-mode, what you see is what you get. Then I realized I was using “what you see is what you get” (WYSISYG) in exactly the opposite of the usual sense. Jupyter notebooks are WYSIWYG in the same sense […]
Floating point: between blissful ignorance and unnecesssary fear
Most programmers are at one extreme or another when it comes to floating point arithmetic. Some are blissfully ignorant that anything can go wrong, while others believe that danger lurks around every corner when using floating point. The limitations of floating point arithmetic are something to be aware of, and ignoring these limitations can cause problems, like crashing […]
...46474849505152535455...