Twitter follower distribution
A conversation this morning prompted the question of how many Twitter accounts have between 10,000 and 20,000 followers. I hadn't thought about the distribution numbers of followers in a while and was curious to revisit the topic.
Apparently this question was more popular five years ago. When I did a few searches on the topic, all the results I got back were five or more years old. Also, data about the Twitter network was more available then that it is now.
This post will come up with an estimate based on almost no data, approaching this like a Fermi problem.
ModelI'm going to assume that the number of followers for the nth most followed account is
f = c n-
for some constants c and . A lot of networks approximately follow some distribution like this, and often is somewhere between 1 and 3.
Two data pointsI've got two unknowns, so I need two data points. Wikipedia has a list of the 50 most followed Twitter accounts here, and the 50th account has 37.7 million followers. (I chose the 50th account on the list rather than a higher one because power laws typically fit better after the first few data points.)
I believe that a few years ago the median number of Twitter followers was 0: more than half of Twitter accounts had no followers. Let's assume the median number of followers is 1. But median out of what? I think I read there are about 350 million total Twitter accounts, and about 200 million active accounts. So should we base our median out of 350 accounts or 200 accounts? We'll split the difference and assume the median account is the 137,500,000th most popular account.
Solve for parametersSo now we have two equations:
c 50- = 37,700,000
c 137500000- = 1
and taking logs gives us two linear equations in two unknowns. We solve this to find = 1.1 and c = 2.9 * 109. The estimate of is about the size we expected, so that's a good sign.
Take this with a grain of salt. We've guessed a very simple model and fit it with just two data points, one of which we based on an educated guess.
Final estimateWe assumed f = c n- and we can solve this for n. If an account has n followers, we'd estimate its rank n as
n = (c / f)1/.
So we'd estimate the number of accounts with between 10,000 and 20,000 followers to be
(c / 10000)1/ - (c /20000)1/.
which is about 40,000. I expect this final estimate is not bad, say within an order of magnitude, despite all the crude assumptions made along the way.
The post Twitter follower distribution first appeared on John D. Cook.