Article 6E950 Top Programming Languages Methodology

Top Programming Languages Methodology

by
Stephen Cass
from IEEE Spectrum on (#6E950)

In our goal of trying to estimate a programming language's popularity, we realized that no one can look over the shoulder of every person writing code, whether that be a child writing a Java script for a personal Minecraft server, a mobile app developer hoping to hit it big, or an aerospace engineer writing mission-critical code for a voyage to Mars. Our Top Programming Languages interactive tries to tackle the problem of estimating a language's popularity.

We do this by constructing measures of popularity from a variety of data sources that we believe are good proxies for active interest for each programming language. In total we identify 59 programming languages. We then weight each data source to create an overall index of popularity. Below, we describe the sources of data we use to get the measures, and the weighting scheme we use to produce the overall indices.

By popularity, we mean we are trying to rank languages that are in active use. We look at three different aspects of popularity: languages in active use among typical IEEE members and working software engineers (the Spectrum" ranking), languages that are in demand by employers (the Jobs" ranking), and languages that are in the zeitgeist (the Trending" ranking).

We gauged the popularity of languages using the following sources for a total of eight metrics (see below). We gathered the information for all metrics in June-July 2023. The data were gathered manually to avoid results being biased due to API changes or terminations and because many of the programming language's names (C++, Scheme) collided with common terms found in research papers and job ads or were difficult for a search engine to parse. When a large number of search results made it impractical to resolve ambiguities by examining all of them individually, we used a sample of each data source, and determined the relevant sample size based on estimating the true mean with 95 percent confidence. Not all data sources contain information for each programming language and we interpret this information as the programming language having no hits" (i.e., not being popular).

The results from each metric are normalized to produce a relative popularity score between 0 and 1. Then the individual metrics are multiplied by a weight factor, combined, and the result renormalized to produce an aggregate popularity score.

In aggregating metrics, we hope to compensate for statistical quirks that might distort a language's popularity score in any particular source of data. Varying the weight factors allows us to create the different results for the Spectrum, Jobs, and Trending rankings. We fully acknowledge that, while these weights are subjective, they are based on our understanding of the sources and our prior coverage of software topics. Varying the weight factors allows us to emphasize different types of popularity and produce the different popularity rankings. We then combined each weighted data source for each program and then renormalized the resulting frequency to produce an aggregate popularity score. In aggregating across each data source, we hope to compensate for statistical quirks that might distort a language's popularity score in any particular source of data.

The Top Programming Languages was originally created by data journalist Nick Diakopoulos. Our statistical methodology advisor is Hilary Wething. Rankings are computed using R.

Google

Google is the leading search engine in the world, making it an ideal fit for estimating language popularity. We measured the number of hits for each language by searching on the template, X programming language" (with quotation marks) and manually recorded the number of results that were returned by the search. We took the measurement in June 2023. We like this measure because it indicates the volume of online information resources about each programming language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We recorded the number of questions tagged to each program within the last week prior to our search (June-July 2023). For the Mathematica/Wolfram language, we relied on the sister Stack" for the Mathematica platform and tallied the number of programming-related questions asked in the past week. These data were gathered manually. This measure indicates what programming languages are currently trending.

IEEE Xplore Digital Library

IEEE maintains a digital library with millions of conference and journal articles covering a wide array of scientific and engineering disciplines. We searched for articles that mention each of the languages in the template X programming" for the years 2022 and 2023, because this is the smallest timeframe for which we could access articles. For search results that returned thousands of articles, we identified the correct sample size for a 95 percent confidence interval (usually a little over 300) and pulled that number of articles. For each language we sampled, we identified the share of articles that utilize the programming language and then multiplied the total number of articles by this share to tally the likely total number of articles that reference a given programming language. We conducted this search in June 2023. This metric captures the prevalence of the different programming languages as used and referenced in engineering scholarship.

IEEE Job Site

We measured the demand for different programming languages in job postings on the IEEE Job Site. For search results that returned thousands of listings, we identified the correct sample size for a 95 percent confidence interval (usually around 300 results) and pulled that number of job listings to manually examine. For each language we sampled, we identified the share of listings that utilize the programming language and then multiplied the total number of job listings by this share to tally the likely total number of job listings that reference a given programming language. Additionally, because some of the languages we track could be ambiguous in plain text-such as lD, Go, J, Ada, and R-we searched for job postings with those words in the job description and then manually examined the results, again sampling entries if the number of results was large. The search was conducted in July 2023. We like the IEEE Job Site for its large number of non-U.S. listings, making it an ideal to measure global popularity.

CareerBuilder

We measured the demand for different programming languages on the CareerBuilder job site. We searched for Developer" jobs offered within the United States, as this is the most popular job title for programmers. We sampled 400 job ads and manually examined them to identify which languages employers mentioned in the postings. The search was conducted in July 2023. We like the career builder site to identify the popularity of programmer jobs in the United States.

GitHub

GitHub is a public repository for many volunteer-driven open-source software projects. We used data gathered by GitHut 2.0, which measures the top 50 languages used by the number of repositories tagged with that language and draws from GitHub's public API. We use two metrics from GitHub: repositories that have been starred" by users to reflect long-term interests, and the number of pull requests to indicate current activity. The data cover the second quarter of 2023. These measures indicate what languages coders choose to work in when they have a personal choice.

Trinity College Dublin Library

The library of Trinity College Dublin is one of six legal deposit libraries in Ireland and the United Kingdom. A copy must be deposited with the library of any book published or distributed in Ireland, and on request any U.K. publisher or distributor must also deposit a book. We searched for all books published in the year to date that had their subject matter categorized as computer programming and totaled the number of returns. The search was conducted in June 2023. We like this library collection because it represents a large and categorized sample of works, primarily in the English language.

Discord

Discord is popular chat-room platform where many programmers exchange information. We counted the number of tags that correspond to each language. In the case of languages that could also be names of nonprogramming topics, (many nonprogramming-related topics also have dedicated Discord servers; for example, Julia" could refer to the programming language or the Sesame Street puppet), results were manually examined. Disboard was searched in June 2023. Disboard lists many public discord servers and many young coders use the site, contributing a different demographic of coders.

External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/IeeeSpectrum
Feed Title IEEE Spectrum
Feed Link https://spectrum.ieee.org/
Reply 0 comments