The Popularity of Data Science Software is a great article showing popularity of various data analytic software from different perspectives. Inspired by this article, I want to look at something I am personally interested: the popularity of statistics software in (UK) academic job market.
While the article mentioned above discusses popularity in job market in general, as well as scholarly articles, it does not look at the academic job market. Another reason to re-look at this issue is that data science/analytic/analysis/statistical… jobs are different. An example to illustrate this is Microsoft Excel which was not included in the article’s job market analysis. Excel is nevertheless widely used. If you use “Excel” and search terms used in their article and try today, you would find Excel is the third most popular software in data jobs advertised on Indeed, just behind Python and SQL. Seriously, I was turned down for a data analytic job because I did not do well in their Excel test during interview. However, it seems that jobs heavily using Excel are different kinds of jobs than those using Python/SQL/R etc., and a candidate of experienced Python users may not want to work for those Excel jobs anyway.
Another example is SAS. There is a kind of job titled as “SAS Programmer”. However, having worked and interacted with many statisticians in many UK academic institutions, I have never heard of or can imagined any researcher/statistician would consider themselves a “programmer” (be it SAS programmer or R programmer). Do they really want a “programmer” job? Similarly, many economists are good at data and statistics, but I would never call them a “Stata Programmer” (Stata seems very popular among economists). Would Nate Silver (who had a degree in economics from the University of Chicago and used to work as an economic consultant) call himself a “Stata Programmer”? Stata advocates itself “for researchers; by researchers”. Now as a researcher myself I am interested in how popular of major statistics software in academic job market, including the top used software used in scholarly articles (SPSS, R, SAS, Stata) and top software in general job market (Python).
To investigate this, I used http://www.jobs.ac.uk, the major academic job advertisement website. As you can see from the domain, it is mainly for UK but institutions from other countries do post jobs regularly. The majority of jobs are by universities but companies do occasionally post research jobs here. The website does have its built in search function but its search for R is impossible, so I will use Google to search this site. The search terms are the software name plus the data related terms (these are slightly different than this guide partially because Google has limit on number of words to be searched and also some term is irreverent in academia such as “business analytics” so they were removed):
site:jobs.ac.uk software AND ("big data" OR "data analytics" OR "machine learning" OR "statistical analysis" OR "data mining" OR "data science" OR "data scientist" OR "statistical software" OR "predictive analytics" OR "artificial intelligence" OR "predictive modelling" OR "statistical modelling" OR "statistical tools" OR "statistician")
For Stata and SPSS they are the easiest for obvious reasons and their names are used as such. For Python, since it is a general purpose programming language I only search its main machine learning library “scikit-learn”. Some academic jobs list statistics along with many other quantitative skills in an ad such as mathematics, physics etc and Python is often one of the languages along with Java, C++, so it is not clear how the job would use Python so I think a search for “scikit-learn” makes more sense. For SAS, it could mean many things. I have seen School of Advanced Study, Statistical Advisory Service, Student Administration and Support Services… all abbreviated as SAS. In particular Statistical Advisory Service may mess the results a little. Finally for R, I used the following: (” R ” OR ” R,”). Note this is far from perfect since I saw someone’s name such as “Firstname R Surname” or email address (r.surname@someemail.com) on the search results. Thus it is likely the number of R jobs has been greatly inflated.
The following is the search result done today:
Number of jobs | |
SPSS | 29 |
R | 262 |
SAS | 59 |
Stata | 64 |
Python (scikit-learn) | 5 |
As discussed above it is important to differentiate different kinds jobs. In particular, terms like “data science” or “artificial intelligence” may imply dealing with just (big) data itself or robot, a more computer science thing instead of trying to make sense of data. Now let’s re-do the above search but without those data science or AI terms, i.e. using:
site:jobs.ac.uk software AND ("data analytics" OR "machine learning" OR "statistical analysis" OR "data mining" OR "statistical software" OR "predictive analytics" OR "predictive modelling" OR "statistical modelling" OR "statistical tools" OR "statistician")
Number of jobs | |
SPSS | 29 |
R | 167 |
SAS | 41 |
Stata | 58 |
Python (scikit-learn) | 5 |
It seems R is the most popular but the actual number may be greatly inflated. Stata is the second most popular statistical software in (the UK) academic job market, followed by SAS, SPSS, and Python (scikit-learn).