Kaggle State of Machine Learning and Data Science 2020 PDF
Kaggle State of Machine Learning and Data Science 2020 PDF
Kaggle State of Machine Learning and Data Science 2020 PDF
Key Results 03
Education 07
Employment 11
Technology 18
Conclusion 28
Report
Methodology
The content of this report focuses on respondents who are
currently employed and chose their current job title as
“data scientist”. There are many other job titles that
support data science and machine learning workflows and
you can find their responses in the complete 2020 survey
dataset on Kaggle.
Technology
More data scientists use cloud computing compared to
2019 results
G e n d e r i d e n t i t y o f d ata s c i e n t i s t s
Man 81.9%
Woman 16.4%
Nonbinary 0.3%
Prefer not
to say 1.1%
Prefer to
self-describe 0.4%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
A g e r a n g e s o f d ata s c i e n t i s t s
0-17
18-21 6.9%
22-24 13.7%
25-29 25.2%
30-34 20.1%
35-39 13.4%
40-44 8.7%
45-49 5%
50-54 3.1%
55-59 1.5%
60-69 1.8%
79+ 0.6%
M o s t c o m m o n n at i o n a l i t i e s
21.8%
20%
15% 14.5%
10% 6.7%
4.6%
4.2%
5% 3% 3.3%
2.8% 2.8%
2.1% 2.4% 2.6%
1.4% 1.5% 1.8%
0%
Un
Po
Au
Tu
Ca
Sp
Nig
Ge
Jap
Fra
Ru
Bra
Ot
U.S
Ind
ite
rke
he
ss
la
ain
str
na
rm
ia
eri
zil
.
an
A.
dK
nd
ce
ia
r
da
alia
an
y
ing
y
do
m
Responses per country
# of respondents
300+
150
100
50
E d u c at i o n l e v e l o f K a g g l e d ata s c i e n t i s t s
No formal
education past 0.6%
high school
Some
college/university 2.4%
study without earning
a bachelor’s degree
Bachelor’s
24.2%
degree
Master’s
degree 51.1%
Doctoral
degree 17.2%
Professional degree
3.2%
I prefer
not to 1.3%
answer
Coursera 62.9%
Udemy 34.7%
University
Courses(resulting in 30.8%
a university degree
Kaggle Learn
30.1%
Courses
DataCamp 29.6%
edX 22.5%
Udacity 19.2%
Fast.ai 11.8%
Other 9.9%
Cloud-certification
programs (direct 9.1%
from AWS, Azure,
GCP, or similar)
None 7.4%
7.6%
13.3%
10-20 years
19.6%
21.9%
5-10 years
29.2%
27.9%
3-5 years
25.3%
17.3%
1-2 years
7%
9.3%
< 1 years
0.8%
20 or more 2.1%
years
5.1%
3.9%
10-15 years
8.6%
13%
5-10 years
19.6%
10.9%
4-5 years
15.3%
12.3%
3-4 years
17.2%
15.9%
2-3 years
15.8%
21.4%
1-2 years
12.1%
17.9%
Under 1 year
5.6%
the six figures, based on these survey results. Global 90% make less than $50,000 USD per year.
distributed.
G l o b a l s a l a r y d i s t r i b u t i o n f o r d a t a s c i e n t i s t s
300,000-500,000 0.7%
250,000-299,999 0.3%
200,000-249,999 1.6%
150,000-199,999 4.7%
125,000-149,999 4.5%
100,000-124,999 6.8%
90,000-99,999 3.5%
80,000-89,999 3.2%
70,000-79,999 4.2%
60,000-69,999 3.7%
50,000-59,999 4.3%
40,000-49,999 5.5%
30,000-39,999 5%
25,000-29,999 3.5%
20,000-24,999 3.6%
15,000-19,999 4%
10,000-14,999 5.7%
7,500-9,999 2.7%
5,000-7,499 3.1%
4,000-4,999 1.8%
3,000-3,999 1.8%
2,000-2,999 2%
1,000-1,999 4.3%
$0-999 18.6%
0% 5% 10% 15%
150,000-199,999 21.3%
125,000-149,999 18%
100,000-124,999 18.6%
90,000-99,999 6.9%
80,000-89,999 5.3%
70,000-79,999 4.7%
60,000-69,999 0.8%
50,000-59,999 0.6%
40,000-49,999 1.1%
30,000-39,999 0.3%
20,000-24,999 0.3%
15,000-19,999 0.3%
10,000-14,999 0.8%
5,000-7,499 0.3%
4,000-4,999 0.3%
3,000-3,999 0.3%
1,000-1,999 0.3%
$0-999 5%
150,000-199,999 0.6%
125,000-149,999 1%
100,000-124,999 1.2%
90,000-99,999 0.8%
80,000-89,999 1%
70,000-79,999 1.6%
60,000-69,999 0.8%
50,000-59,999 2.6%
40,000-49,999 3.4%
30,000-39,999 4.5%
25,000-29,999 4.7%
20,000-24,999 6.7%
15,000-19,999 7.3%
10,000-14,999 9.7%
7,500-9,999 5.7%
5,000-7,499 4.9%
4,000-4,999 3%
3,000-3,999 1.6%
2,000-2,999 1.6%
1,000-1,999 4%
$0-999 32%
M e d i a n s a l a r y f o r d ata s c i e n t i s t s b y c o u n t r y
125,000-
USA
149,999
Germany 70,000-79,999
Japan 40,000-49,999
Russia 10,000-14,999
Brazil 10,000-14,999
India 7,500-9,999
C o m pa n y s i z e ( # o f e m p l oy e e s )
0-49
37.3%
employees
50-249
13.7%
employees
250-999
10%
employees
1000-9,999
17%
employees
10,000 +
22%
employees
D ata s c i e n c e t e a m s ( # o f e m p l oy e e s )
0 9.2%
1-2 23.3%
3-4 18.8%
5-9 15.1%
10-14 7.4%
15-19 3.4%
20+ 22.9%
Machine learning adoption in the enterprise over time 2020 2019 2018
7.8%
I do not know
3.8%
4.9%
6.9%
No (we do not use
ML methods) 5.5%
4.2%
23.9%
We recently started
using ML methods (ie., 30.7%
models in production
for less than 2 years) 32.9%
17.6%
We are exploring ML
methods (and maybe 16.7%
one day put a model
into production)
19%
U S v s g l o b a l e n t e r p r i s e s p e n d i n g i n t h e pa s t 5 y e a r s ( $ U S D ) GLOBAL USA
11.6%
$100K+
25.6%
15.1%
$10K-
$99,999 20.8%
21.3%
$1K-$9,999
18.9%
16.3%
$100-$999
11.5%
10.2%
$1-$99
4.2%
25.6%
$0
18.9%
Jupyter-based IDEs continue to be the go-to tool for data This is the first year it has been separated out from Visual
scientists, with around three-quarters of Kaggle data Studio. The two combined for over 43% this year, versus
scientists using it. However, this has decreased from last under 30% in 2019.
year’s 83%. Visual Studio Code is in the second spot with
just over 33%.
JupyterLab 74.1%
PyCharm 31.9%
RStudio 31.5%
Spyder 21.8%
Notepad ++ 19.4%
Vim, Emacs, or
11%
similar
MATLAB 5.8%
Other 5.6%
None 0.7%
Linear or Logistic
Regression 83.7%
Decision Trees or
78.1%
Random Forests
Recurrent Neural
30.2%
Networks
Evolutionary 6.5%
Approaches
Other 4.5%
None
1.7%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Scikit-learn 82.8%
TensorFlow 50.5%
Keras 50.5%
Xgboost 48.4%
PyTorch 30.9%
LightGBM 26.1%
Caret 14.1%
Catboost 13.7%
Prophet 10%
Fast.ai 7.5%
Tidymodels 7.2%
H20 3 6%
MXNet 2.1%
Other 3.7%
None 3.2%
JAX 0.7%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Amazon Web
48.2%
Services (AWS)
Google Cloud
35.3%
Platform (GCP)
None 17.1%
Other
4.1%
Oracle Cloud 3%
No/None 20.3%
Amazon Elastic
Container Service 14.4%
Microsoft Azure
Container Instances 12.5%
Other 3.4%
No/None 55.2%
Google Cloud AI
Platform/Google Cloud 14.8%
ML Engine
Azure Machine
12.9%
Learning Studio
Azure Cognitive
Services 6.4%
Other 2.9%
D ata s c i e n t i s t u s a g e o f b u s i n e s s i n t e l l i g e n c e t o o l s
None 38.8%
Tableau 33.3%
Other 6.4%
Qlik 5%
Salesforce 2.8%
Looker 2.5%
Alteryx 2.1%
Sisense 1.2%
Domo 0.7%
D ata b a s e u s a g e b y d ata s c i e n t i s t s
MySQL 35.6%
PostgreSQL 28.9%
MongoDB 18.7%
SQLite 16.5%
None 15.4%
7.9%
Other
Snowflake 5.6%
A u t o m at e d m a c h i n e l e a r n i n g f r a m e w o r k u s a g e
No/None
68.1%
TensorBoard
21.6%
Other 5.4%
Trains
3.1%
Neptune 2.3%
Polyaxon
0.9%
Guild.ai
0.8%
Comet.ml
0.7%
Sacred+Omniboard
0.6%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%