0% found this document useful (0 votes)
8 views41 pages

Exploratory Data Analysis Using Python

The document is a Jupyter notebook for exploratory data analysis using Python, specifically analyzing the Stack Overflow Developer Survey 2020 dataset. It details the process of downloading the dataset, loading it into a Pandas DataFrame, and preparing the data for analysis by selecting relevant columns and handling missing values. The analysis focuses on demographics, programming experience, and employment-related information of survey respondents.

Uploaded by

obrama59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views41 pages

Exploratory Data Analysis Using Python

The document is a Jupyter notebook for exploratory data analysis using Python, specifically analyzing the Stack Overflow Developer Survey 2020 dataset. It details the process of downloading the dataset, loading it into a Pandas DataFrame, and preparing the data for analysis by selecting relevant columns and handling missing values. The analysis focuses on demographics, programming experience, and employment-related information of survey respondents.

Uploaded by

obrama59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

05/06/2023, 02:02 Exploratory Data Analysis using Python.

ipynb - Colaboratory

!pip install opendatasets --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Requirement already satisfied: opendatasets in /usr/local/lib/python3.10/dist-packages (0.1.22)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from opendatasets) (4.65.0)
Requirement already satisfied: kaggle in /usr/local/lib/python3.10/dist-packages (from opendatasets) (1.5.13)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from opendatasets) (8.1.3)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (1.1
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2022.
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2.27
Requirement already satisfied: python-slugify in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (1.26.
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.10/dist-packages (from python-slugify->
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->kaggle->opend

import opendatasets as od

od.download('stackoverflow-developer-survey-2020')
4
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/
94609408it [00:15, 5945838.42it/s]
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/
16384it [00:00, 190956.92it/s]
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/
8192it [00:00, 71149.66it/s]

Saved successfully!
Let's verify that the dataset was downloaded into the directory stackoverflow-developer-survey-2020 and retrieve the list of files in the
dataset.

import os

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 1/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

os.listdir('stackoverflow-developer-survey-2020')

['survey_results_public.csv', 'survey_results_schema.csv', 'README.txt']

You can through the downloaded files using the "File" > "Open" menu option in Jupyter. It seems like the dataset contains three files:

README.txt - Information about the dataset


survey_results_schema.csv - The list of questions, and shortcodes for each question
survey_results_public.csv - The full list of responses to the questions

Let's load the CSV files using the Pandas library. We'll use the name survey_raw_df for the data frame to indicate this is unprocessed data
that we might clean, filter, and modify to prepare a data frame ready for analysis.

import pandas as pd

survey_raw_df = pd.read_csv('stackoverflow-developer-survey-2020/survey_results_public.csv')

survey_raw_df
4

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 2/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Respondent MainBranch Hobbyist Age Age1stCode CompFreq CompTotal ConvertedComp Country CurrencyDe

I am a
0 1 developer by Yes NaN 13 Monthly NaN NaN Germany European Eu
profession

I am a
United
1 2 developer by No NaN 19 NaN NaN NaN Pound sterli
Kingdom
profession

I code
Russian
2 3 primarily as a Yes NaN 15 NaN NaN NaN Na
Federation
hobby

I am a
3 4 developer by Yes 25.0 18 NaN NaN NaN Albania Albanian l
profession

I used to be a
developer by United
4 5 Yes 31.0 16 NaN NaN NaN Na
profession, States
but no...
4
... ... ... ... ... ... ... ... ... ...

United
64456 64858 NaN Yes NaN 16 NaN NaN NaN Na
States

64457 64867 NaN Yes NaN NaN NaN NaN NaN Morocco Na
Saved successfully!
The dataset
64458 contains64898
over 64,000 responses
NaN to 60
Yesquestions
NaN (although
NaNmany questions
NaN are optional).
NaN The responses
NaN haveNam
Viet been anonymized
Na to
remove personally identifiable
64459 64925 information,
NaN and each
Yes respondent
NaN hasNaN
been assigned
NaN a randomized
NaN respondentNaN
ID. Poland Na

Let's view the list of columns in the data frame.


64460 65112 NaN Yes NaN NaN NaN NaN NaN Spain Na

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 3/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

survey_raw_df.columns
64461 rows × 61 columns
Index(['Respondent', 'MainBranch', 'Hobbyist', 'Age', 'Age1stCode', 'CompFreq',
'CompTotal', 'ConvertedComp', 'Country', 'CurrencyDesc',
'CurrencySymbol', 'DatabaseDesireNextYear', 'DatabaseWorkedWith',
'DevType', 'EdLevel', 'Employment', 'Ethnicity', 'Gender', 'JobFactors',
'JobSat', 'JobSeek', 'LanguageDesireNextYear', 'LanguageWorkedWith',
'MiscTechDesireNextYear', 'MiscTechWorkedWith',
'NEWCollabToolsDesireNextYear', 'NEWCollabToolsWorkedWith', 'NEWDevOps',
'NEWDevOpsImpt', 'NEWEdImpt', 'NEWJobHunt', 'NEWJobHuntResearch',
'NEWLearn', 'NEWOffTopic', 'NEWOnboardGood', 'NEWOtherComms',
'NEWOvertime', 'NEWPurchaseResearch', 'NEWPurpleLink', 'NEWSOSites',
'NEWStuck', 'OpSys', 'OrgSize', 'PlatformDesireNextYear',
'PlatformWorkedWith', 'PurchaseWhat', 'Sexuality', 'SOAccount',
'SOComm', 'SOPartFreq', 'SOVisitFreq', 'SurveyEase', 'SurveyLength',
'Trans', 'UndergradMajor', 'WebframeDesireNextYear',
'WebframeWorkedWith', 'WelcomeChange', 'WorkWeekHrs', 'YearsCode',
'YearsCodePro'],
dtype='object')

It appears that shortcodes for questions have been used as column names.

We can refer to the schema file to see the full text of each question. The schema file contains only two columns: Column and QuestionText .
We can load it as Pandas Series with Column as the index and the QuestionText as the value.
4

schema_fname = 'stackoverflow-developer-survey-2020/survey_results_schema.csv'
schema_raw = pd.read_csv(schema_fname, index_col='Column').QuestionText

schema_raw

Saved successfully!
Column
Respondent Randomized respondent ID number (not in order ...
MainBranch Which of the following options best describes ...
Hobbyist Do you code as a hobby?
Age What is your age (in years)? If you prefer not...
Age1stCode At what age did you write your first line of c...
...
WebframeWorkedWith Which web frameworks have you done extensive d...
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 4/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

WelcomeChange Compared to last year, how welcome do you feel...


WorkWeekHrs On average, how many hours per week do you wor...
YearsCode Including any education, how many years have y...
YearsCodePro NOT including education, how many years have y...
Name: QuestionText, Length: 61, dtype: object

We can now use schema_raw to retrieve the full question text for any column in survey_raw_df .

schema_raw['YearsCodePro']

'NOT including education, how many years have you coded professionally (as a part of your work)?'

We've now loaded the dataset. We're ready to move on to the next step of preprocessing & cleaning the data for our analysis.

Save and upload your notebook


Whether you're running this Jupyter notebook online or on your computer, it's essential to save your work from time to time. You can continue
working on a saved notebook later or share it with friends and colleagues to let them execute your code. Jovian offers an easy way of saving
and sharing your Jupyter notebooks online.
4

# Select a project name


project='python-eda-stackoverflow-survey'

Data Preparation & Cleaning


Saved successfully!
While the survey responses contain a wealth of information, we'll limit our analysis to the following areas:

Demographics of the survey respondents and the global programming community


Distribution of programming skills, experience, and preferences
Employment-related information, preferences, and opinions

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 5/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Let's select a subset of columns with the relevant data for our analysis.

selected_columns = [
# Demographics
'Country',
'Age',
'Gender',
'EdLevel',
'UndergradMajor',
# Programming experience
'Hobbyist',
'Age1stCode',
'YearsCode',
'YearsCodePro',
'LanguageWorkedWith',
'LanguageDesireNextYear',
'NEWLearn',
'NEWStuck',
# Employment
'Employment',
'DevType',
'WorkWeekHrs', 4

'JobSat',
'JobFactors',
'NEWOvertime',
'NEWEdImpt'
]

Saved successfully!
len(selected_columns)

20

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 6/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Let's extract a copy of the data from these columns into a new data frame survey_df . We can continue to modify further without affecting the
original data= frame.
survey_df survey_raw_df[selected_columns].copy()

schema = schema_raw[selected_columns]

Let's view some basic information about the data frame.

survey_df.shape

(64461, 20)

survey_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64461 entries, 0 to 64460
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country 64072 non-null object
1 Age 45446 non-null float64
2 Gender 50557 non-null object 4
3 EdLevel 57431 non-null object
4 UndergradMajor 50995 non-null object
5 Hobbyist 64416 non-null object
6 Age1stCode 57900 non-null object
7 YearsCode 57684 non-null object
8 YearsCodePro 46349 non-null object
9 LanguageWorkedWith 57378 non-null object
Saved10
successfully!
LanguageDesireNextYear 54113 non-null object
11 NEWLearn 56156 non-null object
12 NEWStuck 54983 non-null object
13 Employment 63854 non-null object
14 DevType 49370 non-null object
15 WorkWeekHrs 41151 non-null float64
16 JobSat 45194 non-null object
17 JobFactors 49349 non-null object
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 7/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

18 NEWOvertime 43231 non-null object


19 NEWEdImpt 48465 non-null object
dtypes: float64(2), object(18)
memory usage: 9.8+ MB

Most columns have the data type object , either because they contain values of different types or contain empty values ( NaN ). It appears that
every column contains some empty values since the Non-Null count for every column is lower than the total number of rows (64461). We'll
need to deal with empty values and manually adjust the data type for each column on a case-by-case basis.

Only two of the columns were detected as numeric columns ( Age and WorkWeekHrs ), even though a few other columns have mostly numeric
values. To make our analysis easier, let's convert some other columns into numeric data types while ignoring any non-numeric value. The non-
numeric are converted to NaN .

survey_df['Age1stCode'] = pd.to_numeric(survey_df.Age1stCode, errors='coerce')


survey_df['YearsCode'] = pd.to_numeric(survey_df.YearsCode, errors='coerce')
survey_df['YearsCodePro'] = pd.to_numeric(survey_df.YearsCodePro, errors='coerce')

Let's now view some basic statistics about numeric columns.

4
survey_df.describe()

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 8/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Age Age1stCode YearsCode YearsCodePro WorkWeekHrs

count 45446.000000 57473.000000 56784.000000 44133.000000 41151.000000

There mean
seems to be a problem with
30.834111 the age column,
15.476572 as the minimum
12.782051 value is 1 and
8.869667 the maximum is 279. This is a common issue with surveys:
40.782174
responses
std may contain invalid values
9.585392 due to accidental
5.114081 9.490657or intentional errors while
7.759961 responding. A simple fix would be to ignore the rows where
17.816383
the age is higher than 100 years or lower than 10 years as invalid survey responses. We can do this using the .drop method, as explained
min 1.000000 5.000000 1.000000 1.000000 1.000000
here.
25% 24.000000 12.000000 6.000000 3.000000 40.000000

50% 29.000000 15.000000


survey_df.drop(survey_df[survey_df.Age <10.000000 6.000000
10].index, inplace=True) 40.000000
survey_df.drop(survey_df[survey_df.Age
75% 35.000000 18.000000 >17.000000
100].index, 12.000000
inplace=True)44.000000

max 279.000000 85.000000 50.000000 50.000000 475.000000


The same holds for WorkWeekHrs . Let's ignore entries where the value for the column is higher than 140 hours. (~20 hours per day).

survey_df.drop(survey_df[survey_df.WorkWeekHrs > 140].index, inplace=True)

The gender column also allows for picking multiple options. We'll remove values containing more than one option to simplify our analysis.

4
survey_df['Gender'].value_counts()

Man 45895
Woman 3835
Non-binary, genderqueer, or gender non-conforming 385
Man;Non-binary, genderqueer, or gender non-conforming 121
Woman;Non-binary, genderqueer, or gender non-conforming 92
Woman;Man
Saved successfully! 73
Woman;Man;Non-binary, genderqueer, or gender non-conforming 25
Name: Gender, dtype: int64

import numpy as np

survey_df.where(~(survey_df.Gender.str.contains(';', na=False)), np.nan, inplace=True)


https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 9/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

We've now cleaned up and prepared the dataset for analysis. Let's take a look at a sample of rows from the data frame.

survey_df.sample(10)

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 10/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Country Age Gender EdLevel UndergradMajor Hobbyist Age1stCode YearsCode YearsCodePro

Master’s degree Another


(M.A., M.S., engineering
55323 France 27.0 Woman Yes 13.0 6.0 4.0 Bash/S
M.Eng., MBA, discipline (such as
etc.) civil,...

Computer science,
Bachelor’s degree
United computer
35965 NaN NaN (B.A., B.S., B.Eng., No 11.0 3.0 1.0
States engineering, or
etc.)
sof...

Computer science,
Bachelor’s degree
United computer
53520 28.0 Man (B.A., B.S., B.Eng., No 19.0 9.0 6.0
States engineering, or
etc.)
sof...

Some
Fine arts or
South college/university
51707 62.0 Man performing arts Yes 33.0 7.0 2.0
Africa study without
(such as graphic ...
earning ...

Computer science,
Exploratory
54551
Analysis
India 22.0
and Visualization
Bachelor’s degree
Man (B.A., B.S., B.Eng.,
computer
Yes 16.0 7.0 3.0 C;HTML
engineering, or
etc.) 4
sof...
Before we ask questions about the survey responses, it would help to understand the respondents' demographics, i.e., country, age, gender,
education level, employment level, etc. It's essential to explore these variables to understand how representative the survey is of the worldwide
Bachelor’s
programming community. A survey of this degree tends to
scale generally A business
have some selection bias.
24140 Poland NaN (B.A., B.S., B.Eng., discipline (such as
Man Yes 20.0 15.0 10.0
Let's begin by importing matplotlib.pyplot and etc.)seabornaccounting,
. fin...

Computer science,
Saved successfully! Bachelor’s degree
import seaborn as sns computer
7061 India 28.0 Man (B.A., B.S., B.Eng., Yes 17.0 10.0 6.0
import matplotlib engineering, or
etc.)
import matplotlib.pyplot as plt sof...
%matplotlib inline
Another
Bachelor’s degree
United engineering
3263 40.0
sns.set_style('darkgrid') Man (B.A., B.S., B.Eng., Yes 20.0 14.0 12.0
States discipline (such as
matplotlib.rcParams['font.size'] = 14 etc.)
civil,...
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 11/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory
matplotlib.rcParams['figure.figsize'] = (9, 5)
Some
matplotlib.rcParams['figure.facecolor'] = '#00000000'
college/university Web development
24309 Sweden 23.0 Man Yes 17.0 5.0 2.0 Bash/Sh
study without or web design
earning ...
Country
Master’s degree Computer science,
Let's look United
at the number 36.0
of countries (M.A.,
from which M.S.,are responses
there computer
in the survey Yes
and plot the ten countries 15.0
with the highest 13.0
number of
55423 Man 10.0 C#;HTML
Kingdom M.Eng., MBA, engineering, or
responses. etc.) sof...

schema.Country

'Where do you live?'

survey_df.Country.nunique()

183

We can identify the countries with the highest number of respondents using the value_counts method.

top_countries = survey_df.Country.value_counts().head(15) 4
top_countries

United States 12371


India 8364
United Kingdom 3881
Germany 3864
Canada 2175
France
Saved successfully! 1884
Brazil 1804
Netherlands 1332
Poland 1259
Australia 1199
Spain 1157
Italy 1115
Russian Federation 1085

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 12/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Sweden 879
Pakistan 802
Name: Country, dtype: int64

We can visualize this information using a bar chart.

plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title(schema.Country)
sns.barplot(x=top_countries.index, y=top_countries);

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 13/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

It appears that a disproportionately high number of respondents are from the US and India, probably because the survey is in English, and these
countries have the highest English-speaking populations. We can already see that the survey may not be representative of the global
programming community - especially from non-English speaking countries. Programmers from non-English speaking countries are almost
certainly underrepresented.

Exercise: Try finding the percentage of responses from English-speaking vs. non-English speaking countries. You can use this list of languages
spoken in different countries.

4
Age
The distribution of respondents' age is another crucial factor to look at. We can use a histogram to visualize it.

plt.figure(figsize=(12, 6))
plt.title(schema.Age)
Saved successfully!
plt.xlabel('Age')
plt.ylabel('Number of respondents')

plt.hist(survey_df.Age, bins=np.arange(10,80,5), color='purple');

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 14/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

It appears that a large percentage of respondents are 20-45 years old. It's somewhat representative of the programming community in general.
Savedyoung
Many successfully!
people have taken up computer science as their field of study or profession in the last 20 years.

Exercise: You may want to filter out responses by age (or age group) if you'd like to analyze and compare the survey results for different age
groups. Create a new column called AgeGroup containing values like Less than 10 years , 10-18 years , 18-30 years , 30-45 years ,
45-60 years and Older than 60 years . Then, repeat the analysis in the rest of this notebook for each age group.

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 15/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Gender
Let's look at the distribution of responses for the Gender. It's a well-known fact that women and non-binary genders are underrepresented in the
programming community, so we might expect to see a skewed distribution here.

schema.Gender

'Which of the following describe you, if any? Please check all that apply. If you prefer not to answer, you may
leave this question blank.'

gender_counts = survey_df.Gender.value_counts()
gender_counts

Man 45895
Woman 3835
Non-binary, genderqueer, or gender non-conforming 385
Name: Gender, dtype: int64

A pie chart would be a great way to visualize the distribution.


4

plt.figure(figsize=(12,6))
plt.title(schema.Gender)
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%', startangle=180);

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 16/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Only about 8% of survey respondents who have answered the question identify as women or non-binary. This number is lower than the overall
percentage of women & non-binary genders in the programming community - which is estimated to be around 12%.

Exercise: It would be interesting to compare the survey responses & preferences across genders. Repeat this analysis with these breakdowns.
How do the relative education levels differ across genders? How do the salaries vary? You may find this analysis on the Gender Divide in Data
Science useful.

Education Level
Formal education in computer science is often considered an essential requirement for becoming a programmer. However, there are many free
4
resources & tutorials available online to learn programming. Let's compare the education levels of respondents to gain some insight into this.
We'll use a horizontal bar plot here.

sns.countplot(y=survey_df.EdLevel)
plt.xticks(rotation=75);
plt.title(schema['EdLevel'])
Saved successfully!
plt.ylabel(None);

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 17/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

It appears that well over half of the respondents hold a bachelor's or master's degree, so most programmers seem to have some college
education. However, it's not clear from this graph alone if they hold a degree in computer science.

Exercises: The graph currently shows the number of respondents for each option. Can you modify it to show the percentage instead? Further,
try comparing the percentages for each degree for men vs. women.

Let's also plot undergraduate majors, but this time we'll convert the numbers into percentages and sort the values to make it easier to visualize
the order. 4

schema.UndergradMajor

'What was your primary field of study?'

Saved successfully!
undergrad_pct = survey_df.UndergradMajor.value_counts() * 100 / survey_df.UndergradMajor.count()

sns.barplot(x=undergrad_pct, y=undergrad_pct.index)

plt.title(schema.UndergradMajor)
plt.ylabel(None);
plt.xlabel('Percentage');

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 18/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

It turns out that 40% of programmers holding a college degree have a field of study other than computer science - which is very encouraging. It
seems to suggest that while a college education is helpful in general, you do not need to pursue a major in computer science to become a 4

successful programmer.

Exercises: Analyze the NEWEdImpt column for respondents who hold some college degree vs. those who don't. Do you notice any difference in
opinion?

Saved successfully!
Employment
Freelancing or contract work is a common choice among programmers, so it would be interesting to compare the breakdown between full-
time, part-time, and freelance work. Let's visualize the data from the Employment column.

schema.Employment
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 19/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

'Which of the following best describes your current employment status?'

(survey_df.Employment.value_counts(normalize=True, ascending=True)*100).plot(kind='barh', color='g')


plt.title(schema.Employment)
plt.xlabel('Percentage');

Saved successfully!
It appears that close to 10% of respondents are employed part time or as freelancers.

Exercise: Add a new column EmploymentType containing the values Enthusiast (student or not employed but looking for work),
Professional (employed full-time, part-time or freelancing), and Other (not employed or retired). For each of the graphs that follow, show a
comparison between Enthusiast and Professional .

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 20/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

The DevType field contains information about the roles held by respondents. Since the question allows multiple answers, the column contains
lists of values separated by a semi-colon ; , making it a bit harder to analyze directly.

schema.DevType

'Which of the following describe you? Please select all that apply.'

survey_df.DevType.value_counts()

Developer, full-stack
4396
Developer, back-end
3056
Developer, back-end;Developer, front-end;Developer, full-stack
2214
Developer, back-end;Developer, full-stack
1465
Developer, front-end
1390

...
4
Database administrator;Developer, back-end;Developer, front-end;Developer, full-stack;Developer, QA or test;Senior
executive/VP 1
Database administrator;Developer, back-end;Developer, front-end;Developer, full-stack;Product manager;Senior
executive/VP 1
Developer, back-end;Developer, full-stack;Developer, mobile;DevOps specialist;Educator;System administrator
1
Data or business analyst;Database administrator;Developer, back-end;Developer, desktop or enterprise
applications;Developer, front-end;Developer, mobile;Engineering manager 1
Saved successfully!
Data or business analyst;Developer, mobile;Senior executive/VP;System administrator
1
Name: DevType, Length: 8213, dtype: int64

Let's define a helper function that turns a column containing lists of values (like survey_df.DevType ) into a data frame with one column for
each possible option.
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 21/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

def split_multicolumn(col_series):
result_df = col_series.to_frame()
options = []
# Iterate over the column
for idx, value in col_series[col_series.notnull()].iteritems():
# Break each value into list of options
for option in value.split(';'):
# Add the option as a column to result
if not option in result_df.columns:
options.append(option)
result_df[option] = False
# Mark the value in the option column as True
result_df.at[idx, option] = True
return result_df[options]

dev_type_df = split_multicolumn(survey_df.DevType)

<ipython-input-56-26a445763b0d>:5: FutureWarning: iteritems is deprecated and will be removed in a future version. U


for idx, value in col_series[col_series.notnull()].iteritems():

4
dev_type_df

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 22/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Developer,
Developer,
desktop or Developer, Developer, Developer, Developer, Developer, DevOps
Designer game or
enterprise full-stack mobile front-end back-end QA or test specialist
graphics
applications

0 True True False False False False False False False

1 False True True False False False False False False

2 False False False False False False False False False

3 False False False False False False False False False

4 False False False False False False False False False

... ... ... ... ... ... ... ... ... ...

64456 False False False False False False False False False

64457
The dev_type_df Falsecolumn forFalse
has one Falsecan be False
each option that False
selected as a response. If a False False
respondent has False the
chosen an option, False

corresponding
64458 column'sFalse
value is TrueFalse
. Otherwise, itFalse
is False . False False False False False False

We can64459 False
now use the column-wise False
totals False
to identify the False roles. False
most common False False False False

64460 False False False False False False False False False 4
dev_type_totals = columns
64306 rows × 23 dev_type_df.sum().sort_values(ascending=False)
dev_type_totals

Developer, back-end 26996


Developer, full-stack 26915
Developer, front-end 18128
Developer, desktop or enterprise applications 11687
Saved successfully!
Developer, mobile 9406
DevOps specialist 5915
Database administrator 5658
Designer 5262
System administrator 5185
Developer, embedded applications or devices 4701
Data or business analyst 3970
Data scientist or machine learning specialist 3939
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 23/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Developer, QA or test 3893


Engineer, data 3700
Academic researcher 3502
Educator 2895
Developer, game or graphics 2751
Engineering manager 2699
Product manager 2471
Scientist 2060
Engineer, site reliability 1921
Senior executive/VP 1292
Marketing or sales professional 625
dtype: int64

As one might expect, the most common roles include "Developer" in the name.

Exercises:

Can you figure out what percentage of respondents work in roles related to data science?
Which positions have the highest percentage of women?

We've only explored a handful of columns from the 20 columns that we selected. Explore and visualize the remaining columns using the empty
cells below. 4

Saved successfully!

Asking and Answering Questions

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 24/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

We've already gained several insights about the respondents and the programming community by exploring individual columns of the dataset.
Let's ask some specific questions and try to answer them using data frame operations and visualizations.

Q: What are the most popular programming languages in 2020?


To answer, this we can use the LanguageWorkedWith column. Similar to DevType , respondents were allowed to choose multiple options here.

survey_df.LanguageWorkedWith

0 C#;HTML/CSS;JavaScript
1 JavaScript;Swift
2 Objective-C;Python;Swift
3 NaN
4 HTML/CSS;Ruby;SQL
...
64456 NaN
64457 Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...
64458 NaN
64459 HTML/CSS
64460 C#;HTML/CSS;Java;JavaScript;SQL
Name: LanguageWorkedWith, Length: 64306, dtype: object
4

First, we'll split this column into a data frame containing a column of each language listed in the options.

languages_worked_df = split_multicolumn(survey_df.LanguageWorkedWith)

<ipython-input-56-26a445763b0d>:5:
Saved successfully! FutureWarning: iteritems is deprecated and will be removed in a future version. U
for idx, value in col_series[col_series.notnull()].iteritems():

languages_worked_df

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 25/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Objective-
C# HTML/CSS JavaScript Swift Python Ruby SQL Java PHP ... VBA Perl Scala C++
C

0 True True True False False False False False False False ... False False False False

1 False False True True False False False False False False ... False False False False

2 False False False True True True False False False False ... False False False False

3 False False False False False False False False False False ... False False False False

4 False True False False False False True True False False ... False False False False

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

64456 False False False False False False False False False False ... False False False False

64457 True True True True True True True True True True ... True True True True

64458 False False False False False False False False False False ... False False False False

64459 False True False False False False False False False False ... False False False False

64460 True True True False False False False True True False ... False False False False

64306 rows × 25 columns


4

It appears that a total of 25 languages were included among the options. Let's aggregate these to identify the percentage of respondents who
selected each language.

languages_worked_percentages
Saved successfully! = languages_worked_df.mean().sort_values(ascending=False) * 100
languages_worked_percentages

JavaScript 59.893323
HTML/CSS 55.801947
SQL 48.444935
Python 39.001026
Java 35.618760
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 26/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Bash/Shell/PowerShell 29.239884
C# 27.803004
PHP 23.130035
TypeScript 22.461357
C++ 21.114670
C 19.236152
Go 7.758219
Kotlin 6.887382
Ruby 6.229590
Assembly 5.447392
VBA 5.394520
Swift 5.226573
R 5.064846
Rust 4.498803
Objective-C 3.603085
Dart 3.517557
Scala 3.150561
Perl 2.757130
Haskell 1.861413
Julia 0.782198
dtype: float64

We can plot this information using a horizontal bar chart.

4
plt.figure(figsize=(12, 12))
sns.barplot(x=languages_worked_percentages, y=languages_worked_percentages.index)
plt.title("Languages used in the past year");
plt.xlabel('count');

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 27/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 28/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Perhaps unsurprisingly, Javascript & HTML/CSS comes out at the top as web development is one of today's most sought skills. It also happens
to be one of the easiest to get started. SQL is necessary for working with relational databases, so it's no surprise that most programmers work
with SQL regularly. Python seems to be the popular choice for other forms of development, beating out Java, which was the industry standard
for server & application development for over two decades.

Exercises:

What are the most common languages used by students? How does the list compare with the most common languages used by
professional developers?
What are the most common languages among respondents who do not describe themselves as "Developer, front-end"?
What are the most common languages among respondents who work in fields related to data science?
What are the most common languages used by developers older than 35 years of age?
What are the most common languages used by developers in your home country?

4
Q: Which languages are the most people interested to learn over the next year?

For this, we can use the LanguageDesireNextYear column, with similar processing as the previous one.

languages_interested_df = split_multicolumn(survey_df.LanguageDesireNextYear)
languages_interested_percentages = languages_interested_df.mean().sort_values(ascending=False) * 100
Saved successfully!
languages_interested_percentages

<ipython-input-56-26a445763b0d>:5: FutureWarning: iteritems is deprecated and will be removed in a future version. U


for idx, value in col_series[col_series.notnull()].iteritems():
Python 41.143906
JavaScript 40.425466
HTML/CSS 32.028116
SQL 30.799614
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 29/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

TypeScript 26.451653
C# 21.058688
Java 20.464653
Go 19.432090
Bash/Shell/PowerShell 18.057413
Rust 16.270643
C++ 15.014151
Kotlin 14.760676
PHP 10.947657
C 9.359935
Swift 8.692812
Dart 7.308805
R 6.571704
Ruby 6.425528
Scala 5.326097
Haskell 4.593662
Assembly 3.766367
Julia 2.540976
Objective-C 2.338818
Perl 1.761888
VBA 1.611047
dtype: float64

plt.figure(figsize=(12, 12)) 4
sns.barplot(x=languages_interested_percentages, y=languages_interested_percentages.index)
plt.title("Languages people are intersted in learning over the next year");
plt.xlabel('count');

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 30/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 31/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Once again, it's not surprising that Python is the language most people are interested in learning - since it is an easy-to-learn general-purpose
programming language well suited for a variety of domains: application development, numerical computing, data analysis, machine learning,
big data, cloud automation, web scraping, scripting, etc. We're using Python for this very analysis, so we're in good company!

Exercises: Repeat the exercises from the previous question, replacing "most common languages" with "languages people are interested in
learning/using."

Q: Which are the most loved languages, i.e., a high percentage of people who have used the language want to continue learning
& using it over the next year?
While this question may seem tricky at first, it's straightforward to solve using Pandas array operations. Here's what we can do:

Create a new data frame languages_loved_df that contains a True value for a language only if the corresponding values in
languages_worked_df and languages_interested_df are both True
Take the column-wise sum of languages_loved_df and divide it by the column-wise sum of languages_worked_df to get the
4
percentage of respondents who "love" the language
Sort the results in decreasing order and plot a horizontal bar graph

languages_loved_df = languages_worked_df & languages_interested_df

languages_loved_percentages
Saved successfully! = (languages_loved_df.sum() * 100/ languages_worked_df.sum()).sort_values(ascending=False)

plt.figure(figsize=(12, 12))
sns.barplot(x=languages_loved_percentages, y=languages_loved_percentages.index)
plt.title("Most loved languages");
plt.xlabel('count');

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 32/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 33/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Rust has been StackOverflow's most-loved language for four years in a row. The second most-loved language is TypeScript, a popular
alternative to JavaScript for web development.

Python features at number 3, despite already being one of the most widely-used languages in the world. Python has a solid foundation, is easy
to learn & use, has a large ecosystem of domain-specific libraries, and a massive worldwide community.

Exercises: What are the most dreaded languages, i.e., languages which people have used in the past year but do not want to learn/use over the
next year. Hint: ~languages_interested_df . 4

Q: In which countries do developers work the highest number of hours per week? Consider countries with more than 250
responses only.

To answer this question, we'll need to use the groupby data frame method to aggregate the rows for each country. We'll also need to filter the
Saved successfully!
results to only include the countries with more than 250 respondents.

countries_df = survey_df.groupby('Country')[['WorkWeekHrs']].mean().sort_values('WorkWeekHrs', ascending=False)

high_response_countries_df = countries_df.loc[survey_df.Country.value_counts() > 250].head(15)

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 34/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

high_response_countries_df

WorkWeekHrs

Country

Iran 44.337748

Israel 43.915094

China 42.150000

United States 41.802982

Greece 41.402724

Viet Nam 41.391667

South Africa 41.023460

Turkey 40.982143

Sri Lanka 40.612245

New Zealand 40.457551

Belgium 40.444444 4

Canada 40.208837

Hungary 40.194340

Bangladesh 40.097458

India 40.090603
Saved successfully!

The Asian countries like Iran, China, and Israel have the highest working hours, followed by the United States. However, there isn't too much
variation overall, and the average working hours seem to be around 40 hours per week.

Exercises:
https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 35/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

How do the average work hours compare across continents? You may find this list of countries in each continent useful.
Which role has the highest average number of hours worked per week? Which one has the lowest?
How do the hours worked compare between freelancers and developers working full-time?

Q: How important is it to start young to build a career in programming?

Let's create a scatter plot of Age vs. YearsCodePro (i.e., years of coding experience) to answer this question.

schema.YearsCodePro

'NOT including education, how many years have you coded professionally (as a part of your work)?'

sns.scatterplot(x='Age', y='YearsCodePro', hue='Hobbyist', data=survey_df)


plt.xlabel("Age")
plt.ylabel("Years of professional coding experience");

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 36/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

You can see points all over the graph, which indicates that you can start programming professionally at any age. Many people who have been
coding for several decades professionally also seem to enjoy it as a hobby.

We can also view the distribution of the Age1stCode column to see when the respondents tried programming for the first time.

plt.title(schema.Age1stCode)
sns.histplot(x=survey_df.Age1stCode, bins=30, kde=True);

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 37/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

As you might expect, most people seem to have had some exposure to programming before the age of 40. However, but there are people of all
ages and walks of life learning to code.

Exercises:

How does programming experience change opinions & preferences? Repeat the entire analysis while comparing the responses of people
who have more than ten years of professional programming experience vs. those who don't. Do you see any interesting trends?
Compare the years of professional coding experience across different genders.

Hopefully, you are already thinking of many more questions you'd like to answer using this data. Use the empty cells below to ask and answer
more questions.

Let's save and commit our work before continuing


Saved successfully!

import jovian

jovian.commit()

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 38/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

[jovian] Detected Colab notebook...


[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian,
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian.
Also, you can also delete this cell, it's no longer necessary.

Inferences and Conclusions


We've drawn many inferences from the survey. Here's a summary of a few of them:

Based on the survey respondents' demographics, we can infer that the survey is somewhat representative of the overall programming
community. However, it has fewer responses from programmers in non-English-speaking countries and women & non-binary genders.

The programming community is not as diverse as it can be. Although things are improving, we should make more efforts to support &
encourage underrepresented communities, whether in terms of age, country, race, gender, or otherwise.

Although most programmers hold a college degree, a reasonably large percentage did not have computer science as their college major.
Hence, a computer science degree isn't compulsory for learning to code or building a career in programming.

A significant percentage of programmers either work part-time or as freelancers, which can be a great way to break into the field,
especially when you're just getting started.

Javascript & HTML/CSS are the most used programming languages in 2020, closely followed by SQL & Python. 4

Python is the language most people are interested in learning - since it is an easy-to-learn general-purpose programming language well
suited for various domains.

Rust and TypeScript are the most "loved" languages in 2020, both of which have small but fast-growing communities. Python is a close
third, despite already being a widely used language.
Saved successfully!
Programmers worldwide seem to be working for around 40 hours a week on average, with slight variations by country.

You can learn and start programming professionally at any age. You're likely to have a long and fulfilling career if you also enjoy
programming as a hobby.

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 39/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

Exercises
There's a wealth of information to be discovered using the survey, and we've barely scratched the surface. Here are some ideas for further
exploration:

Repeat the analysis for different age groups & genders, and compare the results
Pick a different set of columns (we chose 20 out of 65) to analyze other facets of the data
Prepare an analysis focusing on diversity - and identify areas where underrepresented communities are at par with the majority (e.g.,
education) and where they aren't (e.g., salaries)
Compare the results of this year's survey with the previous years and identify interesting trends

References and Future Work


Check out the following resources to learn more about the dataset and tools used in this notebook:

Stack Overflow Developer Survey: https://insights.stackoverflow.com/survey


Pandas user guide: https://pandas.pydata.org/docs/user_guide/index.html
Matplotlib user guide: https://matplotlib.org/3.3.1/users/index.html
Seaborn user guide & tutorial: https://seaborn.pydata.org/tutorial.html
opendatasets Python library: https://github.com/JovianML/opendatasets 4

As a next step, you can try out a project on another dataset of your choice: https://jovian.ml/aakashns/zerotopandas-course-project-starter .

import jovian

jovian.commit()
Saved successfully!

[jovian] Detected Colab notebook...


[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian,
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian.
Also, you can also delete this cell, it's no longer necessary.

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 40/41
05/06/2023, 02:02 Exploratory Data Analysis using Python.ipynb - Colaboratory

4
Colab paid products - Cancel contracts here

check 0s completed at 01:59

Saved successfully!

https://colab.research.google.com/drive/1pILANvXLtIlJAjSdfl5q0Vxj9vqthN3T#scrollTo=yqxgBbUIECP5&printMode=true 41/41

You might also like