DSBDA Mini Project - Ipynb - Colab
DSBDA Mini Project - Ipynb - Colab
1
4 NaN NaN
[5 rows x 24 columns]
2
[5]: # Last five rows
print("The last five rows are: ")
data.tail()
3
45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) \
7840 NaN NaN
7841 NaN NaN
7842 NaN NaN
7843 NaN NaN
7844 NaN NaN
[5 rows x 24 columns]
4
'Total Individuals Vaccinated'],
dtype='object')
[8]: data.describe()
5
max 98275.000000 6.236742e+07
6
25% 5.739350e+06 5.023407e+06
50% 3.716590e+07 3.365402e+07
75% 7.441663e+07 6.685368e+07
max 1.349420e+08 1.156684e+08
[8 rows x 22 columns]
[9]: data.describe(include='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7845 entries, 0 to 7844
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Updated On 7845 non-null object
1 State 7845 non-null object
2 Total Doses Administered 7621 non-null float64
3 Sessions 7621 non-null float64
4 Sites 7621 non-null float64
5 First Dose Administered 7621 non-null float64
6 Second Dose Administered 7621 non-null float64
7 Male (Doses Administered) 7461 non-null float64
8 Female (Doses Administered) 7461 non-null float64
9 Transgender (Doses Administered) 7461 non-null float64
10 Covaxin (Doses Administered) 7621 non-null float64
11 CoviShield (Doses Administered) 7621 non-null float64
12 Sputnik V (Doses Administered) 2995 non-null float64
13 AEFI 5438 non-null float64
14 18-44 Years (Doses Administered) 1702 non-null float64
7
15 45-60 Years (Doses Administered) 1702 non-null float64
16 60+ Years (Doses Administered) 1702 non-null float64
17 18-44 Years(Individuals Vaccinated) 3733 non-null float64
18 45-60 Years(Individuals Vaccinated) 3734 non-null float64
19 60+ Years(Individuals Vaccinated) 3734 non-null float64
20 Male(Individuals Vaccinated) 160 non-null float64
21 Female(Individuals Vaccinated) 160 non-null float64
22 Transgender(Individuals Vaccinated) 160 non-null float64
23 Total Individuals Vaccinated 5919 non-null float64
dtypes: float64(22), object(2)
memory usage: 1.4+ MB
[11]: data.isnull().sum()
[11]: Updated On 0
State 0
Total Doses Administered 224
Sessions 224
Sites 224
First Dose Administered 224
Second Dose Administered 224
Male (Doses Administered) 384
Female (Doses Administered) 384
Transgender (Doses Administered) 384
Covaxin (Doses Administered) 224
CoviShield (Doses Administered) 224
Sputnik V (Doses Administered) 4850
AEFI 2407
18-44 Years (Doses Administered) 6143
45-60 Years (Doses Administered) 6143
60+ Years (Doses Administered) 6143
18-44 Years(Individuals Vaccinated) 4112
45-60 Years(Individuals Vaccinated) 4111
60+ Years(Individuals Vaccinated) 4111
Male(Individuals Vaccinated) 7685
Female(Individuals Vaccinated) 7685
Transgender(Individuals Vaccinated) 7685
Total Individuals Vaccinated 1926
dtype: int64
As there are many NULL values present in the given dataset. We need to replace those values by
mean(in case of numerical data) or mode(in case of categorical data). Here, we need to work on
“First Dose Administered” and “Second Dose Administered”. Both of them are float, hence we will
replace the Nan Values by mean(average).
8
print("Average of First Dose:", avg_firstdose)
C:\Users\gunsm\AppData\Local\Temp\ipykernel_28768\3112630467.py:2:
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series
through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work
because the intermediate object on which we are setting values always behaves as
a copy.
9
Male (Doses Administered) Female (Doses Administered) \
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
7840 NaN NaN
7841 NaN NaN
7842 NaN NaN
7843 NaN NaN
7844 NaN NaN
10
7841 NaN
7842 NaN
7843 NaN
7844 NaN
11
[14]: # Average of Second Dose Administered
avg_seconddose = data["Second Dose Administered"].astype("float").mean(axis = 0)
print("Average of Second Dose:", avg_seconddose)
C:\Users\gunsm\AppData\Local\Temp\ipykernel_28768\81232257.py:2: FutureWarning:
A value is trying to be set on a copy of a DataFrame or Series through chained
assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work
because the intermediate object on which we are setting values always behaves as
a copy.
12
7843 7.414415e+06 1.773755e+06
7844 7.414415e+06 1.773755e+06
13
4 NaN
... ...
7840 NaN
7841 NaN
7842 NaN
7843 NaN
7844 NaN
14
[7845 rows x 24 columns]
0.0.2 Number of persons state wise vaccinated for first dose in India
15
0.0.3 Number of persons state wise vaccinated for second dose in India
16
0.0.4 Number of Males vaccinated
[18]: male = data["Male(Individuals Vaccinated)"].sum()
print("The total number of male individuals vaccinated are", int(male))
[20]: # Convert the 'Updated On' column to datetime (adjust the format if necessary)
data["Updated On"] = pd.to_datetime(data["Updated On"], format="%d/%m/%Y",␣
,→errors='coerce')
17
[22]: # Group by state and sum total doses
state_vaccine_data = data.groupby('State')['Total Doses Administered'].sum().
,→sort_values(ascending=False)
plt.xticks(rotation=90)
plt.xlabel("State")
plt.ylabel("Total Doses Administered")
plt.title("Top 15 States by Vaccine Doses Administered")
plt.show()
C:\Users\gunsm\AppData\Local\Temp\ipykernel_28768\1155984545.py:6:
FutureWarning:
sns.barplot(x=state_vaccine_data.head(15).index,
y=state_vaccine_data.head(15).values, palette="coolwarm")
18
[23]: # Group by state and sum doses
dose_comparison = data.groupby('State')[['First Dose Administered', 'Second Dose␣
,→Administered']].sum()
plt.xlabel("State")
plt.ylabel("Doses Administered")
plt.title("First vs Second Dose Administered by State")
plt.xticks(rotation=90)
plt.legend(["First Dose", "Second Dose"])
plt.show()
19
[24]: # Sum gender-wise vaccination
gender_vaccine_data = data[['Male (Doses Administered)', 'Female (Doses␣
,→Administered)', 'Transgender (Doses Administered)']].sum()
20
[25]: # Sum up vaccination by age group
age_group_vaccine_data = data[['18-44 Years (Doses Administered)', '45-60 Years␣
,→(Doses Administered)', '60+ Years (Doses Administered)']].sum()
plt.xlabel("Age Group")
plt.ylabel("Doses Administered")
plt.title("Vaccination by Age Group")
plt.show()
C:\Users\gunsm\AppData\Local\Temp\ipykernel_28768\1697523184.py:6:
FutureWarning:
21
sns.barplot(x=age_group_vaccine_data.index, y=age_group_vaccine_data.values,
palette="magma")
[ ]:
22