Descriptive Analytics.ipynb - Colab
Descriptive Analytics.ipynb - Colab
ipynb - Colab
import pandas as pd
data = pd.read_csv('/content/sample_data/Inc_Exp_Data (1).csv')
data.head()
2 10000 4500 2 0 1
3 10000 2000 1 0
data.shape
(50, 7)
data.columns
info()- number of rows, No. of columns, col names, data types of each col etc
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 1/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
0 Mthly_HH_Income 50 non-null int64
1 Mthly_HH_Expense 50 non-null int64
2 No_of_Fly_Members 50 non-null int64
3 Emi_or_Rent_Amt 50 non-null int64
4 Annual_HH_Income 50 non-null int64
5 Highest_Qualified_Member 50 non-null object
6 No_of_Earning_Members 50 non-null int64
dtypes: int64(6), object(1)
memory usage: 2.9+ KB
data.describe()
import statistics as st
st.mean(data['Mthly_HH_Income'])
41558
st.variance(data['Mthly_HH_Income'])
681100853.0612245
st.stdev(data['Mthly_HH_Income'])
26097.908978713687
data['No_of_Fly_Members'].unique()
array([3, 2, 1, 5, 4, 6, 7])
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 2/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
st.mode(data['No_of_Fly_Members'])
data['No_of_Fly_Members'].value_counts()
No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64
st.mode(data['No_of_Earning_Members'])
data['Highest_Qualified_Member'].value_counts()
Highest_Qualified_Member
Graduate 19
Under-Graduate 10
Professional 10
Post-Graduate 6
Illiterate 5
Name: count, dtype: int64
1. matplotlib.pyplot
2. seaborn
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 3/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
# size of chart
plt.figure(figsize=(3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
# x & y axis labels
plt.xlabel('Income')
plt.ylabel('Expenditure')
plt.title('Income vs expenditure')
plt.show()
line plot :
plt.figure(figsize=(3,3))
plt.plot(data['Mthly_HH_Income'],label='income' )
plt.plot(data['Mthly_HH_Expense'], label='expenditure')
plt.legend() # giving labels to graphs
plt.show()
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 4/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
pie chart: for categorical variables(few unique values), to know the proportion of each category
x = data['No_of_Earning_Members'].value_counts()
print(x)
No_of_Earning_Members
1 33
2 12
3 4
4 1
Name: count, dtype: int64
plt.figure(figsize=(3,3))
plt.pie(x,labels=x.index, autopct='%.0f%%' )
plt.show()
histogram: used for single variable values are divided into intervals / bins.
print(data['Mthly_HH_Income'].min())
print(data['Mthly_HH_Income'].max())
5000
100000
plt.figure(figsize=(3,3))
plt.hist(data['Mthly_HH_Income'], bins = 10)
plt.show()
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 5/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
earning = data['No_of_Earning_Members'].unique()
#print(earning)
plt.hist(data['No_of_Earning_Members'])
plt.xlabel('No. of earning members')
plt.ylabel('Count')
plt.xticks(earning)
plt.show()
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 6/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
plt.figure(figsize= (3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
plt.xlabel('income')
plt.ylabel('expenditure')
plt.show()
plt.pie(data['No_of_Fly_Members'])
plt.show()
data['No_of_Fly_Members'].unique()
array([3, 2, 1, 5, 4, 6, 7])
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 7/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
x = data['No_of_Fly_Members'].value_counts()
print(x)
No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 8/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 9/9