8/23/24, 11:47 AM descriptive analytics.
ipynb - Colab
income expenditure CSV dataset fro kaggle
load the dataset into dataframe / table
import pandas as pd
data = pd.read_csv('/content/sample_data/Inc_Exp_Data (1).csv')
data.head()
Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_HH_I
0 5000 8000 3 2000
1 6000 7000 2 3000
2 10000 4500 2 0 1
3 10000 2000 1 0
4 12500 12000 2 3000 1
data.shape
(50, 7)
data.columns
Index(['Mthly_HH_Income', 'Mthly_HH_Expense', 'No_of_Fly_Members',
'Emi_or_Rent_Amt', 'Annual_HH_Income', 'Highest_Qualified_Member',
'No_of_Earning_Members'],
dtype='object')
descriptive statistics uses the following measures
1. central tendency: mean, median, mode
2. frequency meadures- how frequently events are occuring
3. measures of variation- ranges, variance, SD
info()- number of rows, No. of columns, col names, data types of each col etc
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 1/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
0 Mthly_HH_Income 50 non-null int64
1 Mthly_HH_Expense 50 non-null int64
2 No_of_Fly_Members 50 non-null int64
3 Emi_or_Rent_Amt 50 non-null int64
4 Annual_HH_Income 50 non-null int64
5 Highest_Qualified_Member 50 non-null object
6 No_of_Earning_Members 50 non-null int64
dtypes: int64(6), object(1)
memory usage: 2.9+ KB
describes numeric columns/attributes
data.describe()
Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_
count 50.000000 50.000000 50.000000 50.000000 5.0
mean 41558.000000 18818.000000 4.060000 3060.000000 4.9
std 26097.908979 12090.216824 1.517382 6241.434948 3.2
min 5000.000000 2000.000000 1.000000 0.000000 6.4
25% 23550.000000 10000.000000 3.000000 0.000000 2.5
50% 35000.000000 15500.000000 4.000000 0.000000 4.4
75% 50375.000000 25000.000000 5.000000 3500.000000 5.9
max 100000.000000 50000.000000 7.000000 35000.000000 1.4
central tendencies using statistics module
import statistics as st
st.mean(data['Mthly_HH_Income'])
41558
st.variance(data['Mthly_HH_Income'])
681100853.0612245
st.stdev(data['Mthly_HH_Income'])
26097.908978713687
data['No_of_Fly_Members'].unique()
array([3, 2, 1, 5, 4, 6, 7])
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 2/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
st.mode(data['No_of_Fly_Members'])
data['No_of_Fly_Members'].value_counts()
No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64
st.mode(data['No_of_Earning_Members'])
Highest_Qualified_Member column is categorical data type- few distince values
data['Highest_Qualified_Member'].value_counts()
Highest_Qualified_Member
Graduate 19
Under-Graduate 10
Professional 10
Post-Graduate 6
Illiterate 5
Name: count, dtype: int64
data visualizations- graphs & charts
python provides a package for visualizations-
1. matplotlib.pyplot
2. seaborn
line, bar, pie, histogram, box, scatter
import matplotlib.pyplot as plt
scatter plot: to visualize the relationship between two variables/attributes/ columns
1. datapoints are represented using dots
trend is - expenditure increases with increase in income
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 3/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
# size of chart
plt.figure(figsize=(3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
# x & y axis labels
plt.xlabel('Income')
plt.ylabel('Expenditure')
plt.title('Income vs expenditure')
plt.show()
line plot :
generally- the monthly expenditure of the families is less than income
plt.figure(figsize=(3,3))
plt.plot(data['Mthly_HH_Income'],label='income' )
plt.plot(data['Mthly_HH_Expense'], label='expenditure')
plt.legend() # giving labels to graphs
plt.show()
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 4/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
pie chart: for categorical variables(few unique values), to know the proportion of each category
1. circular figure showing the proportions
x = data['No_of_Earning_Members'].value_counts()
print(x)
No_of_Earning_Members
1 33
2 12
3 4
4 1
Name: count, dtype: int64
plt.figure(figsize=(3,3))
plt.pie(x,labels=x.index, autopct='%.0f%%' )
plt.show()
histogram: used for single variable values are divided into intervals / bins.
1. bars are displayed to represent count in each bin
print(data['Mthly_HH_Income'].min())
print(data['Mthly_HH_Income'].max())
5000
100000
plt.figure(figsize=(3,3))
plt.hist(data['Mthly_HH_Income'], bins = 10)
plt.show()
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 5/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
earning = data['No_of_Earning_Members'].unique()
#print(earning)
plt.hist(data['No_of_Earning_Members'])
plt.xlabel('No. of earning members')
plt.ylabel('Count')
plt.xticks(earning)
plt.show()
Start coding or generate with AI.
Start coding or generate with AI.
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 6/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
Start coding or generate with AI.
plt.figure(figsize= (3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
plt.xlabel('income')
plt.ylabel('expenditure')
plt.show()
plt.pie(data['No_of_Fly_Members'])
plt.show()
data['No_of_Fly_Members'].unique()
array([3, 2, 1, 5, 4, 6, 7])
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 7/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
x = data['No_of_Fly_Members'].value_counts()
print(x)
No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64
plt.pie(x, labels= x.index)
plt.show()
Start coding or generate with AI.
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 8/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 9/9