0% found this document useful (0 votes)

42 views9 pages

Descriptive Analytics - Ipynb - Colab

The document outlines a descriptive analytics project using a CSV dataset of household income and expenditure. It details the loading of data into a pandas DataFrame, the exploration of data characteristics, and the application of descriptive statistics to analyze central tendencies and variations. Additionally, it includes visualizations such as scatter plots, line plots, pie charts, and histograms to illustrate relationships and distributions within the data.

Uploaded by

lsivakum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views9 pages

Descriptive Analytics - Ipynb - Colab

Uploaded by

lsivakum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

8/23/24, 11:47 AM descriptive analytics.

ipynb - Colab

income expenditure CSV dataset fro kaggle

load the dataset into dataframe / table

import pandas as pd
data = pd.read_csv('/content/sample_data/Inc_Exp_Data (1).csv')

data.head()

Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_HH_I

0 5000 8000 3 2000

1 6000 7000 2 3000

2 10000 4500 2 0 1

3 10000 2000 1 0

4 12500 12000 2 3000 1

data.shape

(50, 7)

data.columns

Index(['Mthly_HH_Income', 'Mthly_HH_Expense', 'No_of_Fly_Members',

'Emi_or_Rent_Amt', 'Annual_HH_Income', 'Highest_Qualified_Member',
'No_of_Earning_Members'],
dtype='object')

descriptive statistics uses the following measures

1. central tendency: mean, median, mode

2. frequency meadures- how frequently events are occuring
3. measures of variation- ranges, variance, SD

info()- number of rows, No. of columns, col names, data types of each col etc

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 1/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab
0 Mthly_HH_Income 50 non-null int64
1 Mthly_HH_Expense 50 non-null int64
2 No_of_Fly_Members 50 non-null int64
3 Emi_or_Rent_Amt 50 non-null int64
4 Annual_HH_Income 50 non-null int64
5 Highest_Qualified_Member 50 non-null object
6 No_of_Earning_Members 50 non-null int64
dtypes: int64(6), object(1)
memory usage: 2.9+ KB

describes numeric columns/attributes

data.describe()

Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_

count 50.000000 50.000000 50.000000 50.000000 5.0

mean 41558.000000 18818.000000 4.060000 3060.000000 4.9

std 26097.908979 12090.216824 1.517382 6241.434948 3.2

min 5000.000000 2000.000000 1.000000 0.000000 6.4

25% 23550.000000 10000.000000 3.000000 0.000000 2.5

50% 35000.000000 15500.000000 4.000000 0.000000 4.4

75% 50375.000000 25000.000000 5.000000 3500.000000 5.9

max 100000.000000 50000.000000 7.000000 35000.000000 1.4

central tendencies using statistics module

import statistics as st

st.mean(data['Mthly_HH_Income'])

41558

st.variance(data['Mthly_HH_Income'])

681100853.0612245

st.stdev(data['Mthly_HH_Income'])

26097.908978713687

data['No_of_Fly_Members'].unique()

array([3, 2, 1, 5, 4, 6, 7])

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 2/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

st.mode(data['No_of_Fly_Members'])

data['No_of_Fly_Members'].value_counts()

No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64

st.mode(data['No_of_Earning_Members'])

Highest_Qualified_Member column is categorical data type- few distince values

data['Highest_Qualified_Member'].value_counts()

Highest_Qualified_Member
Graduate 19
Under-Graduate 10
Professional 10
Post-Graduate 6
Illiterate 5
Name: count, dtype: int64

data visualizations- graphs & charts

python provides a package for visualizations-

1. matplotlib.pyplot
2. seaborn

line, bar, pie, histogram, box, scatter

import matplotlib.pyplot as plt

scatter plot: to visualize the relationship between two variables/attributes/ columns

1. datapoints are represented using dots

trend is - expenditure increases with increase in income

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 3/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

# size of chart
plt.figure(figsize=(3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
# x & y axis labels
plt.xlabel('Income')
plt.ylabel('Expenditure')
plt.title('Income vs expenditure')
plt.show()

line plot :

generally- the monthly expenditure of the families is less than income

plt.figure(figsize=(3,3))
plt.plot(data['Mthly_HH_Income'],label='income' )
plt.plot(data['Mthly_HH_Expense'], label='expenditure')
plt.legend() # giving labels to graphs
plt.show()

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 4/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

pie chart: for categorical variables(few unique values), to know the proportion of each category

1. circular figure showing the proportions

x = data['No_of_Earning_Members'].value_counts()
print(x)

No_of_Earning_Members
1 33
2 12
3 4
4 1
Name: count, dtype: int64

plt.figure(figsize=(3,3))
plt.pie(x,labels=x.index, autopct='%.0f%%' )
plt.show()

histogram: used for single variable values are divided into intervals / bins.

1. bars are displayed to represent count in each bin

print(data['Mthly_HH_Income'].min())
print(data['Mthly_HH_Income'].max())

5000
100000

plt.figure(figsize=(3,3))
plt.hist(data['Mthly_HH_Income'], bins = 10)

plt.show()

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 5/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

earning = data['No_of_Earning_Members'].unique()
#print(earning)
plt.hist(data['No_of_Earning_Members'])
plt.xlabel('No. of earning members')
plt.ylabel('Count')
plt.xticks(earning)
plt.show()

Start coding or generate with AI.

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 6/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

Start coding or generate with AI.

plt.figure(figsize= (3,3))
plt.scatter(data['Mthly_HH_Income'], data['Mthly_HH_Expense'])
plt.xlabel('income')
plt.ylabel('expenditure')
plt.show()

plt.pie(data['No_of_Fly_Members'])
plt.show()

data['No_of_Fly_Members'].unique()

array([3, 2, 1, 5, 4, 6, 7])

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 7/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

x = data['No_of_Fly_Members'].value_counts()
print(x)

No_of_Fly_Members
4 15
6 10
3 9
2 8
5 5
7 2
1 1
Name: count, dtype: int64

plt.pie(x, labels= x.index)

plt.show()

Start coding or generate with AI.

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 8/9
8/23/24, 11:47 AM descriptive analytics.ipynb - Colab

https://colab.research.google.com/drive/1yFLS5fSuCYx2dUpVf3vYKqOb0zfT2epy#printMode=true 9/9

Descriptive Analytics2.Ipynb - Colab
No ratings yet
Descriptive Analytics2.Ipynb - Colab
9 pages
EDA Lab Manual
No ratings yet
EDA Lab Manual
93 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
2nd Unit
No ratings yet
2nd Unit
31 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
No ratings yet
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
7 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
ML Report
No ratings yet
ML Report
12 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas 2
No ratings yet
Pandas 2
17 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Data Engineer Interview 1740985064
No ratings yet
Data Engineer Interview 1740985064
14 pages
Pandas-PPT
No ratings yet
Pandas-PPT
32 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
sakina_assign1_batch3
No ratings yet
sakina_assign1_batch3
8 pages
Experiment No. 1
No ratings yet
Experiment No. 1
7 pages
Python Code Longterm
No ratings yet
Python Code Longterm
5 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Main - Py Text File
No ratings yet
Main - Py Text File
5 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
No ratings yet
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
17 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Logistic
No ratings yet
Logistic
5 pages
Exploratory Data Analysis of Heart Disease Dataset 1737826105
No ratings yet
Exploratory Data Analysis of Heart Disease Dataset 1737826105
50 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
EDAV Manual With Code
No ratings yet
EDAV Manual With Code
70 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Data Analytics With Python Examples
No ratings yet
Data Analytics With Python Examples
2 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
2 Program
No ratings yet
2 Program
8 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
97 pages
Solidus PriceVolume
No ratings yet
Solidus PriceVolume
3 pages
Essential Excel
100% (1)
Essential Excel
64 pages
January 2012 QP - S1 Edexcel
No ratings yet
January 2012 QP - S1 Edexcel
13 pages
Seeing Through Statistics 4th Edition Utts Test Bank pdf version
No ratings yet
Seeing Through Statistics 4th Edition Utts Test Bank pdf version
78 pages
Graphical Presentation For Statistical Data
No ratings yet
Graphical Presentation For Statistical Data
8 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Ncert Solutions Class 8 Math Chapter 5 Data Handling
No ratings yet
Ncert Solutions Class 8 Math Chapter 5 Data Handling
20 pages
Math - Unit Daily Work - Unit 3 Removed
No ratings yet
Math - Unit Daily Work - Unit 3 Removed
4 pages
2016 American College of Rheumatology-European League Against Rheumatism Classification Criteria For Primary Sjögren's Syndrome
No ratings yet
2016 American College of Rheumatology-European League Against Rheumatism Classification Criteria For Primary Sjögren's Syndrome
8 pages
Group Midterm Exam
No ratings yet
Group Midterm Exam
5 pages
Organization of Data Using Table and Graph
No ratings yet
Organization of Data Using Table and Graph
19 pages
Statistics Ma'Am Lec 1
No ratings yet
Statistics Ma'Am Lec 1
10 pages
Draft - Assignment 1 Report
No ratings yet
Draft - Assignment 1 Report
8 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Statistical Process Control: Techniques For Feed Manufacturing
No ratings yet
Statistical Process Control: Techniques For Feed Manufacturing
8 pages
Data Sceince PPT (Copy 3)
No ratings yet
Data Sceince PPT (Copy 3)
12 pages
Ncert Solutions Class 9 Math Chapter 14 Statistics Ex 14 3
No ratings yet
Ncert Solutions Class 9 Math Chapter 14 Statistics Ex 14 3
17 pages
Golden Ratio
No ratings yet
Golden Ratio
12 pages
Histogram Equalization
No ratings yet
Histogram Equalization
38 pages
Applications Spring 2024
No ratings yet
Applications Spring 2024
14 pages
Notes - EDA-Unit3
No ratings yet
Notes - EDA-Unit3
24 pages
Coca-Cola Study
No ratings yet
Coca-Cola Study
29 pages
Regression With Stata
No ratings yet
Regression With Stata
132 pages
4-Data Cleaning - Handout
No ratings yet
4-Data Cleaning - Handout
6 pages
Stat 231 Printed Notes
No ratings yet
Stat 231 Printed Notes
65 pages
Activity Sheets: Quarter 3 - MELC 19
100% (1)
Activity Sheets: Quarter 3 - MELC 19
14 pages
Chapter 2: Organizing and Visualizing Variables: Self-Review
No ratings yet
Chapter 2: Organizing and Visualizing Variables: Self-Review
12 pages
Persistence Analysis Tutorial: Swedge Has The Ability To Take These Factors Into Consideration in A
No ratings yet
Persistence Analysis Tutorial: Swedge Has The Ability To Take These Factors Into Consideration in A
13 pages
Slide PTDL.1
No ratings yet
Slide PTDL.1
16 pages

Descriptive Analytics - Ipynb - Colab

Uploaded by

Descriptive Analytics - Ipynb - Colab

Uploaded by

8/23/24, 11:47 AM descriptive analytics.

income expenditure CSV dataset fro kaggle

load the dataset into dataframe / table

Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_HH_I

0 5000 8000 3 2000

1 6000 7000 2 3000

4 12500 12000 2 3000 1

Index(['Mthly_HH_Income', 'Mthly_HH_Expense', 'No_of_Fly_Members',

descriptive statistics uses the following measures

1. central tendency: mean, median, mode

describes numeric columns/attributes

Mthly_HH_Income Mthly_HH_Expense No_of_Fly_Members Emi_or_Rent_Amt Annual_

count 50.000000 50.000000 50.000000 50.000000 5.0

mean 41558.000000 18818.000000 4.060000 3060.000000 4.9

std 26097.908979 12090.216824 1.517382 6241.434948 3.2

min 5000.000000 2000.000000 1.000000 0.000000 6.4

25% 23550.000000 10000.000000 3.000000 0.000000 2.5

50% 35000.000000 15500.000000 4.000000 0.000000 4.4

75% 50375.000000 25000.000000 5.000000 3500.000000 5.9

max 100000.000000 50000.000000 7.000000 35000.000000 1.4

central tendencies using statistics module

Highest_Qualified_Member column is categorical data type- few distince values

data visualizations- graphs & charts

python provides a package for visualizations-

line, bar, pie, histogram, box, scatter

import matplotlib.pyplot as plt

scatter plot: to visualize the relationship between two variables/attributes/ columns

1. datapoints are represented using dots

trend is - expenditure increases with increase in income

generally- the monthly expenditure of the families is less than income

1. circular figure showing the proportions

1. bars are displayed to represent count in each bin

Start coding or generate with AI.

Start coding or generate with AI.

Start coding or generate with AI.

plt.pie(x, labels= x.index)

Start coding or generate with AI.

You might also like