0% found this document useful (0 votes)

19 views3 pages

DSBDA3

The document contains Python code for data analysis using pandas and matplotlib, focusing on two datasets: sales data and the Iris dataset. It includes operations such as reading CSV files, displaying data, checking for missing values, calculating summary statistics, and grouping data by categories. The sales data is grouped by age group to analyze income, while the Iris dataset is summarized by species.

Uploaded by

naitikpawar22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views3 pages

DSBDA3

Uploaded by

naitikpawar22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

In [4]: sales = pd.read_csv('C:/Users/prajw/Desktop/Indexs/DSBDA print/DSBDA3/data.csv')

In [5]: sales.head()

Out[5]: Name AgeGroup Income

0 John 20-30 30000

1 Alice 30-40 45000

2 Mike 20-30 28000

3 Sarah 40-50 52000

4 Rahul 30-40 46000

In [7]: sales.tail() #Show me the last 5 rows from the table named sales

Out[7]: Name AgeGroup Income

5 Priya 40-50 51000

6 Ravi 20-30 31000

7 Sneha 30-40 47000

8 Amit 20-30 29000

9 Neha 40-50 50000

In [9]: sales.shape #Show me number of rows and columm

Out[9]: (10, 3)

In [11]: sales.isnull().sum() #It tells you how many missing values (nulls) are in each column of your sales table.

Out[11]: Name 0
AgeGroup 0
Income 0
dtype: int64

In [13]: sales.count() #tells you how many non-missing (non-null) values are present in each column

Out[13]: Name 10
AgeGroup 10
Income 10
dtype: int64

In [14]: sales.sum()

Out[14]: Name JohnAliceMikeSarahRahulPriyaRaviSnehaAmitNeha

AgeGroup 20-3030-4020-3040-5030-4040-5020-3030-4020-304...
Income 409000
dtype: object

In [17]: print(sales.columns) #This will show you the actual column names

Index(['Name', 'AgeGroup', 'Income'], dtype='object')

In [20]: grouped = sales.groupby('AgeGroup')['Income']

In [23]: summary_stats = grouped.agg(['mean', 'median', 'min', 'max', 'std'])

print("Summary Statistics grouped by AgeGroup:")
print(summary_stats) # Calculate summary statistics

Summary Statistics grouped by AgeGroup:

mean median min max std
AgeGroup
20-30 29500.0 29500.0 28000 31000 1290.994449
30-40 46000.0 46000.0 45000 47000 1000.000000
40-50 51000.0 51000.0 50000 52000 1000.000000

In [22]: income_groups = grouped.apply(list).to_dict()

print("\nIncome values grouped by AgeGroup:")
for group, values in income_groups.items():
print(f"{group}: {values}")

Income values grouped by AgeGroup:

20-30: [30000, 28000, 31000, 29000]
30-40: [45000, 46000, 47000]
40-50: [52000, 51000, 50000]
In [ ]: # Iris dataset

In [1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

In [2]: iris = pd.read_csv('C:/Users/prajw/Desktop/Indexs/DSBDA print/DSBDA3/iris.csv')

In [3]: iris.head()

Out[3]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

In [4]: iris.tail()

Out[4]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

In [5]: iris.shape

Out[5]: (150, 5)

In [6]: print(iris.columns)

Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',

'petal width (cm)', 'species'],
dtype='object')

In [10]: iris.describe()

Out[10]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

count 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333

std 0.828066 0.435866 1.765298 0.762238

min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

max 7.900000 4.400000 6.900000 2.500000

In [14]: print(iris['species'].unique())

['setosa' 'versicolor' 'virginica']

In [15]: # Check what species are available

print("Available species in dataset:", iris['species'].unique())

Available species in dataset: ['setosa' 'versicolor' 'virginica']

In [16]: # Now use the correct species names (based on the above output)
species_list = iris['species'].unique()

for species in species_list:

print(f"\nStatistical summary for {species}:")
subset = iris[iris['species'] == species]
print(subset.describe(percentiles=[.25, .5, .75]))
Statistical summary for setosa:
sepal length (cm) sepal width (cm) petal length (cm) \
count 50.00000 50.000000 50.000000
mean 5.00600 3.428000 1.462000
std 0.35249 0.379064 0.173664
min 4.30000 2.300000 1.000000
25% 4.80000 3.200000 1.400000
50% 5.00000 3.400000 1.500000
75% 5.20000 3.675000 1.575000
max 5.80000 4.400000 1.900000

petal width (cm)

count 50.000000
mean 0.246000
std 0.105386
min 0.100000
25% 0.200000
50% 0.200000
75% 0.300000
max 0.600000

Statistical summary for versicolor:

sepal length (cm) sepal width (cm) petal length (cm) \
count 50.000000 50.000000 50.000000
mean 5.936000 2.770000 4.260000
std 0.516171 0.313798 0.469911
min 4.900000 2.000000 3.000000
25% 5.600000 2.525000 4.000000
50% 5.900000 2.800000 4.350000
75% 6.300000 3.000000 4.600000
max 7.000000 3.400000 5.100000

petal width (cm)

count 50.000000
mean 1.326000
std 0.197753
min 1.000000
25% 1.200000
50% 1.300000
75% 1.500000
max 1.800000

Statistical summary for virginica:

sepal length (cm) sepal width (cm) petal length (cm) \
count 50.00000 50.000000 50.000000
mean 6.58800 2.974000 5.552000
std 0.63588 0.322497 0.551895
min 4.90000 2.200000 4.500000
25% 6.22500 2.800000 5.100000
50% 6.50000 3.000000 5.550000
75% 6.90000 3.175000 5.875000
max 7.90000 3.800000 6.900000

petal width (cm)

count 50.00000
mean 2.02600
std 0.27465
min 1.40000
25% 1.80000
50% 2.00000
75% 2.30000
max 2.50000

In [17]: iris['species'] = iris['species'].str.strip()

In [18]: iris[iris['species'] == 'Iris-setosa']

Out[18]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species

In [ ]:

Probing Understanding
No ratings yet
Probing Understanding
30 pages
Digital Image Processing Concepts, Algorithms, and Scientific Applications Second Edition by Bemd Jahne PDF
No ratings yet
Digital Image Processing Concepts, Algorithms, and Scientific Applications Second Edition by Bemd Jahne PDF
413 pages
Assignment On Digital Image Processing
0% (1)
Assignment On Digital Image Processing
7 pages
Analysis and Linear Algebra For Finance Part I
No ratings yet
Analysis and Linear Algebra For Finance Part I
127 pages
Tutorials On Hydrostatic Forces
No ratings yet
Tutorials On Hydrostatic Forces
3 pages
18M-302C 6362 Answer Key
No ratings yet
18M-302C 6362 Answer Key
26 pages
InfluenceOfBearingAsymmetryOnStability Linked
No ratings yet
InfluenceOfBearingAsymmetryOnStability Linked
28 pages
Assignment Solution
No ratings yet
Assignment Solution
6 pages
Monica Ruth - 120210180031 - DSE
No ratings yet
Monica Ruth - 120210180031 - DSE
37 pages
The Use of Multiple Measurements in Taxonomic Problems-A R Fisher
No ratings yet
The Use of Multiple Measurements in Taxonomic Problems-A R Fisher
11 pages
File: XFINAL06new2: I. Course Description
No ratings yet
File: XFINAL06new2: I. Course Description
3 pages
Manual Ezysurf
No ratings yet
Manual Ezysurf
10 pages
Worksheet-1 Trigonometry
No ratings yet
Worksheet-1 Trigonometry
3 pages
Craig Ch02
No ratings yet
Craig Ch02
27 pages
Limitations of Mathematical Model PDF
No ratings yet
Limitations of Mathematical Model PDF
16 pages
Zeus Case Study
No ratings yet
Zeus Case Study
7 pages
Find The LCM of The Following Numbers
No ratings yet
Find The LCM of The Following Numbers
1 page
What Is Language?: Medium of Communication
No ratings yet
What Is Language?: Medium of Communication
3 pages
Grammar Test 3689
No ratings yet
Grammar Test 3689
2 pages
Import As Import As: "Iris - CSV"
No ratings yet
Import As Import As: "Iris - CSV"
4 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
General Physics 2 Module 2
No ratings yet
General Physics 2 Module 2
9 pages
Matplotlib Styles: 1. Test - Generate - Plot - With - Style1
No ratings yet
Matplotlib Styles: 1. Test - Generate - Plot - With - Style1
2 pages
Data Visualization With Maplotlib
No ratings yet
Data Visualization With Maplotlib
8 pages
6 Lab
No ratings yet
6 Lab
16 pages
4.5 Raw Dataset For Sepal Length and Sepal Width Setosa Versicolour Virginica
No ratings yet
4.5 Raw Dataset For Sepal Length and Sepal Width Setosa Versicolour Virginica
8 pages
A Review On Cartans Structure Equations For Certa
No ratings yet
A Review On Cartans Structure Equations For Certa
7 pages
Data Visualization and Matplot
No ratings yet
Data Visualization and Matplot
11 pages
Pavel Florensky S Complex Universe
No ratings yet
Pavel Florensky S Complex Universe
39 pages
2024 Calendar
No ratings yet
2024 Calendar
13 pages
Assignment 3 Iris
No ratings yet
Assignment 3 Iris
2 pages
NUMPY-case Study
100% (1)
NUMPY-case Study
4 pages
GATE 2023: (Forenoon Session) Computer Science Engineering
No ratings yet
GATE 2023: (Forenoon Session) Computer Science Engineering
55 pages
DML About Put
No ratings yet
DML About Put
2 pages
ANSYS CFX Tutorials
No ratings yet
ANSYS CFX Tutorials
610 pages
Summary (Iris) #View Statistical Summary of Dataset
No ratings yet
Summary (Iris) #View Statistical Summary of Dataset
1 page
b21 DSBDA Assignment No 3
No ratings yet
b21 DSBDA Assignment No 3
3 pages
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
j48 Decision Tree
No ratings yet
j48 Decision Tree
1 page
Experiment 3
No ratings yet
Experiment 3
4 pages
Id3 Decision Tree
No ratings yet
Id3 Decision Tree
1 page
LAS DRAWING-Q2-Classification of Drawing Tools
No ratings yet
LAS DRAWING-Q2-Classification of Drawing Tools
3 pages
KCBE - Mathematic MS2023
No ratings yet
KCBE - Mathematic MS2023
15 pages
Name:-Nisha Ambike: Roll No: - 02
No ratings yet
Name:-Nisha Ambike: Roll No: - 02
2 pages
K Means Algorithm
No ratings yet
K Means Algorithm
1 page
Kmeansrcode
No ratings yet
Kmeansrcode
2 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
b21 DSBDA Assignment No 10
No ratings yet
b21 DSBDA Assignment No 10
1 page
Iris - Ipynb - Colaboratory
No ratings yet
Iris - Ipynb - Colaboratory
8 pages
CA Merged
No ratings yet
CA Merged
6 pages
CN's Index
No ratings yet
CN's Index
2 pages
Mid Term Last Year
No ratings yet
Mid Term Last Year
4 pages
K Means On IRIS Dataset
No ratings yet
K Means On IRIS Dataset
4 pages
Practical 01
No ratings yet
Practical 01
18 pages
Chbapter 16
No ratings yet
Chbapter 16
12 pages
25 - Assignment10.ipynb - Colaboratory
No ratings yet
25 - Assignment10.ipynb - Colaboratory
13 pages
137 Vsec 6
No ratings yet
137 Vsec 6
2 pages
Iris Decision Tree
No ratings yet
Iris Decision Tree
1 page
Presentation On Research Paper "The Peak Pool Boiling Heat Flux On Horizontal Cylinders"
No ratings yet
Presentation On Research Paper "The Peak Pool Boiling Heat Flux On Horizontal Cylinders"
19 pages
Base de Datos IRIS Codigos R Utilizados para El Analisis
No ratings yet
Base de Datos IRIS Codigos R Utilizados para El Analisis
4 pages
Vsec PW 7
No ratings yet
Vsec PW 7
3 pages
OUTPUT AI-flat
No ratings yet
OUTPUT AI-flat
44 pages
Data Visualizationyuo
No ratings yet
Data Visualizationyuo
28 pages
Notes DV
No ratings yet
Notes DV
19 pages
ML Lab Record
No ratings yet
ML Lab Record
64 pages
Iris
No ratings yet
Iris
3 pages
Experiment 11 PML
No ratings yet
Experiment 11 PML
3 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
Maths Worksheet Grade 4 May 2025
No ratings yet
Maths Worksheet Grade 4 May 2025
3 pages
DSBDA1
No ratings yet
DSBDA1
5 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
DSBDA9
No ratings yet
DSBDA9
7 pages
Ploomber Notebook Conversion - 2
No ratings yet
Ploomber Notebook Conversion - 2
14 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
Ass - 10.ipynb - Colab
No ratings yet
Ass - 10.ipynb - Colab
8 pages
Normalization
No ratings yet
Normalization
4 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
Website Evaluation Report
No ratings yet
Website Evaluation Report
1 page
Program1 MLA Lab 2025 250109 144615
No ratings yet
Program1 MLA Lab 2025 250109 144615
17 pages
Dsbdalab 10
No ratings yet
Dsbdalab 10
12 pages
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
No ratings yet
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
20 pages
Trần Mạnh Hùng 20192643.Ipynb - Colab
No ratings yet
Trần Mạnh Hùng 20192643.Ipynb - Colab
6 pages
Dsbda 3B
No ratings yet
Dsbda 3B
5 pages
Practical 10 Code
No ratings yet
Practical 10 Code
5 pages
Vansh 3089 CA2
No ratings yet
Vansh 3089 CA2
13 pages
Devansh Attachment Answer
No ratings yet
Devansh Attachment Answer
1 page
Nandini Matplotlib Ws
No ratings yet
Nandini Matplotlib Ws
10 pages
Dsbda 3B
No ratings yet
Dsbda 3B
5 pages
Business Analytics Assignment NAME: Divyansh: Bisht
No ratings yet
Business Analytics Assignment NAME: Divyansh: Bisht
7 pages
DSBDA Assignment 3 Jupyter Notebook
No ratings yet
DSBDA Assignment 3 Jupyter Notebook
3 pages
Import As Import As From Import Import As Import As From Import From Import From Import
No ratings yet
Import As Import As From Import Import As Import As From Import From Import From Import
6 pages
AbhishekVallecha 2003184 ADS Exp9
No ratings yet
AbhishekVallecha 2003184 ADS Exp9
6 pages
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
From Everand
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
Bart Baesens
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)

DSBDA3

Uploaded by

DSBDA3

Uploaded by

In [1]: import pandas as pd

In [4]: sales = pd.read_csv('C:/Users/prajw/Desktop/Indexs/DSBDA print/DSBDA3/data.csv')

Out[5]: Name AgeGroup Income

0 John 20-30 30000

1 Alice 30-40 45000

2 Mike 20-30 28000

3 Sarah 40-50 52000

4 Rahul 30-40 46000

Out[7]: Name AgeGroup Income

5 Priya 40-50 51000

6 Ravi 20-30 31000

7 Sneha 30-40 47000

8 Amit 20-30 29000

9 Neha 40-50 50000

In [9]: sales.shape #Show me number of rows and columm

Out[14]: Name JohnAliceMikeSarahRahulPriyaRaviSnehaAmitNeha

Index(['Name', 'AgeGroup', 'Income'], dtype='object')

In [20]: grouped = sales.groupby('AgeGroup')['Income']

In [23]: summary_stats = grouped.agg(['mean', 'median', 'min', 'max', 'std'])

Summary Statistics grouped by AgeGroup:

In [22]: income_groups = grouped.apply(list).to_dict()

Income values grouped by AgeGroup:

In [1]: import numpy as np

In [2]: iris = pd.read_csv('C:/Users/prajw/Desktop/Indexs/DSBDA print/DSBDA3/iris.csv')

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',

count 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333

std 0.828066 0.435866 1.765298 0.762238

min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

max 7.900000 4.400000 6.900000 2.500000

['setosa' 'versicolor' 'virginica']

In [15]: # Check what species are available

Available species in dataset: ['setosa' 'versicolor' 'virginica']

for species in species_list:

petal width (cm)

Statistical summary for versicolor:

petal width (cm)

Statistical summary for virginica:

petal width (cm)

In [17]: iris['species'] = iris['species'].str.strip()

In [18]: iris[iris['species'] == 'Iris-setosa']

You might also like