0% found this document useful (0 votes)

60 views

Numpy, Pandas & Visualisation Test

The document outlines 6 assignments involving loading and analyzing various datasets using pandas and matplotlib. The assignments include tasks like calculating summary statistics, filtering data, grouping data, creating new columns, plotting distributions and relationships, and more. The goal is to get experience manipulating and exploring different types of real-world data.

Uploaded by

Sunil Jaswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Numpy, Pandas & Visualisation Test

Uploaded by

Sunil Jaswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 01:

Load the Iris dataset into a pandas DataFrame

1. Find the mean and median of the 'sepal_length' column.

2. Calculate the 75th percentile of the 'petal_width' column for each species in the Iris dataset.
3. Create a new column in the Iris DataFrame called 'sepal_area', which is the product of 'sepal_length' and
'sepal_width'.
4. Remove all rows in the Iris DataFrame where 'petal_length' is greater than twice the standard deviation of
'petal_length' for that species.
5. Normalize all numerical columns in the Iris DataFrame (except the 'species' column) using Min-Max scaling.
6. Find the three most common combinations of 'sepal_length', 'sepal_width', and 'petal_length' in the Iris
dataset.
7. Group the Iris DataFrame by 'species' and find the row with the highest 'sepal_width' for each group.
8. Replace all negative values in the 'petal_width' column of the Iris DataFrame with the mean of the
non-negative values in that column.
9. Calculate the correlation matrix for the 'sepal_length', 'sepal_width', 'petal_length', and 'petal_width'
columns in the Iris dataset and find the feature with the highest absolute correlation with 'petal_width'.

Assignment 02:

Load the Titanic dataset (available in seaborn) into a pandas DataFrame

1. Find the average age of passengers for each class (1st, 2nd, and 3rd).
2. Create a new DataFrame that contains the count of male and female passengers in each age group (e.g.,
0-10, 11-20, etc.).
3. Find the name and ticket number of the passenger(s) who paid the highest fare and survived the disaster.
4. Calculate the survival rate for passengers who were traveling alone (without any siblings, spouses, parents, or
children) versus those who were traveling with family members.
5. For each passenger, calculate the age difference with the oldest sibling (if any) and the age difference with
the youngest sibling (if any).
6. Find the most common deck letter (A, B, C, etc.) for each passenger class.
7. Group the Titanic DataFrame by 'Embarked' (port of embarkation) and find the percentage of passengers
who survived in each group.
8. Calculate the correlation matrix for the 'Age', 'Fare', and 'Survived' columns in the Titanic dataset and find the
feature with the highest absolute correlation with 'Survived'.
9. Create a new DataFrame that contains the 'Pclass', 'Sex', 'Age', and 'Fare' columns from the Titanic dataset
and pivot it to have 'Pclass' as the index, 'Sex' as the columns, and 'Fare' as the values, with 'Age' as the
weights.

Assignment 03:

Load the Planets dataset from seaborn

1. Scatter plot: Visualize the relationship between 'orbital_period' and 'mass' of the planets.
2. Bar plot: Display the count of planets discovered by each method.
3. Histogram: Visualize the distribution of 'distance' of planets from their respective stars.
4. Pair plot: Show pairwise relationships between 'orbital_period', 'mass', 'year', and 'distance'.
5. Violin plot: Compare the distribution of 'year' for different 'method' of planet discovery.
6. Swarm plot: Visualize the 'mass' of planets discovered by different 'method'
7. Heatmap: Create a heatmap to show the correlation matrix between numerical columns in the dataset.
8. Point plot: Compare the mean 'orbital_period' for each 'year' of planet discovery.
9. Bar plot with error bars: Show the mean 'mass' of planets for different 'method' of discovery with confidence
intervals.
Assignment 04:

Load the Penguins dataset into a pandas DataFrame

1. Display the first 5 rows.

2. Calculate the average 'bill_length_mm' for each species of penguins.
3. the penguin with the highest 'body_mass_g' and display its species and other information.
4. Create a new DataFrame containing only the penguins with 'sex' as 'MALE' and 'island' as 'Torgersen'.
5. Calculate the correlation matrix for 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', and
'body_mass_g'.
6. For each species of penguins, find the mean, median, minimum, and maximum 'body_mass_g'.
7. Replace any missing values in the 'sex' column with the most frequent value in that column.
8. Create a new column in the DataFrame called 'bill_area', which is the product of 'bill_length_mm' and
'bill_depth_mm'.
9. Group the DataFrame by 'species' and calculate the average 'body_mass_g' and 'flipper_length_mm' for each
species.
10. Calculate the total count of penguins for each 'island' and 'sex' combination.

Assignment 05:

Load the Diamonds dataset into a pandas DataFrame.

1. Calculate the average price for each cut of diamonds.

2. Find the diamond with the highest carat and display its details, including its cut, color, and clarity.
3. Create a new DataFrame containing only the diamonds with 'cut' as 'Ideal' and 'color' as 'D'.
4. Calculate the correlation matrix for 'carat', 'depth', 'table', and 'price' columns.
5. For each clarity grade of diamonds, find the mean, median, minimum, and maximum 'carat'.
6. Replace any missing values in the 'depth' column with the mean value of that column.
7. Create a new column in the DataFrame called 'volume', which is the product of 'x', 'y', and 'z' columns.
8. Group the DataFrame by 'cut' and 'color' and calculate the average price and carat for each group.
9. Calculate the total count of diamonds for each 'cut' and 'clarity' combination.

Assignment 06:

Load the dataset Assignment06.csv.

1. Here, the Area column is having the data in a mixed format of `alphabetical` and `int` characters. You need to
remove the last two zeros out of the data present in the column. As you can see in the first record the 'Area'
column is A100100 after conversion it will look like this: A1001
2. Filter the dataframe for records only for 2011 in a different dataframe.
3. Find the average value of 'geo_count' for the year 2022
4. Find the net average of 'ec_count' for year 2010,2011,2012
5. Plot KDE distribution of 'ec_count', for anzsic06 is F371.

Assignment 07:

Load the dataset Assignment07.csv

1. Find the most ordered item in the given dataframe.

2. Find the item, with the net highest number of quantity been sold.
3. Find the number of orders placed on '12-07-2011' and total quantity sold on this date.
4. Plot the number of orders placed with respect to each country.
5. Find the CustomerID of the person having the net highest number of quantity.

Diamond Essentials
100% (3)
Diamond Essentials
260 pages
Rapaport Diamond Report Dec 2020
100% (1)
Rapaport Diamond Report Dec 2020
2 pages
Rapaport 15
0% (1)
Rapaport 15
2 pages
AWN AssetEmancipationSystem
100% (4)
AWN AssetEmancipationSystem
14 pages
Task 1
0% (1)
Task 1
3 pages
Fractals: On The Edge Of Chaos
From Everand
Fractals: On The Edge Of Chaos
Oliver Linton
3/5 (2)
Gems Buying Guide
No ratings yet
Gems Buying Guide
8 pages
Predicting Diamond Price Using Linear Model
50% (2)
Predicting Diamond Price Using Linear Model
20 pages
Assignment
No ratings yet
Assignment
3 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
8 Data Visualization
No ratings yet
8 Data Visualization
12 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
GE Practical Sem 2 (2)
No ratings yet
GE Practical Sem 2 (2)
28 pages
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
No ratings yet
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
12 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
PRACTICAL QUESTIONS For DSBDA
No ratings yet
PRACTICAL QUESTIONS For DSBDA
9 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
manishadav
No ratings yet
manishadav
27 pages
vertopal.com_homework1
No ratings yet
vertopal.com_homework1
17 pages
2023 Data Analysis and Visualization Using Python
100% (1)
2023 Data Analysis and Visualization Using Python
9 pages
Exercises 3
No ratings yet
Exercises 3
11 pages
FDS slips solution
No ratings yet
FDS slips solution
7 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Print Print Print Print: Import As
No ratings yet
Print Print Print Print: Import As
6 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
Guidelines_DAVP
No ratings yet
Guidelines_DAVP
3 pages
Assignment1
No ratings yet
Assignment1
2 pages
hw-1
No ratings yet
hw-1
11 pages
(Program Should Input Number of Row/columns and Followed by The
No ratings yet
(Program Should Input Number of Row/columns and Followed by The
3 pages
Data Science Assignment Submission
No ratings yet
Data Science Assignment Submission
12 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Exercises 5
No ratings yet
Exercises 5
7 pages
data science practicals
No ratings yet
data science practicals
47 pages
DSA lab manual pgms_fINAL
No ratings yet
DSA lab manual pgms_fINAL
34 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Part A Assignment - No - 8-1 PDF
No ratings yet
Part A Assignment - No - 8-1 PDF
17 pages
CS 2 SEM SYLLABUS
No ratings yet
CS 2 SEM SYLLABUS
3 pages
Titanic
No ratings yet
Titanic
22 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
PA Assignment 1 Oct2021
No ratings yet
PA Assignment 1 Oct2021
8 pages
FDS All Practicals
No ratings yet
FDS All Practicals
10 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
The Titanic dataset
No ratings yet
The Titanic dataset
6 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
1
No ratings yet
1
9 pages
ml lab
No ratings yet
ml lab
14 pages
External
No ratings yet
External
11 pages
Presentation 1
No ratings yet
Presentation 1
30 pages
DS Slips Solutions Sem 5
No ratings yet
DS Slips Solutions Sem 5
23 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
PR LIST DSBDA
No ratings yet
PR LIST DSBDA
2 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Today Test
No ratings yet
Today Test
3 pages
Project List
No ratings yet
Project List
2 pages
All PDF Maker 20230811 00.22.23
No ratings yet
All PDF Maker 20230811 00.22.23
2 pages
Output 2
No ratings yet
Output 2
1 page
What Is Gemmology?
No ratings yet
What Is Gemmology?
5 pages
Rapaport Diamond Report: Pears: Pears: Pears: Pears: Pears
No ratings yet
Rapaport Diamond Report: Pears: Pears: Pears: Pears: Pears
2 pages
Electronic Copy: Laboratory Grown Diamond Report
No ratings yet
Electronic Copy: Laboratory Grown Diamond Report
1 page
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Diamond Industry
100% (2)
Diamond Industry
21 pages
Pear Shape Diamonds
No ratings yet
Pear Shape Diamonds
2 pages
GIA DiamondServices USD PDF
No ratings yet
GIA DiamondServices USD PDF
2 pages
Jewelry Sales Receipt For Insurance Purposes: Description of Article (S)
No ratings yet
Jewelry Sales Receipt For Insurance Purposes: Description of Article (S)
1 page
Gemstones: Beginning Jewelry Sales
No ratings yet
Gemstones: Beginning Jewelry Sales
29 pages
SU05
100% (1)
SU05
114 pages
Diamonds & Diamond Grading Book 2 (Assignments 7-11) 2019
100% (1)
Diamonds & Diamond Grading Book 2 (Assignments 7-11) 2019
170 pages
ROUND-8
No ratings yet
ROUND-8
2 pages
Diamond Jewellery Industry
No ratings yet
Diamond Jewellery Industry
122 pages
Gia PDF
No ratings yet
Gia PDF
1 page
The Professor Proposes: Characteristics of Diamonds
No ratings yet
The Professor Proposes: Characteristics of Diamonds
13 pages
Rapaport 2016.08.19
No ratings yet
Rapaport 2016.08.19
4 pages
ddg10 Printable
No ratings yet
ddg10 Printable
42 pages
Culet and Girdle Assessment
No ratings yet
Culet and Girdle Assessment
8 pages
Carat Grading
No ratings yet
Carat Grading
3 pages
Earrings Pricelist
No ratings yet
Earrings Pricelist
16 pages
List For Diamonds
No ratings yet
List For Diamonds
15 pages
REPORT
No ratings yet
REPORT
33 pages
Natural Diamond Price List (2024!05!01 02-00-35)
No ratings yet
Natural Diamond Price List (2024!05!01 02-00-35)
1 page
GI AR Epo RT: Diamond Dossier
No ratings yet
GI AR Epo RT: Diamond Dossier
1 page

Numpy, Pandas & Visualisation Test

Uploaded by

Numpy, Pandas & Visualisation Test

Uploaded by

Assignment 01:

Load the Iris dataset into a pandas DataFrame

1. Find the mean and median of the 'sepal_length' column.

Load the Titanic dataset (available in seaborn) into a pandas DataFrame

Load the Planets dataset from seaborn

Load the Penguins dataset into a pandas DataFrame

1. Display the first 5 rows.

Load the Diamonds dataset into a pandas DataFrame.

1. Calculate the average price for each cut of diamonds.

Load the dataset Assignment06.csv.

Load the dataset Assignment07.csv

1. Find the most ordered item in the given dataframe.

You might also like