0% found this document useful (0 votes)

3 views

Guidelines_ Data Exploration and Visualization

The document outlines the syllabus for the Data Exploration and Visualization course for B.Sc. Computer Science students, covering topics such as NumPy arrays, data manipulation with Pandas, grouping and aggregation, data visualization with Matplotlib and Seaborn, and interactive visualizations with Plotly. It includes a week-by-week breakdown of units, chapters, and reference materials, along with suggestive practice questions using various datasets. The course aims to equip students with essential skills in data analysis and visualization techniques.

Uploaded by

whyytrishh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Guidelines_ Data Exploration and Visualization

Uploaded by

whyytrishh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

B.Sc. Prog.

Computer Science Sem IV

DSE 02a: Data Exploration and Visualization

(Admission 2022 onwards)

TOPICS/UNITS Chapter Ref

Week 1 Unit 1(10): Creating and Manipulating Ch4:4.1 (upto pg 103), Usage [1]
to 2 NumPy arrays: creating arrays, indexing and of rand(), nrand() and
slicing, mathematical operations with NumPy randint() functions of NumPy
arrays

Week 3 Unit 2(15): Data Manipulation with Ch 5: 5.1, 5.2(upto pg 149), 5.3 [1]
to 5 Pandas: Series and DataFrame objects;
importing and exporting data from various Ch 6: 6.1 (pg 169-172, 175)
file formats into pandas DataFrame; Data Ch 7: 7.1, 7.2 (upto pg 202, 205-
selection and filtering- indexing, slicing, 206)
conditional filtering using boolean indexing; Ch 8: 8.1 (pg 221-223), 8.2 (pg
Data Cleaning- handling missing data in 227-231), 8.3 (pg 243-245)
Pandas and outlier detection; Data
Manipulation-sorting, reshaping, merging.
Week 6 Unit 3(5): Grouping and Aggregation with Ch 10: 10.1(upto pg 293), 10.2, [1]
to 9 Pandas: Grouping data using Pandas, 10.3 (upto pg 303), 10.4
applying aggregation functions such as sum,
mean, count, etc.to grouped data, using pivot
tables and cross-tabulation for data
summarization

Week Unit 4(10): Data Visualization with Matplotlib Ch 9: 9.1 (pg 253-264, 267), 9.2 [1]
10 to 13 and Seaborn: Introduction to Matplotlib and
Seaborn to plot data using figures and
subplots, Plots - Line plots, scatter plots, and bar
plots, Visualizing distributions using histogram
and box plots, Customizing plot aesthetics and
adding annotations

Week Unit 5(5): Interactive Visualizations with Chapter-8 (upto topic- Use of
bar charts in Plotly) [4]
14 to 15 Plotly: Introduction to Plotly library for
interactive visualization; Creating interactive line
plots, scatter plots, and bar plots; Adding
interactivity with hover effects, zooming, and
panning
References

1. McKinney W. Python for Data Analysis: Data Wrangling with Pandas, NumPy and IPython. 2nd edition.
O’Reilly Media, 2018.

2. Molin S. Hands-On Data Analysis with Pandas, Packt Publishing, 2019.

3. VanderPlas, J. Python data science handbook: Essential tools for working with data. " O'Reilly Media,
Inc.", 2nd edition.

4. Rahman K. Python Data Visualization Essentials Guide: Become a Data Visualization expert by building
strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh, BPB 2021

Additional References:

1. Chen D. Y, Pandas for Everyone: Python Data Analysis, Pearson, 2018.

Online references/material:
1. https://www.indeepdata.com/blog/exploratory-data-analysis/
2. https://plotly.com/python/

Suggestive Practice Questions:

Use data set of your choice from Open Data Portal (https:// data.gov.in/, UCI repository) or
load from scikit, seaborn library for the following exercises to practice the concepts learnt.

1. Write a program using the NumPy library to perform the following tasks:

A. Generate a 5x2 integer array with values ranging from 50 to 100, where each element has a
difference of 5. Reshape the resulting array to a size of 10x1.
B. Create a 1D random array with values ranging from 1 to 100. Calculate various statistical
measures such as minimum, maximum, mean, median, standard deviation, number of unique
values, count of unique values, and the most frequent value in the array.
C. Create a 5x5 identity matrix where all the diagonal elements are set to the value 5.
D. Consider a dataset containing the heights (in centimeters) and weights (in kilograms) of 20
individuals. Your task is to perform various operations using the NumPy library to analyze the
data.
a. Create a NumPy array called "heights" with the following height values: [165, 170,
175, 168, 172, 180, 160, 169, 176, 171, 174, 182, 158, 167, 173, 179, 163, 166, 177,
181]. Create a NumPy array called "weights" with the following weight values: [60, 65,
70, 75, 80, 85, 55, 58, 63, 68, 72, 77, 50, 62, 67, 74, 52, 57, 69, 73].
b. Create a new NumPy array called "combined" by stacking the heights and weights
arrays such that the shape of the resulting array is 20 x 2.
c. Calculate and print the mean height and weight of the individuals in the dataset.
d. Find and print the index of the shortest and tallest individuals in the dataset.
e. Sort the array based on height on the individuals.
f. Swap the positions of the two columns in the array.
g. Retrieve records of individuals having weight below 70kg.

2. Write a program using the Pandas library to perform the following operations on the penguins dataset from
the Seaborn library:
A. Load the penguins dataset into a Pandas dataframe.
B. Determine the number of observations/records and the number of attributes in the dataframe.
C. Display the names of the attributes, row indexes, and data types of each attribute in the dataframe.
D. Display the first 5 and last 5 records of the dataframe.
E. Retrieve the values of the second column for the third and fourth records.
F. Display a summary of the data distribution for all attributes in the dataframe.
G. Compute the pairwise correlation between all attributes in the dataframe.

3. Consider the Titanic dataset, which contains information about passengers on board the Titanic, including
their age, gender, passenger class, survival status, and other attributes. Write a program using the Pandas
library to perform the following operations on the Titanic dataset:
A. Load the Titanic dataset into a Pandas DataFrame.
B. Check for any duplicate records and missing values in the dataset and handle them appropriately.
C. Calculate and display the total number of passengers who survived and those who did not.
D. Filter the DataFrame to select only the records of passengers who were under the age of 18.
E. Calculate the average age for passengers belonging to each of the passenger class.
F. Create a new column in the DataFrame called "Family Size" that represents the total number of family
members (including the passenger) on board.
G. Calculate the correlation between age and fare attributes of the dataset.
H. Create a contingency table that shows the count of passengers based on their survival status (survived
or not) and passenger class (first, second, or third class). for titanic dataset

4. Utilize the iris dataset from the Sklearn library to generate various visual representations of the data using
the Matplotlib and or Seaborn libraries with proper legends and labels. Perform the following tasks:

A. Create a scatter plot to visualize the relationship between petal length and petal width for different
instances of iris flowers.
B. Generate histograms to display the data distribution of each of the four attributes in the iris dataset.
C. Construct a pie chart to illustrate the frequency count of each flower type in the iris dataset.
D. Create a pair plot that showcases the relationship between every pair of attributes in the iris dataset
(only seaborn library).

5. Create the visualizations of question 4 (A and C part) using plotly library.

Contributors:

Delegate List - 10th IMRC With Contact Details - Removed (1) - Removed
No ratings yet
Delegate List - 10th IMRC With Contact Details - Removed (1) - Removed
234 pages
DH Ratnagiri Quality Manual
100% (2)
DH Ratnagiri Quality Manual
71 pages
Software Development Report
100% (2)
Software Development Report
28 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Guidelines_DAVP
No ratings yet
Guidelines_DAVP
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
GE Practical Sem 2 (2)
No ratings yet
GE Practical Sem 2 (2)
28 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
manishadav
No ratings yet
manishadav
27 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
23HCS4142.pdf
No ratings yet
23HCS4142.pdf
24 pages
Pandas_Worksheet
No ratings yet
Pandas_Worksheet
19 pages
ASSIGNMENT 1
No ratings yet
ASSIGNMENT 1
2 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
Exp7 11 Data Science
No ratings yet
Exp7 11 Data Science
23 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
python 1
No ratings yet
python 1
16 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
Ultimate_Data_Visualization_Guide_with_Python
No ratings yet
Ultimate_Data_Visualization_Guide_with_Python
26 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Lab 9
No ratings yet
Lab 9
2 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
DSF LAB EXP FULL (1) (1)
No ratings yet
DSF LAB EXP FULL (1) (1)
88 pages
KJD ML File
No ratings yet
KJD ML File
45 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Data Sci
No ratings yet
Data Sci
10 pages
b2
No ratings yet
b2
6 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
CS 2 SEM SYLLABUS
No ratings yet
CS 2 SEM SYLLABUS
3 pages
VIP Question Bank for DPV for Theory Exam
No ratings yet
VIP Question Bank for DPV for Theory Exam
6 pages
EDAP LAB
No ratings yet
EDAP LAB
47 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Assignment 5 - Copy (4)
No ratings yet
Assignment 5 - Copy (4)
7 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
24-25 Gr-10-Ai Practical File Python Term-1 and Term-2!24!25 2
No ratings yet
24-25 Gr-10-Ai Practical File Python Term-1 and Term-2!24!25 2
23 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Hints and Answers
No ratings yet
Hints and Answers
13 pages
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
14 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Experiment No 8
No ratings yet
Experiment No 8
26 pages
PH3094D Computational Lab_Exercise3
No ratings yet
PH3094D Computational Lab_Exercise3
3 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Data Sci
No ratings yet
Data Sci
6 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Class X - A.I. - Practical Lab Manual - VVA 2024-25
No ratings yet
Class X - A.I. - Practical Lab Manual - VVA 2024-25
50 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
DV Nivas
No ratings yet
DV Nivas
24 pages
GE02 (DAVP) Assignment
No ratings yet
GE02 (DAVP) Assignment
3 pages
Principles of AI Laboratory Varshadr
No ratings yet
Principles of AI Laboratory Varshadr
54 pages
Fundamentals of Data science Lab manual new
No ratings yet
Fundamentals of Data science Lab manual new
33 pages
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Python for Data Science For Dummies
From Everand
Python for Data Science For Dummies
John Paul Mueller
No ratings yet
Math- MT-1 QP(10-TN-Board) (1)_Final
No ratings yet
Math- MT-1 QP(10-TN-Board) (1)_Final
2 pages
IF11991
No ratings yet
IF11991
3 pages
Techniques in Answering SPM Mathematicspaper1
No ratings yet
Techniques in Answering SPM Mathematicspaper1
6 pages
CN 4th Unit MCQ 160
No ratings yet
CN 4th Unit MCQ 160
30 pages
CSE 444 Mobile Communications Course Outline
0% (1)
CSE 444 Mobile Communications Course Outline
4 pages
My_Bill_16 Oct, 2024-15 Nov, 2024_778730319634.pdf1732995763387_778730319634
No ratings yet
My_Bill_16 Oct, 2024-15 Nov, 2024_778730319634.pdf1732995763387_778730319634
1 page
DECISION-MAKING SKILLS IN CAREER
No ratings yet
DECISION-MAKING SKILLS IN CAREER
7 pages
Wearhouse Inventory Management Implementation Requirements: September 2021
No ratings yet
Wearhouse Inventory Management Implementation Requirements: September 2021
34 pages
Library Genesis - Library Genesis Guide
100% (2)
Library Genesis - Library Genesis Guide
9 pages
7.4 Mitigate ARP Attacks
No ratings yet
7.4 Mitigate ARP Attacks
6 pages
Single Phase Induction Motor
No ratings yet
Single Phase Induction Motor
10 pages
Writing Process Worksheet (Accompanies Unit 9, Page 79) : Top Notch Fundamentals
No ratings yet
Writing Process Worksheet (Accompanies Unit 9, Page 79) : Top Notch Fundamentals
2 pages
MNDC Guideline
No ratings yet
MNDC Guideline
62 pages
Dodoma - Bachelor of Procurement and Supply Management - Timetable - 2024 - 2025 - 1
No ratings yet
Dodoma - Bachelor of Procurement and Supply Management - Timetable - 2024 - 2025 - 1
1 page
Nine Deadly Sins
No ratings yet
Nine Deadly Sins
3 pages
Collins Practice Tests For YLE Starters Teacher S Guide 2
No ratings yet
Collins Practice Tests For YLE Starters Teacher S Guide 2
25 pages
Sportel Monaco 2017 - Participant List
No ratings yet
Sportel Monaco 2017 - Participant List
34 pages
Vimla Bisht 30.01.2024
No ratings yet
Vimla Bisht 30.01.2024
5 pages
VoltCurrMonitor_240703
No ratings yet
VoltCurrMonitor_240703
17 pages
Oxygen Cloud Forensics
No ratings yet
Oxygen Cloud Forensics
4 pages
Immediate download (Ebook) Infrastructure as Code, 2nd Edition (Early Access) by Kief Morris ISBN 9781098114602, 9781098114664, 1098114604, 1098114663 ebooks 2024
100% (9)
Immediate download (Ebook) Infrastructure as Code, 2nd Edition (Early Access) by Kief Morris ISBN 9781098114602, 9781098114664, 1098114604, 1098114663 ebooks 2024
55 pages
Coke Drum Monitoring Inspection Assessment and Repair For Service Life Improvement Chadda Foster Wheeler DCU Rio de Janiero 2014
No ratings yet
Coke Drum Monitoring Inspection Assessment and Repair For Service Life Improvement Chadda Foster Wheeler DCU Rio de Janiero 2014
31 pages
Am Merman 2010
No ratings yet
Am Merman 2010
10 pages
TOPIC 3 - LAW AND LEGISLATION
No ratings yet
TOPIC 3 - LAW AND LEGISLATION
20 pages
Indian Private Banks - Synopsis
No ratings yet
Indian Private Banks - Synopsis
7 pages
Quality Improvement Plan (QIP)
No ratings yet
Quality Improvement Plan (QIP)
17 pages
True - Colors - Test For Mentoring Fillable
No ratings yet
True - Colors - Test For Mentoring Fillable
3 pages

Guidelines_ Data Exploration and Visualization

Uploaded by

Guidelines_ Data Exploration and Visualization

Uploaded by

B.Sc. Prog.

Computer Science Sem IV

DSE 02a: Data Exploration and Visualization

(Admission 2022 onwards)

TOPICS/UNITS Chapter Ref

2. Molin S. Hands-On Data Analysis with Pandas, Packt Publishing, 2019.

1. Chen D. Y, Pandas for Everyone: Python Data Analysis, Pearson, 2018.

Suggestive Practice Questions:

5. Create the visualizations of question 4 (A and C part) using plotly library.

You might also like