0% found this document useful (0 votes)

10 views

Lab #2 - Data Analysis With NumPy and Pandas

This document outlines a lab assignment for a course on AI and Machine Learning using Python, focusing on data analysis with NumPy and Pandas. It includes instructions for various tasks such as creating and manipulating arrays, working with Pandas Series and DataFrames, reading data from CSV files, and analyzing datasets. Additionally, it emphasizes collaborative learning through discussions on key concepts learned during the lab.

Uploaded by

wasaykhan1219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lab #2 - Data Analysis With NumPy and Pandas

Uploaded by

wasaykhan1219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

PROG25211 AI and Machine Learning - Python

Lab #2
Data Analysis with NumPy and Pandas

INSTRUCTIONS
1. All activities in this lab should be conducted individually, unless indicated otherwise.
2. Apply the material seen in class and video/readings to complete the steps in the lab, answering the
questions and taking screenshots of your work and pasting them into this document as indicated.
3. Upload this document with your screenshots and/or answers to the appropriate dropbox in the assignments
section before the due date and time indicated in Slate.
4. Complete the corresponding self-assessment in the quizzes section within 3 days of submitting this
document with your answers.

PART #1 – WORKING WITH NUMPY - ARRAYS FROM PYTHON LISTS AND ARRAY FUNCTIONS
OBJECTIVE: To practice using NumPy to generate and manipulate arrays in Python. Applying fundamental
array operations such as creating sequences, generating random numbers, and performing indexing to extract
statistical values. This exercise reinforces basic array syntax, introduces random number generation, and
strengthens data exploration skills through hands-on coding and output interpretation.

CONDUCT THE FOLLOWING STEPS:

1. Create a new Jupyter Notebook.
2. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Import the NumPy library
b. Create a Python list with 1,2,3
c. Create a NumPy array from the list

3. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Generate an array from 0-9
b. Generate an array from 0-10 with a step of 2
c. Generate an array size 10 of random numbers from 0-1
d. Generate an array size 10 of random numbers from 0-10

4. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Create an array of 10 random integers between 1 and 100.
b. What is the max value in the array?
c. What is the index of the max value?
d. What is the minimum value?
e. What is the index of the minimum value?
f. What is the value at index 3?
g. What are the values at index 2 through 4?

PART #2 – WORKING WITH PANDAS DATA STRUCTURE (SETS AND SERIES)

OBJECTIVE: To understand that Pandas is an open-source library that provides high-performance, user-friendly
data structures and data analysis tools for Python. Explore data structures such as Series and the concept of
key-values.

5. Create lists and dictionaries following the following steps and then take a screenshot of the code and
results and paste it in the box below:
a. Import the pandas library (numpy is also needed)
b. Create a list named labels containing 'a', 'b', 'c' for a label column.
c. Create a list named values1 with the values 7, 8, 9.
d. Create a dictionary with keys 'a', 'b', 'c', 'd' and corresponding values 10, 11, 12, 13.

6. Create set1 and set2 as Pandas Series using values1/labels and dictionary, then add them to get set3.
Take a screenshot of the code and results and paste it in the box below:
a. Create set1 as a Pandas Series using values1 and labels for data and index.
b. Create set2 as a Pandas Series from the dictionary.
c. Add set1 and set2 to create set3.
d. Display set3.
e. Get the value by the key ‘a’

PART #3 – USING DATA FRAMES IN PANDAS

OBJECTIVE: To introduce Pandas DataFrames, a 2-dimensional data structure similar to SQL or Excel, to
analyze and manipulate datasets. Explore the manipulation of rows and columns in DataFrames, including
extracting subframes, adding and removing columns.

7. Import NumPy and Pandas, then create a DataFrame using three lists of random integers, with labels
['a','b','c'] and columns ['W','X','Y','Z']. Take a screenshot of the code and results and paste it in the box
below:
a. Import NumPy and Pandas.
b. Generate three lists of four random integers between 1 and 20 using NumPy.
c. Define a list named labels with values ['a', 'b', 'c'].
d. Define a list named cols with values ['W', 'X', 'Y', 'Z'].
e. Combine the three lists into a list of lists, dataArray.
f. Create a DataFrame dataFrame using dataArray as data, labels as index, and cols as columns.
g. Display the DataFrame.

8. To display column data from a DataFrame, use the column name to access it, which returns a Pandas
Series. Take a screenshot of the code and results and paste it in the box below:
a. Access the 'X' column of the DataFrame, which produces a Pandas Series.
b. Use double brackets with ['X', 'Y'] to select these columns, producing a DataFrame.
c. Access the 'Z' column of the DataFrame using dot notation, which produces a Pandas Series.
d. Create a new column in the DataFrame, sum the 'X' and 'Y' columns, and name the new column
'Sum(X+Y)'

9. Columns and rows can also be removed from the data frame. Take a screenshot of the code and results
and paste it in the box below:
a. Use the drop function (method in the data frame) to remove the sum column. To make changes
permanent, use inplace=True. Set axis=1 to specify column operations, such as removing the
'Sum(X+Y)' column.
b. Display the data frame after removing the sum column.

10. Rows can be added to the data frame. Take a screenshot of the code and results and paste it in the box
below:
a. Create a new DataFrame newRow with a single row [1, 2, 3, 4] and columns ['W', 'X', 'Y', 'Z'],
and index ['d'].
b. Use the concat function in Pandas to concatenate newRow to the existing dataFrame.
c. Display the updated DataFrame.
PART #4 – USING PANDAS FOR READING DATA FRAMES FROM CSV FILES
OBJECTIVE: To read a dataset and start preparing it for analysis using methods in the Pandas library.
Determine which information is useful, what can be removed, and how to adjust the data for easier analysis.

It is important to know what each of the columns are and what the values represent.

● PassengerId - is an index value assigned to the data entry.

● Survived - 1 if they survived, 0 otherwise.
● Pclass - Ticket class: 1 = 1st, 2 = 2nd, 3 = 3rd
● Name - Name of the passenger.
● Sex - male or female for each passenger.
● Age - age of the passenger
● SibSb - No. of siblings / spouses aboard the Titanic
● Parch - No. of parents / children aboard the Titanic
● Ticket - Ticket Number
● Fare - Passenger Fare
● Embarked - Indicates where the passenger boarded.

11. Conduct the following steps, adding the steps as markdown comments, and take a screenshot of the
code and results and paste it in the box below:
a. We are going to read in a data set for the Titanic passenger list. Downloaded from:
https://www.kaggle.com/datasets/yasserh/titanic-dataset?select=Titanic-Dataset.csv
b. Import NumPy and Pandas.
c. Read the Titanic dataset from 'Titanic-Dataset.csv' into a DataFrame named titanicDataFrame.
d. Use the head function (method in the dataframe object) to display the first 5 records as well as
column names
e. Use the info function (method in the dataframe object) to display metadata information about
columns.
f. From the data frame information, how would you identify columns in the data frame that have
null values? Write the answer in the following box after the screenshot of the code and results.

12. Review the dataset to identify and remove columns with excessive null data, minimal relevance or
unknown value meanings.
13. In the box below, list the columns that are candidates for removal and provide a solid rationale (at least 3
columns). Hint: age column should not be removed.

14. Remove the 3 columns identified above using the drop method.
15. Display the updated DataFrame information to review changes. Take a screenshot of the code and
results and paste it in the box below:
a. Delete the 3 columns as indicated using the drop method.
b. Display the first 5 rows of the DataFrame using the head function.
16. After reviewing and removing columns with excessive null data or minimal relevance, remove all rows
with NaN values in the 'Age' column. Take a screenshot of the code and results and paste it in the box
below:
a. Using the index property to create a list of row indexes where Age is NaN.
b. Use the list of indexes to remove all rows in the data frame.
c. Check the updated DataFrame information.
d. Convert the 'Sex' column to numerical values with male=0 and female=1 for easier analysis.
e. Check the updated DataFrame information.

PART #5 – USING PANDAS FOR ANALYZING DATA SETS

OBJECTIVE: To download and examine a dataset in the form of a csv file. Explore the data and use methods in
Pandas library to describe the characteristics of the data (descriptive analytics) such as filtering data and
identifying max, min and average values.

17. Conduct the following steps, adding the steps as markdown comments, and take a screenshot of the
code and results and paste it in the box below:
a. Import the appropriate libraries
b. Download the dataset from: https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-
2023
c. Read the file into a data frame
d. Display the information about the data frame
e. Display the first 5 records of the data frame
f. Create a dataframe of just Canadian residence called canDF

A) How many of these employees have residence in Canada?

B) What is the min, max, and average salary for your canDF?

C) What is the number of unique values for the job title for Canadian employees?

D) What is the name of the highest paid employee for Canadian employees?

18. Provide a screenshot of the code and results and paste it in the box below:

PART #6 – LESSONS LEARNED ABOUT DATA ANALYSIS WITH NUMPY AND PANDAS
OBJECTIVE: After exploring data structures in Python and analyzing datasets using libraries such as Pandas,
students will work in teams of two (assigned by the instructor) to reflect on their learning. Together, each team
will discuss and identify the top concepts they found most important or impactful from the lesson and lab
activities. Teams will then collaborate to write a joint conclusion, summarizing their key takeaways and insights.
This exercise aims to reinforce understanding, encourage critical reflection, and develop collaborative
communication skills.
CONDUCT THE FOLLOWING STEPS
1. Pair up with a classmate designated by the instructor and have your lab answers and notes ready.
2. Hold a discussion with your teammate and brainstorm at least 5 key concepts that were learned from the
lectures, videos or lab activities in this lesson. Each of the concepts should be stated in the following
way:
The concept of ____ is used for _______ and ____ and ____ . It is important since _____.

Example:
The concept of computers being programmable is used for better understanding programming
languages and how to select a programming language and how to design new systems that are more
flexible and powerful. It is important since programmable computers are key tools for professionals and
careers are built around the craft of programming computers.

3. Rate each of the key concepts in order of importance from 1 to 5 (5 is the highest) in terms of level of
impact of your knowledge and understanding of data analysis with numpy and pandas.
4. Copy and paste the joint top 3 key concepts in the following box:

5. Based on the top 3 concepts, write up a joint summary of the concepts and write a solid conclusion that
can be drawn from the lessons learned.
6. Copy and paste the joint summary and conclusion.

RUBRIC
LEVEL 3 LEVEL 2 LEVEL 1 LEVEL 0
CRITERIA %
( 100% ) ( 50% ) ( 25% ) ( 0% )

Part #1 – Working With Some of the

All section steps Only a few section
Numpy - Arrays From section steps were No steps were
15% were completed steps were completed
Python Lists And Array completed completed.
correctly correctly.
Functions correctly.

Some of the
Part #2 – Working With All section steps Only a few section
section steps were No steps were
Pandas Data Structure 15% were completed steps were completed
completed completed.
(Sets And Series) correctly correctly.
correctly.

Some of the
All section steps Only a few section
Part #3 – Using Data section steps were No steps were
20% were completed steps were completed
Frames In Pandas completed completed.
correctly correctly.
correctly.

Some of the
Part #4 – Using Pandas All section steps Only a few section
section steps were No steps were
For Reading Data 20% were completed steps were completed
completed completed.
Frames From Csv Files correctly correctly.
correctly.

Some of the
All section steps Only a few section
Part #5 – Using Pandas section steps were No steps were
20% were completed steps were completed
For Analyzing Data Sets completed completed.
correctly correctly.
correctly.
Part #6 – Lessons Some of the
All section steps Only a few section
Learned about Data section steps were No steps were
10% were completed steps were completed
Analysis with Numpy and completed completed.
correctly correctly.
Pandas correctly.

Mentum Ellipse 8.6.2 Release Note Full
No ratings yet
Mentum Ellipse 8.6.2 Release Note Full
38 pages
Module 4 Pandas File 1
No ratings yet
Module 4 Pandas File 1
3 pages
GE- COMPUTER SCIENCE DATA ANALYSIS
No ratings yet
GE- COMPUTER SCIENCE DATA ANALYSIS
16 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
ip study
No ratings yet
ip study
18 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
Question Bank CIA 2
No ratings yet
Question Bank CIA 2
3 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
python 1
No ratings yet
python 1
16 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
Practical
No ratings yet
Practical
29 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Python Pandas For Class XI Tutorial 1
No ratings yet
Python Pandas For Class XI Tutorial 1
8 pages
Practical of R
No ratings yet
Practical of R
38 pages
Class 12 IP - Program List - Term1
No ratings yet
Class 12 IP - Program List - Term1
2 pages
Exercise 7 - Pandas
No ratings yet
Exercise 7 - Pandas
2 pages
Assignment1
No ratings yet
Assignment1
2 pages
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
No ratings yet
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
72 pages
OCS353-Data Science Fundamentals Manual 1
No ratings yet
OCS353-Data Science Fundamentals Manual 1
34 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Pandas
No ratings yet
Pandas
27 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Panda
No ratings yet
Panda
33 pages
Pragya File
No ratings yet
Pragya File
31 pages
DATA%20HANDLING%20AND%20CSV%202024-%202025
No ratings yet
DATA%20HANDLING%20AND%20CSV%202024-%202025
3 pages
LUCKNOW PUBLIC SCHOOL_20241201_220143_0000
No ratings yet
LUCKNOW PUBLIC SCHOOL_20241201_220143_0000
44 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Xii - Ip - Holiday HW
No ratings yet
Xii - Ip - Holiday HW
2 pages
Practical Xii 11-25
No ratings yet
Practical Xii 11-25
14 pages
DS Practical
No ratings yet
DS Practical
30 pages
Class XII IP Summer Assignment
No ratings yet
Class XII IP Summer Assignment
5 pages
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
No ratings yet
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
11 pages
python interviews
No ratings yet
python interviews
154 pages
CLASS XII - IP List of Practicals with Coding 2020
No ratings yet
CLASS XII - IP List of Practicals with Coding 2020
15 pages
LastMinuteRevisionMaterial_IP24_25_3918eb18d9524a1caeba3b1f7f1f4042_82423
No ratings yet
LastMinuteRevisionMaterial_IP24_25_3918eb18d9524a1caeba3b1f7f1f4042_82423
18 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
XII IP Support Material 2024-25
No ratings yet
XII IP Support Material 2024-25
148 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Practical file 12th
No ratings yet
Practical file 12th
19 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Pandas Lab Assignment Work-2
No ratings yet
Pandas Lab Assignment Work-2
5 pages
Pandas
No ratings yet
Pandas
13 pages
Python_for_DataScience
No ratings yet
Python_for_DataScience
47 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
EX-02-Data manipulation pandas matplot
No ratings yet
EX-02-Data manipulation pandas matplot
9 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
No ratings yet
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
76 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Python for Data Science For Dummies
From Everand
Python for Data Science For Dummies
John Paul Mueller
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
New Perspectives on Microsoft Office Access 2007 Introductory Available Titles Skills Assessment Manager SAM Office 2007 1st Edition Joseph J. Adamski - The latest ebook version is now available for instant access
No ratings yet
New Perspectives on Microsoft Office Access 2007 Introductory Available Titles Skills Assessment Manager SAM Office 2007 1st Edition Joseph J. Adamski - The latest ebook version is now available for instant access
43 pages
Practical (Bda)
No ratings yet
Practical (Bda)
15 pages
Algorithms & Data Structures 06
No ratings yet
Algorithms & Data Structures 06
13 pages
Ecommerce_Clothing_Website_Project_Report (1)
No ratings yet
Ecommerce_Clothing_Website_Project_Report (1)
7 pages
Technical Overview 04 - BRM Data Model
0% (1)
Technical Overview 04 - BRM Data Model
40 pages
Python Question and Answers
No ratings yet
Python Question and Answers
12 pages
1Big_Data (1)
No ratings yet
1Big_Data (1)
69 pages
Odoorpc Documentation: Release 0.6.2
No ratings yet
Odoorpc Documentation: Release 0.6.2
54 pages
Data Engineering UNIT-1
No ratings yet
Data Engineering UNIT-1
14 pages
FactoryTalk VantagePoint EMI
No ratings yet
FactoryTalk VantagePoint EMI
4 pages
Python
No ratings yet
Python
14 pages
Linq Notes
No ratings yet
Linq Notes
8 pages
Microsoft: Exam Questions AZ-304
No ratings yet
Microsoft: Exam Questions AZ-304
10 pages
Choosing A Digital Repository
No ratings yet
Choosing A Digital Repository
30 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
MaheshResume
No ratings yet
MaheshResume
3 pages
Coa June 2019 Question Paper
100% (1)
Coa June 2019 Question Paper
3 pages
UNIT 4 Query Processing and Different types of Databases
No ratings yet
UNIT 4 Query Processing and Different types of Databases
13 pages
Exam Seating Arrangement
No ratings yet
Exam Seating Arrangement
3 pages
Imagenet: A Large-Scale Hierarchical Image Database: Conference Paper
No ratings yet
Imagenet: A Large-Scale Hierarchical Image Database: Conference Paper
9 pages
CT042-3-1 Introduction to Databases (VD1) 6 January 2020
No ratings yet
CT042-3-1 Introduction to Databases (VD1) 6 January 2020
2 pages
DBMS_Assignment2_2079
No ratings yet
DBMS_Assignment2_2079
3 pages
Microservice Course Structure
No ratings yet
Microservice Course Structure
14 pages
Lista de Libros y Manuales Sap PDF
No ratings yet
Lista de Libros y Manuales Sap PDF
3 pages
CummingsDVCon2023 Uvm Resource DB API
No ratings yet
CummingsDVCon2023 Uvm Resource DB API
33 pages
Blockchain Technology PPT Project
100% (2)
Blockchain Technology PPT Project
22 pages
PREBOARD 1 Qpaper Xii
No ratings yet
PREBOARD 1 Qpaper Xii
8 pages
Supriya Data Analyst Resume
No ratings yet
Supriya Data Analyst Resume
3 pages
GIS Unit 2 Class Test
No ratings yet
GIS Unit 2 Class Test
7 pages

Lab #2 - Data Analysis With NumPy and Pandas

Uploaded by

Lab #2 - Data Analysis With NumPy and Pandas

Uploaded by

PROG25211 AI and Machine Learning - Python

CONDUCT THE FOLLOWING STEPS:

PART #2 – WORKING WITH PANDAS DATA STRUCTURE (SETS AND SERIES)

PART #3 – USING DATA FRAMES IN PANDAS

● PassengerId - is an index value assigned to the data entry.

PART #5 – USING PANDAS FOR ANALYZING DATA SETS

A) How many of these employees have residence in Canada?

Part #1 – Working With Some of the

You might also like