0% found this document useful (0 votes)
10 views

Lab #2 - Data Analysis With NumPy and Pandas

This document outlines a lab assignment for a course on AI and Machine Learning using Python, focusing on data analysis with NumPy and Pandas. It includes instructions for various tasks such as creating and manipulating arrays, working with Pandas Series and DataFrames, reading data from CSV files, and analyzing datasets. Additionally, it emphasizes collaborative learning through discussions on key concepts learned during the lab.

Uploaded by

wasaykhan1219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lab #2 - Data Analysis With NumPy and Pandas

This document outlines a lab assignment for a course on AI and Machine Learning using Python, focusing on data analysis with NumPy and Pandas. It includes instructions for various tasks such as creating and manipulating arrays, working with Pandas Series and DataFrames, reading data from CSV files, and analyzing datasets. Additionally, it emphasizes collaborative learning through discussions on key concepts learned during the lab.

Uploaded by

wasaykhan1219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

PROG25211 AI and Machine Learning - Python

Lab #2
Data Analysis with NumPy and Pandas

INSTRUCTIONS
1. All activities in this lab should be conducted individually, unless indicated otherwise.
2. Apply the material seen in class and video/readings to complete the steps in the lab, answering the
questions and taking screenshots of your work and pasting them into this document as indicated.
3. Upload this document with your screenshots and/or answers to the appropriate dropbox in the assignments
section before the due date and time indicated in Slate.
4. Complete the corresponding self-assessment in the quizzes section within 3 days of submitting this
document with your answers.

PART #1 – WORKING WITH NUMPY - ARRAYS FROM PYTHON LISTS AND ARRAY FUNCTIONS
OBJECTIVE: To practice using NumPy to generate and manipulate arrays in Python. Applying fundamental
array operations such as creating sequences, generating random numbers, and performing indexing to extract
statistical values. This exercise reinforces basic array syntax, introduces random number generation, and
strengthens data exploration skills through hands-on coding and output interpretation.

CONDUCT THE FOLLOWING STEPS:


1. Create a new Jupyter Notebook.
2. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Import the NumPy library
b. Create a Python list with 1,2,3
c. Create a NumPy array from the list

3. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Generate an array from 0-9
b. Generate an array from 0-10 with a step of 2
c. Generate an array size 10 of random numbers from 0-1
d. Generate an array size 10 of random numbers from 0-10

4. Write the python code according to the instructions below (add the instructions as comments) and take a
screenshot of the code and results and paste it in the box below:
a. Create an array of 10 random integers between 1 and 100.
b. What is the max value in the array?
c. What is the index of the max value?
d. What is the minimum value?
e. What is the index of the minimum value?
f. What is the value at index 3?
g. What are the values at index 2 through 4?

PART #2 – WORKING WITH PANDAS DATA STRUCTURE (SETS AND SERIES)


OBJECTIVE: To understand that Pandas is an open-source library that provides high-performance, user-friendly
data structures and data analysis tools for Python. Explore data structures such as Series and the concept of
key-values.

5. Create lists and dictionaries following the following steps and then take a screenshot of the code and
results and paste it in the box below:
a. Import the pandas library (numpy is also needed)
b. Create a list named labels containing 'a', 'b', 'c' for a label column.
c. Create a list named values1 with the values 7, 8, 9.
d. Create a dictionary with keys 'a', 'b', 'c', 'd' and corresponding values 10, 11, 12, 13.

6. Create set1 and set2 as Pandas Series using values1/labels and dictionary, then add them to get set3.
Take a screenshot of the code and results and paste it in the box below:
a. Create set1 as a Pandas Series using values1 and labels for data and index.
b. Create set2 as a Pandas Series from the dictionary.
c. Add set1 and set2 to create set3.
d. Display set3.
e. Get the value by the key ‘a’

PART #3 – USING DATA FRAMES IN PANDAS


OBJECTIVE: To introduce Pandas DataFrames, a 2-dimensional data structure similar to SQL or Excel, to
analyze and manipulate datasets. Explore the manipulation of rows and columns in DataFrames, including
extracting subframes, adding and removing columns.

7. Import NumPy and Pandas, then create a DataFrame using three lists of random integers, with labels
['a','b','c'] and columns ['W','X','Y','Z']. Take a screenshot of the code and results and paste it in the box
below:
a. Import NumPy and Pandas.
b. Generate three lists of four random integers between 1 and 20 using NumPy.
c. Define a list named labels with values ['a', 'b', 'c'].
d. Define a list named cols with values ['W', 'X', 'Y', 'Z'].
e. Combine the three lists into a list of lists, dataArray.
f. Create a DataFrame dataFrame using dataArray as data, labels as index, and cols as columns.
g. Display the DataFrame.

8. To display column data from a DataFrame, use the column name to access it, which returns a Pandas
Series. Take a screenshot of the code and results and paste it in the box below:
a. Access the 'X' column of the DataFrame, which produces a Pandas Series.
b. Use double brackets with ['X', 'Y'] to select these columns, producing a DataFrame.
c. Access the 'Z' column of the DataFrame using dot notation, which produces a Pandas Series.
d. Create a new column in the DataFrame, sum the 'X' and 'Y' columns, and name the new column
'Sum(X+Y)'

9. Columns and rows can also be removed from the data frame. Take a screenshot of the code and results
and paste it in the box below:
a. Use the drop function (method in the data frame) to remove the sum column. To make changes
permanent, use inplace=True. Set axis=1 to specify column operations, such as removing the
'Sum(X+Y)' column.
b. Display the data frame after removing the sum column.

10. Rows can be added to the data frame. Take a screenshot of the code and results and paste it in the box
below:
a. Create a new DataFrame newRow with a single row [1, 2, 3, 4] and columns ['W', 'X', 'Y', 'Z'],
and index ['d'].
b. Use the concat function in Pandas to concatenate newRow to the existing dataFrame.
c. Display the updated DataFrame.
PART #4 – USING PANDAS FOR READING DATA FRAMES FROM CSV FILES
OBJECTIVE: To read a dataset and start preparing it for analysis using methods in the Pandas library.
Determine which information is useful, what can be removed, and how to adjust the data for easier analysis.

It is important to know what each of the columns are and what the values represent.

● PassengerId - is an index value assigned to the data entry.


● Survived - 1 if they survived, 0 otherwise.
● Pclass - Ticket class: 1 = 1st, 2 = 2nd, 3 = 3rd
● Name - Name of the passenger.
● Sex - male or female for each passenger.
● Age - age of the passenger
● SibSb - No. of siblings / spouses aboard the Titanic
● Parch - No. of parents / children aboard the Titanic
● Ticket - Ticket Number
● Fare - Passenger Fare
● Embarked - Indicates where the passenger boarded.

11. Conduct the following steps, adding the steps as markdown comments, and take a screenshot of the
code and results and paste it in the box below:
a. We are going to read in a data set for the Titanic passenger list. Downloaded from:
https://www.kaggle.com/datasets/yasserh/titanic-dataset?select=Titanic-Dataset.csv
b. Import NumPy and Pandas.
c. Read the Titanic dataset from 'Titanic-Dataset.csv' into a DataFrame named titanicDataFrame.
d. Use the head function (method in the dataframe object) to display the first 5 records as well as
column names
e. Use the info function (method in the dataframe object) to display metadata information about
columns.
f. From the data frame information, how would you identify columns in the data frame that have
null values? Write the answer in the following box after the screenshot of the code and results.

12. Review the dataset to identify and remove columns with excessive null data, minimal relevance or
unknown value meanings.
13. In the box below, list the columns that are candidates for removal and provide a solid rationale (at least 3
columns). Hint: age column should not be removed.

14. Remove the 3 columns identified above using the drop method.
15. Display the updated DataFrame information to review changes. Take a screenshot of the code and
results and paste it in the box below:
a. Delete the 3 columns as indicated using the drop method.
b. Display the first 5 rows of the DataFrame using the head function.
16. After reviewing and removing columns with excessive null data or minimal relevance, remove all rows
with NaN values in the 'Age' column. Take a screenshot of the code and results and paste it in the box
below:
a. Using the index property to create a list of row indexes where Age is NaN.
b. Use the list of indexes to remove all rows in the data frame.
c. Check the updated DataFrame information.
d. Convert the 'Sex' column to numerical values with male=0 and female=1 for easier analysis.
e. Check the updated DataFrame information.

PART #5 – USING PANDAS FOR ANALYZING DATA SETS


OBJECTIVE: To download and examine a dataset in the form of a csv file. Explore the data and use methods in
Pandas library to describe the characteristics of the data (descriptive analytics) such as filtering data and
identifying max, min and average values.

17. Conduct the following steps, adding the steps as markdown comments, and take a screenshot of the
code and results and paste it in the box below:
a. Import the appropriate libraries
b. Download the dataset from: https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-
2023
c. Read the file into a data frame
d. Display the information about the data frame
e. Display the first 5 records of the data frame
f. Create a dataframe of just Canadian residence called canDF

A) How many of these employees have residence in Canada?

B) What is the min, max, and average salary for your canDF?

C) What is the number of unique values for the job title for Canadian employees?

D) What is the name of the highest paid employee for Canadian employees?

18. Provide a screenshot of the code and results and paste it in the box below:

PART #6 – LESSONS LEARNED ABOUT DATA ANALYSIS WITH NUMPY AND PANDAS
OBJECTIVE: After exploring data structures in Python and analyzing datasets using libraries such as Pandas,
students will work in teams of two (assigned by the instructor) to reflect on their learning. Together, each team
will discuss and identify the top concepts they found most important or impactful from the lesson and lab
activities. Teams will then collaborate to write a joint conclusion, summarizing their key takeaways and insights.
This exercise aims to reinforce understanding, encourage critical reflection, and develop collaborative
communication skills.
CONDUCT THE FOLLOWING STEPS
1. Pair up with a classmate designated by the instructor and have your lab answers and notes ready.
2. Hold a discussion with your teammate and brainstorm at least 5 key concepts that were learned from the
lectures, videos or lab activities in this lesson. Each of the concepts should be stated in the following
way:
The concept of ____ is used for _______ and ____ and ____ . It is important since _____.

Example:
The concept of computers being programmable is used for better understanding programming
languages and how to select a programming language and how to design new systems that are more
flexible and powerful. It is important since programmable computers are key tools for professionals and
careers are built around the craft of programming computers.

3. Rate each of the key concepts in order of importance from 1 to 5 (5 is the highest) in terms of level of
impact of your knowledge and understanding of data analysis with numpy and pandas.
4. Copy and paste the joint top 3 key concepts in the following box:

5. Based on the top 3 concepts, write up a joint summary of the concepts and write a solid conclusion that
can be drawn from the lessons learned.
6. Copy and paste the joint summary and conclusion.

RUBRIC
LEVEL 3 LEVEL 2 LEVEL 1 LEVEL 0
CRITERIA %
( 100% ) ( 50% ) ( 25% ) ( 0% )

Part #1 – Working With Some of the


All section steps Only a few section
Numpy - Arrays From section steps were No steps were
15% were completed steps were completed
Python Lists And Array completed completed.
correctly correctly.
Functions correctly.

Some of the
Part #2 – Working With All section steps Only a few section
section steps were No steps were
Pandas Data Structure 15% were completed steps were completed
completed completed.
(Sets And Series) correctly correctly.
correctly.

Some of the
All section steps Only a few section
Part #3 – Using Data section steps were No steps were
20% were completed steps were completed
Frames In Pandas completed completed.
correctly correctly.
correctly.

Some of the
Part #4 – Using Pandas All section steps Only a few section
section steps were No steps were
For Reading Data 20% were completed steps were completed
completed completed.
Frames From Csv Files correctly correctly.
correctly.

Some of the
All section steps Only a few section
Part #5 – Using Pandas section steps were No steps were
20% were completed steps were completed
For Analyzing Data Sets completed completed.
correctly correctly.
correctly.
Part #6 – Lessons Some of the
All section steps Only a few section
Learned about Data section steps were No steps were
10% were completed steps were completed
Analysis with Numpy and completed completed.
correctly correctly.
Pandas correctly.

You might also like