0% found this document useful (0 votes)

2 views10 pages

Assignment

The document outlines an assignment for a Data Science course requiring students to perform exploratory data analysis on a dataset named nyt1.csv. Students must categorize users into age groups, calculate click-through rates, and create visualizations based on user demographics. The assignment emphasizes the use of Python for data manipulation and analysis, with a submission deadline of May 20, 2023.

Uploaded by

Usama Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views10 pages

Assignment

Uploaded by

Usama Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

Assignment N0. 01
Submission Due Date: 20 May 2023
Marks 05
Instructions (Any):

 You are supposed to create data frames in order to perform exploratory data analysis.
 This assignment requires to perform below mentioned tasks on one data set named as
nyt1.csv from the below mentioned link.
 Assignment submission should be on python.
 Submit soft copy on LMS

Question 1 CLO2-PLO3-BT level C3

There are 31 datasets named nyt1.csv, nyt2.csv,...,nyt31.csv, which you can find
here: https://github.com/oreillymedia/doing_data_science.
Each one represents one (simulated) days’ worth of ads shown and clicks recorded on the New York
Times home page in May 2012. Each row represents a single user. There are five columns: age, gender
(0=female, 1=male), number impressions, number clicks, and logged-in.

Once you have the data loaded, it’s time for some EDA:
1. Create a new variable, age_group, that categorizes users as "<18", "18-
24", "25-34", "35-44", "45-54", "55-64", and "65+".

For a single day:

Answer:

Code:

import pandas as Usama

# Define the age group function

def get_age_group(age):

if age < 18:

return '<18'
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

elif age < 25:

return '18-24'

elif age < 35:

return '25-34'

elif age < 45:

return '35-44'

elif age < 55:

return '45-54'

elif age < 65:

return '55-64'

else:

return '65+'

# Read the CSV file into a DataFrame

data_frame = Usama.read_csv('D:/Assignments/Alpha_Usama/assig1.csv')

# Add the age_group column to the DataFrame

data_frame['age_group'] = df['Age'].apply(get_age_group)#Apply() method. This function acts as a

map() function in Python.

#It takes a function as an input and applies this function to an entire DataFrame. If you are working with
tabular data,

#you must specify an axis you want your function to act on ( 0 for columns; and 1 for rows)

# Write the updated DataFrame back to the CSV file

data_frame.to_csv('D:/Assignments/Alpha_Usama/assig1.csv', index=False)

Screenshot:
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

2. Plot the distributions of number impressions and click-through-rate (CTR=# clicks/# impressions)
for these six age categories. Define a new variable to segment or categorize users based on their
click behavior.
Answer:
Code:
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_csv('C:/Users/Alpha_Usama/Desktop/nyt1.csv')
# Calculate CTR
df['CTR'] = df['Clicks'] / df['Impressions']
# Create click_segment variable
median_ctr = df['CTR'].median()
def click_segment(row):
if row['CTR'] >= median_ctr:
return 'High Clicker'
else:
return 'Low Clicker'

df['click_segment'] = df.apply(click_segment, axis=1)

Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

# Plot histogram of Impressions for 18-24 age group

plt.hist(df[df['Age'] == '18-24']['Impressions'])
plt.xlabel('Number of Impressions')
plt.ylabel('Frequency')
plt.title('Distribution of Impressions for 18-24 Age Group')
plt.show()
# Plot CTR distribution for each age category
age_categories = df['Age'].unique()
for category in age_categories:
plt.hist(df[df['Age'] == category]['CTR'])
plt.xlabel('CTR')
plt.ylabel('Frequency')
plt.title(f'Distribution of CTR for {category}')
plt.show()
Screenshot:
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

Note As I was aking the continous data from csv file nyt1 so it give me a lot of grah according to
age but I have choose 3 gragh for result.
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

3. Explore the data and make visual and quantitative comparisons across user
segments/demographics (<18-year-old males versus < 18-year-old females or
logged-in versus not, for example).
Answer:
Code:
import pandas as pd
import matplotlib.pyplot as plt

# Read the csv file

data = pd.read_csv("C:/Users/NAEEM UR REHMAN/Desktop/nyt1.csv")

# Define the segments based on age and gender

data["segments"] = data.apply(lambda row: "<18-year-old males" if row["Age"] < 18 and
row["Gender"] == 1 else "<18-year-old females" if row["Age"] < 18 and row["Gender"] == 0 else
">=18-year-old males" if row["Age"] >= 18 and row["Gender"] == 1 else ">=18-year-old females",
axis=1)
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

# Define the segments based on sign-in status

data["signed_in_segments"] = data.apply(lambda row: "Signed-in users" if row["Signed_In"] == 1
else "Non-signed-in users", axis=1)

# Group the data by the segments

grouped_data = data.groupby(["segments", "signed_in_segments"]).sum()[["Impressions", "Clicks"]]

# Plot the data

grouped_data.plot(kind="bar")
plt.title("Impressions and clicks by user segments")
plt.xlabel("User segments")
plt.ylabel("Count")
plt.show()
Screenshot:
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

4. Create metrics/measurements/statistics that summarize the data.

Representing mean, median,
variance, and max, and these can be calculated across the various user
segments.

Answer:

Code:

import pandas as naeem

#this will load the data and read data by your file
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

data_frame = naeem.read_csv('C:/Users/NAEEM UR REHMAN/Desktop/nyt1.csv')

#we have to define the usr segment first

user_segments = ['Age', 'Gender', 'Signed_In']

# Calculate the metrics for each user segment

metrics = {}

for segment in user_segments:

# Group the data by the user segment

#To group large amounts of data and compute operations on these groups.

group = data_frame.groupby(segment)

# Calculate the metrics for Impressions and Clicks

metrics[segment] = {}

metrics[segment]['Impressions'] = group['Impressions'].agg(['mean', 'median', 'var',

'max'])

metrics[segment]['Clicks'] = group['Clicks'].agg(['mean', 'median', 'var', 'max'])

#Allows you to apply a function or a list of function names to

#be executed along one of the axis of the DataFrame,

#default 0, which is the index (row) axis. Note:

#the agg() method is an alias of the aggregate() method.

# Print the metrics

for segment in user_segments:

print(f"Metrics for {segment}:")

print(metrics[segment])

Screenshot:
Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

NAME :NAEEM UR REHMAN
REG NO:02-134202-053

Modbus
No ratings yet
Modbus
12 pages
Lab Manual: 18CS3262S Data Modelling and Visualization Techniques
33% (3)
Lab Manual: 18CS3262S Data Modelling and Visualization Techniques
17 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
NUM-BSMATH-2023-15_Lab_Report_8_663c5f49df9a0
No ratings yet
NUM-BSMATH-2023-15_Lab_Report_8_663c5f49df9a0
4 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Data Mining Journal 1 Kashan
No ratings yet
Data Mining Journal 1 Kashan
13 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Module 7 _ Advanced Python Tools Assignment DS
No ratings yet
Module 7 _ Advanced Python Tools Assignment DS
3 pages
Sample Paper 2 IP 12
No ratings yet
Sample Paper 2 IP 12
8 pages
Project Walkthrough - Bike Share-2020
No ratings yet
Project Walkthrough - Bike Share-2020
58 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
Data Analysis and Visualization LAB
No ratings yet
Data Analysis and Visualization LAB
2 pages
Ip CLSS Xii 2024-25 Hy
No ratings yet
Ip CLSS Xii 2024-25 Hy
14 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
XII IP JPR-MS-PB-1-SET-2
No ratings yet
XII IP JPR-MS-PB-1-SET-2
12 pages
data science
No ratings yet
data science
10 pages
Day 1-Tasks
No ratings yet
Day 1-Tasks
3 pages
BAET Record
No ratings yet
BAET Record
19 pages
Paper 2
No ratings yet
Paper 2
12 pages
Datascience
No ratings yet
Datascience
8 pages
DIVP PYQ 2023
No ratings yet
DIVP PYQ 2023
7 pages
DATASCIENCE(Unit-1) Question Bank
No ratings yet
DATASCIENCE(Unit-1) Question Bank
6 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
12TH Hy Ip St. Mary 2023
No ratings yet
12TH Hy Ip St. Mary 2023
10 pages
DATA SCIENCE SAMPLE
No ratings yet
DATA SCIENCE SAMPLE
5 pages
Report
No ratings yet
Report
18 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
ms_Preboard QP IP 2024-25_set3she
No ratings yet
ms_Preboard QP IP 2024-25_set3she
12 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
Python Data Analyst Handbook Guide_byom_cybertechie
No ratings yet
Python Data Analyst Handbook Guide_byom_cybertechie
57 pages
Xi Ai Final Practcial File
No ratings yet
Xi Ai Final Practcial File
3 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
No ratings yet
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
12 pages
Matplotlib Project Report AIPT (2)
No ratings yet
Matplotlib Project Report AIPT (2)
6 pages
Ip Practical 2024 2025
No ratings yet
Ip Practical 2024 2025
14 pages
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
No ratings yet
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
20 pages
Ai Class 12 Practical
No ratings yet
Ai Class 12 Practical
21 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
CSCI946 Assignment_1_task_sheet
No ratings yet
CSCI946 Assignment_1_task_sheet
4 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
Practice assignment -1-Class XI AI
No ratings yet
Practice assignment -1-Class XI AI
2 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
6205solved Ip CL Xii 2020
No ratings yet
6205solved Ip CL Xii 2020
11 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
NPV 70 Marks Set 2
No ratings yet
NPV 70 Marks Set 2
4 pages
XII IP SAMPLE PAPER 1
No ratings yet
XII IP SAMPLE PAPER 1
10 pages
XII IP Support Material 2024-25
No ratings yet
XII IP Support Material 2024-25
148 pages
Data Sci
No ratings yet
Data Sci
10 pages
Sample Paper Annual
No ratings yet
Sample Paper Annual
3 pages
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
No ratings yet
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
1 page
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Ukg Ip 8
No ratings yet
Ukg Ip 8
6 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Oracle Navigation Quick Reference
No ratings yet
Oracle Navigation Quick Reference
2 pages
ITIL Interview Questions
No ratings yet
ITIL Interview Questions
5 pages
Dea 5TT2
No ratings yet
Dea 5TT2
3 pages
AFS Brief Deck - 14jun21
No ratings yet
AFS Brief Deck - 14jun21
21 pages
Revision Notes - 17 Data storage
No ratings yet
Revision Notes - 17 Data storage
14 pages
Microsoft Visual Basic 2012: Chapter One
No ratings yet
Microsoft Visual Basic 2012: Chapter One
40 pages
How To Start: Stop: Restart: Enable: Reload The VSFTPD Service in Linux?
No ratings yet
How To Start: Stop: Restart: Enable: Reload The VSFTPD Service in Linux?
5 pages
Final Year Project Presentation Format For Engineering Students
No ratings yet
Final Year Project Presentation Format For Engineering Students
12 pages
Subject: Bcan-502 Unix and Shell Programming (Simple Filter Commands Bca 5 SEM
No ratings yet
Subject: Bcan-502 Unix and Shell Programming (Simple Filter Commands Bca 5 SEM
13 pages
Data Mining With Rattle For: Akhil Anil Karun Full Stack Engineer (Java)
No ratings yet
Data Mining With Rattle For: Akhil Anil Karun Full Stack Engineer (Java)
40 pages
HHP CV - Office Lady PDF
No ratings yet
HHP CV - Office Lady PDF
1 page
Unit 4
100% (1)
Unit 4
10 pages
Log
No ratings yet
Log
31 pages
Quartz Clock Tutorial: Page 1/12
No ratings yet
Quartz Clock Tutorial: Page 1/12
12 pages
Mohammed
No ratings yet
Mohammed
11 pages
L T P C: Nirma University Institute of Technology B.Tech., All Branches Semester-I/II
No ratings yet
L T P C: Nirma University Institute of Technology B.Tech., All Branches Semester-I/II
2 pages
L1 - Prog Fundamental - Q
No ratings yet
L1 - Prog Fundamental - Q
4 pages
Zdemo Url
No ratings yet
Zdemo Url
6 pages
Windows 7: Appendix
No ratings yet
Windows 7: Appendix
2 pages
1st Exam Practice Scratch (Answer)
No ratings yet
1st Exam Practice Scratch (Answer)
2 pages
Expense Tracking App
No ratings yet
Expense Tracking App
28 pages
IOT Projects List 2021
No ratings yet
IOT Projects List 2021
4 pages
Dual DUW Configuration
40% (5)
Dual DUW Configuration
9 pages
T120 Data Sheet: Feature Spectra Logic T120
No ratings yet
T120 Data Sheet: Feature Spectra Logic T120
2 pages
Quasar Manual
50% (8)
Quasar Manual
48 pages
Lec5 - Shift and Rotate Instructions
No ratings yet
Lec5 - Shift and Rotate Instructions
14 pages
CSE100 wk3 4
No ratings yet
CSE100 wk3 4
21 pages
13th 14th and 15th Paper
No ratings yet
13th 14th and 15th Paper
24 pages
Blu Ray
No ratings yet
Blu Ray
4 pages

Assignment

Uploaded by

Assignment

Uploaded by

Department of Computer Science

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

Question 1 CLO2-PLO3-BT level C3

For a single day:

import pandas as Usama

# Define the age group function

if age < 18:

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

elif age < 25:

elif age < 35:

elif age < 45:

elif age < 55:

elif age < 65:

# Read the CSV file into a DataFrame

# Add the age_group column to the DataFrame

data_frame['age_group'] = df['Age'].apply(get_age_group)#Apply() method. This function acts as a

# Write the updated DataFrame back to the CSV file

# Write the updated DataFrame back to the CSV file

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

df['click_segment'] = df.apply(click_segment, axis=1)

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

# Plot histogram of Impressions for 18-24 age group

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

# Read the csv file

# Define the segments based on age and gender

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

# Define the segments based on sign-in status

# Group the data by the segments

# Plot the data

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

4. Create metrics/measurements/statistics that summarize the data.

import pandas as naeem

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

data_frame = naeem.read_csv('C:/Users/NAEEM UR REHMAN/Desktop/nyt1.csv')

#we have to define the usr segment first

user_segments = ['Age', 'Gender', 'Signed_In']

# Calculate the metrics for each user segment

for segment in user_segments:

# Group the data by the user segment

# Calculate the metrics for Impressions and Clicks

metrics[segment]['Impressions'] = group['Impressions'].agg(['mean', 'median', 'var',

metrics[segment]['Clicks'] = group['Clicks'].agg(['mean', 'median', 'var', 'max'])

#Allows you to apply a function or a list of function names to

#be executed along one of the axis of the DataFrame,

#default 0, which is the index (row) axis. Note:

#the agg() method is an alias of the aggregate() method.

# Print the metrics

for segment in user_segments:

print(f"Metrics for {segment}:")

CSC-487: Introduction to Data Science

Semester 1 (SPRING 2023)

You might also like