Assignment
Assignment
Assignment N0. 01
Submission Due Date: 20 May 2023
Marks 05
Instructions (Any):
You are supposed to create data frames in order to perform exploratory data analysis.
This assignment requires to perform below mentioned tasks on one data set named as
nyt1.csv from the below mentioned link.
Assignment submission should be on python.
Submit soft copy on LMS
Once you have the data loaded, it’s time for some EDA:
1. Create a new variable, age_group, that categorizes users as "<18", "18-
24", "25-34", "35-44", "45-54", "55-64", and "65+".
Answer:
Code:
def get_age_group(age):
return '<18'
Department of Computer Science
return '18-24'
return '25-34'
return '35-44'
return '45-54'
return '55-64'
else:
return '65+'
data_frame = Usama.read_csv('D:/Assignments/Alpha_Usama/assig1.csv')
#It takes a function as an input and applies this function to an entire DataFrame. If you are working with
tabular data,
#you must specify an axis you want your function to act on ( 0 for columns; and 1 for rows)
data_frame.to_csv('D:/Assignments/Alpha_Usama/assig1.csv', index=False)
Screenshot:
Department of Computer Science
2. Plot the distributions of number impressions and click-through-rate (CTR=# clicks/# impressions)
for these six age categories. Define a new variable to segment or categorize users based on their
click behavior.
Answer:
Code:
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_csv('C:/Users/Alpha_Usama/Desktop/nyt1.csv')
# Calculate CTR
df['CTR'] = df['Clicks'] / df['Impressions']
# Create click_segment variable
median_ctr = df['CTR'].median()
def click_segment(row):
if row['CTR'] >= median_ctr:
return 'High Clicker'
else:
return 'Low Clicker'
Note As I was aking the continous data from csv file nyt1 so it give me a lot of grah according to
age but I have choose 3 gragh for result.
Department of Computer Science
3. Explore the data and make visual and quantitative comparisons across user
segments/demographics (<18-year-old males versus < 18-year-old females or
logged-in versus not, for example).
Answer:
Code:
import pandas as pd
import matplotlib.pyplot as plt
Answer:
Code:
#this will load the data and read data by your file
Department of Computer Science
metrics = {}
#To group large amounts of data and compute operations on these groups.
group = data_frame.groupby(segment)
metrics[segment] = {}
print(metrics[segment])
Screenshot:
Department of Computer Science