0% found this document useful (0 votes)
11 views8 pages

Assignment 1

The document outlines an assignment on data handling using Python, submitted by students Samanpreet Singh and Hardeep Singh. It includes code for loading a dataset on phone usage in India, displaying data characteristics, identifying null values, and printing a specified range of the dataset. The dataset contains 53,058 rows and 19 columns, with various features related to phone usage and demographics.

Uploaded by

unknownloves2329
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

Assignment 1

The document outlines an assignment on data handling using Python, submitted by students Samanpreet Singh and Hardeep Singh. It includes code for loading a dataset on phone usage in India, displaying data characteristics, identifying null values, and printing a specified range of the dataset. The dataset contains 53,058 rows and 19 columns, with various features related to phone usage and demographics.

Uploaded by

unknownloves2329
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment 1: Data Handling using Python

Roll number: 24071227 and 24071232


Student Name: Samanpreet Singh and Hardeep singh
Group: 7
Date of submission: 28-01-2025
Submitted to: Dr. Sukhjeet Kaur Ranade & Ms. Rama Rani
Program Title: Dataset of Phone Usage in india

Code
# Import the pandas library
import pandas as pd

# load the original data


df=pd.read_csv("E:\phone_usage_india_dirty.csv")

# Here we add dashline to understand more easyliy using following command


print("-" * 40)

# Display the number of rows, columns and Datatypes


datatypes=df.dtypes

print("Number of rows:",df.shape[0])
print("Number of columns:",df.shape[1])

print("-" * 40)
print("Data types:")
print(datatypes)

print("-" * 40)

# List of continuous features by only taking integer values


continue_features=df.select_dtypes(include=['int']).columns

# Print the continuous series


print("Continue_features:")
for feature in continue_features:
print(feature)

print("-" * 40)

# Display the dataset size (Number of rows)


print("Dataset size (Number of rows):",df.shape[0])

print("-" * 40)

# Find the number of null values in each column


null_values = df.isnull().sum()

# Print the results


print("Null values in the dataset")
print(null_values)

print("-" * 40)

# Identify discrete (categorical) features


categorical_features = df.select_dtypes(include=['object', 'category', 'float', 'int'])

# Count the number of unique categories for each discrete feature


category_counts = categorical_features.nunique()

# Print the results with proper alignment


print(f"{'Feature':<25}{'Unique Categories':<50}")

for feature, count in category_counts.items():


print(f"{feature:<30} {count:<60}")

print("-" * 40)

# Print the range according to user specifications


def print_csv_range(df, start, end):

try:
# Load the CSV file into a DataFrame
df=pd.read_csv("E:\phone_usage_india_dirty.csv")

# Print the specified range of rows


print(df.iloc[start:end])

# In case of finding an error


except FileNotFoundError:
print(f"Error: The file at '{df}' was not found.")
except Exception as e:
print(f"An error occurred: {e}")

# Get value from user


start=int(input("enter the starting range value:" ))

end=int(input("Enter the ending range value:" ))

print_csv_range(df, start, end)

print("-" * 40)
Output
df=pd.read_csv("E:\phone_usage_india_dirty.csv")
----------------------------------------
Number of rows: 53058
Number of columns: 19
----------------------------------------
Data types:
User ID object
Age float64
Gender object
Location object
Phone Brand object
OS object
Screen Time (hrs/day) float64
Data Usage (GB/month) float64
Calls Duration (mins/day) float64
Number of Apps Installed float64
Social Media Time (hrs/day) float64
E-commerce Spend (INR/month) float64
Streaming Time (hrs/day) float64
Gaming Time (hrs/day) float64
Monthly Recharge Cost (INR) float64
Primary Use object
Timestamp object
Customer_Satisfaction int64
Customer_Lifetime_Value float64
dtype: object
----------------------------------------
Continue_features:
Customer_Satisfaction
----------------------------------------
Dataset size (Number of rows): 53058
----------------------------------------
Null values in the dataset
User ID 5452
Age 5048
Gender 5335
Location 5247
Phone Brand 5303
OS 5353
Screen Time (hrs/day) 4822
Data Usage (GB/month) 4997
Calls Duration (mins/day) 5238
Number of Apps Installed 5456
Social Media Time (hrs/day) 5292
E-commerce Spend (INR/month) 5018
Streaming Time (hrs/day) 5334
Gaming Time (hrs/day) 5245
Monthly Recharge Cost (INR) 5270
Primary Use 5289
Timestamp 0
Customer_Satisfaction 0
Customer_Lifetime_Value 5018
dtype: int64
----------------------------------------
Feature Unique Categories
User ID 17665
Age. 47
Gende 3
Location 10
Phone Brand 10
OS 2
Screen Time (hrs/day) 112
Data Usage (GB/month) 492
Calls Duration (mins/day) 2945
Number of Apps Installed 191
Social Media Time (hrs/day) 56
E-commerce Spend (INR/month) 8195
Streaming Time (hrs/day) 76
Gaming Time (hrs/day) 51
Monthly Recharge Cost (INR) 1901
Primary Use 5
Timestamp 53058
Customer_Satisfaction 5
Customer_Lifetime_Value 48040
----------------------------------------
enter the starting range value:100
Enter the ending range value:2000
User ID Age Gender Location Phone Brand ... Monthly Recharge Cost (INR) Primary Use Timestamp Customer_Satisfaction
Customer_Lifetime_Value

100 U00101 31.0 Female Pune Apple ... NaN Education 2023-01-05 04:00:00 2 168606.410928

101 U00102 39.0 Male Pune Apple ... NaN Social Media 2023-01-05 05:00:00 4 64562.179611

102 U00103 15.0 NaN Mumbai Realme ... 687.0 Work 2023-01-05 06:00:00 0 136436.260992

103 U00104 21.0 Male NaN Apple ... 1129.0 Entertainment 2023-01-05 07:00:00 0 86387.988358

104 U00105 56.0 NaN NaN Motorola ... 1037.0 Entertainment 2023-01-05 08:00:00 1 91850.388545

... ... ... ... ... ... ... ... ... ... ... ...

1995 U01996 28.0 Other Bangalore Google Pixel ... 547.0 Entertainment 2023-03-25 03:00:00 2
108289.398497

1996 U01997 57.0 Male NaN Motorola ... 900.0 Work 2023-03-25 04:00:00 3 89147.220506

1997 U01998 23.0 NaN Ahmedabad OnePlus ... 509.0 Entertainment 2023-03-25 05:00:00 2
374678.862645

1998 U01999 25.0 Male Pune NaN ... 138.0 Gaming 2023-03-25 06:00:00 1 129811.377124

1999 U02000 35.0 Female Kolkata Vivo ... NaN Gaming 2023-03-25 07:00:00 1 142723.216312

[1900 rows x 19 columns]

You might also like