0% found this document useful (0 votes)
4 views

GR P Assignment Code

Uploaded by

S U P R E M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

GR P Assignment Code

Uploaded by

S U P R E M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Assignment/laptop_price.csv',
encoding='latin1')

print(df.head())

# Display data types of each column

print(df.dtypes)

# Statistical summary

summary = df.describe()

print(summary)

# Replace '?' with NaN

df.replace('?', pd.NA, inplace=True)

# Now you can handle missing values using methods like imputation or dropping

# Identify missing values

missing_values = df.isnull().sum()

print(missing_values)

# Define independent and dependent variables

independent_variables = ['Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution',


'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight']

dependent_variable = 'Price_euros'

Correlation Coefficient between Independent and Dependent Variables

# Convert 'Weight' column to string type

df['Weight'] = df['Weight'].astype(str)

# Remove non-numeric characters from the 'Weight' column


df['Weight'] = df['Weight'].str.replace('kg', '').astype(float)

# Select only numerical columns for correlation calculation

numerical_columns = ['Inches', 'Ram', 'Weight', 'Price_euros']

correlation = df[numerical_columns].corr()

print(correlation['Price_euros'])

Model Creation to Predict Laptop Prices

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

# Define independent and dependent variables

X = df[['Inches', 'Ram', 'Weight']]

y = df['Price_euros']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions

predictions = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, predictions)

print("Mean Squared Error:", mse)


Supervised vs. Unsupervised Model

The model created above is a supervised learning model. This is because it is trained on
labeled data where the algorithm learns from the input-output pairs. In supervised
learning, the model aims to learn the mapping function from the input variables to the
output variable.

Model Comparison with Different Independent Variables

# Extract numeric values from 'ScreenResolution' column

df['ScreenResolution_Width'] = df['ScreenResolution'].str.extract(r'(\d+)x\d+')

df['ScreenResolution_Height'] = df['ScreenResolution'].str.extract(r'\d+x(\d+)')

# Convert the new columns to numeric

df['ScreenResolution_Width'] = pd.to_numeric(df['ScreenResolution_Width'],
errors='coerce')

df['ScreenResolution_Height'] = pd.to_numeric(df['ScreenResolution_Height'],
errors='coerce')

# Drop rows where width or height couldn't be extracted

df.dropna(subset=['ScreenResolution_Width', 'ScreenResolution_Height'],
inplace=True)

# Define different independent variables

X_new = df[['Ram', 'Weight', 'ScreenResolution_Width', 'ScreenResolution_Height']]

# Split the data into training and testing sets

X_train_new, X_test_new, y_train, y_test = train_test_split(X_new, y, test_size=0.2,


random_state=42)

# Create and train the Linear Regression model with new variables

model_new = LinearRegression()

model_new.fit(X_train_new, y_train)
# Make predictions with new variables

predictions_new = model_new.predict(X_test_new)

# Evaluate the model with new variables

mse_new = mean_squared_error(y_test, predictions_new)

print("Mean Squared Error with new variables:", mse_new)

You might also like