0% found this document useful (0 votes)
6 views37 pages

SHYAM 1 Final 2.o

Uploaded by

arunharini641
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views37 pages

SHYAM 1 Final 2.o

Uploaded by

arunharini641
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

ANALYSIS OF LAPTOP PRICE DATASET ON LENOVO

SYNOPSIS

The analysis of laptop price dataset on Lenovo provides a comprehensive collection of laptop
specifications along with their corresponding prices, aimed at facilitating data analysis and
predictive modelling. The dataset includes features such as brand, model, processor type, RAM
size, storage capacity, screen size, and additional attributes like graphics card, operating system,
and battery life. This dataset can be used to analyse pricing trends, understand the influence of
different specifications on the price, and develop machine learning models to predict the price
of a laptop based on its features. It serves as a valuable resource for market research, pricing
strategy development, and consumer decision-making insights.

The analysis of laptop price dataset contains detailed information on various laptop models,
including key features, specifications, and pricing. Each entry typically includes attributes such
as brand, model name, processor type, RAM size, storage capacity (HDD or SSD), screen size,
resolution, GPU details, operating system, and the price of the laptop. This dataset is valuable
for analysing trends in laptop pricing, comparing models based on features, and understanding
how specifications influence cost. It is particularly useful for data-driven decision-making,
helping consumers and businesses find the best value laptops based on their needs and budget.

The dataset can also be used for exploratory data analysis, price prediction modelling, and
market trend analysis. With features like processor speed, storage type, and RAM size
influencing laptop performance, this data offers insights into how technological advancements
affect pricing. Additionally, it can help identify patterns such as price variations across different
brands, premium vs. budget segments, and the impact of high-end features like dedicated GPUs
or high-resolution displays. By leveraging this dataset, analysts can develop machine learning
models to predict laptop prices based on specifications or examine consumer preferences in the
laptop market.

1
INTRODUCTION

1.1 INTRODUCTION:

The rapid growth of the laptop market, driven by technological advancements and consumer
demand, has led to a wide variety of laptop models, brands, and specifications. As a result,
consumers are often faced with the challenge of selecting the right laptop based on their budget
and required features. In this context, analysing laptop prices becomes crucial, both for
consumers looking to make informed purchasing decisions and for businesses aiming to
position their products competitively.

This mini-project focuses on analysing a analysis of laptop price dataset on Lenovo, which
includes various features such as brand, specifications (processor, RAM, storage), and price.
The primary objective of this project is to explore and understand the key factors influencing
laptop prices. Through data analysis, we will uncover trends, patterns, and relationships that
can help predict laptop prices of Lenovo based on different features.

The project will involve several steps, including data preprocessing, exploratory data analysis
(EDA), and building models for price prediction. By the end of this project, we aim to provide
insights into how different factors contribute to laptop pricing and potentially build a model
that can estimate the price of a laptop given its specifications.

Key Objectives:

1. Data Exploration: Understanding the structure of the dataset, identifying key variables,
and performing initial data cleaning.

2. Price Trends Analysis: Analysing how price varies with respect to different features
such as brand, RAM size, processor type, and storage capacity.

3. Predictive Modelling: Building machine learning models to predict laptop prices based
on their specifications.

2
1.2. ABOUT THE PROJECT:

The project aims to analyse and predict laptop prices of Lenovo using a comprehensive dataset
that encompasses various specifications and features of laptops available in the market. The
dataset includes attributes such as processor type, RAM size, storage capacity, screen size,
brand, and additional features, allowing for a detailed examination of factors that influence
pricing. Through data preprocessing techniques, including handling missing values and
encoding categorical variables, we will prepare the data for analysis. Dimensionality reduction
techniques, such as Principal Component Analysis (PCA) and t-Distributed Stochastic
Neighbour Embedding (t-SNE), will be employed to simplify the dataset and enhance
visualization, helping to identify key trends and relationships among features. Additionally,
machine learning algorithms will be utilized to build predictive models that estimate laptop
prices based on the selected features. The project's ultimate goal is to provide insights into the
pricing dynamics of laptops, enabling consumers to make informed purchasing decisions and
helping manufacturers understand market trends. Through rigorous analysis and modelling, this
project seeks to contribute valuable knowledge to the field of e-commerce and consumer
electronics.

• Data Collection and Preprocessing: We will begin by collecting a diverse dataset that
encompasses a wide range of laptops available in the market. Data preprocessing steps will
include handling missing values, removing duplicates, and encoding categorical variables to
ensure that the dataset is clean and suitable for analysis. This stage is crucial for maintaining
the integrity of the data and improving the accuracy of subsequent analyses.

• Exploratory Data Analysis (EDA): EDA will be conducted to understand the


underlying patterns and relationships within the data. This will involve visualizing
distributions, examining correlations among features, and identifying trends that may influence
laptop prices. We will use techniques like correlation matrices and scatter plots to uncover
significant insights, helping to guide our feature selection process.

• Dimensionality Reduction: To streamline the dataset and enhance visualization, we


will implement dimensionality reduction techniques such as Principal Component Analysis
(PCA) and t-Distributed Stochastic Neighbour Embedding (t-SNE). These methods will help
reduce the complexity of the data while preserving essential information, making it easier to
analyse and interpret.

3
• Feature Selection: Utilizing techniques like Recursive Feature Elimination (RFE) and
evaluating feature importance from tree-based models (e.g., Random Forest), we will identify
the most significant features that contribute to laptop pricing. This focused approach will
improve model performance and reduce the risk of overfitting.

• Conclusion and Future Work: The project will conclude with a comprehensive report
summarizing the findings, insights, and recommendations based on the analysis. We will also
discuss potential future work, including exploring additional datasets, refining models, or
incorporating advanced techniques such as deep learning for more accurate price predictions

4
1.3. DATA OVERVIEW:

The analysis of laptop price on Lenovo serves as a comprehensive resource for


understanding the various attributes that influence laptop pricing. Typically, it includes features
such as Brand, RAM, Storage, Processor Type, Screen Size, Operating System, and Price
(USD). Each of these features provides valuable insights into consumer preferences and market
trends. For example, attributes like RAM and Storage capacity can significantly affect
performance and are often correlated with higher prices. The dataset may also include
categorical variables that require encoding for analysis. A data overview table summarizes the
key features, their data types, and a brief description of each, offering a quick reference for
analysts and data scientists.

FEATURE DATA TYPE DESCRIPTION

Brand Categorical The manufacture or brand


name of the laptop.
Model Categorical The specific model name of the
laptop.
Ram (GB) Numerical The amount of RAM installed
in the laptop.
Storage (GB) Numerical The storage capacity of the
laptop (SSD/HDD).
Processor Type Categorical The type of processor used
(Intel and AMD).

Processor speed (GHz) Numerical The clock speed of yhe


processor in gigahertz.

Screen Size(inches) Numerical The diagonal size of the laptop.

Operating System Categorical The operating system installed


(e.g, Windows and macOS).

Price (USD) Numerical The retail price of the laptop in


U.S. dollars

5
2.DATA PREPROCESSING

Data preprocessing for the analysis of laptop price on Lenovo is a crucial step that involves
preparing the raw data for analysis and model building. It begins with handling missing values
through imputation methods such as filling numerical data with the mean or median and
categorical data with the mode or a new category. Next, categorical variables like brand and
processor type need to be encoded into numerical form using techniques like one-hot encoding
or label encoding to make them suitable for machine learning algorithms. Outliers in features
such as price, RAM, or storage are identified and handled to prevent them from skewing model
performance. Numerical features are then scaled using normalization or standardization to
ensure that attributes like price, RAM, and weight have comparable ranges, which helps
improve model accuracy. Finally, splitting the dataset into training and testing sets ensures that
models are evaluated fairly on unseen data. This entire process enhances the dataset's quality
and ensures that the machine learning models can make reliable predictions.

Data preprocessing for the laptop price dataset on Lenovo is an essential step to ensure data
quality and improve the accuracy of predictive models. The process starts by addressing
missing values, where numerical columns like price, RAM, or storage are imputed with
statistical measures such as the mean or median, and categorical columns like brand or
operating system are handled by filling with the most frequent category or creating a new
category, such as "Unknown." Next, categorical features are transformed into numerical form
through encoding techniques like one-hot encoding or label encoding, making them usable in
machine learning models. Outlier detection is then performed to identify and treat any extreme
values in features like price or screen size that could negatively impact model performance.
Following this, numerical data is scaled using standardization or normalization to ensure all
features have comparable ranges, preventing any one feature from disproportionately
influencing the model. Finally, the dataset is split into training and testing sets, ensuring models
are trained and evaluated properly on different subsets of data. This preprocessing pipeline
ensures the dataset is clean, consistent, and ready for analysis, ultimately enhancing the
predictive capabilities of the model.

6
2.1 HANDLE MISSING VALUE:
Handling missing values in a dataset, such as a analysis of “laptop price" dataset on Lenovo,
requires careful consideration of the type of data, the percentage of missing values, and the
possible reasons behind the missing values. Here are common strategies for handling missing
values.

Handling missing values in a analysis of laptop price dataset on Lenovo is a critical step in
ensuring the accuracy and reliability of predictive models. Depending on the type and extent
of missing data, different strategies can be applied. For numerical features like price, RAM, or
storage, common methods include mean or median imputation, which fill missing values with
the average or middle value of the column. Categorical features, such as brand or operating
system, can be handled by imputing the most frequent category (mode) or assigning a new
category like "Unknown." If missing data is extensive and concentrated in specific rows or
columns, it may be appropriate to remove those rows or columns altogether. In more advanced
cases, machine learning algorithms like K-Nearest Neighbours (KNN) or regression models
can be used to predict missing values based on the relationships between other features. The
chosen strategy should balance preserving data integrity while minimizing bias to maintain the
predictive power of the model.

CODING:
# Import necessary libraries

import pandas as pd import

numpy as np

# Sample DataFrame (replace this with your actual data) data

= {'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, np.nan, 28],

'City': ['New York', 'London', 'Paris', np.nan]}

df = pd.DataFrame(data) # Check for missing values

print("Missing Values in Each Column (Count):")

print(df.isnull().sum())

# Drop rows or columns with missing values (if there's too much missing
data)

7
# df_cleaned = df.dropna()

# missing_percentage = df.isnull().sum() / len(df) * 100

# columns_to_drop = missing_percentage[missing_percentage > 50].index

# df_cleaned = df.drop(columns=columns_to_

df['Age'].fillna(df['Age'].mean(), inplace=True)

df['Age'].fillna(df['Age'].median(), inplace=True)

df['City'].fillna(df['City'].mode()[0], inplace=True)

#df['Age'].fillna(method='ffill', inplace=True)

#df['Age'].fillna(method='bfill', inplace=True) print("\nMissing Values

After Handling:") print(df.isnull().sum())

e.g original missing values

8
2.2. DATA TRANSFORMATION:

Data transformation is a critical step in preparing a laptop price dataset for analysis and
modelling. It involves converting raw data into a format that better suits the analytical needs of
the model. For the laptop price dataset of Lenovo, transformations might include scaling
numeric features like RAM, Storage, and Price to normalize the values, ensuring that features
with larger ranges don’t dominate the model’s learning process. Standardization or
normalization can be applied to features such as Price and Processor Speed to make the dataset
more uniform. Categorical features like Brand, Processor Type, and Storage Type may need to
be encoded into numeric values using techniques such as one-hot encoding or label encoding.
This helps machine learning algorithms interpret categorical data effectively. Additionally,
transforming skewed features, such as converting Price using log transformation, can help
reduce the impact of outliers. Overall, data transformation enhances the dataset’s structure,
enabling better model performance and more accurate price predictions.

CODING:
import pandas as pd import

NumPy as np

from sklearn.preprocessing import StandardScaler, MinMaxScaler,


OneHotEncoder data = {

'Price': [500, 800, 1200, 2000, 700, 1500, 1800, 3000],

'RAM': [4, 8, 16, 32, 8, 16, 32, 64], # in GB

'SSD': [256, 512, 1024, 2048, 512, 1024, 2048, 4096], # in GB

'Screen_Size': [13, 14, 15, 17, 15, 14, 13, 17], # in inches

'Weight': [3.5, 2.8, 2.5, 4.0, 2.9, 2.6, 3.2, 3.9], # in kg

'Brand': ['HP', 'Dell', 'Apple', 'Lenovo', 'HP', 'Apple', 'Dell',


'Lenovo'] # Categorical variable

} df = pd.DataFrame(data) scaler = StandardScaler()

df['Price_Scaled'] = scaler.fit_transform(df[['Price']])

min_max_scaler =

MinMaxScaler() df['RAM_Norm'] =

min_max_scaler.fit_transform(df

[['RAM']]) df_encoded =

pd.get_dummies(df,

columns=['Brand'],

9
drop_first=True)

df['Price_per_GB_RAM'] =

df['Price'] / df['RAM']

print("Transformed Dataset:")

print(df_encoded) OUTPUT:

10
2.3 DATA CLEANING:

Data cleaning for the analysis of laptop price dataset of Lenovo involves identifying and
correcting inconsistencies, inaccuracies, and missing information to ensure the data is reliable
for analysis and modelling. The process begins by removing or imputing missing values in
critical columns like price, RAM, or storage, using techniques such as mean or median
imputation for numerical fields and mode imputation or assigning an "Unknown" category for
categorical fields like brand or operating system. Duplicate rows, if any, are identified and
removed to prevent skewing the analysis. Inconsistent data formats across columns, such as
variations in units for weight or storage (e.g., GB, TB), are standardized to maintain uniformity.
Outliers, such as unusually high or low prices that may not reflect realistic values, are detected
and handled through capping or removal to prevent them from distorting results. The cleaning
process also includes fixing typographical errors, normalizing categorical values (e.g., ensuring
consistent brand names), and ensuring all necessary columns are present and correctly
formatted. This thorough data cleaning ensures the dataset is accurate, consistent, and suitable
for building robust machine learning models.

CODING:
import pandas as pd

import numpy as np data

'Price': [500, 800, 1200, np.nan, 700, 1500, 1800, 3000, 3000],

'RAM': [4, 8, 16, 32, 8, 16, 32, 64, 'NaN'], # One entry is a string

'SSD': [256, 512, 1024, 2048, 512, 1024, 2048, np.nan, 4096],

'Screen_Size': [13, 14, np.nan, 17, 15, 14, 13, 17, 17],

'Weight': [3.5, 2.8, 2.5, 4.0, np.nan, 2.6, 3.2, 3.9, 3.9]

df =pd.DataFrame(data)

print("Original

Dataset:") print(df)

df.fillna(df.mean(numer

ic_only=True),

11
inplace=True) df['RAM']

pd.to_numeric(df['RAM']

, errors='coerce') #

Fill any resulting NaN

values from coercion

df['RAM'].fillna(df['RA

M'].mean(),

inplace=True)

df.drop_duplicates(inpl

ace=True)

df.reset_index(drop=Tru

e, inplace=True)

print("\nCleaned

Dataset:") print(df)

OUTPUT:

E.g: Data cleaning

12
3.DATA ANALYSIS

Data analysis for the analysis of laptop price dataset on Lenovo involves exploring the
relationships between various laptop features and their impact on price, with the goal of
identifying trends and insights. The process begins with descriptive statistics to summarize the
dataset, such as calculating the mean, median, and distribution of key variables like price,
RAM, and storage. Visualizations like bar plots, histograms, and scatter plots are then used to
observe the distribution of laptop prices across brands, processor types, and other
specifications, highlighting which factors are most correlated with price variations. Correlation
analysis is conducted to understand the strength and direction of relationships between
numerical features, such as the positive correlation expected between price and RAM or storage
capacity. Grouping and aggregation techniques are applied to identify price trends by brand or
performance tier, showing which brands command higher premiums or which hardware
configurations offer the best value. Through this analysis, important insights emerge, such as
how specific features (e.g., screen size, GPU type) influence price, providing valuable guidance
for further modelling and decision-making.

Data analysis for the laptop price dataset on Lenovo focuses on understanding the relationships
between various laptop features and their impact on price. The process begins with descriptive
statistics, where key metrics such as the mean, median, and range of features like price, RAM,
storage, and screen size are examined to gain an initial understanding of the dataset.
Visualizations, such as histograms, box plots, and scatter plots, are used to explore distributions
and detect patterns or outliers in the data. For instance, scatter plots can reveal the correlation
between price and RAM, or between price and processor type, helping to identify which
features are most influential in determining the price. Additionally, bar charts can be used to
analyse price variations across different brands and operating systems. Correlation matrices
help in identifying multicollinearity among numerical variables, which might influence model
performance. This exploratory data analysis provides valuable insights into the dataset, guiding
feature selection and informing subsequent machine learning model development for price
prediction.

13
3.DATA ANALYSIS

3.1. DESCRIPTIVE STATISTIC:


Descriptive statistics provide a summary of the central tendency, dispersion, and shape of a
dataset's distribution. Below is a structured representation of descriptive statistics you might
compute for a laptop price dataset on Lenovo.

3.2. ANALYZING PRICE AND TRENDS:


Analysing price trends in the laptop price dataset on Lenovo reveals insights into market
dynamics and consumer preferences. For instance, if the average price has been steadily
increasing over the years, it may indicate a growing demand for high-performance laptops.
Conversely, a decline in prices could suggest market saturation or increased competition.
Seasonal fluctuations might also be evident, with prices dropping during major sales events
like Black Friday. Additionally, the correlation between specifications (like RAM or storage)
and price can highlight the value consumers place on certain features. Overall, these trends help
inform purchasing decisions and guide manufacturers in product development.

Analysing price trends in the laptop price dataset can provide valuable insights into the market
dynamics and consumer preferences. Typically, laptops with higher specifications—such as
increased RAM, storage capacity, or premium processors—tend to command higher prices.
Brands also play a crucial role, with premium brands often pricing their models higher due to
factors like build quality, brand value, and after-sales services. Seasonal trends might also

14
emerge, with price drops around major sale events like Black Friday or during back-to-school
promotions. By tracking these trends, businesses can better understand pricing strategies, and
consumers can identify the best times or features to prioritize when purchasing. This analysis
is crucial for both manufacturers looking to optimize pricing models and consumers seeking
the best value for money.

CODING:
import pandas as pd import

matplotlib.pyplot as plt

# Load the dataset data =

pd.read_csv('laptop_price_data.csv')

# Group by price ranges or specific categories, e.g., by Brand and get the

mean price price_trends = data.groupby('Brand')['Price

(USD)'].mean().sort_values()

# Plot the bar chart plt.figure(figsize=(10,6))

price_trends.plot(kind='bar', color='skyblue')

# Add labels and title plt.title('Average Laptop Prices

by Brand', fontsize=16) plt.xlabel('Laptop Brand',

fontsize=12) plt.ylabel('Average Price (USD)',

fontsize=12)

plt.xticks(rotation=45

) plt.tight_layout()

plt.show() OUTPUT:

15
16
3.3. INSIGHTS FROM THE DATASET:

RAM and Storage Influence on Price: Higher RAM (e.g., 16 GB) and larger storage (e.g., 1 TB
SSD) are strong predictors of higher laptop prices, reflecting their importance in performance.

Processor Impact: Laptops with advanced processors (e.g., Intel i7, AMD Ryazan 7) tend to be
priced significantly higher, indicating that processor type plays a crucial role in pricing.

Brand Value: Premium brands like Apple, Dell, and HP have higher average prices compared
to lesser-known brands, signifying the role of brand loyalty and perceived quality in price
determination.

Price Segmentation: The market appears segmented into budget, mid-range, and premium
categories, with lower-priced models targeting students or casual users and high-end laptops
catering to gamers and professionals.

SSD vs. HDD: Laptops with SSD storage are priced higher than those with traditional HDDs,
suggesting the growing consumer preference for faster, more efficient storage.

Effect of Screen Size: Larger screen sizes (15.6 inches and above) often correlate with higher
prices, indicating that consumers are willing to pay more for larger displays.

Seasonal Pricing Trends: Prices may drop during major sale events (like Black Friday or
backto-school promotions), offering opportunities for cost-conscious buyers.

17
4.FEATURE SELECTION

Feature selection is a crucial step in the data preprocessing phase that involves selecting a subset
of relevant features for model training. In the analysis of laptop price dataset on Lenovo, key
features such as RAM, Storage, Processor type, and Brand can significantly influence the price.
Correlation analysis helps to identify relationships between these features and the target
variable, allowing us to understand which attributes contribute most to variations in laptop
prices. For instance, high correlation coefficients between Price and RAM or Storage indicate
that these features should be retained in the model, as they provide valuable information. On
the other hand, features with low correlation might be considered for removal to reduce
dimensionality and enhance model performance.

Feature selection is an essential process in the data preprocessing phase of machine learning,
aimed at identifying the most relevant variables for model training. In the context of a laptop
price dataset, effective feature selection helps to enhance model performance by retaining
features that significantly contribute to predicting the price while eliminating redundant or
irrelevant variables. This process not only reduces the complexity of the model, leading to faster
training times and improved interpretability but also mitigates the risk of overfitting. For
instance, features like RAM, Storage, Processor type, and Brand often show strong correlations
with price and can provide critical insights into consumer preferences. By using statistical
techniques such as correlation analysis, recursive feature elimination, or tree-based methods,
one can systematically assess the importance of each feature, ensuring that the final model is
both efficient and robust. Ultimately, careful feature selection plays a pivotal role in building
predictive models that accurately reflect the underlying data patterns, resulting in better
decision-making and more actionable insights.

18
4.1. CORRELATION MATRIX:

A correlation matrix for the analysis laptop price dataset on Lenovo helps identify the
strength and direction of relationships between various features, such as RAM, Storage,
Processor type, and Price. A high positive correlation (close to +1) between variables like RAM
and Price, or Storage and Price, indicates that as RAM or storage capacity increases, the laptop
price tends to rise as well. This suggests that higher-specification laptops command a premium.
On the other hand, features with low or negative correlation may have little to no impact on
pricing. For example, there may be weaker correlations between screen size or battery life and
price, indicating that while these features matter to some extent, they are not the primary drivers
of cost. By analysing these correlations, businesses can identify which features have the most
significant influence on pricing, allowing for more targeted marketing and product
development strategies.

A high correlation between processor type (e.g., Intel i7, Ryazan 7) and price would further
show that laptops equipped with premium processors command higher prices. On the other
hand, a lower correlation with features like battery life or screen size might indicate that while
these factors contribute to consumer decisions, they don’t significantly affect pricing compared
to core hardware components like RAM and storage.

The correlation matrix can also reveal interesting dynamics between non-numeric factors, like
brand and price. While brand itself may not have a numerical value, laptops from premium
brands such as Apple or Dell could show an implicit relationship with price due to their higher
perceived value, which could be reflected in a positive correlation when brand-specific dummy
variables are used.

By analysing the correlations, companies can better understand what features to prioritize in
their pricing strategies. For instance, a high correlation between SSD storage and Price might
encourage manufacturers to promote SSD-equipped laptops as premium products. In summary,
the correlation matrix serves as a key tool for uncovering which features most impact price,
guiding product development, marketing, and consumer decision-making.

19
4.1 CORRELATION MATRIX OUTPUT

20
4.2. SELECTING RELEVANT FEATURES:

Selecting relevant features for a analysis laptop price dataset on Lenovo is crucial for
building an effective predictive model. The process typically begins with data preprocessing,
which includes handling missing values and encoding categorical variables. Correlation
analysis helps identify relationships between features and the target variable, allowing for the
visualization of how various attributes, such as processor speed, RAM, and storage, correlate
with laptop prices. Subsequently, a machine learning model, such as a Random Forest regressor,
can be employed to assess feature importance, highlighting which features contribute the most
to price prediction. Finally, Recursive Feature Elimination (RFE) can be utilized to refine the
feature set by recursively selecting the most significant features while eliminating the least
impactful ones. By systematically identifying and retaining only the most relevant features, the
model can improve in accuracy, interpretability, and computational efficiency, ultimately
leading to better insights and predictions regarding laptop prices on Lenovo.

Correlation Analysis:

• Use a correlation matrix to visualize relationships between features and the target
variable (laptop price).
• Identify highly correlated features that may provide similar information.

• Feature Importance:

• Train a machine learning model, such as Random Forest, to assess feature importance
scores.
• Visualize the top features that significantly influence laptop prices.

• Recursive Feature Elimination (RFE):

• Implement RFE to iteratively remove the least important features based on a chosen
model.
• Focus on retaining features that contribute the most to predictive performance.

• Leverage knowledge of the laptop market to include features likely to affect price, such
as brand, specifications (RAM, processor type, storage), and condition (new vs. used).

21
• Model Evaluation:

• Use cross-validation to evaluate the performance of the model with the selected features
and ensure robustness.

• Iterative Process:

• Feature selection is iterative; continuously refine the feature set based on model
performance and insights gained during analysis.

22
4.3 DIMENSIONALITY REDUCTION TECHNIQUES:

Dimensionality reduction techniques are essential for managing and analysing a laptop
price dataset, particularly when dealing with a large number of features. These methods aim to
reduce the number of variables while retaining the most critical information, improving model
performance and interpretability. One widely used technique is Principal Component Analysis
(PCA), which transforms the original features into a smaller set of uncorrelated variables called
principal components, capturing the majority of the variance in the data. Another effective
method is t-distributed Stochastic Neighbour Embedding (t-SNE), which is particularly useful
for visualizing high-dimensional data in lower dimensions while preserving the local structure
of the data. Linear Discriminant Analysis (LDA) can also be employed when the target variable
is categorical, helping to find the feature subspace that best separates the different classes (e.g.,
laptop brands or categories). Additionally, autoencoders, a type of neural network, can learn
efficient representations of the input data by compressing it into a lower-dimensional space and
then reconstructing it. By applying these dimensionality reduction techniques, analysts can
simplify complex datasets, enhance computational efficiency, and potentially improve the
predictive accuracy of models aimed at estimating laptop prices.

• Principal Component Analysis (PCA):

• Transforms the data into a lower-dimensional space while retaining the most variance.
• Identifies the directions (principal components) along which the variance is maximized.

• t-Distributed Stochastic Neighbour Embedding (t-SNE):

• A non-linear technique primarily used for data visualization in two or three dimensions.
• Preserves local structures, making it effective for visualizing clusters in high
dimensional datasets.

• Linear Discriminant Analysis (LDA):

• A supervised method that reduces dimensionality while maximizing the separation


between multiple classes.
• Useful for feature extraction and classification tasks.

23
Autoencoders:

• A type of neural network that learns to encode data into a lower-dimensional


representation and then reconstruct it.
• Effective for capturing complex patterns in data while reducing dimensions.

Independent Component Analysis (ICA):

Similar to PCA but focuses on finding independent components rather than uncorrelated ones.

• Commonly used in signal processing and image analysis.

Random Projections:

• A technique that uses random matrices to project high-dimensional data into a lower
dimensional space.
• Retains distances between points with high probability, making it efficient and scalable.

Feature Selection Methods:

• Techniques like Recursive Feature Elimination (RFE) and feature importance from
models (e.g., Random Forest) can help reduce dimensionality by selecting only the most
relevant features.

Kernel PCA:

• An extension of PCA that applies kernel methods to capture non-linear relationships in


the data.
• Suitable for datasets where relationships are not linearly separable.

Singular Value Decomposition (SVD):

• A mathematical technique that decomposes a matrix into its singular values and vectors.
• Often used in collaborative filtering and image compression.

24
5.DATA VISUALIZATION

Data visualization is an essential aspect of analysing the analysis of laptop price dataset on
Lenovo, as it transforms complex numerical data into intuitive graphical formats that enhance
comprehension and insight. By employing visualization libraries such as Matplotlib and
Seaborn, we can create various types of charts and plots to better understand the relationships
among different variables. For example, histograms can illustrate the distribution of laptop
prices, revealing important information such as skewness, outliers, and the range of prices
available in the market. Box plots serve as effective tools for comparing price variations across
different brands or laptop types, highlighting key statistics like medians and interquartile
ranges.

Scatter plots can further elucidate relationships between price and critical features such as RAM
size, processor type, and storage capacity. These plots allow us to visually assess how these
specifications impact the pricing of laptops, providing a clear understanding of market
dynamics. Additionally, a heatmap can be employed to visualize correlation matrices,
indicating which features are most strongly associated with laptop prices. This can be
particularly useful for identifying redundant features or key drivers of price.

Furthermore, employing pie charts or bar charts to represent the proportions of different laptop
types or operating systems in the dataset can help us understand consumer preferences and
trends. By integrating these various visualization techniques, we can derive actionable insights
that guide consumers in their purchasing decisions and assist manufacturers in understanding
market trends and customer needs. Ultimately, effective data visualization not only clarifies
complex datasets but also facilitates more informed decision-making, making it an invaluable
tool in the analysis of the laptop price dataset.

Understanding Price Distribution: Data visualization techniques, such as histograms and box
plots, can be employed to examine the distribution of laptop prices. By visualizing the price
range and identifying potential outliers, stakeholders can gain insights into market trends and
pricing strategies, enabling them to make informed decisions about product offerings and
marketing approaches.

25
Analysing Feature Relationships: Scatter plots and correlation heatmaps can help analyse the
relationships between laptop prices and key features, such as RAM, processor type, and storage
capacity. These visualizations can reveal how different specifications influence pricing, aiding
both consumers in selecting laptops that fit their needs and manufacturers in optimizing their
product designs based on market demand.

CODING:

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C', 'D', 'E']

values = [10, 24, 36, 40, 29]

plt.bar(categories, values, color='skyblue')

plt.xlabel('Categories')

plt.ylabel('Values')

plt.title('Bar Chart Example')

plt.show()

26
27
5.1. VISUALIZATION OF FEATURES:

Price Distribution:
A histogram of the analysis of laptop prices on Lenovo can reveal the overall distribution and
spread of prices across different laptop models. This visualization helps us understand the most
common price ranges and identify any extreme values, such as high-end gaming laptops or
budget-friendly options. A box plot can complement this by highlighting outliers, showing the
range of typical laptop prices, and providing insights into which models fall into premium or
low-budget categories.

Brand vs. Price:

A bar chart that groups laptops by brand and their average prices is useful for understanding
brand influence on pricing. For example, luxury brands like Apple or high-end gaming brands
like Alienware tend to have higher average prices compared to more budget-friendly brands
like Acer or Lenovo. A box plot for each brand can further illustrate the price variance within
the brand, showing whether the brand offers a wide range of models or sticks to a specific price
segment.

Processor Type vs. Price:

Processors play a significant role in laptop pricing. A box plot grouped by processor types (e.g.,
Intel i3, i5, i7, AMD Ryzen 5, Ryzen 7) can show the price distribution for laptops with different
CPUs. Higher-end processors are likely to correspond with higher prices, while budget models
tend to have older or lower-tier processors. This visualization helps in determining the price
premium for specific processors and understanding how much a better processor drives up the
price.

RAM Size and SSD Storage vs. Price:

The relationship between RAM size and SSD storage with price can be captured using scatter
plots or line charts. For RAM, laptops with 8GB, 16GB, or 32GB tend to show a strong positive
correlation with price, where more RAM generally means a higher price. Similarly, SSD
storage size (e.g., 256GB, 512GB, 1TB) can be visualized to show how much additional storage
capacity impacts pricing. These plots make it easy to observe trends in hardware upgrades and
their effect on laptop costs.

Screen Size and Weight vs. Price:

28
A scatter plot for screen size versus price can help visualize how screen dimensions influence
the price. Larger screen sizes, such as 17-inch laptops, often come with higher prices due to
better display quality and larger form factors, while smaller, ultra-portable 13-inch laptops may
also command a premium for portability. Similarly, a scatter plot for weight vs. price can show
how lighter laptops, typically built for portability, are often more expensive than heavier
models. This correlation is important for users prioritizing portability and design.

GPU Type and Gaming Laptops vs. Price

For laptops aimed at gamers or professionals needing graphical power, a bar chart can show
how the GPU type (e.g., NVIDIA GeForce, AMD Radeon) impacts the price. Gaming laptops
with dedicated graphics cards generally cost significantly more than those without, and
visualizing the GPU types alongside prices provides insights into how much more a user might
expect to pay for enhanced graphics performance.

Battery Life vs. Price:

A box plot or line chart comparing battery life (hours) with price can illustrate whether laptops
with longer battery life, such as ultra books, have higher prices. This is particularly relevant for
users who prioritize mobility and work on-the-go, helping to identify laptops with a good
balance of battery life and price.

CODING:

import pandas as pd

import matplotlib.pyplot as plt

data = {

'Feature1': [12, 7, 9, 13, 5, 18, 14, 10, 9, 16],

'Feature2': [22, 25, 19, 24, 26, 21, 23, 22, 20, 24],

'Feature3': [34, 31, 32, 36, 35, 33, 31, 34, 32, 30]

df = pd.DataFrame(data)

df.hist(bins=10, color='skyblue', edgecolor='black', figsize=(10, 5))

29
plt.suptitle('Histograms of Features')

plt.tight_layout()

plt.show()

6.1 VISULAIZATION OF FEATURES

30
5.2. RELATIONSHIP BETWEEN PRICE AND OTHER FEATURES:

The relationship between analysis laptop price of Lenovo and other features is
typically influenced by a combination of performance, brand, and build quality. Processor type
is one of the most significant factors, as high-performance CPUs like Intel i7 or AMD Ryazan
7 often drive prices up compared to entry-level processors. RAM size also plays a critical role,
with laptops featuring 16GB or 32GB of RAM generally priced higher than models with 4GB
or 8GB, as they offer better multitasking and performance. Storage capacity, especially SSD
size (e.g., 512GB or 1TB SSD), is another key factor influencing price, as larger, faster storage
options tend to increase costs.

Brand reputation significantly impacts pricing, with premium brands like Apple and Dell
typically having higher prices due to their design, build quality, and brand value. Screen size
and resolution also correlate with price; larger screens and higher-resolution displays (e.g., 4K)
command higher prices. Dedicated GPUs for gaming or graphic-intensive tasks can
substantially increase the price compared to laptops with integrated graphics. Finally, battery
life and portability (lighter, thinner designs) are often priced higher as consumers are willing
to pay more for convenience and mobility.

In essence, laptop prices are driven by a complex interplay of performance-related features,


brand, and design elements, where each additional enhancement typically leads to a higher
price bracket.

CODING: import matplotlib.pyplot as plt

ram_sizes = [4, 8, 16, 32] # In GB

ssd_sizes = [256, 512, 1024, 2048] # In GB

screen_sizes = [13, 14, 15, 17] # In inches

prices = [500, 800, 1200, 2000] # In USD

plt.figure(figsize=(10, 6))

plt.plot(ram_sizes, prices, marker='o', label='RAM Size vs Price',


color='blue')
plt.plot(ssd_sizes, prices, marker='s', label='SSD Size vs Price',
color='green')

plt.plot(screen_sizes, prices, marker='^', label='Screen Size vs Price',

color='red') plt.xlabel('Feature Values') plt.ylabel('Price (USD)')

31
plt.title('Relationship Between Laptop Price and Other Features')

plt.legend() plt.grid(True) plt.show()

OUTPUT:

32
5.3. INSIGHTS FROM VISUAL REPRENSENTATIONS:

The visualizations of the analysis laptop price dataset of Lenovo further illustrate
important trends and relationships among various features. For instance, the pie chart depicting
brand distribution indicates a market dominated by a few key players, with brands like HP and
Lenovo occupying significant market shares. This concentration suggests that these brands
have successfully positioned themselves as reliable options for consumers, while niche brands
may struggle to gain visibility and sales.

Moreover, the box plots showcasing processor types reveal not only differences in average
prices but also the price variance within each processor category. High-performance processors
such as Intel i7 or AMD Ryzen 7 are associated with wider price ranges, indicating a diverse
offering of laptops that cater to both budget-conscious consumers and high-end users seeking
premium performance. This suggests that within each processor category, there are laptops
designed for various use cases, from casual browsing to demanding gaming and professional
applications.

Another insight arises from analysing the scatter plots comparing weight and battery life against
price. These visualizations reveal a trend where lighter laptops often come at a premium price,
emphasizing the growing consumer preference for portability without compromising on
performance. Additionally, laptops with extended battery life generally command higher prices,
reflecting the increasing demand for mobile devices that can sustain long working hours
without frequent recharging. This trend indicates a shift in consumer priorities towards usability
and convenience, driving manufacturers to innovate in lightweight designs and efficient battery
technologies.

Finally, the line chart highlighting the relationships between price and features such as RAM,
SSD size, and screen size emphasizes the importance of technological advancements in driving
laptop prices upward. As more consumers seek powerful, multitasking devices, manufacturers
are likely to continue integrating higher specifications, thereby increasing the overall price
point of laptops. This suggests a potential market trend where laptops with advanced features
become the norm, potentially leaving budget options as outliers in a rapidly evolving
technology landscape. Overall, these insights gleaned from the visual representations provide
a comprehensive understanding of the dynamics shaping the laptop market and guide both
consumers and manufacturers in making informed decisions.

33
CODING:

import pandas as pd import

matplotlib.pyplot as plt import

seaborn as sns data = {

'Price': [500, 800, 1200, 2000, 700, 1500, 1800, 3000],

'RAM': [4, 8, 16, 32, 8, 16, 32, 64], # in GB

'SSD': [256, 512, 1024, 2048, 512, 1024, 2048, 4096], # in GB

'Screen_Size': [13, 14, 15, 17, 15, 14, 13, 17] # in inches

} df = pd.DataFrame(data) sns.set(style="whitegrid") fig, axs =

plt.subplots(1, 3, figsize=(18, 5)) sns.regplot(x='RAM', y='Price',

data=df,ax=axs[0], marker='o', color='blue') axs[0].set_title('Price vs RAM

Size') axs[0].set_xlabel('RAM Size (GB)') axs[0].set_ylabel('Price (USD)')

sns.regplot(x='SSD',y='Price',data=df,ax=axs[1], marker='s', color='green')

axs[1].set_title('Price vs SSD Size') axs[1].set_xlabel('SSD Size (GB)')

axs[1].set_ylabel('Price (USD)')

sns.regplot(x='Screen_Size', y='Price', data=df, ax=axs[2], marker='^',

color='red') axs[2].set_title('Price vs Screen Size')

axs[2].set_xlabel('Screen Size (inches)') axs[2].set_ylabel('Price (USD)')

plt.tight_layout() plt.show()

34
6.CONCLUSION AND FUTURE ENHANCEMENT

6.1 CONCLUSION:
In conclusion, analysing the laptop price dataset of Lenovo provides valuable insights
into the factors influencing laptop pricing. Variables such as brand, processor type, RAM,
storage capacity, and display size play significant roles in determining a laptop's price. By
understanding these factors, consumers can make more informed purchasing decisions, and
businesses can tailor their pricing strategies to meet market demands. Additionally, identifying
trends within the dataset can help predict future pricing models, enabling both buyers and
sellers to better navigate the evolving laptop market.

The analysis of the laptop price dataset of Lenovo highlights several key factors that directly
affect the pricing of laptops. Features such as processor type, RAM size, storage type (SSD or
HDD), and screen size emerge as major determinants of a laptop’s cost. Higher-end models
with powerful processors, more RAM, and SSD storage generally command higher prices.

Additionally, brand reputation and product features like display resolution, graphics card, and
battery life also contribute to variations in laptop pricing. Premium brands tend to price their
products higher, even for similar specifications, due to factors like design, build quality, and
customer service.

Understanding these factors can help consumers choose laptops that offer the best value for
money while enabling retailers to adjust their pricing strategies to meet market demand. In
summary, the dataset offers valuable insights for both buyers and sellers, providing a clearer
picture of how different features and brand positioning influence laptop prices.

35
6.2 FUTURE ENHANCEMAENT:
For future enhancement of the analysis laptop price dataset of Lenovo, several
improvements can be made to enrich analysis and insights. Incorporating additional variables
like GPU performance, battery capacity, and build materials could provide a more
comprehensive understanding of laptop pricing. Adding real-time market data, user reviews,
and demand trends could further enhance predictions of price fluctuations. Expanding the
dataset to include new and emerging brands, as well as more detailed warranty and after-sales
service information, would improve its relevance. Additionally, leveraging machine learning
models to predict future laptop prices based on historical trends and technological
advancements could offer more robust forecasting capabilities.

Future enhancements to the analysis laptop price dataset of Lenova can greatly improve its
utility and depth of analysis. One key improvement would be to include additional variables
such as GPU specifications, battery life, and build materials, as these factors also influence the
price of laptops. Expanding the dataset with real-time pricing data, as well as incorporating
user reviews and satisfaction ratings, would offer a more comprehensive understanding of how
consumer preferences and market trends affect pricing.

Moreover, adding data on newer brands and emerging models will ensure the dataset stays
relevant in a rapidly evolving tech landscape. Machine learning models can be developed to
predict future laptop prices based on historical trends and advancements in technology,
providing valuable insights for consumers and retailers alike. Including region-specific pricing
data would also be beneficial, offering insights into how global supply chains and regional
markets influence pricing. These enhancements would make the dataset more robust and
valuable for future analysis and forecasting.

36
REFERENCES

WEB REFRENCE:

1. https://www.kaggle.com/datasets/mohidabdulrehman/laptop-price-dataset
2. https://colab.research.google.com/drive/1Rhd21aDg9OEX-
u1LJiVus59X99rBscCb#scrollTo=kHowkfwowMij

37

You might also like