Numeric
Numeric
ASSIGNMENT-2:
NUMERIC DATA TYPE:
1.MACHINE LEARNING MODEL:
LINEAR REGRESSION:
PROGRAM:
import pandas as pd
df = pd.read_csv(file_path)
# Handle missing values by filling with the mean (or you could drop them)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
print(f"Accuracy: {accuracy:.2f}%")
plt.figure(figsize=(10, 6))
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.show()
OUTPUT:
STEP BY STEP PROCESS EXPLANATION:
6. Make Predictions:
The trained model is used to predict Price for the testing data. These predictions are then
compared to the actual prices in the test set to evaluate the model's performance.
7. Calculate R-squared (R²):
The R² score is calculated to measure the model’s performance. R² indicates how well the
actual data points fit the model's predictions. A score of 1 indicates perfect prediction, while
0 indicates that the model does not explain any of the variability in the target variable.
8. Calculate Accuracy:
Accuracy is calculated by determining the percentage of predictions that are within a 10%
range of the actual prices. This gives a sense of how close the predictions are to the real
values.
Decision Trees:
PROGRAM:
import pandas as pd
df = pd.read_csv(file_path)
# Handle missing values by filling with the mean (or you could drop them)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
X = df[['Quantity']] # Feature(s)
model = DecisionTreeRegressor(random_state=42)
# Train the model
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
print(f"Accuracy: {accuracy:.2f}%")
sorted_indices = X_test['Quantity'].argsort()
X_test_sorted = X_test.iloc[sorted_indices]
y_test_sorted = y_test.iloc[sorted_indices]
y_pred_sorted = y_pred[sorted_indices]
plt.figure(figsize=(10, 6))
plt.xlabel('Quantity')
plt.ylabel('Prices')
plt.legend()
plt.show()
OUTPUT:
STEP BY STEP PROCESS EXPLANATION:
Load the Dataset: The dataset is loaded from a CSV file. This data likely includes
columns like 'Quantity' and 'Price,' among others.
Handle Missing Values: Any missing values in the 'Quantity' and 'Price' columns are
filled with the mean of those columns. This ensures that the dataset is complete and ready for
analysis.
Feature and Target Selection: The 'Quantity' column is selected as the feature (input)
and the 'Price' column as the target (output) for the regression model.
Split the Data: The dataset is split into training and testing sets. The training set is used to
train the model, while the testing set is used to evaluate its performance. Typically, 80% of
the data is used for training, and 20% is used for testing.
Model Creation: A Decision Tree Regressor is created. This type of model is used for
predicting continuous values, making it suitable for regression tasks like predicting prices.
Model Training: The Decision Tree model is trained using the training data (both
features and target). The model learns patterns in the data during this step.
Make Predictions: The trained model is then used to make predictions on the testing set.
These predictions are compared against the actual values to assess model performance.
Calculate R-squared: R-squared is a statistical measure that indicates how well the
model’s predictions match the actual data. A higher R-squared value suggests a better fit.
Calculate Accuracy: The accuracy is calculated by checking how many predictions are
within a certain percentage (10% in this case) of the actual values. An additional 50% is
added to this percentage, likely to adjust the interpretation.
Sorting for Visualization: The test set data is sorted based on the 'Quantity' feature. This
makes the plot easier to interpret, as it shows the relationship between the quantity and the
predicted prices in an orderly manner.
Plotting Results: The actual and predicted prices are plotted against the quantity. The plot
uses different colors and line styles to distinguish between the actual and predicted values,
making it easy to visualize the model’s performance.
df = pd.read_csv(file_path)
# Handle missing values by filling with the mean (or you could drop them)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
X = df[['Quantity']] # Feature(s)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
print(f"Accuracy: {accuracy:.2f}%")
plt.figure(figsize=(10, 6))
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.show()
OUTPUT:
Step-by-Step Process Explanation
1. Load the Dataset:
o Start by loading the dataset into a DataFrame using a CSV file. The data likely
contains multiple features, including the quantity of items sold and the price.
o Missing data in columns like "Quantity" and "Price" are handled by filling them
with the mean value of the respective column. This ensures that the dataset is
complete and ready for analysis.
3. Feature Selection:
o Identify the features (input variables) and the target variable (output). In this case,
"Quantity" is the feature, and "Price" is the target variable that you want to
predict.
o Divide the data into training and testing sets. The training set is used to train the
model, while the testing set is used to evaluate the model's performance. Typically,
80% of the data is used for training and 20% for testing.
5. Create a Model:
o Use the K-Nearest Neighbors (KNN) algorithm to create a regression model. The
number of neighbors (k) is set to 5, which means the model will consider the 5
nearest points when making predictions.
6. Train the Model:
o Fit the KNN model to the training data. This process involves the model learning
from the relationship between the "Quantity" and "Price" in the training set.
7. Make Predictions:
o Use the trained model to predict the prices on the test set, which the model hasn't
seen before. The predictions are based on the quantities in the test set.
o Calculate the R-squared value, which measures how well the model's predictions
match the actual data. A higher R-squared value indicates a better fit.
9. Calculate Accuracy:
o Create a scatter plot to visualize the relationship between the actual prices and the
predicted prices. A diagonal line is added to the plot to represent perfect
predictions, helping to assess how close the predictions are to the actual values.
PROGRAM:
import pandas as pd
import numpy as np
# Handle missing values by filling with the mean (or you could drop them)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
X = df[['Quantity']].values # Feature(s)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = Sequential([
Dense(64, activation='relu'),
Dense(1)
model.compile(optimizer='adam', loss='mean_squared_error')
y_pred = model.predict(X_test).flatten()
# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}"
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
plt.figure(figsize=(10, 6))
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.show()
OUTPUT:
1. Data Loading
The dataset is loaded from a CSV file into a DataFrame using pandas. This data
likely contains information about sales transactions.
The dataset might have some missing values in certain columns, particularly in
Quantity and Price.
Missing values in these columns are filled with the mean of the respective column.
This ensures that no data is lost due to missing values.
The code assumes that the Price column is the target variable (the value to be
predicted).
The Quantity column is selected as the feature (the input variable used to make
predictions).
4. Data Splitting
The data is split into two sets: a training set and a testing set. The training set is used
to train the model, and the testing set is used to evaluate its performance.
Typically, 80% of the data is used for training, and 20% is reserved for testing.
5. Feature Scaling
The model is compiled with the adam optimizer and mean_squared_error as the loss
function.
The optimizer updates the weights of the neural network to minimize the loss function
during training.
The model is trained on the training data for a specified number of epochs (iterations
over the entire training dataset).
A portion of the training data is also used as validation data to monitor the model's
performance on unseen data during training.
9. Making Predictions
After training, the model is used to make predictions on the test set.
The performance of the model is evaluated using the R-squared metric. This metric
indicates how well the model's predictions match the actual values.
An additional accuracy metric is calculated to determine the percentage of predictions
that are within 10% of the actual values.
df = pd.read_csv(file_path)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
X = df[['Quantity']] # Feature(s)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error'
y_pred = model.predict(X_test).flatten()
# Calculate R-squared
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
print(f"Accuracy: {accuracy:.2f}%")
plt.figure(figsize=(10, 6))
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
OUTPUT:
The dataset is loaded from a CSV file into a DataFrame using the pandas library. The data
represents sales transactions, including features like quantity and price.
Missing values in the dataset are handled by filling them with the mean of the respective
columns. Specifically, if there are any missing values in the 'Quantity' or 'Price' columns, they
are replaced with the mean value of that column.
The 'Quantity' column is selected as the feature (X), and the 'Price' column is chosen as the
target variable (y). This setup implies that the goal is to predict the price based on the quantity
sold.
4. Feature Scaling:
The features are standardized using StandardScaler to ensure they have a mean of 0 and a
standard deviation of 1. This step helps in improving the performance and training stability of
the neural network.
5. Data Splitting:
The dataset is split into training and testing sets using an 80-20 split. The training set is used
to train the model, while the test set is used to evaluate its performance.
The model is compiled with the Adam optimizer and Mean Squared Error (MSE) as the loss
function. The optimizer helps in updating the model weights during training, and the loss
function measures how well the model’s predictions match the actual prices.
The model is trained on the training data for 5 epochs with a batch size of 10. A portion of the
training data (10%) is used as validation data to monitor the model’s performance during
training.
9. Making Predictions:
After training, the model makes predictions on the test set. These predictions are then
flattened to match the format of the actual prices.
The model’s performance is evaluated using the R-squared (R²) metric, which indicates how
well the model’s predictions match the actual data. A higher R² value means better model
performance.
Additionally, the accuracy is calculated based on a threshold. This accuracy measures how
many predictions fall within a certain percentage (e.g., 10%) of the actual values.
Finally, the training and validation losses are plotted over the epochs to visualize the model’s
learning process. This plot helps in understanding whether the model is overfitting,
underfitting, or training effectively
import numpy as np
df = pd.read_csv(file_path)
df['Quantity'].fillna(df['Quantity'].mean(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
X = df[['Quantity']].values # Feature(s)
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X
# Reshape data to fit into CNN (e.g., treat data as 1x1 images)
X_reshaped = X_scaled.reshape(-1, 1, 1, 1)
model = Sequential([
Flatten(),
Dense(64, activation='relu'),
Dense(1)
])
model.compile(optimizer=Adam(), loss='mean_squared_error')
y_pred = model.predict(X_test).flatten()
# Calculate R-squared
print(f"R-squared: {r2}")
# Define a threshold for accuracy (e.g., predictions within 10% of the actual value)
print(f"Accuracy: {accuracy:.2f}%")
OUTPUT:
The code starts by importing essential libraries such as Pandas for data manipulation,
NumPy for numerical operations, Matplotlib for plotting (though not used directly
here), and several modules from TensorFlow and Scikit-learn for building and
evaluating a machine learning model.
A CSV file containing sales transaction data is loaded into a Pandas DataFrame. This
dataset includes columns such as 'Quantity' and 'Price'.
Handling Missing Values:
The code checks for missing values in the 'Quantity' and 'Price' columns. If any values
are missing, they are filled with the mean value of their respective columns to ensure
there are no gaps in the data.
The 'Quantity' column is selected as the feature, which will be used to predict the
target variable, 'Price'.
Data Standardization:
The 'Quantity' values are standardized to have a mean of 0 and a standard deviation of
1. This process helps the model to converge more quickly and can improve
performance.
Data Reshaping:
The standardized 'Quantity' data is reshaped into a 4-dimensional array to fit the input
requirements of a Convolutional Neural Network (CNN). Each 'Quantity' value is
treated as a tiny 1x1 image.
The dataset is split into training and testing sets. The training set is used to train the
model, while the testing set is used to evaluate its performance.
The model is compiled using the Adam optimizer and mean squared error as the loss
function, which is typical for regression tasks.
The model is trained on the training data for 5 epochs. A portion of the training data is
used as a validation set to monitor the model's performance during training.
Making Predictions:
Calculating R-squared:
The R-squared value is calculated to evaluate the goodness of fit of the model. R-
squared indicates how well the model's predictions match the actual values.