0% found this document useful (0 votes)

8 views29 pages

PA DA2 - Merged

Uploaded by

josephallen.abc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views29 pages

PA DA2 - Merged

Uploaded by

josephallen.abc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

DIGITAL ASSIGNMENT 2

BCSE334L-Predictive Analytics
FALL SEM 2024-2025
Slot: E1+TE1

Submitted by-

Arnav Bahuguna

Reg - 21BCE3795
Q. Develop a comprehensive prediction model-based on four machine learning techniques using a
real-time dataset of your choice. Your task includes the following components:
1. Modeling: Develop prediction models using some machine learning techniques of your choice.
(min 4 techniques)
2. Model Tuning: Discuss the tuning methods applied to optimize each regression model.
3. Model Validation: Validate the performance of your models using appropriate metrics. This should
include:
a. Split of data into training and testing sets.
b. Calculation of performance metrics such as Mean Absolute Error (MAE), Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), and R- squared (R²) score.
c. Comparison of the performance of diSerent models and selection of the best-performing
model.
4. Report: Compile a detailed report summarizing the entire process. Your report should include:
a. Introduction and objective of the prediction model.
b. Comprehensive details of the data preprocessing, modeling, tuning, and validation steps.
c. Interpretation of the results and insights gained from the models.
d. Conclusion and any potential future work or improvements.

Ensure that your assignment is well-structured, clearly written, and demonstrates a deep
understanding of regression techniques and their application to real-time datasets. Use high- quality
English and support your explanations with relevant references and citations where appropriate.

1. Introduction
In this project, we aim to develop a comprehensive prediction model using four machine learning
techniques on a real-time dataset. The dataset used is a housing dataset loaded from a CSV ﬁle
named Housing.csv. This dataset contains various features relevant to house pricing, such as the
number of bedrooms, bathrooms, square footage of living space, and more. The dataset is sourced
from Kaggle and would be used in regression testing to predict house prices. The objective is to
predict the target variable (e.g., housing prices or any continuous variable) by training and optimizing
several regression models.
The models evaluated include Ridge Regression, Decision Tree Regressor, Random Forest Regressor,
and Support Vector Regressor (SVR). The key goals of this report are:
• To implement and optimize at least four diSerent machine learning models.
• To compare the performance of the models based on various metrics such as MAE, MSE,
RMSE, and R².
• To identify the best-performing model and discuss any potential improvements for future work.
2. Data Preprocessing
The dataset used for this project consists of features (independent variables) and a target variable
(price) to be predicted. Data preprocessing steps included:
1. Handling Missing Values: Rows or columns with missing data were dropped or imputed.
2. Scaling: We applied StandardScaler to standardize the features to have a mean of 0 and a
standard deviation of 1.
3. Train-Test Split: The dataset was split into 80% training and 20% testing sets to evaluate the
model performance on unseen data.

All the preprocessing steps were completed during the previous iteration of the project, hence have
not been mentioned in detail in this document. We had also scaled the data to minimize the number
of outliers.

3. Modeling and Tuning

We have used GridSearchCV which is a tool from the scikit-learn library used for hyperparameter
tuning in machine learning. It essentially automates the process of ﬁnding the optimal combination of
hyperparameters for a given machine learning model.
The following machine learning models were used in this project:

1. Ridge Regression Tuning

Hyperparameter: alpha
• Purpose: The alpha parameter in Ridge Regression controls the strength of L2 regularization,
which penalizes large coeSicient values, preventing overfitting by shrinking the coeSicients.
• EGect:
o Low alpha (close to 0): The model behaves similarly to standard linear regression, with
little or no regularization. This can result in overfitting, especially when the model is
complex or has many features.
o High alpha: Large alpha values impose a greater penalty on the coeSicients, making the
model simpler by reducing the impact of less important features. However, this can also
lead to underfitting if the model becomes too constrained.
Tuning Process:
• We used GridSearchCV to search for the best alpha value over a predefined range ([0.01, 0.1,
1, 10, 100]).
• Why GridSearchCV? It automatically evaluates the performance of the model for each alpha
and finds the best one based on cross-validation.
Impact of Tuning:
• The optimal alpha value for Ridge Regression was 100 in this case, balancing regularization to
avoid overfitting without underfitting.
Error Metrics Default Hyperparameters Tuned Hyperparameters
Mean Absolute Error (MAE) 127247.01883052982 127047.49797298762
Mean Squared Error (MSE) 42340277513.86764 42344471977.561775
Root Mean Squared Error (RMSE) 205767.5327010256 205777.72468749326
R Squared Score 0.5501403299752385 0.5474111916440647

2. Decision Tree Regressor Tuning

Hyperparameters: max_depth, min_samples_split
• Purpose:
o max_depth: This controls the maximum depth of the decision tree. Limiting the depth
can prevent the tree from overfitting by stopping it from becoming overly complex.
o min_samples_split: This defines the minimum number of samples required to split an
internal node. It prevents the model from making splits that are based on too few
samples, which can lead to overfitting.
• EGect:
o Low max_depth: Limits the complexity of the model, reducing the chance of overfitting
but potentially increasing bias (underfitting).
o High max_depth: Allows the tree to grow deeper, capturing more information from the
data, but risks overfitting.
o Low min_samples_split: Encourages more splits, leading to a highly flexible model that
can overfit.
o High min_samples_split: Prevents splits when there are too few samples, resulting in a
more generalized model but possibly underfitting.
Tuning Process:
• We tuned both max_depth and min_samples_split over a range of values ([5, 10, 20] for
max_depth and [2, 10, 20] for min_samples_split).

Impact of Tuning:
• The best combination found was max_depth = 10 and min_samples_split = 20, allowing the
tree to maintain a good balance between complexity and generalization.
Error Metrics Default Hyperparameters Tuned Hyperparameters
Mean Absolute Error (MAE) 122431.539019124 110636.27198458016
Mean Squared Error (MSE) 50792182369.69726 39237127498.66877
Root Mean Squared Error (RMSE) 225371.21016158487 198083.63763488585
R Squared Score 0.6421405678374152 0.6568032082982447

3. Random Forest Regressor Tuning

Hyperparameters: n_estimators, max_depth
• Purpose:
o n_estimators: This controls the number of trees in the forest. More trees generally
improve performance by reducing variance, but beyond a certain point, the
improvement becomes marginal and increases computation time.
o max_depth: Controls the depth of individual trees in the forest. It helps control the
trade-oS between bias and variance.
• EGect:
o Low n_estimators: Fewer trees lead to a less stable model, increasing variance, while
too few trees may underfit.
o High n_estimators: More trees increase robustness, but after a certain point, the
benefit diminishes.
o Low max_depth: Shallow trees prevent overfitting but may cause the model to underfit.
o High max_depth: Deep trees allow the model to capture more data patterns but can
lead to overfitting.
Tuning Process:
• We used GridSearchCV to tune n_estimators ([50, 100, 200]) and max_depth ([10, 20, None]).

Impact of Tuning:
• The best combination was n_estimators = 200 and max_depth = 20. This provided a well-
balanced model with enough trees to reduce variance and a controlled depth to prevent
overﬁtting.
Error Metrics Default Hyperparameters Tuned Hyperparameters
Mean Absolute Error (MAE) 86510.18535006807 86224.04226690672
Mean Squared Error (MSE) 25090348717.306355 25156235320.69841
Root Mean Squared Error (RMSE) 158399.3330709014 158607.1729799709
R Squared Score 0.7740575447454721 0.7707744616996358

4. Support Vector Regressor (SVR) Tuning

Hyperparameters: C, kernel
• Purpose:
o C: This controls the penalty for misclassified points. A higher C gives the model more
flexibility to fit the training data tightly, potentially reducing bias but increasing variance.
o kernel: This defines how the data is mapped into higher dimensions. Common options
are linear (for linearly separable data) and rbf (for more complex relationships).
• EGect:
o Low C: Results in a simpler model with higher bias and less variance, but it can underfit
the data.
o High C: Allows the model to fit more closely to the training data, reducing bias but
potentially leading to overfitting.
o linear kernel: Works well for linearly separable data but may struggle with more
complex data patterns.
o rbf kernel: ESective for capturing nonlinear relationships in the data but requires tuning
of additional parameters like gamma (not done here).
Tuning Process:
• We used GridSearchCV to tune C ([0.1, 1, 10]) and the kernel type (['linear', 'rbf']).

Impact of Tuning:
• The best combination found was C = 10 and kernel = ‘linear’, allowing the model to capture the
nonlinear relationships in the data while maintaining a good balance between bias and
variance.
Error Metrics Default Hyperparameters Tuned Hyperparameters
Mean Absolute Error (MAE) 222343.80394267367 137747.75623270954
Mean Squared Error (MSE) 148191154279.18042 72231554707.12172
Root Mean Squared Error (RMSE) 384956.04200892913 268759.28766671807
R Squared Score -289701.7426570366 -1.812122075633066

• The R-squared metric is misleading for nonlinear models (SVR in this case) and does not help
you assess goodness-of-ﬁt like so we get a value which lies outside of [0,1].

5. K-Nearest Neighbors Regression (KNN)

• Purpose:
o KNN is a non-parametric algorithm used for regression (and classification) tasks. It
predicts the target variable by averaging the values of the k nearest data points in the
feature space.
o Ideal for datasets with non-linear relationships and lower-dimensional spaces.
• EGect:
o The simplicity of KNN allows it to capture patterns without assuming any specific
functional form for the data. However, it is sensitive to the choice of k, the number of
neighbors, and can struggle with high-dimensional data due to the curse of
dimensionality.
o Lower k (e.g., 1 or 3) results in a more responsive model with higher variance (risk of
overfitting).
o Higher k averages over more neighbors, leading to a smoother model that may underfit
if k is too large.

6. Elastic Net Regression

• Purpose:
o Elastic Net combines both L1 (Lasso) and L2 (Ridge) regularization, making it eSective
for feature selection and controlling overfitting.
o It’s useful for high-dimensional datasets or data with correlated features where both L1
and L2 regularization techniques may benefit the model.
• EGect:
o The combination of L1 and L2 regularization can improve the model’s robustness by
shrinking coeSicients and selectively removing irrelevant features, enhancing
interpretability and performance.
o L1 component (controlled by l1_ratio) can drive coeSicients of less important features
to zero, essentially performing feature selection.
o L2 component adds stability, shrinking coeSicients without setting them to zero, thus
preventing high variance.
o The balance between L1 and L2 regularization is crucial. Higher L1 encourages feature
selection, while higher L2 prevents the model from becoming overly sparse.
7. Bayesian Ridge Regression
• Purpose:
o Bayesian Ridge is a probabilistic approach to linear regression, incorporating prior
probability distributions into the coeSicients. This regularization approach is
particularly useful for cases with limited or uncertain data.
o ESective when we want a model that captures uncertainty in predictions, and it’s also
robust to multicollinearity (correlated features).
• EGect:
o By modeling coeSicients as random variables, Bayesian Ridge provides a distribution of
possible values for the target variable rather than a single-point estimate. This can be
beneficial when we want confidence intervals for predictions.
o The model’s prior assumptions about coeSicients provide natural regularization,
reducing overfitting, especially when data is scarce.
o Stronger prior distributions yield smoother models, ideal for small datasets but may
cause underfitting if too restrictive.

8. Huber Regression
• Purpose:
o Huber Regression is a robust regression method that combines the strengths of both L1
and L2 loss functions. It is especially useful when dealing with datasets containing
outliers.
o It minimizes squared errors for smaller residuals (like MSE) but uses absolute errors for
larger residuals (like MAE), providing resilience to outliers.
• EGect:
o Small residuals are treated with an MSE approach, encouraging the model to ﬁt closely
to most data points.
o Large residuals are treated with MAE, reducing the inﬂuence of outliers on the model,
which minimizes their eSect on the overall regression line.
o The Huber parameter (delta) controls the threshold between MSE and MAE. Lower
delta makes the model more sensitive to outliers, while higher delta makes it less
sensitive.
o Overall, Huber Regression provides a balance between robustness and accuracy,
making it ideal for datasets with a few extreme outliers.

4. Modeling (Classiﬁcation)
Now, we have categorized ‘price’ column into four classes—Low, Medium, High, and Very High—
using quartiles. This method splitted the continuous price values into four equally sized groups based
on the distribution of the data.

1. Logistic Regression:
• Purpose: Logistic Regression is a simple and interpretable model that is commonly used for
binary and multiclass classiﬁcation tasks. It models the probability that an instance belongs to
a particular class using a logistic (sigmoid) function, making it suitable for linearly separable
data.
• EGect: Logistic Regression tends to perform well when the relationship between the features
and the target class is linear. However, in more complex datasets with non-linear interactions
(such as your housing dataset, where price categories depend on both continuous and
categorical-like features), Logistic Regression can struggle to capture these complexities. It
relies heavily on well-separated classes, and in this case, its accuracy was lower than more
complex models like Random Forest.

2. Random Forest Classiﬁer:

• Purpose: The Random Forest Classifier is an ensemble learning model that builds multiple
decision trees and combines their outputs to make predictions. It is particularly useful for
handling complex datasets with non-linear relationships and interactions between features, as
it automatically detects these patterns without requiring extensive data preprocessing or
feature engineering.
• EGect: In this project, the Random Forest Classifier was the most eSective model, achieving
the highest accuracy. Its ability to handle both numerical and categorical-like features (e.g.,
waterfront, view, zipcode), as well as complex feature interactions, gave it a distinct advantage.
It also reduces overfitting compared to single decision trees by averaging the predictions of
multiple trees, making it robust and capable of generalizing well to unseen data.

3. Support Vector Machine (SVM):

• Purpose: SVM is a powerful classification algorithm that works by finding the hyperplane that
best separates data points into diSerent classes. It can handle both linear and non-linear
classification by using a kernel trick (e.g., radial basis function, RBF) to map the data into
higher dimensions where it can be separated more easily.
• EGect: SVM performed well on this dataset, but not as eSectively as Random Forest. While
SVM is good at handling complex decision boundaries, it is sensitive to feature scaling and may
struggle with very large datasets or noisy data. Additionally, SVM does not natively handle
multiclass classification (like this dataset’s four price categories) as easily as Random Forest
does, requiring additional techniques like one-vs-rest. Its accuracy was lower than Random
Forest but still competitive due to its ability to model non-linear relationships.

4. k-Nearest Neighbors (kNN):

• Purpose: kNN is a simple, instance-based learning algorithm that classifies new data points
based on the majority class of the nearest neighbors. It is a non-parametric model, meaning it
makes no assumptions about the underlying data distribution and works by "memorizing" the
training set and using it to classify new examples based on similarity.
• EGect: kNN tends to work well when the dataset has well-separated classes and the distance
metric (such as Euclidean distance) is meaningful for classification. However, in this case, with
mixed feature types (both continuous and categorical-like features) and potentially overlapping
classes, kNN did not perform as well as Random Forest or SVM. It also tends to be slower for
large datasets because it needs to compute distances to all training points for each new
prediction. In this housing dataset, its accuracy was lower, reflecting these challenges.

5. Model Validation
To evaluate the performance of the models, we used the following metrics:
• Mean Absolute Error (MAE): The average of the absolute diSerences between predicted and
actual values. It measures how far the predictions are from the actual values on average.
Lower MAE indicates better model performance, with predictions closer to actual values.
• Mean Squared Error (MSE): The average of the squared diSerences between predicted and
actual values. It gives a larger penalty to larger errors because it squares the errors. Lower MSE
means better performance, but it is more sensitive to outliers than MAE.
• Root Mean Squared Error (RMSE): The square root of MSE, bringing the error back to the
original units of the target variable, making it more interpretable. Like MSE, a lower RMSE
indicates better performance, and it penalizes larger errors more heavily than MAE.
• R² Score: The proportion of the variance in the dependent variable that is predictable from the
independent variables. It indicates how well the model fits the data. R² values range from 0 to
1. A value closer to 1 indicates a better fit, with 1 meaning a perfect fit.
• Accuracy: Accuracy is one of the simplest and most commonly used metrics for evaluating
classification models. It measures the proportion of correctly classified instances out of the
total instances in the dataset.
• Classification Report: The classification report shows a representation of the main
classification metrics on a per-class basis. This gives a deeper intuition of the classifier
behavior over global accuracy which can mask functional weaknesses in one class of a
multiclass problem. It includes metrics: Precision, Recall, F1-Score, Support.

6. Conclusion and Future Work

o In this project, we used a variety of regression models to predict housing prices based on features
such as the number of bedrooms, bathrooms, square footage of the living area and lot, the
condition and grade of the house, and its geographic location (latitude and longitude). After
comparing multiple models, including Linear Regression, Decision Tree Regressor, and Random
Forest Regressor, the Random Forest Regressor emerged as the best-performing model.
o The Random Forest Regressor excelled because it eSectively captured the complex relationships
between the features and the target variable (price). For instance, the dataset contains both
numeric features (like square footage and number of bedrooms) and categorical-like features
(such as waterfront and view), which influence the price in nonlinear ways. Simple linear models,
such as Linear Regression, struggled to capture these interactions and dependencies because
they assume linear relationships between features and the target variable.
o Random Forest, however, uses an ensemble of decision trees, which allows it to:
1. Handle Nonlinear Relationships: Features like the number of bedrooms or the presence
of a waterfront may not have a linear impact on price. For instance, a house with a
waterfront may have a disproportionate eSect on price compared to a slight increase in
square footage. Random Forests are well-equipped to model these types of nonlinear
interactions.
2. Capture Feature Interactions: The model can automatically identify interactions between
features. For instance, the impact of sqft_living on price may diSer depending on the
zipcode. Random Forest's structure of multiple decision trees captures these interactions
without requiring explicit feature engineering.
3. Robustness Against Noise and Overfitting: While Decision Trees can overfit the training
data, Random Forest mitigates this by averaging the predictions of multiple trees built on
random subsets of the data. This ensures that the model generalizes well to unseen data,
as it reduces the variance typically seen in single decision tree models.
o In this dataset, where both numerical and categorical-like features have varying importance
depending on local factors (e.g., location, house quality), Random Forest's flexibility and ability to
aggregate multiple perspectives (trees) gave it a significant edge over simpler models.
o In the same way, we observe Random Forest Classifier emerged as the best-performing model,
achieving the highest accuracy for categorizing ‘price_category’ column which was discrete ,i.e.,
categorical .

Future Work and Potential Improvements

To further enhance the model’s predictive power and accuracy, we can implement the following:
1. Exploring Advanced Ensemble Techniques
o While Random Forest performed well, more advanced ensemble methods like Gradient
Boosting Machines (GBM) or XGBoost could yield additional gains. These models
typically outperform Random Forest in structured data scenarios, as they sequentially
build trees to correct errors from previous iterations, capturing even more complex
patterns in the data.
2. Enhanced Feature Engineering and Selection
o Additional feature engineering could reveal more meaningful patterns. For instance,
creating interaction terms, binning, or applying polynomial transformations might
capture non-linear relationships not initially apparent. Testing feature importance
across diSerent models could also help identify the most predictive features, allowing
for a more streamlined and accurate model.
3. Refining Hyperparameter Tuning
o Although we used GridSearchCV, experimenting with a more refined grid of parameters
could yield further performance improvements. Additionally, techniques like
RandomizedSearchCV or Bayesian Optimization might expedite the tuning process
while still providing optimal parameters, especially if model training times are extensive.
4. Applying Cross-Validation for Robustness
o Implementing k-fold cross-validation would provide a more robust estimate of model
performance by ensuring that the model’s accuracy is consistent across multiple data
splits. This would reduce the likelihood of overfitting to a single train-test split, thereby
enhancing generalization.
5. Testing on Diverse Datasets
o Finally, applying the model to diSerent datasets or expanding the housing dataset to
include additional regions could test the model’s robustness and generalizability. A
broader dataset could reveal regional price diSerences and other factors, adding depth
to the model and potentially leading to additional insights.

Link to ipynb
The ipynb notebook is also attached below for ref.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('Housing.csv')
df.head()

id date price bedrooms bathrooms

sqft_living \
0 7229300521 20141013T000000 231300.0 2 1.00
1180
1 6414100192 20141209T000000 538000.0 3 2.25
2570
2 5631500400 20150225T000000 180000.0 2 1.00
770
3 2487200875 20141209T000000 604000.0 4 3.00
1960
4 1954400510 20150218T000000 510000.0 3 2.00
1680

sqft_lot floors waterfront view ... grade sqft_above

sqft_basement \
0 5650 1.0 0 0 ... 7 1180
0
1 7242 2.0 0 0 ... 7 2170
400
2 10000 1.0 0 0 ... 6 770
0
3 5000 1.0 0 0 ... 7 1050
910
4 8080 1.0 0 0 ... 8 1680
0

yr_built yr_renovated zipcode lat long sqft_living15 \

0 1955 0 98178 47.5112 -122.257 1340
1 1951 1991 98125 47.7210 -122.319 1690
2 1933 0 98028 47.7379 -122.233 2720
3 1965 0 98136 47.5208 -122.393 1360
4 1987 0 98074 47.6168 -122.045 1800

sqft_lot15
0 5650
1 7639
2 8062
3 5000
4 7503

[5 rows x 21 columns]

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 21613 non-null int64
1 date 21613 non-null object
2 price 21613 non-null float64
3 bedrooms 21613 non-null int64
4 bathrooms 21613 non-null float64
5 sqft_living 21613 non-null int64
6 sqft_lot 21613 non-null int64
7 floors 21613 non-null float64
8 waterfront 21613 non-null int64
9 view 21613 non-null int64
10 condition 21613 non-null int64
11 grade 21613 non-null int64
12 sqft_above 21613 non-null int64
13 sqft_basement 21613 non-null int64
14 yr_built 21613 non-null int64
15 yr_renovated 21613 non-null int64
16 zipcode 21613 non-null int64
17 lat 21613 non-null float64
18 long 21613 non-null float64
19 sqft_living15 21613 non-null int64
20 sqft_lot15 21613 non-null int64
dtypes: float64(5), int64(15), object(1)
memory usage: 3.5+ MB

df.drop('date', axis=1, inplace=True)

df.isnull().sum()

id 0
price 0
bedrooms 0
bathrooms 0
sqft_living 0
sqft_lot 0
floors 0
waterfront 0
view 0
condition 0
grade 0
sqft_above 0
sqft_basement 0
yr_built 0
yr_renovated 0
zipcode 0
lat 0
long 0
sqft_living15 0
sqft_lot15 0
dtype: int64

df.drop('id', axis=1, inplace = True)

df[df.columns].plot(kind='box', figsize=(20,10))

<Axes: >

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
ftransform = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot',
'floors', 'waterfront', 'view', 'condition',
'grade',
'sqft_above', 'sqft_basement', 'yr_built',
'yr_renovated',
'zipcode', 'lat', 'long', 'sqft_living15',
'sqft_lot15']

df[ftransform] = scaler.fit_transform(df[ftransform])
df[df.columns].plot(kind='box', figsize=(20,10))

<Axes: >
features = df.drop('price', axis=1)
y = df['price']

from sklearn.decomposition import PCA

pca = PCA(n_components=0.95)
pca_features = pca.fit_transform(features)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(pca_features, y,

test_size=0.3, random_state=101)

Linear Regression

from sklearn.linear_model import LinearRegression

lin_model = LinearRegression()
lin_model.fit(X_train, y_train)

LinearRegression()

lin_pred = lin_model.predict(X_test)

from sklearn.metrics import mean_absolute_error, mean_squared_error,

classification_report, r2_score

print("Mean Absolute error: ", mean_absolute_error(lin_pred, y_test))

print("Mean Squared error: ", mean_squared_error(lin_pred, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(lin_pred, y_test)))
print("R-squared score: ", r2_score(lin_pred, y_test))

Mean Absolute error: 127249.10489453698

Mean Squared error: 42340254626.76828
Root Mean Squared error: 205767.4770870467
R-squared score: 0.5501677966367371

from sklearn.linear_model import Ridge

rdg_model = Ridge()
rdg_model.fit(X_train, y_train)

Ridge()

rdg_preds = rdg_model.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(rdg_preds, y_test))

print("Mean Squared error: ", mean_squared_error(rdg_preds, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(rdg_preds, y_test)))
print("R-squared score: ", r2_score(rdg_preds, y_test))

Mean Absolute error: 127247.01883052982

Mean Squared error: 42340277513.86764
Root Mean Squared error: 205767.5327010256
R-squared score: 0.5501403299752385

from sklearn.tree import DecisionTreeRegressor

dcst_model = DecisionTreeRegressor()
dcst_model.fit(X_train, y_train)

DecisionTreeRegressor()

dcst_preds = dcst_model.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(dcst_preds,

y_test))
print("Mean Squared error: ", mean_squared_error(dcst_preds, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(dcst_preds, y_test)))
print("R-squared score: ", r2_score(dcst_preds, y_test))

Mean Absolute error: 120855.29673041332

Mean Squared error: 49089921510.92567
Root Mean Squared error: 221562.45510222545
R-squared score: 0.6344749960736829

from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor()
rf_model.fit(X_train,y_train)
rf_preds = rf_model.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(rf_preds, y_test))

print("Mean Squared error: ", mean_squared_error(rf_preds, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(rf_preds, y_test)))
print("R-squared score: ", r2_score(rf_preds, y_test))

Mean Absolute error: 86746.69128582078

Mean Squared error: 24859027200.217564
Root Mean Squared error: 157667.4576449356
R-squared score: 0.7762662337723687

from sklearn.svm import SVR

svr_model = SVR()
svr_model.fit(X_train,y_train)
svr_preds = svr_model.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(svr_preds, y_test))

print("Mean Squared error: ", mean_squared_error(svr_preds, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(svr_preds, y_test)))
print("R-squared score: ", r2_score(svr_preds, y_test))

Mean Absolute error: 222343.80394267367

Mean Squared error: 148191154279.18042
Root Mean Squared error: 384956.04200892913
R-squared score: -289701.7426570366

MODEL TUNING
from sklearn.model_selection import GridSearchCV

param_grid_ridge = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_rdg_model = GridSearchCV(rdg_model, param_grid_ridge, cv=3,
scoring='neg_mean_squared_error')
grid_rdg_model.fit(X_train, y_train)

GridSearchCV(cv=3, estimator=Ridge(),
param_grid={'alpha': [0.01, 0.1, 1, 10, 100]},
scoring='neg_mean_squared_error')

best_ridge = grid_rdg_model.best_estimator_
grid_rdg_preds = best_ridge.predict(X_test)

print(f"Best Ridge Alpha:",grid_rdg_model.best_params_['alpha'])

print("Mean Absolute error: ", mean_absolute_error(grid_rdg_preds,
y_test))
print("Mean Squared error: ", mean_squared_error(grid_rdg_preds,
y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(grid_rdg_preds, y_test)))
print("R-squared score: ", r2_score(grid_rdg_preds, y_test))

Best Ridge Alpha: 100

Mean Absolute error: 127047.49797298762
Mean Squared error: 42344471977.561775
Root Mean Squared error: 205777.72468749326
R-squared score: 0.5474111916440647

param_grid_dt = {
'max_depth': [5, 10, 20, None],
'min_samples_split': [2, 10, 20]
}
grid_dcst = GridSearchCV(dcst_model, param_grid_dt, cv=3,
scoring='neg_mean_squared_error')
grid_dcst.fit(X_train, y_train)

GridSearchCV(cv=3, estimator=DecisionTreeRegressor(),
param_grid={'max_depth': [5, 10, 20, None],
'min_samples_split': [2, 10, 20]},
scoring='neg_mean_squared_error')

best_dcst = grid_dcst.best_estimator_
y_pred_dcst = best_dcst.predict(X_test)

print("Best Decision Tree Parameters:",grid_dcst.best_params_)

print("Mean Absolute error: ", mean_absolute_error(y_pred_dcst,
y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_dcst, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_dcst, y_test)))
print("R-squared score: ", r2_score(y_pred_dcst, y_test))

Best Decision Tree Parameters: {'max_depth': 10, 'min_samples_split':

20}
Mean Absolute error: 110631.04159020874
Mean Squared error: 39239209652.15163
Root Mean Squared error: 198088.8933084125
R-squared score: 0.6567287156442908

param_grid_rf = {
'n_estimators': [50, 100, 200],
'max_depth': [10, 20, None]
}
rf = RandomForestRegressor(random_state=42)
grid_search_rf = GridSearchCV(rf, param_grid_rf, cv=3,
scoring='neg_mean_squared_error')
grid_search_rf.fit(X_train, y_train)

GridSearchCV(cv=3, estimator=RandomForestRegressor(random_state=42),
param_grid={'max_depth': [10, 20, None],
'n_estimators': [50, 100, 200]},
scoring='neg_mean_squared_error')

best_rf = grid_search_rf.best_estimator_
y_pred_rf = best_rf.predict(X_test)

print("Best Random Forest Parameters:",grid_search_rf.best_params_)

print("Mean Absolute error: ", mean_absolute_error(y_pred_rf, y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_rf, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_rf, y_test)))
print("R-squared score: ", r2_score(y_pred_rf, y_test))

Best Random Forest Parameters: {'max_depth': 20, 'n_estimators': 200}

Mean Absolute error: 86224.04226690672
Mean Squared error: 25156235320.69841
Root Mean Squared error: 158607.1729799709
R-squared score: 0.7707744616996358

from sklearn.svm import SVR

param_grid_svr = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf']
}
svr = SVR()
grid_search_svr = GridSearchCV(svr, param_grid_svr, cv=3,
scoring='neg_mean_squared_error')
grid_search_svr.fit(X_train, y_train)

GridSearchCV(cv=3, estimator=SVR(),
param_grid={'C': [0.1, 1, 10], 'kernel': ['linear',
'rbf']},
scoring='neg_mean_squared_error')

best_svr = grid_search_svr.best_estimator_
y_pred_svr = best_svr.predict(X_test)

print("Best SVR Parameters:",grid_search_svr.best_params_)

print("Mean Absolute error: ", mean_absolute_error(y_pred_svr,
y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_svr, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_svr, y_test)))
print("R-squared score: ", r2_score(y_pred_svr, y_test))
Best SVR Parameters: {'C': 10, 'kernel': 'linear'}
Mean Absolute error: 137747.75623270954
Mean Squared error: 72231554707.12172
Root Mean Squared error: 268759.28766671807
R-squared score: -1.812122075633066

from sklearn.neighbors import KNeighborsRegressor

knn_reg = KNeighborsRegressor()
knn_reg.fit(X_train, y_train)

KNeighborsRegressor()

y_pred_knn = knn_reg.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(y_pred_knn,

y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_knn, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_knn, y_test)))
print("R-squared score: ", r2_score(y_pred_knn, y_test))

Mean Absolute error: 87159.51838371376

Mean Squared error: 27524502043.479027
Root Mean Squared error: 165905.09951016883
R-squared score: 0.7228269728251158

from sklearn.linear_model import ElasticNet

elastic_net_reg = ElasticNet()
elastic_net_reg.fit(X_train, y_train)

ElasticNet()

y_pred_elastic_net = elastic_net_reg.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(y_pred_elastic_net,

y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_elastic_net,
y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_elastic_net, y_test)))
print("R-squared score: ", r2_score(y_pred_elastic_net, y_test))

Mean Absolute error: 125533.69246905086

Mean Squared error: 46411282818.92971
Root Mean Squared error: 215432.78027944054
R-squared score: 0.30885381578698745

from sklearn.linear_model import BayesianRidge

bayesian_ridge_reg = BayesianRidge()
bayesian_ridge_reg.fit(X_train, y_train)

BayesianRidge()

y_pred_bayesian_ridge = bayesian_ridge_reg.predict(X_test)

print("Mean Absolute error: ",

mean_absolute_error(y_pred_bayesian_ridge, y_test))
print("Mean Squared error: ",
mean_squared_error(y_pred_bayesian_ridge, y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_bayesian_ridge, y_test)))
print("R-squared score: ", r2_score(y_pred_bayesian_ridge, y_test))

Mean Absolute error: 127220.492976071

Mean Squared error: 42340606769.06621
Root Mean Squared error: 205768.3327654336
R-squared score: 0.5497874363751005

from sklearn.linear_model import HuberRegressor

huber_reg = HuberRegressor()
huber_reg.fit(X_train, y_train)

HuberRegressor()

y_pred_huber = huber_reg.predict(X_test)

print("Mean Absolute error: ", mean_absolute_error(y_pred_huber,

y_test))
print("Mean Squared error: ", mean_squared_error(y_pred_huber,
y_test))
print("Root Mean Squared error: ",
np.sqrt(mean_squared_error(y_pred_huber, y_test)))
print("R-squared score: ", r2_score(y_pred_huber, y_test))

Mean Absolute error: 118872.32059026897

Mean Squared error: 47919156366.38232
Root Mean Squared error: 218904.445743759
R-squared score: 0.21181390040065873

Using Classification Models on the dataset

dfc = pd.read_csv('Housing.csv')

df.head()

price bedrooms bathrooms sqft_living sqft_lot floors

waterfront \
0 231300.0 -1.473841 -1.447464 -0.979835 -0.228321 -0.915427 -
0.087173
1 538000.0 -0.398669 0.175607 0.533634 -0.189885 0.936506 -
0.087173
2 180000.0 -1.473841 -1.447464 -1.426254 -0.123298 -0.915427 -
0.087173
3 604000.0 0.676503 1.149449 -0.130550 -0.244014 -0.915427 -
0.087173
4 510000.0 -0.398669 -0.149007 -0.435422 -0.169653 -0.915427 -
0.087173

view condition grade sqft_above sqft_basement yr_built

\
0 -0.305759 -0.629187 -0.558836 -0.734708 -0.658681 -0.544898

1 -0.305759 -0.629187 -0.558836 0.460841 0.245141 -0.681079

2 -0.305759 -0.629187 -1.409587 -1.229834 -0.658681 -1.293892

3 -0.305759 2.444294 -0.558836 -0.891699 1.397515 -0.204446

4 -0.305759 -0.629187 0.291916 -0.130895 -0.658681 0.544548

yr_renovated zipcode lat long sqft_living15

sqft_lot15
0 -0.210128 1.870152 -0.352572 -0.306079 -0.943355 -
0.260715
1 4.746678 0.879568 1.161568 -0.746341 -0.432686 -
0.187868
2 -0.210128 -0.933388 1.283537 -0.135655 1.070140 -
0.172375
3 -0.210128 1.085160 -0.283288 -1.271816 -0.914174 -
0.284522
4 -0.210128 -0.073636 0.409550 1.199335 -0.272190 -
0.192849

dfc.drop(['date', 'id'], axis=1, inplace=True)

dfc['price_category'] = pd.qcut(df['price'], q=4, labels=['Low',

'Medium', 'High', 'Very High'])
dfc.drop('price', axis=1, inplace = True)

dfc.head()

bedrooms bathrooms sqft_living sqft_lot floors waterfront

view \
0 2 1.00 1180 5650 1.0 0
0
1 3 2.25 2570 7242 2.0 0
0
2 2 1.00 770 10000 1.0 0
0
3 4 3.00 1960 5000 1.0 0
0
4 3 2.00 1680 8080 1.0 0
0

condition grade sqft_above sqft_basement yr_built yr_renovated

\
0 3 7 1180 0 1955 0

1 3 7 2170 400 1951 1991

2 3 6 770 0 1933 0

3 5 7 1050 910 1965 0

4 3 8 1680 0 1987 0

zipcode lat long sqft_living15 sqft_lot15 price_category

0 98178 47.5112 -122.257 1340 5650 Low

1 98125 47.7210 -122.319 1690 7639 High

2 98028 47.7379 -122.233 2720 8062 Low

3 98136 47.5208 -122.393 1360 5000 High

4 98074 47.6168 -122.045 1800 7503 High

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

ftransform = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot',

'floors', 'waterfront','view', 'condition', 'grade', 'sqft_above',
'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'sqft_living15', 'sqft_lot15']
dfc[ftransform] = scaler.fit_transform(dfc[ftransform])

dfc.head()

bedrooms bathrooms sqft_living sqft_lot floors waterfront

view \
0 -1.473841 -1.447464 -0.979835 -0.228321 -0.915427 -0.087173 -
0.305759
1 -0.398669 0.175607 0.533634 -0.189885 0.936506 -0.087173 -
0.305759
2 -1.473841 -1.447464 -1.426254 -0.123298 -0.915427 -0.087173 -
0.305759
3 0.676503 1.149449 -0.130550 -0.244014 -0.915427 -0.087173 -
0.305759
4 -0.398669 -0.149007 -0.435422 -0.169653 -0.915427 -0.087173 -
0.305759

condition grade sqft_above sqft_basement yr_built

yr_renovated \
0 -0.629187 -0.558836 -0.734708 -0.658681 -0.544898 -
0.210128
1 -0.629187 -0.558836 0.460841 0.245141 -0.681079
4.746678
2 -0.629187 -1.409587 -1.229834 -0.658681 -1.293892 -
0.210128
3 2.444294 -0.558836 -0.891699 1.397515 -0.204446 -
0.210128
4 -0.629187 0.291916 -0.130895 -0.658681 0.544548 -
0.210128

zipcode lat long sqft_living15 sqft_lot15

price_category
0 1.870152 -0.352572 -0.306079 -0.943355 -0.260715
Low
1 0.879568 1.161568 -0.746341 -0.432686 -0.187868
High
2 -0.933388 1.283537 -0.135655 1.070140 -0.172375
Low
3 1.085160 -0.283288 -1.271816 -0.914174 -0.284522
High
4 -0.073636 0.409550 1.199335 -0.272190 -0.192849
High

features = dfc.drop('price_category', axis=1)

y = dfc['price_category']

from sklearn.decomposition import PCA

pca = PCA(n_components=0.95)
pca_features = pca.fit_transform(features)

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(pca_features, y,

test_size=0.3, random_state=101)

from sklearn.linear_model import LogisticRegression

log_c = LogisticRegression(max_iter=1000)

log_c.fit(X_train, y_train)
log_preds = log_c.predict(X_test)
print("Accuracy: ", round(accuracy_score(y_test, log_preds)*100, 2))
print("Classification Report: ", classification_report(y_test,
log_preds))

Accuracy: 64.57
Classification Report: precision recall f1-score
support

High 0.51 0.49 0.50 1626

Low 0.78 0.80 0.79 1605
Medium 0.53 0.55 0.54 1656
Very High 0.77 0.76 0.76 1597

accuracy 0.65 6484

macro avg 0.65 0.65 0.65 6484
weighted avg 0.65 0.65 0.65 6484

from sklearn.ensemble import RandomForestClassifier

rf_c = RandomForestClassifier()

rf_c.fit(X_train, y_train)
rf_preds = rf_c.predict(X_test)

print("Accuracy: ", round(accuracy_score(y_test, rf_preds)*100, 2))

print("Classification Report: ", classification_report(y_test,
rf_preds))

Accuracy: 72.62
Classification Report: precision recall f1-score
support

High 0.64 0.62 0.63 1626

Low 0.82 0.80 0.81 1605
Medium 0.64 0.65 0.64 1656
Very High 0.81 0.83 0.82 1597

accuracy 0.73 6484

macro avg 0.73 0.73 0.73 6484
weighted avg 0.73 0.73 0.73 6484

from sklearn.svm import SVC

svc = SVC()

svc.fit(X_train, y_train)
svc_pred = svc.predict(X_test)
print("Accuracy: ", round(accuracy_score(y_test, svc_pred)*100, 2))
print("Classification Report: ", classification_report(y_test,
svc_pred))

Accuracy: 72.39
Classification Report: precision recall f1-score
support

High 0.62 0.67 0.65 1626

Low 0.83 0.78 0.81 1605
Medium 0.63 0.65 0.64 1656
Very High 0.84 0.79 0.81 1597

accuracy 0.72 6484

macro avg 0.73 0.72 0.73 6484
weighted avg 0.73 0.72 0.73 6484

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

knn.fit(X_train, y_train)
knn_pred = knn.predict(X_test)

print("Accuracy: ", round(accuracy_score(y_test, knn_pred)*100, 2))

print("Classification Report: ", classification_report(y_test,
knn_pred))

Accuracy: 70.19
Classification Report: precision recall f1-score
support

High 0.59 0.65 0.62 1626

Low 0.78 0.81 0.80 1605
Medium 0.63 0.57 0.60 1656
Very High 0.82 0.78 0.80 1597

accuracy 0.70 6484

macro avg 0.70 0.70 0.70 6484
weighted avg 0.70 0.70 0.70 6484

L03 The Regression Pipeline - 2
No ratings yet
L03 The Regression Pipeline - 2
58 pages
FULLTEXT02
No ratings yet
FULLTEXT02
72 pages
Machine Learning
No ratings yet
Machine Learning
43 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
ML 3
No ratings yet
ML 3
50 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Random Forest
No ratings yet
Random Forest
28 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Session 6 - Gross Validation
No ratings yet
Session 6 - Gross Validation
26 pages
Parameter's Resumes
No ratings yet
Parameter's Resumes
18 pages
Key Concepts Margin of Tolerance: F X y X
No ratings yet
Key Concepts Margin of Tolerance: F X y X
28 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Car Resale Value Prediction
No ratings yet
Car Resale Value Prediction
23 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Grid Search
No ratings yet
Grid Search
48 pages
All About ML
No ratings yet
All About ML
18 pages
ML Record
No ratings yet
ML Record
21 pages
Machine Learning Presentaion
No ratings yet
Machine Learning Presentaion
15 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Pa Da1
No ratings yet
Pa Da1
17 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
Regression Linaire Python Tome II
No ratings yet
Regression Linaire Python Tome II
10 pages
Linear Regression Problems With Solution
No ratings yet
Linear Regression Problems With Solution
5 pages
QB 1
No ratings yet
QB 1
11 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
S 10
No ratings yet
S 10
11 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Eldar: Name: Ticket:N3 Group:E27-24
No ratings yet
Eldar: Name: Ticket:N3 Group:E27-24
10 pages
Unit 5
No ratings yet
Unit 5
18 pages
House Report
No ratings yet
House Report
26 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Hyperparameter Tuning in Machine Learning 1706249573
No ratings yet
Hyperparameter Tuning in Machine Learning 1706249573
9 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
Bankruptcy Prevention Project
No ratings yet
Bankruptcy Prevention Project
16 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
ML Record
No ratings yet
ML Record
19 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
ML Assigment 1 Report
No ratings yet
ML Assigment 1 Report
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Reference Guide - Validation & Cross-Validation
No ratings yet
Reference Guide - Validation & Cross-Validation
7 pages
ML Models and Techniques
No ratings yet
ML Models and Techniques
12 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
SML
No ratings yet
SML
8 pages
Data Collection
No ratings yet
Data Collection
8 pages
OR Module 2 SESSION 2
No ratings yet
OR Module 2 SESSION 2
52 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Hyperparameter - Tuning
No ratings yet
Hyperparameter - Tuning
3 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Module 2
No ratings yet
Module 2
5 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
3 pages
Point Estimation Exercises
100% (1)
Point Estimation Exercises
7 pages
Car Mock - ML Ans
No ratings yet
Car Mock - ML Ans
6 pages
Hyperparameter Tuning For Machine Learning Models
No ratings yet
Hyperparameter Tuning For Machine Learning Models
5 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
Network Infrastructure
No ratings yet
Network Infrastructure
18 pages
Design and Implementation of Proportional Integral Observer Based Linear Model Predictive Controller
No ratings yet
Design and Implementation of Proportional Integral Observer Based Linear Model Predictive Controller
8 pages
Security Assessment Priciples
No ratings yet
Security Assessment Priciples
42 pages
Iot Based Crop Recommendation System Using Machine Learning For Smart Agriculture
No ratings yet
Iot Based Crop Recommendation System Using Machine Learning For Smart Agriculture
12 pages
Mymodules - ICT1511-19-S1 - Online Assessment 2
No ratings yet
Mymodules - ICT1511-19-S1 - Online Assessment 2
18 pages
Smart Fields: Enhancing Agriculture With Machine Learning
No ratings yet
Smart Fields: Enhancing Agriculture With Machine Learning
5 pages
Internet of Things IoT Assisted Context Aware Fertilizer Recommendation
No ratings yet
Internet of Things IoT Assisted Context Aware Fertilizer Recommendation
15 pages
Lecture 1
No ratings yet
Lecture 1
15 pages
RC4 Encryption Algorithm
No ratings yet
RC4 Encryption Algorithm
7 pages
SVD Slides
No ratings yet
SVD Slides
26 pages
Libopenabe v1.0.0 Design
No ratings yet
Libopenabe v1.0.0 Design
30 pages
Bayes' Theorem in Artificial Intelligence
No ratings yet
Bayes' Theorem in Artificial Intelligence
21 pages
Operating System Attack
No ratings yet
Operating System Attack
16 pages
Application
No ratings yet
Application
22 pages
Game Theory
No ratings yet
Game Theory
21 pages
Unit Class1
No ratings yet
Unit Class1
56 pages
IoT Based Soil Nutrients Analysis and Monitoring System For Smart Agriculture
No ratings yet
IoT Based Soil Nutrients Analysis and Monitoring System For Smart Agriculture
7 pages
Lab 13: Implementation of AVL TREE
No ratings yet
Lab 13: Implementation of AVL TREE
4 pages
Characterizing Uncertain Site-Specific Trend Function by Sparse Bayesian Learning
No ratings yet
Characterizing Uncertain Site-Specific Trend Function by Sparse Bayesian Learning
14 pages
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
No ratings yet
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
3 pages
Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 07 Lecture - 07
No ratings yet
Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 07 Lecture - 07
12 pages
PersoNet A Novel Framework For Personality Classification-Based Apt Customer Service Agent Selection
No ratings yet
PersoNet A Novel Framework For Personality Classification-Based Apt Customer Service Agent Selection
7 pages
ESS-IBAA: Efficient, Short, and Secure ID-based Authentication Algorithm For Wireless Sensor Network
No ratings yet
ESS-IBAA: Efficient, Short, and Secure ID-based Authentication Algorithm For Wireless Sensor Network
14 pages
Q No Questions Marks BTL CO PO CO 3 PO1: 1 2 F 1.5+20PG1+0.1 (PG1) F2 1.9+30PG2+0.1 (PG2) 3 4
No ratings yet
Q No Questions Marks BTL CO PO CO 3 PO1: 1 2 F 1.5+20PG1+0.1 (PG1) F2 1.9+30PG2+0.1 (PG2) 3 4
1 page
Howxtre
No ratings yet
Howxtre
8 pages
Data Classification-Decision Trees: Business Intelligence
No ratings yet
Data Classification-Decision Trees: Business Intelligence
23 pages
Drttit - Gvet.edu - in Drttit - Gvet.edu - in Drttit - Gvet.edu - in Drttit - Gvet.edu - in
No ratings yet
Drttit - Gvet.edu - in Drttit - Gvet.edu - in Drttit - Gvet.edu - in Drttit - Gvet.edu - in
3 pages
Tricubic Interpolation
No ratings yet
Tricubic Interpolation
17 pages
Electronics and Communication Engineering: PAPR Analysis of DHT-Precoded OFDM System For M-QAM
No ratings yet
Electronics and Communication Engineering: PAPR Analysis of DHT-Precoded OFDM System For M-QAM
19 pages
Control of Multi Input Multi Output Processes: Cbe495 Lecture Iii
No ratings yet
Control of Multi Input Multi Output Processes: Cbe495 Lecture Iii
17 pages
Central Divided Difference: Topic: Differentiation
No ratings yet
Central Divided Difference: Topic: Differentiation
14 pages
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
No ratings yet
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
12 pages
Functions AA HL W 4
No ratings yet
Functions AA HL W 4
3 pages
CAO Task
No ratings yet
CAO Task
1 page
RAID Personalized Image Editing
No ratings yet
RAID Personalized Image Editing
4 pages
Resueltos Amortizado
No ratings yet
Resueltos Amortizado
3 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet