ML Internship Project Report 2024
ML Internship Project Report 2024
ABOUT
in
Machine Learning
Submitted By
INTRODUCTION
Youtube advertisers pay content creators depending on adviews and clicks for the goods or services they
are promoting The ads can also be estimated using other metrics such as likes, comments among others.
This therefore calls for training a number of regression models so as to select the best model that will
predict adviews which is the problem statement at hand. Data has to be cleaned up before it is fed into
algorithms in order to obtain better results.
Objective
To build a machine learning regression to predict youtube adview count based on other youtube
metrics.
In classic terms, machine learning is a type of artificial intelligence that enables selflearning from
data and then applies that learning without the need for human intervention.
Linear Regression
Linear Regression is a supervised machine learning algorithm where the predicted output is
continuous and has a constant slope. It's used to predict values within a continuous range, (e.g. sales,
price) rather than trying to classify them into categories (e.g. cat, dog).
1. Simple regression.
2. Multiple regression
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for
both classification or regression challenges. However, it is mostly used in classification problems. In the
SVM algorithm, we plot each data item as a point in n-dimensional space (where n is number of features
you have) with the value of each feature being the value of a particular coordinate
Decision Tree
Decision tree analysis involves making a tree-shaped diagram to chart out a course of action or a
statistical probability analysis. It is used to break down complex problems or branches. Each branch
of the decision tree could be a possible outcome.
An artificial neural network (ANN) is the piece of a computing system designed to simulate the
way the human brain analyzes and processes information. It is the foundation of artificial
intelligence (AI) and solves problems that would prove impossible or difficult by human or
statistical standards. ANNs have self-learning capabilities that enable them to produce better results
as more data becomes available.
Data Description
The file train.csv contains metrics and other details of about 15000 youtube videos. The metrics
include number of views, likes, dislikes, comments and apart from that published date, duration and
category are also included. The train.csv file also contains the metric number of adviews which is
our target variable for prediction.
2. Visualise the dataset using plotting using heatmaps and plots. You can study data distributions
for each attribute as well.
5. Normalise your data and split the data into training, validation and test set in the appropriate
ratio.
6. Use linear regression, Support Vector Regressor for training and get errors.
8. Build an artificial neural network and train it with different layers and hyperparameters.
Experiment a little. Use keras.
Visualization :
Error
Root Mean 28907.8385 25385.7004 35018.3689 28907.8385 28801.9559
Squ 7573986 6376795 713254 7573986 5433679
ared Error
Best Model
From the training dataset by applying all algorithms for train the model,we found that "Random
Forest Regressor" algorithm has less root mean squared error as compared to othre algorithms.As
we know model having less root mean squared error is more perfect.So here for prediction of test
dataset we use "Random Forest” algorithm.
Conclusions
We had a lot of different ideas for the project, but were maybe originally too ambitious for our
goals. We were originally trying to predict the view count of advertisement. In this way we can
predict the adview of an advertisement. We were hoping that. Some more things that we could have
tried if we had more time would include.