0% found this document useful (0 votes)
4 views

Intern ReportFSDFSDF

VXXVFWREWRER
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Intern ReportFSDFSDF

VXXVFWREWRER
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

INTERNSHIP REPORT ON

“MACHINE LEARNING INTERN,


INDIAN SERVERS, ANDRA PRADESH”
SUBMITTED BY
DINESH KUMAR.P [622121202301]
B.TECH INFORMATION TECHNOLOGY
PAAVAI ENGINEERING COLLEGE
NAMAKKAL – 607318

UNDER THE GUIDANCE OF


SAI SATHISH.D
INTERNSHIP REPORT ON
“MACHINE LEARNING INTERN,
INDIAN SERVERS, ANDRA PRADESH”
SUBMITTED BY
DINESH KUMAR.P [622121202301]
B.TECH INFORMATION TECHNOLOGY
PAAVAI ENGINEERING COLLEGE
NAMAKKAL – 607318

UNDER THE GUIDANCE OF


SAI SATHISH.D
DECLARATION

I here by declare that the internship training entitled “MACHINE


LEARNING” submitted to the department of INFORMATION
TECHNOLOGY in PAAVAI ENGINEERING COLLEGE,
NAMAKKAL for the industrial internship in the degree of BACHELOR
OF TECHNOLOGY is a record of original work done by me under the
guidance of ANITHA.P

PLACE: Namakkal

DATE:

SIGNATURE OF HOD SIGNATURE OF STUDENT


DAY 01 TO DAY 05

INTRODUCTION

Operation of the Organization


The race for digital transformation is on. In this globally connected on-demand
world with rapid advancements in internet technologies, businesses worldwide are
under constant pressure to add innovative real-time capabilities to their applications
to respond to market opportunities. Every business worldwide is building event-
driven, real-time applications - from financial services, transportation, and energy,
to retail, healthcare, and Gaming companies. Our endeavor is to make it easy to
develop innovative real-time applications and efficient to operate them in
production. We have a proven record of building highly scalable, world-class
consulting processes that offer tremendous business advantages to our clients in the
form of huge cost benefits, definitive results and consistent project deliveries across
the globe. We prominently strive to improve your business by delivering the full
range of competencies including operational performance, developing and applying
business strategies to improve financial reports, defining strategic goals and measure
and manage those goals along with measuring and managing them

Major Milestones
Skills have become the global currency of the 21st century. In a world where
competition for jobs, pay increases, and academic success continues to increase,
certifications offer hope because they are a credible, third-party assessment of one’s
skill and knowledge for a given subject. Some of the key benefits achieved by the
students by certification are Validation of knowledge, Increased marketability,
Increased earning power, Enhanced academic performance, Improved reputation,
Enhanced credibility, Increased confidence, Respect from peers. By Knowledge
Solution India’s certification, students has improved academic performance having,
higher grade point average for certified college students from 6.9 to 7.8, higher
graduation rates for certified college students: 78.4% to 94.5% and the dropout rates
are reduced to 0.2% to 1.0%.
DAY 06 TO DAY 10

Data set
This section describes, in brief, the data that has been used for the research. Data
from multiple sources was used in this project, the major amount of data was
extracted from public website Yocket (Yocket.com), data regarding the rankings,
fees and enrolment in colleges was obtained from a leading educational consultancy
firm The Mentors Circle in India. Data from both the sources was integrated together
to form a staging data-set. For predicting the chance of a student getting shortlisted
in universities the final data-set was divided into multiple datasets each representing
a particular university. For predicting the list of universities suitable for students
based on their profile data of all the students the staging data-set was updated only
to have records of students who had successfully secured admission in the
universities. Below table shows the different features of the data-sets.

Dataset extraction and transformation


Data related to the college ranking was collected in .csv format, the data related
to students’ profile was extracted using data extraction tool provided by (Mozenda
(n.d.)) in .csv files. Data being from public portal had multiple records with missing
and irrelevant values; data cleaning was performed in Microsoft Excel by deleting
the records having unwanted and missing values. Unwanted columns were removed
from the data-set. Once the data-set was cleaned data was transformed to be suitable
for the model. The original data-set had TOEFL score as a representation of
language, to have a consistent metrics for the language score. Similarly, the
Undergraduate score of the students were represented in terms of percentage and
CGPA; all the records of percentage were converted to CGPA by multiplying
percentage score by 9.5.
DAY 11 TO DAY 15

Algorithms

➢ Linear Regression Linear Regression is a machine learning algorithm


based on supervised learning. It performs a regression task. Regression
models a target prediction value based on independent variables. It is
mostly used for finding out the relationship between variables and
forecasting. Different regression models differ based on – the kind of
relationship between dependent and independent variables they are
considering, and the number of independent variables getting used.

➢ Artificial Neural Networks It intended to simulate the behavior of


biological systems composed of “neurons”. ANNs are computational
models inspired by an animal’s central nervous systems. It is capable of
machine learning as well as pattern recognition. These presented as
systems of interconnected “neurons” which can compute values from
inputs. A neural network is an oriented graph. It consists of nodes which
in the biological analogy represent neurons, connected by arcs. It
corresponds to dendrites and synapses. Each arc associated with a weight
while at each node. Apply the values received as input by the node and
define Activation function along the incoming arcs, adjusted by the
weights of the arcs. A neural network is a machine learning algorithm
based on the model of a human neuron. The human brain consists of
millions of neurons. It sends and process signals in the form of electrical
and chemical signals.
DAY 16 TO DAY 20

Existing System

(Bibodi et al. (n.d.)) used multiple machine learning models to create a system that
would help the students to shortlist the universities suitable for them also a second
model was created to help the colleges to decide on enrolment of the student. Nave
Bayes algorithm was used to predict the likelihood of success of an application, and
multiple classification algorithms like Decision Tree, Random Forest, Nave Bayes and
SVM were compared and evaluated based on their accuracy to select the best
candidates for the college. GRADE system was developed by (Waters and
Miikkulainen (2013)) to support the admission process for the graduate students in the
University of Texas Austin Department of Computer Science. The main objective of
the project was to develop a system that can help the admission committee of the
university to take better and faster decisions. Logistic regression and SVM were used
to create the model, both models performed equally well and the final system was
developed using Logistic regression due to its simplicity. The time required by the
admission committee to review the applications was reduced by 74% but human
intervention was required to make the final decision on status if the application.
(Nandeshwar et al. (2014)) created a similar model to predict the enrolment of the
student in the university based on the factors like SAT score, GPA score, residency
race etc. The Model was created using the Multiple Logistic regression algorithm, it
was able to achieve accuracy rate of 67% only.
DAY 21 TO DAY 25

MODULES DESCRIPTION
Exploratory Data Analysis: Performed initial investigations on data so as to discover
patterns, to spot anomalies, to test hypothesis and to check assumptions with the help
of summary statistics and graphical representations. Data Visualization: Using data
visualization, I summarized the data with graphs, pictures and maps, so that the
human mind has an easier time processing and understanding the given data. Data
visualization plays a significant role in the representation of both small and large
data sets, but it is especially useful when we have large data sets, in which it is
impossible to see all of our data, let alone process and understand it manually.
Training and Testing: In this project, datasets are split into two subsets. The first
subset is known as the training data - it's a portion of our actual dataset that is fed
into the machine learning model to discover and learn patterns. In this way, it trains
our model. The other subset is known as the testing data. Train and Evaluate Linear
Regression: Simple linear regression is an approach for predicting a quantitative
response using a single feature (or "predictor" or "input variable"). It takes the
following form: y=β0+β1x
Limitation of this system only relied on the GRE, TOEFL and Undergraduate Score
of the student and missed on taking into consideration other important factors like
SOP and LOR. The existing system lagged the factor of the research work in the
related field. This model achieved only 67% accuracy
DAY 26 TO DAY 30

Linear Regression
Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between a dependent variable and one or more
independent features. When the number of the independent feature, is 1 then it is
known as Univariate Linear regression, and in the case of more than one feature, it
is known as multivariate linear regression.

Why Linear Regression is Important?


The interpretability of linear regression is a notable strength. The model’s
equation provides clear coefficients that elucidate the impact of each independent
variable on the dependent variable, facilitating a deeper understanding of the
underlying dynamics. Its simplicity is a virtue, as linear regression is transparent,
easy to implement, and serves as a foundational concept for more complex
algorithms.
Linear regression is not merely a predictive tool; it forms the basis for various
advanced models. Techniques like regularization and support vector machines
draw inspiration from linear regression, expanding its utility. Additionally, linear
regression is a cornerstone in assumption testing, enabling researchers to validate
key assumptions about the data.

The best Fit Line


Our primary objective while using linear regression is to locate the best-fit line,
which implies that the error between the predicted and actual values should be kept
to a minimum. There will be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship
between the dependent and independent variables. The slope of the line indicates
how much the dependent variable changes for a unit change in the independent
variable(s).
DAY 31 TO DAY 34

Simple Linear Regression

This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple linear
regression is:

where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope

Multiple Linear Regression


This involves more than one independent variable and one dependent
variable. The equation for multiple linear regression is:

where:
 Y is the dependent variable
 X1, X2, …, Xp are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes

The goal of the algorithm is to find the best Fit Line equation that can predict the
values based on the independent variables.
In regression set of records are present with X and Y values and these values are
used to learn a function so if you want to predict Y from an unknown X this learned
function can be used. In regression we have to find the value of Y, So, a function
is required that predicts continuous Y in the case of regression given X as
independent features.
DAY 35 TO DAY 39

Gradient Descent for Linear Regression


A linear regression model can be trained using the optimization algorithm gradient
decent by iteratively modifying the model’s parameters to reduce the mean square
error of the model on a training dataset. To update θ1 and θ2 values in order to reduce
the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses
Gradient Descent. The idea is to start with random θ1 and θ2 values and then iteratively
update the values, reaching minimum cost.
A gradient is nothing but a derivative that defines the effects on outputs of the
function with a little bit of variation in inputs.
Let’s differentiate the cost function(J) with rest. Finding the coefficients of a linear
equation that best fits the training data is the objective of linear regression. By moving in
the direction of the Mean Squared Error negative gradient with respect to the coefficients,
the coefficients can be changed. And the respective intercept and coefficient of X will be
the learning rate.

Independence: The observations in the dataset are independent of each other. This
means that the value of the dependent variable for one observation does not depend on
the value of the dependent variable for another observation. If the observations are not
independent, then linear regression will not be an accurate model.

Homoscedasticity: Across all levels of the independent variable(s), the variance of


the errors is constant. This indicates that the amount of the independent variable(s) has
no impact on the variance of the errors. If the variance of the residuals is not constant,
then linear regression will not be an accurate model.

Normality: The residuals should be normally distributed. This means that the residuals
should follow a bell-shaped curve. If the residuals are not normally distributed, then
linear regression will not be an accurate model.
DAY 40 TO DAY 44

Assumptions of Multiple Linear Regression

For Multiple Linear Regression, all four of the assumptions from Simple Linear
Regression apply. In addition to this, below are few more:

1. No multicollinearity: There is no high correlation between the independent


variables. This indicates that there is little or no correlation between the
independent variables. Multicollinearity occurs when two or more independent
variables are highly correlated with each other, which can make it difficult to
determine the individual effect of each variable on the dependent variable. If
there is multicollinearity, then multiple linear regression will not be an accurate
model.

2. Additivity: The model assumes that the effect of changes in a predictor


variable on the response variable is consistent regardless of the values of the
other variables. This assumption implies that there is no interaction between
variables in their effects on the dependent variable.

3. Feature Selection: In multiple linear regression, it is essential to carefully


select the independent variables that will be included in the model. Including
irrelevant or redundant variables may lead to overfitting and complicate the
interpretation of the model.

4. Overfitting: Overfitting occurs when the model fits the training data too
closely, capturing noise or random fluctuations that do not represent the true
underlying relationship between variables. This can lead to poor generalization
performance on new, unseen data.

Acunetix is an automated web application security testing tool that audits your web
applications by checking for vulnerabilities like SQL Injection, Cross site scripting and
other exploitable vulnerabilities. In general, Acunetix scans any website or web application
that is accessible via a web browser and uses the HTTP/HTTPS protocol.
DAY 45 TO DAY 49

Multicollinearity
Multicollinearity is a statistical phenomenon that occurs when two or more independent
variables in a multiple regression model are highly correlated, making it difficult to assess
the individual effects of each variable on the dependent variable.

Detecting Multicollinearity includes two techniques

 Correlation Matrix: Examining the correlation matrix among the independent


variables is a common way to detect multicollinearity. High correlations (close
to 1 or -1) indicate potential multicollinearity.
 VIF (Variance Inflation Factor): VIF is a measure that quantifies how much
the variance of an estimated regression coefficient increases if your predictors
are correlated. A high VIF (typically above 10) suggests multicollinearity.

Evaluation Metrics for Linear Regression


A variety of evaluation measures can be used to determine the strength of any linear
regression model. These assessment metrics often give an indication of how well the
model is producing the observed outputs.
The most common measurements are:

Mean Square Error (MSE)


Mean square error is an evaluation metric that calculates the average of the squared
differences between the actual and predicted values for all the data points. The
difference is squared to ensure that negative and positive differences don’t cancel each
other out.

Here,
 n is the number of data points.
 Vi is the actual or observed value for the ith data point.
 Yi is the predicted value for the ith data point.
MSE is a way to quantify the accuracy of a model’s predictions. MSE is sensitive to
outliers as large errors contribute significantly to the overall score.
DAY 50 TO DAY 54

Product of Vector
In the case of vector multiplication, there are basically two kinds of products- scalar and
vector. The dot product is a kind of multiplication that results in a scalar quantity. Cross
Product is a kind of multiplication that results in a vector quantity. Vector products are
used to define other derived vector quantities. The equations for torque, angular velocity,
and acceleration. All of these quantities involve the operations resulting in vectors from
vectors. These operations are usually vector products.

Position Vector
The position vector is used to denote the position of the particle on the Cartesian plane
with respect to the origin as a reference.

Velocity
 The average velocity is the ratio of total displacement over total time.
 Gradient Descent stands as a cornerstone orchestrating the intricate dance of model
optimization. At its core, it is a numerical optimization algorithm that aims to find
the optimal parameters—weights and biases—of a neural network by minimizing
a defined cost function.
 Gradient Descent (GD) is a widely used optimization algorithm in machine
learning and deep learning that minimises the cost function of a neural network
model during training. It works by iteratively adjusting the weights or parameters
of the model in the direction of the negative gradient of the cost function until the
minimum of the cost function is reached.
 The learning happens during back propagation while training the neural network-
based model. There is a term known as machine, which is used to optimize the
weight and biases based on the cost function. The cost function evaluates the
difference between the actual and predicted outputs.
 Gradient Descent is a fundamental optimization algorithm in machine
learning used to minimize the cost or loss function during model training.
DAY 55 TO DAY 58

Root Mean Squared Error (RMSE)


The square root of the residuals’ variance is the root mean square error. It describes how
well the observed data points match the expected values, or the model’s absolute fit to
the data.

Rather than dividing the entire number of data points in the model by the number of
degrees of freedom, one must divide the sum of the squared residuals to obtain an
unbiased estimate. Then, this figure is referred to as the Residual Standard Error (RSE).

RSME is not as good of a metric as R-squared. Root Mean Squared Error can fluctuate
when the units of the variables vary since its value is dependent on the variables’ units
(it is not a normalized measure).

Coefficient of Determination (R-squared)


R-squared is a statistic that indicates how much variation the developed model can
explain or capture. It is always in the range of 0 to 1. In general, the better the model
matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:

 Residual sum of Squares (RSS): The sum of squares of the residual for each
data point in the plot or data is known as the residual sum of squares, or RSS.
It is a measurement of the difference between the output that was observed
and what was anticipated.

 Total Sum of Squares (TSS): The sum of the data points’ errors from the
answer variable’s mean is known as the total sum of squares, or TSS.

R squared metric is a measure of the proportion of variance in the dependent variable


that is explained the independent variables in the model.
Adjusted R-Squared Error
Adjusted R2 measures the proportion of variance in the dependent variable that is
explained by independent variables in a regression model. Adjusted R-square accounts
the number of predictors in the model and penalizes the model for including irrelevant
predictors that don’t contribute significantly to explain the variance in the dependent
variables.
DAY 59 AND DAY 60

CODINGS:
class LinearRegression:
def __init__(self):
self.parameters = {}

def forward_propagation(self, train_input):


m = self.parameters['m']
c = self.parameters['c']
predictions = np.multiply(m, train_input) + c
return predictions

def cost_function(self, predictions, train_output):


cost = np.mean((train_output - predictions) ** 2)
return cost

def backward_propagation(self, train_input, train_output, predictions):


derivatives = {}
df = (train_output - predictions) * -1
dm = np.mean(np.multiply(train_input, df))
dc = np.mean(df)
derivatives['dm'] = dm
derivatives['dc'] = dc
return derivatives

def update_parameters(self, derivatives, learning_rate):


self.parameters['m'] = self.parameters['m'] - \
learning_rate * derivatives['dm']
self.parameters['c'] = self.parameters['c'] - \
learning_rate * derivatives['dc']

def train(self, train_input, train_output, learning_rate, iters):


# Initialize random parameters
self.parameters['m'] = np.random.uniform(0, 1) * -1
self.parameters['c'] = np.random.uniform(0, 1) * -1
DAY 61 AND DAY 62

Lasso Regression (L1 Regularization)


Lasso Regression is a technique used for regularizing a linear regression model, it
adds a penalty term to the linear regression objective function to prevent
overfitting.

The objective function after applying lasso regression is:


 the first term is the least squares loss, representing the squared
difference between predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of
absolute values of the regression coefficient θj.

Ridge Regression (L2 Regularization)


Ridge regression is a linear regression technique that adds a regularization term to
the standard linear objective. Again, the goal is to prevent overfitting by penalizing
large coefficient in linear regression equation. It useful when the dataset
has multiple where predictor variables are highly correlated.

The objective function after applying ridge regression is:


 the first term is the least squares loss, representing the squared
difference between predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of
square of values of the regression coefficient θj.

Elastic Net Regression


Elastic net regression is a hybrid regularization technique that combines the
power of both L1 and L2 regularization in linear regression objective.

 The first term is least square loss.


 The second term is L1 regularization and third is ridge regression.
 L2 is the overall regularization strength.
 α controls the mix between L1 and L2 regularization.

You might also like