0% found this document useful (0 votes)
39 views25 pages

Internship Report PDF 2022

Uploaded by

simranbisti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views25 pages

Internship Report PDF 2022

Uploaded by

simranbisti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

lOMoARcPSD|40808151

Internship report pdf 2022

Computer Science and Engineering (Visvesvaraya Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Simran Bisti ([email protected])
lOMoARcPSD|40808151

Machine Learning using Python

CHAPTER 1

ORGANIZATION INTRODUCTION
1.1 ABOUT THE ORGANIZATION
The purpose of the business is the development of various types of testing
equipment’s including IC Testers, Component Testers, GSM module, GPS module and
Microcontroller trainer kits based on 8051/8052. Microcontroller based educational
systems; ARM-7/8051/MSP430 based development kits. We also undertake development
of Custom Designed/Application Specific equipment.

HISTORY

amsa embedded solutions

Since 2013, we are contributing our work successfully on design and development
of various types of hardware circuit boards, electronics products, industrial and academic
projects. With strong industrial background and experience we offer student training,
corporate training and placement. aMSa is a dedicated team consisting of graduates in
Bachelor of Engineering from and are working as technical specialists in embedded
companies in Bangalore and Hubli. With this industrial expertise we would like to assist
the students to get awareness on Industry demand and work during their training in our
academy.

Dept. of CSE, GEC Haveri [2022-23] Page 1

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

COMPANY STRATERGY
AMSa Embedded Solution will be promoted in various local media. Clients have
already begun promotion by word of mouth. Response has been favorable. Formal
advertising is planned to begin three months before the center is scheduled to open.

1.2 PROFILE OF THE ORGANIZATION


Name aMSa Embedded Solution.

Address aMSa Embedded Solutions,


Baligar Building, Unkal Cross Stop, Shirur Park Road
Vidyanagar Hubli – 580031

Mobile No 0836-2270318

Email ID [email protected]

Area of Operation Hubli

CHAPTER 2
ABOUT THE DEPARTMENT
2.1 COMPANY SERVICES
• Electronic Product Design
• Embedded systems design using 8, 16, 32-bit microcontroller.
• Embedded Software & Firmware development
• Re-engineering services to correct flaws or to optimize existing designs for lowering costs.
• Electronics Consulting services

2.2 TRAINING
Engineering students will from now on have to undergo a mandatory period of
internship to gain practical experience, a decision prompted by industry complaints about
the poor employability of BE/BTech graduates. Students studying in college does not have
the opportunity to learn the practical skills required for the skill development and
employment process so it is better to join internship in required domain in the companies

Dept. of CSE, GEC Haveri [2022-23] Page 2

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

which has got the potential to shape the student’s future with hands on experience on the
technologies including hardware and software development.

Service Offerings:

1. Application Development
2. Web-based Applications
3. Windows based Applications
4. IOT based Application

2.3 Department of Research and Development

AES has R&D team size of 4 members, two are working as Hardware Design
Engineer and other two are working as Firmware Design Engineer, whose
responsibility will be as follows-

Hardware Design Engineer Responsibility:


✓ Design and develop hardware products and systems.
✓ Design, define, and simulate analog and digital circuits.
✓ Design, test and debug digital, analog, and RF based circuits.
✓ Debug and evaluate motherboard and resolve circuit problems.
✓ Assist and support CAD team or design team in schematic designs and
developments.
✓ Coordinate with software teams on designs and development of devices.
✓ Work with software teams on device bring-up and development.
✓ Research, evaluate, develop and apply new processes to products and
applications.

Box Building and Board Assembling:

✓ AES has Production team size of 2 members whose responsibility will be as


follow.
✓ Sub-Level Product Assembly
✓ Product Assembly
✓ System Level Assembly

Dept. of CSE, GEC Haveri [2022-23] Page 3

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

✓ Testing including Functional, Final, Environmental and Burn-In


✓ Software Loading and Product Configuration
✓ Warehousing & Order Fulfillment & Traceability
✓ Packaging & Labeling including Bar Coding
✓ Aftermarket Service and Depot Repair of EIT Built Products

CHAPTER 3
TASK PERFORMED
3.1 Machine Learning
Machine learning is a growing technology which enables computers to learn
automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or information.
Currently, it is being used for various tasks such as image recognition, speech recognition,
email filtering, Facebook auto-tagging, recommender system, and many more.

What is Machine Learning In the real world, we are surrounded by humans who
can learn everything from their experiences with their learning capability, and we have
computers or machines which work on our instructions. But can a machine also learn from
experiences or past data like a human does? So here comes the role of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned
with the development of algorithms which allow a computer to learn from the data and
past experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as: A machine has the ability to
learn if it can improve its performance by gaining more data.

How does Machine Learning work A Machine Learning system learns from
historical data, builds the prediction models, and whenever it receives new data, predicts
the output for it. The accuracy of predicted output depends upon the amount of data, as the
huge amount of data helps to build a better model which predicts the output more
accurately. Suppose we have a complex problem, where we need to perform some
predictions, so instead of writing a code for it, we just need to feed the data to generic
algorithms, and with the help of these algorithms, machine builds the logic as per the data

Dept. of CSE, GEC Haveri [2022-23] Page 4

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

and predict the output. Machine learning has changed our way of thinking about the
problem. The below block diagram explains the working of Machine Learning algorithm:

Features of Machine Learning:

• Machine learning uses data to detect various patterns in a given data


set.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also deals with
the huge amount of the data.

Following are some key points which show the importance of


Machine Learning:

• Rapid increment in the production of data.


• Solving complex problems, which are difficult for a human.
• Decision making in various sector including finance.
• Finding hidden patterns and extracting useful information from data.

3.1.1 Classification of Machine Learning


➢ Supervised Learning
Supervised learning, also known as supervised machine learning, is a sub category
of Machine Learning and Artificial Intelligence. It is defined by its use of labeled datasets
to train algorithms that to classify data or predict outcomes accurately. As input data is fed
in the model, it adjusts its weights until the model has been fitted appropriately, which
occurs as part of the cross-validation process. Supervised learning helps organization3.2s
solve for a variety of real-world problems at scale, such as classifying spam in a separate
folder from your inbox.

Dept. of CSE, GEC Haveri [2022-23] Page 5

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Fig: Flow Model of Algorithms of Supervised Learning.


How supervised learning works Supervised learning uses a training set to teach
models to yield the desired output. This training dataset includes inputs and correct
outputs, which allow the model to learn over time. The algorithm measures its accuracy
through the loss function, adjusting until the error has been sufficiently minimized.
Supervised learning can be separated into two types of problems when data
mining—classification and regression:

Classification uses an algorithm to accurately assign test data into specific


categories. It recognizes specific entities within the dataset and attempts to draw some
conclusions on how those entities should be labeled or defined. Common classification
algorithms are linear classifiers, support vector machines (SVM), decision trees, k-nearest
neighbor, and random forest, which are described in more detail below.

Regression is used to understand the relationship between dependent and


independent variables. It is commonly used to make projections, such as for sales revenue
for a given business. Linear regression, logistical regression, and polynomial regression
are popular regression algorithms.

Dept. of CSE, GEC Haveri [2022-23] Page 6

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

➢ Unsupervised Learning

Unsupervised learning, also known as unsupervised machine learning, uses machine


learning algorithms to analyze and cluster unlabeled datasets. These algorithms
discover hidden patterns or data groupings without the need for human intervention. Its
ability to discover similarities and differences in information make it the ideal solution
for exploratory data analysis, cross-selling strategies, customer segmentation, and
image recognition.

Unsupervised learning models are utilized for three main tasks clustering,
association, and dimensionality reduction.

• Clustering: Clustering is a data mining technique which groups unlabeled data


based on their similarities or differences.
• Association: An association rule is a rule-based method for finding relationships
between variables in a given dataset.
• Dimensionality reduction: Dimensionality reduction is a technique used when the
number of features, or dimensions, in a given dataset is too high. It reduces the number
of data inputs to a manageable size while also preserving the integrity of the dataset as
much as possible.

Fig: Flow Model of Algorithms of Unsupervised Learning.

Dept. of CSE, GEC Haveri [2022-23] Page 7

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

➢ Reinforcement learning

Reinforcement learning is an area of Machine Learning. It is about taking suitable


action to maximize reward in a particular situation. It is employed by various software
and machines to find the best possible behavior or path it should take in a specific
situation. Reinforcement learning differs from the supervised learning in a way that in
supervised learning the training data has the answer key with it so the model is trained
with the correct answer itself whereas in reinforcement learning, there is no answer but
the reinforcement agent decides what to do to perform the given task. In the absence of
training dataset, it is bound to learn from its experience.

There are two types of Reinforcement learning:

• Positive: Positive Reinforcement is defined as when an event, occurs due to a


particular behavior, increases the strength and the frequency of the behavior. In other
words, it has a positive effect on the behavior.

• Negative: Negative Reinforcement is defined as strengthening of behavior because


a negative condition is stopped or avoided. It increases behavior and provide defiance
to minimum standard of performance. It only provides enough to meet up the minimum
behavior.

Fig: Flow Model of Algorithms of Reinforcement Learning.

Dept. of CSE, GEC Haveri [2022-23] Page 8

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

3.2 Machine Learning Algorithms


Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without being
explicitly programmed. Machine learning focuses on the development of computer
programs that can access data and use it learn for themselves.

Fig: Types of Machine Learning Algorithms

3.3.1 Linear Regression


Linear regression is used to identify the relationship between a dependent
variable and one or more independent variables and is typically leveraged to make
predictions about future outcomes. When there is only one independent variable and one
dependent variable, it is known as simple linear regression. As the number of independent
variables increases, it is referred to as multiple linear regression. For each type of linear
regression, it seeks to plot a line of best fit, which is calculated through the method of least
squares. However, unlike other regression models, this line is straight when plotted on a
graph.

Dept. of CSE, GEC Haveri [2022-23] Page 9

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Fig: Linear Regression

Advantages

• Linear regression is an extremely simple method. It is very easy and intuitive to use
and understand.

• A person with only the knowledge of high school mathematics can understand and use
it. In addition, it works in most of the cases. Even when it doesn’t fit the data exactly,
we can use it to find the nature of the relationship between the two variables.

Disadvantages

• By its definition, linear regression only models’ relationships between dependent and
independent variables that are linear. It assumes there is a straight-line relationship
between them which is incorrect sometimes.
• Linear regression is very sensitive to the anomalies in the data (or outliers).

3.3.2 Logistic Regression

While linear regression is leveraged when dependent variables are continuous,


logistical regression is selected when the dependent variable is categorical, meaning they
have binary outputs, such as "true" and "false" or "yes" and "no." While both regression

Dept. of CSE, GEC Haveri [2022-23] Page 10

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

models seek to understand relationships between data inputs, logistic regression is


mainly used to solve binary classification problems, such as spam identification.

3.3.2.1 Key Features


• Logistic regression predicts whether something is True (1) or False (0) instead,
predicting something that is continuous like size.
• It has an S-shaped line.
• We can take our Linear Regression Model and convert it into Logistic Regression model
with the help of Sigmoid Function.
• Logistic Regression’s ability to provide probabilities and classify new samples using
continuous and discrete measurements makes it a popular machine learning method.

Fig: Linear Regression v/s Logistic Regression

This is where logistic regression comes into play. In logistic regression, you get
a probability score that reflects the probability of the occurrence of the event. An event
in this case is each row of the training dataset. It could be something like classifying if
a given email is spam, or mass of cell is malignant or a user will buy a product and so
on.

Advantages
➢ It doesn’t require high computational power.
➢ Is easily interpretable.
➢ Is used widely by the data analyst and data scientists.
➢ Is very easy to implement.
➢ It doesn’t require scaling of features.

Dept. of CSE, GEC Haveri [2022-23] Page 11

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Disadvantages

➢ While working with Logistic regression you are not able to handle a large number
of categorical features/variables.
➢ It is vulnerable to overfitting.
➢ regression will not perform well with independent(X) variables that are not correlated to
the target(Y) variable.

3.3.3 KNN
K nearest neighbors or KNN Algorithm is a simple algorithm which uses the entire
dataset in its training phase. Whenever a prediction is required for an unseen data
instance, it searches through the entire training dataset for k-most similar instances and
the data with the most similar instance is finally returned as the prediction. KNN is often
used in search applications where you are looking for similar items, like find items similar
to this one.

3.3.3.1 Features of KNN Algorithm


• KNN is a Supervised Learning algorithm that uses labelled input data set to predict the
output of the data points.
• It is one of the simplest Machine learning algorithms and it can be easily implemented
for a varied set of problems.
• It is mainly based on feature similarity. KNN checks how similar a data point is to
its neighbor and classifies the data point into the class it is most similar to.
• KNN is a lazy algorithm; this means that it memorizes the training data set instead
of learning a discriminative function from the training data.
• KNN can be used for solving both classification and regression problems.

Advantages
• The algorithm is simple and easy to implement.
• There’s no need to build a model, tune several parameters, or make additional
assumptions.
• The algorithm is versatile. It can be used for classification, regression, and search.

Dept. of CSE, GEC Haveri [2022-23] Page 12

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

• The training phase of K-nearest neighbor classification is much faster compared to other
classification algorithms.
Disadvantages
▪ The algorithm gets significantly slower as the number of examples and/or
predictors/independent variables increase.
▪ The testing phase of K-nearest neighbor classification is slower and costlier in terms
of time and memory. It requires large memory for storing the entire training dataset
for prediction.
▪ KNN also not suitable for large dimensional data.

3.3.4 SVM
“Support Vector Machine” (SVM) is a supervised machine learning algorithm that
can be used for both classification and regression challenges. However, it is mostly used
in classification problems. In the SVM algorithm, we plot each data item as a point in n-
dimensional space (where n is a number of features you have) with the value of each
feature being the value of a particular coordinate.

3.3.4.1 Features of SVM

• SVM is a Supervised Learning algorithm that uses labelled input data set to predict
the output of the data points.
• It is one of the simplest Machine learning algorithms and it can be easily
implemented for a varied set of problems.
• SVM can be used for solving both classification and regression problems.

Fig: Plot of ideal SVM Algorithm

Dept. of CSE, GEC Haveri [2022-23] Page 13

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Advantages
• SVM works relatively well when there is a clear margin of separation between classes.
• SVM is more effective in high dimensional spaces.
• SVM is relatively memory efficient.
• SVM is effective in cases where the number of dimensions is greater than number of
samples.
Disadvantages
• SVM algorithm is not suitable for large data sets.
• SVM does not perform very well when the data set has more noise i.e. target classes are
overlapping.
• In cases where the number of features for each data point exceeds the number of training
data samples, the SVM will underperform.

3.3.5 Decision Tree

Decision Tree is a supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome. It is a graphical representation for getting all the possible
solutions to a problem/decision on based on given conditions. In a Decision tree, there
are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to
make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.

3.3.5.1 Features of Decision tree

• Decision Trees usually mimic human thinking ability while making a decision, so it
is easy to understand.
• The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
• It is very easy to understand and implement.

Dept. of CSE, GEC Haveri [2022-23] Page 14

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

3.3.5.2 Working

In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute with
the record (real dataset) attribute and, based on the comparison, follows the branch and
jumps to the next node. For the next node, the algorithm again compares the attribute
value with the other sub- nodes and move further. It continues the process until it reaches
the leaf node of the tree.
The complete process can be better understood using the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step3.
Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

Fig: Ideal diagram of a Decision Tree

Dept. of CSE, GEC Haveri [2022-23] Page 15

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Advantages

• It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.

Disadvantages

• The decision tree contains lots of layers, which makes it complex.


• It may have an over fitting issue, which can be resolved using the Random Forest
algorithm.

CHAPTER 4
REFLECTION
4.1 PROJECT: TITANIC SURVIVAL PREDICTION
➢ Data collection:
The very first step is data collection process which can be obtained from many
sources like company side, Kaggle repository surveys 3rd party API etc. and import that
data set in form of comma separated file and import the required modules.

➢ Train Test split operation:


After feature engineering is completed and we have a data set with no null values
and dummy variables we classify the data into training and testing data set. 20% of the
data is the test data set and the 80% is considered to be the training data set which we will
feed to our algorithm and after learning from that it can make the predictions on the testing
data.

Dept. of CSE, GEC Haveri [2022-23] Page 16

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

➢ Filling Null values:


Our job is to fill those null values in a specific feature by any method possible so
that those null values do not affect the accuracy of our model.
There are 2 types of data: 1) Numerical data 2) Categorical data
for the numerical data we use the mean median imputation method and replace the null
values by the median of that column. By using median and not mean it reduces the
impact of outliers which can be present in the data.

➢ Algorithm used
The algorithm I used here is Logistic Regression which is generally used in binary
classification problem statements. It is used in problems where the data is linearly
separable and our algorithm designs a line which best linearly separates the two classes.
It basically predicts the probability whether that particular event will happen or not.
We then create the classifier and the fit the training data into it and then use the
predict function for the output. Since the model is now ready it is time to evaluate the
performance of our classifier or model.

➢ Problem statement:
Here we can choose any of the models to predict survival of test sample. Since
we have evaluated all models by using confusion matrix, we will predict by using model
which has highest accuracy. We performed prediction on dataset by using logistic
regression, SVC and Random Forest. Additionally, the final prediction is made through
best voting outcome of algorithms using ensemble technique. As it is very much clear
from above table, and we predicted that Random Forest model has the highest accuracy
decisions.

4.2 HARDWARE REQUIREMENTS


• System Processor: Intel core i5/ i7.
• RAM: 4 GB.
• Hard disk capacity: 500 GB

Dept. of CSE, GEC Haveri [2022-23] Page 17

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

SOFTWARE REQUIRMENTS
• Operating System: windows 8/10
• Programming Language: Python
• Framework: Anaconda
• IDE: Jupyter Notebook
• ML Libraries: NumPy, Pandas, Matplotlib.

4.4 SNAPSHOT
RESULT

Dept. of CSE, GEC Haveri [2022-23] Page 18

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Dept. of CSE, GEC Haveri [2022-23] Page 19

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Dept. of CSE, GEC Haveri [2022-23] Page 20

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Dept. of CSE, GEC Haveri [2022-23] Page 21

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Dept. of CSE, GEC Haveri [2022-23] Page 22

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

CONCLUTION
Data cleaning is the first step while performing data analysis. Exploratory data
analytics helps one to understand the dataset and the dependency among the attributes.
EDA is used to figure out the relationship between the features of the dataset. This is done
by using various graphical techniques. The one used above is gg plot and histograms. By
applying EDA some conclusions are drawn and facts are found. There is high influence of
age on survival. We can see from table-2 that as age increases survival decreases

Dept. of CSE, GEC Haveri [2022-23] Page 23

Downloaded by Simran Bisti ([email protected])


lOMoARcPSD|40808151

Machine Learning using Python

Dept. of CSE, GEC Haveri [2022-23] Page 24

Downloaded by Simran Bisti ([email protected])

You might also like