0% found this document useful (0 votes)
20 views22 pages

Course Report

Uploaded by

Vrishhti Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Course Report

Uploaded by

Vrishhti Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Mini Project/Internship Assessment-KCS554

MACHINE LEARNING
COURSE
REPORT

By-Aparna Joshi
Roll no: 2100910130026
Under the mentorship of- Mrs. Aparna Srivastava
TABLE OF CONTENTS:

Declaration ......................................................................................................... 1

Acknowledgement ............................................................................................. 2

Certificate .......................................................................................................... 3

Introduction…………………………………………………………………….4

Modules……………………………………………………………………….5-10

Learning outcomes……………………………………………………………11-14

Project…………………………………………………………………………14-16

Utility of course and Conclusion………………………………………………17

References…………………………………………………………………….18-20
Declaration

I hereby declare that this submission is my own work and that, to the best of her knowledge
and belief, it contains no material previously published or written by another person nor
material which to a substantial extent has been accepted for the award of any other degree or
diploma of the university or other institute of higher learning, except where due
acknowledgement has been made in the test.

Name: Aparna Joshi

Roll No:2100910130026

Date: 22/12/23
Acknowledgement

We would like to express our sincere gratitude to our mentor, Mrs. APARNA
SRIVASTAVA, Assistant Professor, Department of Information technology, whose role as
project guide was extremely valuable for the project. We are thankful for the keen interest
she took in advising us, helping us out throughout the project, without whose guidance this
project would not be possible.

We are also thankful to all the Professors and Faculty Members in the department for their
teachings and academic support

Date: 22/12/23 Aparna Joshi


CERTIFICATE:
INTRODUCTION

Took a course on Machine learning - by Kunal Jain, Pranav Dar, Aishwarya Singh
On Internshala.

The course mainly talks about:


MODULE-1

• Get Started with Internshala Trainings


• What is Machine Learning
• How Machine Learning Works
• Types of Machine Learning – Supervised and Unsupervised

MODULE-2
• Training Overview Video
• Types of Data
• Graphical and Analytical Representation of Data
• Limitations of Traditional Data Analysis

MODULE-3
• Introduction to Python and Installing Jupyter Notebook
• Basic Libraries in Python (Pandas, Numpy, Matplotlib)
• Understanding Basics of Python Programming (Conditional- Iterative Statements and Function)
• Basic Data Exploration
• Advanced Functions for Data Manipulation

MODULE-4
• Context Setting and Problem Statement
• Data exploration - Target Variable
• Data Exploration - Independent Numerical Variables
• Data Exploration - Categorical Variables
• Splitting of Data
• Feature Scaling of Data

MODULE-5
• Building Your First Predictive Model (Regression) and Evaluate Performance
• Introduction to Linear Regression
• Understanding Gradient Descent
• Assumptions of Linear Regression
• Implementing Linear Regression
• Feature Engineering
MODULE-6
• Common Dimensionality Reduction Techniques
The primary aim of dimensionality reduction is to avoid overfitting. A training data with
considerably lesser features will ensure that your model remains simple – it will make smaller
assumptions.

• Filter strategy
• Wrapper strategy
• Embedded strategy

• Advanced Dimensionality Reduction Techniques

Principal Component Analysis (PCA)

Non-negative matrix factorization (NMF)

Linear discriminant analysis (LDA)

Missing Values Ratio

Low Variance Filter

MODULE-7
• Understanding the Basics of Logistic Regression
Logistic regression is a supervised machine learning algorithm mainly used for classification
tasks where the goal is to predict the probability that an instance belongs to a given class or not. It
is a kind of statistical algorithm, which analyze the relationship between a set of independent
variables and the dependent binary variables. It is a powerful tool for decision-making. For
example email spam or not.
• Evaluation Metrics :Evaluation metrics are quantitative measures that assess the performance and
effectiveness of a statistical or machine learning model. They provide insights into how well the
model is performing and help in comparing different models or algorithms
• Implementing Logistic Regression

MODULE-8
• Introduction to Decision Tree
• Logic Behind Decision Tree
• Implementing Decision Tree
• Improving Model Performance by Pruning/Hyperparameters Tuning

A decision tree in machine learning is a versatile, interpretable algorithm used for predictive
modelling. It structures decisions based on input data, making it suitable for both classification
and regression tasks. This article delves into the components, terminologies, construction, and
advantages of decision trees, exploring their applications and learning algorithms.

How Decision Tree is formed?


The process of forming a decision tree involves recursively partitioning the data based on the values of
different attributes. The algorithm selects the best attribute to split the data at each internal node, based
on certain criteria such as information gain or Gini impurity. This splitting process continues until a
stopping criterion is met, such as reaching a maximum depth or having a minimum number of instances
in a leaf node.

MODULE-9
• Basics of Ensemble Techniques
• Random Forest
• Implementation of Bagging and Random Forest

Ensemble methods are techniques that aim at improving the accuracy of results in models by
combining multiple models instead of using a single model. The combined models increase the
accuracy of the results significantly. This has boosted the popularity of ensemble methods
in machine learning.

• Ensemble methods aim at improving predictability in models by combining several models to make
one very reliable model.
• The most popular ensemble methods are boosting, bagging, and stacking.
• Ensemble methods are ideal for regression and classification, where they reduce bias and variance to
boost the accuracy of models.
1. Bagging

Bagging, the short form for bootstrap aggregating, is mainly applied in classification and regression. It
increases the accuracy of models through decision trees, which reduces variance to a large extent. The
reduction of variance increases accuracy, eliminating overfitting, which is a challenge to many predictive
models.

Bagging is classified into two types, i.e., bootstrapping and aggregation. Bootstrapping is a sampling
technique where samples are derived from the whole population (set) using the replacement procedure. The
sampling with replacement method helps make the selection procedure randomized. The base learning
algorithm is run on the samples to complete the procedure.

Aggregation in bagging is done to incorporate all possible outcomes of the prediction and randomize the
outcome. Without aggregation, predictions will not be accurate because all outcomes are not put into
consideration. Therefore, the aggregation is based on the probability bootstrapping procedures or on the
basis of all outcomes of the predictive models.
Bagging is advantageous since weak base learners are combined to form a single strong learner that is more
stable than single learners. It also eliminates any variance, thereby reducing the overfitting of models. One
limitation of bagging is that it is computationally expensive. Thus, it can lead to more bias in models when
the proper procedure of bagging is ignored.

2. Boosting

Boosting is an ensemble technique that learns from previous predictor mistakes to make better predictions in
the future. The technique combines several weak base learners to form one strong learner, thus significantly
improving the predictability of models. Boosting works by arranging weak learners in a sequence, such that
weak learners learn from the next learner in the sequence to create better predictive models.

3. Stacking

Stacking, another ensemble method, is often referred to as stacked generalization. This technique works by
allowing a training algorithm to ensemble several other similar learning algorithm predictions.

MODULE-10

• Clustering
• Understanding K-means
• Implementation of K-means

o The method of identifying similar groups of data in a large dataset is called clustering or cluster

analysis.

o It is one of the most popular clustering techniques in data science used by data scientists. Entities in

each group are comparatively more similar to entities of that group than those of the other groups. In

this article, I will be taking you through the types of clustering, different clustering algorithms, and a

comparison between two of the most commonly used clustering methods.

o K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into

different clusters. Here K defines the number of pre-defined clusters that need to be created in the

process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
o The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
LEARNING OUTCOMES:

Learning Machine Learning can lead to various learning outcomes, encompassing technical,
practical, and ethical aspects. Here are some potential learning outcomes that were observed
involved in the course:

1. **Machine Learning and Computer Vision Skills:**

i. **Understanding:** Gain a deeper understanding of machine learning algorithms


and computer vision techniques, especially those related to image classification
and object detection.

ii. **Practical Application:** Apply machine learning concepts to real-world


problems, enhancing practical skills in model development and training.

2. **Programming Proficiency:**

a. **Coding Skills:** Improve programming proficiency, particularly in languages like


Python, and become adept at using relevant libraries and frameworks such as
TensorFlow, PyTorch, and OpenCV.

3. **Model Deployment and Integration:**

a. **Deployment Skills:** Learn how to deploy machine learning models, integrate


them into applications, and ensure they work effectively in real-world scenarios.

4. **Data Preprocessing and Labeling:**


a. **Data Handling:** Develop skills in preprocessing and handling image data, as well as
labeling datasets for supervised learning tasks.

5. **System Architecture and Integration:**

a. **System Design:** Gain experience in designing and integrating components of a


system, including databases, backend services, and front-end interfaces.

6. **Cloud Computing Knowledge:**

a. **Cloud Services:** Understand the use of cloud platforms like AWS, Azure, or
Google Cloud for deploying and scaling machine learning applications.

7. **Containerization:**

a. **Docker Skills:** Learn how to use Docker for containerization, ensuring consistency and
portability of applications across different environments.

8. **Continuous Integration/Continuous Deployment (CI/CD):**

a. **CI/CD Practices:** Gain familiarity with CI/CD tools and practices for automating testing,
integration, and deployment processes.
b. **Project Management:**
**Project Planning:** Develop project management skills by planning and executing tasks

within a timeline, including milestones and deliverables.

9. **Privacy and Ethical Considerations:**

a. **Ethical Awareness:** Understand the ethical implications of developing and deploying


technology, particularly in the context of privacy and data security.

PROJECT OVERVIEW
HOME LOAN PREDICTION MODEL

Description:Generally, loan prediction involves the lender looking at various background information about

the applicant and deciding whether the bank should grant the loan. Parameters like credit score, loan

amount, lifestyle, career, and assets are the deciding factors in getting the loan approved. If, in the past,

people with parameters similar to yours have paid their dues timely, it is more likely that your loan would be

granted as well.

Machine learning algorithms can exploit this dependency on past experiences and comparisons with other

applicants and formulate a data science problem to predict the loan status of a new applicant using similar

rules.

Objective:This model is based on SVM(support vector machine) which is a the most popular
supervised learning algorithm that used to solve both classification and regression problems. However
primarily it is used for classification problems.

The main goal of svm is to create the best line or decision boundary(HYPERPLANE) which can
segregate n-dimensional space into classes so that we can easily put the new data point in the correct
category in future.

Loan-Prediction-Classification

o A Classification Problem which predicts if a loan will get approved or not.

o Dataset- The data has 615 rows and 13 columns.

o Dataset Description-

o Variable

o Description

o Loan_ID - Unique Loan ID

o Gender - Male/ Female

o Married- Applicant married (Y/N)

o Dependents - Number of dependents

o Education - Applicant Education (Graduate/ Under Graduate)

o Self_Employed - Self employed (Y/N)


o ApplicantIncome - Applicant income

o CoapplicantIncome - Coapplicant income

o LoanAmount - Loan amount in thousands

o Loan_Amount_Term - Term of loan in months

o Credit_History - credit history meets guidelines

o Property_Area - Urban/ Semi Urban/ Rural

o Loan_Status - Loan approved (Y/N)


SAMPLE FROM MODEL:

Training data:

Test data:

Output: the accuracy of the train data was around 0.79% and that of test data was observed to be 0.84%
which is nearly the same.
UTILITY OF COURSE:

CONCLUSION:

After completion of the course it can be concluded that machine learning is a powerful tool for solving
complex problems, and it has a wide range of applications. Whether you’re looking to predict stock prices,
classify images, or understand natural language, there is a machine learning model that can help you achieve
your goals. With its ability to learn from data and make predictions, machine learning has the potential to
revolutionize the way we work and live.
REFERENCES:

1. Reference Books:

Caruana, Rich, et al. “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-
day readmission.” Proceedings of the 21th ACM SIGKDD international conference on knowledge
discovery and data mining. (2015).

Ancona, Marco, et al. “Towards better understanding of gradient-based attribution methods for deep
neural networks.” arXiv preprint arXiv:1711.06104 (2017).

2. Reference Links:
For datasets:
https://kaggle.com/datasets/
YouTube: https://youtu.be/XckM1pFgZmg?
https://www.projectpro.io/
Training , Validation & Creating of the Model done on Jupyterlab:
https://jupyter.org/

*********THE END********

You might also like