Course Report
Course Report
MACHINE LEARNING
COURSE
REPORT
By-Aparna Joshi
Roll no: 2100910130026
Under the mentorship of- Mrs. Aparna Srivastava
TABLE OF CONTENTS:
Declaration ......................................................................................................... 1
Acknowledgement ............................................................................................. 2
Certificate .......................................................................................................... 3
Introduction…………………………………………………………………….4
Modules……………………………………………………………………….5-10
Learning outcomes……………………………………………………………11-14
Project…………………………………………………………………………14-16
References…………………………………………………………………….18-20
Declaration
I hereby declare that this submission is my own work and that, to the best of her knowledge
and belief, it contains no material previously published or written by another person nor
material which to a substantial extent has been accepted for the award of any other degree or
diploma of the university or other institute of higher learning, except where due
acknowledgement has been made in the test.
Roll No:2100910130026
Date: 22/12/23
Acknowledgement
We would like to express our sincere gratitude to our mentor, Mrs. APARNA
SRIVASTAVA, Assistant Professor, Department of Information technology, whose role as
project guide was extremely valuable for the project. We are thankful for the keen interest
she took in advising us, helping us out throughout the project, without whose guidance this
project would not be possible.
We are also thankful to all the Professors and Faculty Members in the department for their
teachings and academic support
Took a course on Machine learning - by Kunal Jain, Pranav Dar, Aishwarya Singh
On Internshala.
MODULE-2
• Training Overview Video
• Types of Data
• Graphical and Analytical Representation of Data
• Limitations of Traditional Data Analysis
MODULE-3
• Introduction to Python and Installing Jupyter Notebook
• Basic Libraries in Python (Pandas, Numpy, Matplotlib)
• Understanding Basics of Python Programming (Conditional- Iterative Statements and Function)
• Basic Data Exploration
• Advanced Functions for Data Manipulation
MODULE-4
• Context Setting and Problem Statement
• Data exploration - Target Variable
• Data Exploration - Independent Numerical Variables
• Data Exploration - Categorical Variables
• Splitting of Data
• Feature Scaling of Data
MODULE-5
• Building Your First Predictive Model (Regression) and Evaluate Performance
• Introduction to Linear Regression
• Understanding Gradient Descent
• Assumptions of Linear Regression
• Implementing Linear Regression
• Feature Engineering
MODULE-6
• Common Dimensionality Reduction Techniques
The primary aim of dimensionality reduction is to avoid overfitting. A training data with
considerably lesser features will ensure that your model remains simple – it will make smaller
assumptions.
• Filter strategy
• Wrapper strategy
• Embedded strategy
MODULE-7
• Understanding the Basics of Logistic Regression
Logistic regression is a supervised machine learning algorithm mainly used for classification
tasks where the goal is to predict the probability that an instance belongs to a given class or not. It
is a kind of statistical algorithm, which analyze the relationship between a set of independent
variables and the dependent binary variables. It is a powerful tool for decision-making. For
example email spam or not.
• Evaluation Metrics :Evaluation metrics are quantitative measures that assess the performance and
effectiveness of a statistical or machine learning model. They provide insights into how well the
model is performing and help in comparing different models or algorithms
• Implementing Logistic Regression
MODULE-8
• Introduction to Decision Tree
• Logic Behind Decision Tree
• Implementing Decision Tree
• Improving Model Performance by Pruning/Hyperparameters Tuning
A decision tree in machine learning is a versatile, interpretable algorithm used for predictive
modelling. It structures decisions based on input data, making it suitable for both classification
and regression tasks. This article delves into the components, terminologies, construction, and
advantages of decision trees, exploring their applications and learning algorithms.
MODULE-9
• Basics of Ensemble Techniques
• Random Forest
• Implementation of Bagging and Random Forest
Ensemble methods are techniques that aim at improving the accuracy of results in models by
combining multiple models instead of using a single model. The combined models increase the
accuracy of the results significantly. This has boosted the popularity of ensemble methods
in machine learning.
• Ensemble methods aim at improving predictability in models by combining several models to make
one very reliable model.
• The most popular ensemble methods are boosting, bagging, and stacking.
• Ensemble methods are ideal for regression and classification, where they reduce bias and variance to
boost the accuracy of models.
1. Bagging
Bagging, the short form for bootstrap aggregating, is mainly applied in classification and regression. It
increases the accuracy of models through decision trees, which reduces variance to a large extent. The
reduction of variance increases accuracy, eliminating overfitting, which is a challenge to many predictive
models.
Bagging is classified into two types, i.e., bootstrapping and aggregation. Bootstrapping is a sampling
technique where samples are derived from the whole population (set) using the replacement procedure. The
sampling with replacement method helps make the selection procedure randomized. The base learning
algorithm is run on the samples to complete the procedure.
Aggregation in bagging is done to incorporate all possible outcomes of the prediction and randomize the
outcome. Without aggregation, predictions will not be accurate because all outcomes are not put into
consideration. Therefore, the aggregation is based on the probability bootstrapping procedures or on the
basis of all outcomes of the predictive models.
Bagging is advantageous since weak base learners are combined to form a single strong learner that is more
stable than single learners. It also eliminates any variance, thereby reducing the overfitting of models. One
limitation of bagging is that it is computationally expensive. Thus, it can lead to more bias in models when
the proper procedure of bagging is ignored.
2. Boosting
Boosting is an ensemble technique that learns from previous predictor mistakes to make better predictions in
the future. The technique combines several weak base learners to form one strong learner, thus significantly
improving the predictability of models. Boosting works by arranging weak learners in a sequence, such that
weak learners learn from the next learner in the sequence to create better predictive models.
3. Stacking
Stacking, another ensemble method, is often referred to as stacked generalization. This technique works by
allowing a training algorithm to ensemble several other similar learning algorithm predictions.
MODULE-10
• Clustering
• Understanding K-means
• Implementation of K-means
o The method of identifying similar groups of data in a large dataset is called clustering or cluster
analysis.
o It is one of the most popular clustering techniques in data science used by data scientists. Entities in
each group are comparatively more similar to entities of that group than those of the other groups. In
this article, I will be taking you through the types of clustering, different clustering algorithms, and a
o K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
o The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
LEARNING OUTCOMES:
Learning Machine Learning can lead to various learning outcomes, encompassing technical,
practical, and ethical aspects. Here are some potential learning outcomes that were observed
involved in the course:
2. **Programming Proficiency:**
a. **Cloud Services:** Understand the use of cloud platforms like AWS, Azure, or
Google Cloud for deploying and scaling machine learning applications.
7. **Containerization:**
a. **Docker Skills:** Learn how to use Docker for containerization, ensuring consistency and
portability of applications across different environments.
a. **CI/CD Practices:** Gain familiarity with CI/CD tools and practices for automating testing,
integration, and deployment processes.
b. **Project Management:**
**Project Planning:** Develop project management skills by planning and executing tasks
PROJECT OVERVIEW
HOME LOAN PREDICTION MODEL
Description:Generally, loan prediction involves the lender looking at various background information about
the applicant and deciding whether the bank should grant the loan. Parameters like credit score, loan
amount, lifestyle, career, and assets are the deciding factors in getting the loan approved. If, in the past,
people with parameters similar to yours have paid their dues timely, it is more likely that your loan would be
granted as well.
Machine learning algorithms can exploit this dependency on past experiences and comparisons with other
applicants and formulate a data science problem to predict the loan status of a new applicant using similar
rules.
Objective:This model is based on SVM(support vector machine) which is a the most popular
supervised learning algorithm that used to solve both classification and regression problems. However
primarily it is used for classification problems.
The main goal of svm is to create the best line or decision boundary(HYPERPLANE) which can
segregate n-dimensional space into classes so that we can easily put the new data point in the correct
category in future.
Loan-Prediction-Classification
o Dataset Description-
o Variable
o Description
Training data:
Test data:
Output: the accuracy of the train data was around 0.79% and that of test data was observed to be 0.84%
which is nearly the same.
UTILITY OF COURSE:
CONCLUSION:
After completion of the course it can be concluded that machine learning is a powerful tool for solving
complex problems, and it has a wide range of applications. Whether you’re looking to predict stock prices,
classify images, or understand natural language, there is a machine learning model that can help you achieve
your goals. With its ability to learn from data and make predictions, machine learning has the potential to
revolutionize the way we work and live.
REFERENCES:
1. Reference Books:
Caruana, Rich, et al. “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-
day readmission.” Proceedings of the 21th ACM SIGKDD international conference on knowledge
discovery and data mining. (2015).
Ancona, Marco, et al. “Towards better understanding of gradient-based attribution methods for deep
neural networks.” arXiv preprint arXiv:1711.06104 (2017).
2. Reference Links:
For datasets:
https://kaggle.com/datasets/
YouTube: https://youtu.be/XckM1pFgZmg?
https://www.projectpro.io/
Training , Validation & Creating of the Model done on Jupyterlab:
https://jupyter.org/
*********THE END********