Machine Learning
Machine Learning
Machine Learning
UNIT - 1
INTRODUCTION TO MACHINE LEARNING AND DATA PREPROCESSING
Introduction to Machine Learning (ML) — Types of ML (Supervised, Unsupervised,
Reinforcement Learning) — Applications of ML — Data Preprocessing: Data
Cleaning, Handling Missing Values, Feature Scaling (Normalization, Standardization),
Data Splitting (Training, Validation, Test Sets), Data Encoding (Categorical, One-hot
Encoding)
Basics of AI and ML
Artificial Intelligence (AI) is when machines (like computers) are made to act
like humans by learning, thinking, and solving problems. For example, when
you talk to Siri or Alexa, they understand and respond to you like a person
would.
•Interpretability: Models like decision trees and linear regression are easier to understand and explain.
•Lower Computational Requirements: Generally requires less powerful hardware and computational
resources.
•Effectiveness with Small Data: Performs well with smaller datasets where deep learning might struggle.
•Faster Training: Typically has shorter training times compared to deep learning models.
•Flexibility with Different Algorithms: Offers a variety of algorithms for different types of problems.
•Ease of Implementation: Often simpler to implement and fine-tune compared to deep learning.
•Less Data Preprocessing: Many traditional ML algorithms require less data preprocessing and
augmentation.
•Suitability for Structured Data: Performs exceptionally well with structured data like tabular data.
•Lower Risk of Overfitting: Easier to regularize and avoid overfitting, especially with smaller datasets.
•Wider Applicability: Suitable for a broad range of applications, including finance, healthcare, and
marketing.
Drawbacks of ML
•Manual Feature Engineering: Requires manual extraction and selection of features by domain experts,
which can be time-consuming and may not capture all relevant information.
•Performance on Complex Data: May struggle with complex data types such as images, audio, and
unstructured text, requiring extensive preprocessing and domain-specific knowledge.
•Scalability with Data Size: Often requires a large amount of hand-crafted features and may not scale well
with large datasets due to computational and memory limitations.
•Interpretability: Models such as decision trees and linear regression are often more interpretable, allowing
humans to understand how decisions are made based on the input features.
•Computational Resources: Generally requires less computational power and can be implemented on less
powerful hardware, making it suitable for applications with resource constraints.
•Data Requirements: Can perform well with smaller datasets, as many algorithms do not require extensive
amounts of data to generalize effectively.
Types of ML.
In supervised learning, we train a model using data with known answers.
Because we know the correct answers, we guide the learning process to get
accurate results. We feed this data into a Machine Learning algorithm to train
the model. Once trained, the model can predict outcomes for new, unknown
data.
In unsupervised learning, the training data is unknown and not labeled,
meaning no one has looked at it before. Since the data isn't labeled, the
algorithm doesn't get specific instructions. This data is fed to the machine
learning algorithm to train it. The trained model then looks for patterns and
tries to provide useful results.
Reinforcement Learning (RL) is a type of machine learning where an agent
learns to make decisions by interacting with an environment and receiving
rewards or penalties. The agent improves its strategy over time based on
the feedback it receives. It’s like learning through trial and error.