0% found this document useful (0 votes)
5 views5 pages

Advance Machine Learning 3

The document discusses advanced machine learning concepts, including concept learning through the Find-S algorithm, fraud detection using the PyCaret library, and various word embedding techniques in NLP. It illustrates how concept learning derives general rules from specific examples, highlights the automation of fraud detection with minimal coding, and compares traditional and advanced word embedding methods. Key insights emphasize the importance of feature engineering in fraud detection and the advantages of modern embedding techniques for understanding word meanings.

Uploaded by

riswanthgs23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Advance Machine Learning 3

The document discusses advanced machine learning concepts, including concept learning through the Find-S algorithm, fraud detection using the PyCaret library, and various word embedding techniques in NLP. It illustrates how concept learning derives general rules from specific examples, highlights the automation of fraud detection with minimal coding, and compares traditional and advanced word embedding methods. Key insights emphasize the importance of feature engineering in fraud detection and the advantages of modern embedding techniques for understanding word meanings.

Uploaded by

riswanthgs23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Master of Business Administration

Batch 2023-25

Advanced Machine Learning

Assignment 3

Submitted to:
Dr. Praveen Gujjar

Submitted By:
Rishwanth GS
USN: 23MBAR0069
1. Demonstrate the Use of Concept Learning with Example
Concept learning is the process of deriving a general rule from specific training examples. It
plays a crucial role in supervised machine learning, where a model learns to classify new
data based on given attributes. One of the simplest approaches in concept learning is the
Find-S Algorithm, which finds the most specific hypothesis that fits all positive examples.
The algorithm starts with the most restrictive hypothesis and generalizes it as it encounters
new positive examples.
Example: Predicting if a Person Will Play Tennis
Consider the following dataset where we predict whether a person will play tennis based on
weather conditions.

Outlook Temperature Humidity Wind Play Tennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Using Find-S, we begin with the most specific hypothesis (h = (?, ?, ?, ?)), then generalize it
based on positive examples:
1. First positive example (Overcast, Hot, High, Weak): h = (Overcast, Hot, High, Weak)
2. Second positive example (Rain, Mild, High, Weak): h = (?, ?, High, Weak)
3. Third positive example (Rain, Cool, Normal, Weak): h = (?, ?, ?, Weak)
Final Hypothesis
(?, ?, ?, Weak) → The person will play tennis if the wind is weak, regardless of other
conditions.
Limitations
 Find-S assumes there are no contradictory examples (all data is clean).
 It does not handle negative examples or missing data well.
 It finds only one hypothesis and ignores alternative possibilities.
Despite its simplicity, Find-S introduces the core idea of concept learning and is a stepping
stone to more advanced classification models like Decision Trees and Neural Networks.

2. Illustrate Fraud Detection Using PyCaret


Fraud detection is a critical machine learning application that helps identify suspicious
transactions in financial systems. It involves classifying transactions as either fraudulent (1)
or non-fraudulent (0) based on various features such as transaction amount, location, and
user behavior. PyCaret, an automated machine learning (AutoML) library, simplifies fraud
detection by handling data preprocessing, model selection, and evaluation with minimal
coding.
Example: Detecting Fraudulent Transactions
We use a dataset where each transaction has features like amount, time, transaction type, and
location. Our goal is to predict whether a transaction is fraudulent.
Step 1: Install and Import PyCaret
!pip install pycaret
from pycaret.classification import * import
pandas as pd
Step 2: Load the Fraud Detection Dataset
from pycaret.datasets import get_data
data = get_data('fraud') # Example dataset
print(data.head())
Step 3: Set Up PyCaret
clf = setup(data, target='Class', session_id=123)
 target='Class' means we are predicting whether a transaction is fraudulent.
Step 4: Train a Model
model = create_model('rf') # Train a Random Forest classifier
Step 5: Evaluate the Model
evaluate_model(model)
This opens an interactive dashboard to analyze performance using metrics like Accuracy,
Precision, Recall, and F1-score.
Step 6: Make Predictions
predictions = predict_model(model)
print(predictions[['Class', 'Label']].head())
 Class is the actual fraud label, while Label is the model’s prediction.
Key Insights
1. PyCaret automates fraud detection by selecting the best algorithm.
2. Fraud detection requires feature engineering, including past transaction history
and behavioral analysis.
3. Combining PyCaret with deep learning and anomaly detection can further
improve fraud identification.
This approach makes fraud detection accessible even for those with minimal coding
experience while delivering high accuracy.

3. Highlight the Various Word Embedding Techniques


Word embeddings are techniques used in Natural Language Processing (NLP) to convert
words into numerical vectors while preserving their meaning. Traditional text
representation methods, such as One-Hot Encoding and TF-IDF, do not capture semantic
relationships between words. More advanced techniques like Word2Vec, GloVe, FastText,
and BERT allow models to understand word meanings based on context.
Example: Different Word Embedding Techniques
1. One-Hot Encoding
 Assigns a unique binary vector to each word in a vocabulary.
 Example: For ["dog", "cat", "fish"]:
o dog = [1, 0, 0], cat = [0, 1, 0], fish = [0, 0, 1]
 z‘’ Limitation: High dimensionality and no word similarity captured.
2. TF-IDF (Term Frequency-Inverse Document Frequency)
 Weighs words based on how often they appear in a document relative to
all documents.
 Example: "Apple" in a fruit-related article will have a higher TF-IDF score than in
an article about technology.
 z‘’ Limitation: Does not capture word meaning or relationships.
3. Word2Vec
 Developed by Google, it predicts words based on context.
 Types:
o CBOW (Continuous Bag of Words): Predicts a word from surrounding words.
o Skip-Gram: Predicts surrounding words from a given word.
 Example: "King"−"Man"+"Woman"≈"Queen"\text{"King"} - \text{"Man"} +
\text{"Woman"} \approx \text{"Queen"}
 ‘z’ Advantage: Captures semantic relationships.
4. GloVe (Global Vectors for Word Representation)
 Uses word co-occurrence statistics.
 Example: "Ice" and "winter" appear together frequently, so their vectors will
be similar.
 z‘ ’ Advantage: Captures meaning based on word usage patterns.
5. FastText
 Enhances Word2Vec by considering subwords.
 Example: The word "playing" is broken into [pla, lay, ayi, yin, ing], improving
rare word handling.
 ‘z’ Advantage: Useful for morphologically rich languages.
6. BERT (Bidirectional Encoder Representations from
Transformers)
 Uses deep learning to generate context-aware embeddings.
 Example:
o In "I deposited money in the bank", "bank" means financial institution.
o In "She sat by the river bank", "bank" means land near a river.
 ’‘z Advantage: Most powerful for context-aware NLP tasks.

You might also like