Unit 3 &4 BDA Notes
Unit 3 &4 BDA Notes
Classification
Examples:
Decision Trees
Random Forest
Naive Bayes
Neural Networks
Prediction
Prediction, on the other hand, deals with forecasting continuous values based
on input data. It is widely used in regression tasks where the output is a
numeric value rather than a category.
Examples:
Weather forecasting
Sales forecasting
Linear Regression
Polynomial Regression
1. Overfitting – Decision trees can grow very deep, making them overly
complex and prone to memorizing rather than generalizing.
2. Instability – A small change in the data can lead to a completely
different tree structure.
2. At each step, the algorithm selects the best feature to divide the data,
often using metrics like Gini Index or Entropy (Information Gain).
Bayesian Classification
Bayes' Theorem
Where:
( P(A|B) ) is the probability of event ( A ) occurring given that event
( B ) has occurred.
Limitations
Advantages of Backpropagation
Classification Accuracy
Recall – Captures how well the model finds all relevant instances.
Improving Accuracy
Clustering
Types of Clustering:
Fraud detection
Spatial Mining
Web Mining
Web mining is the process of extracting useful insights from web data,
including webpages, links, and user interactions. It helps businesses,
researchers, and organizations gain valuable knowledge from online sources.
Applications:
E-commerce personalization
Text Mining
Key Techniques:
Applications:
Spam detection
Automated summarization
Would you like to explore a real-world use case or coding example for web
mining or text mining?
Web Mining
Web mining is the process of extracting useful insights from web data,
including webpages, links, and user interactions. It helps businesses,
researchers, and organizations gain valuable knowledge from online sources.
Applications:
E-commerce personalization
Text Mining
Text mining focuses on extracting meaningful insights from unstructured text
data, such as emails, articles, and social media posts. It combines natural
language processing (NLP) and machine learning techniques.
Key Techniques:
Applications:
Spam detection
Automated summarization
Would you like to explore a real-world use case or coding example for web
mining or text mining?
Unit 4 :--
Hadoop's journey began in 2002 when Doug Cutting and Mike Cafarella
were working on the Apache Nutch project, a web search engine. They
needed a way to store and process massive amounts of data efficiently.
Key Milestones:
10. Sqoop & Flume – Tools for data ingestion from external sources.
3. MapReduce
4. Hadoop Common
1. Block Storage – Files are split into blocks and distributed across
nodes.
3. Write Once, Read Many – Optimized for batch processing rather than
frequent updates.
For a deeper dive into HDFS, you can check out this guide or explore its
architecture here. Let me know if you want to focus on a specific aspect!
Design of HDFS
1. Block Storage – Files are split into blocks and distributed across
nodes.
3. Write Once, Read Many – Optimized for batch processing rather than
frequent updates.
4. High Availability – Uses Secondary NameNode or Federation for
redundancy.
You can explore more details on HDFS architecture and Java interfaces
here and here. Want to dive into a specific aspect?
4. Application Layer – Includes tools like Hive, Pig, and Spark for data
analysis.
For a detailed breakdown, you can explore this guide or this explanation. Let
me know if you want to focus on a specific aspect!
1. Prerequisites
2. Installation Steps
3. Development Tools
For a detailed setup guide, check out this tutorial or this step-by-step guide.
Let me know if you need help with a specific part!
Hadoop Distribution
Here are some essential Hadoop commands used for managing HDFS:
For a more detailed list of commands, you can explore this guide or this
tutorial. Let me know if you need help with a specific command!
Hadoop Distribution
Here are some essential Hadoop commands used for managing HDFS:
For a more detailed list of commands, you can explore this guide or this
tutorial. Let me know if you need help with a specific command!
For a detailed step-by-step guide, check out this tutorial or this Eclipse setup
guide. Let me know if you need help with a specific part!