0% found this document useful (0 votes)
12 views9 pages

Module 2-2

The document discusses different informed search strategies including best-first search, greedy best-first search, and A* search. It also covers heuristic functions and their impact on search performance as well as the relationships between machine learning, data science, data mining, and data analytics.

Uploaded by

nagraj1312003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Module 2-2

The document discusses different informed search strategies including best-first search, greedy best-first search, and A* search. It also covers heuristic functions and their impact on search performance as well as the relationships between machine learning, data science, data mining, and data analytics.

Uploaded by

nagraj1312003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Module 2

INFORMED (HEURISTIC) SEARCH STRATEGIES

1. Introduction: Discusses how informed search strategies use problem-specific knowledge efficiently.
2. Best-First Search: Prioritizes node expansion based on an evaluation function, f(n).
3. Evaluation Function: Represents a cost estimate; nodes with the lowest evaluation are expanded first.
4. Implementation: Similar to uniform-cost search, but uses f instead of g for prioritization.
5. Heuristic Function: Often included in f as h(n), estimating the cost to reach the goal.
6. Special Cases: Best-first tree search includes depth-first search.
7. Efficiency: Utilizes problem-specific knowledge, particularly through heuristic functions, for efficient
search.

Greedy Best-First Search:


1. Approach: Prioritizes nodes based solely on heuristic function, without considering path cost.
2. Node Selection: Chooses nodes closest to the goal according to the heuristic function.
3. Heuristic Function: Provides cost estimate from current node to goal; lower values are preferred.
4. Efficiency: Quick in finding solutions, especially with accurate heuristics.
5. Optimality: May not find optimal solutions; prone to local optima.
6. Application: Suitable for domains where approximate solutions are acceptable or for large search spaces.
7. Example: In pathfinding, prioritizes nodes closer to the destination, potentially leading to suboptimal paths.

A Search:
1. Overview: A Search is an informed search algorithm that combines the advantages of both Dijkstra's
algorithm and Greedy Best-First Search by considering both the cost to reach a node and the estimated cost to
reach the goal.
2. Node Selection: A Search selects nodes for expansion based on a combination of the actual cost from the
start node to the current node (g(n)) and the estimated cost from the current node to the goal (h(n)), using the
function f(n) = g(n) + h(n)
3. Optimality: A Search is optimal when the heuristic function h(n) is admissible (never overestimates the true
cost to reach the goal) and consistent (satisfies the triangle inequality).
4. Efficiency: A Search is efficient in finding optimal solutions, especially when a good heuristic function is
provided, as it focuses the search on the most promising paths.
5. Completeness: A Search is complete in finite state spaces, meaning it will eventually find a solution if one
exists.
6. Memory Usage: A Search requires memory to store visited nodes and the priority queue, which can be
significant in large search spaces.
7. Applications: Widely used in pathfinding, puzzle-solving, and various optimization problems where finding
the optimal solution is important.
8. Example: In pathfinding, A Search expands nodes in a way that prioritizes paths likely to lead to the goal
while considering the actual distance traveled. This results in finding the shortest path efficiently.

heuristic functions:
1. Definition: Heuristic functions estimate the cost from a given state to the goal in search algorithms.
2. Admissibility: Heuristics should not overestimate the true cost to reach the goal for optimality.
3. Consistency: Consistent heuristics ensure efficiency in certain algorithms by maintaining a specific
relationship between node costs.
4. Examples: Common heuristics include Manhattan distance, Euclidean distance, and the number of
misplaced tiles in puzzles
5. Selection: Effective heuristics balance informativeness with computational efficiency, often leveraging
domain-specific knowledge.
6. Impact: The quality of the heuristic directly influences the efficiency of search algorithms.
7. Trade-offs: Achieving optimal performance often involves balancing between admissibility and
informativeness.

effect of heuristic accuracy on performance:


1. Efficiency: Accurate heuristics lead to faster convergence to optimal solutions.
2. Optimal Solutions: They increase the likelihood of finding optimal solutions.
3. Reduced Search Space: Accurate heuristics decrease the search space, reducing computational overhead.
4. Faster Convergence: Algorithms converge more quickly with accurate heuristics.
5. Trade-offs: Designing highly accurate heuristics can be challenging and may increase computational
complexity.
6. Informativeness: Heuristics should balance accuracy and informativeness to avoid excessive computational
overhead.
7. Domain Specificity: Effectiveness depends on the problem domain, requiring tailoring to exploit domain-
specific knowledge.

need for machine learning:


1. Complex Patterns: Machine learning is essential for handling complex patterns and relationships within
data that may be difficult to capture using traditional programming approaches.
2. Large Datasets: With the increasing availability of large datasets, machine learning algorithms can
efficiently process and extract valuable insights from vast amounts of information.
3. Automation: Machine learning enables automation of tasks that would otherwise be time-consuming or
impractical to perform manually, leading to increased efficiency and productivity.
4. Personalization: Machine learning algorithms can personalize experiences for users by analyzing their
preferences, behaviors, and historical data to provide tailored recommendations and services.
5. Prediction and Forecasting: Machine learning models excel at prediction and forecasting tasks, allowing
businesses to anticipate future trends, make informed decisions, and mitigate risks.
6. Optimization: Machine learning techniques can optimize processes and systems by identifying optimal
solutions to complex problems, such as resource allocation, scheduling, and logistics.
7. Adaptability: Machine learning models can adapt and improve over time as they are exposed to new data,
enabling continuous learning and refinement of predictions and recommendations.
8. Advanced Technologies: Machine learning is a fundamental component of many advanced technologies,
including natural language processing, computer vision, robotics, and autonomous systems.
Overall, machine learning plays a crucial role in addressing the challenges posed by large and complex
datasets, enabling automation, personalization, prediction, optimization, and powering advanced technologies
across various domains.
1. AI and ML Relationship:
- AI aims to develop intelligent agents, while ML is a branch of AI focused on learning from data.
- ML's resurgence came with data-driven systems, emphasizing finding patterns in data.
2. Machine Learning:
- ML extracts patterns from data for prediction, including learning from examples and reinforcement
learning.
- It's depicted as a subbranch of AI, handling unknown instances to generate results.
3. Deep Learning:
- Subbranch of ML using neural networks inspired by human neurons.
- Deep Learning has revolutionized tasks like image recognition and natural language processing.
Machine Learning, Data Science, Data Mining, and Data Analytics:

1. Machine Learning (ML):

- ML focuses on creating algorithms that can learn from and make predictions or decisions based on data.

- It involves building models that can automatically improve their performance over time.

- ML is a subset of both Data Science and Artificial Intelligence (AI).

2. Data Science:

- Data Science is an interdisciplinary field that encompasses various techniques and methods to extract insights and
knowledge from data.

- It involves collecting, processing, analyzing, and interpreting large volumes of structured and unstructured data to
uncover patterns, trends, and relationships.

- Data Science incorporates elements of statistics, ML, data visualization, and domain knowledge.

3. Data Mining:

- Data Mining is the process of discovering patterns, correlations, or anomalies in large datasets to extract useful
information.

- It often involves applying ML algorithms to identify hidden patterns or relationships that may not be immediately
apparent.

- Data Mining is a subset of both Data Science and ML, focusing specifically on uncovering patterns in data.

4. Data Analytics:

- Data Analytics involves the exploration, interpretation, and communication of meaningful patterns and insights
derived from data.

- It encompasses various techniques, including statistical analysis, ML, and data visualization, to support decision-
making and business strategy.

- Data Analytics can be descriptive (what happened), diagnostic (why it happened), predictive (what will happen), or
prescriptive (what action to take).
Supervised learning
1. Supervised Learning: In supervised learning, models learn from labeled data, with a clear distinction
between input features and corresponding output labels.
2. Classification Task: This is a specific type of supervised learning where the goal is to categorize input data
into predefined classes or categories.
3. Training and Testing: Supervised learning involves two main stages: training, where the model learns
from labeled data, and testing, where its performance is evaluated on unseen data to assess its ability to
generalize
4. Input and Output: Input attributes represent the features or characteristics of the data, while the output
variable, also known as the target variable, is what the model aims to predict.
5. Examples and Algorithms: Supervised learning encompasses various tasks and algorithms, with
classification being one example. Common classification algorithms include Decision Trees, Random Forest,
Support Vector Machines (SVM), Naïve Bayes, and Artificial Neural Networks (including Deep Learning
networks like CNNs).

unsupervised learning:
1. Unsupervised Learning: Unsupervised learning is a type of machine learning where the model learns
patterns and structures from unlabeled data.
2. No Labels: Unlike supervised learning, there are no predefined output labels provided in the training data.
The model must discover the underlying structure of the data on its own.
3. Clustering and Dimensionality Reduction: Unsupervised learning tasks include clustering, where the
goal is to group similar data points together, and dimensionality reduction, which involves reducing the
number of features in the data while preserving its essential characteristics.
4. Discovering Patterns: The model learns to identify patterns, trends, and relationships within the data
without explicit guidance or supervision.
5. Applications: Unsupervised learning is used in various applications such as customer segmentation,
anomaly detection, recommendation systems, and exploratory data analysis. It can uncover hidden insights
and structures in data that may not be immediately apparent.

Cluster Analysis:
1. Unsupervised Learning: Cluster analysis is a type of unsupervised learning where objects are grouped
into clusters based on similarities in their attributes.
2. Grouping Objects: It aims to partition data into disjoint clusters, where objects within the same cluster are
more similar to each other than to those in other clusters
3. Examples: Common applications include image segmentation, medical image analysis for detecting
abnormalities, and clustering gene signatures in databases.
4. Clustering Scheme: Clustering algorithms group data objects into clusters, as illustrated in Figure 1.9,
where images of dogs and cats are grouped into distinct clusters.
5. Key Algorithms: Important clustering algorithms include k-means and hierarchical methods, which
partition data based on criteria such as minimizing intra-cluster variance or maximizing inter-cluster
differences.
Semi-supervised Learning:
1. Utilization of Unlabeled Data: Semi-supervised learning uses unlabeled data by assigning pseudo-labels,
complementing the limited labeled data.
2. Combination of Data: Labeled and pseudo-labeled data are merged to create a larger dataset for training
models, reducing the need for extensive labeling.
3. Cost-Effective: It offers a cost-effective approach to model training, leveraging unlabeled data without the
expense of extensive labeling efforts.

Reinforcement Learning:
1. Human-like Learning: Reinforcement learning mirrors human learning, where agents interact with the
environment and receive rewards for their actions.
2. Reward Maximization: Agents strive to maximize cumulative rewards by learning from feedback received
during interactions with the environment.
3. Experience-based Learning: Learning occurs through experience as agents explore actions, receive
rewards or punishments, and adjust their strategies accordingly.
Example of Grid Game:
1. Interactive Exploration: In the grid game, the agent explores paths from a starting point to a goal, learning
from rewards and punishments encountered along the way
2. Model-Free Approach: Reinforcement learning in the grid game doesn't require an explicit model of the
environment, focusing on learning from interactions.
3. Strategy Development: Through trial and error, the agent develops strategies to navigate obstacles and
reach the goal, optimizing its behavior based on learned experiences.
MACHINE LEARNING PROCESS
Machine learning application:
1. Business Bankruptcy Prediction: Using machine learning to analyze financial data and predict business
firm bankruptcies, aiding in risk assessment and decision-making.
2. Banking Fraud Detection: Machine learning algorithms identify loan defaulters and detect credit card
fraud, enhancing security and risk management in banking operations
3. Image Processing: Machine learning powers image search engines, object identification, and image
classification, enabling various applications in fields like healthcare, surveillance, and entertainment.
4. Audio/Voice Chatbots: Machine learning enables the development of chatbots for customer support,
speech-to-text conversion, and text-to-voice conversion, improving user interaction and accessibility.
5. Telecommunication Fraud Detection: Machine learning analyzes telecommunication data to detect
fraudulent calls and identify customer churn, improving service quality and security.
6. Marketing Analysis: Machine learning tools analyze retail sales data, market basket patterns, and customer
travel patterns, aiding in marketing strategy formulation and decision-making.
7. Game AI: Machine learning techniques develop intelligent game programs for games like chess and GO,
enhancing player experience and providing challenging opponents.
8. Natural Language Translation: Machine learning powers language translation services, text
summarization, and sentiment analysis, facilitating communication across languages and contexts.
9. Web Analysis and Services: Machine learning algorithms identify access patterns, detect email spams, and
personalize web services, enhancing user experience and security online.
10. Medicine Diagnosis and Treatment Prediction: Machine learning models predict diseases based on
symptoms and assess treatment effectiveness using patient history, improving healthcare delivery and patient
outcomes.

Big Data refers to large and complex datasets that are difficult to manage and analyze using traditional
methods. It's characterized by four main attributes:
1. Volume: Big Data involves a massive amount of data, ranging from terabytes to petabytes and beyond.
2. Velocity: Data is generated at a high speed, requiring rapid processing to keep up with the continuous stream
of incoming data.
3. Variety: Big Data comes in various forms, including structured (like databases), semi-structured (like XML),
and unstructured (like text, images, and videos).
4. Veracity: Veracity refers to the reliability and quality of the data, as Big Data often includes data from
diverse sources with varying levels of accuracy.
These characteristics pose challenges but also offer opportunities for organizations to derive insights and make
informed decisions.

Big Data Analytics refers to the process of collecting, preprocessing, and analyzing data to generate useful
insights for decision-making. There are four main types of data analytics:
1. Descriptive Analytics:
- Focuses on describing the main features of the data.
- Quantifies collected data without making inferences.
- Essentially statistics without inference.
2. Diagnostic Analytics:
- Aims to answer the question "Why?"
- Also known as causal analysis.
- Seeks to identify the cause and effect of events, such as reasons for low product sales.
3. Predictive Analytics:
- Focuses on predicting future outcomes based on historical data.
- Applies algorithms to identify patterns and make predictions.
- Core aspect of machine learning, aiming to answer "What will happen in the future given this data?"
4. Prescriptive Analytics:
- Goes beyond prediction to suggest the best course of action.
- Helps in decision-making by providing a set of actions to take.
- Assists organizations in planning for the future and mitigating risks.
These types of analytics play crucial roles in leveraging data to drive business decisions and improve
organizational performance.

The Big Data Analysis Framework


1. Data Connection Layer:
- Responsible for connecting to various data sources, including databases, data warehouses, streaming data
sources, and external APIs.
- Ingests raw data into the system for further processing.
2. Data Management Layer:
- Handles the storage, organization, and processing of the ingested data.
- Includes functionalities like data cleaning, transformation, and storage optimization.
- Utilizes distributed storage systems like Hadoop Distributed File System (HDFS) or cloud storage
solutions.
3. Data Analytics Layer:
- Core layer where data analysis and processing take place.
- Applies various analytics techniques such as descriptive, diagnostic, predictive, and prescriptive analytics.
- Utilizes distributed computing frameworks like Apache Spark, Apache Flink, or TensorFlow for scalable
and efficient processing.
4. Presentation Layer:
- Presents the analyzed data and insights to end-users in a human-readable format.
- Includes visualization tools, dashboards, reports, and interactive interfaces.
- Enables decision-makers to understand and interpret the results of data analysis effectively.
This layered architecture provides a structured approach to perform data analytics on big data, enabling
organizations to derive valuable insights and make informed decisions.

Data preprocessing
Data preprocessing involves several essential steps to prepare raw data for analysis.
1. **Cleaning Data:**
- Handling missing values and outliers.
- Correcting errors and inconsistencies
2. **Transforming Data:**
- Scaling numerical features.
- Encoding categorical variables.
- Creating or modifying features.
3. **Reducing Data:**
- Reducing dimensionality.
- Sampling data if needed.
4. **Integrating Data:**
- Combining data from different sources.
5. **Normalizing Data:**
- Ensuring data is on a similar scale.
6. **Discretizing Data:**
- Converting continuous variables into discrete intervals.
7. **Handling Imbalanced Data:**
- Addressing class imbalance if present.
By performing these preprocessing steps, analysts can enhance the quality and usability of data for subsequent
analysis and modeling tasks.
Do
Data visualization
Sums on
Best first search, A*, Central tendency (mean , median , mode), Standard deviation, Mean absolute
Deviation(MAD), Correlation, Gaussian Elimination method, matrix,

You might also like