0% found this document useful (0 votes)
3 views

Machine Learning and Data Science ANSWER

The document provides a comprehensive overview of machine learning, data science, and artificial intelligence, covering key concepts, algorithms, and techniques. It includes definitions and examples of supervised, unsupervised, and reinforcement learning, as well as discussions on evaluation metrics, search algorithms, and knowledge representation. Additionally, it addresses advanced topics such as natural language processing, computer vision, and ethical concerns in AI.

Uploaded by

ppal7191
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Machine Learning and Data Science ANSWER

The document provides a comprehensive overview of machine learning, data science, and artificial intelligence, covering key concepts, algorithms, and techniques. It includes definitions and examples of supervised, unsupervised, and reinforcement learning, as well as discussions on evaluation metrics, search algorithms, and knowledge representation. Additionally, it addresses advanced topics such as natural language processing, computer vision, and ethical concerns in AI.

Uploaded by

ppal7191
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

✅Machine Learning and Data Science (1–25)

1. What is machine learning and how is it different from traditional programming?


Machine Learning is a subset of AI that enables systems to learn from data and make
predictions or decisions without being explicitly programmed. Unlike traditional programming,
where rules are manually defined, ML models automatically derive rules from patterns in data.

2. What are the different types of machine learning algorithms?

 Supervised Learning (e.g., regression, classification)


 Unsupervised Learning (e.g., clustering, dimensionality reduction)
 Reinforcement Learning (e.g., agent-based learning through reward signals)

3. What is supervised learning and give some examples?


Supervised learning is where the model is trained on labeled data. Examples include:

 Classification (e.g., spam detection)


 Regression (e.g., predicting house prices)

4. What is unsupervised learning and give some examples?


Unsupervised learning deals with unlabeled data and aims to find hidden patterns. Examples:

 Clustering (e.g., customer segmentation)


 Dimensionality reduction (e.g., PCA)

5. What is reinforcement learning?


Reinforcement Learning involves an agent that learns by interacting with an environment,
receiving rewards or penalties to learn optimal actions.

6. What is a feature in machine learning?


A feature is an individual measurable property or characteristic of a phenomenon being
observed. It is the input variable used by the model.

7. What is feature engineering and why is it important?


Feature engineering involves creating, transforming, or selecting the most relevant features
from raw data to improve model performance. It's crucial because better features often lead to
better models.

8. What are some common feature selection techniques?

 Filter methods (e.g., Chi-square test)


 Wrapper methods (e.g., Recursive Feature Elimination)
 Embedded methods (e.g., Lasso)

9. What is dimensionality reduction?


It’s the process of reducing the number of input variables while preserving as much information
as possible, to simplify models and improve performance.
10. What is Principal Component Analysis (PCA)?
PCA is a linear dimensionality reduction technique that transforms data into a new coordinate
system with uncorrelated variables called principal components, ordered by the amount of
variance they explain.

11. What is the difference between classification and regression?

 Classification predicts discrete labels (e.g., cat or dog)


 Regression predicts continuous values (e.g., temperature)

12. What is clustering in machine learning?


Clustering is an unsupervised learning technique that groups similar data points based on
features, such as in K-Means clustering.

13. What is the k-nearest neighbors (KNN) algorithm?


KNN is a non-parametric algorithm that classifies data based on the majority class of its 'k'
nearest neighbors in feature space.

14. Explain decision trees and their advantages.


A Decision Tree is a flowchart-like model that splits data into branches to arrive at a decision.
Advantages: Easy to understand, handles both numerical and categorical data, no need for
feature scaling.

15. What is a random forest?


A Random Forest is an ensemble of decision trees that improves prediction accuracy by
averaging multiple tree outputs, reducing overfitting.

16. What is the difference between bagging and boosting?

 Bagging builds models in parallel to reduce variance (e.g., Random Forest)


 Boosting builds models sequentially to reduce bias (e.g., Gradient Boosting)

17. What is gradient boosting?


Gradient Boosting builds an ensemble of weak learners sequentially, where each model corrects
the errors of the previous one using gradient descent optimization.

18. What is a confusion matrix and how is it used?


A confusion matrix is a 2x2 or NxN table used to evaluate classification models. It shows true vs.
predicted classes and helps calculate accuracy, precision, recall, and F1-score.

19. What are precision, recall, and F1-score?

 Precision: TP / (TP + FP) – How many predicted positives are truly positive
 Recall: TP / (TP + FN) – How many actual positives are identified
 F1-score: Harmonic mean of precision and recall – balances the two

20. What is ROC-AUC and why is it important?


ROC-AUC evaluates classification performance across thresholds. AUC represents the
probability that the classifier ranks a random positive instance higher than a random negative
one. Higher AUC indicates better performance.

21. What is cross-validation and why is it used?


Cross-validation splits the dataset into multiple parts to train and validate the model multiple
times, ensuring robustness and preventing overfitting.

22. What is the difference between parametric and non-parametric models?

 Parametric models assume a fixed form (e.g., Linear Regression)


 Non-parametric models make fewer assumptions (e.g., KNN, Decision Trees)

23. What is regularization and what are L1 and L2 penalties?


Regularization adds a penalty term to the loss function to reduce overfitting.

 L1 (Lasso) encourages sparsity (feature selection)


 L2 (Ridge) shrinks coefficients without making them zero

24. What are some commonly used Python libraries for data science?

 Pandas, NumPy – data manipulation


 Scikit-learn – machine learning
 Matplotlib, Seaborn – visualization
 TensorFlow, PyTorch – deep learning

25. What is the CRISP-DM process in data science?


CRISP-DM (Cross-Industry Standard Process for Data Mining) includes:

1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment

26. What is Artificial Intelligence?


Artificial Intelligence (AI) is the simulation of human intelligence in machines,
enabling them to perform tasks such as reasoning, learning, perception, and decision-
making autonomously.

27. What are the main goals of AI?

 Automation of cognitive tasks


 Learning from data
 Mimicking human intelligence
 Solving complex problems efficiently
 Improving decision-making capabilities

28. What are the different types of AI?


 Narrow AI: Performs specific tasks (e.g., voice assistants)
 General AI: Can perform any intellectual task a human can do
 Super AI: Hypothetical AI that surpasses human intelligence

29. What is the difference between narrow AI and general AI?

 Narrow AI specializes in one task with high efficiency


 General AI can understand, learn, and apply knowledge across different
domains like a human

30. What is the Turing Test and why is it important?


The Turing Test evaluates a machine’s ability to exhibit human-like intelligence. If a
machine can engage in a conversation indistinguishably from a human, it passes the
test—indicating potential for true AI.

31. What are intelligent agents?


An intelligent agent is an autonomous entity that perceives its environment through
sensors and acts upon it using actuators to achieve specific goals.

32. What is the difference between AI, Machine Learning, and Deep Learning?

 AI is the broader field of creating intelligent systems


 Machine Learning is a subset that enables systems to learn from data
 Deep Learning is a specialized branch of ML using neural networks with
multiple layers

33. What is the role of knowledge representation in AI?


Knowledge representation allows AI systems to store, retrieve, and manipulate
knowledge logically and efficiently, forming the backbone for reasoning and decision-
making.

34. What are some applications of AI in real life?

 Virtual assistants (e.g., Siri, Alexa)


 Fraud detection
 Healthcare diagnostics
 Autonomous vehicles
 Recommendation systems

35. What are the main challenges in AI development?

 Data privacy and security


 Bias in training data
 High computational cost
 Lack of explainability
 Ethical and legal concerns
✅Search Algorithms

36. What is a search algorithm in AI?


A search algorithm helps find solutions or paths in a problem space by exploring
possible states using strategies like BFS, DFS, or A*.

37. Explain the difference between uninformed and informed search.

 Uninformed search has no domain-specific knowledge (e.g., BFS, DFS)


 Informed search uses heuristics to guide the search (e.g., A*)

38. What is Breadth-First Search (BFS)?


BFS explores all nodes at the present depth before moving to the next level. It's
complete and optimal for uniform-cost paths but can consume a lot of memory.

39. What is Depth-First Search (DFS)?


DFS explores as far as possible down each branch before backtracking. It uses less
memory but may get stuck in infinite paths if not handled properly.

40. What is the A algorithm and how does it work?*


A* combines actual cost (g) and heuristic cost (h) to find the shortest path.
f(n) = g(n) + h(n). It’s both complete and optimal when the heuristic is admissible.

41. What is a heuristic function?


A heuristic function estimates the cost from the current state to the goal. It helps
prioritize paths in informed search algorithms.

42. What is the difference between greedy search and A search?*

 Greedy Search uses only heuristic: f(n) = h(n)


 A* uses both cost so far and heuristic: f(n) = g(n) + h(n)
A* is more reliable and optimal.

43. What are the limitations of search algorithms?

 High memory usage


 Computational inefficiency
 May not scale well with large state spaces
 Incomplete or suboptimal solutions in some cases

44. What is hill climbing algorithm?


Hill climbing is a local search algorithm that moves in the direction of increasing value
(higher heuristic) and stops when it reaches a peak. It may get stuck in local maxima.

45. What is simulated annealing?


Simulated annealing overcomes local maxima by occasionally allowing worse moves. As
it "cools," the probability of accepting worse moves decreases, encouraging global
optimization.
✅Knowledge Representation and Reasoning

46. What is a knowledge base in AI?


A knowledge base is a structured repository of facts, rules, and relationships used by
AI systems to reason and draw conclusions.

47. What is propositional logic?


It represents knowledge using true/false propositions connected by logical operators
(AND, OR, NOT). It's simple but limited in expressiveness.

48. What is first-order logic?


First-order logic extends propositional logic by using quantifiers, predicates, and
variables, enabling representation of more complex knowledge and relationships.

49. What is an ontology in AI?


An ontology defines a set of concepts and categories in a domain and the relationships
between them. It's essential for semantic understanding and reasoning.

50. What is reasoning and inference?


Reasoning is the process of deriving logical conclusions from known facts or rules.
Inference is the actual mechanism that applies logic to draw conclusions.

51. What is forward and backward chaining?

 Forward chaining starts with known facts and applies rules to infer
conclusions.
 Backward chaining starts with a goal and works backward to verify if known
facts support it.

52. What is non-monotonic reasoning?


Non-monotonic reasoning allows conclusions to be withdrawn when new information
contradicts them. It reflects human-like reasoning under uncertainty.

53. What is fuzzy logic and how is it used in AI?


Fuzzy logic allows reasoning with degrees of truth rather than binary logic. It's useful in
control systems (e.g., washing machines, air conditioners) where inputs are vague.

54. What is the difference between deterministic and non-deterministic systems?

 Deterministic: Same input always produces the same output


 Non-deterministic: Outcomes may vary even with the same input

55. What is constraint satisfaction problem (CSP)?


CSPs are problems where variables must be assigned values that satisfy a set of
constraints (e.g., Sudoku, scheduling). Solved using backtracking and heuristics.
✅Machine Learning Basics (in AI context)

56. What is machine learning in the context of AI?


Machine Learning is a core subset of AI focused on enabling systems to learn patterns
from data and make decisions without being explicitly programmed.

57. What are the types of machine learning?

 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning

58. What is supervised learning?


In supervised learning, models are trained using labeled data to predict output.
Common examples: classification and regression tasks.

59. What is unsupervised learning?


Unsupervised learning works with unlabeled data to discover hidden structures or
patterns. Clustering and association are key examples.

60. What is reinforcement learning?


Reinforcement learning trains agents to take actions in an environment by receiving
rewards or penalties to maximize cumulative reward.

61. What is overfitting and underfitting?

 Overfitting: Model learns noise in training data; poor generalization


 Underfitting: Model is too simple to capture the underlying patterns

62. What is the bias-variance tradeoff?


It describes the tradeoff between a model’s ability to generalize (bias) and its sensitivity
to training data (variance). The goal is to find the optimal balance.

63. What are common evaluation metrics in ML models?

 Accuracy
 Precision
 Recall
 F1-Score
 ROC-AUC
 Mean Squared Error (for regression)

64. What is a confusion matrix?


A confusion matrix displays the count of true/false positives and negatives to help
assess classification model performance.

65. What is cross-validation?


Cross-validation evaluates model performance by dividing data into k subsets and
rotating the training/validation sets to ensure consistency and robustness.
✅Advanced Topics and Applications

66. What is Natural Language Processing (NLP)?


NLP enables machines to understand, interpret, and generate human language.
Applications include chatbots, sentiment analysis, and translation.

67. What are the main components of a speech recognition system?

 Acoustic Model
 Language Model
 Feature Extractor
 Decoder
 Lexicon

68. What is computer vision?


Computer Vision enables systems to understand and interpret visual inputs like images
and videos. Tasks include object detection, image classification, and facial recognition.

69. What is deep learning and how does it relate to AI?


Deep Learning is a branch of ML based on neural networks with multiple layers. It
powers complex AI applications such as image recognition and language models.

70. What are neural networks?


Neural Networks are algorithms modeled after the human brain, consisting of
interconnected nodes (neurons) that process input data to learn patterns.

71. What is backpropagation?


Backpropagation is a learning algorithm that adjusts weights in a neural network by
minimizing error through gradient descent.

72. What is a convolutional neural network (CNN)?


CNNs are specialized neural networks for processing grid-like data such as images. They
use convolutional layers to automatically detect spatial features.

73. What is a recurrent neural network (RNN)?


RNNs are designed to handle sequential data by retaining a memory of previous inputs.
They're used in tasks like language modeling and time series forecasting.

74. What are ethical concerns with AI?

 Bias and fairness


 Data privacy
 Job displacement
 Misuse in surveillance or autonomous weapons
 Lack of transparency
75. What is explainable AI (XAI) and why is it important?
XAI focuses on making AI decisions transparent and understandable. It’s essential for
trust, accountability, and regulatory compliance, especially in sensitive domains like
healthcare and finance.

You might also like