0% found this document useful (0 votes)
0 views17 pages

40 Questions of Python, AIML Data Analysis

The document covers fundamental concepts in Python programming, including data types, control structures, and object-oriented programming. It also discusses machine learning basics, such as types of learning, overfitting, model evaluation, and the confusion matrix. Key topics include the differences between interpreters and compilers, data handling techniques, and stages of building machine learning models.

Uploaded by

caps39968
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views17 pages

40 Questions of Python, AIML Data Analysis

The document covers fundamental concepts in Python programming, including data types, control structures, and object-oriented programming. It also discusses machine learning basics, such as types of learning, overfitting, model evaluation, and the confusion matrix. Key topics include the differences between interpreters and compilers, data handling techniques, and stages of building machine learning models.

Uploaded by

caps39968
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

PYTHON FUNDAMENTALS

Q1: What is the fundamental difference between an interpreter and a compiler?

A: An interpreter translates code line-by-line and executes it immediately, while a compiler translates the en re code
into machine code before execu on.

Q2: What are the built-in data types in Python?

A: The built-in data types in Python include integers, floats, strings, booleans, lists, dic onaries, tuples, and sets.

Q3: Explain the 'if' statement in Python.

A: The 'if' statement in Python is used for condi onal execu on. It executes a block of code only if a specified
condi on is true.

Q4: What is a list in Python?

A: A list is a data structure in Python that can hold a collec on of items. Lists are mutable, ordered, and
can contain elements of different data types.

Q5: What is a dic onary in Python?

A: A dic onary in Python is a collec on of key-value pairs. It is unordered, mutable, and each key within a dic onary
must be unique.

Q6: Explain tuples in Python.

A: Tuples are immutable sequences in Python, typically used to store collec ons of heterogeneous data. They are
created using parentheses and can contain elements of different data types.

Q7: What are sets in Python?

A: Sets in Python are unordered collec ons of unique elements. They are mutable but do not allow duplicate values.

Q8: What is Object-Oriented Programming (OOP)?

A: Object-Oriented Programming (OOP) is a programming paradigm based on the concept of objects, which
can contain data in the form of a ributes and code in the form of methods.

Q9: Give an example of a language that uses an interpreter.

A: Python is an example of a language that uses an interpreter. The Python interpreter executes Python code
directly.

Q10: What is the difference between a float and an integer in Python?

A: A float is a data type that represents floa ng-point numbers (decimal numbers), while an integer represents whole
numbers without any decimal point.

Q11: What is the purpose of the 'else' statement in Python's 'if' condi on?

A: The 'else' statement is used to execute a block of code when the condi on specified in the 'if' statement is false.

Q12: How do you access elements in a list in Python?

A: Elements in a list can be accessed using indexing. Indexing starts from 0, so the first element is at index 0, the
second at index 1, and so on.

Q13: Can a dic onary in Python contain duplicate values?

A: No, dic onaries in Python cannot contain duplicate keys. Each key must be unique within a dic onary.

Q14: What is the main difference between a list and a tuple?


A: The main difference is that lists are mutable (can be changed), while tuples are immutable (cannot be changed
a er crea on).

Q15: How do you remove an item from a set in Python?

A: You can use the 'remove()' method to remove a specific item from a set, or 'discard()' method which won't raise an
error if the item is not present.

Q16: What is a class in Python?

A: A class in Python is a blueprint for crea ng objects. It defines the a ributes and methods common to all objects of
a certain kind.

Q17: Explain the advantage of using a compiler over an interpreter.

A: The main advantage of using a compiler is that it translates the en re code into machine code before execu on,
poten ally leading to faster execu on compared to interpreta on.

Q18: How do you check the type of a variable in Python?

A: You can use the 'type()' func on to check the type of a variable in Python. For example, 'type(variable_name)'
returns the type of 'variable_name'.

Q19: What is the purpose of the 'elif' statement in Python's 'if-elif-else' ladder?

A: The 'elif' statement is used to check mul ple condi ons one by one. It is executed if the previous condi ons
are false and its condi on is true.

Q20: How do you append an element to a list in Python?

A: You can use the 'append()' method to add an element to the end of a list in Python.

Q21: How do you access the value associated with a key in a dic onary?

A: You can access the value associated with a key in a dic onary using square brackets with the key inside. For
example, 'my_dict[key]' returns the value associated with 'key'.

Q22: Can you change the elements of a tuple a er it has been created?

A: No, tuples are immutable, which means you cannot change their elements a er they have been created.

Q23: What is the difference between 'add()' and 'update()' methods in sets?

A: The 'add()' method adds a single element to a set, while the 'update()' method adds mul ple elements from
another set (or any iterable) to the current set.

Q24: What is inheritance in OOP?

A: Inheritance is a mechanism in OOP that allows a new class to inherit proper es and behaviors (a ributes and
methods) from an exis ng class.

Q25: Give an example of a language that uses a compiler.

A: C and C++ are examples of languages that use compilers. They compile the code into machine code before
execu on.

Q26: What is the difference between 'int()' and 'float()' func ons in Python?

A: 'int()' func on converts a value to an integer, while 'float()' func on converts a value to a floa ng-point number.

Q27: What is the purpose of the 'for' loop in Python?

A: The 'for' loop in Python is used to iterate over a sequence (such as a list, tuple, or string) and execute a block of
code for each item in the sequence.
Q28: How do you remove an element from a list in Python?

A: You can remove an element from a list using methods like 'remove()', 'pop()', or 'del'. The 'remove()' method
removes the first occurrence of a specified value, 'pop()' removes an element at a specific index and returns it, and
'del' removes an element at a specific index or deletes the en re list if used without an index.

Q29: Can a dic onary have mul ple values for the same key?

A: No, each key in a dic onary must be unique. If you try to assign a new value to an exis ng key, it will overwrite
the previous value associated with that key.

Q30: What is the syntax for crea ng an empty tuple in Python?

A: An empty tuple can be created using empty parentheses '()'. For example, 'my_tuple = ()'.

Q31: How do you perform set intersec on and set union opera ons in Python?

A: Set intersec on can be performed using the '&' operator or 'intersec on()' method, while set union can be
performed using the '|' operator or 'union()' method.

Q32: What is encapsula on in OOP?

A: Encapsula on is the bundling of data (a ributes) and methods (func ons) that operate on the data into a single
unit (class). It helps in hiding the internal state of an object and restric ng direct access to it from outside the class.

Q33: Explain the concept of 'just-in- me' compila on.

A: Just-in- me (JIT) compila on is a hybrid approach that combines aspects of both interpreta on and compila on.
It involves compiling code into machine code at run me, just before execu ng it, allowing for op miza ons tailored
to the specific run me environment.

Q34: What is the difference between a string and a list in Python?

A: A string is a sequence of characters, while a list is a collec on of items that can be of different data types. Strings
are immutable, meaning they cannot be changed a er crea on, while lists are mutable and can be modified.

Q35: How do you exit a loop prematurely in Python?

A: You can exit a loop prematurely using the 'break' statement. When the 'break' statement is encountered within a
loop, the loop is terminated immediately, and control passes to the next statement a er the loop.

Q36: What is the difference between the 'extend()' and 'append()' methods in Python lists?

A: The 'extend()' method is used to add elements from another list to the end of the current list, effec vely
extending it. The 'append()' method, on the other hand, adds a single element to the end of the list.

Q37: How do you check if a key exists in a dic onary?

A: You can use the 'in' keyword to check if a key exists in a dic onary. For example, 'if key in my_dict:' checks if 'key'
exists in 'my_dict'.

Q38: Can you concatenate two tuples in Python?

A: Yes, you can concatenate two tuples using the '+' operator. For example, 'tuple1 + tuple2' will concatenate 'tuple1'
and 'tuple2' into a new tuple.

Q39: What is the purpose of the 'difference()' method in sets?

A: The 'difference()' method in sets is used to get the difference between two sets. It returns a new
set containing elements that are present in the first set but not in the second set.

Q40: What is polymorphism in OOP?


A: Polymorphism is the ability of objects to take on different forms or behave differently based on the context in
which they are used. It allows objects of different classes to be treated as objects of a common superclass, enabling
code reuse and flexibility in design.

AI AND ML FUNDAMENTALS
1. What Are the Different Types of Machine Learning?

Types of Machine Learning

There are several types of machine learning, each with special characteris cs and applica ons. Some of the main
types of machine learning algorithms are as follows:

 Supervised Machine Learning

 Unsupervised Machine Learning

 Semi-Supervised Machine Learning

 Reinforcement Learning

2. What is Overfi ng, and How Can You Avoid It?

 Overfi ng & underfi ng are the two main errors/problems in the machine learning model, which
cause poor performance in Machine Learning.

 Overfi ng occurs when the model fits more data than required, and it tries to capture each and
every datapoint fed to it. Hence it starts capturing noise and inaccurate data from the dataset, which
degrades the performance of the model.

 An overfi ed model doesn't perform accurately with the test/unseen dataset and can’t generalize well.

 An overfi ed model is said to have low bias and high variance.

Ways to prevent the Overfi ng

Although overfi ng is an error in Machine learning which reduces the performance of the model, however, we can
prevent it in several ways. With the use of the linear model, we can avoid overfi ng; however, many real-world
problems are non-linear ones. It is important to prevent overfi ng from the models. Below are several ways that can
be used to prevent overfi ng:

1. Early Stopping
2. Train with more data

3. Feature Selec on

4. Cross-Valida on

5. Data Augmenta on

6. Regulariza on

3. What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How Much Data Will You Allocate for Your
Training, Valida on, and Test Sets?

 The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the
machine learning model.

 The test dataset is another subset of original data, which is independent of the training dataset.

 20% - 30% can be used for tes ng and the remaining 70% - 80% can be used for Training the model.

4. How Do You Handle Missing or Corrupted Data in a Dataset?

 Dele ng rows or columns. We usually use this method when it comes to empty cells.

 Replacing the missing data with aggregated values.

 Crea ng an unknown category.

 Predic ng missing values.

5. How Can You Choose a Classifier Based on a Training Set Data Size?

 Choosing a classifier based on the size of the training set involves considering several factors such as the
complexity of the problem, the amount of available data, and the computa onal resources available.

 For small training sets, simple classifiers like Naive Bayes or decision trees may be more suitable, as they are
less prone to overfi ng and require less data to train effec vely. These classifiers are also computa onally
less expensive, making them a prac cal choice for limited data scenarios.

 On the other hand, for large training sets, more complex classifiers like ensemble methods (e.g., random
forests, gradient boos ng) or deep learning models (e.g., neural networks) may be more appropriate. These
classifiers are capable of capturing intricate pa erns in the data but require a large amount of data to
generalize well and avoid overfi ng.

6. Explain the Confusion Matrix with Respect to Machine Learning Algorithms.

A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It
is a means of displaying the number of accurate and inaccurate instances based on the model’s predic ons. It is o en
used to measure the performance of classifica on models, which aim to predict a categorical label for each input
instance.

The matrix displays the number of instances produced by the model on the test data.

 True posi ves (TP): occur when the model accurately predicts a posi ve data point.

 True nega ves (TN): occur when the model accurately predicts a nega ve data point.

 False posi ves (FP): occur when the model predicts a posi ve data point incorrectly.

 False nega ves (FN): occur when the model mispredicts a nega ve data point.

7. What Is a False Posi ve and False Nega ve and How Are They Significant?

 A false posi ve is an outcome where the model incorrectly predicts the posi ve class.
 And a false nega ve is an outcome where the model incorrectly predicts the nega ve class.

By conven on, this false posi ve rate is usually set to 5%: for tests where there is not a meaningful difference
between treatment and control, we’ll falsely conclude that there is a “sta s cally significant” difference 5% of the
me. Tests that are conducted with this 5% false posi ve rate are said to be run at the 5% significance level.

8. What Are the Three Stages of Building a Model in Machine Learning?

 Data prepara on

Data prepara on is the process of preparing raw data so that it is suitable for further processing and analysis.

 Model crea on

The process of feeding an ML algorithm with data to help iden fy and learn good values for all a ributes involved.

 Deployment

Model deployment in machine learning is the process of integra ng your model into an exis ng produc on
environment where it can take in an input and return an output.

9. What is Supervised Learning?

Supervised learning is a category of machine learning that uses labelled datasets to train algorithms to predict
outcomes and recognize pa erns.

10. What is Unsupervised Learning?

Unsupervised learning, also known as unsupervised machine learning, uses machine learning (ML) algorithms to
analyse and cluster unlabelled data sets. These algorithms discover hidden pa erns or data groupings without the
need for human interven on.

11. What is ‘Naive’ in a Naive Bayes?

Naive Bayes is a simple classifica on algorithm based on Thomas Bayes’ condi onal probability theorem.

Everyone is aware that this algorithm is naive because it assumes that measurement features are independent of one
another and contribute equally to the outcome.

12. What is PCA? When do you use it?

Principal component analysis, or PCA, is a dimensionality reduc on method that is o en used to reduce the
dimensionality of large data sets, by transforming a large set of variables into a smaller one that s ll contains most of
the informa on in the large set.

The most important use of PCA is to represent a mul variate data table as smaller set of variables (summary
indices) in order to observe trends, jumps, clusters and outliers.

13. What are Support Vectors in SVM?

The data points or vectors that are the closest to the hyperplane and which affect the posi on of the hyperplane are
termed as Support Vector.

14. What is Bias in Machine Learning?

Bias in ML is an sort of mistake in which some aspects of a dataset are given more weight and/or representa on than
others.

15. Explain the Difference Between Classifica on and Regression?


Classifica on Regression

In this problem statement, the target variables are In this problem statement, the target variables are
discrete. con nuous.

Problems like Spam Email Classifica on, Disease


Problems like House Price Predic on, Rainfall Predic on like
predic on like problems are solved using Classifica on
problems are solved using regression Algorithms.
Algorithms.

In this algorithm, we try to find the best possible


In this algorithm, we try to find the best-fit line which
decision boundary which can separate the two classes
can represent the overall trend in the data.
with the maximum possible separa on.

Evalua on metrics like Precision, Recall, and F1-Score Evalua on metrics like Mean Squared Error, R2-Score, and
are used here to evaluate the performance of the MAPE are used here to evaluate the performance of the
classifica on algorithms. regression algorithms.

Here we face the problems like binary Here we face the problems like Linear Regression models as
Classifica on or Mul -Class Classifica on problems. well as non-linear models.

Input Data are Independent variables and categorical Input Data are Independent variables and con nuous
dependent variable. dependent variable.

The classifica on algorithm’s task mapping the input The regression algorithm’s task is mapping input value (x)
value of x with the discrete output variable of y. with con nuous output variable (y).

Output is Categorical labels. Output is Con nuous numerical values.

Objec ve is to Predict categorical/class labels. Objec ve is to Predic ng con nuous numerical values.

Example use cases are Spam detec on, image Example use cases are Stock price predic on, house price
recogni on, sen ment analysis predic on, demand forecas ng.

Examples of classifica on algorithms are: Examples of regression algorithms are:

Logis c Regression, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, Ridge Regression,
Support Vector Machines (SVM), K- Lasso Regression, Support Vector Regression (SVR), Decision
Nearest Neighbors (K-NN), Naive Bayes, Neural Trees for Regression, Random Forest Regression, K-
Networks, K-Means Clustering, Mul -layer Perceptron Nearest Neighbors (K-NN) Regression, Neural Networks for
(MLP), etc. Regression, etc.

16. Explain the terms Ar ficial Intelligence (AI), Machine Learning (ML) and Deep Learning?

Ar ficial intelligence (AI) refers to computer systems capable of performing complex tasks that historically only a
human could do, such as reasoning, making decisions, or solving problems.

Machine learning (ML) is a branch of ar ficial intelligence (AI) and computer science that focuses on the using data
and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy.

Deep learning is a method in ar ficial intelligence (AI) that teaches computers to process data in a way that is
inspired by the human brain. Deep learning models can recognize complex pa erns in pictures, text, sounds, and
other data to produce accurate insights and predic ons.

17. What is the difference between deep learning and machine learning?
S.
Machine Learning Deep Learning
No.

1. Machine Learning is a superset of Deep Learning Deep Learning is a subset of Machine Learning

The data represented in Machine Learning is


The data representa on used in Deep Learning is quite
2. quite different compared to Deep Learning as it
different as it uses neural networks(ANN).
uses structured data

Deep Learning is an evolu on of Machine


3. Machine Learning is an evolu on of AI.
Learning. Basically, it is how deep is the machine learning.

Machine learning consists of thousands of data


4. Big Data: Millions of data points.
points.

Outputs: Numerical Value, like classifica on of Anything from numerical values to free-form elements, such
5.
the score. as free text and sound.

Uses various types of automated algorithms that


Uses a neural network that passes data through processing
6. turn to model func ons and predict future ac on
layers to, interpret data features and rela ons.
from data.

Algorithms are detected by data analysts to Algorithms are largely self-depicted on data analysis
7.
examine specific variables in data sets. once they’re put into produc on.

Machine Learning is highly used to stay in the


8. Deep Learning solves complex machine-learning issues.
compe on and learn new things.

Training can be performed using the CPU (Central A dedicated GPU (Graphics Processing Unit) is required for
9.
Processing Unit). training.

More human interven on is involved in ge ng Although more difficult to set up, deep learning requires less
10.
results. interven on once it is running.

Although they require addi onal setup me, deep learning


Machine learning systems can be swi ly set up
algorithms can produce results immediately (although the
11. and run, but their effec veness may be
quality is likely to improve over me as more data becomes
constrained.
available).

Its model takes less me in training due to its A huge amount of me is taken because of very big data
12.
small size. points.

Feature engineering is not needed because important


13. Humans explicitly do feature engineering.
features are automa cally detected by neural networks.

Machine learning applica ons are simpler


Deep learning systems u lize much more powerful hardware
14. compared to deep learning and can be executed
and resources.
on standard computers.

15. The results of an ML model are easy to explain. The results of deep learning are difficult to explain.

Machine learning models can be used to solve Deep learning models are appropriate for resolving
16.
straigh orward or a li le bit challenging issues. challenging issues.

Deep learning technology enables increasingly sophis cated


Banks, doctor’s offices, and mailboxes all employ
17. and autonomous algorithms, such as self-driving automobiles
machine learning already.
or surgical robots.
Deep learning, on the other hand, uses complex neural
Machine learning involves training algorithms
18. networks with mul ple layers to analyze more intricate
to iden fy pa erns and rela onships in data.
pa erns and rela onships.

Machine learning algorithms can range from Deep learning algorithms, on the other hand, are based
19. simple linear models to more complex models on ar ficial neural networks that consist of mul ple layers
such as decision trees and random forests. and nodes.

Machine learning algorithms typically require less Deep learning algorithms, on the other hand, require large
20. data than deep learning algorithms, but the amounts of data to train the neural networks but can learn
quality of the data is more important. and improve on their own as they process more data.

Machine learning is used for a wide range of Deep learning, on the other hand, is mostly used for complex
21. applica ons, such as regression, classifica on, tasks such as image and speech recogni on, natural language
and clustering. processing, and autonomous systems.

Machine learning algorithms for complex tasks,


Deep learning algorithms are more accurate than machine
22. but they can also be more difficult to train and
learning algorithms.
may require more computa onal resources.

18. How do you select important variables while working on a data set?

When working on a data set, there are several methods to select important variables, depending on the nature of the
data and the specific goals of the analysis. Some common approaches include:

Univariate Selec on: This involves selec ng variables based on their individual performance in rela on to the target
variable, using sta s cal tests such as t-tests, ANOVA, or correla on coefficients.

Feature Importance: Techniques such as decision trees, random forests, or gradient boos ng can be used to rank
variables based on their importance in predic ng the target variable.

Lasso Regression: This method involves adding a penalty for non-zero coefficients, effec vely shrinking some
coefficients to zero, thus performing variable selec on.

Principal Component Analysis (PCA): This technique transforms the original variables into a new set of uncorrelated
variables, and the importance of the original variables can be assessed based on the variance they explain.

Domain Knowledge: Subject ma er experts can provide valuable insights into which variables are likely to be
important based on their understanding of the underlying processes.

Automated Feature Selec on: There are various algorithms and tools that can automa cally select important
variables based on predefined criteria, such as recursive feature elimina on or forward/backward selec on.

19. There are many machine learning algorithms ll now. If given a data set, how can one determine which
algorithm to be used for that?

1. Understand Your Project Goal. ...

2. Analyze Your Data by Size, Processing, and Annota on Required. ...

3. Evaluate the Speed and Training Time. ...

4. Find Out the Linearity of Your Data. ...

5. Decide on the Number of Features and Parameters.

20. How are covariance and correla on different from one another?
Covariance indicates the direc on of the linear rela onship between variables. Correla on on the other hand
measures both the strength and direc on of the linear rela onship between two variables.

21. State the differences between causality and correla on?

Causa on means one thing causes another—in other words, ac on A causes outcome B. On the other hand,
correla on is simply a rela onship where ac on A relates to ac on B—but one event doesn't necessarily cause the
other event to happen.

22. We look at machine learning so ware almost all the me. How do we apply Machine Learning to Hardware?

AI and ML can be used for hardware design at different stages of the design cycle and levels of abstrac on.

There are two primary processors used as part of most AI/ML tasks: central processing units (CPUs) and graphics
processing units (GPUs).

23. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?

To prevent biases from being introduced, One-Hot Encoding is preferable for nominal data (where there is no
inherent order among categories). Label encoding, however, might be more appropriate for ordinal data (where
categories naturally have an order).-

24. What is Semi-supervised Machine Learning?

Semi-supervised learning is a branch of machine learning that combines supervised and unsupervised learning by
using both labeled and unlabeled data to train ar ficial intelligence (AI) models for classifica on and regression
tasks.

Though semi-supervised learning is generally employed for the same use cases in which one might otherwise use
supervised learning methods, it’s dis nguished by various techniques that incorporate unlabeled data into model
training, in addi on to the labeled data required for conven onal supervised learning.

Semi-supervised learning methods are especially relevant in situa ons where obtaining a sufficient amount
of labeled data is prohibi vely difficult or expensive, but large amounts of unlabeled data are rela vely
easy to acquire. In such scenarios, neither fully supervised nor unsupervised learning methods will provide adequate
solu ons.

25. How do you choose which algorithm to use for a dataset?

Steps to Choose Best Machine Learning Algorithm

Here is a step-by-step procedure to choose correct machine learning algorithm :

1. Understand Your Problem : Begin by gaining a deep understanding on the problem you are trying to solve.
What is your goal? What is the problem all about classifica on, regression , clustering, or something else?
What kind of data you are working with?

2. Process the Data: Ensure that your data is in the right format for your chosen algorithm. Process and prepare
your data by cleaning, Clustering, Regression.

3. Explora on of Data: Conduct data analysis to gain insights into your data. Visualiza ons and
sta s cs helps you to understand the rela onships within your data.

4. Metrics Evalua on: Decide on the metrics that will measure the success of model. You must choose the
metric that should align with your problem.

5. Simple models: One should begin with the simple easy-to-learn algorithms. For classifica on, try regression,
decision tree. Simple model provides a baseline for comparison.
6. Use Mul ple Algorithms: Try to use mul ple algorithms to check that one performs on your dataset. That
may include:

 Decision Trees

 Gradient Boos ng(XGBoost, LightGBM)

 Random Forest

 k-Neasrest Neighbors(KNN)

 Naive Bayes

 Support Vector Machines(SVM)

 Neural Networks(Deep Learning)

7. Hyperparameter Tuning: Grid Search and Random Search can helps with adjus ng parameters choose
algorithm that find best combina on.

8. Cross- Valida on: Use cross- valida on to get assess the performance of your models. This helps
prevent overfi ng .

9. Comparing Results: Evaluate the models’s performance by using the metrics evalua on. Compare their
performance and choose that best one that align with problem’s goal.

10. Consider Model Complexity: Balance complexity of model and their performance. Compare their
performance and choose that one best algorithm to generalize be er.

26. Men on the difference between Data Mining and Machine learning?

S.No. Data Mining Machine Learning

Extrac ng useful informa on from large


1. Introduce algorithm from data as well as from past experience
amount of data

Teaches the computer to learn and understand from the data


2. Used to understand the data flow
flow

3. Huge databases with unstructured data Exis ng data as well as algorithms

Models can be developed for using data machine learning algorithm can be used in the decision tree,
4.
mining technique neural networks and some other area of ar ficial intelligence

5. human interference is more in it. No human effort required a er design

It is used in web Search, spam filter, fraud detec on and


6. It is used in cluster analysis
computer design

Data mining abstract from the data


7. Machine learning reads machine
warehouse

Data mining is more of a research using


8. Self learned and trains system to do the intelligent task
methods like machine learning

9. Applied in limited area Can be used in vast area

10. Uncovering hidden pa erns and insights Making accurate predic ons or decisions based on data

11. Exploratory and descrip ve Predic ve and prescrip ve

12. Historical data Historical and real- me data


13. Pa erns, rela onships, and trends Predic ons, classifica ons, and recommenda ons

Clustering, associa on rule mining, outlier


14. Regression, classifica on, clustering, deep learning
detec on

Data cleaning, transforma on, and


15. Data cleaning, transforma on, and feature engineering
integra on

16. Strong domain knowledge is o en required Domain knowledge is helpful, but not always necessary

Can be used in a wide range of applica ons, Primarily used in applica ons where predic on or decision-
17. including business, healthcare, and social making is important, such as finance, manufacturing, and
science cybersecurity

27. What is induc ve machine learning?

Induc ve Learning Algorithm (ILA) is an itera ve and induc ve machine learning algorithm that is used for genera ng
a set of classifica on rules, which produces rules of the form “IF-THEN”, for a set of examples, producing rules at
each itera on and appending to the set of rules.

There are basically two methods for knowledge extrac on firstly from domain experts and then with machine
learning. For a very large amount of data, the domain experts are not very useful and reliable. So we move towards
the machine learning approach for this work. To use machine learning One method is to replicate the expert’s logic in
the form of algorithms but this work is very tedious, me taking, and expensive. So we move towards the induc ve
algorithms which generate the strategy for performing a task and need not instruct separately at each step.

28. What are the three stages to build the hypotheses or model in machine learning?

The three stages of building a machine learning model are:

 Model Building Choose a suitable algorithm for the model and train it according to the requirement

 Model Tes ng Check the accuracy of the model through the test data

 Applying the Model Make the required changes a er tes ng and use the final model for real- me projects

29. What is the standard approach to supervised learning?

The standard approach to supervised learning is to split the set of example into the training set and the test..

30. List down various approaches for machine learning?

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

31. Explain what is the func on of ‘Unsupervised Learning’?

Unsupervised learning, also known as unsupervised machine learning, uses machine learning (ML) algorithms
to analyze and cluster unlabeled data sets. These algorithms discover hidden pa erns or data groupings without the
need for human interven on.

Unsupervised learning's ability to discover similari es and differences in informa on make it the ideal solu on for
exploratory data analysis, cross-selling strategies, customer segmenta on and image recogni on.
32. What is algorithm independent machine learning?

Algorithm independent machine learning refers to the development of machine learning models that are not ed to
specific algorithms. In tradi onal machine learning, specific algorithms such as decision trees, support vector
machines, or neural networks are used to train models. However, algorithm independent machine learning aims to
create models that can adapt to different algorithms, allowing for more flexibility and poten ally be er performance.
This approach focuses on building models that are agnos c to the underlying algorithms, making it easier to switch
between different algorithms based on the specific requirements of a task or problem.

33. What is classifier in machine learning?

Classifica on is defined as the process of recogni on, understanding, and grouping of objects and ideas into preset
categories a.k.a “sub-popula ons.” With the help of these pre-categorized training datasets, classifica on in machine
learning programs leverage a wide range of algorithms to classify future datasets into respec ve and relevant
categories.

Classifica on algorithms used in machine learning u lize input training data for the purpose of predic ng the
likelihood or probability that the data that follows will fall into one of the predetermined categories. One of the most
common applica ons of classifica on is for filtering emails into “spam” or “non-spam”, as used by today’s top email
service providers

34. What are the advantages of Naive Bayes?

The following are some of the benefits of the Naive Bayes classifier:

 It is simple and easy to implement

 It doesn’t require as much training data

 It handles both con nuous and discrete data

 It is highly scalable with the number of predictors and data points

 It is fast and can be used to make real- me predic ons

 It is not sensi ve to irrelevant features

35. What is Induc ve Logic Programming in Machine Learning?

Induc ve logic programming is the subfield of machine learning that uses first-order logic to represent hypotheses
and data. Because first-order logic is expressive and declara ve, induc ve logic programming specifically targets
problems involving structured data and background knowledge.

36. What is Model Selec on in Machine Learning?

The process of selec ng the machine learning model most appropriate for a given issue is known as model selec on.”
Model selec on is a procedure that may be used to compare models of the same type that have been set up with
various model hyperparameters and models of other types.

37. What are the two methods used for the calibra on in Supervised Learning?

Pla Calibra on and Isotonic Regression.

Pla Scaling is preferable if the calibra on curve has a sigmoid shape and when there is few calibra on data.
Whereas, Isotonic Regression, being a non-parametric method, is preferable for non-sigmoid calibra on curves and
in situa ons where many addi onal data can be used for calibra on.

38. What is the difference between heuris c for rule learning and heuris cs for decision trees?

• The heuris c for rule learning it is open to changes in the rule Based on the learning that as it is proceed.
• in the other hand these heuris cs are fixed in the decision trees to be able to reach a decision. • the difference is
that the heuris cs for decision trees evolute the average quality of a number of disjointed sets while rule learners
Only evaluate the quality of the set of instances that is covered with the candidate rule.

39. What is ensemble learning?

The ensemble methods in machine learning combine the insights obtained from mul ple learning models
to facilitate accurate and improved decisions. These methods follow the same principle as the example of buying an
air-condi oner cited above.

In learning models, noise, variance, and bias are the major sources of error. The ensemble methods in machine
learning help minimize these error-causing factors, thereby ensuring the accuracy and stability of machine learning
(ML) algorithms.

Example 2: Assume that you are developing an app for the travel industry. It is obvious that before making the app
public, you will want to get crucial feedback on bugs and poten al loopholes that are affec ng the user experience.
What are your available op ons for obtaining cri cal feedback? 1) Solici ng opinions from your parents, spouse, or
close friends. 2) Asking your co-workers who travel regularly and then evalua ng their response. 3) Rolling out your
travel and tourism app in beta to gather feedback from non-biased audiences and the travel community.

Think for a moment about what you are doing. You are taking into account different views and ideas from a wide
range of people to fix issues that are limi ng the user experience. The ensemble neural network and ensemble
algorithm do precisely the same thing.

Example 3: Imagine a group of blindfolded people playing the touch-and-tell game, where they are asked to touch
and explore a mini donut factory that no one of them has ever seen before. Since they are blindfolded, their version
of what a mini donut factory looks like will vary, depending on the parts of the appliance they touch. Now, suppose
they are personally asked to describe what they touched. In that case, their individual experiences will give a precise
descrip on of specific parts of the mini donut factory. S ll, collec vely, their combined experiences will provide a
highly detailed account of the en re equipment.

40. What is dimension reduc on in Machine Learning?

Dimensionality reduc on refers to the method of reducing variables in a training dataset used to develop machine
learning models. The process keeps a check on the dimensionality of data by projec ng high dimensional data to a
lower dimensional space that encapsulates the 'core essence' of the data

DATA ANALYSIS FUNDAMENTALS


1Q) What is data analysis?

1A) Data analysis is the process of inspec ng, cleaning, transforming, and modeling data to discover useful
informa on, pa erns, and insights.

2Q) Why is data analysis important?

2A) Data analysis helps in making informed decisions, understanding trends, solving problems, and gaining
compe ve advantages in various fields.

3Q) What are the steps involved in data analysis?

3A) The steps include data collec on, data cleaning, data explora on, data visualiza on, data modeling, and
interpreta on of results.

4Q) Explain the term 'data cleaning'.

4A) Data cleaning involves iden fying and correc ng errors, inconsistencies, and missing values in a dataset to
ensure its accuracy and reliability for analysis.

5Q) What is data explora on?


5A) Data explora on involves summarizing the main characteris cs of a dataset, such as its distribu on, central
tendency, and rela onships between variables, using descrip ve sta s cs and visualiza on techniques.

6Q) What are descrip ve sta s cs?

6A) Descrip ve sta s cs are numerical and graphical techniques used to summarize and describe the main
features of a dataset, including measures of central tendency, variability, and distribu on.

7Q) Give examples of descrip ve sta s cs.

7A) Examples include mean, median, mode, range, standard devia on, variance, histograms, box plots, and sca er
plots.

8Q) Explain the purpose of data visualiza on.

8A) Data visualiza on is used to present data visually through graphs, charts, and maps, making it easier to
understand complex pa erns, trends, and rela onships in the data.

9Q) What are the common types of data visualiza ons?

9A) Common types include bar charts, line graphs, pie charts, histograms, box plots, sca er plots, and heat maps.

10Q) What is a histogram?

10A) A histogram is a graphical representa on of the distribu on of numerical data, showing the frequency of values
within different intervals or bins.

11Q) Explain the concept of correla on.

11A) Correla on measures the strength and direc on of the linear rela onship between two variables. It ranges from
-1 to +1, where -1 indicates a perfect nega ve correla on, +1 indicates a perfect posi ve correla on, and
0 indicates no correla on.

12Q) How is correla on different from causa on?

12A) Correla on indicates a rela onship between two variables but does not imply causa on, meaning that changes
in one variable cause changes in the other. Causa on requires addi onal evidence to establish a cause-and-effect
rela onship.

13Q) What is a sca er plot?

13A) A sca er plot is a graphical representa on of the rela onship between two con nuous variables, with one
variable plo ed on the x-axis and the other on the y-axis, showing individual data points.

14Q) Explain the concept of central tendency.

14A) Central tendency refers to the tendency of data to cluster around a central value or average. Measures of
central tendency include the mean, median, and mode.

15Q) What is the mean?

15A) The mean, also known as the average, is the sum of all values in a dataset divided by the number of values.

16Q) What is the median?

16A) The median is the middle value of a dataset when arranged in ascending or descending order. If there is an
even number of values, the median is the average of the two middle values.

17Q) What is the mode?

17A) The mode is the value that appears most frequently in a dataset.

18Q) Explain the concept of variability.


18A) Variability measures the extent to which data points differ from each other. Measures of variability include the
range, standard devia on, and variance.

19Q) What is the range?

19A) The range is the difference between the maximum and minimum values in a dataset, represen ng the spread
of the data.

20Q) What is standard devia on?

20A) Standard devia on measures the average distance of data points from the mean, providing a measure of the
dispersion or spread of the data.

21Q) What is variance?

21A) Variance is the average of the squared differences between each data point and the mean, represen ng the
variability of the data.

22Q) Explain the concept of outliers.

22A) Outliers are data points that significantly differ from the rest of the dataset. They can distort sta s cal analyses
and should be carefully examined to determine whether they represent valid data or errors.

23Q) How can outliers be iden fied?

23A) Outliers can be iden fied using sta s cal methods such as the interquar le range (IQR), z-scores, or visual
inspec on of box plots and sca er plots.

24Q) What is a box plot?

24A) A box plot, also known as a box-and-whisker plot, is a graphical representa on of the distribu on of numerical
data through quar les, outliers, and the median.

25Q) Explain the concept of data transforma on.

25A) Data transforma on involves conver ng or modifying the original data to meet specific assump ons or
requirements for analysis, such as normalizing data, standardizing scales, or applying mathema cal func ons.

26Q) What is data normaliza on?

26A) Data normaliza on is the process of scaling numerical data to a standard range, typically between 0 and 1 or -1
and 1, to eliminate differences in scale and facilitateanalysis. It helps in comparing variables with different units and
ensures that no variable dominates the analysis due to its scale.

27Q) What is data standardiza on?

27A) Data standardiza on, also known as z-score normaliza on, involves scaling numerical data to have a mean of 0
and a standard devia on of 1. It allows for easier interpreta on of data by expressing values in terms of standard
devia ons from the mean.

28Q) Explain the concept of data aggrega on.

28A) Data aggrega on involves combining individual data points into groups, bins, or summary sta s cs to reduce
the complexity of the dataset while preserving essen al informa on. It is o en used to analyze large datasets or
create visualiza ons.

29Q) What are pivot tables?

29A) Pivot tables are data summariza on tools used in spreadsheet programs like Microso Excel or Google Sheets.
They allow users to rearrange and summarize tabular data to extract insights by dragging and dropping variables into
rows, columns, or value fields.

30Q) What is data filtering?


30A) Data filtering involves selec ng specific subsets of data based on predefined criteria or condi ons. It helps in
focusing the analysis on relevant data points and excluding irrelevant or erroneous entries.

31Q) Explain the concept of data mining.

31A) Data mining is the process of discovering pa erns, trends, and insights from large datasets using sta s cal and
machine learning techniques. It aims to extract valuable knowledge from data to support decision-making and
predic on.

32Q) What are some common data mining techniques?

32A) Common data mining techniques include classifica on, clustering, associa on rule mining, regression analysis,
and anomaly detec on.

33Q) What is classifica on in data mining?

33A) Classifica on is a data mining technique used to categorize data into predefined classes or categories based on
input features. It involves building a predic ve model that assigns new observa ons to the most likely class based on
their characteris cs.

34Q) What is clustering in data mining?

34A) Clustering is a data mining technique used to group similar data points together based on their characteris cs
or a ributes. It aims to discover natural groupings or clusters within a dataset without predefined class labels.

35Q) What is associa on rule mining?

35A) Associa on rule mining is a data mining technique used to discover interes ng rela onships or associa ons
between variables in large datasets. It iden fies pa erns such as frequent itemsets or rules indica ng co-occurrence
or correla on between items.

36Q) What is regression analysis?

36A) Regression analysis is a sta s cal technique used to model the rela onship between a dependent variable and
one or more independent variables. It helps in predic ng the value of the dependent variable based on the values of
the independent variables.

37Q) What is anomaly detec on?

37A) Anomaly detec on is a data mining technique used to iden fy unusual or abnormal observa ons in a dataset
that deviate from expected behavior. It is used for detec ng fraud, errors, or outliers in various domains.

38Q) What is data storytelling?

38A) Data storytelling is the process of using data visualiza ons, narra ves, and compelling storytelling techniques to
communicate insights and findings derived from data analysis effec vely. It helps in making data-driven decisions and
influencing stakeholders.

39Q) Explain the concept of exploratory data analysis (EDA).

39A) Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteris cs,
o en with visual methods. EDA helps in understanding the data, genera ng hypotheses, and iden fying pa erns or
rela onships for further inves ga on.

40Q) What are some common tools and so ware used for data analysis?

40A) Common tools and so ware for data analysis include spreadsheet programs like Microso Excel, sta s cal
so ware like R and Python with libraries such as Pandas, NumPy, and SciPy, business intelligence tools like Tableau
and Power BI, and programming environments like Jupyter Notebook.

You might also like