AIML
AIML
→C FUNDAMENTALS
→DATA STRUCTURES
→JAVA BASICS
→BASICS OF OS
→DBMS
1
C FUNDAMENTALS
5. What is a Preprocessor?
A preprocessor is a software program that processes a source file before
sending it to be compiled. Inclusion of header files, macro expansions,
conditional compilation, and line control are all possible with the
preprocessor.
2
8. What is the purpose of #include?
#include is a preprocessor directive that tells the compiler to include the
standard input-output library in the program.
16. Explain the use of volatile keyword in C. The volatile keyword is used to
indicate that a variable may be changed unexpectedly by external factors,
such as hardware interrupts or concurrent threads.
3
DATA STRUCTURES
4
A queue is a linear data structure that allows users to store items in a list in a systematic
manner. The items are added to the queue at the rear end until they are full, at which
point they are removed from the queue from the front.
Some applications of queue data structure:
• Breadth-first search algorithm in graphs
• Operating system: job scheduling operations, Disk scheduling, CPU
scheduling etc.
• Call management in call centres
Stack Queue
Stack is a linear data structure where Queue is a linear data structure where data is
data is added and removed from the top. ended at the rear end and removed from the
front.
Stack is based on
LIFO(Last In First Out) Queue is based on FIFO(First In First Out)
principle principle
5
6. What is a linked list data structure? What are the applications for the
Linked list?
A linked list can be thought of as a series of linked nodes (or items) that are connected
by links (or paths). Each link represents an entry into the linked list, and each entry points
to the next node in the sequence. The order in which nodes are added to the list is
determined by the order in which they are created.
Some applications of linked list data structure:
• Stack, Queue, binary trees, and graphs are implemented using linked lists.
• Dynamic management for Operating System memory.
• Round robin scheduling for operating system tasks.
• Forward and backward operation in the browser.
of data elements of the same type. entities known as nodes. The node is divided into
two sections: data and address.
6
It keeps the data elements in a single
memory. It stores elements at random, or anywhere in the
memory.
9. What is binary tree data structure? What are the applications for binary trees?
A binary tree is a data structure that is used to organize data in a way that allows for
efficient retrieval and manipulation. It is a data structure that uses two nodes, called
leaves and nodes, to represent the data.It's widely used in computer networks for storing
routing table
information. Some of the applications are Decision Trees.
• Expression Evaluation.
• Database indices.
7
A binary search tree is a data structure that stores items in sorted order. In a binary search
tree, each node stores a key and a value. The key is used to access the item and the value
is used to determine whether the item is present or not.
13. What is the difference between the Breadth First Search (BFS) and Depth
First Search (DFS)?
Breadth First Search (BFS) Depth First Search (DFS)
It stands for “Breadth First Search” It stands for “Depth First Search”
BFS performs better when the target is close DFS performs better when the target is far
to the source. from the source.
8
When there are no more nodes to visit, the
visited nodes are added to the stack and
Nodes that have been traversed multiple then removed.
times are removed from the queue.
9
JAVA BASICS
Variables are locations in memory that can hold values. Before assigning any value to
a variable, it must be declared.
2. What are the kinds of variables in Java? What are their uses?
• Java has three kinds of variables namely, the instance variable, the local variable and
the class variable.
• Local variables are used inside blocks as counters or in methods as temporary variables
and are used to store information needed by a single method.
• Instance variables are used to define attributes or the state of a particular object and are
used to store information needed by multiple methods in the objects.
• Class variables are global to a class and to all the instances of the class and are useful
for communicating between different objects of all the same class or keeping track of
global states.
• Variables with the same data type can be declared together. Local variables
must be given a value before usage.
Variable types can be any data type that java supports, which includes the eight primitive
data types, the name of a class or interface and an array.
10
A literal represents a value of a certain type where the type describes how that value
behaves. There are different types of literals namely number literals, character
literals,boolean literals, string literals,etc.
7. What is an array?
Array variable indicates the type of object hat the array holds. Ex: int arr[];
9. What is a string?
10. When a string literal is used in the program, Java automatically creates
instances of the string class.
True
Addition operator(+).
The array subscript expression can be used to change the values of the elements of the
array.
If a variable is declared as final variable, then you can not change its value. It becomes
constant.
11
15. What is mean by garbage collection?
Methods are functions that operate on instances of classes in which they are
defined.Objects can communicate with each other using methods and can call methods in
other classes.Method definition has four parts. They are name of the method, type of
object or primitive type the method returns, a list of parameters and the body of the
method. A method's signature is a combination of the first three parts mentioned above.
getClass( ) method can be used to find out what class the belongs to. This class is defined
in the object class and is available to all objects.
The package statement defines a name space in which classes are stored.If you omit the
package, the classes are put into the default package.Signature... package pkg;
Use: * It specifies to which package the classes defined in a file belongs to. * Package is
both naming and a visibility control mechanism.
12
22. What do you understand by package access specifier?
Private: Anything declared in the private can’t be seen outside of its class.
It is similar to class which may contain method’s signature only but not bodies.
13
BASICS OF OS
6. What is Thrashing?
Thrashing is a situation when the performance of a computer degrades or collapses.
Thrashing occurs when a system spends more time processing page faults than executing
transactions. While processing page faults is necessary in order to appreciate the benefits
of virtual memory, thrashing has a negative effect on the system. As the page fault rate
increases, more transactions need processing from the paging device. The queue at the
paging device increases, resulting in increased service time for a page fault.
7. What is Buffer?
A buffer is a memory area that stores data being transferred between two devices or
between a device and an application.
Virtual memory creates an illusion that each user has one or more contiguous address
spaces, each beginning at address zero. The sizes of such virtual address spaces are
generally very high. The idea of virtual memory is to use disk space to extend the RAM.
Running processes don’t need to care whether the memory is from RAM or disk. The
illusion of such a large amount of memory is created by subdividing the virtual memory
into smaller pieces, which can be loaded into physical memory whenever they are needed
by a process.
15
A kernel is the central component of an operating system that manages the operations of
computers and hardware. It basically manages operations of memory and CPU time. It
is a core component of an operating system. Kernel acts as a bridge between applications
and data processing performed at the hardware level using interprocess communication
and system calls.
Multi-programming increases CPU utilization by organizing jobs (code and data) so that
the CPU always has one to execute. The main objective of multiprogramming is to keep
multiple jobs in the main memory. If one job gets occupied with IO, the CPU can be
assigned to other jobs.
16
17. Briefly explain FCFS.
FCFS stands for First Come First served. In the FCFS scheduling algorithm, the job that
arrived first in the ready queue is allocated to the CPU and then the job that came second
and so on. FCFS is a non-preemptive scheduling algorithm as a process that holds the
CPU until it either terminates or performs I/O. Thus, if a longer job has been assigned to
the CPU then many shorter jobs after it will have to wait.
A round-robin scheduling algorithm is used to schedule the process fairly for each job in
a time slot or quantum and interrupting the job if it is not completed by then the job
comes after the other job which is arrived in the quantum time makes these scheduling
fairly.
A redundant array of independent disks is a set of several physical disk drives that the
operating system sees as a single logical unit. It played a significant role in narrowing
the gap between increasingly fast processors and slow disk drives. RAID has different
levels:
• Level-0
• Level-1
• Level-2
• Level-3
• Level-4
• Level-5
• Level-6
20. What is Banker’s algorithm?
The banker’s algorithm is a resource allocation and deadlock avoidance algorithm that
tests for safety by simulating the allocation for the predetermined maximum possible
amounts of all resources, then makes an “s-state” check to test for possible activities,
before deciding whether allocation should be allowed to continue.
21. State the main difference between logical and physical address space?
17
LOGICAL PHYSICAL
Parameter
ADDRESS ADDRESS
Logical Address
Physical Address is a
Space is a set of all set of all physical
logical addresses addresses mapped to generated by
Address Space the the corresponding
CPU in reference to a logical addresses. program.
Users can view the Users can never view logical address of a the
physical address of program. the program.
Visibility
18
Generation generated by the Computed by MMU.
LOGICAL PHYSICAL
Parameter
ADDRESS ADDRESS
CPU.
The user can use the The user can indirectly logical address to access
physical
Access
access the physical addresses but not address. directly.
22. How does dynamic loading aid in better memory space utilization?
With dynamic loading, a routine is not loaded until it is called. This method is especially
useful when large amounts of code are needed in order to handle infrequently occurring
cases such as error routines.
The concept of overlays is that whenever a process is running it will not use the complete
program at the same time, it will use only some part of it. Then overlay concept says that
whatever part you required, you load it and once the part is done, then you just unload
it, which means just pull it back and get the new part you required and run it. Formally,
“The process of transferring a block of program code or other data into internal memory,
replacing what is already stored”.
Processes are stored and removed from memory, which makes free memory space, which
is too little to even consider utilizing by different processes. Suppose, that process is not
ready to dispense to memory blocks since its little size and memory hinder consistently
staying unused is called fragmentation. This kind of issue occurs during a dynamic
memory allotment framework when free blocks are small, so it can’t satisfy any request.
Paging is a memory management method accustomed fetch processes from the secondary
memory into the main memory in the form of pages. in paging, each process is split into
parts wherever the size of every part is the same as the page size. The size of the last half
could also be but the page size. The pages of the process area unit hold on
within the frames of main memory relying upon their accessibility
19
26. How does swapping result in better memory management?
• Bounded-buffer
• Readers-writers
• Dining philosophers
• Sleeping barber
28. What is the Direct Access Method?
The direct Access method is based on a disk model of a file, such that it is viewed as a
numbered sequence of blocks or records. It allows arbitrary blocks to be read or written.
Direct access is advantageous when accessing large amounts of information. Direct
memory access (DMA) is a method that allows an input/output (I/O) device to send or
receive data directly to or from the main memory, bypassing the CPU to speed up
memory operations. The process is managed by a chip known as a DMA controller
(DMAC).
Thrashing occurs when processes on the system frequently access pages, not available
memory.
30. What is the best page size when designing an operating system?
The best paging size varies from system to system, so there is no single best when it
comes to page size. There are different factors to consider in order to come up with a
suitable page size, such as page table, paging time, and its effect on the overall efficiency
of the operating system.
20
32. What is caching?
The cache is a smaller and faster memory that stores copies of the data from frequently
used main memory locations. There are various different independent caches in a CPU,
which store instructions and data. Cache memory is used to reduce the average time to
access data from the Main memory.
The Assembler is used to translate the program written in Assembly language into
machine code. The source program is an input of an assembler that contains assembly
language instructions. The output generated by the assembler is the object code or
machine code understandable by the computer.
35. What are interrupts?
The interrupts are a signal emitted by hardware or software when a process or an event
needs immediate attention. It alerts the processor to a high-priority process requiring
interruption of the current working process. In I/O devices one of the bus control lines is
dedicated to this purpose and is called the Interrupt Service Routine (ISR).
GUI is short for Graphical User Interface. It provides users with an interface wherein
actions can be performed by interacting with icons and graphical symbols.
21
• They are machine-independent.
• Easy to implement.
Correctness is easy to determine.
• Can have many different critical sections with different
semaphores.
• Semaphores acquire many resources
simultaneously.
• No waste of resources due to busy waiting.
40. What is a bootstrap program in the OS?
• Pipes (Same Process): This allows a flow of data in one direction only.
Analogous to simplex systems (Keyboard). Data from the output is usually
buffered until the input process receives it which must have a common origin.
Named Pipes (Different Processes): This is a pipe with a specific name it
can be used in processes that don’t have a shared common process origin.
E.g. FIFO where the details written to a pipe are first named.
• Shared Memory: This allows the interchange of data through a defined area
of memory. Semaphore values have to be obtained before data can get access
to shared memory.
22
• Sockets: This method is mostly used to communicate over a network between
a client and a server. It allows for a standard connection which is computer
and OS independent
• Preemptive Scheduling has to maintain the integrity of shared data that’s why
it is cost associative which is not the case with Nonpreemptive Scheduling.
23
44. What is the zombie process?
A process that has finished the execution but still has an entry in the process table to
report to its parent process is known as a zombie process. A child process always first
becomes a zombie before being removed from the process table. The parent process
reads the exit status of the child process which reaps off the child process entry from the
process table.
A process whose parent process no more exists i.e. either finished or terminated without
waiting for its child process to terminate is called an orphan process.
Starvation: Starvation is a resource management problem where a process does not get
the resources it needs for a long time because the resources are being allocated to other
processes.
Switching of CPU to another process means saving the state of the old process and
loading the saved state for the new process. In Context Switching the process is stored
in the Process Control Block to serve the new process so that the old process can be
resumed from the same part it was left.
49. What is the difference between the Operating system and kernel?
51. What is PCB? the process control block (PCB) is a block that is used to track the
process’s execution status. A process control block (PCB) contains information
about the process, i.e. registers, quantum, priority, etc. The process table is an
array of PCBs, that means logically contains a PCB for all of the current
processes in the system.
• Switching context
• Switching to user mode Jumping to the proper location in the user program
to restart that program
• Min response time [Time when a process produces the first response]
• Condition variables
• Semaphores
• File locks
OS doesn’t recognize user- Kernel threads level threads. recognized by OS. are
Implementation of the
Implementation of User
28
perform kernel thread is threads is easy.
complicated.
Context switch time is less. Context switch time is more. Context switch requires no
Hardware support is needed.
User-level thread Kernel level thread hardware support.
If one user-level thread If one kernel thread perform performs a blocking a the blocking
operation then operation then entire another thread can continue process will be blocked.
execution.
User-level threads are Kernel level threads are designed as dependent designed as
independent threads. threads.
29
address space provides extremely high-bandwidth, lowlatency
communication between separate tasks within an application
It is a feature of the
4. It is a feature of the OS. process.
Multi-threading is
Multitasking is sharing of sharing of
computing computing resources(CPU,
5. resources among memory, devices, etc.) threads of
a single among processes. process.
30
66. Define the term Bounded waiting?
A system is said to follow bounded waiting conditions if a process wants to enter into a
critical section will enter in some finite time.
67. What are the solutions to the critical section problem?
There are three solutions to the critical section problem:
• Software solutions
• Hardware solutions
• Semaphores
68. What is a Banker’s algorithm?
The banker’s algorithm is a resource allocation and deadlock avoidance algorithm that
tests for safety by simulating the allocation for the predetermined maximum possible
amounts of all resources, then makes an “s-state” check to test for possible activities,
before deciding whether allocation should be allowed to continue.
71. What are the necessary conditions which can lead to a deadlock in a system?
Mutual Exclusion: There is a resource that cannot be shared.
Hold and Wait: A process is holding at least one resource and waiting for another
resource, which is with some other process.
No Preemption: The operating system is not allowed to take a resource back from a
process until the process gives it back. Circular Wait: A set of processes
waiting for each other in circular form.
31
72. What are the issues related to concurrency?
• Non-atomic: Operations that are non-atomic but interruptible by multiple
processes can cause problems.
• A directed edge from node A to node B shows that statement A executes first
and then Statement B executes
32
there are two or more processes that hold some resources and wait for resources held by
other(s).
76. What is the goal and functionality of memory management?
The goal and functionality of memory management are as follows;
• Relocation
• Protection
• Sharing
• Logical organization
• Physical organization
77. Write a difference between physical address and logical address?
S.NO. Parameters Logical address Physical Address
It is the virtual
The physical address address
1. Basic generated by is a location in a
memory unit. CPU.
Set of all logical
addresses Set of all physical generated by the addresses
mapped to
CPU in reference the corresponding
2. Address
to a program is logical addresses is referred to as referred to
as a Logical Address Physical Address. Space.
The user can The user can never view the logical view the
physical
3. Visibility
address of a address of the program.
program
S.NO. Parameters Logical address Physical Address
The user uses the
The user can not logical address
to
4. Access directly access the access the
physical address physical
address.
The Logical
Address is Physical Address is
5. Generation
generated by the Computed by MMU CPU
33
78. Explain address binding?
The Association of program instruction and data to the actual physical memory locations
is called Address Binding.
• When we do not know how much amount of memory would be needed for
the program beforehand.
• When we want data structures without any upper limit of memory space.
Internal
S.NO External fragmentation
fragmentation
In internal
In external fragmentation, fragmentation
fixedvariable-sized memory
1. sized memory, blocks blocks square measure square
measure
appointed to the method. appointed to
process.
Internal fragmentation
34
happens when the External fragmentation
2. method or process is happens when the method larger than the or process is
removed. memory.
Solution for external
The solution to internal fragmentation is
3. fragmentation is the
compaction, paging and best-
fit block.
segmentation.
External fragmentation
Internal fragmentation occurs when memory is occurs when memory
4. is divided into fixed- divided into variable-size partitions based on the size
sized partitions.
of processes.
5. The difference between The unused spaces formed Internal
S.NO External fragmentation
fragmentation memory allocated and between non-contiguous
required space or memory fragments are too memory is
called small to serve a new Internal fragmentation. process, which is called
External fragmentation.
• In many situations, hash tables turn out to be more efficient than search trees
or any other table lookup structure. For this reason, they are widely used in
many kinds of computer software, particularly for associative arrays,
database indexing, caches, and sets. Disadvantages
• Hash tables become quite inefficient when there are many collisions.
• Hash table does not allow null values, like a hash map.
35
• Define Compaction.
36
9. system must maintain a a list of holes in
the main free frame list.
memory.
Paging is invisible to Segmentation is visible to
10. the user. the user.
In paging, processor In segmentation, the needs page number, processor uses
segment
11.
offset to calculate the number, and offset to absolute address. calculate the full
address.
37
• A higher degree of multiprogramming.
• Allocating memory is easy and cheap
• Eliminates external fragmentation
• Data (page frames) can be scattered all over the
PM
• Pages are mapped appropriately anyway
• Large programs can be written, as the virtual space available is huge
compared to physical memory.
• Less I/O required leads to faster and easy swapping of processes.
• More physical memory is available, as programs are stored on virtual
memory, so they occupy very less space on actual physical memory.
• More efficient swapping
88. How to calculate performance in virtual memory?
Effective access time = (1-p) x Memory access time + p x page fault time
• Create
• Open
• Read
• Write
• Rename
• Delete
• Append
• Truncate
• Close
91. Define the term Bit-Vector?
38
A Bitmap or Bit Vector is a series or collection of bits where each bit corresponds to a
disk block. The bit can take two values: 0 and 1: 0 indicates that the block is allocated
and 1 indicates a free block.
file systems and disks. A file allocation table (FAT) is a table that an operating system
maintains on a hard disk that provides a map of the cluster (the basic units of logical
storage on a hard disk) that a file has been stored in.
39
There are some main advantages of a multiprocessor system:
• Enhanced performance.
• Multiple applications.
deadlock processes
Selecting a victim
One is that it depends on how often a deadlock is likely to occur under the
implementation of this algorithm. The other has to do with how many processes will be
affected by deadlock when this algorithm is applied.
40
DBMS
5. Define a database
Specifying the data types, structures, and constraints of the data to be stored using a Data
Definition Language
12. What is internal level schema. Object data models –a group of higher level
implementation data models closer to conceptual data models
42
The conceptual level –has a conceptual schema which describes the structure of the
database for users. It hides the details of the physical storage structures, and concentrates
on describing entities, data types, relationships, user operations and constraints. Usually
a representational data model is used to describe the conceptual schema.
43
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS
system. Relation instances never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called
relation key.
10. Attribute domain – Every attribute has some predefined value and scope
which is known as attribute domain
1 Google
2 Amazon
3 Apple
22. List the operations which can be done on the relational model.
The operations are, Insert, update, delete and select. • Insert is used to insert
data into the relation • Delete is used to delete tuples from the
table.
• Modify allows you to change the values of some attributes in existing tuples.
• Select allows you to choose a specific range of data.
45
ARTIFICIAL INTELLIGENCE BASICS
5. Define Artificial in terms of rational thinking. The study of mental faculties through the
use of computational models-Charniak & McDermott. The study of the computations that
make it possible to perceive, reason and act-Winston.
7. Define Agent.
An Agent is anything that can be viewed as perceiving (i.e.) understanding its
environment through sensors and acting upon that environment through actuators.
9. What are the factors that a rational agent should depend on at any given time?
46
1. The performance measure that defines degree of success.
2. Ever thing that the agent has perceived so far. We will call this complete
perceptual history the percept sequence.
3. When the agent knows about the environment.
4. The action that the agent can perform.
12. Give the structure of agent in an environment? Agent interacts with environment
through sensors and actuators.
An Agent is anything that can be viewed as perceiving (i.e.) understanding its
environment through sensors and acting upon that environment through actuators.
48
26. What is Environment Class (EC) and Environment Generator (EG)?
EC – It is defined as a group of environment.
EG – It selects the environment from environment class in which the agent has to Run.
49
ii. Real world problems
50
43. Define Backtracking search.
The variant of depth first search called backtracking search. Only one successor is
generated at a time rather than all successor, partially expanded node remembers which
successor generate next is called Backtracking search.
51
Memory bounded search-Iterative deepening A*search simplified memory bounded
A*search -Iterative improvement search –hill climbing -simulated annealing
53. Define Heuristic function, h (n). h (n) is defined as the estimated cost of the cheapest
path from node n to a goal node.
52
60. What is RBFS?
It keeps track of the f-value of the best alternative path available from any ancestor of
the current node. RBFS remembers the f-value of the best leaf in the forgotten sub tree
and therefore decide whether its worth re expanding the sub tree sometimes later.
61. Define iterative deepening search.
Iterative deepening is a strategy that sidesteps the issue of choosing the best depth limit
by trying all possible depth limits: first depth 0, then depth 1,then depth 2& so on.
69. Write the time & space complexity associated with depth limited search.
53
Time complexity =O (bd) , b-branching factor, d-depth of tree
Space complexity=o (bl)
79. List some drawbacks of hill climbing process. Local maxima: A local maxima as
opposed to a goal maximum is a peak that is lower that the highest peak in the state space.
Once a local maxima is reached the algorithm will halt even though the solution may be far
from satisfactory.
54
Plateaux: A plateaux is an area of the state space where the evaluation fn is
essentially flat. The search will conduct a random walk.
80. What is the meaning for greedy local search? It goals (picks) a good neighbor state
without thinking ahead about where to go next.
55
* It avoids repeated states as for as its memory allow.
* It is complete if the available memory is sufficient to store the shallowest
path.
* It is optimal if enough memory is available to store the shallowest optimal
solution path.
Otherwise it returns the best solution that can be reached with the available memory.
*When enough memory is available for entire search tree, the search is optimally
efficient.
*Hill climbing.
*Simulated annealing.
90. What are the things that agent knows in online search problems? a. Actions(s)
b. Step cost function C(s, a, s’)
c. Goal TEST(s)
109. What are the three levels in describing knowledge based agent?
• Logical level
• Implementation level
• Knowledge level or epistemological level
58
111. Define Semantics
The semantics of the language defines the truth of each sentence with respect to each
possible world. With this semantics, when a particular configuration exists with in an
agent, the agent believes the corresponding sentence.
113. What are the 3 types of symbol which is used to indicate objects, relations
and functions? i) Constant symbols for objects ii) Predicate symbols for
relations iii) Function symbols for functions
117. What are the two we use to query and answer in knowledge base?
ASK and TELL.
59
120. Define synchronic and diachronic sentence. Sentences dealing with same time are
called synchronic sentences. Sentences that allow reasoning “a cross time” are called
diachronic sentence.
121. What are the 2 types of synchronic rules? i. Diagnostic rules ii. Casual rules.
60
131. Define Modus Ponen’s rule in Propositional logic?
The standard patterns of inference that can be applied to derive chains of conclusions
that lead to the desired goal is said to be Modus Ponen’s rule.
136. Explain the function of Rete Algorithm? This algorithm preprocess the set of rules in
KB to constant a sort of data flow network in which each node is a literals from rule a
premise.
61
i. Set of Support (SOS) ii. Usable axioms
iii. Rewrites (or) Demodulators iv. A set of
parameters and sentences
63
MACHINE LEARNING FUNDAMENTALS
Supervised Learning
Unsupervised Learning
In unsupervised learning, we don't have labeled data. A model can identify patterns,
anomalies, and relationships in the input data.
64
Reinforcement Learning
Using reinforcement learning, the model can learn based on the rewards it received for
its previous action.
65
• Neural Networks (back propagation)
• Probabilistic networks
• Nearest Neighbor
• Support vector machines
4. Name three different categories while creating a model?
• Training dataset
• Validation dataset
• Test dataset
5. What is Overfitting, and How Can You Avoid It?
The Overfitting is a situation that occurs when a model learns the training set too well,
taking up random fluctuations in the training data as concepts. These impact the
model’s ability to generalize and don’t apply to new data.
When a model is given the training data, it shows 100 percent accuracy—technically a
slight loss. But, when we use the test data, there may be an error and low efficiency.
This condition is known as overfitting.
There are multiple ways of avoiding overfitting, such as:
• Regularization. It involves a cost term for the features involved with the
objective function
• Making a simple model. With lesser variables and parameters, the
variance can be reduced
• Cross-validation methods like k-folds can also be used
• If some model parameters are likely to cause overfitting, techniques for
regularization like LASSO can be used that penalize these parameters
66
Training Set Test Set
The training se
t is examples The test set is used to test the
given to the model to analyze accuracy of the hypothesis
and learn generated by the model
6. What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How Much Data
Will You Allocate for Your Training, Validation, and Test Sets?
Consider a case where you have labeled data for 1,000 records. One way to train the
model is to expose all 1,000 records during the training process. Then you take a small
set of the same data to test the model, which would give good results in this case.
But, this is not an accurate way of testing. So, we set aside a portion of that data called
the ‘test set’ before starting the training process. The remaining data is called the
‘training set’ that we use for training the model. The training set passes through the
model multiple times until the accuracy is high, and errors are minimized.
67
Now, we pass the test data to check if the model can accurately predict the values and
determine if training is effective. If you get errors, you either need to change your
model or retrain it with more data.
Regarding the question of how to split the data into a training set and test set, there is
no fixed rule, and the ratio can vary based on individual preferences.
7. How Do You Handle Missing or Corrupted Data in a Dataset?
One of the easiest ways to handle missing or corrupted data is to drop those rows or
columns or replace them entirely with some other value.
There are two useful methods in Pandas:
• IsNull() and dropna() will help to find the columns/rows with missing
data and drop them
• Fillna() will replace the wrong values with a placeholder value
68
8. How Can You Choose a Classifier Based on a Training Set Data Size?
When the training set is small, a model that has a right bias and low variance seems to
work better because they are less likely to overfit.
For example, Naive Bayes works best when the training set is large. Models with low
bias and high variance tend to perform better as they work fine with complex
relationships.
69
Here,
For actual values:
Total Yes = 12+1 = 13
Total No = 3+9 = 12
Similarly, for predicted values:
Total Yes = 12+3 = 15
Total No = 1+9 = 10
For a model to be accurate, the values across the diagonals should be high. The total
sum of all the values in the matrix equals the total observations in the test data set.
For the above matrix, total observations = 12+3+1+9 = 25
Now, accuracy = sum of the values across the diagonal/total dataset
= (12+9) / 25
= 21 / 25
= 84%
10. What Is a False Positive and False Negative and How Are They Significant?
False positives are those cases that wrongly get classified as True but are False.
False negatives are those cases that wrongly get classified as False but are True.
In the term ‘False Positive,’ the word ‘Positive’ refers to the ‘Yes’ row of the predicted
value in the confusion matrix. The complete term indicates that the system has
predicted it as a positive, but the actual value is negative.
True positive = 12
11. What Are the Three Stages of Building a Model in Machine Learning?
The three stages of building a machine learning model are:
• Model Building
Choose a suitable algorithm for the model and train it according to the
requirement
• Model Testing
Check the accuracy of the model through the test data
• Applying the Model
Make the required changes after testing and use the final model for real-
time projects
Here, it’s important to remember that once in a while, the model needs to be checked to
make sure it’s working correctly. It should be modified to make sure that it is upto-date.
12. What is Deep Learning?
The Deep learning is a subset of machine learning that involves systems that think and
learn like humans using artificial neural networks. The term ‘deep’ comes from the fact
that you can have several layers of neural networks.
One of the primary differences between machine learning and deep learning is that
feature engineering is done manually in machine learning. In the case of deep learning,
the model consisting of neural networks will automatically determine which features to
use (and which not to use).
Machine Learning Deep Learning
71
Enables machines to take decisions on Enables machines to take decisions with
their own, based on past data the help of artificial neural networks
Most feature need to be identified in The machine learns the features from
advance and manually coded the data it is provided
The problem is divided into two parts and The problem is solved in an end-to-end
solved individually and then combined manner
13. Give a popular application of machine learning that you see on day to day basis?
The recommendation engine implemented by major ecommerce websites uses Machine
Learning.
14. What Are the Applications of Supervised Machine Learning in Modern Businesses?
72
• Email Spam Detection
Here we train the model using historical data that consists of emails
categorized as spam or not spam. This labeled information is fed as
input to the model.
• Healthcare Diagnosis
By providing images regarding a disease, a model can be trained to
detect if a person is suffering from the disease or not.
• Sentiment Analysis
This refers to the process of using algorithms to mine documents and
determine whether they’re positive, neutral, or negative in sentiment.
• Fraud Detection
By training the model to identify suspicious patterns, we can detect
instances of possible fraud.
There are two techniques used in unsupervised learning: clustering and association.
Clustering
73
Clustering problems involve data to be divided into subsets. These subsets, also called
clusters, contain data that are similar to each other. Different clusters reveal different
details about the objects, unlike classification or regression.
Association
For example, an e-commerce website can suggest other items for you to buy, based on
the prior purchases that you have made, spending habits, items in your wishlist, other
customers’ purchase habits, and so on.
74
17. What is the Difference Between Supervised and Unsupervised Machine Learning?
• Supervised learning - This model learns from the labeled data and makes
a future prediction as output
• Unsupervised learning - This model uses unlabeled input data and allows
the algorithm to act on that information without guidance.
18. What is the Difference Between Inductive Machine Learning and Deductive Machine
Learning?
75
K-Means is unsupervised
The classifier is called ‘naive’ because it makes assumptions that may or may not turn
out to be correct.
The algorithm assumes that the presence of one feature of a class is not related to the
presence of any other feature (absolute independence of features), given the class
variable.
For instance, a fruit may be considered to be a cherry if it is red in color and round in
shape, regardless of other features. This assumption may or may not be right (as an
apple also matches the description).
21. Explain How a System Can Play a Game of Chess Using Reinforcement Learning.
76
Reinforcement learning has an environment and an agent. The agent performs some
actions to achieve a specific goal. Every time the agent performs a task that is taking it
towards the goal, it is rewarded. And, every time it takes a step that goes against that goal
or in the reverse direction, it is penalized.
Earlier, chess programs had to determine the best moves after much research on
numerous factors. Building a machine designed to play such games would require
many rules to be specified.
With reinforced learning, we don’t have to deal with this problem as the learning agent
learns by playing the game. It will make a move (decision), check if it’s the right move
(feedback), and keep the outcomes in memory for the next step it takes (learning).
There is a reward for every correct decision the system takes and punishment for the
wrong one.
Once a user buys something from Amazon, Amazon stores that purchase data for future
reference and finds products that are most likely also to be bought, it is possible because
of the Association algorithm, which can identify patterns in a given dataset.
77
24. When Will You Use Classification over Regression?
Classification is used when your target is categorical, while regression is used when
your target variable is continuous. Both classification and regression belong to the
category of supervised machine learning algorithms.
• Predicting yes or no
• Estimating gender
• Breed of an animal
• Type of color
Examples of regression problems include:
• Estimating sales and price of a product
• Predicting the score of a team
• Predicting the amount of rainfall
78
• The supervised machine learning algorithm will then determine which
type of emails are being marked as spam based on spam words
like the lottery, free offer, no money, full refund, etc.
• The next time an email is about to hit your inbox, the spam filter will use
statistical analysis and algorithms like Decision Trees and SVM to
determine how likely the email is spam
• If the likelihood is high, it will label it as spam, and the email won’t hit
your inbox
• Based on the accuracy of each model, we will use the algorithm with the
highest accuracy after testing all the models
A ‘random forest’ is a supervised machine learning algorithm that is generally used for
classification problems. It operates by constructing multiple decision trees during the
training phase. The random forest chooses the decision of the majority of the trees as
the final decision.
79
27. Considering a Long List of Machine Learning Algorithms, given a Data Set,
How Do You Decide Which One to Use?
There is no master algorithm for all situations. Choosing an algorithm depends on the
following questions:
• How much data do you have, and is it continuous or categorical?
• Is the problem related to classification, association, clustering, or
regression? Predefined variables (labeled), unlabeled, or mix?
• What is the goal?
Based on the above questions, the following algorithms can be used:
80
28. What is Bias and Variance in a Machine Learning Model?
Bias
Bias in a machine learning model occurs when the predicted values are further from the
actual values. Low bias indicates a model where the prediction values are very close to
the actual ones.
Underfitting: High bias can cause an algorithm to miss the relevant relations between
features and target outputs.
Variance
Variance refers to the amount the target model will change when trained with
different training data. For a good model, the variance should be minimized.
Overfitting: High variance can cause an algorithm to model the random noise in the
training data rather than the intended outputs.
29. What is the Trade-off Between Bias and Variance?
The bias-variance decomposition essentially decomposes the learning error from any
algorithm by adding the bias, variance, and a bit of irreducible error due to noise in the
underlying dataset.
Necessarily, if you make the model more complex and add more variables, you’ll lose
bias but gain variance. To get the optimally-reduced amount of error, you’ll have to
trade off bias and variance. Neither high bias nor high variance is desired.
High bias and low variance algorithms train models that are consistent, but inaccurate on
average.
High variance and low bias algorithms train models that are accurate but inconsistent.
Precision
Precision is the ratio of several events you can correctly recall to the total number of
events you recall (mix of correct and wrong recalls).
Recall
A recall is the ratio of the number of events you can recall the number of total events.
81
Recall = (True Positive) / (True Positive + False
Negative)
31. What is a Decision Tree Classification?
A decision tree builds classification (or regression) models as a tree structure, with
datasets broken up into ever-smaller subsets while developing the decision tree,
literally in a tree-like way with branches and nodes.
Decision trees can handle both categorical and numerical data.
32. What is Pruning in Decision Trees, and How Is It Done?
Pruning is a technique in machine learning that reduces the size of decision trees. It
reduces the complexity of the final classifier, and hence improves predictive accuracy
by the reduction of overfitting.
• Starting at the leaves, each node is replaced with its most popular class
82
34. Explain the K Nearest Neighbor Algorithm.
In K nearest neighbors, K can be an integer greater than 1. So, for every new data
point, we want to classify, we compute to which neighboring group it is closest.
Let us classify an object using the following example. Consider there are three clusters:
• Football
• Basketball
• Tennis ball
83
Let the new data point to be classified is a black ball. We use KNN to classify it. Assume
K = 5 (initially).
Observe that all five selected points do not belong to the same cluster. There are three
tennis balls and one each of basketball and football.
When multiple classes are involved, we prefer the majority. Here the majority is with
the tennis ball, so the new data point is assigned to this cluster.
84
35. What is a Recommendation System?
Anyone who has used Spotify or shopped at Amazon will recognize a recommendation
system: It’s an information filtering system that predicts what a user might want to hear
or see based on choice patterns provided by the user.
F1 = 2 * (P * R) / (P + R)
The F1 score is one when both Precision and Recall scores are one.
85
41. Explain Correlation and Covariance?
Correlation: Correlation tells us how strongly two random variables are related to each
other. It takes values between -1 to +1.
Covariance: Covariance tells us the direction of the linear relationship between two
random variables. It can take any value between - ∞ and + ∞.
Formula to calculate Covariance:
86
42. What are Support Vectors in SVM?
Support Vectors are data points that are nearest to the hyperplane. It influences the
position and orientation of the hyperplane. Removing the support vectors will alter the
position of the hyperplane. The support vectors help us build our support vector
machine model.
87
Example: A Random Forest with 100 trees can provide much better results than using
just one decision tree.
88
Gini Impurity: Splitting the nodes of a decision tree
using Gini Impurity is followed when the target
variable is categorical.
46. How does the Support Vector Machine algorithm handle self-learning?
The SVM algorithm has a learning rate and expansion rate which takes care of self-
learning. The learning rate compensates or penalizes the hyperplanes for making all the
incorrect moves while the expansion rate handles finding the maximum separation area
between different classes.
47. What are the assumptions you need to take before starting with linear regression?
There are primarily 5 assumptions for a Linear Regression model:
• Multivariate normality
• No auto-correlation
• Homoscedasticity
• Linear relationship
• No or little multicollinearity
48. What is the difference between Lasso and Ridge regression?
Lasso(also known as L1) and Ridge(also known as L2) regression are two popular
regularization techniques that are used to avoid overfitting of data. These methods are
used to penalize the coefficients to find the optimum solution and reduce complexity.
The Lasso regression works by penalizing the sum of the absolute values of the
coefficients. In Ridge or L2 regression, the penalty function is determined by the sum of
the squares of the coefficients.
89
Neural Networks
5.What is hyperparameters?
Hyperparameters are parameters whose values control the learning process and
determine the values of model parameters that a learning algorithm ends up learning.
6. Define ReLU.
90
Rectified Linear Unit (ReLU) solve the vanishing gradient problem. ReLU is a nonlinear
function or piecewise linear function that will output the input directly if it is positive,
otherwise, it will output zero.
The vanishing gradient problem is a problem that user face, when we are training
Neural Networks by using gradient-based methods like backpropagation. This problem
makes it difficult to learn and tune the parameters of the earlier layers in the network
8. Define normalization.
Normalization is a data pre-processing tool used to bring the numerical data to a common
scale without distorting its shape.
a) ReLU is simple to compute and has a predictable gradient for the backpropagation of
the error. b) Easy to implement and very fast.
91
Disadvantages of deep learning
Except for the input layer, each node in the other layers uses a nonlinear activation
function. This means the input layers, the data coming in, and the activation function is
based upon all nodes and weights being added together, producing the output. MLP uses
a supervised learning method called “backpropagation.” In backpropagation, the neural
network calculates the error with the help of cost function. It propagates this error
backward from where it came (adjusts the weights to train the model more accurately).
93
23. What Is Gradient Descent?
Gradient Descent is an optimal algorithm to minimize the cost function or to minimize
an error. The aim is to find the local-global minima of a function. This determines the
direction the model should take to reduce the error.
A Feedforward Neural Network signals travel in one direction from input to output. There
are no feedback loops; the network considers only the current input. It cannot memorize
previous inputs (e.g., CNN).
94
25. What Are the Applications of a Recurrent Neural Network (RNN)?
The RNN can be used for sentiment analysis, text mining, and image captioning.
Recurrent Neural Networks can also address time series problems such as predicting the
prices of stocks in a month or quarter.
ReLU (or Rectified Linear Unit) is the most widely used activation function. It gives an
output of X if X is positive and zeros otherwise. ReLU is often used for hidden layers.
28. What Will Happen If the Learning Rate Is Set Too Low or Too High?
When your learning rate is too low, training of the model will progress very slowly as we
are making minimal updates to the weights. It will take many updates before reaching
the minimum point.
If the learning rate is set too high, this causes undesirable divergent behavior to the loss
function due to drastic updates in weights. It may fail to converge (model can give a good
output) or even diverge (data is too chaotic for the network to train).
Batch normalization is the technique to improve the performance and stability of neural
networks by normalizing the inputs in every layer so that they have mean output
activation of zero and standard deviation of one.
30. What Is the Difference Between Batch Gradient Descent and Stochastic Gradient
Descent?
Batch Gradient Descent Stochastic Gradient Descent
95
The batch gradient The stochastic gradient computes the gradient computes the gradient
using using the entire dataset. a single sample.
It takes time to converge It converges much faster than because the volume of data the
batch gradient because it is huge, and weights update updates weight more slowly.
frequently.
Underfitting alludes to a model that is neither well-trained on data nor can generalize to
new information. This usually happens when there is less and incorrect data to train a
model. Underfitting has both poor performance and accuracy.
32. How Are Weights Initialized in a Network? There are two methods here: we can either
initialize the weights to zero or assign them randomly.
Initializing all weights to 0: This makes your model similar to a linear model. All the
neurons and every layer perform the same operation, giving the same output and making
the deep net useless.
Initializing all weights randomly: Here, the weights are assigned randomly by initializing
them very close to 0. It gives better accuracy to the model since every neuron performs
different computations. This is the most commonly used method.
33. What Are the Different Layers on CNN? There are four layers in CNN:
1. Convolutional Layer - the layer that performs a convolutional operation,
creating several smaller picture windows to go over the data.
2. ReLU Layer - it brings non-linearity to the network and converts all the
negative pixels to zero. The output is a rectified feature map.
3. Pooling Layer - pooling is a down-sampling operation that reduces the
dimensionality of the feature map.
34. What is Pooling on CNN, and How Does It Work?
Pooling is used to reduce the spatial dimensions of a CNN. It performs down-sampling
operations to reduce the dimensionality and creates a pooled feature map by sliding a
filter
42. What is the significance of using the Fourier transform in Deep Learning tasks?
97
The Fourier transform function efficiently analyzes, maintains, and manages large
datasets. You can use it to generate real-time array data that is helpful for processing
multiple signals.
43. What do you understand by transfer learning? Name a few commonly used transfer
learning models.
Transfer learning is the process of transferring the learning from a model to another
model without having to train it from scratch. It takes critical parts of a pre-trained model
and applies them to solve new but similar machine learning problems.
The input image gets fully covered by the filter and specified stride. The padding type is
named SAME as the output size is the same as the input size (when stride=1).
With padding == “VALID” implies there is no padding in the input image. The filter
window always stays inside the input image. It assumes that all the dimensions are valid
so that the input image gets fully covered by a filter and the stride defined by you.
The Swish function works better than ReLU for a variety of deeper models.
98
The derivative of Swist can be written as: y’ = y + sigmoid(x) * (1 - y)
47. What are the reasons for mini-batch gradient being so useful?
• Mini-batch gradient is highly efficient compared to stochastic gradient
descent.
• It lets you attain generalization by finding the flat minima.
• Mini-batch gradient helps avoid local minima to allow gradient
approximation for the whole dataset.
48. What do you understand by Leaky ReLU activation function?
Leaky ReLU is an advanced version of the ReLU activation function. In general, the
ReLU function defines the gradient to be 0 when all the values of inputs are less than
zero. This deactivates the neurons. To overcome this problem, Leaky ReLU activation
functions are used. It has a very small slope for negative values instead of a flat slope.
49. What is Data Augmentation in Deep Learning?
Data Augmentation is the process of creating new data by enhancing the size and quality
of training datasets to ensure better models can be built using them. There are different
techniques to augment data such as numerical data augmentation, image augmentation,
GAN-based augmentation, and text augmentation.
• CNN allows you to look at the weights of a filter and visualize what the
network learned. So, this gives a better understanding of the model. CNN
trains models in a hierarchical way, i.e., it learns the patterns by explaining
complex patterns using simpler ones.
52. Which strategy does not prevent a model from over-fitting to the training data?
1. Dropout
99
2. Pooling
3. Data augmentation
4. Early stopping
Answer: b) Pooling - It’s a layer in CNN that performs a downsampling operation.
53. Explain two ways to deal with the vanishing gradient problem in a deep neural
network.
• Use the ReLU activation function instead of the sigmoid function
• Initialize neural networks using Xavier initialization that works with tanh
activation.
54. Why is a deep neural network better than a shallow neural network?
Both deep and shallow neural networks can approximate the values of a function. But
the deep neural network is more efficient as it learns something new in every layer. A
shallow neural network has only one hidden layer. But a deep neural network has several
hidden layers that create a deeper representation and computation capability.
55. What is the need to add randomness in the weight initialization process?
If you set the weights to zero, then every neuron at each layer will produce the same
result and the same gradient value during backpropagation. So, the neural network won’t
be able to learn the function as there is no asymmetry between the neurons. Hence,
randomness to the weight initialization process is crucial.
100