0% found this document useful (0 votes)
25 views97 pages

Python QA

The document provides an overview of various Python data types, methods, and programming concepts including control flow, functions, file handling, exception handling, modules, and object-oriented programming. It includes practical examples for each concept, such as data types like int, float, and str, as well as methods for lists, dictionaries, and sets. Additionally, it covers advanced topics like generators, lambda functions, and the use of built-in functions and modules for mathematical operations and regular expressions.

Uploaded by

sh.ashfaqueme49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views97 pages

Python QA

The document provides an overview of various Python data types, methods, and programming concepts including control flow, functions, file handling, exception handling, modules, and object-oriented programming. It includes practical examples for each concept, such as data types like int, float, and str, as well as methods for lists, dictionaries, and sets. Additionally, it covers advanced topics like generators, lambda functions, and the use of built-in functions and modules for mathematical operations and regular expressions.

Uploaded by

sh.ashfaqueme49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Extracted Text from the Image

Data Types
• int: Integer (e.g., x = 5)
• float: Decimal (e.g., y = 3.14)
• str: String (e.g., name = "Data Science")
• bool: Boolean (e.g., flag = True)
• list: Ordered, mutable collection (e.g., lst = [1, 2, 3])
• tuple: Ordered, immutable collection (e.g., tpl = (1, 2, 3))
• set: Unordered, unique elements (e.g., s = {1, 2, 3})
• dict: Key-value pairs (e.g., d = {"a": 1, "b": 2})

String Methods
• .upper(), .lower(): Convert case (e.g., "data".upper() → "DATA")
• .strip(): Removes leading/trailing spaces (e.g., " data ".strip() → "data")
• .replace(): Replaces substrings (e.g., "AI".replace("A", "ML") → "MLI")
• .split(), .join(): Splits/join strings (e.g., "a b".split() → ['a', 'b'])
• .find(), .index(): Finds substring (e.g., "science".find("i") → 2)
• .count(): Counts occurrences (e.g., "banana".count("a") → 3)
• .startswith(), .endswith(): Checks start/end (e.g., "data".startswith("d")
→ True)
• f"{}" (f-string): String formatting (e.g., f"Value: {x}" → "Value: 5")
List Methods
• .append(): Adds item (e.g., lst.append(4))
• .extend(): Adds multiple items (e.g., lst.extend([5, 6]))
• .insert(): Inserts at index (e.g., lst.insert(1, "AI"))
• .remove(): Removes first occurrence (e.g., lst.remove(2))
• .pop(): Removes & returns last item (e.g., lst.pop())
• .clear(): Empties list (e.g., lst.clear())
• .index(): Finds index of value (e.g., lst.index(3))
• .count(): Counts occurrences (e.g., lst.count(1))
• .sort(): Sorts list (e.g., lst.sort())
• .reverse(): Reverses list (e.g., lst.reverse())

Tuple
• tuple(): Creates an immutable sequence (e.g., t = (1, 2, 3))
• len(): Returns length (e.g., len((1,2,3)) → 3)
• index(): Finds index of an element (e.g., (1,2,3).index(2) → 1)
• count(): Counts occurrences (e.g., (1,1,2).count(1) → 2)
Set Methods
• .add(): Adds an element (e.g., s.add(5))
• .remove(): Removes an element (e.g., s.remove(3))
• .discard(): Removes element if exists (e.g., s.discard(3))
• .pop(): Removes & returns random element (e.g., s.pop())
• .clear(): Empties the set (e.g., s.clear())
• .union(): Combines sets (e.g., s1.union(s2))
• .intersection(): Finds common elements (e.g., s1.intersection(s2))
• .difference(): Finds unique elements (e.g., s1.difference(s2))

Dictionary Methods
• .keys(): Returns all keys (e.g., d.keys())
• .values(): Returns all values (e.g., d.values())
• .items(): Returns key-value pairs (e.g., d.items())
• .get(): Retrieves value by key (e.g., d.get("name"))
• .update(): Updates dictionary (e.g., d.update({"age": 25}))
• .pop(): Removes key-value pair by key (e.g., d.pop("age"))
• .popitem(): Removes last inserted item (e.g., d.popitem())
• .clear(): Empties dictionary (e.g., d.clear())

Control Flow
• if, elif, else: Conditional statements (e.g., if x > 5: print("High"))
• for: Loops over sequences (e.g., for i in range(5): print(i))
• while: Loops until condition is false (e.g., while x < 10: x += 1)
• break: Exits loop early (e.g., if x == 5: break)
• continue: Skips to next iteration (e.g., if x == 5: continue)
• pass: Placeholder for future code (e.g., if x > 5: pass)

Functions
• def: Defines a function (e.g., def add(a, b): return a + b)
• return: Returns a value from a function (e.g., return result)
• lambda: Anonymous function (e.g., lambda x: x * 2 → 4 for x=2)
• map(): Applies function to an iterable (e.g., map(str.upper, ['a', 'b']))
• filter(): Filters values based on condition (e.g., filter(lambda x: x > 2,
[1,2,3]))
• reduce(): Performs cumulative operation (e.g., reduce(lambda x, y: x + y,
[1,2,3]))

File Handling
• open(): Opens a file (e.g., f = open('data.txt', 'r'))
• .read(), .readline(), .readlines(): Reads file content (e.g., f.read())
• .write(), .writelines(): Writes data to a file (e.g., f.write("Hello"))
• .close(): Closes the file (e.g., f.close())
• with open() as: Handles files safely (e.g., with open('file.txt') as f:)

Exception Handling
• try, except, finally: Handles errors (e.g., try: x=1/0 except
ZeroDivisionError: print("Error"))
• raise: Raises an exception (e.g., raise ValueError("Invalid Input"))
• assert: Debugging check (e.g., assert x > 0, "x must be positive")

Modules
• import: Imports a module (e.g., import numpy as np)
• from import: Imports specific function (e.g., from math import sqrt)
• as: Renames module (e.g., import pandas as pd)
• dir(): Lists attributes of an object (e.g., dir(str))
• help(): Displays documentation (e.g., help(list))

Classes & OOP (Object-Oriented Programming)


• class: Defines a class (e.g., class Car:)
• __init__: Constructor method (e.g., def __init__(self, name): self.name =
name)
• self: Refers to the instance of the class (e.g., self.attribute)
• @staticmethod: Defines a static method that doesn’t use self (e.g.,
@staticmethod def greet(): return "Hello")
• @classmethod: Defines a class method (e.g., @classmethod def
from_string(cls, string): return cls(*string.split()))
• @property: Defines a read-only property (e.g., @property def
get_name(self): return self.name)
• inheritance: Enables one class to inherit from another (e.g., class
Car(Vehicle):)
• super(): Calls parent class methods (e.g., super().__init__())

Special Methods (Dunder Methods)


• __str__: Returns string representation (e.g., def __str__(self): return "Car
Object")
• __repr__: Returns a detailed representation (e.g., def __repr__(self):
return "Car('BMW'))
• __len__: Defines length behavior (e.g., def __len__(self): return
len(self.data))
• __call__: Makes an object callable (e.g., def __call__(self): return
"Called")
• __getitem__: Enables indexing (e.g., def __getitem__(self, index): return
self.data[index])
• __setitem__: Modifies an item (e.g., def __setitem__(self, index, value):
self.data[index] = value)
• __iter__: Makes an object iterable (e.g., def __iter__(self): return
iter(self.data))

Built-in Functions
• sum(): Computes sum (e.g., sum([1,2,3]) → 6)
• min(), max(): Finds min/max value (e.g., max([1,2,3]) → 3)
• abs(): Returns absolute value (e.g., abs(-5) → 5)
• round(): Rounds a number (e.g., round(3.14, 1) → 3.1)
• sorted(): Sorts a sequence (e.g., sorted([3,1,2]) → [1,2,3])
• enumerate(): Adds index while iterating (e.g., for i, val in enumerate(['a',
'b']): print(i, val))
• zip(): Combines iterables (e.g., list(zip([1,2], ['a', 'b'])) → [(1, 'a'), (2, 'b')])
• any(), all(): Checks if any/all elements meet a condition (e.g., all([True,
False]) → False)

Datetime Module
• datetime.datetime: Represents date & time (e.g.,
datetime.datetime.now())
• datetime.date: Represents only the date (e.g., datetime.date.today())
• datetime.timedelta: Represents time difference (e.g.,
datetime.timedelta(days=5))
• datetime.strptime(), datetime.strftime(): Converts between string and
datetime (e.g., datetime.strptime("2025-02-19", "%Y-%m-%d"))

Random Module (For Generating Random Data)


• random.randint(): Returns a random integer within a range (e.g.,
random.randint(1, 10) → 7)
• random.choice(): Selects a random item from a list (e.g.,
random.choice(['A', 'B', 'C']) → 'B')
• random.shuffle(): Shuffles elements in a list (e.g.,
random.shuffle(my_list))
• random.random(): Returns a random float between 0 and 1 (e.g.,
random.random() → 0.657)
• random.sample(): Returns a subset of elements (e.g.,
random.sample(range(10), 3) → [2, 8, 5])

Math Module (For Mathematical Operations)


• math.sqrt(): Computes square root (e.g., math.sqrt(25) → 5.0)
• math.pow(): Computes power (e.g., math.pow(2, 3) → 8.0)
• math.pi: Returns the value of π (e.g., math.pi → 3.14159)
• math.e: Returns Euler’s number (e.g., math.e → 2.718)
• math.sin(), math.cos(), math.tan(): Trigonometric functions (e.g.,
math.sin(math.pi/2) → 1.0)
• math.log(): Computes natural logarithm (e.g., math.log(10) → 2.302)

Regular Expressions (For Pattern Matching)


• import re: Imports the regex module (re).
• re.search(): Searches for a pattern in text (e.g., re.search(r'\d+', "Price is
100") → 100)
• re.match(): Matches a pattern at the start (e.g., re.match(r'Hello', "Hello
World") → 'Hello')
• re.findall(): Finds all matches in a string (e.g., re.findall(r'\d+', "Price 100,
Discount 50") → ['100', '50'])
• re.sub(): Replaces a pattern in text (e.g., re.sub(r'\d+', 'X', "Price 100") →
"Price X")
• re.compile(): Compiles a regex for repeated use (e.g., pattern =
re.compile(r'\d+'))

1. Python Basics
Que-What is indentation in Python?
Indentation is used to define code blocks instead of curly braces.

Que-How do you take user input in Python?


input("Enter something: ")

Que-What is the default data type of input()?


str (string).

Que-How do you convert a string to an integer?


int("10")

Que-What is the use of type() function?


It returns the data type of a variable.

2. Lists, Tuples, and Sets


Que-How do you add an element to a list?
my_list.append(5)

Que-How do you remove an element from a list?


my_list.remove(5)

Que-How do you access the last element of a list?


my_list[-1]

Que-How do you merge two sets?


set1 | set2 or set1.union(set2)

Que-How do you convert a list to a tuple?


tuple(my_list)

3. Class, Dictionary, Conditional, and Loop


Que-How do you access a dictionary value by key?
my_dict["key"]

Que-What happens if you access a non-existent key in a dictionary?


A KeyError occurs.

Que-How do you check if a dictionary is empty?


if not my_dict:

Que-How do you write an if-else statement in one line?


x = a if condition else b

Que-How do you iterate over dictionary keys and values?


for k, v in my_dict.items():
4. For Loop Practice
Que-How do you loop over a list?
for item in my_list:

Que-How do you iterate in reverse order?


for item in reversed(my_list):

Que-How do you loop over a range with a step?


for i in range(0, 10, 2):

Que-How do you find the sum of a list using a loop?


total = sum(my_list)

Que-How do you break a loop early?


break

5. While Loop
Que-What is an infinite loop?
A loop that never stops running.

Que-How do you stop an infinite loop?


Use break or Ctrl + C.

Que-How do you decrement in a while loop?


while x > 0: x -= 1

Que-What happens if the condition in a while loop is always False?


The loop never executes.

Que-How do you use else with a while loop?


while condition: ... else: ...

6. Comprehension
Que-How do you create a set using comprehension?
{x for x in range(5)}

Que-How do you create a dictionary using comprehension?


{x: x 2 for x in range(5)}

Que-How do you filter items using comprehension?


[x for x in my_list if x > 10]

Que-How do you use nested loops in comprehension?


[x*y for x in range(3) for y in range(2)]

Que-How do you use a function inside list comprehension?


[func(x) for x in my_list]
7. Functions
Que-How do you define a function?
def my_function():

Que-What does a function return if no return statement is given?


None

Que-How do you specify default arguments?


def func(x=10):

Que-What is the difference between *args and kwargs?


*args collects positional arguments, kwargs collects keyword arguments.

Que-How do you return multiple values from a function?


return a, b, c

8. Generator Functions
Que-What is a generator function?
A function that uses yield to return values lazily.

Que-How do you create a generator?


Use yield instead of return.

Que-What is the advantage of a generator?


It saves memory as values are generated on demand.

Que-How do you iterate over a generator?


for val in generator():

Que-How do you convert a generator to a list?


list(generator())

9. Lambda Functions
Que-What is a lambda function?
An anonymous function defined using lambda.

Que-How do you define a lambda function?


lambda x: x + 2

Que-How do you use a lambda function inside map()?


map(lambda x: x*2, my_list)

Que-Can a lambda function have multiple parameters?


Yes, lambda x, y: x + y.

Que-Can a lambda function return multiple values?


No, but it can return a tuple.
10. Map, Reduce, and Filter
Que-What does map() do?
Applies a function to all elements in an iterable.

Que-What does filter() do?


Filters elements based on a condition.

Que-What does reduce() do?


Applies a function cumulatively to elements.

Que-How do you sum a list using reduce()?


reduce(lambda x, y: x + y, my_list)

Que-What is required for reduce() to work?


Import from functools: from functools import reduce

11. OOPS Concepts


Que-What is a class?
A blueprint for creating objects.

Que-What is an object?
An instance of a class.

Que-How do you define a class?


class MyClass:

Que-What is self in Python classes?


A reference to the instance of the class.

Que-How do you create an instance of a class?


obj = MyClass()

12. Polymorphism
Que-What is polymorphism?
The ability of different classes to be treated as instances of the same class.

Que-How do you implement polymorphism?


By overriding methods in subclasses.

Que-How do you use method overloading in Python?


Python does not support method overloading directly.

Que-What is operator overloading?


Using special methods like _add_ to redefine operators.

Que-Which dunder method is used for string representation?


_str_()
13. Encapsulation
Que-What is encapsulation in Python?
It is the bundling of data and methods into a single unit (class).

Que-How do you define a private variable in Python?


Prefix it with double underscore: self.__private_var.

Que-Can private variables be accessed outside the class?


No, but they can be accessed using name mangling: ClassName_private_var.

Que-How do you define a protected variable in Python?


Prefix it with a single underscore: self._protected_var.

Que-Why is encapsulation important?


It helps in data hiding and protects data from direct modification.

14. Inheritance
Que-What is inheritance in Python?
It allows a class to derive properties and methods from another class.

Que-How do you create a subclass in Python?


class Child(Parent):

Que-How do you call a parent class method in a child class?


super().method_name()

Que-What is multiple inheritance?


A class inheriting from more than one parent class.

Que-What is method overriding?


Redefining a method in the subclass that exists in the parent class.

15. Abstraction
Que-What is abstraction in Python?
Hiding implementation details and exposing only necessary functionalities.

Que-Which module provides abstraction support?


abc (Abstract Base Classes).

Que-How do you define an abstract class?


from abc import ABC; class MyClass(ABC):

Que-How do you define an abstract method?


@abstractmethod def my_method(self): pass

Que-Can an abstract class be instantiated?


No, it must be subclassed and implemented.
16. Decorators
Que-What is a decorator in Python?
A function that modifies another function’s behavior.

Que-do you define a decorator?


def decorator(func): def wrapper(): func(); return wrapper

Que-How do you apply a decorator?


Use @decorator_name before the function definition.

Que-Can a decorator take arguments?


Yes, by using nested functions.

Que-What is functools.wraps used for?


It preserves the original function's metadata.

17. Class Methods


Que-What is a class method?
A method that operates on the class, not the instance.

Que-How do you define a class method?


Use @classmethod.

Que-What is the first parameter of a class method?


cls (class reference).

Que-How do you call a class method?


ClassName.method_name() or instance.method_name().

Que-How is a class method different from a static method?


A class method receives cls, while a static method does not.

18. Static Methods


Que-What is a static method?
A method that does not access class or instance attributes.

Que-How do you define a static method?


Use @staticmethod.

Que-Can a static method access self or cls?


No, it works independently of class instances.

Que-How do you call a static method?


ClassName.method_name() or instance.method_name().

Que-When should you use a static method?


When the method logic does not depend on instance or class attributes.
19. Special (Magic/Dunder) Methods
Que-What are special (dunder) methods?
Methods with double underscores used for operator overloading and built-in behaviors.

Que-What is _init_ used for?


It initializes a new instance of a class.

Que-What is _str_ used for?


Defines the string representation of an object.

Que-What is _len_ used for?


Returns the length of an object.

Que-What is _call_ used for?


Allows an instance of a class to be called like a function.

20. Property Decorators - Getters, Setters, and Deletes


Que-What is the purpose of @property?
It defines a method as a property.

Que-How do you create a getter method?


Use @property.

Que-How do you create a setter method?


Use @property_name.setter.

Que-How do you delete a property?


Use @property_name.deleter.

Que-Why use property decorators?


To control attribute access and validation.

21. Working with Files


Que-How do you open a file in Python?
open("file.txt", "r")

Que-How do you read a file?


file.read()

Que-How do you write to a file?


file.write("Hello")

Que-How do you close a file?


file.close()

Que-What is the advantage of with open()?


It automatically closes the file.
22. Reading and Writing Files
Que-How do you read lines from a file?
file.readlines()

Que-How do you write multiple lines to a file?


file.writelines(["line1", "line2"])

Que-What happens if you open a file in append mode?


Data is added to the file instead of overwriting.

Que-How do you check if a file exists?


import os; os.path.exists("file.txt")

Que-How do you delete a file?


os.remove("file.txt")

23. Exception Handling


Que-What is exception handling?
It prevents program crashes due to runtime errors.

Que-How do you handle exceptions?


Using try-except blocks.

Que-How do you handle multiple exceptions?


except (TypeError, ValueError):

Que-How do you use finally?


finally executes code regardless of exceptions.

Que-What is raise used for?


To manually trigger an exception.

24. Multithreading
Que-What is multithreading?
Running multiple threads in parallel within a program.

Que-How do you create a thread?


import threading; thread = threading.Thread(target=func)

Que-How do you start a thread?


thread.start()

Que-What is the GIL?


Global Interpreter Lock, restricting Python threads.

Que-How do you make a thread wait?


thread.join()
25. Multiprocessing
Que-What is multiprocessing?
Running processes in parallel for CPU-bound tasks.

Que-How do you create a process?


from multiprocessing import Process

Que-How do you start a process?


process.start()

Que-How do you terminate a process?


process.terminate()

Que-Why use multiprocessing over multithreading?


To bypass Python’s GIL and utilize multiple CPU cores.

26. Custom Exception Handling


Que-How do you define a custom exception?
class MyException(Exception): pass

Que-How do you raise a custom exception?


raise MyException("Error message")

Que-Can custom exceptions inherit from Exception?


Yes, all custom exceptions should inherit from Exception.

Que-Why use custom exceptions?


To create meaningful and domain-specific error messages.

Que-Can you define multiple custom exceptions?


Yes, by creating different classes inheriting from Exception.

27. List of General Use Exceptions


Que-What is TypeError?
Raised when an operation is applied to an inappropriate type.

Que-What is ValueError?
Raised when a function gets an argument of the right type but invalid value.

Que-What is IndexError?
Raised when accessing an index that does not exist in a sequence.

Que-What is KeyError?
Raised when trying to access a key that does not exist in a dictionary.

Que-What is ZeroDivisionError?
Raised when division by zero occurs.
28. Best Practices for Exception Handling
Que-Why should you avoid using a generic except?
It catches all exceptions, making debugging difficult.

Que-How do you log exceptions?


Using the logging module.

Que-Why should you use finally?


To ensure resource cleanup, like closing a file.

Que-What is exception chaining?


Using raise from to maintain context between exceptions.

Que-Why should you handle specific exceptions?


To provide better error handling and debugging.

29. Logging & Debugging


Que-What is the purpose of the logging module?
To record runtime messages and debug information.

Que-How do you log an error message?


import logging; logging.error("An error occurred")

Que-What are different log levels in Python?


DEBUG, INFO, WARNING, ERROR, CRITICAL

Que-How do you configure logging to a file?


logging.basicConfig(filename="app.log", level=logging.DEBUG)

Que-What is the default logging level?


WARNING

30. Modules and Import Statements


Que-How do you import a module in Python?
import module_name

Que-How do you import a specific function from a module?


from module_name import function_name

Que-How do you rename a module while importing?


import module_name as alias

Que-How do you check all functions in a module?


dir(module_name)

Que-How do you install an external module?


pip install module_name
31. Working with Buffered Read and Write
Que-What is buffered file reading?
Reading large files in chunks to optimize memory.

Que-How do you read a file line by line?


for line in open("file.txt")

Que-How do you flush a file write buffer?


file.flush()

Que-What does file.seek(0) do?


Moves the cursor to the beginning of the file.

Que-What is the benefit of using buffered I/O?


It improves performance when handling large files.

32. Multithreading vs. Multiprocessing


Que-What is the key difference between threads and processes?
Threads share memory, processes run in separate memory spaces.

Que-When should you use multithreading?


When the task is I/O-bound (e.g., file reading, network calls).

Que-When should you use multiprocessing?


When the task is CPU-bound (e.g., computations, data processing).

Que-How do you create a process pool?


from multiprocessing import Pool; Pool(processes=4)

Que-Does Python truly execute threads in parallel?


No, due to the Global Interpreter Lock (GIL).

33. Advanced Python Topics


Que-What is a metaclass in Python?
A class that defines the behavior of other classes.

Que-What is the purpose of _slots_?


It restricts attribute creation in a class for memory optimization.

Que-What is duck typing in Python?


If an object behaves like a type, it is treated as that type.

Que-What is monkey patching?


Dynamically modifying a class or module at runtime.

Que-What is the difference between deep copy and shallow copy?


A shallow copy copies references, while a deep copy clones objects.
34. Iterators and Generators
Que-What is an iterator in Python?
An object with _iter() and __next_() methods.

Que-How do you create an iterator?


Define a class with _iter() and __next_().

Que-What is a generator?
A function that yields values lazily.

Que-How do you create a generator function?


Use yield instead of return.

Que-How do you manually get the next value of a generator?


next(generator)

35. Python Memory Management


Que-How does Python manage memory?
Using automatic garbage collection and reference counting.

Que-What is reference counting?


Python keeps track of the number of references to an object.

Que-What is garbage collection?


Python removes objects with zero references.

Que-How do you manually trigger garbage collection?


import gc; gc.collect()

Que-How do you check memory usage in Python?


Using the sys.getsizeof() function.

36. Python Best Practices


Que-Why should you use virtual environments?
To manage dependencies separately for projects.

Que-How do you create a virtual environment?


python -m venv env_name

Que-How do you activate a virtual environment?


source env_name/bin/activate (Linux) or env_name\Scripts\activate (Windows).

Que-How do you freeze package dependencies?


pip freeze > requirements.txt

Que-How do you install dependencies from a file?


pip install -r requirements.txt
37. Advanced List Methods
Que-How do you reverse a list?
my_list[::-1] or my_list.reverse()

Que-How do you find the index of an element?


my_list.index(value)

Que-How do you count occurrences of an element?


my_list.count(value)

Que-How do you remove duplicates from a list?


list(set(my_list))

Que-How do you flatten a nested list?


[item for sublist in nested_list for item in sublist]

38. Advanced String Manipulation


Que-How do you split a string into a list?
string.split(separator)

Que-How do you join a list into a string?


separator.join(my_list)

Que-How do you replace substrings in a string?


string.replace(old, new)

Que-How do you strip whitespace from a string?


string.strip()

Que-How do you check if a string starts with a specific word?


string.startswith("word")

39. Miscellaneous Python Concepts


Que-What is the difference between is and ==?
is checks object identity, == checks value equality.

Que-How do you check the Python version?


python --version

Que-What is the difference between json.load() and json.loads()?


load() reads from a file, loads() reads from a string.

Que-What is the difference between repr() and str()?


repr() gives an unambiguous representation, str() is human-readable.

Que-How do you sort a dictionary by values?


sorted(my_dict.items(), key=lambda x: x[1])
Pandas
1. What is Pandas used for?
It is a Python library for data manipulation and analysis.

2. What are the two primary Pandas data structures?


Series (1D) and DataFrame (2D).

3. How do you create a Pandas DataFrame?


df = pd.DataFrame(data)

4. How to read a CSV file in Pandas?


df = pd.read_csv("file.csv")
5. How to write a DataFrame to a CSV file?
df.to_csv("file.csv", index=False)

6. How do you check for missing values in a DataFrame?


df.isnull().sum()

7. How do you fill missing values in Pandas?


df.fillna(value, inplace=True)

8. How to drop missing values?


df.dropna(inplace=True)

9. How to get summary statistics of a DataFrame?


df.describe()

10. How do you select a specific column in Pandas?


df["column_name"]

11. How do you select multiple columns?


df[["col1", "col2"]]

12. How to filter rows in Pandas?


df[df["col"] > value]

13. How to sort a DataFrame?


df.sort_values(by="column", ascending=True)

14. How to rename columns?


df.rename(columns={"old_name": "new_name"}, inplace=True)

15. How to reset the index of a DataFrame?


df.reset_index(drop=True, inplace=True)

16. How to set a specific column as an index?


df.set_index("column", inplace=True)

17. How to group data in Pandas?


df.groupby("column").mean()

18. How to merge two DataFrames?


df_merged = pd.merge(df1, df2, on="key")

19. How to concatenate DataFrames?


df_concat = pd.concat([df1, df2], axis=0)

20. How to check for duplicate values?


df.duplicated().sum()
21. How to drop duplicate values?
df.drop_duplicates(inplace=True)

22. How to apply a function to a column?


df["col"] = df["col"].apply(func)

23. How to convert a column’s datatype?


df["col"] = df["col"].astype(dtype)

24. How to extract year from a datetime column?


df["year"] = df["date_col"].dt.year

25. How to get unique values of a column?


df["col"].unique()

2. NumPy
26. What is NumPy used for?
Efficient numerical computations and array operations.

27. How do you create a NumPy array?


arr = np.array([1, 2, 3])

28. How do you create a NumPy array of zeros?


np.zeros((3,3))

29. How to create a NumPy array of ones?


np.ones((3,3))

30. How to generate a range of numbers using NumPy?


np.arange(1, 10, 2)

31. How to generate random numbers in NumPy?


np.random.rand(3,3)

32. How to generate a random integer array?


np.random.randint(1, 10, size=(3,3))

33. How to find the shape of a NumPy array?


arr.shape

34. How to find the data type of a NumPy array?


arr.dtype

35. How to reshape a NumPy array?


arr.reshape(2,3)
36. How to find the mean of an array?
np.mean(arr)

37. How to find the median of an array?


np.median(arr)

38. How to find the standard deviation of an array?


np.std(arr)

39. How to get the maximum value in an array?


np.max(arr)

40. How to get the minimum value in an array?


np.min(arr)

41. How to perform element-wise addition in NumPy?


arr1 + arr2

42. How to perform element-wise multiplication?


arr1 * arr2

43. How to get the dot product of two arrays?


np.dot(arr1, arr2)

44. How to get the inverse of a matrix?


np.linalg.inv(matrix)

45. How to get the determinant of a matrix?


np.linalg.det(matrix)

46. How to concatenate two NumPy arrays?


np.concatenate((arr1, arr2), axis=0)

47. How to split a NumPy array?


np.split(arr, 2)

48. How to find unique values in an array?


np.unique(arr)

49. How to save a NumPy array to a file?


np.save("filename.npy", arr)

50. How to load a NumPy array from a file?


np.load("filename.npy")
3. Seaborn
51. What is Seaborn used for?
Statistical data visualization.

52. How to import Seaborn?


import seaborn as sns

53. How to load a sample dataset in Seaborn?


df = sns.load_dataset("tips")

54. How to create a scatter plot?


sns.scatterplot(x="col1", y="col2", data=df)

55. How to create a line plot?


sns.lineplot(x="col1", y="col2", data=df)

56. How to create a bar plot?


sns.barplot(x="col1", y="col2", data=df)

57. How to create a box plot?


sns.boxplot(x="col1", y="col2", data=df)

58. How to create a violin plot?


sns.violinplot(x="col1", y="col2", data=df)

59. How to create a heatmap?


sns.heatmap(df.corr(), annot=True, cmap="coolwarm")

60. How to create a pairplot?


sns.pairplot(df)

4. Matplotlib
61. What is Matplotlib used for?
It is a Python library for creating static, animated, and interactive visualizations.

62. How do you import Matplotlib?


import matplotlib.pyplot as plt

63. How do you create a simple plot?


plt.plot([1, 2, 3, 4])
plt.show()

64. How to add a title to a plot?


plt.title("My Plot")
65. How to label x and y axes?
plt.xlabel("X-axis")
plt.ylabel("Y-axis")

66. How to set axis limits in Matplotlib?


plt.xlim(0, 10); plt.ylim(0, 100)

67. How to change line color in a plot?


plt.plot(x, y, color='red')

68. How to change line style in a plot?


plt.plot(x, y, linestyle='--')

69. How to add a legend to a plot?


plt.legend(["Line 1"])

70. How to create a scatter plot in Matplotlib?


plt.scatter(x, y)

71. How to create a bar plot?


plt.bar(x, y)

72. How to create a histogram?


plt.hist(data, bins=10)

73. How to create a pie chart?


plt.pie(sizes, labels=labels, autopct="%1.1f%%")

74. How to create multiple subplots?


fig, ax = plt.subplots(2, 2)

75. How to change figure size in Matplotlib?


plt.figure(figsize=(10,5))

76. How to save a plot as an image?


plt.savefig("plot.png")

77. How to display a grid in Matplotlib?


plt.grid(True)

78. How to set log scale for axes?


plt.yscale("log")

79. How to change font size in Matplotlib?


plt.rcParams["font.size"] = 14
80. How to rotate x-axis labels?
plt.xticks(rotation=45)

81. How to change marker style in a scatter plot?


plt.scatter(x, y, marker='o')

82. How to create a horizontal bar plot?


plt.barh(x, y)

83. How to add annotations to a plot?


plt.annotate("Point", xy=(2, 4), xytext=(3, 5), arrowprops=dict(arrowstyle="->"))

84. How to plot multiple lines in the same graph?


plt.plot(x1, y1, label="Line 1")
plt.plot(x2, y2, label="Line 2")
plt.legend()

85. How to use different colormaps in Matplotlib?


plt.scatter(x, y, c=z, cmap="viridis")

5. Plotly
86. What is Plotly used for?
It is a Python library for interactive data visualization.

87. How to import Plotly?


import plotly.express as px

88. How to create a scatter plot in Plotly?


fig = px.scatter(df, x="col1", y="col2")
fig.show()

89. How to create a line plot?


fig = px.line(df, x="col1", y="col2")
fig.show()

90. How to create a bar chart?


fig = px.bar(df, x="col1", y="col2")
fig.show()

91. How to create a histogram in Plotly?


fig = px.histogram(df, x="col")
fig.show()

92. How to create a box plot in Plotly?


fig = px.box(df, x="col1", y="col2")
fig.show()

93. How to create a violin plot in Plotly?


fig = px.violin(df, x="col1", y="col2")
fig.show()

94. How to create a heatmap in Plotly?


import plotly.graph_objects as go
fig = go.Figure(data=go.Heatmap(z=df.values))
fig.show()

95. How to create a 3D scatter plot?


fig = px.scatter_3d(df, x="col1", y="col2", z="col3")
fig.show()

96. How to create an animated plot?


fig = px.scatter(df, x="col1", y="col2", animation_frame="time")
fig.show()

97. How to add a title to a Plotly plot?


fig.update_layout(title="My Plot")

98. How to change axis labels in Plotly?


fig.update_layout(xaxis_title="X-Axis", yaxis_title="Y-Axis")

99. How to change color scale in a Plotly plot?


fig.update_traces(marker=dict(colorbar=dict(title="Color Scale")))

100.How to customize a Plotly plot theme?


fig.update_layout(template="plotly_dark")

101.How to show data labels in a bar plot?


fig.update_traces(text=df["col"], textposition="outside")

102.How to set the figure size in Plotly?


fig.update_layout(width=800, height=500)

103.How to add a hover tooltip in Plotly?


fig.update_traces(hoverinfo="text")

104.How to create a grouped bar chart?


fig = px.bar(df, x="col1", y="col2", color="category", barmode="group") fig.show()

105. How to create a stacked bar chart?


fig = px.bar(df, x="col1", y="col2", color="category", barmode="stack") fig.show()
106.How to plot a choropleth map in Plotly?
fig = px.choropleth(df, locations="country", color="value") fig.show()

107.How to add reference lines in a Plotly plot?


fig.add_shape(type="line", x0=0, y0=0, x1=1, y1=1, line=dict(color="Red"))

108.How to add custom annotations in Plotly?


fig.add_annotation(x=2, y=4, text="Custom Annotation")

109.How to save a Plotly figure as an image?


fig.write_image("plot.png")

110How to export Plotly graphs to HTML?


fig.write_html("plot.html")

Basic Probability Concepts


1. What is probability?
Probability measures the likelihood of an event occurring, ranging from 0 to 1.

2. What are independent events?


Events where the occurrence of one does not affect the probability of another.

3. What are mutually exclusive events?


Events that cannot occur simultaneously (e.g., getting heads and tails in a single coin
toss).

4. What is conditional probability?


The probability of an event occurring given that another event has already occurred,
P(A | B) = P(A ∩ B) / P(B).

5. What is Bayes’ Theorem?


A formula to find the probability of an event given prior knowledge:
P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A).

6. What is the Law of Total Probability?


The total probability of an event occurring is the sum of probabilities over all
mutually exclusive cases.

7. What is the difference between discrete and continuous probability distributions?


Discrete distributions deal with countable outcomes, while continuous distributions
deal with uncountable ranges.

8. What is the complement rule in probability?


The probability of an event not occurring is 1 minus the probability of it occurring:
P(A') = 1 - P(A).

9. What is the probability of at least one event occurring in multiple trials?


1−P(none)1 - P(\text{none})1−P(none).

10. What is the probability of independent events A and B occurring together?


P(A∩B)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)P(A∩B)=P(A)×P(B).

Probability Distributions
11. What is a probability distribution?
A function that describes the likelihood of different outcomes in a dataset.

12. What is a Bernoulli distribution?


A distribution for binary (0 or 1) outcomes, with P(success) = p.

13. What is a Binomial distribution?


The probability distribution of the number of successes in nnn independent trials
with probability ppp.

14. What is a Poisson distribution?


Models the number of events occurring in a fixed interval of time or space.

15. What is a Normal distribution?


A symmetric, bell-shaped distribution defined by mean μ\muμ and standard
deviation σ\sigmaσ.

16. What is the Central Limit Theorem?


The distribution of sample means approaches normality as sample size increases.

17. What is a Gaussian distribution?


Another name for the normal distribution.

18. What is the difference between normal and standard normal distribution?
The standard normal distribution has a mean of 0 and a standard deviation of 1.

19. What is the Exponential distribution used for?


Models time until the next event occurs.

20. What is the Uniform distribution?


A distribution where all outcomes are equally likely.

Statistical Measures
21. What is the mean?
The average of all values: μ=∑xn\mu = \frac{\sum x}{n}μ=n∑x.
22. What is the median?
The middle value when data is sorted.

23. What is the mode?


The most frequently occurring value in a dataset.

24. What is variance?


The measure of how much values deviate from the mean:

25. What is standard deviation?


The square root of variance, representing data spread.

26. What is the coefficient of variation?


A normalized measure of dispersion:

27. What is skewness?


Measures asymmetry in a distribution.
28. What is kurtosis?
Measures the "tailedness" of a distribution.

29. What is covariance?


Measures the directional relationship between two variables.

30. What is correlation?


A standardized measure of linear relationship, ranging from -1 to 1.

Inferential Statistics
31. What is hypothesis testing?
A method to test assumptions about a population using sample data.

32. What is a null hypothesis (H0H_0H0)?


The assumption that there is no effect or difference.

33. What is an alternative hypothesis (H1H_1H1)?


The assumption that there is a significant effect or difference.

34. What is a Type I error?


Rejecting H0H_0H0 when it is actually true (false positive).

35. What is a Type II error?


Failing to reject H0H_0H0 when it is actually false (false negative).

36. What is a p-value?


The probability of obtaining results at least as extreme as observed, assuming
H0H_0H0 is true.

37. What is confidence interval?


A range of values within which the population parameter is expected to fall.

38. What is statistical power?


The probability of correctly rejecting a false null hypothesis.

39. What is an F-test?


Compares variances between two populations.

40. What is an ANOVA test?


Tests whether means of three or more groups are significantly different.

Advanced Concepts
41. What is a Markov Chain?
A stochastic process where future states depend only on the present state.
42. What is the Law of Large Numbers?
As sample size increases, sample mean approaches population mean.

43. What is the difference between parametric and non-parametric tests?


Parametric tests assume a specific distribution; non-parametric tests do not.

44. What is Maximum Likelihood Estimation (MLE)?


A method for estimating parameters by maximizing the likelihood function.

45. What is the Bootstrap method?


A resampling technique for estimating sampling distribution.

46. What is Monte Carlo Simulation?


A computational method using repeated random sampling.

47. What is Bayesian Inference?


Updating probabilities as more evidence is introduced.

48. What is a Chi-Square test?


A statistical test to determine if categorical variables are independent.

49. What is the difference between one-tailed and two-tailed tests?


One-tailed tests predict direction; two-tailed tests check for any difference.

50. What is the difference between descriptive and inferential statistics?


Descriptive statistics summarize data; inferential statistics draw conclusions.

Probability Theory in Machine Learning


51. What is the probability mass function (PMF)?
The function that gives the probability of discrete random variables.

52. What is the probability density function (PDF)?


The function that represents the probability of continuous random variables.

53. What is the cumulative distribution function (CDF)?


The probability that a variable takes a value less than or equal to a certain number.

54. What is the difference between PMF and PDF?


PMF is for discrete variables, while PDF is for continuous variables.

55. What is an expected value?


The mean of a probability distribution:
E(X)=∑xP(x)E(X) = \sum x P(x)E(X)=∑xP(x).

56. What is the variance of a probability distribution?


Measures how much values deviate from the expected value:
Var(X)=E[(X−μ)2]Var(X) = E[(X - \mu)^2]Var(X)=E[(X−μ)2].

57. What is the entropy of a distribution?


Measures the uncertainty in a probability distribution:
H(X)=−∑P(x)log⁡P(x)H(X) = -\sum P(x) \log P(x)H(X)=−∑P(x)logP(x).

58. What is cross-entropy?


A loss function measuring the difference between two probability distributions.

59. What is the Kullback-Leibler (KL) divergence?


Measures how one probability distribution differs from another.

60. What is Jensen's Inequality?

Bayesian Statistics & Probability


61. What is a prior probability in Bayesian statistics?
The initial belief about a probability before observing data.

62. What is a posterior probability?


The updated probability after incorporating evidence.

63. What is a likelihood function?


The probability of observed data given model parameters.

64. What is Markov Chain Monte Carlo (MCMC)?


A method for estimating probability distributions using Markov Chains.

65. What is the difference between frequentist and Bayesian statistics?


Frequentists rely on sample data; Bayesians update probabilities with new data.

66. What is Maximum A Posteriori (MAP) estimation?


The mode of the posterior distribution, incorporating prior belief.

67. What is the difference between MAP and MLE?


MLE maximizes likelihood, while MAP includes prior knowledge.

68. What is conjugate prior in Bayesian inference?


A prior distribution that results in a posterior of the same form.

69. What is the Beta distribution used for?


Modeling probabilities for binary outcomes.
70. What is the Dirichlet distribution used for?
Modeling categorical probabilities in Bayesian analysis.

Regression Analysis
71. What is the difference between correlation and regression?
Correlation measures association, while regression predicts outcomes.

72. What is heteroscedasticity in regression?


When the variance of errors is not constant across observations.

73. What is multicollinearity?


When predictor variables are highly correlated, affecting model stability.

74. What is the coefficient of determination (R2R^2R2)?


A measure of how well the model explains variance in the data.

75. What is adjusted R2R^2R2?


A modified R2R^2R2 that accounts for the number of predictors.

76. What is ordinary least squares (OLS)?


A method that minimizes the sum of squared residuals in regression.

77. What is ridge regression?


A regression technique that adds L2 regularization to reduce overfitting.

78. What is lasso regression?


A regression technique that adds L1 regularization for feature selection.

79. What is the difference between L1 and L2 regularization?


L1 (lasso) shrinks coefficients to zero; L2 (ridge) reduces coefficient size.

80. What is an interaction effect in regression?


When two independent variables combine to affect the dependent variable
differently.

Statistical Testing & Confidence Intervals


81. What is a t-test?
A test comparing the means of two groups.

82. What is the difference between paired and independent t-tests?


Paired t-tests compare related samples, while independent t-tests compare different
groups.

83. What is the Mann-Whitney U test?


A non-parametric test comparing two independent distributions.
84. What is the Wilcoxon signed-rank test?
A non-parametric alternative to the paired t-test.

85. What is a Z-test?


A test comparing means when population variance is known.

86. What is a one-sample t-test?


A test checking if a sample mean differs from a known value.

87. What is a two-sample t-test?


A test comparing means of two independent samples.

88. What is the Shapiro-Wilk test?


A test for normality in a dataset.

89. What is the Kolmogorov-Smirnov test?


A test comparing a sample distribution to a reference distribution.

90. What is the Levene’s test?


A test for homogeneity of variances in different groups.

Miscellaneous Topics in Statistics for Data Science


91. What is Simpson's Paradox?
A trend appearing in groups but reversing when groups are combined.

92. What is the Birthday Paradox?


The probability that two people in a group share the same birthday is higher than
intuition suggests.

93. What is A/B testing?


A statistical method to compare two versions of a product or strategy.

94. What is Bootstrapping in statistics?


A resampling technique to estimate statistics from small samples.

95. What is the Bonferroni correction?


A method for adjusting p-values when performing multiple comparisons.

96. What is Expectation-Maximization (EM)?


A statistical algorithm for finding maximum likelihood estimates when data is
incomplete.

97. What is Gibbs Sampling?


A Markov Chain Monte Carlo method for sampling from complex distributions.
98. What is Principal Component Analysis (PCA)?
A dimensionality reduction technique using eigenvectors and eigenvalues.

99. What is Singular Value Decomposition (SVD)?


A matrix factorization technique used in data compression and noise reduction.

100.What is an outlier and how can it be detected?


An observation significantly different from others, detected using Z-scores, IQR, or
visualization techniques.

Exploratory Data Analysis (EDA)


1. What is Exploratory Data Analysis (EDA)?
- EDA is the process of analyzing datasets to summarize their main characteristics using
statistical and visualization techniques.

2. Why is EDA important?


- EDA helps in understanding the data, identifying patterns, detecting outliers, and finding
correlations, which are crucial for building effective models.

3. What are the main steps in EDA?


- Data collection, data cleaning, data visualization, statistical analysis, and feature
engineering.

4. What tools are commonly used for EDA?


- Python (Pandas, NumPy, Matplotlib, Seaborn), R, Tableau, Excel, and SQL.

5. What is the role of visualizations in EDA?


- Visualizations help in understanding data distributions, relationships between variables,
and identifying anomalies.

6. What are some common data visualization techniques?


- Histograms, scatter plots, box plots, heatmaps, and pair plots.

7. What is the difference between univariate and multivariate analysis?


- Univariate analysis examines one variable at a time, while multivariate analysis examines
relationships between multiple variables.

8. What is a missing value, and how do you handle it?


- Missing values are data points that are not recorded. They can be handled by removing,
imputing, or flagging.

9. What are outliers? How do you detect them?


- Outliers are data points significantly different from others. Detection methods include
box plots, z-scores, and the IQR method.

10. What is the significance of data cleaning in EDA?


- Cleaning ensures that the data is accurate, consistent, and ready for analysis, which leads
to better model performance.

Data Cleaning
11. How do you identify duplicates in a dataset?
- Use methods like `.duplicated()` in Pandas or the DISTINCT keyword in SQL.

12. What is data normalization?


- It’s the process of scaling data to a range, often [0,1] or [-1,1], to ensure uniformity.

13. What is data standardization?


- It’s the process of rescaling data to have a mean of 0 and a standard deviation of 1.

14. How do you deal with categorical data?


- Convert using techniques like one-hot encoding, label encoding, or embeddings.

15. What is imputation, and why is it important?


- Imputation is replacing missing values with estimated ones. It’s crucial for maintaining
data integrity.

16. What is the difference between mean and median imputation?


- Mean imputation replaces missing values with the average, while median uses the
middle value.

17. What is feature scaling, and why is it important?


- Feature scaling ensures all features contribute equally to the model by bringing them to
a comparable scale.

18. How do you identify inconsistent data?


- By looking for anomalies like typos, mixed formats, or invalid values.

19. What is the IQR method for outlier detection?


- The interquartile range (IQR) is the range between the 25th and 75th percentiles.
Outliers are data points outside 1.5 times the IQR.

20. What is data deduplication?


- It’s the process of removing duplicate rows or entries in a dataset.

Data Visualization
21. What is a histogram?
- A histogram visualizes the frequency distribution of a variable.

22. When would you use a scatter plot?


- To analyze relationships between two continuous variables.

23. What is a box plot?


- A box plot displays the distribution of data, highlighting medians and outliers.

24. What is a heatmap?


- A heatmap visualizes data using colors, often to show correlations or intensity.

25. What is a pair plot?


- A pair plot visualizes pairwise relationships in a dataset, often using scatter plots.

26. What is a bar chart?


- A bar chart represents categorical data with rectangular bars.

27. What is the purpose of a pie chart?


- To display proportions of categories as slices of a pie.

28. How do you visualize missing data?


- Using heatmaps or bar plots to display the count or location of missing values.

29. What is a violin plot?


- A violin plot combines a box plot with a kernel density estimate to show data
distributions.

30. What are the best practices for data visualization?


- Keep it simple, label axes, use appropriate chart types, and avoid clutter.

Statistical Questions
31. What is a correlation matrix?
- A table showing correlation coefficients between variables.

32. What is the difference between correlation and causation?


- Correlation shows a relationship between variables; causation implies one variable
causes the other.

33. What are p-values?


- P-values indicate the probability of observing a result under the null hypothesis.

34. What is hypothesis testing in EDA?


- It’s a statistical method to test assumptions about data.

35. What is skewness?


- Skewness measures the asymmetry of a distribution.

36. What is kurtosis?


- Kurtosis measures the “tailedness” of a distribution.

37. What is a z-score?


- A z-score measures how many standard deviations a value is from the mean.
38. What is ANOVA?
- Analysis of Variance tests differences between group means.

39. How do you check for data normality?


- Using visualizations like histograms or statistical tests like the Shapiro-Wilk test.

40. What is a chi-square test?


- A statistical test to assess relationships between categorical variables.

Domain-Specific EDA Questions


41. How do you handle time-series data in EDA?
- Analyze trends, seasonality, and autocorrelations.

42. How do you deal with geospatial data in EDA?


- Use maps and clustering techniques.

44. What is feature engineering in EDA?


- Creating new features or modifying existing ones to improve model performance.

45. What are lag features in time-series data?


- Lag features are previous time points used to predict future ones.

46. How do you detect seasonality in data?


- Using visualizations or statistical methods like decomposition.

47. How do you analyze customer data?


- Segment customers, analyze churn rates, and study purchase behaviors.

48. What is cohort analysis?


- Grouping data based on shared characteristics to analyze behaviors over time.

49. How do you analyze sales data?


- Examine trends, outliers, and seasonality using visualizations and aggregations.

50. What is a lift chart?


- A lift chart evaluates model performance by comparing predicted vs. actual outcomes.

Advanced EDA Techniques


51. What is dimensionality reduction in EDA?
- Reducing the number of variables using techniques like PCA or t-SNE while preserving
significant information.

52. What is PCA (Principal Component Analysis)?


- PCA transforms data into a set of orthogonal components to capture the most variance.

53. What is t-SNE (t-Distributed Stochastic Neighbor Embedding)?


- A technique for visualizing high-dimensional data in two or three dimensions.

54. What is clustering, and how is it used in EDA?


- Clustering groups data points based on similarities, helping identify patterns.

55. What is a dendrogram?


- A tree-like diagram used to visualize hierarchical clustering results.

56. What is the purpose of sampling in EDA?


- Sampling reduces data size for faster analysis while retaining data characteristics.

57. What is feature interaction analysis?


- Examining how features interact and influence each other in the dataset.

58. What is the Gini index?


- A measure of inequality or diversity in data, often used in decision trees.

59. How do you analyze imbalanced datasets during EDA?


- By examining class distributions and using techniques like SMOTE or resampling.

60. What is stratified sampling?


- Dividing the data into subgroups and sampling proportionally from each group.

61. What is the role of correlation heatmaps in feature selection?


- Heatmaps help identify redundant features by visualizing correlations.

62. How do you handle multicollinearity in EDA?


- By removing highly correlated features or using dimensionality reduction.

63. What is VIF (Variance Inflation Factor)?


- A measure to detect multicollinearity among features.

64. What is cross-tabulation in EDA?


- A table to summarize relationships between two categorical variables.

65. What is the difference between supervised and unsupervised EDA?


- Supervised EDA involves labeled data, while unsupervised EDA explores patterns without
labels.

66. How do you analyze seasonal data trends?


- By decomposing time series into trend, seasonality, and residuals.

67. What are hierarchical clustering methods?


- Techniques to build clusters based on nested data relationships.

68. What is a silhouette score?


- A metric to evaluate the quality of clusters.

69. What is LOF (Local Outlier Factor)?


- A method to detect density-based outliers.

70. How do you visualize high-dimensional data?


- Using techniques like PCA, t-SNE, or parallel coordinates.

71. What is feature importance analysis?


- Identifying the most significant features impacting the target variable.

72. What is an elbow method in clustering?


- A technique to determine the optimal number of clusters in k-means.

73. What is auto-correlation in time-series analysis?


- The correlation of a time series with its own lagged values.

74. What is detrending in time-series analysis?


- Removing the trend component to analyze residual patterns.

75. What is a lag plot?


- A plot to identify autocorrelation by comparing a variable with its lagged values.

Practical EDA Applications


76. How do you analyze marketing data?
- By studying campaign performance, customer segmentation, and ROI.

77. What is churn analysis?


- Analyzing reasons why customers stop using a service or product.

78. How do you analyze web traffic data?


- By examining page views, bounce rates, and user behavior.

79. How do you conduct sentiment analysis in EDA?


- By analyzing text data to determine positive, neutral, or negative sentiments.

80. What is the role of A/B testing in EDA?


- Comparing two versions of a variable to analyze performance differences.

81. How do you analyze health data?


- By identifying trends, risk factors, and patient outcomes.

82. How do you analyze fraud detection data?


- By identifying anomalies and unusual patterns in transactional data.

83. What is anomaly detection in EDA?


- Identifying unusual data points that deviate from the norm.

84. What is network analysis in EDA?


- Analyzing relationships and structures in networked data.

85. What is survival analysis in EDA?


- Analyzing time-to-event data to estimate survival rates.

86. How do you perform cohort retention analysis?


- Tracking how user groups behave over time.

87. What is lift analysis in marketing?


- Measuring the effectiveness of marketing campaigns by analyzing incremental impacts.

88. How do you analyze financial data?


- By examining trends, volatility, and risk factors.

89. What is the purpose of feature encoding?


- Converting categorical data into numerical formats for analysis.

90. How do you deal with large datasets in EDA?


- Using sampling, distributed computing, or optimized libraries like Dask.

91. What is dimensional stacking?


- A visualization technique for high-dimensional categorical data.

92. What is Simpson’s Paradox?


- A trend that appears in groups but reverses when combined.

93. How do you analyze social media data?


- By examining likes, shares, comments, and engagement metrics.

94. How do you deal with biased data?


- By identifying bias sources and rebalancing the dataset.

95. What is the purpose of a residual plot?


- To check the goodness-of-fit for regression models.

96. How do you analyze operational data?


- By identifying bottlenecks, inefficiencies, and KPIs.

97. What is cross-validation in EDA?


- A technique to validate model performance on unseen data.

98. How do you explore sparse datasets?


- By focusing on non-zero values and using sparse data structures.

99. What is a word cloud in EDA?


- A visualization that shows the frequency of words in text data.

100. How do you document EDA findings?


- By creating reports, dashboards, and notebooks with visualizations and insights.

Feature Engineering

1. What is feature engineering?


- Feature engineering is the process of creating, modifying, or selecting features to
improve the performance of machine learning models.

2. Why is feature engineering important?


- It helps models learn better by providing relevant, meaningful, and high-quality input
data.

3. What are the main steps in feature engineering?


- Feature creation, feature transformation, feature selection, and feature scaling.

4. What is feature selection?


- Selecting the most relevant features to reduce dimensionality and improve model
performance.

5. What are categorical features?


- Features that represent categories or labels, such as gender or product type.

6. What is one-hot encoding?


- A method of converting categorical variables into binary vectors.

7. What is label encoding?


- A method of converting categorical labels into numeric values.

8. When should you use one-hot encoding vs. label encoding?


- Use one-hot encoding for nominal categories and label encoding for ordinal categories.

9. What is feature scaling?


- Transforming features to a comparable scale, such as normalization or standardization.

10. What is normalization?


- Scaling data to a range of [0,1].

11. What is standardization?


- Scaling data to have a mean of 0 and a standard deviation of 1.

12. What is feature transformation?


- Modifying features to make them more suitable for a model, such as applying log or
square root transformations.

13. What is binning in feature engineering?


- Dividing continuous variables into discrete bins or intervals.

14. What is polynomial feature generation?


- Creating new features by raising existing features to powers or combining them.

15. What is interaction feature generation?


- Creating new features by combining two or more existing features.

16. What is feature extraction?


- Deriving new features from existing data, often using dimensionality reduction
techniques.

17. What is PCA (Principal Component Analysis)?


- A technique for reducing dimensionality by transforming data into principal components.

18. What is t-SNE?


- A technique for visualizing high-dimensional data in two or three dimensions.

19. What is feature importance?


- A measure of how much each feature contributes to the model’s predictions.

20. What is feature hashing?


- A method of encoding categorical variables using a hash function.

Handling Categorical Features


21. How do you handle high-cardinality categorical features?
- Use techniques like target encoding, hashing, or dimensionality reduction.

22. What is target encoding?


- Replacing categorical values with the mean of the target variable for each category.

23. What is frequency encoding?


- Encoding categories based on their frequency in the dataset.

24. What is ordinal encoding?


- Assigning numerical values to categories based on their order.
25. What is mean encoding?
- Replacing categories with the mean of the target variable grouped by the category.

26. What is dummy variable trap?


- A situation where one-hot encoding causes multicollinearity by including redundant
features.

27. How do you avoid the dummy variable trap?


- Drop one column after one-hot encoding.

28. What is embedding in feature engineering?


- Representing categorical variables as dense numerical vectors, often used in deep
learning.

29. What is count encoding?


- Encoding categories based on their count in the dataset.

30. What is binary encoding?


- A combination of one-hot encoding and hashing for encoding categorical variables.

Handling Missing Data


31. How do you handle missing data in feature engineering?
- Use techniques like imputation, deletion, or flagging.

32. What is mean imputation?


- Replacing missing values with the mean of the column.

33. What is median imputation?


- Replacing missing values with the median of the column.

34. What is mode imputation?


- Replacing missing values with the mode of the column.

35. What is forward fill?


- Filling missing values with the last observed value.

36. What is backward fill?


- Filling missing values with the next observed value.

37. What is KNN imputation?


- Filling missing values using the mean or median of the k-nearest neighbors.

38. What is iterative imputation?


- Predicting missing values using other features iteratively.

39. What is missing indicator?


- Adding a binary feature to indicate whether a value was missing.

40. When should you drop rows with missing values?


- When the missing data is random and constitutes a small portion of the dataset.

Questions About Feature Selection


41. What are filter methods in feature selection?
- Methods that use statistical measures, such as correlation or chi-square tests, to select
features.

42. What are wrapper methods in feature selection?


- Methods that evaluate subsets of features using model performance, such as forward
selection or backward elimination.

43. What are embedded methods in feature selection?


- Methods that perform feature selection during model training, such as Lasso or tree-
based models.

44. What is Recursive Feature Elimination (RFE)?


- A wrapper method that recursively removes the least important features based on model
performance.

45. What is the role of feature importance in feature selection?


- To identify and prioritize the most relevant features for the model.

46. What is mutual information in feature selection?


- A measure of the dependency between features and the target variable.

47. What is the role of variance in feature selection?


- Features with low variance are often removed as they contribute little to the model.

48. What is feature elimination?


- Removing irrelevant or redundant features from the dataset.

49. What is the role of correlation in feature selection?


- Highly correlated features can be removed to avoid multicollinearity.

50. What is dimensionality reduction in feature engineering?


- Reducing the number of features while retaining the most important information, often
using techniques like PCA or autoencoders.

1. AI vs ML vs DL vs DS (15 Questions)
1. What is Artificial Intelligence (AI)?
AI is the simulation of human intelligence in machines to perform cognitive tasks.
2. What is Machine Learning (ML)?
ML is a subset of AI that enables machines to learn patterns from data without
explicit programming.

3. What is Deep Learning (DL)?


DL is a subset of ML that uses multi-layered neural networks to learn complex data
representations.

4. What is Data Science (DS)?


DS is an interdisciplinary field that combines AI, ML, statistics, and domain
knowledge to extract insights from data.

5. How does ML differ from AI?


ML is a subset of AI that focuses on pattern learning, while AI includes rule-based
systems and other intelligent automation techniques.

6. How does DL differ from ML?


DL uses neural networks with multiple layers (deep architectures), whereas ML
typically involves feature engineering and shallow models.

7. How does DS differ from AI and ML?


DS encompasses AI, ML, statistics, and data processing techniques for decision-
making.

8. Which real-world applications use AI?


AI is used in chatbots, recommendation systems, autonomous vehicles, and fraud
detection.

9. Which real-world applications use ML?


ML is used in spam filtering, credit scoring, and predictive analytics.

10. Which real-world applications use DL?


DL is used in image recognition, speech-to-text, and natural language processing
(NLP).

11. Which real-world applications use DS?


DS is used in customer segmentation, risk assessment, and data-driven decision-
making.

12. What is the role of a Data Scientist?


A Data Scientist analyzes, processes, and models data to generate insights and
predictive analytics.

13. What is the role of a Machine Learning Engineer?


An ML Engineer builds, trains, and optimizes machine learning models for production
systems.
14. What is the role of a Deep Learning Engineer?
A DL Engineer specializes in designing, training, and deploying neural network
models.

15. What are the major skills required for AI, ML, and DS?
AI requires problem-solving; ML requires statistics & algorithms; DS requires Python,
SQL, and data visualization.

2. Supervised, Unsupervised & Reinforcement Learning


16. What is Supervised Learning?
A type of ML where the model learns from labeled data to make predictions.

17. What is Unsupervised Learning?


A type of ML where the model learns patterns from unlabeled data without explicit
supervision.

18. What is Reinforcement Learning (RL)?


A learning paradigm where an agent learns by interacting with an environment and
receiving rewards.

19. What are examples of Supervised Learning?


Classification (spam detection) and Regression (house price prediction).

20. What are examples of Unsupervised Learning?


Clustering (customer segmentation) and Dimensionality Reduction (PCA).

21. What are examples of Reinforcement Learning?


Robotics, game-playing AI, and autonomous vehicles.

22. What is the main difference between Supervised and Unsupervised Learning?
Supervised learning uses labeled data, while unsupervised learning does not.

23. How does Reinforcement Learning differ from Supervised Learning?


RL learns from rewards and penalties, while SL learns from labeled data.

24. What are the main challenges in Reinforcement Learning?


Reward sparsity, exploration-exploitation trade-off, and computational complexity.

25. Which ML technique is best for finding hidden patterns in data?


Unsupervised learning, specifically clustering techniques like K-Means or DBSCAN.

3. Train, Test & Validation


26. What is Training Data?
Data used to train the ML model and adjust parameters.
27. What is Validation Data?
Data used to tune hyperparameters and prevent overfitting.

28. What is Test Data?


Data used to evaluate the final model's performance on unseen data.

29. What is the typical split ratio for Train, Validation, and Test sets?
70-80% training, 10-15% validation, 10-15% testing.

30. Why is a Validation Set important?


It helps in hyperparameter tuning and prevents overfitting.

4. Variance, Bias, Overfitting & Underfitting


31. What is Variance in ML?
Variance measures how much model predictions change with different datasets.

32. What is Bias in ML?


Bias represents the error due to overly simplistic assumptions in the learning
algorithm.

33. What is Overfitting?


When a model learns noise and performs well on training data but poorly on unseen
data.

34. What is Underfitting?


When a model is too simple to capture patterns in the data, leading to high bias.

35. What is the Bias-Variance Tradeoff?


The balance between high bias (underfitting) and high variance (overfitting).

36. How can Overfitting be prevented?


Using techniques like regularization, dropout, and cross-validation.

37. How can Underfitting be fixed?


Increasing model complexity, adding features, or reducing regularization.

38. Which ML models are prone to Overfitting?


Decision Trees and Deep Neural Networks.

39. Which ML models are prone to Underfitting?


Linear Regression and Naïve Bayes.

40. What is Regularization?


A technique to reduce overfitting by adding a penalty term to the loss function.

Feature Engineering
1. Handling Missing Values
41. How can missing values be handled?
Using deletion, imputation (mean, median, mode), or predictive modeling.

42. What is Mean Imputation?


Replacing missing values with the mean of the column.

43. What is the best method for handling categorical missing values?
Mode imputation or creating a new category like "Unknown".

44. What is KNN Imputation?


Filling missing values using K-Nearest Neighbors.

45. How does Multiple Imputation work?


It creates multiple datasets with different imputed values and aggregates predictions.

2. Handling Imbalanced Datasets


46. What is an imbalanced dataset?
A dataset where one class has significantly more samples than another.

47. How to handle an imbalanced dataset?


Using techniques like oversampling, undersampling, or SMOTE.

48. What is Oversampling?


Increasing the number of samples in the minority class.

49. What is Undersampling?


Removing samples from the majority class to balance the dataset.

50. What is the impact of imbalanced data on model performance?


It can bias the model towards the majority class, leading to poor generalization.

3. SMOTE (Synthetic Minority Over-sampling Technique)


51. What is SMOTE?
SMOTE is a technique that generates synthetic samples for the minority class to
balance the dataset.

52. How does SMOTE work?


It creates new data points by interpolating between existing minority class samples.

53. When should SMOTE be used?


When dealing with highly imbalanced datasets in classification problems.

54. What is the downside of SMOTE?


It can generate unrealistic synthetic samples, leading to overfitting.

55. How does SMOTE differ from random oversampling?


SMOTE creates synthetic samples instead of duplicating existing ones.
4. Data Interpolation
56. What is Data Interpolation?
A technique to estimate missing or unknown values based on known data points.

57. Which interpolation methods are commonly used?


Linear, Polynomial, and Spline interpolation.

58. How does Linear Interpolation work?


It estimates missing values by assuming a straight-line relationship between known
data points.

59. What is Spline Interpolation?


A method that fits piecewise polynomials to the data for smooth approximation.

60. When should Data Interpolation be avoided?


When missing values are too large or data distribution is highly non-linear.

5. Handling Outliers
61. What are Outliers?
Data points that significantly deviate from the normal pattern of the dataset.

62. How can Outliers be detected?


Using methods like Z-score, IQR (Interquartile Range), or visualizations (boxplots).

63. What is the IQR method?


A technique that defines outliers as values outside the 1.5*IQR range.

64. What are common ways to handle Outliers?


Removal, transformation (log, square root), or capping using percentiles.

65. Why should Outliers not always be removed?


Because they might contain valuable information, especially in fraud detection or
anomaly detection.

6. Feature Selection Update


66. What is Feature Selection?
The process of selecting the most relevant features to improve model performance.

67. What are the types of Feature Selection techniques?


Filter, Wrapper, and Embedded methods.

68. What is a Filter Method?


A technique that ranks features based on statistical measures like correlation and
mutual information.
69. What is a Wrapper Method?
A technique that selects features based on model performance, like Recursive
Feature Elimination (RFE).

70. What is an Embedded Method?


A technique where feature selection occurs as part of the model training process,
such as LASSO regression.

7. Feature Extraction
71. What is Feature Extraction?
Transforming raw data into new features that better represent patterns in data.

72. How does PCA help in Feature Extraction?


PCA reduces dimensionality by extracting principal components that explain
variance.

73. What is TF-IDF in Feature Extraction?


A technique used in NLP to weigh words based on their importance in a document.

74. What are Wavelet Transforms used for?


Feature extraction in signal processing and time-series analysis.

75. How does Autoencoder help in Feature Extraction?


Autoencoders compress input data into a latent representation for feature learning.

8. Feature Scaling Normalization


76. What is Feature Scaling?
A technique used to standardize feature values within a fixed range for better model
performance.

77. Why is Feature Scaling important?


It ensures that features have equal importance, especially for distance-based models.

78. What is Min-Max Scaling?


A normalization technique that scales features between 0 and 1.

79. What is Standardization (Z-score Scaling)?


A technique that transforms data to have zero mean and unit variance.

80. Which models require Feature Scaling?


Models like KNN, SVM, and Neural Networks are sensitive to feature scale.

9. Normalization Min-Max Scaling


81. What is Min-Max Normalization?
A scaling technique that transforms data within a range of [0,1] using (x - min) / (max
- min).
82. When is Min-Max Scaling preferred?
When data is uniformly distributed and outliers are not a major concern.

83. What is the drawback of Min-Max Scaling?


It is sensitive to outliers, which can distort the scale.

84. Which ML models work best with Min-Max Scaling?


Deep Learning models and K-Means clustering.

85. How does Min-Max Scaling affect gradient descent?


It speeds up convergence by keeping feature values in a controlled range.

10. Unit Vectors Feature Scaling


86. What is Unit Vector Scaling?
A technique that scales feature vectors to have a magnitude of 1.

87. What is the formula for Unit Vector Scaling?


x_scaled = x / ||x||, where ||x|| is the Euclidean norm.

88. When is Unit Vector Scaling used?


When working with text data in NLP (TF-IDF) and clustering algorithms.

89. What is the impact of Unit Vector Scaling?


It normalizes the direction of feature vectors while preserving relative differences.

90. How does Unit Vector Scaling differ from Min-Max Scaling?
Min-Max rescales within [0,1], while Unit Vector preserves feature relationships.

11. PCA (Principal Component Analysis)


91. What is PCA?
A dimensionality reduction technique that transforms correlated variables into
uncorrelated components.

92. What is the main objective of PCA?


To reduce feature dimensionality while retaining maximum variance in data.

93. How are principal components computed?


Using eigenvectors and eigenvalues of the covariance matrix.

94. What is the explained variance ratio in PCA?


It represents the proportion of total variance retained by each principal component.

95. When should PCA be avoided?


When interpretability of features is important or when dealing with categorical data.

12. Data Encoding


96. What is Data Encoding in ML?
Transforming categorical variables into numerical form for model training.

97. What are types of Data Encoding?


Label Encoding, One-Hot Encoding, Ordinal Encoding, and Target Encoding.

98. What is Label Encoding?


Assigning unique numerical values to categorical labels.

99. What is One-Hot Encoding?


Creating binary variables for each category in a categorical feature.

101.When is One-Hot Encoding preferred over Label Encoding?


When categorical values have no ordinal relationship.

13. Nominal vs One-Hot Encoding


101.What is Nominal Encoding?
A technique for categorical features without order, often using One-Hot Encoding.

102.What is the disadvantage of One-Hot Encoding?


It increases feature dimensionality significantly with many categories.

103.What is Dummy Variable Trap?


A situation where One-Hot Encoding introduces multicollinearity in linear models.

104.How to avoid the Dummy Variable Trap?


By dropping one category from the One-Hot Encoded features.

105.Which ML models benefit from One-Hot Encoding?


Tree-based models like Decision Trees and Random Forest.

14. Covariance and Correlation


106.What is Covariance?
A measure of how two variables change together.

107.What is Correlation?
A normalized measure of the relationship between two variables, ranging
from -1 to 1.

108.What is the difference between Covariance and Correlation?


Correlation standardizes covariance, making it easier to compare.

109.Which correlation method is commonly used?


Pearson correlation coefficient.

110.Why is Correlation preferred over Covariance?


Correlation is scale-independent, making it more interpretable.
Machine Learning Algorithms
1. Simple Linear Regression
1. What is Simple Linear Regression?
It is a statistical method to model the relationship between a dependent variable and
a single independent variable.

2. What is the equation of Simple Linear Regression?


y=mx+cy = mx + cy=mx+c, where mmm is the slope and ccc is the intercept.

3. What assumptions does Simple Linear Regression make?


Linearity, independence, homoscedasticity, and normality of residuals.

4. How is the best-fit line determined in Linear Regression?


By minimizing the sum of squared residuals using the Least Squares method.

5. Which evaluation metrics are used for Simple Linear Regression?


R-squared, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).

2. Multi Linear Regression


6. What is Multiple Linear Regression?
It models the relationship between a dependent variable and multiple independent
variables.

7. What is Multicollinearity?
A condition where independent variables are highly correlated, affecting model
stability.

8. How can Multicollinearity be detected?


Using the Variance Inflation Factor (VIF).

9. How does Multi Linear Regression handle categorical variables?


By using One-Hot Encoding or Label Encoding.

10. How can overfitting be reduced in Multi Linear Regression?


By using techniques like Ridge and Lasso Regression.

3. Polynomial Regression
11. What is Polynomial Regression?
A regression technique where the relationship between variables is modeled as an
nth-degree polynomial.

12. What is the equation of a second-degree Polynomial Regression?


y=ax2+bx+cy = ax^2 + bx + cy=ax2+bx+c.
13. Why is Polynomial Regression preferred over Linear Regression?
It captures non-linear relationships between variables.

14. What is the risk of using higher-degree Polynomial Regression?


Overfitting, as the model becomes too complex.

15. How to avoid overfitting in Polynomial Regression?


Use cross-validation and select an optimal polynomial degree.

4. R-squared & Adjusted R-squared


16. What is R-squared?
A measure of how well a regression model explains variance in the dependent
variable.

17. What is the range of R-squared?


Between 0 and 1, where 1 indicates a perfect fit.

18. Why is Adjusted R-squared preferred over R-squared in multiple regression?


It accounts for the number of predictors and adjusts for overfitting.

19. How is Adjusted R-squared calculated?


1−(1−R2)(n−1)n−k−11 - \frac{(1-R^2)(n-1)}{n-k-1}1−n−k−1(1−R2)(n−1), where nnn is
the number of observations and kkk is the number of predictors.

20. Can R-squared decrease when adding more variables?


No, but Adjusted R-squared can decrease if the added variables do not improve the
model.

5. MSE, MAE & RMSE


21. What is Mean Squared Error (MSE)?
The average of the squared differences between predicted and actual values.

22. Why is Root Mean Squared Error (RMSE) preferred over MSE?
RMSE is in the same unit as the target variable, making it more interpretable.

23. How does Mean Absolute Error (MAE) differ from MSE?
MAE calculates the average absolute errors, while MSE squares the errors.

24. Which metric is more sensitive to outliers: MAE or MSE?


MSE, because it squares the differences.

25. When should MAE be preferred over RMSE?


When dealing with non-Gaussian error distributions or outliers.

6. Simple Linear Regression with Python


26. Which library is used for Linear Regression in Python?
scikit-learn.

27. How to import Linear Regression from scikit-learn?


from sklearn.linear_model import LinearRegression.

28. How to fit a Simple Linear Regression model in Python?


model = LinearRegression().fit(X_train, y_train).

29. How to get the coefficients of a trained regression model?


model.coef_ for slope and model.intercept_ for the intercept.

30. How to make predictions using a trained model?


y_pred = model.predict(X_test).

7. Multi Linear Regression with Python


31. How to prepare data for Multi Linear Regression in Python?
Convert categorical variables using pd.get_dummies() and standardize numerical
variables.

32. Which method is used to handle multicollinearity in Python?


Variance Inflation Factor (VIF) from statsmodels.stats.outliers_influence.

33. How to fit a Multi Linear Regression model using sklearn?


model = LinearRegression().fit(X_train, y_train).

34. How to interpret model coefficients in Multi Linear Regression?


Each coefficient represents the change in the dependent variable for a unit change in
that feature.

35. How to evaluate Multi Linear Regression in Python?


Using metrics like r2_score(y_test, y_pred) and mean_squared_error(y_test, y_pred).

8. Ridge Regression
36. What is Ridge Regression?
A regression technique that applies L2 regularization to reduce overfitting.

37. What is the penalty term in Ridge Regression?


λ∑w2\lambda \sum w^2λ∑w2, which penalizes large coefficients.

38. How to implement Ridge Regression in Python?


from sklearn.linear_model import Ridge; model = Ridge(alpha=1.0).fit(X_train,
y_train).

39. What is the effect of increasing alpha in Ridge Regression?


It shrinks coefficients toward zero, reducing model complexity.
40. Can Ridge Regression perform feature selection?
No, it only shrinks coefficients but does not eliminate them.

9. Lasso Regression
41. What is Lasso Regression?
A regression technique that applies L1 regularization to perform feature selection.

42. How does Lasso Regression help in feature selection?


It forces some coefficients to become zero, effectively removing them.

43. How to implement Lasso Regression in Python?


from sklearn.linear_model import Lasso; model = Lasso(alpha=1.0).fit(X_train,
y_train).

44. What happens if the alpha value is too high in Lasso Regression?
The model can underfit by eliminating too many features.

45. How does Lasso differ from Ridge Regression?


Lasso can remove features (L1 penalty), while Ridge only shrinks coefficients (L2
penalty).

10. Elastic Net Regression


46. What is Elastic Net Regression?
A combination of Ridge (L2) and Lasso (L1) regression to balance regularization and
feature selection.

47. What is the Elastic Net formula?


Loss=RSS+λ1∑∣w∣+λ2∑w2Loss = RSS + \lambda_1 \sum |w| + \lambda_2 \sum
w^2Loss=RSS+λ1∑∣w∣+λ2∑w2.

48. How does Elastic Net differ from Ridge and Lasso?
It balances both techniques, preventing limitations like Lasso selecting too few
features.

49. When should Elastic Net be preferred over Ridge or Lasso?


When features are highly correlated, as it stabilizes feature selection.

50. How to implement Elastic Net Regression in Python?


from sklearn.linear_model import ElasticNet; model = ElasticNet(alpha=1.0,
l1_ratio=0.5).fit(X_train, y_train).

11. Decision Trees


51. What is a Decision Tree?
A tree-based algorithm that splits data into branches based on feature values to
classify or predict outcomes.
52. What is entropy in Decision Trees?
A measure of impurity or randomness in data, calculated as −∑pilog⁡2pi-\sum p_i
\log_2 p_i−∑pilog2pi.

53. What is Gini Impurity in Decision Trees?


A measure of how often a randomly chosen element would be incorrectly classified.

54. What is pruning in Decision Trees?


A technique to reduce model complexity by removing unnecessary branches.

55. How to implement a Decision Tree Classifier in Python?


from sklearn.tree import DecisionTreeClassifier; model =
DecisionTreeClassifier().fit(X_train, y_train).

12. Support Vector Machines


56. What is Support Vector Machine (SVM)?
A supervised learning algorithm that finds the optimal hyperplane to classify data
points.

57. What is the kernel trick in SVM?


A method to transform non-linearly separable data into higher dimensions for
classification.

58. What are the common kernels used in SVM?


Linear, Polynomial, RBF (Radial Basis Function), and Sigmoid.

59. How does the C parameter affect SVM?


A higher C leads to a hard-margin classifier (low bias, high variance), while a lower C
allows soft margins.

60. How to implement SVM in Python?


from sklearn.svm import SVC; model = SVC(kernel='rbf', C=1).fit(X_train, y_train).

13. Naïve Bayes


61. What is Naïve Bayes?
A probabilistic classifier based on Bayes' theorem with an assumption of feature
independence.

62. What are the types of Naïve Bayes classifiers?


Gaussian, Multinomial, and Bernoulli Naïve Bayes.

63. What is the formula for Bayes’ Theorem?


P(A∣B)=P(B∣A)P(A)P(B)P(A|B) = \frac{P(B|A) P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)P(A).
64. Why is Naïve Bayes called ‘naïve’?
Because it assumes that features are conditionally independent, which is rarely true
in real-world data.

65. How to implement Naïve Bayes in Python?


from sklearn.naive_bayes import GaussianNB; model = GaussianNB().fit(X_train,
y_train).

14. Ensemble Learning


66. What is Ensemble Learning?
A technique that combines multiple weak models to create a stronger predictive
model.

67. What are the types of Ensemble Learning?


Bagging, Boosting, and Stacking.

68. What is Bagging in Ensemble Learning?


A method where multiple models are trained on different subsets of the data and
their predictions are averaged.

69. What is the Out-of-Bag (OOB) score in Bagging?


A performance estimate of a Bagging model based on unused training samples.

70. How to implement a Random Forest Classifier in Python?


from sklearn.ensemble import RandomForestClassifier; model =
RandomForestClassifier(n_estimators=100).fit(X_train, y_train).

15. Boosting
71. What is Boosting in Machine Learning?
A technique that sequentially trains models, with each model correcting the errors of
the previous one.

72. What is the difference between Bagging and Boosting?


Bagging trains models independently, while Boosting trains them sequentially.

73. What is XGBoost?


An optimized Gradient Boosting algorithm that is highly efficient and scalable.

74. What is the learning rate in Boosting?


A hyperparameter that controls how much each weak learner contributes to the final
prediction.

75. How to implement XGBoost in Python?


from xgboost import XGBClassifier; model = XGBClassifier().fit(X_train, y_train).
16. K-Means Clustering
76. What is K-Means Clustering?
A clustering algorithm that partitions data into K clusters by minimizing intra-cluster
variance.

77. What is the main disadvantage of K-Means?


It requires selecting K beforehand and is sensitive to outliers.

78. How is the optimal number of clusters determined in K-Means?


Using the Elbow Method or Silhouette Score.

79. What is the difference between K-Means and Hierarchical Clustering?


K-Means requires K as input, while Hierarchical Clustering creates a tree of clusters.

80. How to implement K-Means in Python?


from sklearn.cluster import KMeans; model = KMeans(n_clusters=3).fit(X_train).

17. Time Series


81. What is a Time Series?
A sequence of data points indexed in time order.

82. What are the components of Time Series?


Trend, Seasonality, Cyclicity, and Irregularity.

83. What is the difference between Stationary and Non-Stationary Time Series?
A stationary series has constant mean and variance, while a non-stationary series
does not.

84. What are ACF and PACF used for in Time Series Analysis?
ACF (Autocorrelation Function) checks correlation at different lags, while PACF
(Partial ACF) isolates direct correlations.

85. How to implement ARIMA in Python?


from statsmodels.tsa.arima.model import ARIMA; model = ARIMA(y_train,
order=(1,1,1)).fit().

18. Anomaly Detection


86. What is Anomaly Detection?
It identifies rare or unusual patterns in data that deviate from expected behavior.

87. What is Isolation Forest in Anomaly Detection?


A tree-based algorithm that isolates anomalies by randomly selecting splits and
observing isolation depth.

88. How does DBSCAN detect anomalies?


It labels points in low-density regions as outliers based on the number of neighbors
within a certain distance.

89. What is the Local Outlier Factor (LOF)?


A density-based algorithm that detects anomalies by comparing local density around
a point to its neighbors.

90. How to implement Isolation Forest in Python?


from sklearn.ensemble import IsolationForest; model = IsolationForest().fit(X_train).

19. Logistic Regression


91. What is Logistic Regression?
A classification algorithm that predicts probabilities using a sigmoid function.

92. What is the sigmoid function formula?


σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

93. What is the difference between Logistic and Linear Regression?


Logistic Regression is used for classification, while Linear Regression is used for
continuous predictions.

94. What is the decision boundary in Logistic Regression?


A threshold (typically 0.5) that separates positive and negative classes.

95. How to implement Logistic Regression in Python?


from sklearn.linear_model import LogisticRegression; model =
LogisticRegression().fit(X_train, y_train).

20. Principal Component Analysis (PCA)


96. What is PCA?
A dimensionality reduction technique that transforms correlated features into
orthogonal components.

97. What is the Curse of Dimensionality?


The problem where high-dimensional data increases computational cost and reduces
model performance.

98. What is the mathematical intuition behind PCA?


It finds eigenvectors of the covariance matrix and projects data onto the top
eigenvectors.

99. What is the role of eigenvalues in PCA?


Eigenvalues indicate the variance explained by each principal component.

100.How to implement PCA in Python?


from sklearn.decomposition import PCA; pca =
PCA(n_components=2).fit_transform(X_train).
21. Cross-Validation & Hyperparameter Tuning
101.What is Cross-Validation?
A technique to evaluate model performance by splitting data into multiple training
and testing subsets.

102.What is k-Fold Cross-Validation?


Data is divided into k subsets, with k-1 used for training and 1 for testing, repeated k
times.

103.What is the difference between GridSearchCV and RandomizedSearchCV?


GridSearchCV searches all parameter combinations, while RandomizedSearchCV
samples a random subset.

104.What is Hyperparameter Tuning?


The process of optimizing model parameters to improve performance.

105.How to implement GridSearchCV in Python?


from sklearn.model_selection import GridSearchCV; grid = GridSearchCV(model,
param_grid).fit(X_train, y_train).

22. Performance Metrics


106.What is Precision in classification?
The proportion of true positives among predicted positives:
Precision=TPTP+FPPrecision = \frac{TP}{TP + FP}Precision=TP+FPTP.

107.What is Recall in classification?


The proportion of true positives among actual positives: Recall=TPTP+FNRecall =
\frac{TP}{TP + FN}Recall=TP+FNTP.

108.What is F1-Score?
The harmonic mean of Precision and Recall: F1=2×Precision×RecallPrecision+RecallF1
= 2 \times \frac{Precision \times Recall}{Precision +
Recall}F1=2×Precision+RecallPrecision×Recall.

109.What is the ROC curve?


A plot of True Positive Rate vs. False Positive Rate at different thresholds.

110.How to implement the Confusion Matrix in Python?


from sklearn.metrics import confusion_matrix; cm = confusion_matrix(y_test,
y_pred).

23. Clustering & DBSCAN


111.What is DBSCAN?
A density-based clustering algorithm that groups points based on density rather than
predefined clusters.
112.What are the main parameters of DBSCAN?
eps (maximum distance between points in a cluster) and min_samples (minimum
points needed to form a dense region).

113.What is the advantage of DBSCAN over K-Means?


It does not require the number of clusters as input and can detect noise and
outliers.

114.What is the Silhouette Score?


A metric to evaluate clustering quality based on intra-cluster and inter-cluster
distances.

115.How to implement DBSCAN in Python?


from sklearn.cluster import DBSCAN; model = DBSCAN(eps=0.5,
min_samples=5).fit(X_train).

24. End-to-End Machine Learning Project


116.What are the main steps in an ML project?
Problem Definition → Data Collection → Data Preprocessing → Model Training →
Evaluation → Deployment.

117.What is the difference between training, validation, and test sets?


Training is for model learning, validation is for hyperparameter tuning, and test is for
final evaluation.

118.What is the purpose of Model Deployment?


To integrate the trained model into a real-world application for predictions.

119.What is Model Drift?


The degradation of model performance over time due to changes in data distribution.

120.What tools are used for ML model deployment?


Flask, FastAPI, Streamlit, Docker, AWS Lambda, Google Cloud AI, etc.

25. Time Series Analysis


121.What are the common models for Time Series Forecasting?
ARIMA, SARIMA, Prophet, LSTM, and Exponential Smoothing.

122.What is the difference between ACF and PACF?


ACF shows correlation with all previous lags, while PACF shows direct correlation
with specific lags.

123.What is differencing in Time Series?


A technique to remove trends and make a series stationary by subtracting previous
values.
124What is the Box-Jenkins Methodology?
A systematic approach for ARIMA modeling, including identification, estimation, and
validation.

125.How to implement ARIMA in Python?


from statsmodels.tsa.arima.model import ARIMA; model = ARIMA(y_train,
order=(1,1,1)).fit().

1. Introduction to Deep Learning & Use Cases


1. What is Deep Learning?
A subset of ML that uses neural networks with multiple layers to learn complex
patterns.

2. What are common applications of Deep Learning?


Image recognition, NLP, autonomous driving, recommendation systems, healthcare
diagnostics, etc.

3. What is the key advantage of Deep Learning over traditional ML?


It automatically extracts features from raw data without manual feature engineering.

4. What is the difference between Deep Learning and Machine Learning?


Deep Learning uses deep neural networks, while ML relies on algorithms like SVM,
Decision Trees, etc.

5. Which hardware accelerates Deep Learning computations?


GPUs and TPUs enhance training speed by parallelizing matrix operations.

2. Neural Network, Perceptron & Mathematical Explanation


6. What is a Perceptron?
A single-layer neural network model that acts as a binary classifier.

7. What is the mathematical equation of a perceptron?


y=f(WX+b)y = f(WX + b)y=f(WX+b), where WWW is weights, XXX is input, bbb is bias,
and fff is an activation function.

8. What is the difference between Perceptron and Multi-Layer Perceptron (MLP)?


Perceptron has a single layer, while MLP consists of multiple hidden layers.

9. What is the role of bias in neural networks?


Bias helps the model shift activation functions to fit data better.

10. Why do deep networks perform better than shallow ones?


They learn complex hierarchical features and capture non-linear relationships.

3. Mathematical Concepts in Deep Learning


11. What mathematical concepts are used in Deep Learning?
Linear algebra, calculus, probability, optimization, and statistics.

12. What is the chain rule in backpropagation?


It computes gradients by recursively applying derivatives from output to input layers.

13. What is the purpose of matrix multiplication in neural networks?


It computes weighted sums of inputs, crucial for transformations in layers.

14. What is the gradient in deep learning?


It represents the partial derivative of a loss function w.r.t. model parameters.

15. Why is normalization important in Deep Learning?


It ensures stable gradient updates and accelerates convergence.

4. Activation Functions
16. What is an activation function?
A non-linear function that introduces complexity to neural networks.

17. What is the formula for the sigmoid activation function?


σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1

18. Why is ReLU preferred over sigmoid in deep networks?


It avoids vanishing gradients and allows faster training.

19. What is the main issue with ReLU?


The "dying ReLU" problem, where neurons output zero due to negative values.

20. What is Leaky ReLU, and why is it used?


It solves the dying ReLU problem by allowing small gradients for negative inputs.

5. Forward & Backward Propagation


21. What is forward propagation in neural networks?
It computes predictions by passing inputs through network layers.

22. What is the key mathematical operation in forward propagation?


Matrix multiplication of inputs and weights, followed by activation.

23. What is backpropagation?


A process to update weights by computing gradients using the chain rule.

24. What is the role of gradients in backpropagation?


They guide weight updates in the direction that minimizes loss.

25. What is the vanishing gradient problem?


When gradients become too small in deep networks, slowing learning.
6. Implementation of ANN using Keras
26. What is Keras?
A high-level API for building deep learning models on TensorFlow.

27. How to define a sequential ANN in Keras?


model = Sequential([Dense(64, activation='relu'), Dense(1, activation='sigmoid')])

28. Which optimizer is commonly used in Keras for ANN training?


Adam optimizer (keras.optimizers.Adam).

29. What is the loss function for binary classification in Keras?


binary_crossentropy.

30. How to compile and train a model in Keras?


model.compile(loss='binary_crossentropy', optimizer='adam'); model.fit(X_train,
y_train, epochs=10).

7. Loss Functions
31. What is a loss function?
A function that quantifies the error between predicted and actual values.

32. Which loss function is used for regression problems?


Mean Squared Error (MSE).

33. Which loss function is used for multi-class classification?


Categorical Cross-Entropy.

34. Why is MSE preferred over MAE in deep learning?


MSE penalizes large errors more, leading to smoother gradients.

35. What is KL Divergence loss?


A measure of difference between two probability distributions.

8. Optimizers
36. What is an optimizer in Deep Learning?
An algorithm that updates network weights to minimize loss.

37. What is the difference between SGD and Adam?


SGD updates weights with a fixed learning rate, while Adam adapts learning rates for
each parameter.

38. What is momentum in optimizers?


It helps accelerate updates by using past gradients.

39. What is the advantage of RMSprop?


It normalizes learning rates for different parameters, improving convergence.
40. Which optimizer is best for sparse data?
Adam or RMSprop.

9. TensorFlow
41. What is TensorFlow?
An open-source deep learning framework for building neural networks.

42. What are Tensors in TensorFlow?


Multi-dimensional arrays used for computations.

43. How to define a simple Tensor in TensorFlow?


import tensorflow as tf; tensor = tf.constant([[1,2], [3,4]]).

44. What is the role of tf.GradientTape()?


It records operations for automatic differentiation.

45. What is a TensorFlow checkpoint?


A mechanism to save and restore model weights.

10. PyTorch
46. What is PyTorch?
A flexible deep learning library known for dynamic computation graphs.

47. What is the difference between PyTorch and TensorFlow?


PyTorch supports dynamic graphs, while TensorFlow initially used static graphs.

48. How to define a tensor in PyTorch?


import torch; tensor = torch.tensor([[1,2], [3,4]]).

49. What is Autograd in PyTorch?


A module for automatic differentiation.

50. How to move a PyTorch tensor to GPU?


tensor.cuda().

1. CNN Foundations and Architectures


1. What is the foundation of CNN (Convolutional Neural Networks)?
CNNs are specialized for processing grid-like data, such as images, by using
convolutional layers to extract features.

2. What is a convolutional layer in CNN?


It applies a filter to the input data to create feature maps, detecting specific patterns.

3. What is a pooling layer in CNN?


A layer that reduces the spatial dimensions of feature maps to prevent overfitting
and reduce computation.

4. What is LeNet architecture?


LeNet is an early CNN architecture consisting of convolutional layers followed by fully
connected layers, primarily for digit recognition.

5. What is AlexNet?
A deep CNN model that revolutionized image classification by using more layers and
GPUs for training, achieving top performance in the 2012 ImageNet competition.

6. What makes VGGNet different from other CNN architectures?


VGGNet uses a deep network with small 3x3 convolution filters, simplifying the
design and improving performance.

7. What is ResNet and why is it important?


ResNet uses skip connections (or residual connections) to solve the vanishing
gradient problem, allowing deeper networks.

8. What is the Inception network in CNNs?


Inception networks use a multi-branch architecture, applying different convolutions
at each layer to capture various types of features.

9. What is RCNN (Region-based CNN)?


RCNN uses selective search to propose candidate regions and then applies CNN to
classify these regions for object detection.

10. What is Fast RCNN?


Fast RCNN improves RCNN by feeding the entire image into the network first and
then classifying the regions, making the process faster.

11. What is Faster RCNN?


Faster RCNN introduces a Region Proposal Network (RPN) to generate object
proposals, speeding up the object detection pipeline.

12. What is Non-Maximum Suppression (NMS)?


NMS is used in object detection to eliminate overlapping bounding boxes by keeping
the one with the highest confidence score.

13. What is the role of the fully connected layer in CNN?


The fully connected layer at the end of CNN is responsible for combining features
learned in earlier layers to make final predictions.

14. What is the difference between convolution and correlation in CNN?


Convolution flips the filter during operation, while correlation does not, but in
practice, both are similar in CNNs.
15. What is the purpose of using dropout in CNN?
Dropout is used as a regularization technique to prevent overfitting by randomly
deactivating certain neurons during training.

2. YOLO (You Only Look Once) Architectures


16. What is YOLO V2?
YOLO V2 is an improved version of YOLO that uses anchor boxes to predict bounding
boxes and improves accuracy and speed.

17. What is YOLO V3?


YOLO V3 is a more accurate and faster version with a new backbone (Darknet-53) and
multi-scale detection.

18. What is YOLO V4?


YOLO V4 introduces advanced techniques like CIoU loss, self-adversarial training, and
more optimizations for better performance.

19. What is YOLO V5?


YOLO V5, developed by Ultralytics, is an unofficial, highly optimized, and easy-to-use
version for real-time object detection.

20. What is YOLO V6?


YOLO V6 is an advanced version that includes further optimizations for real-time
inference and accuracy in object detection tasks.

21. What is YOLO V7?


YOLO V7 introduces enhancements like better accuracy, faster inference, and support
for more complex real-world scenarios.

22. What is RoboFlow?


RoboFlow is a platform for annotating and preparing data for training deep learning
models, especially for YOLO-based models.

23. What is Custom Training with YOLO V5?


Custom training in YOLO V5 involves fine-tuning the pre-trained model with your
custom dataset for specific object detection tasks.

24. How to train custom data using YOLO V7?


Training with YOLO V7 involves preparing the dataset, configuring the model’s
configuration file, and running training scripts.
25. How does YOLO perform face recognition?

YOLO can detect faces by treating face detection as an object detection problem,
using its grid-based approach to localize and classify faces.

3. Generative Adversarial Networks (GANs)


26. What is a Generative Adversarial Network (GAN)?
A GAN consists of two networks, a generator and a discriminator, competing to
create and evaluate realistic data.

27. How does GAN training work?


The generator creates fake data, and the discriminator evaluates it, with both
improving iteratively through adversarial training.

28. What is DCGAN (Deep Convolutional GAN)?


DCGAN is a type of GAN that uses deep convolutional layers in both the generator
and discriminator for image generation.

29. What is StyleGAN?


StyleGAN generates high-quality images with controllable attributes, leveraging a
style-based generator architecture.

30. What is WGAN (Wasserstein GAN)?


WGAN introduces a new loss function (Wasserstein distance) to improve the stability
of GAN training and reduce mode collapse.

31. How is GAN used for image generation?


GANs can generate realistic images by training the generator to produce images that
are indistinguishable from real data.

32. What is mode collapse in GANs?


Mode collapse occurs when the generator produces limited variety in the output,
reducing the diversity of generated samples.

33. What are the challenges of training GANs?


Training GANs is difficult due to instability, mode collapse, and the delicate balance
required between the generator and discriminator.

34. How is GAN applied in data augmentation?


GANs can generate additional synthetic data, which is useful for augmenting training
datasets in tasks with limited data.

35. What is a typical use case for GANs?


GANs are widely used for image generation, style transfer, super-resolution, and
inpainting, among other creative applications.

1. Introduction to NLP
1. What is NLP (Natural Language Processing)?
NLP is a field of AI that focuses on the interaction between computers and human
language, enabling machines to understand, interpret, and generate human
language.
2. Why is NLP important?
NLP allows machines to process and analyze large amounts of natural language data,
facilitating tasks like sentiment analysis, translation, and chatbots.

3. What are the main tasks in NLP?


NLP tasks include tokenization, part-of-speech tagging, named entity recognition,
machine translation, and text classification.

4. What is Text Preprocessing in NLP?


Text preprocessing involves cleaning and preparing raw text for analysis by tasks like
removing stop words, punctuation, and stemming or lemmatization.

5. What are the challenges of NLP?


NLP faces challenges such as ambiguity, language complexity, context understanding,
and language differences.

2. History of NLP
6. What is the history of NLP?
NLP began in the 1950s with rule-based systems and evolved into statistical models
in the 1990s, and more recently, into deep learning-based models.

7. What is the Turing Test in NLP?


The Turing Test, proposed by Alan Turing in 1950, tests a machine’s ability to exhibit
intelligent behavior indistinguishable from that of a human.

8. What is the major advancement in NLP in recent years?


Deep learning models like Transformers, BERT, and GPT have drastically improved
NLP performance in tasks like machine translation and question answering.

3. Web Scraping and Text Processing


9. What is web scraping?
Web scraping is the process of extracting data from websites by parsing HTML
content.

10. What is regex in NLP?


Regex (regular expressions) is used to identify specific patterns in text, such as phone
numbers, emails, or dates.

11. What is tokenization?


Tokenization is the process of breaking a text into smaller units like words or
sentences.

12. What is sentence processing in NLP?


Sentence processing involves parsing and analyzing sentences to extract meaning,
such as sentence segmentation and syntactic structure.
13. What is lemmatization in text processing?
Lemmatization reduces words to their base or root form, considering context, unlike
stemming, which is based on rules.

4. Word Embedding and Vectorization


14. What is word embedding?
Word embedding represents words in a continuous vector space where semantically
similar words are close together.

15. What is Word2Vec?


Word2Vec is a neural network-based model that learns word embeddings by
predicting context words in a given text corpus.

16. What is a co-occurrence vector?


A co-occurrence vector captures the frequency with which words appear together in
a given window, used for representing word meanings.

17. What is Doc2Vec?


Doc2Vec extends Word2Vec by representing entire documents as vectors, learning to
predict context words for a document.

18. What is TF-IDF (Term Frequency-Inverse Document Frequency)?


TF-IDF is a statistical measure used to evaluate the importance of a word in a
document relative to a corpus, balancing frequency and uniqueness.

19. What is the difference between Bag of Words and TF-IDF?


Bag of Words counts word frequency in documents, while TF-IDF adjusts these
counts by how common the word is across all documents, emphasizing more unique
terms.

5. NLP Libraries
20. What is TextBlob?
TextBlob is a Python library for processing textual data, offering tools for part-of-
speech tagging, noun phrase extraction, sentiment analysis, and translation.

21. What is NLTK?


NLTK (Natural Language Toolkit) is a comprehensive library for NLP in Python,
providing tools for text processing, classification, tokenization, and parsing.

22. What is Gensim?


Gensim is a Python library for unsupervised learning, focused on topic modeling and
document similarity, especially with word and document embeddings.

23. What is Spacy?


Spacy is an advanced NLP library in Python, optimized for large-scale text processing
with pre-trained models for various NLP tasks.

24. What is the role of word embeddings in NLP?


Word embeddings capture semantic relationships between words, enabling better
handling of synonyms, analogies, and linguistic structure in text analysis.

6. Recurrent Neural Networks (RNN) and Variants


25. What is an RNN (Recurrent Neural Network)?
RNN is a neural network architecture designed for processing sequences of data,
maintaining a memory of previous inputs.

26. What is LSTM (Long Short-Term Memory)?


LSTM is a type of RNN designed to handle long-range dependencies in sequences by
using gating mechanisms to control memory flow.

27. What is Bi-LSTM?


Bi-LSTM is an extension of LSTM that processes sequences in both forward and
backward directions, capturing context from both past and future inputs.

28. How does RNN work in NLP?


RNNs process sequences of text by maintaining a hidden state that is updated at
each time step, useful for tasks like sentiment analysis and machine translation.

29. What is the vanishing gradient problem in RNNs?


The vanishing gradient problem occurs when gradients become too small during
backpropagation, making it difficult for RNNs to learn long-range dependencies.

30. What is the advantage of LSTM over standard RNN?


LSTMs mitigate the vanishing gradient problem and are better at capturing long-term
dependencies in sequential data.

31. What is the purpose of sequence-to-sequence models (Seq2Seq)?


Seq2Seq models, using RNNs or LSTMs, are used for tasks like machine translation,
where the input and output are both sequences of varying lengths.

7. Attention Mechanism and Transformers


32. What is self-attention in NLP?
Self-attention allows a model to focus on different parts of a sequence for each word,
enabling the capture of dependencies regardless of distance in the text.

33. What is the attention mechanism in neural networks?


The attention mechanism allows the model to weigh the importance of different
input parts, improving tasks like translation and summarization.

34. What is the Transformer model?


Transformer is a model architecture relying entirely on attention mechanisms and
eliminating recurrent layers for parallel processing of sequences.

35. What is BERT (Bidirectional Encoder Representations from Transformers)?


BERT is a pre-trained transformer-based model designed to understand context in
both directions, improving performance in NLP tasks like question answering.

36. How does BERT differ from traditional RNNs?


BERT uses a bidirectional attention mechanism to capture contextual information in
both directions, unlike RNNs which process sequences in one direction.

37. What is GPT (Generative Pretrained Transformer)?


GPT is a transformer-based model focused on generating text, trained on large
corpora of data, and fine-tuned for various NLP tasks.

38. What is masked language modeling in BERT?


Masked language modeling involves randomly masking words in a sentence and
training the model to predict the masked words based on the context.

39. What is the difference between BERT and GPT?


BERT is designed for understanding text through bidirectional context, while GPT
focuses on generating coherent text with unidirectional context.

8. NLP Projects and Applications


40. What is a question-answering project using BERT?
A question-answering project using BERT involves fine-tuning the model on a dataset
to answer questions based on a given context or passage.

41. What is sequence-to-sequence in NLP?


Sequence-to-sequence models are used for transforming one sequence into another,
such as translating text from one language to another.

42. What are encoder-decoder models in NLP?


Encoder-decoder models use two neural networks, one to encode input sequences
and the other to decode them into the output sequence.

43. How does Seq2Seq with attention work?


Seq2Seq with attention improves traditional Seq2Seq models by allowing the
decoder to focus on different parts of the input sequence during decoding.

44. What is text normalization in NLP?


Text normalization involves converting text into a consistent format by removing
noise, such as converting all text to lowercase or removing punctuation.

Computer Vision Projects


1. What is object tracking in computer vision?
Object tracking involves following the movement of a specific object across frames in
a video, typically using algorithms like Kalman filters or Deep SORT.

2. What is image classification?


Image classification assigns a label or category to an entire image, such as identifying
objects, animals, or scenes, using models like CNNs.

3. What is image-to-text in computer vision?


Image-to-text refers to the process of converting visual information from an image
into a textual description, often using OCR (Optical Character Recognition) or caption
generation models.

4. How does a vision-based attendance system work?


A vision-based attendance system uses facial recognition or object detection to
automatically mark the attendance of individuals based on their images.

5. What is sign language detection in computer vision?


Sign language detection involves recognizing hand gestures and movements in
images or videos and converting them into readable text or speech.

6. What is a shredder system in computer vision?


A shredder system uses computer vision techniques to detect and sort materials or
objects based on image classification for processing in shredding machines.

Big Data
Introduction to Big Data
7. What is Big Data?
Big Data refers to extremely large datasets that cannot be processed using traditional
data management tools due to their volume, variety, velocity, and complexity.

8. What are the applications of Big Data?


Big Data applications include predictive analytics, business intelligence, personalized
recommendations, fraud detection, and real-time data processing.

9. What is a Big Data pipeline?


A Big Data pipeline refers to a series of processes and tools that ingest, process,
analyze, and store large datasets, often using distributed computing frameworks like
Hadoop or Spark.

10. What is Hadoop?


Hadoop is an open-source framework for storing and processing large datasets in a
distributed computing environment using HDFS (Hadoop Distributed File System) and
MapReduce.

11. What is Hadoop Architecture?


Hadoop architecture consists of HDFS for storage, YARN for resource management,
and MapReduce for data processing, all distributed across multiple nodes in a cluster.

12. What are Hadoop Commands?


Hadoop commands are used to interact with Hadoop's HDFS and other components,
such as hadoop fs -ls, hadoop fs -put, hadoop fs -get, etc., for managing data files.

Spark Overview
13. What is Apache Spark?
Apache Spark is an open-source, distributed computing framework designed for fast
data processing and analytics, supporting in-memory computation and real-time data
streaming.

14. How to install Apache Spark?


Apache Spark can be installed by downloading the binary from the official website or
using package managers like apt or brew, followed by configuring environment
variables.

15. What is RDD in Spark?


RDD (Resilient Distributed Dataset) is the fundamental data structure in Spark,
representing an immutable, distributed collection of objects that can be processed in
parallel.

16. What are DataFrames in Spark?


DataFrames in Spark are a distributed collection of data organized into named
columns, providing a higher-level abstraction over RDDs and optimized for
performance.

17. What is Spark Architecture?


Spark architecture consists of a master node (driver) and multiple worker nodes
(executors), with tasks being distributed across these nodes for parallel execution.

18. What is Spark MLlib?


Spark MLlib is a scalable machine learning library in Spark that provides algorithms
for classification, regression, clustering, and collaborative filtering.

19. What is Spark NLP?


Spark NLP is a natural language processing library built on top of Apache Spark,
designed for processing and analyzing large text datasets using scalable machine
learning models.

Data Engineering
20. What is Data Engineering?
Data Engineering involves designing, building, and maintaining systems and
infrastructure for collecting, storing, and processing large volumes of data for
analysis.
21. What is Docker?
Docker is an open-source platform for automating the deployment and management
of applications in lightweight containers that ensure consistency across different
environments.

22. How to install Docker?


Docker can be installed by downloading the Docker Desktop application for
Windows/Mac or using package managers like apt or brew on Linux systems.

23. What is a practical use of Docker in data science?


Docker is used to create reproducible environments for machine learning models,
ensuring dependencies and configurations are consistent across various machines.

24. What is Spark DataFrame in practice?


Spark DataFrame is commonly used in practice for handling structured data,
performing SQL-like queries, and integrating with external data sources like CSV,
Parquet, and Hive.

25. What is a practical implementation of Spark?


A practical implementation of Spark includes reading large datasets, performing
transformations, running machine learning algorithms, and storing results for further
analysis.

Introduction to Power BI
1. What is Power BI?
Power BI is a business intelligence tool by Microsoft used for data visualization,
reporting, and analytics.

2. What are the main components of Power BI?


Power BI Desktop, Power BI Service, Power BI Mobile, Power BI Report Server, and
Power BI Embedded.

3. What are the different views in Power BI Desktop?


Report View, Data View, and Model View.

4. What is Power BI Service?


Power BI Service is a cloud-based SaaS platform used to publish, share, and
collaborate on reports and dashboards.

5. What is the Power BI Gateway?


A bridge that connects on-premises data sources to Power BI Service for scheduled
refresh.

6. What is Power Query in Power BI?


Power Query is a data transformation and ETL tool used to clean and shape data
before loading it into Power BI.

7. What is Power Pivot in Power BI?


Power Pivot is an in-memory data modeling component used for creating
relationships and calculations using DAX.

8. What is Power BI Embedded?


Power BI Embedded allows developers to integrate Power BI reports and dashboards
into applications.

9. What is the difference between Power BI Free, Pro, and Premium?


Free allows basic reporting, Pro enables sharing and collaboration, and Premium
offers dedicated capacity for large-scale reporting.

10. What file formats does Power BI support?


Excel (.xlsx, .xlsm), CSV, XML, JSON, PDF, Parquet, and Power BI Dataflows.

Data Sources and Connectivity


11. How does Power BI connect to different data sources?
Using built-in connectors for databases, cloud services, and APIs.

12. What are some common data sources for Power BI?
SQL Server, Azure, SharePoint, Excel, and Web APIs.

13. What is DirectQuery in Power BI?


DirectQuery allows Power BI to fetch real-time data from the source without loading
it into memory.

14. What is the difference between Import Mode and DirectQuery?


Import Mode loads data into memory for fast performance, whereas DirectQuery
fetches real-time data directly from the source.

15. What is a Power BI Dataflow?


A Power BI Dataflow is a cloud-based ETL solution used to clean and transform data
before using it in reports.

16. How do you connect Power BI to an API?


Using the Web Connector and REST API integration with Power Query.

17. What is a Power BI Gateway used for?


To connect on-premises data sources to Power BI Service securely.

18. What are the types of Power BI Gateway?


Personal Mode (for individual use) and Standard Mode (for enterprise-level data
sharing).
19. How do you refresh data in Power BI?
By setting up scheduled refresh in Power BI Service or manually refreshing in Power
BI Desktop.

20. Can Power BI connect to a NoSQL database like MongoDB?


Yes, using ODBC or third-party connectors.

Data Modeling and DAX (Data Analysis Expressions)


21. What is DAX in Power BI?
DAX is a formula language used in Power BI for creating calculated columns,
measures, and tables.

22. What is a calculated column in Power BI?


A column created using DAX that is stored in the model and computed row by row.

23. What is a measure in Power BI?


A dynamic aggregation calculation using DAX that updates based on user interaction.

24. What is the difference between SUM() and SUMX()?


SUM() aggregates a column, while SUMX() iterates row by row using an expression.

25. What is the purpose of CALCULATE() in DAX?


CALCULATE() modifies the filter context of an expression dynamically.

26. What is the role of ALL() in DAX?


ALL() removes filters from a column or table in a DAX expression.

27. What does the FILTER() function do in DAX?


FILTER() returns a subset of a table based on a given condition.

28. What is a relationship in Power BI?


A link between tables that allows for proper data modeling and aggregation.

29. What is the difference between Star Schema and Snowflake Schema?
Star Schema has a central fact table with directly linked dimension tables, while
Snowflake Schema normalizes dimensions into multiple related tables.

30. What is a composite model in Power BI?


A model that combines Import Mode and DirectQuery within the same dataset.

Data Visualization and Reports


31. What are the types of visualizations available in Power BI?
Bar charts, line charts, pie charts, tables, matrices, maps, scatter plots, and more.

32. What is a slicer in Power BI?


A slicer is a filtering tool that allows users to interactively filter reports.
33. What is a hierarchy in Power BI?
A structured representation of data that allows users to drill down into different
levels of detail.

34. What is conditional formatting in Power BI?


Conditional formatting changes visual elements like colors or fonts based on data
values.

35. What is a drill-through in Power BI?


A drill-through enables users to navigate from a summary report to a detailed report
for deeper insights.

36. What is the difference between a table and a matrix in Power BI?
A table displays data in a flat structure, while a matrix allows for grouping and
aggregation.

37. What is Bookmarks in Power BI?


Bookmarks save the current state of a report page for quick access.
38. What is a tooltip in Power BI?
A tooltip is an additional information popup that appears when hovering over a
visualization.

39. What is a KPI in Power BI?


A KPI (Key Performance Indicator) visualization tracks business goals against actual
performance.

40. How do you create a custom visualization in Power BI?


By using Power BI’s Custom Visual SDK or importing from AppSource.

Power BI Performance Optimization


41. How do you improve Power BI report performance?
Use aggregations, optimize DAX calculations, reduce data size, and use Import Mode.

42. What is query folding in Power BI?


Query folding is the process where Power Query pushes transformations to the data
source for better performance.

43. How do you reduce report loading time?


By limiting the number of visuals, using pre-aggregated data, and optimizing DAX
formulas.

44. What is incremental refresh in Power BI?


Incremental refresh loads only new or changed data instead of refreshing the entire
dataset.

45. How do you optimize DAX queries?


Use SUMX() instead of SUM(), avoid iterators, and optimize row context evaluations.
Power BI Security and Deployment
46. What is Row-Level Security (RLS) in Power BI?
RLS restricts data visibility for users based on assigned roles.

47. What is Object-Level Security (OLS) in Power BI?


OLS restricts access to specific tables or columns for certain users.

48. How do you publish a Power BI report?


By using the “Publish” button in Power BI Desktop to upload it to Power BI Service.

49. What is a workspace in Power BI?


A workspace is a shared environment in Power BI Service for collaboration on reports
and dashboards.

50. What is Power BI Premium?


A paid tier that provides dedicated capacity, larger datasets, and enhanced
performance.

Advanced Power BI Features


51. What is Composite Modeling in Power BI?
Composite Modeling allows combining Import Mode and DirectQuery within the
same dataset.

52. What is Hybrid Tables in Power BI?


Hybrid Tables combine Import Mode with DirectQuery for real-time and historical
data analysis.

53. What is Dataflows in Power BI?


Dataflows enable ETL processes within Power BI Service for reusable data
preparation.

54. What is a Shared Dataset in Power BI?


A dataset that multiple reports can use without duplicating data models.

55. How does AI Insights work in Power BI?


AI Insights apply machine learning models to data transformations using Azure ML
and Power Query.

56. What is Paginated Reports in Power BI?


Paginated Reports generate detailed, pixel-perfect reports suitable for printing.

57. How does the Smart Narrative feature work in Power BI?
Smart Narrative automatically generates text-based insights from data visualizations.
58. What is Sensitivity Labels in Power BI?
Sensitivity Labels classify and protect data by enforcing security and compliance
policies.

59. What is the difference between Power Automate and Power BI?
Power Automate automates workflows, while Power BI focuses on data visualization
and reporting.

60. What is the purpose of Q&A in Power BI?


The Q&A feature allows users to ask natural language questions and get instant
insights.

Power BI Integration
61. How do you integrate Power BI with Azure Synapse?
Using DirectQuery or Import Mode to connect with Azure Synapse Analytics.

62. Can Power BI connect to Google BigQuery?


Yes, using the native Google BigQuery connector.

63. How do you integrate Power BI with Python?


By enabling the Python scripting option in Power Query and using Python visuals.

64. How do you integrate Power BI with R?


By enabling R scripting in Power BI and using R visuals for advanced analytics.

65. How can Power BI be embedded in a web application?


Using Power BI Embedded APIs and embedding reports via iframes or JavaScript SDK.

66. Can Power BI connect to Snowflake?


Yes, using the Snowflake connector in Power BI.

67. What is the difference between Power BI and Tableau?


Power BI is tightly integrated with Microsoft products, while Tableau offers greater
customization and flexibility.

68. Can Power BI be integrated with SharePoint?


Yes, using SharePoint Lists, SharePoint Online, and embedding reports.

69. What is PowerApps and how does it relate to Power BI?


PowerApps allows building applications that can be embedded in Power BI reports
for interactivity.

70. Can Power BI connect to AWS services like Redshift?


Yes, using the Amazon Redshift connector.
Power BI Deployment & Automation
71. What are Power BI Deployment Pipelines?
Deployment Pipelines enable CI/CD workflows for Power BI reports across
development, test, and production.

72. How do you schedule a data refresh in Power BI?


Using Scheduled Refresh in Power BI Service settings.

73. How do you automate report distribution in Power BI?


Using Power BI subscriptions or Power Automate.

74. What is Incremental Refresh in Power BI?


Incremental Refresh loads only new or modified data instead of refreshing the entire
dataset.

75. What is Power BI REST API?


Power BI REST API enables automation and integration of Power BI reports, datasets,
and dashboards.

76. How do you deploy a Power BI report to multiple environments?


Using Power BI Deployment Pipelines or manually publishing versions.

77. What are Workspaces in Power BI?


Workspaces are collaborative environments in Power BI Service for report sharing
and development.

78. How do you restrict report access to specific users?


Using Row-Level Security (RLS) or access control settings in Power BI Service.

79. How do you automate Power BI report publishing?


Using Power BI REST API and Power Automate.

80. What is XMLA Endpoint in Power BI?


XMLA Endpoint enables external connectivity to Power BI datasets for advanced data
modeling.

Power BI Performance Tuning & Best Practices


81. How can you optimize Power BI performance?
Use Import Mode, optimize DAX queries, enable aggregations, and reduce visual
complexity.

82. What is Query Folding in Power BI?


Query Folding pushes transformations back to the data source for better
performance.
83. How do you handle large datasets in Power BI?
Use Aggregations, Composite Models, and Incremental Refresh.
84. How do you optimize DAX calculations in Power BI?
Reduce calculated columns, use measures instead, and optimize row context
evaluations.

85. What are Aggregations in Power BI?


Aggregations precompute summary data to improve report performance.

86. How do you reduce memory usage in Power BI?


Remove unnecessary columns, use relationships efficiently, and minimize cardinality.

87. What is Cardinality in Power BI?


Cardinality refers to the uniqueness of values in a column, impacting performance.

88. What is a calculated table in Power BI?


A table created using DAX that exists in memory and is not loaded from the source.
89. How do you optimize DirectQuery performance in Power BI?
Minimize visual interactions, reduce queries, and use indexed columns in the
database.

90. What is Materialized Views, and how does it help Power BI?
Materialized Views precompute query results for faster performance in relational
databases.

Power BI Advanced Analytics & AI Features


91. What is AI Insights in Power BI?
AI Insights apply ML models for sentiment analysis, text analytics, and anomaly
detection.

92. What is Decomposition Tree in Power BI?


A visualization that allows users to drill down into data hierarchies dynamically.

93. What is Key Influencers Visual in Power BI?


A visualization that helps identify factors influencing a particular outcome.

94. What is Anomaly Detection in Power BI?


A feature that highlights unusual patterns in data automatically.

95. How do you perform clustering in Power BI?


Using the built-in Clustering option in scatter plots.

96. What is Explain Feature in Power BI?


A feature that auto-generates explanations for data trends and insights.

97. What is Forecasting in Power BI?


A feature that uses statistical methods to predict future trends based on historical
data.

98. Can you integrate Power BI with Azure Machine Learning?


Yes, using Azure ML Web Services or Power BI AI Insights.

99. What is Sentiment Analysis in Power BI?


An AI-driven analysis that classifies text data as positive, negative, or neutral.

100.How do you use Python or R for advanced analytics in Power BI?


By enabling Python or R scripting within Power BI and using visuals for machine learning
models.

Power BI Data Modeling & DAX


1. What is the difference between Calculated Columns and Measures in Power
BI?
Calculated Columns are stored in the model, while Measures are calculated at
runtime for better performance.
2. What is the function of the RELATED() function in Power BI?
RELATED() fetches a value from another table connected via a one-to-many
relationship.
3. What is the difference between RELATED() and LOOKUPVALUE()?
RELATED() requires an active relationship, while LOOKUPVALUE() retrieves a
value without relationships.
4. What is the purpose of USERELATIONSHIP() in DAX?
USERELATIONSHIP() enables the use of inactive relationships in calculations.
5. What is the function of SUMX() in DAX?
SUMX() iterates over a table and applies row-wise calculations before
summing.
6. How do you create a rolling average in Power BI?
Using the AVERAGEX() function combined with a time-based filter context.
7. What is the difference between DISTINCT() and VALUES() in Power BI?
DISTINCT() returns unique values from a column, while VALUES() returns
unique values considering blanks.
8. What is the role of ALL() function in DAX?
ALL() removes all filters from a table or column in calculations.
9. How do you create a dynamic ranking in Power BI?
Using RANKX() function along with filter context manipulation.
10. What is the significance of EARLIER() in DAX?
EARLIER() is used to access previous row context values within nested row
iterations.

Power BI Advanced Visualization Techniques


1. What is Drillthrough in Power BI?
Drillthrough allows users to navigate from a summarized report to a detailed
view of specific data.
2. What is a Tooltip Page in Power BI?
A report page designed as a custom tooltip to provide additional context
when hovering over visuals.
3. How do you enable Dynamic Titles in Power BI?
Using DAX to create a measure and assigning it as a title in the visualization
settings.
4. What is a Field Parameter in Power BI?
A feature that enables dynamic axis and column selection in visuals using
slicers.
5. How do you implement small multiples in Power BI?
Using the Small Multiples feature in line charts, bar charts, and column charts
for category-based visual splits.
6. What is a Synced Slicer in Power BI?
A slicer that applies filters across multiple pages with synchronized
interactions.
7. How do you display Top N values dynamically in Power BI?
Using a parameterized measure with RANKX() and a slicer input.
8. What is a Matrix Visual in Power BI?
A visual similar to Pivot Tables in Excel, allowing hierarchical data
representation.
9. How do you create a Heatmap in Power BI?
Using Conditional Formatting on a matrix or table visual based on value
intensity.
10. What is a KPI Visual in Power BI?
A visualization that tracks key metrics against targets with trend indicators.

Power BI Security & Governance


1. What is Object-Level Security (OLS) in Power BI?
OLS restricts access to entire tables or columns in a dataset.
2. What is Row-Level Security (RLS) in Power BI?
RLS filters data based on user roles to show only relevant records.
3. What is the difference between Static RLS and Dynamic RLS?
Static RLS applies predefined filters, whereas Dynamic RLS filters data based
on user identity.
4. How do you implement Dynamic Row-Level Security in Power BI?
Using a security table with relationships and the USERPRINCIPALNAME()
function.
5. What is Data Sensitivity Labeling in Power BI?
A compliance feature that classifies and protects sensitive data within reports
and datasets.
6. How does Power BI handle GDPR compliance?
By allowing data encryption, sensitivity labels, and governance monitoring in
Power BI Service.
7. What is Data Masking in Power BI?
Hiding or obfuscating sensitive data using calculated columns or security
rules.
8. How do you audit user activities in Power BI?
Using Power BI Audit Logs in Microsoft Purview or Admin Portal.
9. What is a Power BI Service Tenant?
A dedicated Power BI environment managed within an organization's
Microsoft 365 subscription.
10. How do you manage dataset refresh failures in Power BI?
Checking the refresh history, increasing gateway resources, and optimizing
query folding.

Power BI Data Connectivity & Performance Optimization


1. What is a Gateway in Power BI?
A Gateway enables on-premises data connectivity for scheduled refresh and
live queries.
2. What is the difference between Personal Gateway and Enterprise Gateway?
Personal Gateway is for individual use, while Enterprise Gateway supports
shared connections in organizations.
3. What is Power Query Optimization?
Reducing transformation complexity, leveraging Query Folding, and
minimizing applied steps.
4. What is Direct Lake Mode in Power BI?
Direct Lake Mode enables near real-time access to data stored in OneLake in
Fabric.
5. How do you implement data partitioning in Power BI?
Using Incremental Refresh

You might also like