Python QA
Python QA
Data Types
• int: Integer (e.g., x = 5)
• float: Decimal (e.g., y = 3.14)
• str: String (e.g., name = "Data Science")
• bool: Boolean (e.g., flag = True)
• list: Ordered, mutable collection (e.g., lst = [1, 2, 3])
• tuple: Ordered, immutable collection (e.g., tpl = (1, 2, 3))
• set: Unordered, unique elements (e.g., s = {1, 2, 3})
• dict: Key-value pairs (e.g., d = {"a": 1, "b": 2})
String Methods
• .upper(), .lower(): Convert case (e.g., "data".upper() → "DATA")
• .strip(): Removes leading/trailing spaces (e.g., " data ".strip() → "data")
• .replace(): Replaces substrings (e.g., "AI".replace("A", "ML") → "MLI")
• .split(), .join(): Splits/join strings (e.g., "a b".split() → ['a', 'b'])
• .find(), .index(): Finds substring (e.g., "science".find("i") → 2)
• .count(): Counts occurrences (e.g., "banana".count("a") → 3)
• .startswith(), .endswith(): Checks start/end (e.g., "data".startswith("d")
→ True)
• f"{}" (f-string): String formatting (e.g., f"Value: {x}" → "Value: 5")
List Methods
• .append(): Adds item (e.g., lst.append(4))
• .extend(): Adds multiple items (e.g., lst.extend([5, 6]))
• .insert(): Inserts at index (e.g., lst.insert(1, "AI"))
• .remove(): Removes first occurrence (e.g., lst.remove(2))
• .pop(): Removes & returns last item (e.g., lst.pop())
• .clear(): Empties list (e.g., lst.clear())
• .index(): Finds index of value (e.g., lst.index(3))
• .count(): Counts occurrences (e.g., lst.count(1))
• .sort(): Sorts list (e.g., lst.sort())
• .reverse(): Reverses list (e.g., lst.reverse())
Tuple
• tuple(): Creates an immutable sequence (e.g., t = (1, 2, 3))
• len(): Returns length (e.g., len((1,2,3)) → 3)
• index(): Finds index of an element (e.g., (1,2,3).index(2) → 1)
• count(): Counts occurrences (e.g., (1,1,2).count(1) → 2)
Set Methods
• .add(): Adds an element (e.g., s.add(5))
• .remove(): Removes an element (e.g., s.remove(3))
• .discard(): Removes element if exists (e.g., s.discard(3))
• .pop(): Removes & returns random element (e.g., s.pop())
• .clear(): Empties the set (e.g., s.clear())
• .union(): Combines sets (e.g., s1.union(s2))
• .intersection(): Finds common elements (e.g., s1.intersection(s2))
• .difference(): Finds unique elements (e.g., s1.difference(s2))
Dictionary Methods
• .keys(): Returns all keys (e.g., d.keys())
• .values(): Returns all values (e.g., d.values())
• .items(): Returns key-value pairs (e.g., d.items())
• .get(): Retrieves value by key (e.g., d.get("name"))
• .update(): Updates dictionary (e.g., d.update({"age": 25}))
• .pop(): Removes key-value pair by key (e.g., d.pop("age"))
• .popitem(): Removes last inserted item (e.g., d.popitem())
• .clear(): Empties dictionary (e.g., d.clear())
Control Flow
• if, elif, else: Conditional statements (e.g., if x > 5: print("High"))
• for: Loops over sequences (e.g., for i in range(5): print(i))
• while: Loops until condition is false (e.g., while x < 10: x += 1)
• break: Exits loop early (e.g., if x == 5: break)
• continue: Skips to next iteration (e.g., if x == 5: continue)
• pass: Placeholder for future code (e.g., if x > 5: pass)
Functions
• def: Defines a function (e.g., def add(a, b): return a + b)
• return: Returns a value from a function (e.g., return result)
• lambda: Anonymous function (e.g., lambda x: x * 2 → 4 for x=2)
• map(): Applies function to an iterable (e.g., map(str.upper, ['a', 'b']))
• filter(): Filters values based on condition (e.g., filter(lambda x: x > 2,
[1,2,3]))
• reduce(): Performs cumulative operation (e.g., reduce(lambda x, y: x + y,
[1,2,3]))
File Handling
• open(): Opens a file (e.g., f = open('data.txt', 'r'))
• .read(), .readline(), .readlines(): Reads file content (e.g., f.read())
• .write(), .writelines(): Writes data to a file (e.g., f.write("Hello"))
• .close(): Closes the file (e.g., f.close())
• with open() as: Handles files safely (e.g., with open('file.txt') as f:)
Exception Handling
• try, except, finally: Handles errors (e.g., try: x=1/0 except
ZeroDivisionError: print("Error"))
• raise: Raises an exception (e.g., raise ValueError("Invalid Input"))
• assert: Debugging check (e.g., assert x > 0, "x must be positive")
Modules
• import: Imports a module (e.g., import numpy as np)
• from import: Imports specific function (e.g., from math import sqrt)
• as: Renames module (e.g., import pandas as pd)
• dir(): Lists attributes of an object (e.g., dir(str))
• help(): Displays documentation (e.g., help(list))
Built-in Functions
• sum(): Computes sum (e.g., sum([1,2,3]) → 6)
• min(), max(): Finds min/max value (e.g., max([1,2,3]) → 3)
• abs(): Returns absolute value (e.g., abs(-5) → 5)
• round(): Rounds a number (e.g., round(3.14, 1) → 3.1)
• sorted(): Sorts a sequence (e.g., sorted([3,1,2]) → [1,2,3])
• enumerate(): Adds index while iterating (e.g., for i, val in enumerate(['a',
'b']): print(i, val))
• zip(): Combines iterables (e.g., list(zip([1,2], ['a', 'b'])) → [(1, 'a'), (2, 'b')])
• any(), all(): Checks if any/all elements meet a condition (e.g., all([True,
False]) → False)
Datetime Module
• datetime.datetime: Represents date & time (e.g.,
datetime.datetime.now())
• datetime.date: Represents only the date (e.g., datetime.date.today())
• datetime.timedelta: Represents time difference (e.g.,
datetime.timedelta(days=5))
• datetime.strptime(), datetime.strftime(): Converts between string and
datetime (e.g., datetime.strptime("2025-02-19", "%Y-%m-%d"))
1. Python Basics
Que-What is indentation in Python?
Indentation is used to define code blocks instead of curly braces.
5. While Loop
Que-What is an infinite loop?
A loop that never stops running.
6. Comprehension
Que-How do you create a set using comprehension?
{x for x in range(5)}
8. Generator Functions
Que-What is a generator function?
A function that uses yield to return values lazily.
9. Lambda Functions
Que-What is a lambda function?
An anonymous function defined using lambda.
Que-What is an object?
An instance of a class.
12. Polymorphism
Que-What is polymorphism?
The ability of different classes to be treated as instances of the same class.
14. Inheritance
Que-What is inheritance in Python?
It allows a class to derive properties and methods from another class.
15. Abstraction
Que-What is abstraction in Python?
Hiding implementation details and exposing only necessary functionalities.
24. Multithreading
Que-What is multithreading?
Running multiple threads in parallel within a program.
Que-What is ValueError?
Raised when a function gets an argument of the right type but invalid value.
Que-What is IndexError?
Raised when accessing an index that does not exist in a sequence.
Que-What is KeyError?
Raised when trying to access a key that does not exist in a dictionary.
Que-What is ZeroDivisionError?
Raised when division by zero occurs.
28. Best Practices for Exception Handling
Que-Why should you avoid using a generic except?
It catches all exceptions, making debugging difficult.
Que-What is a generator?
A function that yields values lazily.
2. NumPy
26. What is NumPy used for?
Efficient numerical computations and array operations.
4. Matplotlib
61. What is Matplotlib used for?
It is a Python library for creating static, animated, and interactive visualizations.
5. Plotly
86. What is Plotly used for?
It is a Python library for interactive data visualization.
Probability Distributions
11. What is a probability distribution?
A function that describes the likelihood of different outcomes in a dataset.
18. What is the difference between normal and standard normal distribution?
The standard normal distribution has a mean of 0 and a standard deviation of 1.
Statistical Measures
21. What is the mean?
The average of all values: μ=∑xn\mu = \frac{\sum x}{n}μ=n∑x.
22. What is the median?
The middle value when data is sorted.
Inferential Statistics
31. What is hypothesis testing?
A method to test assumptions about a population using sample data.
Advanced Concepts
41. What is a Markov Chain?
A stochastic process where future states depend only on the present state.
42. What is the Law of Large Numbers?
As sample size increases, sample mean approaches population mean.
Regression Analysis
71. What is the difference between correlation and regression?
Correlation measures association, while regression predicts outcomes.
Data Cleaning
11. How do you identify duplicates in a dataset?
- Use methods like `.duplicated()` in Pandas or the DISTINCT keyword in SQL.
Data Visualization
21. What is a histogram?
- A histogram visualizes the frequency distribution of a variable.
Statistical Questions
31. What is a correlation matrix?
- A table showing correlation coefficients between variables.
Feature Engineering
1. AI vs ML vs DL vs DS (15 Questions)
1. What is Artificial Intelligence (AI)?
AI is the simulation of human intelligence in machines to perform cognitive tasks.
2. What is Machine Learning (ML)?
ML is a subset of AI that enables machines to learn patterns from data without
explicit programming.
15. What are the major skills required for AI, ML, and DS?
AI requires problem-solving; ML requires statistics & algorithms; DS requires Python,
SQL, and data visualization.
22. What is the main difference between Supervised and Unsupervised Learning?
Supervised learning uses labeled data, while unsupervised learning does not.
29. What is the typical split ratio for Train, Validation, and Test sets?
70-80% training, 10-15% validation, 10-15% testing.
Feature Engineering
1. Handling Missing Values
41. How can missing values be handled?
Using deletion, imputation (mean, median, mode), or predictive modeling.
43. What is the best method for handling categorical missing values?
Mode imputation or creating a new category like "Unknown".
5. Handling Outliers
61. What are Outliers?
Data points that significantly deviate from the normal pattern of the dataset.
7. Feature Extraction
71. What is Feature Extraction?
Transforming raw data into new features that better represent patterns in data.
90. How does Unit Vector Scaling differ from Min-Max Scaling?
Min-Max rescales within [0,1], while Unit Vector preserves feature relationships.
107.What is Correlation?
A normalized measure of the relationship between two variables, ranging
from -1 to 1.
7. What is Multicollinearity?
A condition where independent variables are highly correlated, affecting model
stability.
3. Polynomial Regression
11. What is Polynomial Regression?
A regression technique where the relationship between variables is modeled as an
nth-degree polynomial.
22. Why is Root Mean Squared Error (RMSE) preferred over MSE?
RMSE is in the same unit as the target variable, making it more interpretable.
23. How does Mean Absolute Error (MAE) differ from MSE?
MAE calculates the average absolute errors, while MSE squares the errors.
8. Ridge Regression
36. What is Ridge Regression?
A regression technique that applies L2 regularization to reduce overfitting.
9. Lasso Regression
41. What is Lasso Regression?
A regression technique that applies L1 regularization to perform feature selection.
44. What happens if the alpha value is too high in Lasso Regression?
The model can underfit by eliminating too many features.
48. How does Elastic Net differ from Ridge and Lasso?
It balances both techniques, preventing limitations like Lasso selecting too few
features.
15. Boosting
71. What is Boosting in Machine Learning?
A technique that sequentially trains models, with each model correcting the errors of
the previous one.
83. What is the difference between Stationary and Non-Stationary Time Series?
A stationary series has constant mean and variance, while a non-stationary series
does not.
84. What are ACF and PACF used for in Time Series Analysis?
ACF (Autocorrelation Function) checks correlation at different lags, while PACF
(Partial ACF) isolates direct correlations.
108.What is F1-Score?
The harmonic mean of Precision and Recall: F1=2×Precision×RecallPrecision+RecallF1
= 2 \times \frac{Precision \times Recall}{Precision +
Recall}F1=2×Precision+RecallPrecision×Recall.
4. Activation Functions
16. What is an activation function?
A non-linear function that introduces complexity to neural networks.
7. Loss Functions
31. What is a loss function?
A function that quantifies the error between predicted and actual values.
8. Optimizers
36. What is an optimizer in Deep Learning?
An algorithm that updates network weights to minimize loss.
9. TensorFlow
41. What is TensorFlow?
An open-source deep learning framework for building neural networks.
10. PyTorch
46. What is PyTorch?
A flexible deep learning library known for dynamic computation graphs.
5. What is AlexNet?
A deep CNN model that revolutionized image classification by using more layers and
GPUs for training, achieving top performance in the 2012 ImageNet competition.
YOLO can detect faces by treating face detection as an object detection problem,
using its grid-based approach to localize and classify faces.
1. Introduction to NLP
1. What is NLP (Natural Language Processing)?
NLP is a field of AI that focuses on the interaction between computers and human
language, enabling machines to understand, interpret, and generate human
language.
2. Why is NLP important?
NLP allows machines to process and analyze large amounts of natural language data,
facilitating tasks like sentiment analysis, translation, and chatbots.
2. History of NLP
6. What is the history of NLP?
NLP began in the 1950s with rule-based systems and evolved into statistical models
in the 1990s, and more recently, into deep learning-based models.
5. NLP Libraries
20. What is TextBlob?
TextBlob is a Python library for processing textual data, offering tools for part-of-
speech tagging, noun phrase extraction, sentiment analysis, and translation.
Big Data
Introduction to Big Data
7. What is Big Data?
Big Data refers to extremely large datasets that cannot be processed using traditional
data management tools due to their volume, variety, velocity, and complexity.
Spark Overview
13. What is Apache Spark?
Apache Spark is an open-source, distributed computing framework designed for fast
data processing and analytics, supporting in-memory computation and real-time data
streaming.
Data Engineering
20. What is Data Engineering?
Data Engineering involves designing, building, and maintaining systems and
infrastructure for collecting, storing, and processing large volumes of data for
analysis.
21. What is Docker?
Docker is an open-source platform for automating the deployment and management
of applications in lightweight containers that ensure consistency across different
environments.
Introduction to Power BI
1. What is Power BI?
Power BI is a business intelligence tool by Microsoft used for data visualization,
reporting, and analytics.
12. What are some common data sources for Power BI?
SQL Server, Azure, SharePoint, Excel, and Web APIs.
29. What is the difference between Star Schema and Snowflake Schema?
Star Schema has a central fact table with directly linked dimension tables, while
Snowflake Schema normalizes dimensions into multiple related tables.
36. What is the difference between a table and a matrix in Power BI?
A table displays data in a flat structure, while a matrix allows for grouping and
aggregation.
57. How does the Smart Narrative feature work in Power BI?
Smart Narrative automatically generates text-based insights from data visualizations.
58. What is Sensitivity Labels in Power BI?
Sensitivity Labels classify and protect data by enforcing security and compliance
policies.
59. What is the difference between Power Automate and Power BI?
Power Automate automates workflows, while Power BI focuses on data visualization
and reporting.
Power BI Integration
61. How do you integrate Power BI with Azure Synapse?
Using DirectQuery or Import Mode to connect with Azure Synapse Analytics.
90. What is Materialized Views, and how does it help Power BI?
Materialized Views precompute query results for faster performance in relational
databases.