0% found this document useful (0 votes)
6 views24 pages

Ilovepdf Merged

This document provides a comprehensive guide on top Python interview questions, covering basic to advanced topics, including OOP and coding problems. It also includes a separate section on Data Engineering interview questions, focusing on Python, AWS, PySpark, SQL, and real-world scenarios. The guide is formatted for easy conversion to PDF and offers practical examples and coding exercises.

Uploaded by

tejaswini6299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views24 pages

Ilovepdf Merged

This document provides a comprehensive guide on top Python interview questions, covering basic to advanced topics, including OOP and coding problems. It also includes a separate section on Data Engineering interview questions, focusing on Python, AWS, PySpark, SQL, and real-world scenarios. The guide is formatted for easy conversion to PDF and offers practical examples and coding exercises.

Uploaded by

tejaswini6299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Here’s a **PDF-ready version** of the **Top Python Interview Questions with Answers and

Examples**. You can copy this into a text editor (like Microsoft Word or Google Docs) and save
it as a PDF.

---

# **Top Python Interview Questions with Answers & Examples**

*(PDF Version)*

## **Table of Contents**

1. [Basic Python Questions](#basic-python-questions)

2. [Intermediate Python Questions](#intermediate-python-questions)

3. [Advanced Python Questions](#advanced-python-questions)

4. [Python OOP Questions](#python-oop-questions)

5. [Python Coding Problems](#python-coding-problems)

---

## **1. Basic Python Questions**

### **Q1. What is Python?**

Python is a **high-level, interpreted, dynamically typed** programming language known for its
**simplicity** and **readability**.

### **Q2. What are Python’s key features?**


✔ Easy-to-learn syntax

✔ Interpreted (executes line-by-line)

✔ Dynamically typed (no explicit variable types)

✔ Cross-platform (Windows, Linux, macOS)

✔ Supports **OOP, functional, and procedural** programming

### **Q3. Difference between `list` and `tuple`?**

| Feature | List (`[]`) | Tuple (`()`) |

|--------------|------------|-------------|

| **Mutable?** | Yes | No |

| **Speed** | Slower | Faster |

| **Use Case** | Dynamic data | Fixed data |

**Example:**

```python

my_list = [1, 2, 3] # Mutable

my_tuple = (1, 2, 3) # Immutable

```

### **Q4. What is PEP 8?**

Python’s **style guide** for clean, readable code. Key rules:

- Use **4 spaces** for indentation

- Limit lines to **79 characters**


- Use `snake_case` for variables (`my_variable`)

- Use `CamelCase` for classes (`MyClass`)

### **Q5. How is memory managed in Python?**

- Uses **garbage collection** (automatic memory management).

- **Reference counting**: Objects deleted when references reach zero.

---

## **2. Intermediate Python Questions**

### **Q6. What are Python decorators?**

Decorators **modify functions** without changing their code.

**Example:**

```python

def my_decorator(func):

def wrapper():

print("Before function")

func()

print("After function")

return wrapper
@my_decorator

def say_hello():

print("Hello!")

say_hello()

```

**Output:**

```

Before function

Hello!

After function

```

### **Q7. Difference between `==` and `is`?**

- `==` → **Value equality**

- `is` → **Memory address (identity)**

**Example:**

```python

a = [1, 2, 3]

b=a

c = [1, 2, 3]
print(a == b) # True (same values)

print(a is b) # True (same object)

print(a == c) # True (same values)

print(a is c) # False (different objects)

```

### **Q8. What are `*args` and `**kwargs`?**

- `*args` → Variable **positional** arguments (tuple).

- `**kwargs` → Variable **keyword** arguments (dict).

**Example:**

```python

def example(*args, **kwargs):

print("Args:", args)

print("Kwargs:", kwargs)

example(1, 2, name="Alice")

```

**Output:**

```

Args: (1, 2)

Kwargs: {'name': 'Alice'}

```
### **Q9. What is a lambda function?**

An **anonymous function** defined with `lambda`.

**Example:**

```python

square = lambda x: x ** 2

print(square(5)) # Output: 25

```

### **Q10. Explain `__init__` and `__str__`**

- `__init__` → Constructor (initializes object).

- `__str__` → String representation (used by `print()`).

**Example:**

```python

class Person:

def __init__(self, name):

self.name = name

def __str__(self):

return f"Person: {self.name}"

p = Person("Alice")
print(p) # Output: Person: Alice

```

---

## **3. Advanced Python Questions**

### **Q11. What is the GIL?**

- **Global Interpreter Lock (GIL)** allows **only one thread** to execute Python bytecode at a
time.

- **Workaround**: Use `multiprocessing` for CPU-bound tasks.

### **Q12. Multithreading vs Multiprocessing?**

| Feature | Multithreading | Multiprocessing |

|--------------|---------------|----------------|

| **Memory** | Shared | Separate |

| **GIL** | Affected | Not affected |

| **Use Case** | I/O-bound | CPU-bound |

### **Q13. What are metaclasses?**

Metaclasses **define how classes are created** (default: `type`).

**Example:**

```python
class Meta(type):

def __new__(cls, name, bases, dct):

dct['version'] = 1.0

return super().__new__(cls, name, bases, dct)

class MyClass(metaclass=Meta):

pass

print(MyClass.version) # Output: 1.0

```

### **Q14. What are generators?**

Generators **yield values one at a time** (memory-efficient).

**Example:**

```python

def count_up_to(n):

i=1

while i <= n:

yield i

i += 1

for num in count_up_to(5):


print(num) # Output: 1, 2, 3, 4, 5

```

### **Q15. What is monkey patching?**

Modifying a class **at runtime**.

**Example:**

```python

class MyClass:

def original(self):

return "Original"

def patched(self):

return "Patched"

MyClass.original = patched

obj = MyClass()

print(obj.original()) # Output: Patched

```

---

## **4. Python OOP Questions**


### **Q16. What is inheritance?**

A class **derives properties** from another class.

**Example:**

```python

class Animal:

def speak(self):

return "Sound"

class Dog(Animal):

def speak(self):

return "Bark"

dog = Dog()

print(dog.speak()) # Output: Bark

```

### **Q17. What is polymorphism?**

Different classes can have **methods with the same name**.

**Example:**

```python
class Cat:

def speak(self):

return "Meow"

class Duck:

def speak(self):

return "Quack"

def animal_sound(animal):

print(animal.speak())

cat = Cat()

duck = Duck()

animal_sound(cat) # Output: Meow

animal_sound(duck) # Output: Quack

```

### **Q18. What is encapsulation?**

Restricts **direct access** to data (uses `_` or `__` for private variables).

**Example:**

```python

class BankAccount:
def __init__(self, balance):

self.__balance = balance # Private

def get_balance(self):

return self.__balance

account = BankAccount(1000)

print(account.get_balance()) # Output: 1000

# print(account.__balance) # Error (private)

```

---

## **5. Python Coding Problems**

### **Q19. Reverse a string**

```python

def reverse(s):

return s[::-1]

print(reverse("hello")) # Output: olleh

```

### **Q20. Check if a string is a palindrome**


```python

def is_palindrome(s):

return s == s[::-1]

print(is_palindrome("madam")) # Output: True

```

### **Q21. Factorial of a number**

```python

def factorial(n):

return 1 if n == 0 else n * factorial(n - 1)

print(factorial(5)) # Output: 120

```

### **Q22. Fibonacci sequence**

```python

def fibonacci(n):

a, b = 0, 1

for _ in range(n):

print(a, end=" ")

a, b = b, a + b
fibonacci(5) # Output: 0 1 1 2 3

```

### **Q23. Count character occurrences**

```python

from collections import Counter

def count_chars(s):

return Counter(s)

print(count_chars("hello")) # Output: {'h':1, 'e':1, 'l':2, 'o':1}

```

---

## **Final Notes**

✅ **Best for interviews** (FAANG, startups, coding rounds)

✅ **Covers basics to advanced Python**

✅ **Includes OOP & real-world examples**

**Want more?** Let me know if you need:

🔹 **More coding problems**

🔹 **Data Science/ML-specific Python questions**


🔹 **System design with Python**

Happy coding! 🚀🐍
Here’s a **comprehensive PDF-ready guide** covering **Data Engineering Interview
Questions** with **Python, AWS, PySpark, SQL, and real-world scenarios**, including
**diagrams, coding exercises, and SQL challenges**:

---

# **Data Engineering Interview Cheat Sheet**

*(Python + AWS + PySpark + SQL + System Design)*

## **Table of Contents**

1. [Python for Data Engineering](#python-for-data-engineering)

2. [AWS for Data Engineers](#aws-for-data-engineers)

3. [PySpark Deep Dive](#pyspark-deep-dive)

4. [SQL Mastery](#sql-mastery)

5. [Real-World System Design](#real-world-system-design)

6. [Coding Exercises](#coding-exercises)

7. [AWS Architecture Diagrams](#aws-architecture-diagrams)

---

## **1. Python for Data Engineering**

### **Key Concepts**

- **ETL Pipelines**: Use `pandas`, `generators`, and `Dask` for large datasets.
- **Parallel Processing**: `multiprocessing` for CPU-bound tasks.

### **Example: Memory-Efficient CSV Reader**

```python

import pandas as pd

# Process 1M rows in chunks of 10k

for chunk in pd.read_csv('large.csv', chunksize=10000):

process(chunk) # Custom transformation

```

### **Common Libraries**

| Library | Use Case |

|--------------|----------|

| **Pandas** | Small-to-medium ETL |

| **Dask** | Out-of-core DataFrames |

| **Luigi** | Pipeline orchestration |

---

## **2. AWS for Data Engineers**

### **Core Services**


| Service | Key Feature | Interview Question |

|--------------|------------|---------------------|

| **S3** | Infinite storage | "How would you optimize S3 for query performance?" |

| **Glue** | Serverless ETL | "Explain how Glue Catalog works." |

| **Redshift** | Columnar DB | "Compare Redshift vs Snowflake." |

| **EMR** | Managed Spark | "How do you handle EMR cluster sizing?" |

### **S3 Optimization Tips**

- **Partitioning**: `s3://bucket/date=2024-01-01/`

- **File Formats**: Use **Parquet/ORC** (columnar) over CSV.

- **Lifecycle Policies**: Move old data to **S3 Glacier**.

---

## **3. PySpark Deep Dive**

### **Optimization Techniques**

1. **Partitioning**:

```python

df.repartition(100, "date") # Avoid skew

```

2. **Broadcast Join**:

```python
df_large.join(broadcast(df_small), "key")

```

3. **Caching**:

```python

df.cache() # For iterative algorithms

```

### **Common Interview Questions**

- **Q**: "How does Spark handle failures?"

**A**: Uses **RDD lineage** to recompute lost partitions.

- **Q**: "Explain `groupBy` vs `reduceByKey`."

**A**: `reduceByKey` is faster (combines data before shuffling).

---

## **4. SQL Mastery**

### **Must-Know Concepts**

1. **Window Functions**:

```sql

SELECT user_id,

SUM(revenue) OVER (PARTITION BY user_id) as total_revenue

FROM sales;
```

2. **CTEs vs Subqueries**:

```sql

WITH top_users AS (

SELECT user_id FROM users ORDER BY revenue DESC LIMIT 10

SELECT * FROM top_users;

```

### **Performance Tuning**

- **Indexes**: Create on JOIN/WHE RE columns.

- **Query Plan**: Use `EXPLAIN ANALYZE` to debug slowness.

---

## **5. Real-World System Design**

### **Scenario: Design a Clickstream Pipeline**

1. **Ingest**: Kafka/Kinesis for real-time clicks.

2. **Process**: Spark for aggregation (e.g., clicks/user).

3. **Store**: S3 (Parquet) + Redshift for analytics.

4. **Orchestrate**: Airflow to schedule daily batches.


**Diagram**:

```

[Kafka] → [Spark Streaming] → [S3] → [Glue ETL] → [Redshift]

```

### **Key Interview Questions**

- **Q**: "How would you handle late-arriving data?"

**A**: Use **watermarking** in Spark Structured Streaming.

- **Q**: "How to ensure data quality?"

**A**: Implement **Great Expectations** or unit tests.

---

## **6. Coding Exercises**

### **PySpark: Find Top 10 Products**

```python

from pyspark.sql import functions as F

df_sales.groupBy("product_id") \

.agg(F.sum("revenue").alias("total_revenue")) \

.orderBy(F.desc("total_revenue")) \

.limit(10)
```

### **SQL: Rolling 7-Day Average**

```sql

SELECT date,

AVG(sales) OVER (ORDER BY date ROWS 6 PRECEDING) as rolling_avg

FROM daily_sales;

```

---

## **7. AWS Architecture Diagrams**

### **Batch ETL Pipeline**

```mermaid

graph LR

A[S3 Raw Zone] --> B[Glue ETL Job]

B --> C[S3 Processed Zone]

C --> D[Athena/Redshift]

```

### **Real-Time Pipeline**

```mermaid
graph LR

A[Kinesis] --> B[Lambda]

B --> C[Firehose]

C --> D[S3]

```

---

## **Final Tips**

✅ **Python**: Focus on **generators**, `pandas`, and `Dask`.

✅ **AWS**: Know **S3, Glue, Redshift, EMR** cold.

✅ **PySpark**: Master **optimizations** (partitioning, caching).

✅ **SQL**: Practice **window functions** and **query plans**.

✅ **System Design**: Draw diagrams + mention **cost/scale tradeoffs**.

**Need more?** Ask for:

🔹 **More PySpark exercises**

🔹 **AWS cost optimization tricks**

🔹 **Case studies (e.g., Uber-style data lake)**

Good luck! 🚀

---
**To save as PDF**:

1. Copy this into **Google Docs/Microsoft Word**.

2. Export as **PDF**.

3. For **diagrams**, use [Mermaid Live Editor](https://mermaid.live/) or draw.io.

Let me know if you'd like any section expanded!

You might also like