Python Interview Q&A for Data Science and Gen AI
Generated on: June 25, 2025
Experience Level: 2+ Years
1. What are Python's key features that make it suitable for data science and Gen AI development?
Answer: Python has simple syntax, a large ecosystem of libraries (like Pandas, NumPy,
Scikit-learn, TensorFlow, PyTorch, Transformers), strong community support, and flexibility for rapid
prototyping.
2. Explain the difference between a list, tuple, set, and dictionary in Python.
Answer: Lists are ordered and mutable; tuples are ordered and immutable; sets are unordered
and contain unique elements; dictionaries store key-value pairs and are mutable.
3. What is list comprehension? Provide an example.
Answer: List comprehension provides a concise way to create lists. Example: [x**2 for x in
range(5)] produces [0, 1, 4, 9, 16].
4. How does Python handle memory management?
Answer: Python uses reference counting and garbage collection to manage memory. The 'gc'
module helps detect and collect cyclic references.
5. What are Python generators and how are they useful?
Answer: Generators allow you to iterate through data without storing everything in memory using
the 'yield' keyword. Useful for large datasets or infinite streams.
6. Explain the use of decorators in Python.
Answer: Decorators modify the behavior of functions or methods. They're often used for logging,
timing, access control, and memoization.
7. What is the difference between shallow copy and deep copy?
Answer: Shallow copy copies references to objects; deep copy copies objects recursively. Use
'copy.copy()' for shallow and 'copy.deepcopy()' for deep copies.
8. What libraries do you use in Python for data manipulation and analysis?
Answer: Pandas, NumPy, SciPy for analysis; Matplotlib, Seaborn, Plotly for visualization;
Scikit-learn for ML; Statsmodels for statistics.
9. How do you handle missing data using Pandas?
Answer: Use 'isnull()', 'dropna()', or 'fillna()' to detect, remove, or fill missing values respectively.
10. What are groupby operations in Pandas and when do you use them?
Answer: 'groupby()' splits the data into groups, applies a function (like mean or sum), and
combines the result. Useful for aggregation and summary statistics.
11. Explain the difference between NumPy arrays and Python lists.
Answer: NumPy arrays are faster, support vectorized operations, and use less memory compared
to Python lists.
12. How do you apply vectorized operations using NumPy?
Answer: You can perform operations directly on arrays without loops. Example: 'arr * 2' multiplies
each element by 2.
13. What is broadcasting in NumPy?
Answer: Broadcasting allows operations between arrays of different shapes by implicitly
expanding the smaller array.
14. How do you visualize data using Matplotlib or Seaborn in Python?
Answer: Use 'plt.plot()', 'plt.hist()', 'sns.barplot()', etc. Matplotlib is low-level; Seaborn is built on top
and provides higher-level APIs.
15. How do you use Python to interact with transformer-based models like GPT or BERT?
Answer: Use libraries like Hugging Face Transformers to load models, tokenize input, and
generate predictions or embeddings.
16. What is Hugging Face Transformers library and how have you used it?
Answer: It's a Python library to work with pretrained NLP models. I've used it to build
question-answering, text generation, and RAG-based systems.
17. How do you fine-tune a pre-trained language model using Python?
Answer: Use Hugging Face Trainer API with a custom dataset and define training arguments,
tokenizer, and model. Call 'trainer.train()'.
18. What is tokenization in NLP and how is it implemented in Python?
Answer: Tokenization splits text into tokens. In Transformers, use model-specific tokenizers like
'AutoTokenizer.from_pretrained(...)'.
19. Explain the concept of embeddings and how you generate them using Python.
Answer: Embeddings are vector representations of text. Generate them using models like BERT
or Sentence Transformers via Python libraries.
20. How do you implement RAG (Retrieval-Augmented Generation) in Python?
Answer: Use vector stores (like FAISS) for retrieval, retrieve relevant documents, then pass them
with the query to a generative model (like GPT-2).
21. Write a Python function to check if a string is a palindrome.
Answer: def is_palindrome(s): return s == s[::-1]
22. Write a Python script to read a CSV file and calculate summary statistics.
Answer: import pandas as pd
df = pd.read_csv('data.csv')
print(df.describe())
23. How do you handle exceptions in Python? Give an example.
Answer: Use try-except blocks. Example:
try:
result = 10 / 0
except ZeroDivisionError:
print('Cannot divide by zero')
24. Explain the difference between synchronous and asynchronous programming in Python.
Answer: Synchronous code runs line-by-line; asynchronous uses 'async/await' for concurrent I/O
tasks, improving performance without threading.
25. What are Python's data classes and where would you use them?
Answer: Data classes reduce boilerplate for classes that primarily store data. Use '@dataclass'
from the 'dataclasses' module.