Day 2 Python Interview QnA
Day 2 Python Interview QnA
1. **What is Python?**
- Python is a high-level, interpreted programming language known for its readability and
simplicity. It's widely used in various fields, including data analysis.
6. **What is NumPy?**
- NumPy is a Python library for numerical computations, providing support for arrays,
matrices, and a wide range of mathematical functions.
21. **What libraries are commonly used for data visualization in Python?**
- Common libraries include Matplotlib, Seaborn, and Plotly.
22. **How do you create a simple line plot using Matplotlib?**
- Use:
```python
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
```
30. **What is the difference between deep copy and shallow copy?**
- A shallow copy creates a new object but inserts references into it to the objects found in the
original. A deep copy creates a new object and recursively adds copies of nested objects found
in the original.
### Data Analytics Concepts
**Question:** Write a function that takes a DataFrame and a column name, and returns the
mean of that column.
```python
import pandas as pd
def mean_of_column(df, column_name):
return df[column_name].mean()
# Example usage
data = {'A': [1, 2, 3, 4], 'B': [5, 6, None, 8]}
df = pd.DataFrame(data)
print(mean_of_column(df, 'A')) # Output: 2.5
```
**Question:** Write a function to filter rows in a DataFrame where a specified column’s values
are greater than a given threshold.
```python
def filter_above_threshold(df, column_name, threshold):
return df[df[column_name] > threshold]
# Example usage
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(filter_above_threshold(df, 'A', 2))
```
**Question:** Write a function that returns the sum of values in a specific column grouped by
another column.
```python
def sum_grouped_by(df, group_column, sum_column):
return df.groupby(group_column)[sum_column].sum()
# Example usage
data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [1, 2, 3, 4]}
df = pd.DataFrame(data)
print(sum_grouped_by(df, 'Category', 'Values')) # Output: A 4, B 6
```
**Question:** Write a function that replaces missing values in a DataFrame with the mean of
their respective columns.
```python
def fill_missing_with_mean(df):
return df.fillna(df.mean())
# Example usage
data = {'A': [1, None, 3], 'B': [None, 2, 3]}
df = pd.DataFrame(data)
print(fill_missing_with_mean(df))
```
**Question:** Write code to create a bar plot of the average values of a column grouped by
another column.
```python
import matplotlib.pyplot as plt
# Example usage
data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [1, 2, 3, 4]}
df = pd.DataFrame(data)
plot_average_bar(df, 'Category', 'Values')
```
**Question:** Write a function that detects outliers in a DataFrame column using the IQR
method.
```python
def detect_outliers_iqr(df, column_name):
Q1 = df[column_name].quantile(0.25)
Q3 = df[column_name].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column_name] < lower_bound) | (df[column_name] > upper_bound)]
# Example usage
data = {'Values': [1, 2, 3, 4, 100]}
df = pd.DataFrame(data)
print(detect_outliers_iqr(df, 'Values')) # Output: Rows with outliers
```
**Question:** Write a function that adds a specified number of days to a date column in a
DataFrame.
```python
def add_days_to_date(df, date_column, days):
df[date_column] = pd.to_datetime(df[date_column]) + pd.Timedelta(days=days)
return df
# Example usage
data = {'Date': ['2023-01-01', '2023-01-02']}
df = pd.DataFrame(data)
print(add_days_to_date(df, 'Date', 5))
```
1. **What is Python?**
- Python is a high-level, interpreted programming language known for its readability and
versatility. It is widely used in data analytics, web development, automation, and more.
6. **What is Pandas?**
- Pandas is a powerful data manipulation and analysis library for Python. It provides data
structures like Series and DataFrames.
18. **What are the differences between `loc` and `iloc` in Pandas?**
- `loc` is label-based indexing, while `iloc` is position-based indexing. For example:
```python
df.loc[0] # First row by label
df.iloc[0] # First row by position
```
22. **How do you calculate the mean and standard deviation using NumPy?**
```python
import numpy as np
mean = np.mean(data)
std_dev = np.std(data)
```
23. **What is linear regression, and how can you implement it in Python?**
- Linear regression is a method to model the relationship between a dependent variable and
one or more independent variables. It can be implemented using `scikit-learn`:
```python
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
```