0% found this document useful (0 votes)

7 views15 pages

Day 2 Python Interview QnA

Uploaded by

spandushetty28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views15 pages

Day 2 Python Interview QnA

Uploaded by

spandushetty28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

### Basic Python Questions

1. **What is Python?**
- Python is a high-level, interpreted programming language known for its readability and
simplicity. It's widely used in various fields, including data analysis.

2. How do you install Python?

- You can install Python from the official Python website or use package managers like `apt`,
`brew`, or `conda`.

3. What are lists and tuples in Python?

- Lists are mutable, ordered collections of items. Tuples are immutable, ordered collections.
Lists use square brackets (`[]`), while tuples use parentheses (`()`).

4. What are dictionaries in Python?

- Dictionaries are mutable, unordered collections of key-value pairs. They are defined using
curly braces (`{}`).

5. How do you handle exceptions in Python?

- Use the `try` and `except` blocks to catch and handle exceptions. Optionally, you can use
`finally` for cleanup actions.

### Data Manipulation Questions

6. **What is NumPy?**
- NumPy is a Python library for numerical computations, providing support for arrays,
matrices, and a wide range of mathematical functions.

7. How do you create a NumPy array?

- Use `numpy.array()`, `numpy.zeros()`, or `numpy.ones()` functions to create arrays.

8. What are the advantages of using Pandas?

- Pandas is excellent for data manipulation and analysis, providing DataFrame structures,
handling missing data, and easy data filtering.

9. How do you read a CSV file in Pandas?

- Use `pandas.read_csv('filename.csv')` to read a CSV file into a DataFrame.

10. How do you handle missing data in Pandas?

- Use `DataFrame.dropna()` to remove missing values or `DataFrame.fillna(value)` to replace
them with a specified value.

### Data Analysis Questions

11. **What is data wrangling?**
- Data wrangling is the process of cleaning and transforming raw data into a format suitable
for analysis.

12. What is the difference between a Series and a DataFrame in Pandas?

- A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional
labeled data structure with columns that can be of different types.

13. How do you group data in Pandas?

- Use the `groupby()` method to group data based on specific columns.

14. What is a pivot table in Pandas?

- A pivot table is a data summarization tool that aggregates data based on one or more keys.

15. How do you merge two DataFrames in Pandas?

- Use `pd.merge(df1, df2, on='key_column')` to merge two DataFrames based on a common
column.

### Statistical Analysis Questions

16. What is the purpose of the `describe()` method in Pandas?

- The `describe()` method provides summary statistics of the DataFrame, including count,
mean, std, min, and quantiles.

17. How do you calculate correlation in Pandas?

- Use the `DataFrame.corr()` method to compute pairwise correlation of columns.

18. What is hypothesis testing?

- Hypothesis testing is a statistical method used to determine the validity of a hypothesis
based on sample data.

19. What are p-values?

- A p-value indicates the probability of observing the data if the null hypothesis is true. A low
p-value suggests that the null hypothesis may be rejected.

20. What is linear regression?

- Linear regression is a statistical method used to model the relationship between a
dependent variable and one or more independent variables.

### Data Visualization Questions

21. **What libraries are commonly used for data visualization in Python?**
- Common libraries include Matplotlib, Seaborn, and Plotly.
22. **How do you create a simple line plot using Matplotlib?**
- Use:
```python
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
```

23. What is Seaborn?

- Seaborn is a Python data visualization library based on Matplotlib that provides a high-level
interface for drawing attractive statistical graphics.

24. How do you create a scatter plot using Seaborn?

- Use:
```python
import seaborn as sns
sns.scatterplot(data=df, x='column1', y='column2')
```

25. What is a box plot?

- A box plot is a graphical representation of the distribution of a dataset, highlighting the
median, quartiles, and potential outliers.

### Advanced Python Questions

26. What are lambda functions in Python?

- Lambda functions are small anonymous functions defined with the `lambda` keyword. They
can take any number of arguments but only have one expression.

27. What is list comprehension?

- List comprehension is a concise way to create lists in Python using a single line of code.

28. What is the purpose of the `apply()` function in Pandas?

- The `apply()` function is used to apply a function along the axis of the DataFrame or to each
element of a Series.

29. How do you install external libraries in Python?

- Use `pip install library_name` to install external libraries.

30. **What is the difference between deep copy and shallow copy?**
- A shallow copy creates a new object but inserts references into it to the objects found in the
original. A deep copy creates a new object and recursively adds copies of nested objects found
in the original.
### Data Analytics Concepts

31. What is data normalization?

- Data normalization is the process of scaling data to fit within a specific range, often [0, 1] or
[-1, 1].

32. What is feature engineering?

- Feature engineering is the process of using domain knowledge to create new features from
raw data to improve model performance.

33. What is the difference between supervised and unsupervised learning?

- Supervised learning uses labeled data to train models, while unsupervised learning finds
patterns in unlabeled data.

34. What are outliers, and how can they be detected?

- Outliers are data points that differ significantly from the rest of the data. They can be
detected using statistical methods such as Z-scores or IQR.

35. What is the purpose of data validation?

- Data validation ensures that data is accurate, complete, and meets the specified criteria
before being used for analysis.

### SQL Integration Questions

36. How can you connect Python to a SQL database?

- Use libraries like `sqlite3`, `SQLAlchemy`, or `pyodbc` to connect to SQL databases.

37. What is the purpose of the `pandas.read_sql()` function?

- The `read_sql()` function is used to read SQL query results into a Pandas DataFrame.

38. How do you perform a SQL join in Pandas?

- Use `pd.merge(df1, df2, on='key_column', how='join_type')` to perform SQL-like joins in
Pandas.

39. What is a primary key in a database?

- A primary key is a unique identifier for records in a database table, ensuring that no two
records can have the same value.

40. What is a foreign key?

- A foreign key is a field in one table that uniquely identifies a row of another table,
establishing a relationship between the two.

### Machine Learning Questions

41. **What is the purpose of the `train_test_split()` function?**
- The `train_test_split()` function splits a dataset into training and testing sets to evaluate
model performance.

42. What is overfitting?

- Overfitting occurs when a model learns the training data too well, capturing noise and
fluctuations rather than the underlying trend.

43. What are decision trees?

- Decision trees are a type of supervised learning algorithm that splits data into branches
based on feature values to make predictions.

44. What is cross-validation?

- Cross-validation is a technique used to assess the performance of a model by dividing the
data into subsets and training/testing multiple times.

45. What is a confusion matrix?

- A confusion matrix is a table used to evaluate the performance of a classification model by
comparing predicted and actual classifications.

### Data Ethics Questions

46. What is data privacy?

- Data privacy refers to the proper handling and protection of sensitive data, ensuring
individuals' rights and freedoms are respected.

47. What is bias in data analysis?

- Bias refers to systematic errors that can lead to incorrect conclusions or unfair treatment of
certain groups in data analysis.

48. How can you ensure data integrity?

- Data integrity can be ensured through validation rules, access controls, and regular audits of
data sources and processes.

49. What is GDPR?

- The General Data Protection Regulation (GDPR) is a regulation in the EU that governs data
protection and privacy, giving individuals greater control over their personal data.

50. Why is data transparency important?

- Data transparency builds trust, allows for verification of findings, and ensures accountability
in data handling and analysis.

### More Advanced Topics

51. **What is the difference between K-means and hierarchical clustering?**
K-means: This is a partitioning method that divides the data into a specified number of clusters
(k). It initializes k centroids, assigns each data point to the nearest centroid, and then updates
the centroids based on the mean of the assigned points. This process iterates until
convergence.
Hierarchical Clustering: This method creates a tree-like structure (dendrogram) of clusters. It
can be agglomerative (bottom-up approach) or divisive (top-down approach). Agglomerative
starts with each point as its own cluster and merges them based on similarity, while divisive
starts with one cluster and splits it.

### Theory Questions

1. What is the difference between Python lists and arrays?

- Lists can hold different data types and are dynamic in size, while arrays (from the `numpy`
library) are fixed in size and hold homogeneous data types for better performance in numerical
computations.

2. Explain the concept of DataFrames in Pandas.

- DataFrames are two-dimensional, size-mutable, and potentially heterogeneous tabular data
structures with labeled axes (rows and columns), ideal for data manipulation and analysis.

3. What is the purpose of the `groupby()` function in Pandas?

- The `groupby()` function is used to split the data into groups based on some criteria, allowing
for operations like aggregation, transformation, or filtration.

4. How does the `apply()` function work in Pandas?

- The `apply()` function allows you to apply a function along the axis of a DataFrame or to
each element of a Series, enabling complex data manipulations.

5. What are some common methods to handle missing data in a dataset?

- Common methods include removing rows/columns with missing values (`dropna()`), filling
them with specific values (`fillna()`), or using interpolation methods.

### Coding Questions

#### 1. Data Manipulation

**Question:** Write a function that takes a DataFrame and a column name, and returns the
mean of that column.

```python
import pandas as pd
def mean_of_column(df, column_name):
return df[column_name].mean()

# Example usage
data = {'A': [1, 2, 3, 4], 'B': [5, 6, None, 8]}
df = pd.DataFrame(data)
print(mean_of_column(df, 'A')) # Output: 2.5
```

#### 2. Filtering Data

**Question:** Write a function to filter rows in a DataFrame where a specified column’s values
are greater than a given threshold.

```python
def filter_above_threshold(df, column_name, threshold):
return df[df[column_name] > threshold]

# Example usage
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(filter_above_threshold(df, 'A', 2))
```

#### 3. Grouping Data

**Question:** Write a function that returns the sum of values in a specific column grouped by
another column.

```python
def sum_grouped_by(df, group_column, sum_column):
return df.groupby(group_column)[sum_column].sum()

# Example usage
data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [1, 2, 3, 4]}
df = pd.DataFrame(data)
print(sum_grouped_by(df, 'Category', 'Values')) # Output: A 4, B 6
```

#### 4. Handling Missing Values

**Question:** Write a function that replaces missing values in a DataFrame with the mean of
their respective columns.
```python
def fill_missing_with_mean(df):
return df.fillna(df.mean())

# Example usage
data = {'A': [1, None, 3], 'B': [None, 2, 3]}
df = pd.DataFrame(data)
print(fill_missing_with_mean(df))
```

#### 5. Data Visualization

**Question:** Write code to create a bar plot of the average values of a column grouped by
another column.

```python
import matplotlib.pyplot as plt

def plot_average_bar(df, group_column, value_column):

averages = df.groupby(group_column)[value_column].mean()
averages.plot(kind='bar')
plt.title(f'Average {value_column} by {group_column}')
plt.xlabel(group_column)
plt.ylabel(f'Average {value_column}')
plt.show()

# Example usage
data = {'Category': ['A', 'B', 'A', 'B'], 'Values': [1, 2, 3, 4]}
df = pd.DataFrame(data)
plot_average_bar(df, 'Category', 'Values')
```

### Additional Theory Questions

6. What is the purpose of normalization and standardization in data preprocessing?

- Normalization scales data to a specific range, while standardization centers the data around
the mean with a unit variance.

7. Explain the importance of exploratory data analysis (EDA).

- EDA is crucial for understanding data distributions, identifying patterns, detecting anomalies,
and informing feature selection for modeling.

8. What is a correlation matrix?

- A correlation matrix is a table showing correlation coefficients between variables, helping to
understand relationships and dependencies.

9. What are the benefits of using Python for data analytics?

- Python offers extensive libraries (e.g., Pandas, NumPy, Matplotlib), ease of use, community
support, and flexibility for various data manipulation tasks.

10. How do you handle categorical variables in machine learning?

- Categorical variables can be handled using encoding techniques like one-hot encoding or
label encoding to convert them into a numerical format.

### Additional Coding Challenges

#### 6. Outlier Detection

**Question:** Write a function that detects outliers in a DataFrame column using the IQR
method.

```python
def detect_outliers_iqr(df, column_name):
Q1 = df[column_name].quantile(0.25)
Q3 = df[column_name].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column_name] < lower_bound) | (df[column_name] > upper_bound)]

# Example usage
data = {'Values': [1, 2, 3, 4, 100]}
df = pd.DataFrame(data)
print(detect_outliers_iqr(df, 'Values')) # Output: Rows with outliers
```

#### 7. Date and Time Manipulation

**Question:** Write a function that adds a specified number of days to a date column in a
DataFrame.

```python
def add_days_to_date(df, date_column, days):
df[date_column] = pd.to_datetime(df[date_column]) + pd.Timedelta(days=days)
return df

# Example usage
data = {'Date': ['2023-01-01', '2023-01-02']}
df = pd.DataFrame(data)
print(add_days_to_date(df, 'Date', 5))
```

### Basic Python Questions

1. **What is Python?**
- Python is a high-level, interpreted programming language known for its readability and
versatility. It is widely used in data analytics, web development, automation, and more.

2. What are Python lists?

- Lists are mutable sequences in Python that can hold a collection of items. They are defined
using square brackets `[]`.

3. How do you create a function in Python?

- A function is defined using the `def` keyword followed by the function name and
parentheses. For example:
```python
def my_function():
return "Hello, World!"
```

4. What are tuples in Python?

- Tuples are immutable sequences, defined using parentheses `()`, that can store a collection
of items.

5. How do you handle exceptions in Python?

- Exceptions are handled using `try` and `except` blocks:
```python
try:
# code that may cause an exception
except ExceptionType:
# code to handle the exception
```

### Data Manipulation with Pandas

6. **What is Pandas?**
- Pandas is a powerful data manipulation and analysis library for Python. It provides data
structures like Series and DataFrames.

7. How do you read a CSV file into a Pandas DataFrame?

- Use `pd.read_csv('filename.csv')` to read a CSV file.

8. How do you filter rows in a DataFrame?

- You can filter rows using boolean indexing:
```python
filtered_df = df[df['column_name'] > value]
```

9. How do you handle missing data in Pandas?

- You can use `df.dropna()` to remove missing values or `df.fillna(value)` to fill them with a
specified value.

10. How do you group data in Pandas?

- Use the `groupby()` method:
```python
grouped = df.groupby('column_name').mean()
```

### Data Visualization

11. What libraries can be used for data visualization in Python?

- Common libraries include Matplotlib, Seaborn, and Plotly.

12. How do you create a simple line plot using Matplotlib?

```python
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
```

13. What is Seaborn, and how does it relate to Matplotlib?

- Seaborn is a statistical data visualization library built on top of Matplotlib, offering a high-
level interface for drawing attractive graphics.

14. How do you create a scatter plot using Seaborn?

```python
import seaborn as sns
sns.scatterplot(data=df, x='column_x', y='column_y')
```

15. What is a histogram, and how do you create one in Python?

- A histogram is a graphical representation of the distribution of numerical data. You can
create one using:
```python
plt.hist(data, bins=10)
```

### Advanced Python Questions

16. What are lambda functions in Python?

- Lambda functions are anonymous functions defined using the `lambda` keyword. They can
take any number of arguments but can only have one expression.

17. How do you merge two DataFrames in Pandas?

- Use `pd.merge(df1, df2, on='column_name')`.

18. **What are the differences between `loc` and `iloc` in Pandas?**
- `loc` is label-based indexing, while `iloc` is position-based indexing. For example:
```python
df.loc[0] # First row by label
df.iloc[0] # First row by position
```

19. What is a DataFrame in Pandas?

- A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns).

20. Explain the concept of "vectorization" in Python.

- Vectorization refers to the process of applying operations on entire arrays rather than
individual elements, which enhances performance.

### Statistical Analysis

21. What is NumPy?

- NumPy is a fundamental library for numerical computing in Python, providing support for
arrays, matrices, and a collection of mathematical functions.

22. **How do you calculate the mean and standard deviation using NumPy?**
```python
import numpy as np
mean = np.mean(data)
std_dev = np.std(data)
```

23. **What is linear regression, and how can you implement it in Python?**
- Linear regression is a method to model the relationship between a dependent variable and
one or more independent variables. It can be implemented using `scikit-learn`:
```python
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
```

24. How do you perform hypothesis testing in Python?

- You can use libraries like `SciPy` to perform various tests (e.g., t-tests, chi-square tests):
```python
from scipy import stats
t_statistic, p_value = stats.ttest_ind(sample1, sample2)
```

25. What is the Central Limit Theorem?

- The Central Limit Theorem states that the distribution of the sample means approaches a
normal distribution as the sample size increases, regardless of the original distribution of the
data.

### SQL and Data Queries

26. How can you connect to a SQL database using Python?

- You can use libraries like `sqlite3` or `SQLAlchemy` to connect to databases.

27. What is the purpose of the `GROUP BY` clause in SQL?

- The `GROUP BY` clause groups rows that have the same values in specified columns into
summary rows, like finding the average or sum.

28. How do you perform a SQL JOIN in Pandas?

- You can use the `merge()` function to perform SQL-like joins:
```python
result = pd.merge(df1, df2, on='key', how='inner')
```

29. What is a primary key in a database?

- A primary key is a unique identifier for a record in a table, ensuring that no two rows have
the same value in that column.

30. How do you handle SQL injections in Python?

- Use parameterized queries or ORM frameworks like SQLAlchemy to prevent SQL injection
attacks.

### Machine Learning Basics

31. What is the difference between supervised and unsupervised learning?

- Supervised learning uses labeled data to train models, while unsupervised learning
identifies patterns in unlabeled data.
32. **How do you split data into training and testing sets?**
- You can use `train_test_split` from `scikit-learn`:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

33. What is overfitting in machine learning?

- Overfitting occurs when a model learns the noise in the training data rather than the actual
underlying patterns, leading to poor performance on new data.

34. What are decision trees?

- Decision trees are a type of supervised learning algorithm used for classification and
regression that splits data into branches based on feature values.

35. How do you evaluate the performance of a machine learning model?

- Performance can be evaluated using metrics such as accuracy, precision, recall, F1-score,
and ROC-AUC for classification tasks, and mean squared error (MSE) for regression tasks.

### Data Wrangling and Transformation

36. What is data wrangling?

- Data wrangling is the process of cleaning and transforming raw data into a usable format for
analysis.

37. How do you pivot a DataFrame in Pandas?

- You can use the `pivot()` method:
```python
pivot_df = df.pivot(index='column1', columns='column2', values='column3')
```

38. What is one-hot encoding?

- One-hot encoding is a technique to convert categorical variables into a binary matrix format,
allowing algorithms to work with categorical data.

39. How do you concatenate DataFrames in Pandas?

- Use the `concat()` function:
```python
result = pd.concat([df1, df2])
```

40. How do you normalize data in Python?

- You can normalize data using the `MinMaxScaler` from `scikit-learn`:
```python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
```

### Final Questions and Scenarios

41. Can you explain the importance of data visualization?

- Data visualization helps communicate insights effectively, making complex data more
understandable and facilitating decision-making.

42. How would you handle imbalanced datasets?

- Techniques include resampling (over-sampling the minority class or under-sampling the
majority class), using different evaluation metrics, and employing algorithms that handle
imbalance naturally.

43. What is feature engineering, and why is it important?

- Feature engineering involves creating new features from existing data to improve model
performance. It

COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
100 Python Interview Questions
No ratings yet
100 Python Interview Questions
68 pages
Viva
No ratings yet
Viva
7 pages
Data science
No ratings yet
Data science
16 pages
Top 100 Python Interview Questions for Data Analyst
No ratings yet
Top 100 Python Interview Questions for Data Analyst
10 pages
Data Analytics Lab QA
No ratings yet
Data Analytics Lab QA
7 pages
Viva Voce
No ratings yet
Viva Voce
5 pages
Python Unit 2 Question Bank (2)
No ratings yet
Python Unit 2 Question Bank (2)
5 pages
DAL_Oral_QB
No ratings yet
DAL_Oral_QB
2 pages
UNIT 4 Data Science Notes
No ratings yet
UNIT 4 Data Science Notes
4 pages
pandas_Trick_ques
No ratings yet
pandas_Trick_ques
2 pages
Data Analysis Concepts Explanation (1)
No ratings yet
Data Analysis Concepts Explanation (1)
3 pages
Viva_Answers
No ratings yet
Viva_Answers
3 pages
Data Science Mid-II Question Bank
No ratings yet
Data Science Mid-II Question Bank
1 page
Python Libraries Questions
No ratings yet
Python Libraries Questions
3 pages
MCQ_QB
No ratings yet
MCQ_QB
2 pages
ds viva
No ratings yet
ds viva
9 pages
DS1
No ratings yet
DS1
20 pages
pandasmohali
No ratings yet
pandasmohali
6 pages
cls10datascience_24082024_113123
No ratings yet
cls10datascience_24082024_113123
4 pages
Analystics Data Cleaning Questions Interview
No ratings yet
Analystics Data Cleaning Questions Interview
8 pages
MY Question Bank
No ratings yet
MY Question Bank
3 pages
Pandas
No ratings yet
Pandas
12 pages
VIP Question Bank for DPV for Theory Exam
No ratings yet
VIP Question Bank for DPV for Theory Exam
6 pages
data science
No ratings yet
data science
10 pages
Interview Questions About Python Programming
No ratings yet
Interview Questions About Python Programming
16 pages
Interview Preparation Data Science Analyse
No ratings yet
Interview Preparation Data Science Analyse
9 pages
Python Interview Questions For Data Analytics
No ratings yet
Python Interview Questions For Data Analytics
2 pages
Data Science Interview ques.
No ratings yet
Data Science Interview ques.
141 pages
Python Interview Questions.docx
No ratings yet
Python Interview Questions.docx
23 pages
data science
No ratings yet
data science
28 pages
Python Ques
No ratings yet
Python Ques
5 pages
Sac QB 2023-2024
No ratings yet
Sac QB 2023-2024
2 pages
1742275703376
No ratings yet
1742275703376
3 pages
100_interview_questions
No ratings yet
100_interview_questions
15 pages
Python Interviews Question
No ratings yet
Python Interviews Question
47 pages
FEATURE ENGINEERING ASSIGNMENT
No ratings yet
FEATURE ENGINEERING ASSIGNMENT
7 pages
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
No ratings yet
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
6 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Data Science QnA
No ratings yet
Data Science QnA
15 pages
Python Pandas
No ratings yet
Python Pandas
15 pages
python2 materials
No ratings yet
python2 materials
27 pages
jenisha INTERNSHIP REPORT-2.docx (1)
No ratings yet
jenisha INTERNSHIP REPORT-2.docx (1)
19 pages
Python Interview Questions by Skill Arbitrage
No ratings yet
Python Interview Questions by Skill Arbitrage
3 pages
Python 1
No ratings yet
Python 1
14 pages
Top Python Questions 1735201448
No ratings yet
Top Python Questions 1735201448
25 pages
Questions
No ratings yet
Questions
4 pages
Python Numpy Pandas Interview Questions
No ratings yet
Python Numpy Pandas Interview Questions
8 pages
Da Ans (GKJ)
No ratings yet
Da Ans (GKJ)
11 pages
How To Add Pandas To Spyder?: Ans-Import Pandas As PD
No ratings yet
How To Add Pandas To Spyder?: Ans-Import Pandas As PD
3 pages
Chapter 4 - Data Science
No ratings yet
Chapter 4 - Data Science
4 pages
INTRO TO PYTHON - DATACAMP
No ratings yet
INTRO TO PYTHON - DATACAMP
10 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Python DA Interview Topics
No ratings yet
Python DA Interview Topics
2 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Data Analytics at NP IT SOLUTIONS
No ratings yet
Data Analytics at NP IT SOLUTIONS
4 pages
Phyton
No ratings yet
Phyton
11 pages
Viva Questions
No ratings yet
Viva Questions
7 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Hierarchical and Partitional Clustering
No ratings yet
Hierarchical and Partitional Clustering
3 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
DM Mod5
No ratings yet
DM Mod5
49 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
UNIT5
No ratings yet
UNIT5
60 pages
DM GTU Study Material Presentations Unit-5 21052021124400PM
No ratings yet
DM GTU Study Material Presentations Unit-5 21052021124400PM
63 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
MLT Unit-1
No ratings yet
MLT Unit-1
19 pages
Instant Access to Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara ebook Full Chapters
100% (2)
Instant Access to Practical Guide to Cluster Analysis in R Unsupervised Machine Learning Alboukadel Kassambara ebook Full Chapters
52 pages
Topic 18:: Cluster Analysis and MDS
No ratings yet
Topic 18:: Cluster Analysis and MDS
38 pages
unit-3
No ratings yet
unit-3
30 pages
Ml ppt
No ratings yet
Ml ppt
19 pages
Classification of Painting Style
No ratings yet
Classification of Painting Style
9 pages
A UAV-Swarm-Communication Model Using A Machine-Learning
No ratings yet
A UAV-Swarm-Communication Model Using A Machine-Learning
19 pages
Machine Learning ISA-2 Answer Bank
No ratings yet
Machine Learning ISA-2 Answer Bank
28 pages
Minerals 11 01178
No ratings yet
Minerals 11 01178
16 pages
Data Science Questions and Answers - Clustering
No ratings yet
Data Science Questions and Answers - Clustering
4 pages
A Diana Algoritma
No ratings yet
A Diana Algoritma
2 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
THEORY FILE - Machine Learning (6th Sem)!!
No ratings yet
THEORY FILE - Machine Learning (6th Sem)!!
26 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
MCQ Machine Learning
No ratings yet
MCQ Machine Learning
23 pages
UNIT 3 Data Mining
No ratings yet
UNIT 3 Data Mining
11 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
A Simple Guide To Centroid Based Clustering (With Python Code)
No ratings yet
A Simple Guide To Centroid Based Clustering (With Python Code)
25 pages
Module -05 Machine Learning(BCS602) Search Creators
No ratings yet
Module -05 Machine Learning(BCS602) Search Creators
47 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages

Day 2 Python Interview QnA

Uploaded by

Day 2 Python Interview QnA

Uploaded by

### Basic Python Questions

2. **How do you install Python?**

3. **What are lists and tuples in Python?**

4. **What are dictionaries in Python?**

5. **How do you handle exceptions in Python?**

### Data Manipulation Questions

7. **How do you create a NumPy array?**

8. **What are the advantages of using Pandas?**

9. **How do you read a CSV file in Pandas?**

10. **How do you handle missing data in Pandas?**

### Data Analysis Questions

12. **What is the difference between a Series and a DataFrame in Pandas?**

13. **How do you group data in Pandas?**

14. **What is a pivot table in Pandas?**

15. **How do you merge two DataFrames in Pandas?**

### Statistical Analysis Questions

16. **What is the purpose of the `describe()` method in Pandas?**

17. **How do you calculate correlation in Pandas?**

18. **What is hypothesis testing?**

19. **What are p-values?**

20. **What is linear regression?**

### Data Visualization Questions

23. **What is Seaborn?**

24. **How do you create a scatter plot using Seaborn?**

25. **What is a box plot?**

### Advanced Python Questions

26. **What are lambda functions in Python?**

27. **What is list comprehension?**

28. **What is the purpose of the `apply()` function in Pandas?**

29. **How do you install external libraries in Python?**

31. **What is data normalization?**

32. **What is feature engineering?**

33. **What is the difference between supervised and unsupervised learning?**

34. **What are outliers, and how can they be detected?**

35. **What is the purpose of data validation?**

### SQL Integration Questions

36. **How can you connect Python to a SQL database?**

37. **What is the purpose of the `pandas.read_sql()` function?**

38. **How do you perform a SQL join in Pandas?**

39. **What is a primary key in a database?**

40. **What is a foreign key?**

### Machine Learning Questions

42. **What is overfitting?**

43. **What are decision trees?**

44. **What is cross-validation?**

45. **What is a confusion matrix?**

### Data Ethics Questions

46. **What is data privacy?**

47. **What is bias in data analysis?**

48. **How can you ensure data integrity?**

49. **What is GDPR?**

50. **Why is data transparency important?**

### More Advanced Topics

### Theory Questions

1. **What is the difference between Python lists and arrays?**

2. **Explain the concept of DataFrames in Pandas.**

3. **What is the purpose of the `groupby()` function in Pandas?**

4. **How does the `apply()` function work in Pandas?**

5. **What are some common methods to handle missing data in a dataset?**

### Coding Questions

#### 1. Data Manipulation

#### 2. Filtering Data

#### 3. Grouping Data

#### 4. Handling Missing Values

#### 5. Data Visualization

def plot_average_bar(df, group_column, value_column):

### Additional Theory Questions

6. **What is the purpose of normalization and standardization in data preprocessing?**

7. **Explain the importance of exploratory data analysis (EDA).**

8. **What is a correlation matrix?**

9. **What are the benefits of using Python for data analytics?**

10. **How do you handle categorical variables in machine learning?**

### Additional Coding Challenges

#### 6. Outlier Detection

#### 7. Date and Time Manipulation

### Basic Python Questions

2. How do you install Python?

3. What are lists and tuples in Python?

4. What are dictionaries in Python?

5. How do you handle exceptions in Python?

7. How do you create a NumPy array?

8. What are the advantages of using Pandas?

9. How do you read a CSV file in Pandas?

10. How do you handle missing data in Pandas?

12. What is the difference between a Series and a DataFrame in Pandas?

13. How do you group data in Pandas?

14. What is a pivot table in Pandas?

15. How do you merge two DataFrames in Pandas?

16. What is the purpose of the `describe()` method in Pandas?

17. How do you calculate correlation in Pandas?

18. What is hypothesis testing?

19. What are p-values?

20. What is linear regression?

23. What is Seaborn?

24. How do you create a scatter plot using Seaborn?

25. What is a box plot?

26. What are lambda functions in Python?

27. What is list comprehension?

28. What is the purpose of the `apply()` function in Pandas?

29. How do you install external libraries in Python?

31. What is data normalization?

32. What is feature engineering?

33. What is the difference between supervised and unsupervised learning?

34. What are outliers, and how can they be detected?

35. What is the purpose of data validation?

36. How can you connect Python to a SQL database?

37. What is the purpose of the `pandas.read_sql()` function?

38. How do you perform a SQL join in Pandas?

39. What is a primary key in a database?

40. What is a foreign key?

42. What is overfitting?

43. What are decision trees?

44. What is cross-validation?

45. What is a confusion matrix?

46. What is data privacy?

47. What is bias in data analysis?

48. How can you ensure data integrity?

49. What is GDPR?

50. Why is data transparency important?

1. What is the difference between Python lists and arrays?

2. Explain the concept of DataFrames in Pandas.

3. What is the purpose of the `groupby()` function in Pandas?

4. How does the `apply()` function work in Pandas?

5. What are some common methods to handle missing data in a dataset?

6. What is the purpose of normalization and standardization in data preprocessing?

7. Explain the importance of exploratory data analysis (EDA).

8. What is a correlation matrix?

9. What are the benefits of using Python for data analytics?

10. How do you handle categorical variables in machine learning?

2. What are Python lists?

3. How do you create a function in Python?

4. What are tuples in Python?

5. How do you handle exceptions in Python?

7. How do you read a CSV file into a Pandas DataFrame?

8. How do you filter rows in a DataFrame?

9. How do you handle missing data in Pandas?

10. How do you group data in Pandas?

11. What libraries can be used for data visualization in Python?

12. How do you create a simple line plot using Matplotlib?

13. What is Seaborn, and how does it relate to Matplotlib?

14. How do you create a scatter plot using Seaborn?

15. What is a histogram, and how do you create one in Python?

16. What are lambda functions in Python?

17. How do you merge two DataFrames in Pandas?

19. What is a DataFrame in Pandas?

20. Explain the concept of "vectorization" in Python.

21. What is NumPy?

24. How do you perform hypothesis testing in Python?

25. What is the Central Limit Theorem?

26. How can you connect to a SQL database using Python?

27. What is the purpose of the `GROUP BY` clause in SQL?

28. How do you perform a SQL JOIN in Pandas?

29. What is a primary key in a database?

30. How do you handle SQL injections in Python?

31. What is the difference between supervised and unsupervised learning?

33. What is overfitting in machine learning?