0% found this document useful (0 votes)
15 views4 pages

Question Bank (1&2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

Question Bank (1&2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

QUESTIONS BANK FOR EDA UNIT 1

2MARKS
1. 1.What does EDA stand for in data science?
2. Can you recall two primary goals of Exploratory Data Analysis (EDA)?
3. Name two software tools commonly used for EDA.
4. Define data transformation in the context of EDA.
5. Mention one key benefit of merging databases in EDA.
6. What is the primary purpose of reshaping and pivoting data during EDA?
7. Give an example of a visual aid commonly used in EDA.
8. Explain the significance of EDA in the data science process.
9. Compare and contrast EDA with classical statistical analysis. How are they different?
10. How does Bayesian analysis differ from EDA in terms of data exploration?
11. Describe how data transformation techniques can improve the quality of EDA.
12. Why is it important to compare and contrast data during EDA?
13. How can visual aids, such as histograms, help in understanding data distribution?
14. Explain the concept of merging databases and its role in EDA.
15. What is the purpose of data reshaping and pivoting, and when is it typically
performed during EDA?
16. Describe one common data transformation technique used in EDA.

16 MARKS

1. What does EDA stand for in data science?


2. Name two fundamental components of EDA.
3. List two software tools commonly used for EDA.
4. Define data transformation in the context of EDA
5. Explain the significance of EDA in the data science process.
6. Compare and contrast EDA with classical statistical analysis.
7. How does Bayesian analysis differ from EDA in terms of approach and goals?
8. Can you provide an example of a situation where EDA is more appropriate than
classical analysis?
9. Given a dataset, describe a specific EDA technique you would use to explore its
distribution.
10. Imagine you have two datasets you need to merge for analysis. What EDA
considerations should you take into account before merging them?
11. Provide an example of reshaping and pivoting data in EDA. How can this help in data
exploration?
12. What are the potential challenges or limitations of using EDA in real-world data
analysis projects?
13. Compare the advantages and disadvantages of visual aids, such as histograms and
scatter plots, in EDA.
14. How can EDA uncover hidden patterns or anomalies in data that might not be evident
through classical analysis?
15. Assess the impact of data quality on the effectiveness of EDA. How can poor data
quality affect the results of EDA?
16. In a given business scenario, explain why a data scientist might choose EDA over
Bayesian analysis when trying to gain insights from a large dataset.

***************************

QUESTION BANK FOR UNIT 2


2MARKS

1. What is the primary data structure used in Pandas for handling tabular data?
2. Name two common Pandas data structures for one-dimensional data.
3. How do you access the first five rows of a DataFrame using Pandas?
4. In Pandas, what method is used to check the shape (number of rows and columns) of a
DataFrame?
5. What function is used to load a CSV file into a Pandas DataFrame?
6. Explain the purpose of the head() method in Pandas.
7. What is the default index for a newly created DataFrame in Pandas?
8. How do you drop a column from a DataFrame in Pandas?
9. Describe the difference between a Series and a DataFrame in Pandas.
10. Explain the concept of hierarchical indexing in Pandas. Provide an example.
11. How can you handle missing data in a Pandas DataFrame?
12. What is the difference between the concat() and merge() functions in Pandas for
combining DataFrames?
13. Describe the process of grouping data in Pandas and mention a function used for
aggregation within groups.
14. What is the purpose of a pivot table in Pandas, and how is it created?
15. How do vectorized string operations differ from regular string operations in Pandas?
16. Explain the difference between the append() and join() methods when combining
DataFrames in Pandas.

16 MARKS

1. Define what Pandas is and explain its significance in data manipulation.


2. List and briefly describe the primary Pandas objects used for data manipulation.
3. Recall the purpose of data indexing in Pandas and how it facilitates data selection.
4. Name two common methods for handling missing data in Pandas.
5. Explain the concept of hierarchical indexing in Pandas. Provide an example to
illustrate its use.
6. Differentiate between concatenation and merging of datasets in Pandas. When would
you use each?
7. Describe how vectorized string operations work in Pandas. Give an example of a
practical use case.
8. Why is it important to use the appropriate aggregation functions when grouping data
in Pandas? Provide an example.
9. Given a dataset, write code in Pandas to perform a left join between two DataFrames
using a common key.
10. Create a Pandas DataFrame with hierarchical indexing and demonstrate how to select
data from specific levels.
11. Given a dataset with missing values, apply suitable Pandas methods to fill in missing
data based on a chosen strategy.
12. Using Pandas, create a pivot table from a dataset, and explain the steps involved.
13. Given a real-world dataset, describe a scenario where hierarchical indexing would be
particularly useful for data analysis.
14. Analyze a dataset using Pandas to find the mean, median, and standard deviation of a
specific numeric column. Interpret the results.
15. Compare and contrast the benefits and drawbacks of using the "concat" and "merge"
methods in Pandas for combining datasets.
16. Given a dataset containing text data, perform text preprocessing and analysis using
Pandas' vectorized string operations to extract meaningful insights.
17. Evaluate the impact of missing data on the results of a statistical analysis. Discuss
strategies to handle missing data effectively using Pandas.
18. Compare and contrast the "append" and "concat" methods in Pandas for combining
DataFrames. When would you choose one method over the other?
19. Critically analyze a case study where hierarchical indexing in Pandas was employed
to solve a complex data analysis problem. What were the key benefits of using
hierarchical indexing in this context?
20. Design a Pandas workflow to merge and clean two separate datasets with different
structures and create a single cohesive DataFrame for further analysis.
21. Create a step-by-step guide on how to perform a pivot table operation in Pandas,
including data preparation, indexing, aggregation, and visualization of results.
22. Develop a custom function in Pandas that automates the process of handling missing
data based on user-defined criteria. Provide an example of its usage.
23. Propose a data analysis project that leverages Pandas' capabilities for string
operations. Explain the problem statement and the expected outcomes.

You might also like