0% found this document useful (0 votes)
6 views1 page

DS Question Bank Unit-2 Part-1

The document is a question bank for a Data Science course focusing on Unit 2, Part 1. It covers topics such as handling large volumes of data, data wrangling phases, combining datasets in pandas, and challenges in merging data. Additionally, it includes practical tasks like identifying duplicates, handling missing values, and performing group calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views1 page

DS Question Bank Unit-2 Part-1

The document is a question bank for a Data Science course focusing on Unit 2, Part 1. It covers topics such as handling large volumes of data, data wrangling phases, combining datasets in pandas, and challenges in merging data. Additionally, it includes practical tasks like identifying duplicates, handling missing values, and performing group calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

DATA SCIENCE

QUESTION BANK
UNIT-2
PART-1

1. Discuss the problems and its solutions for handling large volume of Data.

2. Define Data Wrangling and clearly explain its phases/ steps:


a. Clean
b. Transform
c. Merge
d. Shape

3. How the data in pandas are combined together? Discuss all the ways.
[Hint: Combining and Merging Datasets Merging on Index Concatenate Combining
with overlap]

4. Explain the following:


a. How can you merge two DataFrames on multiple indexes (hierarchical index
merging)?
[Hint: pd.merge(left1, right1, left_on='key', right_index=True, how='outer')]
b. How the dataframe are combined together having the same or similar indexes
but non-overlapping columns?
[Hint: left2.join(right2, how='outer') ]
c. Discuss various concatenate functions arguments.
i. Create two data frames: One dataframe contains index values [‘a’, ‘b’]
while another dataframe contains index values [‘a’, ‘b’, ‘c’, ‘d’].
ii. Perform concatenate operations with ‘inner’ join operation along axis
=1.
[Hint: pd.concat([s1, s4], axis=1, join='inner') ]
d. Discuss different join types and apply on two dataframe.

5. Discuss the challenges encountered when merging and combining datasets, such as
handling missing values, duplicate entries, and performance issues. How can these
challenges be mitigated?
6. Load a dataset and identify duplicate records.
a) Remove them while keeping the first occurrence.
b) Replace missing values in a dataset using:
a. Mean for numerical columns
b. Mode for categorical columns
c) Group data by multiple columns and calculate the mean for each group.
d) Compute the percentage contribution of each category in a column.
e) Count unique values in each column of a dataset.

You might also like