Python Pandas
Python Pandas
Pandas is a popular open-source data manipulation and analysis library for Python. It
provides powerful data structures and tools for working with structured data, making it
an essential tool for data scientists, analysts, and developers working with tabular and
time series data. Here's a detailed overview of pandas:
Key Features:
1.
Data Structures:
2.
Series: 1-dimensional labeled array capable of holding any data type.
DataFrame: 2-dimensional labeled data structure with columns of potentially
different types, akin to a spreadsheet or SQL table.
Index: Immutable sequence used for indexing and aligning data.
3.
Data Input/Output:
4.
Supports reading and writing data from/to various file formats such as CSV,
Excel, SQL databases, JSON, HTML, HDF5, Parquet, and more.
Efficient handling of large datasets.
5.
Data Manipulation:
6.
Powerful methods for selecting, filtering, and transforming data.
Ability to handle missing data easily with methods like dropna(), fillna().
Grouping and aggregation operations using groupby() function.
Merging and joining datasets with merge() and join() functions.
7.
Data Cleaning and Preparation:
8.
Handling duplicate data with drop_duplicates() function.
Reshaping and pivoting data with pivot() and pivot_table() functions.
Renaming columns, changing data types, and handling outliers.
9.
Data Analysis:
10.
Descriptive statistics like mean, median, mode, standard deviation, etc.
Time series analysis with built-in functionality for date/time indexing and
resampling.
11.
Data Visualization:
12.
Integration with popular visualization libraries like Matplotlib and Seaborn.
Quick plotting of data using plot() function directly on pandas objects.
13.
Performance:
14.
Optimized for performance and scalability, especially for handling large
datasets.
Efficient memory usage with support for out-of-core computation.
Example Usage:
python
Copy code
import pandas as pd
# Creating a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# Reading data from a CSV file
df = pd.read_csv('data.csv')
# Selecting data
df['Age'] # Selecting a single column
df.loc[df['Age'] > 30] # Selecting rows based on a condition
# Data manipulation
df['Age'] += 1 # Incrementing age by 1
df['City'].str.upper() # Converting city names to uppercase
# Data analysis
df.describe() # Summary statistics
df.groupby('City').mean() # Mean age by city
# Data visualization
df['Age'].plot(kind='hist') # Histogram of ages
Installation:
Copy code
pip install pandas
Conclusion:
Pandas is an indispensable tool for data manipulation and analysis in Python. Its
intuitive syntax, powerful functionality, and excellent performance make it the go-to
choice for handling structured data in various domains, including finance, research,
machine learning, and more. Whether you're cleaning messy data, performing
complex analyses, or visualizing insights, pandas provides the tools you need to work
efficiently with your data.
MCQs
1. What is pandas?
- A) A snake species
- B) A data manipulation and analysis library in Python
- C) A type of data visualization tool
- D) A web development framework
- **Answer: B**
4. What is the primary data structure in pandas for holding 1-dimensional data?
- A) Array
- B) DataFrame
- C) Series
- D) List
- **Answer: C**
5. Which of the following is NOT a valid method for handling missing data in pandas?
- A) `dropna()`
- B) `fillna()`
- C) `replace()`
- D) `interpolate()`
- **Answer: C**
8. Which of the following methods can be used for merging two DataFrames in pandas?
- A) `concat()`
- B) `join()`
- C) `merge()`
- D) All of the above
- **Answer: D**
10. How can you plot data directly from a pandas DataFrame?
- A) Using `matplotlib.pyplot.plot()`
- B) Using `pandas.plot()`
- C) Using `seaborn.plot()`
- D) Using `DataFrame.plot()`
- **Answer: D**
12. Which method is used for performing element-wise arithmetic operations between two
DataFrames in pandas?
- A) `add()`
- B) `merge()`
- C) `combine()`
- D) `apply()`
- **Answer: A**
14. Which of the following is NOT a valid method to select data in pandas?
- A) Using square brackets `[]`
- B) Using `.loc[]`
- C) Using `.ix[]`
- D) Using `.select()`
- **Answer: D**
16. Which of the following methods is used for forward filling missing values in pandas?
- A) `ffill()`
- B) `fill_forward()`
- C) `forward_fill()`
- D) `forward_fillna()`
- **Answer: A**
20. Which of the following is NOT a valid parameter for the `read_csv()` function in pandas?
- A) `header`
- B) `rows`
- C) `index_col`
- D) `dtype`
- **Answer: B**
21. How can you change the data type of a column in a DataFrame in pandas?
- A) Using the `.astype()` method
- B) Using the `.change_dtype()` method
- C) Using the `.modify_dtype()` method
- D) Using the `.convert_dtype()` method
- **Answer: A**
22. Which of the following methods can be used to pivot data in pandas?
- A) `pivot()`
- B) `transpose()`
- C) `reshape()`
- D) `pivot_table()`
- **Answer: D**
26. How can you apply a function to each element in a pandas Series?
- A) Using a for loop
- B) Using the `.apply
()` method
- C) Using the `.map()` method
- D) Using the `.transform()` method
- **Answer: B**
27. What does the `resample()` method in pandas do?
- A) Reshapes the DataFrame
- B) Resamples time series data
- C) Reverses the DataFrame
- D) Replaces missing values
- **Answer: B**
28. Which of the following methods is used to extract specific rows and columns from a DataFrame in
pandas?
- A) `.iloc[]`
- B) `.loc[]`
- C) `.ix[]`
- D) All of the above
- **Answer: D**
29. How can you perform a left join between two DataFrames in pandas?
- A) Using the `left_join()` method
- B) Using the `join()` method with `how='left'` parameter
- C) Using the `merge()` method with `how='left'` parameter
- D) Using the `concat()` method with `join='left'` parameter
- **Answer: C**
30. Which function is used to calculate the correlation between columns in a DataFrame in pandas?
- A) `correlate()`
- B) `covariance()`
- C) `corr()`
- D) `correlation()`
- **Answer: C**
31. How can you check for missing values in a DataFrame in pandas?
- A) Using the `is_missing()` method
- B) Using the `missing_values()` function
- C) Using the `isna()` or `isnull()` methods
- D) Using the `check_missing()` function
- **Answer: C**
32. Which of the following is NOT a valid parameter for the `pivot_table()` function in pandas?
- A) `index`
- B) `columns`
- C) `values`
- D) `groups`
- **Answer: D**
36. Which of the following is NOT a valid method for reindexing in pandas?
- A) `reindex()`
- B) `reset_index()`
- C) `set_index()`
- D) `index()`
- **Answer: D**
37. How can you create a new column in a DataFrame based on values from existing columns?
- A) Using the `create_column()` method
- B) Using the `add_column()` method
- C) Using assignment with square brackets `[]`
- D) Using the `insert_column()` method
- **Answer: C**
39. How can you select the first `n` rows of a DataFrame in pandas?
- A) Using the `head(n)` method
- B) Using the `first(n)` method
- C) Using the `select_first(n)` method
- D) Using the `top(n)` method
- **Answer: A**
44. How can you drop rows with missing values in a DataFrame in pandas?
- A) Using the `dropna()` method
- B) Using the `remove_missing()` method
- C) Using the `drop_missing()` method
- D) Using the `delete_missing()` method
- **Answer: A**
45. Which method is used to calculate the mean of each group in a DataFrame in pandas?
- A) `group_mean()`
- B) `mean_group()`
- C) `groupby().mean()`
- D) `aggregate_mean()`
- **Answer: C**
49
. Which of the following methods can be used to fill missing values in a DataFrame in pandas?
- A) `fill()`
- B) `fill_missing()`
- C) `fill_value()`
- D) `fillna()`
- **Answer: D**
54. Which method is used to calculate the cumulative maximum of elements in a DataFrame in
pandas?
- A) `cummax()`
- B) `max_cum()`
- C) `cumulative_max()`
- D) `maximum_cum()`
- **Answer: A**
55. What does the `shift()` method do with negative values in pandas?
- A) Shifts the values to the left
- B) Shifts the values to the right
- C) Shifts the index to the left
- D) Shifts the index to the right
- **Answer: B**
57. What does the `explode()` method do with non-list elements in pandas?
- A) Converts them to lists and expands them into separate rows
- B) Removes them from the DataFrame
- C) Raises an error
- D) Converts them to missing values
- **Answer: C**
58. How can you calculate the median of each column in a DataFrame in pandas?
- A) Using the `median()` method
- B) Using the `agg()` method with the `'median'` parameter
- C) Using the `groupby()` method followed by the `median()` method
- D) Using the `stats()` method with the `'median'` parameter
- **Answer: B**
59. Which method is used to calculate the minimum of each group in a DataFrame in pandas?
- A) `group_min()`
- B) `min_group()`
- C) `groupby().min()`
- D) `aggregate_min()`
- **Answer: C**
61. How can you calculate the skewness of each column in a DataFrame in pandas?
- A) Using the `skewness()` method
- B) Using the `stats()` method with the `'skew'` parameter
- C) Using the `groupby()` method followed by the `skew()` method
- D) Using the `aggregate()` method with the `'skew'` parameter
- **Answer: C**
62. What does the `min_periods` parameter in pandas `rolling()` function specify?
- A) The minimum number of elements required for the rolling window
- B) The minimum value of the rolling window
- C) The minimum period for the rolling window
- D) The minimum index value for the rolling window
- **Answer: A**
65. How can you calculate the kurtosis of each column in a DataFrame in pandas?
- A) Using the `kurtosis()` method
- B) Using the `stats()` method with the `'kurt'` parameter
- C) Using the `groupby()` method followed by the `kurt()` method
- D) Using the `aggregate()` method with the `'kurt'` parameter
- **Answer: C**
67. How can you calculate the percentile of each column in a DataFrame in pandas?
- A) Using the `percentile()` method
- B) Using the `stats()` method with the `'percentile'` parameter
- C) Using the `groupby()` method followed by the `percentile()` method
- D) Using the `aggregate()` method with the `'percentile'` parameter
- **Answer: D**
69. How can you calculate the percentage change of each column in a DataFrame in pandas?
- A) Using the `change()` method
- B) Using the `percentage_change()` method
- C) Using the `pct_change
()` method
- D) Using the `change_percentage()` method
- **Answer: C**
71. How can you calculate the rank of each column in a DataFrame in pandas?
- A) Using the `rank()` method
- B) Using the `stats()` method with the `'rank'` parameter
- C) Using the `groupby()` method followed by the `rank()` method
- D) Using the `aggregate()` method with the `'rank'` parameter
- **Answer: A**
73. How can you calculate the rolling mean of each column in a DataFrame in pandas?
- A) Using the `rolling_mean()` method
- B) Using the `mean()` method with the `rolling=True` parameter
- C) Using the `groupby()` method followed by the `rolling_mean()` method
- D) Using the `agg()` method with the `'rolling_mean'` parameter
- **Answer: A**
77. How can you calculate the sum of each column in a DataFrame in pandas?
- A) Using the `sum()` method
- B) Using the `stats()` method with the `'sum'` parameter
- C) Using the `groupby()` method followed by the `sum()` method
- D) Using the `aggregate()` method with the `'sum'` parameter
- **Answer: A**
79. How can you calculate the variance of each column in a DataFrame in pandas?
- A) Using the `variance()` method
- B) Using the `stats()` method with the `'var'` parameter
- C) Using the `groupby()` method followed by the `var()` method
- D) Using the `aggregate()` method with the `'var'` parameter
- **Answer: C**
81. How can you calculate the covariance between columns in a DataFrame in pandas?
- A) Using the `covariance()` method
- B) Using the `stats()` method with the `'cov'` parameter
- C) Using the `groupby()` method followed by the `cov()` method
- D) Using the `aggregate()` method with the `'cov'` parameter
- **Answer: A**
83. How can you calculate the cumulative product of each column in a DataFrame in pandas?
- A) Using the `cumprod()` method
- B) Using the `stats()` method with the `'cumprod'` parameter
- C) Using the `groupby()` method followed by the `cumprod()` method
- D) Using the `aggregate()` method with the `'cumprod'` parameter
- **Answer: A**
85. How can you calculate the skewness of each column in a DataFrame in pandas?
- A) Using the `skew()` method
- B) Using the `stats()` method with the `'skew'` parameter
- C) Using the `groupby()` method followed by the `skew()` method
- D) Using the `aggregate()` method with the `'skew'` parameter
- **Answer: A**
87. How can you calculate the rolling median of each column in a DataFrame in pandas?
- A) Using the `rolling_median()` method
- B) Using the `median()` method with the `rolling=True` parameter
- C) Using the `groupby()` method followed by the `rolling_median()` method
- D) Using the `agg()` method with the `'rolling_median'` parameter
- **Answer: A**
89. How can you calculate the cumulative sum of each column in a DataFrame in pandas?
- A) Using the `cumsum()` method
- B) Using the `stats()` method with the `'cumsum'` parameter
- C) Using the `groupby()` method followed by the `cumsum()` method
- D) Using the `aggregate()` method with the `'cumsum'` parameter
- **Answer: A**
93. How can you calculate the correlation between columns in a DataFrame in pandas?
- A) Using the `correlation()` method
- B) Using the `stats()` method with the `'corr'` parameter
- C) Using the `groupby()` method followed by the `corr()` method
- D) Using the `aggregate()` method with the `'corr'` parameter
- **Answer: C**
95. How can you calculate the percentage change of each column in a DataFrame in pandas?
- A) Using the `change()` method
- B) Using the `percentage_change()` method
- C) Using the `pct_change()` method
- D) Using the `change_percentage()` method
- **Answer: C**
97. How can you calculate the rolling standard deviation of each column in a DataFrame in pandas?
- A) Using the `rolling_std()` method
- B) Using the `std()` method with the `rolling=True` parameter
- C) Using the `groupby()` method followed by the `rolling_std()` method
- D) Using the `agg()` method with the `'rolling_std'` parameter
- **Answer: A**
99. How can you calculate the sum of each row in a DataFrame in pandas?
- A) Using the `sum()` method with the `axis=0` parameter
- B) Using the `sum()` method with the `axis=1` parameter
- C) Using the `groupby()` method followed by the `sum()` method
- D) Using the `aggregate()` method with the `'sum'` parameter
- **Answer: B**