JJKJK
JJKJK
S.n Questions
o
1 Under what circumstances, the pivot_table() in pandas is used?
The `pivot_table()` in pandas is used to summarize and aggregate data by transforming it into a new
table format, where rows and columns represent unique values from the original dataset. It is especially
useful for creating summaries, such as totals, averages, or counts, across different categories or
combinations of categories.
Using appropriate data visualization modules develop a python code snippet that generates a simple
2 State the advantages of using Nympy arrays State the advantages of using Nympy arrays sinusoidal
wave in an empty gridded axes?
import numpy as np
import matplotlib.pyplot as plt
# Create a plot
fig, ax = plt.subplots()
3.No Units: It is a dimensionless measure, meaning it does not depend on the units of the variables.
This makes it easier to compare across different datasets.
4. Linear Relationship: The coefficient only measures the strength and direction of a linear
relationship between two variables. Non-linear relationships may not be accurately represented.
5. Sensitivity to Outliers: Pearson correlation is sensitive to outliers, which can significantly affect the
value of the coefficient.
6.Assumes Continuous Variables: It assumes that both variables are continuous and normally
distributed. However, it can still be used for non-normally distributed data, but interpretation should be
done cautiously.
4 Summarize some built – in Pandas aggregations?
Pandas provides several built-in aggregation functions that are useful for summarizing data. Here are
some common ones:
1. sum() - Calculates the sum of values along the specified axis.
2. mean() - Computes the average of values.
3. median()- Finds the median (middle value) of the data.
4. min() / max()`** - Returns the minimum or maximum value.
5. count() - Counts the number of non-missing values.
6. std() - Calculates the standard deviation, showing how much the data deviates from the mean.
7. var() - Computes the variance, measuring the spread of data.
8. prod() - Returns the product of values.
9. mode() - Identifies the most common value(s) in the dataset.
10 describe() - Generates descriptive statistics, including count, mean, std, min, quartiles, and max.
11.agg()` - Allows applying multiple aggregations to a DataFrame or Series using different functions.
These functions can be used directly on DataFrames or Series and are useful for data analysis,
exploration, and summarization.
5 State the advantages of using Nympy arrays
NumPy arrays offer several advantages:
1. Speed: Faster than Python lists due to efficient memory usage and C-based implementation.
2. Vectorization: Supports element-wise operations without loops, enabling quick computations.
3. Mathematical Functions: Provides many built-in functions for easy numerical operations.
4. Multi-Dimensional Support: Easily handles arrays of multiple dimensions (e.g., matrices).
5. Integration: Compatible with libraries like Pandas, SciPy, and scikit-learn.
6. Boolean Indexing: Efficiently filter and manipulate data based on conditions.
7. Cross-Platform: Works seamlessly across different systems.
These benefits make NumPy ideal for data analysis, scientific computing, and machine learning.
6 Outline the two types of Nympy’s UFuncs
NumPy's Universal Functions (UFuncs) are of two main types:
1. Unary UFuncs
- Definition: These operate on a single input array element-wise.
Examples:
np.sqrt(): Computes the square root of each element.
np.exp(): Calculates the exponential (e^x) for each element.
np.sin(), np.cos(), etc.: Trigonometric functions applied to each element.
np.abs()`: Returns the absolute value of each element.
2. Binary UFuncs
Definition: These operate on two input arrays element-wise.
Examples
np.add(): Adds corresponding elements of two arrays.
np.subtract() Subtracts elements of one array from another.
np.multiply(): Multiplies elements of two arrays.
np.maximum(), np.minimum(): Finds the maximum or minimum between corresponding
elements.
These UFuncs enable fast, element-wise operations on arrays, making computations efficient and
straightforward.
7
List the attribute of a Nympy array. Give an example for it
ndarray.ndim - Returns the number of dimensions (axes) of the array.
Ex: import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # Output: 2
ndarray.shape - Gives the shape of the array as a tuple, showing the size along each dimension.
Ex: print(arr.shape) # Output: (2, 3)
ndarray.size - Returns the total number of elements in the array.
Ex: print(arr.size) # Output: 6
ndarray.dtype - Displays the data type of the array elements.
Ex: print(arr.dtype) # Output: int64 (or similar, depending on the system)
8
Create a data frame with key and data pair as A-10, b-20,A-40, C-5, B-10, C-10. Find the sum of each
key and display the result as each key group
import pandas as pd
# Create the data
data = {'Key': ['A', 'B', 'A', 'C', 'B', 'C'],
'Value': [10, 20, 40, 5, 10, 10]}
# Create DataFrame
df = pd.DataFrame(data)
# Group by 'Key' and sum the values
grouped_sum = df.groupby('Key').sum()
# Display the result
print(grouped_sum)
9 Define Dictionary in python.
A dictionary in Python is a collection of key-value pairs, where each key is associated with a specific
value. It is defined using curly braces `{}`. Keys are unique and immutable, and values can be accessed,
modified, or removed using their corresponding keys.
Example:
student = {"name": "Alice", "age": 25}
print(student["name"])
# Output: Alice
10 What is Hierarchical data in a dataframe?
Hierarchical data in a DataFrame refers to data organized with multiple levels of indexing, known as
a MultiIndex. This allows for a nested structure where rows or columns are grouped by multiple
criteria (e.g., Year and Region). It is useful for representing and analyzing complex datasets.
Example:
import pandas as pd
Output:
Sales
2022 North 150
South 200
2023 North 250
South 300
data = {'Product': ['A', 'A', 'B', 'B'], 'Sales': [100, 150, 200, 250]}
df = pd.DataFrame(data)
grouped = df.groupby('Product').sum()
print(grouped)
Output:
Sales
Product
A 250
B 450
12
What is data indexing
Data indexing in Python, particularly with pandas, refers to the process of selecting, accessing, and
organizing data within a DataFrame or Series. An index acts as a label to identify rows or columns,
allowing you to retrieve data efficiently. Indexing helps in quick data lookup, slicing, and filtering.
Key Points:
- The row index is used to identify and access rows.
- The column index is used to identify and access columns.
- Indices can be single-level (simple) or multi-level (hierarchical).
Example:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['ID1', 'ID2', 'ID3'])
# Accessing data using index
print(df.loc['ID2']) # Access row with index 'ID2'
Output:
Name Bob
Age 30
Name: ID2, dtype: object
13 What is Broadcasting? state the rules of broadcasting
Broadcasting in Python, especially with NumPy, refers to the ability of NumPy to perform element-
wise operations on arrays of different shapes by "stretching" the smaller array to match the shape of
the larger one without actually copying data. This feature makes it easy to perform arithmetic
operations on arrays of varying dimensions.
Rules of Broadcasting:
1 Same Shape: If the two arrays have the same shape, they are compatible for element-wise
operations.
2. One of the Dimensions is 1: If the shapes of the arrays are not the same, NumPy can still operate
on them if one of the dimensions is `1`. The array with a dimension of `1` is "stretched" or repeated
to match the size of the other array.
3. Arrays with Different Lengths: If the two arrays have different numbers of dimensions, the smaller
array is padded with `1s` on the left side until the shapes have the same length.
Example:
import numpy as np
# Array A: shape (3, 1)
A = np.array([[1], [2], [3]])
# Array B: shape (1, 3)
B = np.array([10, 20, 30])
# Broadcasting: resulting shape (3, 3)
result = A * B
print(result)
Output:
[[10 20 30]
[20 40 60]
[30 60 90]]
Here, `A` is stretched horizontally, and `B` is stretched vertically to perform element-wise
multiplication.
14
Enumerate the operation on missing data
Operations on missing data in Python, especially with pandas, include methods to detect, handle, and
fill missing values. Common operations include:
1. Detection of Missing Data:
- `isnull()`: Returns `True` for missing values and `False` for non-missing values.
- `notnull()`: Returns `True` for non-missing values and `False` for missing values.
import pandas as pd
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 30]}
df = pd.DataFrame(data)
print(df.isnull())
2. *Removing Missing Data:
- `dropna()`: Removes rows or columns with missing values.
- `dropna(axis=1)`: Removes columns with missing values.
df_cleaned = df.dropna() # Removes rows with any missing values
3. Filling Missing Data:
- `fillna(value)`: Replaces missing values with a specified value.
- `fillna(method='ffill')`: Forward fills missing values using the previous non-missing value.
- `fillna(method='bfill')`: Backward fills missing values using the next non-missing value.
df_filled = df.fillna(0) # Replaces missing values with 0
4.Replacing Missing Data with Statistical Values:
- `df.fillna(df.mean())`: Replaces missing values with the mean of the column.
- Other functions like `median()`, `mode()`, or custom logic can also be used.
df_filled = df.fillna(df['Age'].mean()) # Replace missing values in 'Age' with the column mean
```These operations allow effective handling of missing data to ensure accurate analysis and
computations.
15 What is NumPy in Python used for? Write a python program create an array?
NumPy is a Python library used for working with arrays.
Import numpy as
np
np.array([1,4,2,5,3]
)
OUTPUT: array([1,4,2,5,3])
0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64
18 What is Data frame?
A DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible
column names. DataFrame as a sequence of aligned Series objects.
Example:
states = pd.DataFrame({'population': population, 'area':
area})print(states)
Output:
area Population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
22
How a pandas dataframe can be constructed?
i) From a single Series object
ii) From a list of dicts
iii) From a dictionary of Series objects
iv) From a two-dimensional NumPy array
# Creating a Series
s = pd.Series([1, 2, 3, 4])
df_from_list_of_dicts = pd.DataFrame(data)
print(df_from_list_of_dicts)
iii) From a Dictionary of Series Objects
Each `Series` in the dictionary corresponds to a column, with keys serving as column names.
data = {
'Name': pd.Series(['Alice', 'Bob', 'Charlie']),
'Age': pd.Series([25, 30, 35])
}
df_from_dict_of_series = pd.DataFrame(data)
print(df_from_dict_of_series)
iv) From a Two-Dimensional NumPy Array
A two-dimensional NumPy array can be passed to create a DataFrame, along with optional row and
column labels.
import numpy as np