Numpy, Pandas & Visualisation Test
Numpy, Pandas & Visualisation Test
Assignment 02:
1. Find the average age of passengers for each class (1st, 2nd, and 3rd).
2. Create a new DataFrame that contains the count of male and female passengers in each age group (e.g.,
0-10, 11-20, etc.).
3. Find the name and ticket number of the passenger(s) who paid the highest fare and survived the disaster.
4. Calculate the survival rate for passengers who were traveling alone (without any siblings, spouses, parents, or
children) versus those who were traveling with family members.
5. For each passenger, calculate the age difference with the oldest sibling (if any) and the age difference with
the youngest sibling (if any).
6. Find the most common deck letter (A, B, C, etc.) for each passenger class.
7. Group the Titanic DataFrame by 'Embarked' (port of embarkation) and find the percentage of passengers
who survived in each group.
8. Calculate the correlation matrix for the 'Age', 'Fare', and 'Survived' columns in the Titanic dataset and find the
feature with the highest absolute correlation with 'Survived'.
9. Create a new DataFrame that contains the 'Pclass', 'Sex', 'Age', and 'Fare' columns from the Titanic dataset
and pivot it to have 'Pclass' as the index, 'Sex' as the columns, and 'Fare' as the values, with 'Age' as the
weights.
Assignment 03:
1. Scatter plot: Visualize the relationship between 'orbital_period' and 'mass' of the planets.
2. Bar plot: Display the count of planets discovered by each method.
3. Histogram: Visualize the distribution of 'distance' of planets from their respective stars.
4. Pair plot: Show pairwise relationships between 'orbital_period', 'mass', 'year', and 'distance'.
5. Violin plot: Compare the distribution of 'year' for different 'method' of planet discovery.
6. Swarm plot: Visualize the 'mass' of planets discovered by different 'method'
7. Heatmap: Create a heatmap to show the correlation matrix between numerical columns in the dataset.
8. Point plot: Compare the mean 'orbital_period' for each 'year' of planet discovery.
9. Bar plot with error bars: Show the mean 'mass' of planets for different 'method' of discovery with confidence
intervals.
Assignment 04:
Assignment 05:
Assignment 06:
1. Here, the Area column is having the data in a mixed format of `alphabetical` and `int` characters. You need to
remove the last two zeros out of the data present in the column. As you can see in the first record the 'Area'
column is A100100 after conversion it will look like this: A1001
2. Filter the dataframe for records only for 2011 in a different dataframe.
3. Find the average value of 'geo_count' for the year 2022
4. Find the net average of 'ec_count' for year 2010,2011,2012
5. Plot KDE distribution of 'ec_count', for anzsic06 is F371.
Assignment 07: