Understanding.results.with.Python.B0DCY757YS
Understanding.results.with.Python.B0DCY757YS
Chapter 1 Introduction
1. Purpose
2. About the Execution Environment for Source Code
Chapter 2 For beginners
1. Generating and Plotting a Sine Wave with Python
2. Scatter Plot of Random Points
3. Population Comparison of Cities
4. Creating a Histogram of Random Numbers
5. Simple Linear Regression with Synthetic Data
6. Box Plot Creation with Python
7. Heatmap of a 5x5 Matrix
8. Violin Plot Comparison
9. Comparing Monthly Sales of Two Products Using Python
10. Scatter Plot Matrix for 4D Dataset
11. Bar Chart of Average Student Scores
12. Market Share Analysis of Five Companies
13. Histogram of Ages
14. Polynomial Regression Curve
15. Creating a Box Plot for Heights Comparison
16. Generate a Heatmap of Random Values
17. Violin Plot for Weight Comparison
18. 3D Scatter Plot Generation
19. Population Growth Line Plot
20. Bar Chart of Company Revenues
21. Budget Expense Distribution
22. Histogram Analysis of Student Test Scores
23. Exponential Regression Analysis for Sales Growth Prediction
24. Generating a Heatmap from a 15x15 Random Matrix for Data
Analysis Practice
25. Violin Plot for Age Comparison Across Groups
26. 3D Surface Plot of a Trigonometric Function
Chapter 3 For advanced
1. Temperature Variation Analysis Over a Week
2. Generating a Scatter Plot Matrix from a 6-Dimensional Dataset
3. Sales Analysis of Products Over Quarters
4. Data Analysis with Python: Creating a Pie Chart for Activity
Distribution
5. Creating a Histogram to Analyze Income Distribution in a City
6. Logarithmic Regression Analysis for Sales Forecasting
7. Generate a Heatmap from Random Data in Python
8. 3D Scatter Plot Generation Using Python
9. Visualizing Stock Prices with Python
10. Creating a Bar Chart to Visualize Employee Distribution Across
Departments
11. Vehicle Distribution Analysis in a City
12. Analyzing the Weight Distribution of Individuals
13. Quadratic Regression with Synthetic Data
14. Creating a Box Plot to Compare Product Prices Across Categories
15. Generating and Analyzing a Heatmap from a 25x5 Matrix of
Random Values
16. Creating Violin Plots for Activity Duration Analysis
17. 3D Surface Plot of a Mathematical Function
18. Rainfall Data Analysis Using Python
19. Scatter Plot Matrix for Customer Purchase Data Analysis
20. Creating a Bar Chart to Compare Company Profits Over Three Years
21. Creating a Histogram for Product Length Distribution Analysis
22. Comparing Temperature Data Across Cities Using Python
23. Generate and Analyze a Heatmap from Random Data
24. Analyzing Vehicle Speed Data Using Violin Plots
25. 3D Scatter Plot Generation for Analyzing Customer Locations in 3D
Space
26. Analyzing Monthly Product Sales Using Python
27. Website Visitor Analysis with Bar Charts
28. Creating a Pie Chart for Library Book Distribution Analysis
29. Analyzing Customer Height Distribution for Clothing Store
Inventory
30. Sinusoidal Regression for Data Analysis
31. Analyzing Animal Weights with Box Plots
32. Visualizing Random Data with Heatmaps
33. Creating a Violin Plot for Task Completion Times
34. Creating a 3D Surface Plot from a Parametric Equation
35. Create a Line Plot of Hourly Temperature Variations Over a Day
36. Scatter Plot Matrix for 10-Dimensional Data Analysis
37. Creating a Bar Chart for Product Sales Analysis
38. Creating a Pie Chart for Beverage Distribution
39. Creating a Histogram of Ages
40. Logistic Regression Curve with Synthetic Data
41. Analyzing Plant Lengths with Box Plots
42. Generating a Heatmap from Random Data
43. Analyzing Game Scores: Creating Violin Plots
44. 3D Scatter Plot for Data Analysis Practice
45. Analyzing Monthly Household Expenses Over a Year
46. Bar Chart Creation for Product Sales Analysis
47. Analyzing Clothing Inventory Distribution with a Pie Chart
48. Creating a Histogram of 700 Individuals' Weights
49. Piecewise Regression with Synthetic Data
50. Creating a Box Plot to Compare Prices of Various Electronic Devices
51. Generating and Analyzing a Heatmap from a 45x45 Random Value
Matrix
52. Violin Plot Analysis of Event Durations
53. Analyzing Fractal Patterns in 3D Surface Plots
54. Weekly Factory Production Line Plot Analysis
55. Customer Distribution Analysis Across Multiple Restaurants
56. Visualizing the Distribution of Electronics in a Store
57. Analyzing the Distribution of Item Lengths in a Product Inventory
58. Spline Regression Curve with Synthetic Data
59. Comparative Analysis of Tree Heights Using Box Plots
60. Generating and Analyzing a Heatmap from Random Data
61. Project Completion Time Analysis with Violin Plot
62. Generating a 3D Scatter Plot with Python
63. Daily Energy Consumption Line Plot
64. Library Book Borrowing Analysis
65. Analyzing Furniture Distribution in a Household
66. Creating a Histogram for Age Distribution Analysis
67. Plotting a Rational Regression Curve with Synthetic Data
68. Analyzing and Visualizing Bird Species Weight Data Using Box
Plots
69. Generating and Visualizing a Heatmap from a 55x55 Matrix of
Random Values
70. Analyzing and Visualizing Sports Scores Using Violin Plots
71. Generating a 3D Surface Plot of a Chaotic System
72. Analyzing Monthly Rainfall Trends Over Three Years
73. Scatter Plot Matrix Analysis for Multidimensional Data in Marketing
Analytics
74. Create a Bar Chart of Employee Counts in Different Companies
75. Generating a Pie Chart for Gadget Distribution in a Store
76. Histogram of Weights for Data Analysis Practice
77. Comparing Insect Lengths Using Python Data Analysis
78. Heatmap Generation Using Python for Data Analysis
79. Comparing Activity Durations with a Violin Plot
80. 3D Scatter Plot Generation with Python
81. Visualizing Company Revenue Over a Decade
82. Scatter Plot Matrix Analysis for a 15-Dimensional Marketing
Dataset
83. Visualizing Park Visitor Data with Python
84. Vehicle Fleet Distribution Analysis
85. Histogram Analysis of Heights for Business Insights
86. Moving Average Curve with Synthetic Data
87. Creating a Box Plot to Compare Fruit Prices
88. Generate a Heatmap from a 65x65 Matrix of Random Values
89. Comparative Analysis of Animal Speeds Using Violin Plot
90. Generating and Analyzing a 3D Surface Plot of a Complex Algebraic
Function
Chapter 4 Request for review evaluation
Appendix: Execution Environment
Chapter 1 Introduction
1. Purpose
This ebook is designed for those who already have a basic understanding of
programming and are looking to deepen their skills in Python for data
analysis and statistical computation through hands-on practice.With 100
exercises, each accompanied by clear visual representations of the output
and detailed explanations, the learning process is made intuitive and
engaging.Whether you’re on the go or have just a few moments to spare,
this book allows you to easily expand your knowledge.
By running the provided source code, you can gain a more profound
understanding of the concepts.Each exercise is presented with both the
source code and the corresponding output, ensuring a comprehensive
learning experience.Through this structured approach, you’ll not only
reinforce your existing knowledge but also develop new insights into
Python’s capabilities for data analysis.
2. About the Execution Environment for Source
Code
For information on the execution environment used for the source code in
this book, please refer to the appendix at the end of the book.
Chapter 2 For beginners
1. Generating and Plotting a Sine Wave with
Python
Importance★★★★★
Difficulty★★☆☆☆
You are a data analyst tasked with helping a client visualize periodic data.
The client needs to see a simple sine wave plotted to understand the basic
behavior of their periodic data over time.
Generate the necessary data and create a plot to visualize this sine wave
using Python.
The data should represent one full cycle of the sine wave, from 0 to 2π, with
enough points to provide a smooth curve.
Use the appropriate libraries to generate the data and create the plot.
Ensure that the plot is labeled correctly and clearly shows the sine wave
pattern.
import numpy as np
【Code Answer】
import numpy as np
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
【Diagram Answer】
【Code Answer】
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y)
plt.ylabel('Y-axis')
plt.show()
To solve this problem, you need to use Python to generate and plot random
data points.
First, we import the necessary libraries: numpy for generating random
numbers and matplotlib.pyplot for plotting.
We then use numpy's rand function to create two arrays of 50 random
numbers each, representing the x and y coordinates of the points.
The scatter function from matplotlib.pyplot is used to create the scatter plot.
We add a title and labels for the x and y axes to make the plot more
informative.
Finally, the show function displays the plot.
【Trivia】
Scatter plots are a fundamental tool in data analysis, allowing for the
visualization of relationships between two variables.
They are particularly useful for identifying correlations, clusters, and
outliers in data.
The matplotlib library is one of the most widely used plotting libraries in
Python, offering a variety of functions for creating different types of plots
and charts.
3. Population Comparison of Cities
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a city planning department. You have
been asked to create a bar chart that compares the populations of five
different cities. The purpose of this chart is to help the department
understand population distribution and make informed decisions.
Create a Python script that generates a bar chart using the following cities
and their respective populations:
City A: 1,000,000
City B: 750,000
City C: 500,000
City D: 1,250,000
City E: 900,000
Use the provided data within the script and ensure the chart is clearly
labeled.
cities = ['City A', 'City B', 'City C', 'City D', 'City E'] # List of city names
【Code Answer】
cities = ['City A', 'City B', 'City C', 'City D', 'City E'] # List of city names
【Trivia】
‣ The matplotlib library was originally written by John D. Hunter and is
now maintained by a large community of developers.
‣ Bar charts are useful for comparing quantities of different categories or
groups.
‣ In addition to bar charts, matplotlib can be used to create a wide variety
of plots, including line plots, scatter plots, histograms, and pie charts.
‣ The matplotlib library is highly customizable, allowing users to change
the appearance of plots, add annotations, and create complex visualizations.
4. Creating a Histogram of Random Numbers
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst for a company that wants to understand the
distribution of certain metrics in their dataset.
Your task is to generate a histogram of 1000 random numbers drawn from a
normal distribution.
This will help visualize the distribution and identify any potential
anomalies.
The data should be generated within the code, and the final histogram
should be displayed using Python's data analysis and visualization libraries.
import numpy as np
data = np.random.randn(1000)
【Diagram Answer】
【Code Answer】
import numpy as np
data = np.random.randn(1000)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
To create the histogram, first, generate 1000 random numbers drawn from a
normal distribution using the numpy library's randn function.
This function produces numbers with a mean of 0 and a standard deviation
of 1.
Next, use matplotlib's hist function to create the histogram.
The bins parameter specifies the number of bins to divide the data into, and
the edgecolor parameter adds a black border to the bins for better
visualization.
Finally, add a title and labels for the x and y axes using plt.title, plt.xlabel,
and plt.ylabel functions respectively.
Call plt.show to display the histogram.
This process helps in visualizing the distribution of the generated data,
making it easier to identify patterns and anomalies.
【Trivia】
‣ Histograms are one of the most common ways to visualize the
distribution of a dataset.
‣ The shape of a histogram can reveal a lot about the data, such as whether
it is normally distributed, skewed, or has outliers.
‣ The number of bins can significantly affect the appearance of a
histogram. Too few bins can oversimplify the data, while too many bins can
overcomplicate it.
5. Simple Linear Regression with Synthetic Data
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst at a retail company.
Your manager has asked you to analyze the relationship between the
amount spent on advertising and the sales revenue.
To do this, you need to plot a simple linear regression line using synthetic
data.
Generate synthetic data for advertising spend and sales revenue, then plot
the data points and the regression line.
Make sure to label the axes and provide a legend.
import numpy as np
np.random.seed(0)
plt.scatter(advertising_spend, sales_revenue)
【Diagram Answer】
【Code Answer】
import numpy as np
np.random.seed(0)
plt.legend()
plt.show()
【Trivia】
Linear regression is one of the simplest and most commonly used
algorithms in machine learning.
It assumes a linear relationship between the input variables (independent
variables) and the single output variable (dependent variable).
Despite its simplicity, linear regression can be very powerful, especially
when the relationship between variables is indeed linear.
It is also the basis for more complex algorithms and is often used as a
benchmark model in predictive analytics.
6. Box Plot Creation with Python
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst at a retail company. You have been given three
datasets representing the sales figures of three different products over the
past year. Your task is to create a box plot to visualize the distribution of
sales for these products.
Write a Python script to generate this box plot. The script should include the
following steps:
Generate three datasets of sales figures.
Create a box plot to compare the distributions of these datasets.
Ensure the plot is properly labeled with titles and axis labels.
Use the provided code snippet to generate the sample data.
import numpy as np
np.random.seed(0)
【Code Answer】
import numpy as np
np.random.seed(0)
plt.xlabel('Products')
plt.ylabel('Sales Figures')
plt.show()
【Trivia】
Box plots, also known as whisker plots, were introduced by John Tukey in
1970. They are particularly useful for identifying outliers and understanding
the spread and skewness of data.
In addition to matplotlib, other libraries like seaborn can also be used to
create box plots in Python. seaborn provides a higher-level interface for
drawing attractive and informative statistical graphics.
7. Heatmap of a 5x5 Matrix
Importance★★★★☆
Difficulty★★☆☆☆
You are working as a data analyst for a company that needs to visualize
random data for a presentation.
Your task is to generate a heatmap of a 5x5 matrix with random values
between 0 and 1.
The heatmap will help in visually analyzing the distribution of these
random values.
Write the Python code to generate and display this heatmap.
Ensure that the code generates the random data within the script itself.
import numpy as np
data = np.random.rand(5, 5)
【Diagram Answer】
【Code Answer】
import numpy as np
data = np.random.rand(5, 5)
plt.colorbar()
【Trivia】
‣ Heatmaps are a great way to visualize matrix data and are widely used in
various fields such as biology (e.g., gene expression data), finance (e.g.,
correlation matrices), and sports analytics.
‣ The cmap parameter in plt.imshow can take various values like 'viridis',
'plasma', 'inferno', and 'magma', each providing a different color scheme.
‣ The numpy library is highly optimized for numerical operations and is a
fundamental package for scientific computing in Python.
8. Violin Plot Comparison
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a company that wants to visualize the
distribution of two different datasets.
Your task is to create a violin plot to compare these two datasets.
Generate the datasets within your code and ensure the plot is clear and
informative.
Use Python and the appropriate libraries to accomplish this task.
import numpy as np
np.random.seed(10)
【Code Answer】
import numpy as np
np.random.seed(10)
sns.violinplot(data=data)
plt.xlabel('Dataset')
plt.ylabel('Value')
plt.show()
import numpy as np
import pandas as pd
np.random.seed(0)
sales_data = pd.DataFrame({
'Month': months,
})
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
plt.xlabel('Month')
plt.ylabel('Sales')
plt.legend()
plt.grid(True)
plt.show()
To solve this problem, you first need to generate a dataset containing the
monthly sales data for two products.
This is done using numpy to create an array of months and to generate
random sales figures for each product.
The pandas library is then used to organize this data into a DataFrame,
which is a tabular structure that makes it easy to manage and manipulate the
data.
The core of this exercise is the use of matplotlib to create a line plot.
You start by importing the necessary libraries, including matplotlib.pyplot,
which is essential for creating plots in Python.
Two line plots are generated, one for each product, with distinct colors and
labels. This makes it easy to compare the sales trends between the two
products.
The xlabel, ylabel, and title functions are used to add labels and a title to the
plot, ensuring clarity and context.
The legend function is included to distinguish between the two products in
the plot. Finally, plt.grid(True) adds a grid to the plot for better readability.
The plt.show() command is crucial as it renders the plot, allowing you to
visually compare the sales data.
【Trivia】
The practice of using line plots to compare multiple data series is a common
method in data analysis. It is particularly useful for identifying trends over
time, such as seasonal sales patterns or the impact of marketing campaigns
on product performance.
10. Scatter Plot Matrix for 4D Dataset
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a retail company. Your manager has
asked you to analyze the relationships between four key performance
indicators (KPIs): sales, customer satisfaction, number of returns, and
marketing spend. Generate a scatter plot matrix to visualize these
relationships. Create the dataset within the code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
data = pd.DataFrame({
'sales': np.random.rand(100),
'customer_satisfaction': np.random.rand(100),
'returns': np.random.rand(100),
'marketing_spend': np.random.rand(100)
})
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
data = pd.DataFrame({
'sales': np.random.rand(100),
'customer_satisfaction': np.random.rand(100),
'returns': np.random.rand(100),
'marketing_spend': np.random.rand(100)
})
sns.pairplot(data)
plt.show()
import numpy as np
np.random.seed(0)
print(data)
【Diagram Answer】
【Code Answer】
import numpy as np
The goal is to create a bar chart showing the average scores of students
across five subjects using Python.First, we use numpy to generate random
scores for 30 students in each subject.
▸ This ensures the data is uniformly distributed within a specified range (50
to 100).The data is stored in a dictionary, with subjects as keys and lists of
scores as values.We calculate the average score for each subject using
numpy's mean function.Using matplotlib, we set up the plot:
‣ plt.figure(figsize=(10, 6)) sets the size of the plot.
‣ plt.bar(subjects, average_scores, color='skyblue') creates the bar chart,
with the subject names on the x-axis and the average scores on the y-axis.
‣ plt.xlabel('Subjects') and plt.ylabel('Average Score') label the axes for
clarity.
‣ plt.title('Average Scores of Students in Different Subjects') provides a title
for the chart.
‣ plt.ylim(0, 100) ensures the y-axis runs from 0 to 100 to align with
possible score values.Finally, plt.show() displays the chart.
【Trivia】
Bar charts are widely used in statistics to compare the frequency, count, or
other measures (such as mean) for different discrete categories of data.
They provide a clear and straightforward way to visualize the relative sizes
of different groups. Matplotlib, a powerful plotting library in Python, offers
extensive customization options for creating and fine-tuning such
visualizations.
12. Market Share Analysis of Five Companies
Importance★★★★☆
Difficulty★★☆☆☆
A company wants to visualize the market share distribution of its top 5
competitors to better understand the competitive landscape.
Your task is to generate a pie chart that displays the market share
percentages of these companies.
You should use Python for data analysis and visualization.
Create the data for the market shares within your code and then generate the
pie chart.
The market shares are as follows:
Company A: 25%, Company B: 20%, Company C: 15%, Company D: 30%,
Company E: 10%.
market_shares = {'Company A': 25, 'Company B': 20, 'Company C': 15,
'Company D': 30, 'Company E': 10}
【Diagram Answer】
【Code Answer】
market_shares = {'Company A': 25, 'Company B': 20, 'Company C': 15,
'Company D': 30, 'Company E': 10}
companies = list(market_shares.keys())
shares = list(market_shares.values())
plt.figure(figsize=(8, 8))
plt.show()
【Trivia】
‣ The pie chart is one of the simplest forms of data visualization and is best
used for representing parts of a whole.
‣ Matplotlib is a powerful library in Python that allows for a wide range of
static, animated, and interactive visualizations.
‣ While pie charts are popular, they are not always the best choice for
comparing parts to a whole, especially when there are many segments or the
values are very similar. Bar charts can sometimes be more effective in these
cases.
13. Histogram of Ages
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst at a marketing firm. Your manager has asked you to
analyze the age distribution of a sample of 100 customers to better
understand the target audience for a new product.
Create a histogram to visualize the age distribution of these 100 customers.
Generate the sample data within your code.
Ensure the histogram is clearly labeled with appropriate titles and axis
labels.
Use Python for this task and include all necessary imports in your code.
import numpy as np
【Code Answer】
import numpy as np
plt.xlabel('Age')
plt.ylabel('Number of Customers')
plt.show()
【Trivia】
‣ Histograms are a type of bar chart that represent the distribution of
numerical data.
‣ They are particularly useful for understanding the frequency distribution
of data points in different intervals.
‣ The choice of the number of bins can significantly affect the appearance
and interpretability of the histogram.
‣ numpy and matplotlib are two of the most commonly used libraries in
Python for data analysis and visualization.
14. Polynomial Regression Curve
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a company that wants to understand the
relationship between their marketing spend and sales. The company
suspects that the relationship is not linear and might be better captured by a
polynomial regression model. Your task is to generate synthetic data that
simulates this relationship and plot a polynomial regression curve to
visualize it. Use Python to create the data and generate the plot. Make sure
the plot is clear and well-labeled.
【Data Generation Code Example】
import numpy as np
np.random.seed(0)
y = 2 * X**2 + 3 * X + np.random.randn(100) * 10
plt.scatter(X, y)
plt.xlabel('Marketing Spend')
plt.ylabel('Sales')
plt.show()
【Diagram Answer】
【Code Answer】
import numpy as np
y = 2 * X**2 + 3 * X + np.random.randn(100, 1) * 10
degree = 2
model.fit(X, y)
y_pred = model.predict(X)
plt.scatter(X, y, label='Data')
plt.xlabel('Marketing Spend')
plt.ylabel('Sales')
plt.legend()
plt.show()
【Trivia】
‣ Polynomial regression can capture more complex relationships than linear
regression, but it can also lead to overfitting if the degree of the polynomial
is too high.
‣ The make_pipeline function in scikit-learn is useful for chaining together
multiple steps in a machine learning workflow, such as preprocessing and
model fitting, into a single object.
‣ Adding too many polynomial terms can make the model overly sensitive
to small fluctuations in the data, leading to poor generalization on new data.
This is known as the bias-variance tradeoff.
15. Creating a Box Plot for Heights Comparison
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a fitness company. They have
collected height data from their clients and want to compare the heights of
men and women to identify any significant differences. Your task is to
create a box plot that visualizes the height distribution for both men and
women using Python.Please generate a sample dataset within your code
with heights for both genders and create a box plot for comparison. Ensure
the data includes at least 50 entries for each gender.
【Data Generation Code Example】
import numpy as np
np.random.seed(0)
heights_men
heights_women
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
data = {
'Height': np.concatenate([heights_men, heights_women]),
df = pd.DataFrame(data)
plt.figure(figsize=(10, 6))
labels=['Men', 'Women'])
plt.ylabel('Height (cm)')
plt.show()
【Trivia】
‣ Box plots, also known as box-and-whisker plots, were introduced by John
Tukey in the 1970s.
‣ The box plot is particularly useful in descriptive statistics as it provides a
graphical summary of data, showing its spread and skewness.
‣ In a box plot, the box represents the interquartile range (IQR), which
contains the middle 50% of the data. The line inside the box is the median.
The "whiskers" extend to the smallest and largest values within 1.5 * IQR
from the lower and upper quartiles, respectively. Data points outside this
range are considered outliers and are plotted individually.
16. Generate a Heatmap of Random Values
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst at a retail company. Your manager has asked you to
generate a heatmap to visualize the performance of various stores across
different regions. To simulate this, create a 10x10 matrix of random values
representing the sales data. Use Python to generate this matrix and create a
heatmap to visualize the data. Ensure the heatmap is clear and easy to
interpret.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
To generate the heatmap, we first import the necessary libraries: numpy for
creating the random data, matplotlib.pyplot for plotting, and seaborn for
creating the heatmap.
We create a 10x10 matrix of random values using np.random.rand(10, 10).
This matrix simulates the sales data for different stores across various
regions.
Next, we set the size of the figure using plt.figure(figsize=(8, 6)) to ensure
the heatmap is large enough to be easily readable.
We then use sns.heatmap(data, annot=True, fmt=".2f", cmap="viridis") to
create the heatmap. The annot=True parameter adds the numerical values to
each cell, fmt=".2f" formats these values to two decimal places, and
cmap="viridis" sets the color map to "viridis" for better visual distinction.
Finally, we add a title and labels to the axes using plt.title('Sales
Performance Heatmap'), plt.xlabel('Region'), and plt.ylabel('Store')
respectively. The plt.show() function is called to display the heatmap.
This exercise helps in understanding how to visualize data using heatmaps,
which is a common technique in data analysis for identifying patterns and
trends in complex datasets.
【Trivia】
‣ Heatmaps are particularly useful in fields like bioinformatics, where they
are used to visualize gene expression data.
‣ The seaborn library is built on top of matplotlib and provides a high-level
interface for drawing attractive statistical graphics.
‣ The "viridis" color map is designed to be perceptually uniform, making it
easier to interpret the data accurately.
17. Violin Plot for Weight Comparison
Importance★★★★☆
Difficulty★★★☆☆
A client has collected weight data for three different groups of individuals
and wants to visualize the distribution of weights for each group using a
violin plot. Your task is to generate a violin plot comparing the weights of
these three groups.
Use the following code to create sample data for the weights of the three
groups. Ensure that the plot is properly labeled and includes a legend.
The purpose of this exercise is to practice Python data analysis and
statistical visualization.
import numpy as np
import pandas as pd
np.random.seed(42)
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
plt.figure(figsize=(10, 6))
plt.xlabel('Group')
plt.ylabel('Weight')
plt.show()
To create the violin plot, we first import the necessary libraries: NumPy for
numerical operations, pandas for data manipulation, matplotlib for plotting,
and seaborn for statistical visualization.
We set a random seed for reproducibility. Then, we generate three groups of
weight data using the np.random.normal function, which creates normally
distributed data. Each group has a different mean and standard deviation to
simulate real-world variability.
Next, we combine these groups into a single pandas DataFrame with two
columns: 'Weight' and 'Group'. The 'Weight' column contains the weight
data, and the 'Group' column indicates the group each weight belongs to.
We then create a violin plot using seaborn's violinplot function, specifying
'Group' as the x-axis and 'Weight' as the y-axis. The plt.figure function is
used to set the figure size. We add titles and labels for clarity. Finally, the
plt.show function displays the plot.
Violin plots are useful for visualizing the distribution of data across
different categories. They combine aspects of box plots and kernel density
plots, showing both summary statistics and the density of the data.
【Trivia】
Violin plots were introduced by J.L. Hintze and R.D. Nelson in 1998 as a
way to combine the benefits of box plots and density plots. They are
particularly useful for comparing the distribution of data across multiple
groups, as they provide more information about the data's density and
variability than standard box plots.
18. 3D Scatter Plot Generation
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a tech company. Your manager has
asked you to generate a 3D scatter plot of 100 random points in 3D space to
visualize the distribution of some experimental data.
Your task is to write a Python script that generates this plot.
Ensure that the plot is clearly labeled with appropriate axis titles.
The data should be generated within the script without reading from or
writing to any files.
Use the following guidelines:
Generate 100 random points for X, Y, and Z coordinates.
The range for each coordinate should be between 0 and 100.
Plot these points in a 3D scatter plot.
Label the axes as 'X-axis', 'Y-axis', and 'Z-axis'.
The plot should be displayed using a Python library.
import numpy as np
【Code Answer】
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_zlabel('Z-axis')
plt.show()
To solve this problem, we first need to generate random data points for the
X, Y, and Z coordinates.
We use the numpy library to generate 100 random values for each
coordinate within the range of 0 to 100.
The np.random.uniform function is used for this purpose, which generates
random numbers from a uniform distribution.
Next, we use the matplotlib library to create a 3D scatter plot.
The mpl_toolkits.mplot3d module provides the necessary tools to create 3D
plots.
We start by creating a figure object using plt.figure().
Then, we add a 3D subplot to this figure using fig.add_subplot(111,
projection='3d').
The projection='3d' argument specifies that this subplot will be a 3D plot.
We plot the generated data points using the ax.scatter(x, y, z) method, where
x, y, and z are the arrays of random points.
Finally, we label the axes using ax.set_xlabel('X-axis'), ax.set_ylabel('Y-
axis'), and ax.set_zlabel('Z-axis').
The plot is displayed using plt.show().
This exercise helps in understanding how to generate random data, create
3D plots, and label axes in Python using numpy and matplotlib.
【Trivia】
‣ The matplotlib library is one of the most widely used plotting libraries in
Python, known for its flexibility and ease of use.
‣ The mpl_toolkits.mplot3d module was introduced in matplotlib version
1.0.0, allowing users to create 3D plots.
‣ 3D scatter plots are particularly useful for visualizing the relationship
between three variables and can help in identifying patterns or clusters in
the data.
‣ The numpy library is often used in data analysis and scientific computing
for its powerful array operations and random number generation
capabilities.
19. Population Growth Line Plot
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a city planning department. The city has
collected population data over the past 10 years and wants to visualize this
data to understand the growth trend.
Create a Python script that generates a line plot showing the growth of the
population over 10 years.
You need to generate the input data within the script and then use it to
create the plot.
Ensure that the plot has appropriate labels for the x-axis (Years), y-axis
(Population), and a title (Population Growth Over 10 Years).
import numpy as np
【Code Answer】
import numpy as np
plt.xlabel('Years')
plt.ylabel('Population')
plt.grid(True)
plt.show()
【Trivia】
‣ Matplotlib is one of the most widely used plotting libraries in Python,
known for its flexibility and extensive customization options.
‣ The np.random.randint function is useful for generating random integers
within a specified range, which can be helpful for creating synthetic
datasets for testing and development purposes.
‣ Line plots are particularly effective for visualizing trends over time,
making them a common choice for time series data analysis.
20. Bar Chart of Company Revenues
Importance★★★★☆
Difficulty★★☆☆☆
A client has provided you with the annual revenue data of four different
companies.
Your task is to create a bar chart to visually represent this data.
The companies and their respective revenues (in million dollars) are as
follows:
Company A: 120
Company B: 90
Company C: 150
Company D: 110
Use Python to generate a bar chart to help the client visualize the revenue
distribution.
You need to write the Python code that generates this bar chart.
The data should be created within the code itself.
revenues=[120,90,150,110]
plt.bar(companies,revenues)
plt.xlabel('Companies')
plt.title('Revenue of Companies')
plt.show()
【Diagram Answer】
【Code Answer】
revenues=[120,90,150,110]
plt.bar(companies,revenues)
【Trivia】
The matplotlib library was originally written by John D. Hunter and is now
maintained by a large community of developers.
It is one of the most widely used plotting libraries in the Python ecosystem.
Bar charts are particularly useful for comparing quantities across different
categories and are one of the simplest yet most effective ways to visualize
data.
In addition to bar charts, matplotlib can create line plots, scatter plots,
histograms, and many other types of visualizations.
21. Budget Expense Distribution
Importance★★★★☆
Difficulty★★★☆☆
You are a financial analyst tasked with helping a client understand their
monthly expenses by visualizing the distribution of their budget.
Your goal is to generate a pie chart that clearly shows the percentage
distribution of various expense categories.
The expense categories and their respective amounts are as follows:
Rent: $1200
Groceries: $400
Utilities: $150
Transportation: $100
Entertainment: $200
Savings: $300
Write a Python script to generate a pie chart representing these expenses.
Ensure the pie chart is clearly labeled with each category and its percentage
of the total budget.
Use the following code to generate the input data for your script.
【Code Answer】
fig, ax = plt.subplots()
ax.axis('equal')
plt.title('Monthly Expense Distribution')
plt.show()
【Trivia】
‣ Matplotlib was originally developed by John D. Hunter in 2003.
‣ The library is designed to closely resemble MATLAB, a popular
commercial software for data visualization and analysis.
‣ Pie charts are useful for showing the relative proportions of different
categories in a dataset, but they can become difficult to interpret with too
many slices.
‣ It's often recommended to use pie charts for datasets with fewer than six
categories to maintain clarity.
22. Histogram Analysis of Student Test Scores
Importance★★★★☆
Difficulty★★☆☆☆
A school administrator has provided you with the test scores of 200
students from the latest examination.
The school needs to analyze the distribution of these scores to identify the
performance trends of the students.
Your task is to create a histogram of the test scores using Python to
visualize the data distribution.
Please write the Python code necessary to generate this histogram.
Use the generated histogram to help the school understand the performance
levels and whether there are any significant clusters of high or low scores.
Be sure to also provide insight into the most common score range and any
notable patterns observed from the histogram.
import numpy as np
import random
random.seed(42)
【Code Answer】
import numpy as np
import random
random.seed(42)
plt.hist(test_scores,bins=10,edgecolor='black')
plt.xlabel("Test Scores")
plt.ylabel("Number of Students")
plt.show()
【Trivia】
Histograms were first introduced by Karl Pearson, a British mathematician,
in the late 19th century.
They are now a standard tool in statistical analysis and data visualization
across various fields.
One interesting aspect of histograms is that they can reveal the underlying
distribution of data, such as normal distribution, skewed distribution, or
bimodal distribution, which can be critical for more advanced statistical
analysis.
23. Exponential Regression Analysis for Sales
Growth Prediction
Importance★★★★☆
Difficulty★★★☆☆
A retail company has observed rapid growth in the sales of one of its
products and suspects that the growth follows an exponential pattern.Your
task is to confirm this hypothesis by fitting an exponential regression model
to the sales data.The company has provided monthly sales data over the past
12 months.You need to generate synthetic sales data that follows an
exponential trend, fit an exponential regression model to this data, and
visualize both the actual sales and the predicted regression curve.Present
your findings in a plot.The company is interested in understanding how
well the exponential model fits the data and any potential deviations from
this model.
【Data Generation Code Example】
import numpy as np
np.random.seed(0)
【Code Answer】
import numpy as np
np.random.seed(0)
return a * np.exp(b * x)
plt.figure(figsize=(10, 6))
plt.xlabel('Months')
plt.ylabel('Sales')
plt.legend()
plt.show()
【Trivia】
Exponential growth is often observed in phenomena like population growth,
radioactive decay, and compound interest. In business, recognizing
exponential patterns early can be key to scaling operations efficiently and
capitalizing on rapid growth opportunities.
24. Generating a Heatmap from a 15x15 Random
Matrix for Data Analysis Practice
Importance★★★☆☆
Difficulty★★☆☆☆
Your client has tasked you with generating a visual representation of a
15x15 matrix, where each cell contains a random value.This visual will help
in understanding the distribution of the values across the matrix, which is
crucial for their ongoing data analysis project.Using Python, create a
heatmap to visually represent the data in this matrix.The focus should be on
how the data distribution can be analyzed using the heatmap, not just on
generating the visual.The matrix must be generated within the code itself,
with values randomly assigned.Ensure that the code is concise and can be
easily executed by someone with a basic understanding of Python.
【Data Generation Code Example】
import numpy as np
np.random.seed(42)
【Code Answer】
import numpy as np
np.random.seed(42)
plt.colorbar(label='Value')
plt.xlabel('Column Index')
plt.ylabel('Row Index')
plt.show()
import numpy as np
import pandas as pd
import random
【Code Answer】
import numpy as np
import pandas as pd
In this exercise, we are tasked with creating a violin plot to visualize the age
distribution of employees across four departments.
▸ Data Generation:
We first import necessary libraries: numpy, pandas, matplotlib.pyplot, and
seaborn.
We set a random seed for reproducibility, ensuring that our random numbers
can be recreated.
We define the four departments and generate random ages between 20 and
60 for 100 employees in each department using np.random.randint. This
creates a dictionary where each key is a department and the value is an
array of ages.
We convert this dictionary into a pandas DataFrame, which organizes our
data in a tabular format.
▸ Creating the Violin Plot:
We use plt.figure to define the size of our plot.
The sns.violinplot function from the Seaborn library is used to create the
violin plot. This plot combines a box plot and a kernel density plot, showing
the distribution of the data across different categories.
We add a title and labels for the x and y axes to provide context for the
viewer.
plt.xticks is used to set the x-axis labels to the names of the departments.
Finally, we call plt.show() to render the plot.
This exercise not only helps in visualizing data but also enhances
understanding of how different departments may vary in terms of employee
age distribution, providing valuable insights for human resource
management.
【Trivia】
Violin plots are particularly useful for comparing multiple distributions
because they show the density of the data at different values. Unlike box
plots, which only show summary statistics, violin plots provide a more
detailed view of the distribution shape, making them an excellent choice for
exploratory data analysis.
26. 3D Surface Plot of a Trigonometric Function
Importance★★★★☆
Difficulty★★★☆☆
A company is analyzing the behavior of a trigonometric function to
optimize their product design. They want to visualize the surface of the
function z=sin(x+y)z=\sin(\sqrt{x^2+y^2})z=sin(x+y) over a grid of xxx
and yyy values ranging from -5 to 5. Your task is to generate the input data
for this function and create a 3D surface plot.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X + Y))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
import numpy as np
days=
['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
【Code Answer】
import numpy as np
days=
['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
plt.plot(days,temperatures,marker='o',linestyle='-',color='b')
## Add titles and labels
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()
In this exercise, you are required to create a line plot that visualizes the
temperature variation over a week.
The sample dataset is generated using a list of days and randomly generated
temperature values around a mean of 22°C with slight variations introduced
by the np.random.randn() function.
The plot function is used to generate a line graph, where the x-axis
represents the days of the week, and the y-axis represents the temperature.
Each point on the graph is marked with an 'o' marker to make individual
data points more visible.
The plt.title, plt.xlabel, and plt.ylabel functions are used to add appropriate
labels to the graph, making it easier to understand.
The grid is enabled using plt.grid(True) to enhance readability by adding a
background grid.
Finally, plt.show() is called to display the plot.
This exercise helps to understand how to generate synthetic data for
analysis and create simple visualizations using Python's matplotlib library.
Understanding these basic plotting techniques is crucial for any data
analysis task, as visualizing data is often the first step in understanding and
communicating trends, patterns, and insights.
【Trivia】
Did you know that weather data has been recorded systematically since the
17th century? Early instruments, such as thermometers and barometers,
were developed in Europe and allowed for the first accurate recordings of
temperature and atmospheric pressure, laying the groundwork for modern
meteorology.
2. Generating a Scatter Plot Matrix from a 6-
Dimensional Dataset
Importance★★★☆☆
Difficulty★★★☆☆
A retail company wants to analyze the relationships between different
metrics of their products to improve sales strategies. They have six
dimensions of data: Price, Rating, Reviews, Stock, Discount, and Sales.
Your task is to generate a scatter plot matrix to visualize the relationships
among these six dimensions. Create the input data within the code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
data = {
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
data = {
df = pd.DataFrame(data)
plt.show()
【Trivia】
Scatter plot matrices are particularly useful in exploratory data analysis
(EDA) as they allow analysts to quickly identify relationships, trends, and
potential outliers in the data. They are commonly used in fields such as
finance, marketing, and healthcare to visualize complex datasets.
3. Sales Analysis of Products Over Quarters
Importance★★★☆☆
Difficulty★★☆☆☆
A retail company wants to analyze the sales performance of three products
(Product A, Product B, Product C) over four quarters. The company needs
to visualize this data to understand trends and make informed decisions.
Create a Python script to generate sample sales data for these products and
display a bar chart showing their sales across the four quarters.
【Data Generation Code Example】
import pandas as pd
import numpy as np
data
【Diagram Answer】
【Code Answer】
import pandas as pd
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.xticks(rotation=0)
plt.legend(title='Products')
plt.show()
In this exercise, we are focusing on visualizing sales data for three products
over four quarters using Python.
▸ Data Generation:
The first part of the code generates a DataFrame containing sales data for
three products across four quarters. The Product column lists the products,
the Quarter column indicates the respective quarters, and the Sales column
contains randomly generated sales figures.
▸ Data Pivoting:
The data is then pivoted to create a format suitable for plotting. This means
transforming the DataFrame so that each product's sales figures are
organized by quarter, allowing for a clear comparison across products.
▸ Plotting:
The plot method is used to create a bar chart. The kind='bar' argument
specifies that we want a bar chart. The title, x-label, and y-label are set to
make the chart informative. The xticks(rotation=0) ensures that the quarter
labels are horizontal for better readability. Finally, plt.show() displays the
chart.
This exercise not only helps in understanding how to manipulate and
visualize data using Python but also emphasizes the importance of data
analysis in making business decisions. By visualizing sales trends,
companies can identify which products are performing well and which may
need further marketing efforts.
【Trivia】
Visualizing data through charts and graphs is a powerful way to
communicate insights effectively. Bar charts, in particular, are excellent for
comparing different categories, making them a staple in data analysis and
reporting.
4. Data Analysis with Python: Creating a Pie
Chart for Activity Distribution
Importance★★★☆☆
Difficulty★★☆☆☆
A small company wants to analyze how its employees spend their time
during a typical workday. They are interested in understanding the
distribution of time spent on various activities, such as meetings, project
work, emails, and breaks. Your task is to create a pie chart that visualizes
this distribution. Generate the input data within your code.
【Data Generation Code Example】
import numpy as np
total_time = time_spent.sum()
【Code Answer】
import numpy as np
total_time = time_spent.sum()
percentages = (time_spent / total_time) * 100
plt.figure(figsize=(8, 6))
plt.axis('equal')
plt.show()
import numpy as np
np.random.seed(0)
【Code Answer】
import numpy as np
np.random.seed(0)
plt.ylabel('Number of Residents')
plt.show()
In this exercise, the primary goal is to learn how to create and interpret a
histogram using Python, which is a key tool in data analysis and statistical
interpretation.
The provided data represents the annual income of residents in a city. This
data is generated using a normal distribution with a mean (average) income
of $50,000 and a standard deviation of $15,000. This setup approximates a
realistic income distribution for a city.
The histogram is a type of bar chart that shows the frequency of data within
specified ranges (or "bins"). Each bar represents the number of data points
(incomes) that fall within a specific range. In this case, the bins=30
argument divides the income data into 30 intervals, allowing for a detailed
view of the distribution.
The edgecolor='black' argument is used to make the bars visually distinct
by adding a black border around each one. This improves the clarity of the
histogram.
The plt.title(), plt.xlabel(), and plt.ylabel() functions are used to label the
graph, which is crucial for making the visualization understandable to
others. The title, "Income Distribution in the City," gives a clear indication
of what the histogram represents, while the x-axis and y-axis labels
("Annual Income (USD)" and "Number of Residents," respectively) provide
context to the plotted data.
By analyzing the histogram, you can identify trends such as the most
common income range, the spread of income levels, and whether the
distribution is skewed towards higher or lower incomes. This information is
valuable for making informed decisions in urban planning and policy-
making.
【Trivia】
Histograms are not only used in income analysis but also widely used in
various fields like quality control, weather forecasting, and finance. For
example, in finance, histograms are used to observe the distribution of
returns for an asset, which can help in assessing risk.
6. Logarithmic Regression Analysis for Sales
Forecasting
Importance★★★☆☆
Difficulty★★★☆☆
A retail company has been tracking the sales performance of a new product
over the past several months.
The sales data appears to show exponential growth initially but then starts
to stabilize, suggesting a logarithmic pattern.
As a data analyst, your task is to analyze this data and create a logarithmic
regression model to forecast future sales.
First, you need to generate synthetic sales data that follows a logarithmic
trend.
Then, plot this data along with the logarithmic regression curve.
import numpy as np
import matplotlib.pyplot as plt
【Code Answer】
import numpy as np
plt.xlabel('Months')
plt.ylabel('Sales')
plt.legend()
plt.show()
import numpy as np
plt.colorbar()
plt.xlabel('Product Categories')
plt.ylabel('Regions')
plt.show()
【Diagram Answer】
【Code Answer】
import numpy as np
plt.colorbar()
plt.ylabel('Regions')
plt.show()
In this exercise, you will learn how to generate a heatmap using Python,
which is a powerful tool for visualizing data.
Data Generation: The first step involves creating a 20x0 matrix filled with
random values. This simulates the sales volume for different product
categories across various regions. The numpy library is utilized for this
purpose, specifically the np.random.rand(20, 20) function, which generates
a matrix of the specified dimensions filled with random floats between 0
and 1.
Visualization: To visualize the data, we use the matplotlib library, which is
widely used for plotting in Python. The plt.imshow() function is employed
to display the matrix as an image. The cmap='hot' argument specifies the
color map to use, where lower values are darker and higher values are
lighter, effectively representing lower and higher sales volumes.
Enhancing the Plot: The plt.colorbar() function adds a color bar to the side
of the plot, indicating the scale of values represented by the colors. Titles
and labels for the axes are added using plt.title(), plt.xlabel(), and
plt.ylabel() to make the plot informative.
Displaying the Heatmap: Finally, plt.show() is called to render the heatmap
on the screen.
This exercise not only demonstrates how to create a heatmap but also
provides insights into data visualization techniques in Python, which is
essential for data analysis and reporting in various fields, including
business, healthcare, and scientific research.
【Trivia】
Heatmaps are commonly used in various fields such as finance, biology,
and marketing to identify trends and patterns in data. They provide a quick
visual representation that can help in making informed decisions based on
data analysis.
8. 3D Scatter Plot Generation Using Python
Importance★★★★☆
Difficulty★★★☆☆
A retail company wants to analyze customer purchasing behavior based on
three different features: age, income, and spending score. Your task is to
generate a 3D scatter plot with 200 data points representing these features.
Each point should represent a customer, with age ranging from 18 to 70,
income ranging from $30,000 to $120,000, and spending scores ranging
from 1 to 100. Create the data within your code.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
In this exercise, you will learn how to generate a 3D scatter plot using
Python, which is a valuable skill in data analysis and visualization.
Understanding the Data: The data consists of three features: age, income,
and spending score. Each feature is important for understanding customer
behavior in a retail context.
Generating Random Data: The code uses the numpy library to create
random data points.
np.random.randint(18, 71, size=200) generates 200 random integers for age
between 18 and 70.
np.random.randint(30000, 120001, size=200) generates income values
between $30,000 and $120,000.
np.random.randint(1, 101, size=200) generates spending scores between 1
and 100.
▸ Creating the 3D Scatter Plot:
The matplotlib library is used for plotting.
A figure is created using plt.figure(), and a 3D subplot is added with
fig.add_subplot(111, projection='3d').
The ax.scatter() function plots the data points in 3D space, where c='blue'
specifies the color of the points and marker='o' specifies the shape of the
points.
▸ Labeling Axes and Title:
The axes are labeled using ax.set_xlabel(), ax.set_ylabel(), and
ax.set_zlabel(), which helps in understanding what each axis represents.
A title is added to the plot using ax.set_title().
Displaying the Plot: Finally, plt.show() is called to render the plot on the
screen.
This exercise not only helps you practice data generation and visualization
but also enhances your understanding of how to represent multi-
dimensional data effectively.
【Trivia】
Did you know that data visualization is a crucial step in data analysis? It
helps to identify patterns, trends, and outliers in the data, making it easier to
communicate findings to stakeholders. The 3D scatter plot is particularly
useful when dealing with three variables, allowing for a more
comprehensive view of the data relationships.
9. Visualizing Stock Prices with Python
Importance★★★★☆
Difficulty★★★☆☆
A financial analyst wants to visualize the stock prices of a company over
the course of a year to identify trends and patterns. Your task is to create a
line plot using Python that displays the stock prices for each month.
Generate the input data within your code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
data
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
plt.figure(figsize=(10, 5))
plt.xlabel('Date')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.grid()
plt.tight_layout()
plt.show()
In this exercise, you will learn how to visualize stock prices using Python, a
crucial skill for data analysis and statistical interpretation.
Data Generation: The code begins by importing necessary libraries: numpy,
pandas, and matplotlib.pyplot.
numpy is used for numerical operations, while pandas is essential for data
manipulation and analysis.
matplotlib.pyplot is the library that allows you to create static, animated,
and interactive visualizations in Python.
Creating Dates: The pd.date_range() function generates a range of dates
from January 1, 2023, to December 31, 2023, with a frequency of one
month (freq='M'). This creates a list of the last day of each month within the
specified range.
Generating Prices: The np.random.uniform() function generates random
stock prices between 100 and 200 for each month. This simulates the stock
price data for the year.
DataFrame Creation: A DataFrame is created using pd.DataFrame(), which
organizes the dates and prices into a structured format that can be easily
manipulated and visualized.
Plotting the Data: The plt.figure() function sets the size of the plot. The
plt.plot() function is used to create a line plot, where data['Date'] is on the x-
axis and data['Price'] is on the y-axis. The marker='o' argument adds
markers to each data point.
Adding Titles and Labels: The plt.title(), plt.xlabel(), and plt.ylabel()
functions add a title and labels to the axes, enhancing the readability of the
plot.
Formatting the X-axis: The plt.xticks(rotation=45) function rotates the x-
axis labels for better visibility.
Displaying the Grid: The plt.grid() function adds a grid to the plot, making
it easier to read the values.
Final Adjustments: The plt.tight_layout() function adjusts the padding of
the plot to make sure everything fits well without overlapping.
Showing the Plot: Finally, plt.show() displays the plot.
This exercise not only helps you understand how to visualize data in Python
but also emphasizes the importance of data analysis in making informed
business decisions.
【Trivia】
Did you know that data visualization is a powerful tool in data analysis? It
helps to convey complex data insights in a clear and understandable
manner, making it easier for stakeholders to make informed decisions based
on visual trends and patterns.
10. Creating a Bar Chart to Visualize Employee
Distribution Across Departments
Importance★★★☆☆
Difficulty★★☆☆☆
You have been hired by a mid-sized company to help them analyze their
workforce distribution across different departments.The HR department
wants a clear visualization to understand which departments have the
highest and lowest number of employees.Your task is to create a Python
script that generates a bar chart to display the number of employees in five
different departments.Please generate the data within the script, ensuring
the values are realistic for a company of this size.
【Data Generation Code Example】
【Code Answer】
plt.bar(departments, employee_count)
plt.xlabel('Departments')
plt.ylabel('Number of Employees')
plt.show()
To solve this problem, the first step is to import the necessary library,
matplotlib.pyplot, which is a common library used for creating
visualizations in Python.This problem involves creating a bar chart, so we
start by generating sample data that represents the number of employees in
different departments.In this case, we are considering five departments:
Sales, Engineering, HR, Marketing, and Finance.Each department is
associated with a corresponding number of employees, which is stored in
the employee_count list.The plt.bar() function is then used to create the bar
chart, where the first argument represents the categories (departments) and
the second argument represents the values (employee count).Labels for the
x-axis, y-axis, and the chart title are added using plt.xlabel(), plt.ylabel(),
and plt.title() respectively.Finally, plt.show() is called to display the bar
chart.This exercise is beneficial for learning how to create basic
visualizations in Python, which is a crucial skill in data analysis and
reporting.
【Trivia】
Did you know that bar charts are one of the most widely used chart types
for data visualization?They are particularly useful for comparing the
quantities of different categories, making them ideal for situations like this
where you want to compare the number of employees across
departments.Bar charts can be oriented either horizontally or vertically,
depending on what best suits the data being presented.
11. Vehicle Distribution Analysis in a City
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a city transportation department.
The department wants to analyze the distribution of different types of
vehicles within the city to optimize traffic flow and resource allocation.
Your task is to generate a pie chart that visually represents the distribution
of various types of vehicles (e.g., cars, buses, trucks, motorcycles) in the
city.
The input data is not provided; you need to generate a sample dataset
representing the number of each type of vehicle.
Write the Python code required to create this dataset and plot the pie chart
using Matplotlib.
The chart should clearly show the percentage distribution of each vehicle
type.
import random
【Code Answer】
import random
plt.show()
In this exercise, you are asked to analyze the distribution of different types
of vehicles within a city by generating a pie chart.
The purpose of this task is to practice Python data analysis and visualization
techniques.
You begin by importing the necessary library, matplotlib.pyplot, which is a
powerful plotting library in Python.
The random module is used to generate sample data for the different vehicle
types.
The vehicle types are stored in the list vehicle_types, and the corresponding
counts of each vehicle type are generated randomly and stored in
vehicle_counts.
The plt.pie() function is used to create the pie chart, with labels parameter
assigning the vehicle types to each slice of the pie, and autopct displaying
the percentage of each type on the chart.
Finally, the plt.show() function displays the chart, providing a visual
representation of the vehicle distribution.
This type of analysis is practical for real-world applications where visual
data representation can inform decision-making processes in fields such as
transportation and urban planning.
【Trivia】
The pie chart was first popularized by William Playfair in 1801 as a means
to represent data visually. It has since become a standard tool in data
visualization for illustrating proportional data. However, experts advise
using pie charts only when the data categories are limited in number, as too
many slices can make the chart difficult to interpret.
12. Analyzing the Weight Distribution of
Individuals
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a health and wellness company.
The company is conducting a study to understand the weight distribution
among a sample of 300 individuals.
Your task is to generate a histogram that visualizes this weight distribution.
Additionally, analyze the distribution to determine if it follows a normal
distribution and describe the central tendency of the data.
Use Python to simulate the data and create the histogram.
import numpy as np
【Code Answer】
import numpy as np
plt.title('Histogram of Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
【Trivia】
Histograms are one of the most basic yet powerful tools in statistical
analysis.
They provide an immediate visual summary of the distribution of a dataset,
making it easier to understand underlying patterns.
In real-world applications, histograms are frequently used in quality control
processes, economics, and any field where understanding data distribution
is crucial.
13. Quadratic Regression with Synthetic Data
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a company that wants to model the
relationship between advertising spend and sales.
The company believes that the relationship is quadratic, meaning that after
a certain point, additional spending results in diminishing returns.
Your task is to create a synthetic dataset that simulates this scenario and
then plot a quadratic regression curve to visualize the relationship.
Use Python to generate the data and plot the curve.
import numpy as np
np.random.seed(0)
X = 2 - 3 * np.random.normal(0, 1, 100)
Y = X - 2 * (X ** 2) + np.random.normal(-3, 3, 100)
【Diagram Answer】
【Code Answer】
import numpy as np
np.random.seed(0)
X = 2 - 3 * np.random.normal(0, 1, 100)
Y = X - 2 * (X ** 2) + np.random.normal(-3, 3, 100)
# Reshape data
X = X[:, np.newaxis]
Y = Y[:, np.newaxis]
polynomial_features= PolynomialFeatures(degree=2)
X_poly = polynomial_features.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, Y)
Y_poly_pred = model.predict(X_poly)
plt.title('Quadratic Regression')
plt.xlabel('Advertising Spend')
plt.ylabel('Sales')
plt.legend()
plt.show()
‣ The task involves generating synthetic data to simulate a real-world
scenario where the relationship between two variables is quadratic.
‣ First, we use NumPy to create random data points for X, which represent
advertising spend, and Y, which represent sales.
‣ The relationship is defined as a quadratic equation: Y = X - 2 * (X ** 2)
+ noise, where noise is added to simulate variability.
‣ The PolynomialFeatures class from sklearn.preprocessing is used to
transform the input data X to include polynomial terms up to the specified
degree (in this case, 2 for quadratic).
‣ We then fit a linear regression model using these polynomial features.
This allows us to model non-linear relationships by transforming the input
space.
‣ The LinearRegression class from sklearn.linear_model is used to fit the
model to the polynomial-transformed data.
‣ Finally, we plot the original data points and the quadratic regression curve
using Matplotlib. The scatter plot shows the data points, and the line plot
shows the fitted quadratic curve.
‣ The plot is labeled with titles and axis labels to make it clear what the
data represents.
【Trivia】
‣ Quadratic regression is a type of polynomial regression that is used when
data shows a parabolic trend.
‣ It is particularly useful in scenarios where there is an initial increase in
response with an increase in the predictor variable, followed by a decrease.
‣ This type of analysis can be applied in various fields, such as economics,
biology, and engineering, to model complex relationships.
14. Creating a Box Plot to Compare Product
Prices Across Categories
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst at a retail company.The company wants to
understand the price distribution of products in four different categories:
Electronics, Furniture, Clothing, and Groceries.Your task is to create a box
plot that visually compares the price distributions across these four
categories.First, generate a sample dataset with random prices for each
category.Then, use this dataset to create the box plot.Ensure that the plot
clearly shows the median, quartiles, and any outliers for each category.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data)
plt.figure(figsize=(10, 6))
labels=categories)
plt.xlabel('Category')
plt.ylabel('Price ($)')
plt.show()
The task is to create a box plot that compares the price distributions of
products across four different categories.
A box plot is useful for visualizing the distribution of data based on five
summary statistics: the minimum, first quartile (Q1), median, third quartile
(Q3), and maximum.
Outliers, if any, are also highlighted, making it easier to identify unusual
data points.
The code starts by importing necessary libraries such as NumPy, pandas,
and Matplotlib.
NumPy is used to generate random prices, while pandas is used to organize
the data into a DataFrame.
Matplotlib is the library used to create the box plot.
The sample data is generated by first creating a list of categories and then
randomly selecting a category for each of the 200 products.
For each product, a random price is generated using np.random.uniform,
which creates a uniform distribution of prices between 5 and 500.
In the plotting section, a figure of size 10x6 inches is created.
The plt.boxplot function is used to generate the box plot.
The function takes as input a list of price arrays, each corresponding to a
different category.
Labels are added to the x-axis to represent each category, and titles are
added to both the plot and axes for clarity.
Finally, plt.show() displays the plot, showing the price distribution for each
category, which allows the retail company to easily compare the pricing
patterns of different product categories.
【Trivia】
The box plot, also known as a whisker plot, was first introduced by John
Tukey in 1970.It’s a standard way of displaying the distribution of data
based on a five-number summary.Box plots are especially useful in
exploratory data analysis for identifying outliers and understanding the
central tendency and variability of the data.
15. Generating and Analyzing a Heatmap from a
25x5 Matrix of Random Values
Importance★★★☆☆
Difficulty★★☆☆☆
A retail company wants to analyze the distribution of sales across different
regions to identify potential areas for expansion.
They have divided their target market into a 25x5 grid, with each cell
representing a different region.
To simulate and analyze this data, generate a heatmap from a 25x5 matrix
filled with random sales data.
After generating the heatmap, provide insights into how the data is
distributed and identify any patterns or anomalies.
Use Python's data analysis and visualization libraries to create the heatmap.
Do not use any external data sources; generate the data within your code.
import numpy as np
【Code Answer】
import numpy as np
【Trivia】
The concept of heatmaps originated in the 19th century when early forms of
heatmaps were used to show temperature variations.
Today, heatmaps are widely used in various fields, including web analytics,
biology, and finance, to visualize data and identify patterns.
16. Creating Violin Plots for Activity Duration
Analysis
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a fitness app company.
The company has collected data on the durations of six different activities:
Running, Cycling, Swimming, Yoga, Weightlifting, and Meditation.
Your task is to create a violin plot to visualize the distribution of durations
for these activities.
This will help the company understand which activities have the most
variability in duration.
Generate the data within your code and ensure that the plot is clear and
informative.
import numpy as np
import pandas as pd
np.random.seed(42)
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
activities = ['Running', 'Cycling', 'Swimming', 'Yoga', 'Weightlifting',
'Meditation']
plt.xlabel('Activity')
plt.ylabel('Duration (minutes)')
plt.xticks(rotation=45)
plt.show()
import numpy as np
x = np.linspace(-6, 6, 100)
y = np.linspace(-6, 6, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x + y))
【Diagram Answer】
【Code Answer】
import numpy as np
x = np.linspace(-6, 6, 100)
y = np.linspace(-6, 6, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x + y))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
【Trivia】
‣ The function f(x, y) = sin(sqrt(x^2 + y^2)) is known as the "ripple"
function because it creates a pattern similar to ripples on a pond.
‣ 3D surface plots are commonly used in various fields, including finance,
engineering, and physics, to visualize complex functions and data.
‣ The matplotlib library is one of the most widely used plotting libraries in
Python, offering extensive capabilities for 2D and 3D plotting.
18. Rainfall Data Analysis Using Python
Importance★★★★☆
Difficulty★★★☆☆
A local agricultural company is interested in analyzing the rainfall data over
the past year to make informed decisions about crop irrigation. Your task is
to create a line plot showing the monthly rainfall for the last 12 months. The
rainfall data should be generated within the code itself.
【Data Generation Code Example】
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.figure(figsize=(10, 5))
plt.ylabel('Rainfall (mm)')
plt.xticks(data['Month'])
plt.grid()
plt.show()
【Trivia】
Did you know that rainfall data is crucial for predicting crop yields?
Accurate rainfall analysis can significantly enhance agricultural
productivity.
Python's data visualization capabilities make it a popular choice among data
scientists for analyzing trends and patterns in various fields, including
agriculture, finance, and health.
19. Scatter Plot Matrix for Customer Purchase
Data Analysis
Importance★★★★☆
Difficulty★★★☆☆
A retail company wants to understand the relationship between different
factors that influence customer purchasing behavior. They have gathered
data on 8 different variables, including age, income, product category
preference, purchase frequency, average spending, time spent on the
website, customer satisfaction, and the number of products reviewed.
The company needs a detailed analysis to identify patterns or correlations
among these variables to better target their marketing efforts.
Generate a scatter plot matrix of the 8-dimensional dataset to visually
explore potential relationships between these variables. The scatter plot
matrix should be created using Python.
import numpy as np
import pandas as pd
df = pd.DataFrame(data, columns=columns)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
df = pd.DataFrame(data, columns=columns)
plt.show()
【Trivia】
Did you know that scatter plot matrices are often referred to as "sploms"?
The term "splom" stands for Scatter Plot Matrix, and it was first coined by
John W. Tukey, a pioneering statistician known for developing exploratory
data analysis techniques. These matrices are particularly useful when
dealing with multivariate data, as they provide a compact way to visualize
the relationships between all pairs of variables in a single view.
20. Creating a Bar Chart to Compare Company
Profits Over Three Years
Importance★★★★☆
Difficulty★★★☆☆
You are a financial analyst working for a consulting firm.
Your client, a portfolio manager, has requested a comparative analysis of
the profits of four companies over the past three years.
The goal is to visualize the trend in profits and identify which company has
shown the most consistent growth.
You need to create a bar chart that clearly displays the profits of these
companies across the three-year period.
This chart will be used to help the client make decisions on which company
to invest in further.
Generate the data for the companies’ profits and write Python code to
produce the required bar chart.
Focus on using data analysis and visualization techniques efficiently to
convey the needed insights.
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
df.T.plot(kind='bar')
plt.xlabel('Year')
plt.xticks(rotation=0)
plt.legend(title='Companies')
plt.show()
This exercise focuses on creating a bar chart to compare the profits of four
companies over three years.
To begin, random data is generated using numpy to simulate the profits for
each company in each year.
This data is structured in a DataFrame using pandas, with companies as the
rows and years as the columns.
The data is then transposed to facilitate plotting, placing years on the x-axis
and profits on the y-axis.
The matplotlib library is used to create a bar chart, which is an effective
way to visually compare the profits across different years and companies.
The chart is customized with a title, axis labels, and a legend, which helps
in making the data easily interpretable.
This type of visualization is particularly useful in financial analysis, as it
allows stakeholders to quickly assess performance trends over time.
By plotting the data, analysts can provide insights into which companies are
consistently performing well, making it easier for clients to make informed
investment decisions.
The main learning points include data manipulation using pandas, creating
visualizations with matplotlib, and understanding how to interpret bar
charts in a business context.
【Trivia】
Bar charts are among the most common types of visualizations used in
business analytics due to their simplicity and clarity.
They allow easy comparison of different groups, making them ideal for
displaying performance metrics like profit, sales, and other financial data.
21. Creating a Histogram for Product Length
Distribution Analysis
Importance★★★★☆
Difficulty★★☆☆☆
A company is analyzing the length distribution of its newly manufactured
products to ensure consistency in production quality. You have been given
the task of visualizing the distribution of lengths for 400 products.
Create a Python script that generates a histogram to represent the
distribution of these product lengths.
The data for the lengths should be generated randomly within a realistic
range that a company might expect for their product, such as between 50
cm and 150 cm.
The script should plot the histogram and provide labels for both axes.
import random
【Code Answer】
import random
In this exercise, we generate random length data for 400 products using the
random.uniform function, which creates floating-point numbers within a
specified range—in this case, between 50 and 150 cm.
This range is chosen to reflect a plausible variation in product lengths,
depending on what the company manufactures.
The lengths are stored in a list called lengths, which is then used as input
for the histogram.
The histogram is created using the plt.hist() function from the Matplotlib
library, where the data is grouped into 20 bins. The bins parameter
determines how the data is divided on the x-axis, with each bin representing
a range of product lengths.
The edgecolor='black' parameter is used to add a black border around each
bin, making the individual bins easier to distinguish.
The x-axis (plt.xlabel) and y-axis (plt.ylabel) are labeled to indicate that the
x-axis represents the product lengths in centimeters, while the y-axis shows
the frequency of products falling within each length range.
A title is added using plt.title() to give context to the histogram, and the
plt.grid(True) function is used to add a grid to the plot, making it easier to
read the values. Finally, plt.show() is called to display the histogram.
【Trivia】
Histograms are one of the most effective ways to visualize the distribution
of data, especially when you need to quickly understand the spread and
concentration of values within a dataset. They are widely used in quality
control processes across various industries to ensure that product
dimensions stay within acceptable limits.
22. Comparing Temperature Data Across Cities
Using Python
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst at a weather monitoring company. Your manager has
asked you to create a box plot comparing the temperatures recorded in five
different cities over the past week. The cities are New York, Los Angeles,
Chicago, Houston, and Miami. Use Python to generate the necessary data
and create the box plot.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(temperature_data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(temperature_data)
plt.figure(figsize=(10, 6))
plt.ylabel('Temperature (°C)')
plt.grid()
plt.show()
In this exercise, you will learn how to create a box plot in Python using the
Matplotlib library, which is a powerful tool for data visualization. A box
plot provides a visual summary of the central tendency, dispersion, and
skewness of a dataset. It shows the median, quartiles, and potential outliers,
making it an excellent choice for comparing distributions across different
groups—in this case, the temperatures in five cities.
To begin, you will generate synthetic temperature data for each city using
the NumPy library. The np.random.normal function is used to create
normally distributed data points, where loc specifies the mean temperature
for each city, and scale determines the standard deviation. The size
parameter indicates the number of data points generated.
Next, you will organize this data into a Pandas DataFrame, which allows for
easy manipulation and plotting. The DataFrame will contain columns
corresponding to each city, filled with the generated temperature data.
Finally, you will use Matplotlib to create the box plot. The plt.boxplot
function takes a list of data arrays (one for each city) and plots them. You
will also set the title and label the y-axis to indicate that the temperatures
are measured in degrees Celsius. The plt.grid() function adds a grid to the
plot for better readability, and plt.show() displays the plot.
This exercise will help you understand how to visualize data effectively,
which is a crucial skill in data analysis and statistics.
【Trivia】
Did you know that box plots are particularly useful for identifying outliers
in your data? Outliers are data points that fall significantly outside the range
of the rest of the data, and box plots visually highlight these points,
allowing analysts to investigate them further.
23. Generate and Analyze a Heatmap from
Random Data
Importance★★★★☆
Difficulty★★★☆☆
A market research company wants to visualize the distribution of customer
satisfaction scores across various products. They want you to simulate a
30x30 matrix representing these scores, where each element in the matrix is
a random value between 0 and 1. Your task is to generate this data, create a
heatmap, and analyze any patterns or anomalies that might be visible. Write
the code to generate the heatmap and explain how such visualizations can
be useful for identifying trends or outliers in the data.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
import numpy as np
np.random.seed(42)
cars_speeds=[np.random.normal(70,10)for _ in range(100)]
trucks_speeds=[np.random.normal(60,8)for _ in range(100)]
motorcycles_speeds=[np.random.normal(85,15)for _ in range(100)]
【Diagram Answer】
【Code Answer】
import numpy as np
np.random.seed(42)
cars_speeds=[np.random.normal(70,10)for _ in range(100)]
trucks_speeds=[np.random.normal(60,8)for _ in range(100)]
motorcycles_speeds=[np.random.normal(85,15)for _ in range(100)]
vehicle_speeds=cars_speeds+trucks_speeds+motorcycles_speeds
vehicle_types=['Cars']*100+['Trucks']*100+['Motorcycles']*100
sns.violinplot(x=vehicle_types,y=vehicle_speeds)
plt.xlabel('Vehicle Type')
plt.ylabel('Speed (km/h)')
plt.show()
【Trivia】
Did you know that violin plots are named for their resemblance to the shape
of a violin? Unlike box plots, which only show summary statistics like the
median and quartiles, violin plots provide a richer visualization of the data
distribution, showing both the probability density and summary statistics
simultaneously.
25. 3D Scatter Plot Generation for Analyzing
Customer Locations in 3D Space
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a logistics company that wants to
visualize the distribution of customer locations in a 3D space.The company
is planning to optimize delivery routes by analyzing the geographical
spread of its customers across different regions.Generate a 3D scatter plot
representing 300 customer locations in 3D space using random data
points.The company wants to see how the customers are distributed along
the X, Y, and Z coordinates.Ensure that the data points cover a wide range
of values to give a clear picture of customer distribution.Create the data
points directly in the code and generate the plot.Your task is to write the
Python code necessary to create this scatter plot, ensuring that the data is
randomly distributed.The focus is not just on generating the plot but also on
understanding how to work with 3D data and analyzing its distribution.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
In this exercise, you are asked to generate a 3D scatter plot using randomly
distributed data points.
The primary goal is to practice working with 3D data and understand how
to visualize it using Python.
You begin by generating three sets of random numbers representing the X,
Y, and Z coordinates of the points.
These coordinates simulate customer locations in a 3D space, allowing the
logistics company to analyze how these locations are spread out
geographically.
The numpy library is used to generate the random data points within a
specified range (-100 to 100 in this case).
This range is chosen to ensure a wide distribution of points, providing a
comprehensive view of the customer locations.
After generating the data, you use the matplotlib library to create a 3D
scatter plot.
The Axes3D object is added to the figure, allowing you to plot in three
dimensions.
The scatter function plots the points, and labels are added to each axis to
make the plot easier to interpret.
Finally, the plt.show() function is called to display the plot, giving you a
visual representation of the data.
This visualization helps in understanding how customers are spread across
different regions, which can be valuable for optimizing delivery routes.
By practicing with this example, you learn how to generate, visualize, and
analyze 3D data, which is an essential skill in many areas of data analysis
and statistics.
【Trivia】
Did you know that 3D scatter plots are commonly used in fields like
astronomy to visualize the distribution of stars and galaxies in space?They
are also widely used in marketing to analyze customer segments across
multiple dimensions, such as age, income, and purchasing behavior.
26. Analyzing Monthly Product Sales Using
Python
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a retail company.
Your manager has provided you with daily sales data for a specific product
over the past month.
You need to analyze this data and create a visual representation that will
help the company understand the product's sales trends.
Specifically, you need to generate a line plot showing the daily sales of the
product throughout the month.
Create the necessary data within the code, and then use Python to produce a
line plot.
import numpy as np
import pandas as pd
##Creating a DataFrame
【Code Answer】
import numpy as np
import pandas as pd
##Creating a DataFrame
plt.figure(figsize=(10, 6))
plt.ylabel('Sales')
plt.grid(True)
plt.show()
In this exercise, we are simulating the analysis of daily sales data for a
product over one month.
The goal is to visualize the sales trends by creating a line plot.
The first step involves generating a synthetic dataset.
We use numpy to create an array representing the days of the month (from 1
to 31) and generate random sales numbers using numpy.random.randint,
which simulates the daily sales figures.
These random values are intended to represent the variability in daily sales
over the month.
Next, we store this data in a pandas DataFrame.
A DataFrame is a two-dimensional labeled data structure that is well-suited
for handling and analyzing structured data.
After the data is prepared, we use matplotlib.pyplot to create the line plot.
We set up the figure size for better visualization, plot the data using the plot
function, and add markers to each data point for clarity.
The plot includes titles and labels for the x and y axes, making it easier to
understand the context of the data.
Finally, we enable gridlines for improved readability of the plot.
This exercise not only demonstrates how to generate and plot data in
Python but also emphasizes the importance of visualizing data to gain
insights into sales trends.
【Trivia】
Did you know that line plots are one of the simplest yet most effective ways
to visualize time series data?
They are widely used in fields such as finance, economics, and sales
analysis because they clearly show trends and patterns over time.
27. Website Visitor Analysis with Bar Charts
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst at a digital marketing agency.
Your client wants to understand the web traffic patterns for four different
websites over the past month.
Using Python, create a bar chart to visualize the number of visitors to these
websites.
The websites are named Site A, Site B, Site C, and Site D.
Generate sample data for the number of visitors for each site and create a
bar chart to present this data.
import numpy as np
np.random.seed(0)
【Code Answer】
import numpy as np
plt.bar(websites, visitor_counts)
plt.xlabel('Websites')
plt.ylabel('Number of Visitors')
plt.show()
‣ This task involves creating a bar chart using Python to visualize data.
‣ First, we generate sample data for the number of visitors to four websites
using the numpy library.
‣ The numpy.random.randint function is used to create random integers
within a specified range, simulating visitor counts.
‣ The matplotlib.pyplot library is used for plotting the bar chart.
‣ The plt.bar() function creates a bar chart with the website names on the x-
axis and the visitor counts on the y-axis.
‣ The plt.xlabel() and plt.ylabel() functions label the x-axis and y-axis,
respectively, providing context for the data.
‣ The plt.title() function adds a title to the chart, making it clear what the
visualization represents.
‣ Finally, plt.show() displays the chart.
‣ This exercise helps understand how to visualize categorical data using bar
charts, a common task in data analysis.
【Trivia】
‣ Bar charts are one of the most commonly used data visualization tools
because they are easy to understand and interpret.
‣ They are particularly useful for comparing quantities across different
categories.
‣ In Python, the matplotlib library is a powerful tool for creating a wide
range of static, animated, and interactive visualizations.
28. Creating a Pie Chart for Library Book
Distribution Analysis
Importance★★★★☆
Difficulty★★☆☆☆
A local library wants to analyze the distribution of different types of books
in its collection to better understand the preferences of its visitors.You have
been asked to generate a pie chart that shows the proportion of various book
categories in the library.To achieve this, you need to first generate sample
data for the types of books and then use Python to create a pie chart that
visualizes this distribution.Generate the data programmatically and create
the pie chart accordingly.
【Data Generation Code Example】
import random
import numpy as np
【Code Answer】
import random
import numpy as np
ax = fig.add_subplot(1, 1, 1)
ax.pie(category_data.values(), labels=category_data.keys(),
autopct='%1.1f%%')
plt.show()
In this exercise, you are tasked with generating and visualizing the
distribution of different types of books in a library using a pie chart.
Pie charts are a simple yet effective way to visualize the proportional
representation of various categories within a dataset.
▸ Here’s a detailed breakdown of the steps taken:
‣ Data Generation: First, we generate random data representing the number
of books in different categories.
This is done using the random.randint() function, which creates a list of
random integers between 50 and 300 for each book category.
The book categories are stored in the list book_categories, and their
corresponding counts are stored in num_books.
We then combine these two lists into a dictionary category_data, where
keys are the book categories and values are the number of books in each
category.
‣ Data Visualization: To visualize the data, we use the matplotlib library,
which is a powerful tool for creating static, animated, and interactive
visualizations in Python.
We create a figure and an axis using plt.figure() and fig.add_subplot(),
respectively.
The ax.pie() function is used to create the pie chart, where
category_data.values() provides the sizes of each wedge, and
category_data.keys() provides the labels.
The autopct parameter is set to '%1.1f%%', which formats the percentage
value displayed on each wedge to one decimal place.
Finally, we set the title of the pie chart using ax.set_title() and display the
chart with plt.show().
This exercise is crucial for understanding how to create visualizations based
on data, which is a common requirement in data analysis tasks.
It teaches you how to generate data programmatically and visualize it in a
way that is easy to interpret and communicate to others.
By focusing on the steps required to achieve this, you gain hands-on
experience in using Python for data analysis and visualization.
【Trivia】
Did you know that pie charts were first used in 1801 by William Playfair, a
Scottish engineer and political economist?
He is also credited with inventing several other types of graphs, including
the line chart and bar chart.
29. Analyzing Customer Height Distribution for
Clothing Store Inventory
Importance★★★★☆
Difficulty★★★☆☆
A clothing store chain wants to optimize its inventory by better
understanding the height distribution of its customers.
You are tasked with analyzing the height data of 500 randomly selected
individuals, which represents a sample of their customer base.
Using Python, create a histogram to visualize this height distribution.
This will help the store in deciding the range of sizes to keep in stock.
Generate the data within your script, assuming that the heights follow a
normal distribution with a mean of 170 cm and a standard deviation of 10
cm.
Your analysis should focus on how the data is distributed and any
observations that might inform inventory decisions.
import numpy as np
【Code Answer】
import numpy as np
The task involves generating and analyzing height data for 500 individuals
to understand customer height distribution.
This analysis is crucial for a clothing store as it helps determine the size
range for inventory.
We assume the heights follow a normal distribution with a mean (average)
height of 170 cm and a standard deviation of 10 cm.
The standard deviation indicates how much the height values deviate from
the mean.
To generate the height data, we use NumPy's np.random.normal() function.
This function creates random data following a normal distribution based on
the specified mean and standard deviation.
The generated data is then visualized using a histogram, a common method
for displaying frequency distributions.
The histogram is created with Matplotlib's plt.hist() function.
We specify 30 bins, which determine how the data is grouped along the x-
axis.
Each bin represents a range of heights, and the y-axis shows the frequency
of heights within each range.
The resulting plot shows the shape of the distribution, typically bell-shaped
for normally distributed data.
This information helps the store identify which height ranges are most
common and adjust their stock sizes accordingly.
【Trivia】
Histograms are not only useful for visualizing data distributions but also for
detecting outliers.
In a normal distribution, outliers would appear as isolated bars far from the
mean.
This could indicate potential measurement errors or unique customer
characteristics.
30. Sinusoidal Regression for Data Analysis
Importance★★★★☆
Difficulty★★★☆☆
A client in the agricultural sector wants to predict the seasonal yield of a
particular crop based on temperature variations throughout the year.
They believe that the yield follows a sinusoidal pattern due to the seasonal
temperature changes.
Your task is to simulate the temperature data for a year and plot a sinusoidal
regression curve to visualize the relationship.
Use Python to generate synthetic temperature data and fit a sinusoidal
regression model to this data.
import numpy as np
【Code Answer】
import numpy as np
plt.title('Sinusoidal Regression')
plt.xlabel('Time (radians)')
plt.ylabel('Temperature')
plt.legend()
plt.show()
【Trivia】
‣ Sinusoidal regression is particularly useful in fields like meteorology and
agriculture, where periodic patterns are common.
‣ The method can also be applied to model biological rhythms or economic
cycles, showcasing its versatility in various domains.
31. Analyzing Animal Weights with Box Plots
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a wildlife conservation organization.
Your task is to analyze the weight distribution of six different animal
species to understand their health and growth patterns.
Create a box plot to visually compare the weights of these species.
Use Python to generate random sample data for the weights of these
animals, ensuring each species has a different weight distribution.
Provide insights based on the box plot you create.
import numpy as np
import pandas as pd
data = pd.DataFrame(weights)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
data = pd.DataFrame(weights)
plt.figure(figsize=(10, 6))
plt.boxplot([data[s] for s in species], labels=species)
plt.xlabel('Species')
plt.ylabel('Weight (kg)')
plt.grid(True)
plt.show()
The task involves creating a box plot to compare the weights of different
animal species.
A box plot is a graphical representation that displays the distribution of data
based on a five-number summary: minimum, first quartile (Q1), median,
third quartile (Q3), and maximum.
It helps in identifying outliers and understanding the spread and skewness
of the data.
In this exercise, we use Python libraries such as NumPy, pandas, and
Matplotlib.
NumPy is used to generate random data, simulating the weights of different
animal species.
The np.random.normal function generates data following a normal
distribution, where loc is the mean and scale is the standard deviation.
This allows us to create realistic weight distributions for each species.
The data is stored in a pandas DataFrame, which is a versatile data structure
for handling tabular data.
Pandas makes it easy to manipulate and analyze data, and it integrates well
with Matplotlib for visualization.
Matplotlib is used to create the box plot.
The plt.boxplot function takes a list of data arrays and creates a box plot for
each.
We label the x-axis with the species names and the y-axis with the weight
units (kg).
Additional plot features like the title, grid, and labels are added for clarity
and better presentation.
By analyzing the box plot, you can compare the central tendency and
variability of weights across species, helping to draw insights about their
health and growth.
【Trivia】
‣ The box plot was introduced by John Tukey in the 1970s as a part of
exploratory data analysis.
‣ Box plots are particularly useful for comparing distributions between
several groups or datasets.
‣ They are also known as whisker plots due to the lines extending from the
boxes, which indicate variability outside the upper and lower quartiles.
32. Visualizing Random Data with Heatmaps
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst at a retail company.
Your manager wants to visualize customer shopping patterns to identify
potential trends.
To simulate this, generate a heatmap of a 35x35 matrix of random values,
which represents different shopping behaviors across various customer
segments and time periods.
Use Python to create this visualization.
The goal is to understand how to create and interpret heatmaps for data
analysis.
import numpy as np
【Code Answer】
import numpy as np
plt.xlabel('Customer Segments')
plt.ylabel('Time Periods')
plt.show()
In this exercise, you will create a heatmap to visualize random data, which
is a common technique in data analysis to represent complex datasets.
A heatmap is a graphical representation of data where individual values are
represented by colors.
This allows for quick identification of patterns, trends, and anomalies
within the data.
The process begins by generating a 35x35 matrix of random values using
NumPy's rand function, which creates an array of the given shape and
populates it with random samples from a uniform distribution over [0, 1).
These values are used to simulate different shopping behaviors across
customer segments and time periods.
Next, the imshow function from Matplotlib is used to display the matrix as
a heatmap.
The cmap parameter specifies the colormap, which in this case is set to
'viridis', a popular choice for its perceptual uniformity.
The aspect parameter is set to 'auto' to ensure the heatmap scales correctly.
A colorbar is added to the plot using plt.colorbar, providing a reference for
interpreting the intensity of the colors.
Labels and a title are added to the plot to provide context, helping viewers
understand what the heatmap represents.
Finally, plt.show() is called to display the plot.
This exercise demonstrates how to use Python libraries to create
visualizations that can aid in data analysis, making it easier to derive
insights from complex datasets.
【Trivia】
Heatmaps are widely used in various fields, including biology, finance, and
marketing, to visualize complex data.
They are particularly popular in genomics for visualizing gene expression
data and in finance for representing correlations between different financial
instruments.
The choice of colormap can significantly impact the interpretation of a
heatmap, so it's important to select one that accurately represents the data's
characteristics.
33. Creating a Violin Plot for Task Completion
Times
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a productivity software company.
The company wants to understand the distribution of time taken by users to
complete four different tasks within their application.
Your task is to create a violin plot to visually compare the time distributions
for these tasks.
Generate synthetic data representing the time taken (in minutes) to
complete each task for 100 users.
Use Python to create a violin plot that shows the distribution of completion
times for each task.
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data)
plt.figure(figsize=(10, 6))
plt.xlabel('Task')
plt.show()
The task requires generating synthetic data to simulate the time taken by
users to complete four different tasks.
This is done using NumPy's np.random.normal function, which generates
random numbers following a normal distribution.
In this case, the mean (loc) is set to 30 minutes, and the standard deviation
(scale) is set to 5 minutes, simulating realistic task completion times.
The data is organized into a Pandas DataFrame for easy manipulation and
analysis.
To create a violin plot using Seaborn, the data needs to be in a "long"
format, where each row represents a single observation.
This is achieved using the melt function from Pandas, which transforms the
DataFrame from wide to long format.
The sns.violinplot function is then used to create the plot, with
inner='quartile' to display the quartiles within the violin shapes.
The plot is customized with titles and labels using Matplotlib's plt functions
to improve readability and presentation.
Violin plots are useful for visualizing the distribution and density of data,
providing insights into the spread and skewness of the data.
They combine the features of a box plot with a kernel density plot, offering
a comprehensive view of the data distribution.
【Trivia】
‣ Violin plots are particularly useful when comparing multiple categories,
as they provide a clear visual representation of differences in data
distributions.
‣ The shape of the violin indicates the density of the data at different
values, with wider sections representing higher data density.
‣ Seaborn, used here for creating the violin plot, is a Python data
visualization library based on Matplotlib, offering a high-level interface for
drawing attractive and informative statistical graphics.
34. Creating a 3D Surface Plot from a Parametric
Equation
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a company that designs 3D models for
virtual reality applications.
Your task is to visualize a complex 3D surface to understand its geometric
properties better.
The company has provided you with a parametric equation that describes
the surface.
Your goal is to generate a 3D surface plot using Python to analyze the shape
and features of the surface.
Use the following parametric equations for the surface:
x(u,v)=(1+0.5⋅cos(v))⋅cos(u)x(u,v)=(1+0.5\cdot \cos(v))\cdot
\cos(u)x(u,v)=(1+0.5⋅cos(v))⋅cos(u)
y(u,v)=(1+0.5⋅cos(v))⋅sin(u)y(u,v)=(1+0.5\cdot \cos(v))\cdot \sin(u)y(u,v)=
(1+0.5⋅cos(v))⋅sin(u)
z(u,v)=0.5⋅sin(v)z(u,v)=0.5\cdot \sin(v)z(u,v)=0.5⋅sin(v)
where uuu ranges from 0 to 2π2\pi 2π and vvv ranges from 0 to 2π2\pi 2π.
Your task is to write Python code to create and display a 3D surface plot of
this parametric surface.
import numpy as np
z = 0.5 * np.sin(v)
【Diagram Answer】
【Code Answer】
import numpy as np
u, v = np.meshgrid(u, v)
z = 0.5 * np.sin(v)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
【Trivia】
Did you know that parametric equations are not only used in 3D modeling
but also in computer graphics to create complex animations and
simulations?
They allow for more flexibility and control over the shapes and curves,
making them a powerful tool in various fields, including engineering and
virtual reality.
35. Create a Line Plot of Hourly Temperature
Variations Over a Day
Importance★★★★☆
Difficulty★★☆☆☆
A weather data analysis company has hired you to help visualize the
temperature changes throughout the day for a particular city.
Your task is to create a line plot that shows the hourly temperature over a
24-hour period.
You need to generate a sample dataset where temperatures are recorded at
each hour.
Use Python to create this plot.
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()
【Trivia】
Did you know that sinusoidal patterns are often used to model natural
phenomena? The daily temperature cycle is a perfect example, as it’s
influenced by the Earth’s rotation and the angle of sunlight. These patterns
are not only found in weather but also in other areas such as economics,
biology, and even music. Learning to recognize and model these patterns
can be incredibly useful in various scientific and engineering fields.
36. Scatter Plot Matrix for 10-Dimensional Data
Analysis
Importance★★★☆☆
Difficulty★★★☆☆
A retail company wants to analyze the relationship between various product
features and sales performance. They have collected data on 10 different
features for 100 products, including price, weight, dimensions, and
customer ratings. Your task is to create a scatter plot matrix to visualize the
relationships among these features. Generate the input data within your
code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data, columns=columns)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(data, columns=columns)
plt.show()
【Trivia】
Scatter plot matrices are particularly useful in exploratory data analysis
(EDA) as they allow analysts to quickly identify correlations, trends, and
outliers among multiple variables.
37. Creating a Bar Chart for Product Sales
Analysis
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a retail company.
Your task is to analyze the sales data of five different products over the past
two years.
The company wants to visualize this data to better understand sales trends
and make informed business decisions.
Create a bar chart that displays the sales of these products over the two
years to help the company identify patterns and opportunities for growth.
Use Python to generate the data and create the visualization.
import pandas as pd
import numpy as np
## Create a DataFrame with random sales data for 5 products over 2 years
data = {'Product': ['Product A', 'Product B', 'Product C', 'Product D',
'Product E'],
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import pandas as pd
import numpy as np
## Create a DataFrame with random sales data for 5 products over 2 years
data = {'Product': ['Product A', 'Product B', 'Product C', 'Product D',
'Product E'],
'Year 1': np.random.randint(100, 500, 5),
df = pd.DataFrame(data)
fig, ax = plt.subplots()
x = np.arange(len(df['Product']))
ax.set_xlabel('Products')
ax.set_ylabel('Sales')
ax.set_xticks(x)
ax.set_xticklabels(df['Product'])
ax.legend()
plt.show()
‣ This exercise involves creating a bar chart to visualize sales data using
Python.
‣ We start by importing necessary libraries: pandas for data manipulation,
numpy for numerical operations, and matplotlib.pyplot for plotting.
‣ The data is generated using numpy.random.randint, which creates random
integers to simulate sales figures for five products over two years.
‣ A pandas.DataFrame is used to store this data, making it easy to
manipulate and visualize.
‣ The matplotlib library is then used to create a bar chart. The plt.subplots()
function initializes the plotting area.
‣ We define the width of the bars and calculate their positions using
numpy.arange to ensure they are placed correctly on the x-axis.
‣ Two sets of bars are plotted for each year using ax.bar(), with a slight
offset to separate them visually.
‣ Labels and titles are added for clarity, and ax.set_xticks() and
ax.set_xticklabels() are used to label the x-axis with product names.
‣ Finally, plt.show() displays the plot. This exercise demonstrates how to
use Python for data analysis and visualization, skills that are essential for
making data-driven decisions.
【Trivia】
‣ Bar charts are one of the most common types of data visualization and are
particularly useful for comparing quantities across different categories.
‣ The first known bar chart was created by William Playfair in 1786, who is
considered one of the pioneers of statistical graphics.
38. Creating a Pie Chart for Beverage
Distribution
Importance★★★★☆
Difficulty★★☆☆☆
A local grocery store wants to visualize the distribution of different types of
beverages they sell.
They have the following categories: "Soda", "Juice", "Water", "Tea", and
"Coffee".
Your task is to create a pie chart that represents the percentage distribution
of these beverages.
Generate the data within the code, and ensure that the pie chart is displayed
correctly.
【Code Answer】
plt.show()
The task is to create a pie chart using Python to visualize the distribution of
different types of beverages in a store.
To achieve this, we use the matplotlib library, which is a popular tool for
data visualization in Python.
First, we import the pyplot module from matplotlib, which provides a
MATLAB-like interface for plotting.
We define two lists: beverages, which contains the names of the beverage
categories, and counts, which contains the number of items for each
category.
These lists represent the data that will be visualized in the pie chart.
The plt.pie() function is used to create the pie chart.
▸ This function takes several parameters:
‣ counts: The sizes of each wedge in the pie chart.
‣ labels: The labels for each wedge, which are the beverage names in this
case.
‣ autopct: A string format that determines how the percentage labels are
displayed on the chart. Here, '%1.1f%%' formats the percentage to one
decimal place.
‣ startangle: The starting angle of the pie chart, which is set to 140 degrees
to make the chart more visually appealing.
The plt.title() function sets the title of the chart, helping viewers understand
what the chart represents.
The plt.axis('equal') function ensures that the pie chart is drawn as a circle
rather than an ellipse, which can happen if the aspect ratio is not set to
equal.
Finally, plt.show() displays the pie chart. This function renders the chart in
a window, allowing the user to see the visual representation of the data.
This exercise demonstrates how to use Python for basic data visualization,
which is a crucial skill in data analysis and presentation.
【Trivia】
Did you know that the pie chart was popularized by the Scottish engineer
William Playfair in the early 19th century?
Although pie charts are widely used, some data visualization experts argue
that they are not always the best choice for representing data, especially
when there are many categories or when the differences between categories
are subtle.
In such cases, bar charts or other types of visualizations might be more
effective.
39. Creating a Histogram of Ages
Importance★★★★☆
Difficulty★★☆☆☆
A marketing company wants to analyze the age distribution of their
potential customers.
They have collected age data from 600 individuals.
Your task is to create a histogram to visualize this age distribution.
Generate the age data randomly, assuming the ages range from 18 to 80.
Use Python to create the histogram and provide insights into the age
distribution.
import numpy as np
【Code Answer】
import numpy as np
# Create a histogram
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
‣ The task involves generating a dataset of ages for 600 individuals, which
is achieved using the numpy library.
‣ The numpy.random.randint function is used to create an array of random
integers, representing ages between 18 and 80.
‣ Once the data is generated, the matplotlib.pyplot library is used to create
a histogram.
‣ The plt.hist function takes the age data as input and creates a histogram.
The bins parameter is set to 15, which divides the age range into 15
intervals.
‣ The color and edgecolor parameters are used to style the bars of the
histogram.
‣ The plt.title, plt.xlabel, and plt.ylabel functions are used to add a title and
labels to the axes, making the plot more informative.
‣ plt.grid(True) adds a grid to the background, which can help in
visualizing the distribution more clearly.
‣ Finally, plt.show() is called to display the histogram.
【Trivia】
‣ Histograms are a popular tool in data analysis because they provide a
visual representation of the distribution of a dataset.
‣ They are particularly useful for identifying patterns, such as skewness,
and for detecting outliers.
‣ The choice of the number of bins can significantly affect the appearance
of the histogram and the insights drawn from it.
40. Logistic Regression Curve with Synthetic Data
Importance★★★★☆
Difficulty★★★☆☆
A marketing company wants to predict whether a customer will respond
positively to a new product advertisement.
They believe that the likelihood of a positive response can be modeled
using logistic regression based on several features of the customer data.
Your task is to create synthetic data that simulates this scenario and plot a
logistic regression curve to visualize the relationship.
Use Python to generate the data and produce the plot.
import numpy as np
import pandas as pd
X, y = make_classification(n_samples=100, n_features=2,
n_informative=2, n_redundant=0, n_clusters_per_class=1,
random_state=42)
data['Response'] = y
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
X, y = make_classification(n_samples=100, n_features=2,
n_informative=2, n_redundant=0, n_clusters_per_class=1,
random_state=42)
data['Response'] = y
model = LogisticRegression()
model.fit(X_train, y_train)
## Predict probabilities
probabilities = model.predict_proba(X_test)[:, 1]
plt.figure(figsize=(10, 6))
plt.xlabel('Feature1')
plt.colorbar(label='Actual Response')
plt.show()
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(data)
## Create a box plot
plt.figure(figsize=(10, 6))
df.boxplot()
plt.xlabel('Plant Types')
plt.ylabel('Length (cm)')
plt.grid(True)
plt.show()
‣ This task involves creating a box plot to analyze the lengths of different
plant types.
Box plots are a useful statistical tool for visualizing the distribution, central
tendency, and variability of data.
‣ We start by importing the necessary libraries: numpy for numerical
operations, pandas for data manipulation, and matplotlib.pyplot for plotting.
‣ The sample data is generated using numpy.random.normal, which creates
normally distributed data for each plant type.
The loc parameter sets the mean, and the scale parameter sets the standard
deviation.
This simulates realistic variations in plant lengths.
‣ The generated data is stored in a dictionary, where each key represents a
plant type, and the values are arrays of lengths.
This dictionary is then converted into a pandas DataFrame for easier
manipulation and plotting.
‣ The box plot is created using the boxplot method of the DataFrame,
which automatically handles the plotting of each column.
‣ The plot is customized with titles and labels to make it informative.
The grid is enabled to improve readability.
‣ Finally, plt.show() displays the plot, allowing the user to visually interpret
the data.
The box plot will show the median, quartiles, and potential outliers for each
plant type, providing insights into their growth patterns.
【Trivia】
‣ Box plots were introduced by John Tukey in 1977 as part of his
exploratory data analysis techniques.
They are sometimes called "box-and-whisker plots" because of the whiskers
that extend from the boxes to indicate variability outside the upper and
lower quartiles.
‣ In a box plot, the "box" represents the interquartile range (IQR), which
contains the middle 50% of the data.
The line inside the box indicates the median of the data.
‣ Outliers are often plotted as individual points beyond the whiskers,
providing a clear view of any anomalies in the data.
This makes box plots particularly useful for identifying outliers and
understanding the spread of data.
42. Generating a Heatmap from Random Data
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a retail company.
Your manager wants to visualize the sales performance across different
regions in a 40x40 grid.
Each cell in the grid represents a region, and the value represents the sales
performance.
Generate a heatmap using random values to simulate the sales data.
Ensure that the heatmap is clearly labeled and visually appealing to present
at the next team meeting.
Use Python to create this visualization.
import numpy as np
【Code Answer】
import numpy as np
plt.xlabel('Region X')
plt.ylabel('Region Y')
plt.show()
【Trivia】
Heatmaps are widely used in various fields, including biology for
visualizing gene expression data, and in sports analytics to show player
movements or activity levels on the field.
The choice of color map can significantly impact the interpretation of data,
and it's important to choose one that accurately represents the data's
characteristics.
43. Analyzing Game Scores: Creating Violin Plots
Importance★★★★☆
Difficulty★★★☆☆
A gaming company wants to analyze the score distribution of players across
four different games to understand the variability and distribution of scores.
They have collected score data from 100 players for each game and need to
create a visual comparison using violin plots.
Your task is to create a violin plot to compare the score distributions for
these four games.
Use the generated data in your analysis. Ensure that the violin plots clearly
show the distribution of scores for each game.
import numpy as np
import pandas as pd
np.random.seed(42)
【Code Answer】
import pandas as pd
import numpy as np
np.random.seed(42)
plt.figure(figsize=(10, 6))
sns.violinplot(data=data)
plt.ylabel('Scores')
plt.xlabel('Games')
plt.show()
【Trivia】
The violin plot was introduced by Jerry Hintze and Ray Nelson in 1998.
It is particularly useful in statistical data analysis for comparing the
distribution of data across different categories, making it a valuable tool for
exploratory data analysis.
44. 3D Scatter Plot for Data Analysis Practice
Importance★★★★☆
Difficulty★★★☆☆
A company is interested in visualizing the distribution of their product sales
data in a 3D space.
They have 400 data points, each representing a sale with three attributes:
price, quantity, and discount.
Your task is to generate a 3D scatter plot to help them understand the
relationship between these attributes.
Create the data using random values and plot it using Python.
Use the plot to identify any patterns or clusters that might indicate trends in
sales.
import numpy as np
【Code Answer】
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data[:, 0], data[:, 1], data[:, 2], c='b', marker='o')
ax.set_xlabel('Price')
ax.set_ylabel('Quantity')
ax.set_zlabel('Discount')
plt.show()
【Trivia】
‣ 3D scatter plots are a powerful tool for visualizing relationships between
three variables. They are commonly used in data analysis to identify
clusters, trends, or outliers.
‣ While 3D plots can provide more information than 2D plots, they can also
be harder to interpret, especially when dealing with large datasets.
‣ Python's matplotlib library is widely used for creating static, interactive,
and animated visualizations in Python. It is highly customizable and
supports a wide range of plot types.
45. Analyzing Monthly Household Expenses Over
a Year
Importance★★★★☆
Difficulty★★☆☆☆
A customer wants to understand their monthly household expenses over the
last year to better plan their budget for the upcoming year.
Create a Python program that generates a line plot showing the monthly
expenses for a household.
The generated data should simulate the monthly expenses for a household
over 12 months.
The customer wants to visualize these expenses to identify any trends or
unusual spikes in spending.
Create a line plot that clearly shows the monthly expenses.
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.xlabel('Month')
plt.ylabel('Expenses ($)')
plt.grid(True)
plt.show()
In this exercise, you are tasked with creating a Python program to generate
a line plot of monthly household expenses over a year.
The purpose of this task is to provide a visual representation of the
expenses, which can help in identifying patterns, trends, or unusual
spending.
The first step involves generating the data, which simulates household
expenses for each of the 12 months.
This is done by using a normal distribution centered around a typical
monthly expense of $2000, with a standard deviation of $250 to introduce
some variation.
The data is stored in a pandas DataFrame, where the 'Month' column
represents the months of the year, and the 'Expenses' column contains the
corresponding expenses.
In the next step, you plot the data using Matplotlib, a powerful plotting
library in Python.
The plt.plot() function is used to create a line plot, with 'Month' on the x-
axis and 'Expenses' on the y-axis.
The marker='o' argument is added to show individual data points on the
line, making it easier to identify specific values.
The plt.title(), plt.xlabel(), and plt.ylabel() functions are used to add a title
and labels to the axes, providing context to the plot.
Finally, plt.grid(True) adds a grid to the plot, which helps in better
visualizing the data points and trends.
The plt.show() function displays the plot to the user.
This exercise is essential for understanding how to visualize time-series
data and interpret trends, which is a common task in data analysis and
budget planning.
It also demonstrates the importance of data visualization in making
informed decisions based on numerical data.
【Trivia】
Did you know that line plots are one of the most commonly used types of
charts in data analysis?
They are especially useful for displaying data trends over time, making
them a go-to choice for time-series analysis, financial data, and scientific
research.
46. Bar Chart Creation for Product Sales Analysis
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a retail company that operates several
stores.
Your manager has asked you to analyze the sales performance of six
different stores.
Create a bar chart to visualize the number of products sold by each store.
This will help in understanding which stores are performing well and which
need improvement.
Use Python to generate the data and create the chart.
import random
stores = ['Store A', 'Store B', 'Store C', 'Store D', 'Store E', 'Store F']
【Code Answer】
import random
stores = ['Store A', 'Store B', 'Store C', 'Store D', 'Store E', 'Store F']
plt.xlabel('Store')
plt.show()
In this exercise, you are tasked with creating a bar chart to visualize the
sales data of different stores.
This is a common task in data analysis, where visualizations help convey
insights from data.
The first step involves generating random sales data for six stores.
This is done using Python's random module, which allows you to create a
list of random integers representing sales figures.
The random.randint(50, 200) function generates a random integer between
50 and 200, simulating the number of products sold by each store.
The data is then organized into a dictionary format, with store names as
keys and sales numbers as values.
For visualization, the matplotlib.pyplot library is used, which is a powerful
tool for creating static, interactive, and animated visualizations in Python.
The plt.bar() function is used to create a bar chart, where the first argument
is the list of store names and the second is the list of sales figures.
The color parameter is set to 'skyblue' to give the bars a distinct color.
The plt.title(), plt.xlabel(), and plt.ylabel() functions are used to add a title
and labels to the x and y axes, respectively.
Finally, plt.show() is called to display the chart.
This exercise demonstrates how to use Python for data visualization, which
is a crucial skill in data analysis and business intelligence.
【Trivia】
Did you know that bar charts are one of the most popular types of data
visualization?
They are widely used because they are simple to create and easy to
interpret, making them ideal for comparing quantities across different
categories.
Bar charts can be displayed vertically or horizontally, and they are
particularly effective when dealing with categorical data.
47. Analyzing Clothing Inventory Distribution
with a Pie Chart
Importance★★★★☆
Difficulty★★★☆☆
You are the manager of a retail clothing store and have been asked to
present a visual representation of the store's current inventory to the sales
team.To do this, you decide to create a pie chart that shows the distribution
of different types of clothing in your shop.The categories of clothing
include "Shirts," "Pants," "Jackets," "Shoes," and "Accessories."Your task is
to analyze the data and generate a pie chart that visually represents the
percentage share of each clothing category.You must use Python to create
this pie chart.
【Data Generation Code Example】
import numpy as np
categories=['Shirts','Pants','Jackets','Shoes','Accessories']
quantities=np.array([150,100,75,125,50])
【Diagram Answer】
【Code Answer】
import numpy as np
categories=['Shirts','Pants','Jackets','Shoes','Accessories']
quantities=np.array([150,100,75,125,50])
plt.pie(quantities,labels=categories,autopct='%1.1f%%')
plt.show()
This exercise focuses on using Python for basic data analysis and
visualization.
To achieve the goal, we first generate the data, which consists of the
quantities of each clothing category.
The quantities are stored in a NumPy array, which is a powerful tool for
numerical operations in Python.
Next, we use Matplotlib, a popular library for creating static, animated, and
interactive visualizations in Python.
In the code, we use the plt.pie function to create a pie chart.
This function takes the array of quantities as input and generates a pie chart
where each slice represents a category's proportion of the total.
The labels parameter specifies the names of the categories, and
autopct='%1.1f%%' formats the percentage labels on the pie chart to one
decimal place.
Finally, plt.title adds a title to the chart, and plt.show displays the pie chart
to the user.
This exercise is important for beginners to learn basic data analysis and
visualization techniques using Python.
Understanding how to create visual representations of data is crucial for
effectively communicating insights and making informed business
decisions.
【Trivia】
Did you know that pie charts were first popularized by William Playfair in
1801?He used them to show the proportions of a nation's exports and
imports to different parts of the world.Today, pie charts are commonly used
in business and statistics to represent the composition of a whole in a simple
and visually appealing way.
48. Creating a Histogram of 700 Individuals'
Weights
Importance★★★★☆
Difficulty★★★☆☆
You have been hired by a health clinic to analyze the distribution of body
weights among 700 individuals who recently participated in a health check-
up. The clinic wants to better understand the general weight distribution of
their patients to plan health programs and allocate resources accordingly.
Your task is to generate a random dataset of these 700 individuals' weights
(in kilograms) and create a histogram to visualize the distribution. Ensure
that the weights are normally distributed with a mean of 70 kg and a
standard deviation of 15 kg.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
plt.ylabel('Number of Individuals')
plt.grid(True)
plt.show()
import numpy as np
import pandas as pd
y = np.piecewise(x, [x < 30, (x >= 30) & (x < 70), x >= 70],
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
models = []
start = 0
models.append(model)
start = bp
return models
Plotting
plt.figure(figsize=(10, 6))
plt.xlabel('Advertising Spend')
plt.ylabel('Sales Revenue')
plt.legend()
plt.grid()
plt.show()
【Trivia】
Piecewise regression is particularly useful in fields like economics and
marketing, where relationships between variables may not be constant
across their entire range. It allows analysts to capture more complex
behaviors in the data, leading to better predictions and insights.
50. Creating a Box Plot to Compare Prices of
Various Electronic Devices
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for an electronics retailer, and the
company has requested an analysis of the pricing distribution for several
popular electronic devices. Your task is to create a box plot that compares
the prices of eight different electronic devices. This will help the company
understand the pricing trends and identify any outliers in the market.To
proceed, first, generate a random dataset representing the prices of these
devices. Then, using this dataset, create a box plot to visualize the
distribution of prices for each device.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(42)
data = pd.DataFrame(prices)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
plt.figure(figsize=(10, 6))
plt.xlabel('Device')
plt.ylabel('Price ($)')
plt.show()
Box plots are a useful way to visualize the distribution of data, highlighting
the median, quartiles, and potential outliers.
In this task, you first generate a dataset containing the prices of eight
different electronic devices.
Each device's prices are randomly generated using a normal distribution
with different means and a constant standard deviation.
This ensures that the prices vary realistically between different types of
devices, reflecting how high-end products like laptops are generally more
expensive than accessories like headphones.
Once the data is generated, the box plot is created using matplotlib.
The box plot shows the interquartile range (IQR), which represents the
middle 50% of the data, with a line at the median price.
Whiskers extend from the box to the smallest and largest values within 1.5
times the IQR, and points outside this range are considered outliers.
This visualization allows you to compare the central tendency and
variability of prices across different devices, helping to identify products
with particularly high or low price distributions.
Understanding these statistical concepts is crucial for data analysis, as it
helps in making informed decisions based on data trends and identifying
anomalies that might require further investigation.
By completing this exercise, you gain practical experience in generating
and analyzing data distributions using Python, which is a fundamental skill
in data science.
【Trivia】
The box plot was introduced by John Tukey in 1970 as part of his work on
exploratory data analysis.
Tukey's goal was to provide simple, visual tools to help people understand
data distributions and identify potential anomalies without requiring
complex statistical calculations.
Today, box plots are widely used across various fields, from finance to
biology, due to their simplicity and effectiveness in summarizing data
distributions.
51. Generating and Analyzing a Heatmap from a
45x45 Random Value Matrix
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a retail company.
Your task is to simulate and visualize the correlation between different
stores' sales data by generating a heatmap of a randomly generated 45x45
matrix.
This matrix will represent the correlation between 45 different stores.
Your goal is to generate this matrix, create a heatmap from it, and analyze
the resulting heatmap to identify any clusters or patterns that might suggest
relationships between store sales.
Write the necessary Python code to generate the random matrix and display
it as a heatmap.
You do not need to actually analyze the heatmap; just focus on creating and
displaying it.
import numpy as np
np.random.seed(42)
【Code Answer】
import numpy as np
plt.figure(figsize=(10, 8))
plt.xlabel("Store Index")
plt.ylabel("Store Index")
plt.show()
To tackle this problem, you begin by generating a 45x45 matrix filled with
random values using NumPy's np.random.rand() function.
This function produces random values between 0 and 1, simulating
correlations between different stores.
You set a random seed with np.random.seed(42) to ensure that the generated
random numbers are reproducible, which is crucial when dealing with data
analysis, as it allows others to replicate your results.
Next, you move on to the visualization part, where you use Matplotlib and
Seaborn libraries to create and display the heatmap.
The sns.heatmap() function is employed to generate the heatmap, with the
cmap parameter set to 'viridis' to provide a visually appealing color
gradient.
The annot=False option is used to keep the heatmap clean by not displaying
the individual values inside the cells.
Finally, you add titles and labels using Matplotlib's plt.title(), plt.xlabel(),
and plt.ylabel() functions to make the heatmap easy to interpret.
The plt.show() function is called to display the heatmap.
This code provides a simple yet effective way to simulate and visualize the
relationships between different stores using a heatmap, making it easier to
identify potential patterns or clusters.
【Trivia】
Heatmaps are a popular visualization tool in data analysis because they
allow for the easy identification of patterns, correlations, and outliers in
large datasets.
In this case, a heatmap of a random matrix doesn't have real-world
implications, but in actual business scenarios, it could represent anything
from customer purchase behavior to the similarity of product sales across
different regions.
52. Violin Plot Analysis of Event Durations
Importance★★★★☆
Difficulty★★★☆☆
A company is analyzing the durations of five different events to improve
their scheduling efficiency. The events are: "Event A", "Event B", "Event
C", "Event D", and "Event E". Your task is to create a violin plot that
compares the durations of these events based on simulated data. Use Python
to generate the data and visualize it.
【Data Generation Code Example】
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(42)
data = pd.DataFrame({
'Event': np.repeat(['Event A', 'Event B', 'Event C', 'Event D', 'Event E'],
100),
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
data = pd.DataFrame({
'Event': np.repeat(['Event A', 'Event B', 'Event C', 'Event D', 'Event E'],
100),
'Duration': np.concatenate([np.random.normal(loc=30, scale=5,
size=100),
})
plt.figure(figsize=(10, 6))
plt.xlabel('Events')
plt.ylabel('Duration (minutes)')
plt.show()
【Trivia】
Violin plots are particularly useful for comparing multiple categories
because they provide more information than box plots. They show the
density of the data at different values, allowing for a deeper understanding
of the distribution shape.
53. Analyzing Fractal Patterns in 3D Surface Plots
Importance★★★☆☆
Difficulty★★★★☆
A client in the field of mathematical visualization has requested you to
analyze the surface characteristics of a specific fractal pattern.
They believe that visualizing this pattern in a 3D plot will help them
understand the distribution and density variations across different regions of
the fractal.
Your task is to generate a 3D surface plot of the fractal pattern using
Python.
While the primary goal is to generate this plot, the underlying objective is
to analyze the data and its implications for the client's needs.
You should also explain how the characteristics of the fractal can be
interpreted from the resulting visualization.
import numpy as np
x = np.linspace(-2, 2, 1000)
y = np.linspace(-2, 2, 1000)
X, Y = np.meshgrid(x, y)
Z = np.abs(np.sin(X + Y))
【Diagram Answer】
【Code Answer】
import numpy as np
x = np.linspace(-2, 2, 1000)
y = np.linspace(-2, 2, 1000)
X, Y = np.meshgrid(x, y)
## Calculate the fractal pattern using a simple iteration function
Z = np.abs(np.sin(X + Y))
## Create a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
import numpy as np
import pandas as pd
## Create DataFrame
【Code Answer】
import numpy as np
import pandas as pd
## Create DataFrame
plt.plot(df["Week"], df["Production"])
【Trivia】
Line plots are particularly useful for time-series data because they clearly
show trends, patterns, and potential outliers over time. When analyzing data
like weekly production, these plots can reveal seasonality, growth, or
decline trends, which are critical for strategic planning and operational
adjustments in manufacturing industries.
55. Customer Distribution Analysis Across
Multiple Restaurants
Importance★★★★☆
Difficulty★★☆☆☆
A restaurant chain wants to analyze the distribution of customers across its
7 different locations. The management needs a visual representation of the
number of customers in each restaurant to better understand which locations
are performing well and which ones need improvement.
Your task is to create a bar chart that shows the number of customers in
each of the 7 restaurants.
You will need to generate sample data for the number of customers in each
restaurant, then create a bar chart using this data.
Please generate the data for the number of customers in each restaurant
within the code itself.
The names of the restaurants are: "Bistro A", "Café B", "Diner C", "Eatery
D", "Grill E", "House F", and "Inn G".
import random
restaurants = ["Bistro A", "Café B", "Diner C", "Eatery D", "Grill E",
"House F", "Inn G"]
【Code Answer】
import random
restaurants = ["Bistro A", "Café B", "Diner C", "Eatery D", "Grill E",
"House F", "Inn G"]
plt.bar(restaurants, customer_counts)
plt.xlabel("Restaurants")
plt.ylabel("Number of Customers")
plt.show()
To solve this problem, you first need to import the necessary library,
matplotlib.pyplot, which is used to create visualizations in Python.
You also import the random library, which allows you to generate random
numbers. This is useful for creating the sample data for the number of
customers in each restaurant.
The list restaurants contains the names of the seven different restaurants.
This list is used to label the x-axis of the bar chart.
Next, you generate a list of customer counts using a list comprehension.
The random.randint(50, 200) function generates random integers between
50 and 200 for each restaurant, simulating the number of customers. This
data is stored in the customer_counts list.
To create the bar chart, you use the plt.bar() function, which takes the
restaurants list as the x-axis labels and the customer_counts list as the
heights of the bars.
The plt.xlabel() and plt.ylabel() functions label the x and y axes,
respectively, while plt.title() adds a title to the chart.
Finally, plt.show() displays the bar chart.
This exercise teaches you how to generate random data, create a basic bar
chart in Python, and label different elements of the chart. Understanding
how to visualize data is a fundamental skill in data analysis, helping to
convey insights clearly and effectively.
【Trivia】
Bar charts are one of the most common ways to visualize categorical data.
They are especially useful when comparing different groups, such as
customer distribution across various locations in this problem.
However, it's important to choose the right type of chart based on the nature
of the data. For example, if you were comparing data over time, a line chart
might be more appropriate.
Mastering the use of different chart types in Python will greatly enhance
your ability to analyze and communicate data effectively.
56. Visualizing the Distribution of Electronics in a
Store
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for an electronics store chain. The store
manager has asked you to help visualize the current inventory distribution
of different types of electronics.
Your task is to generate a pie chart that shows the proportion of various
electronics categories in the store, such as smartphones, laptops, tablets, and
televisions.
To do this, you need to create a sample dataset that represents the number of
items available in each category, then use Python to generate a pie chart.
This visualization will help the manager quickly understand the inventory
distribution and make informed decisions about stock management.
【Code Answer】
plt.axis('equal')
plt.show()
# The list categories contains the names of the electronics types available
in the store.
# The list counts represents the number of items available for each
corresponding category.
# startangle=140 rotates the start of the pie chart by 140 degrees for better
visualization.
In this exercise, you are creating a pie chart to visualize the distribution of
different categories of electronics in a store.
This task involves basic data analysis and visualization skills, crucial for
understanding inventory distribution.
The Python library matplotlib.pyplot is used for generating the pie chart,
which is one of the most common tools for data visualization.
The lists categories and counts are created to hold the names of the
electronics categories and their respective quantities in the store.
The plt.pie() function is used to create the pie chart, where labels assigns
the names to each slice, and autopct displays the percentage of the total for
each category.
The startangle parameter adjusts the starting angle of the pie chart for a
more aesthetically pleasing layout.
Ensuring the pie chart is circular and displaying it with plt.show()
completes the visualization process.
Understanding how to generate and interpret such charts is important for
making data-driven decisions in inventory management and other business
scenarios.
【Trivia】
Did you know that pie charts are often criticized for being less effective
than other types of charts, like bar charts, for comparing relative sizes?
However, they are still widely used because they offer a quick and intuitive
way to represent data as parts of a whole.
In cases where the exact proportions are less important, pie charts can be a
very effective communication tool.
57. Analyzing the Distribution of Item Lengths in
a Product Inventory
Importance★★★☆☆
Difficulty★★☆☆☆
A company wants to analyze the lengths of different items in their inventory
to optimize their storage solutions.
The company has recorded the lengths of 800 different items, and you are
tasked with visualizing the distribution of these lengths.
Your goal is to create a histogram that displays the frequency distribution of
the item lengths.
This analysis will help the company understand how item lengths are
distributed, allowing them to design better storage compartments.
Generate a random dataset representing the lengths of 800 items, where the
lengths are normally distributed with a mean of 50 units and a standard
deviation of 10 units.
Then, write the Python code to create and display a histogram of these
lengths.
import numpy as np
【Code Answer】
import numpy as np
plt.xlabel('Length (units)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
【Trivia】
Histograms were first introduced by Karl Pearson, one of the pioneers of
statistics, in the late 19th century.
They have since become one of the most widely used tools in exploratory
data analysis, helping statisticians and data scientists alike to visualize and
understand the underlying patterns in their data.
58. Spline Regression Curve with Synthetic Data
Importance★★★☆☆
Difficulty★★☆☆☆
A retail company wants to analyze the relationship between advertising
expenditure and sales revenue to optimize their marketing budget. They
suspect a non-linear relationship and would like to visualize this using a
spline regression curve. Your task is to create synthetic data representing
advertising expenditure (in thousands of yen) and corresponding sales
revenue (in thousands of yen). Use Python to generate this data and plot a
spline regression curve to visualize the relationship.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
plt.legend()
plt.show()
import numpy as np
【Code Answer】
import numpy as np
plt.boxplot(heights, labels=species)
plt.xlabel('Tree Species')
plt.ylabel('Height (m)')
plt.grid()
plt.show()
【Trivia】
Box plots are particularly useful for identifying outliers in the data. Outliers
are data points that fall outside the expected range, which can indicate
unusual growth patterns or measurement errors. By examining the box plot,
one can quickly assess the spread and symmetry of the data, making it a
valuable tool in data analysis.
60. Generating and Analyzing a Heatmap from
Random Data
Importance★★★☆☆
Difficulty★★☆☆☆
A retail company is analyzing customer purchase patterns and needs to
visualize these patterns as a heatmap.
This heatmap will be created using a 50x50 matrix where each cell
represents a particular metric, such as the frequency of purchases in specific
regions of the store.
The company wants to use this heatmap to identify hot spots and optimize
the store layout.
Your task is to generate the 50x50 matrix with random values representing
the frequency of purchases and create a heatmap to visualize this data.
Use Python to generate the data and display the heatmap.
import numpy as np
【Code Answer】
import numpy as np
plt.show()
【Trivia】
Heatmaps were first popularized in the 1990s for visualizing data in the
form of color-coded matrices.
They have since become an essential tool in data science and are used
across various industries for tasks ranging from website analytics to
biological data analysis.
61. Project Completion Time Analysis with Violin
Plot
Importance★★★★☆
Difficulty★★★☆☆
A project management company wants to analyze the time taken to
complete six different projects. They have collected data on the completion
times (in hours) for each project. Your task is to create a violin plot to
visualize the distribution of completion times for each project. Use the
provided code to generate the sample data and create the plot.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(42)
data = {
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
data = {
df = pd.DataFrame(data)
plt.figure(figsize=(10, 6))
sns.violinplot(data=df)
plt.xlabel('Projects')
plt.legend(title='Projects', labels=df.columns)
plt.show()
In this exercise, you will learn how to create a violin plot using Python's
data analysis and visualization libraries. A violin plot is a method of
plotting numeric data and can be understood as a combination of a box plot
and a density plot. It provides a visual summary of the data distribution,
showing the probability density of the data at different values.
▸ Importing Libraries: We start by importing necessary libraries:
numpy for numerical operations,
pandas for data manipulation,
matplotlib.pyplot for plotting,
seaborn for enhanced data visualization.
▸ Generating Sample Data: We create a dictionary containing completion
times for six different projects. The np.random.normal function generates
random numbers following a normal distribution, where:
loc is the mean (average time for completion),
scale is the standard deviation (how spread out the times are),
size is the number of data points (100 in this case).
Creating a DataFrame: We convert the dictionary into a pandas DataFrame,
which organizes our data in a tabular format, making it easier to work with.
▸ Plotting the Violin Plot:
We set the figure size for better visibility.
The sns.violinplot(data=df) function creates the violin plot, where each
'violin' represents the distribution of completion times for each project.
Titles and labels are added for clarity.
Displaying the Plot: Finally, plt.show() renders the plot on the screen.
This exercise helps you understand how to visualize data distributions
effectively, which is crucial for data analysis and interpretation in real-
world scenarios.
【Trivia】
Violin plots are particularly useful when comparing multiple groups, as they
not only show the median and interquartile ranges like box plots but also
provide insights into the density of the data at different values. This makes
them ideal for understanding the underlying distribution of completion
times across different projects.
62. Generating a 3D Scatter Plot with Python
Importance★★★☆☆
Difficulty★★☆☆☆
A client wants to visualize customer data in a 3D scatter plot to identify
patterns in purchasing behavior based on three different features: age,
income, and spending score. Your task is to generate a dataset with 500
random points representing these features and create a 3D scatter plot to
visualize this data.
【Data Generation Code Example】
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(0)
data = pd.DataFrame({
'Age': np.random.randint(18, 70, 500),
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
data = pd.DataFrame({
})
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('Age')
ax.set_ylabel('Income')
ax.set_zlabel('Spending Score')
plt.show()
In this exercise, you will learn how to generate a 3D scatter plot using
Python, which is a powerful tool for data analysis and visualization. The
goal is to understand how to manipulate data and visualize it effectively,
which is essential for making data-driven decisions.
To start, we generate a dataset with 500 random points. This dataset
includes three features: Age, Income, and Spending Score. The numpy
library is used to create random integers for these features, which simulates
customer data. The pandas library is then used to create a DataFrame,
which is a convenient way to store and manipulate tabular data.
Next, we visualize this data in a 3D scatter plot using matplotlib. The
plt.figure() function creates a new figure for plotting, and add_subplot(111,
projection='3d') specifies that we want a 3D plot. The scatter method is
used to plot the points in 3D space, where we pass the three features as the
x, y, and z coordinates. The color and marker style can also be customized.
Finally, we label the axes and give the plot a title to make it clear what the
data represents. The plt.show() function displays the plot. This exercise not
only teaches you how to create visualizations but also emphasizes the
importance of understanding the data you are working with.
【Trivia】
3D scatter plots are particularly useful for visualizing relationships between
three variables, allowing analysts to identify trends, clusters, and outliers in
the data.
63. Daily Energy Consumption Line Plot
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst working for a utility company. Your manager has
asked you to analyze the daily energy consumption of a household over a
month to identify trends and patterns. Create a line plot that visualizes this
data, which will help in understanding peak usage times and potential
energy-saving opportunities. Generate the input data within your code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
print(data)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
plt.figure(figsize=(10, 5))
plt.plot(data['Date'], data['Energy Consumption (kWh)'], marker='o')
plt.xlabel('Date')
plt.xticks(rotation=45)
plt.grid()
plt.tight_layout()
plt.show()
In this exercise, you will learn how to visualize data using Python,
specifically focusing on daily energy consumption. The goal is to create a
line plot that clearly represents the energy usage of a household over a
month.
First, we import the necessary libraries: NumPy for numerical operations,
Pandas for data manipulation, and Matplotlib for plotting.
Next, we set a random seed to ensure that our results are reproducible. We
generate a date range for 30 days starting from January 1, 2024.
For the energy consumption data, we use a Poisson distribution to simulate
daily energy usage, which is a common approach for modeling count-based
data. We also add a linear trend to the generated data to reflect increasing
consumption over the month.
We then create a DataFrame to hold our dates and energy consumption
values.
In the plotting section, we set the figure size for better visibility. We plot the
data using a line plot with markers for each point. The title, x-label, and y-
label are added for clarity. We also rotate the x-ticks for better readability
and enable a grid for easier visualization of trends. Finally, we call
plt.show() to display the plot.
This exercise helps you understand how to manipulate and visualize data
using Python, which is a crucial skill in data analysis and statistics.
【Trivia】
Visualizing data is a powerful way to communicate insights. Line plots are
particularly useful for showing trends over time, making them ideal for time
series data like energy consumption.
64. Library Book Borrowing Analysis
Importance★★★☆☆
Difficulty★★☆☆☆
A local library wants to analyze the borrowing patterns of its patrons. They
have data on the number of books borrowed from 8 different libraries over
the last month. Your task is to create a bar chart that visualizes this data.
Generate the sample data within your code.
【Data Generation Code Example】
import numpy as np
import pandas as pd
libraries = ['Library A', 'Library B', 'Library C', 'Library D', 'Library E',
'Library F', 'Library G', 'Library H']
【Code Answer】
import numpy as np
import pandas as pd
libraries = ['Library A', 'Library B', 'Library C', 'Library D', 'Library E',
'Library F', 'Library G', 'Library H']
plt.xlabel('Libraries')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
In this exercise, you will learn how to visualize data using Python's
Matplotlib library. Visualization is a crucial part of data analysis as it helps
to convey information clearly and effectively.
First, we import the necessary libraries: NumPy for numerical operations,
Pandas for data manipulation, and Matplotlib for plotting.
Next, we define a list of library names and generate random borrowing data
using NumPy's randint function. This function creates an array of random
integers within a specified range. In this case, we simulate the number of
books borrowed from each library, with values ranging between 50 and 200.
We then create a Pandas DataFrame to organize our data, which makes it
easier to manipulate and visualize. The DataFrame contains two columns:
one for the library names and another for the corresponding number of
books borrowed.
Finally, we use Matplotlib to create a bar chart. The bar function draws the
bars, with the libraries on the x-axis and the number of books borrowed on
the y-axis. We also add titles and labels to the axes for clarity. The xticks
function rotates the x-axis labels for better readability, and tight_layout
ensures that the layout fits well within the figure area. The show function
displays the chart.
By completing this exercise, you will gain practical experience in data
visualization, which is an essential skill in data analysis and statistics.
【Trivia】
Data visualization helps to identify trends, patterns, and outliers in data,
making it easier to draw insights and make informed decisions.
65. Analyzing Furniture Distribution in a
Household
Importance★★★★☆
Difficulty★★☆☆☆
You have been hired by a home decor company to analyze the distribution
of different types of furniture in a client's house. The company wants to
understand which categories of furniture are most common to optimize their
future product offerings.
Create a Python program that generates a pie chart showing the distribution
of different furniture types in the client's house.
The furniture types are as follows: "Chairs", "Tables", "Beds", "Sofas",
"Cabinets", and "Others".
Use this data to generate a pie chart, and make sure the proportions are
accurately represented.
Your task is to write a Python program to achieve this.
【Code Answer】
sizes = list(furniture_data.values())
plt.figure(figsize=(6, 6))
plt.show()
This exercise requires you to generate a pie chart based on the distribution
of different furniture types in a house.
In Python, the matplotlib library is typically used for creating visualizations
like pie charts.
The data for this exercise is stored in a dictionary where the keys represent
the types of furniture, and the values represent their quantities.
The plt.pie() function from matplotlib is used to create the pie chart.
The sizes list represents the portions of the pie, which are the values from
our dictionary.
The labels list contains the names of each furniture type.
The autopct argument is used to display the percentage of each slice directly
on the pie chart, formatted to one decimal place.
The startangle argument rotates the start of the pie chart, making it more
aesthetically pleasing.
Finally, plt.axis('equal') ensures that the pie chart is perfectly circular.
This exercise helps you practice using basic data structures like dictionaries,
along with the matplotlib library for visualization, which is crucial in data
analysis and presentation.
【Trivia】
The first pie chart was created by Scottish engineer William Playfair in
1801. It was used to represent the proportion of different exports from
Scotland to various countries. Pie charts have since become a staple in data
visualization, particularly for representing categorical data distributions.
66. Creating a Histogram for Age Distribution
Analysis
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a retail company that wants to
understand the age distribution of their customers.
The company has collected the ages of 900 customers and wants to
visualize this data in a histogram to better understand the distribution and
any patterns that might emerge.
Using Python, generate a dataset representing the ages of 900 customers,
and then create a histogram to display the age distribution.
The histogram should be analyzed to provide insights on the most common
age groups among the customers.
import numpy as np
【Code Answer】
import numpy as np
plt.ylabel('Number of Customers')
plt.show()
In this exercise, you are required to generate a dataset that represents the
ages of 900 customers and visualize the distribution using a histogram.
A histogram is a graphical representation of the distribution of numerical
data, where the data is divided into bins (or intervals), and the frequency of
data points within each bin is depicted by the height of the corresponding
bar.
To start, you will use the numpy library to generate a random sample of
ages. In this case, we generate 900 random integers between 18 and 80,
simulating the ages of customers.
The function np.random.randint(18, 81, 900) is used to create this dataset.
The parameters 18 and 81 set the range of ages (inclusive for 18 and
exclusive for 81), while 900 specifies the number of data points.
After generating the data, the next step is to create the histogram using the
matplotlib library.
The function plt.hist(ages, bins=15, color='blue', edgecolor='black') is used
to create the histogram. The bins parameter controls the number of intervals
(15 in this case), color sets the color of the bars, and edgecolor defines the
color of the bar edges.
Finally, the title and labels are added to the histogram to make it more
informative.
plt.title('Age Distribution of 900 Customers') sets the title of the histogram,
while plt.xlabel('Age') and plt.ylabel('Number of Customers') label the x-
axis and y-axis, respectively.
The plt.show() function then displays the histogram.
Through this exercise, you gain practical experience in data visualization,
specifically in creating and interpreting histograms, which is a crucial skill
in data analysis.
【Trivia】
Histograms are one of the most commonly used tools in exploratory data
analysis.
They provide a visual summary of the data distribution and can help
identify patterns such as skewness, the presence of outliers, and the
modality of the data.
In the context of customer data, histograms are particularly useful for
understanding demographics, spending behaviors, and other characteristics
that follow a distribution.
67. Plotting a Rational Regression Curve with
Synthetic Data
Importance★★★☆☆
Difficulty★★☆☆☆
A customer wants to analyze the relationship between the amount of
advertising spend and sales revenue for their new product. They suspect
that the relationship is not linear and would like to visualize this using a
rational regression curve. Your task is to generate synthetic data that
simulates this scenario and plot a rational regression curve based on the
generated data.
【Data Generation Code Example】
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
return a * x / (b + x)
x_fit = np.linspace(1, 100, 100) # Generate x values for the fit line
【Trivia】
Rational regression is particularly useful when the relationship between
variables is expected to be hyperbolic, which is common in economic and
biological systems. Understanding this type of regression can provide
deeper insights into complex relationships in data.
68. Analyzing and Visualizing Bird Species Weight
Data Using Box Plots
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a wildlife conservation organization.
The organization has collected data on the weights of 10 different bird
species in a specific region.
Your task is to analyze the weight distribution of these species to identify
any outliers and compare their weight ranges.
To do this, you need to generate a box plot that displays the distribution of
weights for each species.
The data for each species should be generated using random values to
simulate realistic bird weights.
Use Python to create this box plot and ensure that your code is efficient and
clear.
import numpy as np
np.random.seed(42)
【Code Answer】
import numpy as np
np.random.seed(42)
plt.figure(figsize=(10, 6))
plt.boxplot(data, labels=species)
plt.xlabel('Bird Species')
plt.ylabel('Weight (grams)')
plt.show()
In this exercise, we aim to use Python to analyze and visualize the weight
distributions of 10 different bird species.
We generate synthetic data to simulate the weights for each bird species
using the numpy library.
The weights are normally distributed around a mean that increases by 5
grams for each species, starting from 50 grams.
The numpy.random.normal() function is used for this purpose, where loc
specifies the mean, scale specifies the standard deviation, and size
determines the number of samples.
This ensures that each species has a distinct weight range while still
allowing for some overlap.
Once the data is generated, we use the matplotlib library to create a box
plot.
A box plot is an effective way to visualize the spread of the data, showing
the median, quartiles, and potential outliers.
The plt.boxplot() function takes in the data and labels, and then the plot is
customized with a title and axis labels.
This visual representation allows us to quickly compare the weight
distributions of the different species and identify any species with unusually
high or low weights.
【Trivia】
Box plots were introduced by John Tukey in the 1970s as a part of his work
in exploratory data analysis.
They are particularly useful for comparing the distribution and variability of
data across different categories.
In the context of wildlife studies, box plots can help researchers quickly
assess variations in animal characteristics, such as weight or size, across
different species or regions.
69. Generating and Visualizing a Heatmap from a
55x55 Matrix of Random Values
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst at a tech company. The product team has requested a
heatmap visualization to help understand the distribution of random data
across a grid. Your task is to generate a 55x55 matrix filled with random
values, then create a heatmap to visualize this data.Please generate the data
within the script (do not load from an external file), and provide the
necessary code to produce the heatmap. The visualization should help the
team identify areas of high and low concentration in the data grid.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
stds = [10, 1, 2, 1, 1, 6, 5]
plt.figure(figsize=(12, 8))
plt.xlabel('Sport')
plt.ylabel('Score')
plt.show()
In this exercise, the main goal is to teach the reader how to generate and
analyze a violin plot using Python. The violin plot is a powerful tool for
visualizing the distribution of data across different categories, in this case,
various sports.We start by importing the necessary libraries: numpy for
generating random data, pandas for data manipulation, and matplotlib and
seaborn for data visualization. The data is generated by creating a normal
distribution of scores for each sport. This is done using
numpy.random.normal, which takes a mean, a standard deviation, and the
number of data points to generate.The data is stored in a pandas DataFrame,
which makes it easy to manipulate and visualize. We then melt this
DataFrame to convert it into a format suitable for Seaborn, where each row
represents a score and its corresponding sport. This is necessary because
Seaborn's violinplot function expects the data to be in a long format.The
violinplot function is then used to create the plot, where we specify the x-
axis as the sport categories and the y-axis as the scores. The resulting plot
shows the distribution of scores for each sport, giving insights into the
variability and distribution of scores across different sports.Understanding
and interpreting these violin plots is crucial for analyzing data distributions,
which is an important skill in data analysis and statistics. The use of
different sports with varying score ranges in this problem provides a
realistic scenario, helping readers grasp the concept in a practical context.
【Trivia】
Violin plots are similar to box plots, but they provide a more detailed view
of the data's distribution by also showing the kernel density estimation. This
makes violin plots particularly useful for comparing the distribution of
multiple categories in a single visualization.
71. Generating a 3D Surface Plot of a Chaotic
System
Importance★★★★☆
Difficulty★★★☆☆
A customer in the field of data visualization wants to analyze a chaotic
system's behavior over time. They require a 3D surface plot to visualize the
relationship between three variables: time, x, and y. Your task is to generate
the required input data within the code and create a surface plot that
illustrates this chaotic behavior.
【Data Generation Code Example】
import numpy as np
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
【Code Answer】
import numpy as np
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
【Trivia】
Did you know that chaotic systems can be found in various fields, including
weather patterns, stock market fluctuations, and even population dynamics
in ecology? Understanding these systems can help in predicting behaviors
and making informed decisions.
72. Analyzing Monthly Rainfall Trends Over
Three Years
Importance★★★★☆
Difficulty★★★☆☆
You are a data analyst working for a weather forecasting company.
Your task is to analyze and visualize the monthly rainfall data over the past
three years.
You need to create a line plot that shows the monthly rainfall trends to
identify any patterns or anomalies that might help in improving the
accuracy of future predictions.
The data should include monthly rainfall amounts for three consecutive
years.
Generate the sample data within your code and create a line plot to display
the results.
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.plot(data['Date'], data['Rainfall'])
plt.xlabel('Date')
plt.ylabel('Rainfall (mm)')
plt.grid(True)
plt.show()
This exercise focuses on using Python for data analysis and statistical
visualization.
The problem simulates a real-world scenario where monthly rainfall data is
analyzed to identify trends.
To begin, a date range is generated to cover three years, from January 2021
to December 2023.
Random rainfall data is generated using numpy's uniform function, which
creates a realistic range of values between 50 and 200 mm.
This simulates the variation in monthly rainfall.
The generated dates and rainfall values are then combined into a pandas
DataFrame, which is a common structure for managing and analyzing data
in Python.
Next, the data is visualized using matplotlib, a powerful plotting library.
The line plot generated by plt.plot() allows for easy identification of trends
or anomalies in the data over the three-year period.
Grid lines are added to the plot to improve readability, and labels are
provided for both the axes and the title.
This type of visualization is crucial for understanding weather patterns and
could be used in conjunction with more advanced statistical methods to
improve forecasting models.
【Trivia】
Did you know that the highest recorded annual rainfall was 467.4 inches in
Mawsynram, India, in 1985?
This small village is one of the wettest places on Earth, receiving rain
almost every day during the monsoon season.
Studying such extreme weather conditions can help improve predictive
models for heavy rainfall and related natural disasters.
73. Scatter Plot Matrix Analysis for
Multidimensional Data in Marketing Analytics
Importance★★★★☆
Difficulty★★★☆☆
A marketing firm has collected data on 14 different metrics related to
customer behavior and product interactions across several campaigns.The
firm wants to understand the relationships between these metrics to identify
patterns or correlations that could inform future marketing strategies.Your
task is to generate a scatter plot matrix to visualize the pairwise
relationships between these metrics.You are required to first generate
synthetic data for these 14 metrics, ensuring that the data contains varying
degrees of correlation among different pairs of metrics.Then, use Python to
create a scatter plot matrix to visualize the relationships between all
possible pairs of metrics.Make sure to include proper labels and ensure the
matrix is easily interpretable for non-technical stakeholders.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(data, columns=metrics)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
import seaborn as sns
np.random.seed(0)
df = pd.DataFrame(data, columns=metrics)
sns.pairplot(df)
plt.show()
Scatter plot matrices are a useful tool for visualizing the relationships
between multiple variables.Each cell in the matrix represents a scatter plot
of a pair of metrics, allowing us to observe the pairwise correlations
visually.In the context of marketing analytics, this can help identify
relationships between different customer behavior metrics.For instance, a
strong linear pattern in a scatter plot between two metrics might indicate a
correlation, suggesting that changes in one metric are associated with
changes in the other.To perform this analysis, we first generated synthetic
data using the numpy.random.multivariate_normal function.This function
creates a multivariate normal distribution with a specified mean and
covariance matrix.In this case, the data was generated with some built-in
correlations by manipulating the covariance matrix.The synthetic data is
then loaded into a pandas DataFrame, which is ideal for handling and
analyzing tabular data.To visualize the relationships between the metrics,
we use the seaborn.pairplot function, which automatically creates a scatter
plot matrix.The plt.suptitle function is used to add a title to the entire
matrix, and the matrix is displayed using plt.show().This visualization helps
in quickly identifying any potential correlations or patterns across the
different metrics, providing valuable insights for marketing strategy.
【Trivia】
Scatter plot matrices are particularly useful in the early stages of data
exploration.They allow analysts to quickly assess the relationships between
variables without making any assumptions about the nature of these
relationships.One limitation of scatter plot matrices is that they can become
difficult to interpret when dealing with very high-dimensional data (more
than 20 dimensions).In such cases, dimensionality reduction techniques like
PCA (Principal Component Analysis) might be used before visualization.
74. Create a Bar Chart of Employee Counts in
Different Companies
Importance★★★★★
Difficulty★★☆☆☆
A client from a business consulting firm wants to visualize the number of
employees across various companies they are analyzing.
Your task is to create a bar chart displaying the number of employees in 9
different companies.
Use Python to generate the data and create the chart.
Ensure that the bar chart is clear, with each company labeled properly on
the x-axis and the number of employees on the y-axis.
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.xlabel('Company')
plt.ylabel('Number of Employees')
plt.xticks(rotation=45)
plt.show()
In this exercise, you will learn how to generate and visualize data using
Python.
The goal is to create a bar chart that shows the number of employees in
different companies.
To start, you use numpy to generate random employee counts for each
company.
These counts range from 50 to 500. You then store the data in a pandas
DataFrame for easy manipulation.
Next, you use matplotlib, a popular library for data visualization, to create
the bar chart.
The plt.bar function is used to create the bars, with the company names on
the x-axis and the number of employees on the y-axis.
Labels for the x-axis and y-axis are added using plt.xlabel and plt.ylabel,
respectively.
The chart title is set with plt.title. Finally, plt.xticks(rotation=45) rotates the
x-axis labels for better readability.
This exercise reinforces the process of data generation, manipulation, and
visualization, which are crucial skills in data analysis and statistics.
【Trivia】
Bar charts are one of the most common ways to visualize categorical data.
They are particularly effective when you want to compare quantities across
different categories.
Matplotlib offers extensive customization options for bar charts, including
color, width, and orientation, allowing for detailed and precise visual
representation of data.
75. Generating a Pie Chart for Gadget
Distribution in a Store
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a retail store that sells various types of
gadgets. The store manager has asked you to create a visual representation
of the current distribution of different types of gadgets in the store.Your task
is to generate a pie chart to illustrate the proportion of each type of gadget
in the inventory. For this exercise, you will need to create a sample dataset
that includes the following gadget types: 'Smartphones', 'Tablets', 'Laptops',
'Cameras', and 'Smartwatches'. Each gadget type should have a different
quantity, and these quantities should be generated randomly.Write Python
code to create this dataset, analyze the data, and generate a pie chart that
shows the distribution of the gadgets.
【Data Generation Code Example】
import random
gadget_types=['Smartphones','Tablets','Laptops','Cameras','Smartwatches']
【Code Answer】
import random
gadget_types=['Smartphones','Tablets','Laptops','Cameras','Smartwatches']
plt.pie(quantities,labels=gadget_types,autopct='%1.1f%%',startangle=140
)
## Setting the title of the chart
plt.show()
To solve this problem, the first step is to import the necessary libraries,
which are random for generating random numbers and matplotlib.pyplot for
creating the pie chart.
Next, you create a list of gadget types that are available in the store. Each
gadget type is represented as a string in a list. After defining the gadget
types, you generate a list of quantities using the random.randint() function,
which generates random integers between 50 and 200 for each gadget type.
This randomness simulates different stock levels for each gadget type.
With the data prepared, you use the plt.pie() function from the matplotlib
library to create the pie chart. The labels parameter assigns the gadget types
to their corresponding slices in the chart. The autopct parameter formats the
percentage labels on each slice, and startangle=140 rotates the chart to start
from a specific angle for better visualization.
Finally, the plt.title() function is used to add a title to the chart, making it
clear that the chart represents the gadget distribution in the store. The
plt.show() function then displays the pie chart to the user.
This exercise emphasizes the importance of data visualization in
understanding and analyzing data distributions. It also demonstrates how to
use Python for generating random data and creating visual representations,
which are essential skills in data analysis.
【Trivia】
Pie charts are best used when you need to show the proportions of a whole
and are most effective when there are limited categories to compare. If there
are too many categories or if the differences between the categories are
subtle, a pie chart might not be the best choice for data visualization. In
such cases, a bar chart or a histogram might be more appropriate.
76. Histogram of Weights for Data Analysis
Practice
Importance★★★★☆
Difficulty★★☆☆☆
A health and fitness company wants to analyze the weights of its 1000
clients to understand their distribution. Create a histogram that displays the
weights of these individuals. The weights should be generated using a
normal distribution with a mean of 70 kg and a standard deviation of 10 kg.
Your task is to write the code that generates the sample data and creates the
histogram.
【Data Generation Code Example】
import numpy as np
np.random.seed(0)
plt.title('Histogram of Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
plt.show()
【Diagram Answer】
【Code Answer】
import numpy as np
np.random.seed(0)
plt.title('Histogram of Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
plt.show()
【Trivia】
Histograms are a fundamental tool in data analysis and statistics, allowing
us to visualize the distribution of data points across different ranges. They
are particularly useful for identifying patterns, such as skewness or the
presence of outliers, in the data.
77. Comparing Insect Lengths Using Python Data
Analysis
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst at a research institute studying various insect species.
You have been tasked with visualizing the lengths of 11 different types of
insects to understand their size distribution. Create a box plot that compares
the lengths of these insects. Use the provided code to generate sample data
for the analysis.
【Data Generation Code Example】
import numpy as np
import pandas as pd
【Code Answer】
import numpy as np
import pandas as pd
plt.figure(figsize=(10, 6))
plt.boxplot(data, labels=species)
plt.xlabel('Insect Species')
plt.ylabel('Length (mm)')
plt.grid()
plt.show()
In this exercise, you will learn how to create a box plot using Python,
specifically with the Matplotlib library. Box plots are useful for visualizing
the distribution of data points, highlighting the median, quartiles, and
potential outliers.
Import Libraries: The first step involves importing the necessary libraries:
NumPy for numerical operations, Pandas for data manipulation, and
Matplotlib for plotting.
Generate Sample Data: The sample data consists of lengths of 11 different
insect species. We use the np.random.normal function to simulate lengths
based on a normal distribution. Each species has a different mean (loc) and
standard deviation (scale), which reflects the variability in insect sizes.
Create a DataFrame: We organize the generated lengths into a Pandas
DataFrame, where each column corresponds to an insect species and each
row represents a length measurement.
Plotting: Using Matplotlib, we create a box plot. The plt.boxplot function
takes the DataFrame and labels it with the species names. The plot displays
the median, quartiles, and any outliers in the data.
Customization: We add titles and labels to make the plot informative. The
plt.grid() function enhances readability by adding a grid to the background.
Display the Plot: Finally, plt.show() renders the plot, allowing you to
visualize the lengths of the insects.
This exercise not only helps in understanding how to visualize data but also
emphasizes the importance of data analysis in biological research.
【Trivia】
Box plots are particularly useful in comparing multiple groups and can
reveal insights about the data distribution that might not be obvious from
other types of plots.
78. Heatmap Generation Using Python for Data
Analysis
Importance★★★★☆
Difficulty★★★☆☆
A retail company wants to visualize the sales performance across different
regions in a 60x60 grid format. Create a heatmap to represent random sales
data for each region. Your task is to generate this data and visualize it using
Python.
【Data Generation Code Example】
import numpy as np
【Code Answer】
import numpy as np
plt.colorbar(label='Sales Performance')
plt.ylabel('Region Y')
plt.show()
【Trivia】
Heatmaps are widely used in various fields, including finance, biology, and
web analytics, to visualize complex data in a more understandable format.
79. Comparing Activity Durations with a Violin
Plot
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst at a fitness center, and you need to compare the
durations of eight different activities to understand which ones take the
most time. Create a violin plot that visualizes the distribution of durations
for these activities. The activities are: Running, Cycling, Swimming, Yoga,
Weightlifting, Pilates, Hiking, and Dancing.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(durations)
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
plt.figure(figsize=(12, 6))
sns.violinplot(data=df)
plt.xlabel('Activities')
plt.ylabel('Duration (minutes)')
plt.xticks(rotation=45)
plt.show()
In this exercise, you will learn how to create a violin plot using Python's
Seaborn and Matplotlib libraries, which are essential for data visualization.
A violin plot is a method of plotting numeric data and can be understood as
a combination of a box plot and a kernel density plot. It provides a visual
representation of the distribution of the data across different categories,
which in this case are the eight activities.
▸ Data Generation:
First, we generate synthetic data for the durations of each activity using a
normal distribution. The numpy.random.normal function is used to create
random data points centered around a mean (loc) with some variability
(scale). This simulates realistic durations for each activity.
▸ DataFrame Creation:
The generated data is stored in a Pandas DataFrame, which makes it easy to
manipulate and visualize the data. Each column in the DataFrame
corresponds to a different activity, and each row corresponds to a different
observation of that activity's duration.
▸ Plotting:
We utilize Seaborn's violinplot function to create the plot. The data
parameter takes the DataFrame we created. The plot displays the
distribution of durations for each activity, making it easy to compare them
visually.
The plt.title, plt.xlabel, and plt.ylabel functions are used to label the plot
appropriately. The plt.xticks(rotation=45) function rotates the x-axis labels
for better readability.
This exercise not only helps you understand how to visualize data
distributions but also prepares you for more complex data analysis tasks.
【Trivia】
Violin plots are particularly useful when comparing multiple categories, as
they show not only the central tendency (mean or median) but also the
distribution shape, which can reveal insights about the variability and
skewness of the data.
80. 3D Scatter Plot Generation with Python
Importance★★★★☆
Difficulty★★★☆☆
A customer wants to visualize the distribution of their sales data across
three different regions. They have requested a 3D scatter plot to better
understand the performance in each region. Create a Python script that
generates 600 random data points representing sales figures in three
dimensions (X, Y, Z) and plots them in a 3D scatter plot.
【Data Generation Code Example】
import numpy as np
np.random.seed(42)
x = np.random.rand(600) * 100
y = np.random.rand(600) * 100
z = np.random.rand(600) * 100
【Code Answer】
import numpy as np
np.random.seed(42)
x = np.random.rand(600) * 100
y = np.random.rand(600) * 100
z = np.random.rand(600) * 100
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
plt.show()
import pandas as pd
print(data)
【Diagram Answer】
【Code Answer】
import pandas as pd
plt.figure(figsize=(10, 5))
plt.xticks(data['Year'])
plt.legend()
plt.grid()
plt.show()
In this exercise, you are tasked with visualizing a company's revenue data
over a decade using Python. The goal is to create a line plot that effectively
communicates the trends in revenue over the years.
To achieve this, we first need to import the necessary libraries: pandas for
data manipulation and matplotlib.pyplot for plotting.
Next, we generate the input data. In this case, we create a list of years from
2014 to 2023 and a corresponding list of revenue figures. This data is then
organized into a DataFrame, which is a convenient structure for handling
tabular data in Python.
The plotting process begins by setting the figure size for better visibility.
We then use the plot function to create a line plot, specifying the x-axis as
the years and the y-axis as the revenue. The marker parameter adds points
to the line, making it easier to see individual data points.
We enhance the plot by adding a title, labeling the axes, and customizing
the x-ticks to show each year. A legend is included to identify the revenue
line, and a grid is added for better readability.
Finally, we call plt.show() to display the plot. This visualization will help
the company's management to quickly grasp revenue trends and make
informed decisions based on historical performance.
【Trivia】
Visualizing data is a crucial part of data analysis, as it allows stakeholders
to quickly understand complex information. Line plots are particularly
effective for showing trends over time, making them a popular choice in
business analytics.
82. Scatter Plot Matrix Analysis for a 15-
Dimensional Marketing Dataset
Importance★★★★☆
Difficulty★★★☆☆
You have been hired as a data analyst by a marketing firm that recently
conducted a comprehensive survey on customer preferences across 15
different product categories.Your task is to visualize the relationships
among these 15 variables to identify any underlying patterns or correlations
that might help in the development of targeted marketing
strategies.Generate a scatter plot matrix to visualize the pairwise
relationships between all 15 variables in the dataset.Ensure the data is
randomly generated and resembles typical customer preference scores,
ranging between 0 and 100.Use this visualization to identify any clusters or
correlations that could inform marketing decisions.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(42)
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(42)
sns.pairplot(data)
plt.show()
import numpy as np
import pandas as pd
data
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
plt.xlabel('Parks')
plt.ylabel('Number of Visitors')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
In this exercise, you will learn how to visualize data using Python,
specifically focusing on creating a bar chart with the Matplotlib library.
Data Generation: The first step involves generating sample data. We create
a list of park names and a corresponding list of random visitor numbers
using NumPy. The np.random.randint function generates random integers
between 100 and 1000, simulating the number of visitors to each park.
Data Organization: We then organize this data into a Pandas DataFrame,
which is a powerful data structure for handling and analyzing data in
Python. This DataFrame contains two columns: one for the park names and
another for the visitor counts.
Data Visualization: The next part involves visualizing this data. We use the
plt.bar function from Matplotlib to create a bar chart. The x-axis represents
the parks, while the y-axis represents the number of visitors. We also
customize the chart with a title, axis labels, and rotate the x-axis labels for
better readability.
Displaying the Chart: Finally, the plt.show() function is called to display the
chart. This process helps you understand how to analyze and visualize data
effectively, which is a crucial skill in data analysis and statistics.
This exercise not only reinforces your understanding of Python
programming but also enhances your ability to interpret and present data
visually, making it a valuable tool in your analytical toolkit.
【Trivia】
Did you know that visualizing data can significantly improve
comprehension and retention of information? Studies show that people
remember visual information better than text alone, making data
visualization an essential skill in data analysis.
84. Vehicle Fleet Distribution Analysis
Importance★★★★☆
Difficulty★★★☆☆
You are working as a data analyst for a logistics company that manages a
diverse fleet of vehicles.
The company wants to understand the distribution of different types of
vehicles in their fleet to optimize resource allocation.
Your task is to generate a pie chart that visually represents this distribution.
Use Python to create this chart, and ensure that you provide the company
with insights into the proportions of each vehicle type.
To start, generate a sample dataset of vehicles, then proceed to create the
chart.
import random
vehicle_types=["Truck","Van","Car","Motorcycle","Bicycle"]
fleet_data=dict(zip(vehicle_types,vehicle_counts))
【Diagram Answer】
【Code Answer】
vehicle_types=["Truck","Van","Car","Motorcycle","Bicycle"]
plt.pie(vehicle_counts,labels=vehicle_types,autopct='%1.1f%%',startangl
e=140)
plt.title('Distribution of Vehicle Types in the Fleet')
plt.show()
【Trivia】
Pie charts, while useful for displaying simple data distributions, can be
misleading if not used carefully.
For example, they are less effective when there are many categories with
small differences between them, making it difficult to discern proportions.
In such cases, bar charts or other types of visualizations might be more
appropriate.
85. Histogram Analysis of Heights for Business
Insights
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a health and wellness company. The
company is conducting a study to better understand the height distribution
of its client base, which includes 1100 individuals.Your task is to generate a
histogram of the heights to provide a visual understanding of the
distribution. This will help the company tailor their services, such as
designing ergonomic furniture or fitness programs, to better fit the physical
characteristics of their clients.Create a Python script that generates a
histogram based on simulated height data for these 1100 individuals.
The histogram should provide insights into the overall distribution and any
potential anomalies.
Your deliverable should include both the code to generate the data and the
code to create the histogram.
import numpy as np
【Code Answer】
import numpy as np
【Trivia】
Histograms are one of the most common tools in data analysis to
understand the distribution of a single variable.
They can reveal important characteristics of data, such as skewness,
bimodality, and the presence of outliers.
For example, in quality control, histograms are frequently used to identify
whether processes meet standards or if there are defects that need
addressing.
86. Moving Average Curve with Synthetic Data
Importance★★★☆☆
Difficulty★★☆☆☆
You are a data analyst for a retail company. You have been tasked with
analyzing sales data to identify trends over time. Your goal is to plot a
moving average curve to smooth out the fluctuations in the sales data.
Create a Python code snippet that generates synthetic sales data and plots
the moving average curve. Use this code as a basis for your analysis.
【Data Generation Code Example】
import numpy as np
import pandas as pd
np.random.seed(0)
data.head()
【Diagram Answer】
【Code Answer】
import numpy as np
import pandas as pd
np.random.seed(0)
data['Moving_Average'] = data['Sales'].rolling(window=7).mean()
plt.figure(figsize=(12,6))
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.grid()
plt.show()
In this exercise, you will learn how to plot a moving average curve using
synthetic sales data in Python. The moving average is a statistical
calculation that helps smooth out short-term fluctuations and highlight
longer-term trends in data.
First, we generate synthetic sales data using a normal distribution. The
numpy library is used to create random sales figures, which are then
cumulatively summed to simulate a sales trend over time. We also add some
noise to the sales data to make it more realistic.
Next, we create a DataFrame using the pandas library, which allows us to
organize our data efficiently. The DataFrame consists of two columns:
'Date' and 'Sales'. We then compute the moving average of the sales data
using the rolling() method, specifying a window of 7 days. This means that
each point in the moving average series is the average of the sales figures
from the past 7 days.
Finally, we use matplotlib to create a visual representation of the sales data
and the moving average. We plot the sales data in blue and the moving
average in orange, adding titles and labels to make the chart informative.
The plt.show() function displays the plot.
This exercise not only helps you understand how to plot data but also
emphasizes the importance of moving averages in data analysis, particularly
in identifying trends over time.
【Trivia】
Did you know that moving averages are widely used in various fields,
including finance, economics, and even meteorology? They help analysts
make informed decisions by filtering out noise and providing a clearer view
of trends.
87. Creating a Box Plot to Compare Fruit Prices
Importance★★★★☆
Difficulty★★☆☆☆
You are working as a data analyst for a grocery store chain.
The store manager has asked you to compare the prices of different types of
fruits
sold across various branches to identify pricing patterns.
Your task is to create a box plot that visualizes the price distribution of 12
different types of fruits.
The data for these fruit prices will be generated randomly for this exercise.
Use Python to create this visualization.
import random
import pandas as pd
## Generate random fruit price data for 12 types of fruits across multiple
stores
df = pd.DataFrame(data)
【Diagram Answer】
【Code Answer】
import random
import pandas as pd
## Generate random fruit price data for 12 types of fruits across multiple
stores
df = pd.DataFrame(data)
plt.figure(figsize=(10, 6))
df.boxplot()
plt.xlabel("Fruit Type")
plt.xticks(rotation=45)
plt.show()
【Trivia】
Did you know that box plots were first introduced by John Tukey in the
1970s?
Tukey was an American mathematician who contributed significantly to the
field of statistics.
Box plots are particularly useful when comparing distributions between
multiple groups.
88. Generate a Heatmap from a 65x65 Matrix of
Random Values
Importance★★★☆☆
Difficulty★★☆☆☆
A retail company wants to analyze the sales performance across different
regions. They have decided to visualize the sales data using a heatmap.
Your task is to generate a 65x65 matrix of random sales figures (values
between 0 and 100) to represent sales data across various regions and then
create a heatmap from this matrix.
Please write the Python code to generate this data and create the heatmap
visualization.
import numpy as np
import matplotlib.pyplot as plt
【Code Answer】
import numpy as np
plt.colorbar(label='Sales Figures')
plt.ylabel('Region')
plt.show()
【Trivia】
Heatmaps are a powerful visualization tool that can represent complex data
in an easily interpretable format. They are commonly used in various fields,
including finance, marketing, and health sciences, to visualize patterns and
trends.
89. Comparative Analysis of Animal Speeds Using
Violin Plot
Importance★★★★☆
Difficulty★★☆☆☆
You are a data analyst working for a wildlife research organization. The
organization wants to visualize the speed distribution of various animals to
understand their mobility capabilities better. The speed data (in km/h) for
nine different types of animals has been collected. Your task is to create a
violin plot to compare the speed distributions of these animals and provide
insights into their mobility patterns.Write Python code that:Generates
synthetic speed data for nine types of animals. Each animal should have a
different number of speed observations, and the speed should vary around a
mean value typical for that species.Creates a violin plot to compare these
distributions visually.Ensure that the code is efficient and concise.
【Data Generation Code Example】
import numpy as np
np.random.seed(42)
【Code Answer】
import numpy as np
np.random.seed(42)
plt.figure(figsize=(12, 6))
sns.violinplot(x=animals, y=speeds)
plt.xlabel("Animal Type")
plt.ylabel("Speed (km/h)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
The code begins by importing the necessary libraries: numpy for generating
random data, matplotlib.pyplot for plotting, and seaborn for creating the
violin plot.
Next, we generate synthetic speed data for nine different animals. Each
animal's speed data is generated using a normal distribution, with a
specified mean and standard deviation that are typical for that species. For
example, cheetahs are known to be the fastest land animals, so their mean
speed is set to 100 km/h. In contrast, elephants are much slower, with a
mean speed of 25 km/h. The np.random.seed(42) ensures that the random
data generated is reproducible.
The data is then flattened into a list of tuples, where each tuple consists of
an animal type and its corresponding speed observation. This format is
necessary for Seaborn to correctly plot the data.
Finally, the sns.violinplot() function is used to create the violin plot. The x-
axis represents the different animal types, while the y-axis shows the speed
in km/h. The plot is customized with titles and axis labels to ensure clarity.
The plt.xticks(rotation=45) rotates the animal names on the x-axis for better
readability, and plt.tight_layout() adjusts the layout to prevent overlap.
A violin plot is particularly useful in this context because it displays the
distribution of speed data for each animal, showing both the median and the
range of speeds. This visualization helps in understanding not just the
average speed of each animal but also the variability in their speeds, which
can be crucial for ecological studies.
【Trivia】
Did you know that the cheetah, often considered the fastest land animal, can
accelerate from 0 to 100 km/h in just a few seconds? However, it can only
maintain this speed for a short burst due to the immense energy required.
90. Generating and Analyzing a 3D Surface Plot
of a Complex Algebraic Function
Importance★★★☆☆
Difficulty★★★☆☆
You have been hired by a mathematical visualization company to develop a
3D surface plot for a complex algebraic function.
The company needs a plot to visualize the function in three dimensions for
educational purposes.
Your task is to write a Python script that will generate a 3D surface plot of
the function and analyze its behavior.
Ensure that your code includes both the creation of input data and the
generation of the plot.
The company is interested in visualizing the function f(x, y) = sin(sqrt(x^2
+ y^2)) / sqrt(x^2 + y^2) over a defined range of x and y values.
They also require basic statistical analysis, such as calculating the mean and
standard deviation of the function's values across the grid.
Create the necessary input data programmatically, generate the 3D surface
plot, and include the required statistical analysis.
import numpy as np
X, Y = np.meshgrid(x, y)
【Code Answer】
import numpy as np
X, Y = np.meshgrid(x, y)
mean_z = np.mean(Z)
std_z = np.std(Z)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
In this exercise, we begin by creating a grid of x and y values using
NumPy’s linspace and meshgrid functions.
These functions are essential in numerical computing for generating evenly
spaced values over a specified range and creating coordinate matrices from
coordinate vectors.
The function f(x, y) is defined as sin(sqrt(x^2 + y^2)) / sqrt(x^2 + y^2).
This function is particularly interesting due to the singularity at the origin
(0,0), which we handle by replacing the NaN value with 1.
Next, we calculate basic statistical measures—mean and standard deviation
—of the computed Z values to provide insight into the distribution of the
function values.
These statistics are crucial in understanding the overall behavior of the
function over the defined grid.
Finally, the 3D surface plot is generated using Matplotlib’s plot_surface
function, which visualizes the relationship between x, y, and z in three
dimensions.
The plot is enhanced by adding labels to each axis and a title for clarity.
This exercise demonstrates the power of Python in both data generation and
visualization, along with basic statistical analysis, making it highly
applicable in mathematical modeling and education.
【Trivia】
The function f(x, y) = sin(sqrt(x^2 + y^2)) / sqrt(x^2 + y^2) is known as the
sinc function, which is significant in signal processing and is often used to
reconstruct a continuous signal from discrete samples.
The 3D surface plot of the sinc function reveals a wave-like structure,
which is characteristic of functions involving sine and cosine.
Chapter 4 Request for review evaluation
Dear Reader,
Thank you for taking the time to read this book on Python data analysis and
statistical analysis.
As an author, I am deeply grateful for your interest and support.
This book is designed for those who have a basic understanding of
programming and want to dive deeper into the world of data analysis using
Python.
Through 100 practical exercises, you will learn how to apply various
techniques and tools to extract insights from data.
One of the key features of this book is the inclusion of source code
execution result figures and detailed explanations.
This visual approach helps to simplify complex concepts and makes the
learning process more engaging and effective.
I sincerely hope that this book has been a valuable resource for you and that
you have gained new skills and knowledge that you can apply in your work
or personal projects.
If you have any feedback, comments, or suggestions, I would greatly
appreciate it if you could take a moment to share them with me.
Your input is invaluable as it helps me to improve my writing and create
better content for future readers.
Even if you only have time to leave a star rating, it would mean a lot to me
and would help to guide my future writing endeavors.
Thank you once again for your support and for being a part of this journey.
I look forward to continuing to provide valuable resources and to connect
with readers like yourself.
Best regards,
Appendix: Execution Environment
In this eBook, we will use Google Colab to run Python code.
Google Colab is a free Python execution environment that runs in your
browser.
Below are the steps to use Google Colab to execute Python code.