Report Python
Report Python
HANOI, 2023
HANOI, 2023
ACKNOWLEDGEMENT
------*------
We would like to extend our heartfelt gratitude to all who have supported and
guided us throughout the preparation of this report on the Python For Data
Science’s lecture.
First and foremost, we owe our deepest thanks to our esteemed lecturer, Nguyen
Tuan Long, whose insights, patience, and dedication have been invaluable. Their
profound knowledge and persistent encouragement have been the guiding light
throughout our research and analysis. Without their unwavering support and
guidance, this report would not have been possible.
We are also immensely grateful to our fellow classmates and group members. The
collaborative spirit, constant exchange of ideas, and shared dedication among all
four of us have made this journey both rewarding and enlightening. Each member
brought a unique perspective and expertise that enriched the report and made it a
true team effort.
Lastly, we appreciate our families and friends for their understanding, patience,
and encouragement during the late nights and intense discussions.
Sincerely,
TABLE OF CONTENTS
------*------
DESCRIPTION OF PROJECT IMPLEMENTATION PROCESS USING
MANIM .......................................................................................................................... 5
I. Design and development steps for the animations ................................................ 5
1.1. Understanding the Problem: ............................................................................ 5
1.2. Developing Animation for Each Step: .............................................................. 5
II. Tools and techniques used ...................................................................................... 7
III. Algorithms and logic behind the animations........................................................ 8
3.1. Introduction to problem ..................................................................................... 8
3.2. Constructing and optimizing the loss function................................................... 9
3.2.1. Model............................................................................................................... 9
3.2.2. Loss function ................................................................................................. 10
3.2.3. Gradient Descent .......................................................................................... 11
3.4. Discussion ........................................................................................................ 13
3.4.1. Problems that can be solved by linear regression ........................................ 13
3.4.2. Limitations of linear regression .................................................................... 13
IV. Challenges and difficulties faced. ........................................................................ 14
PRESENTATION OF COMPLETED ANIMATIONS ........................................... 16
I. Linear Regression model ..................................................................................... 16
1.1. Introduction................................................................................................... 16
II. Loss Function ........................................................................................................ 18
2.1. Introduce to Loss Function ........................................................................... 18
2.2. Gradient Descent .......................................................................................... 22
III. Training/testing process....................................................................................... 22
3.1. Training/Testing process ................................................................................ 22
3.2. Explain class MetricScene .............................................................................. 25
IV. Solve an application problems ............................................................................ 25
4.1. Explain class TableExamples ......................................................................... 25
4.2. Explain class FitScene2 .................................................................................. 27
V. Explain the last scene ........................................................................................... 31
EVALUATION OF USING MANIM ........................................................................ 32
CONCLUSION ............................................................................................................ 34
REFERENCES ............................................................................................................ 35
INTRODUCTION
------*------
In the realm of mathematical animation, few tools possess the capability to
seamlessly blend elegance with mathematical precision as effectively as Manim
— the Mathematical Animation Engine. Developed primarily by Grant Sanderson
of 3Blue1Brown, Manim has emerged as a powerful open-source library that
transforms the way we visualize and communicate mathematical concepts. This
essay explores the intersection of Manim's prowess and the fundamental
principles of linear regression, delving into how this animation engine breathes
life into abstract mathematical constructs.
This report serves a dual purpose — to unveil the capabilities of Manim in the
context of linear regression and to elucidate the fundamental principles that
underlie this statistical technique. By marrying the elegance of visual storytelling
with the rigor of mathematical analysis, we aim to not only showcase the
versatility of Manim but also provide a comprehensive understanding of linear
regression, making it accessible to a broader audience.
4
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
Data Understanding: The data set that we use is special, that the data must be
scattered in a straight line, so that the linear regression model will work effectively
and produce accurate results. As can be seen, we are using the data of housing
price and describe in a graph. The data actually be scattered on a straight line,
suitable for the application of linear regression.
5
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
and the 𝑏 or the β0 controls the intercept of the line. In machine learning, they
also known as the bias.
Gradient Descent: After that, we need to find the 𝑚 and 𝑏 coefficients that will
minimize that Loss Function, the coefficients can be solved with a variety of
techniques. One of them is gradient descent which is the algorithm for finding the
minimum value of the function 𝐽(θ) based on the derivative.
Train/Test Split: Dividing data into training and testing sets to evaluate model
performance. The training data set will then be used to fit the regression line the
test data set will then be used to validate the regression line this is done to make
sure that the regression performs well on data it has not seen before metrics used
to evaluate the linear regression vary from the r square standard error of the
estimate prediction intervals as well as statistical significance.
6
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
About tools, we wrote our code in Python classes that are inherited from
Manim's Scene class to structure each animation scene. Google Collab provides a
convenient browser-based environment for development without local software
installs. To combine the finished scenes into a video, we exported the animations
and edited them together using video software since rendering all scene in 1 video
by Manim is complex and wasting a lot of time. Moreover, we utilized many new
(to us) Python libraries to code what we want, for instance, we have used pandas
to read and transfer file of data into many graphs needed for visualizing data. This
step has opened a door of getting to know a simple way to create graph from data,
and as we are doing a statistical project involving coding, this helps a lot. Besides,
the sklearn library provides us essential functions such as defferentitate function
so that we do not need to put so much effort in building a whole function to
differentitate what we need.
7
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
In this section, we delve into the underlying algorithms and logic governing
the creation of animations, particularly focusing on linear regression as a
foundational technique. The linear regression algorithm solves problems with
real-valued outputs, for example: predicting house prices, predicting stock prices,
predicting age, etc. It is a supervised algorithm where the relationship between
the input and output is described by a linear function. This algorithm is also
referred to as linear fitting or linear least squares.
In the hypothetical situation where one is tasked with estimating the price of a
50-square-meter house, a logical approach involves plotting a line that best aligns
with the given data points. Subsequently, the price of the house at the 50-square-
meter point can be calculated.
8
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
𝑦 = 𝑤1 ∗ 𝑥 + 𝑤0
Therefore, the task of finding the straight line is equivalent to determining the
values of 𝑤0 and 𝑤1 . To facilitate formulaic representation, we designate the
data in the data table as (𝑥1 , 𝑦1 ) = (30,448), (𝑥2 , 𝑦2 ) = (32,509),...
This signifies that a house with area 𝑥𝑖 corresponds to an actual price 𝑦𝑖 . The
predicted value by the current model is denoted as:
ŷ𝑖 = 𝑤1 ∗ 𝑥𝑖 + 𝑤0 .
9
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
Here, 𝑁 denotes the number of data points. Several observations can be made:
• 𝐽 is non-negative.
• The smaller 𝐽 is, the closer the line is to the data points. If 𝐽 = 0, the line
perfectly intersects all data points.
Hence, the imperative which is to ascertain the optimal fitting line (model)
with the dataset transforms into the task of finding 𝑤0 , 𝑤1 such that the loss
function 𝐽 is minimized.
10
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
In this step, compute the partial derivative of 𝐽 with respect to 𝑤0 (Dw0 ) and
partial derivative with respect to 𝑤1 (Dw1 ) using the above equation for each data
point (𝑥𝑖 , 𝑦𝑖 ). Finally, calculate the sum of all compute the partial derivative of 𝐽
with respect to 𝑤0 .and partial derivative with respect to 𝑤1 . In other words, we
compute the gradient of 𝐽(𝑤0 , 𝑤1 ) for the current dataset.
11
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
In essence, iterate Step 2 a sufficiently large number of times (e.g., 100 or 1000
iterations depending on the problem and the learning rate coefficient) until reaches
a small enough value. The value of 𝑤0 , 𝑤1 that we are left with now will 𝐽(𝑤0 , 𝑤1 )
be the optimum values.
Choosing the learning rate is extremely important. There are three scenarios to
consider:
• If 𝐿 is small: each time the function decreases very little so it takes many times
to perform step 2 for the function to reach the smallest value
• If 𝐿 is reasonable: after a reasonable number of step 2 iterations, the function
will reach a small enough value.
• If 𝐿 is too large: will cause overshoot and never reach the minimum value of
the function.
12
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
The best way to check whether the learning rate is appropriate is to examine
the value of the function after each execution of step 2 by plotting a graph
Comments:
• The algorithm works very well in cases where it is not possible to find the
minimum value using linear algebra.
• The most important thing in the algorithm is to calculate the derivative of
the function with respect to each variable and then repeat step 2.
3.4. Discussion
3.4.1. Problems that can be solved by linear regression
The function 𝒚 ≈ 𝒇(𝒙) = 𝒙ᵀ𝒘 is a linear function in both 𝒘 and 𝒙 . In
practice, linear regression can be applied to models that are linear only in 𝒘 . For
example,
𝑦 ≈ 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥12 + 𝑤4 sin(𝑥2 ) + 𝑤5 𝑥 1 𝑥 2 + 𝑤0
and then apply linear regression with this new data. However, finding
functions like sin(𝑥 2 ) or 𝑥1 𝑥2 is relatively nontrivial. Polynomial regression is
often used more frequently with new feature vectors in the form [1, 𝑥1 , 𝑥12 , … ]Τ
13
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
14
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
needs.
15
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
The animation opens with title “Visualizing Linear Regression” which is our
topic Project. Then it introduces the table of contents in our video.
About the Leading scene, our mission is to make sense of the situation where
the Linear Regression can be applied. Moreover, as our real life example is house
pricing, we tried to draw a house in the scene with its parts combined of many
mobjects in Manim and a table which includes dataset about house pricing
depending on area.
16
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
The next scene is about introducing Linear Regression model, which is the
main point of the project. The title has the same formula with the previous scene,
the gradient color effect makes the code more eye-catching. However, the follow
parts actually are one of the most important scenes to explain an comprehensive
understanding about Linear Regression model.
17
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
Figure 8: Correlation
scence
In this animation, we use a linear line on the xy-axis to illustrate the correlation
of the linear function.
18
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
1. line.add_updater(
2. lambda l: l.become(
3. Line(start=ax.c2p(0, b.get_value()),
4. end=ax.c2p(10, m.get_value() * 10 +
b.get_value()))).set_color(YELLOW)
5. )
6. return data, m, b, ax, points, line
After creating three versions of the linear regression function and notation for
the function in LaTeX format using Manim's MathTex class.
19
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
To ensure the movement of the line of the graph, we define the function def
1. blink(eq5[1], m, .50)
2. blink(eq5[3], b, 2.0)
The important part of calculating Loss Function is finding the residuals from
the data to the 𝑦 = 𝑚 𝑥 + 𝑏 . Consequently, when creating the graph to
illustrate the Loss Function, we need a function:
1. def create_residual_model(scene,data,m,b,ax,points,line) -> tuple:c to animate
residuals in graph.
2. for d in data:
3. residual = Line(start=ax.c2p(d.x, d.y),
4. end=ax.c2p(d.x, m.get_value() * d.x +
b.get_value())).set_color(RED) #từ các điểm dot đến đường thẳng
5. scene.play(Create(residual), run_time=.3)
6.
7. residual.add_updater( # cập nhật residuals khi m và b thay đổi
8. lambda r,d=d:
9. r.become(Line(start=ax.c2p(d.x, d.y), end=ax.c2p(d.x,
m.get_value()*d.x+b.get_value())).set_color(RED)))
10. residuals += residual
The add_updater method is employed to dynamically update the position of
each residual as the values of the slope 𝑚 and y-intercept 𝑏 change. This ensures
that the residuals move accordingly during any subsequent animations that modify
the linear regression model.
def flex_residuals():
20
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
Next part to create the given scene, we add a class named class
21
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
22
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
6. Y = df.values[:, -1]
7. # Split the dataset into training and testing sets (2/3 train, 1/3 test)
8. X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=1 /
3, random_state=7)
9. # Fit a Linear Regression model on the training set
10. model = LinearRegression()
11. fit = model.fit(X_train, Y_train)
12. # Calculate the R^2 Score on the test set
13. result = model.score(X_test, Y_test)
The coefficients of the trained model, representing the slope (m) and
intercept (b), are stored using Manim's ValueTracker. An axes system is set
up to create a visual representation of the data points and the linear regression
model. The training and testing data points are plotted on the graph as blue dots.
14. # Store the coefficients of the model into m and b using ValueTrackers
15. m = ValueTracker(fit.coef_.flatten()[0])
16. b = ValueTracker(fit.intercept_.flatten()[0])
17. # Set up the coordinate axes for visualization
18. ax = Axes(
19. x_range=[0, 100, 20],
20. y_range=[-40, 200, 40],
21. axis_config={"include_tip": False},
22. )
23. # Plot the training and testing data points
24. train_points = [Dot(point=ax.c2p(p.x, p.y), radius=.15, color=BLUE) for
p in
25. pd.DataFrame(data={'x': X_train.flatten(), 'y':
Y_train.flatten()}).itertuples()]
26. test_points = [Dot(point=ax.c2p(p.x, p.y), radius=.15, color=BLUE) for p
in
27. pd.DataFrame(data={'x': X_test.flatten(), 'y':
Y_test.flatten()}).itertuples()]
23
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
24
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
Figure 21:
Introduction to
problem
25
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
The initial animation introduces the topic with a title, "Linear Regression," and
a subtitle highlighting the context, "Hanoi Housing Price Problems." The use of
gradient colors enhances visual appeal.
The first table animation displays historical data on house areas and
corresponding prices. The color-coded labels emphasize the relevance of the
information. The table is introduced with a smooth entry animation, followed by
the gradual presentation of column labels. A second table animation introduces
the scenario of predicting the price for a 70 m² house. The row corresponding to
the prediction task is highlighted in blue, distinguishing it from the historical data.
This animation effectively conveys the transition from historical data to the
predictive task.
Arrows and braces are employed to illustrate the flow of information. The
arrows indicate the transition from historical data to prediction, while braces
provide textual annotations such as "Historical data" and "Prediction," aiding in
viewer comprehension. The animations are orchestrated to create a coherent flow,
ensuring that each element is introduced at an appropriate moment.
26
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
27
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
28
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
29
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
This scene begins by presenting both training and testing data points, along
with the initial linear regression model. The quality of the model is assessed
through the 𝑅2 score, a metric indicating its predictive performance. The scene
begins by presenting both training and testing data points, along with the initial
linear regression model. The quality of the model is assessed through the 𝑅2 score,
a metric indicating its predictive performance.
The left panel consolidates various elements for better visibility, and the
conclusion text summarizes the optimal house price according to the linear
regression model.
30
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
First, it displays the group name ("GROUP 6") and the label "DIRECTED
BY" at the top of the screen. These texts are created with specific font styles and
sizes, and after a brief pause (self.wait(2)), they fade out of view.
31
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
32
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
In summary, while Matplotlib and Plotly are versatile tools for a wide range
of visualizations, Manim's specialization in mathematical graphics sets it apart,
making it a preferred choice for those seeking to convey complex mathematical
concepts through dynamic and visually compelling animations.
33
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
CONCLUSION
------*------
In summarizing the implementation process, our visualization proved to be a
harmonious blend of aesthetic appeal and educational depth. The animation not
only presented a visually engaging representation of linear regression but also
served as an effective pedagogical tool, seamlessly integrating quantitative
assessments like the R² score for a comprehensive understanding of the model's
accuracy.
Looking ahead, our venture opens the door to exciting possibilities for future
improvements and developments. Suggestions include refining visual aesthetics,
experimenting with color schemes, and introducing interactive elements to elevate
the user experience. Feature expansion is also on the horizon, with considerations
for incorporating regularization techniques and exploring alternative algorithms,
ensuring a more comprehensive exploration of linear regression concepts.
34
Downloaded by Vu Bui ([email protected])
lOMoARcPSD|18620449
REFERENCES
------*------
1. Khandelwal, R. (2019). Linear Regression using Gradient Descent.
Towards Data Science. Retrieved from https://towardsdatascience.com/linear-
regression-using-gradient-descent-97a6c8700931
35
Downloaded by Vu Bui ([email protected])