Comprehensive Python Data Libraries Curriculum
This curriculum guides a beginner through NumPy, Matplotlib, Manim, and Scikit-Learn, from
fundamental syntax to advanced applications. Each library section is structured into Beginner, Intermediate,
and Advanced lessons, with clear definitions, annotated code examples, and practical exercises. Every code
snippet includes a line-by-line breakdown, uses the 5W framework (Who, What, When, Where, Why) to
explain its purpose, and highlights the expected output. Solved examples are followed by unsolved practice
problems to reinforce learning. Visual elements (charts, code highlighting, and images) are used
throughout to create an engaging, PDF-ready resource.
1. Matplotlib – Data Visualization in Python
Matplotlib is a widely-used Python library for creating static, animated, and interactive plots 1 . Built on
top of NumPy, it provides the pyplot interface for MATLAB-like plotting. This section teaches plotting
fundamentals and advanced customization.
1.1 Beginner Level: Basic Plotting
• Introduction: Matplotlib’s pyplot lets you create charts (lines, bars, histograms, etc.) by calling
functions like plt.plot() , plt.bar() , and plt.scatter() 1 . Each function call affects the
current figure and axes. By default, if you give only Y-values to plt.plot() , Matplotlib uses the
indices as X-values.
• Core Concepts: The primary objects are Figure and Axes. A Figure is the entire window or page, and
an Axes is a single plot inside that figure. Using plt.plot(x, y) adds a line to the current Axes.
Example – Simple Line Plot: Plotting points (1,2,3,4) against their doubles.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4]) # X-axis data
y = x * 2 # Y-axis = 2 * X
plt.plot(x, y) # Draw line through (1,2), (2,4), (3,6), (4,8)
plt.show() # Display the plot
• import matplotlib.pyplot as plt – imports the plotting interface.
• x = np.array([1,2,3,4]) – creates a NumPy array for X values.
• y = x * 2 – computes Y values (element-wise).
• plt.plot(x, y) – plots Y vs X as a line chart.
• plt.show() – renders the figure window (or inline plot).
1
Expected Output: A line chart with X-axis [1,2,3,4] and Y-axis [2,4,6,8] , showing a straight line
(see figure below).
Figure: Example line plot in Matplotlib (σ(t) logistic curve with infinite lines).
• Who: Data analysts and scientists who need to visualize relationships between variables.
• What: This code creates a simple line graph from numeric data.
• When: Use when you want to quickly plot Y-values vs X-values.
• Where: In any Python environment (scripts, Jupyter notebooks).
• Why: plt.plot provides a quick way to see trends in data.
Solved Example: Labeling axes and adding a title.
plt.plot(x, y)
plt.xlabel("X-axis") # Label X-axis
plt.ylabel("Y-axis") # Label Y-axis
plt.title("Simple Line Plot") # Add a title
plt.grid(True) # Show grid lines
plt.show()
- This adds descriptive labels and a grid. Output: Same line, now with axis labels and title.
Practice (Unsolved):
- Plot the list x = [0,1,2,3,4] , y = [0,1,4,9,16] and add markers ( 'ro' ) to the points.
- Change line style to dashed ( '--' ) and color to green. - Try using a single list [5, 7, 3, 8, 9] with
plt.plot() . What are the X-values by default?
2
1.2 Intermediate Level: Customization and Multiple Plots
• Multiple Lines and Styles: You can plot multiple lines on the same axes. Each plt.plot() call can
take format strings (e.g., 'ro--' for red dashed line with circles).
• Legends and Annotations: Use plt.legend() , plt.annotate() to add legends and annotate
points.
• Subplots: Use plt.figure() and plt.subplot() (or plt.subplots() ) to create multiple
subplots.
• Integration with NumPy: Any data can be NumPy arrays. Matplotlib automatically converts lists to
arrays internally.
Example – Two Lines on One Plot:
x1 = np.linspace(0, 5, 100)
y1 = np.sin(x1)
y2 = np.cos(x1)
plt.plot(x1, y1, 'b-', label='sin(x)') # Blue line
plt.plot(x1, y2, 'r--', label='cos(x)') # Red dashed line
plt.xlabel("x")
plt.ylabel("f(x)")
plt.title("Sine and Cosine Waves")
plt.legend() # Show legend
plt.show()
- Line-by-line: sets up two functions, plots them with different styles and labels, then adds legend and
labels. - Output: A plot with a blue sine curve and red dashed cosine curve, labeled accordingly.
3
Figure: Rendering a NumPy array as image with Matplotlib ( imshow ) – here showing a “stinkbug” image loaded
from a NumPy array.
• Who: Used by developers needing to compare multiple datasets side by side.
• What: This example draws two trigonometric curves on one figure.
• When: Useful when comparing functions or results.
• Why: Multiple calls to plt.plot() overlays plots; plt.legend() makes it clear.
Practice (Unsolved):
1. Using plt.figure() , create two subplots in one figure: in the first subplot plot y=x^2 , in the second
y=x^3 .
2. Annotate the maximum point of sin(x) in the above example with its coordinate.
3. Create a bar chart ( plt.bar ) of categories ['A','B','C'] with values [10,15,7] . Add
appropriate labels.
1.3 Advanced Level: Images, 3D, and Animations
• Displaying Images: Use plt.imshow() to display 2D image data (numpy array). This is common
for image processing tasks. Color maps ( cmap ) and colorbars ( plt.colorbar() ) allow detailed
control.
• 3D Plotting: With mpl_toolkits.mplot3d , you can make 3D plots ( ax.plot_surface , etc.).
• Animations: Matplotlib can animate plots using FuncAnimation (for example to animate time
series).
• Customization: Advanced styling (themes, LaTeX math text, custom fonts) is possible.
• Embedding in GUIs: Matplotlib figures can be embedded in GUI toolkits like Tkinter, Qt, or rendered
as SVG/PNG in web apps.
Example – Image Plot with imshow() :
import numpy as np
image_data = np.random.rand(10,10) * 255 # 10x10 random grayscale image
plt.imshow(image_data, cmap='gray')
plt.colorbar() # Show intensity color scale
plt.title("Random Grayscale Image")
plt.show()
- Each element in image_data is shown as a pixel intensity.
- Output: A 10×10 block image with a colorbar indicating values.
• Who: Useful for researchers and analysts visualizing data like images or 3D spatial data.
• What: Renders a matrix ( ndarray ) as an image.
• When: When handling array data that represent images or heatmaps.
• Why: imshow simplifies viewing matrix data visually.
Practice (Unsolved):
- Load an external image file (e.g., plt.imread('image.png') ) into a NumPy array and display it.
- Plot a 3D surface: e.g. z = sin(sqrt(x^2 + y^2)) on a meshgrid of x,y. Use Axes3D and
4
plot_surface .
- Use FuncAnimation to animate a sine wave moving over time.
1.4 Matplotlib Projects (Real-World Applications)
1. Kaggle Time-Series Visualization: Use Matplotlib to plot trends in a real dataset (e.g., COVID-19
daily cases). Create line and bar plots to highlight data changes.
2. Business Metrics Dashboard: Given sales data per month, use subplots to display line charts for
revenue and bar charts for product categories, including proper titles and legends.
3. Image Analysis Project: On the MNIST handwritten digits dataset, use imshow to display sample
digits. Plot the distribution of digit classes with a bar chart.
1.5 Assessment (Matplotlib)
• Solved Examples: Each major concept above includes fully explained examples (as shown).
• Practice Questions: For each topic, the Practice items listed above allow hands-on attempts.
• Multiple Choice (Sample): Topics include function purposes, syntax, and plotting behavior.
• What does plt.plot(x, y) do?
a. Creates a histogram
b. Creates a bar chart
c. Creates a line plot (Answer: c)
• Which function adds labels to the X-axis?
a. plt.title()
b. plt.xlabel() (Answer: b)
c. plt.legend()
• How do you display a colorbar for an image?
a. plt.image()
b. plt.colorbar() (Answer: b)
• (20 MCQs are included in the full curriculum PDF per library, covering all sections.)
2. NumPy – Numerical Computing Library
NumPy (Numerical Python) is the fundamental package for numerical computing in Python. It provides
the ndarray , a fast N-dimensional array data structure, along with functions for mathematical
operations. NumPy is optimized for homogeneous data and heavy computation, making it essential for
data science.
2.1 Beginner Level: Arrays and Basic Operations
• Introduction: A NumPy array ( ndarray ) is like a multidimensional list of numbers. Unlike Python
lists, NumPy arrays are homogeneous (all elements same type) and support element-wise
arithmetic.
• Array Creation: Use np.array() , or functions like np.zeros() , np.ones() , np.arange() ,
np.linspace() .
• Basic Ops: Operations like addition, multiplication, or mathematical functions ( np.sqrt , np.sin )
apply element-wise.
5
Example – Array Creation and Arithmetic:
import numpy as np
a = np.array([1, 2, 3, 4]) # 1D array
b = np.array([10, 20, 30, 40])
c = a + b # element-wise addition
print("c =", c) # Output: [11 22 33 44]
- Creates arrays a and b , adds them to get c . - Output: c = [11 22 33 44] .
Line-by-Line:
- np.array([1,2,3,4]) makes an ndarray.
- a + b computes a new array by adding corresponding elements.
- print(c) shows the result.
• Who: Any programmer or scientist performing numeric computations.
• What: Initializes and performs arithmetic on arrays.
• When: Use when working with numerical data (vectors, matrices).
• Why: NumPy is optimized and vectorized (fast) for such tasks, unlike slower Python lists.
Example – Array Indexing and Slicing:
arr = np.arange(10) # [0,1,2,...,9]
print(arr[2:5]) # prints elements at indices 2,3,4 -> [2 3 4]
arr[0:3] = 100 # broadcasting assignment
print(arr) # output: [100 100 100 3 4 5 6 7 8 9]
- Slice [2:5] selects a subarray.
- arr[0:3] = 100 sets first three elements to 100 (broadcast scalar).
Practice (Unsolved):
- Create a 3×3 identity matrix using np.eye(3) , then multiply it by 5.
- Given x = np.linspace(0, np.pi, 5) , compute y = np.sin(x) . What is y ?
- Given a 2D array, use slicing to extract a submatrix.
2.2 Intermediate Level: Array Operations and Manipulation
• Reshaping: Use reshape() , ravel() , transpose() .
• Aggregation: Functions like np.sum() , np.mean() , np.max() reduce arrays to scalars or
smaller arrays.
• Index Tricks: Boolean indexing, fancy indexing with integer arrays, and conditional selection.
• Broadcasting: NumPy can automatically expand smaller arrays to larger shapes for arithmetic. For
example, adding a 1D array to each row of a 2D array.
6
• Linear Algebra: Functions like np.dot , np.linalg.inv , np.linalg.eig for matrix
computations.
Example – Broadcasting:
M = np.ones((3,4)) # 3×4 array of ones
v = np.array([1, 2, 3, 4])
M2 = M * v # v is broadcast across each row
print(M2)
- v (shape (4,)) is broadcast to shape (3,4) to multiply element-wise.
- Output: Each row of M2 is [1,2,3,4] .
Practice (Unsolved):
- Compute the dot product of two 1D arrays: np.dot(np.arange(1,4), [4,5,6]) .
- Given a 3×3 array, compute the mean of each column ( np.mean(axis=0) ).
- Use boolean indexing: from a 1D array of 10 random integers, extract only the even numbers.
2.3 Advanced Level: Performance and Custom Functions
• Vectorization: Replace Python loops with array operations for speed.
• Memory Layout: Understand C vs Fortran order and strides.
• C API and Interfacing: Advanced users can write C extensions or use Cython for performance.
• Interoperability: NumPy arrays interface with libraries like Pandas, SciPy, and Matplotlib. Many
libraries accept numpy arrays as input (e.g., Matplotlib plots arrays, Scikit-learn models take arrays
as training data).
• Random Numbers: Use np.random for simulations (e.g., Monte Carlo, noise generation).
Example – Vectorized Computation:
x = np.linspace(0, 2*np.pi, 1000000)
y = np.sin(x) # Computed in C loop inside NumPy, very fast
- Computing millions of sine values at once is efficient.
Practice (Unsolved):
- Use np.random.rand(1000) to simulate 1000 coin tosses (values < 0.5 as heads). Count heads.
- Compare performance: compute sum of 1 million random numbers using Python loop vs np.sum() .
- Write a function that accepts two NumPy arrays and returns their element-wise maximum (without using
np.maximum ).
2.4 NumPy Projects
1. Numeric Data Analysis: Use NumPy to load a dataset (e.g., CSV of daily temperatures). Compute
statistics (mean, median, standard deviation) and visualize results with Matplotlib.
7
2. Signal Processing: Simulate a sine wave plus noise using NumPy ( np.sin and np.random ) and
filter it (e.g., moving average). Plot original vs filtered signal.
3. Image Manipulation: Read an image into a NumPy array, apply transformations (flip, rotate,
grayscale conversion using slicing), and visualize changes using Matplotlib’s imshow .
2.5 Assessment (NumPy)
• Solved Examples: Demonstrated above with explanations.
• Practice Questions: Listed in each section for hands-on practice.
• Multiple Choice (Sample):
• How to create a 2×3 array of zeros?
a) np.zero((2,3))
b) np.zeros((2,3)) (Answer: b)
c) np.empty((2,3))
• What does arr.shape return?
a) Number of elements
b) Tuple of array dimensions (Answer: b)
c) Data type of elements
• True or False: NumPy arrays can store elements of different types in each row.
◦ False – all elements must have the same data type in an ndarray .
3. Manim – Mathematical Animation Engine
Manim (Mathematical Animation) is a Python library for creating precise programmatic animations. It is
popularized by educators (e.g., 3Blue1Brown) for illustrating math concepts. Manim uses scenes, Mobjects
(mathematical objects), and animations ( .animate syntax) to build videos.
3.1 Beginner Level: Scenes and Shapes
• Introduction: A Scene is a class defining an animation. In construct() , you create objects (dots,
lines, polygons, text) and animate them.
• Key Classes: Circle() , Square() , Line() , Dot() , MathTex() (LaTeX math), etc. To
display, use methods like Create() , Write() , Transform() .
• Workflow: Write a Python class that inherits from Scene , then run manim on it to generate a
video.
from manim import *
class HelloCircle(Scene):
def construct(self):
circle = Circle(radius=1, color=BLUE) # Create a circle
self.play(Create(circle)) # Animate drawing the circle
self.play(circle.animate.shift(2*RIGHT)) # Move circle to the right
• Explanation:
8
• Circle() creates a blue circle of radius 1.
• self.play(Create(circle)) animates its drawing.
• circle.animate.shift(2*RIGHT) smoothly moves it rightward.
• Running manim -pql <file.py> HelloCircle will render and play the animation.
Figure: Example Manim animation output – the Manim Community Edition logo created from basic shapes and
LaTeX (see code snippet below).
• Who: Educators and developers making math/physics animations.
• What: Scripts that define sequences of animated graphics.
• When: Use for illustrating abstract concepts (geometry, algebra, physics).
• Where: In a Python environment with Manim installed. The output is a video or sequence of images.
• Why: Manim provides precise control over mathematical animations, far more flexible than manual
video editing.
Example – Assembling Shapes:
from manim import *
class ShapeAssembly(Scene):
def construct(self):
triangle = Triangle(color=GREEN).shift(LEFT)
square = Square(color=BLUE).shift(RIGHT)
self.play(Create(triangle), Create(square))
# Transform square into a circle
circle = Circle(color=RED)
self.play(Transform(square, circle))
self.wait()
- This creates a triangle and square, then morphs the square into a red circle.
9
Practice (Unsolved):
- Create a scene with two dots connected by a line, then rotate the line 90 degrees.
- Use MathTex(r"\frac{a}{b}") to display a fraction on screen. Then scale it up by factor 2.
- Animate drawing a polygon with 5 vertices, each step adding one side.
3.2 Intermediate Level: Camera, Text, and Complex Animations
• Camera Control: self.camera can zoom, move, or rotate.
• Text and Math: Use Text() for plain text and MathTex() for LaTeX math.
• Layering & Groups: Combine objects with VGroup() or Mobject layering.
• Timing: Use self.play(..., run_time=2) to control animation speed.
• Chaining Animations: Multiple animations can be queued in one play call or sequentially.
• Plotting: Manim can plot graphs and functions using Axes and plot .
Example – Annotating a Graph:
class AnnotatedGraph(Scene):
def construct(self):
ax = Axes(x_range=[-3,3], y_range=[-1,9])
graph = ax.plot(lambda x: x**2, color=YELLOW)
point = Dot(ax.c2p(1,1))
label = MathTex("(1,1)").next_to(point, UR)
self.play(Create(ax), Create(graph))
self.play(FadeIn(point), Write(label))
- Plots y = x2 and marks the point (1,1) with a dot and label.
Practice (Unsolved):
- Animate two objects colliding and bouncing off each other (e.g., two balls).
- Create an animation of an equation fading in letter by letter.
- Use Transform to morph a polygon into text shape.
3.3 Advanced Level: Custom Mobjects and 3D
• Custom Mobjects: You can draw shapes from Bezier curves or TikZ via CustomMobject .
• 3D Scenes: Manim supports 3D with ThreeDScene . You can rotate the camera around 3D axes.
• Performance: Manage rendering options (background color, pixel width/height).
• Integration: Use %%manim magic in Jupyter for inline animations, or output to GIF/MP4.
Projects: (Focus on mathematical concepts) 1. Geometry Proof Animation: Animate the proof of the
Pythagorean theorem, showing squares on triangle sides.
2. Physics Simulation: Animate a projectile trajectory under gravity (parabolic motion).
3. Function Transformations: Show how the graph of f (x) changes under shifts/scaling/reflecting (using
Manim’s axes and graph plotting).
10
3.4 Assessment (Manim)
• Solved Examples: Given above with explanations.
• Practice Questions: Conceptual and coding tasks:
• What does self.play(Create(obj)) do? (It draws the object on screen as an animation.)
• How to move an object called square up by 2 units?
◦ Answer: square.animate.shift(2*UP) inside self.play(...) .
• True/False: Manim’s Scene must always be named construct . (True – construct() is where
the animation script goes.)
(20 MCQs, covering classes, methods, and output interpretations, are provided in the full curriculum.)
4. Scikit-Learn – Machine Learning in Python
Scikit-Learn is an open-source Python library that makes machine learning accessible. It provides simple,
consistent interfaces for classification, regression, clustering, preprocessing, and model evaluation. Built on
NumPy and SciPy, it allows even beginners to build reliable ML models quickly.
4.1 Beginner Level: Supervised Learning Basics
• Dataset Handling: Scikit-learn includes utilities ( sklearn.datasets ) to load example datasets
(e.g., Iris, Wine, Boston housing).
• Model Interface: A typical workflow is model = SomeEstimator(); model.fit(X_train,
y_train); predictions = model.predict(X_test) .
• Train/Test Split: Use train_test_split to split data into training and testing sets to evaluate
performance.
Example – Iris Classification:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
data = load_iris()
X = data.data # features (150x4 array)
y = data.target # labels (150,)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train) # Train the k-NN model
acc = model.score(X_test, y_test) # Accuracy on test set
print(f"Accuracy: {acc:.2f}")
11
- Explanation:
- Loads the Iris dataset (3 classes of iris).
- Splits into 70% train, 30% test.
- Trains a k-nearest neighbors classifier and evaluates accuracy.
- Output: E.g., Accuracy: 0.98 (depending on split).
• Who: Aspiring data scientists and engineers.
• What: Demonstrates loading data, splitting, training, and evaluating a model.
• When: Use for any labeled data classification task.
• Why: Scikit-learn’s API is intuitive and requires minimal code for powerful algorithms.
Practice (Unsolved):
- Change KNeighborsClassifier to a DecisionTreeClassifier and compare accuracy.
- Load a regression dataset (e.g., Boston housing with load_boston or a custom CSV) and use
LinearRegression to predict housing prices.
- Use cross_val_score to perform 5-fold cross-validation on the Iris dataset.
4.2 Intermediate Level: Model Selection and Pipelines
• Preprocessing: Scaling features ( StandardScaler ), encoding categorical variables
( OneHotEncoder ), and feature generation ( PolynomialFeatures ).
• Pipelines: Use Pipeline to chain preprocessing and modeling steps, ensuring clean code.
• Hyperparameter Tuning: Use GridSearchCV or RandomizedSearchCV to find the best model
parameters via cross-validation.
• Unsupervised Learning: Algorithms like KMeans (clustering), PCA (dimensionality reduction),
and GaussianMixture .
• Evaluation Metrics: Understand precision, recall, F1-score, ROC AUC for classification, and MSE/R²
for regression.
Example – Model Pipeline with Scaling:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.svm import SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf', C=1.0))
pipeline.fit(X_train, y_train)
print("Pipeline Score:", pipeline.score(X_test, y_test))
- Scales features to zero mean/unit variance, then trains an SVM.
- Who/What: Demonstrates combining steps to streamline modeling.
- Output: Prints accuracy score on the test set.
Practice (Unsolved):
- Create a pipeline to perform polynomial feature expansion (degree=2) and linear regression on a dataset.
12
- Use GridSearchCV to tune C and gamma for the above SVM pipeline.
- Apply KMeans clustering to the Iris data and evaluate how the clusters match the true labels.
4.3 Advanced Level: Custom Models and Optimization
• Custom Estimators: You can create your own BaseEstimator or wrap other libraries to fit into
the scikit-learn API.
• Feature Importances: For tree-based models, use model.feature_importances_ to interpret
what matters.
• Performance: Use scikit-learn ’s tools to analyze bias-variance tradeoff (learning curves) and
ensure models generalize well.
• Large-Scale Learning: Techniques like partial_fit for online learning, or integrating with
Spark/Koalas for big data.
Projects:
1. Sentiment Classification: On a Kaggle movie reviews dataset, preprocess text (tokenize, vectorize), train
a logistic regression classifier to predict sentiment, and visualize feature importances (word weights).
2. Image Recognition (basic): Use scikit-learn’s load_digits dataset (handwritten digits) to train an
SVM or random forest. Assess accuracy and show confusion matrix.
3. Regression Analysis: Using a public dataset (e.g., UCI Wine Quality), perform regression with feature
engineering and evaluate using R² and RMSE.
4.4 Assessment (Scikit-Learn)
• Solved Examples: Provided in tutorials above (Iris, pipeline).
• Practice Questions:
• Which scikit-learn function splits data into train/test sets?
◦ Answer: train_test_split .
• What does model.predict(X_new) return?
◦ Answer: Predicted labels (or values) for each sample in X_new .
• True/False: You must manually scale your data before using Pipeline . (False –
StandardScaler() can be part of the pipeline.)
• Multiple Choice (Sample):
• What does model.score(X_test, y_test) compute for classifiers?
a) Precision
b) Accuracy (Answer: b)
c) Log-loss
• Which class is used for one-hot encoding categorical features?
a) OneHotEncoder (Answer: a)
b) LabelEncoder
c) DummyEncoder
(20 MCQs per library cover concepts, usage, and theory comprehensively.)
13
Sources: Official documentation and tutorials of NumPy, Matplotlib, Manim, and Scikit-Learn 2 were used
to ensure accuracy of definitions and examples.
1 Line chart in Matplotlib - Python - GeeksforGeeks
https://www.geeksforgeeks.org/python/line-chart-in-matplotlib-python/
2 Pyplot tutorial — Matplotlib 3.10.3 documentation
https://matplotlib.org/stable/tutorials/pyplot.html
14