PBBML L11
PBBML L11
https://youtu.be/iYmMyKE6FN4?si=fYYSPmTO7fh9If5g
Image Classification Steps
1. Pre-processing: This step involves cleaning and preparing the
image data for processing by the AI model. This includes tasks
such as removing noise, resizing images, and normalizing pixel
values.
2. Object Detection: Identifying and locating objects within the
image. This is done using algorithms that can detect edges,
corners, and other features in the image.
https://youtu.be/iYmMyKE6FN4?si=fYYSPmTO7fh9If5g
Wildlife Image Classification
• WildlifeInsights.org uses AI and camera
traps to streamline wildlife monitoring for
conservation.
• Camera traps, equipped with motion
sensors, automatically capture images
when animals pass by.
• These images are then uploaded to the
Wildlife Insights platform, where AI models
analyze them to identify species, filter out
blank images, and detect possible threats
like poachers.
https://neptune.ai/blog/self-driving-cars-with-convolutional-neural-networks-cnn
Self-Driving Cars
2. Data Processing and Decision Making:
• Mapping and Localization: The car uses high-definition maps and GPS
to determine its precise location and orientation on the road. This
information is crucial for planning routes, identifying intersections, and
understanding traffic patterns.
• Object Detection and Tracking: AI algorithms are used to detect and
track objects in the environment, such as other cars, pedestrians, and
cyclists. This helps the car anticipate their movements and take
appropriate actions to avoid collisions.
https://www.youtube.com/watch?v=Zl3YSPilT-w
Convolutional Neural Networks (CNNs)
Computer Vision
• Humans recognize objects in images by eyes.
• CNN is like “Super-Detective” of the AI world and picks out
subtle clues and patterns.
• CNNs offer a powerful solution for image recognition and
pattern analysis.
CNN Deep Dive
A specialized type of artificial neural network designed for pattern
recognition, particularly within images.
• Layers: Multiple interconnected layers that process information.
• Filters: Small matrices that scan the input image for specific
features (e.g., edges, corners).
• Activation Functions: Introduce non-linearity to enable complex
pattern learning.
Image https://youtu.be/QzY57FaENXg?si=IDObco3URr3xFYLr
CNN Deep Dive
A specialized type of artificial neural network designed for pattern
recognition, particularly within images.
• Layers: Multiple interconnected layers that process information.
• Filters: Small matrices that scan the input image for specific
features (e.g., edges, corners).
• Activation Functions: Introduce non-linearity to enable complex
pattern learning.
Image https://youtu.be/QzY57FaENXg?si=IDObco3URr3xFYLr
The Convolution Process
• Filters slide across the input image (convolution).
• Each filter detects a specific feature.
• The output of each filter is a feature map highlighting where that feature is
present in the image. http://taewan.kim/post/cnn/
Image https://youtu.be/QzY57FaENXg?si=IDObco3URr3xFYLr
The Convolution Process
• Filters slide across the input image (convolution).
• Each filter detects a specific feature.
• The output of each filter is a feature map highlighting where that feature is
present in the image.
http://taewan.kim/post/cnn/
How CNNs Work: Pooling
• Pooling layers down sample the feature maps.
• Pooling is like zooming out on the big and detailed map. Some of the fine
details cab be lost, but you still see the major landmarks and roads.
• In CNNs, pooling layers "zoom out" on the feature maps created by the
convolution layers.
• This reduces the computational load and makes the network less sensitive to
small variations in the input.
Image https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
How CNNs Work: Pooling
• Types of Pooling
• Max Pooling: Like finding the tallest mountain in a region.
It takes the largest value from each section to highlight the strongest
features.
• Average Pooling: calculates the average height of the mountains in a
region. It calculates the average value of each section.
It provides a smoother representation of the features.
http://taewan.kim/post/cnn/
Layers and Feature Hierarchy
• Early layers detect simple features (edges, corners).
• Deeper layers learn more complex features by combining the outputs of
earlier layers (windows, doors, faces).
• This creates a hierarchy of features, allowing the network to recognize
complex objects.
Image https://youtu.be/QzY57FaENXg?si=IDObco3URr3xFYLr
What are Convolutional Neural Networks
(CNNs)? IBM Technology
Applications of CNNs
• Image Classification: Identifying the objects in an image (e.g., cat, dog, car).
• Object Detection: Locating and classifying objects within an image (e.g., self-
driving cars).
• Facial Recognition: Identifying individuals based on their facial features.
• Medical Imaging: Analyzing medical images to detect diseases (e.g., cancer
detection in X-rays).
• Natural Language Processing: Text classification, sentiment analysis.
https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
So what are neural networks?
Chapter 1, Deep Learning | 3Blue1Brown
Image https://www.youtube.com/watch?v=aircAruvnKk
Pandas
• An open-source Python library.
• Essential for data manipulation and analysis, especially in data
science and machine learning.
• Provides efficient data structures. It works seamlessly with
structured data like tables and time series.
• Makes it easy to clean, transform, and analyze your data.
Pandas
Core Data Structures
• Series: A one-dimensional labeled array that can
hold any data type (integers, strings, floats, etc.).
It is essentially a single column of data, with an
index for each value.
• DataFrame: A two-dimensional labeled data
structure, similar to a table in a database or an
Excel spreadsheet. It has rows and columns,
making it ideal for handling datasets.
Pandas
Data Loading
• Pandas has built-in functions to read data from various formats,
such as CSV, Excel, SQL databases, JSON, and more.
• For example, pd.read_csv("file.csv") loads a CSV file into a
DataFrame, making it straightforward to work with external
datasets.
df.dtypes
• It returns the data types of each column in
a DataFrame. It helps you understand the
type of data stored in each column, which
is crucial for data cleaning and analysis.
• Common data types you might see include:
• int64 for integer values,
• float64 for decimal numbers,
• object for strings or mixed data,
• datetime64 for date and time data,
• bool for Boolean values.
df.info()
• It provides a concise summary of a DataFrame.
It gives important details about the structure and
contents of the DataFrame
• Index and Columns: DataFrame’s index type, the
number of entries, and the names of each column.
• Non-Null Counts: For each column, it shows how
many non-null (non-missing) values are present,
which helps in assessing if there’s any missing data.
• Data Types: The data type of each column (e.g.,
int64, float64, object, etc.) is displayed, allowing you
to understand what kind of data each column holds.
• Memory Usage: It shows the memory usage of the
DataFrame, which is helpful when working with
large datasets.
df.describe()
• It a summary of statistical measures for numerical
columns in a DataFrame. It is helpful for quickly
gaining insights into the distribution and central
tendencies of your data.
• When df.describe is used with numerical columns,
it typically returns
• Count: The number of non-null values in each column.
• Mean: The average value of each column.
• Standard Deviation (std): Measures how spread out the values are
from the mean.
• Minimum (min): The smallest value in each column.
• 25% (1st Quartile): The value below which 25% of the data falls.
• 50% (Median or 2nd Quartile): The middle value, dividing the data in
half.
• 75% (3rd Quartile): The value below which 75% of the data falls.
• Maximum (max): The largest value in each column.
df[‘column name’].mode()
• It is used to find the most frequent value(s) in a
specific column of a DataFrame.
• Here's a breakdown of what each part does:
• df: Refers to the DataFrame you’re working with.
• ['column name']: Specifies the particular column within the
DataFrame you want to analyze (replace column name
with the actual name of your column).
• .mode(): A method that calculates the mode, or most
frequently occurring value(s) in the column.
df.loc[df['population']>=1000]
• It is used to filter rows in a DataFrame based on a condition.
• Here's a breakdown of what each part does:
• df: Refers to the DataFrame you’re working with.
• ['population']: Specifies the column in the DataFrame, in this case, population.
• df['population'] >= 1000: This creates a boolean mask (a series of True or False values)
where each row is True if the value in the population column is greater than or equal to
1000 and False otherwise.
• df.loc[ ... ]: The loc method is used to select rows (or columns) by label or condition.
Here, it selects all rows where the condition inside (i.e., df['population'] >= 1000) is True.
df.isnull().sum()
• df.isnull(): This part checks each cell in the
DataFrame (df) and returns a new DataFrame of the
same size.
• If a cell in the original DataFrame has a missing value (like
NaN or None), the corresponding cell in the new
DataFrame will be True. Otherwise, it will be False.
• .sum(): This method is then applied to the DataFrame
of True and False values. By default, sum() operates
column-wise (along axis=0). It essentially treats True
as 1 and False as 0, adding them up for each column.
df[df[‘Column’].isna()]
• This code is designed to find all the rows in your data where the
values are missing.
• df2['Humidity']: This part isolates the "Humidity" column from your df2
DataFrame. It's like focusing your attention specifically on the humidity
readings in your table.
• .isna(): This function goes through each entry in the "Humidity" column
and checks if it's a missing value (NaN). It creates a kind of "checklist“.
• If a humidity reading is present, it marks it as False (meaning "not
missing"). If a humidity reading is missing (NaN), it marks it as True
(meaning "missing").
• df2[...]: This uses the "checklist" created in the previous step to
filter the entire df2 DataFrame. It only keeps the rows where the
corresponding checklist entry is True (i.e., where humidity is
missing).
df[‘column name’].mode()
• df['Column'].mode() looks at a column in your data and tells you
which value appears most often.
• .mode() simply finds the most common value in a column.
df[‘column name’].fillna(value, inplace=True/False)
• Imagine you have a form with some blank spaces.
.fillna() is like a tool that helps you fill in those blanks.
• df['Column']: This selects the specific column in your data
where you want to fill the blanks. Think of it like choosing
which part of the form you want to work on.
• .fillna(): This is the function that does the actual filling.
You need to tell it what to fill the blanks with.
• value: This is what you want to put in the blanks. It could
be a specific number (like 0), a word (like "Missing"), or
even the average of other values in the column.
• inplace=True/False:
• inplace=True: This means you want to directly change your
original form. Like using a pen to fill in the blanks permanently.
• inplace=False: This means you want to create a copy of the form
with the blanks filled in, but keep your original form as it was. Like
making a photocopy and filling that one in.
df.sort_values(‘Column’, ascending=True/False)
• This code sorts a Pandas DataFrame named df by
the values in the column named 'Column'.
• df: This refers to the Pandas DataFrame you're
working with. DataFrames are essentially tables of
data in Python.
• .sort_values(): This is a method that DataFrames
have. It allows you to sort the DataFrame by one or
more columns.
• 'Column': This is the name of the column you want
to sort by.
• ascending=True/False
• True: It lists the data from small to big values
• False: It lists the data from big to small values
df[‘Column’].isin( )
• This code checks if each value in the
'Column' of a Pandas DataFrame (df) is
present in a specified list or set of values.
• df: This refers to your Pandas DataFrame.
• ['Column']: This selects the column named
'Column' from your DataFrame.
• .isin(): This is a method that Series (like a
single column from a DataFrame) have in
Pandas. It checks whether each value in the
Series is present in a given sequence of
values (like a list, tuple, or set).
Seaborn
• A Python data visualization library based on Matplotlib.
• Creates informative and attractive statistical graphics.
• High-level interface: Easy to use, even for beginners.
• Beautiful aesthetics: Comes with attractive default styles and color palettes.
• Statistical focus: Designed for visualizing distributions, relationships, and patterns in
data.
• Integration with Pandas: Works seamlessly with Pandas DataFrames.
https://seaborn.pydata.org/
Data Visualization
1. Balance and White Space
Breathing Room for Your Data
• Declutter: Use alignment, repetition, contrast, and hierarchy.
• White Space: Give elements room to breathe for easy
understanding.
• Focus: One topic per screen, use interactive elements for
details.
Image https://www.youtube.com/watch?v=0Smgm2UTUSo
Data Visualization
2. Patterns and Colors
Guiding Your Visual Cues
• Patterns: Consistent use of color (e.g., green = good, red = bad)
• Consistency: Same chart type for the same data throughout.
• Contrast: Use color to highlight key differences.
• Color Palette: Explore options, but avoid overuse.
Image https://www.youtube.com/watch?v=0Smgm2UTUSo
Data Visualization
3. Text and Fonts
Wors Matter Too
• Clarity: Concise text that guides without
overwhelming.
• Font Choice: Stick to two easy-to-read
fonts.
• Labels: Clear axis labels with units,
add detail only when needed.
• Direct Labeling: Place labels on data
points, not off to the side.
Image https://www.youtube.com/watch?v=0Smgm2UTUSo
Data Visualization
Clarity and Visual Type
The Right Chart for the Job
• Chart Choice: Use the best visual for your data
(bar charts > pie charts for comparisons).
• Simplicity: Avoid unnecessary 3D or complex
visuals.
• Testing: Make sure your visuals effectively
communicate the message.
Image https://www.youtube.com/watch?v=0Smgm2UTUSo
Using Design Techniques for Clear and
Appealing Data Visualization | Null Queries
Image https://www.youtube.com/watch?v=0Smgm2UTUSo
Lec11 Summary
• Convolutional Neural Network
• Pandas & Seaborn Code
• Data Visualization
• Practice the codes with PBBM11_p_Name.ipynb