0% found this document useful (0 votes)

10 views

C2 +Preparing+Data+for+Statistical+Machine+Learning+Models

The document discusses various types of data used in machine learning, categorized by source, structure, and use case. It provides examples of time series, text, image, video, audio, and tabular data, along with their real-world applications. Additionally, it covers the importance of numerical representation and encoding methods for preparing data for machine learning algorithms.

Uploaded by

sirine.nahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

C2 +Preparing+Data+for+Statistical+Machine+Learning+Models

Uploaded by

sirine.nahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

Fall 2024

EECE 490: Introduction to

Machine Learning
Chapter 2: Preparing Data for Statistical Machine Learning
Algorithms
Types of Data in Machine Learning

EECE 490: Introduction to ML 2

Categories of Data In Machine Learning

Data in machine learning studied from three different dimensions:

1. Data Source
2. Data Structure
3. Data Use Case

EECE 490: Introduction to ML 3

Data in ML based on sources

EECE 490: Introduction to ML 4

Time Series Data

Time series data is data collected

over time at regular or irregular
intervals.

EECE 490: Introduction to ML 5

Time Series Data Popular Real World Use Case

Netflix uses time series data to analyze:

- User Viewing Patterns

- Content Engagement Over Time
- Personalized Recommendations
- Seasonal Trends
- User Retention Strategies

EECE 490: Introduction to ML 6

Text Data
Text data can be:

- Natural Language: any language that

occurs naturally in a human community
by a process of use, repetition, and
change.
- Programming Language: Python, C++,
Java,..
- Markup Language: HTML, XML,..
- Symbolic Text: mathematical equations
- Log Files: logs reported by software or
hardware
EECE 490: Introduction to ML 7
Text Data Real World Use Case

AI personal assistants have gained extreme popularity in the last couple of years,
helping automatine or accelerate day-to-day work.

EECE 490: Introduction to ML 8

Image Data

Image data is visual data

collected via cameras or other
imaging devices.

EECE 490: Introduction to ML 9

Image Data Real World Use Case

Google Image uploads uses ML to

identify similar images available on the
internet.

EECE 490: Introduction to ML 10

Video Data

Video data refers to a

sequence of images (frames)
captured and displayed in rapid
succession over time, typically
accompanied by audio.

EECE 490: Introduction to ML 11

Video Data Real World Use Case

Tesla autopilot uses real-time

footage captured through
dashcams to identify
surrounding objects.

EECE 490: Introduction to ML 12

Audio Data

Audio data refers to sound signals

captured in digital or analog form,
typically represented as waveforms or
frequency spectrums, encompassing
speech, music, environmental sounds,
or other acoustic information for
analysis and processing.

EECE 490: Introduction to ML 13

Audio Data Real World Use Case

Music generation is among the most

popular use cases for audio data in
AI.

EECE 490: Introduction to ML 14

Tabular Data

Tabular data is structured data

organized in rows and columns,
resembling a table or
spreadsheet, where each row
represents an individual data
instance (e.g., a customer,
transaction, or observation), and
each column represents a
specific feature or attribute
associated with the instances.

EECE 490: Introduction to ML 15

Tabular Data Real World Use Case

SoFi utilizes machine learning

algorithms to assess applicants'
creditworthiness by analyzing various
data sources, including educational
attainment, utility payments, insurance
claims, and mobile phone usage.

EECE 490: Introduction to ML 16

Data in ML Based on Structure

EECE 490: Introduction to ML 17

Structured Data

EECE 490: Introduction to ML 18

Example of structured data

Data to predict the price of a house.

EECE 490: Introduction to ML 19

Unstructured Data

EECE 490: Introduction to ML 20

Example of unstructured data

Dataset for predicting

handwritten digits.

EECE 490: Introduction to ML 21

Semi-Structured Data

EECE 490: Introduction to ML 22

Semi-structured data example

Data to to analyze sale patterns

EECE 490: Introduction to ML 23

Data in ML based on purpose

EECE 490: Introduction to ML 24

Training Data

Training data is fed into a machine

learning algorithm to teach it how to
perform a task or to analyze the data
patterns. Once the algorithm is
trained, it is called a machine learning
model.

EECE 490: Introduction to ML 25

Validation Data

Validation dataset is used to check the model’s performance during the training
process. It allows us to choose which model ‘settings’ result in the highest
accuracy.

EECE 490: Introduction to ML 26

Testing Dataset

Testing dataset is used to check the performance of the model on data it has not
seen during training.

EECE 490: Introduction to ML 27

The Dataset Split

EECE 490: Introduction to ML 28

Features: Explicit and Implicit Information
in the Dataset

EECE 490: Introduction to ML 29

Stock Price Prediction Example

Let’s say we want to predict the price of a stock, what information would you look
at to make this prediction?

EECE 490: Introduction to ML 30

Stock Price Prediction Example

EECE 490: Introduction to ML 31

Stock Price Prediction Example

If we wanted to feed the machine learning algorithm training data to allow it to

learn how to predict the prices of stock, the features would be the input to the
algorithm.

EECE 490: Introduction to ML 32

Features of a Machine Learning Dataset

Features are the individual

measurable properties or
characteristics of the data used by
a machine learning model to
make predictions or decisions.

EECE 490: Introduction to ML 33

Impact of Feature Properties on ML Model

Not all features have an equal

contribution to the model’s
prediction, but all features must be
relevant.

EECE 490: Introduction to ML 34

Implicit Information within Features

The role of the machine learning model is to understand the underlying patterns
and informations that the features hold. This implicit set of information is learned
and stored within the model’s trainable parameters.

EECE 490: Introduction to ML 35

Features in non-tabular data types

- Text Data: The features could only be the input text (generation purpose) or
could include useful features like “most frequent words”, “document length”,
(prediction purpose)..
- Image Data: The features could include only the color intensities inside the
image or could include shape, depth of pixels, …
- Time Series and Audio Data: The features could include only the timestamp
and the value for each interval, but could include multiple other inferred
features like spectral and pitch features.

EECE 490: Introduction to ML 36

How ML Algorithms Ingest Data

EECE 490: Introduction to ML 37

Numerical Representations

Machine learning models can only

process data in numerical
representations because
mathematical computations
underlie their functionality.
Algorithms rely on numerical
operations like matrix
multiplications, dot products,
and gradient calculations, which
require data to be encoded as
numbers.

EECE 490: Introduction to ML 38

Numerical Representation of Tabular Data

The inputs and outputs of the machine learning model should be in numerical
format. Let’s say we want to predict if a person will get approved on a home loan.

EECE 490: Introduction to ML 39

Numerical Representation of Tabular Data

We have two types of features (including output): Numeric and Categorical

EECE 490: Introduction to ML 40

Numerical Representation of Tabular Data

The goal is to transform all the features to a numeric representation. Obviously, we

do not need to change the features that are already numeric, so let’s explore the
categorical features.

First, we need to import our data from kaggle.

EECE 490: Introduction to ML 41

Numerical Representation of Tabular Data

After importing our data we need to load it. From the dataset card on kaggle, we
see that we have two csv files: one for training, and one for testing. We will work
on the training file.

To load the training csv file, we will use the pandas library.

EECE 490: Introduction to ML 42

Numerical Representation of Tabular Data

Now that we have our dataset, let’s select the categorical features. We can know
which features are categorical by using the .info() function in pandas that would
tell us the object type of the feature.

EECE 490: Introduction to ML 43

Numerical Representation of Tabular Data

Alternatively, we can use the .select_dtypes() to select the features that have the
‘object’ type and print a list of their names.

EECE 490: Introduction to ML 44

Numerical Representation of Tabular Data

The process of transforming a

categorical feature into its
numeric representation is called
‘encoding’.

EECE 490: Introduction to ML 45

Examples of encoding methods

Feature: Color of a rose: Red, White, Yellow, Pink

1. Label Encoding: 2, 3, 1, 0 (done in alphabetical order unless specified

otherwise)
2. One-hot encoding: [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]
3. Ordinal Encoding: 0, 1, 2, 3 (according to order of occurrence or hierarchy)
4. Target Encoding: 0.8, 0.7, 0.9, 0.6 (mean target value, e.g., price or rating,
calculated for each category).

EECE 490: Introduction to ML 46

Numerical Representation of Tabular Data

Let’s explore how we can use each encoding type in python code for the ‘property
area’ feature in our dataset. The first thing we need to do is take a look at the
‘categories’ of this feature and their values.

EECE 490: Introduction to ML 47

Numerical Representation of Tabular Data

Using label encoding:

EECE 490: Introduction to ML 48

Numerical Representation of Tabular Data

Using One Hot Encoding:

EECE 490: Introduction to ML 49

Numerical Representation of Tabular Data

Using ordinal encoding:

EECE 490: Introduction to ML 50

Numerical Representation of Tabular Data

Using target encoding with ‘Loan Amount Term’ as the target variable.

EECE 490: Introduction to ML 51

Numerical Representation of Image Data

To understand how to represent an image in a numerical format, we need to

understand what makes an image. The smallest building block of an image is
called a pixel.

EECE 490: Introduction to ML 52

Numerical Representation of Image Data

The size of an image is described by the number of pixels in the height and the
width.

Example: 1920x1080. This means that the screen will have a width of 1,920
pixels while the height of the screen will be 1,080 pixels. This results in a grand
total of 2,073,600 pixels on-screen.

Each pixel has an intensity value that describes the intensity of color at that pixel.

EECE 490: Introduction to ML 53

Numerical Representation of Image Data

The matrix of pixel intensities in

an image is used as a numeric
representation of that image.
The range of an image’s pixel
intensity depends on the image
type.
We will go through the main image
types: Black and white, Grayscale,
RGB, RGBA, Multi-Spectral, and
Depth Maps

EECE 490: Introduction to ML 54

Manipulating Image Data: OpenCV Library

To visualize the pixel values, we will use the opencv library to load the image and
print its values.

For readability, we will resize the images to a smaller size.

EECE 490: Introduction to ML 55

Numerical Representation of Image Data

Black and white images: Pixels are either 0 (black) or 1 (white), representing a
binary image.

EECE 490: Introduction to ML 56

Numerical Representation of Image Data

Grayscale Images: Pixel values

range from 0 (black) to 255 (white) for
8-bit images. Higher bit-depth
grayscale images (e.g., 16-bit) extend
this range.

EECE 490: Introduction to ML 57

Numerical Representation of Image Data

RGB images: each pixel in an RGB image

can be represented as three values: Red,
Green, and Blue.

Thus, RGB images can be represented as

three channels. Each channel represents
one red, green, and blue intensities
respectively.

EECE 490: Introduction to ML 58

Numerical Representation of Image Data

RGBA images: Similar to RGB but with an

additional alpha channel for transparency,
where the alpha value typically ranges
from 0 (fully transparent) to 255 (fully
opaque).

EECE 490: Introduction to ML 59

Numerical Representation of Image Data

Multi-Spectral images: Images captured

across multiple wavelengths of the
electromagnetic spectrum, typically beyond
the standard visible light range (red, green,
and blue).
Commonly include 4–12 spectral bands,
compared to the 3 bands (RGB) in standard
images. Examples: Visible (RGB),
near-infrared (NIR), short-wave infrared
(SWIR), and ultraviolet (UV).

EECE 490: Introduction to ML 60

Numerical Representation of Image Data

Depth Maps: A representation of the

distances between the camera (or
sensor) and objects in a scene, where
each pixel's value corresponds to the
depth (or distance) of that point. These
maps are typically grayscale images
where brighter pixels represent objects
closer to the camera, and darker pixels
represent objects farther away.

EECE 490: Introduction to ML 61

Numerical Representation: Audio & Time Series Data

Depending on your application, time

series and audio data can be
transformed to either an image
representation or tabular
representation.

EECE 490: Introduction to ML 62

Numerical Representation: Audio & Time Series Data
Sliding Window: An approach used to
capture discrete instances of continuous
data that can be used for image or
tabular transformation.
After segmenting the data into windows,
a ‘window function’ is applied to each
segment to reduce the edge effects
(discontinuities) that would otherwise be
introduced by dividing the signal into
chunks.

EECE 490: Introduction to ML 63

Numerical Representation: Audio & Time Series Data

A window function is a mathematical

function that is applied to each segment
of the audio signal before performing any
analysis (like Fourier Transform). The
purpose of the window function is to
smooth the signal at the edges of each
segment, reducing the abrupt
discontinuities.
This minimizes the spectral leakage
(energy spreading into other frequency
bins) that results from discontinuities at the
edges of the windows.

EECE 490: Introduction to ML 64

Numerical Representation: Audio & Time Series Data

Popular window functions:

1. Hamming Window:

2. Blackman-Harris Window:

EECE 490: Introduction to ML 65

Numerical Representation: Audio & Time Series Data
Tabular Representation
Tabular representation of Audio and Time series data can be achieved by
documenting the values of each window as a row in a structured representation.

EECE 490: Introduction to ML 66

Numerical Representation: Audio & Time Series Data
Image Representation
Spectrograms are created by:
1. Applying Short Time Fourier Transform
on the window after the window function
has been applied.
2. Then, STFT is converted to a log scale to
better match human hearing sensitivity (we
are more sensitive to changes in lower
frequencies than higher ones).
3. The spectrogram is then created by
stacking the frequency content of each
windowed segment along the time axis.

EECE 490: Introduction to ML 67

Reminder of Fourier Transform

The Fourier Transform (FT) converts a signal from the time domain (how a
signal changes over time) to the frequency domain (what frequencies are
present in the signal and their amplitudes).

EECE 490: Introduction to ML 68

Reminder of Fourier Transform

Fourier transform variations:

EECE 490: Introduction to ML 69

Numerical Representation: Audio & Time Series Data
Image Representation
A more advanced version of the spectrogram is
the Mel-frequency image.
1. Compute the spectrogram of the signal.
2. Apply the Mel Filter Bank that divides the
frequency range (e.g., 0–8000 Hz for a 16
kHz signal) into N Mel bands (e.g., 40
bands).
3. For each Mel band, compute the power
(square of the magnitude) of the
frequencies within that band.
4. Plot the log power values of the Mel bands
over time to create a 2D image

EECE 490: Introduction to ML 70

Numerical Representation of Text Data

Before we transform text into its numeric format, we need to pre-process the text:
Mandatory pre-processing:
1. Tokenization
2. Vocabulary Building
Optional pre-processing:
1. Remove links, tags, emojis, stop words, extra spaces and punctuations.
2. Transform all text into lower case.
3. Replace accented characters, numbers, and special characters with a different
representation.

EECE 490: Introduction to ML 71

Numerical Representation of Text: Tokenization

Tokenization: The process of segmenting text into smaller, discrete units

called tokens, such as words, subwords, or characters. These tokens serve as
the basic building blocks for representing text in a machine learning model,
where each token is mapped to a unique index or representation.

EECE 490: Introduction to ML 72

Numerical Representation of Text: Tokenization

5 main types of tokenization are used to to process text for natural language
models:

1. Character level tokenization.

2. Byte level tokenization.
3. Word level tokenization.
4. Sub-word level tokenization.
5. Sentence level tokenization.

EECE 490: Introduction to ML 73

Numerical Representation of Text: Tokenization

Character Tokenization Advantages Disadvantages Use-Cases

Definition

Breaks text into individual - Handles rare or unknown - Produces longer - Languages with no
characters. words effectively. sequences, increasing spaces (e.g., Chinese,
- Compact vocabulary computational cost. Japanese).
size. - Loss of semantic - Handling rare or OOV
- Robust to spelling information at the words.
variations and errors. character level. - Autocompletion and text
- Requires deeper models. generation tasks.

EECE 490: Introduction to ML 74

Numerical Representation of Text: Tokenization

Python implementation of character level tokenization.

EECE 490: Introduction to ML 75

Numerical Representation of Text: Tokenization
Byte level tokenization Advantages Disadvantages Use-Cases
Definition

Breaks text into individual - Handles all characters, - Produces very - Language-agnostic tasks.
bytes or characters, including special and fine-grained tokenization - Low-resource languages.
typically at the byte level, unseen ones, with no OOV (often single character - Models requiring robust
to handle any character, issues. level), which can lead to handling of all characters,
including rare or unseen - Compact vocabulary long sequences. such as multilingual or
ones. size. - Loss of higher-level code-switching tasks.
- Works well with any text, semantic meaning.
regardless of language or
structure.

EECE 490: Introduction to ML 76

Numerical Representation of Text: Tokenization

Python implementation of byte level tokenization.

Some characters are represented by multiple bytes, like emojis.

EECE 490: Introduction to ML 77

Numerical Representation of Text: Tokenization
Word Tokenization Advantages Disadvantages Use-Cases
Definition

Breaks text into individual - Easy to understand and - Struggles with - Traditional NLP tasks
words, usually based on implement. out-of-vocabulary (OOV) (e.g., sentiment analysis,
spaces or punctuation. - Retains semantic words. text classification).
meaning at the word level. - Large vocabulary size
- Works well for languages increases computational
with clear word boundaries cost.
(e.g., English). - Sensitive to spelling
variations.

EECE 490: Introduction to ML 78

Numerical Representation of Text: Tokenization

Python implementation of word

level tokenization.

Different methods and libraries

might tokenize the same string
differently based on the
pre-processing techniques used.

EECE 490: Introduction to ML 79

Numerical Representation of Text: Tokenization
Sub-Word Tokenization Advantages Disadvantages Use-Cases
Definition

Breaks words into smaller - Handles - Requires a pre-defined - Pre-trained language

units, such as subwords, out-of-vocabulary (OOV) vocabulary, which may not models (e.g., BERT, GPT,
often based on frequency words by splitting them adapt well to new RoBERTa).
or statistical patterns. into known subwords. domains. - Multilingual NLP tasks.
- Balances vocabulary size - Complex tokenization - Domains with frequent
and sequence length. process compared to compound words or
- Retains semantic word-level approaches. technical jargon.
meaning at a finer
granularity.

EECE 490: Introduction to ML 80

Numerical Representation of Text: Tokenization

Sub-word tokenization implementation methods: Byte Pair Encodings:

The basic idea of BPE is to iteratively merge the most frequent pair of
consecutive bytes or characters in a text corpus until a predefined vocabulary
size is reached. The resulting subword units can be used to represent the original
text in a more compact and efficient way.

BPE is one of the most popular tokenization methods and is widely used in the
training of LLMs.

EECE 490: Introduction to ML 81

Numerical Representation of Text: Tokenization
How does BPE work?
1. Separate the text into characters.
These are your initial set of tokens
2. Select the two characters that
occur most frequently next to each
other.
3. Merge these two characters tleft
and tright into a new token.
4. Repeat until the number of tokens
in your vocabulary meets a
threshold.

EECE 490: Introduction to ML 82

Numerical Representation of Text: Tokenization

Sub-word tokenization implementation methods: Word Piece:

Similar to byte pair encoding, but instead of using frequency to determine the
existence of a sub-word, a probability function (P) is used to determine the
likelihood of sub-words existing in the sentence (S).

EECE 490: Introduction to ML 83

Numerical Representation of Text: Tokenization

BERT, a very famous language model, uses WordPiece tokenization method. So,
we can just load its tokenizer.

EECE 490: Introduction to ML 84

Numerical Representation of Text: Tokenization

Sub-word tokenization implementation methods: Unigram Language Model:

The Unigram Language Model is a subword tokenization method used in

SentencePiece that selects the best subwords for a given corpus using a
probabilistic framework. It is particularly effective for creating a compact
vocabulary while maintaining good coverage of linguistic phenomena.

Unlike Word Piece and BPE, which iteratively merges subwords, this method
starts with a large set of subwords and prunes them to reach the desired
vocabulary size.

EECE 490: Introduction to ML 85

Numerical Representation of Text: Tokenization

How does ULM work (in SentencePiece Tokenizer)?

1. Create a set of all possible sub-word candidates, called unigrams. (Corpus
segmentation or including all characters in the text)
2. For a sentence S, each subword ti is assigned a probability, where the goal is
to maximize P(S) for the training dataset.

3. Calculate the probability of each subword appearing in the tokenized text,

adjust it to maximize the likelihood of the training data, and filter out
low-probability unigrams.
4. Repeat until the number of unique tokens reaches a threshold.

EECE 490: Introduction to ML 86

Numerical Representation of Text: Tokenization

A unigram language model is trained on a corpus of data and can be loaded and
used for tokenization.

EECE 490: Introduction to ML 87

Numerical Representation of Text: Tokenization
Sentence Advantages Disadvantages Use-Cases
Tokenization
Definition

Breaks text - Captures broader semantic meaning - May lose finer details from - Sentence-based
into complete by preserving sentence context. within sentences (e.g., individual tasks such as
sentences, - Reduces sequence length compared word meaning). sentiment analysis,
treating each to word or character-level tokenization. - Cannot handle complex machine translation,
sentence as a - Facilitates tasks that require sentence structures (e.g., subword-level question answering,
single unit or understanding (e.g., sentiment analysis, nuances, context beyond the document
token. machine translation). sentence). classification.

EECE 490: Introduction to ML 88

Numerical Representation of Text: Tokenization

Sentence Tokenizers are straightforward, where they are either rule based like
NLK, Spacy, Regex, and CoreNLP. Otherwise, a machine learning model can be
trained to predict the boundaries of sentences since rule-based approaches might
become too nuanced.

EECE 490: Introduction to ML 89

Numerical Representation of Text:
Vocabulary Building

The vocabulary is a set of unique tokens available in the corpus. Based on the
tokenization strategy, they could be characters, bytes, words, sub-words, or
sentences.

EECE 490: Introduction to ML 90

Numerical Representation of Text:
Vocabulary Building
Special tokens are added to the vocabulary to account for different cases:

1. <UNK>: for unknown words (words that did not occur in the training corpus)
2. <STRT>: for tokens at the beginning of sentences
3. <END>: for the last token in a sentence or paragraph.
4. </W>: in case of byte, character, or sub-word encoding, this token is used to
indicate the end of a word.

There are many more special sequences that we will come across in the
upcoming chapters.

EECE 490: Introduction to ML 91

Numerical Representation of Text Data

Now that we have a defined set of

vocabulary words for our corpus, we
can use one of two methods:

- Vectorization: Using the

statistical properties of the text to
represent it.
- Embedding: Using a deep
learning model to output a vector
representation of the token.

EECE 490: Introduction to ML 92

Numerical Representation of Text Data

EECE 490: Introduction to ML 93

Numerical Representation of Text Data: Vectorization
Indexing Vocab Tokens
Each token in the vocabulary is indexed with an integer. The choice of token
indices can be based on the order of occurrence, frequency (as in statistical
methods like Bag of Words), alphabetical order, or specific rules defined by the
tokenizer (in BPE, the indexes are created during the training process).

EECE 490: Introduction to ML 94

Numerical Representation of Text Data: Vectorization
Bag of Words
Bag of Words: This method uses word-level tokenization. Each unique word in
the corpus is assigned an index, and the text is represented as a vector indicating
the frequency of each word.

EECE 490: Introduction to ML 95

Numerical Representation of Text Data: Vectorization
Bag of Words

EECE 490: Introduction to ML 96

Numerical Representation of Text Data: Vectorization
TF-IDF
TF-IDF can be applied on any tokenization level and combines two components:

1. Term Frequency (TF): Measures how often a word appears in a document.

2. Inverse Document Frequency (IDF): Measures how unique or rare a word is
across the entire corpus.

EECE 490: Introduction to ML 97

Numerical Representation of Text Data: Vectorization
TF-IDF

EECE 490: Introduction to ML 98

Numerical Representation of Text Data: Vectorization
TF-IDF

EECE 490: Introduction to ML 99

Numerical Representation of Text Data: Embeddings

An embedding is a dense, multidimensional representation of a token that

encodes semantic, contextual, and statistical information, capturing
relationships and patterns within the data.

EECE 490: Introduction to ML 100

Numerical Representation of Text Data: Embeddings
Embeddings are generated through machine learning models that are trained
specifically to output this numeric representation of text.

Unlike traditional vectorization techniques, the values within an embedding vector

are not directly interpretable. Instead, embeddings representing similar
meanings are positioned closer together in the embedding space.

EECE 490: Introduction to ML 101

Numerical Representation of Text Data: Embeddings

We previously mentioned that indexing our

vocabulary could be sufficient to represent
text in a numeric format. However, this
approach disregards the semantic and
structural properties of the text.

A solution to this limitation is to input the

vocabulary into an embedding model,
which learns dense vector representations
for each token. These embeddings capture
the semantic relationships and contextual
information of the text.

EECE 490: Introduction to ML 102

Numerical Representation of Text Data: Embeddings
Word2Vec Model

EECE 490: Introduction to ML 103

Numerical Representation of Video Data

EECE 490: Introduction to ML 104

Thank You

EECE 490: Introduction to ML 105

Operation and Maintenance Manual PDF
75% (4)
Operation and Maintenance Manual PDF
222 pages
Rectifier User Manual
67% (6)
Rectifier User Manual
123 pages
Forma Scientific - 86 Freezer Models 916 - 917 - 923 - 925 - and 926 Manual ENG
100% (1)
Forma Scientific - 86 Freezer Models 916 - 917 - 923 - 925 - and 926 Manual ENG
63 pages
[Fall 2024] Intro to ML
No ratings yet
[Fall 2024] Intro to ML
51 pages
C3-+Data+Exploration+and+Transformation.pptx
No ratings yet
C3-+Data+Exploration+and+Transformation.pptx
173 pages
Week01 Intro AI
No ratings yet
Week01 Intro AI
53 pages
Building A ML System
No ratings yet
Building A ML System
42 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
EE353 - 769 00 Course Introduction
No ratings yet
EE353 - 769 00 Course Introduction
28 pages
Core Concepts of AI
No ratings yet
Core Concepts of AI
46 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
2ML Problem
No ratings yet
2ML Problem
5 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
FML Winter 24 Lecture 1 Introduction
No ratings yet
FML Winter 24 Lecture 1 Introduction
18 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
C4 +Supervised+Machine+Learning.pptx
No ratings yet
C4 +Supervised+Machine+Learning.pptx
169 pages
COURSE ON MACHINE LEARNING
No ratings yet
COURSE ON MACHINE LEARNING
4 pages
MAI Lecture 01 Introduction
No ratings yet
MAI Lecture 01 Introduction
52 pages
ML Notes
No ratings yet
ML Notes
7 pages
Short Brief - Machine Learning
No ratings yet
Short Brief - Machine Learning
10 pages
Week3 02 Dataset Characteristics
No ratings yet
Week3 02 Dataset Characteristics
41 pages
4_Unit 2 - Lecture 1 Types of DataSet-L1
No ratings yet
4_Unit 2 - Lecture 1 Types of DataSet-L1
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Intro To ML
No ratings yet
Intro To ML
3 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
ML 23 First Lectures 2 3 v0.1
No ratings yet
ML 23 First Lectures 2 3 v0.1
66 pages
Machine Learning: Bilal Khan
No ratings yet
Machine Learning: Bilal Khan
26 pages
Dasar Statistika Dan Matematika
No ratings yet
Dasar Statistika Dan Matematika
30 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Lec 01 - Intro To ML
No ratings yet
Lec 01 - Intro To ML
28 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Introduction to ML for Business Analysts and Managers
No ratings yet
Introduction to ML for Business Analysts and Managers
41 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
35 pages
Nqd28MNrTwKndvDDay8Cgg C MMLPGC B Managing ML Projects With GC Student Slides v2.0
No ratings yet
Nqd28MNrTwKndvDDay8Cgg C MMLPGC B Managing ML Projects With GC Student Slides v2.0
118 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
CE880_lecture5_slides
No ratings yet
CE880_lecture5_slides
32 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Overview of machine learning
No ratings yet
Overview of machine learning
60 pages
Machine Learning: Short Hand Book
No ratings yet
Machine Learning: Short Hand Book
14 pages
Query Generation Using Nadaq System
No ratings yet
Query Generation Using Nadaq System
11 pages
Previous Lecture
No ratings yet
Previous Lecture
43 pages
9.2.2021 CS 601 (Introduction To ML - Continue) - Notes
No ratings yet
9.2.2021 CS 601 (Introduction To ML - Continue) - Notes
4 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Lecture 1 Course Introduction
No ratings yet
Lecture 1 Course Introduction
18 pages
Introduction
No ratings yet
Introduction
18 pages
CE802_Lec_IntroML_handouts
No ratings yet
CE802_Lec_IntroML_handouts
24 pages
An Introduction To Machine Learning and Its Applications
No ratings yet
An Introduction To Machine Learning and Its Applications
8 pages
An Introduction To Machine Learning and How To Teach Machines To See
No ratings yet
An Introduction To Machine Learning and How To Teach Machines To See
50 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning Batch 8 2021
100% (1)
Machine Learning Batch 8 2021
73 pages
ML Syllabus
No ratings yet
ML Syllabus
5 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
10 pages
Mathematical Foundations of Machine Learning
No ratings yet
Mathematical Foundations of Machine Learning
7 pages
03 Machine Learning Overview
No ratings yet
03 Machine Learning Overview
24 pages
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
50% (2)
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
175 pages
internshipml(J2)
No ratings yet
internshipml(J2)
50 pages
UNIT2
No ratings yet
UNIT2
20 pages
The Role of Mathematics in Machine Learning: March 2023
No ratings yet
The Role of Mathematics in Machine Learning: March 2023
13 pages
ML Merged
No ratings yet
ML Merged
433 pages
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Marketing Automation
No ratings yet
Marketing Automation
1 page
Read the following passage about the development of electric cars and mark the letter A
No ratings yet
Read the following passage about the development of electric cars and mark the letter A
5 pages
BS 3701-1964
No ratings yet
BS 3701-1964
12 pages
Lehe0939 00
No ratings yet
Lehe0939 00
2 pages
CIAC Memo Circular 02-2020 - Electronic Signatures
No ratings yet
CIAC Memo Circular 02-2020 - Electronic Signatures
4 pages
Fbn
No ratings yet
Fbn
5 pages
Geometric student report - копия
No ratings yet
Geometric student report - копия
4 pages
Asm1 1633
No ratings yet
Asm1 1633
36 pages
Second Semester 2019-2020
0% (1)
Second Semester 2019-2020
2 pages
ucs-c220-m6-rack-server-ds
No ratings yet
ucs-c220-m6-rack-server-ds
7 pages
PDF Progress in Digital and Physical Manufacturing Proceedings of ProDPM 19 Henrique A. Almeida Download
100% (3)
PDF Progress in Digital and Physical Manufacturing Proceedings of ProDPM 19 Henrique A. Almeida Download
52 pages
Configuring JVM Parameters
No ratings yet
Configuring JVM Parameters
3 pages
Lecture 09 Functional Analysis System Technique
No ratings yet
Lecture 09 Functional Analysis System Technique
11 pages
Test Bank - Exponential and Logarithmic Functions
No ratings yet
Test Bank - Exponential and Logarithmic Functions
3 pages
SANY SY365 Hydraulic Diagram MPTS ™
100% (1)
SANY SY365 Hydraulic Diagram MPTS ™
6 pages
Air - Purifier - in India
No ratings yet
Air - Purifier - in India
1 page
Cisco SD-WAN Hands-On Training - LABs - 6.8-Print
No ratings yet
Cisco SD-WAN Hands-On Training - LABs - 6.8-Print
39 pages
Energy Efficient Motor Aspects
No ratings yet
Energy Efficient Motor Aspects
3 pages
Ficha Tecnica Generador
No ratings yet
Ficha Tecnica Generador
10 pages
Fairmoney Product Training Report
No ratings yet
Fairmoney Product Training Report
9 pages
Arduino Magnetic Board: Instructables
No ratings yet
Arduino Magnetic Board: Instructables
14 pages
Assign A Secondary Time Manager For Primary Time Managers
No ratings yet
Assign A Secondary Time Manager For Primary Time Managers
20 pages
Research Paper On First Generaion Computer
No ratings yet
Research Paper On First Generaion Computer
10 pages
Spring Framework Reference Documentation I. Overview of Spring Framework II. What's New in ...
No ratings yet
Spring Framework Reference Documentation I. Overview of Spring Framework II. What's New in ...
645 pages
Grand Vitara 08 PDF
No ratings yet
Grand Vitara 08 PDF
40 pages
SM Dedicated Outdoor Air Unit DOAS
No ratings yet
SM Dedicated Outdoor Air Unit DOAS
204 pages
Inter Shop Displays: WWW - Intershopsa.co - Za
No ratings yet
Inter Shop Displays: WWW - Intershopsa.co - Za
1 page