0% found this document useful (0 votes)
6 views1 page

Data Preprocessing

Uploaded by

Anil anil1318
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views1 page

Data Preprocessing

Uploaded by

Anil anil1318
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

DATA PREPROCESSING

Data cleaning involves removing any irrelevant


information, such as URLs, hashtags, and user
Data Cleaning
handles. It also involves removing stop words,

Text tokenization involves breaking down the text into


individual words or tokens. This step is essential because
Text Tokenization it allows the text to be analyzed at the word level.

Stop words are words that are commonly used in the


English language but do not carry significant meaning,
such as "a," "an," "the," and "is." Removing stop words
Stop Word Removal can help to reduce the dimensionality of the dataset and
improve the accuracy of the classification task.

Stemming and lemmatization are techniques used to


Stemming and Lemmatization reduce the variation of words in the dataset. Stemming
involves reducing a word to its root form, while
lemmatization involves reducing a word to its base form.

Feature encoding involves transforming categorical data


into numerical data that can be used for analysis. This
Feature Encoding step is essential because machine learning algorithms
can I only work with numerical data.

You might also like