0% found this document useful (0 votes)
9 views5 pages

Bavya NLP 0.1

The document outlines a natural language processing project focused on word frequency analysis, measures of central tendency, and visualization. It details steps for preprocessing text, calculating word frequencies, and analyzing word lengths, providing Python implementations and outputs for each section. Key findings include the most common word 'data' appearing three times and a mean word length of 6.06.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Bavya NLP 0.1

The document outlines a natural language processing project focused on word frequency analysis, measures of central tendency, and visualization. It details steps for preprocessing text, calculating word frequencies, and analyzing word lengths, providing Python implementations and outputs for each section. Key findings include the most common word 'data' appearing three times and a mean word length of 6.06.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

NATURAL LANGUAGE PROCESSING

NAME : Bavya C
CLASS : AI-DS ‘A’
ROLLNO : 22AD010
1. Word Frequency Analysis
Explanation:

Word frequency analysis identifies how often each word appears in a text. It helps determine the text's dominant
themes and frequent patterns.

Steps to Solve:

 Preprocessing: Clean the text to remove punctuation, convert to lowercase, and split into words.
 Count Total Words: Count all the words in the cleaned list.
 Calculate Word Frequencies: Use a dictionary or collections.Counter to calculate the frequency of
each word.
 Find the Most Common Word: Identify the word with the highest count.

Python Implementation:

OUTPUT:

Total words: 31

Word frequencies: Counter({'data': 3, 'is': 2, 'and': 2, 'that': 1, 'science': 1, 'an': 1, 'interdisciplinary': 1, 'field': 1, 'uses': 1,
'various': 1, 'techniques': 1, 'algorithms': 1, 'tools': 1, 'to': 1, 'extract': 1, 'insights': 1, 'knowledge': 1, 'from': 1, 'structured': 1,
'unstructured': 1, 'driven': 1, 'decisionmaking': 1, 'transforming': 1, 'industries': 1, 'worldwide': 1})

Most common word: 'data' appears 3 times.


2. Measures of Central Tendency
Explanation:

Word lengths in the text are analyzed using three statistical measures:

Mean: The average length of words.

Median: The middle value in the sorted word lengths.

Mode: The most frequently occurring word length.

Steps to Solve:

 Preprocess the text and calculate word lengths.


 Use statistical formulas or libraries to compute the mean, median, and mode.
 Evaluate which measure best represents the data.

Python Implementation:
OUTPUT:

Mean word length: 6.06

Median word length: 6.0

Mode word length: 4

Typical word length: Median, as it reduces the impact of very short or long words.

3. Visualization
Explanation:

Visualizing the word frequencies offers insights into the text's structure and focus:

Top 5 Words: Identifies the most frequently occurring words.

Bar Chart: Compares the frequencies of these top words.

Insights: Highlights dominant themes or filler words.

Steps to Solve:

1. Extract the top 5 most common words.


2. Plot their frequencies using a bar chart.
3. Analyze the chart to draw conclusions.

Python Implementation:
OUTPUT:

OUTPUT (Bar Chart):

A bar chart with the following:

Words: data, is, and, that, science

Frequencies: 3, 2, 2, 1, 1

You might also like