0% found this document useful (0 votes)

9 views5 pages

Bavya NLP 0.1

The document outlines a natural language processing project focused on word frequency analysis, measures of central tendency, and visualization. It details steps for preprocessing text, calculating word frequencies, and analyzing word lengths, providing Python implementations and outputs for each section. Key findings include the most common word 'data' appearing three times and a mean word length of 6.06.

Uploaded by

monish.g2022ai-ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Bavya NLP 0.1

Uploaded by

monish.g2022ai-ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

NATURAL LANGUAGE PROCESSING

NAME : Bavya C
CLASS : AI-DS ‘A’
ROLLNO : 22AD010
1. Word Frequency Analysis
Explanation:

Word frequency analysis identifies how often each word appears in a text. It helps determine the text's dominant
themes and frequent patterns.

Steps to Solve:

 Preprocessing: Clean the text to remove punctuation, convert to lowercase, and split into words.
 Count Total Words: Count all the words in the cleaned list.
 Calculate Word Frequencies: Use a dictionary or collections.Counter to calculate the frequency of
each word.
 Find the Most Common Word: Identify the word with the highest count.

Python Implementation:

OUTPUT:

Total words: 31

Word frequencies: Counter({'data': 3, 'is': 2, 'and': 2, 'that': 1, 'science': 1, 'an': 1, 'interdisciplinary': 1, 'field': 1, 'uses': 1,
'various': 1, 'techniques': 1, 'algorithms': 1, 'tools': 1, 'to': 1, 'extract': 1, 'insights': 1, 'knowledge': 1, 'from': 1, 'structured': 1,
'unstructured': 1, 'driven': 1, 'decisionmaking': 1, 'transforming': 1, 'industries': 1, 'worldwide': 1})

Most common word: 'data' appears 3 times.

2. Measures of Central Tendency
Explanation:

Word lengths in the text are analyzed using three statistical measures:

Mean: The average length of words.

Median: The middle value in the sorted word lengths.

Mode: The most frequently occurring word length.

Steps to Solve:

 Preprocess the text and calculate word lengths.

 Use statistical formulas or libraries to compute the mean, median, and mode.
 Evaluate which measure best represents the data.

Python Implementation:
OUTPUT:

Mean word length: 6.06

Median word length: 6.0

Mode word length: 4

Typical word length: Median, as it reduces the impact of very short or long words.

3. Visualization
Explanation:

Visualizing the word frequencies offers insights into the text's structure and focus:

Top 5 Words: Identifies the most frequently occurring words.

Bar Chart: Compares the frequencies of these top words.

Insights: Highlights dominant themes or filler words.

Steps to Solve:

1. Extract the top 5 most common words.

2. Plot their frequencies using a bar chart.
3. Analyze the chart to draw conclusions.

Python Implementation:
OUTPUT:

OUTPUT (Bar Chart):

A bar chart with the following:

Words: data, is, and, that, science

Frequencies: 3, 2, 2, 1, 1

Practical Scientific Computing in Python A Workbook
No ratings yet
Practical Scientific Computing in Python A Workbook
43 pages
Word Count - CLASS
No ratings yet
Word Count - CLASS
15 pages
05 - Dictionaries and Tuples
No ratings yet
05 - Dictionaries and Tuples
61 pages
University of Science & Technology, Bannu: Lab Report No: 04 Artificial Intelligence Lab
No ratings yet
University of Science & Technology, Bannu: Lab Report No: 04 Artificial Intelligence Lab
7 pages
Lab - Activity-Iii: ST ND
No ratings yet
Lab - Activity-Iii: ST ND
9 pages
Keyword Extraction From A Single Document Using Word Co-Occurrence Statistical Information
No ratings yet
Keyword Extraction From A Single Document Using Word Co-Occurrence Statistical Information
5 pages
Big 4 Python Challenge
No ratings yet
Big 4 Python Challenge
6 pages
Keyword 2
No ratings yet
Keyword 2
5 pages
Revised Blooms Handout
No ratings yet
Revised Blooms Handout
6 pages
BRM 41 4 977
No ratings yet
BRM 41 4 977
14 pages
Python 2 CBP
No ratings yet
Python 2 CBP
12 pages
CSCI 2270 - Data Structures and Algorithms Instructor Hoenigman Assignment 2 Due Friday, February 3 Before 3pm Word Analysis
No ratings yet
CSCI 2270 - Data Structures and Algorithms Instructor Hoenigman Assignment 2 Due Friday, February 3 Before 3pm Word Analysis
5 pages
#Question 4: Def in
No ratings yet
#Question 4: Def in
3 pages
03 Python
No ratings yet
03 Python
5 pages
Experiment 0
No ratings yet
Experiment 0
1 page
PROGRAMS
No ratings yet
PROGRAMS
4 pages
All Practicals
No ratings yet
All Practicals
33 pages
Assignment 0.2
No ratings yet
Assignment 0.2
8 pages
BDA - Module 5
No ratings yet
BDA - Module 5
31 pages
Dicle 2018
No ratings yet
Dicle 2018
8 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
0 Experimenteeff
No ratings yet
0 Experimenteeff
5 pages
Bag of Words
No ratings yet
Bag of Words
19 pages
Text Processing For NLP Frequency Distribution
No ratings yet
Text Processing For NLP Frequency Distribution
15 pages
DW Phy
No ratings yet
DW Phy
2 pages
PY0101EN 3 5 Practice - Lab 20230526 1685059200.jupyterlite
No ratings yet
PY0101EN 3 5 Practice - Lab 20230526 1685059200.jupyterlite
7 pages
Assignment 3 - Concordance of A Text File
No ratings yet
Assignment 3 - Concordance of A Text File
2 pages
Siddhant Part 2
No ratings yet
Siddhant Part 2
3 pages
Coword Analysis
No ratings yet
Coword Analysis
7 pages
Medi Client
No ratings yet
Medi Client
2 pages
Printing Frequently Appearing Words in A Text File
No ratings yet
Printing Frequently Appearing Words in A Text File
6 pages
IR Practical Code
No ratings yet
IR Practical Code
13 pages
Report
No ratings yet
Report
1 page
Summary
No ratings yet
Summary
1 page
Big Data Report
No ratings yet
Big Data Report
7 pages
Introduction To Large Language Models (LLMS) - Unit 6 - Week 4
No ratings yet
Introduction To Large Language Models (LLMS) - Unit 6 - Week 4
3 pages
Text File Question Bank Solutions
100% (1)
Text File Question Bank Solutions
14 pages
Lecture 10
No ratings yet
Lecture 10
7 pages
File Handling Program Questions
No ratings yet
File Handling Program Questions
6 pages
Text File (3 Mark)
No ratings yet
Text File (3 Mark)
16 pages
Bavya NLP 0.1
No ratings yet
Bavya NLP 0.1
5 pages
Computer Science Assignment For Grade XII-2 19.04.2024
No ratings yet
Computer Science Assignment For Grade XII-2 19.04.2024
2 pages
Harris
No ratings yet
Harris
5 pages
Text File Practice Questions
No ratings yet
Text File Practice Questions
3 pages
Python Lab Program 5
No ratings yet
Python Lab Program 5
4 pages
NLP EXP 3 (A) - Word Analysis
No ratings yet
NLP EXP 3 (A) - Word Analysis
2 pages
Exercise 2
No ratings yet
Exercise 2
2 pages
Cont 1
No ratings yet
Cont 1
3 pages
LV 4 M Sclakwa BR EZb B5 Qo
No ratings yet
LV 4 M Sclakwa BR EZb B5 Qo
3 pages
Ai TXT Unit3
No ratings yet
Ai TXT Unit3
22 pages
Class Xii Text File Handling Assignment
No ratings yet
Class Xii Text File Handling Assignment
3 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
5
No ratings yet
5
1 page
Most Frequent Words Project VTU Format
No ratings yet
Most Frequent Words Project VTU Format
3 pages
Text File Based Questions
No ratings yet
Text File Based Questions
4 pages
Logabaalan 22AD042
No ratings yet
Logabaalan 22AD042
5 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
From Everand
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
Eric Schmidt
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet

Bavya NLP 0.1

Uploaded by

Bavya NLP 0.1

Uploaded by

NATURAL LANGUAGE PROCESSING

Most common word: 'data' appears 3 times.

Mean: The average length of words.

Median: The middle value in the sorted word lengths.

Mode: The most frequently occurring word length.

 Preprocess the text and calculate word lengths.

Mean word length: 6.06

Median word length: 6.0

Mode word length: 4

Top 5 Words: Identifies the most frequently occurring words.

Bar Chart: Compares the frequencies of these top words.

Insights: Highlights dominant themes or filler words.

1. Extract the top 5 most common words.

OUTPUT (Bar Chart):

A bar chart with the following:

Words: data, is, and, that, science

You might also like