Theory of Computation
Theory of Computation
__________________________________________________
. a student of SY.BSC-CS Roll no: _______ has completed the required number of practicals
. in the subject of ____________________________________________________________ ... .
……………….. ……………………
. Prof. Incharge Principal
……….….….…. …...…..………….
…………………….. ……………………….
. Date College Stamp
Professor Name: Jayashree Patade Class : SY.BSC-CS
Semester : Sem IV (2023-2024)
*******
Theory-1
Tokenization
Tokenization is a fundamental process in natural language processing (NLP) that involves breaking down
a text into smaller units, or tokens. These tokens can be words, phrases, symbols, or any meaningful
chunks, depending on the level of granularity required for analysis. The purpose of tokenization is to
transform raw text into a format that is easier to process, analyze, and understand.
In simpler terms, imagine you have a sentence: "The quick brown fox jumps over the lazy dog."
Tokenization would break this sentence into individual units or tokens: "The," "quick," "brown," "fox,"
"jumps," "over," "the," "lazy", "dog", ".". Each of these words is now a separate token.
Basic Tokenization: The most straightforward tokenization is word tokenization, where the input text is
divided into words based on spaces. Punctuation marks are often treated as separate tokens as well.
For example, the sentence "Hello, how are you?" would be tokenized into: "Hello," ",", "how," "are,"
"you," "?"
Sentence Tokenization: Sometimes, it's necessary to tokenize a larger piece of text into sentences. This
process involves splitting the text into individual sentences.
Consider the paragraph: "Tokenization is crucial. It helps in various NLP tasks." Sentence tokenization
would yield: "Tokenization is crucial." and "It helps in various NLP tasks."
Whitespace Tokenization: Another simple tokenization method involves splitting the text based on
whitespace characters, such as spaces or tabs.
For instance, the string "apple banana cherry" would be tokenized into: "apple," "banana," "cherry."
Tokenization Challenges: Tokenization is not always a straightforward task due to challenges like
handling contractions, abbreviations, or languages with no clear word boundaries.
Consider the word "can't." It is a contraction of "cannot," and a simple tokenizer might treat it as "can"
and "'t." Advanced tokenizers take context into account to handle such cases better.
Tokenization Libraries: Many programming languages offer libraries for tokenization. In Python, the
Natural Language Toolkit (nltk) and spaCy are commonly used. These libraries provide pre-built
functions to tokenize text efficiently.
Use Cases:
Tokenization is a crucial preprocessing step in various NLP applications. Some common use cases
include:
Named entity recognition: Identifying entities like names, locations, and organizations.
1 | Page
Practical-1: Write a program for tokenization of given input
Source code:
Output:
2 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is tokenization?
2. What is sentence tokenization?
3. What is whitespace tokenization?
4. Which library is used for tokenization in python?
3 | Page
Theory-2
Regular Expression
Regular Expressions:
Definition: Regular expressions (regex or regexp) are sequences of characters that define a search pattern.
They are used for pattern matching within strings, allowing for the identification of specific sequences of
characters. Regular expressions are powerful tools for text processing and are widely used in
programming, text editors, and various other applications.
Components:
Metacharacters: Special characters with a reserved meaning (e.g., '*', '+', '.', '|').
Character Classes: A set of characters enclosed in square brackets (e.g., [abc] matches 'a', 'b', or 'c').
Anchors: Specify the position in the string (e.g., '^' for the start, '$' for the end).
Grouping: Parentheses are used to group expressions and establish the order of operations.
Example:
The regular expression a*b matches strings like "b", "ab", "aab", etc., where 'a' can occur zero or more
times followed by 'b'.
Regular Grammar:
Definition: A regular grammar is a type of formal grammar that generates regular languages. Regular
languages can be recognized by finite automata, and regular grammars are less expressive than
context-free grammars.
Components:
Production Rules: Define how non-terminals can be replaced by sequences of terminals and
non-terminals.
Start Symbol: Designates the initial non-terminal symbol from which the derivation begins.
Regular Production: A production rule of the form A → xB or A → x, where A and B are non-terminals,
and x is a string of terminals.
Example:
S → aA | b
A → aA | ε
This generates strings in the form of b or strings starting with 'a' followed by any number of 'a's.
Ex: b,a,aa,aaa,aaaa
4 | Page
Practical-2: Write a program for generating regular expressions for regular grammar
Source code:
Output:
5 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is a regular expression?
2. What is regular grammar?
3. What are terminals in grammar?
4. What are non-terminals in grammar?
6 | Page
Theory-3
Language
In the context of formal languages and grammars, a language is a set of strings (sequences of symbols)
that are generated or recognized by a given formal grammar. The formal grammar specifies the rules for
constructing valid strings in the language. In this explanation, The following provides a basic
understanding of languages and how they relate to grammars.
Symbols: Symbols are an entity or individual objects, which can be any letter, alphabet or any picture.
Example: 1, a, b, #
Alphabet (Σ): The alphabet of a language is the set of symbols or characters from which strings in the
language are constructed. For example, in the language of binary strings, the alphabet is {0, 1}.
String: A string is a finite sequence of symbols from the alphabet. For instance, "0101" is a string over the
binary alphabet.
Language (L): A language is a set of strings over a given alphabet. It represents the set of all valid or
accepted strings according to the rules defined by a formal grammar.
Example: 1
Example: 2
Formal Grammar: A formal grammar is a set of rules that defines the syntax of a language. It consists of:
Non-terminal Symbols: Symbols that can be replaced by sequences of symbols according to the rules.
Production Rules: Rules that define how non-terminal symbols can be replaced by sequences of terminal
and/or non-terminal symbols.
Start Symbol: The initial non-terminal symbol from which the derivation process begins.
Derivation: The process of applying production rules in a formal grammar to generate a string in the
language is called derivation. Starting with the start symbol, successive applications of production rules
lead to the formation of strings in the language.
Generative Power: A formal grammar's generative power refers to its ability to generate or produce
strings in a language. Different types of grammars (regular, context-free, context-sensitive, etc.) have
varying generative powers.
S -> aS | bA | ε
A -> ε
This grammar generates the language L, which consists of strings of the form a* b. The asterisk (*)
denotes zero or more occurrences of the symbol 'a'. Strings such as "b", "ab", "aab", etc., are part of this
language.
In summary, a language defined by a formal grammar represents a set of valid strings, and the grammar
provides the rules for generating or recognizing those strings. Different types of grammars have different
expressive powers, allowing for the description of various language classes.
7 | Page
Practical-3: Write a program for generating derivation sequence / language for the given sequence of
productions
Source code:
Output:
8 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is symbol in context of finite automata?
2. What is alphabet?
3. What is string?
4. What is language?
9 | Page
Theory-4
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
F: final state
δ: Transition function
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the null
move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
10 | Page
Practical-4: Design a Program for creating machine that accepts three consecutive one.
Source code:
Output:
11 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Finite Automata (FA) ?
2. What are the type of FA?
3. What is DFA?
4. What is NFA?
For Faculty Use
Correction Formative Assessment Timely completion Attendance
Parameters [ ] of practical [ ] Learning
Attitude[ ]
12 | Page
Theory-5
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
F: final state
δ: Transition function
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the null
move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
13 | Page
Practical-5: Design a Program for creating machine that accepts the string always ending with 101.
Source code:
Output:
14 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Finite Automata (FA) ?
2. What are the type of FA?
3. What is DFA?
4. What is NFA?
15 | Page
Theory-6
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
F: final state
δ: Transition function
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the null
move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
16 | Page
Practical-6: Design a program for accepting decimal number divisible by 2.
Source code:
Output:
17 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Finite Automata (FA) ?
2. What are the type of FA?
3. What is DFA?
4. What is NFA?
18 | Page
Theory-7
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
F: final state
δ: Transition function
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the null
move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
19 | Page
Practical-7: Design a program for creating a machine which accepts string having equal no. of 1’s and 0’s.
Source code:
Output:
20 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Finite Automata (FA) ?
2. What are the type of FA?
3. What is DFA?
4. What is NFA?
21 | Page
Theory-8
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
F: final state
δ: Transition function
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the null
move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
22 | Page
Practical-8: Design a program for creating a machine which count number of 1’s and 0’s in a given string.
Source code:
"
Output:
23 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Finite Automata (FA) ?
2. What are the type of FA?
3. What is DFA?
4. What is NFA?
24 | Page
Theory-9
Pushdown automata
Pushdown automata is a way to implement a CFG in the same way we design DFA for a regular grammar.
A DFA can remember a finite amount of information, but a PDA can remember an infinite amount of
information.
Pushdown automata is simply an NFA augmented with an "external stack memory". The addition of stack
is used to provide a last-in-first-out memory management capability to Pushdown automata. Pushdown
automata can store an unbounded amount of information on the stack. It can access a limited amount of
information on the stack. A PDA can push an element onto the top of the stack and pop off an element
from the top of the stack. To read an element into the stack, the top elements must be popped off and are
lost.
A PDA is more powerful than FA. Any language which can be acceptable by FA can also be acceptable
by PDA. PDA also accepts a class of language which even cannot be accepted by FA. Thus PDA is much
more superior to FA.
PDA Components:
Input tape: The input tape is divided in many cells or symbols. The input head is read-only and may only
move from left to right, one symbol at a time.
Finite control: The finite control has some pointer which points the current symbol which is to be read.
Stack: The stack is a structure in which we can push and remove the items from one end only. It has an
infinite size. In PDA, the stack is used to store the items temporarily.
Γ: a stack symbol which can be pushed and popped from the stack
δ: mapping function which is used for moving from current state to next state.
ID is an informal notation of how a PDA computes an input string and make a decision that string is
accepted or rejected.
Turnstile Notation:
25 | Page
Practical-9: Design a PDA to accept WCWR where w is any string and WR is reverse of that string and C is a
Special symbol.
Source code:
Output:
26 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is PDA?
2. What is NFA?
3. What is DFA?
4. What is Stack?
27 | Page
Theory-10
Turing Machine
Turing machine was invented in 1936 by Alan Turing. It is an accepting device which accepts Recursive
Enumerable Language generated by type 0 grammar.
The model has a facility by which the input at left or right on the tape can be read easily.
The machine can produce a certain output based on its input. Sometimes it may be required that the same
input has to be used to generate the output. So in this machine, the distinction between input and output
has been removed. Thus a common set of alphabets can be used for the Turing machine.
The mapping function shows the mapping from states of finite automata and input symbol on the tape to
the next states, external symbols and the direction for moving the tape head. This is known as a triple or a
program for turing machine.
That means in q0 state, if we read symbol 'a' then it will go to state q1, replaced a by X and move ahead
right(R stands for right).
The turning machine can be modelled with the help of the following representation.
1. The input tape is having an infinite number of cells, each cell containing one input symbol and thus the
input string can be placed on tape. The empty tape is filled by blank characters.
2. The finite control and the tape head which is responsible for reading the current input symbol. The tape
head can move to left to right.
4. Finite set of symbols called external symbols which are used in building the logic of turing machine.
28 | Page
Practical-10: Design a Turing machine that’s accepts the following language aⁿ bⁿ cⁿ where n>0.
Source code:
Output:
29 | Page
Result and Discussion:
Learning Outcomes:
Course Outcomes:
Conclusion:
Viva Question:
1. What is Turing Machine?
2. Recursive Enumerable language is generated by which grammar?
3. Type 0 grammar is also known as?
4. Which type of Language is generated by Unrestricted grammar?
30 | Page