0% found this document useful (0 votes)
18 views34 pages

ChatGPT-NLP Course Summary

The document outlines the syllabus for a Natural Language Processing (NLP) course at Savitribai Phule Pune University, covering six units over 42 hours. Topics include the introduction to NLP, language syntax and semantics, language modeling, information retrieval, NLP tools and techniques, and applications of NLP. Each unit features key concepts, case studies, and mapped course outcomes to ensure comprehensive learning.

Uploaded by

dikshaahire256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views34 pages

ChatGPT-NLP Course Summary

The document outlines the syllabus for a Natural Language Processing (NLP) course at Savitribai Phule Pune University, covering six units over 42 hours. Topics include the introduction to NLP, language syntax and semantics, language modeling, information retrieval, NLP tools and techniques, and applications of NLP. Each unit features key concepts, case studies, and mapped course outcomes to ensure comprehensive learning.

Uploaded by

dikshaahire256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

NLP Course Summary

Course Contents

Unit I Introduction to Natural Language Processing 07


Hours
Introduction: Natural Language Processing, Why NLP is
hard? Programming languages Vs
Natural Languages, Are natural languages regular?
Finite automata for NLP, Stages of NLP,
Challenges and Issues(Open Problems) in NLP
Basics of text processing: Tokenization, Stemming,
Lemmatization, Part of Speech Tagging
#Exemplar/Case Studies Why English is not a regular
language:

http://cs.haifa.ac.il/~shuly/teaching/08/nlp/complexity.p
df#page=20

*Mapping of Course
Outcomes for Unit I

CO1

Unit II Language Syntax and Semantics 07 Hours


Home

Faculty of Engineering Savitribai Phule Pune University

Syllabus for Fourth Year of Computer Engineering


#79/128
Morphological Analysis: What is Morphology? Types of
Morphemes, Inflectional morphology
&Derivational morphology, Morphological parsing with
Finite State Transducers (FST)
Syntactic Analysis: Syntactic Representations of
Natural Language, Parsing Algorithms,
Probabilistic context-free grammars, and Statistical

🚀 Powered by ChatGPT Exporter 1 / 34


parsing
Semantic Analysis: Lexical Semantic, Relations among
lexemes & their senses –
Homonymy,Polysemy, Synonymy, Hyponymy, WordNet,
Word Sense Disambiguation (WSD),
Dictionary
based approach, Latent Semantic Analysis
#Exemplar/CaseStudies Study of Stanford Parser and
POS Tagger
https://nlp.stanford.edu/software/lex-parser.html
https://nlp.stanford.edu/software/tagger.html

*Mapping of Course
Outcomes for Unit II

CO2

Unit III Language Modelling 07 Hours


Probabilistic language modeling, Markov models,
Generative models of language, Log-Liner
Models, Graph-based Models
N-gram models: Simple n-gram models, Estimation
parameters and smoothing, Evaluating
language models, Word Embeddings/ Vector Semantics:
Bag-of-words, TFIDF, word2vec,
doc2vec, Contextualized representations (BERT)
Topic Modelling: Latent Dirichlet Allocation (LDA),
Latent Semantic Analysis, Non
Negative
Matrix Factorization
#Exemplar/Case Studies Study of language modelling
for Indian languages.
*Mapping of Course
Outcomes for Unit III

CO3

Unit IV Information Retrieval using NLP 07 Hours


Information Retrieval: Introduction, Vector Space Model
Named Entity Recognition: NER System Building

🚀 Powered by ChatGPT Exporter 2 / 34


Process, Evaluating NER System
Entity Extraction, Relation Extraction, Reference
Resolution, Coreference resolution, Cross
Lingual Information Retrieval
#Exemplar/Case Studies Natural Language Processing
based Information Extraction &

Retrieval:
https://www.cdac.in/index.aspx?
id=mc_cli_cross_lingual_info

*Mapping of Course
Outcomes for Unit IV

CO4

Unit V NLP Tools and Techniques 07 Hours


Prominent NLP Libraries: Natural Language Tool Kit
(NLTK), spaCy, TextBlob, Gensim etc.
Linguistic Resources: Lexical Knowledge Networks,
WordNets, Indian Language WordNet
(IndoWordnet), VerbNets, PropBank, Treebanks,
Universal Dependency Treebanks
Word Sense Disambiguation: Lesk Algorithm Walker’s
algorithm, WordNets for Word
Sense Disambiguation
#Exemplar/Case Studies Hindi Wordnet:
https://www.cfilt.iitb.ac.in/wordnet/webhwn/
Sanskrit WordNet:
https://www.cfilt.iitb.ac.in/wordnet/webswn/
Indic Library:
http://anoopkunchukuttan.github.io/indic_nlp_library/

Faculty of Engineering Savitribai Phule Pune University

Syllabus for Fourth Year of Computer Engineering


#80/128
*Mapping of Course
Outcomes for Unit V

🚀 Powered by ChatGPT Exporter 3 / 34


CO5

Unit VI Applications of NLP 07 Hours


Machine Translation: Rule based techniques, Statistical
Machine Translation (SMT), Cross
Lingual Translation
Sentiment Analysis, Question Answering, Text
Entailment, Discourse Processing, Dialog and
Conversational Agents, Natural Language Generation
#Exemplar/Case Studies Study working of Google
Translate

Study working of IBM Watson Natural Language


Processing oral qa

Here’s a cleaned and organized version of the course content for your Natural
Language Processing (NLP) syllabus from Savitribai Phule Pune University –
Final Year Computer Engineering. Each unit includes the main topics, suggested
exemplar/case studies, and mapped Course Outcomes (COs):

Unit I: Introduction to Natural Language Processing (07


Hours)
Topics Covered:

Definition and need for NLP

Why NLP is hard?

Programming Languages vs Natural Languages

Are Natural Languages Regular?

Finite Automata for NLP

🚀 Powered by ChatGPT Exporter 4 / 34


Stages of NLP

Challenges and Open Issues in NLP

Basics of Text Processing:

Tokenization

Stemming

Lemmatization

Part-of-Speech (POS) Tagging

Exemplar/Case Study:

Why English is not a regular language:


Complexity of English Language (PDF)

Mapped Course Outcome:

CO1

Unit II: Language Syntax and Semantics (07 Hours)


Topics Covered:

Morphological Analysis:

Morphology, Morphemes

Inflectional vs Derivational Morphology

Finite State Transducers for Morphological Parsing

Syntactic Analysis:

Syntactic Representations

Parsing Algorithms

🚀 Powered by ChatGPT Exporter 5 / 34


Probabilistic Context-Free Grammars

Statistical Parsing

Semantic Analysis:

Lexical Semantics

Word Relations: Homonymy, Polysemy, Synonymy, Hyponymy

WordNet

Word Sense Disambiguation (WSD)

Dictionary-based and Latent Semantic Analysis

Exemplar/Case Study:

Stanford Parser and POS Tagger:


Lex Parser
Tagger

Mapped Course Outcome:

CO2

Unit III: Language Modelling (07 Hours)


Topics Covered:

Probabilistic Language Modeling:

Markov Models

Generative Models

Log-Linear Models

Graph-based Models

🚀 Powered by ChatGPT Exporter 6 / 34


N-gram Models:

Simple N-grams, Parameter Estimation

Smoothing Techniques

Evaluation of Language Models

Vector Semantics:

Bag-of-Words, TF-IDF

word2vec, doc2vec

Contextualized Embeddings (e.g., BERT)

Topic Modelling:

LDA, Latent Semantic Analysis

Non-Negative Matrix Factorization

Exemplar/Case Study:

Language Modeling for Indian Languages

Mapped Course Outcome:

CO3

Unit IV: Information Retrieval using NLP (07 Hours)


Topics Covered:

Information Retrieval Basics:

Introduction, Vector Space Model

Entity Processing:

🚀 Powered by ChatGPT Exporter 7 / 34


Named Entity Recognition (NER)

Entity Extraction

Relation and Reference Resolution

Coreference Resolution

Cross-Lingual Information Retrieval

Exemplar/Case Study:

NLP-based Information Extraction and Retrieval:


CDAC Cross-Lingual IR

Mapped Course Outcome:

CO4

Unit V: NLP Tools and Techniques (07 Hours)


Topics Covered:

NLP Libraries:

NLTK, spaCy, TextBlob, Gensim

Linguistic Resources:

WordNets (English & Indian Languages)

IndoWordNet, VerbNets, PropBank

Treebanks, Universal Dependency Treebanks

Word Sense Disambiguation Techniques:

Lesk Algorithm

Walker’s Algorithm

🚀 Powered by ChatGPT Exporter 8 / 34


WordNets for WSD

Exemplar/Case Study:

Hindi WordNet: WebHwn

Sanskrit WordNet: WebSwn

Indic NLP Library: Indic Library

Mapped Course Outcome:

CO5

Unit VI: Applications of NLP (07 Hours)


Topics Covered:

Machine Translation:

Rule-Based, Statistical Machine Translation

Cross-Lingual Translation

Sentiment Analysis

Question Answering Systems

Text Entailment

Discourse Processing

Dialog and Conversational Agents

Natural Language Generation

Exemplar/Case Study:

🚀 Powered by ChatGPT Exporter 9 / 34


Working of Google Translate

Working of IBM Watson NLP for Oral QA

Let me know if you'd like a summarized PowerPoint presentation, PDF, or mind


map version of this content.

give me oral QA

Here's an Oral Question-Answer (QA) set based on the Natural Language


Processing (NLP) syllabus you've provided. These questions are suitable for viva,
oral exams, or interview preparation. They are organized unit-wise, with a focus on
conceptual clarity.

✅ Unit I – Introduction to NLP


Q1. What is Natural Language Processing?
A: NLP is a field of AI that enables computers to understand, interpret, and generate
human language.

Q2. Why is NLP considered difficult?


A: Because natural languages are ambiguous, unstructured, and have complex
grammar rules which vary by context and culture.

Q3. What are the stages of NLP?


A: Lexical analysis, syntactic analysis, semantic analysis, discourse integration, and
pragmatic analysis.

🚀 Powered by ChatGPT Exporter 10 / 34


Q4. What is tokenization?
A: It is the process of splitting text into individual words or tokens.

Q5. How is stemming different from lemmatization?


A: Stemming chops off word ends; lemmatization returns the base form using
vocabulary and morphology.

✅ Unit II – Syntax and Semantics


Q6. What is morphology in NLP?
A: Study of word formation and structure using morphemes.

Q7. What are inflectional and derivational morphemes?


A: Inflectional morphemes change tense or number; derivational morphemes create
new words.

Q8. What is a context-free grammar (CFG)?


A: A formal grammar where every production rule is of the form A → β, with A as a
non-terminal.

Q9. What is Word Sense Disambiguation (WSD)?


A: The process of identifying the correct meaning of a word in context.

Q10. What are homonymy and polysemy?


A: Homonyms have different meanings but the same form; polysemy has related
meanings.

✅ Unit III – Language Modelling


Q11. What is an N-gram model?
A: A probabilistic model that predicts the next word based on the previous N-1 words.

🚀 Powered by ChatGPT Exporter 11 / 34


Q12. What is smoothing in N-gram models?
A: Techniques used to handle zero probabilities for unseen word combinations.

Q13. What is TF-IDF?


A: Term Frequency-Inverse Document Frequency; it reflects the importance of a word
in a document.

Q14. What is word2vec?


A: A model that represents words in a dense vector space capturing semantic
meaning.

Q15. What is topic modeling?


A: A technique to discover abstract topics within a collection of texts, e.g., using LDA.

✅ Unit IV – Information Retrieval


Q16. What is Information Retrieval (IR)?
A: Finding relevant documents or data from a large corpus based on a query.

Q17. What is Named Entity Recognition (NER)?


A: Identifying entities like person names, locations, dates, etc., in text.

Q18. What is coreference resolution?


A: Identifying when different words refer to the same entity in a text.

Q19. What is cross-lingual IR?


A: Retrieving information in one language based on a query in another.

Q20. How does relation extraction work?


A: It identifies semantic relationships between entities in a sentence.

✅ Unit V – NLP Tools and Techniques


🚀 Powered by ChatGPT Exporter 12 / 34
Q21. What is NLTK?
A: A Python library for symbolic and statistical NLP.

Q22. What is spaCy used for?


A: Industrial-strength NLP tasks like tokenization, POS tagging, NER.

Q23. What is WordNet?


A: A lexical database of English words grouped into synonyms and linked by semantic
relations.

Q24. What is the Lesk algorithm?


A: A WSD algorithm that uses dictionary definitions (overlaps) to determine sense.

Q25. What is IndoWordNet?


A: A multilingual WordNet for Indian languages developed by IIT Bombay.

✅ Unit VI – Applications of NLP


Q26. What is Machine Translation?
A: Automatically translating text from one language to another.

Q27. What is Statistical Machine Translation (SMT)?


A: A translation technique based on probabilities derived from bilingual corpora.

Q28. What is Sentiment Analysis?


A: Detecting the sentiment (positive, negative, neutral) from text.

Q29. What are dialog systems?


A: AI systems that can converse with humans in natural language (e.g., chatbots).

Q30. What is text entailment?


A: Determining if one text logically follows from another.

🚀 Powered by ChatGPT Exporter 13 / 34


Would you like a printable version of this QA sheet or a presentation format with
speaker notes for teaching/revision?

give me more unit wise

Here’s an extended set of Unit-wise Oral Q&A for Natural Language


Processing (NLP) based on your syllabus — ideal for oral exams, interviews, and
concept revision.

✅ Unit I – Introduction to NLP (Extended)


Q1. What makes natural languages ambiguous?
A: Words can have multiple meanings, sentence structures can vary, and context
affects interpretation.

Q2. What are the main components of an NLP pipeline?


A: Lexical analysis, syntactic analysis, semantic analysis, discourse integration,
pragmatic analysis.

Q3. How is a natural language different from a programming language?


A: Natural languages are informal, ambiguous, and evolve over time; programming
languages are formal and have strict syntax.

Q4. What is POS tagging?


A: It assigns parts of speech (noun, verb, adjective, etc.) to each word in a sentence.

Q5. What are finite automata used for in NLP?


A: Modeling regular grammars and recognizing regular languages, though natural
languages are not fully regular.

🚀 Powered by ChatGPT Exporter 14 / 34


✅ Unit II – Language Syntax and Semantics (Extended)
Q6. What is derivational morphology?
A: It forms a new word by adding prefixes/suffixes (e.g., happy → happiness).

Q7. What is morphological parsing?


A: Breaking down words into morphemes to understand their structure and meaning.

Q8. What is parsing in NLP?


A: The process of analyzing sentence structure using a grammar.

Q9. What is the difference between lexical and compositional semantics?


A: Lexical semantics deals with word meanings; compositional semantics explains
how meanings combine in phrases/sentences.

Q10. What is WordNet and how is it useful in semantics?


A: A lexical database that links words with their meanings and relationships, used for
WSD and synonym detection.

✅ Unit III – Language Modelling (Extended)


Q11. What is a generative language model?
A: A model that can generate text by learning the joint probability of sequences of
words.

Q12. What is the role of smoothing in language modeling?


A: To handle zero-probability problems by redistributing some probability mass to
unseen events.

Q13. How does word2vec work?


A: It learns word embeddings using neural networks based on surrounding context
(CBOW or Skip-gram).

🚀 Powered by ChatGPT Exporter 15 / 34


Q14. What is the difference between BERT and word2vec?
A: BERT provides contextualized embeddings; word2vec gives static embeddings.

Q15. What are the applications of topic modeling?


A: Document classification, content summarization, recommendation systems.

✅ Unit IV – Information Retrieval using NLP (Extended)


Q16. What is the Vector Space Model in IR?
A: A model that represents documents and queries as vectors to calculate similarity
(e.g., cosine similarity).

Q17. What is Entity Extraction?


A: Identifying key entities (people, organizations, locations) in text.

Q18. What is the difference between NER and coreference resolution?


A: NER identifies entities; coreference resolution links different mentions of the same
entity.

Q19. What are evaluation metrics for NER?


A: Precision, Recall, F1-score.

Q20. What is the challenge in Cross-Lingual Information Retrieval?


A: Matching queries in one language with documents in another language.

✅ Unit V – NLP Tools and Techniques (Extended)


Q21. What is Gensim used for?
A: Topic modeling and document similarity analysis using models like LDA and
word2vec.

🚀 Powered by ChatGPT Exporter 16 / 34


Q22. What are Treebanks?
A: Annotated text corpora that show the syntactic structure of sentences.

Q23. What is Universal Dependency Treebank?


A: A multilingual treebank standard for syntactic annotation.

Q24. What is the purpose of VerbNet and PropBank?


A: To provide structured information about verb argument structures and semantics.

Q25. How does the Lesk algorithm perform WSD?


A: By finding the sense of a word whose dictionary definition overlaps most with its
context.

✅ Unit VI – Applications of NLP (Extended)


Q26. What are the types of Machine Translation systems?
A: Rule-based, Statistical (SMT), Neural Machine Translation (NMT).

Q27. What is text entailment in NLP?


A: Determining whether one sentence logically follows from another.

Q28. How do question answering systems work?


A: They use information retrieval, NLP, and sometimes reasoning to return accurate
answers to natural questions.

Q29. What is the goal of discourse processing?


A: To understand how sentences relate in context, coherence, and flow.

Q30. What are conversational agents?


A: AI systems like chatbots that interact with users through natural language
dialogue.

🚀 Powered by ChatGPT Exporter 17 / 34


Would you like me to generate flashcards, a quiz, or a PDF handout of this oral QA
for easy review?

oral QA unit wise deeply

Here is a comprehensive, deep oral Q&A set for each unit of your NLP
syllabus, designed for viva voce, oral exams, or detailed concept interviews.
Each question dives deeper into the topic, not just definitions, to reflect understanding
and reasoning.

✅ Unit I: Introduction to NLP (Deep Oral QA)


Q1. What makes natural language processing fundamentally different from traditional
programming problems?
A: Natural languages are inherently ambiguous, unstructured, and full of exceptions.
Unlike programming languages that follow strict grammar rules, natural languages
vary by context, region, and user intent, making interpretation by machines extremely
challenging.

Q2. Explain why natural languages are not regular languages.


A: Regular languages can be represented by finite automata and are limited in
expressiveness. Natural languages, however, require memory (e.g., nested structures
in grammar like "The cat that the dog chased ran away"), which can't be captured by
regular expressions or finite automata — they often require context-free or context-
sensitive grammars.

Q3. What are the primary stages of an NLP pipeline?


A:

🚀 Powered by ChatGPT Exporter 18 / 34


1. Text preprocessing (tokenization, normalization)

2. Morphological analysis

3. Syntactic analysis (parsing)

4. Semantic analysis

5. Discourse integration

6. Pragmatic analysis

Q4. What challenges are still open problems in NLP?


A:

Word Sense Disambiguation

Handling sarcasm and humor

Cross-lingual NLP

Context understanding and reasoning

Bias and fairness in language models

Q5. Differentiate between stemming and lemmatization with examples.


A: Stemming cuts word suffixes without understanding grammar (e.g., playing → play,
flies → fli). Lemmatization uses vocabulary and morphological analysis to return the
dictionary form (flies → fly).

✅ Unit II: Language Syntax and Semantics (Deep


Oral QA)
Q6. How does morphological analysis help in NLP?
A: It breaks down words into morphemes, aiding in understanding meaning and

🚀 Powered by ChatGPT Exporter 19 / 34


grammatical function, which is crucial for POS tagging, parsing, and translation.

Q7. Explain the role of finite-state transducers in morphological parsing.


A: FSTs model the relationship between surface forms and lexical forms by applying
rules that map input (inflected word) to output (root + features), making them
efficient for real-time parsing.

Q8. What is the significance of probabilistic parsing?


A: It uses probability to choose the most likely syntactic structure among many
possibilities, especially useful in ambiguous sentences (e.g., I saw the man with the
telescope).

Q9. Compare Homonymy and Polysemy with examples.


A: Homonymy: different meanings, same word (e.g., bank of river vs bank account).
Polysemy: related meanings (e.g., paper as material and academic publication).

Q10. What is Word Sense Disambiguation, and why is it hard?


A: It’s the process of identifying the correct sense of a word in context. It’s difficult
due to multiple valid meanings, subtle context shifts, and insufficient training data for
rare senses.

✅ Unit III: Language Modelling (Deep Oral QA)


Q11. What is the purpose of a language model in NLP?
A: To estimate the probability of sequences of words, allowing machines to predict the
next word, correct spelling errors, and understand natural language context.

Q12. Explain the problem of data sparsity in N-gram models and how it’s solved.
A: Data sparsity arises when many word sequences are never seen in training data.
It’s handled using smoothing techniques like Laplace, Good-Turing, and backoff
models.

Q13. How do word embeddings improve NLP tasks over traditional methods like BoW?
A: Word embeddings (e.g., word2vec) capture semantic similarity in vector space,

🚀 Powered by ChatGPT Exporter 20 / 34


unlike BoW which ignores context and word order. For instance, king - man + woman
≈ queen in word2vec.

Q14. How does BERT differ from traditional embeddings?


A: BERT provides contextual embeddings where the same word can have different
vectors based on sentence context (e.g., bank in river vs finance), while traditional
models like word2vec provide static vectors.

Q15. Describe Latent Dirichlet Allocation (LDA) in topic modeling.


A: LDA is a generative probabilistic model that assumes documents are mixtures of
topics, and topics are distributions over words. It uses Bayesian inference to uncover
hidden thematic structures.

✅ Unit IV: Information Retrieval Using NLP (Deep


Oral QA)
Q16. What are the limitations of Boolean retrieval in IR and how does the Vector
Space Model overcome them?
A: Boolean retrieval is rigid (AND/OR/NOT), returning exact matches. Vector Space
Model allows ranking documents based on similarity scores (like cosine similarity),
enabling partial and relevance-based matching.

Q17. How is NER useful in real-world applications?


A: NER identifies key entities (names, dates, locations), aiding in summarization,
information extraction, knowledge graph construction, and question answering.

Q18. What is relation extraction and how is it performed?


A: It identifies semantic relationships between entities (e.g., Barack Obama – born in –
Hawaii). It can be rule-based, supervised ML-based, or use pre-trained models.

Q19. How does coreference resolution enhance text understanding?


A: It links pronouns and noun phrases to the entities they refer to, helping maintain

🚀 Powered by ChatGPT Exporter 21 / 34


context across sentences (e.g., Mary loves coffee. She drinks it daily. – She = Mary, it
= coffee).

Q20. Explain the concept and need for Cross-Lingual Information Retrieval.
A: CLIR enables users to query in one language and retrieve documents in another. It
is essential for multilingual societies and global access to information, often using
machine translation or bilingual dictionaries.

✅ Unit V: NLP Tools and Techniques (Deep Oral QA)


Q21. What are the advantages of using spaCy over NLTK?
A: spaCy is faster, designed for production use, and supports deep learning. NLTK is
better for teaching and prototyping due to its comprehensive documentation and
datasets.

Q22. What are Treebanks and how are they used in NLP?
A: Treebanks are corpora annotated with syntactic or semantic sentence structure.
They’re used to train parsers and evaluate syntactic analysis algorithms.

Q23. How does WordNet assist in Word Sense Disambiguation?


A: WordNet provides senses, definitions, synonyms, and relations. Algorithms like Lesk
use this information to compare context with sense definitions.

Q24. What is the Lesk algorithm and what is its limitation?


A: It disambiguates word senses by overlapping dictionary definitions with context.
Limitation: highly dependent on the quality and completeness of definitions.

Q25. What role do linguistic resources like IndoWordNet play for Indian languages?
A: IndoWordNet provides structured semantic relations for Indian languages, enabling
translation, WSD, and cross-lingual NLP applications.

🚀 Powered by ChatGPT Exporter 22 / 34


✅ Unit VI: Applications of NLP (Deep Oral QA)
Q26. How does rule-based machine translation differ from statistical MT?
A: Rule-based uses grammatical rules and dictionaries; statistical MT learns from
aligned bilingual corpora using probability. SMT handles ambiguity better but requires
large data.

Q27. What are the steps involved in Sentiment Analysis?


A: Preprocessing → Tokenization → POS tagging → Feature extraction → Classification
(positive, negative, neutral).

Q28. How do QA systems differ from traditional search engines?


A: QA systems aim to provide direct answers to questions using NLP and IR
techniques, while search engines return a list of relevant documents.

Q29. What challenges exist in building dialog systems?


A: Handling context, understanding multi-turn conversations, managing user intent,
and generating natural responses.

Q30. How does natural language generation work in applications like chatbots or
summaries?
A: NLG converts structured data into coherent text using rule-based, template-based,
or neural network approaches like GPT or T5.

Would you like these in PDF, presentation, or printable flashcard format?

Course Contents

Unit I Introduction to Decision support systems and


Business

intelligence

🚀 Powered by ChatGPT Exporter 23 / 34


07 Hours
Decision support systems: Definition of system,
representation of the decision-making process,
evolution of information systems, Decision Support
System, Development of a decision support system,
the four stages of Simon’s decision-making process,
and common strategies and approaches of decision
makers
Business Intelligence: BI, its components &
architecture, previewing the future of BI, crafting a
better
experience for all business users, End user
assumptions, setting up data for BI, data, information
and
knowledge, The role of mathematical models, Business
intelligence architectures, Ethics and business
intelligence
#Exemplar/Case
Studies

Decision support system in business intelligence:


https://www.riverlogic.com/blog/five-decision-support-
system-examples

*Mapping of Course
Outcomes for Unit I

CO1

Unit II The Architecture of DW and BI 07 Hours

Home

Faculty of Engineering Savitribai Phule Pune University

Syllabus for Fourth Year of Computer Engineering


#99/128
BI and DW architectures and its types - Relation
between BI and DW - OLAP (Online analytical

🚀 Powered by ChatGPT Exporter 24 / 34


processing) definitions - Different OLAP Architectures-
Data Models-Tools in Business Intelligence-Role
of DSS, EIS, MIS and digital Dash boards – Need for
Business Intelligence
Difference between OLAP and OLTP - Dimensional
analysis - What are cubes? Drill-down and roll-up -
slice and dice or rotation - OLAP models - ROLAP versus
MOLAP - defining schemas: Stars,
snowflakes and fact constellations.
#Exemplar/Case
Studies

A case study on Retail Industry :


https://www.diva-
portal.org/smash/get/diva2:831050/FULLTEXT01.pdf

*Mapping of Course
Outcomes for Unit II
CO2

Unit III Reporting Authoring 07 Hours


Building reports with relational vs Multidimensional
data models; Types of Reports – List, crosstabs,
Statistics, Chart, map, financial etc; Data Grouping &
Sorting, Filtering Reports, Adding Calculations to
Reports, Conditional formatting, Adding Summary Lines
to Reports. Drill up, drill- down, drill-through
capabilities. Run or schedule report, different output
forms – PDF, excel, csv, xml etc.
#Exemplar/Case
Studies

Power BI Case Study – How the tool reduced hassles of


Heathrow & Edsby:
https://data-flair.training/blogs/power-bi-case-study/

*Mapping of Course
Outcomes for Unit III
CO3

🚀 Powered by ChatGPT Exporter 25 / 34


Unit IV Data preparation 07 Hours
Data validation: Incomplete data , Data affected by
noise .Data transformation: Standardization ,
Feature extraction. Data reduction : Sampling, Feature
selection, Principal component analysis, Data
discretization .Data exploration : 1.Univarate analysis
:Graphical analysis of categorical attributes
,Graphical analysis of numerical attributes , Measures
of central tendency for numerical attributes ,
Measures of dispersion for numerical attributes,
Identification of outliers for numerical attributes
2.Bivariate analysis: Graphical analysis , Measures of
correlation for numerical attributes , Contingency
tables for categorical attributes, 3.Multivariate analysis:
Graphical analysis , Measures of correlation for
numerical attributes
#Exemplar/Case
Studies

Case study on Data preparation phase of BI system

https://blog.panoply.io/load-and-transform-how-to-
prepare-your-data-for-
business-intelligence

*Mapping of Course
Outcomes for Unit IV
CO4

Unit V Impact of Machine learning in Business


Intelligence Process 07 Hours
Classification: Classification problems, Evaluation of
classification models, Bayesian methods, Logistic
regression. Clustering: Clustering methods, Partition
methods, Hierarchical methods, Evaluation of
clustering models. Association Rule: Structure of
Association Rule, Apriori Algorithm
#Exemplar/Case
Studies

🚀 Powered by ChatGPT Exporter 26 / 34


Business applications for comparing the performance of
a stock over a period
of time https://cleartax.in/s/stock-market-analysis

*Mapping of Course
Outcomes for Unit V
CO5

Unit VI BI Applications 07 Hours

Faculty of Engineering Savitribai Phule Pune University

Syllabus for Fourth Year of Computer Engineering


#100/128
Tools for Business Intelligence, Role of analytical tools
in BI, Case study of Analytical Tools: WEKA,
KNIME, Rapid Miner, R;
Data analytics, Business analytics, ERP and Business
Intelligence, BI and operation management, BI in
inventory management system, BI and human resource
management, BI Applications in CRM, BI
Applications in Marketing, BI Applications in Logistics
and Production, Role of BI in Finance, BI
Applications in Banking, BI Applications in
Telecommunications, BI in salesforce management
#Exemplar/Case
Studies

Logistics planning in the food industry


https://www.foodlogistics.com/case-studies
https://www.barrettdistribution.com/food-distribution-
case-study deep oral QA unit wise

Here is a deep, unit-wise oral Q&A guide for your subject "Decision Support
Systems and Business Intelligence" based on the syllabus of Savitribai Phule
Pune University. Each unit includes in-depth, application-focused, and conceptual
questions ideal for oral exams, vivas, or interviews.

🚀 Powered by ChatGPT Exporter 27 / 34


✅ Unit I: Introduction to DSS and BI – Deep Oral
QA
Q1. What is a Decision Support System (DSS)? How does it differ from MIS?
A: A DSS supports complex decision-making and problem-solving. Unlike MIS, which
focuses on routine operations and reports, DSS is analytical and interactive, helping
managers make strategic decisions.

Q2. Explain Simon’s decision-making model with real-life business examples.


A: Simon proposed four stages:

1. Intelligence (identify problem),

2. Design (generate alternatives),

3. Choice (select best),

4. Implementation.
Example: A retail chain deciding on expansion uses DSS to analyze sales data
(intelligence), compare locations (design), choose the best city (choice), and start
operations (implementation).

Q3. What are the key components of Business Intelligence architecture?


A:

Data sources

ETL (Extract, Transform, Load)

Data warehouse

OLAP engines

Reporting tools

Dashboards/Visualization

🚀 Powered by ChatGPT Exporter 28 / 34


Users

Q4. How are data, information, and knowledge different?


A:

Data: Raw facts (e.g., sales numbers).

Information: Processed data with meaning (e.g., total monthly sales).

Knowledge: Insights derived from information (e.g., customers prefer Product A).

Q5. What ethical issues arise in BI?


A:

Data privacy and surveillance

Biased algorithms

Misuse of personal data

Transparency in decision-making

✅ Unit II: Architecture of DW and BI – Deep Oral


QA
Q6. Differentiate between OLTP and OLAP with business scenarios.
A:

OLTP: Used in day-to-day operations (e.g., ATM transactions).

OLAP: Used for analysis and decision-making (e.g., analyzing monthly banking
trends).

Q7. Explain star and snowflake schemas with diagrams.


A:

🚀 Powered by ChatGPT Exporter 29 / 34


Star schema: Central fact table linked to dimension tables. Simple but
redundant.

Snowflake schema: Normalized dimension tables. More complex but less


redundant.

Q8. What is the importance of drill-down and slice-and-dice in OLAP?


A:

Drill-down: View data at a more detailed level.

Slice-and-dice: Analyze data across different dimensions (e.g., by region or


product).

Q9. What is a cube in data warehousing?


A: A multidimensional dataset allowing analysis from multiple perspectives like time,
geography, and product.

Q10. Explain ROLAP vs MOLAP. Which is better for large datasets?


A:

ROLAP: Works with relational databases; scalable for large data.

MOLAP: Pre-computed cubes; faster but limited in volume.

✅ Unit III: Reporting & Authoring – Deep Oral QA


Q11. What is the difference between list reports and crosstab reports?
A:

List report: Flat, linear data presentation.

Crosstab: Pivot-style, showing aggregation across two dimensions.

🚀 Powered by ChatGPT Exporter 30 / 34


Q12. Why is conditional formatting important in reports?
A: It highlights key values (e.g., low stock, high revenue) for quick insights.

Q13. Explain drill-through capability with an example.


A: Clicking on summarized data (like total sales) to view detailed records (e.g.,
customer-wise sales breakdown).

Q14. How do different output formats (CSV, XML, Excel) serve various stakeholders?
A:

CSV: Lightweight, for developers.

Excel: Interactive, used by analysts.

XML: Structured, used in system integrations.

Q15. What factors must be considered when scheduling reports?


A:

Data refresh frequency

Audience need

Resource availability

Security/access controls

✅ Unit IV: Data Preparation – Deep Oral QA


Q16. Why is data validation crucial before BI analysis?
A: Dirty or incomplete data can lead to false insights and poor decision-making.

Q17. Differentiate between feature extraction and feature selection.


A:

🚀 Powered by ChatGPT Exporter 31 / 34


Feature extraction: Derives new features (e.g., PCA).

Feature selection: Chooses the most relevant existing features.

Q18. How does principal component analysis (PCA) reduce dimensionality?


A: PCA converts correlated features into fewer uncorrelated components while
retaining maximum variance.

Q19. What is the role of univariate and bivariate analysis in data exploration?
A:

Univariate: Analyzing single attributes (e.g., age distribution).

Bivariate: Understanding relationships (e.g., sales vs discount).

Q20. How do you detect outliers in numerical data?


A:

Z-score

Boxplot (IQR method)

Visualizations (scatter plots)

✅ Unit V: Machine Learning in BI – Deep Oral QA


Q21. What is the use of logistic regression in business intelligence?
A: Predict binary outcomes (e.g., will a customer buy or not). Useful in churn
prediction, lead scoring.

Q22. Compare hierarchical and partition clustering methods.


A:

Partition (e.g., K-means): Divides into k clusters based on centroids.

🚀 Powered by ChatGPT Exporter 32 / 34


Hierarchical: Builds tree of clusters (dendrogram) – top-down or bottom-up.

Q23. How is the performance of classification models evaluated?


A:

Accuracy

Precision, Recall

F1 Score

ROC Curve

Q24. Explain the Apriori algorithm with a business use-case.


A: Apriori finds frequent itemsets (e.g., {milk, bread} often bought together) and
generates association rules for market basket analysis.

Q25. What is overfitting in ML, and how can it affect BI decisions?


A: Overfitting models perform well on training data but poorly on real-world data,
leading to unreliable decisions.

✅ Unit VI: BI Applications – Deep Oral QA


Q26. How do ERP systems integrate with BI platforms?
A: ERPs manage transactions (HR, Finance), and BI tools analyze ERP data for trends,
performance metrics, and forecasts.

Q27. Discuss the role of BI in supply chain and logistics.


A: BI helps in demand forecasting, route optimization, inventory control, reducing
delivery times, and cost savings.

Q28. What is the role of tools like RapidMiner and KNIME in BI?
A: They are drag-and-drop platforms for data mining, predictive analytics, and model
deployment without much coding.

🚀 Powered by ChatGPT Exporter 33 / 34


Q29. How does BI support CRM systems?
A: By analyzing customer behavior, preferences, and feedback to improve sales,
service, and retention.

Q30. Give real-life examples of BI in financial services.


A: Fraud detection, credit scoring, risk modeling, portfolio analysis, and regulatory
compliance.

Would you like this content as a PDF for printing, a PowerPoint presentation, or
quiz flashcards for revision?

🚀 Powered by ChatGPT Exporter 34 / 34

You might also like