Mtech-Syllabus-Data Science - Sem2
Mtech-Syllabus-Data Science - Sem2
Curriculum
Handbook for
M.Tech – Data
Science
SEMESTER II
Semester: II Year: 2019-2020
Pre-requisites:
Software systems, programming, data structures and algorithms.
Good programming skills (preferably in Java) Operating Systems,
Distributed Computing Systems,
Introduction to Cloud Computing,
Design and Analysis of Algorithms.
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Describe the basic concepts and technologies of distributed systems. L2
CO2 Illustrate the requirements and challenges when designing, building and L2
managing distributed systems.
CO3 Analyze different scalable distributed system designs. L4
CO4 Analyze use cases for managing distributed file system L4
CO5 Implement the scalable distributed databases and its analysis. L3
Teaching Methodology:
Blackboard teaching and PPT
Assignment
Assessment Methods
Open Book Test for 10 Marks.
Assignment evaluation for 10 Marks.
Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.
Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.
COURSE CONTENT
Unit – I 8 Hrs
Distributed System Models and Enabling Technologies. : Scalable Computing over the Internet. . The Age of Internet
Computing, Scalable Computing Trends and New Paradigms, The Internet of Things and Cyber-Physical Systems,
Technologies for Network-Based Systems, Multicore CPUs and Multithreading Technologies, GPU Computing to
Exascale and Beyond, Memory, Storage, and Wide-Area Networking, Virtual Machines and Virtualization Middleware
, Data Centre Virtualization for Cloud Computing, System Models for Distributed and Cloud Computing, Clusters of
Cooperative Computers, Grid Computing Infrastructures, Peer-to-Peer Network Families., Cloud Computing over the
Internet, Software Environments for Distributed Systems and Clouds. Service-Oriented Architecture (SOA), Trends
toward Distributed Operating Systems, Parallel and Distributed Programming Models, Performance, Security, and
Energy Efficiency, Performance Metrics and Scalability Analysis, Fault Tolerance and System Availability, Network
Threats and Data Integrity, Energy Efficiency in Distributed Computing.
Unit – II 8 Hrs
Virtual Machines and Virtualization of Clusters and Data Centres.: Implementation Levels of Virtualization,
Levels of Virtualization Implementation, Design Requirements and Providers, Virtualization Support at the OS Level,
Middleware Support for Virtualization, Virtualization Structures/Tools and Mechanisms, Hypervisor and Xen
Architecture ,Binary Translation with Full Virtualization, Para-Virtualization with Compiler Support, Virtualization
of CPU, Memory, and I/O Devices, Hardware Support for Virtualization, CPU Virtualization, Memory Virtualization,
I/O Virtualization, Virtualization in Multi-Core Processors, Virtual Clusters and Resource Management, Physical
versus Virtual Clusters, Live VM Migration Steps and Performance Effects, Migration of Memory, Files, and Network
Resources, Dynamic Deployment of Virtual Clusters,
Containers: Containers and Serverless: Kernel namespaces and cgroups, Use cases: Docker, Kubernetes
Unit – III 8 Hrs
Cloud Platform Architecture over Virtualized Data Centers. Cloud Computing and Service Models, Public, Private,
and Hybrid Clouds, Cloud Ecosystem and Enabling Technologies, Infrastructure-as-a-Service (IaaS), Platform-as-a-
Service (PaaS) and Software-as-a-Service (SaaS). Public Cloud Platforms: GAE, AWS, and Azure. Data Science in
cloud: AWS machine learning, Azure Machine Learning, IBM BlueMix, Sense.io, Domino DataLabs, DataJoy,
PythonAnywhere
Unit – IV 7 Hrs
MapReduce and the New Software Stack :Distributed File Systems , MapReduce , Algorithms Using MapReduce ,
Extensions to MapReduce ,The Communication Cost Model,Complexity Theory for MapReduce ,
Unit – V 8 Hrs
Analysing Big Data:The Challenges of Data Science, Introducing Apache Spark. Introduction to Data Analysis with
Scala and Spark :Scala for Data Scientists,The Spark Programming Model, Record Linkage, Getting Started: The Spark
Shell and Spark Context,Bringing Data from the Cluster to the Client,Shipping Code from the Client to the
Cluster,Structuring Data with Tuples and Case Classes, Aggregations, Creating Histograms, Summary Statistics for
Continuous Variables, Creating Reusable Code for Computing Summary Statistics, Simple Variable Selection and
Scoring
Text Books:
1. Kai Hwang, G. C. Fox, J.J. Dongarra “Distributed & Cloud Computing”, Morgan Kauffman Publishers
2. Mining of Massive Datasets. 2nd edition. - Jure Leskovec, AnandRajaraman, Jeff Ullman. Cambridge
University Press. http://www.mmds.org/
3. By Sandy Ryza, Uri Laserson, Josh Wills, Sean Owen Advanced Analytics with Spark”” 2nd Edition,
Publisher: O'Reilly Media, ISBN: 9781491972946
Semester: II Year: 2019-2020
Department: Information Science and Engineering Course Type: Core
Course Title: Neural Network & Deep Learning Course Code:19DS22
L-T-P:3-0-2 Credits: 04
Total Contact Hours:39 hrs Duration of SEE: 3 hrs
SEE Marks: 50 CIE Marks: 50
Pre-requisites:
Machine learning-I, Data mining
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Understand the basic concepts of artificial neural networks L2
CO2 Model Neuron and Neural Network, and to analyze ANN learning, and its L4
applications.
CO3 Develop different single layer/multiple layer Perception learning L3
algorithms.
CO4 Design of another class of layered networks using deep learning principles. L3
Teaching Methodology:
Blackboard teaching and PPT
Executable Codes/ Live Demonstration
Programming Assignment
Assessment Methods
Online certification from Course-era/Edx, etc. for 10 marks
Programming assignments evaluated using rubrics for 10 marks
Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.
Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.
Unit – I 8 Hrs
Introduction to Neural Networks: Neural Network, Human Brain, Models of Neuron, Neural networks viewed as
directed graphs, Biological Neural Network, Artificial neuron, Artificial Neural Network architecture, ANN learning,
analysis and applications, Historical notes.
Learning Processes: Introduction, Error correction learning, Memory-based learning, Hebbian learning, Competitive
learning, Boltzmann learning, credit assignment problem, learning with and without teacher, learning tasks, Memory
and Adaptation.
Unit – II 8 Hrs
Single layer Perception: Introduction, Pattern Recognition, Linear classifier, Simple perception, Perception learning
algorithm, Modified Perception learning algorithm, Adaptive linear combiner, Continuous perception, Learning in
continuous perception. Limitation of Perception
Unit – IV 7 Hrs
Introduction to Deep learning: Neuro architectures as necessary building blocks for the DL techniques, Deep
Learning & Neocognitron, Deep Convolutional Neural Networks, Recurrent Neural Networks (RNN)
Unit – V 8 Hrs
Feature extraction, Deep Belief Networks, Restricted Boltzman Machines, Autoencoders, Training of Deep neural
Networks, Applications and examples (Google, image/speech recognition), Deep Learning Tools: Tensorflow, Caffe,
Theano, Torch.
Text Books:
1. Neural Network- A Comprehensive Foundation, Simon Haykins, 2nd Edition, 1999, Pearson
Prentice Hall, ISBN-13: 978-0-13-147139-9.
2. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016.
3.
Reference Books:
1. Introduction to Artificial Neural Systems, Zurada and Jacek M, 1992, West Publishing Company,
ISBN: 9780534954604
2. Learning & Soft Computing, Vojislav Kecman, 1st Edition, 2004, Pearson Education, ISBN:0-262-
11255-8
3. Neural Networks Design, M T Hagan, H B Demoth, M Beale, 2002, Thomson Learning, ISBN-10:
0-9717321-1-6/ ISBN-13: 978-0-9717321-1-7
Online Materials
Prerequisite:
Machine learning-I
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Understand Key concepts, tools and approaches for pattern recognition on L2
complex data sets
CO2 Understand Kernel methods for handling high dimensional and non-linear L2
patterns
CO3 Apply the state-of-the-art algorithms such as Support Vector Machines and L3
Bayesian networks
CO4 Solve real-world machine learning tasks from data to inference L2
CO5 Demonstrate the theoretical concepts and the motivations behind different L3
learning frameworks
Teaching Methodology:
Assessment Methods:
CO1 2 2 2
CO2 2 3 2
CO3 2 3 3 2 2
CO4 3 2 3 3 2
CO5 3 1 2 2 1 2
19DS23 2 2 3 2 1 2
COURSE CONTENT
Unit – I 8 Hrs
Instance based learning and learning set of rules: K- Nearest Neighbor Learning – Locally Weighted Regression –
Radial Basis Functions – Case Based Reasoning – Sequential Covering Algorithms – Learning Rule Sets – Learning
First Order Rules – Learning Sets of First Order Rules – Induction as Inverted Deduction – Inverting Resolution
(TextBook1)
Unit – II 8 Hrs
Support Vector machine: Maximum margin hyperplanes: Rationale for Maximum Margin, Linear SVM: Separable
Case: Linear Decision Boundary, Margin of a Linear Classifier, Learning a Linear SVM model, Linear SVM: Non-
separable Case, Nonlinear SVM: Attribute Transformation, Learning a Nonlinear SVM, Kernel Trick, Characteristics
of SVM. (Chapter 5.5 from TextBook-2)
Unit – III 8 Hrs
Transfer Learning: Introduction, Transfer in inductive learning: Inductive transfer, Bayesian transfer, Hierarchical
transfer, Transfer with Missing Data or Class Labels.
Transfer in reinforcement learning: Starting-Point Methods, Imitation Methods, Hierarchical Methods, Alteration
Methods, New RL Algorithms
Avoiding negative transfer: Rejecting Bad Information, Choosing a Source Task, Modelling Task Similarity;
Automatically mapping tasks: Equalizing Task Representations, Trying Multiple Mappings, Mapping by Analogy; The
future of transfer learning
Unit – IV 8 Hrs
Analytical learning: Introduction, Learning with Perfect domain theories, Remarks on Explanation based learning;
Explanation based learning of search control knowledge
Combining Inductive and Analytical learning: Motivation, Inductive-Analytical approaches to learning, using prior
knowledge to initialize the hypothesis, Using prior knowledge to initialize the hypothesis, Using prior knowledge to
alter the search objective, Using prior knowledge to augment search. (TextBook1)
Unit – V 7 Hrs
Reinforcement Learning: Introduction, The Learning Task, Q Learning, Nondeterministic Rewards and Actions,
Temporal Difference learning, Generalization from Examples, Relationship to Dynamic programming. (TextBook1)
Text Books:
1. Tom M. Mitchell, “Machine Learning”, McGraw-Hill Education (INDIAN EDITION), 2013.
2. Amanda Casari, Alice Zheng, “Feature Engineering for Machine Learning”, O’Reilly, 2018.
Reference Books:
1. Hands-On Machine Learning with Scikit-Learn and Tensor Flow: Concepts, Tools, and Techniques to
Build Intelligent Systems
Online Materials
1. Machine Learning by Stanford University-Coursera
2. Machine Learning with TensorFlow on Google Cloud Platform Specialization-Coursera
3. Become a Machine Learning Engineer – Udacity.
Semester: II Year: 2019-2020
Pre-requisites:
Teaching Methodology:
Assessment Methods:
Authentication Vs Authorization, Authentication Methods –Password authentication, Public Key Cryptography, Biometric
authentication, Out of band, Authentication Protocols – SSL, Password Authentication Protocol (PAP), Kerberos, Email
authentication,- PGP, Database authentication, Message authentication; secure hash functions and Authorization
Approaches to hmac; publickey cryptography principles; public-key cryptography algorithms, digital signatures, key
management. Kerberos, x.509 directory authentication service. Authorization Definition, Multilayer authorization,
Text books:
1. Cryptography and Network Security Principles and Practice William Stallings, 6th edition,
Pearson Education
2. The Algorithmic Foundations of DifferentialPrivacy, Cynthia Dwork and Aaron Roth. DOI:
10.1561/0400000042.
Reference books:
1. https://s3.amazonaws.com/assets.datacamp.com/production/course_6412/slides/chapter1.
pdf
2. Privacy-Preserving Data Mining- Models and Algorithms, Charu C Aggarwal, Yu Philips, S., Springer
3. Principles of Information Security, Information SecurityProfessional - Michael E. Whitman and
Herbert J. Mattord,4th Edition, Thompson.
Semester: II Year: 2019-2020
Prerequisite:
Database Management Systems
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Describe Big Data and its importance with its applications L2
CO2 Differentiate various big data technologies like Hadoop MapReduce, Pig, L4
Hive, Hbase and No-SQL.
CO3 Apply tools and techniques to analyze Big Data. L3
CO4 Design a solution for a given problem using suitable Big Data Techniques L4
Teaching Methodology:
Black board teaching/ Power Point presentations
Executable Codes/ Live Demonstration
Programming Assignment
Assessment Methods:
Online certification for 10 marks
Programming assignments evaluated using rubrics for 10 marks
Three internals, 30Marks each will be conducted and the Average of best of two will be taken.
Final examination, of100 Marks will be conducted and will be evaluatedfor50Marks.
Unit – I 10 Hrs
INTRODUCTION TO BIG DATA: Big Data and its Importance – Four V’s of Big Data – Drivers for Big Data –
Introduction to Big Data Analytics – Big Data Analytics applications
BIG DATA TECHNOLOGIES:Hadoop’s Parallel World – Data discovery – Open source technology for Big Data
Analytics – cloud and Big Data –Predictive Analytics – Mobile Business Intelligence and Big Data – Crowd Sourcing
Analytics – Inter- and Trans-Firewall Analytics - Information Management.
Unit – II 10Hrs
PROCESSING BIG DATA: Integrating disparate data stores - Mapping data to the programming framework -
Connecting and extracting data from storage - Transforming data for processing - Subdividing data in preparation for
Hadoop Map Reduce.
Unit – III 10 Hrs
HADOOP MAPREDUCE: Employing Hadoop Map Reduce - Creating the components of Hadoop Map Reduce jobs
- Distributing data processing across server farms -Executing Hadoop Map Reduce jobs - Monitoring the progress of
job flows - The Building Blocks of Hadoop Map Reduce - Distinguishing Hadoop daemons - Investigating the Hadoop
Distributed File System Selecting appropriate execution modes: local, pseudo-distributed, fully distributed.
Unit – IV 12 Hrs
BIG DATA TOOLS AND TECHNIQUES: Installing and Running Pig – Comparison with Databases – Pig Latin –
User Define Functions – Data Processing Operators – Installing and Running Hive – Hive QL – Tables – Querying
Data – User-Defined Functions – Oracle Big Data
Unit – V 10 Hrs
ADVANCED ANALYTICS PLATFORM: Real-Time Architecture – Orchestration and Synthesis Using Analytics
Engines – Discovery using Data at Rest – Implementation of Big Data Analytics – Big Data Convergence – Analytics
Business Maturity Model.
Text Books:
1. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business Intelligence and
Analytic Trends for Today’s Business”, 1stEdition, AmbigaDhiraj, Wiely CIO Series, 2013.
2. ArvindSathi, “Big Data Analytics: Disruptive Technologies for Changing the Game”, 1st Edition, IBM
Corporation, 2012.
3. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
Advanced Analytics”, 1st Edition, Wiley and SAS Business Series, 2012.
4. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly, 2012
Additional Reference Book
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN:
9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw Hill, 2012.
3. VigneshPrajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.
4. JyLiebowitz, “Big Data and Business analytics”,CRC press, 2013
Online Materials
1. http://www.bigdatauniversity.com/
2. https://www.coursera.org/courses?query=big%20data%20analytics
Semester: II Year: 2019-2020
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Describing the significance of global platform for data retrieval/process L2
among different business cultures of the world.
CO2 Develop domain knowledge of various technology and its application to L2
facilitates managerial decision /MIS
CO3 Enable communication for data driven decision making L3
CO4 Implement cross functional collaboration to enhance efficiency and L3
productivity.
Teaching Methodology:
ICT enabled Classroom teaching
Case study
Practical / live assignment
Interactive class room discussions
Assessment Methods
Group Discussion for 10 Marks.
Assignment evaluation for 10 Marks.
Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.
Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.
Unit – I 12 Hrs
Introduction to Business Analytics: Why Analytics, Business Analytics: the Science of data driven decision making,
Descriptive Analysis, Predictive Analytics, Prescriptive Analytics, Big Data Analytics, Web and Social media
Analytics, Machine Learning Algorithms, Framework for data driven decision making, Analytics Capability Building,
Roadmap, Challenges, Types (Descriptive, Predictive and Prescriptive), Business Intelligence versus Business
Analytics, Transaction Processing v/s Analytic Processing, OLTP v/s OLAP, OLAP Operations, Data models for
OLTP
Unit – II 10Hrs
Descriptive Analytics: Introduction, Data Types and Scales, Types of Data Measurement Scale, Population and
Sample, Types of Data Measurement Scale
Data Warehouse: Definition, characteristics, framework Data lake Business Reporting, Visual Analytics: Definition,
concepts, Different types of charts and graphs, Emergence of data visualization and visual analytics
Unit – III 10 Hrs
Data Mining: Concepts and applications, Data mining process Text & Web Analytics, Text analytics and text mining
overview, Text mining applications, Web mining overview, Sentiment analysis overview, Supply Chain and
Operations Analytics, Customer Analytics, Project Management, Decision Analysis, Process Analytics, Market
Intelligence
Unit – IV 12 Hrs
Social Network Analysis: Overview of SNA, history and resources, Mathematical foundations, matrices and graph
theory, Whole versus personal networks, one-mode versus two-mode network data, Collecting network data, Informant
accuracy, Network visualizations, Cohesive subgroups, bottom-up and top-down approaches, Block models, Egocentric
SNA, design and applications
Unit – V 8 Hrs
Business Performance Management: Business performance management cycle, KPI, Dashboard Analytics in
Business Support Functions, Sales & Marketing Analytics, HR Analytics, Financial Analytics, Production and
operations analytics, Analytics in Industries: Telecom, Retail, Healthcare, Financial Services
Text Books:
1. U. Dinesh Kumar, “Business Analytics – The Science of Data Driven Decision Making”, Wiley 2017.
2. Ramesh Sharda, DursunDelen, Efraim Turban, “Business Intelligence: A Managerial Perspective on
Analytics”, Pearson, 3e.
3. Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications. A classic,
essential textbook on SNA.
Reference Books:
1. Jesper Thorlund &Gert H.N. Laursen, “ Business Analytics for Managers: Taking Business Intelligence
Beyond”, Wiley
2. Sahil Raj, “Business Analytics”, Cengage
3. James R. Evans, “Business Analytics”, Pearson
4. https://www.bebr.ufl.edu/sites/default/files/ANG5420_Syllabus.pdf
Prerequisite:
Fundamental of Network, Data Mining, Graph theory
Advanced Algorithms
Course Outcomes:
Students will be able to
CO’s Course Learning Outcomes BL
CO1 Understand the basics of Social Network Models and analysis. L2
CO2 Analysesocial network models for community detection. L4
CO3 Implement link prediction and event detection L3
CO4 Analyse social influence and contributing factors. L4
Teaching Methodology:
Black board teaching
Power Point presentations
Assessment Methods:
Rubrics for evaluation of case study 20 Marks
Three internals, 30Marks each will be conducted and the Average of best of two will be taken.
Final examination, of100 Marks will be conducted and will be evaluatedfor50Marks.
Unit – I 10 Hrs
Social Networks : An Introduction; Types of Networks: General Random Networks, Small World Networks, Scale-
Free Networks; Examples of Information Networks; Network Centrality Measures; Strong and Weak ties; Homophily
Walks: Random walk-based proximity measures, Other graph-based proximity measures. Clustering with
random-walk based measures
Unit – II 12 Hrs
Link Prediction: Feature based Link Prediction, Bayesian Probabilistic Models, Probabilistic Relational Models,
Linear Algebraic Methods: Network Evolution based Probabilistic Model, Hierarchical Probabilistic Model, Relational
Bayesian Network. Relational Markov Network.
Unit – IV 10 Hrs
Event Detection: Classification of Text Streams, Event Detection and Tracking: Bag of Words, Temporal, location,
ontology based algorithms. Evolution Analysis in Text Streams, Sentiment analysis.
Unit – V 8 Hrs
Social Influence Analysis: Influence measures, Social Similarity - Measuring Influence, Influencing actions and
interactions. Influence maximization.
Text Books:
1. David Easley, Jon Kleinberg: Networks, Crowds and Markets: Reasoning about a highly connected
world, Cambridge Univ Press 2010
2. S.Wasserman, K.Faust: Social Network Analysis: Methods and Applications, Cambridge Univ Press,
1994
Semester: II Year: 2019-2020
Prerequisites:
Fundamental of Language Processing.
Course outcomes:
Students will be able to:
CO’s Course Learning Outcomes BL
CO1 Describe the basics of Natural Language Processing. L2
CO2 Analyze syntactic and semantic parsing techniques. L2
CO3 Implement a rule-based system to tackle morphology/syntax of a Language L3
CO4 Describe the various issues of Natural Language of Processing. L2
Teaching Methodology:
• Blackboard teaching
• PowerPoint presentations
Assessment Methods:
• Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.
• Rubrics for evaluation of case study 20 Marks
• Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.
PO 1 PO 2 PO 3 PO 4 PO 5 PO 6
CO1 3 2
CO2 3 2 1
CO3 3 2 2 1 2
CO4 3 3 2 2
19DSE254 3 2 2 2 2 2
COURSE CONTENT
Unit – I 11 Hrs
Classical Approaches to Natural Language Processing: context, Classical Toolkit Text Preprocessing Lexical
Analysis, Syntactic Parsing, Semantic Analysis , Natural Language Generation
Text Preprocessing :Introduction Challenges of Text Preprocessing , Character-Set Dependence , Language
Dependence , Corpus Dependence , Application Dependence ,Tokenization ,Tokenization in Space-Delimited
Languages , Tokenization in Un segmented Languages , Sentence Segmentation ,Sentence Boundary Punctuation
, The Importance of Context , Traditional Rule-Based Approaches. Lexical Analysis: Introduction ,Finite State
Morphonology ,Closing Remarks on Finite State Morphonology , Finite State Morphology , Disjunctive Affixes,
Inflectional Classes, and Exceptionality , Further Remarks on Finite State Lexical Analysis , “Difficult”
Morphology and Lexical Analysis ,Isomorphism Problems , Contiguity Problems , Paradigm-Based Lexical
Analysis, Paradigmatic Relations and Generalization..
Unit – II 12 Hrs
Syntactic Parsing: Introduction ,Background ,Context-Free Grammars , Example Grammar , Syntax Trees , Other
Grammar Formalisms , Basic Concepts in Parsing , Parsing as Deduction ,Deduction Systems , The CKY Algorithm ,
Chart Parsing , Bottom-Up Left-Corner Parsing , Top-Down Earley-Style Parsing , Example Session.Semantic
Analysis : Basic Concepts and Issues in Natural Language Semantics ,Theories and Approaches to Semantic
Representation , Logical Approaches , Discourse Representation Theory , Pustejovsky’s Generative Lexicon , Natural
Semantic Meta language , Object-Oriented Semantics , Relational Issues in Lexical Semantics , Sense Relations and
Ontologies , Roles , Fine-Grained Lexical-Semantic Analysis: Three Case Studies , Emotional Meanings: “Sadness”
and “Worry” in English, Ethno geographical Categories: “Rivers” and “Creeks” , Functional Macro-Categories .
Prospectus and “Hard Problems”
Unit – III 08 Hrs
Corpus Creation: Introduction, Corpus Size, Balance, Representativeness, and Sampling Data Capture and
Copyright Corpus Markup and Annotation Multilingual Corpora Multimodal Corpora. Part-of-Speech Tagging
Tunga: Introduction, Parts of Speech , Part-of-Speech Problem , The General Framework, Part-of-Speech Tagging
Approaches , Rule-Based Approaches , Markov Model Approaches , Maximum Entropy Approaches ,Other Statistical
and Machine Learning Approaches , Methods and Relevant Work , Combining Taggers
Unit – V 8 Hrs
Information Retrieval: Introduction, Indexing, Indexing Dimensions • Indexing Process, IR Models Classical
Boolean Model , Vector-Space Models , Probabilistic Models , Query Expansion and Relevance Feedback , Advanced
Models , Evaluation and Failure Analysis , Evaluation Campaigns , Evaluation Measures , Failure Analysis , Natural
Language Processing and Information Retrieval, Morphology , Orthographic Variation and Spelling Errors , Syntax ,
Semantics , Related Applications
Text Analytics: text analytics systems, Named entity recognition Disambiguation, Document clustering: identification
of sets of similar text documents, Term frequency-inverse document frequency- TFIDF, Analysis and Evaluation of
Current Graph-Based Text Mining Researches, Coreference: Relationship, Case study on Biomedical text mining,
Text Books:
1. Nitin Indurkhya, Fred J Damerau “Handbook of Natural Language Processing”, Chapman & Hall/CRC
Publications, 2nd Editions 2010.
Reference Books:
1. Tanveer Sidiqui, U.S Tiwary, “ Natural Language Processing & Information Retrieval”, Oxford
University Press, 2008.
2. Anne Kao & Stephen R Poteel, “ Natural Language & Text Mining”, Springer- Verlag , 2007