B. SC - Data Science
B. SC - Data Science
B. SC - Data Science
B. Sc
DATA SCIENCE
TABLE OF CONTENTS
Note: BOS is to provide final soft copy in PDF and word formats and four copies of hard copies
in bounded form to the office of Dean Academic affairs.
Members present:
Dr.M.KamalaKumari - Chairman Dept of CSE, AKNU, RJY
Dr.P.Venkateswara Rao - Member, Dept of CSE, AKNU, RJY
Mrs.A.M.Sirisha - Coordinator, Dept of CSE, AKNU, RJY
Mr.M. Simhadri - Member, Lecturer, Aditya Degree College, Kakinada
Resolutions:
1. Resolved the revised-common program structure and revising/updating course- wise syllabi
(in the prescribed format) as per the guidelines issued by APSCHE.
2. Resolved the regulations on scheme of examination and marks/grading system of the
University UG programs.
3. Prepared the Model question papers in prescribed format.
4. Prepared the list of equipment/software requirement for each lab/practical
5. Given the eligibility of student for joining the course
6. Given the eligibility of faculty for teaching the course
7. Given the list of paper-setters/paper evaluators with phone, email-id in the prescribed
format
Note 1: For Semester–V, for the domain subject DATA SCIENCE, any one of the three pairs of
SECs shall be chosen as courses 6 and 7, i.e., 6A & 7A or 6B & 7B or 6C & 7C. The
pair shall not be broken (ABC allotment is random, not on any priority basis).
Note 2: One of the main objectives of Skill Enhancement Courses (SEC) is to inculcate field
skills related to the domain subject in students. The syllabus of SEC will be partially
skill oriented. Hence, teachers shall also impart practical training to students on the
field skills embedded in the syllabus citing related real field situations.
Suitable levels of positions for these graduates either in industry/govt organization like.,
technical assistants/ scientists/ school teachers., clearly define them, with reliable
justification
Data Science is a fast-growing interdisciplinary field, focusing on the analysis of data to extract
knowledge and insight. This course will introduce students to the collection. Preparation,
analysis, modelling and visualization of data, covering both conceptual and practical issues.
Examples and case studies from diverse fields will be presented, and hands- on use of statistical
and data manipulation software will be included.
UNIT I:
Defining Data Science and Big data, Benefits and Uses, facets of Data, Data Science Process.
History and Overview of R, Getting Started with R, R Nuts and Bolts
UNIT II:
The Data Science Process: Overview of the Data Science Process-Setting the research goal,
Retrieving Data, Data Preparation, Exploration, Modeling, data Presentation and Automation.
Getting Data in and out of R, Using reader package, Interfaces to the outside world.
UNIT III:
Machine Learning: Understanding why data scientists use machine learning-What is machine
learning and why we should care about, Applications of machine learning in data science,
Where it is used in data science, The modeling process, Types of Machine Learning-Supervised
and Unsupervised.
UNIT IV:
Handling large Data on a Single Computer: The problems we face when handling large data,
General Techniques for handling large volumes of data, Generating programming tips for
dealing with large datasets. Case study- Predicting malicious URLs(This can be implemented in
R).
UNIT V:
Sub setting R objects, Vectorised Operations, Managing Data Frames with the dplyr, Control
structures, functions, Scoping rules of R, Coding Standards in R, Loop Functions, Debugging,
Simulation
TEXT BOOKS:
1. DavyCielen, Arno.D.B.Maysman, Mohamed Ali, “Introducing Data Science”
ManningPublications, 2016.
2. Roger D. Peng, “R Programming for DataScience” Lean Publishing, 2015.
REFERENCE BOOKS:
1. Nina Zumel, John Mount, “Practical Data Science with R”, Manning Publications, 2014.
2. Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, AbhijitDasgupta, “Practical
Data Science Cookbook”, Packt Publishing Ltd., 2014.
B. Sc Semester: I Credits:1
Course: 1 Introduction To Data Science and R Programming Lab Hrs/Wk: 2
B. Sc Semester: II Credits:4
Course: 2 DATA MINING CONCEPTS AND TECHNIQUES Hrs/Wk: 4
Aim and objectives of Course:
To understand Data mining techniques and algorithms.
Comprehend the data mining environments and application.
Learning outcomes of Course:
Students who complete this course will be able to
Compare various conceptions of data mining as evidenced in both research and
application.
Evaluate mathematical methods underlying the effective application of data mining.
Should be able to apply the type of techniques based on the problems considered
UNIT I:
An idea on Data Warehouse, Data mining-KDD versus data mining, Stages of the Data Mining
Process-Task primitives., Data Mining Techniques – Data mining knowledge representation.
UNIT II
Data mining query languages- Integration of Data Mining System with a Data Warehouse-
Issues, Data pre-processing – Data Cleaning, Data transformation – Feature selection –
Dimensionality reduction
UNIT III
Concept Description: Characterization and comparison What is Concept Description,Data
Generalization by Attribute-Oriented Induction(AOI), AOI for Data Characterization, Efficient
Implementation of AOI.
Mining Frequent Patterns, Associations and Correlations: Basic Concepts, FrequentItemset
Mining Methods: Apriori method, generating Association Rules, Improvingthe Efficiency of
Apriori, Pattern-Growth Approach for mining Frequent Item sets.
UNIT-IV
Classification Basic Concepts: Basic Concepts, Decision Tree Induction: Decision
TreeInduction Algorithm, Attribute Selection Measures, Tree Pruning. Bayes Classification
Methods.
UNIT-V
Classification by Back Propagation: Multi_Layer Feed Forward Neural Network. Support
Vector Machines: Cases when the data are linearly separable and linearlyinseparable.
Cluster Analysis: Cluster Analysis, Partitioning Methods, Hierarchal methods, Density based
methods-DBSCAN.
TEXT BOOKS:
1. Jiawei Han, MichelineKamber, Jian Pei.“Data Mining: Concepts and Techniques”, 3rd
Edition,Morgan Kaufmann Publishers, 2011.
2. AdelchiAzzalini, Bruno Scapa, “Data Analysis and Data mining” , 2ndEdiiton, Oxford
Univeristy Press Inc., 2012.
REFERENCES BOOKS:
1. Alex Berson and Stephen J. Smith, “Data Warehousing, Data Mining & OLAP”, 10th
Edition, TataMcGraw Hill Edition , 2007.
2. G.K. Gupta, “Introduction to Data Mining with Case Studies”, 1st Edition, Easter
Economy Edition, PHI, 2006.
B. Sc Data Science Page 14 of 50
ADIKAVI NANNAYA UNIVERSITY:: RAJAHMAHENDRAVARAM
B.Sc Data Science Syllabus (w.e.f :20-21 A.Y)
Student Activities:
1. Students should be able to implement Data Mining algorithms provided the relevant
data
2. Given the data, students can visualize all statistical measures
3. Differentiate the types of mining problems and identify what type of algorithms are to
be implemented.
Continuous assessment:
Let the students be tested in the following questions from each unit
1. What is Data Mining and KDD? Where Data Mining fits in KDD Process
2. Describe all Preprocessing methods
3. Explain Data Description and AOI Algorithm
4. Explain Classification and Write any Decision tree induction algorithm
5. Explain the concept of clustering and write any algorithm to form clusters.
B. Sc Semester: II Credits:1
Course: 2 DATA MINING CONCEPTS AND TECHNIQUES LAB Hrs/Wk: 2
1. Get and Clean data using swirl exercises.(Use ‘swirl’ package, library and install that
topicfrom swirl).
2. Visualize all Statistical measures(Mean ,Mode, Median, Range, Inter Quartile Range etc.,
using Histograms, Boxplots and Scatter Plots).
3. Create a data frame with the following structure.
4. Create a data frame with 10 observations and 3 variables and add new rows and columns to it
using ‘rbind’ and ‘cbind’ function.
5. Create a function to discretize a numeric variable into 3 quantiles and label them as low,
medium, and high. Apply it on each attribute of any dataset to create a new data frame.
‘discrete’ with Categorical variables and the class label.
6. Create a simple scatter plot using any dataset using ‘dplyr’ library. Use the same data to
indicate distribution densities using box whiskers.
7. Write R Programs to implement k-means clustering, k-medoids clustering and density based
clustering on any datasets.
8. Write a R Program to implement decision trees using ‘reading Skills’ dataset.
9. Implement decision trees using any dataset using package party and ‘rpart’.
10. Train SVM Model by taking any dataset.
SECTION-A
Answer any FIVE of the following 5 x 5=25M
SECTION-B
Answer ALL the following Questiopns. 5X10=50M
10. a). Describe the process of data cleaning and data transformation In pre processing
(OR)
b). Explain various data reduction and dimensionality reduction in the pre processing
stepof Data mining.
11. a). Discuss concept description and generalised by AOI for data characterisation.
(OR)
b). Explain Frequent item set mining methods by frequent pattern mining algorithm.
12. a). Explain the algorithm for construction a decision tree from training samples.
(OR)
b). Explain Basian theorem.
UNIT II:
Built-in Data Structures, Functions, Files and Operating System. NumPy Basics: Arrays and
Vectorized Computation, The Numpynd array, Universal Functions, Array-Oriented
Programming with Arrays, File Input and Output with Arrays, Linear Algebra, Pseudorandom
Number Generation.
UNIT III:
Getting Started with Pandas: Introduction to Pandas Data Structures, Essential
Functionality, Summarizing and Computing Descriptive Statistics
Data Loading, Storage and File Formats: Reading and Writing Data in Text Format, Binary
Data Formats, Interacting with Web APIs, Interacting with Databases.
UNIT IV:
Data Cleaning and Preparation: Handling Missing Data, Data Transformation, String
Manipulation.
Data Wrangling: Join, Combine and Reshape: Hierarchical Indexing, Combining and
Merging Datasets, Reshaping and Pivoting.
UNIT V:
Introduction to Modeling Libraries in Python: Interfacing between pandas and Model code,
Creating model descriptions with Patsy, Introduction to stats models.
Plotting and Visualization: A brief matplotlib API Primer, Plotting with Pandas and Seaborn,
Other Python visualization tools.
TEXT BOOKS:
1. Wes McKinney “Python for Data Analysis” O’reilly Publications Second edition
2. Charles R Suverance “Python for Everybody” Exploring data using Python 3
REFERENCE BOOKS:
3. John Zelle Michael Smith Python Programming, second edition 2010
Co-curricular Activities
Take up any application which involves the python coding.Example Case studies/Simulators:
(https://knightlab.northwestern.edu/2014/06/05/five-mini-programming-projects-for-the-
python-beginner/)
4. Hangman
Continuous assessment:
Let the students be tested in the following questions from each unit
1. What is Data Analysis. List out the differences between data analysis and data analytics
SECTION-A
Answer any FIVE of the following 5 x 5=25M
1) What is Data analysis and Data analytics, What are the differences between them.
2) Explain different built in data structures in python
3) How pandas are used in Python.
4) Explain Reshaping and pivoting.
5) What is Pandas.
6) Explain Universal functions
7) Explain interactive with data base concepts.
8) Explain different python visualization tools.
SECTION-B
Answer ALL the following Questions. 5X10=50M
9) a) Why python is used for data analysis, What is meant by library and explain at least six
python libraries.
(OR)
b) What are python and Jupiter note book. Why they are used.
10) a) What is meant by numpy. Why and how numpy is used in python. Explain with in an
example.
1) (OR)
b) Writea programme to generate a pseudo random number in python and write a programme
find out the number of elements in an array.
11) a) Explain predictive and descriptive statistics. Explain with formulas.
(OR)
b) Explain how the data is loaded, stored in different file formats in python.
12) a) What are the different data cleaning and preparation methods. Explain.
(OR)
b) Write python program on hierarchical indexing and joint and combining data.
13) a) How to create model description in python. Explain with a programme.
(OR)
b) Matplotlib is used for plotting and visualization in python using that package explain with
example.
B Sc Semester: IV Credits: 4
Course: 4 BIG DATA ANALYTICS USING SPARK Hrs/Wk: 4
UNIT I:
Introduction to Big Data: What is Big Data-Characteristics, Data in the Warehouse and Data
in Hadoop, Why is Big Data Important- When to consider Big Data Solution, Applications.
Introduction to Hadoop: Hadoop- definition, Application development in Hadoop. The
building blocks of Hadoop, Name Node, Data Node, Secondary Name Node, Job Tracker and
Task Tracker.
UNIT II:
Introduction to Spark: What is Apache Spark, Why Spark when Hadoop is there, Spark
Features, , Spark components, Spark program flow, Spark Eco System. Differences between
implementation of programs in Hadoop and Spark Programming environments.
UNIT III:
Spark Fundamentals- Using spark in action VM, Using Spark Shell and writing first spark
program, Basic RDD actions and transformations.
Spark SQL-Working with Data Frames, Using SQL Commands, Saving and loading Data
Frame.
UNIT IV:
Streaming in Spark- Writing spark streaming applications, Using external data sources,
structured streaming.
Spark MLlib-Introduction to Machine Learning. Definition of Machine Learning, Machine
Learning with Spark.
UNIT V:
Graph Representation in MapReduce: Graph Processing with Spark, Spark GraphX, GraphX
features, Graph Examples, Graph algorithms-Shortest Path Algorithm.
TEXT BOOKS:
1. Understanding Big Data Analytics for Enterprise Class Hadoop and Streaming Data by
Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, 1st Edition,
TMH,2012.
2. Spark in Action PetarZecevic, markoBonaci Manning Publications-2016.
3. Learning Spark“Holden KarauA. Konwinskietc.,”O’reilly Publications.
REFERENCE BOOKS:
1. Hadoop in Action by Chuck Lam, MANNING Publishers.
2. Hadoop: The Definitive Guide by Tom White, 3rd Edition, O’reilly
3. Mining of massive datasets, AnandRajaraman, Jeffrey D Ullman, Wiley Publications.
Student Activities:
Take any dataset and do the following machine learning
steps.(https://www.guru99.com/pyspark-tutorial.html)
Continuous assessment:
Let the students be tested in the following questions from each unit
B Sc Semester: IV Credits: 1
Course: 4 BIG DATA ANALYTICS USING SPARK PROGRAMMING LAB Hrs/Wk: 2
2. Install Hadoop
3. Install Spark on top of Hadoop
4. Create and Implement the transformations in RDDs
5. Create a data frame from an existing RDD using Spark Session
6. Execute a Word Count example in Spark Shell by creating RDDs.
7. Implement Spark SQL Queries in Python.
8. Write a Program to implement maximum temperature give the recordings of one year.
9. Write a Program to implement the Pie estimation
10. Write a User Defined Function to convert a given text to Uppercase.
SECTION-A
Answer any FIVE of the following 5 x 5=25M
SECTION-B
Answer ALL the following Questions. 5X10=50M
9) a) What are the differences between the data in hadoop and in warehouse
(OR)
b) Explain the building blocks of hadoop
10) a) Explain the components of spark and program flow in spark?
(OR)
b) Explain difference between implementation of programs in hadoop and spark
programming environment?
11) a) Explain RDD transmission and actions
(OR)
b) With spark SQL commends explain how to save and load data in data frame
12) a) Explain different extend datasources
(OR)
b) How to implement machine learning concept in spark?
13) a) Explain graphs processing with spark using map reduce
(OR)
B. Sc Semester: IV Credits: 4
Course: 5 DATA VISUALIZATION Hrs/Wk: 4
UNIT I:
Creating Visual Analytics with tableau desktop, connecting to your data-How to Connect to your
data, What are generated Values? Knowing when to use a direct connection, Joining tables with
tableau, blending different data sources in a single worksheet.
UNIT II:
Building your first Visualization- How Me works- Chart types, Text Tables, Maps, bar chart,
Line charts, Area Fill charts and Pie charts, scatter plot, Bullet graph, Gantt charts, Sorting data
in tableau, Enhancing Views with filters, sets groups and hierarchies.
UNIT III:
Creating calculations to enhance your data- What is aggregation, what are calculated values
and table calculations, Using the calculation dialog box to create, Building formulas using table
calculations, Using table calculation functions
UNIT IV:
Using maps to improve insights-Create a Standard Map View, Plotting your own locations
on a map, Replace Tableau’s standard maps, Shaping data to enable Point-to-Point mapping.
UNIT V:
Developing an Adhoc analysis environment- generating new data with forecasts, providing
self evidence adhoc analysis with parameters, Editing views in tableau Server.
TEXT BOOKS:
1. Tableau your data-Daniel G. Murray and the Inter works BI team, Wiley Publications
2. Tableau Data Visualizaton Cookbook, AshutoshNandeshwar, PACKT publishing.
3. Storytelling with Data: A Data Visualization Guide for Business Professionals by
Cole NussbaumerKnaflic (2014)
4. ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham (2009)
REFERENCE BOOKS:
5. Designing Data Visualizations: Representing Informational Relationships by Noah
Iliinsky, Julie Steele (2011)
6. Alexandru C. Telea – “Data Visualization principles and practice” Second Edition,
CRC Publications
7. Joshua N. Millign–“ Learning Tableau -2019” – Third Edition- Packt publications
B. Sc Data Science Page 26 of 50
ADIKAVI NANNAYA UNIVERSITY:: RAJAHMAHENDRAVARAM
B.Sc Data Science Syllabus (w.e.f :20-21 A.Y)
Student Activity
Create a sample super store data set and visualize the following requirements
General Requirements
1. Dashboard size is 1250px wide by 750px tall.
2. Prefer using containers
3. The dashboard has a total of 5 containers (no more, no less)
4. The Filter Pane
5. Each filter has some padding
Business Requirements
1. Show four filters- Category, Sub-Category, Region, and Segment. These filters should
have only relevant values.
2. The dashboard should have the title “Executive sales”
3. The first chart should have the title “YTS KPIs” and should show the following-
Total Discount
Overall Profit
Total Quantity and
Total Sales
4. The second graph should have the title as “Sales” and should show monthly sales per
year. Make sure it is an area chart with proper formatting.
5. The third graph should the title as “Profit” and should show monthly profit per year.
Make sure it is an area chart with proper formatting.
Continuous assessment:
Let the students be tested in the following questions from each unit
10. What are generated values? Join tables using Tableau
11. Create any visualization charts using Chart types, Text Tables, Maps, bar chart, Line
charts, Area Fill charts and Pie charts, scatter plot etc.,
12. What is aggregation, what are calculated values and table calculations?
13. Using Standard Map View, Plot your own locations on a map
14. Develop an Adhoc analysis environment.
B. Sc Semester: IV Credits: 1
Course: 5 DATA VISUALIZATION LAB Hrs/Wk: 2
4. Create Maps
SECTION-A
Answer any FIVE of the following 5 x 5=25M
SECTION-B
Answer ALL the following Questions. 5X10=50M
9. a) Explain how to blend different data sources in a single work sheet
(OR)
b) Discuss how different tables are joined with tableau.
10. a) Discuss how to work with filters to enhance views
(OR)
b) What are different set groups and hierarchies in visualization.
11. a) What is aggregation explain how dialogue box is created using calculations.
(OR)
b) Discuss how formulas are build using table calculations
12. a) Discuss how to create a standard map view with an example
(OR)
13. b) Explain how data shaping is done to enable point to point mapping13.How self evidence
ad-hoc analyses is provided with parameters.
(OR)
b) Explain methods or generating new data with fore cast.
Learning Outcomes
Students at the successful completion of the course will be able to:
1. Understand Big Data and its usage
2. Identify various Data Quality and Preprocessing methods
3. Learn different Clustering techniques and Frequent Pattern Mining
4. Understand Regression, Classification and additional Predictive Methods
Syllabus: (Total Hours: 90 including Teaching, Lab and internal exams, etc.)
UNIT I:
Introduction to Data Analytics: Big Data and Data Science, Big Data Architectures, A Short
Taxonomy of Data Analytics, Examples of Data Use, History on Methodologies for Data
Analytics. Descriptive Statistics: Scale Types, Descriptive Univariate Analysis, Descriptive
Bivariate Analysis.
UNIT II:
Descriptive Multivariate Analysis: Multivariate Frequencies, Multivariate Data Visualization,
Multivariate Statistics, Infographics and Word Clouds Data Quality and Preprocessing: Data
Quality, converting to a Different Scale Type, Converting to a Different Scale, Data
Transformation, Dimensionality Reduction.
UNIT III:
Clustering: Distance Measures, Clustering Validation, Clustering Techniques.
Frequent Pattern Mining: Frequent Itemsets, Association Rules, Behind Support and
Confidence, Other Types of Pattern.
UNIT IV:
Regression: Predictive Performance Estimation, Finding the Parameters of the Model,
Technique and Model Selection.
Classification: Binary Classification, Predictive Performance Measures for Classification,
Distance-based Learning Algorithms, Probabilistic Classification Algorithms.
UNIT V:
Additional Predictive Methods: Search-based Algorithms, Optimization-based Algorithms.
Advanced Predictive Topics: Ensemble Learning, Algorithm Bias, Non-binary Classification
Tasks, Advanced Data Preparation Techniques for Prediction.
TEXT BOOKS:
1. “A General Introduction to Data Analytics” by João Mendes Moreira, André C. P. L. F.
de Carvalho, TomášHorváth, 2019 Edition, Wiley Publications.
2. “Data Analytics: Principles, Tools and Practices” by Dr. Gaurav Aroraa, ChitraLele,
Dr. Munish Jindal, 2022 Edition, pbp publications
3. “Data Analytics” by Anil Maheshwari, First Edition, McGraw Hill Education
OBJECTIVES:
To implement Map Reduce programs for processing big data
To realize storage of big data using H base, Mongo DB
To analyze big data using linear models
To Analyse big data using machine learning techniques such as SVM / Decision tree
classification and clustering
LIST OF EXPERIMENTS
Hadoop
5. Implement an application that stores big data in Hbase / MongoDB / Pig using
Hadoop / R.
TEXT BOOKS:
2. “Data Analytics: Principles, Tools and Practices” by Dr. Gaurav Aroraa, ChitraLele,
Dr. Munish Jindal, 2022 Edition, pbp publications.
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
Course 6A: Data Analytics with Tableau
Time:3Hrs Max.Marks:75
Section – A
Answer any FIVE of the following. 5x5=25M
UNIT I:
Problems and Search: What is Artificial Intelligence, The AI Problems, and Underlying
Assumption, what is an AI Technique.
Problems, Problems Spaces, and Search: Defining the problem as a state space search,
production systems, problems characteristics, issues in the design of search programs.
UNIT II:
Heuristic Search Techniques: Generate-and-test, Hill Climbing, Best-First Search, Problem
Reduction, Constraint Satisfaction, Means-Ends Analysis
UNIT III:
Knowledge Representation Issues: Representations and Mapping, Approaches to Knowledge
Representation, The frame problem. Using Predicate Logic: Representing simple facts in logic,
Representing Isa relationships, predicates, Resolution
UNIT IV:
Representing Knowledge using Rules: Procedural Vs Declarative knowledge, Logic
Programming, Forward Vs Backward Reasoning, Matching, Control Knowledge
UNIT V:
Symbolic Reasoning under Uncertainty: Introduction to Non-monotonic Reasoning, Logics for
Non-monotonic Reasoning, Implementation issues, Augmenting a Problem solver,
implementation: DFS, BFS.
Statistical Reasoning: Probability and Bayes Theorem, Certainty Factors and Rule-Based
Systems, Bayesian Networks, Dempster-Shafer Theory.
TEXT BOOK:
1. Artificial Intelligence, Second Edition, Elaine Rich, Kevin Knight, Tata McGraw-Hill
Edition.
REFERENCES BOOK:
1. Russell, S., &Norvig, P. Artificial intelligence: a modern approach. Third Edition.
Pearson new International edition. 2014.
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
Course 7A: AI Concepts and Techniques with Python
Time:3Hrs Max.Marks:75
SECTION-A
Answer any FIVE of the following 5x5=25M
1. What is AI Technique?
2. Define State space search
3. Explain Generate and test
4. What is heuristic search technique?
5. What is resolution?
6. Explain Uncertainty implementation issues
7. Explain Bayes Theorem
8. Define Dempster-Shafer Theory.
SECTION-B
Answers ALL the Following Questions. 5X10=50M
10. a) Define Heuristic search? What are the advantages of Heuristic search?
(OR)
b) Describe the Hill climbing.
11. a) What is predicate logic? Explain the predicate logic representation with reference to
suitable example.
(OR)
b) Describe the approaches to Knowledge Representation and explain the Issues in
Knowledge Representation
12. a) Explain Procedural Vs Declarative knowledge
(OR)
b) Explain the Issues in Knowledge Representation. Write notes on control
knowledge.
13. a) Show how to implement Non-monotonic reasoning using JTMS in medical
diagnosis. Consider rules such as “If you have a runny nose, assume you have a
cold unless it is Allergy season.”
(OR)
b) Explain logics for Non-monotonic reasoning and discuss the implementation
issues.
Syllabus: (Total Hours: 90 including Teaching, Lab and internal exams, etc.)
UNIT I:
Machine Learning Basics: What is machine learning? Key terminology, Key tasks of machine
learning, How to choose right algorithm, steps in developing a machine learning, why python?
Getting started with Numpy library
Classifying with k-Nearest Neighbors: The k-Nearest Neighbors classification algorithm,
Parsing and importing data from a text file, Creating scatter plots with Matplotlib, Normalizing
numeric values
UNIT II:
Splitting datasets one feature at a time-Decision trees: Introducing decision trees, measuring
consistency in a dataset, using recursion to construct a decision tree, plotting trees in Matplotlib
UNIT III:
Classifying with probability theory-Naïve Bayes: Using probability distributions for
classification, learning the naïve Bayes classifier, Parsing data from RSS feeds, using naïve
Bayes to reveal regional attitudes
UNIT IV:
Logistic regression: Classification with logistic regression and the sigmoid function, Using
optimization to find the best regression coefficients, the gradient descent optimization algorithm,
Dealing with missing values in the our data
UNIT V:
Support vector machines: Introducing support vector machines, using the SMO algorithm for
optimization, using kernels to “transform” data, Comparing support vector machines with other
classifiers
TEXT BOOK:
1. Machine learning in action, Peter Harrington by Manning publications
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a CSV file.
5. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
6. Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set. You can use Java/Python ML library classes/API.
7. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for
this problem.
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
SECTION-A
SECTION-B
Answer ALL the following Questions. 5X10=50M
12. A) Discuss classification with logistic regression and the sigmoid function.
( OR )
B) Discuss gradient descent optimization algorithm.
Syllabus: (Total Hours: 90 including Teaching, Lab and internal exams, etc.)
UNIT I:
Unsupervised Learning: Clustering: k-means clustering algorithm, Improving cluster
performance with post processing, Bisecting k-means, Example: clustering points on a map
UNIT II:
Association analysis : Apriori algorithm: Association analysis, The Apriori principle, Finding
frequent item sets with the Apriori algorithm, Mining association rules from frequent item sets,
uncovering patterns in congressional voting
UNIT III:
Finding frequent item sets: FP-growth –FP trees, Build FP-tree, mining frequent from an FP-
tree, finding co-occurring words in a Twitter feed, mining a click stream from a news site.
UNIT IV:
Principal component analysis: Dimensionality reduction techniques, using PCA to reduce the
dimensionality of semiconductor manufacturing data
UNIT V:
Singular value decomposition: Applications of the SVD, Matrix factorization, SVD in Python,
Collaborative filtering–based recommendation engines, a restaurant dish recommendation
engine
TEXT BOOK:
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
SECTION – A
Answer any FIVE of the following Questions. 5 X 5 = 25M
11. a) Define Finding frequent item sets: FP-growth –FP trees, Build FP-tree
( OR )
b) List out steps to find co-occurring words in a Twitter feed
Syllabus: (Total Hours: 90 including Teaching, Lab and internal exams, etc.)
UNIT I:
Natural Language Processing: What is NLP? NLP and linguistics -Syntax and semantics,
Pragmatics and context, Two views of NLP, Tasks and super tasks. Linguistic tools- Sentence
delimiters and tokenizers, Stemmers and taggers, Noun phrase and name recognizers, Parsers
and grammars.
UNIT II:
Document Retrieval: Information retrieval, Indexing technology Query processing: Boolean
search, Ranked retrieval, Probabilistic retrieval, Language modeling Evaluating search
engines: Evaluation studies Evaluation Metrics Relevance Judgments Total system evaluation
Attempts to enhance search performance: Table of contents Query expansion and thesauri,
Query expansion from relevance information
UNIT III:
Information extraction: The Message Understanding Conferences, Regular expressions Finite
automata in FASTUS: Finite State Machines and regular languages, Finite State Machines as
parsers Pushdown automata and context-free grammars: Analyzing case reports
Context free grammars Parsing with a pushdown automaton, Coping with incompleteness and
ambiguity
UNIT IV:
Text categorization: Overview of categorization tasks and methods , Handcrafted rule based
methods Inductive learning for text classification : Naïve Bayes classifiers , Linear classifiers,
Decision trees and decision lists Nearest Neighbor algorithms Combining classifiers : Data
fusion, Boosting, Using multiple classifiers
UNIT V:
Text mining: What is text mining? Reference and coreference, Named entity recognition, The
coreference task, Automatic summarization: Summarization tasks, Constructing summaries
from document fragments, Multi-document summarization MDS) Testing of automatic
summarization programs: Evaluation problems in summarization research, Building a corpus
for training and testing.
TEXT BOOK:
1. Natural Language Processing for Online Applications, Text Retrieval Extraction &
Categorization. Peter Jackson, Isabelle Moulinier, Thomson Legal & Regulatory.
1. INSTALLATION
2. WORD TOKENIZER
3. SENTENCE TOKENIZER
4. PARAGRAPH TOKENIZER
5. PROBABILISTIC PARSING
6. PROBABILISTIC CONTEXT FREE GRAMMER
7. LEARNING GRAMMAR
8. CONDITIONAL FREQUENCY DISTRIBUTIONS
9. LEXICAL ANALYSER
10. WORDNET
11. CONTEXT FREE GRAMMAR
12. LARGE CONTEXT FREE GRAMMAR AND PARSING
13. NAMED ENTITY RECOGNITION
TEXT BOOK:
1. Natural Language with Python, Steven Bird and Oreilly , First Edition.
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
SECTION – A
Answer any FIVE of the following Questions. 5 X 5 = 25M
SECTION – B
Answer ALL the Following Questions. 5 X 10 = 50M
UNIT I:
Introduction to Deep Learning: Artificial intelligence, machine learning and deep learning,
history of machine learning, Why deep learning? Why now?
The mathematical building blocks of neural networks: A first look at a neural network, Data
representations for neural networks, The gears of neural networks: tensor operations, The
engine of neural networks: gradient-based optimization.
UNIT II:
Getting started with neural networks: Anatomy of a neural network, Introduction to Keras,
Setting up a deep-learning workstation, Classifying movie reviews: a binary classification
Example, Classifying newswires: a multiclass classification example, Predicting house prices:
a regression example.
Fundamentals of machine learning: Four branches of machine learning, Evaluating machine-
learning models, Data preprocessing, feature engineering and feature learning, Overfitting
and underfitting, The universal workflow of machine learning.
UNIT III:
Deep learning for computer vision: Introduction to convnets, Training a convnet from scratch
on a small dataset, Using a pretrained convent, Visualizing what convnets learn.
UNIT IV:
Deep learning for text and sequences: Working with text data, Understanding recurrent neural
networks, Advanced use of recurrent neural networks, Sequence processing with convnets.
UNIT V:
Advanced deep-learning best practices: Going beyond the Sequential model: theKeras
functional API, Inspecting and monitoring deep-learning models using Keras callbacks and
Tensor Board, Getting the most out of your models.
TEXT BOOKS:
1. “Deep Learning with Python” by Francois Chollet, , 2018 Edition, Manning
Publications.
2. “Deep Learning with Python” by Nikhil Ketkar, JojoMoolayil, Second Edition, Apress.
3. “Python Deep Learning” by Ivan Vasilev, Daniel Slatter, Second Edition, Packt
Publications.
i. Instantiating a Convnet
TEXT BOOKS:
B. Sc DEGREE EXAMINATION
SEMESTER –V (Skill Enhancement Course-Elective)
SECTION – A
2. Write about the relationship between network, layers, loss function and optimizer.
SECTION – B
Answer ALL the Following Questions. 5 X 10 = 50M