0% found this document useful (0 votes)
183 views2 pages

Syllabus of Big Data Analysis - Proposed

This proposed syllabus covers 9 units on topics related to big data analysis including introductions to big data, Hadoop architecture, machine learning, processing streaming data, parallel programming, cloud computing, social network mining, text mining, and big data privacy, ethics and security. The course aims to provide foundational training in big data processing technologies and tools like MapReduce, Hadoop, and its ecosystem components to enable effective participation in big data projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views2 pages

Syllabus of Big Data Analysis - Proposed

This proposed syllabus covers 9 units on topics related to big data analysis including introductions to big data, Hadoop architecture, machine learning, processing streaming data, parallel programming, cloud computing, social network mining, text mining, and big data privacy, ethics and security. The course aims to provide foundational training in big data processing technologies and tools like MapReduce, Hadoop, and its ecosystem components to enable effective participation in big data projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Proposed Syllabus of Big Data Analysis:

PURPOSE:
This course provides practical foundation level training that enables immediate and effective participation in big data
projects. The course provides grounding in basic and advanced methods to big data processing technology and tools,
including MapReduce and Hadoop and its ecosystem.
INSTRUCTIONAL OBJECTIVES:
1. Learn tips and tricks for Big Data use cases and solutions.
2. Learn to build and maintain reliable, scalable, distributed systems with Apache Hadoop.
3. Able to apply Hadoop ecosystem components for big data practical application.
Prerequisite:
Knowledge in basic analytical algorithms, Graph Theory, Programming knowledge in any object-oriented language.

UNIT I – INTRODUCTION TO BIG DATA


Introduction, distributed file system, Big Data and its importance, Five Vs, Drivers for Big data, Big data analytics,
Big data applications, Algorithms using map reduce, applications like Matrix-Vector Multiplication by Map Reduce.
UNIT II - HADOOP ARCHITECTURE
Apache Hadoop & Hadoop EcoSystem, Moving Data in and out of Hadoop, Understanding inputs and outputs of
MapReduce, Data Serialization, Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands,
Anatomy of File Write and Read., NameNode, Secondary NameNode, and DataNode, Hadoop MapReduce
paradigm, Map and Reduce tasks, Job, Task trackers - Cluster Setup – SSH & Hadoop Configuration – HDFS
Administering –Monitoring & Maintenance, Hive Architecture and Installation, Comparison with Traditional
Database, HiveQL – Querying Data - Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase
concepts, Advanced Usage, Schema Design, Advance Indexing, HBase uses Zookeeper and how to Build
Applications with Zookeeper.
UNIT III- MACHINE LEARNING
Evaluating clustering models, validating models – cluster analysis – K-means algorithm, Naïve Bayes –
Memorization Machine Learning and Soft Computing: Rationale, Motivations, Needs, Basics: supervised and
unsupervised learning models, Basic Tools of Soft Computing: Neural Networks, Fuzzy Logic Systems, and Support
Vector Machines, Basic Mathematics of Soft Computing, Learning and Statistical Approaches to Regression and
Classification - Support Vector Machines, Single-Layer Networks: The Perceptron, The Adaptive Linear Neuron
(Adaline) and the Least Mean Square Algorithm - Multilayer Perceptron: The Error Backpropagation Algorithm –
The Generalized Delta Rule, Heuristics or Practical Aspects of the Error Backpropagation Algorithm, basic of deep
learning.
UNIT IV - PROCESSING & STORING STREAMING DATA
Distributed Stream Data Processing: Co-ordination, Partition and Merges, Transactions. Duplication Detection using
Bloom Filters, Apache Spark Streaming Examples Choosing a storage system, NoSQL Storage Systems, Visualizing
Data, Mobile Streaming Apps, Times Counting and Summation, Stochastic Optimization, Delivering Time Series
Data.
UNIT V - PARALLEL PROGRAMMING
Parallel programming with Message Passing Interface (MPI): MPI compilation and running process, Implementation
of MPI for clusters, Dynamic process management, Fault tolerance, RMA Performance measurement, Parallel
Virtual Machine (PVM): Overview, Setup, console details-Extended PVM.
UNIT VI- CLOUD COMPUTING
Cloud Enabling Technologies, Characteristics of Cloud Computing -Benefits of Cloud Computing, Cloud Service
Models, Cloud Deployment models, Cloud computing Infrastructure, Cloud Challenges, Understanding IaaS-
Improving performance through Load balancing, Server Types within IaaS solutions, utilizing cloud based NAS
devices, Understanding Cloud based data storage, Cloud based backup devices.
UNIT VII- SOCIAL NETWORK MINING
Cascading Behaviour in Networks: Diffusion in Networks, Modelling Diffusion - Cascades and Cluster, Thresholds,
Extensions of the Basic Cascade Model- Six Degrees of Separation-Structure and Randomness, Decentralized
Search- Empirical Analysis and Generalized Models, Analysis of Decentralized Search, Clustering of Social
Network graphs, Betweenness, closeness, graph degree centrality, Girvan Newman algorithm, Discovery of
communities, Network Modularity, Cliques and Bipartite graphs, Graph partitioning methods, page ranking,
importance of any node.
UNIT VIII – TEXT MINING
Introduction, Core text mining operations, Pre-processing techniques, Categorization, Pointwise mutual information,
word frequency analysis, Clustering, Information extraction, Probabilistic models for information extraction, Text
mining applications.
UNIT IX – BIG DATA PRIVACY, ETHICS AND SECURITY
Privacy – Reidentification of Anonymous People – Why Big Data Privacy is self-regulating? – Ethics, Ownership,
Ethical Guidelines, Big Data Security, Organizational Security, Default Hadoop Model without security, Hadoop
Kerberos Security Implementation & Configuration, Integrating Hadoop with Enterprise Security Systems, Securing
Sensitive Data in Hadoop, SIEM system, Setting up audit logging in Hadoop cluster.

Books & Reference:


1. Chris Eaton, Dirk Deroos et al., “Understanding Big data”, McGraw Hill, 2012.
2. Boris Lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN:
9788126551071, 2015.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 3rd
ed, 2010.
5. Lior Rokach and Oded Maimon, “Data Mining and Knowledge Discovery Handbook”, Springer, 2nd edition,
2010.
6. Sudeesh Narayanan, “Securing Hadoop”, Packt Publishing, 2013.
7. Ben Spivey, Joey Echeverria, “Hadoop Security Protecting Your Big Data Problem”, O’Reilly Media, 2015.
8. Anthony T .Velte, Toby J.Velte, Robert Elsenpeter, “Cloud Computing: A Practical Approach”, Tata McGraw
Hill Edition, Fourth Reprint, 2010.
9. Kris Jamsa, “Cloud Computing: SaaS, PaaS, IaaS, Virtualization, Business Models, Mobile, Security and more”,
Jones & Bartlett Learning Company LLC, 2013.
10. Rajkumar Buyya, “High Performance Cluster Computing: Programming and Applications”, Vol 2, Prentice Hall
PTR, NJ, USA, 1999.
11. Byron Ellis, “Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data”, Wiley, 1st edition,
2014.
12. Sherif Sakr, “Large Scale and Big Data: Processing and Management”, CRC Press, 2014. 2014.
13. Bill Franks, “Taming The Big Data Tidal Wave Finding Opportunities In Huge Data Streams With Advanced
Analytics”, Wiley, 2012.
14. Ronen Feldman and James Sanger, “The Text Mining Handbook: Advanced Approaches in Analyzing
Unstructured Data”, Cambridge University Press, 2006.

You might also like