BDA Module1
BDA Module1
Institute of
Technology
Department of Computer Science & Engineering
Strive for Excellence
Semester:7th Semester
Course Name: Big Data and Analytics
Course Code:18CS72
Module No:1
Presenter: A. K. Sreeja
Strive for Excellence
Slide #1
Course Outline
1.Big Data
2.Scalability & Parallel Processing
3.Designing Data Architecture
4.Data Sources, Quality, Pre-Processing and Storing
5.Data Storage & Analysis
6.Big Data Analytics Applications & Case Studies
Slide #2
Course Outcomes:
CO1: Understand fundamentals of Big Data analytics
.
CO2: Investigate Hadoop framework and Hadoop
Distributed File system.
CO3: Illustrate the concepts of NoSQL using
MongoDB and Cassandra for Big Data.
CO4: Demonstrate the MapReduce
programming model to process the big data along with
Hadoop tools
CO5: Use Machine Learning algorithms for real world
big data.
CO6:
DepartmentAnalyze webEngineering
of Computer Science contents and Social Networks to
Strive for Excellence
Slide #3
Course Syllabus
CREDITS – 04
Course Syllabus
Course Syllabus
Text Books:
1. Raj Kamal and Preeti Saxena, “Big Data Analytics Introduction to
Hadoop, Spark, and Machine-Learning”, McGraw Hill Education,
2018 ISBN: 9789353164966, 9353164966
2. Douglas Eadline, "Hadoop 2 Quick-Start Guide: Learn the
Essentials of Big Data Computing in the Apache Hadoop 2
Ecosystem", 1 stEdition, Pearson Education, 2016. ISBN13: 978-
9332570351
Reference Books:
3. 1. Tom White, “Hadoop: The Definitive Guide”, 4 th Edition,
O‟Reilly Media, 2015.ISBN-13: 978- 9352130672
4. Boris Lublinsky, Kevin T Smith, Alexey Yakubovich,
"Professional Hadoop Solutions", 1 stEdition, Wrox Press,
2014ISBN-13: 978-8126551071
5. Eric Sammer, "Hadoop Operations: A Guide for Developers and
Administrators",1 stEdition, O'Reilly Media, 2012.ISBN-13: 978-
9350239261
6. Arshdeep Bahga, Vijay Madisetti, "Big Data Analytics: A Hands-
On Approach", 1st Edition, VPT Publications, 2018. ISBN-13:
978-0996025577
Department of Computer Science & Engineering
Strive for Excellence
Slide #6
Module 1
1.Big Data
• Definition of Data
• Definition of Web data
• Classification of Data- Structured, Semi-
structured and Unstructured
• Definition of Big Data
Big Data
Big Data Characteristics
Big Data
Big Data Types
Big Data
Big Data Classification
Big Data
Big Data Handling Techniques
• Grid Computing:
Grid Computing refers to distributed computing, in which a group of
computers from several locations are connected with each other to achieve a
common task.
A group of computers that might spread over remotely comprise a grid.
Grid computing, similar to cloud computing, is scalable.
Cloud computing depends on sharing of resources (for example, networks,
servers, storage, applications and services) to attain coordination and
coherence among resources similar to grid computing.
Data Enrichment
"Data enrichment refers to operations or processes which refine, enhance or
improve the raw data.“
Data Editing
Data editing refers to the process of reviewing and adjusting the acquired
datasets.
Data Reduction
• Data reduction enables the transformation of acquired information into an
ordered, correct and simplified form.
Data wrangling refers to the process of transforming and mapping the data.
Big Data analytics deploys large volume of data to identify and derive intelligence using
predictive models about individuals.
Following are some findings: building the health profiles of individual patients and
predicting models for diagnosing better and offer better treatment,
• Aggregating large volume and variety of information around from multiple sources the
DNAs, proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems, that
can enhance the understanding of biology of diseases. Big data creates patterns and models
by data mining and help in better understanding and research,
• Deploying wearable devices data, the devices data records during active as well as
inactive periods, provide better understanding of patient health, and better risk profiling
the user for certain diseases.
Module 2