0% found this document useful (0 votes)
25 views2 pages

Syllabus of BDA

The document outlines a course syllabus on Big Data, covering three main units: Introduction to Big Data, Big Data Technologies, and Data Science in Big Data. It includes topics such as the characteristics of Big Data, Hadoop ecosystem, NoSQL databases, AI applications, and the iterative nature of data science projects. The syllabus also highlights tools and frameworks used in Big Data analytics and data science, including Jupyter Notebook and Tableau.

Uploaded by

a64394127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views2 pages

Syllabus of BDA

The document outlines a course syllabus on Big Data, covering three main units: Introduction to Big Data, Big Data Technologies, and Data Science in Big Data. It includes topics such as the characteristics of Big Data, Hadoop ecosystem, NoSQL databases, AI applications, and the iterative nature of data science projects. The syllabus also highlights tools and frameworks used in Big Data analytics and data science, including Jupyter Notebook and Tableau.

Uploaded by

a64394127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Course Syllabus and Suggestive Readings

Unit-1 Unit 1: Introduction to Big Data Contact Hours: 15


Understanding Big Data Introduction to Big Data – Definition and Characteristics; The 5 V’s of
and the 5 V’s Big Data – Volume: Data at scale, Velocity: Real-time data processing,
Variety: Structured, semi-structured, unstructured data, Veracity:
Uncertainty and trustworthiness in data, Value: Transforming data into
insights; Challenges and Opportunities in Big Data; Big Data Use Cases
in Real-World Applications
Big Data Architecture Fundamentals of Big Data Architecture: Data ingestion, storage,
processing and visualization layers

Hadoop Ecosystem in Big Data Architecture: Tools like HDFS, YARN,


Hive and Sqoop

Streaming Data in Big Data: Tools such as Apache Kafka and Flink

Real-World Big Data Architecture: Lambda and Kappa Architectures,


Hybrid Architecture for batch and real-time processing
The Hadoop Ecosystem Introduction to the Hadoop Ecosystem; HDFS (Hadoop Distributed File
System): Architecture and Functionality; MapReduce Programming
Model: Workflow and Applications; YARN (Yet Another Resource
Negotiator): Resource Management; Tools in the Ecosystem: Pig, HBase,
Flume, and Oozie; Data Processing with Hadoop: ETL, Analytics and
Reporting
Unit-2 Big Data Technologies Contact Hours: 15
Big Data Frameworks Big Data Frameworks: Hadoop, Apache Spark, and their Comparison;
NoSQL databases: MongoDB, Cassandra, and HBase; Big Data
Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big
Data Processing: Apache Storm and Flink; Emerging trends in Big Data
Technologies.
Big SQL and NO SQL Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction
Databases to Big SQL: Big SQL Features – Scalability, support for structured and
unstructured data, Query optimization Techniques in Big SQL; NoSQL
Database Types: Key-Value stores (Redis, DynamoDB), Document
stores (CouchDB), Column-family stores (Cassandra, HBase), Graph
Databases (Neo4j); Advantages and limitations of Big SQL and NoSQL.
AI in Big Data Introduction to IBM Watson: Overview and capabilities of Watson AI,
Watson’s role in Big data and decision-making; Key Watson Services:
Watson Discovery, Watson Studio, and Watson Assistant, Integration of
Watson with Big Data tools;

AI and Machine Learning Applications in Big Data: Natural Language


Processing (NLP), Sentiment Analysis and Predictive Analytics.
Unit-3 Data Science in Big Data Contact Hours: 15
The Iterative Nature of Introduction to Data Science Projects: Stages and Lifecycle; Iterative
Data Science Projects process in Data Science: Problem Definition, Data collection and
exploration, Model development and evaluation; Refinement and
deployment; Importance of Iteration: Continuous improvement and error
correction; Tools supporting Iteration: Notebooks, Version Control and
CI/CD
Notebooks in Data Science Introduction to Data Science Notebooks: Characteristics – Interactive,
reproducible and modular workflow, Key benefits – Visualization,
documentation and collaboration;

Programming Languages for Data Science: Python – Libraries like


pandas, NumPy and Matplotlib, R – Strengths in statistical analysis and
visualization; Mechanisms and Tolls in Notebooks: Code cells,
markdown, widgets, and extensions, Integration with Git and other data
tools
Notebooks and Data Major Data Science Notebooks: Jupyter Notebook, Google Colab and
Science tools in Big Data Zeppelin, Comparing features: Offline vs. cloud, extensions and
performance;

Getting started with Jupyter Notebook: Installation, environment setup,


and basic usage, Working with Python and R in Jupyter;

Introduction to Tableau: Key features and use-cases, Data connection,


visualization building and dashboard creation;

Collaboration and Presentation tools for Data Insights

You might also like