The document outlines a course syllabus on Big Data, covering three main units: Introduction to Big Data, Big Data Technologies, and Data Science in Big Data. It includes topics such as the characteristics of Big Data, Hadoop ecosystem, NoSQL databases, AI applications, and the iterative nature of data science projects. The syllabus also highlights tools and frameworks used in Big Data analytics and data science, including Jupyter Notebook and Tableau.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
25 views2 pages
Syllabus of BDA
The document outlines a course syllabus on Big Data, covering three main units: Introduction to Big Data, Big Data Technologies, and Data Science in Big Data. It includes topics such as the characteristics of Big Data, Hadoop ecosystem, NoSQL databases, AI applications, and the iterative nature of data science projects. The syllabus also highlights tools and frameworks used in Big Data analytics and data science, including Jupyter Notebook and Tableau.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2
Course Syllabus and Suggestive Readings
Unit-1 Unit 1: Introduction to Big Data Contact Hours: 15
Understanding Big Data Introduction to Big Data – Definition and Characteristics; The 5 V’s of and the 5 V’s Big Data – Volume: Data at scale, Velocity: Real-time data processing, Variety: Structured, semi-structured, unstructured data, Veracity: Uncertainty and trustworthiness in data, Value: Transforming data into insights; Challenges and Opportunities in Big Data; Big Data Use Cases in Real-World Applications Big Data Architecture Fundamentals of Big Data Architecture: Data ingestion, storage, processing and visualization layers
Hadoop Ecosystem in Big Data Architecture: Tools like HDFS, YARN,
Hive and Sqoop
Streaming Data in Big Data: Tools such as Apache Kafka and Flink
Real-World Big Data Architecture: Lambda and Kappa Architectures,
Hybrid Architecture for batch and real-time processing The Hadoop Ecosystem Introduction to the Hadoop Ecosystem; HDFS (Hadoop Distributed File System): Architecture and Functionality; MapReduce Programming Model: Workflow and Applications; YARN (Yet Another Resource Negotiator): Resource Management; Tools in the Ecosystem: Pig, HBase, Flume, and Oozie; Data Processing with Hadoop: ETL, Analytics and Reporting Unit-2 Big Data Technologies Contact Hours: 15 Big Data Frameworks Big Data Frameworks: Hadoop, Apache Spark, and their Comparison; NoSQL databases: MongoDB, Cassandra, and HBase; Big Data Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big Data Processing: Apache Storm and Flink; Emerging trends in Big Data Technologies. Big SQL and NO SQL Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction Databases to Big SQL: Big SQL Features – Scalability, support for structured and unstructured data, Query optimization Techniques in Big SQL; NoSQL Database Types: Key-Value stores (Redis, DynamoDB), Document stores (CouchDB), Column-family stores (Cassandra, HBase), Graph Databases (Neo4j); Advantages and limitations of Big SQL and NoSQL. AI in Big Data Introduction to IBM Watson: Overview and capabilities of Watson AI, Watson’s role in Big data and decision-making; Key Watson Services: Watson Discovery, Watson Studio, and Watson Assistant, Integration of Watson with Big Data tools;
AI and Machine Learning Applications in Big Data: Natural Language
Processing (NLP), Sentiment Analysis and Predictive Analytics. Unit-3 Data Science in Big Data Contact Hours: 15 The Iterative Nature of Introduction to Data Science Projects: Stages and Lifecycle; Iterative Data Science Projects process in Data Science: Problem Definition, Data collection and exploration, Model development and evaluation; Refinement and deployment; Importance of Iteration: Continuous improvement and error correction; Tools supporting Iteration: Notebooks, Version Control and CI/CD Notebooks in Data Science Introduction to Data Science Notebooks: Characteristics – Interactive, reproducible and modular workflow, Key benefits – Visualization, documentation and collaboration;
Programming Languages for Data Science: Python – Libraries like
pandas, NumPy and Matplotlib, R – Strengths in statistical analysis and visualization; Mechanisms and Tolls in Notebooks: Code cells, markdown, widgets, and extensions, Integration with Git and other data tools Notebooks and Data Major Data Science Notebooks: Jupyter Notebook, Google Colab and Science tools in Big Data Zeppelin, Comparing features: Offline vs. cloud, extensions and performance;
Getting started with Jupyter Notebook: Installation, environment setup,
and basic usage, Working with Python and R in Jupyter;
Introduction to Tableau: Key features and use-cases, Data connection,
visualization building and dashboard creation;
Collaboration and Presentation tools for Data Insights