0% found this document useful (0 votes)

15 views6 pages

Big Data Analytics Application

The document outlines the requirements for a semester project in Big Data Analytics & Applications for the Department of Computing & Information System. Students must form teams of two, select a topic from specified research areas, conduct a literature review to identify a research gap, and write a scientific paper following a structured format. The project is due on December 6, 2024, and emphasizes originality to avoid plagiarism penalties.

Uploaded by

rahman2312091037

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Big Data Analytics Application

Uploaded by

rahman2312091037

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Department of Computing & Information System (CIS)

Subject: Big Data Analytics & Applications

Course Code: CIS 312
Semester Project
Fall 2024
Last date of Submission: 06th December 2024
---------------------------------------------------------------------------------------------------------------------------

Follow the instructions to complete your final semester project:

Instructions:
1. Construct research teams with your friends, remember each team will have only two
members.
2. Now, choose one topic from Research Area 01 or Research Area 02, remember each
team will pick up only one topic from it.
3. Find a research Gap from your chosen topic by doing a literature review. (Google
Scholar, IEEE Xplore. etc. databases)
4. Write Research questions and objectives according to your literature review.
5. After that, write a scientific paper according to your findings, your paper may have
these structures: -
a. Abstract
b. Introduction
c. Literature Review
d. Methodology / Models
e. Result Analysis & Discussion
f. Conclusion
g. References
Research Area 01
1. Scalable Data Processing & Storage Technologies
• Distributed Systems: Optimization of frameworks like Hadoop, Apache Spark, and Flink.
• Cloud-based Data Management: Efficient data lakes, multi-cloud management, and hybrid
storage.
• Edge Computing: Processing data closer to the source to reduce latency.
• High-Performance Computing (HPC): Parallel computing techniques to handle large-scale
datasets.
Research Questions:
• How to improve data storage and retrieval efficiency in large distributed environments?
• What novel architectures can optimize data processing at the edge?

2. Big Data Privacy & Security

• Privacy-preserving Analytics: Homomorphic encryption, differential privacy techniques.
• Secure Data Sharing & Collaboration: Federated learning across multiple stakeholders.
• Data Governance & Compliance: Handling GDPR, CCPA, and other regulations.
• Anomaly Detection: Identifying malicious activities in large datasets.
Research Questions:
• How to balance data utility with privacy guarantees?
• What methods can ensure security in real-time analytics?

3. Machine Learning and AI for Big Data

• Deep Learning at Scale: Training large-scale models with distributed data.
• AutoML for Big Data: Automating the selection and tuning of models.
• Reinforcement Learning: Adapting learning algorithms to large dynamic datasets.
• Explainable AI (XAI): Interpreting complex models used for analytics.
Research Questions:
• How can ML algorithms be optimized for streaming data?
• What methods can enhance the interpretability of large-scale models?

4. Real-time and Streaming Analytics

• Event Processing Frameworks: Handling high-velocity data streams.
• Data Streams Integration: Combining streams with static datasets.
• Latency Reduction Techniques: Optimizing for low-latency insights.
Research Questions:
• How to develop scalable algorithms for real-time decision-making?
• What novel architectures can better support streaming analytics?

5. Big Data Visualization & Interaction

• Visual Analytics: Creating scalable visualization tools for large datasets.
• Interactive Dashboards: Ensuring performance with high-dimensional data.
• Cognitive Load Management: Designing interfaces for data comprehension.
Research Questions:
• What techniques enhance user interaction with large datasets?
• How can visualizations be adapted to non-technical audiences?

6. Big Data Applications in Specific Domains

• Healthcare: Predictive models for patient outcomes, epidemic detection.
• Finance: Fraud detection, algorithmic trading.
• Smart Cities: Urban planning using sensor data.
• Agriculture: Precision farming and crop yield prediction.
Research Questions:
• How can domain-specific challenges be addressed using Big Data analytics?
• What are the unique data integration challenges in cross-domain analytics?

7. Ethics and Bias in Big Data Analytics

• Bias Detection & Mitigation: Identifying biases in datasets and algorithms.
• Fairness in Decision Making: Ensuring equitable outcomes from data-driven decisions.
• Algorithmic Accountability: Ensuring transparency and accountability in analytics systems.
Research Questions:
• How to detect and eliminate hidden biases in Big Data pipelines?
• What frameworks can ensure the ethical use of Big Data?
8. IoT and Sensor Data Analytics
• IoT Data Fusion: Combining data from heterogeneous sensors.
• Predictive Maintenance: Analyzing sensor data for equipment failure.
• Energy-efficient Analytics: Reducing energy consumption in IoT analytics.
Research Questions:
• How to handle noisy and incomplete sensor data?
• What architectures support continuous learning from IoT devices?

Research area 02
Designing architectures to support streaming analytics requires handling high-velocity, real-time data
efficiently. Novel architectures focus on processing data in motion, reducing latency, scaling
dynamically, and providing timely insights. Here are some cutting-edge architectures to support
streaming analytics:
1. Lambda Architecture 2.0 (Augmented Lambda)
Overview: Extends the traditional Lambda architecture by adding more real-time processing
capabilities. It separates data into:
Batch layer: Stores historical data for deep analysis.
Speed layer: Processes real-time data streams.
Serving layer: Combines results from both layers for quick query responses.
Novelty:

• Uses stream-first processing, where real-time data is prioritized over batch jobs.
• Advances in tools like Apache Beam support unified APIs for both batch and streaming data.
Use Cases: Real-time fraud detection, and live analytics for social media feeds.

2. Kappa Architecture
Overview: A simplification of the Lambda architecture, focusing only on real-time processing. Instead
of separating batch and speed layers, all data is treated as a stream.
Core Technology:

• Tools like Kafka Streams, Flink, and Apache Pulsar make Kappa feasible.
Novelty:

• Data is processed continuously, even for historical datasets, by replaying event logs.
• Avoids the complexity of maintaining two separate processing paths (batch and real-time).
Use Cases: Real-time recommendations, IoT data analytics.
3. Microservices-based Streaming Architecture
Overview: Leverages microservices to create loosely coupled, independently deployable services for
streaming data.

• Data Pipelines are broken into smaller components, each service processing and forwarding
data.
• Services communicate via event-driven platforms like Apache Kafka, RabbitMQ, or AWS
Kinesis.
Novelty:

• Auto-scaling ensures services adapt dynamically to changes in data load.

• Fault isolation: Failures in one service do not crash the entire pipeline.
• Supports polyglot programming—different services can be built using different languages or
frameworks.
Use Cases: E-commerce order tracking, event-driven financial analytics.

4. Graph-based Streaming Architecture

Overview: Uses graph processing frameworks such as Apache Flink, Apache Storm, or DAG-based
(Directed Acyclic Graph) architectures for handling complex event streams.

• Data flows through a sequence of operators organized as a graph.

Novelty:

• Supports stateful streaming with windowed processing.

• Optimizes for multi-path data flows—data can be processed along different paths in parallel.
Use Cases: Network traffic monitoring, fraud detection pipelines, complex IoT event streams.

5. Serverless Streaming Architecture

Overview: Employs serverless computing platforms (e.g., AWS Lambda, Google Cloud Functions,
Azure Functions) to process streaming data in an event-driven, pay-per-use manner.

• Serverless functions are invoked when new data arrives, ensuring real-time responses.
Novelty:

• Elastic scaling: Resources scale up/down automatically with data volume.

• Cost-efficiency: Pay only for computing time used during events.
• Easy integration with cloud-native streaming platforms (e.g., Amazon Kinesis).
Use Cases: Real-time log monitoring, and anomaly detection in cloud infrastructure.

6. Federated Streaming Architecture

Overview: Designed for environments where data cannot be centralized due to privacy, compliance, or
bandwidth constraints.
• Distributed nodes analyze local data streams independently, and only minimal insights
(aggregates or models) are shared.
• Often combined with Federated Learning techniques.

Novelty:

• Preserve privacy by processing data locally and sharing only necessary outcomes.
• Reduces network congestion by minimizing data transfer.
Use Cases: Healthcare analytics, smart cities, and distributed IoT networks.

7. Streaming Digital Twins Architecture

Overview: Creates digital twins—virtual models of physical entities—powered by real-time data
streams. Each twin constantly updates based on sensor data or live inputs.

• Edge computing components process data streams locally and sync with the central model.
Novelty:

• Combines edge analytics with real-time model updates for situational awareness.
• Supports predictive analytics through live simulation of scenarios.
Use Cases: Predictive maintenance in manufacturing, autonomous vehicles, and energy grid
management.

8. In-Memory Streaming Architecture

Overview: Uses in-memory data grids (e.g., Apache Ignite, Redis) to process streams with minimal
latency. Data is held entirely in memory to avoid disk I/O bottlenecks.
Novelty:

• Achieves near real-time processing by reducing disk-based operations.

• Facilitates stateful computations for long-running processes (e.g., session management).
Use Cases: High-frequency trading, real-time recommendation engines.

These architectures aim to balance scalability, latency, fault tolerance, and ease of integration. Each is
tailored to specific use cases—whether edge computing, privacy-preserving analytics, or high-
performance real-time systems—making them critical for the next generation of streaming analytics.

General Instructions:
❖ You have to submit a soft copy
1. A .docx file that will contain the assignment.
❖ Submit into the BLC’s assignment section.

❖ Marks will be deducted accordingly if any plagiarism of work is provided.

Microsoft Certified Azure AI Fundamentals
No ratings yet
Microsoft Certified Azure AI Fundamentals
75 pages
Ebook Fast Data Architectures For Streaming Applications 2
No ratings yet
Ebook Fast Data Architectures For Streaming Applications 2
58 pages
500 ETI MCQ Must Do
100% (1)
500 ETI MCQ Must Do
72 pages
Deepa PPT 49
No ratings yet
Deepa PPT 49
20 pages
Applications of Machine Learning and Deep Learning in Antenna Design Optimization and Selection A Review
No ratings yet
Applications of Machine Learning and Deep Learning in Antenna Design Optimization and Selection A Review
26 pages
Cheatsheet System Design
No ratings yet
Cheatsheet System Design
16 pages
Thesis Computer Science Topics
100% (2)
Thesis Computer Science Topics
4 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
No ratings yet
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
15 pages
Gautam Resume PDF
No ratings yet
Gautam Resume PDF
1 page
Evolution of Generative AI 1721160426
No ratings yet
Evolution of Generative AI 1721160426
10 pages
Unit II Big Data Architecture
No ratings yet
Unit II Big Data Architecture
5 pages
Ataei P
No ratings yet
Ataei P
416 pages
Big Data Notes
No ratings yet
Big Data Notes
291 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
BDA Notes
No ratings yet
BDA Notes
54 pages
System Design CheatSheet
No ratings yet
System Design CheatSheet
9 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Master Thesis
No ratings yet
Master Thesis
68 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
Unit 2 (ETI) BDA
No ratings yet
Unit 2 (ETI) BDA
22 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
Finance - Unit 4
No ratings yet
Finance - Unit 4
39 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
NPTEL Course List (Jan - Apr 2025)
No ratings yet
NPTEL Course List (Jan - Apr 2025)
21 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Attachment
No ratings yet
Attachment
25 pages
1 s2.0 S0169023X23001246 Main
No ratings yet
1 s2.0 S0169023X23001246 Main
23 pages
Final Report A I Detect
No ratings yet
Final Report A I Detect
34 pages
Anthony, Oluwatobiloba Emmanuel 180404027: Department of Computer Science Adekunle Ajasin University, Akungba Akoko
No ratings yet
Anthony, Oluwatobiloba Emmanuel 180404027: Department of Computer Science Adekunle Ajasin University, Akungba Akoko
13 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
2 RK - BigData Architecture - V2
No ratings yet
2 RK - BigData Architecture - V2
23 pages
BDMA Part 2
No ratings yet
BDMA Part 2
16 pages
Bda Ans
No ratings yet
Bda Ans
18 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Scenario-Based Questions On Integrating Data in A Cloud
No ratings yet
Scenario-Based Questions On Integrating Data in A Cloud
17 pages
Bda Ut2 Que Ans
No ratings yet
Bda Ut2 Que Ans
14 pages
The Transformative Impact of Artificial Intelligence (AI) On Enhancing Healthcare Systems in The Middle East-Output
No ratings yet
The Transformative Impact of Artificial Intelligence (AI) On Enhancing Healthcare Systems in The Middle East-Output
16 pages
Bigdata
No ratings yet
Bigdata
23 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
System Design
No ratings yet
System Design
6 pages
Data Eng
No ratings yet
Data Eng
10 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
Bigdata
No ratings yet
Bigdata
18 pages
A Review On Animal Detection and Classification Using Computer Vision Techniques: Scope For Future Enhancement To Application
No ratings yet
A Review On Animal Detection and Classification Using Computer Vision Techniques: Scope For Future Enhancement To Application
6 pages
Big Data Chatgpt
No ratings yet
Big Data Chatgpt
8 pages
ICT703 - Big Data - Assessment 1 - Case Study Analysis Report - 1.2
No ratings yet
ICT703 - Big Data - Assessment 1 - Case Study Analysis Report - 1.2
14 pages
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
No ratings yet
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
12 pages
237 - IEEE-107 - Type-I and Type-II
No ratings yet
237 - IEEE-107 - Type-I and Type-II
5 pages
Al Evaluation
No ratings yet
Al Evaluation
4 pages
Course Information Machine Learning and Data Mining - CSC411/2515 Fall 2018
No ratings yet
Course Information Machine Learning and Data Mining - CSC411/2515 Fall 2018
6 pages
BigdatMid1 Shcema
No ratings yet
BigdatMid1 Shcema
7 pages
ML Merged PDF
No ratings yet
ML Merged PDF
14 pages
Invention of AI
No ratings yet
Invention of AI
13 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
7 pages
Unit 1
No ratings yet
Unit 1
9 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
Stock Prediction RNN
No ratings yet
Stock Prediction RNN
7 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
DARPA To Increase Artificial Intelligence IQ: Researchers Wrestle With How To Represent Human Knowledge in A Machine
No ratings yet
DARPA To Increase Artificial Intelligence IQ: Researchers Wrestle With How To Represent Human Knowledge in A Machine
4 pages
Big Data Integration and Processing 15 Marks
No ratings yet
Big Data Integration and Processing 15 Marks
5 pages
Group 3&4 Assignment Sample Solution
No ratings yet
Group 3&4 Assignment Sample Solution
5 pages
Kmean
No ratings yet
Kmean
4 pages
AIML CIA II Question Paper ECE Remedial
No ratings yet
AIML CIA II Question Paper ECE Remedial
2 pages
SAS - Assignment-01-2001
No ratings yet
SAS - Assignment-01-2001
2 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
Big Data Is A Vast and Fascinating Field With Many Potential Topics For Discussion
No ratings yet
Big Data Is A Vast and Fascinating Field With Many Potential Topics For Discussion
2 pages
Big Data Architectures
No ratings yet
Big Data Architectures
8 pages
20CS1107
No ratings yet
20CS1107
2 pages
Big Data Arch
No ratings yet
Big Data Arch
2 pages
Poster Marcello - Nasional (5 Desember 2024)
No ratings yet
Poster Marcello - Nasional (5 Desember 2024)
1 page
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
From Everand
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
From Everand
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
From Everand
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Netdata in Practice: Definitive Reference for Developers and Engineers
From Everand
Netdata in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Big Data Analytics Application

Uploaded by

Big Data Analytics Application

Uploaded by

Department of Computing & Information System (CIS)

Subject: Big Data Analytics & Applications

Follow the instructions to complete your final semester project:

2. Big Data Privacy & Security

3. Machine Learning and AI for Big Data

4. Real-time and Streaming Analytics

5. Big Data Visualization & Interaction

6. Big Data Applications in Specific Domains

7. Ethics and Bias in Big Data Analytics

• Auto-scaling ensures services adapt dynamically to changes in data load.

4. Graph-based Streaming Architecture

• Data flows through a sequence of operators organized as a graph.

• Supports stateful streaming with windowed processing.

5. Serverless Streaming Architecture

• Elastic scaling: Resources scale up/down automatically with data volume.

6. Federated Streaming Architecture

7. Streaming Digital Twins Architecture

8. In-Memory Streaming Architecture

• Achieves near real-time processing by reducing disk-based operations.

❖ Marks will be deducted accordingly if any plagiarism of work is provided.

You might also like