0% found this document useful (0 votes)
13 views6 pages

Big Data Analytics Application

The document outlines the requirements for a semester project in Big Data Analytics & Applications for the Department of Computing & Information System. Students must form teams of two, select a topic from specified research areas, conduct a literature review to identify a research gap, and write a scientific paper following a structured format. The project is due on December 6, 2024, and emphasizes originality to avoid plagiarism penalties.

Uploaded by

rahman2312091037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Big Data Analytics Application

The document outlines the requirements for a semester project in Big Data Analytics & Applications for the Department of Computing & Information System. Students must form teams of two, select a topic from specified research areas, conduct a literature review to identify a research gap, and write a scientific paper following a structured format. The project is due on December 6, 2024, and emphasizes originality to avoid plagiarism penalties.

Uploaded by

rahman2312091037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Department of Computing & Information System (CIS)

Subject: Big Data Analytics & Applications


Course Code: CIS 312
Semester Project
Fall 2024
Last date of Submission: 06th December 2024
---------------------------------------------------------------------------------------------------------------------------

Follow the instructions to complete your final semester project:


Instructions:
1. Construct research teams with your friends, remember each team will have only two
members.
2. Now, choose one topic from Research Area 01 or Research Area 02, remember each
team will pick up only one topic from it.
3. Find a research Gap from your chosen topic by doing a literature review. (Google
Scholar, IEEE Xplore. etc. databases)
4. Write Research questions and objectives according to your literature review.
5. After that, write a scientific paper according to your findings, your paper may have
these structures: -
a. Abstract
b. Introduction
c. Literature Review
d. Methodology / Models
e. Result Analysis & Discussion
f. Conclusion
g. References
Research Area 01
1. Scalable Data Processing & Storage Technologies
• Distributed Systems: Optimization of frameworks like Hadoop, Apache Spark, and Flink.
• Cloud-based Data Management: Efficient data lakes, multi-cloud management, and hybrid
storage.
• Edge Computing: Processing data closer to the source to reduce latency.
• High-Performance Computing (HPC): Parallel computing techniques to handle large-scale
datasets.
Research Questions:
• How to improve data storage and retrieval efficiency in large distributed environments?
• What novel architectures can optimize data processing at the edge?

2. Big Data Privacy & Security


• Privacy-preserving Analytics: Homomorphic encryption, differential privacy techniques.
• Secure Data Sharing & Collaboration: Federated learning across multiple stakeholders.
• Data Governance & Compliance: Handling GDPR, CCPA, and other regulations.
• Anomaly Detection: Identifying malicious activities in large datasets.
Research Questions:
• How to balance data utility with privacy guarantees?
• What methods can ensure security in real-time analytics?

3. Machine Learning and AI for Big Data


• Deep Learning at Scale: Training large-scale models with distributed data.
• AutoML for Big Data: Automating the selection and tuning of models.
• Reinforcement Learning: Adapting learning algorithms to large dynamic datasets.
• Explainable AI (XAI): Interpreting complex models used for analytics.
Research Questions:
• How can ML algorithms be optimized for streaming data?
• What methods can enhance the interpretability of large-scale models?

4. Real-time and Streaming Analytics


• Event Processing Frameworks: Handling high-velocity data streams.
• Data Streams Integration: Combining streams with static datasets.
• Latency Reduction Techniques: Optimizing for low-latency insights.
Research Questions:
• How to develop scalable algorithms for real-time decision-making?
• What novel architectures can better support streaming analytics?

5. Big Data Visualization & Interaction


• Visual Analytics: Creating scalable visualization tools for large datasets.
• Interactive Dashboards: Ensuring performance with high-dimensional data.
• Cognitive Load Management: Designing interfaces for data comprehension.
Research Questions:
• What techniques enhance user interaction with large datasets?
• How can visualizations be adapted to non-technical audiences?

6. Big Data Applications in Specific Domains


• Healthcare: Predictive models for patient outcomes, epidemic detection.
• Finance: Fraud detection, algorithmic trading.
• Smart Cities: Urban planning using sensor data.
• Agriculture: Precision farming and crop yield prediction.
Research Questions:
• How can domain-specific challenges be addressed using Big Data analytics?
• What are the unique data integration challenges in cross-domain analytics?

7. Ethics and Bias in Big Data Analytics


• Bias Detection & Mitigation: Identifying biases in datasets and algorithms.
• Fairness in Decision Making: Ensuring equitable outcomes from data-driven decisions.
• Algorithmic Accountability: Ensuring transparency and accountability in analytics systems.
Research Questions:
• How to detect and eliminate hidden biases in Big Data pipelines?
• What frameworks can ensure the ethical use of Big Data?
8. IoT and Sensor Data Analytics
• IoT Data Fusion: Combining data from heterogeneous sensors.
• Predictive Maintenance: Analyzing sensor data for equipment failure.
• Energy-efficient Analytics: Reducing energy consumption in IoT analytics.
Research Questions:
• How to handle noisy and incomplete sensor data?
• What architectures support continuous learning from IoT devices?

Research area 02
Designing architectures to support streaming analytics requires handling high-velocity, real-time data
efficiently. Novel architectures focus on processing data in motion, reducing latency, scaling
dynamically, and providing timely insights. Here are some cutting-edge architectures to support
streaming analytics:
1. Lambda Architecture 2.0 (Augmented Lambda)
Overview: Extends the traditional Lambda architecture by adding more real-time processing
capabilities. It separates data into:
Batch layer: Stores historical data for deep analysis.
Speed layer: Processes real-time data streams.
Serving layer: Combines results from both layers for quick query responses.
Novelty:

• Uses stream-first processing, where real-time data is prioritized over batch jobs.
• Advances in tools like Apache Beam support unified APIs for both batch and streaming data.
Use Cases: Real-time fraud detection, and live analytics for social media feeds.

2. Kappa Architecture
Overview: A simplification of the Lambda architecture, focusing only on real-time processing. Instead
of separating batch and speed layers, all data is treated as a stream.
Core Technology:

• Tools like Kafka Streams, Flink, and Apache Pulsar make Kappa feasible.
Novelty:

• Data is processed continuously, even for historical datasets, by replaying event logs.
• Avoids the complexity of maintaining two separate processing paths (batch and real-time).
Use Cases: Real-time recommendations, IoT data analytics.
3. Microservices-based Streaming Architecture
Overview: Leverages microservices to create loosely coupled, independently deployable services for
streaming data.

• Data Pipelines are broken into smaller components, each service processing and forwarding
data.
• Services communicate via event-driven platforms like Apache Kafka, RabbitMQ, or AWS
Kinesis.
Novelty:

• Auto-scaling ensures services adapt dynamically to changes in data load.


• Fault isolation: Failures in one service do not crash the entire pipeline.
• Supports polyglot programming—different services can be built using different languages or
frameworks.
Use Cases: E-commerce order tracking, event-driven financial analytics.

4. Graph-based Streaming Architecture


Overview: Uses graph processing frameworks such as Apache Flink, Apache Storm, or DAG-based
(Directed Acyclic Graph) architectures for handling complex event streams.

• Data flows through a sequence of operators organized as a graph.


Novelty:

• Supports stateful streaming with windowed processing.


• Optimizes for multi-path data flows—data can be processed along different paths in parallel.
Use Cases: Network traffic monitoring, fraud detection pipelines, complex IoT event streams.

5. Serverless Streaming Architecture


Overview: Employs serverless computing platforms (e.g., AWS Lambda, Google Cloud Functions,
Azure Functions) to process streaming data in an event-driven, pay-per-use manner.

• Serverless functions are invoked when new data arrives, ensuring real-time responses.
Novelty:

• Elastic scaling: Resources scale up/down automatically with data volume.


• Cost-efficiency: Pay only for computing time used during events.
• Easy integration with cloud-native streaming platforms (e.g., Amazon Kinesis).
Use Cases: Real-time log monitoring, and anomaly detection in cloud infrastructure.

6. Federated Streaming Architecture


Overview: Designed for environments where data cannot be centralized due to privacy, compliance, or
bandwidth constraints.
• Distributed nodes analyze local data streams independently, and only minimal insights
(aggregates or models) are shared.
• Often combined with Federated Learning techniques.

Novelty:

• Preserve privacy by processing data locally and sharing only necessary outcomes.
• Reduces network congestion by minimizing data transfer.
Use Cases: Healthcare analytics, smart cities, and distributed IoT networks.

7. Streaming Digital Twins Architecture


Overview: Creates digital twins—virtual models of physical entities—powered by real-time data
streams. Each twin constantly updates based on sensor data or live inputs.

• Edge computing components process data streams locally and sync with the central model.
Novelty:

• Combines edge analytics with real-time model updates for situational awareness.
• Supports predictive analytics through live simulation of scenarios.
Use Cases: Predictive maintenance in manufacturing, autonomous vehicles, and energy grid
management.

8. In-Memory Streaming Architecture


Overview: Uses in-memory data grids (e.g., Apache Ignite, Redis) to process streams with minimal
latency. Data is held entirely in memory to avoid disk I/O bottlenecks.
Novelty:

• Achieves near real-time processing by reducing disk-based operations.


• Facilitates stateful computations for long-running processes (e.g., session management).
Use Cases: High-frequency trading, real-time recommendation engines.

These architectures aim to balance scalability, latency, fault tolerance, and ease of integration. Each is
tailored to specific use cases—whether edge computing, privacy-preserving analytics, or high-
performance real-time systems—making them critical for the next generation of streaming analytics.

General Instructions:
❖ You have to submit a soft copy
1. A .docx file that will contain the assignment.
❖ Submit into the BLC’s assignment section.

❖ Marks will be deducted accordingly if any plagiarism of work is provided.

You might also like