Assignment Big Data Analytics
Assignment Big Data Analytics
Assignment: 01
Date: 07/11/2024
Question 1
Describe the evolution of Hadoop and its significance in the context of big data
processing.
Explain how Hadoop's ecosystem has grown to support various data science tasks.
List and briefly describe some of the key tools and technologies that are part of the
Hadoop ecosystem.
Question 2
Explain the concept of a distributed file system in Hadoop.
Discuss how the distributed file system enables cost-effective storage and processing
of large datasets.
Describe the role of resource management and scheduling in Hadoop.
Question 3
Discuss the advantages of using Hadoop for data science tasks, including its ability to
handle unstructured and semi-structured data.
Explain how Hadoop supports multiple programming languages and provides robust
scheduling and resource management capabilities.
Provide examples of how Hadoop's tools and technologies, such as Apache Pig,
Apache Sqoop, and Apache Flume, can be used to support data science workflows.
Big Data Analytics
Assignment: 02
Date: 12/12/2024
Clustering
Overview of Clustering
Uses of Clustering
Designing a Similarity Measure
Distance Functions
Similarity Functions
Clustering Algorithms
Example: Clustering Algorithms
k-means Clustering
Latent Dirichlet Allocation
Evaluating the Clusters and Choosing the Number
of Clusters
Building Big Data Clustering Solutions
Example: Topic Modeling with Latent Dirichlet
Allocation
Feature Generation
Running Latent Dirichlet Allocation
Big Data Analytics
Assignment: 03
Date: 09/01/2025
Anomaly Detection with Hadoop
Overview
Uses of Anomaly Detection
Types of Anomalies in Data
Approaches to Anomaly Detection
Rules-based Methods
Supervised Learning Methods
Unsupervised Learning Methods
Semi-Supervised Learning Methods
Tuning Anomaly Detection Systems
Building a Big Data Anomaly Detection Solution
with Hadoop
Example: Detecting Network Intrusions
Data Ingestion
Building a Classifier
Evaluating Performance