0% found this document useful (0 votes)
164 views2 pages

Amazon Capstone Project

Uploaded by

famell qawiem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views2 pages

Amazon Capstone Project

Uploaded by

famell qawiem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

To meet the company's objectives for ingesting and converting data into their data lake, as well as

providing dashboards for visual representations, the following options can be investigated:

Data Ingestion via IoT Sensors:

Use Amazon Kinesis Data Streams or Amazon IoT Core to capture real-time data from IoT sensors.

Configure Kinesis Data Firehose to convert and load data into the data lake in near real-time.

Create an AWS Glue crawler to catalogue the data and make it available for analysis.

Use Apache Hadoop-based frameworks such as Apache Spark on Amazon EMR to manage and
analyse IoT data.

Database Data Ingestion:

To migrate data from an on-premises database to Amazon S3, use AWS Database Migration Service
(DMS).

Use AWS Glue ETL jobs or Apache Spark on Amazon EMR to prepare the data for analysis.

Keep the modified data in the data lake for subsequent use.
Third-Party Data Ingestion:

Obtain more data from third-party organisations by utilising their APIs or data transmission methods.

Before placing the data in the data lake, use AWS Lambda or EC2 instances to process and enhance it.

Use AWS Glue or Apache Spark to perform transformations as needed.

Cleaning and Transformation of Data:

Use AWS Glue ETL jobs or Apache Spark on Amazon EMR to clean, convert, and enhance the
imported data.

To use current Apache Hadoop-based software skills, use Apache Spark on Amazon EMR, which
provides a comparable environment and capabilities.

Dashboard Design:

Create dynamic dashboards and visualisations using Amazon QuickSight, a cloud-scale business
intelligence (BI) tool.

Connect QuickSight to the data lake and use the modified data to build visualisations.

For insights and analysis, share the dashboards with the analytics team and other stakeholders.

The organisation may ingest and transform data from numerous sources into their data lake utilising
technologies comparable to their present Apache Hadoop-based setup by leveraging AWS services
such as Amazon S3, Amazon Kinesis, AWS Glue, and Amazon EMR. Additionally, users can use
Amazon QuickSight to create interactive dashboards to visualise the data insights.

You might also like