Final project on data lakes with AWS

Project of a Data Lake with AWS usage

Uploaded by

simkaenf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views2 pages

Final project on data lakes with AWS

Project of a Data Lake with AWS usage

Uploaded by

simkaenf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Figure 1.

Architecture for the final project on data lakes

As indicated in the scenario, the data comes from 3 sources:

• IoT sensors sending real-time data, represented by the "Data Stream" logo.
• A database with historical records, represented by the "Multimedia" logo.
• Additional data from third-party entities to enrich the internally generated data, represented
by the "Database" logo.

Next is the ingestion managed by different services:

• The data stream is going to be managed Amazon Kinesis Data Firehose. This service will
prepare and load the data continuously to our storage.
• Kinesis Video Streams is adapted for multimedia. It’s going to ingest, durably stores, encrypts,
and indexes video streams for real-time and batch analytics.
• Snowcone is rugged, secure, and designed for use outside of a traditional data center. Its
compact size makes it ideal for confined spaces or when portability is a necessity. You can use
Snowcone in the backpacks of first responders or for IoT, vehicles, and even drones. You can
run edge computing applications, and you can ship the device with data to AWS for offline
data transfer, or you can transfer data online with AWS DataSync from edge locations.

The ingestion will lead to the data lake:

• Like suggested in the scenario, the storage that we are going to use is Amazon S3. Indeed,
that’s a good choice. Amazon Simple Storage Service (Amazon S3) is an object storage service
that offers scalable capacity, data availability, top-tier security, and performance. Customers
of all sizes and across all industries can store and protect any amount of data for nearly all
use cases, such as data lakes and cloud-native and mobile applications. With cost-effective
storage classes and user-friendly management features, you can optimize costs, organize
data, and configure precise access controls to meet specific operational, organizational, and
compliance requirements.
• In the data lake, we are also using AWS Glue. It helps us to extract data from sources,
transforms it, and loads it into targets by executing a script.

For the consumption part we are going to use theses following services:
• The first one is Amazon EMR which is suggested to use because the client want to use
Hadoop. Indeed, Amazon EMR is a managed cluster platform that simplifies running big data
frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze large
amounts of data. By using these frameworks and associated open-source projects like Apache
Hive and Apache Pig, you can process data for analytical and business intelligence workloads.
Additionally, you can use Amazon EMR to transform and move large amounts of data to and
from other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon
S3) and Amazon DynamoDB.
• The second one is Amazon Athena. Amazon Athena is an interactive query service that makes
it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard
SQL. In just a few steps in the AWS Management Console, you can point Athena to your data
stored in Amazon S3 and start using standard SQL to run ad-hoc queries and get results in
seconds.
• The third one is Amazon QuickSight for the visualization of the data. Amazon QuickSight is an
ultra-fast and user-friendly cloud-based business analytics service that enables all employees
in an organization to quickly create visualizations, perform ad-hoc analysis, and gain market
insights from their data, anywhere, on any device. Load CSV and Excel files, connect to SaaS
applications like Salesforce, access on-premises databases such as SQL Server, MySQL, and
PostgreSQL, and seamlessly discover your AWS data sources such as Amazon Redshift,
Amazon RDS, Amazon Aurora, Amazon Athena, and Amazon S3. QuickSight allows
organizations to scale their business analytics capabilities for hundreds of thousands of users
and deliver fast, responsive query performance using a robust in-memory engine (SPICE).
• The last one is Amazon Redshift. Amazon Redshift simplifies and cost-effectively enables high-
performance querying of petabytes of structured data, allowing you to create powerful
reports and dashboards using your existing business intelligence tools.

The last part is the security of our architecture:

• The first one is AWS Lake Formation. AWS Lake Formation easily creates secure data lakes,
making data available for large-scale analytics.
• The second one is AWS CloudTrail which can monitor and record account activity across your
entire AWS infrastructure, giving you control over storage, analysis, and corrective actions.

Cheat Sheet AWS Data Engineer Associate
No ratings yet
Cheat Sheet AWS Data Engineer Associate
117 pages
AWS Data Engineering Services
No ratings yet
AWS Data Engineering Services
24 pages
Building Data Lakes
No ratings yet
Building Data Lakes
40 pages
AWS+Data+Lake (1)
No ratings yet
AWS+Data+Lake (1)
118 pages
AWS Data Analytics - Technical - Student
No ratings yet
AWS Data Analytics - Technical - Student
160 pages
Xpress User Manual
No ratings yet
Xpress User Manual
108 pages
AWS Data Lake
No ratings yet
AWS Data Lake
87 pages
airline-ticket-shopping-ra
No ratings yet
airline-ticket-shopping-ra
1 page
Subtitle
No ratings yet
Subtitle
2 pages
AWS Data Lake
100% (1)
AWS Data Lake
104 pages
Section 2
No ratings yet
Section 2
1 page
ANT205 R Achieving Your Modern Data Architecture
No ratings yet
ANT205 R Achieving Your Modern Data Architecture
71 pages
AWS Whitepaper
No ratings yet
AWS Whitepaper
31 pages
Ppb1 Workshop Batch v2
No ratings yet
Ppb1 Workshop Batch v2
43 pages
Chapter 2 - Part 1
No ratings yet
Chapter 2 - Part 1
32 pages
Database in AWS
No ratings yet
Database in AWS
24 pages
Project Management Explorer
No ratings yet
Project Management Explorer
19 pages
C Program 2
No ratings yet
C Program 2
42 pages
How To Build Data Pipelines On AWS - Reference Workflow
No ratings yet
How To Build Data Pipelines On AWS - Reference Workflow
26 pages
1.2 CSRF-Slides
No ratings yet
1.2 CSRF-Slides
40 pages
AWS Summary
No ratings yet
AWS Summary
4 pages
Alex Michaelides Pacienta Tăcută PDF
No ratings yet
Alex Michaelides Pacienta Tăcută PDF
358 pages
TCS Anl Presentation - VIL v2.3
No ratings yet
TCS Anl Presentation - VIL v2.3
45 pages
Basic terms of DATA ENGINEERING
No ratings yet
Basic terms of DATA ENGINEERING
9 pages
Part-3 Azure Fundamentals Describe Azure Management and Governance
No ratings yet
Part-3 Azure Fundamentals Describe Azure Management and Governance
23 pages
data-platform-on-aws-and-snowflake-ra
No ratings yet
data-platform-on-aws-and-snowflake-ra
1 page
Aiesec X Aws Workshop
No ratings yet
Aiesec X Aws Workshop
45 pages
Modernize Your Analyticsand Data Architecture
No ratings yet
Modernize Your Analyticsand Data Architecture
47 pages
Data Lake On Aws
No ratings yet
Data Lake On Aws
29 pages
AWS Services
No ratings yet
AWS Services
34 pages
AWS Services - Analytics and ML
No ratings yet
AWS Services - Analytics and ML
2 pages
Analytics Services v2
No ratings yet
Analytics Services v2
59 pages
BDC Output 10
No ratings yet
BDC Output 10
7 pages
AWS White Paper
No ratings yet
AWS White Paper
6 pages
Project
No ratings yet
Project
3 pages
112115115 CC LAB7
No ratings yet
112115115 CC LAB7
7 pages
58076778-Node Javier Ramirez - AWS PDF
No ratings yet
58076778-Node Javier Ramirez - AWS PDF
73 pages
Simulation of Cyber-Attacks Workshop
No ratings yet
Simulation of Cyber-Attacks Workshop
29 pages
Amazon Capstone Project
No ratings yet
Amazon Capstone Project
2 pages
questions on functions in javascript
No ratings yet
questions on functions in javascript
12 pages
R22 DevOps unit-4
No ratings yet
R22 DevOps unit-4
10 pages
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
No ratings yet
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
2 pages
Document 1
No ratings yet
Document 1
15 pages
Aws Data Service Notes
No ratings yet
Aws Data Service Notes
9 pages
C++ Questions and Answers
No ratings yet
C++ Questions and Answers
5 pages
mmi-at022_-Methods to Disable MMI Control of a Vehicle
No ratings yet
mmi-at022_-Methods to Disable MMI Control of a Vehicle
6 pages
REGEDIT XLR8_FFXX%E2%9C%85
No ratings yet
REGEDIT XLR8_FFXX%E2%9C%85
3 pages
Architecture For Data Ingestion Clean Processing and Visulizationyounesse
No ratings yet
Architecture For Data Ingestion Clean Processing and Visulizationyounesse
2 pages
Hacking Ipc
No ratings yet
Hacking Ipc
9 pages
AWS 05 DataLake
No ratings yet
AWS 05 DataLake
78 pages
Mh2p Alpine Solution.: Tools Required
No ratings yet
Mh2p Alpine Solution.: Tools Required
8 pages
Oracle Applications - Query To Get Oracle Payable Account Details
No ratings yet
Oracle Applications - Query To Get Oracle Payable Account Details
3 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
20190516040231anduril SP36usermanualdraft02
No ratings yet
20190516040231anduril SP36usermanualdraft02
4 pages
Information Technology Open Systems Interconnection: The Directory
No ratings yet
Information Technology Open Systems Interconnection: The Directory
13 pages
Architecture
No ratings yet
Architecture
6 pages
ICT Attachment Reports
No ratings yet
ICT Attachment Reports
5 pages
15 Best Plagiarism Checker Tools
No ratings yet
15 Best Plagiarism Checker Tools
11 pages
AWS Products Compare
No ratings yet
AWS Products Compare
3 pages
127+ Data Science Projects With Python Code.
No ratings yet
127+ Data Science Projects With Python Code.
9 pages
DataAnalytics AWS PDF
No ratings yet
DataAnalytics AWS PDF
133 pages
Keliappan Appointment Order
No ratings yet
Keliappan Appointment Order
3 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
100% (1)
Big Data Architectural Patterns and Best Practices On AWS Presentation
56 pages
Bomag 212D2 Travel and Vibration Pump PDF
No ratings yet
Bomag 212D2 Travel and Vibration Pump PDF
39 pages
Big Data PDF
No ratings yet
Big Data PDF
18 pages
AWS Data Lake
No ratings yet
AWS Data Lake
13 pages
BWTS - HYUNDAI HWC - How To Use Rectifier Disable Function
No ratings yet
BWTS - HYUNDAI HWC - How To Use Rectifier Disable Function
4 pages
Question Set 3
No ratings yet
Question Set 3
4 pages
AWS Data-Lake Ebook
No ratings yet
AWS Data-Lake Ebook
9 pages
Activate Office
No ratings yet
Activate Office
2 pages
CS510 Final Term subjectivesolvedbyFatimaAli PDF
100% (1)
CS510 Final Term subjectivesolvedbyFatimaAli PDF
16 pages
Canvass Form: 3/10/2018 Equipment TM-021 Project Chenglong 340HP Engine Req # 109530
No ratings yet
Canvass Form: 3/10/2018 Equipment TM-021 Project Chenglong 340HP Engine Req # 109530
2 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
Using Two Planning Strategies For The Same Material
No ratings yet
Using Two Planning Strategies For The Same Material
10 pages
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
From Everand
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
Robert Johnson
No ratings yet
Amazon Web Services: A Complete Guide
From Everand
Amazon Web Services: A Complete Guide
Christopher Ford
No ratings yet
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
From Everand
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
Sankar Srinivasan
No ratings yet
AWS Cloud Practitioner: From Basic to Advanced
From Everand
AWS Cloud Practitioner: From Basic to Advanced
Alex Carvalho
No ratings yet
AWS Associate Architect: From basic to advanced
From Everand
AWS Associate Architect: From basic to advanced
Alex Carvalho
No ratings yet
A Comprehensive Guide to Amazon Web Services
From Everand
A Comprehensive Guide to Amazon Web Services
Josh Luberisse
No ratings yet
Amazon Web Services: A Complete Guide: The IT Collection
From Everand
Amazon Web Services: A Complete Guide: The IT Collection
Christopher Ford
No ratings yet
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
AWS for Beginners
From Everand
AWS for Beginners
Sankar Srinivasan
No ratings yet
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
AWS SysOps Administrator Associate: From basic to advanced
From Everand
AWS SysOps Administrator Associate: From basic to advanced
Alex Carvalho
No ratings yet
Microsoft Azure Fundamentals Exam Cram: Second Edition
From Everand
Microsoft Azure Fundamentals Exam Cram: Second Edition
IP Specialist
5/5 (1)
AWS in ACTION Part -1: Real-world Solutions for Cloud Professionals
From Everand
AWS in ACTION Part -1: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet

Final project on data lakes with AWS

Uploaded by

Final project on data lakes with AWS

Uploaded by

Figure 1.

Architecture for the final project on data lakes

As indicated in the scenario, the data comes from 3 sources:

Next is the ingestion managed by different services:

The ingestion will lead to the data lake:

The last part is the security of our architecture:

You might also like