0% found this document useful (0 votes)
2 views

dynamodb

DynamoDB is a fully managed, serverless NoSQL database by AWS that offers high availability, low latency, and scalability for massive workloads. It features a key-value data model with a primary key structure and integrates with IAM for security and authorization. Additionally, DynamoDB Accelerator (DAX) enhances performance by providing an in-memory caching layer, allowing for microsecond latency in data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

dynamodb

DynamoDB is a fully managed, serverless NoSQL database by AWS that offers high availability, low latency, and scalability for massive workloads. It features a key-value data model with a primary key structure and integrates with IAM for security and authorization. Additionally, DynamoDB Accelerator (DAX) enhances performance by providing an in-memory caching layer, allowing for microsecond latency in data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

So now let's talk about DynamoDB.

DynamoDB is a fully managed, highly available database with replication across


three availability zone.It is part of the NoSQL database family, so it's not a
relational database.
DynamoDB is one of the flagship product of AWS. It scales to massive workload and
it's distributed serverless database, that means that we don't provision any
servers.
With RDS or with ElastiCache we need to provision a instance type, but with
DynamoDB we don't.
So it's called a serverless database, but there are still servers in the backend,
we just don't see them.
So DynamoDB is great because it scales to millions of requests per seconds,
trillions of rows, and hundreds of terabytes of storage.
It has fast and consistent performance. And so anytime you need a single digit
millisecond latency with low latency retrieval, DynamoDB is the database for you.
So you would be looking for keywords in your exam, such as serverless and low
latency. For example, single digit millisecond latency.
It is integrated with IAM for security, authorization, and administration.
Has low cost and auto scaling capabilities. And finally, a standard and infrequent
access, IA, table class based on how you want to classify your data for cost
saving.
So the question you may have is, what type of data goes into DynamoDB? Well, it's a
key value database and the data looks like this.
You have a primary key, which is made of one or two columns, a partition key and a
sort key. And then attributes on the right hand side where you can define your own
columns for your data.
Finally, all the items are going to be row by row. And this is how a DynamoDB table
works. Very, very simple, but again, remember it's a NoSQL database.
It has low latency retrieval of data and you also get access to a serverless
database.
Then we have DynamoDB Accelerator also called DAX. In the exam both can be used. So
it is a fully managed in-memory cache for DynamoDB. So this is a cache that's
specific for DynamoDB
so it's not like ElastiCache. So for example, say your application wants to access
DynamoDB tables in DynamoDB, for this if you want to cache the most frequently read
objects,
then you would use DAX or DynamoDB Accelerator as a cache in between. And DAX is
made just for DynamoDB. Okay? So you would not use ElastiCache in this case.
Even though you could, you would use DAX because it's already really, really well
integrated with DynamoDB.
So this is going to give you a 10x performance improvement. So instead of digit,
instead of single digit millisecond latency to read records, you will have
microseconds latency when accessing your DynamoDB tables.
It's going to be fully secure, highly scalable, and highly available. And the
difference with ElastiCache again, is that DAX is only used in, it's fully
integrated with DynamoDB. Whereas ElastiCache can be used for other types of
databases to provide caching capabilities.
So that's it for this lecture. I hope you liked it and I will see you in the next
lecture.

Okay, so let's go ahead and practice DynamoDB a little bit. So I'm going to create
a table, and I'll call this one DemoTable.
Now in DynamoDB what you need to do is specify a partition key, so specify user_id,
and sort keys are definitely out of scope for the exam,
so let's just consider just the partition key. Okay, so, now for the settings in
DynamoDB, again, I'm going to leave it as a default settings,
you don't need to know the details of how it works, and then you scroll down and
you click on Create table.
So our table is now creating, and what I'm going to notice is that we are creating
a table without creating a database.
So the database already exists, it's serverless, we don't need to provision
servers.
We just want you to say, "Hey, look, I want this table, please create it for me and
I don't care how it's being run." And that's the whole power of DynamoDB,
that's the whole power of serverless services.
So now that the table is ready we can click on View items, and practice a little
bit inserting some data into this table. So currently zero items are returned
because I haven't created anything, but we can create an item in DynamoDB. And for
user_id I can say 1234, so that will be my user ID. We can have a first name to be
equal to Stephane, and we can have a last name to be equal to Maarek, and then,
finally, we could have a number. So I can see lot of different types we can have,
by number, and it could be my favorite number, and it's 42, and click on Create
item. And here we go, my item was written into DynamoDB. So this is super easy,
yet, again, I do not have to specify any database, just my table, write some item,
and so on. And we didn't have to specify this schema, it just get automatically
inferred, now we have four attributes or columns. Now if I can create a second
item, so 45678, and then add a new string, so, for example, first_name. In here
we're going to have Alice, and click on Create item. So, as we can see, in this
example, I didn't specify a favorite number or last name for Alice, I just
specified first name Alice, and it was still accepted by DynamoDB. So it's a very
flexible type of database, it's a very flexible way to insert data, and this whole,
like, test sets of features and particularities make DynamoDB really, really good.
But the difference between DynamoDB and, say, RDS is that DynamoDB will have all
the data living within one single table, and there's no way to join it with another
table. So it's not a relational database, that's when DynamoDB is a non SQL
database, so not only SQL database, so NoSQL. And the idea, here, is that, yes, we
cannot link this table to another table, so we need to make sure that all the
relevant data is well formatted within our main DynamoDB table. So it changes a
little bit how you see database designs and so on. But that's it for DynamoDB, a
very quick hands-on to show you an overview, but with DynamoDB it's a lot more to
learn and this is the focus of the Certified Developer exam, not the Cloud
Practitioner exam. So, for this example, this is enough, and when you're done and
ready you can just delete the table. You can delete all the CloudWatch alarms
associated with it, and just type delete in this box and you'll be good to go. So
that's it for this lecture, I hope you liked it, and I will see you in the next
lecture. Okay, so now let's talk about DynamoDB global tables. So this is a one
feature need to know about for DynamoDB. It's a way for you to make DynamoDB table
accessible with low latency and this is the key in multiple regions. So let's take
an example. We have our DynamoDB table in us-east-one, and we'll set up as a global
table. So the basics, of course, our users can do read and write to this DynamoDB
table in Northern Virginia us-east-one. But it is possible for us to set up some
replication for this global table. So we can create a global table in Paris eu-
west-three and we'll say they're the same, so there's two way replication between
these tables. That means that the same data is in us-east-one and in eu-west-three,
but users, you know, there are close to the Paris region, can access this global
table with low latency in Paris. So this is true for one to 10 regions, if you want
it to. Okay. A global table is truly global and users can read and write to the
table in any specific region, there will just be replication between these two. So
the fact that there is read/write access to any region of AWS on this global table,
makes it an active-active replication because you can actively write to any region
and it will actively be replicated into other regions. Okay. So that's it, all you
need to know for the exam. I will see you in the next session. Next type of
database we have is Redshift. Redshift is a database that is based on PostgreSQL,
but it is not used for OLTP. OLTP stands for Online Transaction Processing. That is
what RDS was good for. Instead, Redshift is OLAP or Online Analytical Processing,
which is used to do analytics and data warehousing. So anytime in the exam you are
seeing the database needs to be a warehouse and to do analytics on it, then
Redshift is going to be your answer. With Redshift, you do not load data
continuously, you load it for example, every hour. The idea with Redshift that it
is really, really good at analyzing data and making some computations. So it boasts
10x better performance than other data warehouses, and scales to petabytes of data.
The data is stored in columns. So it is called a columnar storage of data instead
of a row based. So anytime you see columnar again, think Redshift and it has
something called the Massively Parallel Query Execution, MBP engine to do these
competitions very, very quickly. It is pay as you go, based on the instances you
have provisioned and has a SQL interface to perform the queries. It is also finally
integrated with BI. So Business Intelligence tools such as Quicksight or Tableau,
if you want to create dashboards on top of your data and your data warehouse. So
that is it. Just a high level overview, that a data warehouse is used to do some
computation on your datasets and do some analytics and possibly build some
visualizations through dashboards on it. So for that use case, Redshift will be
perfect. So that is it for this lecture. I hope you liked it, and I will see you in
the next lecture. Another type of database we have on AWS is Amazon EMR, and EMR
stands for Elastic MapReduce. So EMR is actually not really a database. It's to
create what's called a Hadoop cluster when you wanna do big data on AWS, and a
Hadoop cluster is used to analyze and process vast amount of data. So Hadoop is an
open source technology, and they allow multiple servers that work in a cluster to
analyze the data together, and so when you're using EMR, you can create a cluster
made of hundreds of EC2 instances that will be collaborating together to analyze
your data. So part of the Hadoop ecosystem, the Big Data ecosystem, you will see
projects names such as Apache Spark, HBase, Presto, and Flink, and all these things
will be working on top of your Hadoop cluster. So what is EMR then? Well, EMR takes
care of provisioning all these EC2 instances and configuring them so that they work
together and can analyze together data from a big data perspective. Finally, it has
auto-scaling and it is integrated with Spot instances, and the use cases for EMR
will be data processing, machine learning, web indexing, or big data in general. So
from an exam perspective, any time you see Hadoop cluster, think no more, it's
going to be Amazon EMR. That's it, I hope that was helpful, and I will see you in
the next lecture. So now let's talk about Amazon Athena. Amazon Athena is a
serverless query service to perform analytics against your objects stored in Amazon
S3. So the idea is that you would use the SQL query language to create these files,
but you don't need to load them. They just need to be in S3 and Athena will do the
rest. So these files can be formatted in different ways, such as CSV, JSON, ORC,
Avro, and Parquet and the Athena is built on the Presto engine, if you must know.
Now, how does it work? So users will load it into Amazon S3, and then Amazon Athena
will be used to query and analyze the data. Very, very simple. And then if you
wanted to, you could have some reporting on top of Athena, such as using Amazon
QuickSight. Now the pricing for Athena is around $5 per terabyte of data scanned.
And if you use compressed or data stored in a columnar fashion, then you're going
to have cost savings because there is less scan of the data being made. So the use
cases of Athena are multiple, but anytime you see Business intelligence, analytics,
or reporting, or to analyze Flow Logs in VPC or ELB Logs, or CloudTrail logs, or
platform logs, all these kinds of logs in AWS, then Athena's going to be a really,
really good option. So from an exam perspective, whenever you see serverless
analyze data in S3 use SQL, then think Amazon Athena. That's it. I hope you liked
it. And I will see you in the next lecture. Now let's talk about Amazon
QuickSight. So it's a serverless, machine learning-powered business intelligence
service to create interactive dashboards. So behind this very complicated tagline,
all you have to remember is that Amazon QuickSight allows you to create dashboards
on your databases. so we can visually represent your data and show your business
users the insights they're looking for, okay. So QuickSight allows you to create
all these kind of cool graphs, charts, and so on. So it's fast, it's automatically
scalable. It's embeddable and there's per-session pricing, so you don't have to
provision any servers. The use cases are business analytics, building
visualizations, performing ad-hoc analysis, get business insights using data. And
in terms of integrations, there are so many, but, for example, QuickSight can run
on top of your RDS database, it can run on top of Aurora, Athena, Redshift, Amazon
S3, and so on. So QuickSight is your go-to tool for DI in AWS. That's it. I will
see you in the next lecture. Now let's talk about DocumentDB. So the same way we
had Aurora as the way for AWS to implement a sort of big cloud native version of
PostgresSQL and MySQL, we have DocumentDB, which is an Aurora version for MongoDB.
So MongoDB, if you don't know, that's the logo on the top right corner of your
screen. It is another NoSQL database, and you need to remember this for the exam.
So DocumentDB is a NoSQL database, and it's based on top of the MongoDB technology.
So it's compatible with MongoDB. So MongoDB is used to store query and index JSON
data, and you have the same similar deployment concept as Aurora with DocumentDB.
So that means it's a fully managed database. It's highly available. Data is
replicated across three availability zones, and the storage will grow
automatically in increments of 10 gigabytes up to 64 terabytes of storage, all of
this, okay? And DocumentDB has been engineered so it's can scale to workloads with
millions of requests per second. So at the exam, if you see anything related to
MongoDB, think DocumentDB, or if you see anything related to NoSQL databases, think
DocumentDB and also DynamoDB. That's it for this lecture. I hope you liked it, and
I will see you in the next lecture. Now, let's talk about Amazon Neptune. Neptune
is a fully-managed graph database. So an example of what a graph dataset would be,
would be, for example, something we all know which is a social network. So, if we
look at a social network, people are friends, they like, they connect, they read,
they comment and so on. So users have friends, posts will have comments, comments
have likes from users, users shares and like posts and so, all these things are
interconnected and so, they create a graph. And so, this is why Neptune is a great
choice of database when it comes to graph datasets. So, Neptune has replication
across 3 AZ, up to 15 read replicas. It's used to build and run applications that
are gonna be with highly connected datasets, so like a social network, and because
Neptune is optimized to run queries that are complex and hard on top of these graph
datasets. You can store up to billions of relations on the database and query the
graph with milliseconds latency. It's highly available with application across
multiple Availability Zones. And it's also great for storing knowledge graphs. For
example, the Wikipedia database is a knowledge graph because all the Wikipedia
articles are interconnected with each other, fraud detection, recommendations
engine and social networking. So, coming from an exam perspective, anytime you see
anything related to graph databases, think no more than Neptune. That's it. I hope
you liked it and I will see you in the next lecture. Now let's talk about Amazon
QLDB, which stands for Quantum Ledger Database. So what is it? A ledger is a book
recording financial transactions and so QLDB is going to be just to have a ledger
of financial transactions. It's a fully managed database, it's serverless, highly
available, and has replication of data across three availability T zones. And it's
used to review history of all the changes made to your application data over time,
that's why it's called a ledger. So it's an immutable system as well, that means
that once you write something to the database, it can not be removed or modified.
And there is also a way to have a cryptographic signature to make sure that indeed
nothing has been removed. So how does it work? Well, there is behind the scenes a
journal, and so a journal has a sequence of modifications. And so anytime a
modification is made, there is a cryptographic hash that is computed which
guarantees that nothing has been deleted or modified and so this can be verified by
anyone using the database. So this is extremely helpful for financial transactions
because you wanna make sure that obviously no financial transaction is disappearing
from your database which makes QLDB a great ledger database in the cloud. So you
get two to three times better performance than common ledger blockchain frameworks
and also you can manipulate data using SQL. Now, as you'll see in the next lecture
there's also another database technologies called Amazon Managed Blockchain. But
the difference between QLDB and Managed Blockchain is that with QLDB, there is no
concept of decentralization. That means that there's just a central database owned
by Amazon that allows you to write this journal. Okay. And this is in line with
many financial regulation rules. So the difference between QLDB and Managed
Blockchain is that QLDB has a central authority component and it's a ledger,
whereas managed blockchain is going to have a de-centralization component as well.
Okay. So that's it, anytime you see financial transactions and ledger, think QLDB.
I will see you in the next lecture.

You might also like