0% found this document useful (0 votes)
2 views

CloudComputing DATABASE

This document provides an overview of data services on IBM Cloud, detailing various types of data storage solutions including structured and unstructured data, relational and NoSQL databases, and their respective features. It highlights the importance of data processing for organizational productivity and decision-making, as well as the role of cloud technologies in enabling scalable data management. Additionally, it discusses specific database offerings such as IBM Cloudant, its functionalities, and how to interact with it through an HTTP API.

Uploaded by

dawasthi952
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CloudComputing DATABASE

This document provides an overview of data services on IBM Cloud, detailing various types of data storage solutions including structured and unstructured data, relational and NoSQL databases, and their respective features. It highlights the importance of data processing for organizational productivity and decision-making, as well as the role of cloud technologies in enabling scalable data management. Additionally, it discusses specific database offerings such as IBM Cloudant, its functionalities, and how to interact with it through an HTTP API.

Uploaded by

dawasthi952
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to data services on IBM Cloud

Overview:
This unit provides an overview of the types of data stores that are used in cloud computing. You will also learn about the data
services offerings that are available through the cloud development platform.

Data is defined as a set of facts, statistics, or figures. It can be in many formats, such as text documents, images, audio, or
videos.
Raw data, which is the most basic format of data, is processed to produce useful information. Data processing and analysis
helps modern organizations increase their productivity and make better business decisions.
Data can be categorized into two main categories: structured data and unstructured data. Structured data is the
formatted and highly organized data that can fit easily into data models with fixed fields. An example is a list of students or
employees data, including their names, ages, and addresses. Unstructured data is the opposite of the structured data. It is
unorganized, raw and has no formal structure and it is considered as loosely structured data. For example unstructured text
and multimedia like email messages, webpages, documents, photos, audio files and videos.
There is a popular saying that data is the new oil because the data and the information that is obtained by
processing the data play an important role in modern organizations and contribute to the development of new
business models. The organizations that are considered the most successful ones are those that can capture, manage, and
derive key insights from their corporate data. Cloud technologies enable small organizations to design, set up data platforms,
and use data analysis services on the cloud quickly and receive benefits from the scalability, reliability, and quality of service
that is provided by the cloud. These factors help these organizations to evolve quickly and grow up faster in the market.

How data is stored :


Capitalizing on data requires storage for that data. There are many formats for storing this data. For example:
• Flat files (including XML files)
• Excel spreadsheets
• Relational databases (for example, Db2, MySQL, and PostgreSQL)
• NoSQL databases (Cloudant, MongoDB, and Redis)
• Object-based storage (IBM Cloud Object Storage)
In enterprise systems (systems that are used to run businesses), data is stored by using a database.
Databases are used by modern organizations to organize and store their data so that the data can easily be accessed,
managed, and updated. A database management system (DBMS) is system software for creating and managing databases.
A DBMS provides users and programmers with a systematic way to create, retrieve, update, and manage data.
A data model focuses on the type of data that is required, the way it must be organized, and the manipulation process that is
performed on the data to provide a complete and accurate structure for data within the information system. A data model
organizes data elements and standardizes how the data elements relate to each other.
Two data models that have many database implementations are the relational and NoSQL data models. The following
slides describe these data models through their databases.
A relational database is a database that works on the relational model that was described by Edgar F Codd in 1969. A
relational database stores the data in a set of tables so that the data can be easily accessed by using Structured Query
Language (SQL) statements, which is why a relational database is sometimes called an SQL database.
The key features of the relational database are as follows:
• Data is split over many tables to avoid duplication.
• Primary and unique keys are defined to prevent duplicate rows.
• Integrity is maintained by using foreign keys to prevent rows from referring to locations and departments that no
longer exist.
• Follow the Atomicity – Consistency – Isolation – Durability (ACID) properties for all transactions in the relational
database models.
• Relational databases cannot work with unstructured data, so unstructured data must be analyzed, organized, and
transformed to structure data to be stored in relational databases.
•Relational databases (SQL databases):
▪ IBM Db2 Hosted
Db2 is a Relational Database Management System (RDBMS). Db2 stores, analyzes, and retrieves data efficiently. With Db2
Hosted, you can run Db2 with full administrative access on a cloud infrastructure. It eliminates the cost, complexity, and risk
of managing your own infrastructure.
▪ Databases for PostgreSQL
PostgreSQL is a powerful, open source object-relational database that is highly customizable. It is a feature-rich enterprise
database with JSON support that has the best of both the SQL and NoSQL worlds.
▪ Compose for MySQL (Beta)
MySQL is a fast, easy-to-use, and flexible RDBMS. As the central component of the Linux, Apache, MySQL, and PHP
(LAMP) web service model, it has many connectors, including Python, PHP, and C++ for development needs.
A NoSQL database (originally referred to as "non-SQL" or "nonrelational") provides a mechanism for storage and retrieval of
data that is modeled by means other than the tabular relations that are used in relational databases. Its origin comes from
the 21st century website architectures that faced challenges from the relational model.
The approach that is followed by a NoSQL database is as follows:
• Unlike the relational database, a NoSQL database is eventually consistent and can provide read/write operations
at a lower latency.
• Renormalize the data model: Data can be stored as it is used in the application.

The key features of a NoSQL database are the following ones:


• Flexibility and scalability: IT provides flexibility because some NoSQL databases are tuned for speed at the expense of
data consistency, and other NoSQL databases sacrifice speed for scalability.
• Schema-less: The data is not stored in strictly defined schemas like those in a relational database.
• It does not require a predefined data model for storage, such as specific row and column names and sizes.
• It is optimized to work on distributed hardware.
• It uses relatively simple queries that can be processed quickly across much larger data sets.
• It can process unstructured data and store it in its original format.
Here are the pros and cons of a key-value:
Pros :
• The data model is flexible (no predefined structure).
• The data structure is more application-oriented and simplifies the application design (no mapping between code
and the relational database).
Cons :
• There are no association between attributes that can construct a real data model.
• The data model is more application-oriented and less reusable by different applications.
• The database does not support complex queries (no join or aggregation queries, and update and delete can be
done only by primary key).

• Key-value pair in-memory databases:


▪ Databases for Redis
Redis is an open source, in-memory data structure store that is used as a database, cache, and message broker. It supports
data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperlogs, and geospatial
indexes with radius queries.
▪ Databases for etcd
etcd is a key-value store that developers can use to coordinate and manage server clusters or provide lightning-fast
metadata storage.
Here are the main characteristics of a document NoSQL database:
• Data is stored as documents in the JSON or XML formats. The structure of each document is not imposed; each document
has a flexible schema (schema-less).
• The database supports add, update, or deletion operations for some fields in a document and the ability to create an index
of the document fields, which enables fast access without using a primary key and more complex requests than a key-value
pair.
• Indexing and analytical queries are available by using paradigms like MapReduce.
MapReduce is a way of processing huge amounts of data in parallel by breaking it into small blocks or pieces across many
nodes and then combining or reducing the results of those nodes.
• Does not support object relational mappings like relational databases.
• Document databases:
▪ Cloudant
A highly scalable and performant document store that excels at pushing and retrieving data to and from the edge. It is a fully managed
JSON document database that offers independent serverless scaling of provisioned throughput capacity and storage. Cloudant is
compatible with Apache CouchDB and accessible through a simple to use HTTPS API for web, mobile, serverless, and IoT
applications. It is explained in detail in the following slides.
▪ Databases for MongoDB
MongoDB is a JSON document store with a rich query and aggregation framework. Its features include high availability, automated
backup orchestration, and de-coupled scaling of storage and RAM.
▪ Databases for Elasticsearch
Elasticsearch combines the power of a full text search engine with the indexing strengths of a JSON document database to create a
powerful tool for rich data analysis on large volumes of data. Its features include high availability, automated backup orchestration,
and de-coupled scaling of storage and RAM.
A column-oriented database uses keys and dynamic groupings of columns to store data across distributed servers. To
improve speed, similar data is grouped. Column-oriented databases are designed to have millions of columns and billions of
rows.
Main characteristics:
•Keys include row, column family, and columns, so queries can be done on these keys.
•Multiple types of keys can be created, such as row or composite keys.
•Support for time stamping during inserts enables automatic versioning.
•Strong support of MapReduce for analytics for large amounts of data.
•Similar data is grouped to improve speed, especially for querying data by using aggregate functions such as count, sum,
avg, max, and min.
•Compared to row-oriented databases, columnar databases can better compress data and save storage space, which
permits aggregate functions to be performed rapidly.
• Columnar databases:
▪ Compose for ScyllaDB (Beta)
ScyllaDB is a highly performant, in-place replacement for the Cassandra wide-column distributed database. ScyllaDB is
written in C++ instead of Java like Cassandra's for better resource usage that can result in 10 times better performance in
benchmarks.
Graph databases store information in entities, nodes and relationships, or edges. Graph databases are ideal when
determining or representing data for multiple degrees of data and their distances.
The main difference between the graph model and the relational model is that the graph model has a more flexible structure.
A graph model relies on nodes and relationships to describe the relations. In the relational model, the relations among the
table can be described by using special properties, such as foreign keys.
Here are some key use cases:
• A powerful approach when the data is highly connected and related: A graph database can show relationships
among various data items that are critical, and it can find inferences and rules among data.
• Social data analysis: For example, social networking sites can benefit by quickly locating friends, friends of
friends, and likes.
• Filter mapping, pattern determination, and optimization problems in graph applications: For example, routing,
spatial, and map applications can use graphs to model easily their data for finding close locations or building the
shortest routes for directions.

• Graph databases:
▪ Compose for JanusGraph (Beta)
JanusGraph is scalable graph database that is optimized for storing and querying highly interconnected data that is
modeled as millions or billions of vertices and edges.
IBM Cloudant

Cloudant is an IBM software product, which is primarily delivered as a cloud-based service. Cloudant is a
non-relational, distributed database service of the same name. Cloudant is based on the Apache-
backed CouchDB project and the open source BigCouch project.
IBM acquired Cloudant, a Boston-based cloud database startup, in 2014.
IBM Cloudant is a NoSQL database as a service (DBaaS) that is optimized for handling heavy workloads of
concurrent reads and writes in the cloud. These workloads are typical for large, fast-growing web and mobile
apps. It is built to scale globally, run continuously, and handle various data types, such as JSON, full-text,
and geospatial.
Cloudant ensures that the flow of data between an application and its database remains uninterrupted and
performs to the users’ satisfaction. The data replication technology also allows developers to put data closer
to where their applications need it most.
Cloudant frees developers from worrying about managing the database, which enables them to focus on the
application. Cloudant eliminates the risk, cost, and distractions of database scalability, which enables
you to regain valuable time and your applications to scale larger and remain consistently available to users
worldwide.
Data is stored and sent in JSON format. The data documents are accessed with a simple REST-based
HTTP method. Anything that is encoded into JSON can be stored as a document.
Documents in Cloudant:
Cloudant documents are containers for data, and the documents are JSON objects. All documents in Cloudant must contain
the following unique fields:
• An identifier _id field serves as the document key. It can be created by the application or generated automatically by
Cloudant.
• A revision number _rev field is automatically generated and used internally by the Cloudant database as a revision
number. A revision number is added to your documents by the server when you insert or modify them. You must specify the
latest _rev when a document is updated or your request fails. It also helps avoid conflicting data states.
Cloudant Dashboard
Cloudant Dashboard:
Cloudant Dashboard is a cloud-based web interface that makes it easy to develop, administer, and monitor
your databases. You can perform many tasks, such as:
• View and manage Cloudant databases.
• View and create documents.
• Create and run queries.
• Manage the permissions to the database.
• View capacity usage (reads/second, write/second, storage limit, and so on).
• Manage the plan settings (upgrade plan, raise throughput capacity, and so on).
You can also display the contents of a Cloudant document in IBM Cloud by selecting the database. Then,
select All Documents to display the list of documents. You can edit each of the documents in the list
to display or modify the document contents.
Cloudant HTTP API:
Cloudant uses an HTTP API to provide simple, web-based access to data in the Cloudant data store. The HTTP API is a
programmatic way of accessing the data from your applications. It provides several HTTP access methods for data read,
add, update, and delete functions.
The following HTTP Request methods can be used to apply the create, read, update, and delete operations on Cloudant
documents by directly referencing the document ID:
• GET: Request a specific JSON document.
• POST: Set values, and create documents.
• PUT: Create databases and documents.
• DELETE: Delete a specific document.
To create a document, you can send a POST request to https://$USERNAME.cloudant.com/$DATABASE with the
document's JSON content in the request body.
To update (or create) a document, you can send a PUT request to
https://$USERNAME.cloudant.com/$DATABASE/$DOCUMENT_ID with the updated JSON content, including the latest
_rev value in the request body.
To delete a document, you can send a DELETE request to
https://$USERNAME.cloudant.com/$DATABASE/$DOCUMENT_ID?rev=$REV where $REV is the document's latest _rev.

Cloudant indexes:
A database index is a sorted data structure that enables quick access to a portion of the data. By default, IBM Cloudant
generates a primary index for the _id field so that it can retrieve data by _id.
A user can create secondary indexes for other fields if there are many queries that run on these fields.
After you create an index, a design document is generated on Cloudant to describe the index that is created. Design
documents are used to build indexes, validate updates, and format query results.
The index type is either text or JSON. Text indexes are powered by Cloudant search indexes, which enable you to query a
database by using Lucene Query Parser. JSON indexes are powered by MapReduce.
Cloudant Query:
Before you query for a specific field, it is a best practice to create an index for each field in the selector to optimize query
performance.
The JSON body that is provided shows an example of a Cloudant query request body. In this example, the response of this
request returns Cloudant documents that have lastname = ‘Brown’ and location = ‘New York City, NY’. The document
fields that are shown are only firstname, lastname, and location. Some advanced operators can be used in the
Cloudant query, such as the $eq (equal) and $gt (greater than) operators that are used to search for documents.

You can use Cloudant endpoints to create, list, update, and delete indexes in a database, and to query data by using these
indexes.
The JSON body that is provided shows an example of a create index request body. In this example, an index of type text is
created for a field that is called “foo”. After the creation of this index, the Cloudant query that is used to search for Cloudant
documents by using the “foo” field in the query are more efficient and faster.
HTTP status codes:
Cloudant uses HTTP status codes that are returned in HTTP response headers.
More information might also be included in the response body area for the message.
The following example status codes adhere to the widely accepted status codes for HTTP:
• 200 - OK
• 201 - Created
• 400 - Bad request
• 401 - Unauthorized
• 404 - Not Found
For example, if you try to use https://$USERNAME.cloudant.com/$DATABASE/$DOCUMENT_ID to retrieve a document
that does not exist in the database, Cloudant responds with status code 404 in the header and other information about the
error is returned in the response as JSON, as shown in the slide.
The language-specific libraries often include error handling for these various cases.

You might also like