Chapter 7 What Is Graph Database - 1
Chapter 7 What Is Graph Database - 1
The relational database stores data in the form of tables with appropriate joins between
them. But in a Graph Database, the data is stored as a graph made up of Nodes and Edges.
Graph Analytics/Processing
Graph Analytics refers to the analysis performed on the data stored in knowledge graph
data.
It’s just like Data Management and Data Analysis. You organize the data in a Graph
Database before performing the Graph Analytics.
In Graph Analytics, the queries are executed via the edges connecting the entities. The query
execution on a graph database is comparatively faster than a relational database.
You can differentiate entity types like a person, city, etc, by adding colors, weightage,
format data, and label them in the way you want for visualizing it.
Clustering
Clustering enables grouping objects based on the characteristics they exhibit. Clustering is
extremely useful when you want to categorize your graph data in a customized way.
Path analysis
Path analysis involves finding out the shortest and widest path between two nodes. This
kind of analysis is used in social network analysis, supply chain optimization.
Predictive graph analysis
Predictive analysis, in a graph database, is the analysis performed on past graph data, to
determine the edges or nodes in the future.
Let’s see some real-world use cases to understand these better.
Recommendation Engines
The “You may also know” or “You may also like” recommendations in the social media
platforms, and entertainment applications are examples of a graph analytics application.
With graph analytics, they identify a stream or creator that is of interest to you, and
recommend content from that stream or creator on your feed.
The “You may also know” recommendations are usually backed by the school or college
you studied at, or the company you worked for, or through a mutual connection.
Compliance
Implementing regulatory compliance or company-specific policies is simplified by graph
analytics.
Examples include – detecting transactions involving sanctioned businesses, banned
geographies, unauthorized transactions, etc.
Knowledge graphs also help in busting cyber-attack networks.
Fraud detection
The ecommerce businesses can make use of knowledge graphs for detecting and stopping
orders placed from hacked accounts, false refund claims, etc.
Banks and financial institutions can quite easily tackle fraudulent insurance claims,
unauthorized transactions, and transactions from hacked accounts and so on.
Due to the prevailing pandemic, many banks offer the convenience to open accounts online.
Banks can make use of Graph analytics to identify and stop trouble-makers from opening
multiple accounts with no intention to use those accounts.
The knowledge graph applications from the fraud detection aspect are mostly reactive. But
with appropriate ML and AI algorithms in place, such activities can be stopped proactively
as well.
Operations Optimization
The shortest path graph analytics is used in optimizing the operations and increasing the
business efficiency while lowering costs. The applications are plenty – like identifying the
shortest route in supply chain management, building product distribution networks for
different geographies, and so on.
Training an accurate ML model requires large amounts of data, computing power, and
infrastructure. Training a machine learning model in-house is difficult for most
organizations, given the time and cost. A cloud ML platform provides the compute, storage,
and services required to train machine learning models.
Cloud computing makes machine learning more accessible, flexible, and cost-effective
while allowing developers to build ML algorithms faster. Depending on the use case, an
organization may choose different cloud services to support their ML training projects (GPU
as a service) or leverage pre-trained models for their applications (AI as a service).
There are several barriers to entry for deploying machine learning capabilities into
enterprise applications. The expertise required to build, train, and deploy machine learning
models adds to the cost of labor, development, and infrastructure, along with the need to
purchase and operate specialized hardware equipment.
Many of these problems can be addressed by cloud computing. Public clouds and AIaaS
services help organizations leverage machine learning capabilities to solve business
problems without having to undertake the technical burden.
The key benefits of cloud computing for machine learning workloads can be summarized as
follows:
Doesn’t replace experts—ML systems, even if they are managed on the cloud, still require
human monitoring and optimization. There are practical limits to what AI can do without
human oversight and intervention. Algorithms do not understand everything about a
situation and do not know how to respond to every possible input.
Data mobility—when running ML models in the cloud, it can be challenging to transition
systems from one cloud or service to another. This requires moving the data in a way that
doesn't affect model performance. Machine learning models are often sensitive to small
changes in the input data. For example, a model may not work well if you need to change
the format or size of your data.
Security concerns—cloud-based machine learning is subject to the same concerns as any
cloud computing platform. Cloud-based machine learning systems are often exposed to
public networks and can be compromised by attackers, who might manipulate ML results or
run up infrastructure costs. Cloud-based ML models are also vulnerable to denial of service
(DoS) attacks. Many of these threats do not exist when models are deployed behind a
corporate firewall.
Each AIaaS vendor offers various AI and machine learning services with different features
and pricing models. For example, some cloud AI providers offer specialized hardware for
specific AI tasks, like GPU as a Service (GPUaaS) for intensive workloads. Other services,
like AWS SageMaker, provide a fully managed platform to build and train machine learning
algorithms.
GPUaaS is often delivered as SaaS, ensuring you can focus on building, training, and
deploying AI solutions to end users. You can also use GPUaaS with a server model.
Computationally intensive tasks consume massive amounts of CPU power. GPUaaS lets you
offload some of this work to a GPU to free up resources and improve performance output.
AWS SageMaker
SageMaker is Amazon’s fully managed machine learning (ML) service. It enables you to
quickly build and train ML models and deploy them directly into a production environment.
Here are key features of AWS SageMaker:
Individuals and teams can use this service to deploy ML models into an auditable and secure
production environment. It includes tools that help automate and accelerate ML workflows,
integrate models into services and applications, and tools backed by durable Azure Resource
Manager APIs.
For Python users, Scikit-learn is often the best choice for machine learning.
Deep learning models are most commonly developed with TensorFlow, PyTorch, or
MXNet.
Scala users commonly use SparkMLlib.
R is an aging framework but it has many basic machine learning packages and is still
commonly used by many data scientists.
In the Java world, the H2O.ai framework is a common choice.
Pre-Tuned AI Services
Cloud machine learning platforms provide optimized AI services for use cases like
computer vision, natural language processing, speech synthesis, and predictive analytics.
These services are typically trained and tested using more data than is available to most
businesses. They are deployed on service endpoints with sufficient compute resources,
including hardware accelerators, to ensure excellent response times and high scalability.
Cloud providers offer automated machine learning services that let you tune
hyperparameters and test multiple algorithms simultaneously. For example, Azure offers
AutoML, which supports different ensemble modeling methods and incorporates best
practices for building an ML model. It also provides a centralized workspace to keep track
of your artifacts, including the full model history.
When you deploy your model, you must monitor it continuously to ensure it functions
properly. Monitor performance to verify if the model’s predictions are relevant and accurate.
Some cloud ML platforms offer automated data drift monitoring. Look out for data drift to
keep the predictions relevant (the input data can diverge from the training data over time).
When data drift occurs, revisit your dataset and retrain the model with more relevant data.
Streaming in Cloud
A cloud video streaming service streams and stores your video data (or someone else's video
data) in the cloud. A good cloud video streaming service will host video, deliver it reliably
whenever you want, be scalable and able to reach millions with its content. Some popular
cloud video streaming services include Netflix and Hulu, but they can also include services
like YouTube, Vimeo and api.video.
Before cloud video streaming was a service you could buy, if you wanted to broadcast your
video content to millions, you needed the right servers and hardware to do so.