0% found this document useful (0 votes)

14 views

Chapter 7 What Is Graph Database - 1

Uploaded by

Binod Timilsaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Chapter 7 What Is Graph Database - 1

Uploaded by

Binod Timilsaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

What is Graph Database?

The relational database stores data in the form of tables with appropriate joins between
them. But in a Graph Database, the data is stored as a graph made up of Nodes and Edges.

Nodes and Edges

The Nodes are the entities, while the Edges are the relationships between them.
For example, consider the following graphs.

Simple Knowledge Graphs

Graph 1 conveys that Jack is the brother of Jane. Here, Jack is the source entity, and Jane is
the target entity, connected by the edge “brother of”.
You can identify the source and target entities by the direction of the arrow in the edge. The
target entity is pointed by the arrow mark.
In Graph 2, we see that Jane is the sister of Jack.
Graphs 1 and 2 have unidirectional edges, which mean that the source and target entities are
identifiable.
In Graph 3, the relationship is bidirectional and conveys that Jack and Jane are siblings.
All the above graphs had only one edge. Now let’s see a quick example of a multi-edge
graph.
Multi
-edged Knowledge Graph
Many enterprises today use Big Data technologies to store all their data in differen formats
(say files, video, audio etc) in a single place, regardless of their usage in the future.
Now imagine all your business information visualized as knowledge graphs.
You can easily uncover relationships between entities or identify a network of relationships,
and tap into this information for identifying business opportunities.

Traditional Database vs Graph Database

Knowledge Graphs differs from a traditional database in the following ways:
1. Data is stored as graphs in a Graph database, unlike rows and columns in a traditional database
2. Graphs are Natural language friendly, represents relationships in simple English
3. The query performance in a Graph database is relatively faster than in a traditional database
4. Graph visuals easily unlock network/relationship insights
5. Graphs are flexible to add new data attributes, which isn’t possible in a traditional database

Graph Analytics/Processing
Graph Analytics refers to the analysis performed on the data stored in knowledge graph
data.
It’s just like Data Management and Data Analysis. You organize the data in a Graph
Database before performing the Graph Analytics.
In Graph Analytics, the queries are executed via the edges connecting the entities. The query
execution on a graph database is comparatively faster than a relational database.
You can differentiate entity types like a person, city, etc, by adding colors, weightage,
format data, and label them in the way you want for visualizing it.

Types of Graph Analytics

Based on your goal, graph analytics could be used in different ways. Let’s see them briefly
below.
Node strength analysis
The significance of a specific node in a network of nodes is determined by node strength
analysis. The higher the strength the more important the node to the network.

Edge strength analysis

As the term indicates, the edge significance analysis is all about the weightage of an edge in
connecting two nodes. This analysis helps to determine the strength (strong or weak) of an
edge between two nodes.

Clustering
Clustering enables grouping objects based on the characteristics they exhibit. Clustering is
extremely useful when you want to categorize your graph data in a customized way.
Path analysis
Path analysis involves finding out the shortest and widest path between two nodes. This
kind of analysis is used in social network analysis, supply chain optimization.
Predictive graph analysis
Predictive analysis, in a graph database, is the analysis performed on past graph data, to
determine the edges or nodes in the future.
Let’s see some real-world use cases to understand these better.

Graph Analytics in Action

There are several ways businesses tap into Graph Analytics to unlock hidden relationship
insights.
1. Social network analysis
2. Recommendation engines
3. Compliance
4. Fraud detection
5. Operations optimization
6. National security and defense
Social Network Analysis
Social networks, like Facebook, Linkedin, Instagram that we use in our day-to-day life, are
one of the best examples of knowledge graph applications.
The influencers of a specific target audience on the social media platforms are identified
easily with knowledge graphs. Getting the word out about your offerings through
influencers, called “influencer marketing” is a new normal now.
Further, companies visualize the connections and identify networks to reach an influencer or
the final decision-makers. Mutual connections help a lot in Linkedin.
Finding talent is another application with social network analysis, and there’s so much
more!

Recommendation Engines
The “You may also know” or “You may also like” recommendations in the social media
platforms, and entertainment applications are examples of a graph analytics application.
With graph analytics, they identify a stream or creator that is of interest to you, and
recommend content from that stream or creator on your feed.
The “You may also know” recommendations are usually backed by the school or college
you studied at, or the company you worked for, or through a mutual connection.

Compliance
Implementing regulatory compliance or company-specific policies is simplified by graph
analytics.
Examples include – detecting transactions involving sanctioned businesses, banned
geographies, unauthorized transactions, etc.
Knowledge graphs also help in busting cyber-attack networks.

Fraud detection
The ecommerce businesses can make use of knowledge graphs for detecting and stopping
orders placed from hacked accounts, false refund claims, etc.
Banks and financial institutions can quite easily tackle fraudulent insurance claims,
unauthorized transactions, and transactions from hacked accounts and so on.
Due to the prevailing pandemic, many banks offer the convenience to open accounts online.
Banks can make use of Graph analytics to identify and stop trouble-makers from opening
multiple accounts with no intention to use those accounts.
The knowledge graph applications from the fraud detection aspect are mostly reactive. But
with appropriate ML and AI algorithms in place, such activities can be stopped proactively
as well.
Operations Optimization
The shortest path graph analytics is used in optimizing the operations and increasing the
business efficiency while lowering costs. The applications are plenty – like identifying the
shortest route in supply chain management, building product distribution networks for
different geographies, and so on.

National Security and Defense

The application of graph analytics for National security and defense is quite controversial,
as it oversteps the citizens’ privacy line. Governments analyze individual chat messages,
online activities, and calls to identify and arrest the people involved in criminal activities,
and also to eliminate unnecessary suspicion on innocent civilians.

What Is Machine Learning in the Cloud?

Machine Learning (ML) is a subset of artificial intelligence that emulates human learning,
allowing machines to improve their predictive capabilities until they can perform tasks
autonomously, without specific programming. ML-driven software applications can predict
new outcomes based on historical training data.

Training an accurate ML model requires large amounts of data, computing power, and
infrastructure. Training a machine learning model in-house is difficult for most
organizations, given the time and cost. A cloud ML platform provides the compute, storage,
and services required to train machine learning models.

Cloud computing makes machine learning more accessible, flexible, and cost-effective
while allowing developers to build ML algorithms faster. Depending on the use case, an
organization may choose different cloud services to support their ML training projects (GPU
as a service) or leverage pre-trained models for their applications (AI as a service).

Benefits of Machine Learning in the Cloud

Many organizations are capable of building machine learning models in-house, using open
source frameworks such as Scikit Learn, TensorFlow, or PyTorch. However, even if in-
house teams are capable of building algorithms, they will often find it difficult to deploy
models to production and scale them to real-life workloads, which often requires large
computing clusters.

There are several barriers to entry for deploying machine learning capabilities into
enterprise applications. The expertise required to build, train, and deploy machine learning
models adds to the cost of labor, development, and infrastructure, along with the need to
purchase and operate specialized hardware equipment.
Many of these problems can be addressed by cloud computing. Public clouds and AIaaS
services help organizations leverage machine learning capabilities to solve business
problems without having to undertake the technical burden.

The key benefits of cloud computing for machine learning workloads can be summarized as
follows:

 On-demand pricing models make it possible to embark on ML initiatives without a

large capital investment.
 The cloud provides the speed and performance of GPUs and FPGAs without
requiring an investment in hardware.
 The cloud allows businesses to easily experiment with machine learning capabilities
and scale as projects move into production and demand for those capabilities grows.
 The cloud allows access to ML capabilities without advanced skills in artificial
intelligence or data science.

Limitations of Machine Learning in the Cloud

Doesn’t replace experts—ML systems, even if they are managed on the cloud, still require
human monitoring and optimization. There are practical limits to what AI can do without
human oversight and intervention. Algorithms do not understand everything about a
situation and do not know how to respond to every possible input.
Data mobility—when running ML models in the cloud, it can be challenging to transition
systems from one cloud or service to another. This requires moving the data in a way that
doesn't affect model performance. Machine learning models are often sensitive to small
changes in the input data. For example, a model may not work well if you need to change
the format or size of your data.
Security concerns—cloud-based machine learning is subject to the same concerns as any
cloud computing platform. Cloud-based machine learning systems are often exposed to
public networks and can be compromised by attackers, who might manipulate ML results or
run up infrastructure costs. Cloud-based ML models are also vulnerable to denial of service
(DoS) attacks. Many of these threats do not exist when models are deployed behind a
corporate firewall.

Types of Cloud-Based Machine Learning Services

Artificial Intelligence as a Service (AIaaS)
Artificial Intelligence as a Service (AIaaS) is a delivery model that enables vendors to
provide artificial intelligence (AI) that reduces their customer’s risk and initial investment.
It helps customers experiment with various cloud AI offerings and test different machine
learning (ML) algorithms, using the services that suit their scenario best.

Each AIaaS vendor offers various AI and machine learning services with different features
and pricing models. For example, some cloud AI providers offer specialized hardware for
specific AI tasks, like GPU as a Service (GPUaaS) for intensive workloads. Other services,
like AWS SageMaker, provide a fully managed platform to build and train machine learning
algorithms.

GPU as a Service (GPUaaS)

GPU as a Service (GPUaaS) providers eliminate the need to set up on-premises GPU
infrastructure. These services let you elastically provision GPU resources on demand. It
helps reduce the costs associated with in house GPU infrastructure, increase the level of
scalability and flexibility, and enable many to implement large-scale GPU computing
solutions at scale.

GPUaaS is often delivered as SaaS, ensuring you can focus on building, training, and
deploying AI solutions to end users. You can also use GPUaaS with a server model.
Computationally intensive tasks consume massive amounts of CPU power. GPUaaS lets you
offload some of this work to a GPU to free up resources and improve performance output.

Popular Cloud Machine Learning Platforms

Here are three popular machine learning platforms offered by the leading cloud providers.

AWS SageMaker
SageMaker is Amazon’s fully managed machine learning (ML) service. It enables you to
quickly build and train ML models and deploy them directly into a production environment.
Here are key features of AWS SageMaker:

 An integrated Jupyter authoring notebook instance—provides easy access to data

sources for analysis and exploration. There is no need to manage servers.
 Common machine learning algorithms—the service provides algorithms optimized
for running efficiently against big data in a distributed environment.
 Native support for custom algorithms and frameworks—SageMaker provides flexible
distributed training options designed to adjust to specific workflows.
 Quick deployment—the service lets you use the SageMaker console or SageMaker
Studio to quickly deploy a model into a scalable and secure environment.
 Pay per usage—AWS SageMaker bills training and hosting by usage minutes. There
are no minimum fees or upfront commitments.
Azure Machine Learning
Azure Machine Learning is a cloud-based service that helps accelerate and manage the
entire ML project lifecycle. You can use it in workflows to train and deploy ML models,
create your own model, or use a model from sources like Pytorch or TensorFlow. It also lets
you manage MLOps, ensuring you can monitor, retrain, and redeploy your models.

Individuals and teams can use this service to deploy ML models into an auditable and secure
production environment. It includes tools that help automate and accelerate ML workflows,
integrate models into services and applications, and tools backed by durable Azure Resource
Manager APIs.

Google Cloud AutoML

AutoML is Google Cloud’s machine learning service. It does not require extensive
knowledge of machine learning. AutoML can help you build on Google’s ML capabilities to
create custom ML models tailored to your specific needs. It lets you integrate your models
into applications and websites.

How to Choose a Cloud Machine Learning Platform?

Support for ETL or ELT Pipelines
Extract, Transform, Load (ETL) and Extract, Load, and Transform (ELT) are two common
data pipeline models. Machine learning and deep learning amplify the need for data
transformation to meet the specific requirements of ML models. ELT gives you more
flexibility if you need to change transformations midway. This is commonly needed in the
load phase, which is the most time-consuming in many big data projects.

Support for Scale-Up and Scale-Out Training

When training large-scale models, it can be very useful for notebooks to have access to
multiple large virtual machines or containers. Training can greatly benefit from accelerators
such as GPUs, TPUs, and FPGAs. A cloud machine learning platform should provide access
to these resources at an affordable cost.

Support for Machine Learning Frameworks

Most data scientists have a preferred machine learning and deep learning framework and
programming language:

 For Python users, Scikit-learn is often the best choice for machine learning.
 Deep learning models are most commonly developed with TensorFlow, PyTorch, or
MXNet.
 Scala users commonly use SparkMLlib.
 R is an aging framework but it has many basic machine learning packages and is still
commonly used by many data scientists.
 In the Java world, the H2O.ai framework is a common choice.

Pre-Tuned AI Services
Cloud machine learning platforms provide optimized AI services for use cases like
computer vision, natural language processing, speech synthesis, and predictive analytics.
These services are typically trained and tested using more data than is available to most
businesses. They are deployed on service endpoints with sufficient compute resources,
including hardware accelerators, to ensure excellent response times and high scalability.

Monitor Prediction Performance

Cloud-based machine learning platforms should provide the tools to monitor model
performance and respond to changes. Models that provided excellent performance at first
can degrade in performance over time due to changes to data inputs. The platform should
provide observability capabilities that let you identify performance issues and understand
their root cause, allowing you to tune the model or retrain it on a more relevant dataset.

Training Machine Learning Projects in the Cloud

Identify and Understand Your Data Sources

Sort through your data and identify the sources—this could be a complicated and time-
consuming process, especially if you have incomplete data. If you need to move data from
on-premises environments to the cloud, take into account data transfer rates in case of large
data volumes, and check for any compliance or legal restrictions. It is important to provision
the appropriate storage resources to store your dataset and compute resources to process it.

Engineer the Features

Start your modeling process using iterative steps. First, conduct feature engineering to
determine the variables you want to model. Next, start training the model. Feature
engineering is a complicated but critical process and requires business and domain
knowledge for exploratory data analysis. One challenge is to ensure you have the right
number of variables to enable the model’s functionality while avoiding noise.

Train and Validate Your Model

Model training is a standard procedure with iterative testing and training steps. Cloud-based
machine learning is useful for testing multiple machine learning models, given the
flexibility of cloud computing resources. The algorithms you use depend on your business
requirements, data accuracy requirements, data volume and availability, parameters, and the
computing task (i.e., classification, prediction).

Cloud providers offer automated machine learning services that let you tune
hyperparameters and test multiple algorithms simultaneously. For example, Azure offers
AutoML, which supports different ensemble modeling methods and incorporates best
practices for building an ML model. It also provides a centralized workspace to keep track
of your artifacts, including the full model history.

Deploy and Monitor Your Model

Once you’ve built a model that meets your business objectives, you can deploy it at scale.
Once you have trained the model using a cloud-based ML platform, deployment should be
straightforward. This typically involves defining the model endpoint, specifying computing
resources that should run the model, and hitting the switch.

When you deploy your model, you must monitor it continuously to ensure it functions
properly. Monitor performance to verify if the model’s predictions are relevant and accurate.
Some cloud ML platforms offer automated data drift monitoring. Look out for data drift to
keep the predictions relevant (the input data can diverge from the training data over time).
When data drift occurs, revisit your dataset and retrain the model with more relevant data.

Streaming in Cloud
A cloud video streaming service streams and stores your video data (or someone else's video
data) in the cloud. A good cloud video streaming service will host video, deliver it reliably
whenever you want, be scalable and able to reach millions with its content. Some popular
cloud video streaming services include Netflix and Hulu, but they can also include services
like YouTube, Vimeo and api.video.

Before cloud video streaming was a service you could buy, if you wanted to broadcast your
video content to millions, you needed the right servers and hardware to do so.

How does cloud video streaming work?

Cloud video streaming uses a network of servers that host and deliver video. When you are
ready to live stream or upload a video for viewing, the services you've selected will
transcode your video to prepare and optimize it for transmission.

Features of a cloud video streaming service

The ideal cloud video service will offer customers everything they need to ensure video
broadcasting success. That means including:
API access - Depending on the type of service, a cloud video service's API features will be
wildly different. For example Netflix's API lets you access data about their movie and TV
titles. YouTube's API lets you retrieve information and videos for different users and add
them to applications
Live streaming and recording - Transmitting live video and audio is a key feature of a
cloud video streaming service, but equally important is live recording. Not everyone can
tune in to your live stream when it's happening. That's why a good cloud video streaming
service will allow you to record your live stream for playback later.
Video player - Cloud video streaming services include a video player. HTML5 video
players are the most popular and common -- even Netflix uses HTML5 for its video player.
Monetization - Depending on what angle a streaming service has, monetization is included
so you can monetize your content, or its included so the service can monetize the content it
shows you. Who controls the monetization depends
Video analytics - A useful feature for a cloud video streaming service is analytics. No
matter the type of platform, knowing how it's being used is valuable. Some platforms, for
example Hulu, won't expose analytics to the customer because that information is more for
people who are offering the platform as a service to digest
Content Delivery Network - A video streaming service is only as good as its content
delivery network. Today big companies like Akamai and others ensure that their servers can
process all types of content with ease -- including video. However, when it comes to video,
the very best content delivery will be from CDNs that are designed and optimized solely to
handle video. Many popular streaming services already do this with their servers by offering
their own CDNs.
Support - If something goes wrong while you're watching content on a streaming service or
building your own, you will want help with it. That's why every good video streaming
service offers help in the form of live support and documentation to answer your questions.
Privacy and Security - All users and customers want privacy for their personal data and the
ability to add privacy and security options. For example, if someone paid to view content on
a streaming service, they want to know not just anyone can use their account without their
knowledge. Or, in a set up where someone is building their own service, they'll want to be
able to choose how to restrict and display their content to customers on their platform.