0% found this document useful (0 votes)
101 views

Introduction To Smart Systems

Big data and cloud computing are interrelated concepts. Big data refers to vast amounts of structured, semi-structured, or unstructured data from various sources that is analyzed to identify trends or patterns. Cloud computing provides on-demand access to computing resources and services. The cloud enables scalable and cost-effective big data processing by providing vast infrastructure that users can assemble as needed for analytics projects. While the cloud offers benefits like scalability, agility and cost savings, it also presents disadvantages such as network dependence, storage costs, security risks, and lack of standardization that must be considered.

Uploaded by

api-321004552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Introduction To Smart Systems

Big data and cloud computing are interrelated concepts. Big data refers to vast amounts of structured, semi-structured, or unstructured data from various sources that is analyzed to identify trends or patterns. Cloud computing provides on-demand access to computing resources and services. The cloud enables scalable and cost-effective big data processing by providing vast infrastructure that users can assemble as needed for analytics projects. While the cloud offers benefits like scalability, agility and cost savings, it also presents disadvantages such as network dependence, storage costs, security risks, and lack of standardization that must be considered.

Uploaded by

api-321004552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to smart

systems
Big data and Cloud
services

 What is big data in the cloud?

 Big data and cloud computing are two


distinctly different ideas, but the two
concepts have become so interwoven
that they are almost inseparable. It's
important to define the two ideas and
see how they relate.

 Big data

 Big data refers to vast amounts of data


that can be structured, semistructured or
unstructured. It is all about analytics and
is usually derived from different sources,
such as user input, IoT sensors and sales
data.
Big data and Cloud
services
 Big data also refers to the act of
processing enormous volumes of
data to address some query, as
well as identify a trend or pattern.
 Data is analyzed through a set of
mathematical algorithms, which
vary depending on what the data
means, how many sources are
involved and the business's intent
behind the analysis.
 Distributed computing software
platforms, such as Apache Hadoop,
Databricks and Cloudera, are used
to split up and organize such
complex analytics.
Cloud

 Cloud computing provides


computing resources and
services on demand. A user can
easily assemble the desired
infrastructure of cloud-based
compute instances and storage
resources, connect cloud
services, upload data sets and
perform analyses in the cloud.
Users can engage almost
limitless resources across the
public cloud, use those
resources for as long as needed
and then dismiss the
environment -- paying only for
the resources and services that
were actually used.
The pros of big data in the cloud

 Scalability

 A typical business data center faces limits in physical space, power, cooling and the budget to
purchase and deploy the sheer volume of hardware it needs to build a big data infrastructure. By
comparison, a public cloud manages hundreds of thousands of servers spread across a fleet of
global data centers. The infrastructure and software services are already there, and users can
assemble the infrastructure for a big data project of almost any size.

 Agility

 Not all big data projects are the same. One project may need 100 servers, and another project
might demand 2,000 servers. With cloud, users can employ as many resources as needed to
accomplish a task and then release those resources when the task is complete.

 Cost

 A business data center is an enormous capital expense. Beyond hardware, businesses must also
pay for facilities, power, ongoing maintenance and more. The cloud works all those costs into a
flexible rental model where resources and services are available on demand and follow a pay-per-
use model.
The pros of big data in the
cloud
 Accessibility
 Many clouds provide a global footprint, which enables resources and services
to deploy in most major global regions. This enables data and processing
activity to take place proximally to the region where the big data task is
located. For example, if a bulk of data is stored in a certain region of a cloud
provider, it's relatively simple to implement the resources and services for a
big data project in that specific cloud region -- rather than sustaining the cost
of moving that data to another region.
 Resilience
 Data is the real value of big data projects, and the benefit of cloud resilience
is in data storage reliability. Clouds replicate data as a matter of standard
practice to maintain high availability in storage resources, and even more
durable storage options are available in the cloud.
The cons of big data in the cloud

 Public clouds and many third-party big data services have proven their value in big data use
cases. Despite the benefits, businesses must also consider some of the potential pitfalls. Some
major disadvantages of big data in the cloud can include the following.

 Network dependence

 Cloud use depends on complete network connectivity from the LAN, across the internet, to the
cloud provider's network. Outages along that network path can result in increased latency at best
or complete cloud inaccessibility at worst. While an outage might not impact a big data project in
the same ways that it would affect a mission-critical workload, the effect of outages should still be
considered in any big data use of the cloud.

 Storage costs

 Data storage in the cloud can present a substantial long-term cost for big data projects. The three
principal issues are data storage, data migration and data retention. It takes time to load large
amounts of data into the cloud, and then those storage instances incur a monthly fee. If the data is
moved again, there may be additional fees. Also, big data sets are often time-sensitive, meaning
that some data may have no value to a big data analysis even hours into the future. Retaining
unnecessary data costs money, so businesses must employ comprehensive data retention and
deletion policies to manage cloud storage costs around big data.
The cons of big data in the cloud

 Security
 The data involved in big data projects can involve proprietary or personally
identifiable data that is subject to data protection and other industry- or
government-driven regulations. Cloud users must take the steps needed to
maintain security in cloud storage and computing through adequate
authentication and authorization, encryption for data at rest and in flight, and
copious logging of how they access and use data.
 Lack of standardization
 There is no single way to architect, implement or operate a big data
deployment in the cloud. This can lead to poor performance and expose the
business to possible security risks. Business users should document big data
architecture along with any policies and procedures related to its use. That
documentation can become a foundation for optimizations and improvements
for the future
Choose the right cloud deployment model

 Hybrid cloud
 A hybrid cloud is useful when sharing specific resources. For example, a
hybrid cloud might enable big data storage in the local private cloud --
effectively keeping data sets local and secure -- and use the public cloud for
compute resources and big data analytical services. However, hybrid clouds
can be more complex to build and manage, and users must deal with all of
the issues and concerns of both public and private clouds.
 Multi-cloud
 With multiple clouds, users can maintain availability and use cost benefits.
However, resources and services are rarely identical between clouds, so
multiple clouds are more complex to manage. This cloud model also has
more risks of security oversights and compliance breaches than single public
cloud use. Considering the scope of big data projects, the added complexity
of multi-cloud deployments can add unnecessary challenges to the effort.
Choose the right cloud deployment model

 Private cloud
 Private clouds give businesses control over their cloud environment, often to
accommodate specific regulatory, security or availability requirements.
However, it is more costly because a business must own and operate the
entire infrastructure. Thus, a private cloud might only be used for sensitive
small-scale big data projects.
 Public cloud
 The combination of on-demand resources and scalability makes public cloud
ideal for almost any size of big data deployment. However, public cloud users
must manage the cloud resources and services it uses. In a shared
responsibility model, the public cloud provider handles the security of the
cloud, while users must configure and manage security in the cloud.
Providers
 AWS

• Amazon Elastic MapReduce

• AWS Deep Learning AMIs

• Amazon SageMaker

 Microsoft Azure

• Azure HDInsight

• Azure Analysis Services

• Azure Databricks

 Google Cloud

• Google BigQuery

• Google Cloud Dataproc

• Google Cloud AutoML


What is machine learning?
 Machine learning is a branch of artificial intelligence
(AI) and computer science which focuses on the use
of data and algorithms to imitate the way that
humans learn, gradually improving its accuracy.

 Over the last couple of decades, the technological


advances in storage and processing power have
enabled some innovative products based on machine
learning, such as Netflix’s recommendation engine
and self-driving cars.
• Artificial Intelligence: a
program that can sense,
reason, act and adapt.
• Machine Learning:
algorithms whose
performance improve as
they are exposed to more
data over time.
• Deep Learning: subset of
machine learning in which
multilayered neural
networks learn from vast
amounts of data.
Aplication of
Smart systems
Aplication of Smart systems
 Streaming: Event stream processing correlates and enriches events from multiple
sources to discover meaningful associations. Relationships can be formed through a
combination of both spatial and temporal scales or through data aggregation itself.
Companies drive tremendous value by merely correlating data from multiple
streams enabling event recognition and notification.

 Enrichment: Data enrichment provides the ability to combine data that is in flight
with data at rest from a tertiary source as a means of augmenting the data.

 Archiving: A data lake provides a distributed data store (e.g., Hadoop) that can host
structured (relational dataset), semi-structured (XML, JSON) or unstructured data
(ex. PDF, document). A data archive will provide future enablement of data mining
and machine learning. How data is stored goes a long way in dictating how the data
is used. There are many trade-offs shown by the CAP conjecture published by Eric
Brewer that illustrates that it is impossible to provide any more than two of the
following three guarantees: consistency, availability and partition tolerance.
Application of Smart
systems
 Analyzing: How smart can a system be without analysis? Through
combinations of data mining and machine learning (e.g., Sparx
ML), Smart Systems can become increasingly cognitive and
autonomous. Through iterations, systems can discover new
patterns and new meanings and use them to find new
opportunities and capabilities to automate.

 Automation: Automation is achieved through technology-based


algorithms that computers use to control both physical and
software components that allow Smart Systems to perform
algorithms such as closed-loop control as seen with modern
building management systems.

You might also like