Introduction To Smart Systems
Introduction To Smart Systems
systems
Big data and Cloud
services
Big data
Scalability
A typical business data center faces limits in physical space, power, cooling and the budget to
purchase and deploy the sheer volume of hardware it needs to build a big data infrastructure. By
comparison, a public cloud manages hundreds of thousands of servers spread across a fleet of
global data centers. The infrastructure and software services are already there, and users can
assemble the infrastructure for a big data project of almost any size.
Agility
Not all big data projects are the same. One project may need 100 servers, and another project
might demand 2,000 servers. With cloud, users can employ as many resources as needed to
accomplish a task and then release those resources when the task is complete.
Cost
A business data center is an enormous capital expense. Beyond hardware, businesses must also
pay for facilities, power, ongoing maintenance and more. The cloud works all those costs into a
flexible rental model where resources and services are available on demand and follow a pay-per-
use model.
The pros of big data in the
cloud
Accessibility
Many clouds provide a global footprint, which enables resources and services
to deploy in most major global regions. This enables data and processing
activity to take place proximally to the region where the big data task is
located. For example, if a bulk of data is stored in a certain region of a cloud
provider, it's relatively simple to implement the resources and services for a
big data project in that specific cloud region -- rather than sustaining the cost
of moving that data to another region.
Resilience
Data is the real value of big data projects, and the benefit of cloud resilience
is in data storage reliability. Clouds replicate data as a matter of standard
practice to maintain high availability in storage resources, and even more
durable storage options are available in the cloud.
The cons of big data in the cloud
Public clouds and many third-party big data services have proven their value in big data use
cases. Despite the benefits, businesses must also consider some of the potential pitfalls. Some
major disadvantages of big data in the cloud can include the following.
Network dependence
Cloud use depends on complete network connectivity from the LAN, across the internet, to the
cloud provider's network. Outages along that network path can result in increased latency at best
or complete cloud inaccessibility at worst. While an outage might not impact a big data project in
the same ways that it would affect a mission-critical workload, the effect of outages should still be
considered in any big data use of the cloud.
Storage costs
Data storage in the cloud can present a substantial long-term cost for big data projects. The three
principal issues are data storage, data migration and data retention. It takes time to load large
amounts of data into the cloud, and then those storage instances incur a monthly fee. If the data is
moved again, there may be additional fees. Also, big data sets are often time-sensitive, meaning
that some data may have no value to a big data analysis even hours into the future. Retaining
unnecessary data costs money, so businesses must employ comprehensive data retention and
deletion policies to manage cloud storage costs around big data.
The cons of big data in the cloud
Security
The data involved in big data projects can involve proprietary or personally
identifiable data that is subject to data protection and other industry- or
government-driven regulations. Cloud users must take the steps needed to
maintain security in cloud storage and computing through adequate
authentication and authorization, encryption for data at rest and in flight, and
copious logging of how they access and use data.
Lack of standardization
There is no single way to architect, implement or operate a big data
deployment in the cloud. This can lead to poor performance and expose the
business to possible security risks. Business users should document big data
architecture along with any policies and procedures related to its use. That
documentation can become a foundation for optimizations and improvements
for the future
Choose the right cloud deployment model
Hybrid cloud
A hybrid cloud is useful when sharing specific resources. For example, a
hybrid cloud might enable big data storage in the local private cloud --
effectively keeping data sets local and secure -- and use the public cloud for
compute resources and big data analytical services. However, hybrid clouds
can be more complex to build and manage, and users must deal with all of
the issues and concerns of both public and private clouds.
Multi-cloud
With multiple clouds, users can maintain availability and use cost benefits.
However, resources and services are rarely identical between clouds, so
multiple clouds are more complex to manage. This cloud model also has
more risks of security oversights and compliance breaches than single public
cloud use. Considering the scope of big data projects, the added complexity
of multi-cloud deployments can add unnecessary challenges to the effort.
Choose the right cloud deployment model
Private cloud
Private clouds give businesses control over their cloud environment, often to
accommodate specific regulatory, security or availability requirements.
However, it is more costly because a business must own and operate the
entire infrastructure. Thus, a private cloud might only be used for sensitive
small-scale big data projects.
Public cloud
The combination of on-demand resources and scalability makes public cloud
ideal for almost any size of big data deployment. However, public cloud users
must manage the cloud resources and services it uses. In a shared
responsibility model, the public cloud provider handles the security of the
cloud, while users must configure and manage security in the cloud.
Providers
AWS
• Amazon SageMaker
Microsoft Azure
• Azure HDInsight
• Azure Databricks
Google Cloud
• Google BigQuery
Enrichment: Data enrichment provides the ability to combine data that is in flight
with data at rest from a tertiary source as a means of augmenting the data.
Archiving: A data lake provides a distributed data store (e.g., Hadoop) that can host
structured (relational dataset), semi-structured (XML, JSON) or unstructured data
(ex. PDF, document). A data archive will provide future enablement of data mining
and machine learning. How data is stored goes a long way in dictating how the data
is used. There are many trade-offs shown by the CAP conjecture published by Eric
Brewer that illustrates that it is impossible to provide any more than two of the
following three guarantees: consistency, availability and partition tolerance.
Application of Smart
systems
Analyzing: How smart can a system be without analysis? Through
combinations of data mining and machine learning (e.g., Sparx
ML), Smart Systems can become increasingly cognitive and
autonomous. Through iterations, systems can discover new
patterns and new meanings and use them to find new
opportunities and capabilities to automate.