Big Data and Cloud
Big Data and Cloud
Big Data is a concept that deals with storing, processing and analyzing
large amounts of data. Cloud computing on the other hand is about
offering the infrastructure to enable such processes in a cost-effective and
efficient manner.
Big data has also become key in machine learning to train complex
models and facilitate AI.While there is benefit to big data, the sheer
amount of computing resources and software services needed to support
big data efforts can strain the financial and intellectual capital of even the
largest businesses. The cloud has made great strides in filling the need for
big data. It can provide almost limitless computing resources and services
that make big data initiatives possible for any business.
Big data refers to vast amounts of data that can be structured,
semistructured or unstructured. It is all about analytics and is usually
derived from different sources, such as user input, IoT sensors and sales
data.
Big data also refers to the act of processing enormous volumes of data to
address some query, as well as identify a trend or pattern. Data is
analyzed through a set of mathematical algorithms, which vary depending
on what the data means, how many sources are involved and the
business's intent behind the analysis. Distributed computing software
platforms, such as Apache Hadoop, Databricks and Cloudera, are used to
split up and organize such complex analytics.
The problem with big data is the size of the computing and networking
infrastructure needed to build a big data facility. The financial investment
in servers, storage and dedicated networks can be substantial, as well as
the software knowledge required to set up an effective distributed
computing environment. And, once an organization makes an investment
in big data, it's only valuable to the business when it's operating -- it's
worthless when idle. The demands of big data have long kept the
technology limited to only the largest and best-funded organizations. This
is where cloud computing has made incredible inroads.
Cloud computing services can be broken down into three models that
stack on top of one another:
Agility
Not all big data projects are the same. One project may need 100 servers, and
another project might demand 2,000 servers. With cloud, users can employ as
many resources as needed to accomplish a task and then release those resources
when the task is complete.
Cost
A business data center is an enormous capital expense. Beyond hardware,
businesses must also pay for facilities, power, ongoing maintenance and more.
The cloud works all those costs into a flexible rental model where resources and
services are available on demand and follow a pay-per-use model.
Accessibility
Many clouds provide a global footprint, which enables resources and services to
deploy in most major global regions. This enables data and processing activity to
take place proximally to the region where the big data task is located. For
example, if a bulk of data is stored in a certain region of a cloud provider, it's
relatively simple to implement the resources and services for a big data project in
that specific cloud region -- rather than sustaining the cost of moving that data to
another region.
Resilience
Data is the real value of big data projects, and the benefit of cloud resilience is in
data storage reliability. Clouds replicate data as a matter of standard practice to
maintain high availability in storage resources, and even more durable storage
options are available in the cloud.
Network dependence
Cloud use depends on complete network connectivity from the LAN, across the
internet, to the cloud provider's network. Outages along that network path can
result in increased latency at best or complete cloud inaccessibility at worst.
While an outage might not impact a big data project in the same ways that it
would affect a mission-critical workload, the effect of outages should still be
considered in any big data use of the cloud.
Storage costs
Data storage in the cloud can present a substantial long-term cost for big data
projects. The three principal issues are data storage, data migration and data
retention. It takes time to load large amounts of data into the cloud, and then those
storage instances incur a monthly fee. If the data is moved again, there may be
additional fees. Also, big data sets are often time-sensitive, meaning that some
data may have no value to a big data analysis even hours into the future. Retaining
unnecessary data costs money, so businesses must employ comprehensive data
retention and deletion policies to manage cloud storage costs around big data.
Security
The data involved in big data projects can involve proprietary or personally
identifiable data that is subject to data protection and other industry- or
government-driven regulations. Cloud users must take the steps needed to
maintain security in cloud storage and computing through adequate authentication
and authorization, encryption for data at rest and in flight, and copious logging of
how they access and use data.
Lack of standardization
There is no single way to architect, implement or operate a big data deployment in
the cloud. This can lead to poor performance and expose the business to possible
security risks. Business users should document big data architecture along with
any policies and procedures related to its use. That documentation can become a
foundation for optimizations and improvements for the future.
Private cloud: - Private clouds give businesses control over their cloud
environment, often to accommodate specific regulatory, security or availability
requirements. However, it is more costly because a business must own and
operate the entire infrastructure. Thus, a private cloud might only be used for
sensitive small-scale big data projects.
Providers not only offer services and documentation, but can also arrange for
support and consulting to help businesses optimize their big data projects. A
sampling of available big data services from the top three providers include the
following.
AWS
Amazon Elastic MapReduce, AWS Deep Learning AMIs,Amazon SageMaker
Microsoft Azure
Azure HDInsight, Azure Analysis Services, Azure Databricks
Google Cloud
Google BigQuery, Google Cloud Dataproc,Google Cloud AutoML
Parameters of Big Data Cloud Computing
comparison