0% found this document useful (0 votes)
29 views16 pages

Case Study

esign Principles for Web Connectivity (RGPV 2023 special) (1)

Uploaded by

321506402298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views16 pages

Case Study

esign Principles for Web Connectivity (RGPV 2023 special) (1)

Uploaded by

321506402298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

CASE STUDY

- 321506402090
D.V SRIJA
4/6 CSE-2
CASE STUDY
UNIT 7-INSIDE CLOUD
 INTRODUCTION TO CLOUD COMPUTING
 INTRODUCTION TO MAP REDUCE
 BIG DATA AND ITS IMPACT ON CLOUD
COMPUTING
 HADOOP-OVERVIEW OF BIG DATA
 BUSINESS IMPACT OF CLOUD COMPUTING
1.Introdution to Cloud Computing:
What is Cloud Computing?
Cloud computing is the on-demand access of
computing resources—physical servers or virtual servers,
data storage, networking capabilities, application
development tools, software, AI-powered analytic tools and
more—over the internet with pay-per-use pricing.
The cloud computing model offers customers greater
flexibility and scalability compared to traditional on-premises
infrastructure.
Cloud computing plays a pivotal role in our everyday
lives, whether accessing a cloud application like Google
Gmail, streaming a movie on Netflix or playing a
cloud-hosted video game.
Types of Cloud Computing:
Public:
A public cloud is a type of cloud computing in which a
cloud service provider makes computing resources available
to users over the public internet. These include SaaS
applications, individual virtual machines (VMs).
Private:
A private cloud is a cloud environment where all cloud
infrastructure and computing resources are dedicated to one
customer only. Private cloud combines many benefits of
cloud computing—including elasticity, scalability and ease of
service delivery—with the access control, security and
resource customization of on-premises infrastructure.
Hybrid:
A hybrid cloud is just what it sounds like: a combination
of public cloud, private cloud and on-premises environments.
Specifically (and ideally), a hybrid cloud connects a
combination of these three environments into a single,
flexible infrastructure for running the organization’s
applications and workloads.

Cloud computing services:


IaaS (Infrastructure-as-a-Service):
IaaS (Infrastructure-as-a-Service) provides on-demand
access to fundamental computing resources—physical and
virtual servers, networking and storage—over the internet on
a pay-as-you-go basis.
PaaS (Platform-as-a-Service):
PaaS provides software developers with an on-demand
platform—hardware, complete software stack, infrastructure
and development tools—for running, developing and
managing applications without the cost, complexity and
inflexibility of maintaining that platform on-premises. These
include servers, networks, storage, operating system
software, middleware and databases.
SaaS (Software-as-a-Service):
SaaS (Software-as-a-Service), also known as cloud-based
software or cloud applications, is application software hosted
in the cloud.
SaaS is the primary delivery model for most commercial
software today. for example, (Salesforce) to robust enterprise
database and artificial intelligence (AI) software.
2.MAP REDUCE OVERVIEW:
 WHAT IS MAPREDUCING?
A MapReduce is a data processing tool which is used to
process the data parallelly in a distributed form. It was
developed in 2004, on the basis of paper titled as
"MapReduce: Simplified Data Processing on Large Clusters,"
published by Google.

for example, (Salesforce) to robust enterprise database


and artificial intelligence (AI) software.
Steps in Map Reduce:
The map takes data in the form of pairs and returns a list
of <key, value> pairs. The keys will not be unique in this case.
Using the output of Map, sort and shuffle are applied by
the Hadoop architecture. This sort and shuffle acts on these
list of <key, value> pairs and sends out unique keys and a list
of values associated with this unique key <key, list(values)>.
An output of sort and shuffle sent to the reducer phase.
The reducer performs a defined function on a list of values
for unique keys, and Final output <key, value> will be
stored/displayed.
Let us take a real-world example to comprehend the
power of MapReduce. Twitter receives around 500 million
tweets per day, which is nearly 3000 tweets per second. The
following illustration shows how Tweeter manages its tweets
with the help of MapReduce.
Tokenize: Tokenizes the tweets into maps of tokens and
writes them as key-value pairs.
Filter: Filters unwanted words from the maps of
tokens and writes the filtered maps as key-value pairs.
Count: Generates a token counter per word.
Aggregate Counters: Prepares an aggregate of similar
counter values into small manageable units.

3.Big Data and Its Impact on Cloud Computing:


Big Data is a term that describes extremely large and
complex sets of structured and unstructured data that are
too cumbersome to process through traditional database
management tools. The true power of Big Data lies in the
opportunity for deep analysis it provides. Analyzing Big Data
can lead to uncovering patterns, correlations, and insights
that are invaluable in making data-driven decisions and
automating various aspects of a business.
One of the most popular frameworks for understanding
Big Data is the concept of the Three Vs: Volume, Velocity, and
Variety.
Volume refers to the sheer size of the data, often
ranging from terabytes to petabytes.
Velocity indicates the speed at which new data is
generated and processed. Businesses like social media
platforms may deal with real-time or near-real-time
information that requires rapid processing.
Variety stands for the different types of data; in addition
to traditional structured data, Big Data can include text,
images, sound, video, and more.

Challenges of implementing big data


The most commonly reported big data challenges
include:
1.Lack of data talent and skills:
Data scientists, data analysts, and data engineers are
in short supply—and are some of the most highly sought
after (and highly paid) professionals in the IT industry. Lack of
big data skills and experience with advanced data tools is one
of the primary barriers to realizing value from big data
environments.
2.Speed of data growth:
Big data, by nature, is always rapidly changing and
increasing. Without a solid infrastructure in place that can
handle your processing, storage, network, and security
needs, it can become extremely difficult to manage.
3.Problems with data quality:
Data quality directly impacts the quality of decision-
making, data analytics, and planning strategies. Raw data is
messy and can be difficult to curate. Having big data doesn’t
guarantee results unless the data is accurate, relevant, and
properly organized for analysis. This can slow down reporting,
but if not addressed, you can end up with misleading results
and worthless insights.
4.Security concerns:
Big data contains valuable business and customer
information, making big data stores high-value targets for
attackers. Since these datasets are varied and complex, it can
be harder to implement comprehensive strategies and
policies to protect them.
4.Hadoop:
Overview and Its Role in Cloud Computing:
Hadoop is an open source framework based on Java that
manages the storage and processing of large amounts of data
for applications. Hadoop uses distributed storage and parallel
processing to handle big data and analytics jobs, breaking
workloads down into smaller workloads that can be run at
the same time.

How does Hadoop work?


Hadoop allows for the distribution of datasets across a
cluster of commodity hardware. Processing is performed in
parallel on multiple servers simultaneously.
Software clients input data into Hadoop. HDFS handles
metadata and the distributed file system. MapReduce then
processes and converts the data. Finally, YARN divides the
jobs across the computing cluster.

Modules of Hadoop:
HDFS: Hadoop Distributed File System. Google published its
paper GFS and on the basis of that HDFS was developed. It
states that the files will be broken into blocks and stored in
nodes over the distributed architecture.
Yarn: Yet another Resource Negotiator is used for job
scheduling and manage the cluster.
Map Reduce: This is a framework which helps Java programs
to do the parallel computation on data using key value pair.
The Map task takes input data and converts it into a data set
which can be computed in Key value pair.
The output of Map task is consumed by reduce task and
then the out of reducer gives the desired result.
Hadoop Common: These Java libraries are used to start
Hadoop and are used by other Hadoop modules. Hadoop
Architecture
The Hadoop architecture is a package of the file system,
MapReduce engine and the HDFS (Hadoop Distributed File
System). The MapReduce engine can be MapReduce/MR1 or
YARN/MR2.
A Hadoop cluster consists of a single master and multiple
slave nodes. The master node includes Job Tracker, Task
Tracker, Name Node, and Data Node whereas the slave node
includes Data Node and Task Tracker.
Role of Hadoop in Cloud Computing:
Hadoop plays a significant role in cloud computing by
enhancing data storage, processing, and analysis capabilities.
Here are some key aspects:
Scalability: Cloud environments can quickly scale resources
up or down based on demand. Hadoop’s ability to add nodes
easily aligns well with cloud elasticity, allowing organizations
to handle large datasets efficiently.
Cost Efficiency: Using commodity hardware in cloud
environments reduces costs significantly. Organizations can
leverage Hadoop on cloud platforms without the need for
expensive infrastructure.
Data Storage and Management: Hadoop can store vast
amounts of structured and unstructured data in the cloud,
making it easier for organizations to manage and analyze
diverse data sources.
Integration with Other Cloud Services: Hadoop can
integrate with various cloud services, including data lakes,
analytics tools, and machine learning platforms, providing a
comprehensive ecosystem for big data solutions.
Flexibility and Accessibility: Cloud-based Hadoop
deployments allow users to access data and analytics tools
from anywhere, facilitating collaboration and real-time data
processing.
Disaster Recovery and Backup: Cloud providers often
offer robust backup and disaster recovery options, ensuring
that Hadoop data is secure and recoverable.

5.Business impact of cloud computing:


1. Cost Efficiency:
Reduced Capital Expenditure: Businesses can avoid the
high costs of purchasing and maintaining hardware and
software by utilizing cloud services, converting fixed costs
into variable costs.
Pay-as-You-Go Model: Organizations can pay only for the
resources they use, leading to better cost management and
budgeting.
2.Enhanced Data Security and Compliance:
Robust Security Measures: Many cloud providers offer
advanced security features and compliance certifications,
which can be more effective than traditional in-house
solutions.
Regular Updates: Cloud services often include automatic
updates and patches, ensuring systems are secure and up to
date.
3.Usiness Continuity and Disaster Recovery:
Data Backup Solutions: Cloud computing simplifies
data backup and recovery processes, reducing downtime and
ensuring business continuity in case of disasters.
Geographic Redundancy: Data can be replicated
across multiple locations, enhancing resilience and
availability.
4.Enhanced Customer Experience
Personalization: Businesses can analyze customer data
more effectively, enabling personalized services and better
customer engagement.
Faster Service Delivery: Cloud solutions can improve
response times and service delivery, enhancing overall
customer satisfaction.
 Conclusion:
Cloud computing will affect large part of computer
industry including Software companies, Internet service
providers. Cloud computing makes it very easy for companies
to provide their products to end-user without worrying about
hardware configurations and other requirements of servers.
The cloud computing and virtualization are distinguished by
the fact that all of the control plane activities that center
around creation, management, and maintenance of the
virtual environment, are outsourced to an automated layer
that is called as an API and other management servers for the
cloud management.
In simple words, the virtualization is a part of cloud
computing where manual management is done for
interacting with a hypervisor. On the other hand, in cloud
computing, the activities are self-managing where an API
(Application Program Interface) is used so that the users can
self-consume the cloud service.

You might also like