SRM Unit 1 PPT Full Unit 1 Information Storage and Management
SRM Unit 1 PPT Full Unit 1 Information Storage and Management
18CSE360T
1
UNIT - 1
S1 – SLO1 - Introduction to Information Storage and Management
2
Introduction to Information Storage and Management
4
Data
It is a collection of raw facts from which conclusions may be
drawn.
Data is converted into more
convenient form and stored in
computer − digital data
Factors for digital data growth
are:
Increase in data-processing
capabilities
Lower cost of digital storage
Affordable and faster
communication technology
Rapid increase of
applications and smart
devices 5
Types of Data
Data can be classified as:
Structured
Unstructured
Majority of data being created is unstructured
6
Big Data
It refers to data sets whose sizes are beyond the ability of commonly
used software tools to capture, store, manage, and process within
acceptable time limits.
Includes both structured and unstructured data generated by variety of
sources
Big data analysis in real time requires new techniques and tools that
provide:
High performance
Massively parallel processing (MPP) data platforms
Advanced analytics
Big data analytics provide an opportunity to translate large volumes of
data into right decisions
7
Big Data Ecosystem and interconnection
8
Geospatial data acquisition
10
Remote sensing Data collection
UAV function during disaster
11
Source: Milan Erdelj and Enrico Natalizio, UAV-Assisted Disaster Management: Applications and Open Issues, 2016 International Workshop on Wireless Sensor, Actuator and Robot Networks - ICNC Workshop
Collection of data using GPS and satellite remote sensing techniques
Requirement of Information
Growth of digital information has resulted in information
explosion
Necessary to store, protect, optimize the information
To gain competitive advantage
To derive new business opportunity
13
Storage
Stores data created by individuals and organizations
Provides access to data for further processing
Storage devices are:
Media card in a cell phone or digital camera
DVDs, CD-ROMs
Disk drives
Disk arrays
Tapes
14
UNIT - 1
S1 – SLO2 - Evolution of Storage Technology and Architecture
15
Evolution of Storage Architecture
Server-centric storage architecture
In the open systems, the storage was typically internal to the server. These storage
devices could not be shared with any other servers
(server-centric storage architecture)
Each server has a limited number of storage devices
Any administrative tasks, such as maintenance of the server or increasing
storage capacity, might result in unavailability of information.
Rapid increase in production of departmental servers in an enterprise resulted in
unprotected, unmanaged, fragmented islands of information and increased
capital and operating expenses.
To overcome these challenges, storage technology evolved from non-intelligent
internal storage to intelligent networked storage.
16
Evolution of Storage Architecture
information-centric architecture
19
Technology Evolution
20
Unit - 1
S2 – SLO1 - Data Centre Infrastructure
21
Data Centric Infrastructure
22
Data Centric Infrastructure
23
Data Centric Infrastructure - Example
A customer places an order through a client machine connected over a LAN/ WAN
to a host running an order-processing application.
The client accesses the DBMS on the host through the application to provide order-
related information, such as the customer name, address, payment method, products
ordered, and quantity ordered.
The DBMS uses the host operating system to write this data to the physical disks in
the storage array.
The storage networks provide the communication link between the host and the
storage array and transports the request to read or write data between them.
The storage array, after receiving the read or write request from the host, performs
the necessary operations to store the data on physical disks.
24
Key Characteristics of a Data Center
Availability: ensure the availability of information when required such as financial
services, telecommunications, and e-commerce.
Security: must establish policies, procedures, and core element integration to prevent
unauthorized access to information.
Scalability: business growth often requires deploying more servers, new applications,
and additional databases and resources should scale based on requirements, without
interrupting business operations.
Performance: provide optimal performance based on the required service levels.
Data integrity: Data integrity refers to mechanisms, such as error correction codes or
parity bits, which ensure that data is stored and retrieved exactly as it was received.
Capacity: require adequate resources to store
and process large amounts of data, efficiently.
Based on the requirement, the data center must
provide additional capacity without interrupting
availability or with minimal disruption. Capacity
may be managed by reallocating the existing
resources or by adding new resources.
Manageability: provide easy and integrated
management of all its elements. Manageability
can be achieved through automation and
reduction of human (manual) intervention in
common tasks. 25
Managing a Data Center
Monitoring - Continuous
process of gathering information
on various elements and services Monitoring
27
Virtualization
Virtualization is a technique of abstracting physical resources and
making them appear as logical resources
For example partitioning of raw disks
28
Virtualization
Virtualization is the creation of a virtual environment (rather than actual),
such as a server, a desktop, a storage device, an operating system or
network resources.
In cloud system, Users share the data present in the clouds like application etc,
actually with the help of virtualization users shares the information.
All severs and the software application which are required by other cloud
providers are maintained by the third party people, and it is chargeable.
29
Normal vs Virtual
30
31
32
33
Hardware Virtualization
Virtual machine software or Type :
virtual machine manager (VMM) Full Virtualization –The complete
is directly installed on the simulation of the actual hardware
hardware system is known as
hardware virtualization. takes place to allow the software to
The main job of hypervisor is to run an unmodified guest OS.
control and monitoring the
processor, memory and other Para Virtualization – In this type
hardware resources. of virtualization, software
After virtualization of hardware unmodified runs in modified OS
system installation of different
operating system on it and run as a separate system.
different applications on those OS.
Partial Virtualization – In this
Usage: type of hardware virtualization, the
Hardware virtualization is mainly software may need modification
done for the server platforms, to run.
because controlling virtual
machines is much easier than
controlling a physical server.
34
Network Virtualization
It refers to the management and monitoring of a computer network as
a single entity from a single software-based administrator’s
console.
It is intended to allow network optimization (effective use) of data
transfer rates, scalability, reliability, flexibility, and security.
It also automates many network administrative tasks.
Usage:
Network virtualization is specifically useful for networks that
experience a huge, rapid, and unpredictable traffic increase.
The intended result of network virtualization provides improved
network productivity and efficiency.
Two categories:
Internal: Provide network-like functionality to a single system.
External: Combine many networks or parts of networks into a
virtual unit.
35
Storage Virtualization
Multiple network storage resources are present as a single
storage device for easier and more efficient management of these
resources. It provides various advantages as follows:
Improved storage management in a heterogeneous IT
environment
Easy updates, better availability
Reduced downtime (unavailability)
Better storage utilization
Automated management
Type
Block- It works before the file system exists. It replaces controllers
and takes over at the disk level.
File- The server that uses the storage must have software installed
on it in order to enable file-level usage.
36
M emory Virtualization
It introduces a way to decouple memory from the server to provide a shared,
distributed or networked function.
It enhances performance by providing greater memory capacity without any
addition to the main memory. That ’s why a portion of the disk drive serves as an
extension of the main memory.
37
Software Virtualization
Types:
Operating system
Application virtualization
Service virtualization
38
Data Virtualization
Without any technical details, you can easily manipulate data
(organized way) and know how it is formatted or where it is
physically located.
It decreases the data errors and workload.
39
Desktop virtualization
It provides work convenience and security.
As one can access remotely, you are able to work from any location and on any
PC.
It provides a lot of flexibility for employees to work from home or on the go.
It also protects confidential data from being lost or stolen by keeping it safe on
central servers.
Also is essential for restricted data.
40
Cloud Computing
41
Cloud Computing
On-demand self-service
Broad network access
Resource pooling
Rapid elasticity
Measured service
42
Characteristics of Cloud Computing
On-demand Self-service
Technologies Description
Form of distributed computing
Enables resources of numerous computers in a
Grid computing
network to work on a single task at the same
time
Service provisioning model that offers computing
Utility computing
resources as a metered service
Abstracts physical characteristics of IT
resources from resource users
Virtualization
Enables resource pooling and creating virtual
resources from pooled resources
Service-oriented
Provides a set of services that can communicate
architecture
with each other
(SOA) 45
Benefits of Cloud Computing
Benefits Description
Reduces the up-front capital expenditure
Reduced IT cost
(CAPEX)
Provides the ability to deploy new resources
Business agility
quickly
(Quick flow of
Enables businesses to reduce time-to-
work)
market
Enables consumers to scale up, scale down,
scale out, or scale in the demand for
Flexible scaling computing resources easily
Consumers can unilaterally and
automatically scale computing resources
Ensures resource availability at varying
High availability levels, depending on consumer’s policy and
priority 46
Infrastructure
-as-a-Service
(IaaS)
Cloud
Service
Models
Platform Software
-as-a-Service -as-a-Service
(PaaS) (SaaS)
47
Infrastructure-as-a-Service
Consumers deploy their software, including OS and application on
provider’s infrastructure
Computing resources such as processing power, memory,
storage, and networking components are offered as service
Example: Amazon Elastic Compute Cloud
Consumers have control over the OSs and deployed applications
Database
OS
Network
48
Platform-as-a-Service
Database
Provider’s Resources
OS
Compute
Cloud
Storage
Network
49
Software-as-a-Service
Application
Database
Provider’s Resources
OS
Compute
Cloud
Storage
Network
50
Infrastructure as a Service Platform as a Service Software as a Service
(IaaS). (PaaS) (SaaS)
A service model that A service model that Also referred to as
involves outsourcing the involves outsourcing the “software on demand,”
basic infrastructure used basic infrastructure and outsourcing the
to support operations-- platform (Windows, Unix) infrastructure, platform,
including storage, and software/applications.
hardware, servers, and PaaS facilitates deploying
networking components. applications without the Services are available to
cost and complexity of the customer for a fee,
The service provider owns buying and managing the pay-as-you-go, or a no
the infrastructure underlying hardware and charge model.
equipment and is software where the Example : Google Apps,
responsible for housing, applications are hosted. Dropbox, Salesforce, Cisco
running, and maintaining it. WebEx, Concur,
The customer typically Example : AWS Elastic GoToMeeting
pays on a per-use basis. Beanstalk, Windows Azure,
Heroku, Force.com, Google
Example: DigitalOcean, App Engine, Apache
Linode, Rackspace, Stratos, OpenShift
Amazon Web Services
(AWS), Cisco Metapod,
Microsoft Azure, Google
Compute Engine (GCE) 51
Cloud Deployment Models
Public
Cloud
Private Deployment Hybrid
Models
Community
52
Public cloud model, the cloud infrastructure is provisioned for open use by the general
public.
It may be owned, managed, and operated by a business, academic, or government
organization, or some combination of them.
It exists on the premises of the cloud provider (jointly).
Consumers use the cloud services offered by the providers via the Internet and pay
metered usage charges or subscription fees.
An advantage of the public cloud is its low capital cost with enormous scalability.
For consumers, these benefits come with certain risks:
no control over the resources
the security of confidential data, network performance, and inter-operability issues.
Example: Popular public cloud service providers are Amazon, Google, and
Salesforce.com.
53
Private cloud model, the cloud infrastructure is provisioned for exclusive use by a
single organization comprising multiple consumers (for example, business units).
It may be owned, managed, and operated by the organization, a third party, or some
combination of them, and it may exist on or off premises.
55
Community cloud model, the cloud infrastructure is provisioned for
exclusive use by a specific community of consumers from
organizations that have shared.
Concern services about the (for example, mission, security
requirements, policy, and compliance considerations).
It may be owned, managed, and operated by one or more of the
organizations in the community, a third party, or some combination
of them, and it may exist on or off premises.
56
In a community cloud, the costs spread over to fewer consumers
than a public cloud.
This option is more expensive but might offer a higher level of
privacy, security, and compliance.
The community cloud also offers organizations access to a vast
pool of resources compared to the private cloud.
An example in which a community cloud could be useful is
government agencies.
If various agencies within the government operate under similar
guidelines, they could all share the same infrastructure and lower
their individual agency’s investment.
57
Hybrid cloud model, the cloud infrastructure is a composition of two
or more distinct cloud infrastructures (private, community, or
public)
Remain unique entities, but are bound together by standardized or
proprietary technology that enables data and application
portability (load balancing between clouds).
The hybrid model allows an organization to deploy less critical
applications and data to the public cloud, leveraging the
scalability and cost-effectiveness of the public cloud.
The organization’s mission-critical applications and data remain on
the private cloud that provides greater security.
58
Cloud Challenges – Consumer’s Perspective
59
Cloud Challenges – Provider’s Perspective
60
Unit - 1
S3 – SLO1 – Key challenges in managing information
61
Key Challenges in Managing Information
Exploding digital universe
Multifold increase of information growth
Increasing dependency on information
The strategic use of information plays
Changing value of information
Information that is valuable today may become less important
tomorrow
Constraints:
Cost
Physical environment
Maintenance and support
Compliance – regulatory and legal
Hardware and software infrastructure
Interoperability and compatibility
62
Challenges during Data collection in physical mode
Landslide in
Himalayan terrain
Less storage
capacity and
less power
backup based
data collection
Cloud/ Rain in
Himalayan terrain Rock slide in
Himalayan terrain
64
Unit - 1
S3 – SLO2 – Data Center Environment: Application
65
Data Center Environment
Application
Storage DBMS
Data Center
Environment
Host or
Connectivity
Compute
66
Data Center Environment - Application
67