0% found this document useful (0 votes)
183 views67 pages

SRM Unit 1 PPT Full Unit 1 Information Storage and Management

Uploaded by

Senthil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views67 pages

SRM Unit 1 PPT Full Unit 1 Information Storage and Management

Uploaded by

Senthil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

UNIT - 1

INFORMATION STORAGE AND MANAGEMENT

18CSE360T

1
UNIT - 1
S1 – SLO1 - Introduction to Information Storage and Management

2
Introduction to Information Storage and Management

 Information is increasingly important in our daily lives.


 Information dependents on-demand world.
 Need information as per requirement.
 Daily access of internet : searches, participate in social networking, send and
receive e-mails, sharing pictures and videos.
 Equipped with a growing number of content-generating devices, more
information is created by individuals than by organizations (including
business, governments, non-profits and so on).
 Information created by individuals gains value when shared with others.
When created, information resides locally on devices, such as cell phones,
smartphones, tablets, cameras, and laptops. To be shared, this information
needs to be uploaded to central data repositories (data centers) via
networks. 3
Cycle of information

4
Data
 It is a collection of raw facts from which conclusions may be
drawn.
Data is converted into more
convenient form and stored in
computer − digital data
Factors for digital data growth
are:
 Increase in data-processing
capabilities
 Lower cost of digital storage
 Affordable and faster
communication technology
 Rapid increase of
applications and smart
devices 5
Types of Data
Data can be classified as:
 Structured
 Unstructured
Majority of data being created is unstructured

6
Big Data
 It refers to data sets whose sizes are beyond the ability of commonly
used software tools to capture, store, manage, and process within
acceptable time limits.
 Includes both structured and unstructured data generated by variety of
sources
 Big data analysis in real time requires new techniques and tools that
provide:
 High performance
 Massively parallel processing (MPP) data platforms
 Advanced analytics
 Big data analytics provide an opportunity to translate large volumes of
data into right decisions

7
Big Data Ecosystem and interconnection

8
Geospatial data acquisition

Source: Elements of remote sensing (Lillesand and Kiefer, 1994).


9
Geospatial data

10
Remote sensing Data collection
UAV function during disaster

Data collection by aerial flight Data collection by Drone

Input for machine learning


Post Earthquake
Pre Earthquake

11
Source: Milan Erdelj and Enrico Natalizio, UAV-Assisted Disaster Management: Applications and Open Issues, 2016 International Workshop on Wireless Sensor, Actuator and Robot Networks - ICNC Workshop
Collection of data using GPS and satellite remote sensing techniques

(1)Landslide in satellite image (4) Landslide mapping in GIS

Landslide area identification

Landslide vector layer creation


Field verification

Quick Bird image 2016

(2) Landslide in field photo


(5) Landslide area calculation

Area calculation 2016

Input for machine learning


Area calculation 2015
GPS & GIS integration

Field observation 2016

(3) GPS based locational data

GPS based data 12


Information
 Information is the knowledge derived from data

Requirement of Information
 Growth of digital information has resulted in information
explosion
 Necessary to store, protect, optimize the information
 To gain competitive advantage
 To derive new business opportunity

13
Storage
 Stores data created by individuals and organizations
 Provides access to data for further processing
 Storage devices are:
 Media card in a cell phone or digital camera
 DVDs, CD-ROMs
 Disk drives
 Disk arrays
 Tapes

14
UNIT - 1
S1 – SLO2 - Evolution of Storage Technology and Architecture

15
Evolution of Storage Architecture
Server-centric storage architecture

 Organizations had centralized


computers (mainframe) and information
storage devices (tape reels and disk
packs) in their data center.
 The evolution of open systems, business
units/departments to have their own
servers and storage.

 In the open systems, the storage was typically internal to the server. These storage
devices could not be shared with any other servers
(server-centric storage architecture)
 Each server has a limited number of storage devices
 Any administrative tasks, such as maintenance of the server or increasing
storage capacity, might result in unavailability of information.
 Rapid increase in production of departmental servers in an enterprise resulted in
unprotected, unmanaged, fragmented islands of information and increased
capital and operating expenses.
 To overcome these challenges, storage technology evolved from non-intelligent
internal storage to intelligent networked storage.
16
Evolution of Storage Architecture
information-centric architecture

 Transformation of server-centric to information-centric architecture


 In this architecture, storage devices are managed centrally and
independent of servers.
 These centrally-managed storage
devices are shared with multiple
servers.
 When a new server is deployed in the
environment, storage is assigned
from the same shared storage
devices to that server.
 The capacity of shared storage can be
increased dynamically by adding
more storage devices without
impacting information availability.
 In this architecture, information
management is easier and cost-
effective. 17
Technology Evolution
 Direct-attached storage (DAS):
 Connects directly to a server (host) or a group of servers in a cluster.
 Storage can be either internal or external to the server.
 External DAS alleviated the challenges of limited internal storage
capacity.
 Just a Bunch of Disks :(JBOD)
 Consisting of numerous disk drives inside of a single storage
enclosure.
 Redundant Array of Independent Disks (RAID):
 Developed to address the cost, performance, and availability
requirements of data.
 The evolution is continuous.
 It is used in all storage architectures such as DAS, SAN, and so on.
 Storage area network (SAN):
 High-performance Fibre Channel (FC) network to facilitate block-
level communication between servers and storage.
 Storage is partitioned and assigned to a server for accessing its data.
 SAN offers scalability, availability, performance, and cost benefits
compared to DAS.
18
Technology Evolution

 Network-attached storage (NAS):


 Storage for file serving applications.
 Connects to an existing communication network (LAN) and
provides file access to heterogeneous clients.
 Higher scalability, availability, performance, and cost
benefits compared to general purpose file servers.

 Internet Protocol SAN (IP-SAN):


 Latest evolutions in storage
 Convergence of technologies used in SAN and NAS
 Provides block-level communication across a local or wide
area network (LAN or WAN)
 Resulting in greater consolidation and availability of data.

19
Technology Evolution

20
Unit - 1
S2 – SLO1 - Data Centre Infrastructure

21
Data Centric Infrastructure

 Data center is a facility that contains various IT resources to


provide centralized data-processing capabilities.

 Data center store and manage large amounts of mission-


critical data.

 The data center infrastructure includes :


 Computers
 Storage systems
 Network devices
 Power backups
 Environmental controls (such as air conditioning and fire
suppression).

22
Data Centric Infrastructure

Core elements of a data center


 Application - computer program that provides the logic for
computing operations
 Database management system (DBMS) - Provides a
structured way to store data in logically organized tables that
are interrelated
 Host or Compute - computing platform (hardware and
software) that runs applications and databases
 Network - data path that facilitates communication among
various networked devices
 Storage - device that stores data persistently for subsequent
use
These core elements work together to address data-processing
requirements

23
Data Centric Infrastructure - Example

Online order transaction system (Five core elements of a data center)

 A customer places an order through a client machine connected over a LAN/ WAN
to a host running an order-processing application.
 The client accesses the DBMS on the host through the application to provide order-
related information, such as the customer name, address, payment method, products
ordered, and quantity ordered.
 The DBMS uses the host operating system to write this data to the physical disks in
the storage array.
 The storage networks provide the communication link between the host and the
storage array and transports the request to read or write data between them.
 The storage array, after receiving the read or write request from the host, performs
the necessary operations to store the data on physical disks.
24
Key Characteristics of a Data Center
 Availability: ensure the availability of information when required such as financial
services, telecommunications, and e-commerce.
 Security: must establish policies, procedures, and core element integration to prevent
unauthorized access to information.
 Scalability: business growth often requires deploying more servers, new applications,
and additional databases and resources should scale based on requirements, without
interrupting business operations.
 Performance: provide optimal performance based on the required service levels.
 Data integrity: Data integrity refers to mechanisms, such as error correction codes or
parity bits, which ensure that data is stored and retrieved exactly as it was received.
 Capacity: require adequate resources to store
and process large amounts of data, efficiently.
Based on the requirement, the data center must
provide additional capacity without interrupting
availability or with minimal disruption. Capacity
may be managed by reallocating the existing
resources or by adding new resources.
 Manageability: provide easy and integrated
management of all its elements. Manageability
can be achieved through automation and
reduction of human (manual) intervention in
common tasks. 25
Managing a Data Center
 Monitoring - Continuous
process of gathering information
on various elements and services Monitoring

running in a data center


 Reporting - Details on resource
performance, capacity, and
utilization
 Provisioning (proper supply) -
Managing
Configuration and allocation of Data Center
resources to meet the capacity,
availability, performance, and
security requirements Reporting Provisioning

Virtualization and Cloud


computing have changed the way
data center infrastructure resources
are provisioned and managed
26
Unit - 1
S2 – SLO2 - Virtualization and Cloud Computing

27
Virtualization
 Virtualization is a technique of abstracting physical resources and
making them appear as logical resources
 For example partitioning of raw disks

 Pools physical resources and provides an aggregated view of physical


resource capabilities

 Virtual resources can be created from pooled physical resources


 Improves utilization of physical IT resources
 Hard copy to soft copy
 (Physical map – scanned map, book to ebook)

28
Virtualization
 Virtualization is the creation of a virtual environment (rather than actual),
such as a server, a desktop, a storage device, an operating system or
network resources.

 It plays a very important role in the cloud computing technology.

 In cloud system, Users share the data present in the clouds like application etc,
actually with the help of virtualization users shares the information.

 The main usage of Virtualization Technology is to provide the applications


with the standard versions to their cloud users.

 All severs and the software application which are required by other cloud
providers are maintained by the third party people, and it is chargeable.

29
Normal vs Virtual

30
31
32
33
Hardware Virtualization
 Virtual machine software or Type :
virtual machine manager (VMM) Full Virtualization –The complete
is directly installed on the simulation of the actual hardware
hardware system is known as
hardware virtualization. takes place to allow the software to
 The main job of hypervisor is to run an unmodified guest OS.
control and monitoring the
processor, memory and other Para Virtualization – In this type
hardware resources. of virtualization, software
 After virtualization of hardware unmodified runs in modified OS
system installation of different
operating system on it and run as a separate system.
different applications on those OS.
Partial Virtualization – In this
Usage: type of hardware virtualization, the
 Hardware virtualization is mainly software may need modification
done for the server platforms, to run.
because controlling virtual
machines is much easier than
controlling a physical server.

34
Network Virtualization
 It refers to the management and monitoring of a computer network as
a single entity from a single software-based administrator’s
console.
 It is intended to allow network optimization (effective use) of data
transfer rates, scalability, reliability, flexibility, and security.
 It also automates many network administrative tasks.

Usage:
 Network virtualization is specifically useful for networks that
experience a huge, rapid, and unpredictable traffic increase.
 The intended result of network virtualization provides improved
network productivity and efficiency.

Two categories:
 Internal: Provide network-like functionality to a single system.
 External: Combine many networks or parts of networks into a
virtual unit.
35
Storage Virtualization
 Multiple network storage resources are present as a single
storage device for easier and more efficient management of these
resources. It provides various advantages as follows:
 Improved storage management in a heterogeneous IT
environment
 Easy updates, better availability
 Reduced downtime (unavailability)
 Better storage utilization
 Automated management

Type
 Block- It works before the file system exists. It replaces controllers
and takes over at the disk level.
 File- The server that uses the storage must have software installed
on it in order to enable file-level usage.

36
M emory Virtualization
 It introduces a way to decouple memory from the server to provide a shared,
distributed or networked function.
 It enhances performance by providing greater memory capacity without any
addition to the main memory. That ’s why a portion of the disk drive serves as an
extension of the main memory.

Application-level integration – Applications running on connected computers


directly connected to the memory pool through an API or the file system.

Operating System-Level Integration – The operating system first connects to the


memory pool and makes that pooled memory available to applications.

37
Software Virtualization

 It provides the ability to the main computer to run and create


one or more virtual environments.
 It is used to enable a complete computer system in order to allow a
guest OS to run.

For instance letting Linux run as a guest that is natively running a


Microsoft Windows OS (or vice versa, running Windows as a guest
on Linux).

Types:
 Operating system
 Application virtualization
 Service virtualization

38
Data Virtualization
 Without any technical details, you can easily manipulate data
(organized way) and know how it is formatted or where it is
physically located.
 It decreases the data errors and workload.

39
Desktop virtualization
 It provides work convenience and security.
 As one can access remotely, you are able to work from any location and on any
PC.
 It provides a lot of flexibility for employees to work from home or on the go.
 It also protects confidential data from being lost or stolen by keeping it safe on
central servers.
 Also is essential for restricted data.

40
Cloud Computing

 Enables individuals and organizations to use IT resources as a


service over network
 Enables self-service requesting and automates request-fulfillment
process
 Enables users to scale up or scale down the usage of computing
resources quickly
 Enables consumption-based metering
 Consumers pay only for the resources they use
o Example: CPU hours used, amount of data transferred, and
Gigabytes of data stored

41
Cloud Computing

Cloud Computing allows customers to utilize resources (e.g.,


networks, servers, storage, applications, and services) `which are
hosted by service providers.

Typically this is done on a pay-per-use / own charge-per-use basis.

Characteristics of Cloud Computing

 On-demand self-service
 Broad network access
 Resource pooling
 Rapid elasticity
 Measured service

42
Characteristics of Cloud Computing
On-demand Self-service

Enables consumers to unilaterally provision computing capabilities


(examples: server time and storage capacity) as needed
automatically
Consumers view service catalogue via a Web-based user interface
and use it to request for a service
Broad Network Access
 Computing capabilities are available over the network
 Computing capabilities are accessed from a broad range of client
platforms such as:
 Desktop computer
 Laptop
 Tablet
 Mobile device
43
Characteristics of Cloud Computing
Resource Pooling
 Provider’s computing resources are pooled to serve multiple
consumers using a multitenant model
 Resources are assigned from the pool according to consumer
demand
 Consumers have no control or knowledge over the exact location
of the provided resources
Rapid Elasticity (flexibility)
 Computing capabilities can be elastically provisioned and
released
 Computing capabilities are scaled rapidly with consumer’s
demand (Provides a sense of unlimited scalability)
Measured Service
 Cloud computing provides a metering system that continuously
monitors resource consumption and generates reports
 Helps to control and optimize resource use
 Helps to generate billing and chargeback reports 44
Cloud Enabling Technologies

Technologies Description
 Form of distributed computing
 Enables resources of numerous computers in a
Grid computing
network to work on a single task at the same
time
 Service provisioning model that offers computing
Utility computing
resources as a metered service
 Abstracts physical characteristics of IT
resources from resource users
Virtualization
 Enables resource pooling and creating virtual
resources from pooled resources
Service-oriented
 Provides a set of services that can communicate
architecture
with each other
(SOA) 45
Benefits of Cloud Computing

Benefits Description
 Reduces the up-front capital expenditure
Reduced IT cost
(CAPEX)
 Provides the ability to deploy new resources
Business agility
quickly
(Quick flow of
 Enables businesses to reduce time-to-
work)
market
 Enables consumers to scale up, scale down,
scale out, or scale in the demand for
Flexible scaling computing resources easily
 Consumers can unilaterally and
automatically scale computing resources
 Ensures resource availability at varying
High availability levels, depending on consumer’s policy and
priority 46
Infrastructure
-as-a-Service
(IaaS)

Cloud
Service
Models
Platform Software
-as-a-Service -as-a-Service
(PaaS) (SaaS)

47
Infrastructure-as-a-Service
 Consumers deploy their software, including OS and application on
provider’s infrastructure
 Computing resources such as processing power, memory,
storage, and networking components are offered as service
 Example: Amazon Elastic Compute Cloud
 Consumers have control over the OSs and deployed applications

Consumer’s Resources Application

Database

OS

Provider’s Resources Compute


Cloud
Storage

Network
48
Platform-as-a-Service

 Consumers deploy consumer-created or acquired applications onto


provider’s computing platform
 Computing platform is offered as a service
 Example: Google App Engine and Microsoft Windows Azure
Platform
 Consumer has control over deployed applications

Consumer’s Resources Application

Database
Provider’s Resources
OS

Compute
Cloud
Storage

Network

49
Software-as-a-Service

 Consumers use provider’s applications running on the cloud


infrastructure
 Applications are offered as a service
 Examples: EMC Mozy and Salesforce.com
 Service providers exclusively manage computing infrastructure and
software to support services

Application

Database
Provider’s Resources
OS

Compute
Cloud
Storage

Network

50
Infrastructure as a Service Platform as a Service Software as a Service
(IaaS). (PaaS) (SaaS)
A service model that A service model that Also referred to as
involves outsourcing the involves outsourcing the “software on demand,”
basic infrastructure used basic infrastructure and outsourcing the
to support operations-- platform (Windows, Unix) infrastructure, platform,
including storage, and software/applications.
hardware, servers, and PaaS facilitates deploying
networking components. applications without the  Services are available to
cost and complexity of the customer for a fee,
The service provider owns buying and managing the pay-as-you-go, or a no
the infrastructure underlying hardware and charge model.
equipment and is software where the Example : Google Apps,
responsible for housing, applications are hosted. Dropbox, Salesforce, Cisco
running, and maintaining it. WebEx, Concur,
The customer typically Example : AWS Elastic GoToMeeting
pays on a per-use basis. Beanstalk, Windows Azure,
Heroku, Force.com, Google
Example: DigitalOcean, App Engine, Apache
Linode, Rackspace, Stratos, OpenShift
Amazon Web Services
(AWS), Cisco Metapod,
Microsoft Azure, Google
Compute Engine (GCE) 51
Cloud Deployment Models

Public

Cloud
Private Deployment Hybrid
Models

Community

52
 Public cloud model, the cloud infrastructure is provisioned for open use by the general
public.
 It may be owned, managed, and operated by a business, academic, or government
organization, or some combination of them.
 It exists on the premises of the cloud provider (jointly).
 Consumers use the cloud services offered by the providers via the Internet and pay
metered usage charges or subscription fees.
 An advantage of the public cloud is its low capital cost with enormous scalability.
 For consumers, these benefits come with certain risks:
 no control over the resources
 the security of confidential data, network performance, and inter-operability issues.
Example: Popular public cloud service providers are Amazon, Google, and
Salesforce.com.

53
 Private cloud model, the cloud infrastructure is provisioned for exclusive use by a
single organization comprising multiple consumers (for example, business units).
 It may be owned, managed, and operated by the organization, a third party, or some
combination of them, and it may exist on or off premises.

On-premise private cloud:


 The on-premise private cloud, also
known as internal cloud, is hosted by an
organization within its own data centers.
 This model enables organizations to
standardize their cloud service
management processes and security,
although this model has limitations in
terms of size and resource scalability.
 Organizations would also need to sustain
the capital and operational costs for the
physical resources.
 This is suitable for organizations that
require complete control over their
applications, infrastructure
configurations, and security
mechanisms. 54
Externally hosted private cloud:

 This type of private cloud is hosted external to an organization and is


managed by a third party organization.
 The third-party organization facilitates an exclusive cloud environment for
a specific organization with full guarantee of privacy and
confidentiality.

55
 Community cloud model, the cloud infrastructure is provisioned for
exclusive use by a specific community of consumers from
organizations that have shared.
 Concern services about the (for example, mission, security
requirements, policy, and compliance considerations).
 It may be owned, managed, and operated by one or more of the
organizations in the community, a third party, or some combination
of them, and it may exist on or off premises.

56
 In a community cloud, the costs spread over to fewer consumers
than a public cloud.
 This option is more expensive but might offer a higher level of
privacy, security, and compliance.
 The community cloud also offers organizations access to a vast
pool of resources compared to the private cloud.
 An example in which a community cloud could be useful is
government agencies.
 If various agencies within the government operate under similar
guidelines, they could all share the same infrastructure and lower
their individual agency’s investment.

57
 Hybrid cloud model, the cloud infrastructure is a composition of two
or more distinct cloud infrastructures (private, community, or
public)
 Remain unique entities, but are bound together by standardized or
proprietary technology that enables data and application
portability (load balancing between clouds).
 The hybrid model allows an organization to deploy less critical
applications and data to the public cloud, leveraging the
scalability and cost-effectiveness of the public cloud.
 The organization’s mission-critical applications and data remain on
the private cloud that provides greater security.

58
Cloud Challenges – Consumer’s Perspective

 Security and regulation


 Consumers are indecisive to transfer control of sensitive data
 Regulation may prevent organizations to use cloud services
 Network latency
 Real time applications may suffer due to network latency and
limited bandwidth
 Supportability
 Service provider might not support open acess environments
 Incompatible hypervisors could impact VM migration
 Vendor lock-in
 Restricts consumers from changing their cloud service providers
 Lack of standardization across cloud-based platforms

59
Cloud Challenges – Provider’s Perspective

 Service warranty and service cost


 Resources must be kept ready to meet unpredictable demand
 Hefty penalty, if SLAs are not fulfilled
 Complexity in deploying vendor software in the cloud
 Many vendors do not provide cloud-ready software licenses
 Higher cost of cloud-ready software licenses
 No standard cloud access interface
 Cloud consumers want open APIs
 Need agreement among cloud providers for standardization

60
Unit - 1
S3 – SLO1 – Key challenges in managing information

61
Key Challenges in Managing Information
 Exploding digital universe
 Multifold increase of information growth
 Increasing dependency on information
 The strategic use of information plays
 Changing value of information
 Information that is valuable today may become less important
tomorrow

Constraints:
 Cost
 Physical environment
 Maintenance and support
 Compliance – regulatory and legal
 Hardware and software infrastructure
 Interoperability and compatibility
62
Challenges during Data collection in physical mode

Landslide in
Himalayan terrain

Dense settlement –steep slope

Less storage
capacity and
less power
backup based
data collection

Rock collection Steep rugged


–steep slope topography
63
Natural constraints during the physical mode data collection

Cloud/ Rain in
Himalayan terrain Rock slide in
Himalayan terrain

Steep peak high elevated terrain

Ice accumulation in higher altitude

64
Unit - 1
S3 – SLO2 – Data Center Environment: Application

65
Data Center Environment

Application

Storage DBMS

Data Center
Environment

Host or
Connectivity
Compute

66
Data Center Environment - Application

 An application is a computer program that provides the logic for


computing operations
 The application sends requests to the underlying operating system
to perform read/write (R/W) operations on the storage devices
 Examples
 e-mail
 enterprise resource planning (ERP)
 decision support system (DSS)
 resource management
 backup
 authentication and antivirus applications

67

You might also like