OpenStack Swift is a distributed object storage service designed for scalable and redundant storage of unstructured data, such as documents and media files. It utilizes a unique architecture that ensures data integrity and replication across multiple nodes, allowing for cost-effective storage solutions. Swift is accessible via a RESTful API and supports extensive metadata, making it suitable for applications requiring large-scale data management.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
18 views23 pages
Swift
OpenStack Swift is a distributed object storage service designed for scalable and redundant storage of unstructured data, such as documents and media files. It utilizes a unique architecture that ensures data integrity and replication across multiple nodes, allowing for cost-effective storage solutions. Swift is accessible via a RESTful API and supports extensive metadata, making it suitable for applications requiring large-scale data management.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23
Topics
Introducing Object Storage
Features and Benefits Object Storage Characteristics Swift Components Swift Architecture Cluster Architecture Ring Builder Swift Replications Cinder Snapshots and Backups What is Swift? Swift is a highly available, distributed, eventually consistent object storage service OpenStack Swift is used to backup and archive unstructured data, such as documents, images, audio and video files, emails and virtual machine images It is used for redundant, scalable data storage using clusters of standardized servers It is a long-term storage system for large amounts of static data which can be retrieved and updated easily Deep Dive into Swift Swift uses a distributed architecture with no central point of control, providing greater scalability, redundancy, and permanence Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster If a node fails, OpenStack works to replicate its content from other active nodes Object Storage is ideal for cost effective, scale-out storage It provides distributed API-accessible storage platform that can be integrated directly into applications or used for backup, archiving, and data retention Features and Benefits Swift Interaction with Rest API Swift Components Object Storage Building Blocks Proxy Servers Proxy servers handle all of the incoming API requests Once a proxy server receives a request, it determines the storage node based on the object's URL It also coordinate responses, handle failures, and coordinate timestamps It uses a shared-nothing architecture and can be scaled as needed based on projected workloads A minimum of two proxy servers should be deployed behind a separately- managed load balancer. If one proxy server fails, the others takes over Rings Ring represents mapping between the names of entities stored in the cluster and their physical locations on disks There are separate rings for accounts, containers, and objects When components of the system need to perform an operation on an object, container, or account, they need to interact with the corresponding ring to determine the appropriate location in the cluster The ring maintains this mapping using zones, devices, partitions, and replicas Zones Zones are configured to isolate failure boundaries Each data replica resides in a separate zone A zone could be a single drive or a grouping of a few drives The goal of zones is to allow the cluster to tolerate significant outages of storage servers without losing all replicas of the data Accounts and Containers Each account and container is an individual SQLite database They are distributed across the cluster An account database contains the list of containers in that account A container database contains the list of objects in that container Each account in the system has a database that references all of its containers, and each container database references each object in order to keep track of object data locations. Partitions Partition is a collection of stored data This includes account databases, container databases, and objects. Partitions are core to the replication system System replicators and object uploads or downloads operate on partitions Partition is just a directory sitting on a disk with a corresponding hash table of what it contains Replicators Replicators continuously examine each partition For each local partition, the replicator compares it against the replicated copies in the other zones to see if there are any difference Replication takes place by examining hashes (hash file is created for each partition) If the hashes are different, then it is time to replicate, and the directory that needs to be replicated is copied If a zone goes down, one of the nodes containing a replica notices and proactively copies data to a handoff location Swift in Use The following shows the use case for object uploads and downloads and introduce the components Upload Client uses the REST API to make a HTTP request to PUT an object into an existing container The cluster receives the request First, the system must know where the data is going to go (the account name, container name, and object name are all used to determine the partition where this object is present) A lookup in the Ring figures out which storage nodes contain the partitions The data is then sent to each storage node where it is placed. At least two of the three writes must be successful before the client is notified that the upload was successful The container database is updated asynchronously to reflect that there is a new object in it Downloads A request comes in for an account/container/object Using the same consistent hashing, the partition index is determined Lookup in the ring reveals which storage nodes contain that partition A request is made to one of the storage nodes to fetch the object, if that fails, requests are made to the other nodes Swift Architecture - High Level Architecture Description Proxy Server - responsible for tying together the rest of the Swift architecture responsible for encoding and decoding object data handles failure A Storage Policies - provides a way for object storage providers to differentiate service levels, features and behaviours of a Swift deployment Account Server - responsible for listings of containers rather than objects Container Server - handles listing of objects which are stored as SQLite database files Object Server - store, retrieve and delete objects stored on local devices Auditors - forwards the local server checking the integrity of the objects, containers, and accounts Updaters - updates container or account data Replication - responsible for keeping the system in a consistent state in temporary error conditions like network outages or drive failures SWIFT The OpenStack Object Store project, known as Swift, offers cloud storage software so that you can store and retrieve lots of data with a simple API. It's built for scale and optimized for durability, availability, and concurrency across the entire data set. Swift is ideal for storing unstructured data that can grow without bound. OpenStack Object Storage (swift) is used for redundant, scalable data storage using clusters of standardized servers to store petabytes of accessible data. It is a long-term storage system for large amounts of static data which can be retrieved and updated. Object Storage uses a distributed architecture with no central point of control, providing greater scalability, redundancy, and permanence. Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster. Storage clusters scale horizontally by adding new nodes. Should a node fail, OpenStack works to replicate its content from other active nodes. Because OpenStack uses software logic to ensure data replication and distribution across different devices, inexpensive commodity hard drives and servers can be used in lieu of more expensive equipment. Object Storage is ideal for cost effective, scale-out storage. It provides a fully distributed, API-accessible storage platform that can be integrated directly into applications or used for backup, archiving, and data retention. Swift Characteristics Swift is an object storage system that is part of the OpenStack project Swift is open-source and freely available Swift currently powers the largest object storage clouds, including Rackspace Cloud Files, the HP Cloud, IBM Softlayer Cloud and countless private object storage clusters Swift can be used as a stand-alone storage system or as part of a cloud compute environment. Swift runs on standard Linux distributions and on standard x86 server hardware Swift—like Amazon S3—has an eventual consistency architecture, which make it ideal for building massive, highly distributed + infrastructures with lots of unstructured data serving global sites. All objects (data) stored in Swift have a URL Swift Characteristics Applications store and retrieve data in Swift via an industry- standard RESTful HTTP API Objects can have extensive metadata, which can be indexed and searched All objects are stored with multiple copies and are replicated in as- unique-as-possible availability zones and/or regions Swift is scaled by adding additional nodes, which allows for a cost- effective linear storage expansion When adding or replacing hardware, data does not have to be migrated to a new storage system, i.e. there are no fork-lift upgrades Failed nodes and drives can be swapped out while the cluster is running with no downtime. New nodes and drives can be adde