0% found this document useful (0 votes)
186 views27 pages

Ceph at CSC

This document discusses how CSC implements Ceph storage in their cloud infrastructure. Key points include: - CSC uses Ceph for unified block, object, and file storage in their OpenStack-based cloud offering. - Their Ceph infrastructure includes production, test, and development clusters ranging from 360TB to 960TB of raw storage capacity. - They manage their Ceph clusters using tools like Ansible, monitoring with Grafana, and log management with ELK. - Future plans include expanding capacity to 3PB, introducing storage POD layouts, SSD journals, and erasure coding.

Uploaded by

Davi DE Souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views27 pages

Ceph at CSC

This document discusses how CSC implements Ceph storage in their cloud infrastructure. Key points include: - CSC uses Ceph for unified block, object, and file storage in their OpenStack-based cloud offering. - Their Ceph infrastructure includes production, test, and development clusters ranging from 360TB to 960TB of raw storage capacity. - They manage their Ceph clusters using tools like Ansible, monitoring with Grafana, and log management with ELK. - Future plans include expanding capacity to 3PB, introducing storage POD layouts, SSD journals, and erasure coding.

Uploaded by

Davi DE Souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

How Do we do

Ceph @ CSC
#whoami
Karan Singh

System Specialist Cloud Storage

CSC-IT Center for Science


FINLAND

[email protected]

•  Author for Learning Ceph – Packt Publication 2015

•  Author for Ceph Cookbook – Packt Publication 2016

•  Technical Reviewer for Mastering Ceph – Packt Publication 2016

•  www.ksingh.co.in - Tune in for my blogs


2
CSC-IT Center For Science
•  Founded in 1971

•  Finnish Non Profit organization, Funded by Ministry of Education

•  Connected Finland to Internet in 1988

•  Most Powerful academic computing facility in the Nordics

•  ISO27001:2013 Certification

•  Public cloud offering Pouta Cloud Services

More Information
o  https://www.csc.fi/
o  https://research.csc.fi/cloud-computing
3
CSC Cloud Offering
•  Pouta Cloud Service [ IaaS ]
o  cPouta - Public cloud , General Purpose
o  ePouta - Public cloud , purposely built for sensitive data

•  Built using OpenStack

•  Uses upstream openstack packages, No distribution

•  Storage : Both Ceph and non-Ceph

4
Our Need for Ceph
•  To build our own storage – Not to buy black box

•  Software Defined , use commodity hardware

•  Unified – Block , Object , ( File )

•  Tightly Integrates with OpenStack

•  Open Source, no vendor lock-in

•  Scalable and High available

5
Our Need for Ceph
•  Remove SPOF for Storage in OpenStack

•  OpenStack alone is too complex – Let’s make it a bit less


o  By using Ceph for storage needs

•  To be up-to-date with community


o  Ceph is the most used storage backend for OpenStack

•  Need for Object storage

6
Storage Complexity
Storage  for    
Nova  Instances  

LUN   OpenStack    
OpenStack    
Compute  
OpenStack     Local  
Compute  
Gateway-­‐1   Compute   Disk  
LUN  
Enterprise  
Array  
LUN   Gateway-­‐2  

OpenStack    
OpenStack    
LUN   Compute  
OpenStack    
Compute  
Controller   NFS  

Storage  for  Cinder  


Storage  for    
Glance  

7
This is why we choose Ceph

•  One storage to rule them all

•  Goes hand-in-hand with OpenStack

•  Supports instance Live Migration, CoW

•  Bonus for using Ceph


o  OpenStack Manila ( Shared filesystem)
o  On the way

hFp://www.slideshare.net/ircolle/what-­‐is-­‐a-­‐ceph-­‐and-­‐why-­‐do-­‐i-­‐care-­‐openstack-­‐storage-­‐colorado-­‐openstack-­‐meetup-­‐october-­‐14-­‐2014  
Ceph Infrastructure
ePouta Cloud Service

Production Cluster Test Cluster Development Cluster

•  10 x HP DL380 •  5 x HP DL380 •  4 x HP SL4540


o  E5-2450, 8c, 2.10 GHz o  E5-2450, 8c, 2.10 GHz o  2 x E5-2470, 8c, 2.30 GHz
o  24GB Memory o  24GB Memory o  192 GB Memory
o  12 x 3TB SATA o  12 x 3TB SATA o  60 x 4TB SATA
o  2 x 40Gbe o  2 x 40Gbe o  2 x 10Gbe

•  Ceph Firefly 0.80.8 •  Ceph Hammer 0.94.3 •  Ceph Hammer 0.94.3

•  CentOS 6.6 •  CentOS 6.6 •  CentOS 6.6


o  3.10.69 o  3.10.69 o  3.10.69

•  360 TB Raw •  180 TB Raw •  960 TB Raw


9
Ceph Infrastructure .. Cont..
cPouta Cloud Service Proof of Concept
Pre-Production Cluster Fujitsu Eternus CD10000
•  4 x HP SL4540 •  4 x Primergy Rx300 S8
o  2 x E5-2470, 8c, 2.30 GHz o  2 x E5-2640, 8c, 2.00 GHz
o  192 GB Memory o  128 GB Memory
o  60 x 4TB SATA o  1 x 10Gbe / 1 x 40Gbe
o  2 x 10Gbe o  15 x 900GB SAS 2.5“ 10K
o  1 x 800G Fusion ioDrive2 PCIe SSD
•  Object Storage Service
•  4 x Eternus JX40 JBOD
•  Ceph Firefly 0.80.10 o  24 x 900GB SAS 2.5“ 10K

•  CentOS 6.5 •  Ceph Firefly 0.80.7


o  2.6.32
•  CentOS 6.6
•  240 OSD / 870 TB Available o  3.10.42

•  156 OSD / 126 TB Available


10
Our toolkit for Ceph
•  OS deployment, package mgmt.
o  Spacewalk

•  Ansible
o  End to end system configuration
o  Network, Kernel, packages, OS Tuning, NTP,
o  Metric collection, Monitoring, Central logging
etc.
o  Entire Ceph deployment
o  System / Ceph administration

•  Performance Metric & Dashboard


o  Collectd, Graphite, Grafana

•  Monitoring and Logs Management


o  OpsView, ELK stack

•  Version Control
o  Git , GitHub 11
Live Demo

12
Near Future
•  CSC Espoo DC [ePouta Cloud Storage]
o  Next 8-12 months à 3PB Raw
o  Introduction to storage POD layout for scalability & better failure domain
o  Dedicated Monitor node
o  SSD Journals
o  Erasure Coding

•  CSC Kajaani DC [cPouta Cloud Storage]


o  Early next year à Add new capacity ~850TB ( total capacity ~1.8 PB Raw )
o  Enable full support to OpenStack ( Nova, Glance, Cinder, Swift )
o  Erasure Coding

•  Miscellaneous
o  Multi DC replication [Espoo – Kajaani]

13
Long Term
Build Ceph environment , that is

•  Multi-Petabyte ( ~ 10 PB Usable )
•  Hyper Scalable
•  Multi-Rack Fault tolerant

Storage PODs

•  Design on paper currently


•  Still thinking for the best way
•  Interested to know, what other’s are doing ?

14
Disks, Nodes , Racks

Storage  Node  
Disks  

Rack   Rack   Rack   15


More Racks ... Hyper scale
C  E  P  H  

Rack   Rack   Rack   Rack   Rack   Rack   Rack   Rack   Rack   Rack   Rack   Rack  

How  to  manage  effecPvely   16


Storage POD

•  Storage POD is a group of racks


•  Ease of management , in a hyper scale environment
•  Scalable modular design
•  Can sustain multi-rack failure
•  CRUSH failure domain changes required
•  Primary copy à One POD
•  Secondary & Tertiary Copy à Other Two POD’s

17
Storage POD in action
C  E  P  H  

Rack   Rack   Rack  

POD-­‐1   POD-­‐2   POD-­‐3   18


Scaling up Multi Rack
C  E  P  H  

Rack   Rack   Rack  

POD-­‐1   POD-­‐2   POD-­‐3   19


Scaling up…even more racks
C  E  P  H  

POD-­‐1   POD-­‐2   POD-­‐3   20


Scaling up…several PODs
C  E  P  H  

21
Some Recommendations
•  Monitor Nodes
o  Use dedicated monitor nodes, avoid sharing them with OSD’s
o  Use SSD for Ceph Monitor LevelDB

•  OSD nodes
o  Avoid overloading your SSD journals, you might not get what you expect.
o  Node Preference:
o  #1 Thin node (10-16 disk)
o  #2 Thick Node (16-30 disk)
o  #3 Fat Node (disk > 30)
o  If using FAT nodes , use several of them

22
Operational Experience
•  Use dedicated disks for OS , OSD data & OSD Journal ( can be shared )

•  Plan your requirement well , choose PG count wisely for a prod. Cluster
o  Increasing PG count is one of the most intensive operation
o  Decreasing PG count is not allowed

•  Ceph version upgrades / rolling upgrades , works like charm

•  For Thick and FAT OSD nodes , tune kernel


o  kernel.pid_max=4194303
o  kernel.threads-max=200000

23
Operational Experience
•  If you are seeing Blocked OPS/Slow OSD/Request, don’t worry you are not alone
o  Ceph health detail -> Find OSD -> Find node -> Check “EVERYTHING” on that node -> Mark out
o  If the problem is on most of the nodes -> Check “NETWORK”
o  Interface errors , MTU , Configuration, Network blocking , Architecture, Switch logs, remove iface, bonding.
o  Even the cable change worked for us ( upgraded switch FW and the cable type became up supported )

•  Tune CRUSH for optimal parameters


o  # ceph osd crush tunables optimal
o  Caution this will trigger a lot of data movement

•  Ceph recovery/backfilling can starve your client for IO , you may want to reduce it
ceph tell osd.\* injectargs '--osd_recovery_max_active 1 --osd_recovery_max_single_start 1 --
osd_recovery_op_priority 50 --osd_recovery_max_chunk 1048576 --osd_recovery_threads 1 --
osd_max_backfills 1 --osd_backfill_scan_min 4 --osd_backfill_scan_max 8’
24
#  1  Health  OK  
#2  

03/02/15 25
Operational Experience
•  Increasing filestore max_sync and min_sync vlaues , helped to a certain extent
o  filestore_max_sync_interval = 140
o  filestore_min_sync_interval = 100

•  Firmware upgrade on the network switches as well as replacing physical network


cables fixed the issue.

Advice : Always check your network TWICE !!!

26
THANK YOU

27

You might also like