0% found this document useful (0 votes)

2 views7 pages

Sharding The Hibernate Way

The document discusses Hibernate Shards, an extension of Hibernate that facilitates horizontal partitioning of data across multiple databases to enhance scalability. It highlights the importance of schema design, the relationship between sharding and Hibernate, and the strategies for distributing data across shards. Additionally, it notes some limitations of the current implementation, such as incomplete support for HQL and the absence of management tools for cross-shard operations.

Uploaded by

17280164070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views7 pages

Sharding The Hibernate Way

Uploaded by

17280164070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.

html

Sharding the Hibernate Way

Saturday, July 26, 2008 at 3:04AM

Todd Hoff in Shard, hibernate

Update: A very nice JavaWorld podcast interview with Google engineer

Max Ross on Hibernate Shards. Max defines Hibernate Shards
(horizontal partitioning), how it works (pretty well), virtual shards (don't
ask), what they need to do in the future (query, replication, operational
tools), and how it relates to Google AppEngine (not much).

To scale you are supposed to partition your data. Sounds good, but how
do you do it? When you actually sit down to work out all the details it’s
not that easy. Hibernate Shards to the rescue! Hibernate shards is: an
extension to the core Hibernate product that adds facilities for horizontal
partitioning. If you know the core Hibernate API you know the shards
API. No learning curve at all. Here is what a few members of the core
group had to say about the Hibernate Shards open source project.
Although there are some limitations, from the sound of it they are doing
useful stuff in the right way and it’s very much worth looking at,
especially if you use Hibernate or some other ORM layer.

Information Sources
1. Google Developer Podcast Episode Six: The Hibernate
Shards Open Source Project. This is the document summarized
here.
2. Hibernate Shards Project Page
3. Hibernate Shards Dev Discussion Group.
4. Ryan Barrett’s Scaling on the Cheap presentation. Many of the
lessons from here are in Hibernate Shards.
http://weibo.com/developerworks 2012-11-11 整理第 1／7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

5. JavaWorld podcast interview: Sharding with Max Ross -

Hibernate Shards - Max Ross is the Google engineer who spends
his days working on the Google App Engine data store. On the side
he works on Hibernate Shards, another scalability-obsessed project
that is open source.

What is Hibernate Shards?

1. Shard: splitting up data sets. If data doesn't fit on one machine then
split it up into pieces, each piece is called a shard.
2. Sharding: the process of splitting up data. For example, putting
employees 1-10,000 on shard1 and employees 10,001-20,000 on
shard2.
3. Sharding is used when you have too much data to fit in one single
relational database. If your database has a JDBC adapter that means
Hibernate can talk to it and if Hibernate can talk to it that means
Hibernate Shards can talk to it.
4. Most people don't want to shard because it makes everything
complex. But when you have too much data, when you fill your
database up, you need another solution, which can be to shard the
data across multiple relational databases. The complexity arises
because your application has to have the smarts to access multiple
databases and that's where Hibernate Shards tries to help.
5. Structure of the data is identical from server to server. The same
schema is used across all databases (MySQL, etc).
6. Hibernate was chosen because it's a good ORM tool used internally
at Google, but to Google Scale (really really big), sharding needed
to be added because Hibernate didn’t support that sort of scale out
of the box.
7. The learning curve for a Hibernate user is zero because the
Hibernate API is the same. The shard implementation hasn’t
violated the API (yet). Sharded versions of Session, Critieria, and
http://weibo.com/developerworks 2012-11-11 整理第 2／7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

Factory are available so the programmer doesn't need to change

code. Query isn't implemented yet because features like aggregation
and grouping are very difficult to implement across databases.
8. How does it compare to MySQL's horizontal partitioning? Shards is
for situations where you have too much data to fit in a single
database. MySQL partitioning may allow you to delay when you
need to shard, but it is still a single database and you’ll eventually
run into limits.

Schema Design for Shards

1. When sharding you have to consider the general issues of
distributed data design for high data volumes. These aren’t
Hibernate Shards specific issues, but are general to the problem
space.
2. Schema design is the most important of the sharding process and
you’ll have to do that up front.
3. You need to pick a dimension, a root level entity, that is easily
sharded. Users and customers are common examples.
4. Accept the fact that those entities and all the entities that hang off
those entities will be stored in separate physical spaces. Querying
across different shards will be difficult. As will management and
just about anything else you take for granted.
5. Control over how data are distributed is determined by a pluggable
strategies layer.
6. Plan for the future by picking a strategy that will last you a long
time. Repartitioning/resharding the data is operationally very
difficult. No management tools for this yet.
7. Build simpler models that don't contain as many relationships
because you don't have cross shard relationships. Your objects
graphs should be contained on one shard as much as possible.
8. Lots of lots of objects pointing to each other may not be a good
http://weibo.com/developerworks 2012-11-11 整理第 3／7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

candidate for sharding.

9. Because the shards design doesn’t modify Hibernate core, you can
design using shards from the start, even though you only have one
database. Then when you need to start scaling it will be easier to
grow.
10. Existing systems with shardable tables shouldn’t take very long to
get up and running.
11. Policy decisions can drive sharding. For example, let's say
customers don't want their data intermingling, so each customer
would get their own database. In this case the application would
shard on the customer as a matter of policy, not simply scaling
concerns.

The Sharding Code’s Relationship to

Hibernate
1. Hibernate Shards encapsulates knowledge of all the shards. This
knowledge is not in the database or the application. It's at the
Hibernate persistence layer which provides a unified view of all the
databases so the application doesn't have to know.
2. Shards doesn't have full support for Hibernate’s query interface.
Hibernate has a criteria or a query interface. Criteria interface is
robust, but not good for JPA (Java persistence API), which is query
based.
3. Sharding should work across all databases Hibernate works on since
shards is a layer on top of Hibernate core beneath the standard
Hibernate interfaces. Programmers aren’t aware of it.
4. What they are doing is figuring out how to do standard things like
save objects, update, and query objects across multiple databases
using standard Hibernate interfaces. If Hibernate can talk to it they
can talk to it.
5. A sharded session is used to contain Hibernate’s sessions so
http://weibo.com/developerworks 2012-11-11 整理第 4／7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

Hibernate capabilities are preserved.

6. Can not manage cross shard foreign relationships (yet). Do have
runtime checks to detect when cross shard relations are used
accidentally. No foreign key constraint checking and there’s no
Hibernate lazy loading. From a programming perspective you can
have IDs that reference other objects on other shards, it’s just that
Hibernate won’t know about these relationships.
7. Now that the base software is done these more advanced features
can be considered. It may take changes in Hibernate core

Pluggable Strategies Determine How Data

Are Split Across Shards
1. A Strategy dictates how data are spread across the shards. It’s an
interface you need to implement. There are three Strategies:
* Shard Resolution Strategy - how you will retrieve your objects.
* Shard Selection Strategy – define where objects are saved to.
* Access Strategy – once you figure out which shard you are talking
to, how do you want to access those shards (serially, 2 at a time, in
parallel, etc)?
2. Goal is to have Strategies as flexible as possible so you can decide
how your data are sharded.
3. A couple of implementations are provided out of the box:
* Round Robin - First one goes to the first shard, second to the
second shard, and then it loops back.
* Attribute Based – Look at attributes in the data to determine which
shard. You can shard users by country, for example.
4. Configuration is set by creating a prototype configuration for all
shards (remember, same schema). Then you specify what's different
from shard to shard like URL, user name and password, dialect
(MySQL, Postgres, etc). Then they'll create a sharded session
factory for Hibernate so developers use standard interfaces.
http://weibo.com/developerworks 2012-11-11 整理第 5／7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

Some Limitations
1. Full Hibernate HQL is not yet supported (maybe it is now, but I
couldn’t tell).
2. Distributed queries are handled by applying a standard HQL query
to each shard, merging the results, and applying the filters. This all
happens in the application server so using very large data sets could
be a problem. It’s left to the intelligence of the developers to do the
right thing to manage performance.
3. No mirroring or data replication. Replication is having common
tables, like zip codes, available on all shards.
4. No clean way to manage read only data you want on every shard for
performance and referential integrity reasons. Say you have country
data. It makes sense to replicate that data on each shard so all
queries using that data can stay on the shard.
5. No handling of fail over situations, which is just like Hibernate. You
could handle it in your connection pool or some other layer. It’s not
considered part of the shard/OR mapping layer.
6. There’s a need for management tools that work across shards. For
example, repartition data on a live system.
7. It’s possible to shard across different databases as long as you keep
the same schema in the same in each database.
8. The number of shards you can have is somewhat limited because
each shard is backed by a connection pool which is a lot of
databases connections. And ORDER_BY operations across
databases must be done in memory so a lot of memory could be
used on large data sets

Related Articles
1. An Unorthodox Approach to Database Design: The Coming of

http://weibo.com/developerworks 2012-11-11 整理第 6／7页

http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html

the Shard.

Article originally appeared on High Scalability (http://highscalability.com/).

See website for complete article licensing information.

http://weibo.com/developerworks 2012-11-11 整理第 7／7页

1273-DBMS Record (New)
No ratings yet
1273-DBMS Record (New)
33 pages
PL/SQL - Oracle's Procedural Language Extension To SQL
No ratings yet
PL/SQL - Oracle's Procedural Language Extension To SQL
32 pages
Database Sharding at Netlog
100% (5)
Database Sharding at Netlog
70 pages
Hibernate by JavaScaler
No ratings yet
Hibernate by JavaScaler
126 pages
DBMS Question Bank
No ratings yet
DBMS Question Bank
2 pages
Devops Project
No ratings yet
Devops Project
6 pages
facebook架构设计中文版
No ratings yet
facebook架构设计中文版
39 pages
Hibernateormfeatures 140223193044 Phpapp02
No ratings yet
Hibernateormfeatures 140223193044 Phpapp02
36 pages
Hibernate3.5 With JPA Presentation
No ratings yet
Hibernate3.5 With JPA Presentation
53 pages
DBMS LAB Final
No ratings yet
DBMS LAB Final
40 pages
Hibernate Old
No ratings yet
Hibernate Old
37 pages
Less08 Data TB3
No ratings yet
Less08 Data TB3
31 pages
To Shard or Not To Shard
No ratings yet
To Shard or Not To Shard
31 pages
Hibernate With Example
100% (1)
Hibernate With Example
32 pages
Did The Microsoft Stack Kill MySpace
No ratings yet
Did The Microsoft Stack Kill MySpace
14 pages
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
No ratings yet
TripAdvisor Architecture - 40M Visitors 200M Dynamic Page Views 30TB Data
11 pages
Unit 3
No ratings yet
Unit 3
4 pages
Amazon Architecture
No ratings yet
Amazon Architecture
11 pages
The Performance of Distributed Data-Structures Running On A Cache-Coherent In-Memory Data Grid
No ratings yet
The Performance of Distributed Data-Structures Running On A Cache-Coherent In-Memory Data Grid
10 pages
6q9k5yndkd9j-SDE DF400 020 Full Deck
No ratings yet
6q9k5yndkd9j-SDE DF400 020 Full Deck
81 pages
03-Hibernate Slide 2023-2
100% (1)
03-Hibernate Slide 2023-2
45 pages
YouTube Architecture
No ratings yet
YouTube Architecture
8 pages
S Harding
No ratings yet
S Harding
17 pages
Russ' 10 Ingredient Recipe For Making 1 Million TPS On $5K Hardware
No ratings yet
Russ' 10 Ingredient Recipe For Making 1 Million TPS On $5K Hardware
7 pages
Misco - A MapReduce Framework For Mobile Systems - Start of The Ambient Cloud
No ratings yet
Misco - A MapReduce Framework For Mobile Systems - Start of The Ambient Cloud
7 pages
Using Gossip Protocols For Failure Detection Monitoring Messaging and Other Good Things
No ratings yet
Using Gossip Protocols For Failure Detection Monitoring Messaging and Other Good Things
7 pages
Mongo DB Shell Cheat Sheet 1a0e3aa962
No ratings yet
Mongo DB Shell Cheat Sheet 1a0e3aa962
15 pages
Sharding Pinterest - How We Scaled Our MySQL Fleet - by Pinterest Engineering - Pinterest Engineering Blog - Medium
No ratings yet
Sharding Pinterest - How We Scaled Our MySQL Fleet - by Pinterest Engineering - Pinterest Engineering Blog - Medium
12 pages
An Unorthodox Approach To Database Design - The Coming of The Shard
No ratings yet
An Unorthodox Approach To Database Design - The Coming of The Shard
6 pages
Is It Time To Get Rid of The Linux OS Model in The Cloud
No ratings yet
Is It Time To Get Rid of The Linux OS Model in The Cloud
6 pages
Introduction To SAP HANA
No ratings yet
Introduction To SAP HANA
12 pages
The Anatomy of Search Technology - Blekko's NoSQL Database
No ratings yet
The Anatomy of Search Technology - Blekko's NoSQL Database
5 pages
TableauCertifiedDataAnalyst ExamGuide
No ratings yet
TableauCertifiedDataAnalyst ExamGuide
16 pages
CRUD Operations Using Hibernate
No ratings yet
CRUD Operations Using Hibernate
10 pages
At Some Point The Cost of Servers Outweighs The Cost of Programmers
No ratings yet
At Some Point The Cost of Servers Outweighs The Cost of Programmers
4 pages
Heroku Emergency Strategy - Incident Command System and 8 Hour Ops Rotations For Fresh Minds
No ratings yet
Heroku Emergency Strategy - Incident Command System and 8 Hour Ops Rotations For Fresh Minds
4 pages
PuruAdbms Assignment
No ratings yet
PuruAdbms Assignment
14 pages
Unit 02
No ratings yet
Unit 02
10 pages
SHARD
No ratings yet
SHARD
7 pages
Facebook at 13 Million Queries Per Second Recommends - Minimize Request Variance
No ratings yet
Facebook at 13 Million Queries Per Second Recommends - Minimize Request Variance
3 pages
Cloud Programming Directly Feeds Cost Allocation Back Into Software Design
No ratings yet
Cloud Programming Directly Feeds Cost Allocation Back Into Software Design
3 pages
10 Golden Principles For Building Successful Mobile-Web Applications
No ratings yet
10 Golden Principles For Building Successful Mobile-Web Applications
3 pages
Strategy - Diagonal Scaling - Don't Forget To Scale Out and Up
No ratings yet
Strategy - Diagonal Scaling - Don't Forget To Scale Out and Up
3 pages
High Performance Inheritance: Ii. Iii. IV
No ratings yet
High Performance Inheritance: Ii. Iii. IV
35 pages
Database Sharding Presentation
No ratings yet
Database Sharding Presentation
6 pages
Persistence Hibernate
No ratings yet
Persistence Hibernate
39 pages
Netflix - Continually Test by Failing Servers With Chaos Monkey
No ratings yet
Netflix - Continually Test by Failing Servers With Chaos Monkey
2 pages
Introduction To Hibernate
100% (1)
Introduction To Hibernate
4 pages
Spring MVC-Hibernate
No ratings yet
Spring MVC-Hibernate
16 pages
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
No ratings yet
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
25 pages
SQL Udemy
No ratings yet
SQL Udemy
34 pages
Modern Data Architecture Concepts
No ratings yet
Modern Data Architecture Concepts
18 pages
Database Sharding
No ratings yet
Database Sharding
5 pages
Hibernate ORM 5.3.15.final User Guide
No ratings yet
Hibernate ORM 5.3.15.final User Guide
663 pages
Relational Database
No ratings yet
Relational Database
6 pages
Advanced Topics in Java ORM Hibernate Presentation v2
No ratings yet
Advanced Topics in Java ORM Hibernate Presentation v2
17 pages
Other Hibernate Projects - Hibernate
No ratings yet
Other Hibernate Projects - Hibernate
4 pages
Database Sharding White Paper V1
No ratings yet
Database Sharding White Paper V1
17 pages
Hibernate
No ratings yet
Hibernate
42 pages
Next Gen Database
No ratings yet
Next Gen Database
1 page
Flickr Architecture
No ratings yet
Flickr Architecture
9 pages
Class 7 - Scaling, Sharding, Consistent Hashing
No ratings yet
Class 7 - Scaling, Sharding, Consistent Hashing
4 pages
Rapid Persistence Layer Development With Hibernate
No ratings yet
Rapid Persistence Layer Development With Hibernate
21 pages
I Made An IOS App With Cursor and It's Super Fun
No ratings yet
I Made An IOS App With Cursor and It's Super Fun
3 pages
Hibernate Tutorial
No ratings yet
Hibernate Tutorial
18 pages
Build A Basic Airbnb App With Cursor AI Tricks
No ratings yet
Build A Basic Airbnb App With Cursor AI Tricks
5 pages
Cursor + V0 Can We Build An AI Next - Js App in 8 Minutes
No ratings yet
Cursor + V0 Can We Build An AI Next - Js App in 8 Minutes
2 pages
III Sharding Strategies
No ratings yet
III Sharding Strategies
30 pages
Python Programming Cheat Sheet
No ratings yet
Python Programming Cheat Sheet
13 pages
Chapter 4 (Online Refugee Resettlement System)
No ratings yet
Chapter 4 (Online Refugee Resettlement System)
20 pages
Big Data Huawei Course
No ratings yet
Big Data Huawei Course
12 pages
Interview Questions On Hibernate
No ratings yet
Interview Questions On Hibernate
4 pages
Modified Rkr21-Ii Year Ii-Sem
No ratings yet
Modified Rkr21-Ii Year Ii-Sem
26 pages
Llama2 Extracted
No ratings yet
Llama2 Extracted
4 pages
The Python Database API
No ratings yet
The Python Database API
9 pages
Explain Hibernate Configuration File and Hibernate Mapping File?
No ratings yet
Explain Hibernate Configuration File and Hibernate Mapping File?
10 pages
Why My Slime Mold Is Better Than Your Hadoop Cluster
No ratings yet
Why My Slime Mold Is Better Than Your Hadoop Cluster
8 pages
Lesson 5 - Hibernate
No ratings yet
Lesson 5 - Hibernate
60 pages
Unit 6
No ratings yet
Unit 6
64 pages
23mca12 PPT
No ratings yet
23mca12 PPT
12 pages
How To Build An AI Customer Service Bot
No ratings yet
How To Build An AI Customer Service Bot
6 pages
Day 12-ORM and Hibernate
No ratings yet
Day 12-ORM and Hibernate
24 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Build A Perplexity Clone in 8min With AI
No ratings yet
Build A Perplexity Clone in 8min With AI
3 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
Presentation 1
No ratings yet
Presentation 1
12 pages
C4 Database System
No ratings yet
C4 Database System
3 pages
Hibernate Interview Questions Answers Guide
No ratings yet
Hibernate Interview Questions Answers Guide
12 pages
The University of Texas at Austin CE 395 R 2 - Project Information Management Systems Spring 2006 SQL Exercises
No ratings yet
The University of Texas at Austin CE 395 R 2 - Project Information Management Systems Spring 2006 SQL Exercises
10 pages
Sales Analysis and Prediction Using Pyth
No ratings yet
Sales Analysis and Prediction Using Pyth
5 pages
Aejp Unit IV
No ratings yet
Aejp Unit IV
55 pages
Hibernate
No ratings yet
Hibernate
36 pages
Hibernate Interview Questions
No ratings yet
Hibernate Interview Questions
21 pages
Hibernate Tutorial HIBERNATE - Introduction To Hibernate
No ratings yet
Hibernate Tutorial HIBERNATE - Introduction To Hibernate
43 pages
Programming Assignment Unit 4
No ratings yet
Programming Assignment Unit 4
3 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Assignment 6 - Rapid Miner PDF
No ratings yet
Assignment 6 - Rapid Miner PDF
3 pages
Hibernate Questions Part 1
No ratings yet
Hibernate Questions Part 1
13 pages
Hibernate 3.X Material
No ratings yet
Hibernate 3.X Material
12 pages
Cost-Effective Database Scalability Using Database Sharding
No ratings yet
Cost-Effective Database Scalability Using Database Sharding
19 pages
Lecture 10: BCSE302L - DBMS: Functional Dependencies
No ratings yet
Lecture 10: BCSE302L - DBMS: Functional Dependencies
35 pages
Can't Connect To MongoDB With Authentication Enabled - Stack Overflow PDF
No ratings yet
Can't Connect To MongoDB With Authentication Enabled - Stack Overflow PDF
1 page
How To Use Cursor Agent For Beginners
No ratings yet
How To Use Cursor Agent For Beginners
7 pages
M.Tech JNTUK ADS UNIT-5
No ratings yet
M.Tech JNTUK ADS UNIT-5
13 pages
Hibdernate Tutorials
No ratings yet
Hibdernate Tutorials
3 pages
Hibernate Tutorial
No ratings yet
Hibernate Tutorial
38 pages
Hibernate ntcc2k19
No ratings yet
Hibernate ntcc2k19
21 pages
A RHS Hibernate
No ratings yet
A RHS Hibernate
22 pages
Hibernate Interview Questions
0% (1)
Hibernate Interview Questions
112 pages
Viva SQL
No ratings yet
Viva SQL
4 pages
Hibernate Tutorial
No ratings yet
Hibernate Tutorial
36 pages
Mastering Hibernate - Sample Chapter
No ratings yet
Mastering Hibernate - Sample Chapter
27 pages
Hibernate IntroPPT
No ratings yet
Hibernate IntroPPT
23 pages

Sharding The Hibernate Way

Uploaded by

Sharding The Hibernate Way

Uploaded by

http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.

Sharding the Hibernate Way

Todd Hoff in Shard, hibernate

Update: A very nice JavaWorld podcast interview with Google engineer

5. JavaWorld podcast interview: Sharding with Max Ross -

What is Hibernate Shards?

Factory are available so the programmer doesn't need to change

Schema Design for Shards

candidate for sharding.

The Sharding Code’s Relationship to

Hibernate capabilities are preserved.

Pluggable Strategies Determine How Data

http://weibo.com/developerworks 2012-11-11 整理 第 6／7页

Article originally appeared on High Scalability (http://highscalability.com/).

See website for complete article licensing information.

http://weibo.com/developerworks 2012-11-11 整理 第 7／7页

You might also like

http://weibo.com/developerworks 2012-11-11 整理第 6／7页

http://weibo.com/developerworks 2012-11-11 整理第 7／7页