Sharding The Hibernate Way
Sharding The Hibernate Way
html
To scale you are supposed to partition your data. Sounds good, but how
do you do it? When you actually sit down to work out all the details it’s
not that easy. Hibernate Shards to the rescue! Hibernate shards is: an
extension to the core Hibernate product that adds facilities for horizontal
partitioning. If you know the core Hibernate API you know the shards
API. No learning curve at all. Here is what a few members of the core
group had to say about the Hibernate Shards open source project.
Although there are some limitations, from the sound of it they are doing
useful stuff in the right way and it’s very much worth looking at,
especially if you use Hibernate or some other ORM layer.
Information Sources
1. Google Developer Podcast Episode Six: The Hibernate
Shards Open Source Project. This is the document summarized
here.
2. Hibernate Shards Project Page
3. Hibernate Shards Dev Discussion Group.
4. Ryan Barrett’s Scaling on the Cheap presentation. Many of the
lessons from here are in Hibernate Shards.
http://weibo.com/developerworks 2012-11-11 整理 第 1/7页
http://highscalability.com/blog/2008/7/26/sharding-the-hibernate-way.html
Some Limitations
1. Full Hibernate HQL is not yet supported (maybe it is now, but I
couldn’t tell).
2. Distributed queries are handled by applying a standard HQL query
to each shard, merging the results, and applying the filters. This all
happens in the application server so using very large data sets could
be a problem. It’s left to the intelligence of the developers to do the
right thing to manage performance.
3. No mirroring or data replication. Replication is having common
tables, like zip codes, available on all shards.
4. No clean way to manage read only data you want on every shard for
performance and referential integrity reasons. Say you have country
data. It makes sense to replicate that data on each shard so all
queries using that data can stay on the shard.
5. No handling of fail over situations, which is just like Hibernate. You
could handle it in your connection pool or some other layer. It’s not
considered part of the shard/OR mapping layer.
6. There’s a need for management tools that work across shards. For
example, repartition data on a live system.
7. It’s possible to shard across different databases as long as you keep
the same schema in the same in each database.
8. The number of shards you can have is somewhat limited because
each shard is backed by a connection pool which is a lot of
databases connections. And ORDER_BY operations across
databases must be done in memory so a lot of memory could be
used on large data sets
Related Articles
1. An Unorthodox Approach to Database Design: The Coming of
the Shard.