0% found this document useful (0 votes)
180 views55 pages

SPIDER Storage Engine Database Sharding by Storage Engine

1. Spider is a storage engine for MySQL that allows databases and tables to be partitioned across multiple database servers, enabling database sharding without application modifications. 2. It creates table links between local and remote databases, supporting operations like joins across servers. Spider also supports XA transactions and table partitioning across databases. 3. Database sharding using Spider provides performance benefits like application sharding but is easier to implement and maintain, as the application does not need to handle synchronization or joins across servers.

Uploaded by

Santiago Lertora
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views55 pages

SPIDER Storage Engine Database Sharding by Storage Engine

1. Spider is a storage engine for MySQL that allows databases and tables to be partitioned across multiple database servers, enabling database sharding without application modifications. 2. It creates table links between local and remote databases, supporting operations like joins across servers. Spider also supports XA transactions and table partitioning across databases. 3. Database sharding using Spider provides performance benefits like application sharding but is easier to implement and maintain, as the application does not need to handle synchronization or joins across servers.

Uploaded by

Santiago Lertora
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Introducing a new storage engine for MySQL

“SPIDER”

Database Sharding by Storage Engine


Spider – spinning a rainbow web

ST Global., Inc
Kentoku SHIBA
Problem of too many records

tbl_b
tbl_a tbl_b
tbl_a
tbl_a tbl_b

DB

Too many records in the DB


server.
ST Global., Inc All Rights Reserved
Problem of too many requests

tbl_a tbl_b
Master DB

Update requests

Too many requests to the Master DB.

ST Global., Inc All Rights Reserved


DB SHARDING by an application
can resolve these problems.

ST Global., Inc All Rights Reserved


Structure sample of DB sharding by application

tbl_a tbl_b tbl_c


DB1 DB2 DB3

2.Choose a connection
and get data by AP

AP

1.Request 3.Response
select tbl_b

ST Global., Inc All Rights Reserved


Do you have any problem?

It is very difficult to implement “DB sharding by


application” to an application and when implemented
requires a lot of effort.

But by using Spider,


you can implement DB sharding very easily,

without modifying any


application
ST Global., Inc All Rights Reserved
Structure sample of DB sharding by Spider

tbl_a tbl_b tbl_c


DB1 DB2 DB3

SPIDER
2.Just connect to spider

AP

1.Request 3.Response
select tbl_b

ST Global., Inc All Rights Reserved


What is "Spider"

1. Spider Storage Engine creates table links from


local databases to remote databases.

2. Spider can create sharding of the databases.

3. Spider supports XA transaction and table


partitioning.

4. Spider is being offered to the public by GPL.


http://spiderformysql.com
ST Global., Inc All Rights Reserved
Table link
Spider Storage
Create table tbl_a ( tbl_a tbl_a
col_a int, Engine’s table
col_b int,
primary key(col_a)
DB1
) engine = Spider Other Storage
Connection ‘
2.Get data tbl_a
host “DB1”, Engine’s table
table “tbl_a”,
user “user”,
password “pass”
‘; tbl_a tbl_b
Local DB 3.Join
1.Request
select tbl_a.col_a,
tbl_b.col_c 4.Response
from tbl_a, tbl_b
where tbl_a.col_a = 1 and
tbl_a.col_b = tbl_b.col_b

Spider makes available tables in remote MySQL


servers to be used like that in local MySQL
servers.
ST Global., Inc All Rights Reserved
What is "Spider"

1. Spider Storage Engine creates table links from local


database to remote databases.

2. Spider can be used for database sharding.

3. Spider supports XA transaction and table


partitioning.

4. Spider is being offered to the public by GPL.


http://spiderformysql.com
ST Global., Inc All Rights Reserved
XA transaction with Spider

tbl_a tbl_b tbl_c


DB1 DB2 DB3
my.cnf 2.XA prepare 2.XA prepare 2.XA prepare
------------------ 3.XA commit 3.XA commit 3.XA commit
……
……
spider_internal_xa=1
……
…… tbl_a tbl_b tbl_c
Local DB

1.Request 4.Response
commit

Spider can be used for DB clustering.

ST Global., Inc All Rights Reserved


Table partitioning with Spider
Create table tbl_a ( col_a%3=0 col_a%3=1 col_a%3=2
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
tbl_a tbl_a tbl_a
Connection ‘
table “tbl_a”,
DB1 DB2 DB3
user “user”,
password “pass” 2.Get data

partition by list(
mod(col_a, 3)) (
partition pt1 values in(0) tbl_a tbl_b
comment ‘host “DB1”’,
partition pt2 values in(1) Local DB 3.Join
comment ‘host “DB2”’,
partition pt3 values in(2)
comment ‘host “DB3”’
); 4.Response
1.Request
select tbl_a.col_a, tbl_b.col_c from tbl_a, tbl_b
where tbl_a.col_a = 1 and tbl_a.col_b = tbl_b.col_b

Spider supports DB sharding.


ST Global., Inc All Rights Reserved
What is "Spider"

1. Spider Storage Engine creates table links from local


database to remote databases.

2. Spider can create sharding of the databases.

3. Spider supports XA transaction and table


partitioning.

4. Spider is being offered to the public by GPL.


http://spiderformysql.com
ST Global., Inc All Rights Reserved
DB SHARDING
by Spider
DB sharding by application

DB sharding by an application is usually used


for resolving performance issues that are caused
by increasing data or increasing update requests.

ST Global., Inc All Rights Reserved


DB sharding by application
col_a%3=0 col_a%3=1 col_a%3=2

tbl_a tbl_a tbl_a


DB1 DB2 DB3
2.Choose a connection and get data

AP1 AP2 AP3

1.Request 3.Response
tbl_a.col_a = 1

DB sharding by an application can


resolve performance problems.
ST Global., Inc All Rights Reserved
DB sharding by application

But…
DB sharding by application has the following
problems.
– Can not join tables with different database servers.
– Applications must implement or abandon synchronized
updates on different database servers.
– The application engineers need to have a high level of
database skill to implement database sharding.
– It is very difficult to implement “DB sharding by application”
to an application and when implemented requires a lot of
effort.

ST Global., Inc All Rights Reserved


DB sharding by Spider

You get an additional choice now!!

You can use Spider


for SHARDING!

ST Global., Inc All Rights Reserved


DB sharding by Spider
col_a%3=0 col_a%3=1 col_a%3=2

tbl_a tbl_a tbl_a


DB1 DB2 DB3
3.Choose a connection and get data

tbl_a tbl_a tbl_a


DB DB DB
2.Request 4.Response
AP1
from application AP2
to application AP3

1.Request 5.Response
from client to client
tbl_a.col_a = 1
DB sharding by Spider can
resolve performance problems.
ST Global., Inc All Rights Reserved
DB sharding by Spider

And…
– Tables in different servers can be joined.
– The application does not need to implement
synchronized update. (Spider does it.)
– The application engineers can develop
applications without DB sharding skills.
– It is very easy to deploy on the database for it
usually requires no changes in the application and
the SQL.

ST Global., Inc All Rights Reserved


Case study
About Sagool.tv

Sagool.tv is a video streaming website,


like www.youtube.com
But all the content is collected
from the web by crawlers and
we can watch videos idly like TV.

Sagool.tv was created by

【Team Lab Inc. ]

http://www.team-lab.com
http://www.team-lab.net
ST Global., Inc All Rights Reserved
Sagool.tv (search page)

Sagool.tv was created by Team Lab Inc.


ST Global., Inc All Rights Reserved
Sagool.tv (video streaming page)

Sagool.tv was created by Team Lab Inc.


ST Global., Inc All Rights Reserved
Sagool.tv previous Architecture

Master Master
DB DB Crawler Crawler ……
replication

…… Slave Slave Full-text Full-text ……


DB DB search search
1.Get data 2.Register again,
again…
…… ……
AP AP Batch Batch

Batch processing must create a full-text index every day.

ST Global., Inc All Rights Reserved


The Problem of Sagool.tv

But…

The DB reference performance degraded


as the number of records increased.

The batch processing took well over 24 hours


once 30 million records were exceeded.

In this case, the MySQL cluster could not be used


because their servers had hard memory limitation.

Then, Spider was used.


ST Global., Inc All Rights Reserved
Sagool.tv Architecture with SPIDER


Master Master replication tbl_a Crawler Crawler
DB DB
DB
replication col_a%4=0 col_a%4=3 Full-text Full-text …
Data search search
tbl_a sharding tbl_a again, again…
… Slave Slave
by Spider
DB DB DB DB
2.Register
1.Get data
tbl_a tbl_a
tbl_a tbl_a DB DB
… DB DB …
AP AP Batch Batch
col_a%4=1 col_a%4=2
1.Get data

We added a slave DB using Spider and 4 remote


databases.
Then we added a MySQL server with Spider on the
batch server. ST Global., Inc All Rights Reserved
Sagool.TV: Increase the performance
As a result

1. By using Spider, the DB reference performance improved


dramatically by decreasing the number of records on
each server.

– The DB performance increased X10.


– The batch processing became X5 faster.
(Batch times are currently about 8 hours)

2. Spider is transparent so there is NO need to modify the


applications.
3. Spider can be applied in the areas of the DB where there
are problems without making changes to the
architecture.

SPIDER makes SHARDING easy ST Global., Inc All Rights Reserved


Another Example: KADOKAWord.jp

KADOKAWA is the largest publisher in Japan.

Their site has a large number of websites (over 80)


for its media, books, and merchandise.

KADOKAWord.jp is cross-searching
service for their websites content.
KADOKAWord.jp is operated
by KADOKAWA MEDIA MANAGEMENT CO.,LTD.

ST Global., Inc All Rights Reserved


KADOKAWord.jp with SPIDER

At KADOKAWord.jp
Blackhole and Spider were used
because・・・

spikes in log traffic from it’s group sites is


often generated.

ST Global., Inc All Rights Reserved


KADOKAWord.jp: Architecture with SPIDER

… tbl_a tbl_a
DB DB 3.Log data collecting
2.Replication using Spider
replication

tbl_a tbl_a Blackhole tbl_a


DB DB Statistical
… 1.Write log DB
AP AP

Currently,
there have been no problems with high log traffic.

ST Global., Inc All Rights Reserved


Expanding MySQL’s basic functionality
using Spider
Expanding MySQL’s basic functions

Spider can reinforce and expand other


Storage Engines’ features by
cooperation.

ST Global., Inc All Rights Reserved


Some samples of the expanded functionality

1. Slave trigger
2. Double timestamp
3. Multistep partitioning
4. Range partitioning for MySQL Cluster
5. Parallel replication
6. Synchronous replication
7. “Insert delayed" for InnoDB
8. Effective use for query cache
etc…
ST Global., Inc All Rights Reserved
1:Slave trigger

At row level replication, which is available from MySQL 5.1,


on the master side, row changes made by the trigger are
logged. On the slave side, only the row changes are seen,
and the trigger is not executed.

Using Spider・・・・
you can use the triggers on the slave side.

ST Global., Inc All Rights Reserved


1:Structure sample of slave trigger

tbl_a
Master DB 1.Update request

replication 2.Replication 4.Trigger

tbl_a tbl_a tbl_b


Slave DB DB
3.Update request from Spider

You can use the trigger for statistical information


without service performance issues.
ST Global., Inc All Rights Reserved
2:Double timestamp

In MySQL, there is only one column for the timestamp


for each table.

Using Spider・・・
you can use two timestamp columns.
(You can use more timestamp columns when Spider
table links other Spider table.)
ST Global., Inc All Rights Reserved
2:Structure sample of double timestamp

tbl_a tbl_a
--------------------- ---------------------
col_a timestamp 2.Insert request col_a datetime
col_b datetime from Spider col_b timestamp
DB DB
insert into tbl_a (col_a, col_b) values (‘2009-04-23 14:00:00’, null);

point
1.Insert request
insert into tbl_a (col_a, col_b) values (null, null);

Double timestamp is easy to do with an application,


too.
But you get an additional choice with Spider.
ST Global., Inc All Rights Reserved
3:Multistep partitioning

The table partitioning feature available on MySQL 5.1


can use only two steps, partitioning and sub-partitioning.

Using Spider・・・
you can use four step partitioning; two steps on the
Spider table and two Steps on the remote table.
(You can use more step for partitioning when Spider table links other
Spider table. )

ST Global., Inc All Rights Reserved


3:Structure sample of multistep partitioning

1.Partition by col_a tbl_a


2.Partition by col_b col_a%2=0 col_a%2=1
3.Connect with remote tables col_b col_b col_b col_b
%2=0 %2=1 %2=0 %2=1
4.Partition by col_c
5.Partition again DB

tbl_a tbl_a tbl_a tbl_a


col_c col_c col_c col_c col_c col_c col_c col_c
%2=0 %2=0 %2=0 %2=0 %2=0 %2=0 %2=0 %2=0

DB DB DB DB

You can divide the table data into smaller pieces.

ST Global., Inc All Rights Reserved


4:Range partitioning for MySQL Cluster

MySQL Cluster can only use (LINEAR) KEY


partitioning.

Using Spider・・・
you can use other types of partitionings for
MySQL Clusters.

ST Global., Inc All Rights Reserved


4:Structure sample of range partitioning for MySQL Cluster

1.Range partition by col_a


tbl_a
2.Connect with remote ndb
col_a col_a col_a col_a
tables <100 <200 <300 >=300

DB

tbl_a tbl_a tbl_a tbl_a


(ndb) (ndb) (ndb) (ndb)
DB DB DB DB

The Range partition can be


virtually achieved this way.
ST Global., Inc All Rights Reserved
5:Parallel replication

MySQL uses Serial Replication.

Using Spider・・・
you can use parallel replication

ST Global., Inc All Rights Reserved


5:Structure sample of parallel replication
1.Update request
2.Dividing request
tbl_a
col_a%4=0 col_a%4=1 col_a%4=2 col_a%4=3
DB
tbl_a tbl_a tbl_a tbl_a
DB DB DB DB
replication 3.Replication replication
by parallel
tbl_a tbl_a tbl_a tbl_a
DB DB DB DB
tbl_a
DB 4.Collecting request

High speed replication.

ST Global., Inc All Rights Reserved


5:Spider can implement parallel replication in many ways

Here is another case… 1.Restore from full backup


2.Dividing data tbl_a
col_a%4=0 col_a%4=1 col_a%4=2 col_a%4=3
DB
tbl_a tbl_a tbl_a tbl_a
DB DB DB DB
replication 3.Replication replication
by parallel
tbl_a tbl_a tbl_a tbl_a
DB DB DB DB
tbl_a
DB 4.Collecting data

High speed restoring.

ST Global., Inc All Rights Reserved


6:Synchronous replication

Synchronous replication is not available on


MySQL, yet.

Using Spider・・・
you can use synchronous replication.

ST Global., Inc All Rights Reserved


6:Structure sample of synchronous replication
2.Trigger
1.Update request
tbl_a tbl_b
DB
3.Update request from Spider

tbl_a
DB

For availability, both servers must keep working,


and if trouble develops among the two servers an error is generated.

ST Global., Inc All Rights Reserved


7:"Insert delayed" for InnoDB

You can not use "insert delayed" for InnoDB tables.

Using Spider・・・
you can use "insert delayed" for InnoDB tables.
("insert delayed" becomes another transaction.)

ST Global., Inc All Rights Reserved


7:Structure sample of "insert delayed" for InnoDB
2.Put into delayed queue
1.Insert request
Later
tbl_a delayed tbl_a delayed tbl_a delayed
queue queue queue
DB DB DB
3.Insert request from Spider

tbl_a
DB

If you can use additional servers…


You can create a big queue.

ST Global., Inc All Rights Reserved


8:Effective use for query cache

"query cache" cannot be judged "same statement", if all of the words are
not the same. The effectiveness of cache falls when complicated
“select” statements are multiused, because there is a decrease in
“select” statements judged to be the same.

Using Spider・・・
Spider does not support "query cache", but the effectiveness of
cache can be kept high, because MySQL divides and simplifies
a "select statement" for each table, and Spider send it to the remote server.

ST Global., Inc All Rights Reserved


8:Structure sample of effective use for query cache

tbl_a tbl_b tbl_c

2.Request from Spider DB1 DB2 DB3


select ~ from tbl_a where col_a = 1
select ~ from tbl_b where col_b = ~
select ~ from tbl_c where col_c = ~

tbl_a tbl_b tbl_c


1.Request
select ~ from tbl_a, tbl_b, tbl_c Local DB
where tbl_a.col_a = 1
and tbl_a.col_b = tbl_b.col_b 3.Response
and tbl_a.col_c = tbl_c.col_c

Query cache is used on DB1, DB2 and DB3.

ST Global., Inc All Rights Reserved


Spider Storage Engine

Conclusion
Conclusion
Spider Storage Engine ・・・・・

1. Can reinforce and expand other Storage Engine's features by cooperation.

2. Makes it possible to use tables in remote MySQL servers as if they were in


local MySQL server.

3. Can synchronize an update for remote MySQL servers by XA transaction.

4. Supports table partitioning which is available in MySQL 5.1, and Spider can
connect different servers for each partition.

With these four features ・・・・・


Spider can be used for DB sharding with Transaction.

Spider makes Sharding Easy without loosing functionality.


Applications do not need be changed. Spider can be used
only where it’s needed.

ST Global., Inc All Rights Reserved


Future road map
- Summer 2009
・ Available for General Release.
・ “Engine-condition-pushdown” will be available.
“Intelligent search” updated and more capable.

- Fall in 2009
・ “Savepoint” will be available
Spider will be able to rollback to a save point.
Currently, Spider can only commit or rollback all transaction.
・ Spider will be available on drizzle.
Drizzle is a “slimmed down version of MySQL” designed
for scalability and performance. (Cloud computing)

- Winter in 2009
・ Oracle’s tables will be linked with Spider's.
ST Global., Inc All Rights Reserved
Any Questions?

Thank you for taking


your time!!

ST Global., Inc
Kentoku SHIBA
http://spiderformysql.com

You might also like