SPIDER Storage Engine Database Sharding by Storage Engine
SPIDER Storage Engine Database Sharding by Storage Engine
“SPIDER”
ST Global., Inc
Kentoku SHIBA
Problem of too many records
tbl_b
tbl_a tbl_b
tbl_a
tbl_a tbl_b
DB
tbl_a tbl_b
Master DB
Update requests
2.Choose a connection
and get data by AP
AP
1.Request 3.Response
select tbl_b
SPIDER
2.Just connect to spider
AP
1.Request 3.Response
select tbl_b
1.Request 4.Response
commit
1.Request 3.Response
tbl_a.col_a = 1
But…
DB sharding by application has the following
problems.
– Can not join tables with different database servers.
– Applications must implement or abandon synchronized
updates on different database servers.
– The application engineers need to have a high level of
database skill to implement database sharding.
– It is very difficult to implement “DB sharding by application”
to an application and when implemented requires a lot of
effort.
1.Request 5.Response
from client to client
tbl_a.col_a = 1
DB sharding by Spider can
resolve performance problems.
ST Global., Inc All Rights Reserved
DB sharding by Spider
And…
– Tables in different servers can be joined.
– The application does not need to implement
synchronized update. (Spider does it.)
– The application engineers can develop
applications without DB sharding skills.
– It is very easy to deploy on the database for it
usually requires no changes in the application and
the SQL.
http://www.team-lab.com
http://www.team-lab.net
ST Global., Inc All Rights Reserved
Sagool.tv (search page)
Master Master
DB DB Crawler Crawler ……
replication
But…
…
Master Master replication tbl_a Crawler Crawler
DB DB
DB
replication col_a%4=0 col_a%4=3 Full-text Full-text …
Data search search
tbl_a sharding tbl_a again, again…
… Slave Slave
by Spider
DB DB DB DB
2.Register
1.Get data
tbl_a tbl_a
tbl_a tbl_a DB DB
… DB DB …
AP AP Batch Batch
col_a%4=1 col_a%4=2
1.Get data
KADOKAWord.jp is cross-searching
service for their websites content.
KADOKAWord.jp is operated
by KADOKAWA MEDIA MANAGEMENT CO.,LTD.
At KADOKAWord.jp
Blackhole and Spider were used
because・・・
… tbl_a tbl_a
DB DB 3.Log data collecting
2.Replication using Spider
replication
Currently,
there have been no problems with high log traffic.
1. Slave trigger
2. Double timestamp
3. Multistep partitioning
4. Range partitioning for MySQL Cluster
5. Parallel replication
6. Synchronous replication
7. “Insert delayed" for InnoDB
8. Effective use for query cache
etc…
ST Global., Inc All Rights Reserved
1:Slave trigger
Using Spider・・・・
you can use the triggers on the slave side.
tbl_a
Master DB 1.Update request
Using Spider・・・
you can use two timestamp columns.
(You can use more timestamp columns when Spider
table links other Spider table.)
ST Global., Inc All Rights Reserved
2:Structure sample of double timestamp
tbl_a tbl_a
--------------------- ---------------------
col_a timestamp 2.Insert request col_a datetime
col_b datetime from Spider col_b timestamp
DB DB
insert into tbl_a (col_a, col_b) values (‘2009-04-23 14:00:00’, null);
point
1.Insert request
insert into tbl_a (col_a, col_b) values (null, null);
Using Spider・・・
you can use four step partitioning; two steps on the
Spider table and two Steps on the remote table.
(You can use more step for partitioning when Spider table links other
Spider table. )
DB DB DB DB
Using Spider・・・
you can use other types of partitionings for
MySQL Clusters.
DB
Using Spider・・・
you can use parallel replication
Using Spider・・・
you can use synchronous replication.
tbl_a
DB
Using Spider・・・
you can use "insert delayed" for InnoDB tables.
("insert delayed" becomes another transaction.)
tbl_a
DB
"query cache" cannot be judged "same statement", if all of the words are
not the same. The effectiveness of cache falls when complicated
“select” statements are multiused, because there is a decrease in
“select” statements judged to be the same.
Using Spider・・・
Spider does not support "query cache", but the effectiveness of
cache can be kept high, because MySQL divides and simplifies
a "select statement" for each table, and Spider send it to the remote server.
Conclusion
Conclusion
Spider Storage Engine ・・・・・
4. Supports table partitioning which is available in MySQL 5.1, and Spider can
connect different servers for each partition.
- Fall in 2009
・ “Savepoint” will be available
Spider will be able to rollback to a save point.
Currently, Spider can only commit or rollback all transaction.
・ Spider will be available on drizzle.
Drizzle is a “slimmed down version of MySQL” designed
for scalability and performance. (Cloud computing)
- Winter in 2009
・ Oracle’s tables will be linked with Spider's.
ST Global., Inc All Rights Reserved
Any Questions?
ST Global., Inc
Kentoku SHIBA
http://spiderformysql.com