Edit report at https://bugs.php.net/bug.php?id=53866&edit=1

 ID:                 53866
 Comment by:         reeze dot xia at gmail dot com
 Reported by:        marcin dot babij at nasza-klasa dot pl
 Summary:            Zend engine's hashtable performance tweaks
 Status:             Assigned
 Type:               Feature/Change Request
 Package:            Scripting Engine problem
 PHP Version:        trunk-SVN-2011-01-28 (SVN)
 Assigned To:        dmitry
 Block user comment: N
 Private report:     N

 New Comment:

This seems dead. no response for a long time :(


Previous Comments:
------------------------------------------------------------------------
[2011-01-28 13:27:48] marcin dot babij at nasza-klasa dot pl

Description:
------------
What was done:
- Hash function in zend_hash.h was rebuilt and became much faster, without 
losing the most important properties.
- Hashtable implementation was changed from Simple chaining to Open addressing 
with linear probing, but with linked bucket, not included in hash array, which 
causes:
-- Bucket structure to lose 2 pointers.
-- Searching works similar, but don't have to jump with pointers stored in 
different memory locations, inserting, deleting and rehashing don't need to 
update linked list, but must search for first empty bucket, which is fast, 
because it scans continuous memory.
-- Load factor decreases from 1.0 to 0.5-0.75 to make less collisions and 
faster hashtable, which in turn increases memory footprint a little.
- Open addressing doesn't change significantly performance, but next thing was 
to create new array (arEmpty), which is of size nTableSize bytes, which keeps 
track of used/empty buckets and makes inserting and rehashing much faster. In 
future it can be tested as bit-array with size of nTableSize/8 bytes.
- More macros were added to replace repetitive constructs.
- New constants were added to allow:
-- Creating new hashtables of size at least X (where 4 and 8 are reasonable), 
which makes no rehashing and reallocing memory while changing size to 2 and 
then to 4.
-- For small tables it's better to extend them by a factor of 4 times, not 2, 
to make rehashing cost smaller for most hashtables, of cost of little higher 
memory consumption.
-- For large tables it's better to have other load factor, closer to 1, while 
for small tables it's better to use load factor closer to 0.5.

What should be done:
- 
http://lxr.php.net/xref/PHP_TRUNK/ext/standard/html_tables/html_table_gen.php#722
 should be changed and html_tables.h regenerated, but this will need to rewrite 
hashtable engine from C to PHP
- APC should be fixed

What can be done:
- Make new constants configurable by php.ini.
- Test if changing arEmpty from byte-array to bit-array helps on performance.
- Tweak default constants' values using some real-live benchmarks.
- Prove (or modify and prove) hash function to have property, that it has no 
collisions if two keys don't differ on no more than 6 bytes, which will lead to 
memcmp omit first (or last) 6 bytes of key. Also simpler thing may be proven, 
that is it has no collisions if two keys are not longer than 6 bytes, which 
will make most string keys omit memcpy at all.



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=53866&edit=1

Reply via email to