So, due to many hours of replag, which are only going to get worse for the next 7-8 hours (meaning at least 12 hours of replag), I've cancelled the current run of rebuildTermsSearchKey.php
Whilst trying to work out where to start again from:
mysql:wikiadmin@db35 [wikidatawiki]> select min(term_row_id) from wb_terms where term_search_key = '';
+------------------+
min(term_row_id) |
+------------------+
247135 |
+------------------+
1 row in set (1 min 9.97 sec)
mysql:wikiadmin@db35 [wikidatawiki]> select * from wb_terms where term_row_id > 247130 limit 10;
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
term_row_id | term_entity_id | term_entity_type | term_language | term_type | term_text | term_search_key |
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
247131 | 41253 | item | bn | alias | Movie theaters | movie theaters |
247132 | 41253 | item | bn | alias | Movie house | movie house |
247133 | 41253 | item | bn | alias | Exhibition | exhibition |
247134 | 41253 | item | bn | alias | Film theatre | film theatre |
247135 | 41253 | item | bn | alias | � | |
247136 | 41253 | item | bn | alias | সিনেমা | সিনেমা |
247137 | 41253 | item | bn | alias | Film exhibitor | film exhibitor |
247138 | 41253 | item | bn | alias | Matinee | matinee |
247139 | 41253 | item | bn | alias | Picture house | picture house |
247140 | 41253 | item | bn | alias | Moviegoer | moviegoer |
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
10 rows in set (0.05 sec)
mysql:wikiadmin@db35 [wikidatawiki]> select min(term_row_id) from wb_terms where term_row_id > 247140 AND term_search_key = '';
+------------------+
min(term_row_id) |
+------------------+
254476 |
+------------------+
1 row in set (15.35 sec)
mysql:wikiadmin@db35 [wikidatawiki]> select * from wb_terms where term_row_id = 254476;
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
term_row_id | term_entity_id | term_entity_type | term_language | term_type | term_text | term_search_key |
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
254476 | 41607 | item | bn | alias | � | |
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
1 row in set (0.00 sec)
These show as a square box on my shell, but are having a resultant term_search_key that is ''.
This makes manually finding a starting point difficult, as above. --only-missing would help, but it's still going to go through the process of finding all these rows that are apparently still '', attempting to repopulate them, and then find the next one. This might take a while.
So my first point is, why is the term_search_key coming out as ''? Is this correct? If necessary, we can try and get the results dumped somewhere so we can work out what said character is.. Or with the IDs above, you might be able to find out through the end user interface.
I can/will start the script again when the replag is fixed. In the meantime, finding out if the above is right/wrong/we don't care would be useful
Version: master
Severity: normal