Page MenuHomePhabricator

CirrusSearch triggers false positive results in CI
Closed, ResolvedPublicBUG REPORT

Description

The CirrusSearch message cirrussearch-boost-templates has "arrows" written as <--. These are identified by CI as something that looks like HTML code that has to be manually reviewed. For example, at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1012930 :

[mediawiki-i18n-check-docker] $ /bin/bash /tmp/jenkins9302411617296347522.sh
+ contains_disallowed_html log/additions.txt
+ cat log/additions.txt
+ sed -E -e 's/<\/? ?(abbr|b|bdi|big|br|charinsert|code|dd|del|div|dl|dt|em|h1|h2|h3|h4|hr|i|kbd|li|mark|math|NDL|nowiki|ol|p|page|pagelist|pages|pre|ref|samp|small|span|strong|sub|sup|syntaxhighlight|templatedata|templatestyles|tt|u|ul|user|var)( ((alttext|class|dir|display|id|lang|title|xml:lang|xmlns)=\\?["'\''][^=<>"'\'']*\\?["'\'']))* ?\/?>//g' -e 's/<!--//g' -e 's/<https?:\/\/[a-zA-Z0-9./-]*>//g'
+ grep '<'
+	"cirrussearch-boost-templates": " # antepin ni baris percis kaya' gini --> \n# Kalo atu halaman ada atu derini sablonan, mangka dia punya ponten penyarian dikali ama prosèntase nyang diatur.\n# Perobahan ke mari langsung tokcèr.\n# Nahwunya begini:\n#   * Semuanya deri lèter \"#\" ke ujung baris ièlah atu sautan.\n#   * Saban baris trakosong ièlah nama sablonan bakal pengaru-aruan, dengen ruang nama, gedé-kecilnya hurup èn macem-macemnya, diintilin ama lèter \"|\" èn diintilin ama angka nyang abisnya ada lèter \"%\".\n# Tulad baris nyang bagus:\n# Sablonan:Bagus|150%\n# Sablonan:Bagus Pisan|300%\n# Sablonan:Jelèk|50%\n# Tulad baris nyang boncos:\n# Sablonan:Foo|150.234234% <-- kaga' bolé ada penenger désimal!\n# Foo|150% <--- tèhnisnya jalan, cuman bakal anterserènta halaman Foo deri ruang nama utama\n# Lu bisa jajal perobahan pengamplasan tibang nglakonin pepintaan nyang diawalin ama boost-templates:\"XX\" nyang mana XX ièlah semua sablonan nyang lu pèngèn aru-aruin nyang kepisah ama apstan gantinya pecah baris.\n# Pepintaan nyang nentuin boost-templates:\"XX\" nyuèkin isi di ni kotak.\n #  antepin ni baris percis kaya' gini -->",
+ echo 'HTML detected. Manual review required'
HTML detected. Manual review required
+ exit 1

The easy solution is to replace <-- with #. It is just as valid, provides the same information, and doesn't trigger false positives (tested here).

Event Timeline