User Details
- User Since
- Oct 9 2014, 4:50 PM (524 w, 6 d)
- Availability
- Available
- IRC Nick
- ottomata
- LDAP User
- Ottomata
- MediaWiki User
- Ottomata [ Global Accounts ]
Tue, Oct 29
Merged!
Approved
<3 thank you!
Skein support in Kubernetes might not be required
Indeed!
we need to have the spark3-submit binary be a symlink to spark-submit, as we use it extensively in airflow-dags
Mon, Oct 28
Incident report has been moved to Wikitech.
Just sent this email to the flink and paimon user email groups.
how do we help the revision level MERGE INTO? The suggested partitioning schema doesn't, because we have, at an hour level about ~150k events, and at a day level it is ~3.6M
@xcollazo maybe stupid idea:
Sat, Oct 26
I think MW will write various logs to files in ./cache, which is a mounted volume so you should be able to tail them from your host machine. Is there anything tricky in there?
Fri, Oct 25
so how would one go about "enriching" wmf_dumps.wikitext_raw_rc2 with a diff column? the job could filter the full history for only the pages changed in that hour (broadcast join) and then do the self join, but that would still require a full pass over the data which seems expensive. This certainly is solvable, e.g. one could decrease the update interval, but it is tempting to instead implement the diff as a streaming "enrichment" pipeline.
A quick codesearch (https://codesearch.wmcloud.org/search/?q=kafka-php&files=&excludeFiles=&repos=) and local grep yields no results, so this knot might have neatly tied itself
Other use cases:
ACK, I see comment from David in CR.
EventStreams uses OpenAPI specs and doc UI to show a human readable form. E.g. https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_page_change_v1
This is an amazing idea thank you! We have always wanted to make schema.wikimedia.org more readable, but never had time to prioritize it. The existent UI was something I did best effort in like a day or two, so please! A replacement would be amazing.
@BTullis do you think it would be possible to add authentication and a public domain to this service? I think Metrics Platform folks would really like this. cc @mpopov @phuedx (This would allow you to use a stream.wikimedia.org UI but for all internal event streams).