Page MenuHomePhabricator

Epic: Implement SEO improvements suggested by Go Fish Digital
Closed, ResolvedPublic

Description

I've been working with Go Fish Digital to figure out how the search engine optimisation of the Wikimedia wikis could be improved. The outcome of the project included a big list of recommendations that they had for us which, if we implemented them, would likely improve the search result rankings for our sites.

This task is an epic which contains all the recommendations provided by Go Fish Digital, which should be implemented to improve our rankings.

Related Objects

StatusSubtypeAssignedTask
ResolvedKrinkle
InvalidNone
InvalidNone
Resolvedmpopov
Resolved Imarlier
Resolved Imarlier
Resolvedmpopov
DeclinedNone
InvalidNone
Resolvedovasileva
Resolvedovasileva
Resolvedpmiazga
OpenNone
Resolvedovasileva
Resolvedmpopov
Resolvedovasileva
Resolvedovasileva
ResolvedNone
ResolvedJdlrobson
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedJdforrester-WMF
Resolved Tbayer
Resolved Tbayer
ResolvedNone
Resolvedovasileva
Invalid Tbayer
ResolvedJan 13 2019mpopov
Resolvedmpopov

Event Timeline

Have we actually engaged as a vendor a company with this on their website? https://gofishdigital.com/online-reputation-management/

Yes. If you have a specific complaint, it might be helpful if you stated it clearly.

Have we actually engaged as a vendor a company with this on their website? https://gofishdigital.com/online-reputation-management/

Yes. If you have a specific complaint, it might be helpful if you stated it clearly.

It looks to me like WMF is working with (and from the looks of T192893 and T193052, have provided with access to private information) a company that might be in the business of whitewashing Wikipedia articles.

It looks to me like WMF is working with (and from the looks of T192893 and T193052, have provided with access to private information) a company that might be in the business of whitewashing Wikipedia articles.

If you have evidence that they're violating the Terms of Use, then I suggest contacting Legal. Potentially libellous accusations like this do not belong on Phabricator.

More technically: I don't believe that Google is actually touching our front end pages at all. They don't spider us any more, AIUI --- we give them a direct feed of the (Parsoid format HTML) content of our pages (from RESTBase) and notify them directly whenever page content changes.

So a bunch of these tweaks to PHP-generated UX HTML would have exactly zero effect for Google results, since Google never sees the front-end HTML. Some might be useful for WMF results in other search engines (Bing?) but I bet our search traffic from these non-Google sites is not very high. Some tweaks might also be useful for 3rd party wikis where google is not using their WMF-focused pipeline -- but 3rd party wikis don't typically use wikidata or language links, for instance.

Similarly, Google does seem to rewrite search results to the mobile site when you search on mobile, but this seems to be a google-internal optimization. It doesn't (AFAIK) have anything to do with the HTML we give them. We should probably have a conversation with our contacts about Google about how exactly their search/spider pipeline works before expending effort on any of these changes. Some may be useful. Others may be more efficiently implemented with changes on Google's side.

EDIT: softened wording, added discussion of impact on 3rd party wikis.

More technically: has anyone informed Go Fish digital that Google isn't actually touching our front end pages at all? They don't spider us any more, AIUI we give them a direct feed of the (Parsoid format HTML) content of our pages and notify them directly whenever page content changes.

So a bunch of these tweaks to PHP-generated UX HTML would have exactly zero effect for Google results, since Google never sees the front-end HTML. It might be useful for other search engines (Bing?) but I bet our search traffic from non-Google sites is not very high.

Are you sure? There was some analysis of our page view logs, and there were lots of hits from different crawlers from different search engines, including Google. I don't know the details myself, but they were definitely accessing our sites.

I believe they still hit our front end for zh.wikipedia.org, because I haven't finished implementing LanguageConverter yet for the Parsoid output (T43716, T190689). Finishing LanguageConverter parity is a priority at the moment so that Google can stop using their legacy crawler for zhwiki. There might be other corner cases where they still use their spider. You can actually test this directly by searching google for content which appears only in Parsoid format HTML (or in the UX, or in the mobile front end). This was easier to do when Parsoid had more bugs/differences when compared to the PHP parser, so it was easier then to find corner cases that were searchable. But I used to be able to easily verify in this way that the non-Parsoid content was not indexed.

We should be able to look at page view logs and the RESTBase logs to identify the google crawler by User-Agent. Verifying details with google is a good idea regardless, as they could run multiple search pipelines or do other tricks. They also hit our API directly. I believe Aaron was on the most recent call w/ our google contacts, as they needed us to raise the ORES limits for their use.

Z. Z. from Google is at Wikimania. He confirmed they still spider the site at a low rate, but only to check errors (ie sanity check their internal representation against what the site actually displays to keep us honest/validate our parsing/validate their internal pipeline). They use a variety of sources to build their representation, including ores, wikidata, restbase, the recentchanges feed, and direct queries to the action API.

I'd really be interested to know what's potentially libelous about labeling activity such as this as whitewashing (from https://gofishdigital.com/online-reputation-management/):

The primary platforms that define your online reputation include:
[...]
Wikipedia
[...]

With Online Reputation Management, we work hard to make all of the positive information easy to find. At the same time, we use many different strategies and tactics to diminish the visibility of negative content, or in some cases, remove it from the web altogether. The end result is a positive online reputation because when people search your name or brand, they immediately find positive content.

Why is Wikimedia Foundation Inc. engaging with a company that engages in Wikipedia whitewashing? If you'd prefer, I can also ask on a mailing list, though Phabricator Maniphest seems like a reasonable enough venue.

This task is about "implementing SEO improvements suggested by Go Fish Digital" (emphasis by me) so mailing list sounds more appropriate for your question when it comes to task scope.

SEO optimization came up on the Audiences 1 QCI presentation, and it was mentioned that one question we had was whether Google used the same ingestion pipeline for all languages / wikis, or whether there were certain things that would work differently on (say) English wikipedia -vs- Spanish wikisource.

I couldn't find a better place to discuss this (is there a phab task for the research to answer this question?), so I'll put it here.

As far as I know, Google currently uses the same "special Wikipedia" ingestion pipeline for all wikis *except for those using LanguageConverter*, which (pending resolution of T43716: [EPIC] Support language variant conversion in Parsoid) use a different more-generic pipeline. I assume this applies for Wikipedias, I'm not certain they use this for wikisource, wikivoyage, etc.

But this suggests in particular that we can use zhwiki/srwiki/etc as a good control case if we do this research, since we "know" that these wikis are using the "old" pipeline, and we "know" that enwiki is using the "new" pipeline. So we can come up with some experimental questions, then see which wikis cluster with zhwiki and which cluster with enwiki.

We also have some contacts at google, so we could probably just ask them directly. But it's worth coming up with our own metrics and monitoring here, to sanity check the info we get from our direct contact and so we have some sort of dashboard notification in case the pipeline changes in the future, whether intentionally or unintentionally.

Niedzielski subscribed.

Substituting this epic of epics from Readers Web quarterly goals for targeted task, T209306. All Readers Web SEO work that has Phabricator tasking should now appear under T209306.

"Optimizing" a website for search engines appears to be the wrong approach to website building to me. If a search engine's algorithm fails to correctly balance the relevance of a website to its users, then the problem is in the algorithm, not the website.

I object to any kind of "SEO" measures. If there are accessibility-related issues to fix, then please describe and fix them as accessibility issues, not "SEO" issues.

ZZ from Google is at Wikimania 2019 and will be on the panel at https://wikimania.wikimedia.org/wiki/2019%3AQuality/Idea_jam_on_quality on Sunday.

If we have any remaining SEO questions we should try to meet up and get them addressed.

Krinkle claimed this task.
Krinkle changed the status of subtask T198949: Add navbox links to mobile page HTML from Duplicate to Invalid.