Provide web crawler data logs to Go Fish Digital
I'm working with Go Fish Digital to help me understand more about what we can do to improve our SEO. As part of their work, they need some of our data.

They need crawl logs from the following bots:

  • Googlebot
  • Googlebot mobile
  • Googlebot smartphone
  • Bingbot
  • Baiduspider
  • Yandexbot


  • They want about four days worth of raw data of the activity of these bots on our sites.
  • The data shouldn't be older than, say, one month, but it doesn't necessarily need to be the four days immediately preceding today.
  • Data format should be gzipped TSV.


  • I'll get a public key using 2048-bit RSA to transmit the data to them securely.
  • It's possible that they'll come back to us and ask for more or less data, so it'd be good if you could be prepared for that.
  • I believe these bots use specific user agents which should make your job a bit easier.

They have signed a master service agreement which fully covers our privacy policy, data retention, and data security requirements, and the agreement received signoff from Jim Buatti (in Legal) and Toby (the Chief Product Officer), amongst others.

Very time sensitive.

@Deskana: Googlebot Mobile does not seem to be a thing anymore according to its lack of presence on

There was an update to …in 2015, so I doubt it's still operational. If Go Fish Digital knows that Googlebot Mobile is still operation, can they please provide the UA?

Also, how rigorous does this need to be? Specifically, should I verify every crawler's IP address with reverse DNS lookup to make sure the hostnames belong to Yandex, Bing, Google, Baidu? (I suspect the answer is yes but just wanted to make sure because that part isn't trivial.)

@chelsyx Can you please help me? I’ve been able to find documentation (UserAgent strings and instructions for verifying) on Google’s, Bing’s, and Yandex’s crawlers but the best I’ve been able to find for Baidu (in English) is this blog post by a third party from 2011: so I suspect any official documentation about what Baiduspider’s UA looks like these days (or how to verify) would be on Baidu's website and in Chinese.

Here's an example of the documentation I'm looking for:

FAQs of Baiduspider (in English, include UA):
How to identify Baiduspider (in Chinese, let me know if you can't understand it with google translate):

Thank you so much!!!

@Deskana: Googlebot Mobile does not seem to be a thing anymore according to its lack of presence on

There was an update to …in 2015, so I doubt it's still operational. If Go Fish Digital knows that Googlebot Mobile is still operation, can they please provide the UA?

I'll check with them.

Also, how rigorous does this need to be? Specifically, should I verify every crawler's IP address with reverse DNS lookup to make sure the hostnames belong to Yandex, Bing, Google, Baidu? (I suspect the answer is yes but just wanted to make sure because that part isn't trivial.)

Yeah, the extra check makes sense.

@Deskana: Progress update: I have 4 days of data (~12GB gzipped) and right now I have a script that's verifying ~20K IP addresses to determine which ones are legit and which ones spoofed the UA and pretended to be one of those crawlers. As you might expect, that part is taking some time.

A side benefit of doing the verification is that I will have an extra deliverable for the traffic team of all the IP addresses from those days that misrepresented themselves.

Once that's done, I'll just be waiting for upload instructions and a public encryption key.

@mpopov Thanks! I asked them for the public key and upload instructions last Friday, and I should hear back from them soon.

Just uploaded the data to Go Fish Digital.

