Forums > Suggestions > Scraper improvements

user avatar

MrMamen (11747) on 3/17/2017 7:39 AM · Permalink · Report

The scrapers are really useful tools for MobyGames, but they have their limits. I think there are only two ways of using them: 1). Adding a new game entry 2) Adding new group of promo images

The latter is rather straight forward, and the only limitation I have found is that there is no way to append new images to a group with it. I have seen a few examples where the store images have been updated with a new release, but it is more time consuming to manually add these to a group.

The new game entry is great as it includes, releases (for app store both original and latest release information), cover images, promo images and Ad Blurbs. With the exception of promo images, this is the only way to automatically import these items. Since the code already exists, it would be great if you could scrape app icons, ad blurbs, and releases even from existing entries.

My suggestion is: 1) Support more then one scrape source when entering a new game. 2a) Add an option for "contribute by scraping" which would try to import everything which the new game scraper does. This should probably need some user input such as "select existing image group" or "create new", to avoid double entries. 2b) If 2a is to complicated, a similar system as add promo images scraper should be included on cover images, releases, ad blurbs. 3) Perhaps tech specs could be scraped from a lot of these sites as well.

user avatar

Tracy Poff (2096) on 3/19/2017 12:18 AM · Permalink · Report

The latter is rather straight forward, and the only limitation I have found is that there is no way to append new images to a group with it. I have seen a few examples where the store images have been updated with a new release, but it is more time consuming to manually add these to a group.

Yes, this is an intentional decision--for the moment, at least, there's no way for the scraper to know which images are new, so we'd end up with many duplicated images. It's not perfect, but I think it handles the average case well enough.

1) Support more then one scrape source when entering a new game. 2a) Add an option for "contribute by scraping" which would try to import everything which the new game scraper does. This should probably need some user input such as "select existing image group" or "create new", to avoid double entries.

Probably I would solve #1 by some variation of #2a. You would pick one source which would be used to set the title, official website, etc., and then you could get additional details from other sources, such as new platforms or ad blurbs, using the same interface as for existing games. Some kind of intelligence will be needed by the software when choosing what to acquire and how to deal with any conflicts, but it should not be an insurmountable problem.

3) Perhaps tech specs could be scraped from a lot of these sites as well.

I have that on my todo list. Most tech specs cannot be scraped, but some things can reasonably be inferred, e.g. single-player steam games have a genre on Steam which should correspond to our 'Number of Players: Offline' tech spec. It will take a bit of finesse, but something can probably be done.

It's invisible from the outside, but I've made a change recently which should considerably ease the process of updating these tools. So, although major changes like allowing scraping for existing games are some way off, I hope to continue to make incremental improvements as we go along.