"Import credits" tool to help getting names from the source

Forums > MobyGames > "Import credits" tool to help getting names from the source

Mtik333 (29529) on 8/24/2023 4:17 PM · edited · Permalink · Report

Recently I shared with approvers some kind of 'tool' to help getting text from large credits sources to make it easier (instead of typing all this crap or OCRing each screenshot manually) - you can read about it here: https://github.com/Mtik333/OCRCreditsMobygamesTest

If you want to try it out, then head to http://vps-4a7c24ce.vps.ovh.net:8080/greeting - before starting, I recommend checking these two videos to 'be aware' on how more or less to work with this tool.

https://www.youtube.com/watch?v=CMHLKLidumI and https://www.youtube.com/watch?v=PD7dfzLSFww

Important: this tool will NOT give you 100% valid output, you have to review the combined text and potentially correct it to match groups, double check spellings on roles/names etc. But at least you won't have to type that all manually.

Feel free to provide feedback (either here or through GitHub, or via other ways of communication), just keep in mind that I'm using Tesseract library (so it's not Google Vision or other non-open-source stuff) and it seems to fail when it comes to detection of some letters in certain situation (m vs. rn, l vs I). There's not so much I can do about that although I'm thinking about giving possibility to 'resize' images in the backend and see if it gets any better. And I'm stoo stupid to think about some automatic thresholding or other binarization as most of such cases would need to be 'interpreted' case by case