Topic on User talk:Ladsgroup

Jump to navigation Jump to search
Mike Peel (talkcontribs)

Hi, thanks for the talk yesterday on ORES, and I hope you didn't mind my questions/comments. :-)

I've been working on adding interwikis to new articles for some Wikipedias (and also Commons!) to Wikidata items, but I'm wondering if there are better ways of doing it (currently I just auto-search for matches, and manually say yes/no to add them, within a python script). I've just proposed a potential Outreachy project to improve the current codes I'm using, see https://phabricator.wikimedia.org/T290718 . As part of that, I'm wondering if machine learning might be applicable here - it feels like there's a great training set with all of the other articles that already have sitelinks, which could then be used to assess how good potential matches are, and maybe the highest confidence matches could then be added automatically, so only lower confidence ones need manual checking. I know of machine learning, though, but not how to actually do it!

If you think this might be possible, would you be interested in being a co-mentor for the Outreachy project, and we can make it a bit more ML-focused?

Ladsgroup (talkcontribs)

Thanks. it is a great idea and I added it to my list of work to be used by AI. There are several ways to attack the problem but definitely building a machine learning system would help. I suggest not to add it to this outreachy work but part of that outreachy work would be to make the code pluggable, so later a service/API can be built and then your code easily use their recommendations. How does that sound? cc @Lydia Pintscher (WMDE)

Mike Peel (talkcontribs)

Thanks for the reply. I can't see a clear way to make the code 'pluggable' - I think that if we're going to use ML, then it has to be built in, or perhaps there has to be a clear way to query it, with a yes/no answer, that maybe could be added as a proceed/stop check.

By clicking "Reply", you agree to our Terms of Use and agree to irrevocably release your text under Creative Commons Attribution-Share Alike 4.0.