User:Multichill/Using OpenCV to categorize files

At the time of writing Commons contains about 150.000 uncategorized files. This is only about 1,25% of all files, but it's always nice to be able to lower the number even further. A lot of categorization work has already been done by the CategorizationBot, but this work is all done based on usage of a file. No categorization has been done based on the contents of the file itself.

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. It can be used to "recognize" images. OpenCV could be used to move uncategorized files to one of the unidentified topics categories based on the image characteristics. OpenCV contains several approaches we could use to "recognize" images:

cascade classification
Machine learning
- Object Categorization
  - Normal Bayes classifier: Using the Normal Bayes classifier for image categorization in OpenCV
  - Bag of Words model: The Bag of Words model in OpenCV 2.2
    - bagofwords_classification.cpp can be imported as python module with help of Boost.Python
- Sample dataset for training
  - Caltech-256 Object Category Dataset: http://www.vision.caltech.edu/Image_Datasets/Caltech256/
  - PASCAL Visual Object Classes: http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Some frequently occurring subjects in uncategorized files:

People, could go to Category:Unidentified people
Maps, could go to Category:Unidentified maps
Flags, could go to Category:Unidentified flags
Plants, could go to Category:Unidentified plants
Coats of arms, could go to Category:Unidentified coats of arms
Buildings, could go to Category:Unidentified buildings
Trains, could go to Category:Unidentified trains
Automobiles, could go to Category:Unidentified automobiles
Buses, could go to Category:Unidentified buses
Diagrams

I installed OpenCV as explained here:

I already had Python2.7 installed
Installed the Python eggs of NumPy and SciPy
Downloaded and installed the (rather large) Windows package
Copied the contents of "C:\opencv\build\python\2.7" to "C:\Python27\Lib\site-packages"

In the C:\opencv\samples\ directory there are two folders with example python programs. Fun and useful to play around with!

The first test is to use a already ready classifier to do face detection in combination with Pywikipedia to fill Category:Unidentified people (bot tagged). The first results look promising. I see a lot of faces, but also some false positives. Next step is probably to start training some filters based on Commons images.

Look also at User:DrTrigonBot since it has similar python code.

User:Multichill/Using OpenCV to categorize files

Navigation menu

Search