
We have a great job - we get paid to watch pornographic clips. But more seriously, we work in the R & D department of
Inventos , which deals with automatic filtering of web content: moderation, copyright protection, etc. We were tasked to build a system to automatically detect pornographic content. Here we will tell how we solved the task.
General classification approach
After reviewing the various implementations of pornography in video available at the moment, we decided to approach the issue comprehensively, that is, to use different signs of pornography. The video is passed through several detectors, each of which returns an estimate of the “pornography” of the video, of course, with different accuracy. Then, the resulting estimates are combined into one final.
We decided to evaluate not the entire movie, but to look for small fragments. The fragment size was determined based on the accuracy of the final classification.
This approach with several detectors allows you to combine them, add new ones and work on each of them separately. At present, the system consists of four detectors:
Each of these detectors returns the probability that our fragment is pornographic. And it remains only to calculate the total probability.
')
Now a little more detail about each detector separately.
Nature of movement
The search for rhythmic movement in the frame - this is where we started our work. But first, a few words about the classification itself. The essence of classification is to divide a certain set of objects into two (in our case) class. For this we:
- take a training set of objects that we classify manually;
- create a procedure for selecting the parameters of a statistical model;
- we train our model on the training set of objects;
- To assess the accuracy of the model, we test on a test set.
That's it, it's simple. That is, at first there was the task of obtaining fragments with a rhythmic porn movement (it was not difficult to collect fragments without porn). A number of videos were viewed, scenes with characteristic rhythmic movement were cut and saved. It took 60 man-hours (for classification, the more objects the better).
We will describe the technical details of the rhythmic movement search in the following articles. Here we note that the basis of our method is the use of space-time filters.
Colour
With color things are easier than with the movement. Each point in the picture has coordinates in a certain color space. We simply determine where the point with such coordinates is more common: in the image of a naked human body or in other areas of the picture. Based on these data, we obtain the characteristic of fullness of the video fragment with the naked bodies of people. Also, we will not deal with a specific implementation right now, just to say a few words about the color space used. We settled on the
YUV color model because:
- color coordinates are only two (U and V);
- discarding the brightness coordinate (Y), we can ignore the different brightness of objects;
- no need to perform additional conversion when working with video.
Frame content
When searching for pornography can not be ignored, and individual frames. Need to look for something there. To extract useful information for us directly from the frames, we decided to use the
Bag of Visual Words . That is, first defined "visual words" - fragments or samples that best characterize the frames with porn and without. It turns out such a set of visual words. And then, when classifying our detector, by the presence of certain words in the picture, it gives an estimate of the pornography of this frame.
Sound
The sound detector is based on two basic parameters that help us recognize pornography:
Thus, we can judge the presence of moans (of course, with some probability) in the audio fragment. That is, according to these two parameters, our detector classifies a fragment.
Conclusion
And it's all? Of course not. This is just an introduction. We just decided not to dump all the technical details on different detectors, but to describe them in separate articles. Because the detectors are fundamentally different, the work on them was carried out separately, and the scope of work (and therefore the scope of the description) was different.
So, to be continued:
Licenzero: simple movementsLicenzero: looking for skin color porn