Moreover the description of certain images by local area of interest can be more beneficial for certain types of images than for others. An automatic method determining if an image is of this type or not, would increase the performance of the system. A possible extension to this study would thus consist on the adaptive calculation of the size of the local images according to a measurement of the edge density on the image or an entropic criterion. Indeed, more the image is expected to contain information, more the local images can be numerous but of reduced size.
We presented a simple system for unifying textual and visual informations. We showed that visual information reduces the errors of the textual information without thesaurus of about 50%, which is very promising because of the simplicity of the method. Our system can be added like a fast visual filter (see figure 4) on the result of a request of images on a search engine (such as Google), requested with a small number of keywords, and thus without the use of thesaurus (otherwise no image is found by the search engine).
We could reverse the experiment by considering the textual indices compared to visual classes. This method would make possible to correct a bad textual indexing using the visual content. For example, if a statistical plot image of the working population was labelled automatically by `woman' and `worker', a comparison with visual classes representing `woman' would highlight the indexation error. Therefore, it could automatically remove the word `woman' from the keyword set of this image.