next up previous
Next: Construction of textual semantic Up: Enhancement of Textual Images Previous: Enhancement of Textual Images

Introduction

Within the research field of multimodal image indexing, the visual modality is dominant, and despite the rich semantic of textual modality, it is largely ignored in combination with the visual one. Information retrieval systems based on textual modality are now very efficient [1]. The simple and common vectorial system given by Salton [14] has demonstrated its robustness. But these system requires a construction of an index (or a thesaurus) which is mostly carried out by documentalists who manually assign a limited number of keywords describing the image content. On the other side, existing image engines allow users to search for images via a keywords interface or via query by image example [5], [6], [2], [12], [8], [11]. Most of them are based on visual similarity mesures between an image reference and a test one. Nevertheless, most of WWW image engines allow the user to form a query only in term of keywords. To build the image index, keywords are extracted heuristically from HTML documents containing each image, and/or from the image URL. But giving too much keywords, for a precise query, the user can give information that narrows the scope of possible result images. Here again, the query must contains only a few amount of keywords in order to get few answers. Unfortunately it is difficult to include visual cues within a WWW navigator framework. Therefore, it could be interesting to use a second filter stage, adding visual cues which have been put in correspondance with a given textual thesaurus, in order to refine the query. In this paper we demonstrate such a system that combines textual and visual statistics in a single stochastic fusion for content-based image retrieval(CBIR). By truly unifying textual and visual statistics, one would expect to get better results than either used separately. Textual statistics are captured in vector form, used first in an Ascendant Hierarchical Classification (AHC) resulting in few semantic classes. Visual statistics are then drawn inside these classes, based on color and orientation histograms. The last stage consists in a fusion approach, taking advantage of coupling between the textual content of the document and its image content. Search performance experiments are reported for a database containing 600 images collected by Editing, a press agency, involved in the RNTL Muse Project [3]. All pictures are manually indexed by keywords from a hierarchical thesaurus and saved in an XML file following the MPEG-7 format [9]. Results of the visuo-textual classification show an improvement of 54% against a direct classification using textual information alone.
next up previous
Next: Construction of textual semantic Up: Enhancement of Textual Images Previous: Enhancement of Textual Images
Tollari Sabrina 2003-08-28