next up previous
Next: Classification using visual features Up: Enhancement of Textual Images Previous: Construction of textual semantic


Classification using textual features

All the features (textual or visual) are vectors of various length described in the following sections. They which will be compared after normalisation to the features of the reference set, according to the Kullback-Leibler distance [*]. All pictures are indexed by keywords from a thesaurus and saved in an XML file following the MPEG-7 format[9]. A Java package (org.w3c.dom) is used to extract keywords from XML files. The hierarchical Thesaurus is composed of 1200 keywords with an average depth of 3. See below an example of an XML file including ``Telephone'' and ``Radio''(simplified MPEG7 schema).
<?xml version="1.0" encoding="UTF-8"?>
<mpeg7:Mpeg7 xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" 
             xmlns:mpeg7="http://www.mpeg7.org/2001/MPEG-7_Schema">
  <mpeg7:DescriptionMetadata>
    <mpeg7:LastUpdate>2002-10-2</mpeg7:LastUpdate>
    <mpeg7:PrivateIdentifier>BAR9501001C-1</mpeg7:PrivateIdentifier>
    <mpeg7:CreationTime>2002-10-2</mpeg7:CreationTime>
  </mpeg7:DescriptionMetadata>
  <mpeg7:ContentDescription xsi:type="ContentEntityType">
     <mpeg7:Creation>
       <mpeg7:Title>Developpement of mobile</mpeg7:Title>
       <mpeg7:KeywordAnnotation>
         <mpeg7:Keyword>Telephone</mpeg7:Keyword>
         <mpeg7:Keyword>Radio</mpeg7:Keyword>
       </mpeg7:KeywordAnnotation>
     </mpeg7:Creation>
  </mpeg7:ContentDescription>
  <mpeg7:ContentDescription xsi:type="ViewDescriptionType">
    <mpeg7:Image>
      <mpeg7:MediaUri>BAR9501001C-1.jpg</mpeg7:MediaUri>
    </mpeg7:Image>
  </mpeg7:ContentDescription>
</mpeg7:Mpeg7>
A first experiment consists in classifying the test set using DKL criterion. Then this estimated classification will be easily compared to the a priori class obtained by AHC. Each class $C_k$ of the reference set $B_{Ex}$ is represented by an average textual vector $\vec{C_k^t}^*$, which is the average of the textual vector of each images that it contains. Then the class of an image $d_{T}$ of the test set $B_{Test}$, described by some normalized textual vector $\vec{d_T^t}^*$ is calculated as:

\begin{displaymath}
C^t(d_T)=\mathrm{argmin}_{k\in\{1,2,\dots,c\}}DKL(\vec{d_T^t}^*,\vec{C_k^t}^*).
\end{displaymath}

We then run two textual experiments: the first consists in extending the textual vector using the thesaurus as explained in section 2, the second in using directly the textual without any extension. Table 2 gives the Error Rate (ER) obtained in the two cases.

Table 2: Classification Error Rate in %, with or without thesaurus extension
Textual Textual
with thesaurus without thesaurus
1.17 13.72


We notice that when the vectors are extended by the thesaurus, error rate is very low. On the contrary, we see that vectors without the information of the thesaurus produces nearly 14% ER. Aiming to use our system in the case of reduced textual information as described previously, we won't extend textual vectors by the thesaurus in the following section.
next up previous
Next: Classification using visual features Up: Enhancement of Textual Images Previous: Construction of textual semantic
Tollari Sabrina 2003-08-28