next up previous
Next: Classification using textual features Up: Enhancement of Textual Images Previous: Introduction

Construction of textual semantic reference classes

First, in order to map textual and visual information, we need to get a certain number of semantic classes containing few image samples. In this purpose textual statistics are captured in vector form, and we run the Ascendant Hierarchical Classification (AHC) (Lance et Williams, 1967) algorithm described in this section. One can use other method such as a Hopfield network to build semantic classes [17]. Let $D=\{d_1, d_2, \dots, d_m\}$ a document set and $T=\{t_1, t_2, \dots, t_n\}$ a keyword set, the vectorial model ([13],[15], [14], [1]) describes the document $d_i$ as:

\begin{displaymath}\vec{d_i}=(\omega_{1,i},\ \omega_{2,i},\ \dots,\ \omega_{j,i},\ \dots,\ \omega_{n,i})\end{displaymath}

where $\omega_{j,i}$ is the term-weighting, the best known is tf-idf schemes. In this study, for each keyword of the thesaurus, a vector element is initialized to 1 if the keyword belongs to the image, to 0 if not. One thus has $\omega_{j,i}\in \{0,1\}$. The hierarchical structure of the thesaurus implies that if an image is indexed by $t_j$ and $t_j \prec t_k$ then it is also indexed by $t_k$. Therefore, using the thesaurus, one can extend the vector $\vec{d_i}$ [10] so that $\forall j,k \in [1,n]$, $\omega_{k,i}=1$ if $\omega_{j,i}=1$ and $t_j \prec t_k$ else $0$. The usual similarity mesure in the vectorial model is the cosinus. Let $d_k$ and $d_l$ be two images:


where $\omega_{j,k}$ and $\omega_{j,l}$ $\in \{0,1\}$. In this case a simple distance is then defined as:


Two classes $C_p$ et $C_q$ are merged if the distance $D(C_p,C_q)$ is small enough. A first definition for $D(C_p,C_q)$ can be the nearest neighbours distance:

D(C_p,C_q)=\min\{ dist(i,j) ; i\in C_p, j\in C_q\}

but results on our database generates too small or too large classes. The distance of the maximum diameter:

D(C_p,C_q)=\max\{ dist(i,j) ; i\in C_p, j\in C_q\}

gives uniform classes, but without semantic homogeneity. A third usual distance, the average distance:

D(C_p,C_q)=\frac{ \sum_{i,j}\{dist(i,j) ; i\in C_p, j\in C_q\}}{ Card(C_p) \times Card(C_q)}

mainly gives same results as the first one. We then defined another one, thresholding the maximum diameter method by an empiric value (0.7). The continuing criterion $T$ in the final algorithm of the AHC (see below) is defined in order to assure semantic homogeneity inside a same class and enough image samples: classes are merged until the last distance obtained is higher than 0.55. Ascendant Hierarchical Classification (AHC)
program AHC
    E: the set of n elements to classify
    Dist: the array n*n of distances between elements 
    C: a set of semantic classes
    For	each element e in E
      Add Classe(e) in C
    end	For
    While T do
      Merge the 2 nearest classes
    end	while
Finally, after removing classes having less than 8 samples, we obtain 24 a priori classes (some are given in table 1), for a total of 517 images.

Table 1: List of the more frequent terms of 10 classes
Classe $T_{f_1}$ $T_{f_2}$ $T_{f_3}$  
1 Mexique Politique Portrait  
2 Israël Judaïsme Patrimoine  
3 Constructeurs Transport Automobile  
4 Contemporaine Portrait Rhône  
5 Portrait Armée de l'air Aéronautique  
6 Société Famille Enfant  
7 Cameroun Agriculture Géographie physique  
8 Municipalité Portrait Les Verts  
9 Elevages Santé Police national  
10 Portrait Média Administrations  

Each of the semantic class is then randomly divided in two partitions: a reference set $B_{Ex}$ and a test set $B_{Test}$. As described later on, the reference set will be used to calculate the most probable textual, visual or visuo-textual class of any image of the test set. Automatic scoring of each classification method will be easely calculated according to the a priori semantic class of each image.

next up previous
Next: Classification using textual features Up: Enhancement of Textual Images Previous: Introduction
Tollari Sabrina 2003-08-28