We now merge the textual and visual indices in order to improve the results
obtained with textual classification. The main fusion strategies are early and late fusion.
The fisrt one is usual in CBIR , the second allows more freedom
for adaptive weighting in a stochastical framework . We choose
the second one in this study.
For each image and each class , one
calculates the textual distance
explained in section 3.
Then, it is normalized and we estimate the probability of
membership with the class
We use the same formula for the 5 visual features A:
Therefore, the combination of the posteriors is given by:
is the ER given by . The parameter increases contrast.
The final class is given by:
The figure 3 describes the
results obtained for the fusion of textual
classification without thesaurus (E.R. 13.72%)
and several visual classifications. The first
result (T+Vis[Local]) is obtained using only
best classifications of early fusion of the
ROI () only. The second
(T+Vis[Global]) considers only
classifications on the global indices. The third
(T+Vis[Local+Global]) uses the best
parameters of early fusion of the local and
global indices (). The last
(T+Vis[Dir+Global]) takes into account the
global features for the attributes red, green, blue
and brightness, and the local direction
calculated by DKL(r1,r1).
On this figure, one
notices that our simple ROI fonction generaly improves
classification compared Global for the same .
Naturally, all method converge to the textual ER().
Table 7 summarize rising of
textual classification by the visual classification.
Result of the late fusion of visual and textual classification in %