A mixture of slides presented on Visual'99, Amsterdam; ICDAR'99, Bangalore; and GRCE'99, Paris. A mixture of slides presented on Visual'99, Amsterdam; ICDAR'99, Bangalore; and GRCE'99, Paris.





Using pen-based outlines for object-based annotation and image-based queries




 
                Lambert Schomaker
                Edward de Leau
                Louis Vuurpijl

NICI, Nijmegen Institute for Cognition and Information
University of Nijmegen, P.O.Box 9104
6500 HE Nijmegen, The Netherlands
Tel: +31 24 3616029 / Fax: +31 24 3616066
schomaker@nici.kun.nl
hwr.nici.kun.nl

cogn-eng.gif
 




projects in the Cognitive Engineering group at NICI :





Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




overview







  • image-based retrieval & the user

  • design

  • pattern recognition

  • performance

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




usability problems in image-based retrieval





  • There are already quite a few systems available on WWW, but:

  • What do users want?

Question   Yes No NA
"Did you need an image ..."
"...with a particular object on it?" 122 41 7
"...with a particular color on it?" 25 137 8
"...with a particular texture on it?" 23 137 10



(results of a WWW questionnaire, N=170 responses)

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




usability problems in image-based retrieval what do users want?



  • Object search!
    often: the 'basic categories' (Rosch, 1972)
    cf. Hoenkamp, Schomaker & Stegeman, SIGIR'99
  • Not: 'feature configurations'
    or 'layout structures'

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




queries and matching methods in image-based search


  Query   Matched with:   Matching algorithm  
A keywords  manually provided textual  free text and information-
 image annotations  retrieval (IR) methods
B keywords  textual and contextual information  free text and
 in the image neighbourhood  IR methods
C exemplar image  image bitmap  template matching or
 feature-based
D layout structure  image bitmap  texture and color segmentation
E object outline  image bitmap, contours  feature-based
F object sketch  image bitmap  feature-based

figs/trees.gif
'outline'
= closed curve drawn around an object on a photograph

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




usability problems in image-based retrieval questions:





  • are the users able to produce the queries?

  • do they like to use the query method?

  • what classification performance is required? (and how to measure performance?)

  • is the system able to explain the results? (Picard: 'explainable features')

  • can the system learn from previous queries in a user community?

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




design considerations





  1. focus on object-based representations and queries
  2. focus on photographic images with identifiable objects for which a verbal description can be given
  3. exploit human perceptual abilities in the user
  4. exploit human fine motor control: use a pen to draw object outlines
  5. allow for incremental annotation of image material (to obtain PR bootstrap)
  6. start with a limited content domain

Schomaker, de Leau & Vuurpijl

figs/a-lot-of-horses.gif

(multiple outlines per photograph are allowed)

animal collection outlines



figs/horses.gif
Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




typical bodyworks shape of motor bicycle



figs/bike-bodyworks-red.gif


(note the distribution of points of high curvature along the outline)

figs/bkmotor-nospeech-light.gif


A query to find an engine

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




bodyworks shapes of motor bicycle



figs/bodyworks-ok.gif


Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




motor-bicycle collection driver shapes



figs/drivers-ok.gif


Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




motor-bicycle collection engine shapes



figs/engines-ok.gif


Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




motor-bicycle collection frame shapes



figs/frames-ok.gif


Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




algorithm matching possibilities



  • (a) match the query outline ([(x)\vec],[(y)\vec]) with all outlines which are present in the database

  • (b) match the image I(x,y) content within the outline ([(x)\vec],[(y)\vec]) with existing templates in the database,

  • (c) match a query outline with image edges DI(x,y) of unseen photographs (!)

Simple 1-NN matching will be used for all feature categories.

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




algorithm outline features

figs/outline-features.gif
The raw outline is resampled to a fixed number of samples (100). The center of gravity is translated to (0,0), the size is normalized to an rms radius (sr) of one, yielding the normalized outline ([^[x\vec]],[^[y\vec]]). From the starting point B, the matching process will try both clockwise and counter-clockwise directions, retaining the best result of both match variants. Other normalizations such as left/right or up/down mirroring are optional. In addition, the running angles (cos(f),sin(f)) are added as feature group, as well as the histogram p(f).

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




algorithm image features

The following 68 features were derived from the pixels within the closed object outline:

color centroids The center of gravity for each of the RGB-channels. This gives 6 features: R(x,y), G(x,y) and B(x,y)
color histogram The histogram of the occurrence of 8 main colors: black, blue, green, cyan, red, magenta, yellow and white
intensity histogram A histogram for 10 levels of pixel intensity
RGB statistics The minimum and maximum values of each of the RGB-channels, and their average and standard-deviation (12 features)
texture descriptors A table of five textures was used, with five statistical features each (25 features)

invariant moments Seven statistical high-order moments[] which are invariant to size and rotation

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




results
data set

Data set: 200 mixed JPEG and GIF photographs of motor bicycles. Within this set, 750 outlines were drawn around image parts in the following classes: exhaust, wheels, engine, frame, pedal, fuel tank, saddle, driver, mirror, license plate, bodyworks, head light, fuel tank lid, light, rear light, totalling 15 object classes with 50 different outline samples of each object

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




results
outline matching & within-outline image matching

Results are represented as the average percentage of correct hits in the top-10 hit list (P10), averaged over n = 50 outline instances per class, of which each was used as a probe in nearest-neighbour matching. The query itself was excluded from the matching process.


Query I. P10 (%) II. P10 (%) III. P10 (%) IV. P10 (%)
([^x],[^y]) (cosf,sinf) p(f) image-based
wheels 77.6 81.8 36.0 58.2
exhaust 75.4 79.4 34.0 34.6
engine 57.0 51.4 31.6 49.6
frame 52.0 33.8 38.8 69.4
pedal 47.4 47.2 22.8 33.0
driver 43.6 43.4 20.2 50.2
saddle 41.4 39.2 15.0 20.2
fuel tank 41.4 43.2 23.2 22.8
mirror 40.6 39.8 11.2 22.4
license plate 36.0 47.8 30.2 21.8
bodywork 31.0 26.6 14.4 22.4
head light 30.6 38.2 13.2 30.4
fuel tank lid 29.6 35.8 25.8 23.4
light 21.6 19.4 11.0 27.4
rear light 14.8 14.8 9.0 33.0

Schomaker, de Leau & Vuurpijl

Whereas in general, the outlines outperform pixel-based features in this experiment, a class-dependent feature selection may yield reversed results.

cogn-eng.gif
 




algorithm outline vs edge matching

Ultimately, one will want to use the set of outlines to perform object classification in unseen images, for which only the 'bottom-up' edge representation can be computed. Assuming that scale and translation are already approximately correct, how well can we match the human-generated outlines with the edges?



For each point i on a raw outline (Xi,Yi), a convolution is calculated as follows. Let DI(x,y) be an estimate of the absolute and smoothed derivative of the luminance gradient of an image I(x,y), averaged over a number of suitable directions. Then the local match between an outline point (X,Y) and the edge representation of the image can be calculated as:

MXiYi,DI = w
Sum
dx = -w 
  w
Sum
dy = -w 
DI(Xi+dx,Yi+dy)
Sqrt( dx2 +dy2)
(1)

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




results
outline vs edge matching

figs/edge-results.gif

Figure 1: Results for the matching process between human-drawn outlines and bottom-up calculated image edges as a percentage of outline instances which are correctly associated with their original image. The two curves represent the results of sorting the hit list on mean convolution output M (solid line) or on the maximum value of M (stippled). This performance measure differs from Table  because here instances are matched as opposed to classes.

Schomaker, de Leau & Vuurpijl

cogn-eng.gif
 




Improved outline vs edge matching

The matching results based on outline vs edge matching presented above can be improved. Assuming that an class is often presented in a stereotypical background (cow on a meadow, engine part in shaded metallic textures), it may be useful to perform class-dependent edge matching. This can be done using the human-produced outlines as the target for an MLP edge detector:

48icvo.gif
Schomaker, Vuurpijl & de Leau

generic edge detector (spurious edge pixels!) 48icv_eq.gif

class-dependent edge detector (MLP 49x25x9x1)

48icv_mlp.gif
(training set: heterogenous set of 100+ motor bicycles, outline$ parts determine the edge target output per 7x7 field)

Note that these results are only preliminary because scale and translation invariance has not been achieved at all here.

Conclusion

  • promising results (esp. when compared to HWR problems)

  • succesfully applied to a set of aircraft images

  • computation time...

  • refinement of edge preprocessing will improve the 'bottom-up' search for outlines in unseen images

  • domain-dependent and object-dependent use of features: ideal environment for the multiple-agent paradigm

  • ongoing work: S/N ratios for mouse & pen-based outlines

Schomaker, de Leau & Vuurpijl

List of References


File translated from TEX by TTH, version 2.51.
On 1 Dec 1999, 12:32.