Tutorial, presented at the ICDAR'99, Bangalore, Part (b) of tutorial, presented at the ICDAR'99, Bangalore, Sept. 19.





Pen-based annotation of photographs

Lambert Schomaker
NICI / Nijmegen University
The Netherlands
hwr.nici.kun.nl


Overview

  • Others: L. Vuurpijl, E. de Leau, A. Baris, M. Koenen, E. Hoenkamp

  • Image-based queries on WWW: existing methods and their problems

  • Query types

  • A pen-based annotation and query scheme

  • Experiments


 




(b): pen-based image annotation and queries


Acronyms galore





  • IBIR - image-based information retrieval

  • CBIR - content-based image retrieval

  • QBIC - queries based on image content



  • PBIR - pen-based image retrieval!


 




(b): pen-based image annotation and queries


Existing Methods





  • QBIC (IBM)
  • VisualSEEk (Columbia)
  • Four-Eyes (MIT Media)
  • Webseek (Columbia)
  • Excalibur
  • Imagerover
  • Chabot
  • Piction
  • (i.e.,... booming business)


QBIC (IBM)

Features:
colors, texture, edges, ... later: shape (bitmap) added
Matching:
layout, full-image templates, shape (bitmap, mouse)

figures/qbic-color-layout-query.gif

... but what was the user's intention with the query (top left photo = query)?



VisualSEEk

Features:
colors, texture, edges, ... later: shape (bitmap) added
Matching:
layout, full-image templates, shape (bitmap, mouse)

figures/screendump-VisualSEEk.gif

... cumbersome, feature selection/weighing requires knowledge ...


FourEyes (MIT Medialab)

figures/four-eyes.gif

  • imposed block segmentation
  • textual annotation per block
  • labels are propagated on the basis of texture matching
... relation between texture and object-related content unclear, arbitary segmentation creates ambiguity...


 




(b): pen-based image annotation and queries


Query types



  Query   Matched with:   Matching algorithm  
A keywords manually provided textual image annotations free text and information-retrieval (IR) methods
B keywords textual and contextual information in the image neighbourhood free text and IR methods
C exemplar image image bitmap template matching or feature-based
D rectangular sub image image bitmap template matching or feature-based
E layout structure image bitmap texture and color segmentation
F object outline image bitmap, contours feature-based
G object sketch image bitmap feature-based

(excluded from this table are: point & click navigation in systematically organized image bases)


 




(b): pen-based image annotation and queries


Problems





  • full-image template matching yields bad retrieval results

  • feature-based matching requires a lot of input and knowledge by the user

  • layout-based only suits a subset of image needs

  • reasons behind a retrieved image list are unclear. Features and matching scheme are not easily explainable to the user (Picard, 1995)

  • can the system learn from previous
    queries in a user community?


 




(b): pen-based image annotation and queries


Ergonomic, Cognitive & Perceptual aspects





  • computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions

  • a survey on WWW revealed that users are interested in objects (71%), and not in layout, texture or abstract features. The preferred image type is photographs (68%), N=170 respondents. (Schomaker et al, 1999)

  • objects are best recognized from 'canonical views' (Blanz et al., 1999), (and photographers know and utilize this)


 




(b): pen-based image annotation and queries


How to realize such object-based image search?





  • object recognition in an open domain? Not possible yet.

  • manual annotation/textual queries? Possible but expensive. MPEG-4 and notably MPEG-7 allows for sophisticated annotation. But who is going to do it: the content provider or the user, and how?

  • "what if a form of annotation existed by which intelligent pattern classification could be bootstrapped, on the basis of machine learning?"


 




(b): pen-based image annotation and queries


Design considerations





  1. focus on object-based representations and queries
  2. focus on photographic images with identifiable objects for which a verbal description can be given
  3. exploit human perceptual abilities in the user
  4. exploit human fine motor control: use a pen to draw object outlines
  5. allow for incremental annotation of image material (to obtain PR bootstrap)
  6. start with a limited content domain


figures/a-lot-of-horses.gif

(multiple outlines per photograph are allowed)

(b): pen-based image annotation and queries


Example: animal collection, the outlines



figures/horses.gif


 




(b): pen-based image annotation and queries


Example: 'bodyworks' shape of motor bicycles



figures/bike-bodyworks-red.gif


(note the distribution of points of high curvature along the outline)

figures/bkmotor-nospeech-light.gif


A query to find an engine


 




(b): pen-based image annotation and queries


Annotation





figures/retann.gif

  • after producing pen outlines, it is useful to ask the user for a text label (keyboard,speech,handwritten)

  • in human cognitive representation of objects, a distinction can be made between 'basic categories', and sub-ordinate and super-ordinate levels (Rosch, 1978). The word "chair" is a mental picture producer, while the more abstract "furniture" is not. This has consequences for the textual annotation of images.


 




(b): pen-based image annotation and queries


Algorithm: a few matching possibilities



  • (a) match the query outline ([(x)\vec],[(y)\vec]) with all outlines which are present in the database

  • (b) match the image I(x,y) content within the outline ([(x)\vec],[(y)\vec]) with existing templates in the database,

  • (c) match a query outline with image edges DI(x,y) of unseen photographs (!)

Simple 1-NN matching will be used for all feature categories. Outline matching: best starting point and clockwise/counterclockwise search.


 




(b): pen-based image annotation and queries






System architecture

figures/model.gif



  • HP-UX and Linux
  • C and Tcl/Tk
  • outline input: mouse or pen


 




(b): pen-based image annotation and queries


Performance measurement aspects





  • class labels are needed (!?). Total number of instances is N. Number of instances per class is r.

  • a hit list of n images is assumed (e.g., n = 10)

  • Performance measures?

    • Precision P: the % of images in the retrieved hit list which belong to the intended class (i.e., relative to n)
    • Recall R: the % of intended images in the hit list, relative to the total number of instances r of that class.

  • Criterium for "good Precision"?


 




(b): pen-based image annotation and queries


Hit list: accident or meaningful?





figures/hit-list.gif


 




(b): pen-based image annotation and queries


Hit list: accident or meaningful?





Assume a collection of N items where r(ight) of these items are of one type (type A) and the remaining N-r are of another type (type B). Wanted is the probability of obtaining exactly X items of type A in a subset of n elements randomly drawn from the total of N items. Then X is distributed according to the Hypergeometric Distribution:

Given a precision proportion q for the result of a particular query, there will be a number of x = n q correct items in the hitlist. For a meaningful result we want q >> p(X = x).


Example (given N=750 images in total
               r=50  instances in target class, 
               n=16  images in hit list):

P(X=0) = 0.33
P(X=1) = 0.38  
P(X=2) = 0.21

i.e., Finding 1 hit in list of 16 is not so unlikely: p = 0.38


 




(b): pen-based image annotation and queries


Test the concept: Can the users do it?



figures/outlines-33-gif.gif
Number of subjects producing an outline: 33, number of photographs: 10.
Photographs: brain, Buddha christmas tree, monster truck, jukebox,locomotive, motor cycle, mushroom cloud, pistol. Results kindly provided by Arie Baris.


 




(b): pen-based image annotation and queries


Test the concept: Can the users do it?

figures/locomotive.gif

  • Answer: yes, but multiple interpretations are sometimes possible (locomotive with or without smoke)

  • Subjects differ in their precision to follow all curvature peaks


 




(b): pen-based image annotation and queries


Hierarchical clustering on outlines

figures/outline_tree.gif

  • cluster structure shows separability of shapes


 




(b): pen-based image annotation and queries


(b) Conclusion



  • Users can do it (but it is more 'expensive' than typing keywords)

  • Outline matching works well in objects of medium complexity.

  • The simpler the shape, the more important becomes pixel content within the outline

  • Catch: for generalizing by use of outline to image-edge matching,
    translation, scale, rotation and mirror invariance must be solved!
    This is easy in outline-outline matching.


 




(b): pen-based image annotation and queries


(b) Conclusion (continued)



  • Problems of occlusion and perspective play less of a role than expected

  • Multiple-outline matching allows for geometric structure matching in multiple objects (Del Bimbo, 1998)

  • You can do more with a pen than handwriting alone!


 




References

Blanz, V., Tarr, M.J. & Buelthoff, H.H. (1999).
      What object attributes determine canonical views?
      Perception. 28, 575-599.

Del Bimbo, A. and Vicario, E. (1998).
      Using Weighted Spatial Relationships in Retrieval by Visual Contents
      IEEE Workshop on Content-Based Access of Image and Video Database
      in conjunction with CVPR '98, Santa Barbara, California, USA, 35-39.                

Pentland, A., Picard, R., and Sclaroff, S., 
      Photobook: Tools for Content-Based Manipulation of Image Databases 
      SPIE Storage and Retrieval of Image & Video Databases II, Feb 1994 
      TR #255 

Picard, R.W. (1995)
      Light-years from Lena: Video and Image Libraries of the Future
      Proceedings of the International Conference on Image Processing
      (ICIP), Oct '95, Washington DC, USA. Vol I, 310--313.

Rosch, E. (1978).
      Cognition and Categorization. Hillsdale, N.J.: Erlbaum.

Schomaker, L., de Leau, E. & Vuurpijl, L. (1999). 
      Using pen-based outlines for object-based annotation and
      image-based queries. In: D.P. Huijsmans and A.W.M. Smeulders (Eds.). 
      Visual Information and Information Systems, New York: Springer, 
      pp. 585-592. 
      
Schomaker, L., Vuurpijl, L. & de Leau, E. (1999). 
      New use for the pen: outline-based image queries. 
      Proceedings of the 5th International Conference on Document Analysis 
      and Recognition (ICDAR '99). Piscataway (NJ): IEEE. pp. 293-296. 
      
Vuurpijl, L. & Schomaker, L. (1997). 
      Finding structure in diversity: A hierarchical clustering method for 
      the categorization of allographs in handwriting, 
      Proceedings of the Fourth International Conference on Document Analysis 
      and Recognition, Piscataway(NJ): IEEE CS, p. 387-393. ISBN 981-02-3084-2      

URLs as of 14-9-1999

http://www.qbic.almaden.ibm.com/
http://www.media.mit.edu/~tpminka/photobook/foureyes/
http://www.ctr.columbia.edu/~jrsmith/VisualSEEk/


Tutorial "new pen-based applications" ICDAR'99 Bangalore. Copyright 1999 L. Schomaker cogn-eng.gif