Tutorial, presented at the ICDAR'99, Bangalore,

Tutorial, presented at the ICDAR'99, Bangalore, Part (b) of tutorial, presented at the ICDAR'99, Bangalore, Sept. 19.

(go Up)

Pen-based annotation of photographs
Lambert Schomaker
NICI / Nijmegen University
The Netherlands
hwr.nici.kun.nl

Overview

Others: L. Vuurpijl, E. de Leau, A. Baris, M. Koenen, E. Hoenkamp

Image-based queries on WWW: existing methods and their problems

Query types

A pen-based annotation and query scheme

Experiments

(b): pen-based image annotation and queries
Acronyms galore

IBIR - image-based information retrieval

CBIR - content-based image retrieval

QBIC - queries based on image content

PBIR - pen-based image retrieval!

(b): pen-based image annotation and queries
Existing Methods

QBIC (IBM)
VisualSEEk (Columbia)
Four-Eyes (MIT Media)
Webseek (Columbia)
Excalibur
Imagerover
Chabot
Piction
(i.e.,... booming business)

QBIC (IBM)

Features:

colors, texture, edges, ... later: shape (bitmap) added
Matching:

layout, full-image templates, shape (bitmap, mouse)

... but what was the user's intention with the query (top left photo = query)?

VisualSEEk

Features:

colors, texture, edges, ... later: shape (bitmap) added
Matching:

layout, full-image templates, shape (bitmap, mouse)

... cumbersome, feature selection/weighing requires knowledge ...

FourEyes (MIT Medialab)

imposed block segmentation
textual annotation per block
labels are propagated on the basis of texture matching
... relation between texture and object-related content unclear, arbitary segmentation creates ambiguity...

(b): pen-based image annotation and queries
Query types

Query Matched with: Matching algorithm

A keywords manually provided textual image annotations free text and information-retrieval (IR) methods
B keywords textual and contextual information in the image neighbourhood free text and IR methods
C exemplar image image bitmap template matching or feature-based
D rectangular sub image image bitmap template matching or feature-based
E layout structure image bitmap texture and color segmentation
F object outline image bitmap, contours feature-based
G object sketch image bitmap feature-based

(excluded from this table are: point & click navigation in systematically organized image bases)

(b): pen-based image annotation and queries
Problems

full-image template matching yields bad retrieval results

feature-based matching requires a lot of input and knowledge by the user

layout-based only suits a subset of image needs

reasons behind a retrieved image list are unclear. Features and matching scheme are not easily explainable to the user (Picard, 1995)

can the system learn from previous
queries in a user community?

(b): pen-based image annotation and queries
Ergonomic, Cognitive & Perceptual aspects

computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions

a survey on WWW revealed that users are interested in objects (71%), and not in layout, texture or abstract features. The preferred image type is photographs (68%), N=170 respondents. (Schomaker et al, 1999)

objects are best recognized from 'canonical views' (Blanz et al., 1999), (and photographers know and utilize this)

(b): pen-based image annotation and queries
How to realize such object-based image search?

object recognition in an open domain? Not possible yet.

manual annotation/textual queries? Possible but expensive. MPEG-4 and notably MPEG-7 allows for sophisticated annotation. But who is going to do it: the content provider or the user, and how?

"what if a form of annotation existed by which intelligent pattern classification could be bootstrapped, on the basis of machine learning?"

(b): pen-based image annotation and queries
Design considerations

focus on object-based representations and queries
focus on photographic images with identifiable objects for which a verbal description can be given
exploit human perceptual abilities in the user
exploit human fine motor control: use a pen to draw object outlines
allow for incremental annotation of image material (to obtain PR bootstrap)
start with a limited content domain

(multiple outlines per photograph are allowed)
(b): pen-based image annotation and queries
Example: animal collection, the outlines

(b): pen-based image annotation and queries
Example: 'bodyworks' shape of motor bicycles

(note the distribution of points of high curvature along the outline)

A query to find an engine

(b): pen-based image annotation and queries
Annotation

after producing pen outlines, it is useful to ask the user for a text label (keyboard,speech,handwritten)

in human cognitive representation of objects, a distinction can be made between 'basic categories', and sub-ordinate and super-ordinate levels (Rosch, 1978). The word "chair" is a mental picture producer, while the more abstract "furniture" is not. This has consequences for the textual annotation of images.

(b): pen-based image annotation and queries
Algorithm: a few matching possibilities

(a) match the query outline ([(x)\vec],[(y)\vec]) with all outlines which are present in the database

(b) match the image I(x,y) content within the outline ([(x)\vec],[(y)\vec]) with existing templates in the database,

(c) match a query outline with image edges DI(x,y) of unseen photographs (!)

Simple 1-NN matching will be used for all feature categories. Outline matching: best starting point and clockwise/counterclockwise search.

(b): pen-based image annotation and queries

System architecture

HP-UX and Linux
C and Tcl/Tk
outline input: mouse or pen

(b): pen-based image annotation and queries
Performance measurement aspects

class labels are needed (!?). Total number of instances is N. Number of instances per class is r.

a hit list of n images is assumed (e.g., n = 10)

Performance measures?

Precision P: the % of images in the retrieved hit list which belong to the intended class (i.e., relative to n)
Recall R: the % of intended images in the hit list, relative to the total number of instances r of that class.

Criterium for "good Precision"?

(b): pen-based image annotation and queries
Hit list: accident or meaningful?

(b): pen-based image annotation and queries
Hit list: accident or meaningful?

Assume a collection of N items where r(ight) of these items are of one type (type A) and the remaining N-r are of another type (type B). Wanted is the probability of obtaining exactly X items of type A in a subset of n elements randomly drawn from the total of N items. Then X is distributed according to the Hypergeometric Distribution:

Given a precision proportion q for the result of a particular query, there will be a number of x = n q correct items in the hitlist. For a meaningful result we want q >> p(X = x).

Example (given N=750 images in total r=50 instances in target class, n=16 images in hit list): P(X=0) = 0.33 P(X=1) = 0.38 P(X=2) = 0.21

i.e., Finding 1 hit in list of 16 is not so unlikely: p = 0.38

(b): pen-based image annotation and queries
Test the concept: Can the users do it?

Number of subjects producing an outline: 33, number of photographs: 10.
Photographs: brain, Buddha christmas tree, monster truck, jukebox,locomotive, motor cycle, mushroom cloud, pistol. Results kindly provided by Arie Baris.

(b): pen-based image annotation and queries
Test the concept: Can the users do it?

Answer: yes, but multiple interpretations are sometimes possible (locomotive with or without smoke)

Subjects differ in their precision to follow all curvature peaks

(b): pen-based image annotation and queries
Hierarchical clustering on outlines

cluster structure shows separability of shapes

(b): pen-based image annotation and queries
(b) Conclusion

Users can do it (but it is more 'expensive' than typing keywords)

Outline matching works well in objects of medium complexity.

The simpler the shape, the more important becomes pixel content within the outline

Catch: for generalizing by use of outline to image-edge matching,
translation, scale, rotation and mirror invariance must be solved!
This is easy in outline-outline matching.

(b): pen-based image annotation and queries
(b) Conclusion (continued)

Problems of occlusion and perspective play less of a role than expected

Multiple-outline matching allows for geometric structure matching in multiple objects (Del Bimbo, 1998)

You can do more with a pen than handwriting alone!

References

Blanz, V., Tarr, M.J. & Buelthoff, H.H. (1999). What object attributes determine canonical views? Perception. 28, 575-599. Del Bimbo, A. and Vicario, E. (1998). Using Weighted Spatial Relationships in Retrieval by Visual Contents IEEE Workshop on Content-Based Access of Image and Video Database in conjunction with CVPR '98, Santa Barbara, California, USA, 35-39. Pentland, A., Picard, R., and Sclaroff, S., Photobook: Tools for Content-Based Manipulation of Image Databases SPIE Storage and Retrieval of Image & Video Databases II, Feb 1994 TR #255 Picard, R.W. (1995) Light-years from Lena: Video and Image Libraries of the Future Proceedings of the International Conference on Image Processing (ICIP), Oct '95, Washington DC, USA. Vol I, 310--313. Rosch, E. (1978). Cognition and Categorization. Hillsdale, N.J.: Erlbaum. Schomaker, L., de Leau, E. & Vuurpijl, L. (1999). Using pen-based outlines for object-based annotation and image-based queries. In: D.P. Huijsmans and A.W.M. Smeulders (Eds.). Visual Information and Information Systems, New York: Springer, pp. 585-592. Schomaker, L., Vuurpijl, L. & de Leau, E. (1999). New use for the pen: outline-based image queries. Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR '99). Piscataway (NJ): IEEE. pp. 293-296. Vuurpijl, L. & Schomaker, L. (1997). Finding structure in diversity: A hierarchical clustering method for the categorization of allographs in handwriting, Proceedings of the Fourth International Conference on Document Analysis and Recognition, Piscataway(NJ): IEEE CS, p. 387-393. ISBN 981-02-3084-2 URLs as of 14-9-1999 http://www.qbic.almaden.ibm.com/ http://www.media.mit.edu/~tpminka/photobook/foureyes/ http://www.ctr.columbia.edu/~jrsmith/VisualSEEk/

	Query	Matched with:	Matching algorithm

A	keywords	manually provided textual image annotations	free text and information-retrieval (IR) methods
B	keywords	textual and contextual information in the image neighbourhood	free text and IR methods
C	exemplar image	image bitmap	template matching or feature-based
D	rectangular sub image	image bitmap	template matching or feature-based
E	layout structure	image bitmap	texture and color segmentation
F	object outline	image bitmap, contours	feature-based
G	object sketch	image bitmap	feature-based