Image Manager > CVB Technology > Common Image Model

This document describes in detail the proposal for a Common Image Model, it is split into the following sections :

Use also the Image data access chapter to get informations about working with CVB Images.

What constitutes an image?

Physically, images are created through optical projection of some section of the real world onto a two dimensional surface e.g. the retina, CCD or CMOS device.

The incident light reacts with some discrete set of sensors (Rods, Cones, Photosites...) distributed over this surface, changing the state of each sensor which serves to record local properties such as gray value, color, etc.

To us, an image in its most primitive sense is a state of such an ensemble of sensors. Mathematically it is a function assigning to every sensor, a state in the sensor's state space.

Parametrizing the sensors by their surface coordinates and assuming that their states can be modeled in some common, finite dimensional vector space you arrive at an analytical model of images:

1.There is a subset P of the plane of coordinate pairs (X, Y) called pixels

2.There is some vector space V, of dimension d, the members of which may (but needn't) be called colors

3.There is a function v assigning to every pixel (X, Y) a vector v(X, Y)

The set P, the dimension d and the function v are the essential properties of an image.

With a basis of V fixed, then equivalently to (2) and (3) there are scalar functions v0, v1, ..., vd - 1 such that v(X, Y) = (v0(X, Y), v1(X, Y), ..., vd - 1(X, Y)).

The functions vi are essentially scalar (gray scale) images and are called color planes or more generally image planes (RedPlane, BluePlane, GreenPlane, HuePlane, IntensityPlane, SaturationPlane, ...).

Again equivalently there is a scalar function v of [0, ..., d - 1] x P such that v(i, X, Y) = vi(X, Y) (e.g. v(Green, 100, 120) = 17.5, ...).

This is the simplest and most general model of an image.

It includes gray scale images, color images, pyramids, stereo images, finite time sequences of images, ...

To make the model suitable for computing, three problems have to be solved:

a)The representation of the model in memory

b)Given (X, Y) rapid decision if (X, Y) is in P (a legal pixel)

c)The rapid computation and/or modification of the function v

Problems (a) and (b) are made solvable by the following simplifications

i.P is a sub-rectangle in a two-dimensional square lattice. In fact, since global translation of image coordinates is not relevant we can assume that (0, 0) is one of the corners of the rectangle.
Therefore P is specified by the integral image properties (Width, Height).
This allows problem (b) to be solved with four comparisons.
The lattice assumption causes problems in rotations and other geometric transformations, interpolation mechanisms have to be provided.

ii.The range of the functions vi is a set Ti of numbers conveniently handled by a computer. This ranges from 8-bit (usual) to 64-bit (double precision) scalars, and may perhaps vary from image plane to image plane.
The data type Ti should support basic arithmetic functions.
The number of bits to represent the i'th image plane is a further property of an image.

To summarise the definition of a CVB image here is a list of essential properties:

•There are positive integer Width and Height dimension defining the basic rectangular domain

P = [0, ..., Width - 1] x [0, ..., Height - 1] in the plane.

•There is a number d, called the dimension of the image and possibly interpreted as the number of color planes.

•For each i = 0, ..., d - 1 there is an ordinal data type Ti identified by the number of bits involved and there is a corresponding function vi:P->Ti.
The function vi is interpreted as the i'th color plane.

CVB Image

An image in the sense of CVB can therefore be regarded as a vertical stack of one-dimensional images.
The height of the stack is the dimension of V (e.g. 1 for gray scale images, 3 for RGB color images, two for stereo or complex images, the height of the pyramid for pyramids, the length of the time sequence for time sequences, etc).

The diagram below illustrates the image concept.

The Coordinate system, defined by an origin and a 2x2 matrix,

Number of Image planes (Dimension)

e.g. 1 for monochrome, 3 for RGB

Data type of up to 255 Bits per pixel per plane

signed/unsigned

integer or floating point

Virtual Pixel Address Table (VPAT) per plane

The planes are double linked lists which can be manipulated.