Comments on image processing

From Endrov


Pixel formats?

Because working with images is computationally heavy, many of the low-level issues are lifted up to the level of the user. One of these is what numerical values you can store (which you never care about when you work with e.g. spreadsheets). We usually talk about bits (size), integers vs floating point, signed vs unsigned. The importance of knowing these cannot be overemphasized.

  • Unsigned integers of N bits can store values within [0, 2^N-1]
  • Signed integers of N bits can store values from [-2^(N-1)-1, 2^(N-1)-1]. Essentially, one bit is sacrificed to represent the sign of the value, and the range is divided by 2.
  • Floating point values always has a sign and can store a large range. They can store a number of decimals and performs rounding to fit into memory (the details are very complicated).
  • Integers are usually of size 8 ("byte/char"), 16 ("short"), 32 and 64 bits
  • Floating point values are commonly of size 32 ("floats") and 64 ("doubles") bits
  • 8 bits are 1 byte

You need to have these ranges in mind when carrying out of operations. If a value doesn't fit, it will "overflow" which means starting around in the range from the other side. From a user point of view, these are garbage values and you want to avoid them.

Choice of pixel format when working with images

My opinion is that 8-bit cameras are still sufficient since usually the signal noise already puts a limit. When processing the data, unsigned 8-bit is not suitable however:

  • Usually the entire range of a pixel is used. Adding together two pixel values will overflow
  • Subtraction of two pixels commonly gives a negative sign. This is information you want to keep, so use a signed integer by all means

A signed 16-bit integer fits the sum of 128 unsigned 8-bit integers (an area of 11x11, or volume of 5x5x5). This is enough to work with small areas, such as convolving or local sums, but it is far from enough to sum the entire image. Signed 32-bit integers are enough for this.

A point against using more than 8-bit in the input images is that even larger data types will be needed down the road, which adds to memory consumption. For practical purposes, unsigned 12-bit is the maximum.

Look-up tables are no longer useful

Colors have in the past been remapped according to a table "old color to new color", also called a LUT (Look-Up Table). These allowed fast arbitrarily complicated maps for 8-bit images. These days 8-bit is rather limited when working with images (but might be enough during acquisition). 16-bit images require 128kb tables, and 32-bit requires 16Gb tables. Floating point data cannot make use of LUTs at all since it is not properly discretisized.

The solution is to work with the functions to remap colors explicitly and never create the tables. More instructions have to be executed but memory I/O is reduced and floating point values can be used.

Acquired images are different from viewed images

Those who work with computer graphics thinks colors are in the range 0-255 (for 8-bit). There is a smallest and largest possible value of the pixels. This turns out to be a very bad paradigm when working with acquired images. Instead, think of the pixel value as a photon count - there is no physical limit, but the pixel data type might impose a numerical limit. Endrov allows you to mix pixel data formats and then it is important to stop thoughts like "8-bit value 255 is all white" because it simply does not fit with the other idea that "16-bit value 65535 is all white". The count 255 is the same no matter what the underlying type is. There is however a map into screen color space which is user controlled. The important thing here is that the map is done exactly when viewing, it has nothing to do with the actual image processing.