AuthorsSreevatsan Madhavan is a Technical Lead in the Sensors Engineering TU at MulticoreWare. He is an expert in Image Signal Processing and Camera Sensor and has worked on different ISP platforms. He has extensive experience in ISP tuning, imaging algorithms, and image quality analysis.
Gokhun Tanyer is the Technology Unit Head of Sensor Fusion Engineering at MulticoreWare. He excels in Systems Engineering, next-generation algorithms for Radar, MIMO, Sparse Arrays, Super resolution, Activity Perception, and Multi-sensor fusion. He has won numerous awards and acclamations for his publications and is a proud patent holder for his works on Automotive Radar techniques.
Introduction
Modern sensing requires data that is optimal for feature extraction, be it through classical computer vision systems or deep learning-based models. Most data utilized for sensing (such as object detection, tracking, localization, mapping, segmentation etc.) is in the form of digital images of different modalities (RGB, depth maps, point clouds etc.). Image Signal Processors (ISPs) and the optimal tuning of imaging pipelines have a significant impact on the output of modern sensing systems in terms of accuracy of the model and wealth of information conveyed. This article provides an overview of ISPs, different blocks of an ISP and their effects.
1. An Overview of ISPs
Image Signal Processors (ISPs) are subsystems present in SoCs which handle the pre-processing of any image data (image snapshot frames, video streams) collected by the SoC. This includes image processing such as color correction, white balance, etc. The output digital image/video data can be used in any EDGE application such as object detection as data to train the model, as well as live data whose inference from the model constitutes the information consumed by the end user. The indispensability of ISPs arises from:
- The rapidly developing technology of image sensors (increased information capacity through advancements in semiconductor technology, while optimizing storage and latency concerns) which require implementation of Image Processing algorithms on hardware for pre-processing.
- Image data being at the forefront of many modern sensing applications, which necessitates efficient capture and handling of large volumes of such data.
Following the growing need for accurate and foolproof models which are to be deployed in large-scale commercial and industrial environments, Image Quality (IQ) tuning has become a fundamental step in the development of perception-based machine learning models.
2. Essential Components of an ISP
A simple ISP used to convert RAW data into RGB image data consists of a pipeline of individual modules, each of which performs a specific operation. The high-level breakdown of the ISP is as follows:
- ADC – Analog to Digital Converter: Converts analog signals recorded by the CMOS image sensor into digital signals.
- Memory unit: Stores image data at various parts of the pipeline for quick and efficient processing of image frames.
- Signal processor: Performs signal processing operations on the RAW digital signals to convert them into a standard digital format, while enhancing the image quality. This module is divided into sub-modules such as : Linearization and Black level subtraction, Lens shading correction, etc. The different blocks of an ISP and their effects are discussed in brief below:
2.1. Linearization and Black Level Subtraction
The image sensor outputs a non-linear response for most of the received signals belonging to every color channel. This is a consequence of tone-mapping applied to the RAW data to compress the dynamic range in the sensor output. These non-linear responses should be converted to linear ones for application of image processing such as white balance, and the process of conversion is referred to as Linearization.
In the case of high bit-depth sensors used in WDR (Wide Dynamic Range) applications, the data is compressed using a PWL (Piece-Wise Linear) transform. This is called companding. The RAW data received from the sensor should be de-companded using the PWL LUT before further processing. Every camera sensor has a non-zero output of current even when the received signal is zero (dark current). This dark current value offsets the actual pixel values recorded by the sensor, so it should be estimated and removed before applying any image processing. The estimated value is called Black Level.
2.2. Lens Shading Correction
Lens shading / vignetting is a consequence of the lens arrangement placed on the active area of an image sensor. The intensity of light being focused onto the center of the lens decreases as the radius of curvature of the lens increases, especially in wide FOV applications (wide-angle lens). Some applications in photography consider a certain level of vignetting as a desirable effect, however most applications in human or machine perception consider shading an undesirable artifact which is to be removed for accurate outputs.
2.3. Color Filter Array Interpolation
Most RGB camera sensors do not record 3-channel data for every pixel when captured. Each CMOS pixel on the sensor only captures a single channel (R, G or B). This type of data is commonly referred to as Bayer data, and there are 4 major patterns used to capture such data.
This is done by applying a color filter array on top of the CMOS pixels, to allow only 1 color channel’s response to pass through for each pixel. The patterns used in CFA design are called Bayer patterns (mosaiced data).
The processing step wherein Bayer data is converted into 3-channel RGB data, using a version of pixel interpolation, is called CFA interpolation (commonly referred to as de-mosaicing).
2.4. White Balance
Any scene illuminated by a light source (daylight/sunlight, LED streetlights, indoor incandescent bulb lights) has a color cast on the entire scene, depending on the Correlated Color Temperature of the illuminant (measured in Kelvin).
For most human/machine vision applications, it is desirable to have a neutral scene without color tint. The White balance algorithm estimates the color temperature of a scene and applies gains to each color channel (R, G, B) to correct this color cast.
2.5. Color Correction
Color imbalance is intrinsically present in images captured by modern RGB sensors since the Bayer format predominantly captures green channel content (mosaiced data). Color fidelity is display-dependent. Modern display systems have widely varying parameters, including what is referred to as display gamma.
The color correction algorithm utilizes a reference of the desired display characteristics to correct the color in an image, so as to appear visually acceptable / pleasing on the display in question. This processing step is usually skipped for data fed to machine vision applications as the outputs of the vision model require more natural images to produce accurate results.
2.6. Noise Filtering
Camera image sensors, by virtue of the technology used in their fabrication, are susceptible to eroding factors such as electronic noise (shot, thermal, amplifier) and environmental noise. These noise components can be modelled using statistics (Gaussian, Poisson distributions) and are random in nature. As the brightness of the scene decreases, the signal amplifier applies increased gains for the capture (analog gain) and pre-processing (digital gain) of images, which increases the noise signals in conjunction with real signal data.
Therefore, noise reduction becomes an integral part of image signal processing, particularly for human vision as human eyes are sensitive to such variations in intensity. Machine vision however has higher leniency towards noise fluctuations, many times choosing to skip this step as complex models consider noise as a key feature in generating accurate results (models are trained on noisy data to develop resistance to these characteristics).
2.7. Edge-enhancement
Modern cameras in machine vision applications such as object detection, character recognition etc. require the resolution of data down to the most minute details (texture details, alphanumeric symbols etc.). The state-of-the-art image sensors have very high resolutions, with millions of pixels being recorded in every frame.
Noise, which is an undesired artifact, is removed by filtering the data, but minute details are often indistinguishable from high frequency noise and are also affected. Edge-enhancement post noise reduction helps to recover many such details which are impacted by noise filtering.
Conclusion
MulticoreWare has a wealth of expertise developing algorithms for RGB camera image processing using traditional Computer Vision (CV) algorithms and Deep Learning techniques. Excelling in sensor-driven autonomy, our expertise spans LiDAR, Radar, ToF cameras, thermal, and other sensors. To learn more, write to us at info@multicorewareinc.com