This case study emphasizes our role in creating an optimized pipeline for Chroma Correction Algorithm and future enhancements for one of our clients.
The Client
The Customer is a leading global developer of semiconductor solutions. The client was building the world’s smallest image sensor for smartphone cameras and ISPs and the corresponding software pipeline around it.
The Project
The client had a complex image processing-based pipeline as part of their RGB sensor and camera ISP module. The goal of the project was to optimize the Chroma Correction module of this software pipeline by a factor of ~10x to achieve higher performance (in terms of speedup).
Challenges
- A very naïve version of the algorithm serving as a base to start with
- Substantial dependency on third party libraries like OpenCV
- Data bandwidth related issues had to be managed optimally across modules
Typical Software Optimization Workflow
A typical Software Optimization workflow can be split into the following phases:
Phase 1: This phase would require modifying, compiling & building the application in the target platform ideally with all compiler optimizations disabled. The goal is to determine the correctness of the software.
Phase 2: This phase is called Profiling, to find the areas of code where the application spends most of its run time.
Phase 3: This phase is where actual optimization happens
- Enabling relevant compiler optimization
- Cache Friendly Algorithms
- Optimal usage of available registers & memory transfers
- Hardware specific optimizations
All the phases and its interdependencies can be pictorially represented as below

Solutions Proposed
- Create control flow graph
- Hand-optimize modules to replace API calls from OpenCV
- Design Cache-Aware Algorithm to reduce cache trash
- Loop Optimizations
- Code Motion/Loop Invariant
- Iteration Reordering
- Loop Unrolling
The MulticoreWare Advantage
MulticoreWare’s gene pool consists of deep-rooted expertise in performance optimization especially for image and video processing pipelines. We possess in-depth experience in creating software solutions and tool development for multi-core and heterogeneous computing environments. This project had the perfect mix of Optimization and Video/Image processing, another area where MulticoreWare is considered as a market leader.
Redefining the Technical Architecture – With our experience in developing bare metal image/video API’s that are out there as open-source SDK’s (x265/rpp/rocAL) it was an easy task for the MulticoreWare team to remove the dependent third-party libraries like OpenCV. Once the external dependency was removed, designing the new control flow was next step.
OUTCOME
Within the estimated project timeline, MulticoreWare team was able to squeeze in ~8x performance speedup for the algorithm
