Case Studies

Optimizing & Enhancing the Performance of an Image Processing Algorithm

November 30, 2022

This case study emphasizes our role in creating an optimized pipeline for Chroma Correction Algorithm and future enhancements for one of our clients.

The Client

The Customer is a leading global developer of semiconductor solutions. The client was building the world’s smallest image sensor for smartphone cameras and ISPs and the corresponding software pipeline around it.

The Project

The client had a complex image processing-based pipeline as part of their RGB sensor and camera ISP module. The goal of the project was to optimize the Chroma Correction module of this software pipeline by a factor of ~10x to achieve higher performance (in terms of speedup).


  • A very naïve version of the algorithm serving as a base to start with
  • Substantial dependency on third party libraries like OpenCV
  • Data bandwidth related issues had to be managed optimally across modules

Typical Software Optimization Workflow

A typical Software Optimization workflow can be split into the following phases:

Phase 1: This phase would require modifying, compiling & building the application in the target platform ideally with all compiler optimizations disabled. The goal is to determine the correctness of the software.

Phase 2: This phase is called Profiling, to find the areas of code where the application spends most of its run time.

Phase 3: This phase is where actual optimization happens

  • Enabling relevant compiler optimization
  • Cache Friendly Algorithms
  • Optimal usage of available registers & memory transfers
  • Hardware specific optimizations

All the phases and its interdependencies can be pictorially represented as below

Phases of a typical Software Optimization workflow

Solutions Proposed

  • Create control flow graph
  • Hand-optimize modules to replace API calls from OpenCV
  • Design Cache-Aware Algorithm to reduce cache trash
  • Loop Optimizations
    • Code Motion/Loop Invariant
    • Iteration Reordering
    • Loop Unrolling

The MulticoreWare Advantage & Approach

MulticoreWare’s gene pool consists of deep-rooted expertise in performance optimization especially for image and video processing pipelines. We possess in-depth experience in creating software solutions and tool development for multi-core and heterogeneous computing environments. This project had the perfect mix of Optimization and Video/Image processing, another area where MulticoreWare is considered as a market leader.

Redefining the Technical Architecture – With our experience in developing bare metal image/video API’s that are out there as open-source SDK’s (x265/rpp/rocAL) it was an easy task for the MulticoreWare team to remove the dependent third-party libraries like OpenCV. Once the external dependency was removed, designing the new control flow was next step.


Within the estimated project timeline, MulticoreWare team was able to squeeze in ~8x performance speedup for the algorithm

Share Via

Explore More

Aug 25 2023

Machine Learning and Predictive Analytics

The client is one of the leading chip manufacturers in the industry. They were trying to benchmark and enhance the multithreaded Machine Learning (ML) Inference Performance on high core count CPUs.

Read more
Jan 23 2023 Imaging RADAR and IMU based Static Mapping and Localization

Imaging RADAR & IMU based Static Mapping & Localization

This case study emphasize’s the Multicoreware role in Perception algorithm development with Automotive Radars for ADAS applications for one of our clients.

Read more
Oct 19 2022 Intermediate Representation Support for ML Engine

Intermediate Representation Support for ML Engine

This case study emphasizes the role of MulticoreWare in creating and implementing a software layer to enable PyTorch backend support using the client’s existing software stack.

Read more