MulticoreWare

Case Studies

Machine Learning and Predictive Analytics

August 25, 2023

The Client

The client is one of the leading chip manufacturers in the industry. They were trying to benchmark and enhance the multithreaded Machine Learning (ML) Inference Performance on high core count CPUs. The client recognized the importance of maximizing CPU utilization for improving ML inference throughput and overall system efficiency.

The Project

The project centered around optimizing the performance of the ONNX runtime open-source repository, a widely used framework for ML inferencing. The primary goal was to leverage the full potential of high core count CPUs and address the limitation where ONNX runtime could only utilize a single processor group. This limitation resulted in suboptimal utilization of CPUs with more than 64 threads.

Challenges

  • Processor Group Limitation: ONNX runtime’s inability to utilize more than one processor group in Windows caused inefficiencies in CPUs with more than 64 threads.
  • Inference Performance: The client faced challenges in achieving optimal Inference Performance on high core count CPUs due to limitations in ONNX runtime’s utilization of available hardware resources.
  • Dependency Integration: Implementing processor group support required modifications not only within ONNX runtime but also in its dependency, the Eigen repository.
  • CPU Utilization Profiling: To identify the root cause of the performance issue, profiling the inference process and analyzing CPU utilization was crucial.

The MulticoreWare Advantage & Approach

MulticoreWare’s team has in-depth expertise in ML software Stack Optimization and demonstrated its prowess in tackling the challenges faced by the client. Our approach involved:

  • Profiling and Analysis: Thorough profiling and analysis of the inference process and identifying the lack of processor group support as a key bottleneck.
  • Codebase Analysis: Delving into the codebase of ONNX runtime and Eigen to understand the intricacies and dependencies.
  • Adding Processor Group Support: Adding a processor group support to ONNX runtime and integrating it into the Eigen repository, ensuring compatibility and optimal performance.
  • Parameter Optimization: Optimizing the inference benchmarks by fine-tuning parameters and enabling the ML models to efficiently utilize the available hardware resources.
  • Parallel Inference Feature: Introducing a parallel inference feature to the ONNX Runtime’s performance test suite to further maximize the high core count CPU potential, thereby, extracting every ounce of performance.

Outcome

Through MulticoreWare’s expert intervention, the client achieved remarkable results:

  • Enhanced CPU Utilization: The implementation of processor group support and parameter optimization led to improved CPU utilization on high core count CPUs, addressing the 64-thread limitation.
  • Increased Inference Throughput: The optimizations resulted in significantly higher inference throughput, enabling the client to extract more value from their hardware.
  • System Efficiency: The project’s success resulted in enhanced system efficiency, maximizing the utilization of the available hardware resources.
  • Deeper Collaboration: MulticoreWare’s in-depth understanding of the ML software stack and dependencies fostered a deeper collaboration between the client and MulticoreWare, setting the stage for potential future projects.

Conclusion

This case study underscores how MulticoreWare’s expertise in optimizing ML software stacks based on the hardware’s architecture, combined with our innovative approach, enabled a major chip manufacturer to overcome limitations, optimize performance, and maximize the potential of high core count CPUs.

Share Via

Explore More

Jun 9 2025

Bidirectional Video Coding Application for Multi-Format Media Handling

A leading telecommunications provider delivering high-performance broadcast and streaming infrastructure solutions serving enterprise and media customers with advanced capabilities in video delivery over traditional and IP-based networks.

Read more
May 9 2025

MulticoreWare’s Breakthrough in Diabetic Retinopathy – A collaborative success with a Medical R&D organization

This project was a collaboration between a leading medical R&D organization and MulticoreWare to develop an AI-driven solution for early detection and diagnosis of Diabetic Retinopathy (DR) through annotated medical imaging. The initiative aimed to enhance diagnostic accuracy and accessibility, particularly in regions with limited access to ophthalmologists and heavy work loads.

Read more
Apr 23 2025

Enhancing x265 with HEVC Screen Content Coding

One of the largest broadcasting and cable television companies worldwide, delivering premium content across a variety of platforms including live TV, on-demand video, and interactive media services.

Read more

GET IN TOUCH

    Please note: Personal emails like Gmail, Hotmail, etc. are not accepted
    (Max 2000 characters)