Machine Learning and Predictive Analytics
August 25, 2023The Client
The client is one of the leading chip manufacturers in the industry. They were trying to benchmark and enhance the multithreaded Machine Learning (ML) Inference Performance on high core count CPUs. The client recognized the importance of maximizing CPU utilization for improving ML inference throughput and overall system efficiency.
The Project
The project centered around optimizing the performance of the ONNX runtime open-source repository, a widely used framework for ML inferencing. The primary goal was to leverage the full potential of high core count CPUs and address the limitation where ONNX runtime could only utilize a single processor group. This limitation resulted in suboptimal utilization of CPUs with more than 64 threads.
Challenges
- Processor Group Limitation: ONNX runtime’s inability to utilize more than one processor group in Windows caused inefficiencies in CPUs with more than 64 threads.
- Inference Performance: The client faced challenges in achieving optimal Inference Performance on high core count CPUs due to limitations in ONNX runtime’s utilization of available hardware resources.
- Dependency Integration: Implementing processor group support required modifications not only within ONNX runtime but also in its dependency, the Eigen repository.
- CPU Utilization Profiling: To identify the root cause of the performance issue, profiling the inference process and analyzing CPU utilization was crucial.
The MulticoreWare Advantage & Approach
MulticoreWare’s team has in-depth expertise in ML software Stack Optimization and demonstrated its prowess in tackling the challenges faced by the client. Our approach involved:
- Profiling and Analysis: Thorough profiling and analysis of the inference process and identifying the lack of processor group support as a key bottleneck.
- Codebase Analysis: Delving into the codebase of ONNX runtime and Eigen to understand the intricacies and dependencies.
- Adding Processor Group Support: Adding a processor group support to ONNX runtime and integrating it into the Eigen repository, ensuring compatibility and optimal performance.
- Parameter Optimization: Optimizing the inference benchmarks by fine-tuning parameters and enabling the ML models to efficiently utilize the available hardware resources.
- Parallel Inference Feature: Introducing a parallel inference feature to the ONNX Runtime’s performance test suite to further maximize the high core count CPU potential, thereby, extracting every ounce of performance.
Outcome
Through MulticoreWare’s expert intervention, the client achieved remarkable results:
- Enhanced CPU Utilization: The implementation of processor group support and parameter optimization led to improved CPU utilization on high core count CPUs, addressing the 64-thread limitation.
- Increased Inference Throughput: The optimizations resulted in significantly higher inference throughput, enabling the client to extract more value from their hardware.
- System Efficiency: The project’s success resulted in enhanced system efficiency, maximizing the utilization of the available hardware resources.
- Deeper Collaboration: MulticoreWare’s in-depth understanding of the ML software stack and dependencies fostered a deeper collaboration between the client and MulticoreWare, setting the stage for potential future projects.
Conclusion
This case study underscores how MulticoreWare’s expertise in optimizing ML software stacks based on the hardware’s architecture, combined with our innovative approach, enabled a major chip manufacturer to overcome limitations, optimize performance, and maximize the potential of high core count CPUs.