This Bengaluru Startup Made the Fastest Inference Engine, Beating Together AI and Fireworks AI

Simplismart’s software-level optimisations enabled Llama 3.1 8B to achieve a throughput of over 343 tokens per second.
This Bengaluru Startup Made the Fastest Inference Engine, Beating Together AI and Fireworks AI
Image by Nalini Nirad
Inference speed is a hot topic right now as companies rush to fine-tune and build their own AI models. Conversations around test-time compute are also heating up with models like OpenAI’s o1 showcasing ‘thinking’ and reasoning skills post-prompt, relying on an infrastructure-powered computation even after training.  This is why companies like Groq, Sambanova, and Cerebras Systems have gained traction building their own hardware and providing unparalleled performance in inference, competing with the likes of NVIDIA and AMD. However, Simplismart, a Bengaluru-based startup led by former Oracle and Google engineers, has emerged as a leader in creating high-performance AI deployment tools. It competes in inference speed on the software side, not focusing on the hardware. Sim
Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey
Mohit Pandey
Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.
Related Posts
AIM Print and TV
Don’t Miss the Next Big Shift in AI.
Get one year subscription for ₹5999
Download the easiest way to
stay informed