Published on September 15, 2025
In AI News

Cursor is Using Real Time Reinforcement Learning to Improve Suggestions for Developers

The new Tab model offers 21% fewer suggestions with a 28% higher acceptance rate for suggestions.

By Supreeth Koundinya

Cursor, an AI-powered coding platform, has announced an upgrade for its Tab model—the autocomplete system that provides suggestions for developers.

The company stated that this upgrade reduces low-quality suggestions while boosting accuracy, resulting in “21% fewer suggestions than the previous model while having a 28% higher acceptance rate”.

“Achieving a high accept rate isn’t just about making the model smarter, but also knowing when to suggest and when not to,” Cursor said in the blog post.

To solve the problem, Cursor considered training a separate model to predict whether a suggestion would be accepted or not. Cursor referenced a 2022 research study in which this method was used with GitHub Copilot.

It employed a logistic regression filter on features such as programming language, recent acceptance history and training characters, with suggestions that scored low being hidden.

While Cursor stated that the solution was viable in terms of predicting whether a user would accept a suggestion or not, the AI coding platform noted, “We wanted a more general mechanism that reused the powerful representation of the code learned by the Tab model.”

“Instead of filtering out bad suggestions, we wanted to alter the Tab model to avoid producing bad suggestions in the first place,” added Cursor.

Thus, Cursor used policy gradient methods, a reinforcement learning (RL) approach, to solve the problem. The model receives a reward when suggestions are accepted, a penalty when they are rejected and nothing when it chooses to stay silent.

This method requires ‘on-policy’ data, which is feedback collected from the model that is currently being used. Cursor addressed this by deploying new checkpoints to users multiple times a day and retraining the model quickly on fresh interactions.

“Currently, it takes us 1.5 to 2 hours to roll out a checkpoint and collect the data for the next step. While this is fast relative to what is typical in the AI industry, there is still room to make it much faster,” Cursor stated.

Cursor said the Tab model runs on every user action on the platform, handling over 400 million request per day. “We hope this improves your coding experience and plan to develop these methods further in the future,” it said.

“Online RL is one of the most exciting directions for the field, and I’ve been incredibly impressed with Cursor being seemingly the first to implement it successfully at scale with a frontier capability,” an engineer who works on post-training at OpenAI wrote on X.

This is a big deal. It is the first large-scale demonstration of the advantage of real-time reinforcement learning. The recipe is scalable and requires no intervention in principle; the model can adapt forever as long as it is being used.

There is no way to achieve similar… https://t.co/LLMITX4MeN
— Khurram Javed (@KhurramJaved_96) September 12, 2025

In June, Cursor’s parent company Anysphere announced that it had raised $900 million at a $9.9 billion valuation led by Thrive Capital, Accel, Andreessen Horowitz (a16z) and DST.

The company also launched a $200 monthly ‘Ultra’ plan, which promises 20x more usage than the Pro tier, priced at $20 a month.

In the same month, Cursor also received a platform update, receiving new features that enable automatic code review capabilities, memory features and allow users to set up Model Context Protocol (MCP) servers in a single click.

📣 Want to advertise in AIM? Book here

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.