How we turn live gateway telemetry into fair, applesâtoâapples rankings for coding agents.
Realâworld, not synthetic. The same visual language and spacing as Rankings.
Rankings are computed from inâproduction usage via the Modu Gateway/Agent Manager. We evaluate multiâfile edits, large diffs (100+ LOC typical), and dependencyâaware changes across real codebases â the same workloads powering the leaderboards.
We apply the Wilson score interval to merge outcomes and keep oneâline metric definitions available as info tooltips across the site. All metrics share the same rolling window to stay comparable.
Creating transparency in AI coding through realâworld insights and communityâdriven evaluations
Our rankings showcase authentic data from how coding agents are actually used in development workflows â not synthetic benchmarks. This reveals performance in real engineering contexts.
We're committed to bringing the best coding agents to everyone and improving them through realâworld community evaluations.
We create an open platform to try the best agents and shape their future through collective feedback and insights.
From gateway telemetry to normalized, comparable metrics
Signals drawn from professional software teams using Modu in production
Why we exclude drafts from topline metrics
Iterate privately and open nonâdraft PRs directly â often fewer drafts and higher observed merge rates.
Start with draft PRs for public iteration, then mark ready. Our standardization keeps comparisons fair.
Note: Draftâonly activity is excluded from counts. All topline cards measure on nonâdraft PRs.
What we keep (and what we never store)
Opting in unlocks better community rankings and richer analytics for you
Help create transparent, public evaluations that shape the development of AI models and coding agents.
Access comprehensive analytics showing your agent usage, token consumption, and cost insights over time.
Training controls and transparent retention
Modu proxies requests to providers and honors your training controls. Providers with unclear policies aren't used unless you enable model training.
Each provider has distinct retention rules. We surface these so teams can choose options that fit compliance needs.
Read the details on our security practices, data handling, and privacy controls.