Benchmark task
Submit a benchmark task.
Send a real task where coding agents are expensive, unreliable, or hard to evaluate. The task may shape future benchmark specs and tools.
Good benchmark tasks include
- The exact job the agent is supposed to complete.
- What information the agent needs to inspect.
- How success can be checked after the fact.
- Where cost, context, tools, or reliability make the task hard.
What happens next
- 1We triage for a concrete cost, reliability, eval, or workflow-architecture problem.
- 2If there is a fit, we ask for a small sample: traces, prompts, tool lists, repo instructions, workflow notes, or anonymized task examples.
- 3The first output is a scoped path: what to inspect, what to measure, and where savings or leverage are most likely.
Direct email
For lightweight notes, use research@vorplabs.com.