Brave has launched a revamped search API targeting AI apps that need better data retrieval.
The API introduces the LLM Context API, a tool designed to feed large language models (LLMs) structured information instead of lists of URLs.
For engineers building retrieval-augmented generation (RAG) systems, the release argues that the quality of injected context influences output accuracy more than the size of the model itself.
Context quality versus model size
The industry currently emphasises high-end yet closed frontier models. However, Brave shared research showing that smaller open-weight models can beat market leaders if the grounding data is high quality.
Brave released benchmark data comparing their internal chatbot, Ask Brave, against major competitors. Ask Brave runs on the open-weights Qwen3 model using the new API. The evaluation used 1,500 randomly sampled queries and employed Claude Opus 4.5 and Claude Sonnet 4.5 as judges.
Results indicate that context quality is a primary factor in answer quality. Ask Brave achieved a higher win rate than Google AI Mode and ChatGPT, but it trailed Grok:
| Model | Average rating (5-point scale) | Win rate | Lose rate |
| Grok | 4.71 | 59.87% | 10.05% |
| Ask Brave | 4.66 | 49.21% | 15.82% |
| Google AI Mode | 4.39 | 27.07% | 38.17% |
| ChatGPT | 4.32 | 23.87% | 42.22% |
| Perplexity | 4.01 | 10.51% | 64.26% |
Brave attributes this performance to “data-first ranking.” Systems with limited search index access lagged behind in the tests, suggesting that model capability cannot fully compensate for weak context.
From HTML to smart chunks
Standard web search centres on URLs and humans while Brave’s LLM Context API attempts to fix the friction developers face when scraping data. Instead of raw HTML, the API returns “smart chunks.”
The system performs real-time extraction to convert pages into a format optimised for LLMs. This process includes:
- Clean text extraction: The system uses query-optimised snippets and markdown conversion.
- Structured data: It preserves JSON-LD schemas and tables with row-level granularity.
- Code context: The API extracts code blocks specifically for technical questions and coding agents.
- Multimedia and forums: It handles forum discussions and YouTube captions directly.
An in-house system ranks these chunks to find relevant information. Developers can configure the final response to fit within a specific token budget.
Regarding performance, Brave states that this processing adds less than 130ms of overhead at the 90th percentile (p90) compared to normal search. Total latency for calls to the LLM Context API remains under 600ms at p90.
For platform leads, the data source matters. Relying on third-party scrapers brings legal risks like Terms of Service violations and potential shutdowns.
Brave operates an independent search index and is one of three global-scale indexes in the western world, and the only one outside Big Tech. Brave owns its infrastructure and does not scrape providers like Google or Bing. This independence allows the company to offer Service Organization Control 2 (SOC 2) Type II attestation and a Zero Data Retention (ZDR) policy.
ZDR means queries are not logged, stored, or linked to identities. For enterprises in regulated sectors, this prevents client data from leaking into a third-party model training set, as Brave does not use query data to train its own models.
Brave Search developer tools and API pricing
The release consolidates Brave’s offerings into four plans: Search, Answers, Spellcheck, and Autocomplete.
The ‘Search’ plan includes the new LLM Context API alongside web, news, image, and video search. It costs $5 per 1,000 requests. A separate ‘Answers’ plan provides fully researched responses grounded in web results, priced at $4 per 1,000 searches plus $5 per million tokens.
To support integration, Brave introduced ‘Skills’. These modular workflows allow AI editors or command-line interfaces to load instructions and scripts dynamically. An API assistant, integrated into the developer portal, provides guidance on endpoints and code examples.
As large language models commoditise, value shifts to the data fed into them. Injecting structured web data into open-weight models can reduce inference costs while maintaining quality. For technical teams, the choice of search API is about securing a stable and compliant stream of context.
See also: AWS CEO describes AI undermining software fears as ‘overblown’
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.
Developer is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.



