Daily brief · Token cost · Agent workflow

AI token economics, written like a field brief.

A bright, readable guide to what AI really costs: model prices, coding-agent workflows, benchmark signals, and practical ways to spend fewer tokens.

明快、可读、可追踪：帮个人和小团队看懂模型价格、Agent 工作流、评测信号和省 token 方法。

Start with 15 hooks Agent JSON llms.txt

Price WatchInput, output, cache, batch, context, retry.

Agent CostClaude Code, Codex, Cursor, Aider, OpenCode.

Token SavingPrompt caching, routing, compression, context discipline.

Starting hooks

From top to bottom, each card gives title, date, summary, and the opening signal so readers can decide fast.

2026-07-03·Simon Willison

Fable's judgement

Agent savings can come from fewer bad loops, not only cheaper tokens.

One of the most interesting tips I got from the Fireside Chat I hosted with Cat Wu and Thariq Shihipar from the Claude Code team at AIE on Wednesday was …

Claude Codeagent judgmentworkflow

Read source →

2026-07-02·Simon Willison

Release: llm-coding-agent 0.1a0

A minimal coding agent maps where token spend happens.

A coding agent built on LLM

coding agentLLMPython

Read source →

2026-07-02·Simon Willison

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

Prompt optimization can be evaluated with harnesses instead of vibes.

Leveraging the DSPy framework, this project evaluates and refines the core production system prompts used by Datasette Agent’s read-only SQL question answerer. The methodology involves a harness where DSPy agents …

DSPysystem promptevaluation

Read source →

2026-07-03·Latent Space

Vercel's Andrew Qu on why agents are a new kind of software

Agent-readable websites are becoming part of the product surface.

The Vercel Chief of Software explains how its agent framework, eve, was created — and why skills, sandboxes and agent-readable websites now matter.

agentsVercelsandboxes

Read source →

2026-07-01·Latent Space

How Cursor deploys AI inside the enterprise

Vibe coding becomes a team budget problem when workflows scale.

Cursor's Pauline Brunet explains how her team of Forward Deployed Engineers help organizations implement agents — essentially setting up software factories.

Cursorsoftware factoryenterprise AI

Read source →

2026-06-30·Simon Willison

What’s new in Claude Sonnet 5

New model releases affect defaults, agent costs, and failure rates.

Claude Sonnet 5 came out this morning. I always head straight for the "what's new" developer docs because they tend to have more actionable information than the official announcement post. …

ClaudeSonnetmodel update

Read source →

2026-06-30·Hugging Face / IBM Research

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Agent benchmarks are moving toward real enterprise migration tasks.

A Blog post by IBM Research on Hugging Face

benchmarkAI agentsJava migration

Read source →

2026-06-29·Simon Willison

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Open-weight coding models can change the API cost equation.

This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built …

open weightsagentic codingcoding model

Read source →

2026-06-26·Latent Space / AINews

OpenAI reports median internal Codex output tokens grew dramatically

Output token growth is a major hidden cost in agent workflows.

It's happening.

Codexoutput tokensagent usage

Read source →

2026-06-22·Interconnects

GLM-5.2 is the step change for open agents

China/open models are part of global agent cost/performance comparisons.

A capability threshold I've been carefully monitoring.

GLMopen agentsChina models

Read source →

2026-06-18·Hugging Face

Is it agentic enough? Benchmarking open models on your own tooling

Your own tooling may matter more than public leaderboard rank.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

agent benchmarkopen modelstooling

Read source →

2025-05-08·Aider

Qwen3 benchmark results

A durable bridge between China models and coding-agent evaluation.

Benchmark results for Qwen3 models using the Aider polyglot coding benchmark.

QwenAidercoding benchmark

Read source →

2026-07-04·Claude Code Docs

How Claude Code uses prompt caching

Prompt caching directly changes speed and token cost.

Claude Code manages prompt caching automatically. See why a model switch triggers a slow uncached turn, what /compact costs, why CLAUDE.md edits don't apply mid-session, and how to check your cache hit rate.

Claude Codeprompt cachingcache hit rate

Read source →

2026-07-01·Ian Wootten

Ditching Claude for OpenCode and OpenRouter

A real switching case from default tools to open router/model workflows.

For the entirety of June I ditched Claude Code and have been using open weight models with Opencode and openrouter.ai. Here

OpenCodeOpenRouterClaude

Read source →

2026-07-04·Contextify

Contextify - Searchable History for Claude Code and Codex

Agent history and reusable context can reduce repeated token spend.

Your Claude Code and Codex history auto-deletes. Contextify keeps it forever in a searchable database, syncs it across every machine, and runs on macOS and Linux.

Claude CodeCodexhistory

Read source →

Readable by people and agents

Static HTML first, with machine-readable endpoints for automation and search.

/llms.txt
site purpose and reading policy
/feed.xml
RSS for briefs and hooks
/data/hooks.json
structured article cards
/sources/
source and citation policy