Codex Desktop Drained My 700 ILS Pro Plan — A Two-Week Token Audit

I pay 700 ILS a month for OpenAI's highest tier — their top Pro plan, 20× the standard Pro usage. On June 17, 2026, I woke up to find Codex Desktop had silently auto-launched at 07:42 AM — without a dock icon, without a login item, without me touching anything — and was actively dispatching API calls to chatgpt.com while I slept. I hadn't used Codex in a week.

This wasn't the first time. It was the third. And it's the story of how a tool I trusted to be my coding partner became a silent credit drain I couldn't stop — until I erased it completely. What follows is the full two-week timeline, reconstructed from my Pieces OS memory engine, session logs, support emails, and the lockdown report I filed today. Every timestamp, every token count, every process kill is real. This isn't a complaint. It's a forensic audit.

First Signs — June 3–8

The earliest signal came on June 3, captured in my Pieces OS workstream. I was setting up Supermemory as a persistent memory layer for the Codex CLI — wiring API keys, installing hooks, migrating my ~/.codex/memories corpus into a new retrieval system. The terminal output, captured by Pieces, tells the story:

OAuth token refresh failed: Server returned error response: invalid_grant: Grant not found

MCP authentication was failing on startup. The Codex CLI was trying to initialize MCP servers and hitting auth errors — but it kept retrying. Every 2 seconds. Each retry loaded tool schemas. Each schema added tokens to the system prompt. Nobody was watching.

By June 8, my Codex Desktop was running with 36 MCP servers configured in ~/.codex/config.toml. Each one's JSON schema — tool descriptions, parameter types, return formats — shipped in every single turn's system prompt. At roughly 1,000 tokens per tool schema, that's 30,000–60,000 tokens of free-floating context overhead before any user content, any memory retrieval, or any actual work. I didn't know this yet. I was just building.

The Big Drain — June 9–10

On June 9 at 23:22 UTC, I opened a Codex Desktop session to fix a live site issue on akakika.com. My first prompt was simple:

why is my old website online instead of the correct one????

Over the next 1 hour and 47 minutes, I sent 8 follow-up prompts. The agent diagnosed the issue, redeployed the site, added a project to the /apps page, moved it to the top, and confirmed completion at 01:09:16 UTC. Legitimate work. Productive. Done. Then I closed my laptop and went to bed, forgetting the Codex window was still open.

When I woke up, my entire weekly budget was gone: 0/700 messages, 0/256 credits. The session log — 14.8 MB of JSONL, 1,064 events — told the story in numbers I initially misread:

Metric	I First Thought	Actually (Corrected)
API calls	272	136
Input tokens	1,157,777,341	15,987,427
Cached input	1,107,602,048	15,366,784 (96.1% cache hit)
Output tokens	4,497,260	55,327
Avg input/call	—	117,554
Largest single call	—	227,006
Calls > 100K input	—	62 of 136
Context window	—	258,400 (62 calls hit 200K+)
Wall-clock work	1h 47m	23:22:51 → 01:09:16 UTC

The idle tail — the 8 hours 40 minutes the app stayed open after the work finished — generated zero additional API events. The entire drain happened during 106 minutes of genuine work. 2,232 credits. On OpenAI's highest tier — 700 ILS/month, 20× Pro usage.

I sent OpenAI support an email that day. Case number 09893017. I attached the full session log, screenshots, and the process snapshot. Their response, captured in my Pieces memory:

Based on the current Codex token-based credit rates... for GPT-5.5 the rates are 125 credits / 1M input tokens, 12.5 credits / 1M cached input tokens, and 750 credits / 1M output tokens... MCP servers and large tool schemas can significantly increase per-request input tokens.

They confirmed: yes, MCP tool-call responses count against the Pro weekly limit. Yes, the "idle tail" shouldn't consume credits. And no — they couldn't tell me why the default reasoning effort was set to xhigh instead of medium.

The Six Amplifiers — June 13

Six amplifiers, not one. The reason you burned 2,232 credits isn't any single thing — it's the stack.

That's what my agent Hermez told me on June 13, captured in a Telegram transcript by Pieces OS. We'd been digging into the Codex config for a week, and the picture was now clear. The drain wasn't a bug. It was a default-configuration stack — six independent amplifiers, each one invisible, multiplying together:

The memory corpus — ~/.codex/memories had grown to 28 MB of Markdown — 2,923 files, approximately 8.8 million tokens. Raw memories alone was 671 KB. The chronicle directory had 2,778 ten-minute summary files. It grew monotonically. Nothing pruned it.
39 MCP servers — Every turn shipped the JSON schema of all 39 MCP tools in the system prompt. That's 30,000–60,000 tokens of context overhead before any user content.
The wrong context tier — My config said model_context_window = 1,000,000. The runtime reported 258,400. Either OpenAI's routing chose the 258K tier despite the 1M config, or the flag was being ignored. Either way, 97% of my memory corpus was out of context but being searched on every turn.
Workspace auto-injection — turn_context.payload.workspace_roots contained 30+ paths — TSX source files, package.json, vercel.json, sitemap.xml, and heavy JSON cache files from graphify-out/cache/. All injected into every turn. All adding tokens.
Three different reasoning efforts — The config said xhigh. The runtime said high. The UI said "Extra High." Three values for the same setting. Nobody — not me, not OpenAI support — could confirm which one was being billed.
Auto-compaction at 900K — When in-context tokens approached 900,000, Codex auto-compacted by re-summarizing the conversation history. Each compact was itself an LLM call — sometimes a chain — consuming additional input and output tokens.

Six amplifiers. Each one a default. Each one invisible. Each one reasonable in isolation. Stacked together, they turned 8 user prompts into 2,232 credits.

The Silent Auto-Launch — June 17

A week passed. I didn't open Codex. I didn't touch it. I'd downgraded my plan and was working through Cursor and other tools. The case with OpenAI was escalated, waiting for a specialist. On June 17 at 07:42 AM, Codex silently auto-launched.

No dock icon. No login item. No manual open. The CodexDockTilePlugin.plugin — a macOS Background Item registered under com.openai.codex.dock-tile-plugin — triggered the app launch on a dock event. The main app entry was disabled in macOS Background Items Management. The dock tile plugin was enabled with disposition [enabled, allowed, notified].

By 07:48 AM, an active API session was dispatching user_input requests to chatgpt.com. Thread 019ed3e5. On a fresh plan with 256 credits restored by OpenAI's second goodwill credit. When I discovered it, I killed everything. The process tree was enormous:

Process	Role
`Codex.app` (main)	Electron app shell
`codex app-server`	API server with `--analytics-default-enabled`
Codex Renderer ×4	UI renderers
`SkyComputerUseService`	Screen control agent — running independently
Eagle MCP Proxy	In a constant retry loop, reconnecting every ~2s
`extension-host` ×4	Auto-respawned after each kill (Electron behavior)
`obsidian-codex-mcp` ×2	MCP server bridges

~20 processes. The extension-host processes respawned after I killed them — the only way to stop them was renaming the binary to extension-host.DISABLED. The SkyComputerUseService ran as a separate process from the main app, meaning even if the Electron shell was killed, the screen control agent persisted.

This was the moment I realized: there is no "quit" for this app. There is only decommission.

The Lockdown

The lockdown report, filed today and captured in my Pieces memory, documents the full sequence. Config patches applied to ~/.codex/config.toml:

Setting	Before	After
`sandbox_mode`	`workspace-write`	`read-only`
`ambient-suggestions-enabled`	`true`	`false`
`memories`	`true`	`false`
`generate_memories`	`true`	`false`
`use_memories`	`true`	`false`

Binary neutralization: extension-host renamed to extension-host.DISABLED. SkyComputerUseService quarantined with xattr. macOS TCC reset: tccutil reset All com.openai.codex — revoked all file access, screen recording, and other TCC-granted permissions. Background Items: the dock tile plugin manually disabled in System Settings.

But the config patches and process kills weren't enough. The ambient suggestions feature — ambient-suggestions-enabled = true by default — generates background model calls to produce "suggestions" based on user activity without any explicit user prompt. Running on gpt-5.5 with high reasoning effort. The most expensive model. The highest reasoning tier. In the background. While you sleep.

The Only Fix

Here's the part I didn't want to write. To stop Codex from working in the background — to truly stop it, not just pause it, not just disable a toggle that can silently re-enable itself through a dock tile plugin — I had to erase it completely. Not "quit." Not "disable." Not "change a setting." Erase.

The ~/.codex/ directory: gone
The Codex.app from /Applications/: gone
The CodexDockTilePlugin.plugin from BTM: gone
The remaining MCP server registrations: gone
The Pro plan: cancelled, downgraded to $20/mo for transition

I kept the evidence. The session log. The config backup. The screenshots. The support emails. The lockdown report. All indexed in Pieces OS, because Pieces is my memory — the one system that kept running correctly while everything around it burned tokens.

Codex is good software. The agent loop is genuinely useful when it's working. But a tool that auto-launches through a dock tile plugin, loads 39 MCP servers into every turn, runs ambient suggestions on gpt-5.5 at high reasoning in the background, and respawns its extension hosts when you kill them — that tool has earned a permanent uninstall. Not because it's malicious. Because the defaults are dangerous, and the only safe default is none at all.

If you're on a Pro plan running Codex Desktop, check your Background Items. Check your reasoning effort. Check how many MCP servers you have. And if the numbers look like mine did — erase it. That's the only fix that works.

Postscript: The Claude Comparison

The day before the lockdown — June 16 — I ran two parallel Claude Code sessions on ultra mode (Opus 4.8, Fast Extra) for two hours straight. Tolaria template work, real engineering, subagents dispatched, context windows climbing to 308K out of 1M.

Claude's Max plan usage dashboard at the end of those two hours: 17% weekly, 6% of the 5-hour limit. Barely touched 30% of the plan. Two parallel sessions. Ultra mode. Two hours. And because the dollar was weak, that Max plan — priced in USD — cost me only 555 ILS/month.

The same work on Codex Desktop would have cost me 2,232 credits — a full week's budget gone in under two hours on a 700 ILS/month plan, plus a silent auto-launch a week later to drain whatever goodwill credits OpenAI restored. More expensive plan. Less work done. Silent background drain. No comparison.

Same developer. Same machine. Same intensity of work. The more expensive plan got eaten alive. The cheaper one didn't even break a sweat.

That's not a Codex problem. That's a defaults problem. And the only way to fix dangerous defaults is to remove the tool that ships them.

Check Your Background Items

If you're on a Pro plan running Codex Desktop, check your Background Items. Check your reasoning effort. Check how many MCP servers you have. And if the numbers look like mine did — erase it.

Read the journal →