is it just you?

AI quality is felt, not measured. Here's what the crowd is feeling about Claude and Codex this week.

corpus: records · last updated · classifier v0

one number per model, weekly. trend > absolute level — see methodology.

Claude

— mentions this week
complaints, last full week

trend (90 days, weekly)

top complaint phrases (this week, top 10)

    defection rhetoric ("switching to / cancelled / done with" — rhetoric, not behavior)

    ChatGPT / GPT-5 / Codex

    — mentions this week
    complaints, last full week

    trend (90 days, weekly)

    top complaint phrases (this week, top 10)

      defection rhetoric ("switching to / cancelled / done with" — rhetoric, not behavior)

      methodology & limits

      The dashboard answers one question: is the crowd feeling what you're feeling about Claude / Codex right now? It does not measure user satisfaction, model quality, or actual user churn.

      1. Selection-biased by construction

      The dataset is mentions on Hacker News, r/ClaudeAI, r/OpenAI, r/ChatGPT, r/LocalLLaMA, and r/singularity. People who are angry post; people who are satisfied don't. The complaint rate is a property of what gets posted in complaint forums, not of all users. The chart cannot say "users hate model X" — only "this is what people who post this week are reporting." Example permalinks beneath each top phrase reflect what was public at scrape time; some may have been deleted or edited since.

      2. The classifier undercounts

      A record is counted as a complaint if its text contains any phrase from a hand-curated list (config/complaint_phrases.json). Real complaints that don't use those phrases are missed by design — the trade-off is a classifier you can audit line-by-line. The trend (week-over-week, 90-day) captures direction even when the absolute level is conservative. Don't read the percentage as precise — read it as honest.

      3. The headline number is last completed week, not this week

      Reddit posts arrive through a third-party archive that re-indexes daily, so the current week's rate is always a partial sample until indexing catches up a day or two later. Hacker News data is live. To avoid showing a number that drifts under the visitor, the big headline shows the last completed week — full sample, stable. The "this week so far" chip below each panel is a live-but-partial peek; it'll move slightly until the week closes.

      4. Same phrase list, both models

      The classifier applies an identical phrase list to claude and openai content. No per-model tuning. The comparison is honest only if both are measured the same way. The list is in the repo; audit it.

      5. Two-horse race tracks release calendars

      Most short-term variation in the chart is "days since the last contentious release." When OpenAI ships GPT-N, r/OpenAI spikes. Same for Claude. To compare across models without launch confounding, toggle the x-axis to weeks since release.

      6. Volume is asymmetric

      ChatGPT/Codex are mentioned far more than Claude. The big number on each panel is a rate (% of weekly mentions that contained complaint phrases), not a count. Compare the rates between models — not the per-week mention totals shown beneath them.

      7. Twitter/X is missing

      Free-tier Twitter API access closed in 2023 and the paid tier is too expensive for a free dashboard. Bluesky covers a sliver of similar discourse but is journalist-skewed. This is the biggest gap in the dataset.

      8. Defection language is rhetoric, not behavior

      "Switching to," "cancelled," "done with" are tracked because they're high-signal sentiment. They do not measure who actually churned. People announce departures and stay. The chart label says "defection rhetoric," not "% of users defected."