methodology & limits
The dashboard answers one question: is the crowd feeling what
you're feeling about Claude / Codex right now? It does not measure
user satisfaction, model quality, or actual user churn.
1. Selection-biased by construction
The dataset is mentions on Hacker News, r/ClaudeAI, r/OpenAI, r/ChatGPT,
r/LocalLLaMA, and r/singularity. People who are angry post; people who are
satisfied don't. The complaint rate is a property of what gets posted
in complaint forums, not of all users. The chart cannot say "users
hate model X" — only "this is what people who post this week are reporting."
Example permalinks beneath each top phrase reflect what was public at
scrape time; some may have been deleted or edited since.
2. The classifier undercounts
A record is counted as a complaint if its text contains any phrase from
a hand-curated list (config/complaint_phrases.json).
Real complaints that don't use those phrases are missed by design — the
trade-off is a classifier you can audit line-by-line. The
trend (week-over-week, 90-day) captures direction even
when the absolute level is conservative. Don't read the percentage as
precise — read it as honest.
3. The headline number is last completed week, not this week
Reddit posts arrive through a third-party archive that re-indexes
daily, so the current week's rate is always a partial sample until
indexing catches up a day or two later. Hacker News data is live. To
avoid showing a number that drifts under the visitor, the big
headline shows the last completed week — full
sample, stable. The "this week so far" chip below each panel is a
live-but-partial peek; it'll move slightly until the week closes.
4. Same phrase list, both models
The classifier applies an identical phrase list to claude and openai
content. No per-model tuning. The comparison is honest only if both are
measured the same way. The list is in the repo; audit it.
5. Two-horse race tracks release calendars
Most short-term variation in the chart is "days since the last contentious
release." When OpenAI ships GPT-N, r/OpenAI spikes. Same for Claude. To
compare across models without launch confounding, toggle the x-axis to
weeks since release.
6. Volume is asymmetric
ChatGPT/Codex are mentioned far more than Claude. The big number on
each panel is a rate (% of weekly mentions that
contained complaint phrases), not a count. Compare the rates between
models — not the per-week mention totals shown beneath them.
7. Twitter/X is missing
Free-tier Twitter API access closed in 2023 and the paid tier is too
expensive for a free dashboard. Bluesky covers a sliver of similar
discourse but is journalist-skewed. This is the biggest gap in the
dataset.
8. Defection language is rhetoric, not behavior
"Switching to," "cancelled," "done with" are tracked because they're
high-signal sentiment. They do not measure who actually churned.
People announce departures and stay. The chart label says "defection
rhetoric," not "% of users defected."