Duplicate detection

Twenty users report the same broken save button. Userz turns that into one PR — not twenty.

How it works

When Userz spots that a new report describes the same thing as an existing one, it attaches the new report as a duplicate of the original. The original (the oldest of the cluster) becomes the lead: it carries an LLM-consolidated description, a short theme, and the eventual PR. The agent runs once against the consolidated description — one PR per theme, not one PR per report.

Duplicates remain queryable as full Feedback rows (screenshots, console logs, submitter info all preserved) but are hidden from the dashboard queue. Opening the lead's detail page surfaces them in a "Duplicates of this report" section, and each duplicate shows a backlink to the lead.

Two modes

Pick one in App settings → Batching:

Incremental merge: every new submission is compared against recent feedback in real time (Jaccard token pre-filter → cosine similarity on embeddings). A match attaches immediately; no match routes the submission normally.
Windowed rollup: submissions accumulate in batch_pending for a configurable window (default 24 hours). When you click Trigger duplicate detection, Userz clusters everything in the window in one pass and promotes a lead per cluster.

The two modes are mutually exclusive per App. Both use the helper model you've configured under Settings → AI, so you can pair a cheap helper (Haiku, GPT-4o-mini, GLM-Flash) with a powerful agent.

What the agent sees

When a lead has attached duplicates, the agent's prompt uses the lead's consolidatedText — an LLM-summarized description that covers every member's intent. The PR body cites the lead's feedback id and notes how many reports were consolidated.

When a new duplicate attaches to a lead that's already opened a PR, the lead's consolidated description is updated for future reference, but the in-flight PR is not re-run automatically — you can always trigger a fresh run from the dashboard if the new duplicate adds material context.

Working with duplicates

From the dashboard you can:

Trigger duplicate detection (windowed mode) to cluster the current window immediately rather than wait for the next manual pass.
View attached duplicates on the lead's detail page, with cross-links to each duplicate's full payload.
Tune the similarity threshold (incremental mode) per App, between 0 and 1. The default 0.82 balances recall against false-merges.

The lead is always the oldest member — the original first submission. New duplicates attach to it; the lead is never demoted.

Surfacing the lead back to your users

Each Feedback row — lead or duplicate — keeps its own identity, so end-user-facing surfaces (your own status pages, chat-ops responses) can tell each reporter that their issue is being addressed without revealing how it was grouped. The lead pointer is on the duplicate row's GET /v1/feedback/:id response under duplicateOfId; the lead's response carries duplicateCount, theme, consolidatedText, and a duplicates[] array.