How do we tell a candidate they were flagged without accusing them?

Use neutral, process-based language: "We could not verify continuity on this attempt" or "We need one additional verification step to proceed." Avoid the words "fraud" or "cheating" in candidate comms and point to a specific next action with an ETA.

When should Support escalate to Security?

Escalate only when you have confirmed identity discontinuity (for example, face mismatch across attempts) or a high-risk proxy pattern (for example, multi-face in frame plus device volatility). Everything else should route to a bounded retake or integrity review queue.

Do retakes increase fraud risk?

Unlimited retakes do. A single bounded retake with stronger verification reduces false positives without giving attackers infinite attempts to tune their behavior.

What should an Evidence Pack contain for a false positive case?

Minimum: risk tier, triggering signals, verification results and timestamps, assessment attempt metadata, reviewer decision with policy clause, candidate communications, and the final ATS disposition.

Assessment-integrity · May 16, 2026 · 10 minute read

False Positive Flags in Coding Tests: A Support-Grade Playbook

A false positive is not just a candidate problem. It is a queue design problem that can burn support hours, damage brand trust, and still fail audit if your team cannot explain the decision.

Elena Rostova

IO Psychologist & Assessment Lead

Elena designs fair, predictive coding assessments and calibration frameworks.

False positives are inevitable. Bad processes are optional: define step-ups, SLAs, and evidence so Support can move fast without weakening controls.

Back to all posts

The moment your queue turns into a reputation incident

It is 9:40 AM. A finalist for a remote role fails an integrity check right after a strong coding assessment. The candidate emails your shared inbox with screenshots, tags the CEO on LinkedIn, and asks for their data deletion in the same thread. Meanwhile, the recruiter is asking Support to "just override it" to keep the onsite schedule. If you treat this like a one-off, you will either (a) reject an honest candidate and create churn plus reputational risk, or (b) waive controls in a way that teaches bad actors which flags are bluffable. Support needs a consistent, fast, defensible script: what to check, what to request, what to retake, when to escalate, and how to close the loop in the ATS.

What to do first when a verified honest candidate triggers a flag

Default decision: do not override, do not reject. Move the case into a step-up flow that (1) confirms identity continuity, (2) validates assessment integrity, and (3) documents the rationale in an Evidence Pack. This preserves speed because most honest false positives clear through a bounded retake or a quick manual review, and it preserves security because high-risk patterns still escalate with standardized evidence, not vibes.

Speed: ad hoc back-and-forth creates reviewer fatigue and missed interview slots.
Cost: every unclear case becomes a high-touch ticket that drags TA, Security, and hiring managers into Slack.
Risk: inconsistent exceptions are exactly what auditors and incident responders look for.
Reputation: a single mishandled "fraud" accusation can blow up on social channels and candidate forums.

Fraud is real, but your signals are probabilistic

Checkr reports that 31% of hiring managers say they have interviewed a candidate who later turned out to be using a false identity. Directionally, this implies identity fraud is common enough that you need consistent gating before interviews, not just after an incident. It does not prove your industry, role type, or funnel has the same rate, and it does not tell you which detection signals are most reliable in your stack. Pindrop reports that 1 in 6 applicants to remote roles showed signs of fraud in one real-world pipeline. Directionally, remote hiring increases the attack surface and raises the need for automation plus evidence-based review. It does not mean 1 in 6 are confirmed fraud, and it does not tell you the false-positive rate of any single control.

Ownership, automation, and systems of record

Recommendation: assign one accountable owner for the decision, and keep Support out of the role of "fraud judge." Support should run intake and evidence collection, then route by policy. Ownership model that works in practice: - Recruiting Ops owns the policy (risk tiers, retake eligibility, SLAs) and the candidate-facing process. - Security owns the high-risk escalation criteria and approves any new signals or step-ups. - Hiring managers own the hiring decision only after integrity is cleared to an interviewable state. What is automated vs manually reviewed: - Automated: flag generation, risk tier assignment, gating actions (block, step-up, allow), creation of an Evidence Pack shell, ATS status updates via idempotent webhooks. - Manual: medium-risk adjudication, retake approval, and high-risk escalation review with Security. Sources of truth: - ATS is the system of record for stage and disposition. - Verification service is the system of record for identity continuity evidence (document, face, voice, liveness results). - Assessment platform is the system of record for code telemetry and proctoring signals. - IntegrityLens Evidence Pack is the system of record for the decision narrative and artifacts tied to the candidate.

Definitions you can copy into policy docs

Use these definitions internally so Support, TA, and Security stop arguing about vocabulary mid-incident.

False positive is a detection outcome where a candidate is flagged as risky by a signal, but subsequent review shows no policy violation and the candidate should proceed.
Integrity signal is an observable indicator (device, network, identity continuity, assessment telemetry) that changes risk level but is not, by itself, proof of wrongdoing.
Risk-Tiered Verification is a control design that applies stronger verification steps only when signals indicate higher risk, reducing friction for low-risk candidates while maintaining defensibility.
Evidence Pack is a structured bundle of logs, artifacts, timestamps, and reviewer notes that explains why a candidate was cleared, stepped up, or rejected.

Step-by-step runbook: clear honest candidates without training fraudsters

Recommendation: treat every flagged-but-verified case as a two-question decision: (1) is identity continuous, and (2) is the assessment attempt attributable and consistent. Step 1: Freeze the state, not the candidate - Lock the current attempt artifacts (verification result, assessment telemetry, timestamps). Do not ask the candidate to "try again" before you snapshot evidence, or you lose comparability. - Ensure the ATS reflects "Step-up required" rather than "Rejected" to avoid reputational damage if the case clears. Step 2: Classify the flag into a Support-friendly bucket - Identity continuity mismatch (name/DOB mismatch, face mismatch, liveness quality issues). - Assessment integrity anomaly (tab switching spikes, copy/paste bursts, impossible timing, multiple faces in frame). - Environment anomalies (VPN/proxy, device fingerprint volatility, unusual geo change). Step 3: Apply bounded step-ups (do not debate intent) - For identity-quality issues: offer a one-tap retake with clearer lighting instructions and a quality threshold. Keep it time-boxed (example: within 24 hours) so scheduling stays predictable. - For environment anomalies: require a step-up verification before the next interview stage, not a rejection. Ask for a second factor that is hard to spoof, not a long explanation email thread. - For assessment anomalies: require an attributable retake in a stronger mode (stricter proctoring or a short live verified follow-up) rather than accepting a narrative. Step 4: Review with an evidence rubric, not gut feel - Confirm whether the same verified identity appears across attempts (document, face, voice) and whether liveness passed at adequate quality. - Compare attempt fingerprints: device, network, timestamp patterns, and assessment telemetry. Look for consistency, not perfection. - Decide: Clear, Step-up again (one time), or Escalate to Security. Step 5: Close the loop with a candidate-safe explanation - Never accuse. Use neutral language: "We could not verify continuity on this attempt" or "We need an additional verification step to proceed." - Provide the next action in one sentence, with an ETA and an appeal path. Step 6: Write the Evidence Pack note as you go - Support adds intake context, candidate communications, and which policy clause was applied. - Reviewer adds the specific artifacts consulted and the decision rationale. - Recruiting Ops ensures the ATS stage and disposition match the decision, then closes the ticket.

Low-risk step-up retake decision: same business day.
Medium-risk manual review: within 1 business day.
High-risk Security escalation: acknowledge within 4 business hours, decision within 2 business days unless incident response is triggered.

A policy artifact Support can run without guesswork

This example policy is designed so Support can route cases consistently, while Security retains control of high-risk escalation. Customize thresholds to your environment and validate with your Legal and Security teams.

Where IntegrityLens fits

IntegrityLens AI is the first hiring pipeline that combines a full Applicant Tracking System with advanced biometric identity verification, AI screening, and technical assessments. For false positives, it gives you Risk-Tiered Verification controls, Evidence Packs, and workflow automation so Support can resolve flags fast without informal overrides. Used by TA leaders, recruiting ops, and CISOs, IntegrityLens supports: - ATS workflow as the system of record for stage and disposition - Identity verification in under three minutes (typical end-to-end document + voice + face in 2-3 minutes) before interviews - Fraud detection signals that trigger step-ups and review queues - 24/7 AI screening interviews to reduce scheduling pressure during step-ups - Coding assessments in 40+ languages with integrity telemetry and defensible rubrics

Anti-patterns that make fraud worse

"Just override it" exceptions in Slack with no Evidence Pack, which creates audit findings and teaches attackers that persistence wins. - Unlimited retakes without stronger verification, which lets fraudsters A-B test your controls. - Zero-tolerance auto-reject on any anomaly, which inflates false positives and pushes good candidates to competitors.

Sources

- Checkr (2025): Hiring Hoax (Manager Survey) https://checkr.com/resources/articles/hiring-hoax-manager-survey-2025
Pindrop: Why your hiring process is now a cybersecurity vulnerability https://www.pindrop.com/article/why-your-hiring-process-now-cybersecurity-vulnerability/

Related Resources

Key takeaways

Treat false positives as an operations design issue: define step-ups, SLAs, and evidence, not ad hoc exceptions.
Separate "integrity signals" (risk indicators) from "disqualifiers" (policy violations) to keep honest candidates moving.
Use Risk-Tiered Verification: fast-path low-risk, step-up medium-risk, and security review for high-risk patterns.
Build an Evidence Pack per case so Support can close disputes without backchanneling Security for every decision.
Offer bounded retakes with stronger verification rather than blanket rejections or unlimited retries.

Risk-tier step-up policy for flagged-but-verified candidatesyaml

Drop this into your internal control repo and map each action to ATS statuses and Support macros.

Design intent: one bounded retake path, one escalation path, and a mandatory Evidence Pack for any adverse action.

version: 1
policyName: "false-positive-flag-handling"
scope:
  stages: ["verify-identity", "ai-screen", "coding-assessment", "interview"]
  appliesTo: "flagged-but-not-confirmed-fraud"

riskTiers:
  low:
    definition: "Single weak signal; identity continuity otherwise consistent"
    actions:
      - type: "allow"
        atsStatus: "Proceed"
      - type: "note"
        evidencePackRequired: true
        noteTemplate: "Cleared: weak signal only. See artifacts and timestamps."

  medium:
    definition: "One strong signal or multiple weak signals; no confirmed mismatch"
    actions:
      - type: "step-up"
        stepUpType: "bounded-retake"
        allowedRetakes: 1
        retakeWindowHours: 24
        requirements:
          - "zero-retention-biometrics"
          - "liveness-min-quality:0.70"   # tune per vendor calibration
          - "same-identity-required:true"
        atsStatus: "Step-up required"
      - type: "manual-review"
        queue: "Integrity-Review"
        slaHours: 24
        evidencePackRequired: true

  high:
    definition: "Confirmed identity discontinuity or pattern consistent with proxy behavior"
    actions:
      - type: "hold"
        atsStatus: "Security review"
      - type: "escalate"
        queue: "Security"
        slaHours: 8
        evidencePackRequired: true
      - type: "candidate-comms"
        template: "Neutral-stepup"
        prohibitedPhrases: ["fraud", "cheater", "criminal"]

signals:
  identity:
    blocking:
      - name: "face-mismatch"
        mapsToTier: high
      - name: "document-invalid"
        mapsToTier: high
    stepUp:
      - name: "liveness-low-quality"
        mapsToTier: medium
      - name: "name-normalization-mismatch"
        mapsToTier: medium

  assessment:
    stepUp:
      - name: "impossible-time-to-solve"
        mapsToTier: medium
      - name: "multi-face-detected"
        mapsToTier: high
      - name: "high-copy-paste-burst"
        mapsToTier: medium

integrations:
  webhooks:
    mode: "idempotent"
    idempotencyKey: "candidateId:eventType:attemptId"
    writeBack:
      systemOfRecord: "ATS"
      fields: ["riskTier", "atsStatus", "evidencePackUrl", "decisionTimestamp"]

logging:
  retentionDays:
    evidencePack: 180
    rawBiometrics: 0
  accessControls:
    rolesAllowed: ["RecruitingOps", "IntegrityReviewer", "Security"]
    supportRole: "read-only-metadata"

Outcome proof: What changes

Before

Flagged candidates triggered ad hoc overrides, inconsistent messaging, and frequent Security pings. Support tickets stayed open until a recruiter or manager forced a decision, creating brand-risky comms and weak audit trails.

After

Support ran a consistent step-up flow with one bounded retake lane, a clear escalation lane, and Evidence Packs attached to ATS records. Recruiters stopped requesting informal overrides because the process produced predictable ETAs and defensible outcomes.

Governance Notes: Legal and Security signed off because the process minimizes adverse-action ambiguity (no accusatory language), enforces least-privilege access (Support is read-only on sensitive artifacts), uses Zero-Retention Biometrics where applicable, and keeps retention boundaries explicit (Evidence Packs retained, raw biometrics not retained). An appeal path exists that triggers a structured re-review rather than a discretionary override, and all state changes write back to the ATS via idempotent webhooks for an immutable timeline.

Implementation checklist

Define which flags are informational vs blocking, and publish it internally (Support, TA, Security).
Create a 2-lane queue: low-friction retake lane and high-risk escalation lane.
Set an SLA per lane (example: 4 business hours for retake decisions, 1 business day for escalations).
Require an Evidence Pack for any adverse action or exception.
Instrument idempotent webhook events so the ATS reflects the current risk tier reliably.
Add an appeal path that does not require the hiring manager to interpret fraud signals.

Questions we hear from teams

How do we tell a candidate they were flagged without accusing them?: Use neutral, process-based language: "We could not verify continuity on this attempt" or "We need one additional verification step to proceed." Avoid the words "fraud" or "cheating" in candidate comms and point to a specific next action with an ETA.
When should Support escalate to Security?: Escalate only when you have confirmed identity discontinuity (for example, face mismatch across attempts) or a high-risk proxy pattern (for example, multi-face in frame plus device volatility). Everything else should route to a bounded retake or integrity review queue.
Do retakes increase fraud risk?: Unlimited retakes do. A single bounded retake with stronger verification reduces false positives without giving attackers infinite attempts to tune their behavior.
What should an Evidence Pack contain for a false positive case?: Minimum: risk tier, triggering signals, verification results and timestamps, assessment attempt metadata, reviewer decision with policy clause, candidate communications, and the final ATS disposition.

Ready to secure your hiring pipeline?

Let IntegrityLens help you verify identity, stop proxy interviews, and standardize screening from first touch to final offer.

Try it free Book a demo

Watch IntegrityLens in action

See how IntegrityLens verifies identity, detects proxy interviewing, and standardizes screening with AI interviews and coding assessments.