Should we allow candidates to use ChatGPT or Copilot in coding assessments?

Yes, if you define boundaries and verify authorship. Allow GenAI for implementation help, then require an explain-and-modify checkpoint and use integrity signals to trigger step-up verification when patterns look like proxying or leakage.

How do we avoid penalizing honest candidates who paste code snippets?

Do not auto-reject on a single behavior. Treat high paste ratios as a routing signal to a short explanation checkpoint, and only escalate when identity or session signals corroborate risk.

What evidence do we need to defend a rejection for cheating?

A defensible record should include the candidate-facing policy, the candidate acknowledgement, identity verification status, timestamped assessment artifacts, and a documented disposition rationale tied to specific observed signals.

What is the minimum viable control set for finance leaders worried about cost?

Start with identity gating before assessments for remote roles, log verification status in the ATS, and use a single manual review queue with an SLA and required Evidence Pack links for any adverse action.

Assessment-integrity · Apr 16, 2026 · 9 minute read

Generative AI in Coding Tests: Policy That Catches Fraud

A CFO-grade playbook to separate legitimate GenAI-assisted work from proxying, leakage, and identity fraud - without slowing the funnel.

Elena Rostova

IO Psychologist & Assessment Lead

Elena designs fair, predictive coding assessments and calibration frameworks.

Allow GenAI, but verify authorship. Your control objective is predictive signal, not tool purity.

Back to all posts

The day your "top coder" becomes a finance incident

Your team greenlights an offer for a senior engineer after a fast, impressive coding test. Two weeks into onboarding, deliverables stall, access requests look odd, and the manager admits the hire cannot explain the code they supposedly wrote. Now FP&A is modeling rework, backfill, and delayed roadmap revenue while Legal asks what diligence you performed. By the end of this article, you will be able to write a GenAI-allowed coding assessment policy that protects speed while reliably separating modern efficiency from fraud signals that require step-up verification.

Cost: churn and replacement cost exposure when the wrong hire slips through
Speed: cycle time volatility when policies trigger re-tests and appeals
Risk: audit questions about fair process and defensible decisions
Reputation: candidate backlash when "AI cheating" accusations are sloppy

What the fraud stats imply for your cost model

A Checkr survey reports that 31% of hiring managers say they have interviewed a candidate who later turned out to be using a false identity. Directionally, this suggests identity risk is not edge-case noise in modern hiring funnels, especially where remote work is involved. It does not prove your company has the same rate, nor does it isolate technical roles or verified pipelines. Pindrop reports 1 in 6 applicants to remote roles showed signs of fraud in one real-world hiring pipeline. Directionally, that means remote pipelines should be treated as an attack surface, not just an HR workflow. It does not prove fraud was confirmed in every flagged case, and it reflects one pipeline's instrumentation and definitions. When a bad hire happens, the replacement cost is frequently estimated at 50% to 200% of annual salary depending on role. Directionally, the CFO takeaway is that even a small number of integrity failures can dominate your hiring efficiency gains. It does not mean every mis-hire costs 200%, and it varies heavily by seniority, time-to-fill, and productivity loss.

Your risk is tail-risk: a few high-impact failures, not average-case variance
The real cost is secondary: delays, incident response time, and team churn
The control objective is stability: predictable funnel throughput and defensible exceptions

Definitions you can put in policy

Clear definitions reduce appeals, reduce inconsistent reviewer decisions, and keep your controls from becoming an anti-talent tax.

Who owns this and what is automated

Make ownership explicit before you tune thresholds. The failure mode is cross-functional ambiguity that turns every edge case into an escalation. Recommended model: Recruiting Ops owns policy configuration and queue SLAs, Security owns identity and data handling requirements, and Hiring Managers own rubric scoring and final technical disposition. Finance can require quarterly reporting on exceptions, retest rates, and reviewer workload as an internal control. Automation should handle 90% of cases: identity verification, session instrumentation, risk scoring, and routing. Manual review should be reserved for step-up cases with clear evidence and consistent dispositions. Sources of truth: the ATS is the system of record for stages and decisions; the verification subsystem is the system of record for identity status; the assessment system is the system of record for attempt artifacts; the Evidence Pack links them with timestamps for audit and appeals.

Recruiting Ops: owns thresholds, candidate comms templates, and appeal workflow
Security: signs off on biometric retention, access control, and audit logging
Hiring Manager: scores output quality and reasoning, not "tool purity"
TA leader: accountable for funnel leakage and time-to-offer

What should be allowed vs disallowed in a GenAI-enabled coding test

Allow GenAI for productivity, but disallow misrepresentation. The policy line that works operationally is: candidates may use GenAI to accelerate implementation, but must be able to explain and modify their solution live, and must not outsource work to another human or hidden agent. In practice, CFO risk is not that candidates used an LLM. It is that the hiring signal no longer predicts on-the-job performance because the candidate did not produce the work, cannot reason about it, or used prohibited leakage (private repos, prior test prompts, or employer code).

Syntax help and scaffolding with disclosure ("I used a tool to generate a baseline")
Refactoring suggestions that the candidate can justify
Writing unit tests faster, as long as tests align to the prompt

Proxying: a different person completes the assessment or explains the solution
Prompt leakage: using prior copies of the exact assessment or sharing it externally
Hidden agent behavior: live screen sharing to a helper, remote control, or multiple simultaneous sessions

Integrity signals that separate modern work from misrepresentation

The most defensible approach is signal fusion: no single metric should fail a candidate. Combine identity, environment, and code-behavior signals and use Risk-Tiered Verification to step up only when multiple independent indicators align. From a finance perspective, this reduces false positives that cause re-tests and candidate drop-off, while still catching high-risk patterns before offer.

Identity signals: document + face + voice match status, liveness, repeated identities across applicants
Session signals: IP volatility, impossible geovelocity, multi-device switching mid-attempt, repeated restarts
Behavior signals: extreme copy-paste ratios, low keystroke activity with large code deltas, suspiciously consistent completion times across candidates
Code signals: similarity clusters across attempts, answer-template reuse, mismatch between code complexity and explanation quality

Low risk: proceed, log evidence, no candidate friction
Medium risk: add a short explain-and-modify checkpoint in the interview, focused on tradeoffs
High risk: step-up verification and manual review before advancing, with a standardized disposition

A policy config you can actually implement

This example shows a practical GenAI-allowed policy with step-up controls. It is designed to be reviewed by Recruiting Ops and Security, then enforced consistently across roles.

Step-by-step rollout that keeps cycle time predictable

Start with clarity, then instrumentation, then thresholds. If you tune thresholds first, you will either over-block (candidate churn) or under-block (missed fraud).

Step 1: Define the allowed GenAI behaviors with 3 examples and 3 disallowed behaviors, and require candidate acknowledgement before the test starts
Step 2: Rework the assessment into a Day 1 task: add a small ambiguity, a tradeoff decision, and a short written rationale section that GenAI cannot fake without understanding
Step 3: Turn on identity gating before the assessment for remote roles, and log verification status to the candidate profile
Step 4: Instrument integrity signals and set initial thresholds in shadow mode for 2-3 weeks (illustrative), measuring false positives and reviewer workload
Step 5: Activate Risk-Tiered Verification actions: explain-and-modify for medium risk, step-up verification and review queue for high risk
Step 6: Standardize dispositions and appeals: "pass", "pass with note", "retest", "reject for policy", and require an Evidence Pack link for any adverse action
Step 7: Review monthly: exceptions volume, retest rate, time in review queue, and any audit findings related to fairness and consistency

Anti-patterns that make fraud worse

These look tough on paper but create more noise, more appeals, and more ways for sophisticated actors to route around your controls.

Zero-tolerance "any GenAI is cheating" policies that punish honest candidates and increase funnel leakage into competitors
Single-signal auto-rejects (for example copy-paste alone) that create brittle controls and high false positive rates
Unlogged exceptions ("the manager okayed it") that fail audits and make inconsistent outcomes inevitable

Where IntegrityLens fits

IntegrityLens AI is built for teams that need to allow modern tooling without losing hiring signal quality. TA leaders and recruiting ops teams run the workflow in one ATS, while CISOs rely on the security posture and audit trail. You can gate coding tests with biometric identity verification (typical document + voice + face in 2-3 minutes, under 3 minutes before interviews), run 24/7 AI screening interviews, and deliver technical assessments across 40+ languages. Integrity signals route candidates into Risk-Tiered Verification and manual review queues, with Evidence Packs that keep decisions defensible without adding swivel-chair work.

Separate ATS, assessment tool, and verification vendor stitched together by spreadsheets
Unreviewable screenshots and subjective notes with no timestamps
Inconsistent step-up decisions that change by recruiter or hiring manager

Sources

Related Resources

Key takeaways

Treat GenAI as a permitted tool with boundaries, not a binary pass-fail trigger, to avoid false rejections that inflate hiring cost.
Differentiate efficiency from fraud by combining identity signals, environment signals, and code-forensics signals into a risk score with step-up actions.
Make your policy auditable: log what was allowed, what was observed, and why a step-up review happened.
Design assessments around Day 1 work products so GenAI use reveals judgment and verification skills instead of trivia recall.

GenAI-Allowed Coding Assessment Integrity Policy (v1)yaml

Use this as a baseline configuration for a risk-tiered assessment policy.

It explicitly allows GenAI for efficiency, then uses integrity signals to trigger step-ups and queue reviews.

Tune thresholds in shadow mode first to control false positive rates and reviewer fatigue.

policy:
  name: genai-allowed-coding-assessment-v1
  scope:
    roles:
      - software-engineer
      - data-engineer
      - security-engineer
    locations:
      - remote
      - hybrid
  candidate-disclosure:
    required: true
    prompt: "GenAI tools are allowed for implementation help. You must be able to explain and modify your solution live. Proxying, prompt leakage, and hidden helpers are prohibited."
  allowed-behaviors:
    - "Use GenAI for scaffolding, syntax, refactoring suggestions"
    - "Use GenAI to draft unit tests"
    - "Use public documentation and standard libraries"
  disallowed-behaviors:
    - "Another person completes any part of the assessment or interview"
    - "Sharing or reusing test prompts/solutions outside the platform"
    - "Remote control, screensharing to a helper, or concurrent sessions"
  identity-gating:
    required-before-assessment: true
    target-time-seconds: 180
    methods:
      - document
      - face-liveness
      - voice-match
    privacy:
      biometrics: "zero-retention-biometrics"
      evidence-retention-days: 180
  signals:
    session:
      ip-volatility-threshold: 2
      device-switch-threshold: 1
      concurrent-session: "high"
    behavior:
      paste-to-type-ratio-high: 0.85
      low-keystroke-large-delta: true
      restart-count-threshold: 2
    code:
      similarity-cluster-detection: true
      explanation-mismatch-flag: true
  risk-tiering:
    low:
      when:
        - "identity.status == verified"
        - "signals.concurrent-session == false"
        - "signals.similarity_cluster == false"
      action:
        - "advance"
        - "store-evidence-pack"
    medium:
      when:
        - "signals.paste_ratio >= 0.85"
        - "signals.restart_count >= 2"
      action:
        - "add-explain-and-modify-checkpoint"
        - "store-evidence-pack"
    high:
      when:
        - "identity.status in [unverified, mismatch]"
        - "signals.concurrent_session == true"
        - "signals.similarity_cluster == true"
        - "signals.impossible_geovelocity == true"
      action:
        - "hold"
        - "step-up-verification"
        - "route-to-manual-review-queue: integrity-review"
        - "store-evidence-pack"
  review-queue:
    name: integrity-review
    sla-hours: 24
    dispositions:
      - pass
      - pass-with-note
      - retest
      - reject-for-policy
    required-notes:
      - "reason"
      - "evidence-pack-link"
  integrations:
    ats:
      source-of-truth: true
      writeback-fields:
        - verification_status
        - risk_tier
        - evidence_pack_url
    webhooks:
      mode: idempotent
      events:
        - assessment.submitted
        - verification.completed
        - risk.updated

Outcome proof: What changes

Before

GenAI use was treated as cheating by some managers and ignored by others. That inconsistency drove re-tests, escalations, and unclear audit trails when candidates appealed rejections.

After

They implemented an explicit GenAI-allowed policy with Risk-Tiered Verification, added an explain-and-modify checkpoint for medium-risk cases, and required Evidence Pack links for any adverse decision.

Governance Notes: Security and Legal signed off because biometric handling used zero-retention biometrics with explicit retention limits for evidence, access was role-based with audit logs, and candidates had an appeal path that referenced specific policy clauses and the stored Evidence Pack rather than subjective accusations.

Implementation checklist

Publish an explicit "Allowed GenAI" policy with examples and candidate acknowledgements
Instrument integrity signals (identity, device/session, copy-paste, timing, code similarity) and define thresholds
Adopt Risk-Tiered Verification: low friction by default, step-up only on risk
Route exceptions to an SLA-bound manual review queue with consistent dispositions
Store an Evidence Pack per candidate attempt for appeal handling and audit defense
Review false positives monthly to avoid reviewer fatigue and biased outcomes

Questions we hear from teams

Should we allow candidates to use ChatGPT or Copilot in coding assessments?: Yes, if you define boundaries and verify authorship. Allow GenAI for implementation help, then require an explain-and-modify checkpoint and use integrity signals to trigger step-up verification when patterns look like proxying or leakage.
How do we avoid penalizing honest candidates who paste code snippets?: Do not auto-reject on a single behavior. Treat high paste ratios as a routing signal to a short explanation checkpoint, and only escalate when identity or session signals corroborate risk.
What evidence do we need to defend a rejection for cheating?: A defensible record should include the candidate-facing policy, the candidate acknowledgement, identity verification status, timestamped assessment artifacts, and a documented disposition rationale tied to specific observed signals.
What is the minimum viable control set for finance leaders worried about cost?: Start with identity gating before assessments for remote roles, log verification status in the ATS, and use a single manual review queue with an SLA and required Evidence Pack links for any adverse action.

Ready to secure your hiring pipeline?

Let IntegrityLens help you verify identity, stop proxy interviews, and standardize screening from first touch to final offer.

Try it free Book a demo

Watch IntegrityLens in action

See how IntegrityLens verifies identity, detects proxy interviewing, and standardizes screening with AI interviews and coding assessments.