Is session replay the same as keystroke logging?

No. A defensible approach captures editor events and diffs needed to reconstruct the code timeline, without capturing raw keypress streams or clipboard contents. The goal is integrity evidence, not surveillance.

Will this penalize candidates who copy-paste boilerplate or use reference materials?

It should not if you score composite patterns. Small pastes, templates, and normal research are expected. Escalate only when paste bursts and low edit-to-paste ratios combine with implausible progress curves, then confirm via a short authorship follow-up.

What is the safest action on a high-risk replay score?

Step-up, do not auto-reject. Common step-ups are identity re-verification before the next interview and a short live follow-up where the candidate modifies their own solution. This reduces false positives and creates clearer decision evidence.

How do we make this audit-ready?

Define ownership, log every action, restrict who can view replay, and generate an Evidence Pack that links session telemetry, derived signals, reviewer notes, and the final decision in your ATS. Keep retention time-bound.

Assessment-integrity · Jan 11, 2026 · 15 minute read

Code Session Replay Playbook to Catch Copy-Paste Farms

A defensible way to separate legitimate open-book problem solving from code-farm paste-ins, without slowing down hiring.

Elena Rostova

IO Psychologist & Assessment Lead

Elena designs fair, predictive coding assessments and calibration frameworks.

If you cannot replay how the code appeared, you cannot defend how you judged it.

Back to all posts

When the assessment looks clean but the timeline is not

The offer is ready. The hiring manager is excited. Then Security gets an email from a customer-facing engineer: "I saw the candidate's code in a forum last week." You pull the assessment submission and it is flawless. That is the problem: a static file tells you nothing about how it was produced. By the end of this briefing, you will be able to implement replay-based integrity signals for coding assessments, route only high-risk sessions to manual review, and generate audit-ready Evidence Packs without turning hiring into a surveillance program. The pattern you are defending against is not "candidate used Google". It is a copy-paste coding farm: someone else solves the task offline, then pastes the solution in a few seconds, often with minimal local editing and an implausible progression curve. Replay makes that visible and reviewable.

Why replay changes the risk profile

Session replay is simply a time-ordered reconstruction of editing events: when text was inserted, when blocks were replaced, when the candidate switched tabs, and how the code evolved. You do not need full screen recordings to get value. A CISO and GC care about two things: (1) can we detect and deter fraudulent completion, and (2) can we explain our decision if challenged. Replay helps with both because it produces a narrative: a timeline you can review, summarize, and attach to an Evidence Pack. Pindrop reported that 1 in 6 applicants to remote roles showed signs of fraud in one real-world hiring pipeline. Directionally, that implies remote hiring pipelines can be a meaningful attack surface. It does not prove your organization's fraud rate, nor does it specify fraud types limited to coding assessments. You still need your own instrumentation and baselines.

Ownership and flow

Automated vs manual: Automation scores sessions and queues exceptions. Manual review happens only for escalations, using replay plus a short reviewer checklist.
Sources of truth: ATS is the system of record for stages and decisions. The assessment platform is the source for session telemetry. The verification service is the source for identity status. Evidence Packs are the audit artifact that links all three with immutable timestamps.

What to capture (and what not to) for replay integrity signals

Edit events with timestamps (insert, delete, replace), plus diff sizes
Paste events (size, location, frequency) without storing clipboard contents
Focus/blur and tab visibility events (in-app, not OS-level spying)
Run/compile/test events and results, when available
Session start/end and idle durations Avoid (high exposure, low hiring value):
Full screen recordings by default
Keystroke logging or raw keypress streams
Storing ID documents alongside assessment telemetry
Storing biometrics beyond verification decision artifacts (use Zero-Retention Biometrics where possible)

Risk-tiered scoring that catches farms without punishing open-book work

Paste burst: N paste events over X characters within Y seconds
Edit-to-paste ratio: total typed/edited characters divided by pasted characters
Implausible progress curve: large solution appears near the end with minimal preceding scaffolding
Low iteration: few run/test cycles, then a final "perfect" submission
Context switching spikes: repeated blur events immediately before large pastes (directional, not dispositive) Calibrate by role. A senior engineer might paste a known template and then heavily modify it. A junior role might reasonably have more trial-and-error runs. That is why you use step-ups, not instant rejects.

Implementation steps that survive audit and incident response

Update assessment instructions: state that editing telemetry is collected for integrity and fairness.
Define allowable behavior: "open book" is allowed, impersonation and outsourced completion are not. 2) Instrument event collection
Collect minimal event types above and hash-link event batches to prevent tampering.
Use idempotent webhooks to send finalized session summaries into your ATS record. 3) Build derived features, not raw surveillance
Convert events into summary metrics (paste size distribution, idle time, run frequency).
Store a short replay timeline that replays diffs, not full screen content. 4) Create a Risk-Tiered Verification policy for assessments
Low risk: auto-pass on integrity, no human review.
Medium risk: require a 10-15 minute live follow-up on the same code to confirm authorship.
High risk: step up identity verification before any further interviews and require security review of the Evidence Pack. 5) Stand up a review queue with SLAs
Limit who can view replay details (least privilege).
Track reviewer fatigue: if escalations spike, your thresholds are too sensitive or the assessment prompt is being leaked. 6) Package Evidence Packs for decisions and appeals
Include: session metadata, derived signals, replay highlights, reviewer notes, and decision rationale.
Keep retention short and policy-driven. Long retention increases discovery exposure without hiring benefit.

Policy artifact: replay-based escalation rules

Use this as a starting point for Security and Recruiting Ops to align on thresholds, step-ups, and retention. Tune thresholds after two weeks of baseline data.

Anti-patterns that make fraud worse

Blanket zero-tolerance for any paste event, which drives false positives and increases appeal volume.
Reviewer access sprawl, where recruiters can view raw replay details and you create unnecessary legal exposure.
No step-up lane, forcing binary pass-fail decisions when a 15-minute authorship follow-up would resolve ambiguity.

Where IntegrityLens fits

Recruiting Ops keeps SLAs intact with automated queues and idempotent webhooks into the ATS timeline.
TA leaders get 24/7 AI interviews and assessment workflows that do not punish legitimate open-book work.
Verification runs in under 3 minutes (typical 2-3 minutes for document + voice + face), with 256-bit AES encryption and SOC 2 Type II and ISO 27001-certified infrastructure alignment.

Outcome proof: what "good" looks like in practice

Security investigations started late, usually after an offer exception or a team member raised concerns.
Audit evidence was scattered across the assessment tool, email threads, and ATS notes. After:
Escalations were queued automatically with consistent reviewer notes and a single Evidence Pack ID attached to the candidate record.
Hiring managers got a lightweight follow-up script for medium-risk cases, avoiding automatic rejections.
Legal signed off because the program minimized data capture, enforced least-privilege access, and documented an appeal flow with time-bound retention. Impact (qualitative): faster fraud triage, fewer "he said-she said" disputes, and clearer defensibility when a candidate challenges a decision.

Sources

https://www.pindrop.com/article/why-your-hiring-process-now-cybersecurity-vulnerability/

Related Resources

Key takeaways

Replay is less about surveillance and more about reconstructing a defensible timeline when integrity is challenged.
High-signal indicators are bursty paste events, low edit-to-paste ratios, and implausible progress curves, not "perfect code".
Risk-tiered policies reduce reviewer fatigue: only a small fraction of sessions should escalate to manual replay.
Store event telemetry and derived features, not keystrokes or raw screen recordings, to stay privacy-first and audit-ready.
Evidence Packs should support both enforcement and appeal: what happened, why it triggered, who reviewed, and what action was taken.

Replay integrity escalation policy (audit-friendly)YAML policy

A privacy-first policy that converts session replay telemetry into risk bands and defines step-up actions, retention, reviewer access, and appeal handling.

Designed to reduce reviewer fatigue by escalating only composite high-risk patterns, not normal copy-paste of small snippets.

policy:
  name: code-session-replay-integrity
  version: "1.3"
  scope:
    assessment_types: ["coding"]
    roles_in_scope: ["software-engineering", "data-engineering", "security-engineering"]
  data_minimization:
    collect_events: ["edit", "paste", "run", "focus", "blur", "submit", "idle"]
    do_not_collect: ["clipboard_contents", "raw_keystrokes", "full_screen_recording"]
    retention_days:
      low_risk_sessions: 30
      escalated_sessions: 180
  scoring:
    signals:
      paste_burst:
        description: "Large paste activity in a short window"
        threshold:
          chars_pasted_in_60s_gte: 800
          paste_events_in_60s_gte: 2
        weight: 4
      edit_to_paste_ratio:
        description: "Low editing relative to pasted content"
        threshold:
          ratio_lte: 0.25
        weight: 3
      late_solution_jump:
        description: "Majority of solution appears after prolonged idle or near submission"
        threshold:
          final_10pct_time_contains_gte_chars: 600
        weight: 3
      low_iteration:
        description: "Few run/test cycles for a non-trivial task"
        threshold:
          run_events_lte: 1
        weight: 2
  actions:
    risk_bands:
      low:
        score_max: 3
        action: "continue"
      medium:
        score_min: 4
        score_max: 7
        action: "step_up_followup_interview"
        followup:
          duration_minutes: 15
          prompt: "Walk through your approach, modify one function, and add one test."
      high:
        score_min: 8
        action: "security_review_and_reverify"
        reverify:
          required: true
          method: "biometric_identity_check"
  governance:
    reviewer_roles_allowed: ["security-analyst", "recruiting-ops-lead"]
    decision_authority: "security"
    appeals:
      enabled: true
      appeal_window_days: 7
      required_artifacts: ["evidence_pack_id", "reviewer_notes"]
  audit_logging:
    log_fields: ["candidate_id", "job_id", "session_id", "risk_score", "band", "action", "reviewer_id", "timestamp"]
    tamper_evidence: "hash_chained_batches"

Outcome proof: What changes

Before

Assessment submissions were treated as static artifacts, and integrity questions surfaced late in the funnel with fragmented evidence across tools.

After

Replay-based integrity scoring created consistent escalations, step-up follow-ups for ambiguous cases, and Evidence Packs linked to ATS decisions.

Governance Notes: Legal and Security approved the control because it minimized collected data (event telemetry and derived features rather than raw keystrokes or screen recordings), enforced least-privilege access to replay views, used time-bound retention, and documented an appeal flow with consistent Evidence Packs and audit logs.

Implementation checklist

Define what you will capture (events, diffs, focus) and what you will not (raw biometrics, full screen video, clipboard contents).
Create integrity thresholds and a step-up path (replay review, re-verify, short live follow-up).
Instrument reviewer workflow with SLAs, access controls, and an appeal lane.
Calibrate signals per role and per question type to avoid false positives for senior engineers.
Generate an Evidence Pack per escalation and log every decision in your ATS timeline.

Questions we hear from teams

Is session replay the same as keystroke logging?: No. A defensible approach captures editor events and diffs needed to reconstruct the code timeline, without capturing raw keypress streams or clipboard contents. The goal is integrity evidence, not surveillance.
Will this penalize candidates who copy-paste boilerplate or use reference materials?: It should not if you score composite patterns. Small pastes, templates, and normal research are expected. Escalate only when paste bursts and low edit-to-paste ratios combine with implausible progress curves, then confirm via a short authorship follow-up.
What is the safest action on a high-risk replay score?: Step-up, do not auto-reject. Common step-ups are identity re-verification before the next interview and a short live follow-up where the candidate modifies their own solution. This reduces false positives and creates clearer decision evidence.
How do we make this audit-ready?: Define ownership, log every action, restrict who can view replay, and generate an Evidence Pack that links session telemetry, derived signals, reviewer notes, and the final decision in your ATS. Keep retention time-bound.

Ready to secure your hiring pipeline?

Let IntegrityLens help you verify identity, stop proxy interviews, and standardize screening from first touch to final offer.

Try it free Book a demo

Watch IntegrityLens in action

See how IntegrityLens verifies identity, detects proxy interviewing, and standardizes screening with AI interviews and coding assessments.