Technical-guides · Dec 14, 2025 · 8 minute read

Code Replay: Detect Copy-Paste Farms Without Slowing Hires

An operator playbook for adding replay to coding sessions, turning raw editor telemetry into reviewable integrity signals, and routing only the riskiest sessions into a lightweight

Elena Rostova

IO Psychologist & Assessment Lead

Elena designs fair, predictive coding assessments and calibration frameworks.

Replay doesnt slow hiring when its risk-tiered. It turns weird vibes into evidence you can route, review, and defend.

Back to all posts

The green checkmark that turned into a churn call

Its end of quarter. Your VP of Sales is pushing for a faster implementation cycle because two enterprise renewals hinge on on-time delivery. Recruiting celebrates a senior platform engineer hire: top score on the coding assessment, fast turnaround, confident answers. Three weeks later, Support is escalating P1s because deployments are breaking. Engineering leadership is quietly pair-programming with the new hire and realizing something uncomfortable: they cant reproduce even the simplest refactors without magic appearing in their editor. This is the revenue version of funnel leakage: you paid for a signal (assessment pass) that didnt correlate with Day 1 work. Replay is how you stop buying counterfeit signals.

Why this matters to RevOps/CRO: speed, cost, and reputation

Your anxieties are rational: - Speed: Any additional gate can slow time-to-offer and create candidate drop-off. - Cost: Bad hires dont just cost replacement; they consume scarce reviewer time, onboarding cycles, and incident bandwidth. - Risk: Fraudulent hiring is now a security vector, not just an HR mishap. In one real-world pipeline, 1 in 6 applicants to remote roles showed signs of fraud (Pindrop). - Reputation: If a customer learns you staffed a project with someone who misrepresented identity/skills, you get hit twice: delivery failure and trust erosion. Replay lets you keep the funnel moving while adding defensible evidence when something looks off. Youre not trying to turn your hiring process into surveillance; youre trying to keep the integrity of your signal-to-noise ratio high enough that downstream execution stays predictable.

Treat assessment integrity as a revenue control, like deal desk policy: 90% auto-approve, 10% escalate with evidence.
Optimize for low reviewer fatigue: fewer, higher-signal reviews beat broad manual scrutiny.
Make the why explicit: protect customers from delivery risk and protect candidates from arbitrary rejections.

What to record (privacy-first) to enable replay

A robust replay system does not require full screen recording. You want minimal telemetry that reconstructs the coding workflow while staying privacy-first: - Editor events: insert, delete, replace, selection changes (with timestamps) - Paste events: length, source type (keyboard/menu), and the pasted text or a hashed representation (depending on policy) - Execution events: run/test invoked, compile errors, unit test failures - File events: created/renamed, language mode changes - Environment events: disconnects, tab switches (if available), focus loss Pragmatic privacy guidance: - Capture just enough to produce an Evidence Pack and support appeals. - Use Zero-Retention Biometrics principles for identity steps (dont mix biometrics with code telemetry; keep concerns separated). - Encrypt at rest (256-bit AES baseline) and apply short retention for raw telemetry; keep only derived summaries longer if Legal prefers.

Recording webcam video during coding just in case (high friction, high privacy risk).
Keeping raw pasted content indefinitely when a derived paste fingerprint would satisfy review needs.
Letting every hiring manager become a replay detective (reviewer fatigue + inconsistent decisions).

Turn replay into integrity signals (not vibes)

Replay data becomes useful when you translate it into a small set of measurable signals that correlate with farm patterns. Start with signals that are hard to fake and easy to explain in an audit: Important: allow open book resourcefulness. A candidate pasting a small helper snippet or boilerplate isnt fraud. The policy line should be about authorship transfer, not reference use.

Paste-to-type ratio (session + early window): High ratio in the first N minutes is more suspicious than later.
Paste burst count: Multiple large pastes (>X chars) within a short window often indicates solution injection.
Time-to-first-keystroke: Long idle before any typing can indicate off-screen work or a handoff.
Zero-iteration leap: Solution appears nearly complete without intermediate errors, edits, or tests.
Edit locality: Real work tends to touch multiple regions iteratively; injected solutions often land in one block.
Run/test cadence: Healthy problem-solving shows incremental runs/tests; farms often paste, run once, pass.

Use language/task-specific thresholds: a React task has more boilerplate paste than a pure algorithm task.
Weight early-session behavior more than late-session behavior.
Combine signals; dont reject on one metric (e.g., paste ratio alone).

Risk-tiered replay review policy (ready to operationalize)

Use a policy file so reviewers arent making ad hoc decisions and candidates get consistent treatment. This example routes only high-signal sessions to review and defines an appeal-safe Evidence Pack.

Build a replay review queue that doesnt slow hiring

CRO reality: the moment you add a manual step, your funnel can stall. The fix is to treat replay review like a small, SLAd exception queue. Step-by-step: Operational guardrails: - Keep the reviewer pool small and trained to reduce variance. - Cap daily reviews per reviewer to avoid fatigue-induced errors. - Track false positives (appeals granted) as a first-class metric.

Auto-score every session (signals above) and assign a risk tier (Low/Medium/High).
Low risk: auto-progress in ATS. No human eyes, no delay.
Medium risk: request a lightweight step-up (e.g., short follow-up questions, or a 10-minute live debug).
High risk: route to a trained reviewer pool (Recruiting Ops + a calibrated engineer) with a strict SLA (e.g., same business day).
Decision outputs: Pass, Step-up, or Reject-with-evidence. Every non-pass must attach an Evidence Pack.
Appeal path: candidates can request reconsideration; reviewers re-check the pack, not gut feelings.

Session timeline (events per minute)
Paste segments (or fingerprints) with timestamps
Diff playback markers (key frames)
Run/test events and outputs summary
Risk score breakdown by signal (explainability)

Example: What a copy-paste farm looks like in replay (vs. a strong candidate)

Two sessions can both end with a working solution. Replay shows whether they arrived there like an engineer or like a courier. Pattern A: likely farm injection - 610 minutes idle at start (no typing, no runs) - One large paste (hundreds/thousands of chars) into the main file - Minimal edits (rename variable, adjust import) - Single run; tests pass; done Pattern B: strong, open-book candidate - Starts with scaffold + comments (small typing bursts) - Pastes small snippets (e.g., regex, util) interleaved with edits - Runs tests multiple times; fixes errors iteratively - Refactors naming/structure near the end Your reviewers should be trained to look for work signatures: incremental construction, error correction, and local reasoning. Thats what correlates with Day 1 performance.

If a session is suspicious but not conclusive, prefer a step-up (short live debug) over a hard reject. Zero-tolerance policies can create false rejections of great talent.

Where IntegrityLens fits

IntegrityLens AI is built for exactly this problem: turning fragmented hiring steps into one defensible pipeline: Source candidates Verify identity Run interviews Assess Offer. Heres how it connects to replay-based integrity signals: - ATS workflow: Manage candidates, stages, reviewer assignments, and audit trails in one placeno spreadsheet side-queues. - Identity verification (under 3 minutes; typically 23 minutes end-to-end for document + voice + face): Verify who is showing up before the interview starts, so replay findings arent blamed on identity uncertainty. - Fraud detection: Convert replay telemetry into integrity signals and route by Risk-Tiered Verification (auto-pass vs. step-up vs. review). - AI screening interviews (24/7): When replay flags risk, you can trigger an on-demand, structured follow-up interview without waiting for scheduling. - Technical assessments (40+ languages): Capture coding workflow signals across common stacks, not just one niche environment. - Evidence Packs + governance: Package replay evidence in a consistent format for Recruiting Ops, TA leaders, and CISOs to review and sign off. Who uses it: TA leaders for funnel health, Recruiting Ops for workflow control and consistency, and CISOs for fraud-as-a-security-vector governance.

Less tool sprawl means fewer gaps where fraud hides and fewer handoffs that slow hiring.
A single audit trail reduces he said / she said when a hiring decision is challenged.
Idempotent Webhooks make it easier to push risk tiers and decisions into downstream RevOps reporting cleanly.

Risk notes: privacy, fairness, and keeping Legal onside

Replay can be privacy-sensitive if implemented carelessly. The best operator posture is: record the minimum required, be transparent, and design an appeal flow. Key controls to bake in: - Notice + consent: Tell candidates you capture editor interaction telemetry to ensure assessment integrity, and describe retention. - Retention limits: Keep raw replay events short-lived; retain only Evidence Packs needed for disputes. - Access controls: Restrict replay access to trained reviewers; log access for audits. - Fairness checks: Monitor whether thresholds disproportionately flag certain groups or geographies (often driven by latency/disconnect patterns). - Explainability: Every escalation should cite which signals fired. The replay looked weird does not survive scrutiny.

This is reputational insurance: youre reducing the chance of a public failure rooted in a bad integrity signal.
It also protects honest candidates by avoiding blanket suspicion and focusing review on evidence.

the replay playbook in 5 moves

Instrument the editor for minimal event telemetry (not surveillance).
Convert telemetry into a small set of explainable integrity signals.
Apply Risk-Tiered Verification so most candidates move fast.
Build a small replay review queue with SLAs, training, and an appeal path.
Store Evidence Packs with retention and access controls so decisions are defensible.

Questions CROs should ask in the next hiring ops review

Where are we currently blind: final output only, or do we see workflow evidence? - Whats our reviewer fatigue rate (how many assessments get manual review, and why)? - Do we have a documented open book policy that distinguishes resourcefulness from authorship transfer? - Can we explain every reject decision with an Evidence Pack that Legal would defend? - If fraud increases next quarter, can we tighten thresholds without blowing up time-to-offer?

Sources

Pindrop: Why your hiring process is now a cybersecurity vulnerability (stat: 1 in 6 applicants to remote roles showed signs of fraud): https://www.pindrop.com/article/why-your-hiring-process-now-cybersecurity-vulnerability/

Related Resources

Key takeaways

Replay is an evidence mechanism, not a gotcha: youre looking for workflow anomalies (bursts, zero-iteration leaps, paste storms), not punishing normal reference use.
The best operator pattern is Risk-Tiered Verification: let low-risk sessions pass automatically and route only high-signal sessions to a short review queue.
Replay + lightweight policy thresholds reduces reviewer fatigue by focusing humans on worth watching sessions instead of reading tea leaves in final code.
Your CRO lens: the goal is protecting funnel integrity and downstream delivery capacity while keeping time-to-offer fast and candidate-friendly.
Evidence Packs (timestamps, diffs, paste segments, run/test events) give Recruiting Ops and Security something defensible to stand on if decisions are challenged.

risk-tiered-replay-policy.yamlyaml

A concrete policy for scoring replay integrity signals and routing sessions into auto-pass, step-up, or manual review.

Designed to reduce reviewer fatigue, preserve candidate experience, and generate an Evidence Pack when decisions are contested.

version: "1.0"
policy:
  name: "coding-replay-risk-tiered-verification"
  owner: "recruiting-ops"
  scope:
    assessmentType: "coding"
    captureMode: "editor-event-replay" # not screen recording

signals:
  # Values are illustrative starting points; tune per role/task to reduce false positives.
  paste_ratio:
    description: "pasted_chars / (typed_chars + 1)"
    weight: 3
    thresholds:
      medium: 1.5
      high: 3.0
  early_paste_burst:
    description: "count of paste events >= 400 chars in first 5 minutes"
    weight: 4
    thresholds:
      medium: 1
      high: 2
  time_to_first_keystroke_seconds:
    description: "seconds from session start to first edit event"
    weight: 2
    thresholds:
      medium: 120
      high: 300
  zero_iteration_leap:
    description: "large solution appears with <2 run/test events and <5 edit events"
    weight: 5
    thresholds:
      medium: 1
      high: 1
  disconnect_anomaly:
    description: "disconnects or focus-loss events > 3"
    weight: 1
    thresholds:
      medium: 3
      high: 6

routing:
  tiers:
    low:
      scoreMax: 4
      action: "auto-progress"
      atsStage: "assessment-passed"
    medium:
      scoreMin: 5
      scoreMax: 8
      action: "step-up"
      stepUp:
        type: "live-debug"
        durationMinutes: 10
        instructions: "Ask candidate to explain and extend their own solution (add edge-case test + fix)."
      atsStage: "step-up-required"
    high:
      scoreMin: 9
      action: "manual-review"
      reviewQueue: "integrity-replay"
      slaHours: 8
      atsStage: "integrity-review"

evidencePack:
  generateOn:
    - "step-up"
    - "manual-review"
  contents:
    - "session_timeline_summary"
    - "paste_event_index"         # timestamps, sizes; optionally store hashed paste fingerprints
    - "diff_keyframes"            # key points in replay, not every keystroke
    - "run_test_event_log"
    - "signal_score_breakdown"
  retentionDays:
    rawEvents: 14
    evidencePack: 90
  accessControls:
    allowedRoles:
      - "recruiting-ops"
      - "security-reviewer"
      - "hiring-manager-delegate"
    requireReason: true
    logAccess: true

appeals:
  enabled: true
  windowDays: 7
  workflow:
    - "candidate-requests-appeal"
    - "second-reviewer-checks-evidence-pack"
    - "if-ambiguous -> schedule-step-up"

integrations:
  webhooks:
    # Idempotent Webhooks: safe retries without duplicate state transitions.
    - name: "ats-risk-tier-update"
      idempotencyKey: "candidateId:assessmentId:riskTier"
      event: "assessment.risk_tier.updated"
      target: "https://your-ats-webhook-endpoint.example.com/integritylens"

Outcome proof: What changes

Before

Assessments were graded on final code output with ad hoc reviewer spot-checks. When suspicious results appeared, the team lacked defensible evidence and either let risk through or over-corrected with blanket manual reviews that slowed time-to-offer.

After

Replay-based integrity signals were added to coding sessions and routed via Risk-Tiered Verification. Only high-signal sessions entered a small review queue with SLAs, and every escalation generated an Evidence Pack suitable for audit and appeals.

Governance Notes: Legal and Security signed off because replay capture was limited to editor interaction telemetry (not screen recording), retention was explicitly bounded, access was role-restricted with access logging, and candidates had clear notice plus an appeal workflow. Evidence Packs were standardized to reduce inconsistent decision-making and support defensible adverse-action documentation when applicable.

Implementation checklist

Define what counts as allowed open book behavior vs. disallowed transfer of authorship (copy-paste farm, proxy, paid solver).
Instrument the editor to capture minimal, privacy-first telemetry: events, not full screen recording.
Set thresholds for paste bursts, time-to-first-keystroke, and zero-iteration solutions; tune for false positives.
Route high-risk sessions to a small reviewer pool with a strict SLA and an appeal path.
Store an Evidence Pack with retention controls and access logging for audit readiness.

Questions we hear from teams

Will replay slow down our hiring funnel?: Not if you route by risk tier. Most sessions should auto-progress. Replay review becomes an exception queue with an SLA, not a universal manual step.
Does replay punish candidates who use Google or documentation?: It shouldnt. Your policy should explicitly allow open book behavior and focus on authorship transfer patterns (large paste injection, zero-iteration leaps), not normal reference use.
Do we need to store the full pasted content to make replay useful?: Often no. Many teams store paste timestamps and sizes, plus fingerprints/hashes for comparison, and only retain raw text briefly (or not at all) depending on Legal guidance.
What if a great candidate gets flagged because theyre just fast?: Thats why medium/high tiers should trigger a step-up (short live debug) or a second review, not an automatic reject. Track appeals granted as your false-positive indicator.
How does this connect to identity verification?: Replay addresses who authored the work behaviorally; identity verification addresses who is present. Together they reduce proxy and farm risk without blanket suspicion.

Ready to secure your hiring pipeline?

Let IntegrityLens help you verify identity, stop proxy interviews, and standardize screening from first touch to final offer.

Try it free Book a demo

Watch IntegrityLens in action

See how IntegrityLens verifies identity, detects proxy interviewing, and standardizes screening with AI interviews and coding assessments.

Related resources