What is the operational objective of an NYC LL 144 bias audit?

To ensure that any automated employment decision tool influencing candidate scoring or ranking can be audited with evidence. Operationally, that means reconstructable decisions: versioned logic, logged events, and accountable approvers per candidate.

Why do compliance teams get stuck even when a vendor provides an audit report?

Because the day-to-day decision path often happens outside the vendor report: overrides, ranking changes, rubric drift, and unlogged updates. The audit must bind to the actual workflow, not just the tool.

What should be logged for defensibility?

At minimum: timestamps, actor identity, candidate ID, req ID, identity status at time of scoring, tool and rubric version IDs, score outputs, and override rationale and approver for any changes.

How does identity verification relate to bias audits?

If the wrong person generates the score, you are auditing noise. Identity gating ties the score to a verified individual, reducing proxy interview exposure and making the evidence pack defensible.

Compliance-governance · Dec 20, 2025 · 10 minute read

NYC LL 144: Bias Audit Runbook for Scoring Tools

Turn algorithmic bias audits from a PDF exercise into an instrumented workflow with owners, SLAs, and ATS-anchored evidence packs.

Rebecca Stein

General Counsel

Rebecca advises on global data privacy, biometric compliance, and employment law.

A decision without evidence is not audit-ready. If it is not logged, it is not defensible.

Back to all posts

1. HOOK: Real Hiring Problem

It is 9:10 a.m. and Legal pings you: "We received a request to substantiate how this candidate was scored and ranked for an NYC role. Provide the bias audit artifacts, the scoring logic in effect on that date, and who approved the decision." Your team can produce a vendor PDF from last quarter, but you cannot reconstruct this specific candidate decision. The score came from an auto-screen, the ranking happened in a spreadsheet, and the hiring manager overrode it in chat. No immutable event log. No single source of truth. No timestamps tying the decision to the exact model version and rubric. Operationally, the risk clusters in three places: - Audit liability: you cannot prove what happened, who approved it, or which version of the tool produced the score. If it is not logged, it is not defensible. - SLA breach: the req stalls while teams hunt for artifacts across inboxes, dashboards, and exported CSVs. Time-to-offer delays accumulate exactly where identity and scoring are unverified. - Mis-hire cost exposure: when you cannot trust the funnel integrity, you either over-reject qualified candidates or progress risky ones. SHRM notes replacement costs can range from 50-200% of annual salary depending on role, which turns scoring errors into budget events, not HR events. NYC LL 144 is not just a compliance checkbox. It is a requirement to run hiring decisions like controlled access decisions: consistent inputs, versioned logic, and evidence packs that can be retrieved on demand.

2. WHY LEGACY TOOLS FAIL

The market did not fail because people ignored bias. It failed because the tooling stack makes auditability optional. Most ATS and point-solution screening vendors were built for throughput, not defensibility. They assume sequential checks in a waterfall: screen first, then verify, then assess, then reconcile notes. That sequence creates two predictable gaps: - Evidence fragmentation: scores live in one tool, interview notes in another, ranking changes in spreadsheets, and approvals in chat. Shadow workflows are integrity liabilities. - Unversioned decisioning: models and rubrics change, but candidate-level records do not retain tool version, prompt version, rubric version, and override rationale in a tamper-resistant format. Even when an annual "bias audit" report exists, it is rarely bound to the operational reality of day-to-day decisions. No event logs. No unified evidence packs. No review-bound SLAs. No standardized rubric storage per req. When Legal asks, you are left with a PDF and a guess.

3. OWNERSHIP & ACCOUNTABILITY MATRIX

Before you touch process, assign ownership and sources of truth. NYC LL 144 readiness is a coordination problem. - Recruiting Ops owns: workflow design, req configuration, candidate stage gates, and SLA enforcement. They own the funnel mechanics. - Security owns: access control, identity gating policy, audit retention rules, and evidence pack integrity requirements. They own defensibility controls. - Hiring Manager owns: rubric discipline, structured evaluation completion, and documented rationale for overrides. They own decision quality. - Analytics or People Analytics owns: segmented risk dashboards, time-to-event reporting, and monitoring drift in scoring outcomes across reqs. Automation vs manual review should be explicit: - Automated: identity gate checks, scoring execution, version capture, evidence pack assembly, timestamping, and write-back to ATS. - Manual (review-bound): exception handling, override approval, adverse signal review, and periodic audit sign-off. Sources of truth must be singular, not implied: - ATS is the source of truth for lifecycle stage and decision outcome. - The scoring and ranking system is the source of truth for inputs, outputs, and versions, but must write immutable evidence back into the ATS record. - The verification layer is the source of truth for identity status at the moment scoring was accessed or executed.

4. MODERN OPERATING MODEL

Run NYC LL 144 bias audits as an instrumented workflow, not a one-time document. The goal is repeatability: every candidate decision can be reconstructed from logs. Operating model checkpoints: - Identity verification before access: if a tool score influences ranking, the candidate identity status must be known at the time of scoring. Step-up verification for higher risk roles or suspicious signals. - Event-based triggers: every scoring action emits an event with timestamp, actor, tool version, and req context. The event log is your audit spine. - Automated evidence capture: store inputs (what was scored), outputs (score and explanation fields if applicable), and metadata (model version, rubric version, prompt version, dataset window). - Standardized rubrics: require structured fields for hiring managers, not freeform notes. Rubric completion is a gate, not a suggestion. - Segmented risk dashboards: monitor time-to-event and integrity signals together. Bias audit readiness improves when delays, overrides, and exception rates are visible per req and per team. Compliance outcome: when an auditor asks "show me how this candidate was ranked," you can produce a single evidence pack with timestamps and accountable approvers.

5. WHERE INTEGRITYLENS FITS

IntegrityLens AI functions as the ATS-anchored control plane that makes LL 144 auditability operational instead of aspirational. - Identity gate before scoring access using biometric identity verification (liveness, face match, document authentication) so the score is tied to a verified individual, not an email address. - Immutable evidence packs that bundle identity status, assessment artifacts, scoring outputs, reviewer notes, overrides, and timestamps into a single retrieval unit. - Tamper-resistant logs that show who changed a score, when, and why, supporting reviewer accountability and audit defensibility. - Risk-tiered funnel controls, including step-up verification and review-bound SLAs for exceptions and manual overrides. - Zero-retention biometrics architecture patterns to support Privacy by Design while still producing compliance-ready audit trails.

6. ANTI-PATTERNS THAT MAKE FRAUD WORSE

Running bias audits on vendor summaries only, while candidate-level decisions happen in spreadsheets or chat. You end up "auditing the tool" but not "auditing the decision." - Allowing scoring before identity is verified, then treating verification as a downstream formality. This creates a proxy interview window where unverified actors can generate "legitimate" scores. - Permitting manual overrides without an SLA-bound review queue and documented rationale. Manual review without evidence creates audit liabilities and invites inconsistent outcomes.

7. IMPLEMENTATION RUNBOOK

Inventory and classify AEDTs (SLA: 5 business days per quarter refresh) - Owner: Head of Compliance (with Recruiting Ops) - Action: list every tool or workflow that scores, recommends, or ranks candidates for NYC roles. Include rubrics, auto-screen interviews, coding scores, and any weighting logic. - Evidence: system inventory record, owner, change-control contact, and what decision it influences. #

Freeze versions per req (SLA: within 24 hours of req open) - Owner: Recruiting Ops - Action: bind req to rubric version, model version, and weighting configuration. No "silent" changes mid-funnel. - Evidence: version IDs written into ATS req configuration and immutable event log. #

Identity gate before scoring execution (SLA: under 3 minutes per candidate where verification is required) - Owner: Security (policy) + Recruiting Ops (workflow enforcement) - Action: require identity verification completion before any score that affects ranking is generated or consumed. Use step-up verification for high-risk roles or anomaly signals. - Evidence: verification timestamp, method, and status attached to candidate record and evidence pack. #

Score generation with full metadata capture (SLA: immediate upon completion) - Owner: Recruiting Ops (tool configuration) + Analytics (schema) - Action: when a score is generated, log inputs, outputs, tool version, rubric version, and actor. If an explanation field exists, store it as evidence, not as a UI-only artifact. - Evidence: immutable event log entry and stored scoring payload linked to candidate and req. #

Override control with review-bound SLA (SLA: 24 hours for override approval or rejection) - Owner: Hiring Manager initiates, Compliance sets policy, Recruiting Ops enforces queue - Action: require override reason code and structured justification. Route to a review queue with an explicit SLA. Expire access by default, not exception, for anyone trying to override outside policy. - Evidence: override request event, approver identity, timestamp, and final disposition. #

Evidence pack assembly (SLA: within 5 minutes of decision event) - Owner: Compliance (requirements) + Recruiting Ops (automation) - Action: generate a per-candidate evidence pack at each decision boundary: reject, advance, offer. Include identity status, scores, versions, rubrics, overrides, and who approved. - Evidence: evidence pack ID stored in ATS with retrieval path. #

Bias audit execution and sign-off (SLA: per LL 144 cadence, plus on material change) - Owner: Head of Compliance (sign-off) + Analytics (analysis) + Security (controls attestation) - Action: run the required bias audit, but also validate operational controls: version freezes, override rates, missing logs, and SLA compliance. - Evidence: audit report plus operational attestation that logs, versions, and evidence packs are complete for the audit window. One practical policy-as-code artifact you can adopt immediately is a minimal event logging and versioning policy for AEDT decisions.

Related Resources

Key takeaways

Treat scoring and ranking like privileged access decisions: identity gate before access, evidence-based scoring, and tamper-resistant logs.
A decision without evidence is not audit-ready. NYC LL 144 readiness depends on being able to reconstruct the full scoring path per candidate.
Bias audits fail operationally when models change without versioning, rubrics drift across teams, and evidence lives in screenshots and inboxes.
Run bias audits as an instrumented workflow: event logs, fixed SLAs, standardized rubrics, and reviewer accountability.

NYC LL 144 - AEDT scoring and ranking audit controlsYAML policy

Defines the minimum events and metadata to log for any scoring or ranking action that influences candidate disposition for NYC roles.

Designed to make audits reconstructable per candidate: who, what, when, which version, and what changed.

policy:
  name: nyc-ll144-aedt-audit-controls
  scope:
    locations: ["NYC"]
    applies_to:
      - candidate_scoring
      - candidate_ranking
      - score_override
  gates:
    identity_gate_before_scoring:
      required: true
      max_age_minutes: 1440
      step_up_required_when:
        - signal: "proxy_interview_suspected"
        - signal: "deepfake_signal"
  version_freeze:
    bind_versions_at: "req_open"
    required_fields:
      - model_version_id
      - rubric_version_id
      - weighting_config_id
      - prompt_version_id
    change_control:
      require_new_req_or_formal_exception: true
      exception_sla_hours: 24
  event_logging:
    immutable_log_required: true
    events:
      - name: "aedt.score_generated"
        required_fields:
          - timestamp
          - candidate_id
          - req_id
          - actor_id
          - identity_status
          - model_version_id
          - rubric_version_id
          - input_hash
          - output_score
      - name: "aedt.rank_updated"
        required_fields:
          - timestamp
          - candidate_id
          - req_id
          - actor_id
          - ranking_position
          - ranking_reason_code
          - model_version_id
          - rubric_version_id
      - name: "aedt.override_requested"
        required_fields:
          - timestamp
          - candidate_id
          - req_id
          - requester_id
          - override_reason_code
          - override_justification
      - name: "aedt.override_decided"
        required_fields:
          - timestamp
          - candidate_id
          - req_id
          - approver_id
          - decision
          - decision_rationale
  evidence_pack:
    assemble_on_events:
      - "aedt.score_generated"
      - "aedt.override_decided"
      - "candidate.dispositioned"
    include:
      - identity_verification_summary
      - scoring_payload
      - tool_versions
      - reviewer_notes
      - override_chain
    retention:
      days: 365
    privacy:
      zero_retention_biometrics: true
      store_only_verification_outcome_and_metadata: true

Outcome proof: What changes

Before

Bias audit artifacts existed as periodic PDFs, but candidate-level ranking decisions could not be reconstructed reliably. Overrides happened in chat, and model or rubric updates were not consistently version-bound to reqs.

After

Algorithmic bias audits were paired with candidate-level evidence packs and immutable logs: every score and override had a timestamp, version IDs, and an accountable approver recorded in the ATS record.

Governance Notes: Legal and Security signed off because the operating model reduced defensibility gaps: identity was gated before scoring access, overrides required documented rationale with an approver, and audit artifacts were generated automatically as tamper-resistant evidence packs. Privacy by Design was addressed by storing verification outcomes and metadata while using zero-retention biometrics patterns.

Implementation checklist

Inventory every scoring and ranking input that affects candidate ordering.
Lock model and rubric versions per req and per candidate decision.
Define an SLA-bound review queue for exceptions and manual overrides.
Generate an evidence pack per decision: identity status, tool versions, inputs, outputs, overrides, and approvers.
Publish an audit calendar and change-control gate for scoring logic updates.

Questions we hear from teams

What is the operational objective of an NYC LL 144 bias audit?: To ensure that any automated employment decision tool influencing candidate scoring or ranking can be audited with evidence. Operationally, that means reconstructable decisions: versioned logic, logged events, and accountable approvers per candidate.
Why do compliance teams get stuck even when a vendor provides an audit report?: Because the day-to-day decision path often happens outside the vendor report: overrides, ranking changes, rubric drift, and unlogged updates. The audit must bind to the actual workflow, not just the tool.
What should be logged for defensibility?: At minimum: timestamps, actor identity, candidate ID, req ID, identity status at time of scoring, tool and rubric version IDs, score outputs, and override rationale and approver for any changes.
How does identity verification relate to bias audits?: If the wrong person generates the score, you are auditing noise. Identity gating ties the score to a verified individual, reducing proxy interview exposure and making the evidence pack defensible.

Ready to secure your hiring pipeline?

Let IntegrityLens help you verify identity, stop proxy interviews, and standardize screening from first touch to final offer.

Try it free Book a demo

Watch IntegrityLens in action

See how IntegrityLens verifies identity, detects proxy interviewing, and standardizes screening with AI interviews and coding assessments.