Annotation studio·Image · Video · Speech

Training data for ideas
taking shape.

Zyka Foundry is a precision labeling studio for teams building computer vision, video intelligence, and speech AI. We handle the unglamorous part of the model — so your team can stay focused on the interesting part.

12M+
frames labeled in the last 12 months
98.6%
median inter-annotator agreement
<24h
median turnaround on active pools
7
languages covered for speech work
Trusted by teams shipping real models
Parallax LabsHelio HealthNorthwind AVVerso RoboticsMeridian AudioFolio DiagnosticsKestrel MobilitySignal & SampleCartographMosaic MLLimen SecurityAtlas GeoParallax LabsHelio HealthNorthwind AVVerso RoboticsMeridian AudioFolio DiagnosticsKestrel MobilitySignal & SampleCartographMosaic MLLimen SecurityAtlas Geo
[01] What we believe

Models are only as honest as the data behind them. We treat labels like the product they are.

01 /

Specialists, not a crowd

Radiologists do medical imaging. Linguists do phoneme work. Drivers do LiDAR. Matching expertise to modality is not optional — it's the whole job.

02 /

Quality as a measured system

Every pool runs under a QA framework: gold tasks, consensus scoring, calibrated reviewers, and agreement deltas shipped with every batch.

03 /

Your taxonomy, not ours

We don't hand you a generic schema. We sit with your ML team, iterate on edge cases, and encode your judgment into the annotation guide.

[02] Modality · Image

Pixel-level honesty.

Nine image workflows, one quality system. From a tight crop on a bolt in a factory feed to a boundary-accurate mask of a pulmonary nodule — we match the labeler to the domain and the tool to the task.
frame_00124.jpg · 1920 × 1080
Live preview
annotator: m.ortega · reviewer: j.ikedaIoU 0.94 · IAA 0.97
[03] Modality · Video

Motion, traced
frame by frame.

Video isn't just a stack of images — it's a contract of identity and causality over time. We label with that in mind: consistent IDs, frame-accurate boundaries, and reviewers trained to catch temporal drift before it poisons your model.
scene_42.mp4 · 59.94 fps · 3840 × 2160
00:00:44:23 / 00:02:03:09
person-A (tracking)
person-B (tracking)
vehicle-01 (3D bbox)
action · handoff
action · walking
anomaly · loiter
0:000:301:001:302:00
Track
Segment
Duration
[04] Modality · Speech

Sound with meaning attached.

Speech is the modality where linguistics, acoustics, and judgment collide. Our annotators include trained phoneticians, native speakers of low-resource languages, and certified listeners for subjective quality work.
call_042_agent_customer.wav · 48kHz · stereo
00:02:05.920
speaker · agent
speaker · customer
event · hold-music
emotion · frustration
Transcript

thanksfor calling,how can I help?yeah uhmy orderit hasn't[0.4s hesitation]arrived yetand I'm kindalosing patience here

Live state

active: speaker · agent
intent: order.status.check
entity: order_id (missing)
sentiment: neutral

[05] HITL · Human-in-the-loop

When the model is right, we stay out of the way. When it isn't,
we close the loop.

HITL isn't a fallback tier for when things go wrong. It's the plumbing between your model in production and the labeled data it still needs to learn from. Confidence-gated, cost- efficient, and wired to surface the samples that actually move your evals.
The loopContinuously running
stream
Production inference

Your model makes predictions at full throughput.

1.0×
gate
Confidence split

~85% auto-accepted. ~15% routed by entropy + uncertainty.

85 / 15
review
Human review pool

Matched specialists label, correct, rationalize.

3-class QA
artifact
Redeploy

New checkpoint rolls forward. Confidence gate retightens.

v + 1
queue
Retrain queue

Corrected labels join SFT / RLHF training pool.

∑ deltas
delta
Corrected labels

Disagreement routed to senior review. Rationale logged.

JSONL + audit
median throughput gain with pre-label + correct
34%
lower cost-per-label on mature schemas
<6h
median time from prod flag to human review
2.1×
model eval improvement per active-learning batch
· Eight HITL patterns we operate
  • 01 /3–10× throughput

    Pre-label & correct

    Your model proposes boxes, masks, transcripts, or captions. A trained reviewer approves, fixes, or rejects. On mature schemas, throughput goes up by an order of magnitude and label cost drops just as hard.

  • 02 /Uncertainty-routed

    Active learning

    We route only the samples your model is least sure about — low-confidence predictions, high-entropy distributions, near-decision-boundary cases. The labels you pay for are the ones that move the needle.

  • 03 /Every Nth inference

    Production review pool

    A fixed sample rate of production traffic is mirrored into a review queue. Drift, regressions, and silent failures show up in the agreement delta the day they start — not in the quarterly model eval.

  • 04 /Specialist pools on standby

    Edge-case triage

    When your model hits something weird — a rare medical phenotype, a code-switched utterance, a long-tail class — the task is auto-routed to a specialist pool. No generic-crowd guessing on data that deserves expertise.

  • 05 /Pairwise · rubric · Likert

    RLHF & preference

    Pairwise A/B ranking, rubric-scored absolute rating, and multi-turn dialogue judgment for generative models — image, video, speech, text. We run the calibration pipeline that keeps subjective judgment internally consistent.

  • 06 /Policy · jailbreak · adversarial

    Safety & red-team

    Jailbreak probing, policy-violation review, and adversarial prompt labeling for safety-tuned models. Trained reviewers who know the taxonomy cold and log rationale alongside the label.

  • 07 /Signal, not noise

    Disagreement routing

    When three annotators disagree, the schema is under-specified — not the annotators. Disagreements are routed to a senior reviewer, resolved with written rationale, and become test cases for the next schema revision.

  • 08 /Statistical + human review

    Drift detection & reroute

    Population-level statistics on the production inference stream surface distributional shift. Flagged slices are pulled into a re-labeling queue before the model's confidence catches up with its actual accuracy.

· A note on the word "loop"

A loop implies something that runs. Most HITL pipelines we inherit from customers are not actually loops — they're a one-way conveyor from humans to training data, never back. The difference shows up in model performance twelve months later. We build them as real loops.

[06] How we work

A quiet, careful process
that you never have to manage.

Good annotation looks boring from the outside. No heroic late-night pushes, no Slack fires, no "we'll fix it in the next batch."

The work shows up on time, the agreement numbers are where they should be, and the edge cases are the ones your guide already anticipated.

  • 01

    Kickoff & taxonomy

    We sit with your ML team for 2–3 working sessions to understand the model's failure modes, the decision boundary you want to enforce, and the edge cases that keep your PMs up at night. Output: a versioned annotation guide, test set, and success metric.

    1–2 weeks
  • 02

    Gold set & calibration

    We hand-build 200–1,000 gold tasks with your team, then calibrate annotator pools against them. Annotators who don't hit the agreement bar don't see production data. Simple as that.

    3–5 days
  • 03

    Pilot batch

    A small production batch (1–5% of scope) runs end-to-end through the tool, pool, QA, and export pipeline. We surface schema ambiguities early, before they compound across millions of labels.

    3–7 days
  • 04

    Production at scale

    Full throughput with layered QA — peer review, senior review, gold-task injection, and consensus scoring on disputed tasks. Rolling dashboards on agreement, throughput, and reviewer calibration.

    Ongoing
  • 05

    Retrospective & schema v2

    Every batch ends with a retro. What did annotators argue about? What did the model get wrong after training on this data? That feedback becomes the next version of the guide.

    Per milestone
[07] Quality system

Numbers you can audit,
not adjectives.

Every delivery ships with an agreement report, reviewer calibration curve, and a delta against your gold set. You don't have to trust us — you can check.

Disagreement is the most valuable signal in any annotation pipeline. We surface it, root-cause it, and feed it back into the next schema revision instead of averaging it away.

IAA

Inter-annotator agreement

Krippendorff's α, Cohen's κ, or F1 depending on task shape. Reported per batch and per reviewer.

Gold

Gold-task injection

Annotators see hidden gold tasks at a 3–5% rate. Performance below the bar triggers recalibration.

Consensus

N-way consensus

Critical or ambiguous tasks routed to 3+ annotators. Conflict resolution by senior reviewer with written rationale.

Drift

Reviewer drift monitoring

Weekly calibration checks for reviewers themselves. Humans get lax over time — we control for it explicitly.

[08] Who we work with

Teams whose data earns its keep.

Domain fit matters more than volume. A pool of drivers will always beat a generic crowd on AV work. A pool of radiologists will always beat a generic crowd on mammography. We staff accordingly.
  • AV · drones · robotics

    Autonomous systems

    LiDAR cuboids, sensor-fusion tracking, lane and drivable-surface segmentation, behavior prediction labels.

    14 AV programs shipped
  • radiology · pathology · ophthalmology

    Healthcare & life sciences

    DICOM-native workflows with clinician annotators. HIPAA environment, IRB-aware de-identification, audit-ready trails.

    4 FDA-cleared customers
  • SFT · RLHF · safety red-team

    Generative AI

    Preference ranking, rubric-driven aesthetic scoring, and multi-turn dialogue judgment for text-to-image, video, and voice models.

    1.2M pairwise judgments
  • CCTV · perimeter · behavior

    Security & surveillance

    Anomaly detection, re-identification across cameras, loitering and intrusion events, crowd density estimation at scale.

    24/7 monitored pool
  • catalog · shelf · fit

    Retail & e-commerce

    Product attribute tagging, shelf compliance audits, try-on pose and garment segmentation, aesthetic search preference data.

    6M SKUs processed
  • EO · SAR · aerial

    Geospatial & satellite

    Building footprints, land-use segmentation, change detection, vessel and vehicle detection from overhead imagery.

    EO + SAR trained pools
  • ASR · intent · QA

    Contact centers & voice

    Transcription, diarization, intent and entity schema work for voice agents, IVR, and conversational analytics.

    7 languages live
  • QA · predictive maintenance

    Industrial & manufacturing

    Surface defect segmentation, machine anomaly audio, thermal imaging classification, assembly compliance.

    Line-speed throughput
[09] Our people

A team of specialists. Not a crowd.

We're a hybrid model. A full-time annotation team of 140+ domain specialists handles the hard work. A vetted extension pool of 800+ trained contributors scales us up when volume spikes.

Annotators are paid above local living wage, receive health coverage, and are employed under local labor law — not gig-work contracts. We believe the quality of a label reflects the conditions it was made in.

140+
full-time specialists
vision · speech · domain experts
800+
extension pool
vetted · calibrated · callable
7
languages in-house
native speakers on speech work
74%
retention at 24 months
industry median ~22%
5
radiologists on staff
plus pathologists, OCT readers
4
linguists on staff
phonetics + computational
[10] Tools & integration

We meet you where your stack already is.

We don't force you onto a proprietary platform. If you've standardized on Label Studio, we run in Label Studio. If you've built an internal tool, we'll train our pool on it. Data never leaves your cloud unless you ask us to mirror it.
Platform
  • CVAT (self-hosted)
  • Label Studio
  • Scale Nucleus
  • Labelbox
  • Supervisely
3D / LiDAR
  • Segments.ai
  • Deepen AI
  • Custom PCD viewer
Speech
  • Praat
  • ELAN
  • Audino
  • Custom MUSHRA rig
RLHF & eval
  • Argilla
  • SurgeHQ schema
  • Custom pairwise UI
Transport
  • S3 / GCS / Azure Blob
  • SFTP
  • Signed-URL handoff
Formats
  • COCO
  • YOLO
  • Pascal VOC
  • CVAT XML
  • KITTI
  • nuScenes
  • DICOM-SR
  • WebVTT
  • TextGrid
  • custom schemas
[11] Security & compliance

Your data stays your data.

We operate on a principle of least access. Annotators see only the fields they need, from workstations inside our secure VPN — no downloads, no screenshots, no phone cameras allowed in sensitive pools.

PII redaction, watermarking, and full audit trails on every task are available by default on healthcare, financial, and government engagements.

SOC 2 Type II
ISO 27001
HIPAA
GDPR
CCPA

Clean-room workstations

Locked-down VDI, no USB, no external network egress, recorded sessions on flagged pools.

End-to-end encryption

AES-256 at rest, TLS 1.3 in transit. Customer-managed keys available on enterprise plans.

Signed BAAs & DPAs

HIPAA BAA and GDPR DPA on every engagement. Sub-processor list published and versioned.

Full audit trail

Every annotation is attributed, timestamped, and reviewable. Deletion certificates on project close-out.

[12] What teams tell us

The nicest thing a customer has said is that they forgot we existed.

"
We switched from a crowd platform after six months of arguing with the data. Agreement numbers on our mammography schema went from 0.71 to 0.94 in the first pilot batch. That was the day the CTO stopped asking me about label quality.
Head of ML
Folio Diagnostics
"
Their LiDAR team knew the sensor stack. Nobody had to explain what a moving ground return was. We lost a quarter of engineering time on a previous vendor just trying to communicate basic things. That time came back.
Perception Lead
Kestrel Mobility
"
Annotation is supposed to be invisible when it's working. For two years now I've been able to forget it's even part of our process — which is the highest compliment I can give a vendor.
VP Data Science
Meridian Audio
"
We ran the same 5,000-sample evaluation batch with three other vendors first. Zyka Foundry was the only one whose disagreement pattern was consistent with our own internal review. They were not labeling a different model than us.
Research Scientist
Mosaic ML
[13] Engagement models

Priced for the real shape of the work.

We don't price by the label. Labels are the output, not the unit of work — the real cost is in schema design, annotator calibration, and review. These are the three shapes most engagements take.

Pilot

For teams validating fit
Fixed-scope engagement
from $8,000
  • Up to 20,000 simple labels
  • 1 modality · 1 taxonomy
  • Shared annotation guide authoring
  • Gold set + calibration included
  • Agreement report per batch
  • 2-week standard turnaround
Start a pilot
Most common

Program

For recurring production pipelines
Managed monthly program
from $22,000 / mo
  • Dedicated annotator pool (20–80 people)
  • Dedicated program manager + QA lead
  • SLA on throughput and agreement
  • Weekly reviewer calibration
  • Live dashboards on agreement & throughput
  • Multi-modality supported
  • Direct Slack line to the pool PM
Scope a program

Enterprise

For regulated or high-volume teams
Custom · multi-year
Let's talk
  • Clinician / licensed specialist pools
  • On-premise or customer VPC deployment
  • Customer-managed encryption keys
  • Custom BAA / DPA / DPIA
  • Dedicated legal + security review
  • 24/7 program staffing
  • Data residency guarantees (US / EU / APAC)
Contact us
[14] Questions

Some things worth knowing before we start.

If your question isn't here, the fastest path is a 20-minute call with our delivery team. We'd rather answer the specific thing you care about than write another generic FAQ entry.
[15] Start a conversation

Let's get your data in shape.

Tell us what you're building. We'll reply within one working day with a scope sketch, a rough timeline, and a short list of the sharpest questions we'd ask in a kickoff session.

San Francisco
548 Market St · CA 94104
Amsterdam
Herengracht 182 · 1016 BR
Singapore
8 Marina Blvd · 018981
Mexico City
Polanco · CDMX 11550