Transcripts - Arcmira API

Arcmira serves premium transcripts — diarized, speaker-identified, entity-annotated, community-correctable — and can generate one on demand for any public YouTube video. Every segment is matched against the entity graph, so people, organizations, products, and topics are annotated inline with character-level spans. Many videos in the index carry a lightweight preliminary analysis (detected entities, moments, counts) without a premium transcript. Transcript surfaces are premium-only: until premium transcription runs, the transcript endpoint returns a counts-only summary of what the preliminary analysis detected, plus the quote to run the full premium analysis.

GET  /v1/transcripts/{video_id}          # read (teaser until unlocked)
POST /v1/transcriptions                  # submit a video for transcription
GET  /v1/transcriptions/{id}             # poll status (Retry-After)
POST /v1/videos/{video_id}/corrections   # submit corrections (free)

Pricing

Transcript access is priced in rows and 15-minute blocks — the same row credits the rest of the API uses:

125 rows per 15-minute block of video, rounded up, minimum one block.
A 62-minute podcast is 5 blocks = 625 rows. A 9-minute clip is 1 block = 125 rows.
Same price either way: whether a premium transcript already exists in the index or Arcmira generates it fresh for you.
Unlocks are permanent and per-account. You pay once per video; every subsequent read is free — and if you unlocked a video before its premium transcript existed, running the premium analysis later costs nothing extra.
If a transcription job fails permanently, the rows are refunded automatically and the unlock is revoked.

Every transcript response includes the exact quote in meta.quote:

{ "quarters": 5, "rows": 625 }

Corrections are always free (zero rows), and reading a transcript you have already unlocked is free.

Reading a transcript

curl 'https://api.arcmira.com/v1/transcripts/dQw4w9WgXcQ' \
  -H "Authorization: Bearer $ARCMIRA_API_KEY"

The access field tells you where you stand:

`access`	Meaning
`unlocked`	Full premium transcript in the payload.
`locked`	Premium transcript exists; you get ~5 teaser segments and the `meta.quote`.
`premium_pending`	A preliminary analysis exists but premium transcription hasn’t run. The payload carries `detected` — entity-type counts only (`{ people, organizations, products, topics }`) — plus the quote. Run the premium analysis via `POST /v1/transcriptions`.
`not_transcribed`	Not in the index; `meta.quote` is present when the duration is known — submit via `POST /v1/transcriptions`.
`unauthenticated`	Teaser only; authenticate to unlock.

Gating is enforced server-side — locked responses simply do not contain the rest of the transcript, and premium_pending responses contain no transcript text, entity names, or timestamps at all.

Unlocking

When a premium transcript exists (locked), pass ?unlock=true to purchase in the same request: the quoted rows are debited, the permanent unlock is granted, and the full transcript comes back in one round trip. On premium_pending videos there is nothing to unlock yet — unlock=true is ignored; the purchase is the premium generation itself (POST /v1/transcriptions).

curl 'https://api.arcmira.com/v1/transcripts/dQw4w9WgXcQ?unlock=true' \
  -H "Authorization: Bearer $ARCMIRA_API_KEY"

402 — not enough rows remaining this period (the body includes the quote).
403 — transcript access requires a paid plan.

Response shape

{
  "video": { "videoId": "…", "title": "…", "channelName": "…", "durationSeconds": 3720 },
  "access": "unlocked",
  "segments": [
    { "index": 0, "start": 0.4, "end": 6.1, "text": "Welcome back to the show…", "speaker": 0 }
  ],
  "speakers": [
    { "id": 0, "label": "Speaker 0", "entity": { "id": 147403, "name": "Emad Mostaque", "slug": "emad-mostaque" } }
  ],
  "annotations": [
    { "segment_index": 12, "char_start": 34, "char_end": 40, "entity_id": 120034, "entity_type": "organization", "name": "OpenAI", "slug": "openai" }
  ],
  "meta": { "diarized": true, "locked": false, "revision": "rwmksmk", "quote": { "quarters": 5, "rows": 625 } }
}

segments — transcript lines with start/end seconds and a diarization speaker id.
speakers — the diarization map; entries gain an entity once a speaker has been identified as a person.
annotations — entity mentions as character spans inside segment text.
meta.revision — an opaque id for the served transcript plus its approved-correction state. Save it: corrections echo it back, and a changed revision means the transcript changed underneath you.

Submitting a video for transcription

Any public YouTube video, up to 12 hours, paid tiers only. Rows are debited up front and the permanent unlock is granted at submit time — when the pipeline finishes, the transcript GET just works.

curl -X POST 'https://api.arcmira.com/v1/transcriptions' \
  -H "Authorization: Bearer $ARCMIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }'

{
  "request": {
    "id": "0b0e8f3a-…",
    "videoId": "dQw4w9WgXcQ",
    "status": "queued",
    "stage": "queued",
    "quote": { "quarters": 5, "rows": 625 },
    "etaSeconds": 678,
    "nextPollSeconds": 30
  }
}

Useful properties of this endpoint:

Premium transcript already exists? The request short-circuits to complete and the unlock still applies — unified pricing, no duplicate work. A video with only a preliminary analysis does not short-circuit: you’re buying the premium generation, so the pipeline runs.
Already unlocked? You are not charged again (quote.rows in the response shows what was actually debited) — including unlocks purchased before the premium transcript existed.
Already in flight? A second submit for the same video returns the existing request with existing: true.
Idempotency-Key is supported, per the standard idempotency contract.
User requests ride a reserved fast lane through the pipeline — submission latency doesn’t degrade when background indexing is busy.

Polling

GET /v1/transcriptions/{id} derives live status from the pipeline. While the request is in flight the response carries a Retry-After header (seconds) plus etaSeconds and nextPollSeconds in the body. The polite loop is simply:

while True:
    res = get(f"/v1/transcriptions/{request_id}")
    if res.json()["status"] in ("complete", "failed", "refunded"):
        break
    time.sleep(int(res.headers.get("Retry-After", "30")))

Status walks queued → downloading → transcribing → analyzing → complete. Terminal statuses drop the Retry-After header. On complete, fetch GET /v1/transcripts/{video_id} — you were unlocked at submission. If the pipeline fails permanently (or a request goes stale past 24 hours), the status becomes refunded: the rows come back and the unlock is revoked. GET /v1/transcriptions (optionally ?video_id=) lists your recent requests.

Corrections

Transcripts are community-correctable, and corrections are free — they cost zero rows and accrue to the submitting account and key, like Community Review. Everything you submit is optimistic for you, pending review for everyone else: your own pending corrections ride back on the transcript GET immediately; once a reviewer approves them, they apply for all callers (and bump meta.revision).

Lifecycle

Submit — the correction lands as a pending row, attributed to your key. Speaker identifications additionally create a community-flagged appearance on the person’s page right away.
Pending — visible to you in the transcript GET (edits[], speakerIdentifications[], …); withdrawable via the matching DELETE.
Approved — applied for everyone at read time; meta.revision changes.
Rejected / reverted — removed from view; reverting a speaker identification also deletes the appearance it created.

Unified endpoint

POST /v1/videos/{video_id}/corrections accepts every correction kind with a discriminated kind:

`kind`	What it does
`line_edit`	Fix the text of one segment.
`speaker_reassign`	Move lines (or part of a line — a sub-line split) to another speaker, a new speaker, or a non-person voice role.
`speaker_identify`	Link a diarization speaker to a person entity.
`add_person`	Propose a person not in the index yet and link the speaker in one action.
`entity_tag`	Tag an entity mention the pipeline missed, as a character span.

curl -X POST 'https://api.arcmira.com/v1/videos/dQw4w9WgXcQ/corrections' \
  -H "Authorization: Bearer $ARCMIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "kind": "line_edit",
    "revision": "rwmksmk",
    "anchor": { "segmentIndex": 12, "contentHash": "1a2b3c" },
    "payload": {
      "segmentIndex": 12,
      "originalText": "Anthropics new cloud 4.5",
      "correctedText": "Anthropic'\''s new Claude 4.5"
    }
  }'

Anchors, revisions, and ordering

Kinds that reference segment content (line_edit, speaker_reassign, entity_tag) must prove they were made against the transcript you actually saw:

revision — echo meta.revision from the transcript GET.
anchor.contentHash — a djb2 hash of the covered segment text (segments joined with \n for multi-segment selections), base-36 encoded:

function djb2(input) {
  let hash = 5381;
  for (let i = 0; i < input.length; i++) {
    hash = ((hash << 5) + hash + input.charCodeAt(i)) >>> 0;
  }
  return hash.toString(36);
}

seq (optional) — a per-video monotonic counter for clients that submit streams of dependent corrections (sub-line splits re-index later segments, so order matters). One-off submissions can omit it.

The error semantics are designed for at-least-once, queue-style clients:

Status	Meaning	What to do
`409`	Revision or anchor mismatch — the transcript changed underneath the correction. Body: `{ reason, currentRevision }`.	Drop or re-anchor this correction and continue; the sequence number was consumed.
`412`	Sequence mismatch. Body: `{ expectedSeq }`.	Refetch the transcript, rebase your local counter, resend.
`401`	Authentication expired.	Pause the queue, re-authenticate, resume — never drop.
`429` / `5xx`	Transient.	Retry the same event with backoff.

Idempotency-Key (use the correction’s client-generated UUID) makes retries safe: replays return the stored final response verbatim with Idempotency-Replayed: true.

Purpose-built wrappers

If you don’t need the unified contract, three simpler routes cover the common cases — each supports Idempotency-Key, and each POST returns the row id you can DELETE to withdraw while still pending:

POST   /v1/transcripts/{video_id}/edits          { segmentIndex, originalText, correctedText }
DELETE /v1/transcripts/{video_id}/edits/{id}

POST   /v1/transcripts/{video_id}/speakers       { speakerId, entityId }  or  { speakerId, name }
DELETE /v1/transcripts/{video_id}/speakers/{id}

POST   /v1/transcripts/{video_id}/merges         { sourceName, targetEntityId, replaceWith? }
GET    /v1/transcripts/{video_id}/merges
DELETE /v1/transcripts/{video_id}/merges/{id}

Edits fix segment text.
Speakers identify who a diarized voice is — this creates the community appearance immediately; withdrawing removes it.
Merges fix misattributed name mentions within one video (“mentions of Imad in this video are Emad Mostaque”), optionally respelling the transcript text. Mentions of the same name in other videos are untouched.

Rows created through the unified endpoint’s speaker_reassign / entity_tag kinds are withdrawn at:

DELETE /v1/corrections/speaker-edits/{id}
DELETE /v1/corrections/entity-tags/{id}

​Pricing

​Reading a transcript

​Unlocking

​Response shape

​Submitting a video for transcription

​Polling

​Corrections

​Lifecycle

​Unified endpoint

​Anchors, revisions, and ordering

​Purpose-built wrappers

Pricing

Reading a transcript

Unlocking

Response shape

Submitting a video for transcription

Polling

Corrections

Lifecycle

Unified endpoint

Anchors, revisions, and ordering

Purpose-built wrappers