Skip to content

Telegram Attachments

Audience: operators who have already paired their phone with --telegram and want to send files to the fleet or have the fleet send files back. For pair / use / recover guidance see docs/telegram-remote-steering.md; for the security posture see docs/telegram-threat-model.md; for the developer architecture see docs/telegram-architecture.md.

agents-fleet's Telegram channel supports inbound file uploads (you send a file from your phone, the coordinator can read it) and outbound file delivery (the coordinator chooses to push a file back to every paired chat). Both directions reuse the same AttachmentStore and the same allowlist / auth gate as text steering — pairing requirements do not change.


At a glance

You want to…Do this
Send a screenshot to the fleetAttach a photo in Telegram; add a caption explaining what to do with it
Send a log file or PDFAttach as a document
Send a voice memoHold the mic button and record
Have the coordinator deliver a generated file back to youAsk in your prompt; the coordinator decides whether to call send_attachment

Inbound — sending a file to the bot

  1. From any paired Telegram chat, attach a photo, document, audio, video, voice note, or video note and (optionally) add a caption.

  2. The bot downloads the file (BotTransport.downloadFile) and stores it under ~/.fleet/attachments/<sessionId>/<unique-name> via the AttachmentStore.

  3. Your caption (if any) is echoed into the local CLI transcript as [via Telegram] <caption> exactly like a plain-text message, so the operator at the host sees what was sent.

  4. A second prompt is synthesized and dispatched to the coordinator that combines the caption with one bullet per saved file:

    text
    take a look at this screenshot
    
    [Operator shared attachment:
     - /home/me/.fleet/attachments/sess-…/screenshot.png (84213 bytes, image/png)]
  5. The coordinator can view the file directly, hand it to a worker, or process it with any tool that accepts a path.

Filename handling

  • Telegram supplies fileName for documents (and often for audio/video). When present, that name is sanitized and used.
  • Photos, voice notes, and some stripped media arrive without a name. In that case the stored name is <fileUniqueId><ext>, where <ext> is derived from the reported MIME type (image/jpeg.jpg, audio/ogg.ogg, application/pdf.pdf, etc.). Unknown MIME types fall back to .bin.
  • Sanitization (sanitizeAttachmentName in src/bot/attachmentStore.ts): directory separators stripped, leading dots removed, characters outside [a-zA-Z0-9._-] replaced with _, trailing dots/spaces trimmed. Empty names fall back to attachment.bin.
  • Name collisions resolve by appending -1, -2, …, up to -999 to the stem (extension preserved).

Failure posture

  • Per-attachment failures (download error, store error) are logged and skipped; other attachments on the same message still proceed.
  • If every attachment fails the caption is still preserved in the local transcript via the synchronous echo above, but no coordinator prompt is synthesized (so the coordinator never sees an empty [Operator shared attachment: …] block).

Outbound — delivering files from the fleet

The coordinator has a send_attachment tool (src/coordinator/tools/sendAttachment.ts) that pushes a file to every paired chat. Workers can produce files (logs, screenshots, generated artifacts) but do not auto-deliver: the coordinator decides whether to forward each one.

Tool surface

jsonc
{
  "name": "send_attachment",
  "parameters": {
    "path": "string (absolute or cwd-relative)",
    "caption": "string (optional)",
    "kind": "auto | document | photo | audio | video | voice (optional, default 'auto')"
  }
}
  • path must point at an existing readable regular file.

  • caption is rendered alongside the file in Telegram (subject to Telegram's caption length cap).

  • kind=auto picks the Telegram method by extension:

    • .jpg, .jpeg, .png, .gif, .webpphoto
    • .mp3, .wav, .ogg, .m4a, .flacaudio
    • .mp4, .mov, .mkv, .webm, .avivideo
    • anything else → document

    Pass an explicit kind to override (e.g. voice for an OGG/Opus voice note rendered with a waveform).

Result shape

The tool returns a structured result rather than throwing:

jsonc
{ "ok": true,  "kind": "photo", "path": "/abs/path", "sentCount": 2 }
{ "ok": false, "error": "Cannot read file at /abs/path: ENOENT" }
{ "ok": false, "error": "No attachment channel is attached. …" }

So an outage on one paired chat does not stall the tool call, and a mis-typed path surfaces to the model as a recoverable error.

Current wiring status

send_attachment is registered unconditionally in toolHandlers.ts, but the underlying AttachmentSender implementation on TelegramChannel lands in a follow-up PR. Until then the tool reports { ok: false, error: "No attachment channel is attached. …" } in production. The interface contract is stable; once the channel wiring ships no further coordinator-side changes are required.


Worked example

Sending a screenshot for review, getting back an annotated copy:

text
You (Telegram, attaches screenshot.png with caption):
  this dialog renders wrong on the small viewport — what's broken?

Fleet (local CLI transcript echoes):
  [via Telegram] this dialog renders wrong on the small viewport — what's broken?
  [via Telegram] 📎 attachment: screenshot.png → /home/me/.fleet/attachments/sess-abc/screenshot.png

Coordinator (Telegram reply, after viewing the file):
  The flex container is using `align-items: flex-end` which collapses the
  modal under 480px. Patching `src/ui/Dialog.tsx`…

Coordinator (Telegram, later, via send_attachment):
  ↳ patched-dialog.png  «here's the fix rendered at 375×667»

Storage layout

~/.fleet/attachments/
  <sessionId>/                        ← directory mode 0o700
    screenshot.png                    ← file mode 0o600
    voice-AgADBQADxxx.ogg
    log.txt
    log-1.txt                         ← collision suffix
  • ~/.fleet/attachments/ is the root passed to AttachmentStore (src/cli/telegramSetup.ts:368).
  • The sub-directory name comes from the active sessionId (or no-session when one hasn't been minted yet).
  • Files are written via writeStream → rename of a .partial sidecar so partial downloads never appear as completed files.
  • Telegram caps single uploads at 50 MB (Bot API limit); agents-fleet does not impose an additional size cap on inbound files.

Troubleshooting

SymptomLikely causeFix
Caption echoed locally but no 📎 attachment: lineDownload failed (network blip, Telegram quota)Re-send the file; check AGENTS_FLEET_TG_DEBUG=1 stderr for downloadFile failed
Coordinator never reacts to the attachmentPer-attachment store error after download; caption alone reaches the transcriptCheck stderr for saveFromStream failed; verify ~/.fleet/attachments/ is writable
send_attachment returns "No attachment channel is attached"The AttachmentSender wiring on TelegramChannel hasn't shipped in your version, or --telegram was never startedUpdate agents-fleet; confirm at least one chat is paired before calling the tool
send_attachment returns "Cannot read file at …"Path doesn't exist or is not readable by the fleet processPass an absolute path that the coordinator process can stat; cwd-relative paths resolve against the fleet's launch directory
Stored filename looks like AgADBQADxxx.jpg instead of the originalTelegram did not include fileName (always the case for photos / voice notes)Cosmetic — the file is intact; rename if you need a friendlier name

See also