spacecast — how it works & how the sound is recovered

Capturing a live X Space and restreaming it to an X live-video broadcast over RTMP.

The pipeline (what's running during a cast)

A real, logged-in Chromium joins the Space on a burner account — anonymously, so you never show up in the listener list — and renders the speaker/listener grid to a private virtual display (Xvfb :99). ffmpeg screen-grabs that display for video and captures a dedicated PulseAudio sink for audio, then pushes the combined stream over RTMP to X's Periscope ingest, which surfaces it as your broadcast.

X Space (live)                          X live broadcast
     │                                        ▲
     ▼                                        │ RTMP push
┌───────────────────┐                  ┌─────────────┐
│ Chromium (headful)│  renders grid →  │   ffmpeg    │
│  joins the Space  │  x11grab :99     │ crop/scale  │
│  (burner acct,    │ ───────────────► │ → libx264   │
│   anonymous)      │                  │ → flv/rtmp  │
└───────────────────┘                  │ fr.pscp.tv  │
     │ audio                           └─────────────┘
     ▼                                        ▲
  PulseAudio null-sink ──── pulse capture ────┘
   (spacecast99)

Each session owns its own display number + own audio sink, so concurrent casts never bleed video or audio into one another. That isolation is load-bearing by design.

The audio problem (why a camera Space is hard)

X's web client treats listeners differently depending on the Space type:

Space type	What a plain listener gets	Result
Audio Space	Live audio just plays in the browser → lands in the sink directly.	Easy — works out of the box.
Camera / video Space	Audio is withheld from listeners over WebRTC. Only speakers get an audio track. The page's `<video>` sits at `readyState:1` (metadata only, no data) — not muted, starved.	Silent video unless rescued.

So on a camera Space, play() / volume=1 / "unmute" does literally nothing — the sink sits at silence (≈ −91 dB). Capturing the browser alone would give you silent video.

How the sound is recovered — the "audio guardian"

A second, parallel audio path runs alongside the browser and kicks in only when needed:

Sniff the media_key. join-space.js reads it out of the Space's AudioSpaceById API response.
Sample the sink. If audio is already present (normal Space) the guardian does nothing — "browser audio present, no HLS feed needed."
If SILENT, ask X's own API. GET /i/api/1.1/live_video_stream/status/<media_key> (authed with the burner cookies + the public web bearer) returns an HLS playlist (master_dynamic_playlist.m3u8?type=live) — the same live audio, but over HLS instead of the gated WebRTC path.
Verify a real child rendition. The master often points to child playlists that 404 — when X never built the transcode ladder, the Space is genuinely uncastable with audio. The guardian probes each child and only uses one returning HTTP 200 and actually advancing.
Feed it into the sink. A separate ffmpeg pulls that HLS audio into the same spacecast99 sink with -map 0:a:0 — pin one audio rendition (the master has 3; pulling all = overlapping voices). Now the main pusher captures real speech.
Self-heal. The signed HLS URL expires mid-Space, so on re-detected silence the guardian re-fetches a fresh URL and backs off if none is reachable.

The tell-tale signs it worked: the sink jumps from silence to roughly −20 dB mean, and the pusher ffmpeg bitrate climbs to ~1.3–1.6 Mbit/s. That bitrate is the signature of real audio — a silent cast idles around ~150 kbit/s.

The hard limit: if a camera Space's HLS ladder 404s on every host, audio is gated on both the WebRTC and HLS paths — it's genuinely uncastable-with-audio by an anonymous listener. That's an X platform limitation, not a bug. A Space that works simply happens to have the ladder built.