HLS Download Audio and Video Are Separate? Here's How to Fix It

Why modern HLS streams ship audio and video as separate manifests, and three ways to download them as a single playable MP4 in 2026.

Published on May 6, 2026

You hit “download” on an HLS stream and got two files instead of one — video.mp4 with no sound, audio.mp4 with no picture. Or worse, you got just the silent video and never realized the audio existed in a different m3u8 entirely. This is not a bug in your downloader. It is how modern HLS works, and almost every Chrome HLS extension on the market today fails the same way.

This post explains exactly why modern HLS separates audio from video, what the manifest actually looks like, and three practical ways to merge them back into a playable MP4 — from a one-line FFmpeg command for technical users to a one-click solution for everyone else.

What modern HLS actually looks like

In the early days of HLS (Apple’s original 2009 spec, RFC 8216), the master playlist pointed at a list of media playlists, each containing .ts segments that already had audio and video muxed together:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=1280000,RESOLUTION=720x480
720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2560000,RESOLUTION=1280x720
1080p.m3u8

Each linked m3u8 contained segments like seg001.ts that you could concatenate and play directly. One file, one stream, one download.

That is not what most production HLS streams look like today. A modern master playlist looks more like this:

#EXTM3U
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac",NAME="English",DEFAULT=YES,URI="audio/eng/playlist.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac",NAME="Spanish",URI="audio/spa/playlist.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",URI="subtitles/eng.m3u8"

#EXT-X-STREAM-INF:BANDWIDTH=1280000,RESOLUTION=720x480,CODECS="avc1.4d401e",AUDIO="aac",SUBTITLES="subs"
video/720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2560000,RESOLUTION=1280x720,CODECS="avc1.4d401f",AUDIO="aac",SUBTITLES="subs"
video/1080p/playlist.m3u8

The video manifest now contains only video segments — no audio at all. Audio lives in a completely separate m3u8 referenced by the EXT-X-MEDIA tag with TYPE=AUDIO. The player is responsible for downloading both in parallel and rendering them in sync.

Why every modern stream does this

The split is not arbitrary. There are four concrete reasons every major streaming platform moved to it:

1. Multi-audio-track support

A film with English, Spanish, French, German, and Japanese audio tracks would need 5× the storage and 5× the encoding cost if audio were muxed into the video. Splitting audio into its own manifest lets the player switch tracks without re-downloading video.

2. Adaptive bitrate efficiency

The same audio track can pair with 240p, 480p, 720p, 1080p, and 4K video variants. Without separation, every quality level needs its own audio copy. With separation, one audio manifest serves all video qualities.

3. Encrypted-stream key management

Modern HLS uses AES-128 or SAMPLE-AES encryption. Separating audio and video lets each have its own key rotation schedule. This matters for live streams where encryption keys rotate every 60 seconds for DRM-adjacent (but not full DRM) protection.

4. CDN cache efficiency

Audio segments are tiny (~20-50 KB) and rarely change. Video segments are large (~1-5 MB) and dominate bandwidth. Splitting them lets CDNs cache audio aggressively and stream video on a different cache strategy.

This is good engineering. It is also what makes downloading harder.

How to merge them back: three approaches

Approach 1: FFmpeg one-liner (technical users)

If you can read the master playlist and identify the right audio + video URIs, FFmpeg merges them in a single pass without re-encoding:

ffmpeg \
  -i "https://example.com/video/1080p/playlist.m3u8" \
  -i "https://example.com/audio/eng/playlist.m3u8" \
  -c copy \
  -bsf:a aac_adtstoasc \
  output.mp4

The flags:

-c copy — stream-copy, no re-encode, lossless and fast
-bsf:a aac_adtstoasc — required when the audio is AAC inside MPEG-TS, otherwise iOS/Safari can’t play the resulting MP4

To find the URLs, open Chrome DevTools, go to the Network tab, filter by m3u8, and play the video. You will see the master playlist load first, then the video media playlist, then the audio media playlist. Right-click each, copy the URL, paste into the FFmpeg command above.

This works. It is also slow when you do it more than once a week.

Approach 2: yt-dlp / streamlink (semi-technical users)

Both yt-dlp and streamlink can ingest a master playlist and automatically resolve the audio/video pairing:

yt-dlp -f "bv*+ba/b" "https://example.com/master.m3u8"

The -f "bv*+ba/b" format selector means “best video plus best audio, falling back to the best single stream if separation fails.” This handles the common case but breaks on:

Streams that require an authenticated session (cookies)
Streams behind Cloudflare or token-rotation walls
Live streams (yt-dlp does not record live HLS reliably)

For VOD with public access this is the easiest CLI path. For anything inside a logged-in session, see Approach 3.

Approach 3: A browser extension that handles it (everyone else)

The fundamental problem with the first two approaches is that the m3u8 URL alone is not enough. Modern streams use:

Per-session signed URLs that expire in 30-120 seconds
Cookies that the originating browser session has but a CLI tool does not
Custom request headers (Origin, Referer) the player sets that anti-bot systems verify
TLS fingerprints that match a real browser, not a Python requests library

A browser extension lives inside the same session as the video player. It sees the same cookies, sends the same headers, and inherits the same TLS fingerprint. It also sees both the audio manifest and the video manifest as the page loads them, so it can pair them automatically without you copying URLs anywhere.

This is what we built Video Downloader One-for-All to do. Specifically:

The extension monitors the network for HLS manifest requests as you browse
When it sees a master playlist, it parses it and notes the audio + video manifest URIs
When you click download, it fetches both in parallel using your existing browser session
It runs FFmpeg.wasm (a WebAssembly build of FFmpeg) inside your browser to mux the streams into a single MP4
The merged file lands in your downloads folder, no separate files

No DevTools, no command line, no expired URLs. The whole pipeline runs in your tab, so no part of your video leaves your machine.

Why most other extensions fail at this

If you read recent 1-star reviews of HLS extensions on the Chrome Web Store, the audio-video split problem appears repeatedly:

“Splits into audio and video streams and constantly opens new recordings ad infinitum.”

“It’s not recording audio and is constantly creating new recordings.”

“我願意付費使用希望作者趕快更新不然視訊和音軌都是分開的” (“I’d pay to use it — please update soon, otherwise audio and video are separated.”)

These quotes come from Stream Recorder, the most-installed HLS extension on the Chrome Web Store with over 1 million users. Stream Recorder has not received an update since 2025-08-01. Modern HLS streams break it because its manifest parser predates the audio/video separation pattern. We covered this in detail in Stream Recorder Not Working in 2026? Best HLS/m3u8 Alternative.

The other top-installed HLS extensions (Live Stream Downloader, FetchV, Video Downloader Plus) handle separation with mixed success — some pair them automatically, some only when you select a specific quality, some not at all.

How to verify your downloader handles separation correctly

Pick a known-split HLS source — most live broadcast platforms work as a test. Then:

Start a download with the extension you want to test
When it finishes, check the file size. A 5-minute 1080p video with audio should be 30-80 MB. If you see two ~25 MB files, or one 25 MB silent file, separation is broken.
Open the file in VLC or any media player. If the audio is missing or out of sync, the merge step failed.
If you got a single playable file with audio matching the lip movement, separation is handled correctly.

You can do the same test with our extension: install it, find a stream you know used to fail, and check whether the resulting file plays end-to-end with audio.

What about live streams?

For live HLS, separation matters even more because:

The audio manifest updates independently from the video manifest (every 6-10 seconds typically)
Recording must keep both buffers in sync without drift
Pause/resume needs to seek both streams together

Our extension handles this — see the Live Stream Recorder page for the live-specific feature surface. The basic FFmpeg command above also works for live HLS if you replace -c copy with appropriate buffer settings, but the URL-rotation problem is real and you will likely lose segments at the boundaries.

When the merge produces out-of-sync audio

Sometimes the merge succeeds technically but audio drifts a few hundred milliseconds out of sync with video. This usually means:

The audio and video manifests have different start times — the player handles this with timestamp alignment, but a naive concatenation does not
One stream uses MPEG-TS timestamps, the other uses fragmented MP4 timestamps — they need normalization

If this happens with the FFmpeg approach, add -itsoffset 0.0 (or a small negative value) and re-mux. Our extension auto-corrects up to ~500 ms of timestamp skew before muxing. If you see drift beyond that, send us the source URL and we will check it.

Common questions

Will separated audio and video be a problem for sites where it used to work?

Some streaming platforms still ship muxed segments — primarily older systems and small-scale CDNs. For those, any HLS downloader produces one file because there is only one manifest. But the major platforms (broadcast networks, video-hosting services, video-on-demand portals) all moved to separation between 2022-2025.

Why doesn’t browser-native “Save Video As” work?

<video> tag’s source is the master m3u8. The browser’s “Save As” tries to download the m3u8 as a text file, not the resolved video. Without parsing the manifest and downloading segments, you get a useless 2 KB text file.

Can I just record the screen instead?

You can, but: (1) you re-encode video, losing quality; (2) you re-record audio through your OS audio loopback, often at 48 kHz when the source was 44.1 kHz; (3) you cannot capture full-resolution video unless your screen is at the source resolution. Direct HLS download preserves the original codec, resolution, and audio quality bit-for-bit.

Is the FFmpeg.wasm approach as fast as native FFmpeg?

For mux operations (no re-encode), WebAssembly FFmpeg runs at roughly 80-90% the speed of native FFmpeg. For a 30-minute video this means muxing finishes in ~5 seconds in the browser vs ~4 seconds at the command line — well within the noise. For re-encoding it is slower, but mux operations don’t re-encode.

Bottom line

The audio-video split in modern HLS is a permanent change. Streams will continue to separate audio from video for the next decade because the alternatives are wasteful (storage, encoding, CDN cache).

Three ways to handle it:

FFmpeg command line — works but tedious for repeated use
yt-dlp / streamlink — works for public VOD, breaks on auth and live
Browser extension that auto-pairs and merges — works for everything in a logged-in session

If you are in category 3, install Video Downloader One-for-All. The audio-video split case is the specific reason the extension exists.