Enhances the WebVTT partial parser by adding comprehensive error handling, type validation, and defensive checks to prevent unexpected failures during parsing. Specifically, input types are validated in _MatchParser and parse_fragment, ensuring only valid strings or bytes are accepted. Timestamp parsing now raises clear errors for invalid matches, while regex operations are guarded to avoid NoneType attribute errors. The .decode() step in parse_fragment uses safe fallback to handle invalid byte sequences gracefully.
* [youtube] Fix `--youtube-skip-dash-manifest`
* [build] Use `$()` in `Makefile`. Closes#3684
* Fix bug in 385ffb467b
* Fix bug in 43d7f5a5d0
* [cleanup] Remove unnecessary `utf-8` from `str.encode`/`bytes.decode`
* [utils] LazyList: Expose unnecessarily "protected" attributes
and other minor cleanup
Using https://github.com/asottile/pyupgrade
1. `__future__` imports and `coding: utf-8` were removed
2. Files were rewritten with `pyupgrade --py36-plus --keep-percent-format`
3. f-strings were cherry-picked from `pyupgrade --py36-plus`
Extractors are left untouched (except removing header) to avoid unnecessary merge conflicts
Fixes: https://github.com/yt-dlp/yt-dlp/issues/631#issuecomment-893338552
Previous deduplication algorithm only removed duplicate cues with
identical text, styles and timestamps. This change also merges
cues that come in ‘daisy chains’, where sequences of cues with
identical text and styles appear in which the ending timestamp of
one equals the starting timestamp of the next.
This deduplication algorithm has the somewhat unfortunate side effect
that NOTE blocks between cues, if found, will be emitted in a different
order relative to their original cues. This may be unwanted if perfect
fidelity is desired, but then so is daisy-chain deduplication itself.
NOTE blocks ought to be ignored by WebVTT players in any case.
Authored by: fstirlitz