1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-02-23 17:05:58 +00:00

Compare commits

..

43 Commits

Author SHA1 Message Date
github-actions[bot]
e2a9cc7d13 Release 2026.02.21
Created by: bashonly

:ci skip all
2026-02-21 20:22:26 +00:00
Simon Sawicki
646bb31f39 [cleanup] Misc
Authored by: Grub4K
2026-02-21 21:07:56 +01:00
Simon Sawicki
1fbbe29b99 [ie] Limit netrc_machine parameter to shell-safe characters
Also adapts some extractor regexes to adhere to this limitation

See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm

Authored by: Grub4K
2026-02-21 21:07:36 +01:00
bashonly
c105461647 [ie/youtube] Update ejs to 0.5.0 (#16031)
Authored by: bashonly
2026-02-21 20:05:38 +00:00
bashonly
1d1358d09f [ie] Add browser impersonation support to more extractors (#16029)
Closes #7001, Closes #7444, Closes #16004
Authored by: bashonly
2026-02-21 19:24:05 +00:00
blauerdorf
1fe0bf23aa [ie/spankbang] Fix playlist title extraction (#14132)
Closes #14131
Authored by: blauerdorf
2026-02-21 18:57:20 +00:00
blauerdorf
f05e1cd1f1 [ie/spankbang] Support browser impersonation (#14130)
Closes #14129
Authored by: blauerdorf
2026-02-21 18:51:52 +00:00
bashonly
46d5b6f2b7 [ie/learningonscreen] Fix extractor (#16028)
Closes #15934
Authored by: bashonly, 0xvd
2026-02-21 18:27:33 +00:00
LordMZTE
166356d1a1 [ie/opencast] Support oc-p.uni-jena.de URLs (#16026)
Closes #16023
Authored by: LordMZTE
2026-02-21 18:01:34 +00:00
Sipherdrakon
2485653859 [ie/aenetworks] Fix extractor (#14959)
Closes #14578
Authored by: Sipherdrakon
2026-02-21 17:46:59 +00:00
bashonly
f532a91cef [ie/soundcloud] Support browser impersonation (#16020)
Closes #15660
Authored by: bashonly
2026-02-21 14:50:22 +00:00
bashonly
81bdea03f3 [ie/soundcloud] Fix client ID extraction (#16019)
Authored by: bashonly
2026-02-21 00:21:29 +00:00
bashonly
e74076141d [rh:curl_cffi] Deprioritize unreliable impersonate targets (#16018)
Closes #16012
Authored by: bashonly
2026-02-20 23:48:16 +00:00
Parker Wahle
97f03660f5 [ie/SaucePlusChannel] Add extractor (#15830)
Closes #14985
Authored by: regulad
2026-02-20 00:07:48 +00:00
bashonly
772559e3db [ie/tele5] Fix extractor (#16005)
Closes #16003
Authored by: bashonly
2026-02-19 23:53:53 +00:00
Achraf
c7945800e4 [ie/youtube:search:date] Remove broken ytsearchdate support (#15959)
Closes #15898
Authored by: stastix
2026-02-19 23:18:02 +00:00
bashonly
e2444584a3 [ie/facebook:ads] Fix extractor (#16002)
Closes #16000
Authored by: bashonly
2026-02-19 23:08:08 +00:00
bashonly
acfc00a955 [ie/vk] Solve JS challenges using native JS interpreter (#15992)
Closes #12970
Authored by: bashonly, 0xvd
2026-02-19 15:14:37 +00:00
bashonly
224fe478b0 [ie/dailymotion] Fix extraction (#15995)
Fix 2b61a2a4b2

Authored by: bashonly
2026-02-19 15:11:23 +00:00
bashonly
77221098fc [ie/twitter] Fix error handling again (#15999)
Fix 0d8898c3f4

Closes #15998
Authored by: bashonly
2026-02-19 15:03:07 +00:00
CanOfSocks
319a2bda83 [ie/youtube] Extract live adaptive incomplete formats (#15937)
Closes #10148
Authored by: CanOfSocks, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2026-02-18 23:52:13 +00:00
bashonly
2204cee6d8 [ie/youtube] Add more known player JS variants (#15975)
Authored by: bashonly
2026-02-18 20:23:00 +00:00
bashonly
071ad7dfa0 [ie/odnoklassniki] Fix inefficient regular expression (#15974)
Closes #15958
Authored by: bashonly
2026-02-18 20:03:24 +00:00
bashonly
0d8898c3f4 [ie/twitter] Fix error handling (#15993)
Closes #15963
Authored by: bashonly
2026-02-18 19:55:48 +00:00
bashonly
d108ca10b9 [jsinterp] Support string concatenation with + and += (#15990)
Authored by: bashonly
2026-02-17 23:46:20 +00:00
bashonly
c9c8651975 [jsinterp] Stringify bracket notation keys in object access (#15989)
Authored by: bashonly
2026-02-17 23:20:54 +00:00
bashonly
62574f5763 [jsinterp] Fix bitwise operations (#15985)
Authored by: bashonly
2026-02-17 23:10:18 +00:00
Simon Sawicki
abade83f8d [cleanup] Bump ruff to 0.15.x (#15951)
Authored by: Grub4K
2026-02-16 20:11:02 +00:00
bashonly
43229d1d5f [cookies] Ignore cookies with control characters (#15862)
http.cookies.Morsel was patched in Python 3.14.3 and 3.13.12
to raise a CookieError if the cookie name, value or any attribute
of its input contains a control character.

yt_dlp.cookies.LenientSimpleCookie now preemptively discards
any cookies containing control characters, which is consistent
with its more lenient parsing.

Ref: https://github.com/python/cpython/issues/143919

Closes #15849
Authored by: bashonly, syphyr

Co-authored-by: syphyr <syphyr@gmail.com>
2026-02-16 19:59:34 +00:00
Gareth Seddon
8d6e0b29bf [ie/MatchiTV] Add extractor (#15204)
Authored by: gseddon
2026-02-12 08:14:56 +00:00
Corey Wright
1ea7329cc9 [ie/ApplePodcasts] Fix extractor (#15901)
Closes #15900
Authored by: coreywright
2026-02-12 08:09:37 +00:00
doe1080
a13f281012 [ie/tvo] Add extractor (#15903)
Authored by: doe1080
2026-02-09 20:57:54 +00:00
doe1080
02ce3efbfe [ie/tver:olympic] Add extractor (#15885)
Authored by: doe1080
2026-02-09 20:56:39 +00:00
doe1080
1a9c4b8238 [ie/steam] Fix extractor (#15028)
Closes #15014
Authored by: doe1080
2026-02-09 20:33:36 +00:00
bashonly
637ae202ac [ie/gem.cbc.ca] Support standalone, series & Olympics URLs (#15878)
Closes #8382, Closes #8790, Closes #15850
Authored by: bashonly, makew0rld, 0xvd

Co-authored-by: makeworld <makeworld@protonmail.com>
Co-authored-by: 0xvd <0xvd12@gmail.com>
2026-02-07 23:12:45 +00:00
hunter-gatherer8
23c059a455 [ie/1tv] Extract chapters (#15848)
Authored by: hunter-gatherer8
2026-02-06 20:45:47 +00:00
beacdeac
6f38df31b4 [ie/pornhub] Fix extractor (#15858)
Closes #15827
Authored by: beacdeac
2026-02-06 20:41:56 +00:00
doe1080
442c90da3e [ie/locipo] Add extractors (#15486)
Closes #13656
Authored by: doe1080, gravesducking

Co-authored-by: gravesducking <219445875+gravesducking@users.noreply.github.com>
2026-02-04 21:06:39 +00:00
0x∅
133cb959be [ie/xhamster] Fix extractor (#15831)
Closes #15802
Authored by: 0xvd
2026-02-04 20:49:07 +00:00
doe1080
c7c45f5289 [ie/visir] Add extractor (#15811)
Closes #11901
Authored by: doe1080
2026-02-04 15:33:00 +00:00
github-actions[bot]
bb3af7e6d5 Release 2026.02.04
Created by: bashonly

:ci skip all
2026-02-04 00:31:48 +00:00
doe1080
c677d866d4 [ie/unsupported] Update unsupported URLs (#15812)
Closes #8821, Closes #9851, Closes #13220, Closes #14564, Closes #14620
Authored by: doe1080
2026-02-03 23:30:59 +00:00
bashonly
1a895c18aa [ie/youtube] Default to tv player JS variant (#15818)
Closes #15814
Authored by: bashonly
2026-02-03 23:26:30 +00:00
61 changed files with 2011 additions and 523 deletions

View File

@@ -864,3 +864,13 @@ Sytm
zahlman
azdlonky
thematuu
beacdeac
blauerdorf
CanOfSocks
gravesducking
gseddon
hunter-gatherer8
LordMZTE
regulad
stastix
syphyr

View File

@@ -4,6 +4,69 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2026.02.21
#### Important changes
- Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)
- The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters
#### Core changes
- **cookies**: [Ignore cookies with control characters](https://github.com/yt-dlp/yt-dlp/commit/43229d1d5f47b313e1958d719faff6321d853ed3) ([#15862](https://github.com/yt-dlp/yt-dlp/issues/15862)) by [bashonly](https://github.com/bashonly), [syphyr](https://github.com/syphyr)
- **jsinterp**
- [Fix bitwise operations](https://github.com/yt-dlp/yt-dlp/commit/62574f5763755a8637880044630b12582e4a55a5) ([#15985](https://github.com/yt-dlp/yt-dlp/issues/15985)) by [bashonly](https://github.com/bashonly)
- [Stringify bracket notation keys in object access](https://github.com/yt-dlp/yt-dlp/commit/c9c86519753d6cdafa052945d2de0d3fcd448927) ([#15989](https://github.com/yt-dlp/yt-dlp/issues/15989)) by [bashonly](https://github.com/bashonly)
- [Support string concatenation with `+` and `+=`](https://github.com/yt-dlp/yt-dlp/commit/d108ca10b926410ed99031fec86894bfdea8f8eb) ([#15990](https://github.com/yt-dlp/yt-dlp/issues/15990)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- [Add browser impersonation support to more extractors](https://github.com/yt-dlp/yt-dlp/commit/1d1358d09fedcdc6b3e83538a29b0b539cb9be3f) ([#16029](https://github.com/yt-dlp/yt-dlp/issues/16029)) by [bashonly](https://github.com/bashonly)
- [Limit `netrc_machine` parameter to shell-safe characters](https://github.com/yt-dlp/yt-dlp/commit/1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a) by [Grub4K](https://github.com/Grub4K)
- **1tv**: [Extract chapters](https://github.com/yt-dlp/yt-dlp/commit/23c059a455acbb317b2bbe657efd59113bf4d5ac) ([#15848](https://github.com/yt-dlp/yt-dlp/issues/15848)) by [hunter-gatherer8](https://github.com/hunter-gatherer8)
- **aenetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/24856538595a3b25c75e1199146fcc82ea812d97) ([#14959](https://github.com/yt-dlp/yt-dlp/issues/14959)) by [Sipherdrakon](https://github.com/Sipherdrakon)
- **applepodcasts**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1ea7329cc91da38a790174e831fffafcb3ea3c3d) ([#15901](https://github.com/yt-dlp/yt-dlp/issues/15901)) by [coreywright](https://github.com/coreywright)
- **dailymotion**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/224fe478b0ef83d13b36924befa53686290cb000) ([#15995](https://github.com/yt-dlp/yt-dlp/issues/15995)) by [bashonly](https://github.com/bashonly)
- **facebook**: ads: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e2444584a3e590077b81828ad8a12fc4c3b1aa6d) ([#16002](https://github.com/yt-dlp/yt-dlp/issues/16002)) by [bashonly](https://github.com/bashonly)
- **gem.cbc.ca**: [Support standalone, series & Olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/637ae202aca7a990b3b61bc33d692870dc16c3ad) ([#15878](https://github.com/yt-dlp/yt-dlp/issues/15878)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly), [makew0rld](https://github.com/makew0rld)
- **learningonscreen**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/46d5b6f2b7989d8991a59215d434fb8b5a8ec7bb) ([#16028](https://github.com/yt-dlp/yt-dlp/issues/16028)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
- **locipo**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/442c90da3ec680037b7d94abf91ec63b2e5a9ade) ([#15486](https://github.com/yt-dlp/yt-dlp/issues/15486)) by [doe1080](https://github.com/doe1080), [gravesducking](https://github.com/gravesducking)
- **matchitv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/8d6e0b29bf15365638e0ceeb803a274e4db6157d) ([#15204](https://github.com/yt-dlp/yt-dlp/issues/15204)) by [gseddon](https://github.com/gseddon)
- **odnoklassniki**: [Fix inefficient regular expression](https://github.com/yt-dlp/yt-dlp/commit/071ad7dfa012f5b71572d29ef96fc154cb2dc9cc) ([#15974](https://github.com/yt-dlp/yt-dlp/issues/15974)) by [bashonly](https://github.com/bashonly)
- **opencast**: [Support `oc-p.uni-jena.de` URLs](https://github.com/yt-dlp/yt-dlp/commit/166356d1a1cac19cac14298e735eeae44b52c70e) ([#16026](https://github.com/yt-dlp/yt-dlp/issues/16026)) by [LordMZTE](https://github.com/LordMZTE)
- **pornhub**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6f38df31b477cf5ea3c8f91207452e3a4e8d5aa6) ([#15858](https://github.com/yt-dlp/yt-dlp/issues/15858)) by [beacdeac](https://github.com/beacdeac)
- **saucepluschannel**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/97f03660f55696dc9fce56e7ee43fbe3324a9867) ([#15830](https://github.com/yt-dlp/yt-dlp/issues/15830)) by [regulad](https://github.com/regulad)
- **soundcloud**
- [Fix client ID extraction](https://github.com/yt-dlp/yt-dlp/commit/81bdea03f3414dd4d086610c970ec14e15bd3d36) ([#16019](https://github.com/yt-dlp/yt-dlp/issues/16019)) by [bashonly](https://github.com/bashonly)
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f532a91cef11075eb5a7809255259b32d2bca8ca) ([#16020](https://github.com/yt-dlp/yt-dlp/issues/16020)) by [bashonly](https://github.com/bashonly)
- **spankbang**
- [Fix playlist title extraction](https://github.com/yt-dlp/yt-dlp/commit/1fe0bf23aa2249858c08408b7cc6287aaf528690) ([#14132](https://github.com/yt-dlp/yt-dlp/issues/14132)) by [blauerdorf](https://github.com/blauerdorf)
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f05e1cd1f1052cb40fc966d2fc175571986da863) ([#14130](https://github.com/yt-dlp/yt-dlp/issues/14130)) by [blauerdorf](https://github.com/blauerdorf)
- **steam**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1a9c4b8238434c760b3e27d0c9df6a4a2482d918) ([#15028](https://github.com/yt-dlp/yt-dlp/issues/15028)) by [doe1080](https://github.com/doe1080)
- **tele5**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/772559e3db2eb82e5d862d6d779588ca4b0b048d) ([#16005](https://github.com/yt-dlp/yt-dlp/issues/16005)) by [bashonly](https://github.com/bashonly)
- **tver**: olympic: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/02ce3efbfe51d54cb0866953af423fc6d1f38933) ([#15885](https://github.com/yt-dlp/yt-dlp/issues/15885)) by [doe1080](https://github.com/doe1080)
- **tvo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/a13f281012a21c85f76cf3e320fc3b00d480d6c6) ([#15903](https://github.com/yt-dlp/yt-dlp/issues/15903)) by [doe1080](https://github.com/doe1080)
- **twitter**: [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/0d8898c3f4e76742afb2b877f817fdee89fa1258) ([#15993](https://github.com/yt-dlp/yt-dlp/issues/15993)) by [bashonly](https://github.com/bashonly) (With fixes in [7722109](https://github.com/yt-dlp/yt-dlp/commit/77221098fc5016f12118421982f02b662021972c))
- **visir**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/c7c45f52890eee40565188aee874ff4e58e95c4f) ([#15811](https://github.com/yt-dlp/yt-dlp/issues/15811)) by [doe1080](https://github.com/doe1080)
- **vk**: [Solve JS challenges using native JS interpreter](https://github.com/yt-dlp/yt-dlp/commit/acfc00a955208ee780b4cb18ae26de7b62444153) ([#15992](https://github.com/yt-dlp/yt-dlp/issues/15992)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
- **xhamster**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/133cb959be4d268e2cd6b3f1d9bf87fba4c3743e) ([#15831](https://github.com/yt-dlp/yt-dlp/issues/15831)) by [0xvd](https://github.com/0xvd)
- **youtube**
- [Add more known player JS variants](https://github.com/yt-dlp/yt-dlp/commit/2204cee6d8301e491d8455a2c54fd0e1b23468f5) ([#15975](https://github.com/yt-dlp/yt-dlp/issues/15975)) by [bashonly](https://github.com/bashonly)
- [Extract live adaptive `incomplete` formats](https://github.com/yt-dlp/yt-dlp/commit/319a2bda83f5e54054661c56c1391533f82473c2) ([#15937](https://github.com/yt-dlp/yt-dlp/issues/15937)) by [bashonly](https://github.com/bashonly), [CanOfSocks](https://github.com/CanOfSocks)
- [Update ejs to 0.5.0](https://github.com/yt-dlp/yt-dlp/commit/c105461647315f7f479091194944713b392ca729) ([#16031](https://github.com/yt-dlp/yt-dlp/issues/16031)) by [bashonly](https://github.com/bashonly)
- date, search: [Remove broken `ytsearchdate` support](https://github.com/yt-dlp/yt-dlp/commit/c7945800e4ccd8cad2d5ee7806a872963c0c6d44) ([#15959](https://github.com/yt-dlp/yt-dlp/issues/15959)) by [stastix](https://github.com/stastix)
#### Networking changes
- **Request Handler**: curl_cffi: [Deprioritize unreliable impersonate targets](https://github.com/yt-dlp/yt-dlp/commit/e74076141dc86d5603680ea641d7cec86a821ac8) ([#16018](https://github.com/yt-dlp/yt-dlp/issues/16018)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **cleanup**
- [Bump ruff to 0.15.x](https://github.com/yt-dlp/yt-dlp/commit/abade83f8ddb63a11746b69038ebcd9c1405a00a) ([#15951](https://github.com/yt-dlp/yt-dlp/issues/15951)) by [Grub4K](https://github.com/Grub4K)
- Miscellaneous: [646bb31](https://github.com/yt-dlp/yt-dlp/commit/646bb31f39614e6c2f7ba687c53e7496394cbadb) by [Grub4K](https://github.com/Grub4K)
### 2026.02.04
#### Extractor changes
- **unsupported**: [Update unsupported URLs](https://github.com/yt-dlp/yt-dlp/commit/c677d866d41eb4075b0a5e0c944a6543fc13f15d) ([#15812](https://github.com/yt-dlp/yt-dlp/issues/15812)) by [doe1080](https://github.com/doe1080)
- **youtube**: [Default to `tv` player JS variant](https://github.com/yt-dlp/yt-dlp/commit/1a895c18aaaf00f557aa8cbacb21faa638842431) ([#15818](https://github.com/yt-dlp/yt-dlp/issues/15818)) by [bashonly](https://github.com/bashonly)
### 2026.01.31
#### Extractor changes

View File

@@ -202,9 +202,9 @@ CONTRIBUTORS: Changelog.md
# The following EJS_-prefixed variables are auto-generated by devscripts/update_ejs.py
# DO NOT EDIT!
EJS_VERSION = 0.4.0
EJS_WHEEL_NAME = yt_dlp_ejs-0.4.0-py3-none-any.whl
EJS_WHEEL_HASH = sha256:19278cff397b243074df46342bb7616c404296aeaff01986b62b4e21823b0b9c
EJS_VERSION = 0.5.0
EJS_WHEEL_NAME = yt_dlp_ejs-0.5.0-py3-none-any.whl
EJS_WHEEL_HASH = sha256:674fc0efea741d3100cdf3f0f9e123150715ee41edf47ea7a62fbdeda204bdec
EJS_PY_FOLDERS = yt_dlp_ejs yt_dlp_ejs/yt yt_dlp_ejs/yt/solver
EJS_PY_FILES = yt_dlp_ejs/__init__.py yt_dlp_ejs/_version.py yt_dlp_ejs/yt/__init__.py yt_dlp_ejs/yt/solver/__init__.py
EJS_JS_FOLDERS = yt_dlp_ejs/yt/solver

View File

@@ -406,7 +406,7 @@ Tip: Use `CTRL`+`F` (or `Command`+`F`) to search by keywords
(default)
--live-from-start Download livestreams from the start.
Currently experimental and only supported
for YouTube and Twitch
for YouTube, Twitch, and TVer
--no-live-from-start Download livestreams from the current time
(default)
--wait-for-video MIN[-MAX] Wait for scheduled streams to become
@@ -1864,13 +1864,13 @@ The following extractors use this feature:
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `tv`, `tv_es6`, `phone`, `tablet`. The default is `main`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site
* `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `es6_tcc`, `es6_tce`, `tv`, `tv_es6`, `phone`, `house`. The default is `tv`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site
* `player_js_version`: The player javascript version to use for n/sig deciphering, in the format of `signature_timestamp@hash` (e.g. `20348@0004de42`). The default is to use what is prescribed by the site, and can be selected with `actual`
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread,max-depth`. Default is `all,all,all,all,all`
* A `max-depth` value of `1` will discard all replies, regardless of the `max-replies` or `max-replies-per-thread` values given
* E.g. `all,all,1000,10,2` will get a maximum of 1000 replies total, with up to 10 replies per thread, and only 2 levels of depth (i.e. top-level comments plus their immediate replies). `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash, live adaptive https, and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
@@ -2261,7 +2261,7 @@ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
* **YouTube improvements**:
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefixes (`ytsearch:`, `ytsearchdate:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefix (`ytsearch:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)
* Fix for [n-sig based throttling](https://github.com/ytdl-org/youtube-dl/issues/29326) **\***
* Download livestreams from the start using `--live-from-start` (*experimental*)
* Channel URLs download all uploads of the channel, including shorts and live

View File

@@ -337,5 +337,10 @@
"when": "e2ea6bd6ab639f910b99e55add18856974ff4c3a",
"short": "[ie] Fix prioritization of Youtube URL matching (#15596)",
"authors": ["Grub4K"]
},
{
"action": "add",
"when": "1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a",
"short": "[priority] Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)\n - The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters"
}
]

View File

@@ -55,7 +55,7 @@ default = [
"requests>=2.32.2,<3",
"urllib3>=2.0.2,<3",
"websockets>=13.0",
"yt-dlp-ejs==0.4.0",
"yt-dlp-ejs==0.5.0",
]
curl-cffi = [
"curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.15; implementation_name=='cpython'",
@@ -85,7 +85,7 @@ dev = [
]
static-analysis = [
"autopep8~=2.0",
"ruff~=0.14.0",
"ruff~=0.15.0",
]
test = [
"pytest~=8.1",

View File

@@ -506,7 +506,8 @@ The only reliable way to check if a site is supported is to try it.
- **GDCVault**: [*gdcvault*](## "netrc machine") (**Currently broken**)
- **GediDigital**
- **gem.cbc.ca**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:live**
- **gem.cbc.ca:live**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:olympics**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine")
- **Genius**
- **GeniusLyrics**
@@ -734,6 +735,8 @@ The only reliable way to check if a site is supported is to try it.
- **Livestreamfails**
- **Lnk**
- **loc**: Library of Congress
- **Locipo**
- **LocipoPlaylist**
- **Loco**
- **loom**
- **loom:folder**: (**Currently broken**)
@@ -763,6 +766,7 @@ The only reliable way to check if a site is supported is to try it.
- **MarkizaPage**: (**Currently broken**)
- **massengeschmack.tv**
- **Masters**
- **MatchiTV**
- **MatchTV**
- **mave**
- **mave:channel**
@@ -1283,6 +1287,7 @@ The only reliable way to check if a site is supported is to try it.
- **Sangiin**: 参議院インターネット審議中継 (archive)
- **Sapo**: SAPO Vídeos
- **SaucePlus**: Sauce+
- **SaucePlusChannel**
- **SBS**: sbs.com.au
- **sbs.co.kr**
- **sbs.co.kr:allvod_program**
@@ -1550,10 +1555,12 @@ The only reliable way to check if a site is supported is to try it.
- **TVC**
- **TVCArticle**
- **TVer**
- **tver:olympic**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **TVIPlayer**
- **TVN24**: (**Currently broken**)
- **tvnoe**: Televize Noe
- **TVO**
- **tvopengr:embed**: tvopen.gr embedded videos
- **tvopengr:watch**: tvopen.gr (and ethnos.gr) videos
- **tvp**: Telewizja Polska
@@ -1664,6 +1671,7 @@ The only reliable way to check if a site is supported is to try it.
- **ViMP:Playlist**
- **Viously**
- **Viqeo**: (**Currently broken**)
- **Visir**: Vísir
- **Viu**
- **viu:ott**: [*viu*](## "netrc machine")
- **viu:playlist**
@@ -1812,7 +1820,6 @@ The only reliable way to check if a site is supported is to try it.
- **youtube:playlist**: [*youtube*](## "netrc machine") YouTube playlists
- **youtube:recommended**: [*youtube*](## "netrc machine") YouTube recommended videos; ":ytrec" keyword
- **youtube:search**: [*youtube*](## "netrc machine") YouTube search; "ytsearch:" prefix
- **youtube:search:date**: [*youtube*](## "netrc machine") YouTube search, newest videos first; "ytsearchdate:" prefix
- **youtube:search_url**: [*youtube*](## "netrc machine") YouTube search URLs with sorting and filter support
- **youtube:shorts:pivot:audio**: [*youtube*](## "netrc machine") YouTube Shorts audio pivot (Shorts using audio of a given video)
- **youtube:subscriptions**: [*youtube*](## "netrc machine") YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)

View File

@@ -294,7 +294,7 @@ def expect_info_dict(self, got_dict, expected_dict):
missing_keys = sorted(
test_info_dict.keys() - expected_dict.keys(),
key=lambda x: ALLOWED_KEYS_SORT_ORDER.index(x))
key=ALLOWED_KEYS_SORT_ORDER.index)
if missing_keys:
def _repr(v):
if isinstance(v, str):

View File

@@ -76,6 +76,8 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._get_netrc_login_info(netrc_machine='empty_pass'), ('user', ''))
self.assertEqual(ie._get_netrc_login_info(netrc_machine='both_empty'), ('', ''))
self.assertEqual(ie._get_netrc_login_info(netrc_machine='nonexistent'), (None, None))
with self.assertRaises(ExtractorError):
ie._get_netrc_login_info(netrc_machine=';echo rce')
def test_html_search_regex(self):
html = '<p id="foo">Watch this <a href="http://www.youtube.com/watch?v=BaW_jenozKc">video</a></p>'

View File

@@ -205,8 +205,8 @@ class TestLenientSimpleCookie(unittest.TestCase):
),
(
'Test quoted cookie',
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
{'keebler': 'E=mc2; L="Loves"; fudge=\012;'},
'keebler="E=mc2; L=\\"Loves\\"; fudge=;"',
{'keebler': 'E=mc2; L="Loves"; fudge=;'},
),
(
"Allow '=' in an unquoted value",
@@ -328,4 +328,30 @@ class TestLenientSimpleCookie(unittest.TestCase):
'Key=Value; [Invalid]=Value; Another=Value',
{'Key': 'Value', 'Another': 'Value'},
),
# Ref: https://github.com/python/cpython/issues/143919
(
'Test invalid cookie name w/ control character',
'foo\012=bar;',
{},
),
(
'Test invalid cookie name w/ control character 2',
'foo\015baz=bar',
{},
),
(
'Test invalid cookie name w/ control character followed by valid cookie',
'foo\015=bar; x=y;',
{'x': 'y'},
),
(
'Test invalid cookie value w/ control character',
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
{},
),
(
'Test invalid quoted attribute value w/ control character',
'Customer="WILE_E_COYOTE"; Version="1\\012"; Path="/acme"',
{},
),
)

View File

@@ -33,9 +33,12 @@ class Variant(enum.Enum):
tce = 'player_ias_tce.vflset/en_US/base.js'
es5 = 'player_es5.vflset/en_US/base.js'
es6 = 'player_es6.vflset/en_US/base.js'
es6_tcc = 'player_es6_tcc.vflset/en_US/base.js'
es6_tce = 'player_es6_tce.vflset/en_US/base.js'
tv = 'tv-player-ias.vflset/tv-player-ias.js'
tv_es6 = 'tv-player-es6.vflset/tv-player-es6.js'
phone = 'player-plasma-ias-phone-en_US.vflset/base.js'
house = 'house_brand_player.vflset/en_US/base.js'
@dataclasses.dataclass
@@ -102,6 +105,66 @@ CHALLENGES: list[Challenge] = [
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'ttJC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNGa1kOyBf6HPuAuCduh-_',
}),
# 4e51e895: main variant broke sig solving; n challenge is added only for regression testing
Challenge('4e51e895', Variant.main, JsChallengeType.N, {
'0eRGgQWJGfT5rFHFj': 't5kO23_msekBur',
}),
Challenge('4e51e895', Variant.main, JsChallengeType.SIG, {
'AL6p_8AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7':
'AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n9pKizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7',
}),
# 42c5570b: tce variant broke sig solving; n challenge is added only for regression testing
Challenge('42c5570b', Variant.tce, JsChallengeType.N, {
'ZdZIqFPQK-Ty8wId': 'CRoXjB-R-R',
}),
Challenge('42c5570b', Variant.tce, JsChallengeType.SIG, {
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'EN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavcOmNdYN-wUtgEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt',
}),
# 54bd1de4: tce variant broke sig solving; n challenge is added only for regression testing
Challenge('54bd1de4', Variant.tce, JsChallengeType.N, {
'ZdZIqFPQK-Ty8wId': 'ka-slAQ31sijFN',
}),
Challenge('54bd1de4', Variant.tce, JsChallengeType.SIG, {
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0titeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtp',
}),
# 94667337: tce and es6 variants broke sig solving; n and main/tv variants are added only for regression testing
Challenge('94667337', Variant.main, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.main, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.tv, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.tv, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.es6, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.es6, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.tce, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.tce, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.es6_tce, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.es6_tce, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
]
requests: list[JsChallengeRequest] = []

View File

@@ -9,7 +9,12 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import math
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter, js_number_to_string
from yt_dlp.jsinterp import (
JS_Undefined,
JSInterpreter,
int_to_int32,
js_number_to_string,
)
class NaN:
@@ -101,8 +106,16 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f(){return 5 ^ 9;}', 12)
self._test('function f(){return 0.0 << NaN}', 0)
self._test('function f(){return null << undefined}', 0)
# TODO: Does not work due to number too large
# self._test('function f(){return 21 << 4294967297}', 42)
self._test('function f(){return -12616 ^ 5041}', -8951)
self._test('function f(){return 21 << 4294967297}', 42)
def test_string_concat(self):
self._test('function f(){return "a" + "b";}', 'ab')
self._test('function f(){let x = "a"; x += "b"; return x;}', 'ab')
self._test('function f(){return "a" + 1;}', 'a1')
self._test('function f(){let x = "a"; x += 1; return x;}', 'a1')
self._test('function f(){return 2 + "b";}', '2b')
self._test('function f(){let x = 2; x += "b"; return x;}', '2b')
def test_array_access(self):
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])
@@ -325,6 +338,7 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f() { let a = {m1: 42, m2: 0 }; return [a["m1"], a.m2]; }', [42, 0])
self._test('function f() { let a; return a?.qq; }', JS_Undefined)
self._test('function f() { let a = {m1: 42, m2: 0 }; return a?.qq; }', JS_Undefined)
self._test('function f() { let a = {"1": 123}; return a[1]; }', 123)
def test_regex(self):
self._test('function f() { let a=/,,[/,913,/](,)}/; }', None)
@@ -447,6 +461,22 @@ class TestJSInterpreter(unittest.TestCase):
def test_splice(self):
self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0'])
def test_int_to_int32(self):
for inp, exp in [
(0, 0),
(1, 1),
(-1, -1),
(-8951, -8951),
(2147483647, 2147483647),
(2147483648, -2147483648),
(2147483649, -2147483647),
(-2147483649, 2147483647),
(-2147483648, -2147483648),
(-16799986688, 379882496),
(39570129568, 915423904),
]:
assert int_to_int32(inp) == exp
def test_js_number_to_string(self):
for test, radix, expected in [
(0, None, '0'),

View File

@@ -1004,6 +1004,7 @@ class TestUrllibRequestHandler(TestRequestHandlerBase):
@pytest.mark.parametrize('handler', ['Requests'], indirect=True)
class TestRequestsRequestHandler(TestRequestHandlerBase):
# ruff: disable[PLW0108] `requests` and/or `urllib3` may not be available
@pytest.mark.parametrize('raised,expected', [
(lambda: requests.exceptions.ConnectTimeout(), TransportError),
(lambda: requests.exceptions.ReadTimeout(), TransportError),
@@ -1017,8 +1018,10 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
# catch-all: https://github.com/psf/requests/blob/main/src/requests/adapters.py#L535
(lambda: urllib3.exceptions.HTTPError(), TransportError),
(lambda: requests.exceptions.RequestException(), RequestError),
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError) - Needs a response object
# Needs a response object
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError),
])
# ruff: enable[PLW0108]
def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
with handler() as rh:
def mock_get_instance(*args, **kwargs):
@@ -1034,6 +1037,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
assert exc_info.type is expected
# ruff: disable[PLW0108] `urllib3` may not be available
@pytest.mark.parametrize('raised,expected,match', [
(lambda: urllib3.exceptions.SSLError(), SSLError, None),
(lambda: urllib3.exceptions.TimeoutError(), TransportError, None),
@@ -1052,6 +1056,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
'3 bytes read, 5 more expected',
),
])
# ruff: enable[PLW0108]
def test_response_error_mapping(self, handler, monkeypatch, raised, expected, match):
from requests.models import Response as RequestsResponse
from urllib3.response import HTTPResponse as Urllib3Response

View File

@@ -239,6 +239,7 @@ class TestTraversal:
'accept matching `expected_type` type'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int) is None, \
'reject non matching `expected_type` type'
# ruff: noqa: PLW0108 `type`s get special treatment, so wrap in lambda
assert traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)) == '0', \
'transform type using type function'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0) is None, \

View File

@@ -924,6 +924,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('desember', 'is'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)

View File

@@ -448,6 +448,7 @@ def create_fake_ws_connection(raised):
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
class TestWebsocketsRequestHandler:
# ruff: disable[PLW0108] `websockets` may not be available
@pytest.mark.parametrize('raised,expected', [
# https://websockets.readthedocs.io/en/stable/reference/exceptions.html
(lambda: websockets.exceptions.InvalidURI(msg='test', uri='test://'), RequestError),
@@ -459,13 +460,14 @@ class TestWebsocketsRequestHandler:
(lambda: websockets.exceptions.NegotiationError(), TransportError),
# Catch-all
(lambda: websockets.exceptions.WebSocketException(), TransportError),
(lambda: TimeoutError(), TransportError),
(TimeoutError, TransportError),
# These may be raised by our create_connection implementation, which should also be caught
(lambda: OSError(), TransportError),
(lambda: ssl.SSLError(), SSLError),
(lambda: ssl.SSLCertVerificationError(), CertificateVerifyError),
(lambda: socks.ProxyError(), ProxyError),
(OSError, TransportError),
(ssl.SSLError, SSLError),
(ssl.SSLCertVerificationError, CertificateVerifyError),
(socks.ProxyError, ProxyError),
])
# ruff: enable[PLW0108]
def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
import websockets.sync.client
@@ -482,11 +484,12 @@ class TestWebsocketsRequestHandler:
@pytest.mark.parametrize('raised,expected,match', [
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.send
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
(lambda: RuntimeError(), TransportError, None),
(lambda: TimeoutError(), TransportError, None),
(lambda: TypeError(), RequestError, None),
(lambda: socks.ProxyError(), ProxyError, None),
(RuntimeError, TransportError, None),
(TimeoutError, TransportError, None),
(TypeError, RequestError, None),
(socks.ProxyError, ProxyError, None),
# Catch-all
# ruff: noqa: PLW0108 `websockets` may not be available
(lambda: websockets.exceptions.WebSocketException(), TransportError, None),
])
def test_ws_send_error_mapping(self, handler, monkeypatch, raised, expected, match):
@@ -499,10 +502,11 @@ class TestWebsocketsRequestHandler:
@pytest.mark.parametrize('raised,expected,match', [
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.recv
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
(lambda: RuntimeError(), TransportError, None),
(lambda: TimeoutError(), TransportError, None),
(lambda: socks.ProxyError(), ProxyError, None),
(RuntimeError, TransportError, None),
(TimeoutError, TransportError, None),
(socks.ProxyError, ProxyError, None),
# Catch-all
# ruff: noqa: PLW0108 `websockets` may not be available
(lambda: websockets.exceptions.WebSocketException(), TransportError, None),
])
def test_ws_recv_error_mapping(self, handler, monkeypatch, raised, expected, match):

View File

@@ -1168,6 +1168,7 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
# We use Morsel's legal key chars to avoid errors on setting values
_LEGAL_KEY_CHARS = r'\w\d' + re.escape('!#$%&\'*+-.:^_`|~')
_LEGAL_VALUE_CHARS = _LEGAL_KEY_CHARS + re.escape('(),/<=>?@[]{}')
_LEGAL_KEY_RE = re.compile(rf'[{_LEGAL_KEY_CHARS}]+', re.ASCII)
_RESERVED = {
'expires',
@@ -1185,17 +1186,17 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
# Added 'bad' group to catch the remaining value
_COOKIE_PATTERN = re.compile(r'''
\s* # Optional whitespace at start of cookie
[ ]* # Optional whitespace at start of cookie
(?P<key> # Start of group 'key'
[''' + _LEGAL_KEY_CHARS + r''']+?# Any word of at least one letter
[^ =;]+ # Match almost anything here for now and validate later
) # End of group 'key'
( # Optional group: there may not be a value.
\s*=\s* # Equal Sign
[ ]*=[ ]* # Equal Sign
( # Start of potential value
(?P<val> # Start of group 'val'
"(?:[^\\"]|\\.)*" # Any doublequoted string
| # or
\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr
\w{3},\ [\w\d -]{9,11}\ [\d:]{8}\ GMT # Special case for "expires" attr
| # or
[''' + _LEGAL_VALUE_CHARS + r''']* # Any word or empty string
) # End of group 'val'
@@ -1203,10 +1204,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
(?P<bad>(?:\\;|[^;])*?) # 'bad' group fallback for invalid values
) # End of potential value
)? # End of optional value group
\s* # Any number of spaces.
(\s+|;|$) # Ending either at space, semicolon, or EOS.
[ ]* # Any number of spaces.
([ ]+|;|$) # Ending either at space, semicolon, or EOS.
''', re.ASCII | re.VERBOSE)
# http.cookies.Morsel raises on values w/ control characters in Python 3.14.3+ & 3.13.12+
# Ref: https://github.com/python/cpython/issues/143919
_CONTROL_CHARACTER_RE = re.compile(r'[\x00-\x1F\x7F]')
def load(self, data):
# Workaround for https://github.com/yt-dlp/yt-dlp/issues/4776
if not isinstance(data, str):
@@ -1219,6 +1224,9 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
continue
key, value = match.group('key', 'val')
if not self._LEGAL_KEY_RE.fullmatch(key):
morsel = None
continue
is_attribute = False
if key.startswith('$'):
@@ -1237,6 +1245,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
value = True
else:
value, _ = self.value_decode(value)
# Guard against control characters in quoted attribute values
if self._CONTROL_CHARACTER_RE.search(value):
# While discarding the entire morsel is not very lenient,
# it's better than http.cookies.Morsel raising a CookieError
# and it's probably better to err on the side of caution
self.pop(morsel.key, None)
morsel = None
continue
morsel[key] = value
@@ -1246,6 +1262,10 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
elif value is not None:
morsel = self.get(key, http.cookies.Morsel())
real_value, coded_value = self.value_decode(value)
# Guard against control characters in quoted cookie values
if self._CONTROL_CHARACTER_RE.search(real_value):
morsel = None
continue
morsel.set(key, real_value, coded_value)
self[key] = morsel

View File

@@ -311,8 +311,10 @@ from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE
from .cbc import (
CBCIE,
CBCGemContentIE,
CBCGemIE,
CBCGemLiveIE,
CBCGemOlympicsIE,
CBCGemPlaylistIE,
CBCListenIE,
CBCPlayerIE,
@@ -1029,6 +1031,10 @@ from .livestream import (
)
from .livestreamfails import LivestreamfailsIE
from .lnk import LnkIE
from .locipo import (
LocipoIE,
LocipoPlaylistIE,
)
from .loco import LocoIE
from .loom import (
LoomFolderIE,
@@ -1071,6 +1077,7 @@ from .markiza import (
)
from .massengeschmacktv import MassengeschmackTVIE
from .masters import MastersIE
from .matchitv import MatchiTVIE
from .matchtv import MatchTVIE
from .mave import (
MaveChannelIE,
@@ -1785,7 +1792,10 @@ from .safari import (
from .saitosan import SaitosanIE
from .samplefocus import SampleFocusIE
from .sapo import SapoIE
from .sauceplus import SaucePlusIE
from .sauceplus import (
SaucePlusChannelIE,
SaucePlusIE,
)
from .sbs import SBSIE
from .sbscokr import (
SBSCoKrAllvodProgramIE,
@@ -2174,11 +2184,15 @@ from .tvc import (
TVCIE,
TVCArticleIE,
)
from .tver import TVerIE
from .tver import (
TVerIE,
TVerOlympicIE,
)
from .tvigle import TvigleIE
from .tviplayer import TVIPlayerIE
from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE
from .tvo import TvoIE
from .tvopengr import (
TVOpenGrEmbedIE,
TVOpenGrWatchIE,
@@ -2343,6 +2357,7 @@ from .vimm import (
)
from .viously import ViouslyIE
from .viqeo import ViqeoIE
from .visir import VisirIE
from .viu import (
ViuIE,
ViuOTTIE,
@@ -2541,7 +2556,6 @@ from .youtube import (
YoutubeNotificationsIE,
YoutubePlaylistIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubeShortsAudioPivotIE,

View File

@@ -5,10 +5,12 @@ from ..utils import (
ExtractorError,
GeoRestrictedError,
int_or_none,
make_archive_id,
remove_start,
traverse_obj,
update_url_query,
url_or_none,
)
from ..utils.traversal import traverse_obj
class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
@@ -29,6 +31,19 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'historyvault.com': (None, 'historyvault', None),
'biography.com': (None, 'biography', None),
}
_GRAPHQL_QUERY = '''
query getUserVideo($videoId: ID!) {
video(id: $videoId) {
title
publicUrl
programId
tvSeasonNumber
tvSeasonEpisodeNumber
series {
title
}
}
}'''
def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {
@@ -73,19 +88,39 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand, software_statement = self._DOMAIN_MAP[domain]
if filter_key == 'canonical':
webpage = self._download_webpage(url, filter_value)
graphql_video_id = self._search_regex(
r'<meta\b[^>]+\bcontent="[^"]*\btpid/(\d+)"', webpage,
'id') or self._html_search_meta('videoId', webpage, 'GraphQL video ID', fatal=True)
else:
graphql_video_id = filter_value
result = self._download_json(
f'https://feeds.video.aetnd.com/api/v2/{brand}/videos',
filter_value, query={f'filter[{filter_key}]': filter_value})
result = traverse_obj(
result, ('results',
lambda k, v: k == 0 and v[filter_key] == filter_value),
get_all=False)
if not result:
'https://yoga.appsvcs.aetnd.com/', graphql_video_id,
query={
'brand': brand,
'mode': 'live',
'platform': 'web',
},
data=json.dumps({
'operationName': 'getUserVideo',
'variables': {
'videoId': graphql_video_id,
},
'query': self._GRAPHQL_QUERY,
}).encode(),
headers={
'Content-Type': 'application/json',
})
result = traverse_obj(result, ('data', 'video', {dict}))
media_url = traverse_obj(result, ('publicUrl', {url_or_none}))
if not media_url:
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
video_id=remove_start(filter_value, '/'))
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
video_id = result['programId']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
@@ -100,9 +135,13 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
'display_id': graphql_video_id,
'_old_archive_ids': [make_archive_id(self, graphql_video_id)],
**traverse_obj(result, {
'series': ('series', 'title', {str}),
'season_number': ('tvSeasonNumber', {int_or_none}),
'episode_number': ('tvSeasonEpisodeNumber', {int_or_none}),
}),
})
return info
@@ -116,7 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
(?:shows/[^/?#]+/)?videos/[^/?#]+
)'''
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'url': 'https://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': {
'id': '22253814',
'ext': 'mp4',
@@ -139,11 +178,11 @@ class AENetworksIE(AENetworksBaseIE):
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
'skip': 'Geo-restricted - This content is not available in your location.',
'skip': 'This content requires a valid, unexpired auth token',
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'url': 'https://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'info_dict': {
'id': '600587331957',
'id': '147486',
'ext': 'mp4',
'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
@@ -160,6 +199,8 @@ class AENetworksIE(AENetworksBaseIE):
'season_number': 9,
'series': 'Duck Dynasty',
'age_limit': 0,
'display_id': '600587331957',
'_old_archive_ids': ['aenetworks 600587331957'],
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
@@ -186,6 +227,7 @@ class AENetworksIE(AENetworksBaseIE):
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
'skip': '404 Not Found',
}, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story',
'info_dict': {
@@ -209,6 +251,7 @@ class AENetworksIE(AENetworksBaseIE):
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
'skip': 'This content requires a valid, unexpired auth token',
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True,
@@ -259,7 +302,7 @@ class AENetworksListBaseIE(AENetworksBaseIE):
domain, slug = self._match_valid_url(url).groups()
_, brand, _ = self._DOMAIN_MAP[domain]
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
base_url = f'http://watch.{domain}'
base_url = f'https://watch.{domain}'
entries = []
for item in (playlist.get(self._ITEMS_KEY) or []):

View File

@@ -11,18 +11,18 @@ from ..utils.traversal import traverse_obj
class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/ferreck-dawn-to-the-break-of-dawn-117/id1625658232?i=1000665010654',
'md5': '82cc219b8cc1dcf8bfc5a5e99b23b172',
'url': 'https://podcasts.apple.com/us/podcast/urbana-podcast-724-by-david-penn/id1531349107?i=1000748574256',
'md5': 'f8a6f92735d0cfbd5e6a7294151e28d8',
'info_dict': {
'id': '1000665010654',
'ext': 'mp3',
'title': 'Ferreck Dawn - To The Break of Dawn 117',
'episode': 'Ferreck Dawn - To The Break of Dawn 117',
'description': 'md5:8c4f5c2c30af17ed6a98b0b9daf15b76',
'upload_date': '20240812',
'timestamp': 1723449600,
'duration': 3596,
'series': 'Ferreck Dawn - To The Break of Dawn',
'id': '1000748574256',
'ext': 'm4a',
'title': 'URBANA PODCAST 724 BY DAVID PENN',
'episode': 'URBANA PODCAST 724 BY DAVID PENN',
'description': 'md5:fec77bacba32db8c9b3dda5486ed085f',
'upload_date': '20260206',
'timestamp': 1770400801,
'duration': 3602,
'series': 'Urbana Radio Show',
'thumbnail': 're:.+[.](png|jpe?g|webp)',
},
}, {
@@ -57,22 +57,22 @@ class ApplePodcastsIE(InfoExtractor):
webpage = self._download_webpage(url, episode_id)
server_data = self._search_json(
r'<script [^>]*\bid=["\']serialized-server-data["\'][^>]*>', webpage,
'server data', episode_id, contains_pattern=r'\[{(?s:.+)}\]')[0]['data']
'server data', episode_id)['data'][0]['data']
model_data = traverse_obj(server_data, (
'headerButtonItems', lambda _, v: v['$kind'] == 'share' and v['modelType'] == 'EpisodeLockup',
'model', {dict}, any))
return {
'id': episode_id,
**self._json_ld(
traverse_obj(server_data, ('seoData', 'schemaContent', {dict}))
or self._yield_json_ld(webpage, episode_id, fatal=False), episode_id, fatal=False),
**traverse_obj(model_data, {
'title': ('title', {str}),
'description': ('summary', {clean_html}),
'url': ('playAction', 'episodeOffer', 'streamUrl', {clean_podcast_url}),
'timestamp': ('releaseDate', {parse_iso8601}),
'duration': ('duration', {int_or_none}),
'episode': ('title', {str}),
'episode_number': ('episodeNumber', {int_or_none}),
'series': ('showTitle', {str}),
}),
'thumbnail': self._og_search_thumbnail(webpage),
'vcodec': 'none',

View File

@@ -124,7 +124,7 @@ class BilibiliBaseIE(InfoExtractor):
**traverse_obj(play_info, {
'quality': ('quality', {int_or_none}),
'format_id': ('quality', {str_or_none}),
'format_note': ('quality', {lambda x: format_names.get(x)}),
'format_note': ('quality', {format_names.get}),
'duration': ('timelength', {float_or_none(scale=1000)}),
}),
**parse_resolution(format_names.get(play_info.get('quality'))),

View File

@@ -10,6 +10,7 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
join_nonempty,
js_to_json,
jwt_decode_hs256,
mimetype2ext,
@@ -25,6 +26,7 @@ from ..utils import (
url_basename,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..utils.traversal import require, traverse_obj, trim_str
@@ -540,6 +542,32 @@ class CBCGemBaseIE(InfoExtractor):
f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}',
display_id or item_id, query={'device': 'web'})
def _call_media_api(self, media_id, app_code='gem', display_id=None, headers=None):
media_data = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
display_id or media_id, headers=headers, query={
'appCode': app_code,
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': media_id,
})
error_code = traverse_obj(media_data, ('errorCode', {int}))
if error_code == 1:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
if error_code == 35:
self.raise_login_required(method='password')
if error_code != 0:
error_message = join_nonempty(error_code, media_data.get('message'), delim=' - ')
raise ExtractorError(f'{self.IE_NAME} said: {error_message}')
return media_data
def _extract_item_info(self, item_info):
episode_number = None
title = traverse_obj(item_info, ('title', {str}))
@@ -567,7 +595,7 @@ class CBCGemBaseIE(InfoExtractor):
class CBCGemIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]+)'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]{2,4})/?(?:[?#]|$)'
_TESTS = [{
# This is a normal, public, TV show video
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
@@ -709,29 +737,10 @@ class CBCGemIE(CBCGemBaseIE):
if claims_token := self._fetch_claims_token():
headers['x-claims-token'] = claims_token
m3u8_info = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
video_id, headers=headers, query={
'appCode': 'gem',
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': item_info['idMedia'],
})
if m3u8_info.get('errorCode') == 1:
self.raise_geo_restricted(countries=['CA'])
elif m3u8_info.get('errorCode') == 35:
self.raise_login_required(method='password')
elif m3u8_info.get('errorCode') != 0:
raise ExtractorError(f'{self.IE_NAME} said: {m3u8_info.get("errorCode")} - {m3u8_info.get("message")}')
m3u8_url = self._call_media_api(
item_info['idMedia'], display_id=video_id, headers=headers)['url']
formats = self._extract_m3u8_formats(
m3u8_info['url'], video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
m3u8_url, video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
self._remove_duplicate_formats(formats)
for fmt in formats:
@@ -801,7 +810,128 @@ class CBCGemPlaylistIE(CBCGemBaseIE):
}), series=traverse_obj(show_info, ('title', {str})))
class CBCGemLiveIE(InfoExtractor):
class CBCGemContentIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:content'
IE_DESC = False # Do not list
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>[0-9a-z-]+)/?(?:[?#]|$)'
_TESTS = [{
# Series URL; content_type == 'Season'
'url': 'https://gem.cbc.ca/the-tunnel',
'playlist_count': 3,
'info_dict': {
'id': 'the-tunnel',
},
}, {
# Miniseries URL; content_type == 'Parts'
'url': 'https://gem.cbc.ca/summit-72',
'playlist_count': 1,
'info_dict': {
'id': 'summit-72',
},
}, {
# Olympics URL; content_type == 'Standalone'
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086',
'info_dict': {
'id': 'ski-jumping-nh-individual-womens-final-30086',
'ext': 'mp4',
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
'duration': 12793,
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
'release_timestamp': 1770482100,
'release_date': '20260207',
'live_status': 'was_live',
},
}, {
# Movie URL; content_type == 'Standalone'; requires authentication
'url': 'https://gem.cbc.ca/copa-71',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._search_nextjs_data(webpage, display_id)['props']['pageProps']['data']
content_type = data['contentType']
self.write_debug(f'Routing for content type "{content_type}"')
if content_type == 'Standalone':
new_url = traverse_obj(data, (
'header', 'cta', 'media', 'url', {urljoin('https://gem.cbc.ca/')}))
if CBCGemOlympicsIE.suitable(new_url):
return self.url_result(new_url, CBCGemOlympicsIE)
# Manually construct non-Olympics standalone URLs to avoid returning trailer URLs
return self.url_result(f'https://gem.cbc.ca/{display_id}/s01e01', CBCGemIE)
# Handle series URLs (content_type == 'Season') and miniseries URLs (content_type == 'Parts')
def entries():
for playlist_url in traverse_obj(data, (
'content', ..., 'lineups', ..., 'url', {urljoin('https://gem.cbc.ca/')},
{lambda x: x if CBCGemPlaylistIE.suitable(x) else None},
)):
yield self.url_result(playlist_url, CBCGemPlaylistIE)
return self.playlist_result(entries(), display_id)
class CBCGemOlympicsIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:olympics'
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>(?:[0-9a-z]+-)+[0-9]{5,})/s01e(?P<media_id>[0-9]{5,})'
_TESTS = [{
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086/s01e30086',
'info_dict': {
'id': 'ski-jumping-nh-individual-womens-final-30086',
'ext': 'mp4',
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
'duration': 12793,
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
'release_timestamp': 1770482100,
'release_date': '20260207',
'live_status': 'was_live',
},
}]
def _real_extract(self, url):
video_id, media_id = self._match_valid_url(url).group('id', 'media_id')
video_info = self._call_show_api(video_id)
item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items',
lambda _, v: v['formattedIdMedia'] == media_id, any, {require('item info')}))
live_status = {
'LiveEvent': 'is_live',
'Replay': 'was_live',
}.get(item_info.get('type'))
release_timestamp = traverse_obj(item_info, (
'metadata', (('live', 'startDate'), ('replay', 'airDate')), {parse_iso8601}, any))
if live_status == 'is_live' and release_timestamp and release_timestamp > time.time():
formats = []
live_status = 'is_upcoming'
self.raise_no_formats('This livestream has not yet started', expected=True)
else:
m3u8_url = self._call_media_api(media_id, 'medianetlive', video_id)['url']
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
return {
'id': video_id,
'formats': formats,
'live_status': live_status,
'release_timestamp': release_timestamp,
**traverse_obj(item_info, {
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('images', 'card', 'url', {url_or_none}),
'duration': ('metadata', 'replay', 'duration', {int_or_none}),
}),
}
class CBCGemLiveIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:live'
_VALID_URL = r'https?://gem\.cbc\.ca/live(?:-event)?/(?P<id>\d+)'
_TESTS = [
@@ -871,7 +1001,6 @@ class CBCGemLiveIE(InfoExtractor):
'only_matching': True,
},
]
_GEO_COUNTRIES = ['CA']
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -900,19 +1029,8 @@ class CBCGemLiveIE(InfoExtractor):
live_status = 'is_upcoming'
self.raise_no_formats('This livestream has not yet started', expected=True)
else:
stream_data = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/', video_id, query={
'appCode': 'medianetlive',
'connectionType': 'hd',
'deviceType': 'ipad',
'idMedia': video_stream_id,
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestType': 'desktop',
})
formats = self._extract_m3u8_formats(
stream_data['url'], video_id, 'mp4', live=live_status == 'is_live')
m3u8_url = self._call_media_api(video_stream_id, 'medianetlive', video_id)['url']
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
return {
'id': video_id,

View File

@@ -661,9 +661,11 @@ class InfoExtractor:
if not self._ready:
self._initialize_pre_login()
if self.supports_login():
username, password = self._get_login_info()
if username:
self._perform_login(username, password)
# try login only if it would actually do anything
if type(self)._perform_login is not InfoExtractor._perform_login:
username, password = self._get_login_info()
if username:
self._perform_login(username, password)
elif self.get_param('username') and False not in (self.IE_DESC, self._NETRC_MACHINE):
self.report_warning(f'Login with password is not supported for this website. {self._login_hint("cookies")}')
self._real_initialize()
@@ -1385,6 +1387,11 @@ class InfoExtractor:
def _get_netrc_login_info(self, netrc_machine=None):
netrc_machine = netrc_machine or self._NETRC_MACHINE
if not netrc_machine:
raise ExtractorError(f'Missing netrc_machine and {type(self).__name__}._NETRC_MACHINE')
ALLOWED = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_'
if netrc_machine.startswith(('-', '_')) or not all(c in ALLOWED for c in netrc_machine):
raise ExtractorError(f'Invalid netrc machine: {netrc_machine!r}', expected=True)
cmd = self.get_param('netrc_cmd')
if cmd:

View File

@@ -384,8 +384,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
last_error = None
for note, kwargs in (
('Downloading m3u8 information', {}),
('Retrying m3u8 download with randomized headers', {
('Downloading m3u8 information with randomized headers', {
'headers': self._generate_blockbuster_headers(),
}),
('Retrying m3u8 download with Chrome impersonation', {

View File

@@ -1041,8 +1041,6 @@ class FacebookAdsIE(InfoExtractor):
'uploader': 'Casper',
'uploader_id': '224110981099062',
'uploader_url': 'https://www.facebook.com/Casper/',
'timestamp': 1766299837,
'upload_date': '20251221',
'like_count': int,
},
'playlist_count': 2,
@@ -1054,12 +1052,23 @@ class FacebookAdsIE(InfoExtractor):
'uploader': 'Case \u00e0 Chocs',
'uploader_id': '112960472096793',
'uploader_url': 'https://www.facebook.com/Caseachocs/',
'timestamp': 1768498293,
'upload_date': '20260115',
'like_count': int,
'description': 'md5:f02a255fcf7dce6ed40e9494cf4bc49a',
},
'playlist_count': 3,
}, {
'url': 'https://www.facebook.com/ads/library/?id=1704834754236452',
'info_dict': {
'id': '1704834754236452',
'ext': 'mp4',
'title': 'Get answers now!',
'description': 'Ask the best psychics and get accurate answers on questions that bother you!',
'uploader': 'Your Relationship Advisor',
'uploader_id': '108939234726306',
'uploader_url': 'https://www.facebook.com/100068970634636/',
'like_count': int,
'thumbnail': r're:https://.+/.+\.jpg',
},
}, {
'url': 'https://es-la.facebook.com/ads/library/?id=901230958115569',
'only_matching': True,
@@ -1123,8 +1132,11 @@ class FacebookAdsIE(InfoExtractor):
post_data = traverse_obj(
re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})</script>', webpage), (..., {json.loads}))
data = get_first(post_data, (
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ...,
'entryPointRoot', 'otherProps', 'deeplinkAdCard', 'snapshot', {dict}))
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ..., (
('__bbox', 'result', 'data', 'ad_library_main', 'deeplink_ad_archive_result', 'deeplink_ad_archive'),
# old path
('entryPointRoot', 'otherProps', 'deeplinkAdCard'),
), 'snapshot', {dict}))
if not data:
raise ExtractorError('Unable to extract ad data')
@@ -1140,11 +1152,12 @@ class FacebookAdsIE(InfoExtractor):
'title': title,
'description': markup or None,
}, traverse_obj(data, {
'description': ('link_description', {lambda x: x if not x.startswith('{{product.') else None}),
'description': (
(('body', 'text'), 'link_description'),
{lambda x: x if not x.startswith('{{product.') else None}, any),
'uploader': ('page_name', {str}),
'uploader_id': ('page_id', {str_or_none}),
'uploader_url': ('page_profile_uri', {url_or_none}),
'timestamp': ('creation_time', {int_or_none}),
'like_count': ('page_like_count', {int_or_none}),
}))
@@ -1155,7 +1168,8 @@ class FacebookAdsIE(InfoExtractor):
entries.append({
'id': f'{video_id}_{idx}',
'title': entry.get('title') or title,
'description': traverse_obj(entry, 'body', 'link_description') or info_dict.get('description'),
'description': traverse_obj(
entry, 'body', 'link_description', expected_type=str) or info_dict.get('description'),
'thumbnail': url_or_none(entry.get('video_preview_image_url')),
'formats': self._extract_formats(entry),
})

View File

@@ -3,10 +3,12 @@ import urllib.parse
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
mimetype2ext,
parse_qs,
unescapeHTML,
unified_strdate,
url_or_none,
)
@@ -107,6 +109,11 @@ class FirstTVIE(InfoExtractor):
'timestamp': ('dvr_begin_at', {int_or_none}),
'upload_date': ('date_air', {unified_strdate}),
'duration': ('duration', {int_or_none}),
'chapters': ('episodes', lambda _, v: float_or_none(v['from']) is not None, {
'start_time': ('from', {float_or_none}),
'title': ('name', {str}, {unescapeHTML}),
'end_time': ('to', {float_or_none}),
}),
}),
'id': video_id,
'formats': formats,

View File

@@ -318,9 +318,48 @@ class FloatplaneIE(FloatplaneBaseIE):
self.raise_login_required()
class FloatplaneChannelIE(InfoExtractor):
class FloatplaneChannelBaseIE(InfoExtractor):
"""Subclasses must set _RESULT_IE, _BASE_URL and _PAGE_SIZE"""
def _fetch_page(self, display_id, creator_id, channel_id, page):
query = {
'id': creator_id,
'limit': self._PAGE_SIZE,
'fetchAfter': page * self._PAGE_SIZE,
}
if channel_id:
query['channel'] = channel_id
page_data = self._download_json(
f'{self._BASE_URL}/api/v3/content/creator', display_id,
query=query, note=f'Downloading page {page + 1}')
for post in page_data or []:
yield self.url_result(
f'{self._BASE_URL}/post/{post["id"]}',
self._RESULT_IE, id=post['id'], title=post.get('title'),
release_timestamp=parse_iso8601(post.get('releaseDate')))
def _real_extract(self, url):
creator, channel = self._match_valid_url(url).group('id', 'channel')
display_id = join_nonempty(creator, channel, delim='/')
creator_data = self._download_json(
f'{self._BASE_URL}/api/v3/creator/named',
display_id, query={'creatorURL[0]': creator})[0]
channel_data = traverse_obj(
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
return self.playlist_result(OnDemandPagedList(functools.partial(
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
display_id, title=channel_data.get('title') or creator_data.get('title'),
description=channel_data.get('about') or creator_data.get('about'))
class FloatplaneChannelIE(FloatplaneChannelBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
_BASE_URL = 'https://www.floatplane.com'
_PAGE_SIZE = 20
_RESULT_IE = FloatplaneIE
_TESTS = [{
'url': 'https://www.floatplane.com/channel/linustechtips/home/ltxexpo',
'info_dict': {
@@ -346,36 +385,3 @@ class FloatplaneChannelIE(InfoExtractor):
},
'playlist_mincount': 200,
}]
def _fetch_page(self, display_id, creator_id, channel_id, page):
query = {
'id': creator_id,
'limit': self._PAGE_SIZE,
'fetchAfter': page * self._PAGE_SIZE,
}
if channel_id:
query['channel'] = channel_id
page_data = self._download_json(
'https://www.floatplane.com/api/v3/content/creator', display_id,
query=query, note=f'Downloading page {page + 1}')
for post in page_data or []:
yield self.url_result(
f'https://www.floatplane.com/post/{post["id"]}',
FloatplaneIE, id=post['id'], title=post.get('title'),
release_timestamp=parse_iso8601(post.get('releaseDate')))
def _real_extract(self, url):
creator, channel = self._match_valid_url(url).group('id', 'channel')
display_id = join_nonempty(creator, channel, delim='/')
creator_data = self._download_json(
'https://www.floatplane.com/api/v3/creator/named',
display_id, query={'creatorURL[0]': creator})[0]
channel_data = traverse_obj(
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
return self.playlist_result(OnDemandPagedList(functools.partial(
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
display_id, title=channel_data.get('title') or creator_data.get('title'),
description=channel_data.get('about') or creator_data.get('about'))

View File

@@ -59,7 +59,7 @@ class GetCourseRuIE(InfoExtractor):
'marafon.mani-beauty.com',
'on.psbook.ru',
]
_BASE_URL_RE = rf'https?://(?:(?!player02\.)[^.]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})'
_BASE_URL_RE = rf'https?://(?:(?!player02\.)[a-zA-Z0-9-]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})'
_VALID_URL = [
rf'{_BASE_URL_RE}/(?!pl/|teach/)(?P<id>[^?#]+)',
rf'{_BASE_URL_RE}/(?:pl/)?teach/control/lesson/view\?(?:[^#]+&)?id=(?P<id>\d+)',

View File

@@ -29,7 +29,7 @@ class LearningOnScreenIE(InfoExtractor):
}]
def _real_initialize(self):
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'):
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-LOS-LIVE'):
self.raise_login_required(method='session_cookies')
def _real_extract(self, url):

209
yt_dlp/extractor/locipo.py Normal file
View File

@@ -0,0 +1,209 @@
import functools
import math
from .streaks import StreaksBaseIE
from ..networking import HEADRequest
from ..utils import (
InAdvancePagedList,
clean_html,
js_to_json,
parse_iso8601,
parse_qs,
str_or_none,
)
from ..utils.traversal import require, traverse_obj
class LocipoBaseIE(StreaksBaseIE):
_API_BASE = 'https://web-api.locipo.jp'
_BASE_URL = 'https://locipo.jp'
_UUID_RE = r'[\da-f]{8}(?:-[\da-f]{4}){3}-[\da-f]{12}'
def _call_api(self, path, item_id, note, fatal=True):
return self._download_json(
f'{self._API_BASE}/{path}', item_id,
f'Downloading {note} API JSON',
f'Unable to download {note} API JSON',
fatal=fatal)
class LocipoIE(LocipoBaseIE):
_VALID_URL = [
fr'https?://locipo\.jp/creative/(?P<id>{LocipoBaseIE._UUID_RE})',
fr'https?://locipo\.jp/embed/?\?(?:[^#]+&)?id=(?P<id>{LocipoBaseIE._UUID_RE})',
]
_TESTS = [{
'url': 'https://locipo.jp/creative/fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
'info_dict': {
'id': 'fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
'ext': 'mp4',
'title': 'リアルカレカノ#4 ~伊達さゆりと勉強しよっ?~',
'description': 'md5:70a40c202f3fb7946b61e55fa015094c',
'display_id': '5a2947fe596441f5bab88a61b0432d0d',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1711789200,
'release_date': '20240330',
'series': 'リアルカレカノ',
'series_id': '1142',
'tags': 'count:4',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1756984919,
'upload_date': '20250904',
'uploader': '東海テレビ',
'uploader_id': 'locipo-prod',
},
}, {
'url': 'https://locipo.jp/embed/?id=71a334a0-2b25-406f-9d96-88f341f571c2',
'info_dict': {
'id': '71a334a0-2b25-406f-9d96-88f341f571c2',
'ext': 'mp4',
'title': '#1 オーディション/ゲスト伊藤美来、豊田萌絵',
'description': 'md5:5bbcf532474700439cf56ceb6a15630e',
'display_id': '0ab32634b884499a84adb25de844c551',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1751623200,
'release_date': '20250704',
'series': '声優ラジオのウラカブリLocipo出張所',
'series_id': '1454',
'tags': 'count:6',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1757002966,
'upload_date': '20250904',
'uploader': 'テレビ愛知',
'uploader_id': 'locipo-prod',
},
}, {
'url': 'https://locipo.jp/creative/bff9950d-229b-4fe9-911a-7fa71a232f35?list=69a5b15c-901f-4828-a336-30c0de7612d3',
'info_dict': {
'id': '69a5b15c-901f-4828-a336-30c0de7612d3',
'title': '見て・乗って・語りたい。 東海の鉄道沼',
},
'playlist_mincount': 3,
}, {
'url': 'https://locipo.jp/creative/a0751a7f-c7dd-4a10-a7f1-e12720bdf16c?list=006cff3f-ba74-42f0-b4fd-241486ebda2b',
'info_dict': {
'id': 'a0751a7f-c7dd-4a10-a7f1-e12720bdf16c',
'ext': 'mp4',
'title': '#839 人間真空パック',
'description': 'md5:9fe190333b6975c5001c8c9cbe20d276',
'display_id': 'c2b4c9f4a6d648bd8e3c320e384b9d56',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1746239400,
'release_date': '20250503',
'series': 'でんじろう先生のはぴエネ!',
'series_id': '202',
'tags': 'count:3',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1756975909,
'upload_date': '20250904',
'uploader': '中京テレビ',
'uploader_id': 'locipo-prod',
},
'params': {'noplaylist': True},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
playlist_id = traverse_obj(parse_qs(url), ('list', -1, {str}))
if self._yes_playlist(playlist_id, video_id):
return self.url_result(
f'{self._BASE_URL}/playlist/{playlist_id}', LocipoPlaylistIE)
creatives = self._call_api(f'creatives/{video_id}', video_id, 'Creatives')
media_id = traverse_obj(creatives, ('media_id', {str}, {require('Streaks media ID')}))
webpage = self._download_webpage(url, video_id)
config = self._search_json(
r'window\.__NUXT__\.config\s*=', webpage, 'config', video_id, transform_source=js_to_json)
api_key = traverse_obj(config, ('public', 'streaksVodPlaybackApiKey', {str}, {require('api key')}))
return {
**self._extract_from_streaks_api('locipo-prod', media_id, headers={
'Origin': 'https://locipo.jp',
'X-Streaks-Api-Key': api_key,
}),
**traverse_obj(creatives, {
'title': ('name', {clean_html}),
'description': ('description', {clean_html}, filter),
'release_timestamp': ('publication_started_at', {parse_iso8601}),
'tags': ('keyword', {clean_html}, {lambda x: x.split(',')}, ..., {str.strip}, filter),
'uploader': ('company', 'name', {clean_html}, filter),
}),
**traverse_obj(creatives, ('series', {
'series': ('name', {clean_html}, filter),
'series_id': ('id', {str_or_none}),
})),
'id': video_id,
}
class LocipoPlaylistIE(LocipoBaseIE):
_VALID_URL = [
fr'https?://locipo\.jp/(?P<type>playlist)/(?P<id>{LocipoBaseIE._UUID_RE})',
r'https?://locipo\.jp/(?P<type>series)/(?P<id>\d+)',
]
_TESTS = [{
'url': 'https://locipo.jp/playlist/35d3dd2b-531d-4824-8575-b1c527d29538',
'info_dict': {
'id': '35d3dd2b-531d-4824-8575-b1c527d29538',
'title': 'レシピ集',
},
'playlist_mincount': 135,
}, {
# Redirects to https://locipo.jp/series/1363
'url': 'https://locipo.jp/playlist/fef7c4fb-741f-4d6a-a3a6-754f354302a2',
'info_dict': {
'id': '1363',
'title': 'CBCアナウンサー公式【みてちょてれび】',
'description': 'md5:50a1b23e63112d5c06c882835c8c1fb1',
},
'playlist_mincount': 38,
}, {
'url': 'https://locipo.jp/series/503',
'info_dict': {
'id': '503',
'title': 'FishingLover東海',
'description': '東海地区の釣り場でフィッシングの魅力を余すところなくご紹介!!',
},
'playlist_mincount': 223,
}]
_PAGE_SIZE = 100
def _fetch_page(self, path, playlist_id, page):
creatives = self._download_json(
f'{self._API_BASE}/{path}/{playlist_id}/creatives',
playlist_id, f'Downloading page {page + 1}', query={
'premium': False,
'live': False,
'limit': self._PAGE_SIZE,
'offset': page * self._PAGE_SIZE,
})
for video_id in traverse_obj(creatives, ('items', ..., 'id', {str})):
yield self.url_result(f'{self._BASE_URL}/creative/{video_id}', LocipoIE)
def _real_extract(self, url):
playlist_type, playlist_id = self._match_valid_url(url).group('type', 'id')
if urlh := self._request_webpage(HEADRequest(url), playlist_id, fatal=False):
playlist_type, playlist_id = self._match_valid_url(urlh.url).group('type', 'id')
path = 'playlists' if playlist_type == 'playlist' else 'series'
creatives = self._call_api(
f'{path}/{playlist_id}/creatives', playlist_id, path.capitalize())
entries = InAdvancePagedList(
functools.partial(self._fetch_page, path, playlist_id),
math.ceil(int(creatives['total']) / self._PAGE_SIZE), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
**traverse_obj(creatives, ('items', ..., playlist_type, {
'title': ('name', {clean_html}, filter),
'description': ('description', {clean_html}, filter),
}, any)))

View File

@@ -0,0 +1,38 @@
from .common import InfoExtractor
from ..utils import join_nonempty, unified_strdate
from ..utils.traversal import traverse_obj
class MatchiTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?matchi\.tv/watch/?\?(?:[^#]+&)?s=(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://matchi.tv/watch?s=0euhjzrxsjm',
'info_dict': {
'id': '0euhjzrxsjm',
'ext': 'mp4',
'title': 'Court 2 at Stratford Padel Club 2024-07-13T18:32:24',
'thumbnail': 'https://thumbnails.padelgo.tv/0euhjzrxsjm.jpg',
'upload_date': '20240713',
},
}, {
'url': 'https://matchi.tv/watch?s=FkKDJ9SvAx1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
loaded_media = traverse_obj(
self._search_nextjs_data(webpage, video_id, fatal=False),
('props', 'pageProps', 'loadedMedia', {dict})) or {}
start_date_time = traverse_obj(loaded_media, ('startDateTime', {str}))
return {
'id': video_id,
'title': join_nonempty(loaded_media.get('courtDescription'), start_date_time, delim=' '),
'thumbnail': f'https://thumbnails.padelgo.tv/{video_id}.jpg',
'upload_date': unified_strdate(start_date_time),
'formats': self._extract_m3u8_formats(
f'https://streams.padelgo.tv/v2/streams/m3u8/{video_id}/anonymous/playlist.m3u8',
video_id, 'mp4', m3u8_id='hls'),
}

View File

@@ -25,7 +25,7 @@ class MixcloudBaseIE(InfoExtractor):
%s
}
}''' % (lookup_key, username, f', slug: "{slug}"' if slug else '', object_fields), # noqa: UP031
})['data'][lookup_key]
}, impersonate=True)['data'][lookup_key]
class MixcloudIE(MixcloudBaseIE):

View File

@@ -9,13 +9,13 @@ from ..utils import (
int_or_none,
qualities,
smuggle_url,
traverse_obj,
unescapeHTML,
unified_strdate,
unsmuggle_url,
url_or_none,
urlencode_postdata,
)
from ..utils.traversal import find_element, traverse_obj
class OdnoklassnikiIE(InfoExtractor):
@@ -264,9 +264,7 @@ class OdnoklassnikiIE(InfoExtractor):
note='Downloading desktop webpage',
headers={'Referer': smuggled['referrer']} if smuggled.get('referrer') else {})
error = self._search_regex(
r'[^>]+class="vp_video_stub_txt"[^>]*>([^<]+)<',
webpage, 'error', default=None)
error = traverse_obj(webpage, {find_element(cls='vp_video_stub_txt')})
# Direct link from boosty
if (error == 'The author of this video has not been found or is blocked'
and not smuggled.get('referrer') and mode == 'videoembed'):

View File

@@ -33,7 +33,8 @@ class OpencastBaseIE(InfoExtractor):
vid\.igb\.illinois\.edu|
cursosabertos\.c3sl\.ufpr\.br|
mcmedia\.missioncollege\.org|
clases\.odon\.edu\.uy
clases\.odon\.edu\.uy|
oc-p\.uni-jena\.de
)'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
@@ -106,7 +107,7 @@ class OpencastBaseIE(InfoExtractor):
class OpencastIE(OpencastBaseIE):
_VALID_URL = rf'''(?x)
https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella/ui/watch\.html\?
https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella[0-9]*/ui/watch\.html\?
(?:[^#]+&)?id=(?P<id>{OpencastBaseIE._UUID_RE})'''
_API_BASE = 'https://%s/search/episode.json?id=%s'
@@ -131,8 +132,12 @@ class OpencastIE(OpencastBaseIE):
def _real_extract(self, url):
host, video_id = self._match_valid_url(url).group('host', 'id')
return self._parse_mediapackage(
self._call_api(host, video_id)['search-results']['result']['mediapackage'])
response = self._call_api(host, video_id)
package = traverse_obj(response, (
('search-results', 'result'),
('result', ...), # Path needed for oc-p.uni-jena.de
'mediapackage', {dict}, any)) or {}
return self._parse_mediapackage(package)
class OpencastPlaylistIE(OpencastBaseIE):

View File

@@ -128,7 +128,7 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = rf'''(?x)
https?://
(?:
(?:[^/]+\.)?
(?:[a-zA-Z0-9.-]+\.)?
{PornHubBaseIE._PORNHUB_HOST_RE}
/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/
@@ -506,6 +506,7 @@ class PornHubIE(PornHubBaseIE):
'cast': ({find_elements(attr='data-label', value='pornstar')}, ..., {clean_html}),
}),
'subtitles': subtitles,
'http_headers': {'Referer': f'https://www.{host}/'},
}, info)
@@ -533,7 +534,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = rf'(?P<url>https?://(?:[^/]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_VALID_URL = rf'(?P<url>https?://(?:[a-zA-Z0-9.-]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118,

View File

@@ -1,4 +1,4 @@
from .floatplane import FloatplaneBaseIE
from .floatplane import FloatplaneBaseIE, FloatplaneChannelBaseIE
class SaucePlusIE(FloatplaneBaseIE):
@@ -39,3 +39,19 @@ class SaucePlusIE(FloatplaneBaseIE):
def _real_initialize(self):
if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'):
self.raise_login_required()
class SaucePlusChannelIE(FloatplaneChannelBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?sauceplus\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
_BASE_URL = 'https://www.sauceplus.com'
_RESULT_IE = SaucePlusIE
_PAGE_SIZE = 20
_TESTS = [{
'url': 'https://www.sauceplus.com/channel/williamosman/home',
'info_dict': {
'id': 'williamosman',
'title': 'William Osman',
'description': 'md5:a67bc961d23c293b2c5308d84f34f26c',
},
'playlist_mincount': 158,
}]

View File

@@ -146,8 +146,8 @@ class SBSIE(InfoExtractor):
'release_year': ('releaseYear', {int_or_none}),
'duration': ('duration', ({float_or_none}, {parse_duration})),
'is_live': ('liveStream', {bool}),
'age_limit': (('classificationID', 'contentRating'), {str.upper}, {
lambda x: self._AUS_TV_PARENTAL_GUIDELINES.get(x)}), # dict.get is unhashable in py3.7
'age_limit': (
('classificationID', 'contentRating'), {str.upper}, {self._AUS_TV_PARENTAL_GUIDELINES.get}),
}, get_all=False),
**traverse_obj(media, {
'categories': (('genres', ...), ('taxonomy', ('genre', 'subgenre'), 'name'), {str}),

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor, SearchInfoExtractor
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..networking.impersonate import ImpersonateTarget
from ..utils import (
ExtractorError,
float_or_none,
@@ -118,9 +119,9 @@ class SoundcloudBaseIE(InfoExtractor):
self.cache.store('soundcloud', 'client_id', client_id)
def _update_client_id(self):
webpage = self._download_webpage('https://soundcloud.com/', None)
webpage = self._download_webpage('https://soundcloud.com/', None, 'Downloading main page')
for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
script = self._download_webpage(src, None, fatal=False)
script = self._download_webpage(src, None, 'Downloading JS asset', fatal=False)
if script:
client_id = self._search_regex(
r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
@@ -136,13 +137,13 @@ class SoundcloudBaseIE(InfoExtractor):
if non_fatal:
del kwargs['fatal']
query = kwargs.get('query', {}).copy()
for _ in range(2):
for is_first_attempt in (True, False):
query['client_id'] = self._CLIENT_ID
kwargs['query'] = query
try:
return self._download_json(*args, **kwargs)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
if is_first_attempt and isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
self._store_client_id(None)
self._update_client_id()
continue
@@ -152,7 +153,10 @@ class SoundcloudBaseIE(InfoExtractor):
raise
def _initialize_pre_login(self):
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf'
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id')
if self._CLIENT_ID:
return
self._update_client_id()
def _verify_oauth_token(self, token):
if self._request_webpage(
@@ -830,6 +834,30 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
'entries': self._entries(base_url, playlist_id),
}
@functools.cached_property
def _browser_impersonate_target(self):
available_targets = self._downloader._get_available_impersonate_targets()
if not available_targets:
# impersonate=True gives a generic warning when no impersonation targets are available
return True
# Any browser target older than chrome-116 is 403'd by Datadome
MIN_SUPPORTED_TARGET = ImpersonateTarget('chrome', '116', 'windows', '10')
version_as_float = lambda x: float(x.version) if x.version else 0
# Always try to use the newest Chrome target available
filtered = sorted([
target[0] for target in available_targets
if target[0].client == 'chrome' and target[0].os in ('windows', 'macos')
], key=version_as_float)
if not filtered or version_as_float(filtered[-1]) < version_as_float(MIN_SUPPORTED_TARGET):
# All available targets are inadequate or newest available Chrome target is too old, so
# warn the user to upgrade their dependency to a version with the minimum supported target
return MIN_SUPPORTED_TARGET
return filtered[-1]
def _entries(self, url, playlist_id):
# Per the SoundCloud documentation, the maximum limit for a linked partitioning query is 200.
# https://developers.soundcloud.com/blog/offset-pagination-deprecated
@@ -844,7 +872,9 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
try:
response = self._call_api(
url, playlist_id, query=query, headers=self._HEADERS,
note=f'Downloading track page {i + 1}')
note=f'Downloading track page {i + 1}',
# See: https://github.com/yt-dlp/yt-dlp/issues/15660
impersonate=self._browser_impersonate_target)
break
except ExtractorError as e:
# Downloading page may result in intermittent 502 HTTP error

View File

@@ -3,6 +3,7 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
merge_dicts,
parse_duration,
@@ -12,6 +13,7 @@ from ..utils import (
urlencode_postdata,
urljoin,
)
from ..utils.traversal import find_element, traverse_obj, trim_str
class SpankBangIE(InfoExtractor):
@@ -122,7 +124,7 @@ class SpankBangIE(InfoExtractor):
}), headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
})
}, impersonate=True)
for format_id, format_url in stream.items():
if format_url and isinstance(format_url, list):
@@ -178,9 +180,9 @@ class SpankBangPlaylistIE(InfoExtractor):
def _real_extract(self, url):
mobj = self._match_valid_url(url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(
url, playlist_id, headers={'Cookie': 'country=US; mobile=on'})
country = self.get_param('geo_bypass_country') or 'US'
self._set_cookie('.spankbang.com', 'country', country.upper())
webpage = self._download_webpage(url, playlist_id, impersonate=True)
entries = [self.url_result(
urljoin(url, mobj.group('path')),
@@ -189,8 +191,8 @@ class SpankBangPlaylistIE(InfoExtractor):
r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/[^"\'](?:(?!\1).)*)\1',
webpage)]
title = self._html_search_regex(
r'<em>([^<]+)</em>\s+playlist\s*<', webpage, 'playlist title',
fatal=False)
title = traverse_obj(webpage, (
{find_element(tag='h1', attr='data-testid', value='playlist-title')},
{clean_html}, {trim_str(end=' Playlist')}))
return self.playlist_result(entries, playlist_id, title)

View File

@@ -8,15 +8,12 @@ from ..utils import (
extract_attributes,
join_nonempty,
js_to_json,
parse_resolution,
str_or_none,
url_basename,
url_or_none,
)
from ..utils.traversal import (
find_element,
find_elements,
traverse_obj,
trim_str,
)
from ..utils.traversal import find_element, traverse_obj
class SteamIE(InfoExtractor):
@@ -27,7 +24,7 @@ class SteamIE(InfoExtractor):
'id': '105600',
'title': 'Terraria',
},
'playlist_mincount': 3,
'playlist_mincount': 5,
}, {
'url': 'https://store.steampowered.com/app/271590/Grand_Theft_Auto_V/',
'info_dict': {
@@ -37,6 +34,39 @@ class SteamIE(InfoExtractor):
'playlist_mincount': 26,
}]
def _entries(self, app_id, app_name, data_props):
for trailer in traverse_obj(data_props, (
'trailers', lambda _, v: str_or_none(v['id']),
)):
movie_id = str_or_none(trailer['id'])
thumbnails = []
for thumbnail_url in traverse_obj(trailer, (
('poster', 'thumbnail'), {url_or_none},
)):
thumbnails.append({
'url': thumbnail_url,
**parse_resolution(url_basename(thumbnail_url)),
})
formats = []
if hls_manifest := traverse_obj(trailer, ('hlsManifest', {url_or_none})):
formats.extend(self._extract_m3u8_formats(
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
for dash_manifest in traverse_obj(trailer, ('dashManifests', ..., {url_or_none})):
formats.extend(self._extract_mpd_formats(
dash_manifest, app_id, mpd_id='dash', fatal=False))
self._remove_duplicate_formats(formats)
yield {
'id': join_nonempty(app_id, movie_id),
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
'formats': formats,
'series': app_name,
'series_id': app_id,
'thumbnails': thumbnails,
}
def _real_extract(self, url):
app_id = self._match_id(url)
@@ -45,32 +75,13 @@ class SteamIE(InfoExtractor):
self._set_cookie('store.steampowered.com', 'lastagecheckage', '1-January-2000')
webpage = self._download_webpage(url, app_id)
app_name = traverse_obj(webpage, ({find_element(cls='apphub_AppName')}, {clean_html}))
data_props = traverse_obj(webpage, (
{find_element(cls='gamehighlight_desktopcarousel', html=True)},
{extract_attributes}, 'data-props', {json.loads}, {dict}))
app_name = traverse_obj(data_props, ('appName', {clean_html}))
entries = []
for data_prop in traverse_obj(webpage, (
{find_elements(cls='highlight_player_item highlight_movie', html=True)},
..., {extract_attributes}, 'data-props', {json.loads}, {dict},
)):
formats = []
if hls_manifest := traverse_obj(data_prop, ('hlsManifest', {url_or_none})):
formats.extend(self._extract_m3u8_formats(
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
for dash_manifest in traverse_obj(data_prop, ('dashManifests', ..., {url_or_none})):
formats.extend(self._extract_mpd_formats(
dash_manifest, app_id, mpd_id='dash', fatal=False))
movie_id = traverse_obj(data_prop, ('id', {trim_str(start='highlight_movie_')}))
entries.append({
'id': movie_id,
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
'formats': formats,
'series': app_name,
'series_id': app_id,
'thumbnail': traverse_obj(data_prop, ('screenshot', {url_or_none})),
})
return self.playlist_result(entries, app_id, app_name)
return self.playlist_result(
self._entries(app_id, app_name, data_props), app_id, app_name)
class SteamCommunityIE(InfoExtractor):

View File

@@ -22,7 +22,7 @@ class StreaksBaseIE(InfoExtractor):
_GEO_BYPASS = False
_GEO_COUNTRIES = ['JP']
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False):
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False, live_from_start=False):
try:
response = self._download_json(
self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''),
@@ -83,6 +83,10 @@ class StreaksBaseIE(InfoExtractor):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query)
for fmt in fmts:
if live_from_start:
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args': ['-live_start_index', '0']})
fmt['is_from_start'] = True
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)

View File

@@ -102,7 +102,7 @@ class TeachableIE(TeachableBaseIE):
_WORKING = False
_VALID_URL = r'''(?x)
(?:
{}https?://(?P<site_t>[^/]+)|
{}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
https?://(?:www\.)?(?P<site>{})
)
/courses/[^/]+/lectures/(?P<id>\d+)
@@ -211,7 +211,7 @@ class TeachableIE(TeachableBaseIE):
class TeachableCourseIE(TeachableBaseIE):
_VALID_URL = r'''(?x)
(?:
{}https?://(?P<site_t>[^/]+)|
{}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
https?://(?:www\.)?(?P<site>{})
)
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)

View File

@@ -9,39 +9,39 @@ class Tele5IE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?P<parent_slug>[\w-]+)/(?P<slug_a>[\w-]+)(?:/(?P<slug_b>[\w-]+))?'
_TESTS = [{
# slug_a and slug_b
'url': 'https://tele5.de/mediathek/stargate-atlantis/quarantane',
'url': 'https://tele5.de/mediathek/star-trek-enterprise/vox-sola',
'info_dict': {
'id': '6852024',
'id': '4140114',
'ext': 'mp4',
'title': 'Quarantäne',
'description': 'md5:6af0373bd0fcc4f13e5d47701903d675',
'episode': 'Episode 73',
'episode_number': 73,
'season': 'Season 4',
'season_number': 4,
'series': 'Stargate Atlantis',
'upload_date': '20240525',
'timestamp': 1716643200,
'duration': 2503.2,
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/21/c81fcb45-8902-309b-badb-4e6d546b575d.jpeg',
'creators': ['Tele5'],
'title': 'Vox Sola',
'description': 'md5:329d115f74324d4364efc1a11c4ea7c9',
'duration': 2542.76,
'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
'tags': [],
'creators': ['Tele5'],
'series': 'Star Trek - Enterprise',
'season': 'Season 1',
'season_number': 1,
'episode': 'Episode 22',
'episode_number': 22,
'timestamp': 1770491100,
'upload_date': '20260207',
},
}, {
# only slug_a
'url': 'https://tele5.de/mediathek/inside-out',
'url': 'https://tele5.de/mediathek/30-miles-from-nowhere-im-wald-hoert-dich-niemand-schreien',
'info_dict': {
'id': '6819502',
'id': '4102641',
'ext': 'mp4',
'title': 'Inside out',
'description': 'md5:7e5f32ed0be5ddbd27713a34b9293bfd',
'series': 'Inside out',
'upload_date': '20240523',
'timestamp': 1716494400,
'duration': 5343.4,
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/15/181eba3c-f9f0-3faf-b14d-0097050a3aa4.jpeg',
'creators': ['Tele5'],
'title': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
'description': 'md5:0b731539f39ee186ebcd9dd444a86fc2',
'duration': 4849.96,
'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
'tags': [],
'creators': ['Tele5'],
'series': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
'timestamp': 1770417300,
'upload_date': '20260206',
},
}, {
# playlist
@@ -50,20 +50,27 @@ class Tele5IE(DiscoveryPlusBaseIE):
'id': 'mediathek-schlefaz',
},
'playlist_mincount': 3,
'skip': 'Dead link',
}]
def _real_extract(self, url):
parent_slug, slug_a, slug_b = self._match_valid_url(url).group('parent_slug', 'slug_a', 'slug_b')
playlist_id = join_nonempty(parent_slug, slug_a, slug_b, delim='-')
query = {'environment': 'tele5', 'v': '2'}
query = {
'include': 'default',
'filter[environment]': 'tele5',
'v': '2',
}
if not slug_b:
endpoint = f'page/{slug_a}'
query['parent_slug'] = parent_slug
else:
endpoint = f'videos/{slug_b}'
query['filter[show.slug]'] = slug_a
cms_data = self._download_json(f'https://de-api.loma-cms.com/feloma/{endpoint}/', playlist_id, query=query)
endpoint = f'shows/{slug_a}'
query['filter[video.slug]'] = slug_b
cms_data = self._download_json(f'https://public.aurora.enhanced.live/site/{endpoint}/', playlist_id, query=query)
return self.playlist_result(map(
functools.partial(self._get_disco_api_info, url, disco_host='eu1-prod.disco-api.com', realm='dmaxde', country='DE'),

View File

@@ -51,7 +51,8 @@ class TruthIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
status = self._download_json(f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id)
status = self._download_json(
f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id, impersonate=True)
uploader_id = strip_or_none(traverse_obj(status, ('account', 'username')))
return {
'id': video_id,

View File

@@ -4,6 +4,7 @@ from .streaks import StreaksBaseIE
from ..utils import (
ExtractorError,
GeoRestrictedError,
clean_html,
int_or_none,
join_nonempty,
make_archive_id,
@@ -11,7 +12,9 @@ from ..utils import (
str_or_none,
strip_or_none,
time_seconds,
unified_timestamp,
update_url_query,
url_or_none,
)
from ..utils.traversal import require, traverse_obj
@@ -257,3 +260,113 @@ class TVerIE(StreaksBaseIE):
'id': video_id,
'_old_archive_ids': [make_archive_id('BrightcoveNew', brightcove_id)] if brightcove_id else None,
}
class TVerOlympicIE(StreaksBaseIE):
IE_NAME = 'tver:olympic'
_API_BASE = 'https://olympic-data.tver.jp/api'
_VALID_URL = r'https?://(?:www\.)?tver\.jp/olympic/milanocortina2026/(?P<type>live|video)/play/(?P<id>\w+)'
_TESTS = [{
'url': 'https://tver.jp/olympic/milanocortina2026/video/play/3b1d4462150b42558d9cc8aabb5238d0/',
'info_dict': {
'id': '3b1d4462150b42558d9cc8aabb5238d0',
'ext': 'mp4',
'title': '【開会式】ぎゅっと凝縮ハイライト',
'display_id': 'ref:3b1d4462150b42558d9cc8aabb5238d0',
'duration': 712.045,
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'tags': 'count:1',
'thumbnail': r're:https://.+\.(?:jpg|png)',
'timestamp': 1770420187,
'upload_date': '20260206',
'uploader_id': 'tver-olympic',
},
}, {
'url': 'https://tver.jp/olympic/milanocortina2026/live/play/glts313itwvj/',
'info_dict': {
'id': 'glts313itwvj',
'ext': 'mp4',
'title': '開会式ハイライト',
'channel_id': 'ntv',
'display_id': 'ref:sp_260207_spc_01_dvr',
'duration': 7680,
'live_status': 'was_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'thumbnail': r're:https://.+\.(?:jpg|png)',
'timestamp': 1770420300,
'upload_date': '20260206',
'uploader_id': 'tver-olympic-live',
},
}]
def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id')
live_from_start = self.get_param('live_from_start')
if video_type == 'live':
project_id = 'tver-olympic-live'
api_key = 'a35ebb1ca7d443758dc7fcc5d99b1f72'
olympic_data = traverse_obj(self._download_json(
f'{self._API_BASE}/live/{video_id}', video_id), ('contents', 'live', {dict}))
media_id = traverse_obj(olympic_data, ('video_id', {str}))
now = time_seconds()
start_timestamp_str = traverse_obj(olympic_data, ('onair_start_date', {str}))
start_timestamp = unified_timestamp(start_timestamp_str, tz_offset=9)
if not start_timestamp:
raise ExtractorError('Unable to extract on-air start time')
end_timestamp = traverse_obj(olympic_data, (
'onair_end_date', {unified_timestamp(tz_offset=9)}, {require('on-air end time')}))
if now < start_timestamp:
self.raise_no_formats(
f'This program is scheduled to start at {start_timestamp_str} JST', expected=True)
return {
'id': video_id,
'live_status': 'is_upcoming',
'release_timestamp': start_timestamp,
}
elif start_timestamp <= now < end_timestamp:
live_status = 'is_live'
if live_from_start:
media_id += '_dvr'
elif end_timestamp <= now:
dvr_end_timestamp = traverse_obj(olympic_data, (
'dvr_end_date', {unified_timestamp(tz_offset=9)}))
if dvr_end_timestamp and now < dvr_end_timestamp:
live_status = 'was_live'
media_id += '_dvr'
else:
raise ExtractorError(
'This program is no longer available', expected=True)
else:
project_id = 'tver-olympic'
api_key = '4b55a4db3cce4ad38df6dd8543e3e46a'
media_id = video_id
live_status = 'not_live'
olympic_data = traverse_obj(self._download_json(
f'{self._API_BASE}/video/{video_id}', video_id), ('contents', 'video', {dict}))
return {
**self._extract_from_streaks_api(project_id, f'ref:{media_id}', {
'Origin': 'https://tver.jp',
'Referer': 'https://tver.jp/',
'X-Streaks-Api-Key': api_key,
}, live_from_start=live_from_start),
**traverse_obj(olympic_data, {
'title': ('title', {clean_html}, filter),
'alt_title': ('sub_title', {clean_html}, filter),
'channel': ('channel', {clean_html}, filter),
'channel_id': ('channel_id', {clean_html}, filter),
'description': (('description', 'description_l', 'description_s'), {clean_html}, filter, any),
'timestamp': ('onair_start_date', {unified_timestamp(tz_offset=9)}),
'thumbnail': (('picture_l_url', 'picture_m_url', 'picture_s_url'), {url_or_none}, any),
}),
'id': video_id,
'live_status': live_status,
}

152
yt_dlp/extractor/tvo.py Normal file
View File

@@ -0,0 +1,152 @@
import json
import urllib.parse
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_duration,
parse_iso8601,
smuggle_url,
str_or_none,
url_or_none,
)
from ..utils.traversal import (
require,
traverse_obj,
trim_str,
)
class TvoIE(InfoExtractor):
IE_NAME = 'TVO'
_VALID_URL = r'https?://(?:www\.)?tvo\.org/video(?:/documentaries)?/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.tvo.org/video/how-can-ontario-survive-the-trade-war',
'info_dict': {
'id': '6377531034112',
'ext': 'mp4',
'title': 'How Can Ontario Survive the Trade War?',
'description': 'md5:e7455d9cd4b6b1270141922044161457',
'display_id': 'how-can-ontario-survive-the-trade-war',
'duration': 3531,
'episode': 'How Can Ontario Survive the Trade War?',
'episode_id': 'how-can-ontario-survive-the-trade-war',
'episode_number': 1,
'season': 'Season 1',
'season_number': 1,
'series': 'TVO at AMO',
'series_id': 'tvo-at-amo',
'tags': 'count:17',
'thumbnail': r're:https?://.+',
'timestamp': 1756944016,
'upload_date': '20250904',
'uploader_id': '18140038001',
},
}, {
'url': 'https://www.tvo.org/video/documentaries/the-pitch',
'info_dict': {
'id': '6382500333112',
'ext': 'mp4',
'title': 'The Pitch',
'categories': ['Documentaries'],
'description': 'md5:9d4246b70dce772a3a396c4bd84c8506',
'display_id': 'the-pitch',
'duration': 5923,
'episode': 'The Pitch',
'episode_id': 'the-pitch',
'episode_number': 1,
'season': 'Season 1',
'season_number': 1,
'series': 'The Pitch',
'series_id': 'the-pitch',
'tags': 'count:8',
'thumbnail': r're:https?://.+',
'timestamp': 1762693216,
'upload_date': '20251109',
'uploader_id': '18140038001',
},
}, {
'url': 'https://www.tvo.org/video/documentaries/valentines-day',
'info_dict': {
'id': '6387298331112',
'ext': 'mp4',
'title': 'Valentine\'s Day',
'categories': ['Documentaries'],
'description': 'md5:b142149beb2d3a855244816c50cd2f14',
'display_id': 'valentines-day',
'duration': 3121,
'episode': 'Valentine\'s Day',
'episode_id': 'valentines-day',
'episode_number': 2,
'season': 'Season 1',
'season_number': 1,
'series': 'How We Celebrate',
'series_id': 'how-we-celebrate',
'tags': 'count:6',
'thumbnail': r're:https?://.+',
'timestamp': 1770386416,
'upload_date': '20260206',
'uploader_id': '18140038001',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/18140038001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
video_data = self._download_json(
'https://hmy0rc1bo2.execute-api.ca-central-1.amazonaws.com/graphql',
display_id, headers={'Content-Type': 'application/json'},
data=json.dumps({
'operationName': 'getVideo',
'variables': {'slug': urllib.parse.urlparse(url).path.rstrip('/')},
'query': '''query getVideo($slug: String) {
getTVOOrgVideo(slug: $slug) {
contentCategory
description
length
program {
nodeUrl
title
}
programOrder
publishedAt
season
tags
thumbnail
title
videoSource {
brightcoveRefId
}
}
}''',
}, separators=(',', ':')).encode(),
)['data']['getTVOOrgVideo']
brightcove_id = traverse_obj(video_data, (
'videoSource', 'brightcoveRefId', {str_or_none}, {require('Brightcove ID')}))
return {
'_type': 'url_transparent',
'ie_key': BrightcoveNewIE.ie_key(),
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['CA']}),
'display_id': display_id,
'episode_id': display_id,
**traverse_obj(video_data, {
'title': ('title', {clean_html}, filter),
'categories': ('contentCategory', {clean_html}, filter, all, filter),
'description': ('description', {clean_html}, filter),
'duration': ('length', {parse_duration}),
'episode': ('title', {clean_html}, filter),
'episode_number': ('programOrder', {int_or_none}),
'season_number': ('season', {int_or_none}),
'tags': ('tags', ..., {clean_html}, filter),
'thumbnail': ('thumbnail', {url_or_none}),
'timestamp': ('publishedAt', {parse_iso8601}),
}),
**traverse_obj(video_data, ('program', {
'series': ('title', {clean_html}, filter),
'series_id': ('nodeUrl', {clean_html}, {trim_str(start='/programs/')}, filter),
})),
}

View File

@@ -131,11 +131,15 @@ class TwitterBaseIE(InfoExtractor):
video_id, headers=headers, query=query, expected_status=allowed_status,
note=f'Downloading {"GraphQL" if graphql else "legacy API"} JSON')
if result.get('errors'):
errors = ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str}))))
if errors and 'not authorized' in errors:
self.raise_login_required(remove_end(errors, '.'))
raise ExtractorError(f'Error(s) while querying API: {errors or "Unknown error"}')
if error_msg := ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str})))):
# Errors with the message 'Dependency: Unspecified' are a false positive
# See https://github.com/yt-dlp/yt-dlp/issues/15963
if error_msg.lower() == 'dependency: unspecified':
self.write_debug(f'Ignoring Twitter API error: "{error_msg}"')
elif 'not authorized' in error_msg.lower():
self.raise_login_required(remove_end(error_msg, '.'))
else:
raise ExtractorError(f'Error(s) while querying API: {error_msg or "Unknown error"}')
return result
@@ -1078,7 +1082,7 @@ class TwitterIE(TwitterBaseIE):
raise ExtractorError(f'Twitter API says: {cause or "Unknown error"}', expected=True)
elif typename == 'TweetUnavailable':
reason = result.get('reason')
if reason == 'NsfwLoggedOut':
if reason in ('NsfwLoggedOut', 'NsfwViewerHasNoStatedAge'):
self.raise_login_required('NSFW tweet requires authentication')
elif reason == 'Protected':
self.raise_login_required('You are not authorized to view this protected tweet')

View File

@@ -67,6 +67,10 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'plus\.rtl\.de(?!/podcast/)',
r'mediasetinfinity\.es',
r'tv5mondeplus\.com',
r'tv\.rakuten\.co\.jp',
r'watch\.telusoriginals\.com',
r'video\.unext\.jp',
r'www\.web\.nhk',
)
_TESTS = [{
@@ -231,6 +235,23 @@ class KnownDRMIE(UnsupportedInfoExtractor):
# https://github.com/yt-dlp/yt-dlp/issues/14743
'url': 'https://www.tv5mondeplus.com/',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/8821
'url': 'https://tv.rakuten.co.jp/content/519554/',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/9851
'url': 'https://watch.telusoriginals.com/play?assetID=fruit-is-ripe',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/13220
# https://github.com/yt-dlp/yt-dlp/issues/14564
'url': 'https://video.unext.jp/play/SID0062010/ED00337407',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/14620
'url': 'https://www.web.nhk/tv/an/72hours/pl/series-tep-W3W8WRN8M3/ep/QW8ZY6146V',
'only_matching': True,
}]
def _real_extract(self, url):

116
yt_dlp/extractor/visir.py Normal file
View File

@@ -0,0 +1,116 @@
import re
from .common import InfoExtractor
from ..utils import (
UnsupportedError,
clean_html,
int_or_none,
js_to_json,
month_by_name,
url_or_none,
urljoin,
)
from ..utils.traversal import find_element, traverse_obj
class VisirIE(InfoExtractor):
IE_DESC = 'Vísir'
_VALID_URL = r'https?://(?:www\.)?visir\.is/(?P<type>k|player)/(?P<id>[\da-f-]+)(?:/(?P<slug>[\w.-]+))?'
_EMBED_REGEX = [rf'<iframe[^>]+src=["\'](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://www.visir.is/k/eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
'info_dict': {
'id': 'eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
'ext': 'mp4',
'title': 'Sveppi og Siggi Þór mestu skaphundarnir',
'categories': ['island-i-dag'],
'description': 'md5:e06bd6a0cd8bdde328ad8cf00d3d4df6',
'duration': 792,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260121',
'view_count': int,
},
}, {
'url': 'https://www.visir.is/k/b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704/tonlistarborgin-reykjavik',
'info_dict': {
'id': 'b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704',
'ext': 'mp4',
'title': 'Tónlistarborgin Reykjavík',
'categories': ['tonlist'],
'description': 'md5:47237589dc95dbde55dfbb163396f88a',
'display_id': 'tonlistarborgin-reykjavik',
'duration': 81,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20251124',
'view_count': int,
},
}, {
'url': 'https://www.visir.is/player/0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
'info_dict': {
'id': '0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
'ext': 'mp4',
'title': 'Sportpakkinn 2. febrúar 2026',
'categories': ['sportpakkinn'],
'display_id': 'sportpakkinn-2.-februar-2026',
'duration': 293,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260202',
'view_count': int,
},
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.visir.is/g/20262837896d/segir-von-brigdin-med-prinsessuna-rista-djupt',
'info_dict': {
'id': '9ad5e58a-f26f-49f7-8b1d-68f0629485b7-1770059257365',
'ext': 'mp4',
'title': 'Norðmenn tala ekki um annað en prinsessuna',
'categories': ['frettir'],
'description': 'md5:53e2623ae79e1355778c14f5b557a0cd',
'display_id': 'nordmenn-tala-ekki-um-annad-en-prinsessuna',
'duration': 138,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260202',
'view_count': int,
},
}]
def _real_extract(self, url):
video_type, video_id, display_id = self._match_valid_url(url).group('type', 'id', 'slug')
webpage = self._download_webpage(url, video_id)
if video_type == 'player':
real_url = self._og_search_url(webpage)
if not self.suitable(real_url) or self._match_valid_url(real_url).group('type') == 'player':
raise UnsupportedError(real_url)
return self.url_result(real_url, self.ie_key())
upload_date = None
date_elements = traverse_obj(webpage, (
{find_element(cls='article-item__date')}, {clean_html}, filter, {str.split}))
if date_elements and len(date_elements) == 3:
day, month, year = date_elements
day = int_or_none(day.rstrip('.'))
month = month_by_name(month, 'is')
if day and month and re.fullmatch(r'[0-9]{4}', year):
upload_date = f'{year}{month:02d}{day:02d}'
player = self._search_json(
r'App\.Player\.Init\(', webpage, video_id, 'player', transform_source=js_to_json)
m3u8_url = traverse_obj(player, ('File', {urljoin('https://vod.visir.is/')}))
return {
'id': video_id,
'display_id': display_id,
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4'),
'upload_date': upload_date,
**traverse_obj(webpage, ({find_element(cls='article-item press-ads')}, {
'description': ({find_element(cls='-large')}, {clean_html}, filter),
'view_count': ({find_element(cls='article-item__viewcount')}, {clean_html}, {int_or_none}),
})),
**traverse_obj(player, {
'title': ('Title', {clean_html}),
'categories': ('Categoryname', {clean_html}, filter, all, filter),
'duration': ('MediaDuration', {int_or_none}),
'thumbnail': ('Image', {url_or_none}),
}),
}

View File

@@ -1,6 +1,7 @@
import collections
import hashlib
import re
import urllib.parse
from .common import InfoExtractor
from .dailymotion import DailymotionIE
@@ -8,6 +9,7 @@ from .odnoklassniki import OdnoklassnikiIE
from .sibnet import SibnetEmbedIE
from .vimeo import VimeoIE
from .youtube import YoutubeIE
from ..jsinterp import JSInterpreter
from ..utils import (
ExtractorError,
UserNotLive,
@@ -36,16 +38,38 @@ class VKBaseIE(InfoExtractor):
def _download_webpage_handle(self, url_or_request, video_id, *args, fatal=True, **kwargs):
response = super()._download_webpage_handle(url_or_request, video_id, *args, fatal=fatal, **kwargs)
challenge_url, cookie = response[1].url if response else '', None
if challenge_url.startswith('https://vk.com/429.html?'):
cookie = self._get_cookies(challenge_url).get('hash429')
if not cookie:
if response is False:
return response
hash429 = hashlib.md5(cookie.value.encode('ascii')).hexdigest()
webpage, urlh = response
challenge_url = urlh.url
if urllib.parse.urlparse(challenge_url).path != '/challenge.html':
return response
self.to_screen(join_nonempty(
video_id and f'[{video_id}]',
'Received a JS challenge response',
delim=' '))
challenge_hash = traverse_obj(challenge_url, (
{parse_qs}, 'hash429', -1, {require('challenge hash')}))
func_code = self._search_regex(
r'(?s)var\s+salt\s*=\s*\(\s*function\s*\(\)\s*(\{.+?\})\s*\)\(\);\s*var\s+hash',
webpage, 'JS challenge salt function')
jsi = JSInterpreter(f'function salt() {func_code}')
salt = jsi.extract_function('salt')([])
self.write_debug(f'Generated salt with native JS interpreter: {salt}')
key_hash = hashlib.md5(f'{challenge_hash}:{salt}'.encode()).hexdigest()
self.write_debug(f'JS challenge key hash: {key_hash}')
# Request with the challenge key and the response should set a 'solution429' cookie
self._request_webpage(
update_url_query(challenge_url, {'key': hash429}), video_id, fatal=fatal,
note='Resolving WAF challenge', errnote='Failed to bypass WAF challenge')
update_url_query(challenge_url, {'key': key_hash}), video_id,
'Submitting JS challenge solution', 'Unable to solve JS challenge', fatal=True)
return super()._download_webpage_handle(url_or_request, video_id, *args, fatal=True, **kwargs)
def _perform_login(self, username, password):

View File

@@ -3,6 +3,7 @@ import re
import urllib.parse
from .common import InfoExtractor
from ..jsinterp import int_to_int32
from ..utils import (
ExtractorError,
clean_html,
@@ -20,73 +21,69 @@ from ..utils import (
)
def to_signed_32(n):
return n % ((-1 if n < 0 else 1) * 2**32)
class _ByteGenerator:
def __init__(self, algo_id, seed):
try:
self._algorithm = getattr(self, f'_algo{algo_id}')
except AttributeError:
raise ExtractorError(f'Unknown algorithm ID "{algo_id}"')
self._s = to_signed_32(seed)
self._s = int_to_int32(seed)
def _algo1(self, s):
# LCG (a=1664525, c=1013904223, m=2^32)
# Ref: https://en.wikipedia.org/wiki/Linear_congruential_generator
s = self._s = to_signed_32(s * 1664525 + 1013904223)
s = self._s = int_to_int32(s * 1664525 + 1013904223)
return s
def _algo2(self, s):
# xorshift32
# Ref: https://en.wikipedia.org/wiki/Xorshift
s = to_signed_32(s ^ (s << 13))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 17))
s = self._s = to_signed_32(s ^ (s << 5))
s = int_to_int32(s ^ (s << 13))
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 17))
s = self._s = int_to_int32(s ^ (s << 5))
return s
def _algo3(self, s):
# Weyl Sequence (k≈2^32*φ, m=2^32) + MurmurHash3 (fmix32)
# Ref: https://en.wikipedia.org/wiki/Weyl_sequence
# https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash3.java.html
s = self._s = to_signed_32(s + 0x9e3779b9)
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16))
s = to_signed_32(s * to_signed_32(0x85ebca77))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 13))
s = to_signed_32(s * to_signed_32(0xc2b2ae3d))
return to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16))
s = self._s = int_to_int32(s + 0x9e3779b9)
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
s = int_to_int32(s * int_to_int32(0x85ebca77))
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 13))
s = int_to_int32(s * int_to_int32(0xc2b2ae3d))
return int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
def _algo4(self, s):
# Custom scrambling function involving a left rotation (ROL)
s = self._s = to_signed_32(s + 0x6d2b79f5)
s = to_signed_32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7
s = to_signed_32(s + 0x9e3779b9)
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 11))
return to_signed_32(s * 0x27d4eb2d)
s = self._s = int_to_int32(s + 0x6d2b79f5)
s = int_to_int32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7
s = int_to_int32(s + 0x9e3779b9)
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 11))
return int_to_int32(s * 0x27d4eb2d)
def _algo5(self, s):
# xorshift variant with a final addition
s = to_signed_32(s ^ (s << 7))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 9))
s = to_signed_32(s ^ (s << 8))
s = self._s = to_signed_32(s + 0xa5a5a5a5)
s = int_to_int32(s ^ (s << 7))
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 9))
s = int_to_int32(s ^ (s << 8))
s = self._s = int_to_int32(s + 0xa5a5a5a5)
return s
def _algo6(self, s):
# LCG (a=0x2c9277b5, c=0xac564b05) with a variable right shift scrambler
s = self._s = to_signed_32(s * to_signed_32(0x2c9277b5) + to_signed_32(0xac564b05))
s2 = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 18))
s = self._s = int_to_int32(s * int_to_int32(0x2c9277b5) + int_to_int32(0xac564b05))
s2 = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 18))
shift = (s & 0xFFFFFFFF) >> 27 & 31
return to_signed_32((s2 & 0xFFFFFFFF) >> shift)
return int_to_int32((s2 & 0xFFFFFFFF) >> shift)
def _algo7(self, s):
# Weyl Sequence (k=0x9e3779b9) + custom multiply-xor-shift mixing function
s = self._s = to_signed_32(s + to_signed_32(0x9e3779b9))
e = to_signed_32(s ^ (s << 5))
e = to_signed_32(e * to_signed_32(0x7feb352d))
e = to_signed_32(e ^ ((e & 0xFFFFFFFF) >> 15))
return to_signed_32(e * to_signed_32(0x846ca68b))
s = self._s = int_to_int32(s + int_to_int32(0x9e3779b9))
e = int_to_int32(s ^ (s << 5))
e = int_to_int32(e * int_to_int32(0x7feb352d))
e = int_to_int32(e ^ ((e & 0xFFFFFFFF) >> 15))
return int_to_int32(e * int_to_int32(0x846ca68b))
def __next__(self):
return self._algorithm(self._s) & 0xFF
@@ -213,16 +210,9 @@ class XHamsterIE(InfoExtractor):
'only_matching': True,
}]
def _decipher_format_url(self, format_url, format_id):
parsed_url = urllib.parse.urlparse(format_url)
hex_string, path_remainder = self._search_regex(
r'^/(?P<hex>[0-9a-fA-F]{12,})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
default=(None, None), group=('hex', 'rem'))
if not hex_string:
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
return None
_VALID_HEX_RE = r'[0-9a-fA-F]{12,}'
def _decipher_hex_string(self, hex_string, format_id):
byte_data = bytes.fromhex(hex_string)
seed = int.from_bytes(byte_data[1:5], byteorder='little', signed=True)
@@ -232,7 +222,33 @@ class XHamsterIE(InfoExtractor):
self.report_warning(f'Skipping format "{format_id}": {e.msg}')
return None
deciphered = bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1')
return bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1')
def _decipher_format_url(self, format_url, format_id):
# format_url can be hex ciphertext or a URL with a hex ciphertext segment
if re.fullmatch(self._VALID_HEX_RE, format_url):
return self._decipher_hex_string(format_url, format_id)
elif not url_or_none(format_url):
if re.fullmatch(r'[0-9a-fA-F]+', format_url):
# Hex strings that are too short are expected, so we don't want to warn
self.write_debug(f'Skipping dummy ciphertext for "{format_id}": {format_url}')
else:
# Something has likely changed on the site's end, so we need to warn
self.report_warning(f'Skipping format "{format_id}": invalid ciphertext')
return None
parsed_url = urllib.parse.urlparse(format_url)
hex_string, path_remainder = self._search_regex(
rf'^/(?P<hex>{self._VALID_HEX_RE})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
default=(None, None), group=('hex', 'rem'))
if not hex_string:
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
return None
deciphered = self._decipher_hex_string(hex_string, format_id)
if not deciphered:
return None
return parsed_url._replace(path=f'/{deciphered}{path_remainder}').geturl()
@@ -252,7 +268,7 @@ class XHamsterIE(InfoExtractor):
display_id = mobj.group('display_id') or mobj.group('display_id_2')
desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
webpage, urlh = self._download_webpage_handle(desktop_url, video_id, impersonate=True)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',

View File

@@ -16,7 +16,7 @@ from ._redirect import (
YoutubeYtBeIE,
YoutubeYtUserIE,
)
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
@@ -39,7 +39,6 @@ for _cls in [
YoutubeYtBeIE,
YoutubeYtUserIE,
YoutubeMusicSearchURLIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubePlaylistIE,

View File

@@ -28,21 +28,6 @@ class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
}]
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube search, newest videos first'
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
_TESTS = [{
'url': 'ytsearchdate5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}]
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube search URLs with sorting and filter support'
IE_NAME = YoutubeSearchIE.IE_NAME + '_url'

View File

@@ -139,11 +139,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
]
_RETURN_TYPE = 'video' # XXX: How to handle multifeed?
_PLAYER_INFO_RE = (
r'/s/player/(?P<id>[a-zA-Z0-9_-]{8,})/(?:tv-)?player',
r'/(?P<id>[a-zA-Z0-9_-]{8,})/player(?:_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?|-plasma-ias-(?:phone|tablet)-[a-z]{2}_[A-Z]{2}\.vflset)/base\.js$',
r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.js$',
)
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
_DEFAULT_CLIENTS = ('android_vr', 'web', 'web_safari')
_DEFAULT_JSLESS_CLIENTS = ('android_vr',)
@@ -1879,17 +1874,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}]
_DEFAULT_PLAYER_JS_VERSION = 'actual'
_DEFAULT_PLAYER_JS_VARIANT = 'main'
_DEFAULT_PLAYER_JS_VARIANT = 'tv'
_PLAYER_JS_VARIANT_MAP = {
'main': 'player_ias.vflset/en_US/base.js',
'tcc': 'player_ias_tcc.vflset/en_US/base.js',
'tce': 'player_ias_tce.vflset/en_US/base.js',
'es5': 'player_es5.vflset/en_US/base.js',
'es6': 'player_es6.vflset/en_US/base.js',
'es6_tcc': 'player_es6_tcc.vflset/en_US/base.js',
'es6_tce': 'player_es6_tce.vflset/en_US/base.js',
'tv': 'tv-player-ias.vflset/tv-player-ias.js',
'tv_es6': 'tv-player-es6.vflset/tv-player-es6.js',
'phone': 'player-plasma-ias-phone-en_US.vflset/base.js',
'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js', # Dead since 19712d96 (2025.11.06)
'house': 'house_brand_player.vflset/en_US/base.js', # Used by Google Drive
}
_INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()}
@@ -2179,13 +2176,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
@classmethod
def _extract_player_info(cls, player_url):
for player_re in cls._PLAYER_INFO_RE:
id_m = re.search(player_re, player_url)
if id_m:
break
else:
raise ExtractorError(f'Cannot identify player {player_url!r}')
return id_m.group('id')
if m := re.search(r'/s/player/(?P<id>[a-fA-F0-9]{8,})/', player_url):
return m.group('id')
raise ExtractorError(f'Cannot identify player {player_url!r}')
def _load_player(self, video_id, player_url, fatal=True):
player_js_key = self._player_js_cache_key(player_url)
@@ -3219,6 +3212,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
])
skip_player_js = 'js' in self._configuration_arg('player_skip')
format_types = self._configuration_arg('formats')
skip_bad_formats = 'incomplete' not in format_types
all_formats = 'duplicate' in format_types
if self._configuration_arg('include_duplicate_formats'):
all_formats = True
@@ -3464,7 +3458,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
https_fmts = []
for fmt_stream in streaming_formats:
if fmt_stream.get('targetDurationSec'):
# Live adaptive https formats are not supported: skip unless extractor-arg given
if fmt_stream.get('targetDurationSec') and skip_bad_formats:
continue
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
@@ -3576,7 +3571,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
yield from process_https_formats()
needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = 'incomplete' not in format_types
skip_manifests = set(self._configuration_arg('skip'))
if (needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway
@@ -4086,16 +4080,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
needs_live_processing = self._needs_live_processing(live_status, duration)
def is_bad_format(fmt):
if needs_live_processing and not fmt.get('is_from_start'):
return True
elif (live_status == 'is_live' and needs_live_processing != 'is_live'
and fmt.get('protocol') == 'http_dash_segments'):
return True
def adjust_incomplete_format(fmt, note_suffix='(Last 2 hours)', pref_adjustment=-10):
fmt['preference'] = (fmt.get('preference') or -1) + pref_adjustment
fmt['format_note'] = join_nonempty(fmt.get('format_note'), note_suffix, delim=' ')
for fmt in filter(is_bad_format, formats):
fmt['preference'] = (fmt.get('preference') or -1) - 10
fmt['format_note'] = join_nonempty(fmt.get('format_note'), '(Last 2 hours)', delim=' ')
# Adjust preference and format note for incomplete live/post-live formats
if live_status in ('is_live', 'post_live'):
for fmt in formats:
protocol = fmt.get('protocol')
# Currently, protocol isn't set for adaptive https formats, but this could change
is_adaptive = protocol in (None, 'http', 'https')
if live_status == 'post_live' and is_adaptive:
# Post-live adaptive formats cause HttpFD to raise "Did not get any data blocks"
# These formats are *only* useful to external applications, so we can hide them
# Set their preference <= -1000 so that FormatSorter flags them as 'hidden'
adjust_incomplete_format(fmt, note_suffix='(ended)', pref_adjustment=-5000)
# Is it live with --live-from-start? Or is it post-live and its duration is >2hrs?
elif needs_live_processing:
if not fmt.get('is_from_start'):
# Post-live m3u8 formats for >2hr streams
adjust_incomplete_format(fmt)
elif live_status == 'is_live':
if protocol == 'http_dash_segments':
# Live DASH formats without --live-from-start
adjust_incomplete_format(fmt)
elif is_adaptive:
# Incomplete live adaptive https formats
adjust_incomplete_format(fmt, note_suffix='(incomplete)', pref_adjustment=-20)
if needs_live_processing:
self._prepare_live_from_start_formats(

View File

@@ -1,10 +1,10 @@
# This file is generated by devscripts/update_ejs.py. DO NOT MODIFY!
VERSION = '0.4.0'
VERSION = '0.5.0'
HASHES = {
'yt.solver.bun.lib.js': '6ff45e94de9f0ea936a183c48173cfa9ce526ee4b7544cd556428427c1dd53c8073ef0174e79b320252bf0e7c64b0032cc1cf9c4358f3fda59033b7caa01c241',
'yt.solver.core.js': '05964b458d92a65d4fb7a90bcb5921c9fed2370f4e4f2f25badb41f28aff9069e0b3c4e5bf1baf2d3021787b67fc6093cefa44de30cffdc6f9fb25532484003b',
'yt.solver.core.min.js': '0cd3c0b37e095d3cca99443b58fe03980ac3bf2e777c2485c23e1f6052b5ede9f07c7f1c79a9c3af3258ea91a228f099741e7eb07b53125b5dcc84bb4c0054f3',
'yt.solver.core.js': '9742868113d7b0c29e24a95c8eb2c2bec7cdf95513dc7f55f523ba053c0ecf2af7dcb0138b1d933578304f0dda633a6b3bfff64e912b4c547b99dad083428c4b',
'yt.solver.core.min.js': 'aee8c3354cfd535809c871c2a517d03231f89cd184e903af82ee274bcc2e90991ef19cb3f65f2ccc858c4963856ea87f8692fe16d71209f4fc7f41c44b828e36',
'yt.solver.deno.lib.js': '9c8ee3ab6c23e443a5a951e3ac73c6b8c1c8fb34335e7058a07bf99d349be5573611de00536dcd03ecd3cf34014c4e9b536081de37af3637c5390c6a6fd6a0f0',
'yt.solver.lib.js': '1ee3753a8222fc855f5c39db30a9ccbb7967dbe1fb810e86dc9a89aa073a0907f294c720e9b65427d560a35aa1ce6af19ef854d9126a05ca00afe03f72047733',
'yt.solver.lib.min.js': '8420c259ad16e99ce004e4651ac1bcabb53b4457bf5668a97a9359be9a998a789fee8ab124ee17f91a2ea8fd84e0f2b2fc8eabcaf0b16a186ba734cf422ad053',

View File

@@ -60,26 +60,29 @@ var jsc = (function (meriyah, astring) {
}
return value;
}
const nsigExpression = {
type: 'VariableDeclaration',
kind: 'var',
declarations: [
const nsig = {
type: 'CallExpression',
callee: { or: [{ type: 'Identifier' }, { type: 'SequenceExpression' }] },
arguments: [
{},
{
type: 'VariableDeclarator',
init: {
type: 'CallExpression',
callee: { type: 'Identifier' },
arguments: [
{ type: 'Literal' },
{
type: 'CallExpression',
callee: { type: 'Identifier', name: 'decodeURIComponent' },
},
],
},
type: 'CallExpression',
callee: { type: 'Identifier', name: 'decodeURIComponent' },
arguments: [{}],
},
],
};
const nsigAssignment = {
type: 'AssignmentExpression',
left: { type: 'Identifier' },
operator: '=',
right: nsig,
};
const nsigDeclarator = {
type: 'VariableDeclarator',
id: { type: 'Identifier' },
init: nsig,
};
const logicalExpression = {
type: 'ExpressionStatement',
expression: {
@@ -97,6 +100,17 @@ var jsc = (function (meriyah, astring) {
callee: { type: 'Identifier' },
arguments: {
or: [
[
{
type: 'CallExpression',
callee: {
type: 'Identifier',
name: 'decodeURIComponent',
},
arguments: [{ type: 'Identifier' }],
optional: false,
},
],
[
{ type: 'Literal' },
{
@@ -110,6 +124,8 @@ var jsc = (function (meriyah, astring) {
},
],
[
{ type: 'Literal' },
{ type: 'Literal' },
{
type: 'CallExpression',
callee: {
@@ -138,18 +154,18 @@ var jsc = (function (meriyah, astring) {
expression: {
type: 'AssignmentExpression',
operator: '=',
left: { type: 'Identifier' },
right: { type: 'FunctionExpression', params: [{}, {}, {}] },
left: { or: [{ type: 'Identifier' }, { type: 'MemberExpression' }] },
right: { type: 'FunctionExpression' },
},
},
{ type: 'FunctionDeclaration', params: [{}, {}, {}] },
{ type: 'FunctionDeclaration' },
{
type: 'VariableDeclaration',
declarations: {
anykey: [
{
type: 'VariableDeclarator',
init: { type: 'FunctionExpression', params: [{}, {}, {}] },
init: { type: 'FunctionExpression' },
},
],
},
@@ -157,124 +173,150 @@ var jsc = (function (meriyah, astring) {
],
};
function extract$1(node) {
if (!matchesStructure(node, identifier$1)) {
return null;
}
let block;
if (
const blocks = [];
if (matchesStructure(node, identifier$1)) {
if (
node.type === 'ExpressionStatement' &&
node.expression.type === 'AssignmentExpression' &&
node.expression.right.type === 'FunctionExpression' &&
node.expression.right.params.length >= 3
) {
blocks.push(node.expression.right.body);
} else if (node.type === 'VariableDeclaration') {
for (const decl of node.declarations) {
if (
_optionalChain$2([
decl,
'access',
(_) => _.init,
'optionalAccess',
(_2) => _2.type,
]) === 'FunctionExpression' &&
decl.init.params.length >= 3
) {
blocks.push(decl.init.body);
}
}
} else if (
node.type === 'FunctionDeclaration' &&
node.params.length >= 3
) {
blocks.push(node.body);
} else {
return null;
}
} else if (
node.type === 'ExpressionStatement' &&
node.expression.type === 'AssignmentExpression' &&
node.expression.right.type === 'FunctionExpression'
node.expression.type === 'SequenceExpression'
) {
block = node.expression.right.body;
} else if (node.type === 'VariableDeclaration') {
for (const decl of node.declarations) {
for (const expr of node.expression.expressions) {
if (
decl.type === 'VariableDeclarator' &&
_optionalChain$2([
decl,
'access',
(_) => _.init,
'optionalAccess',
(_2) => _2.type,
]) === 'FunctionExpression' &&
_optionalChain$2([
decl,
'access',
(_3) => _3.init,
'optionalAccess',
(_4) => _4.params,
'access',
(_5) => _5.length,
]) === 3
expr.type === 'AssignmentExpression' &&
expr.right.type === 'FunctionExpression' &&
expr.right.params.length === 3
) {
block = decl.init.body;
break;
blocks.push(expr.right.body);
}
}
} else if (node.type === 'FunctionDeclaration') {
block = node.body;
} else {
return null;
}
const relevantExpression = _optionalChain$2([
block,
'optionalAccess',
(_6) => _6.body,
'access',
(_7) => _7.at,
'call',
(_8) => _8(-2),
]);
let call = null;
if (matchesStructure(relevantExpression, logicalExpression)) {
if (
_optionalChain$2([
relevantExpression,
'optionalAccess',
(_9) => _9.type,
]) !== 'ExpressionStatement' ||
relevantExpression.expression.type !== 'LogicalExpression' ||
relevantExpression.expression.right.type !== 'SequenceExpression' ||
relevantExpression.expression.right.expressions[0].type !==
'AssignmentExpression' ||
relevantExpression.expression.right.expressions[0].right.type !==
'CallExpression'
) {
return null;
}
call = relevantExpression.expression.right.expressions[0].right;
} else if (
_optionalChain$2([
relevantExpression,
'optionalAccess',
(_10) => _10.type,
]) === 'IfStatement' &&
relevantExpression.consequent.type === 'BlockStatement'
) {
for (const n of relevantExpression.consequent.body) {
if (!matchesStructure(n, nsigExpression)) {
continue;
for (const block of blocks) {
let call = null;
for (const stmt of block.body) {
if (matchesStructure(stmt, logicalExpression)) {
if (
stmt.type === 'ExpressionStatement' &&
stmt.expression.type === 'LogicalExpression' &&
stmt.expression.right.type === 'SequenceExpression' &&
stmt.expression.right.expressions[0].type ===
'AssignmentExpression' &&
stmt.expression.right.expressions[0].right.type === 'CallExpression'
) {
call = stmt.expression.right.expressions[0].right;
}
} else if (stmt.type === 'IfStatement') {
let consequent = stmt.consequent;
while (consequent.type === 'LabeledStatement') {
consequent = consequent.body;
}
if (consequent.type !== 'BlockStatement') {
continue;
}
for (const n of consequent.body) {
if (n.type !== 'VariableDeclaration') {
continue;
}
for (const decl of n.declarations) {
if (
matchesStructure(decl, nsigDeclarator) &&
_optionalChain$2([
decl,
'access',
(_3) => _3.init,
'optionalAccess',
(_4) => _4.type,
]) === 'CallExpression'
) {
call = decl.init;
break;
}
}
if (call) {
break;
}
}
} else if (stmt.type === 'ExpressionStatement') {
if (
stmt.expression.type !== 'LogicalExpression' ||
stmt.expression.operator !== '&&' ||
stmt.expression.right.type !== 'SequenceExpression'
) {
continue;
}
for (const expr of stmt.expression.right.expressions) {
if (matchesStructure(expr, nsigAssignment) && expr.type) {
if (
expr.type === 'AssignmentExpression' &&
expr.right.type === 'CallExpression'
) {
call = expr.right;
break;
}
}
}
}
if (
n.type !== 'VariableDeclaration' ||
_optionalChain$2([
n,
'access',
(_11) => _11.declarations,
'access',
(_12) => _12[0],
'access',
(_13) => _13.init,
'optionalAccess',
(_14) => _14.type,
]) !== 'CallExpression'
) {
continue;
if (call) {
break;
}
call = n.declarations[0].init;
break;
}
if (!call) {
continue;
}
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: 'sig' }],
body: {
type: 'CallExpression',
callee: call.callee,
arguments: call.arguments.map((arg) => {
if (
arg.type === 'CallExpression' &&
arg.callee.type === 'Identifier' &&
arg.callee.name === 'decodeURIComponent'
) {
return { type: 'Identifier', name: 'sig' };
}
return arg;
}),
optional: false,
},
async: false,
expression: false,
generator: false,
};
}
if (call === null) {
return null;
}
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: 'sig' }],
body: {
type: 'CallExpression',
callee: { type: 'Identifier', name: call.callee.name },
arguments:
call.arguments.length === 1
? [{ type: 'Identifier', name: 'sig' }]
: [call.arguments[0], { type: 'Identifier', name: 'sig' }],
optional: false,
},
async: false,
expression: false,
generator: false,
};
return null;
}
function _optionalChain$1(ops) {
let lastAccessLHS = undefined;
@@ -472,8 +514,31 @@ var jsc = (function (meriyah, astring) {
return value;
}
function preprocessPlayer(data) {
const ast = meriyah.parse(data);
const body = ast.body;
const program = meriyah.parse(data);
const plainStatements = modifyPlayer(program);
const solutions = getSolutions(plainStatements);
for (const [name, options] of Object.entries(solutions)) {
plainStatements.push({
type: 'ExpressionStatement',
expression: {
type: 'AssignmentExpression',
operator: '=',
left: {
type: 'MemberExpression',
computed: false,
object: { type: 'Identifier', name: '_result' },
property: { type: 'Identifier', name: name },
optional: false,
},
right: multiTry(options),
},
});
}
program.body.splice(0, 0, ...setupNodes);
return astring.generate(program);
}
function modifyPlayer(program) {
const body = program.body;
const block = (() => {
switch (body.length) {
case 1: {
@@ -506,16 +571,7 @@ var jsc = (function (meriyah, astring) {
}
throw 'unexpected structure';
})();
const found = { n: [], sig: [] };
const plainExpressions = block.body.filter((node) => {
const n = extract(node);
if (n) {
found.n.push(n);
}
const sig = extract$1(node);
if (sig) {
found.sig.push(sig);
}
block.body = block.body.filter((node) => {
if (node.type === 'ExpressionStatement') {
if (node.expression.type === 'AssignmentExpression') {
return true;
@@ -524,41 +580,241 @@ var jsc = (function (meriyah, astring) {
}
return true;
});
block.body = plainExpressions;
for (const [name, options] of Object.entries(found)) {
const unique = new Set(options.map((x) => JSON.stringify(x)));
if (unique.size !== 1) {
const message = `found ${unique.size} ${name} function possibilities`;
throw (
message +
(unique.size
? `: ${options.map((x) => astring.generate(x)).join(', ')}`
: '')
);
return block.body;
}
function getSolutions(statements) {
const found = { n: [], sig: [] };
for (const statement of statements) {
const n = extract(statement);
if (n) {
found.n.push(n);
}
const sig = extract$1(statement);
if (sig) {
found.sig.push(sig);
}
plainExpressions.push({
type: 'ExpressionStatement',
expression: {
type: 'AssignmentExpression',
operator: '=',
left: {
type: 'MemberExpression',
computed: false,
object: { type: 'Identifier', name: '_result' },
property: { type: 'Identifier', name: name },
},
right: options[0],
},
});
}
ast.body.splice(0, 0, ...setupNodes);
return astring.generate(ast);
return found;
}
function getFromPrepared(code) {
const resultObj = { n: null, sig: null };
Function('_result', code)(resultObj);
return resultObj;
}
function multiTry(generators) {
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: '_input' }],
body: {
type: 'BlockStatement',
body: [
{
type: 'VariableDeclaration',
kind: 'const',
declarations: [
{
type: 'VariableDeclarator',
id: { type: 'Identifier', name: '_results' },
init: {
type: 'NewExpression',
callee: { type: 'Identifier', name: 'Set' },
arguments: [],
},
},
],
},
{
type: 'ForOfStatement',
left: {
type: 'VariableDeclaration',
kind: 'const',
declarations: [
{
type: 'VariableDeclarator',
id: { type: 'Identifier', name: '_generator' },
init: null,
},
],
},
right: { type: 'ArrayExpression', elements: generators },
body: {
type: 'BlockStatement',
body: [
{
type: 'TryStatement',
block: {
type: 'BlockStatement',
body: [
{
type: 'ExpressionStatement',
expression: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'add' },
optional: false,
},
arguments: [
{
type: 'CallExpression',
callee: {
type: 'Identifier',
name: '_generator',
},
arguments: [
{ type: 'Identifier', name: '_input' },
],
optional: false,
},
],
optional: false,
},
},
],
},
handler: {
type: 'CatchClause',
param: null,
body: { type: 'BlockStatement', body: [] },
},
finalizer: null,
},
],
},
await: false,
},
{
type: 'IfStatement',
test: {
type: 'UnaryExpression',
operator: '!',
argument: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'size' },
optional: false,
},
prefix: true,
},
consequent: {
type: 'BlockStatement',
body: [
{
type: 'ThrowStatement',
argument: {
type: 'TemplateLiteral',
expressions: [],
quasis: [
{
type: 'TemplateElement',
value: { cooked: 'no solutions', raw: 'no solutions' },
tail: true,
},
],
},
},
],
},
alternate: null,
},
{
type: 'IfStatement',
test: {
type: 'BinaryExpression',
left: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'size' },
optional: false,
},
right: { type: 'Literal', value: 1 },
operator: '!==',
},
consequent: {
type: 'BlockStatement',
body: [
{
type: 'ThrowStatement',
argument: {
type: 'TemplateLiteral',
expressions: [
{
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'join' },
optional: false,
},
arguments: [{ type: 'Literal', value: ', ' }],
optional: false,
},
],
quasis: [
{
type: 'TemplateElement',
value: {
cooked: 'invalid solutions: ',
raw: 'invalid solutions: ',
},
tail: false,
},
{
type: 'TemplateElement',
value: { cooked: '', raw: '' },
tail: true,
},
],
},
},
],
},
alternate: null,
},
{
type: 'ReturnStatement',
argument: {
type: 'MemberExpression',
object: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'values' },
optional: false,
},
arguments: [],
optional: false,
},
computed: false,
property: { type: 'Identifier', name: 'next' },
optional: false,
},
arguments: [],
optional: false,
},
computed: false,
property: { type: 'Identifier', name: 'value' },
optional: false,
},
},
],
},
async: false,
expression: false,
generator: false,
};
}
function main(input) {
const preprocessedPlayer =
input.type === 'player'

View File

@@ -18,6 +18,14 @@ from .utils import (
)
def int_to_int32(n):
"""Converts an integer to a signed 32-bit integer"""
n &= 0xFFFFFFFF
if n & 0x80000000:
return n - 0x100000000
return n
def _js_bit_op(op):
def zeroise(x):
if x in (None, JS_Undefined):
@@ -28,7 +36,7 @@ def _js_bit_op(op):
return int(float(x))
def wrapped(a, b):
return op(zeroise(a), zeroise(b)) & 0xffffffff
return int_to_int32(op(int_to_int32(zeroise(a)), int_to_int32(zeroise(b))))
return wrapped
@@ -368,6 +376,10 @@ class JSInterpreter:
if not _OPERATORS.get(op):
return right_val
# TODO: This is only correct for str+str and str+number; fix for str+array, str+object, etc
if op == '+' and (isinstance(left_val, str) or isinstance(right_val, str)):
return f'{left_val}{right_val}'
try:
return _OPERATORS[op](left_val, right_val)
except Exception as e:
@@ -377,7 +389,7 @@ class JSInterpreter:
if idx == 'length':
return len(obj)
try:
return obj[int(idx)] if isinstance(obj, list) else obj[idx]
return obj[int(idx)] if isinstance(obj, list) else obj[str(idx)]
except Exception as e:
if allow_undefined:
return JS_Undefined

View File

@@ -175,6 +175,13 @@ _TARGETS_COMPAT_LOOKUP = {
'safari180_ios': 'safari18_0_ios',
}
# These targets are known to be insufficient, unreliable or blocked
# See: https://github.com/yt-dlp/yt-dlp/issues/16012
_DEPRIORITIZED_TARGETS = {
ImpersonateTarget('chrome', '133', 'macos', '15'), # chrome133a
ImpersonateTarget('chrome', '136', 'macos', '15'), # chrome136
}
@register_rh
class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
@@ -192,6 +199,8 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
for version, targets in BROWSER_TARGETS.items()
if curl_cffi_version >= version
), key=lambda x: (
# deprioritize unreliable targets so they are not selected by default
x[1] not in _DEPRIORITIZED_TARGETS,
# deprioritize mobile targets since they give very different behavior
x[1].os not in ('ios', 'android'),
# prioritize tor < edge < firefox < safari < chrome

View File

@@ -511,7 +511,7 @@ def create_parser():
general.add_option(
'--live-from-start',
action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch')
help='Download livestreams from the start. Currently experimental and only supported for YouTube, Twitch, and TVer')
general.add_option(
'--no-live-from-start',
action='store_false', dest='live_from_start',

View File

@@ -75,6 +75,9 @@ MONTH_NAMES = {
'fr': [
'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'],
'is': [
'janúar', 'febrúar', 'mars', 'apríl', 'maí', 'júní',
'júlí', 'ágúst', 'september', 'október', 'nóvember', 'desember'],
# these follow the genitive grammatical case (dopełniacz)
# some websites might be using nominative, which will require another month list
# https://en.wikibooks.org/wiki/Polish/Noun_cases

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2026.01.31'
__version__ = '2026.02.21'
RELEASE_GIT_HEAD = '9a9a6b6fe44a30458c1754ef064f354f04a84004'
RELEASE_GIT_HEAD = '646bb31f39614e6c2f7ba687c53e7496394cbadb'
VARIANT = None
@@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2026.01.31'
_pkg_version = '2026.02.21'