1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-02-23 17:05:58 +00:00

Compare commits

..

43 Commits

Author SHA1 Message Date
github-actions[bot]
e2a9cc7d13 Release 2026.02.21
Created by: bashonly

:ci skip all
2026-02-21 20:22:26 +00:00
Simon Sawicki
646bb31f39 [cleanup] Misc
Authored by: Grub4K
2026-02-21 21:07:56 +01:00
Simon Sawicki
1fbbe29b99 [ie] Limit netrc_machine parameter to shell-safe characters
Also adapts some extractor regexes to adhere to this limitation

See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm

Authored by: Grub4K
2026-02-21 21:07:36 +01:00
bashonly
c105461647 [ie/youtube] Update ejs to 0.5.0 (#16031)
Authored by: bashonly
2026-02-21 20:05:38 +00:00
bashonly
1d1358d09f [ie] Add browser impersonation support to more extractors (#16029)
Closes #7001, Closes #7444, Closes #16004
Authored by: bashonly
2026-02-21 19:24:05 +00:00
blauerdorf
1fe0bf23aa [ie/spankbang] Fix playlist title extraction (#14132)
Closes #14131
Authored by: blauerdorf
2026-02-21 18:57:20 +00:00
blauerdorf
f05e1cd1f1 [ie/spankbang] Support browser impersonation (#14130)
Closes #14129
Authored by: blauerdorf
2026-02-21 18:51:52 +00:00
bashonly
46d5b6f2b7 [ie/learningonscreen] Fix extractor (#16028)
Closes #15934
Authored by: bashonly, 0xvd
2026-02-21 18:27:33 +00:00
LordMZTE
166356d1a1 [ie/opencast] Support oc-p.uni-jena.de URLs (#16026)
Closes #16023
Authored by: LordMZTE
2026-02-21 18:01:34 +00:00
Sipherdrakon
2485653859 [ie/aenetworks] Fix extractor (#14959)
Closes #14578
Authored by: Sipherdrakon
2026-02-21 17:46:59 +00:00
bashonly
f532a91cef [ie/soundcloud] Support browser impersonation (#16020)
Closes #15660
Authored by: bashonly
2026-02-21 14:50:22 +00:00
bashonly
81bdea03f3 [ie/soundcloud] Fix client ID extraction (#16019)
Authored by: bashonly
2026-02-21 00:21:29 +00:00
bashonly
e74076141d [rh:curl_cffi] Deprioritize unreliable impersonate targets (#16018)
Closes #16012
Authored by: bashonly
2026-02-20 23:48:16 +00:00
Parker Wahle
97f03660f5 [ie/SaucePlusChannel] Add extractor (#15830)
Closes #14985
Authored by: regulad
2026-02-20 00:07:48 +00:00
bashonly
772559e3db [ie/tele5] Fix extractor (#16005)
Closes #16003
Authored by: bashonly
2026-02-19 23:53:53 +00:00
Achraf
c7945800e4 [ie/youtube:search:date] Remove broken ytsearchdate support (#15959)
Closes #15898
Authored by: stastix
2026-02-19 23:18:02 +00:00
bashonly
e2444584a3 [ie/facebook:ads] Fix extractor (#16002)
Closes #16000
Authored by: bashonly
2026-02-19 23:08:08 +00:00
bashonly
acfc00a955 [ie/vk] Solve JS challenges using native JS interpreter (#15992)
Closes #12970
Authored by: bashonly, 0xvd
2026-02-19 15:14:37 +00:00
bashonly
224fe478b0 [ie/dailymotion] Fix extraction (#15995)
Fix 2b61a2a4b2

Authored by: bashonly
2026-02-19 15:11:23 +00:00
bashonly
77221098fc [ie/twitter] Fix error handling again (#15999)
Fix 0d8898c3f4

Closes #15998
Authored by: bashonly
2026-02-19 15:03:07 +00:00
CanOfSocks
319a2bda83 [ie/youtube] Extract live adaptive incomplete formats (#15937)
Closes #10148
Authored by: CanOfSocks, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2026-02-18 23:52:13 +00:00
bashonly
2204cee6d8 [ie/youtube] Add more known player JS variants (#15975)
Authored by: bashonly
2026-02-18 20:23:00 +00:00
bashonly
071ad7dfa0 [ie/odnoklassniki] Fix inefficient regular expression (#15974)
Closes #15958
Authored by: bashonly
2026-02-18 20:03:24 +00:00
bashonly
0d8898c3f4 [ie/twitter] Fix error handling (#15993)
Closes #15963
Authored by: bashonly
2026-02-18 19:55:48 +00:00
bashonly
d108ca10b9 [jsinterp] Support string concatenation with + and += (#15990)
Authored by: bashonly
2026-02-17 23:46:20 +00:00
bashonly
c9c8651975 [jsinterp] Stringify bracket notation keys in object access (#15989)
Authored by: bashonly
2026-02-17 23:20:54 +00:00
bashonly
62574f5763 [jsinterp] Fix bitwise operations (#15985)
Authored by: bashonly
2026-02-17 23:10:18 +00:00
Simon Sawicki
abade83f8d [cleanup] Bump ruff to 0.15.x (#15951)
Authored by: Grub4K
2026-02-16 20:11:02 +00:00
bashonly
43229d1d5f [cookies] Ignore cookies with control characters (#15862)
http.cookies.Morsel was patched in Python 3.14.3 and 3.13.12
to raise a CookieError if the cookie name, value or any attribute
of its input contains a control character.

yt_dlp.cookies.LenientSimpleCookie now preemptively discards
any cookies containing control characters, which is consistent
with its more lenient parsing.

Ref: https://github.com/python/cpython/issues/143919

Closes #15849
Authored by: bashonly, syphyr

Co-authored-by: syphyr <syphyr@gmail.com>
2026-02-16 19:59:34 +00:00
Gareth Seddon
8d6e0b29bf [ie/MatchiTV] Add extractor (#15204)
Authored by: gseddon
2026-02-12 08:14:56 +00:00
Corey Wright
1ea7329cc9 [ie/ApplePodcasts] Fix extractor (#15901)
Closes #15900
Authored by: coreywright
2026-02-12 08:09:37 +00:00
doe1080
a13f281012 [ie/tvo] Add extractor (#15903)
Authored by: doe1080
2026-02-09 20:57:54 +00:00
doe1080
02ce3efbfe [ie/tver:olympic] Add extractor (#15885)
Authored by: doe1080
2026-02-09 20:56:39 +00:00
doe1080
1a9c4b8238 [ie/steam] Fix extractor (#15028)
Closes #15014
Authored by: doe1080
2026-02-09 20:33:36 +00:00
bashonly
637ae202ac [ie/gem.cbc.ca] Support standalone, series & Olympics URLs (#15878)
Closes #8382, Closes #8790, Closes #15850
Authored by: bashonly, makew0rld, 0xvd

Co-authored-by: makeworld <makeworld@protonmail.com>
Co-authored-by: 0xvd <0xvd12@gmail.com>
2026-02-07 23:12:45 +00:00
hunter-gatherer8
23c059a455 [ie/1tv] Extract chapters (#15848)
Authored by: hunter-gatherer8
2026-02-06 20:45:47 +00:00
beacdeac
6f38df31b4 [ie/pornhub] Fix extractor (#15858)
Closes #15827
Authored by: beacdeac
2026-02-06 20:41:56 +00:00
doe1080
442c90da3e [ie/locipo] Add extractors (#15486)
Closes #13656
Authored by: doe1080, gravesducking

Co-authored-by: gravesducking <219445875+gravesducking@users.noreply.github.com>
2026-02-04 21:06:39 +00:00
0x∅
133cb959be [ie/xhamster] Fix extractor (#15831)
Closes #15802
Authored by: 0xvd
2026-02-04 20:49:07 +00:00
doe1080
c7c45f5289 [ie/visir] Add extractor (#15811)
Closes #11901
Authored by: doe1080
2026-02-04 15:33:00 +00:00
github-actions[bot]
bb3af7e6d5 Release 2026.02.04
Created by: bashonly

:ci skip all
2026-02-04 00:31:48 +00:00
doe1080
c677d866d4 [ie/unsupported] Update unsupported URLs (#15812)
Closes #8821, Closes #9851, Closes #13220, Closes #14564, Closes #14620
Authored by: doe1080
2026-02-03 23:30:59 +00:00
bashonly
1a895c18aa [ie/youtube] Default to tv player JS variant (#15818)
Closes #15814
Authored by: bashonly
2026-02-03 23:26:30 +00:00
61 changed files with 2011 additions and 523 deletions

View File

@@ -864,3 +864,13 @@ Sytm
zahlman zahlman
azdlonky azdlonky
thematuu thematuu
beacdeac
blauerdorf
CanOfSocks
gravesducking
gseddon
hunter-gatherer8
LordMZTE
regulad
stastix
syphyr

View File

@@ -4,6 +4,69 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2026.02.21
#### Important changes
- Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)
- The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters
#### Core changes
- **cookies**: [Ignore cookies with control characters](https://github.com/yt-dlp/yt-dlp/commit/43229d1d5f47b313e1958d719faff6321d853ed3) ([#15862](https://github.com/yt-dlp/yt-dlp/issues/15862)) by [bashonly](https://github.com/bashonly), [syphyr](https://github.com/syphyr)
- **jsinterp**
- [Fix bitwise operations](https://github.com/yt-dlp/yt-dlp/commit/62574f5763755a8637880044630b12582e4a55a5) ([#15985](https://github.com/yt-dlp/yt-dlp/issues/15985)) by [bashonly](https://github.com/bashonly)
- [Stringify bracket notation keys in object access](https://github.com/yt-dlp/yt-dlp/commit/c9c86519753d6cdafa052945d2de0d3fcd448927) ([#15989](https://github.com/yt-dlp/yt-dlp/issues/15989)) by [bashonly](https://github.com/bashonly)
- [Support string concatenation with `+` and `+=`](https://github.com/yt-dlp/yt-dlp/commit/d108ca10b926410ed99031fec86894bfdea8f8eb) ([#15990](https://github.com/yt-dlp/yt-dlp/issues/15990)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- [Add browser impersonation support to more extractors](https://github.com/yt-dlp/yt-dlp/commit/1d1358d09fedcdc6b3e83538a29b0b539cb9be3f) ([#16029](https://github.com/yt-dlp/yt-dlp/issues/16029)) by [bashonly](https://github.com/bashonly)
- [Limit `netrc_machine` parameter to shell-safe characters](https://github.com/yt-dlp/yt-dlp/commit/1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a) by [Grub4K](https://github.com/Grub4K)
- **1tv**: [Extract chapters](https://github.com/yt-dlp/yt-dlp/commit/23c059a455acbb317b2bbe657efd59113bf4d5ac) ([#15848](https://github.com/yt-dlp/yt-dlp/issues/15848)) by [hunter-gatherer8](https://github.com/hunter-gatherer8)
- **aenetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/24856538595a3b25c75e1199146fcc82ea812d97) ([#14959](https://github.com/yt-dlp/yt-dlp/issues/14959)) by [Sipherdrakon](https://github.com/Sipherdrakon)
- **applepodcasts**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1ea7329cc91da38a790174e831fffafcb3ea3c3d) ([#15901](https://github.com/yt-dlp/yt-dlp/issues/15901)) by [coreywright](https://github.com/coreywright)
- **dailymotion**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/224fe478b0ef83d13b36924befa53686290cb000) ([#15995](https://github.com/yt-dlp/yt-dlp/issues/15995)) by [bashonly](https://github.com/bashonly)
- **facebook**: ads: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e2444584a3e590077b81828ad8a12fc4c3b1aa6d) ([#16002](https://github.com/yt-dlp/yt-dlp/issues/16002)) by [bashonly](https://github.com/bashonly)
- **gem.cbc.ca**: [Support standalone, series & Olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/637ae202aca7a990b3b61bc33d692870dc16c3ad) ([#15878](https://github.com/yt-dlp/yt-dlp/issues/15878)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly), [makew0rld](https://github.com/makew0rld)
- **learningonscreen**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/46d5b6f2b7989d8991a59215d434fb8b5a8ec7bb) ([#16028](https://github.com/yt-dlp/yt-dlp/issues/16028)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
- **locipo**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/442c90da3ec680037b7d94abf91ec63b2e5a9ade) ([#15486](https://github.com/yt-dlp/yt-dlp/issues/15486)) by [doe1080](https://github.com/doe1080), [gravesducking](https://github.com/gravesducking)
- **matchitv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/8d6e0b29bf15365638e0ceeb803a274e4db6157d) ([#15204](https://github.com/yt-dlp/yt-dlp/issues/15204)) by [gseddon](https://github.com/gseddon)
- **odnoklassniki**: [Fix inefficient regular expression](https://github.com/yt-dlp/yt-dlp/commit/071ad7dfa012f5b71572d29ef96fc154cb2dc9cc) ([#15974](https://github.com/yt-dlp/yt-dlp/issues/15974)) by [bashonly](https://github.com/bashonly)
- **opencast**: [Support `oc-p.uni-jena.de` URLs](https://github.com/yt-dlp/yt-dlp/commit/166356d1a1cac19cac14298e735eeae44b52c70e) ([#16026](https://github.com/yt-dlp/yt-dlp/issues/16026)) by [LordMZTE](https://github.com/LordMZTE)
- **pornhub**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6f38df31b477cf5ea3c8f91207452e3a4e8d5aa6) ([#15858](https://github.com/yt-dlp/yt-dlp/issues/15858)) by [beacdeac](https://github.com/beacdeac)
- **saucepluschannel**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/97f03660f55696dc9fce56e7ee43fbe3324a9867) ([#15830](https://github.com/yt-dlp/yt-dlp/issues/15830)) by [regulad](https://github.com/regulad)
- **soundcloud**
- [Fix client ID extraction](https://github.com/yt-dlp/yt-dlp/commit/81bdea03f3414dd4d086610c970ec14e15bd3d36) ([#16019](https://github.com/yt-dlp/yt-dlp/issues/16019)) by [bashonly](https://github.com/bashonly)
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f532a91cef11075eb5a7809255259b32d2bca8ca) ([#16020](https://github.com/yt-dlp/yt-dlp/issues/16020)) by [bashonly](https://github.com/bashonly)
- **spankbang**
- [Fix playlist title extraction](https://github.com/yt-dlp/yt-dlp/commit/1fe0bf23aa2249858c08408b7cc6287aaf528690) ([#14132](https://github.com/yt-dlp/yt-dlp/issues/14132)) by [blauerdorf](https://github.com/blauerdorf)
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f05e1cd1f1052cb40fc966d2fc175571986da863) ([#14130](https://github.com/yt-dlp/yt-dlp/issues/14130)) by [blauerdorf](https://github.com/blauerdorf)
- **steam**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1a9c4b8238434c760b3e27d0c9df6a4a2482d918) ([#15028](https://github.com/yt-dlp/yt-dlp/issues/15028)) by [doe1080](https://github.com/doe1080)
- **tele5**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/772559e3db2eb82e5d862d6d779588ca4b0b048d) ([#16005](https://github.com/yt-dlp/yt-dlp/issues/16005)) by [bashonly](https://github.com/bashonly)
- **tver**: olympic: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/02ce3efbfe51d54cb0866953af423fc6d1f38933) ([#15885](https://github.com/yt-dlp/yt-dlp/issues/15885)) by [doe1080](https://github.com/doe1080)
- **tvo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/a13f281012a21c85f76cf3e320fc3b00d480d6c6) ([#15903](https://github.com/yt-dlp/yt-dlp/issues/15903)) by [doe1080](https://github.com/doe1080)
- **twitter**: [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/0d8898c3f4e76742afb2b877f817fdee89fa1258) ([#15993](https://github.com/yt-dlp/yt-dlp/issues/15993)) by [bashonly](https://github.com/bashonly) (With fixes in [7722109](https://github.com/yt-dlp/yt-dlp/commit/77221098fc5016f12118421982f02b662021972c))
- **visir**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/c7c45f52890eee40565188aee874ff4e58e95c4f) ([#15811](https://github.com/yt-dlp/yt-dlp/issues/15811)) by [doe1080](https://github.com/doe1080)
- **vk**: [Solve JS challenges using native JS interpreter](https://github.com/yt-dlp/yt-dlp/commit/acfc00a955208ee780b4cb18ae26de7b62444153) ([#15992](https://github.com/yt-dlp/yt-dlp/issues/15992)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
- **xhamster**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/133cb959be4d268e2cd6b3f1d9bf87fba4c3743e) ([#15831](https://github.com/yt-dlp/yt-dlp/issues/15831)) by [0xvd](https://github.com/0xvd)
- **youtube**
- [Add more known player JS variants](https://github.com/yt-dlp/yt-dlp/commit/2204cee6d8301e491d8455a2c54fd0e1b23468f5) ([#15975](https://github.com/yt-dlp/yt-dlp/issues/15975)) by [bashonly](https://github.com/bashonly)
- [Extract live adaptive `incomplete` formats](https://github.com/yt-dlp/yt-dlp/commit/319a2bda83f5e54054661c56c1391533f82473c2) ([#15937](https://github.com/yt-dlp/yt-dlp/issues/15937)) by [bashonly](https://github.com/bashonly), [CanOfSocks](https://github.com/CanOfSocks)
- [Update ejs to 0.5.0](https://github.com/yt-dlp/yt-dlp/commit/c105461647315f7f479091194944713b392ca729) ([#16031](https://github.com/yt-dlp/yt-dlp/issues/16031)) by [bashonly](https://github.com/bashonly)
- date, search: [Remove broken `ytsearchdate` support](https://github.com/yt-dlp/yt-dlp/commit/c7945800e4ccd8cad2d5ee7806a872963c0c6d44) ([#15959](https://github.com/yt-dlp/yt-dlp/issues/15959)) by [stastix](https://github.com/stastix)
#### Networking changes
- **Request Handler**: curl_cffi: [Deprioritize unreliable impersonate targets](https://github.com/yt-dlp/yt-dlp/commit/e74076141dc86d5603680ea641d7cec86a821ac8) ([#16018](https://github.com/yt-dlp/yt-dlp/issues/16018)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **cleanup**
- [Bump ruff to 0.15.x](https://github.com/yt-dlp/yt-dlp/commit/abade83f8ddb63a11746b69038ebcd9c1405a00a) ([#15951](https://github.com/yt-dlp/yt-dlp/issues/15951)) by [Grub4K](https://github.com/Grub4K)
- Miscellaneous: [646bb31](https://github.com/yt-dlp/yt-dlp/commit/646bb31f39614e6c2f7ba687c53e7496394cbadb) by [Grub4K](https://github.com/Grub4K)
### 2026.02.04
#### Extractor changes
- **unsupported**: [Update unsupported URLs](https://github.com/yt-dlp/yt-dlp/commit/c677d866d41eb4075b0a5e0c944a6543fc13f15d) ([#15812](https://github.com/yt-dlp/yt-dlp/issues/15812)) by [doe1080](https://github.com/doe1080)
- **youtube**: [Default to `tv` player JS variant](https://github.com/yt-dlp/yt-dlp/commit/1a895c18aaaf00f557aa8cbacb21faa638842431) ([#15818](https://github.com/yt-dlp/yt-dlp/issues/15818)) by [bashonly](https://github.com/bashonly)
### 2026.01.31 ### 2026.01.31
#### Extractor changes #### Extractor changes

View File

@@ -202,9 +202,9 @@ CONTRIBUTORS: Changelog.md
# The following EJS_-prefixed variables are auto-generated by devscripts/update_ejs.py # The following EJS_-prefixed variables are auto-generated by devscripts/update_ejs.py
# DO NOT EDIT! # DO NOT EDIT!
EJS_VERSION = 0.4.0 EJS_VERSION = 0.5.0
EJS_WHEEL_NAME = yt_dlp_ejs-0.4.0-py3-none-any.whl EJS_WHEEL_NAME = yt_dlp_ejs-0.5.0-py3-none-any.whl
EJS_WHEEL_HASH = sha256:19278cff397b243074df46342bb7616c404296aeaff01986b62b4e21823b0b9c EJS_WHEEL_HASH = sha256:674fc0efea741d3100cdf3f0f9e123150715ee41edf47ea7a62fbdeda204bdec
EJS_PY_FOLDERS = yt_dlp_ejs yt_dlp_ejs/yt yt_dlp_ejs/yt/solver EJS_PY_FOLDERS = yt_dlp_ejs yt_dlp_ejs/yt yt_dlp_ejs/yt/solver
EJS_PY_FILES = yt_dlp_ejs/__init__.py yt_dlp_ejs/_version.py yt_dlp_ejs/yt/__init__.py yt_dlp_ejs/yt/solver/__init__.py EJS_PY_FILES = yt_dlp_ejs/__init__.py yt_dlp_ejs/_version.py yt_dlp_ejs/yt/__init__.py yt_dlp_ejs/yt/solver/__init__.py
EJS_JS_FOLDERS = yt_dlp_ejs/yt/solver EJS_JS_FOLDERS = yt_dlp_ejs/yt/solver

View File

@@ -406,7 +406,7 @@ Tip: Use `CTRL`+`F` (or `Command`+`F`) to search by keywords
(default) (default)
--live-from-start Download livestreams from the start. --live-from-start Download livestreams from the start.
Currently experimental and only supported Currently experimental and only supported
for YouTube and Twitch for YouTube, Twitch, and TVer
--no-live-from-start Download livestreams from the current time --no-live-from-start Download livestreams from the current time
(default) (default)
--wait-for-video MIN[-MAX] Wait for scheduled streams to become --wait-for-video MIN[-MAX] Wait for scheduled streams to become
@@ -1864,13 +1864,13 @@ The following extractors use this feature:
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests * `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `tv`, `tv_es6`, `phone`, `tablet`. The default is `main`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site * `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `es6_tcc`, `es6_tce`, `tv`, `tv_es6`, `phone`, `house`. The default is `tv`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site
* `player_js_version`: The player javascript version to use for n/sig deciphering, in the format of `signature_timestamp@hash` (e.g. `20348@0004de42`). The default is to use what is prescribed by the site, and can be selected with `actual` * `player_js_version`: The player javascript version to use for n/sig deciphering, in the format of `signature_timestamp@hash` (e.g. `20348@0004de42`). The default is to use what is prescribed by the site, and can be selected with `actual`
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread,max-depth`. Default is `all,all,all,all,all` * `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread,max-depth`. Default is `all,all,all,all,all`
* A `max-depth` value of `1` will discard all replies, regardless of the `max-replies` or `max-replies-per-thread` values given * A `max-depth` value of `1` will discard all replies, regardless of the `max-replies` or `max-replies-per-thread` values given
* E.g. `all,all,1000,10,2` will get a maximum of 1000 replies total, with up to 10 replies per thread, and only 2 levels of depth (i.e. top-level comments plus their immediate replies). `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total * E.g. `all,all,1000,10,2` will get a maximum of 1000 replies total, with up to 10 replies per thread, and only 2 levels of depth (i.e. top-level comments plus their immediate replies). `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one) * `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash, live adaptive https, and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others * `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used * `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
@@ -2261,7 +2261,7 @@ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details. * **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
* **YouTube improvements**: * **YouTube improvements**:
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefixes (`ytsearch:`, `ytsearchdate:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`) * Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefix (`ytsearch:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)
* Fix for [n-sig based throttling](https://github.com/ytdl-org/youtube-dl/issues/29326) **\*** * Fix for [n-sig based throttling](https://github.com/ytdl-org/youtube-dl/issues/29326) **\***
* Download livestreams from the start using `--live-from-start` (*experimental*) * Download livestreams from the start using `--live-from-start` (*experimental*)
* Channel URLs download all uploads of the channel, including shorts and live * Channel URLs download all uploads of the channel, including shorts and live

View File

@@ -337,5 +337,10 @@
"when": "e2ea6bd6ab639f910b99e55add18856974ff4c3a", "when": "e2ea6bd6ab639f910b99e55add18856974ff4c3a",
"short": "[ie] Fix prioritization of Youtube URL matching (#15596)", "short": "[ie] Fix prioritization of Youtube URL matching (#15596)",
"authors": ["Grub4K"] "authors": ["Grub4K"]
},
{
"action": "add",
"when": "1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a",
"short": "[priority] Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)\n - The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters"
} }
] ]

View File

@@ -55,7 +55,7 @@ default = [
"requests>=2.32.2,<3", "requests>=2.32.2,<3",
"urllib3>=2.0.2,<3", "urllib3>=2.0.2,<3",
"websockets>=13.0", "websockets>=13.0",
"yt-dlp-ejs==0.4.0", "yt-dlp-ejs==0.5.0",
] ]
curl-cffi = [ curl-cffi = [
"curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.15; implementation_name=='cpython'", "curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.15; implementation_name=='cpython'",
@@ -85,7 +85,7 @@ dev = [
] ]
static-analysis = [ static-analysis = [
"autopep8~=2.0", "autopep8~=2.0",
"ruff~=0.14.0", "ruff~=0.15.0",
] ]
test = [ test = [
"pytest~=8.1", "pytest~=8.1",

View File

@@ -506,7 +506,8 @@ The only reliable way to check if a site is supported is to try it.
- **GDCVault**: [*gdcvault*](## "netrc machine") (**Currently broken**) - **GDCVault**: [*gdcvault*](## "netrc machine") (**Currently broken**)
- **GediDigital** - **GediDigital**
- **gem.cbc.ca**: [*cbcgem*](## "netrc machine") - **gem.cbc.ca**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:live** - **gem.cbc.ca:live**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:olympics**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine") - **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine")
- **Genius** - **Genius**
- **GeniusLyrics** - **GeniusLyrics**
@@ -734,6 +735,8 @@ The only reliable way to check if a site is supported is to try it.
- **Livestreamfails** - **Livestreamfails**
- **Lnk** - **Lnk**
- **loc**: Library of Congress - **loc**: Library of Congress
- **Locipo**
- **LocipoPlaylist**
- **Loco** - **Loco**
- **loom** - **loom**
- **loom:folder**: (**Currently broken**) - **loom:folder**: (**Currently broken**)
@@ -763,6 +766,7 @@ The only reliable way to check if a site is supported is to try it.
- **MarkizaPage**: (**Currently broken**) - **MarkizaPage**: (**Currently broken**)
- **massengeschmack.tv** - **massengeschmack.tv**
- **Masters** - **Masters**
- **MatchiTV**
- **MatchTV** - **MatchTV**
- **mave** - **mave**
- **mave:channel** - **mave:channel**
@@ -1283,6 +1287,7 @@ The only reliable way to check if a site is supported is to try it.
- **Sangiin**: 参議院インターネット審議中継 (archive) - **Sangiin**: 参議院インターネット審議中継 (archive)
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **SaucePlus**: Sauce+ - **SaucePlus**: Sauce+
- **SaucePlusChannel**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
- **sbs.co.kr** - **sbs.co.kr**
- **sbs.co.kr:allvod_program** - **sbs.co.kr:allvod_program**
@@ -1550,10 +1555,12 @@ The only reliable way to check if a site is supported is to try it.
- **TVC** - **TVC**
- **TVCArticle** - **TVCArticle**
- **TVer** - **TVer**
- **tver:olympic**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **TVIPlayer** - **TVIPlayer**
- **TVN24**: (**Currently broken**) - **TVN24**: (**Currently broken**)
- **tvnoe**: Televize Noe - **tvnoe**: Televize Noe
- **TVO**
- **tvopengr:embed**: tvopen.gr embedded videos - **tvopengr:embed**: tvopen.gr embedded videos
- **tvopengr:watch**: tvopen.gr (and ethnos.gr) videos - **tvopengr:watch**: tvopen.gr (and ethnos.gr) videos
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
@@ -1664,6 +1671,7 @@ The only reliable way to check if a site is supported is to try it.
- **ViMP:Playlist** - **ViMP:Playlist**
- **Viously** - **Viously**
- **Viqeo**: (**Currently broken**) - **Viqeo**: (**Currently broken**)
- **Visir**: Vísir
- **Viu** - **Viu**
- **viu:ott**: [*viu*](## "netrc machine") - **viu:ott**: [*viu*](## "netrc machine")
- **viu:playlist** - **viu:playlist**
@@ -1812,7 +1820,6 @@ The only reliable way to check if a site is supported is to try it.
- **youtube:playlist**: [*youtube*](## "netrc machine") YouTube playlists - **youtube:playlist**: [*youtube*](## "netrc machine") YouTube playlists
- **youtube:recommended**: [*youtube*](## "netrc machine") YouTube recommended videos; ":ytrec" keyword - **youtube:recommended**: [*youtube*](## "netrc machine") YouTube recommended videos; ":ytrec" keyword
- **youtube:search**: [*youtube*](## "netrc machine") YouTube search; "ytsearch:" prefix - **youtube:search**: [*youtube*](## "netrc machine") YouTube search; "ytsearch:" prefix
- **youtube:search:date**: [*youtube*](## "netrc machine") YouTube search, newest videos first; "ytsearchdate:" prefix
- **youtube:search_url**: [*youtube*](## "netrc machine") YouTube search URLs with sorting and filter support - **youtube:search_url**: [*youtube*](## "netrc machine") YouTube search URLs with sorting and filter support
- **youtube:shorts:pivot:audio**: [*youtube*](## "netrc machine") YouTube Shorts audio pivot (Shorts using audio of a given video) - **youtube:shorts:pivot:audio**: [*youtube*](## "netrc machine") YouTube Shorts audio pivot (Shorts using audio of a given video)
- **youtube:subscriptions**: [*youtube*](## "netrc machine") YouTube subscriptions feed; ":ytsubs" keyword (requires cookies) - **youtube:subscriptions**: [*youtube*](## "netrc machine") YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)

View File

@@ -294,7 +294,7 @@ def expect_info_dict(self, got_dict, expected_dict):
missing_keys = sorted( missing_keys = sorted(
test_info_dict.keys() - expected_dict.keys(), test_info_dict.keys() - expected_dict.keys(),
key=lambda x: ALLOWED_KEYS_SORT_ORDER.index(x)) key=ALLOWED_KEYS_SORT_ORDER.index)
if missing_keys: if missing_keys:
def _repr(v): def _repr(v):
if isinstance(v, str): if isinstance(v, str):

View File

@@ -76,6 +76,8 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._get_netrc_login_info(netrc_machine='empty_pass'), ('user', '')) self.assertEqual(ie._get_netrc_login_info(netrc_machine='empty_pass'), ('user', ''))
self.assertEqual(ie._get_netrc_login_info(netrc_machine='both_empty'), ('', '')) self.assertEqual(ie._get_netrc_login_info(netrc_machine='both_empty'), ('', ''))
self.assertEqual(ie._get_netrc_login_info(netrc_machine='nonexistent'), (None, None)) self.assertEqual(ie._get_netrc_login_info(netrc_machine='nonexistent'), (None, None))
with self.assertRaises(ExtractorError):
ie._get_netrc_login_info(netrc_machine=';echo rce')
def test_html_search_regex(self): def test_html_search_regex(self):
html = '<p id="foo">Watch this <a href="http://www.youtube.com/watch?v=BaW_jenozKc">video</a></p>' html = '<p id="foo">Watch this <a href="http://www.youtube.com/watch?v=BaW_jenozKc">video</a></p>'

View File

@@ -205,8 +205,8 @@ class TestLenientSimpleCookie(unittest.TestCase):
), ),
( (
'Test quoted cookie', 'Test quoted cookie',
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"', 'keebler="E=mc2; L=\\"Loves\\"; fudge=;"',
{'keebler': 'E=mc2; L="Loves"; fudge=\012;'}, {'keebler': 'E=mc2; L="Loves"; fudge=;'},
), ),
( (
"Allow '=' in an unquoted value", "Allow '=' in an unquoted value",
@@ -328,4 +328,30 @@ class TestLenientSimpleCookie(unittest.TestCase):
'Key=Value; [Invalid]=Value; Another=Value', 'Key=Value; [Invalid]=Value; Another=Value',
{'Key': 'Value', 'Another': 'Value'}, {'Key': 'Value', 'Another': 'Value'},
), ),
# Ref: https://github.com/python/cpython/issues/143919
(
'Test invalid cookie name w/ control character',
'foo\012=bar;',
{},
),
(
'Test invalid cookie name w/ control character 2',
'foo\015baz=bar',
{},
),
(
'Test invalid cookie name w/ control character followed by valid cookie',
'foo\015=bar; x=y;',
{'x': 'y'},
),
(
'Test invalid cookie value w/ control character',
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
{},
),
(
'Test invalid quoted attribute value w/ control character',
'Customer="WILE_E_COYOTE"; Version="1\\012"; Path="/acme"',
{},
),
) )

View File

@@ -33,9 +33,12 @@ class Variant(enum.Enum):
tce = 'player_ias_tce.vflset/en_US/base.js' tce = 'player_ias_tce.vflset/en_US/base.js'
es5 = 'player_es5.vflset/en_US/base.js' es5 = 'player_es5.vflset/en_US/base.js'
es6 = 'player_es6.vflset/en_US/base.js' es6 = 'player_es6.vflset/en_US/base.js'
es6_tcc = 'player_es6_tcc.vflset/en_US/base.js'
es6_tce = 'player_es6_tce.vflset/en_US/base.js'
tv = 'tv-player-ias.vflset/tv-player-ias.js' tv = 'tv-player-ias.vflset/tv-player-ias.js'
tv_es6 = 'tv-player-es6.vflset/tv-player-es6.js' tv_es6 = 'tv-player-es6.vflset/tv-player-es6.js'
phone = 'player-plasma-ias-phone-en_US.vflset/base.js' phone = 'player-plasma-ias-phone-en_US.vflset/base.js'
house = 'house_brand_player.vflset/en_US/base.js'
@dataclasses.dataclass @dataclasses.dataclass
@@ -102,6 +105,66 @@ CHALLENGES: list[Challenge] = [
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt': 'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'ttJC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNGa1kOyBf6HPuAuCduh-_', 'ttJC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNGa1kOyBf6HPuAuCduh-_',
}), }),
# 4e51e895: main variant broke sig solving; n challenge is added only for regression testing
Challenge('4e51e895', Variant.main, JsChallengeType.N, {
'0eRGgQWJGfT5rFHFj': 't5kO23_msekBur',
}),
Challenge('4e51e895', Variant.main, JsChallengeType.SIG, {
'AL6p_8AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7':
'AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n9pKizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7',
}),
# 42c5570b: tce variant broke sig solving; n challenge is added only for regression testing
Challenge('42c5570b', Variant.tce, JsChallengeType.N, {
'ZdZIqFPQK-Ty8wId': 'CRoXjB-R-R',
}),
Challenge('42c5570b', Variant.tce, JsChallengeType.SIG, {
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'EN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavcOmNdYN-wUtgEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt',
}),
# 54bd1de4: tce variant broke sig solving; n challenge is added only for regression testing
Challenge('54bd1de4', Variant.tce, JsChallengeType.N, {
'ZdZIqFPQK-Ty8wId': 'ka-slAQ31sijFN',
}),
Challenge('54bd1de4', Variant.tce, JsChallengeType.SIG, {
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0titeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtp',
}),
# 94667337: tce and es6 variants broke sig solving; n and main/tv variants are added only for regression testing
Challenge('94667337', Variant.main, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.main, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.tv, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.tv, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.es6, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.es6, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.tce, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.tce, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
Challenge('94667337', Variant.es6_tce, JsChallengeType.N, {
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
}),
Challenge('94667337', Variant.es6_tce, JsChallengeType.SIG, {
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
}),
] ]
requests: list[JsChallengeRequest] = [] requests: list[JsChallengeRequest] = []

View File

@@ -9,7 +9,12 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import math import math
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter, js_number_to_string from yt_dlp.jsinterp import (
JS_Undefined,
JSInterpreter,
int_to_int32,
js_number_to_string,
)
class NaN: class NaN:
@@ -101,8 +106,16 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f(){return 5 ^ 9;}', 12) self._test('function f(){return 5 ^ 9;}', 12)
self._test('function f(){return 0.0 << NaN}', 0) self._test('function f(){return 0.0 << NaN}', 0)
self._test('function f(){return null << undefined}', 0) self._test('function f(){return null << undefined}', 0)
# TODO: Does not work due to number too large self._test('function f(){return -12616 ^ 5041}', -8951)
# self._test('function f(){return 21 << 4294967297}', 42) self._test('function f(){return 21 << 4294967297}', 42)
def test_string_concat(self):
self._test('function f(){return "a" + "b";}', 'ab')
self._test('function f(){let x = "a"; x += "b"; return x;}', 'ab')
self._test('function f(){return "a" + 1;}', 'a1')
self._test('function f(){let x = "a"; x += 1; return x;}', 'a1')
self._test('function f(){return 2 + "b";}', '2b')
self._test('function f(){let x = 2; x += "b"; return x;}', '2b')
def test_array_access(self): def test_array_access(self):
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7]) self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])
@@ -325,6 +338,7 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f() { let a = {m1: 42, m2: 0 }; return [a["m1"], a.m2]; }', [42, 0]) self._test('function f() { let a = {m1: 42, m2: 0 }; return [a["m1"], a.m2]; }', [42, 0])
self._test('function f() { let a; return a?.qq; }', JS_Undefined) self._test('function f() { let a; return a?.qq; }', JS_Undefined)
self._test('function f() { let a = {m1: 42, m2: 0 }; return a?.qq; }', JS_Undefined) self._test('function f() { let a = {m1: 42, m2: 0 }; return a?.qq; }', JS_Undefined)
self._test('function f() { let a = {"1": 123}; return a[1]; }', 123)
def test_regex(self): def test_regex(self):
self._test('function f() { let a=/,,[/,913,/](,)}/; }', None) self._test('function f() { let a=/,,[/,913,/](,)}/; }', None)
@@ -447,6 +461,22 @@ class TestJSInterpreter(unittest.TestCase):
def test_splice(self): def test_splice(self):
self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0']) self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0'])
def test_int_to_int32(self):
for inp, exp in [
(0, 0),
(1, 1),
(-1, -1),
(-8951, -8951),
(2147483647, 2147483647),
(2147483648, -2147483648),
(2147483649, -2147483647),
(-2147483649, 2147483647),
(-2147483648, -2147483648),
(-16799986688, 379882496),
(39570129568, 915423904),
]:
assert int_to_int32(inp) == exp
def test_js_number_to_string(self): def test_js_number_to_string(self):
for test, radix, expected in [ for test, radix, expected in [
(0, None, '0'), (0, None, '0'),

View File

@@ -1004,6 +1004,7 @@ class TestUrllibRequestHandler(TestRequestHandlerBase):
@pytest.mark.parametrize('handler', ['Requests'], indirect=True) @pytest.mark.parametrize('handler', ['Requests'], indirect=True)
class TestRequestsRequestHandler(TestRequestHandlerBase): class TestRequestsRequestHandler(TestRequestHandlerBase):
# ruff: disable[PLW0108] `requests` and/or `urllib3` may not be available
@pytest.mark.parametrize('raised,expected', [ @pytest.mark.parametrize('raised,expected', [
(lambda: requests.exceptions.ConnectTimeout(), TransportError), (lambda: requests.exceptions.ConnectTimeout(), TransportError),
(lambda: requests.exceptions.ReadTimeout(), TransportError), (lambda: requests.exceptions.ReadTimeout(), TransportError),
@@ -1017,8 +1018,10 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
# catch-all: https://github.com/psf/requests/blob/main/src/requests/adapters.py#L535 # catch-all: https://github.com/psf/requests/blob/main/src/requests/adapters.py#L535
(lambda: urllib3.exceptions.HTTPError(), TransportError), (lambda: urllib3.exceptions.HTTPError(), TransportError),
(lambda: requests.exceptions.RequestException(), RequestError), (lambda: requests.exceptions.RequestException(), RequestError),
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError) - Needs a response object # Needs a response object
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError),
]) ])
# ruff: enable[PLW0108]
def test_request_error_mapping(self, handler, monkeypatch, raised, expected): def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
with handler() as rh: with handler() as rh:
def mock_get_instance(*args, **kwargs): def mock_get_instance(*args, **kwargs):
@@ -1034,6 +1037,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
assert exc_info.type is expected assert exc_info.type is expected
# ruff: disable[PLW0108] `urllib3` may not be available
@pytest.mark.parametrize('raised,expected,match', [ @pytest.mark.parametrize('raised,expected,match', [
(lambda: urllib3.exceptions.SSLError(), SSLError, None), (lambda: urllib3.exceptions.SSLError(), SSLError, None),
(lambda: urllib3.exceptions.TimeoutError(), TransportError, None), (lambda: urllib3.exceptions.TimeoutError(), TransportError, None),
@@ -1052,6 +1056,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
'3 bytes read, 5 more expected', '3 bytes read, 5 more expected',
), ),
]) ])
# ruff: enable[PLW0108]
def test_response_error_mapping(self, handler, monkeypatch, raised, expected, match): def test_response_error_mapping(self, handler, monkeypatch, raised, expected, match):
from requests.models import Response as RequestsResponse from requests.models import Response as RequestsResponse
from urllib3.response import HTTPResponse as Urllib3Response from urllib3.response import HTTPResponse as Urllib3Response

View File

@@ -239,6 +239,7 @@ class TestTraversal:
'accept matching `expected_type` type' 'accept matching `expected_type` type'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int) is None, \ assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int) is None, \
'reject non matching `expected_type` type' 'reject non matching `expected_type` type'
# ruff: noqa: PLW0108 `type`s get special treatment, so wrap in lambda
assert traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)) == '0', \ assert traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)) == '0', \
'transform type using type function' 'transform type using type function'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0) is None, \ assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0) is None, \

View File

@@ -924,6 +924,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(month_by_name(None), None) self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12) self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12) self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('desember', 'is'), 12)
self.assertEqual(month_by_name('December'), 12) self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None) self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None) self.assertEqual(month_by_name('Unknown', 'unknown'), None)

View File

@@ -448,6 +448,7 @@ def create_fake_ws_connection(raised):
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True) @pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
class TestWebsocketsRequestHandler: class TestWebsocketsRequestHandler:
# ruff: disable[PLW0108] `websockets` may not be available
@pytest.mark.parametrize('raised,expected', [ @pytest.mark.parametrize('raised,expected', [
# https://websockets.readthedocs.io/en/stable/reference/exceptions.html # https://websockets.readthedocs.io/en/stable/reference/exceptions.html
(lambda: websockets.exceptions.InvalidURI(msg='test', uri='test://'), RequestError), (lambda: websockets.exceptions.InvalidURI(msg='test', uri='test://'), RequestError),
@@ -459,13 +460,14 @@ class TestWebsocketsRequestHandler:
(lambda: websockets.exceptions.NegotiationError(), TransportError), (lambda: websockets.exceptions.NegotiationError(), TransportError),
# Catch-all # Catch-all
(lambda: websockets.exceptions.WebSocketException(), TransportError), (lambda: websockets.exceptions.WebSocketException(), TransportError),
(lambda: TimeoutError(), TransportError), (TimeoutError, TransportError),
# These may be raised by our create_connection implementation, which should also be caught # These may be raised by our create_connection implementation, which should also be caught
(lambda: OSError(), TransportError), (OSError, TransportError),
(lambda: ssl.SSLError(), SSLError), (ssl.SSLError, SSLError),
(lambda: ssl.SSLCertVerificationError(), CertificateVerifyError), (ssl.SSLCertVerificationError, CertificateVerifyError),
(lambda: socks.ProxyError(), ProxyError), (socks.ProxyError, ProxyError),
]) ])
# ruff: enable[PLW0108]
def test_request_error_mapping(self, handler, monkeypatch, raised, expected): def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
import websockets.sync.client import websockets.sync.client
@@ -482,11 +484,12 @@ class TestWebsocketsRequestHandler:
@pytest.mark.parametrize('raised,expected,match', [ @pytest.mark.parametrize('raised,expected,match', [
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.send # https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.send
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None), (lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
(lambda: RuntimeError(), TransportError, None), (RuntimeError, TransportError, None),
(lambda: TimeoutError(), TransportError, None), (TimeoutError, TransportError, None),
(lambda: TypeError(), RequestError, None), (TypeError, RequestError, None),
(lambda: socks.ProxyError(), ProxyError, None), (socks.ProxyError, ProxyError, None),
# Catch-all # Catch-all
# ruff: noqa: PLW0108 `websockets` may not be available
(lambda: websockets.exceptions.WebSocketException(), TransportError, None), (lambda: websockets.exceptions.WebSocketException(), TransportError, None),
]) ])
def test_ws_send_error_mapping(self, handler, monkeypatch, raised, expected, match): def test_ws_send_error_mapping(self, handler, monkeypatch, raised, expected, match):
@@ -499,10 +502,11 @@ class TestWebsocketsRequestHandler:
@pytest.mark.parametrize('raised,expected,match', [ @pytest.mark.parametrize('raised,expected,match', [
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.recv # https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.recv
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None), (lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
(lambda: RuntimeError(), TransportError, None), (RuntimeError, TransportError, None),
(lambda: TimeoutError(), TransportError, None), (TimeoutError, TransportError, None),
(lambda: socks.ProxyError(), ProxyError, None), (socks.ProxyError, ProxyError, None),
# Catch-all # Catch-all
# ruff: noqa: PLW0108 `websockets` may not be available
(lambda: websockets.exceptions.WebSocketException(), TransportError, None), (lambda: websockets.exceptions.WebSocketException(), TransportError, None),
]) ])
def test_ws_recv_error_mapping(self, handler, monkeypatch, raised, expected, match): def test_ws_recv_error_mapping(self, handler, monkeypatch, raised, expected, match):

View File

@@ -1168,6 +1168,7 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
# We use Morsel's legal key chars to avoid errors on setting values # We use Morsel's legal key chars to avoid errors on setting values
_LEGAL_KEY_CHARS = r'\w\d' + re.escape('!#$%&\'*+-.:^_`|~') _LEGAL_KEY_CHARS = r'\w\d' + re.escape('!#$%&\'*+-.:^_`|~')
_LEGAL_VALUE_CHARS = _LEGAL_KEY_CHARS + re.escape('(),/<=>?@[]{}') _LEGAL_VALUE_CHARS = _LEGAL_KEY_CHARS + re.escape('(),/<=>?@[]{}')
_LEGAL_KEY_RE = re.compile(rf'[{_LEGAL_KEY_CHARS}]+', re.ASCII)
_RESERVED = { _RESERVED = {
'expires', 'expires',
@@ -1185,17 +1186,17 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
# Added 'bad' group to catch the remaining value # Added 'bad' group to catch the remaining value
_COOKIE_PATTERN = re.compile(r''' _COOKIE_PATTERN = re.compile(r'''
\s* # Optional whitespace at start of cookie [ ]* # Optional whitespace at start of cookie
(?P<key> # Start of group 'key' (?P<key> # Start of group 'key'
[''' + _LEGAL_KEY_CHARS + r''']+?# Any word of at least one letter [^ =;]+ # Match almost anything here for now and validate later
) # End of group 'key' ) # End of group 'key'
( # Optional group: there may not be a value. ( # Optional group: there may not be a value.
\s*=\s* # Equal Sign [ ]*=[ ]* # Equal Sign
( # Start of potential value ( # Start of potential value
(?P<val> # Start of group 'val' (?P<val> # Start of group 'val'
"(?:[^\\"]|\\.)*" # Any doublequoted string "(?:[^\\"]|\\.)*" # Any doublequoted string
| # or | # or
\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr \w{3},\ [\w\d -]{9,11}\ [\d:]{8}\ GMT # Special case for "expires" attr
| # or | # or
[''' + _LEGAL_VALUE_CHARS + r''']* # Any word or empty string [''' + _LEGAL_VALUE_CHARS + r''']* # Any word or empty string
) # End of group 'val' ) # End of group 'val'
@@ -1203,10 +1204,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
(?P<bad>(?:\\;|[^;])*?) # 'bad' group fallback for invalid values (?P<bad>(?:\\;|[^;])*?) # 'bad' group fallback for invalid values
) # End of potential value ) # End of potential value
)? # End of optional value group )? # End of optional value group
\s* # Any number of spaces. [ ]* # Any number of spaces.
(\s+|;|$) # Ending either at space, semicolon, or EOS. ([ ]+|;|$) # Ending either at space, semicolon, or EOS.
''', re.ASCII | re.VERBOSE) ''', re.ASCII | re.VERBOSE)
# http.cookies.Morsel raises on values w/ control characters in Python 3.14.3+ & 3.13.12+
# Ref: https://github.com/python/cpython/issues/143919
_CONTROL_CHARACTER_RE = re.compile(r'[\x00-\x1F\x7F]')
def load(self, data): def load(self, data):
# Workaround for https://github.com/yt-dlp/yt-dlp/issues/4776 # Workaround for https://github.com/yt-dlp/yt-dlp/issues/4776
if not isinstance(data, str): if not isinstance(data, str):
@@ -1219,6 +1224,9 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
continue continue
key, value = match.group('key', 'val') key, value = match.group('key', 'val')
if not self._LEGAL_KEY_RE.fullmatch(key):
morsel = None
continue
is_attribute = False is_attribute = False
if key.startswith('$'): if key.startswith('$'):
@@ -1237,6 +1245,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
value = True value = True
else: else:
value, _ = self.value_decode(value) value, _ = self.value_decode(value)
# Guard against control characters in quoted attribute values
if self._CONTROL_CHARACTER_RE.search(value):
# While discarding the entire morsel is not very lenient,
# it's better than http.cookies.Morsel raising a CookieError
# and it's probably better to err on the side of caution
self.pop(morsel.key, None)
morsel = None
continue
morsel[key] = value morsel[key] = value
@@ -1246,6 +1262,10 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
elif value is not None: elif value is not None:
morsel = self.get(key, http.cookies.Morsel()) morsel = self.get(key, http.cookies.Morsel())
real_value, coded_value = self.value_decode(value) real_value, coded_value = self.value_decode(value)
# Guard against control characters in quoted cookie values
if self._CONTROL_CHARACTER_RE.search(real_value):
morsel = None
continue
morsel.set(key, real_value, coded_value) morsel.set(key, real_value, coded_value)
self[key] = morsel self[key] = morsel

View File

@@ -311,8 +311,10 @@ from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE from .caracoltv import CaracolTvPlayIE
from .cbc import ( from .cbc import (
CBCIE, CBCIE,
CBCGemContentIE,
CBCGemIE, CBCGemIE,
CBCGemLiveIE, CBCGemLiveIE,
CBCGemOlympicsIE,
CBCGemPlaylistIE, CBCGemPlaylistIE,
CBCListenIE, CBCListenIE,
CBCPlayerIE, CBCPlayerIE,
@@ -1029,6 +1031,10 @@ from .livestream import (
) )
from .livestreamfails import LivestreamfailsIE from .livestreamfails import LivestreamfailsIE
from .lnk import LnkIE from .lnk import LnkIE
from .locipo import (
LocipoIE,
LocipoPlaylistIE,
)
from .loco import LocoIE from .loco import LocoIE
from .loom import ( from .loom import (
LoomFolderIE, LoomFolderIE,
@@ -1071,6 +1077,7 @@ from .markiza import (
) )
from .massengeschmacktv import MassengeschmackTVIE from .massengeschmacktv import MassengeschmackTVIE
from .masters import MastersIE from .masters import MastersIE
from .matchitv import MatchiTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mave import ( from .mave import (
MaveChannelIE, MaveChannelIE,
@@ -1785,7 +1792,10 @@ from .safari import (
from .saitosan import SaitosanIE from .saitosan import SaitosanIE
from .samplefocus import SampleFocusIE from .samplefocus import SampleFocusIE
from .sapo import SapoIE from .sapo import SapoIE
from .sauceplus import SaucePlusIE from .sauceplus import (
SaucePlusChannelIE,
SaucePlusIE,
)
from .sbs import SBSIE from .sbs import SBSIE
from .sbscokr import ( from .sbscokr import (
SBSCoKrAllvodProgramIE, SBSCoKrAllvodProgramIE,
@@ -2174,11 +2184,15 @@ from .tvc import (
TVCIE, TVCIE,
TVCArticleIE, TVCArticleIE,
) )
from .tver import TVerIE from .tver import (
TVerIE,
TVerOlympicIE,
)
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tviplayer import TVIPlayerIE from .tviplayer import TVIPlayerIE
from .tvn24 import TVN24IE from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvo import TvoIE
from .tvopengr import ( from .tvopengr import (
TVOpenGrEmbedIE, TVOpenGrEmbedIE,
TVOpenGrWatchIE, TVOpenGrWatchIE,
@@ -2343,6 +2357,7 @@ from .vimm import (
) )
from .viously import ViouslyIE from .viously import ViouslyIE
from .viqeo import ViqeoIE from .viqeo import ViqeoIE
from .visir import VisirIE
from .viu import ( from .viu import (
ViuIE, ViuIE,
ViuOTTIE, ViuOTTIE,
@@ -2541,7 +2556,6 @@ from .youtube import (
YoutubeNotificationsIE, YoutubeNotificationsIE,
YoutubePlaylistIE, YoutubePlaylistIE,
YoutubeRecommendedIE, YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, YoutubeSearchURLIE,
YoutubeShortsAudioPivotIE, YoutubeShortsAudioPivotIE,

View File

@@ -5,10 +5,12 @@ from ..utils import (
ExtractorError, ExtractorError,
GeoRestrictedError, GeoRestrictedError,
int_or_none, int_or_none,
make_archive_id,
remove_start, remove_start,
traverse_obj,
update_url_query, update_url_query,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
@@ -29,6 +31,19 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'historyvault.com': (None, 'historyvault', None), 'historyvault.com': (None, 'historyvault', None),
'biography.com': (None, 'biography', None), 'biography.com': (None, 'biography', None),
} }
_GRAPHQL_QUERY = '''
query getUserVideo($videoId: ID!) {
video(id: $videoId) {
title
publicUrl
programId
tvSeasonNumber
tvSeasonEpisodeNumber
series {
title
}
}
}'''
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = { query = {
@@ -73,19 +88,39 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
def _extract_aetn_info(self, domain, filter_key, filter_value, url): def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand, software_statement = self._DOMAIN_MAP[domain] requestor_id, brand, software_statement = self._DOMAIN_MAP[domain]
if filter_key == 'canonical':
webpage = self._download_webpage(url, filter_value)
graphql_video_id = self._search_regex(
r'<meta\b[^>]+\bcontent="[^"]*\btpid/(\d+)"', webpage,
'id') or self._html_search_meta('videoId', webpage, 'GraphQL video ID', fatal=True)
else:
graphql_video_id = filter_value
result = self._download_json( result = self._download_json(
f'https://feeds.video.aetnd.com/api/v2/{brand}/videos', 'https://yoga.appsvcs.aetnd.com/', graphql_video_id,
filter_value, query={f'filter[{filter_key}]': filter_value}) query={
result = traverse_obj( 'brand': brand,
result, ('results', 'mode': 'live',
lambda k, v: k == 0 and v[filter_key] == filter_value), 'platform': 'web',
get_all=False) },
if not result: data=json.dumps({
'operationName': 'getUserVideo',
'variables': {
'videoId': graphql_video_id,
},
'query': self._GRAPHQL_QUERY,
}).encode(),
headers={
'Content-Type': 'application/json',
})
result = traverse_obj(result, ('data', 'video', {dict}))
media_url = traverse_obj(result, ('publicUrl', {url_or_none}))
if not media_url:
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True, raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
video_id=remove_start(filter_value, '/')) video_id=remove_start(filter_value, '/'))
title = result['title'] title = result['title']
video_id = result['id'] video_id = result['programId']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id) r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
@@ -100,9 +135,13 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
info.update(self._extract_aen_smil(media_url, video_id, auth)) info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({ info.update({
'title': title, 'title': title,
'series': result.get('seriesName'), 'display_id': graphql_video_id,
'season_number': int_or_none(result.get('tvSeasonNumber')), '_old_archive_ids': [make_archive_id(self, graphql_video_id)],
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')), **traverse_obj(result, {
'series': ('series', 'title', {str}),
'season_number': ('tvSeasonNumber', {int_or_none}),
'episode_number': ('tvSeasonEpisodeNumber', {int_or_none}),
}),
}) })
return info return info
@@ -116,7 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
(?:shows/[^/?#]+/)?videos/[^/?#]+ (?:shows/[^/?#]+/)?videos/[^/?#]+
)''' )'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'https://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': { 'info_dict': {
'id': '22253814', 'id': '22253814',
'ext': 'mp4', 'ext': 'mp4',
@@ -139,11 +178,11 @@ class AENetworksIE(AENetworksBaseIE):
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'Geo-restricted - This content is not available in your location.', 'skip': 'This content requires a valid, unexpired auth token',
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1', 'url': 'https://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'info_dict': { 'info_dict': {
'id': '600587331957', 'id': '147486',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Inlawful Entry', 'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08', 'description': 'md5:57c12115a2b384d883fe64ca50529e08',
@@ -160,6 +199,8 @@ class AENetworksIE(AENetworksBaseIE):
'season_number': 9, 'season_number': 9,
'series': 'Duck Dynasty', 'series': 'Duck Dynasty',
'age_limit': 0, 'age_limit': 0,
'display_id': '600587331957',
'_old_archive_ids': ['aenetworks 600587331957'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
@@ -186,6 +227,7 @@ class AENetworksIE(AENetworksBaseIE):
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': '404 Not Found',
}, { }, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story', 'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story',
'info_dict': { 'info_dict': {
@@ -209,6 +251,7 @@ class AENetworksIE(AENetworksBaseIE):
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'This content requires a valid, unexpired auth token',
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True, 'only_matching': True,
@@ -259,7 +302,7 @@ class AENetworksListBaseIE(AENetworksBaseIE):
domain, slug = self._match_valid_url(url).groups() domain, slug = self._match_valid_url(url).groups()
_, brand, _ = self._DOMAIN_MAP[domain] _, brand, _ = self._DOMAIN_MAP[domain]
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS) playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
base_url = f'http://watch.{domain}' base_url = f'https://watch.{domain}'
entries = [] entries = []
for item in (playlist.get(self._ITEMS_KEY) or []): for item in (playlist.get(self._ITEMS_KEY) or []):

View File

@@ -11,18 +11,18 @@ from ..utils.traversal import traverse_obj
class ApplePodcastsIE(InfoExtractor): class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)' _VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/ferreck-dawn-to-the-break-of-dawn-117/id1625658232?i=1000665010654', 'url': 'https://podcasts.apple.com/us/podcast/urbana-podcast-724-by-david-penn/id1531349107?i=1000748574256',
'md5': '82cc219b8cc1dcf8bfc5a5e99b23b172', 'md5': 'f8a6f92735d0cfbd5e6a7294151e28d8',
'info_dict': { 'info_dict': {
'id': '1000665010654', 'id': '1000748574256',
'ext': 'mp3', 'ext': 'm4a',
'title': 'Ferreck Dawn - To The Break of Dawn 117', 'title': 'URBANA PODCAST 724 BY DAVID PENN',
'episode': 'Ferreck Dawn - To The Break of Dawn 117', 'episode': 'URBANA PODCAST 724 BY DAVID PENN',
'description': 'md5:8c4f5c2c30af17ed6a98b0b9daf15b76', 'description': 'md5:fec77bacba32db8c9b3dda5486ed085f',
'upload_date': '20240812', 'upload_date': '20260206',
'timestamp': 1723449600, 'timestamp': 1770400801,
'duration': 3596, 'duration': 3602,
'series': 'Ferreck Dawn - To The Break of Dawn', 'series': 'Urbana Radio Show',
'thumbnail': 're:.+[.](png|jpe?g|webp)', 'thumbnail': 're:.+[.](png|jpe?g|webp)',
}, },
}, { }, {
@@ -57,22 +57,22 @@ class ApplePodcastsIE(InfoExtractor):
webpage = self._download_webpage(url, episode_id) webpage = self._download_webpage(url, episode_id)
server_data = self._search_json( server_data = self._search_json(
r'<script [^>]*\bid=["\']serialized-server-data["\'][^>]*>', webpage, r'<script [^>]*\bid=["\']serialized-server-data["\'][^>]*>', webpage,
'server data', episode_id, contains_pattern=r'\[{(?s:.+)}\]')[0]['data'] 'server data', episode_id)['data'][0]['data']
model_data = traverse_obj(server_data, ( model_data = traverse_obj(server_data, (
'headerButtonItems', lambda _, v: v['$kind'] == 'share' and v['modelType'] == 'EpisodeLockup', 'headerButtonItems', lambda _, v: v['$kind'] == 'share' and v['modelType'] == 'EpisodeLockup',
'model', {dict}, any)) 'model', {dict}, any))
return { return {
'id': episode_id, 'id': episode_id,
**self._json_ld(
traverse_obj(server_data, ('seoData', 'schemaContent', {dict}))
or self._yield_json_ld(webpage, episode_id, fatal=False), episode_id, fatal=False),
**traverse_obj(model_data, { **traverse_obj(model_data, {
'title': ('title', {str}), 'title': ('title', {str}),
'description': ('summary', {clean_html}), 'description': ('summary', {clean_html}),
'url': ('playAction', 'episodeOffer', 'streamUrl', {clean_podcast_url}), 'url': ('playAction', 'episodeOffer', 'streamUrl', {clean_podcast_url}),
'timestamp': ('releaseDate', {parse_iso8601}), 'timestamp': ('releaseDate', {parse_iso8601}),
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
'episode': ('title', {str}),
'episode_number': ('episodeNumber', {int_or_none}),
'series': ('showTitle', {str}),
}), }),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'vcodec': 'none', 'vcodec': 'none',

View File

@@ -124,7 +124,7 @@ class BilibiliBaseIE(InfoExtractor):
**traverse_obj(play_info, { **traverse_obj(play_info, {
'quality': ('quality', {int_or_none}), 'quality': ('quality', {int_or_none}),
'format_id': ('quality', {str_or_none}), 'format_id': ('quality', {str_or_none}),
'format_note': ('quality', {lambda x: format_names.get(x)}), 'format_note': ('quality', {format_names.get}),
'duration': ('timelength', {float_or_none(scale=1000)}), 'duration': ('timelength', {float_or_none(scale=1000)}),
}), }),
**parse_resolution(format_names.get(play_info.get('quality'))), **parse_resolution(format_names.get(play_info.get('quality'))),

View File

@@ -10,6 +10,7 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty,
js_to_json, js_to_json,
jwt_decode_hs256, jwt_decode_hs256,
mimetype2ext, mimetype2ext,
@@ -25,6 +26,7 @@ from ..utils import (
url_basename, url_basename,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
from ..utils.traversal import require, traverse_obj, trim_str from ..utils.traversal import require, traverse_obj, trim_str
@@ -540,6 +542,32 @@ class CBCGemBaseIE(InfoExtractor):
f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}', f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}',
display_id or item_id, query={'device': 'web'}) display_id or item_id, query={'device': 'web'})
def _call_media_api(self, media_id, app_code='gem', display_id=None, headers=None):
media_data = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
display_id or media_id, headers=headers, query={
'appCode': app_code,
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': media_id,
})
error_code = traverse_obj(media_data, ('errorCode', {int}))
if error_code == 1:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
if error_code == 35:
self.raise_login_required(method='password')
if error_code != 0:
error_message = join_nonempty(error_code, media_data.get('message'), delim=' - ')
raise ExtractorError(f'{self.IE_NAME} said: {error_message}')
return media_data
def _extract_item_info(self, item_info): def _extract_item_info(self, item_info):
episode_number = None episode_number = None
title = traverse_obj(item_info, ('title', {str})) title = traverse_obj(item_info, ('title', {str}))
@@ -567,7 +595,7 @@ class CBCGemBaseIE(InfoExtractor):
class CBCGemIE(CBCGemBaseIE): class CBCGemIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca' IE_NAME = 'gem.cbc.ca'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]+)' _VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]{2,4})/?(?:[?#]|$)'
_TESTS = [{ _TESTS = [{
# This is a normal, public, TV show video # This is a normal, public, TV show video
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01', 'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
@@ -709,29 +737,10 @@ class CBCGemIE(CBCGemBaseIE):
if claims_token := self._fetch_claims_token(): if claims_token := self._fetch_claims_token():
headers['x-claims-token'] = claims_token headers['x-claims-token'] = claims_token
m3u8_info = self._download_json( m3u8_url = self._call_media_api(
'https://services.radio-canada.ca/media/validation/v2/', item_info['idMedia'], display_id=video_id, headers=headers)['url']
video_id, headers=headers, query={
'appCode': 'gem',
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': item_info['idMedia'],
})
if m3u8_info.get('errorCode') == 1:
self.raise_geo_restricted(countries=['CA'])
elif m3u8_info.get('errorCode') == 35:
self.raise_login_required(method='password')
elif m3u8_info.get('errorCode') != 0:
raise ExtractorError(f'{self.IE_NAME} said: {m3u8_info.get("errorCode")} - {m3u8_info.get("message")}')
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
m3u8_info['url'], video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''}) m3u8_url, video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
for fmt in formats: for fmt in formats:
@@ -801,7 +810,128 @@ class CBCGemPlaylistIE(CBCGemBaseIE):
}), series=traverse_obj(show_info, ('title', {str}))) }), series=traverse_obj(show_info, ('title', {str})))
class CBCGemLiveIE(InfoExtractor): class CBCGemContentIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:content'
IE_DESC = False # Do not list
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>[0-9a-z-]+)/?(?:[?#]|$)'
_TESTS = [{
# Series URL; content_type == 'Season'
'url': 'https://gem.cbc.ca/the-tunnel',
'playlist_count': 3,
'info_dict': {
'id': 'the-tunnel',
},
}, {
# Miniseries URL; content_type == 'Parts'
'url': 'https://gem.cbc.ca/summit-72',
'playlist_count': 1,
'info_dict': {
'id': 'summit-72',
},
}, {
# Olympics URL; content_type == 'Standalone'
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086',
'info_dict': {
'id': 'ski-jumping-nh-individual-womens-final-30086',
'ext': 'mp4',
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
'duration': 12793,
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
'release_timestamp': 1770482100,
'release_date': '20260207',
'live_status': 'was_live',
},
}, {
# Movie URL; content_type == 'Standalone'; requires authentication
'url': 'https://gem.cbc.ca/copa-71',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._search_nextjs_data(webpage, display_id)['props']['pageProps']['data']
content_type = data['contentType']
self.write_debug(f'Routing for content type "{content_type}"')
if content_type == 'Standalone':
new_url = traverse_obj(data, (
'header', 'cta', 'media', 'url', {urljoin('https://gem.cbc.ca/')}))
if CBCGemOlympicsIE.suitable(new_url):
return self.url_result(new_url, CBCGemOlympicsIE)
# Manually construct non-Olympics standalone URLs to avoid returning trailer URLs
return self.url_result(f'https://gem.cbc.ca/{display_id}/s01e01', CBCGemIE)
# Handle series URLs (content_type == 'Season') and miniseries URLs (content_type == 'Parts')
def entries():
for playlist_url in traverse_obj(data, (
'content', ..., 'lineups', ..., 'url', {urljoin('https://gem.cbc.ca/')},
{lambda x: x if CBCGemPlaylistIE.suitable(x) else None},
)):
yield self.url_result(playlist_url, CBCGemPlaylistIE)
return self.playlist_result(entries(), display_id)
class CBCGemOlympicsIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:olympics'
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>(?:[0-9a-z]+-)+[0-9]{5,})/s01e(?P<media_id>[0-9]{5,})'
_TESTS = [{
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086/s01e30086',
'info_dict': {
'id': 'ski-jumping-nh-individual-womens-final-30086',
'ext': 'mp4',
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
'duration': 12793,
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
'release_timestamp': 1770482100,
'release_date': '20260207',
'live_status': 'was_live',
},
}]
def _real_extract(self, url):
video_id, media_id = self._match_valid_url(url).group('id', 'media_id')
video_info = self._call_show_api(video_id)
item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items',
lambda _, v: v['formattedIdMedia'] == media_id, any, {require('item info')}))
live_status = {
'LiveEvent': 'is_live',
'Replay': 'was_live',
}.get(item_info.get('type'))
release_timestamp = traverse_obj(item_info, (
'metadata', (('live', 'startDate'), ('replay', 'airDate')), {parse_iso8601}, any))
if live_status == 'is_live' and release_timestamp and release_timestamp > time.time():
formats = []
live_status = 'is_upcoming'
self.raise_no_formats('This livestream has not yet started', expected=True)
else:
m3u8_url = self._call_media_api(media_id, 'medianetlive', video_id)['url']
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
return {
'id': video_id,
'formats': formats,
'live_status': live_status,
'release_timestamp': release_timestamp,
**traverse_obj(item_info, {
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('images', 'card', 'url', {url_or_none}),
'duration': ('metadata', 'replay', 'duration', {int_or_none}),
}),
}
class CBCGemLiveIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:live' IE_NAME = 'gem.cbc.ca:live'
_VALID_URL = r'https?://gem\.cbc\.ca/live(?:-event)?/(?P<id>\d+)' _VALID_URL = r'https?://gem\.cbc\.ca/live(?:-event)?/(?P<id>\d+)'
_TESTS = [ _TESTS = [
@@ -871,7 +1001,6 @@ class CBCGemLiveIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}, },
] ]
_GEO_COUNTRIES = ['CA']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -900,19 +1029,8 @@ class CBCGemLiveIE(InfoExtractor):
live_status = 'is_upcoming' live_status = 'is_upcoming'
self.raise_no_formats('This livestream has not yet started', expected=True) self.raise_no_formats('This livestream has not yet started', expected=True)
else: else:
stream_data = self._download_json( m3u8_url = self._call_media_api(video_stream_id, 'medianetlive', video_id)['url']
'https://services.radio-canada.ca/media/validation/v2/', video_id, query={ formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
'appCode': 'medianetlive',
'connectionType': 'hd',
'deviceType': 'ipad',
'idMedia': video_stream_id,
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestType': 'desktop',
})
formats = self._extract_m3u8_formats(
stream_data['url'], video_id, 'mp4', live=live_status == 'is_live')
return { return {
'id': video_id, 'id': video_id,

View File

@@ -661,9 +661,11 @@ class InfoExtractor:
if not self._ready: if not self._ready:
self._initialize_pre_login() self._initialize_pre_login()
if self.supports_login(): if self.supports_login():
username, password = self._get_login_info() # try login only if it would actually do anything
if username: if type(self)._perform_login is not InfoExtractor._perform_login:
self._perform_login(username, password) username, password = self._get_login_info()
if username:
self._perform_login(username, password)
elif self.get_param('username') and False not in (self.IE_DESC, self._NETRC_MACHINE): elif self.get_param('username') and False not in (self.IE_DESC, self._NETRC_MACHINE):
self.report_warning(f'Login with password is not supported for this website. {self._login_hint("cookies")}') self.report_warning(f'Login with password is not supported for this website. {self._login_hint("cookies")}')
self._real_initialize() self._real_initialize()
@@ -1385,6 +1387,11 @@ class InfoExtractor:
def _get_netrc_login_info(self, netrc_machine=None): def _get_netrc_login_info(self, netrc_machine=None):
netrc_machine = netrc_machine or self._NETRC_MACHINE netrc_machine = netrc_machine or self._NETRC_MACHINE
if not netrc_machine:
raise ExtractorError(f'Missing netrc_machine and {type(self).__name__}._NETRC_MACHINE')
ALLOWED = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_'
if netrc_machine.startswith(('-', '_')) or not all(c in ALLOWED for c in netrc_machine):
raise ExtractorError(f'Invalid netrc machine: {netrc_machine!r}', expected=True)
cmd = self.get_param('netrc_cmd') cmd = self.get_param('netrc_cmd')
if cmd: if cmd:

View File

@@ -384,8 +384,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
last_error = None last_error = None
for note, kwargs in ( for note, kwargs in (
('Downloading m3u8 information', {}), ('Downloading m3u8 information with randomized headers', {
('Retrying m3u8 download with randomized headers', {
'headers': self._generate_blockbuster_headers(), 'headers': self._generate_blockbuster_headers(),
}), }),
('Retrying m3u8 download with Chrome impersonation', { ('Retrying m3u8 download with Chrome impersonation', {

View File

@@ -1041,8 +1041,6 @@ class FacebookAdsIE(InfoExtractor):
'uploader': 'Casper', 'uploader': 'Casper',
'uploader_id': '224110981099062', 'uploader_id': '224110981099062',
'uploader_url': 'https://www.facebook.com/Casper/', 'uploader_url': 'https://www.facebook.com/Casper/',
'timestamp': 1766299837,
'upload_date': '20251221',
'like_count': int, 'like_count': int,
}, },
'playlist_count': 2, 'playlist_count': 2,
@@ -1054,12 +1052,23 @@ class FacebookAdsIE(InfoExtractor):
'uploader': 'Case \u00e0 Chocs', 'uploader': 'Case \u00e0 Chocs',
'uploader_id': '112960472096793', 'uploader_id': '112960472096793',
'uploader_url': 'https://www.facebook.com/Caseachocs/', 'uploader_url': 'https://www.facebook.com/Caseachocs/',
'timestamp': 1768498293,
'upload_date': '20260115',
'like_count': int, 'like_count': int,
'description': 'md5:f02a255fcf7dce6ed40e9494cf4bc49a', 'description': 'md5:f02a255fcf7dce6ed40e9494cf4bc49a',
}, },
'playlist_count': 3, 'playlist_count': 3,
}, {
'url': 'https://www.facebook.com/ads/library/?id=1704834754236452',
'info_dict': {
'id': '1704834754236452',
'ext': 'mp4',
'title': 'Get answers now!',
'description': 'Ask the best psychics and get accurate answers on questions that bother you!',
'uploader': 'Your Relationship Advisor',
'uploader_id': '108939234726306',
'uploader_url': 'https://www.facebook.com/100068970634636/',
'like_count': int,
'thumbnail': r're:https://.+/.+\.jpg',
},
}, { }, {
'url': 'https://es-la.facebook.com/ads/library/?id=901230958115569', 'url': 'https://es-la.facebook.com/ads/library/?id=901230958115569',
'only_matching': True, 'only_matching': True,
@@ -1123,8 +1132,11 @@ class FacebookAdsIE(InfoExtractor):
post_data = traverse_obj( post_data = traverse_obj(
re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})</script>', webpage), (..., {json.loads})) re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})</script>', webpage), (..., {json.loads}))
data = get_first(post_data, ( data = get_first(post_data, (
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ..., 'require', ..., ..., ..., '__bbox', 'require', ..., ..., ..., (
'entryPointRoot', 'otherProps', 'deeplinkAdCard', 'snapshot', {dict})) ('__bbox', 'result', 'data', 'ad_library_main', 'deeplink_ad_archive_result', 'deeplink_ad_archive'),
# old path
('entryPointRoot', 'otherProps', 'deeplinkAdCard'),
), 'snapshot', {dict}))
if not data: if not data:
raise ExtractorError('Unable to extract ad data') raise ExtractorError('Unable to extract ad data')
@@ -1140,11 +1152,12 @@ class FacebookAdsIE(InfoExtractor):
'title': title, 'title': title,
'description': markup or None, 'description': markup or None,
}, traverse_obj(data, { }, traverse_obj(data, {
'description': ('link_description', {lambda x: x if not x.startswith('{{product.') else None}), 'description': (
(('body', 'text'), 'link_description'),
{lambda x: x if not x.startswith('{{product.') else None}, any),
'uploader': ('page_name', {str}), 'uploader': ('page_name', {str}),
'uploader_id': ('page_id', {str_or_none}), 'uploader_id': ('page_id', {str_or_none}),
'uploader_url': ('page_profile_uri', {url_or_none}), 'uploader_url': ('page_profile_uri', {url_or_none}),
'timestamp': ('creation_time', {int_or_none}),
'like_count': ('page_like_count', {int_or_none}), 'like_count': ('page_like_count', {int_or_none}),
})) }))
@@ -1155,7 +1168,8 @@ class FacebookAdsIE(InfoExtractor):
entries.append({ entries.append({
'id': f'{video_id}_{idx}', 'id': f'{video_id}_{idx}',
'title': entry.get('title') or title, 'title': entry.get('title') or title,
'description': traverse_obj(entry, 'body', 'link_description') or info_dict.get('description'), 'description': traverse_obj(
entry, 'body', 'link_description', expected_type=str) or info_dict.get('description'),
'thumbnail': url_or_none(entry.get('video_preview_image_url')), 'thumbnail': url_or_none(entry.get('video_preview_image_url')),
'formats': self._extract_formats(entry), 'formats': self._extract_formats(entry),
}) })

View File

@@ -3,10 +3,12 @@ import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
mimetype2ext, mimetype2ext,
parse_qs, parse_qs,
unescapeHTML,
unified_strdate, unified_strdate,
url_or_none, url_or_none,
) )
@@ -107,6 +109,11 @@ class FirstTVIE(InfoExtractor):
'timestamp': ('dvr_begin_at', {int_or_none}), 'timestamp': ('dvr_begin_at', {int_or_none}),
'upload_date': ('date_air', {unified_strdate}), 'upload_date': ('date_air', {unified_strdate}),
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
'chapters': ('episodes', lambda _, v: float_or_none(v['from']) is not None, {
'start_time': ('from', {float_or_none}),
'title': ('name', {str}, {unescapeHTML}),
'end_time': ('to', {float_or_none}),
}),
}), }),
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,

View File

@@ -318,9 +318,48 @@ class FloatplaneIE(FloatplaneBaseIE):
self.raise_login_required() self.raise_login_required()
class FloatplaneChannelIE(InfoExtractor): class FloatplaneChannelBaseIE(InfoExtractor):
"""Subclasses must set _RESULT_IE, _BASE_URL and _PAGE_SIZE"""
def _fetch_page(self, display_id, creator_id, channel_id, page):
query = {
'id': creator_id,
'limit': self._PAGE_SIZE,
'fetchAfter': page * self._PAGE_SIZE,
}
if channel_id:
query['channel'] = channel_id
page_data = self._download_json(
f'{self._BASE_URL}/api/v3/content/creator', display_id,
query=query, note=f'Downloading page {page + 1}')
for post in page_data or []:
yield self.url_result(
f'{self._BASE_URL}/post/{post["id"]}',
self._RESULT_IE, id=post['id'], title=post.get('title'),
release_timestamp=parse_iso8601(post.get('releaseDate')))
def _real_extract(self, url):
creator, channel = self._match_valid_url(url).group('id', 'channel')
display_id = join_nonempty(creator, channel, delim='/')
creator_data = self._download_json(
f'{self._BASE_URL}/api/v3/creator/named',
display_id, query={'creatorURL[0]': creator})[0]
channel_data = traverse_obj(
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
return self.playlist_result(OnDemandPagedList(functools.partial(
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
display_id, title=channel_data.get('title') or creator_data.get('title'),
description=channel_data.get('about') or creator_data.get('about'))
class FloatplaneChannelIE(FloatplaneChannelBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?' _VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
_BASE_URL = 'https://www.floatplane.com'
_PAGE_SIZE = 20 _PAGE_SIZE = 20
_RESULT_IE = FloatplaneIE
_TESTS = [{ _TESTS = [{
'url': 'https://www.floatplane.com/channel/linustechtips/home/ltxexpo', 'url': 'https://www.floatplane.com/channel/linustechtips/home/ltxexpo',
'info_dict': { 'info_dict': {
@@ -346,36 +385,3 @@ class FloatplaneChannelIE(InfoExtractor):
}, },
'playlist_mincount': 200, 'playlist_mincount': 200,
}] }]
def _fetch_page(self, display_id, creator_id, channel_id, page):
query = {
'id': creator_id,
'limit': self._PAGE_SIZE,
'fetchAfter': page * self._PAGE_SIZE,
}
if channel_id:
query['channel'] = channel_id
page_data = self._download_json(
'https://www.floatplane.com/api/v3/content/creator', display_id,
query=query, note=f'Downloading page {page + 1}')
for post in page_data or []:
yield self.url_result(
f'https://www.floatplane.com/post/{post["id"]}',
FloatplaneIE, id=post['id'], title=post.get('title'),
release_timestamp=parse_iso8601(post.get('releaseDate')))
def _real_extract(self, url):
creator, channel = self._match_valid_url(url).group('id', 'channel')
display_id = join_nonempty(creator, channel, delim='/')
creator_data = self._download_json(
'https://www.floatplane.com/api/v3/creator/named',
display_id, query={'creatorURL[0]': creator})[0]
channel_data = traverse_obj(
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
return self.playlist_result(OnDemandPagedList(functools.partial(
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
display_id, title=channel_data.get('title') or creator_data.get('title'),
description=channel_data.get('about') or creator_data.get('about'))

View File

@@ -59,7 +59,7 @@ class GetCourseRuIE(InfoExtractor):
'marafon.mani-beauty.com', 'marafon.mani-beauty.com',
'on.psbook.ru', 'on.psbook.ru',
] ]
_BASE_URL_RE = rf'https?://(?:(?!player02\.)[^.]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})' _BASE_URL_RE = rf'https?://(?:(?!player02\.)[a-zA-Z0-9-]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})'
_VALID_URL = [ _VALID_URL = [
rf'{_BASE_URL_RE}/(?!pl/|teach/)(?P<id>[^?#]+)', rf'{_BASE_URL_RE}/(?!pl/|teach/)(?P<id>[^?#]+)',
rf'{_BASE_URL_RE}/(?:pl/)?teach/control/lesson/view\?(?:[^#]+&)?id=(?P<id>\d+)', rf'{_BASE_URL_RE}/(?:pl/)?teach/control/lesson/view\?(?:[^#]+&)?id=(?P<id>\d+)',

View File

@@ -29,7 +29,7 @@ class LearningOnScreenIE(InfoExtractor):
}] }]
def _real_initialize(self): def _real_initialize(self):
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'): if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-LOS-LIVE'):
self.raise_login_required(method='session_cookies') self.raise_login_required(method='session_cookies')
def _real_extract(self, url): def _real_extract(self, url):

209
yt_dlp/extractor/locipo.py Normal file
View File

@@ -0,0 +1,209 @@
import functools
import math
from .streaks import StreaksBaseIE
from ..networking import HEADRequest
from ..utils import (
InAdvancePagedList,
clean_html,
js_to_json,
parse_iso8601,
parse_qs,
str_or_none,
)
from ..utils.traversal import require, traverse_obj
class LocipoBaseIE(StreaksBaseIE):
_API_BASE = 'https://web-api.locipo.jp'
_BASE_URL = 'https://locipo.jp'
_UUID_RE = r'[\da-f]{8}(?:-[\da-f]{4}){3}-[\da-f]{12}'
def _call_api(self, path, item_id, note, fatal=True):
return self._download_json(
f'{self._API_BASE}/{path}', item_id,
f'Downloading {note} API JSON',
f'Unable to download {note} API JSON',
fatal=fatal)
class LocipoIE(LocipoBaseIE):
_VALID_URL = [
fr'https?://locipo\.jp/creative/(?P<id>{LocipoBaseIE._UUID_RE})',
fr'https?://locipo\.jp/embed/?\?(?:[^#]+&)?id=(?P<id>{LocipoBaseIE._UUID_RE})',
]
_TESTS = [{
'url': 'https://locipo.jp/creative/fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
'info_dict': {
'id': 'fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
'ext': 'mp4',
'title': 'リアルカレカノ#4 ~伊達さゆりと勉強しよっ?~',
'description': 'md5:70a40c202f3fb7946b61e55fa015094c',
'display_id': '5a2947fe596441f5bab88a61b0432d0d',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1711789200,
'release_date': '20240330',
'series': 'リアルカレカノ',
'series_id': '1142',
'tags': 'count:4',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1756984919,
'upload_date': '20250904',
'uploader': '東海テレビ',
'uploader_id': 'locipo-prod',
},
}, {
'url': 'https://locipo.jp/embed/?id=71a334a0-2b25-406f-9d96-88f341f571c2',
'info_dict': {
'id': '71a334a0-2b25-406f-9d96-88f341f571c2',
'ext': 'mp4',
'title': '#1 オーディション/ゲスト伊藤美来、豊田萌絵',
'description': 'md5:5bbcf532474700439cf56ceb6a15630e',
'display_id': '0ab32634b884499a84adb25de844c551',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1751623200,
'release_date': '20250704',
'series': '声優ラジオのウラカブリLocipo出張所',
'series_id': '1454',
'tags': 'count:6',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1757002966,
'upload_date': '20250904',
'uploader': 'テレビ愛知',
'uploader_id': 'locipo-prod',
},
}, {
'url': 'https://locipo.jp/creative/bff9950d-229b-4fe9-911a-7fa71a232f35?list=69a5b15c-901f-4828-a336-30c0de7612d3',
'info_dict': {
'id': '69a5b15c-901f-4828-a336-30c0de7612d3',
'title': '見て・乗って・語りたい。 東海の鉄道沼',
},
'playlist_mincount': 3,
}, {
'url': 'https://locipo.jp/creative/a0751a7f-c7dd-4a10-a7f1-e12720bdf16c?list=006cff3f-ba74-42f0-b4fd-241486ebda2b',
'info_dict': {
'id': 'a0751a7f-c7dd-4a10-a7f1-e12720bdf16c',
'ext': 'mp4',
'title': '#839 人間真空パック',
'description': 'md5:9fe190333b6975c5001c8c9cbe20d276',
'display_id': 'c2b4c9f4a6d648bd8e3c320e384b9d56',
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'release_timestamp': 1746239400,
'release_date': '20250503',
'series': 'でんじろう先生のはぴエネ!',
'series_id': '202',
'tags': 'count:3',
'thumbnail': r're:https?://.+\.(?:jpg|png)',
'timestamp': 1756975909,
'upload_date': '20250904',
'uploader': '中京テレビ',
'uploader_id': 'locipo-prod',
},
'params': {'noplaylist': True},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
playlist_id = traverse_obj(parse_qs(url), ('list', -1, {str}))
if self._yes_playlist(playlist_id, video_id):
return self.url_result(
f'{self._BASE_URL}/playlist/{playlist_id}', LocipoPlaylistIE)
creatives = self._call_api(f'creatives/{video_id}', video_id, 'Creatives')
media_id = traverse_obj(creatives, ('media_id', {str}, {require('Streaks media ID')}))
webpage = self._download_webpage(url, video_id)
config = self._search_json(
r'window\.__NUXT__\.config\s*=', webpage, 'config', video_id, transform_source=js_to_json)
api_key = traverse_obj(config, ('public', 'streaksVodPlaybackApiKey', {str}, {require('api key')}))
return {
**self._extract_from_streaks_api('locipo-prod', media_id, headers={
'Origin': 'https://locipo.jp',
'X-Streaks-Api-Key': api_key,
}),
**traverse_obj(creatives, {
'title': ('name', {clean_html}),
'description': ('description', {clean_html}, filter),
'release_timestamp': ('publication_started_at', {parse_iso8601}),
'tags': ('keyword', {clean_html}, {lambda x: x.split(',')}, ..., {str.strip}, filter),
'uploader': ('company', 'name', {clean_html}, filter),
}),
**traverse_obj(creatives, ('series', {
'series': ('name', {clean_html}, filter),
'series_id': ('id', {str_or_none}),
})),
'id': video_id,
}
class LocipoPlaylistIE(LocipoBaseIE):
_VALID_URL = [
fr'https?://locipo\.jp/(?P<type>playlist)/(?P<id>{LocipoBaseIE._UUID_RE})',
r'https?://locipo\.jp/(?P<type>series)/(?P<id>\d+)',
]
_TESTS = [{
'url': 'https://locipo.jp/playlist/35d3dd2b-531d-4824-8575-b1c527d29538',
'info_dict': {
'id': '35d3dd2b-531d-4824-8575-b1c527d29538',
'title': 'レシピ集',
},
'playlist_mincount': 135,
}, {
# Redirects to https://locipo.jp/series/1363
'url': 'https://locipo.jp/playlist/fef7c4fb-741f-4d6a-a3a6-754f354302a2',
'info_dict': {
'id': '1363',
'title': 'CBCアナウンサー公式【みてちょてれび】',
'description': 'md5:50a1b23e63112d5c06c882835c8c1fb1',
},
'playlist_mincount': 38,
}, {
'url': 'https://locipo.jp/series/503',
'info_dict': {
'id': '503',
'title': 'FishingLover東海',
'description': '東海地区の釣り場でフィッシングの魅力を余すところなくご紹介!!',
},
'playlist_mincount': 223,
}]
_PAGE_SIZE = 100
def _fetch_page(self, path, playlist_id, page):
creatives = self._download_json(
f'{self._API_BASE}/{path}/{playlist_id}/creatives',
playlist_id, f'Downloading page {page + 1}', query={
'premium': False,
'live': False,
'limit': self._PAGE_SIZE,
'offset': page * self._PAGE_SIZE,
})
for video_id in traverse_obj(creatives, ('items', ..., 'id', {str})):
yield self.url_result(f'{self._BASE_URL}/creative/{video_id}', LocipoIE)
def _real_extract(self, url):
playlist_type, playlist_id = self._match_valid_url(url).group('type', 'id')
if urlh := self._request_webpage(HEADRequest(url), playlist_id, fatal=False):
playlist_type, playlist_id = self._match_valid_url(urlh.url).group('type', 'id')
path = 'playlists' if playlist_type == 'playlist' else 'series'
creatives = self._call_api(
f'{path}/{playlist_id}/creatives', playlist_id, path.capitalize())
entries = InAdvancePagedList(
functools.partial(self._fetch_page, path, playlist_id),
math.ceil(int(creatives['total']) / self._PAGE_SIZE), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
**traverse_obj(creatives, ('items', ..., playlist_type, {
'title': ('name', {clean_html}, filter),
'description': ('description', {clean_html}, filter),
}, any)))

View File

@@ -0,0 +1,38 @@
from .common import InfoExtractor
from ..utils import join_nonempty, unified_strdate
from ..utils.traversal import traverse_obj
class MatchiTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?matchi\.tv/watch/?\?(?:[^#]+&)?s=(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://matchi.tv/watch?s=0euhjzrxsjm',
'info_dict': {
'id': '0euhjzrxsjm',
'ext': 'mp4',
'title': 'Court 2 at Stratford Padel Club 2024-07-13T18:32:24',
'thumbnail': 'https://thumbnails.padelgo.tv/0euhjzrxsjm.jpg',
'upload_date': '20240713',
},
}, {
'url': 'https://matchi.tv/watch?s=FkKDJ9SvAx1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
loaded_media = traverse_obj(
self._search_nextjs_data(webpage, video_id, fatal=False),
('props', 'pageProps', 'loadedMedia', {dict})) or {}
start_date_time = traverse_obj(loaded_media, ('startDateTime', {str}))
return {
'id': video_id,
'title': join_nonempty(loaded_media.get('courtDescription'), start_date_time, delim=' '),
'thumbnail': f'https://thumbnails.padelgo.tv/{video_id}.jpg',
'upload_date': unified_strdate(start_date_time),
'formats': self._extract_m3u8_formats(
f'https://streams.padelgo.tv/v2/streams/m3u8/{video_id}/anonymous/playlist.m3u8',
video_id, 'mp4', m3u8_id='hls'),
}

View File

@@ -25,7 +25,7 @@ class MixcloudBaseIE(InfoExtractor):
%s %s
} }
}''' % (lookup_key, username, f', slug: "{slug}"' if slug else '', object_fields), # noqa: UP031 }''' % (lookup_key, username, f', slug: "{slug}"' if slug else '', object_fields), # noqa: UP031
})['data'][lookup_key] }, impersonate=True)['data'][lookup_key]
class MixcloudIE(MixcloudBaseIE): class MixcloudIE(MixcloudBaseIE):

View File

@@ -9,13 +9,13 @@ from ..utils import (
int_or_none, int_or_none,
qualities, qualities,
smuggle_url, smuggle_url,
traverse_obj,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unsmuggle_url, unsmuggle_url,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import find_element, traverse_obj
class OdnoklassnikiIE(InfoExtractor): class OdnoklassnikiIE(InfoExtractor):
@@ -264,9 +264,7 @@ class OdnoklassnikiIE(InfoExtractor):
note='Downloading desktop webpage', note='Downloading desktop webpage',
headers={'Referer': smuggled['referrer']} if smuggled.get('referrer') else {}) headers={'Referer': smuggled['referrer']} if smuggled.get('referrer') else {})
error = self._search_regex( error = traverse_obj(webpage, {find_element(cls='vp_video_stub_txt')})
r'[^>]+class="vp_video_stub_txt"[^>]*>([^<]+)<',
webpage, 'error', default=None)
# Direct link from boosty # Direct link from boosty
if (error == 'The author of this video has not been found or is blocked' if (error == 'The author of this video has not been found or is blocked'
and not smuggled.get('referrer') and mode == 'videoembed'): and not smuggled.get('referrer') and mode == 'videoembed'):

View File

@@ -33,7 +33,8 @@ class OpencastBaseIE(InfoExtractor):
vid\.igb\.illinois\.edu| vid\.igb\.illinois\.edu|
cursosabertos\.c3sl\.ufpr\.br| cursosabertos\.c3sl\.ufpr\.br|
mcmedia\.missioncollege\.org| mcmedia\.missioncollege\.org|
clases\.odon\.edu\.uy clases\.odon\.edu\.uy|
oc-p\.uni-jena\.de
)''' )'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}' _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
@@ -106,7 +107,7 @@ class OpencastBaseIE(InfoExtractor):
class OpencastIE(OpencastBaseIE): class OpencastIE(OpencastBaseIE):
_VALID_URL = rf'''(?x) _VALID_URL = rf'''(?x)
https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella/ui/watch\.html\? https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella[0-9]*/ui/watch\.html\?
(?:[^#]+&)?id=(?P<id>{OpencastBaseIE._UUID_RE})''' (?:[^#]+&)?id=(?P<id>{OpencastBaseIE._UUID_RE})'''
_API_BASE = 'https://%s/search/episode.json?id=%s' _API_BASE = 'https://%s/search/episode.json?id=%s'
@@ -131,8 +132,12 @@ class OpencastIE(OpencastBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
host, video_id = self._match_valid_url(url).group('host', 'id') host, video_id = self._match_valid_url(url).group('host', 'id')
return self._parse_mediapackage( response = self._call_api(host, video_id)
self._call_api(host, video_id)['search-results']['result']['mediapackage']) package = traverse_obj(response, (
('search-results', 'result'),
('result', ...), # Path needed for oc-p.uni-jena.de
'mediapackage', {dict}, any)) or {}
return self._parse_mediapackage(package)
class OpencastPlaylistIE(OpencastBaseIE): class OpencastPlaylistIE(OpencastBaseIE):

View File

@@ -128,7 +128,7 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = rf'''(?x) _VALID_URL = rf'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)? (?:[a-zA-Z0-9.-]+\.)?
{PornHubBaseIE._PORNHUB_HOST_RE} {PornHubBaseIE._PORNHUB_HOST_RE}
/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| /(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
@@ -506,6 +506,7 @@ class PornHubIE(PornHubBaseIE):
'cast': ({find_elements(attr='data-label', value='pornstar')}, ..., {clean_html}), 'cast': ({find_elements(attr='data-label', value='pornstar')}, ..., {clean_html}),
}), }),
'subtitles': subtitles, 'subtitles': subtitles,
'http_headers': {'Referer': f'https://www.{host}/'},
}, info) }, info)
@@ -533,7 +534,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
class PornHubUserIE(PornHubPlaylistBaseIE): class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = rf'(?P<url>https?://(?:[^/]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' _VALID_URL = rf'(?P<url>https?://(?:[a-zA-Z0-9.-]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph', 'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118, 'playlist_mincount': 118,

View File

@@ -1,4 +1,4 @@
from .floatplane import FloatplaneBaseIE from .floatplane import FloatplaneBaseIE, FloatplaneChannelBaseIE
class SaucePlusIE(FloatplaneBaseIE): class SaucePlusIE(FloatplaneBaseIE):
@@ -39,3 +39,19 @@ class SaucePlusIE(FloatplaneBaseIE):
def _real_initialize(self): def _real_initialize(self):
if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'): if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'):
self.raise_login_required() self.raise_login_required()
class SaucePlusChannelIE(FloatplaneChannelBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?sauceplus\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
_BASE_URL = 'https://www.sauceplus.com'
_RESULT_IE = SaucePlusIE
_PAGE_SIZE = 20
_TESTS = [{
'url': 'https://www.sauceplus.com/channel/williamosman/home',
'info_dict': {
'id': 'williamosman',
'title': 'William Osman',
'description': 'md5:a67bc961d23c293b2c5308d84f34f26c',
},
'playlist_mincount': 158,
}]

View File

@@ -146,8 +146,8 @@ class SBSIE(InfoExtractor):
'release_year': ('releaseYear', {int_or_none}), 'release_year': ('releaseYear', {int_or_none}),
'duration': ('duration', ({float_or_none}, {parse_duration})), 'duration': ('duration', ({float_or_none}, {parse_duration})),
'is_live': ('liveStream', {bool}), 'is_live': ('liveStream', {bool}),
'age_limit': (('classificationID', 'contentRating'), {str.upper}, { 'age_limit': (
lambda x: self._AUS_TV_PARENTAL_GUIDELINES.get(x)}), # dict.get is unhashable in py3.7 ('classificationID', 'contentRating'), {str.upper}, {self._AUS_TV_PARENTAL_GUIDELINES.get}),
}, get_all=False), }, get_all=False),
**traverse_obj(media, { **traverse_obj(media, {
'categories': (('genres', ...), ('taxonomy', ('genre', 'subgenre'), 'name'), {str}), 'categories': (('genres', ...), ('taxonomy', ('genre', 'subgenre'), 'name'), {str}),

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from ..networking import HEADRequest from ..networking import HEADRequest
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..networking.impersonate import ImpersonateTarget
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
@@ -118,9 +119,9 @@ class SoundcloudBaseIE(InfoExtractor):
self.cache.store('soundcloud', 'client_id', client_id) self.cache.store('soundcloud', 'client_id', client_id)
def _update_client_id(self): def _update_client_id(self):
webpage = self._download_webpage('https://soundcloud.com/', None) webpage = self._download_webpage('https://soundcloud.com/', None, 'Downloading main page')
for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)): for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
script = self._download_webpage(src, None, fatal=False) script = self._download_webpage(src, None, 'Downloading JS asset', fatal=False)
if script: if script:
client_id = self._search_regex( client_id = self._search_regex(
r'client_id\s*:\s*"([0-9a-zA-Z]{32})"', r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
@@ -136,13 +137,13 @@ class SoundcloudBaseIE(InfoExtractor):
if non_fatal: if non_fatal:
del kwargs['fatal'] del kwargs['fatal']
query = kwargs.get('query', {}).copy() query = kwargs.get('query', {}).copy()
for _ in range(2): for is_first_attempt in (True, False):
query['client_id'] = self._CLIENT_ID query['client_id'] = self._CLIENT_ID
kwargs['query'] = query kwargs['query'] = query
try: try:
return self._download_json(*args, **kwargs) return self._download_json(*args, **kwargs)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status in (401, 403): if is_first_attempt and isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
self._store_client_id(None) self._store_client_id(None)
self._update_client_id() self._update_client_id()
continue continue
@@ -152,7 +153,10 @@ class SoundcloudBaseIE(InfoExtractor):
raise raise
def _initialize_pre_login(self): def _initialize_pre_login(self):
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf' self._CLIENT_ID = self.cache.load('soundcloud', 'client_id')
if self._CLIENT_ID:
return
self._update_client_id()
def _verify_oauth_token(self, token): def _verify_oauth_token(self, token):
if self._request_webpage( if self._request_webpage(
@@ -830,6 +834,30 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
'entries': self._entries(base_url, playlist_id), 'entries': self._entries(base_url, playlist_id),
} }
@functools.cached_property
def _browser_impersonate_target(self):
available_targets = self._downloader._get_available_impersonate_targets()
if not available_targets:
# impersonate=True gives a generic warning when no impersonation targets are available
return True
# Any browser target older than chrome-116 is 403'd by Datadome
MIN_SUPPORTED_TARGET = ImpersonateTarget('chrome', '116', 'windows', '10')
version_as_float = lambda x: float(x.version) if x.version else 0
# Always try to use the newest Chrome target available
filtered = sorted([
target[0] for target in available_targets
if target[0].client == 'chrome' and target[0].os in ('windows', 'macos')
], key=version_as_float)
if not filtered or version_as_float(filtered[-1]) < version_as_float(MIN_SUPPORTED_TARGET):
# All available targets are inadequate or newest available Chrome target is too old, so
# warn the user to upgrade their dependency to a version with the minimum supported target
return MIN_SUPPORTED_TARGET
return filtered[-1]
def _entries(self, url, playlist_id): def _entries(self, url, playlist_id):
# Per the SoundCloud documentation, the maximum limit for a linked partitioning query is 200. # Per the SoundCloud documentation, the maximum limit for a linked partitioning query is 200.
# https://developers.soundcloud.com/blog/offset-pagination-deprecated # https://developers.soundcloud.com/blog/offset-pagination-deprecated
@@ -844,7 +872,9 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
try: try:
response = self._call_api( response = self._call_api(
url, playlist_id, query=query, headers=self._HEADERS, url, playlist_id, query=query, headers=self._HEADERS,
note=f'Downloading track page {i + 1}') note=f'Downloading track page {i + 1}',
# See: https://github.com/yt-dlp/yt-dlp/issues/15660
impersonate=self._browser_impersonate_target)
break break
except ExtractorError as e: except ExtractorError as e:
# Downloading page may result in intermittent 502 HTTP error # Downloading page may result in intermittent 502 HTTP error

View File

@@ -3,6 +3,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
determine_ext, determine_ext,
merge_dicts, merge_dicts,
parse_duration, parse_duration,
@@ -12,6 +13,7 @@ from ..utils import (
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, traverse_obj, trim_str
class SpankBangIE(InfoExtractor): class SpankBangIE(InfoExtractor):
@@ -122,7 +124,7 @@ class SpankBangIE(InfoExtractor):
}), headers={ }), headers={
'Referer': url, 'Referer': url,
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
}) }, impersonate=True)
for format_id, format_url in stream.items(): for format_id, format_url in stream.items():
if format_url and isinstance(format_url, list): if format_url and isinstance(format_url, list):
@@ -178,9 +180,9 @@ class SpankBangPlaylistIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) mobj = self._match_valid_url(url)
playlist_id = mobj.group('id') playlist_id = mobj.group('id')
country = self.get_param('geo_bypass_country') or 'US'
webpage = self._download_webpage( self._set_cookie('.spankbang.com', 'country', country.upper())
url, playlist_id, headers={'Cookie': 'country=US; mobile=on'}) webpage = self._download_webpage(url, playlist_id, impersonate=True)
entries = [self.url_result( entries = [self.url_result(
urljoin(url, mobj.group('path')), urljoin(url, mobj.group('path')),
@@ -189,8 +191,8 @@ class SpankBangPlaylistIE(InfoExtractor):
r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/[^"\'](?:(?!\1).)*)\1', r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/[^"\'](?:(?!\1).)*)\1',
webpage)] webpage)]
title = self._html_search_regex( title = traverse_obj(webpage, (
r'<em>([^<]+)</em>\s+playlist\s*<', webpage, 'playlist title', {find_element(tag='h1', attr='data-testid', value='playlist-title')},
fatal=False) {clean_html}, {trim_str(end=' Playlist')}))
return self.playlist_result(entries, playlist_id, title) return self.playlist_result(entries, playlist_id, title)

View File

@@ -8,15 +8,12 @@ from ..utils import (
extract_attributes, extract_attributes,
join_nonempty, join_nonempty,
js_to_json, js_to_json,
parse_resolution,
str_or_none, str_or_none,
url_basename,
url_or_none, url_or_none,
) )
from ..utils.traversal import ( from ..utils.traversal import find_element, traverse_obj
find_element,
find_elements,
traverse_obj,
trim_str,
)
class SteamIE(InfoExtractor): class SteamIE(InfoExtractor):
@@ -27,7 +24,7 @@ class SteamIE(InfoExtractor):
'id': '105600', 'id': '105600',
'title': 'Terraria', 'title': 'Terraria',
}, },
'playlist_mincount': 3, 'playlist_mincount': 5,
}, { }, {
'url': 'https://store.steampowered.com/app/271590/Grand_Theft_Auto_V/', 'url': 'https://store.steampowered.com/app/271590/Grand_Theft_Auto_V/',
'info_dict': { 'info_dict': {
@@ -37,6 +34,39 @@ class SteamIE(InfoExtractor):
'playlist_mincount': 26, 'playlist_mincount': 26,
}] }]
def _entries(self, app_id, app_name, data_props):
for trailer in traverse_obj(data_props, (
'trailers', lambda _, v: str_or_none(v['id']),
)):
movie_id = str_or_none(trailer['id'])
thumbnails = []
for thumbnail_url in traverse_obj(trailer, (
('poster', 'thumbnail'), {url_or_none},
)):
thumbnails.append({
'url': thumbnail_url,
**parse_resolution(url_basename(thumbnail_url)),
})
formats = []
if hls_manifest := traverse_obj(trailer, ('hlsManifest', {url_or_none})):
formats.extend(self._extract_m3u8_formats(
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
for dash_manifest in traverse_obj(trailer, ('dashManifests', ..., {url_or_none})):
formats.extend(self._extract_mpd_formats(
dash_manifest, app_id, mpd_id='dash', fatal=False))
self._remove_duplicate_formats(formats)
yield {
'id': join_nonempty(app_id, movie_id),
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
'formats': formats,
'series': app_name,
'series_id': app_id,
'thumbnails': thumbnails,
}
def _real_extract(self, url): def _real_extract(self, url):
app_id = self._match_id(url) app_id = self._match_id(url)
@@ -45,32 +75,13 @@ class SteamIE(InfoExtractor):
self._set_cookie('store.steampowered.com', 'lastagecheckage', '1-January-2000') self._set_cookie('store.steampowered.com', 'lastagecheckage', '1-January-2000')
webpage = self._download_webpage(url, app_id) webpage = self._download_webpage(url, app_id)
app_name = traverse_obj(webpage, ({find_element(cls='apphub_AppName')}, {clean_html})) data_props = traverse_obj(webpage, (
{find_element(cls='gamehighlight_desktopcarousel', html=True)},
{extract_attributes}, 'data-props', {json.loads}, {dict}))
app_name = traverse_obj(data_props, ('appName', {clean_html}))
entries = [] return self.playlist_result(
for data_prop in traverse_obj(webpage, ( self._entries(app_id, app_name, data_props), app_id, app_name)
{find_elements(cls='highlight_player_item highlight_movie', html=True)},
..., {extract_attributes}, 'data-props', {json.loads}, {dict},
)):
formats = []
if hls_manifest := traverse_obj(data_prop, ('hlsManifest', {url_or_none})):
formats.extend(self._extract_m3u8_formats(
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
for dash_manifest in traverse_obj(data_prop, ('dashManifests', ..., {url_or_none})):
formats.extend(self._extract_mpd_formats(
dash_manifest, app_id, mpd_id='dash', fatal=False))
movie_id = traverse_obj(data_prop, ('id', {trim_str(start='highlight_movie_')}))
entries.append({
'id': movie_id,
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
'formats': formats,
'series': app_name,
'series_id': app_id,
'thumbnail': traverse_obj(data_prop, ('screenshot', {url_or_none})),
})
return self.playlist_result(entries, app_id, app_name)
class SteamCommunityIE(InfoExtractor): class SteamCommunityIE(InfoExtractor):

View File

@@ -22,7 +22,7 @@ class StreaksBaseIE(InfoExtractor):
_GEO_BYPASS = False _GEO_BYPASS = False
_GEO_COUNTRIES = ['JP'] _GEO_COUNTRIES = ['JP']
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False): def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False, live_from_start=False):
try: try:
response = self._download_json( response = self._download_json(
self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''), self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''),
@@ -83,6 +83,10 @@ class StreaksBaseIE(InfoExtractor):
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query) src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query)
for fmt in fmts:
if live_from_start:
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args': ['-live_start_index', '0']})
fmt['is_from_start'] = True
formats.extend(fmts) formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles) self._merge_subtitles(subs, target=subtitles)

View File

@@ -102,7 +102,7 @@ class TeachableIE(TeachableBaseIE):
_WORKING = False _WORKING = False
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
{}https?://(?P<site_t>[^/]+)| {}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
https?://(?:www\.)?(?P<site>{}) https?://(?:www\.)?(?P<site>{})
) )
/courses/[^/]+/lectures/(?P<id>\d+) /courses/[^/]+/lectures/(?P<id>\d+)
@@ -211,7 +211,7 @@ class TeachableIE(TeachableBaseIE):
class TeachableCourseIE(TeachableBaseIE): class TeachableCourseIE(TeachableBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
{}https?://(?P<site_t>[^/]+)| {}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
https?://(?:www\.)?(?P<site>{}) https?://(?:www\.)?(?P<site>{})
) )
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+) /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)

View File

@@ -9,39 +9,39 @@ class Tele5IE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?P<parent_slug>[\w-]+)/(?P<slug_a>[\w-]+)(?:/(?P<slug_b>[\w-]+))?' _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?P<parent_slug>[\w-]+)/(?P<slug_a>[\w-]+)(?:/(?P<slug_b>[\w-]+))?'
_TESTS = [{ _TESTS = [{
# slug_a and slug_b # slug_a and slug_b
'url': 'https://tele5.de/mediathek/stargate-atlantis/quarantane', 'url': 'https://tele5.de/mediathek/star-trek-enterprise/vox-sola',
'info_dict': { 'info_dict': {
'id': '6852024', 'id': '4140114',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Quarantäne', 'title': 'Vox Sola',
'description': 'md5:6af0373bd0fcc4f13e5d47701903d675', 'description': 'md5:329d115f74324d4364efc1a11c4ea7c9',
'episode': 'Episode 73', 'duration': 2542.76,
'episode_number': 73, 'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
'season': 'Season 4',
'season_number': 4,
'series': 'Stargate Atlantis',
'upload_date': '20240525',
'timestamp': 1716643200,
'duration': 2503.2,
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/21/c81fcb45-8902-309b-badb-4e6d546b575d.jpeg',
'creators': ['Tele5'],
'tags': [], 'tags': [],
'creators': ['Tele5'],
'series': 'Star Trek - Enterprise',
'season': 'Season 1',
'season_number': 1,
'episode': 'Episode 22',
'episode_number': 22,
'timestamp': 1770491100,
'upload_date': '20260207',
}, },
}, { }, {
# only slug_a # only slug_a
'url': 'https://tele5.de/mediathek/inside-out', 'url': 'https://tele5.de/mediathek/30-miles-from-nowhere-im-wald-hoert-dich-niemand-schreien',
'info_dict': { 'info_dict': {
'id': '6819502', 'id': '4102641',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Inside out', 'title': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
'description': 'md5:7e5f32ed0be5ddbd27713a34b9293bfd', 'description': 'md5:0b731539f39ee186ebcd9dd444a86fc2',
'series': 'Inside out', 'duration': 4849.96,
'upload_date': '20240523', 'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
'timestamp': 1716494400,
'duration': 5343.4,
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/15/181eba3c-f9f0-3faf-b14d-0097050a3aa4.jpeg',
'creators': ['Tele5'],
'tags': [], 'tags': [],
'creators': ['Tele5'],
'series': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
'timestamp': 1770417300,
'upload_date': '20260206',
}, },
}, { }, {
# playlist # playlist
@@ -50,20 +50,27 @@ class Tele5IE(DiscoveryPlusBaseIE):
'id': 'mediathek-schlefaz', 'id': 'mediathek-schlefaz',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
'skip': 'Dead link',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
parent_slug, slug_a, slug_b = self._match_valid_url(url).group('parent_slug', 'slug_a', 'slug_b') parent_slug, slug_a, slug_b = self._match_valid_url(url).group('parent_slug', 'slug_a', 'slug_b')
playlist_id = join_nonempty(parent_slug, slug_a, slug_b, delim='-') playlist_id = join_nonempty(parent_slug, slug_a, slug_b, delim='-')
query = {'environment': 'tele5', 'v': '2'} query = {
'include': 'default',
'filter[environment]': 'tele5',
'v': '2',
}
if not slug_b: if not slug_b:
endpoint = f'page/{slug_a}' endpoint = f'page/{slug_a}'
query['parent_slug'] = parent_slug query['parent_slug'] = parent_slug
else: else:
endpoint = f'videos/{slug_b}' endpoint = f'shows/{slug_a}'
query['filter[show.slug]'] = slug_a query['filter[video.slug]'] = slug_b
cms_data = self._download_json(f'https://de-api.loma-cms.com/feloma/{endpoint}/', playlist_id, query=query)
cms_data = self._download_json(f'https://public.aurora.enhanced.live/site/{endpoint}/', playlist_id, query=query)
return self.playlist_result(map( return self.playlist_result(map(
functools.partial(self._get_disco_api_info, url, disco_host='eu1-prod.disco-api.com', realm='dmaxde', country='DE'), functools.partial(self._get_disco_api_info, url, disco_host='eu1-prod.disco-api.com', realm='dmaxde', country='DE'),

View File

@@ -51,7 +51,8 @@ class TruthIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
status = self._download_json(f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id) status = self._download_json(
f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id, impersonate=True)
uploader_id = strip_or_none(traverse_obj(status, ('account', 'username'))) uploader_id = strip_or_none(traverse_obj(status, ('account', 'username')))
return { return {
'id': video_id, 'id': video_id,

View File

@@ -4,6 +4,7 @@ from .streaks import StreaksBaseIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
GeoRestrictedError, GeoRestrictedError,
clean_html,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
make_archive_id, make_archive_id,
@@ -11,7 +12,9 @@ from ..utils import (
str_or_none, str_or_none,
strip_or_none, strip_or_none,
time_seconds, time_seconds,
unified_timestamp,
update_url_query, update_url_query,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj from ..utils.traversal import require, traverse_obj
@@ -257,3 +260,113 @@ class TVerIE(StreaksBaseIE):
'id': video_id, 'id': video_id,
'_old_archive_ids': [make_archive_id('BrightcoveNew', brightcove_id)] if brightcove_id else None, '_old_archive_ids': [make_archive_id('BrightcoveNew', brightcove_id)] if brightcove_id else None,
} }
class TVerOlympicIE(StreaksBaseIE):
IE_NAME = 'tver:olympic'
_API_BASE = 'https://olympic-data.tver.jp/api'
_VALID_URL = r'https?://(?:www\.)?tver\.jp/olympic/milanocortina2026/(?P<type>live|video)/play/(?P<id>\w+)'
_TESTS = [{
'url': 'https://tver.jp/olympic/milanocortina2026/video/play/3b1d4462150b42558d9cc8aabb5238d0/',
'info_dict': {
'id': '3b1d4462150b42558d9cc8aabb5238d0',
'ext': 'mp4',
'title': '【開会式】ぎゅっと凝縮ハイライト',
'display_id': 'ref:3b1d4462150b42558d9cc8aabb5238d0',
'duration': 712.045,
'live_status': 'not_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'tags': 'count:1',
'thumbnail': r're:https://.+\.(?:jpg|png)',
'timestamp': 1770420187,
'upload_date': '20260206',
'uploader_id': 'tver-olympic',
},
}, {
'url': 'https://tver.jp/olympic/milanocortina2026/live/play/glts313itwvj/',
'info_dict': {
'id': 'glts313itwvj',
'ext': 'mp4',
'title': '開会式ハイライト',
'channel_id': 'ntv',
'display_id': 'ref:sp_260207_spc_01_dvr',
'duration': 7680,
'live_status': 'was_live',
'modified_date': r're:\d{8}',
'modified_timestamp': int,
'thumbnail': r're:https://.+\.(?:jpg|png)',
'timestamp': 1770420300,
'upload_date': '20260206',
'uploader_id': 'tver-olympic-live',
},
}]
def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id')
live_from_start = self.get_param('live_from_start')
if video_type == 'live':
project_id = 'tver-olympic-live'
api_key = 'a35ebb1ca7d443758dc7fcc5d99b1f72'
olympic_data = traverse_obj(self._download_json(
f'{self._API_BASE}/live/{video_id}', video_id), ('contents', 'live', {dict}))
media_id = traverse_obj(olympic_data, ('video_id', {str}))
now = time_seconds()
start_timestamp_str = traverse_obj(olympic_data, ('onair_start_date', {str}))
start_timestamp = unified_timestamp(start_timestamp_str, tz_offset=9)
if not start_timestamp:
raise ExtractorError('Unable to extract on-air start time')
end_timestamp = traverse_obj(olympic_data, (
'onair_end_date', {unified_timestamp(tz_offset=9)}, {require('on-air end time')}))
if now < start_timestamp:
self.raise_no_formats(
f'This program is scheduled to start at {start_timestamp_str} JST', expected=True)
return {
'id': video_id,
'live_status': 'is_upcoming',
'release_timestamp': start_timestamp,
}
elif start_timestamp <= now < end_timestamp:
live_status = 'is_live'
if live_from_start:
media_id += '_dvr'
elif end_timestamp <= now:
dvr_end_timestamp = traverse_obj(olympic_data, (
'dvr_end_date', {unified_timestamp(tz_offset=9)}))
if dvr_end_timestamp and now < dvr_end_timestamp:
live_status = 'was_live'
media_id += '_dvr'
else:
raise ExtractorError(
'This program is no longer available', expected=True)
else:
project_id = 'tver-olympic'
api_key = '4b55a4db3cce4ad38df6dd8543e3e46a'
media_id = video_id
live_status = 'not_live'
olympic_data = traverse_obj(self._download_json(
f'{self._API_BASE}/video/{video_id}', video_id), ('contents', 'video', {dict}))
return {
**self._extract_from_streaks_api(project_id, f'ref:{media_id}', {
'Origin': 'https://tver.jp',
'Referer': 'https://tver.jp/',
'X-Streaks-Api-Key': api_key,
}, live_from_start=live_from_start),
**traverse_obj(olympic_data, {
'title': ('title', {clean_html}, filter),
'alt_title': ('sub_title', {clean_html}, filter),
'channel': ('channel', {clean_html}, filter),
'channel_id': ('channel_id', {clean_html}, filter),
'description': (('description', 'description_l', 'description_s'), {clean_html}, filter, any),
'timestamp': ('onair_start_date', {unified_timestamp(tz_offset=9)}),
'thumbnail': (('picture_l_url', 'picture_m_url', 'picture_s_url'), {url_or_none}, any),
}),
'id': video_id,
'live_status': live_status,
}

152
yt_dlp/extractor/tvo.py Normal file
View File

@@ -0,0 +1,152 @@
import json
import urllib.parse
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_duration,
parse_iso8601,
smuggle_url,
str_or_none,
url_or_none,
)
from ..utils.traversal import (
require,
traverse_obj,
trim_str,
)
class TvoIE(InfoExtractor):
IE_NAME = 'TVO'
_VALID_URL = r'https?://(?:www\.)?tvo\.org/video(?:/documentaries)?/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.tvo.org/video/how-can-ontario-survive-the-trade-war',
'info_dict': {
'id': '6377531034112',
'ext': 'mp4',
'title': 'How Can Ontario Survive the Trade War?',
'description': 'md5:e7455d9cd4b6b1270141922044161457',
'display_id': 'how-can-ontario-survive-the-trade-war',
'duration': 3531,
'episode': 'How Can Ontario Survive the Trade War?',
'episode_id': 'how-can-ontario-survive-the-trade-war',
'episode_number': 1,
'season': 'Season 1',
'season_number': 1,
'series': 'TVO at AMO',
'series_id': 'tvo-at-amo',
'tags': 'count:17',
'thumbnail': r're:https?://.+',
'timestamp': 1756944016,
'upload_date': '20250904',
'uploader_id': '18140038001',
},
}, {
'url': 'https://www.tvo.org/video/documentaries/the-pitch',
'info_dict': {
'id': '6382500333112',
'ext': 'mp4',
'title': 'The Pitch',
'categories': ['Documentaries'],
'description': 'md5:9d4246b70dce772a3a396c4bd84c8506',
'display_id': 'the-pitch',
'duration': 5923,
'episode': 'The Pitch',
'episode_id': 'the-pitch',
'episode_number': 1,
'season': 'Season 1',
'season_number': 1,
'series': 'The Pitch',
'series_id': 'the-pitch',
'tags': 'count:8',
'thumbnail': r're:https?://.+',
'timestamp': 1762693216,
'upload_date': '20251109',
'uploader_id': '18140038001',
},
}, {
'url': 'https://www.tvo.org/video/documentaries/valentines-day',
'info_dict': {
'id': '6387298331112',
'ext': 'mp4',
'title': 'Valentine\'s Day',
'categories': ['Documentaries'],
'description': 'md5:b142149beb2d3a855244816c50cd2f14',
'display_id': 'valentines-day',
'duration': 3121,
'episode': 'Valentine\'s Day',
'episode_id': 'valentines-day',
'episode_number': 2,
'season': 'Season 1',
'season_number': 1,
'series': 'How We Celebrate',
'series_id': 'how-we-celebrate',
'tags': 'count:6',
'thumbnail': r're:https?://.+',
'timestamp': 1770386416,
'upload_date': '20260206',
'uploader_id': '18140038001',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/18140038001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
video_data = self._download_json(
'https://hmy0rc1bo2.execute-api.ca-central-1.amazonaws.com/graphql',
display_id, headers={'Content-Type': 'application/json'},
data=json.dumps({
'operationName': 'getVideo',
'variables': {'slug': urllib.parse.urlparse(url).path.rstrip('/')},
'query': '''query getVideo($slug: String) {
getTVOOrgVideo(slug: $slug) {
contentCategory
description
length
program {
nodeUrl
title
}
programOrder
publishedAt
season
tags
thumbnail
title
videoSource {
brightcoveRefId
}
}
}''',
}, separators=(',', ':')).encode(),
)['data']['getTVOOrgVideo']
brightcove_id = traverse_obj(video_data, (
'videoSource', 'brightcoveRefId', {str_or_none}, {require('Brightcove ID')}))
return {
'_type': 'url_transparent',
'ie_key': BrightcoveNewIE.ie_key(),
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['CA']}),
'display_id': display_id,
'episode_id': display_id,
**traverse_obj(video_data, {
'title': ('title', {clean_html}, filter),
'categories': ('contentCategory', {clean_html}, filter, all, filter),
'description': ('description', {clean_html}, filter),
'duration': ('length', {parse_duration}),
'episode': ('title', {clean_html}, filter),
'episode_number': ('programOrder', {int_or_none}),
'season_number': ('season', {int_or_none}),
'tags': ('tags', ..., {clean_html}, filter),
'thumbnail': ('thumbnail', {url_or_none}),
'timestamp': ('publishedAt', {parse_iso8601}),
}),
**traverse_obj(video_data, ('program', {
'series': ('title', {clean_html}, filter),
'series_id': ('nodeUrl', {clean_html}, {trim_str(start='/programs/')}, filter),
})),
}

View File

@@ -131,11 +131,15 @@ class TwitterBaseIE(InfoExtractor):
video_id, headers=headers, query=query, expected_status=allowed_status, video_id, headers=headers, query=query, expected_status=allowed_status,
note=f'Downloading {"GraphQL" if graphql else "legacy API"} JSON') note=f'Downloading {"GraphQL" if graphql else "legacy API"} JSON')
if result.get('errors'): if error_msg := ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str})))):
errors = ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str})))) # Errors with the message 'Dependency: Unspecified' are a false positive
if errors and 'not authorized' in errors: # See https://github.com/yt-dlp/yt-dlp/issues/15963
self.raise_login_required(remove_end(errors, '.')) if error_msg.lower() == 'dependency: unspecified':
raise ExtractorError(f'Error(s) while querying API: {errors or "Unknown error"}') self.write_debug(f'Ignoring Twitter API error: "{error_msg}"')
elif 'not authorized' in error_msg.lower():
self.raise_login_required(remove_end(error_msg, '.'))
else:
raise ExtractorError(f'Error(s) while querying API: {error_msg or "Unknown error"}')
return result return result
@@ -1078,7 +1082,7 @@ class TwitterIE(TwitterBaseIE):
raise ExtractorError(f'Twitter API says: {cause or "Unknown error"}', expected=True) raise ExtractorError(f'Twitter API says: {cause or "Unknown error"}', expected=True)
elif typename == 'TweetUnavailable': elif typename == 'TweetUnavailable':
reason = result.get('reason') reason = result.get('reason')
if reason == 'NsfwLoggedOut': if reason in ('NsfwLoggedOut', 'NsfwViewerHasNoStatedAge'):
self.raise_login_required('NSFW tweet requires authentication') self.raise_login_required('NSFW tweet requires authentication')
elif reason == 'Protected': elif reason == 'Protected':
self.raise_login_required('You are not authorized to view this protected tweet') self.raise_login_required('You are not authorized to view this protected tweet')

View File

@@ -67,6 +67,10 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'plus\.rtl\.de(?!/podcast/)', r'plus\.rtl\.de(?!/podcast/)',
r'mediasetinfinity\.es', r'mediasetinfinity\.es',
r'tv5mondeplus\.com', r'tv5mondeplus\.com',
r'tv\.rakuten\.co\.jp',
r'watch\.telusoriginals\.com',
r'video\.unext\.jp',
r'www\.web\.nhk',
) )
_TESTS = [{ _TESTS = [{
@@ -231,6 +235,23 @@ class KnownDRMIE(UnsupportedInfoExtractor):
# https://github.com/yt-dlp/yt-dlp/issues/14743 # https://github.com/yt-dlp/yt-dlp/issues/14743
'url': 'https://www.tv5mondeplus.com/', 'url': 'https://www.tv5mondeplus.com/',
'only_matching': True, 'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/8821
'url': 'https://tv.rakuten.co.jp/content/519554/',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/9851
'url': 'https://watch.telusoriginals.com/play?assetID=fruit-is-ripe',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/13220
# https://github.com/yt-dlp/yt-dlp/issues/14564
'url': 'https://video.unext.jp/play/SID0062010/ED00337407',
'only_matching': True,
}, {
# https://github.com/yt-dlp/yt-dlp/issues/14620
'url': 'https://www.web.nhk/tv/an/72hours/pl/series-tep-W3W8WRN8M3/ep/QW8ZY6146V',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

116
yt_dlp/extractor/visir.py Normal file
View File

@@ -0,0 +1,116 @@
import re
from .common import InfoExtractor
from ..utils import (
UnsupportedError,
clean_html,
int_or_none,
js_to_json,
month_by_name,
url_or_none,
urljoin,
)
from ..utils.traversal import find_element, traverse_obj
class VisirIE(InfoExtractor):
IE_DESC = 'Vísir'
_VALID_URL = r'https?://(?:www\.)?visir\.is/(?P<type>k|player)/(?P<id>[\da-f-]+)(?:/(?P<slug>[\w.-]+))?'
_EMBED_REGEX = [rf'<iframe[^>]+src=["\'](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://www.visir.is/k/eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
'info_dict': {
'id': 'eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
'ext': 'mp4',
'title': 'Sveppi og Siggi Þór mestu skaphundarnir',
'categories': ['island-i-dag'],
'description': 'md5:e06bd6a0cd8bdde328ad8cf00d3d4df6',
'duration': 792,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260121',
'view_count': int,
},
}, {
'url': 'https://www.visir.is/k/b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704/tonlistarborgin-reykjavik',
'info_dict': {
'id': 'b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704',
'ext': 'mp4',
'title': 'Tónlistarborgin Reykjavík',
'categories': ['tonlist'],
'description': 'md5:47237589dc95dbde55dfbb163396f88a',
'display_id': 'tonlistarborgin-reykjavik',
'duration': 81,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20251124',
'view_count': int,
},
}, {
'url': 'https://www.visir.is/player/0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
'info_dict': {
'id': '0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
'ext': 'mp4',
'title': 'Sportpakkinn 2. febrúar 2026',
'categories': ['sportpakkinn'],
'display_id': 'sportpakkinn-2.-februar-2026',
'duration': 293,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260202',
'view_count': int,
},
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.visir.is/g/20262837896d/segir-von-brigdin-med-prinsessuna-rista-djupt',
'info_dict': {
'id': '9ad5e58a-f26f-49f7-8b1d-68f0629485b7-1770059257365',
'ext': 'mp4',
'title': 'Norðmenn tala ekki um annað en prinsessuna',
'categories': ['frettir'],
'description': 'md5:53e2623ae79e1355778c14f5b557a0cd',
'display_id': 'nordmenn-tala-ekki-um-annad-en-prinsessuna',
'duration': 138,
'thumbnail': r're:https?://www\.visir\.is/.+',
'upload_date': '20260202',
'view_count': int,
},
}]
def _real_extract(self, url):
video_type, video_id, display_id = self._match_valid_url(url).group('type', 'id', 'slug')
webpage = self._download_webpage(url, video_id)
if video_type == 'player':
real_url = self._og_search_url(webpage)
if not self.suitable(real_url) or self._match_valid_url(real_url).group('type') == 'player':
raise UnsupportedError(real_url)
return self.url_result(real_url, self.ie_key())
upload_date = None
date_elements = traverse_obj(webpage, (
{find_element(cls='article-item__date')}, {clean_html}, filter, {str.split}))
if date_elements and len(date_elements) == 3:
day, month, year = date_elements
day = int_or_none(day.rstrip('.'))
month = month_by_name(month, 'is')
if day and month and re.fullmatch(r'[0-9]{4}', year):
upload_date = f'{year}{month:02d}{day:02d}'
player = self._search_json(
r'App\.Player\.Init\(', webpage, video_id, 'player', transform_source=js_to_json)
m3u8_url = traverse_obj(player, ('File', {urljoin('https://vod.visir.is/')}))
return {
'id': video_id,
'display_id': display_id,
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4'),
'upload_date': upload_date,
**traverse_obj(webpage, ({find_element(cls='article-item press-ads')}, {
'description': ({find_element(cls='-large')}, {clean_html}, filter),
'view_count': ({find_element(cls='article-item__viewcount')}, {clean_html}, {int_or_none}),
})),
**traverse_obj(player, {
'title': ('Title', {clean_html}),
'categories': ('Categoryname', {clean_html}, filter, all, filter),
'duration': ('MediaDuration', {int_or_none}),
'thumbnail': ('Image', {url_or_none}),
}),
}

View File

@@ -1,6 +1,7 @@
import collections import collections
import hashlib import hashlib
import re import re
import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
@@ -8,6 +9,7 @@ from .odnoklassniki import OdnoklassnikiIE
from .sibnet import SibnetEmbedIE from .sibnet import SibnetEmbedIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .youtube import YoutubeIE from .youtube import YoutubeIE
from ..jsinterp import JSInterpreter
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
UserNotLive, UserNotLive,
@@ -36,16 +38,38 @@ class VKBaseIE(InfoExtractor):
def _download_webpage_handle(self, url_or_request, video_id, *args, fatal=True, **kwargs): def _download_webpage_handle(self, url_or_request, video_id, *args, fatal=True, **kwargs):
response = super()._download_webpage_handle(url_or_request, video_id, *args, fatal=fatal, **kwargs) response = super()._download_webpage_handle(url_or_request, video_id, *args, fatal=fatal, **kwargs)
challenge_url, cookie = response[1].url if response else '', None if response is False:
if challenge_url.startswith('https://vk.com/429.html?'):
cookie = self._get_cookies(challenge_url).get('hash429')
if not cookie:
return response return response
hash429 = hashlib.md5(cookie.value.encode('ascii')).hexdigest() webpage, urlh = response
challenge_url = urlh.url
if urllib.parse.urlparse(challenge_url).path != '/challenge.html':
return response
self.to_screen(join_nonempty(
video_id and f'[{video_id}]',
'Received a JS challenge response',
delim=' '))
challenge_hash = traverse_obj(challenge_url, (
{parse_qs}, 'hash429', -1, {require('challenge hash')}))
func_code = self._search_regex(
r'(?s)var\s+salt\s*=\s*\(\s*function\s*\(\)\s*(\{.+?\})\s*\)\(\);\s*var\s+hash',
webpage, 'JS challenge salt function')
jsi = JSInterpreter(f'function salt() {func_code}')
salt = jsi.extract_function('salt')([])
self.write_debug(f'Generated salt with native JS interpreter: {salt}')
key_hash = hashlib.md5(f'{challenge_hash}:{salt}'.encode()).hexdigest()
self.write_debug(f'JS challenge key hash: {key_hash}')
# Request with the challenge key and the response should set a 'solution429' cookie
self._request_webpage( self._request_webpage(
update_url_query(challenge_url, {'key': hash429}), video_id, fatal=fatal, update_url_query(challenge_url, {'key': key_hash}), video_id,
note='Resolving WAF challenge', errnote='Failed to bypass WAF challenge') 'Submitting JS challenge solution', 'Unable to solve JS challenge', fatal=True)
return super()._download_webpage_handle(url_or_request, video_id, *args, fatal=True, **kwargs) return super()._download_webpage_handle(url_or_request, video_id, *args, fatal=True, **kwargs)
def _perform_login(self, username, password): def _perform_login(self, username, password):

View File

@@ -3,6 +3,7 @@ import re
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..jsinterp import int_to_int32
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
@@ -20,73 +21,69 @@ from ..utils import (
) )
def to_signed_32(n):
return n % ((-1 if n < 0 else 1) * 2**32)
class _ByteGenerator: class _ByteGenerator:
def __init__(self, algo_id, seed): def __init__(self, algo_id, seed):
try: try:
self._algorithm = getattr(self, f'_algo{algo_id}') self._algorithm = getattr(self, f'_algo{algo_id}')
except AttributeError: except AttributeError:
raise ExtractorError(f'Unknown algorithm ID "{algo_id}"') raise ExtractorError(f'Unknown algorithm ID "{algo_id}"')
self._s = to_signed_32(seed) self._s = int_to_int32(seed)
def _algo1(self, s): def _algo1(self, s):
# LCG (a=1664525, c=1013904223, m=2^32) # LCG (a=1664525, c=1013904223, m=2^32)
# Ref: https://en.wikipedia.org/wiki/Linear_congruential_generator # Ref: https://en.wikipedia.org/wiki/Linear_congruential_generator
s = self._s = to_signed_32(s * 1664525 + 1013904223) s = self._s = int_to_int32(s * 1664525 + 1013904223)
return s return s
def _algo2(self, s): def _algo2(self, s):
# xorshift32 # xorshift32
# Ref: https://en.wikipedia.org/wiki/Xorshift # Ref: https://en.wikipedia.org/wiki/Xorshift
s = to_signed_32(s ^ (s << 13)) s = int_to_int32(s ^ (s << 13))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 17)) s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 17))
s = self._s = to_signed_32(s ^ (s << 5)) s = self._s = int_to_int32(s ^ (s << 5))
return s return s
def _algo3(self, s): def _algo3(self, s):
# Weyl Sequence (k≈2^32*φ, m=2^32) + MurmurHash3 (fmix32) # Weyl Sequence (k≈2^32*φ, m=2^32) + MurmurHash3 (fmix32)
# Ref: https://en.wikipedia.org/wiki/Weyl_sequence # Ref: https://en.wikipedia.org/wiki/Weyl_sequence
# https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash3.java.html # https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash3.java.html
s = self._s = to_signed_32(s + 0x9e3779b9) s = self._s = int_to_int32(s + 0x9e3779b9)
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16)) s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
s = to_signed_32(s * to_signed_32(0x85ebca77)) s = int_to_int32(s * int_to_int32(0x85ebca77))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 13)) s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 13))
s = to_signed_32(s * to_signed_32(0xc2b2ae3d)) s = int_to_int32(s * int_to_int32(0xc2b2ae3d))
return to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16)) return int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
def _algo4(self, s): def _algo4(self, s):
# Custom scrambling function involving a left rotation (ROL) # Custom scrambling function involving a left rotation (ROL)
s = self._s = to_signed_32(s + 0x6d2b79f5) s = self._s = int_to_int32(s + 0x6d2b79f5)
s = to_signed_32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7 s = int_to_int32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7
s = to_signed_32(s + 0x9e3779b9) s = int_to_int32(s + 0x9e3779b9)
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 11)) s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 11))
return to_signed_32(s * 0x27d4eb2d) return int_to_int32(s * 0x27d4eb2d)
def _algo5(self, s): def _algo5(self, s):
# xorshift variant with a final addition # xorshift variant with a final addition
s = to_signed_32(s ^ (s << 7)) s = int_to_int32(s ^ (s << 7))
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 9)) s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 9))
s = to_signed_32(s ^ (s << 8)) s = int_to_int32(s ^ (s << 8))
s = self._s = to_signed_32(s + 0xa5a5a5a5) s = self._s = int_to_int32(s + 0xa5a5a5a5)
return s return s
def _algo6(self, s): def _algo6(self, s):
# LCG (a=0x2c9277b5, c=0xac564b05) with a variable right shift scrambler # LCG (a=0x2c9277b5, c=0xac564b05) with a variable right shift scrambler
s = self._s = to_signed_32(s * to_signed_32(0x2c9277b5) + to_signed_32(0xac564b05)) s = self._s = int_to_int32(s * int_to_int32(0x2c9277b5) + int_to_int32(0xac564b05))
s2 = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 18)) s2 = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 18))
shift = (s & 0xFFFFFFFF) >> 27 & 31 shift = (s & 0xFFFFFFFF) >> 27 & 31
return to_signed_32((s2 & 0xFFFFFFFF) >> shift) return int_to_int32((s2 & 0xFFFFFFFF) >> shift)
def _algo7(self, s): def _algo7(self, s):
# Weyl Sequence (k=0x9e3779b9) + custom multiply-xor-shift mixing function # Weyl Sequence (k=0x9e3779b9) + custom multiply-xor-shift mixing function
s = self._s = to_signed_32(s + to_signed_32(0x9e3779b9)) s = self._s = int_to_int32(s + int_to_int32(0x9e3779b9))
e = to_signed_32(s ^ (s << 5)) e = int_to_int32(s ^ (s << 5))
e = to_signed_32(e * to_signed_32(0x7feb352d)) e = int_to_int32(e * int_to_int32(0x7feb352d))
e = to_signed_32(e ^ ((e & 0xFFFFFFFF) >> 15)) e = int_to_int32(e ^ ((e & 0xFFFFFFFF) >> 15))
return to_signed_32(e * to_signed_32(0x846ca68b)) return int_to_int32(e * int_to_int32(0x846ca68b))
def __next__(self): def __next__(self):
return self._algorithm(self._s) & 0xFF return self._algorithm(self._s) & 0xFF
@@ -213,16 +210,9 @@ class XHamsterIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _decipher_format_url(self, format_url, format_id): _VALID_HEX_RE = r'[0-9a-fA-F]{12,}'
parsed_url = urllib.parse.urlparse(format_url)
hex_string, path_remainder = self._search_regex(
r'^/(?P<hex>[0-9a-fA-F]{12,})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
default=(None, None), group=('hex', 'rem'))
if not hex_string:
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
return None
def _decipher_hex_string(self, hex_string, format_id):
byte_data = bytes.fromhex(hex_string) byte_data = bytes.fromhex(hex_string)
seed = int.from_bytes(byte_data[1:5], byteorder='little', signed=True) seed = int.from_bytes(byte_data[1:5], byteorder='little', signed=True)
@@ -232,7 +222,33 @@ class XHamsterIE(InfoExtractor):
self.report_warning(f'Skipping format "{format_id}": {e.msg}') self.report_warning(f'Skipping format "{format_id}": {e.msg}')
return None return None
deciphered = bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1') return bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1')
def _decipher_format_url(self, format_url, format_id):
# format_url can be hex ciphertext or a URL with a hex ciphertext segment
if re.fullmatch(self._VALID_HEX_RE, format_url):
return self._decipher_hex_string(format_url, format_id)
elif not url_or_none(format_url):
if re.fullmatch(r'[0-9a-fA-F]+', format_url):
# Hex strings that are too short are expected, so we don't want to warn
self.write_debug(f'Skipping dummy ciphertext for "{format_id}": {format_url}')
else:
# Something has likely changed on the site's end, so we need to warn
self.report_warning(f'Skipping format "{format_id}": invalid ciphertext')
return None
parsed_url = urllib.parse.urlparse(format_url)
hex_string, path_remainder = self._search_regex(
rf'^/(?P<hex>{self._VALID_HEX_RE})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
default=(None, None), group=('hex', 'rem'))
if not hex_string:
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
return None
deciphered = self._decipher_hex_string(hex_string, format_id)
if not deciphered:
return None
return parsed_url._replace(path=f'/{deciphered}{path_remainder}').geturl() return parsed_url._replace(path=f'/{deciphered}{path_remainder}').geturl()
@@ -252,7 +268,7 @@ class XHamsterIE(InfoExtractor):
display_id = mobj.group('display_id') or mobj.group('display_id_2') display_id = mobj.group('display_id') or mobj.group('display_id_2')
desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url) desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
webpage, urlh = self._download_webpage_handle(desktop_url, video_id) webpage, urlh = self._download_webpage_handle(desktop_url, video_id, impersonate=True)
error = self._html_search_regex( error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>', r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',

View File

@@ -16,7 +16,7 @@ from ._redirect import (
YoutubeYtBeIE, YoutubeYtBeIE,
YoutubeYtUserIE, YoutubeYtUserIE,
) )
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE from ._search import YoutubeMusicSearchURLIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE from ._video import YoutubeIE
@@ -39,7 +39,6 @@ for _cls in [
YoutubeYtBeIE, YoutubeYtBeIE,
YoutubeYtUserIE, YoutubeYtUserIE,
YoutubeMusicSearchURLIE, YoutubeMusicSearchURLIE,
YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, YoutubeSearchURLIE,
YoutubePlaylistIE, YoutubePlaylistIE,

View File

@@ -28,21 +28,6 @@ class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
}] }]
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube search, newest videos first'
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
_TESTS = [{
'url': 'ytsearchdate5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}]
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor): class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube search URLs with sorting and filter support' IE_DESC = 'YouTube search URLs with sorting and filter support'
IE_NAME = YoutubeSearchIE.IE_NAME + '_url' IE_NAME = YoutubeSearchIE.IE_NAME + '_url'

View File

@@ -139,11 +139,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
] ]
_RETURN_TYPE = 'video' # XXX: How to handle multifeed? _RETURN_TYPE = 'video' # XXX: How to handle multifeed?
_PLAYER_INFO_RE = (
r'/s/player/(?P<id>[a-zA-Z0-9_-]{8,})/(?:tv-)?player',
r'/(?P<id>[a-zA-Z0-9_-]{8,})/player(?:_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?|-plasma-ias-(?:phone|tablet)-[a-z]{2}_[A-Z]{2}\.vflset)/base\.js$',
r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.js$',
)
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt') _SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
_DEFAULT_CLIENTS = ('android_vr', 'web', 'web_safari') _DEFAULT_CLIENTS = ('android_vr', 'web', 'web_safari')
_DEFAULT_JSLESS_CLIENTS = ('android_vr',) _DEFAULT_JSLESS_CLIENTS = ('android_vr',)
@@ -1879,17 +1874,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}] }]
_DEFAULT_PLAYER_JS_VERSION = 'actual' _DEFAULT_PLAYER_JS_VERSION = 'actual'
_DEFAULT_PLAYER_JS_VARIANT = 'main' _DEFAULT_PLAYER_JS_VARIANT = 'tv'
_PLAYER_JS_VARIANT_MAP = { _PLAYER_JS_VARIANT_MAP = {
'main': 'player_ias.vflset/en_US/base.js', 'main': 'player_ias.vflset/en_US/base.js',
'tcc': 'player_ias_tcc.vflset/en_US/base.js', 'tcc': 'player_ias_tcc.vflset/en_US/base.js',
'tce': 'player_ias_tce.vflset/en_US/base.js', 'tce': 'player_ias_tce.vflset/en_US/base.js',
'es5': 'player_es5.vflset/en_US/base.js', 'es5': 'player_es5.vflset/en_US/base.js',
'es6': 'player_es6.vflset/en_US/base.js', 'es6': 'player_es6.vflset/en_US/base.js',
'es6_tcc': 'player_es6_tcc.vflset/en_US/base.js',
'es6_tce': 'player_es6_tce.vflset/en_US/base.js',
'tv': 'tv-player-ias.vflset/tv-player-ias.js', 'tv': 'tv-player-ias.vflset/tv-player-ias.js',
'tv_es6': 'tv-player-es6.vflset/tv-player-es6.js', 'tv_es6': 'tv-player-es6.vflset/tv-player-es6.js',
'phone': 'player-plasma-ias-phone-en_US.vflset/base.js', 'phone': 'player-plasma-ias-phone-en_US.vflset/base.js',
'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js', # Dead since 19712d96 (2025.11.06) 'house': 'house_brand_player.vflset/en_US/base.js', # Used by Google Drive
} }
_INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()} _INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()}
@@ -2179,13 +2176,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
@classmethod @classmethod
def _extract_player_info(cls, player_url): def _extract_player_info(cls, player_url):
for player_re in cls._PLAYER_INFO_RE: if m := re.search(r'/s/player/(?P<id>[a-fA-F0-9]{8,})/', player_url):
id_m = re.search(player_re, player_url) return m.group('id')
if id_m: raise ExtractorError(f'Cannot identify player {player_url!r}')
break
else:
raise ExtractorError(f'Cannot identify player {player_url!r}')
return id_m.group('id')
def _load_player(self, video_id, player_url, fatal=True): def _load_player(self, video_id, player_url, fatal=True):
player_js_key = self._player_js_cache_key(player_url) player_js_key = self._player_js_cache_key(player_url)
@@ -3219,6 +3212,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
]) ])
skip_player_js = 'js' in self._configuration_arg('player_skip') skip_player_js = 'js' in self._configuration_arg('player_skip')
format_types = self._configuration_arg('formats') format_types = self._configuration_arg('formats')
skip_bad_formats = 'incomplete' not in format_types
all_formats = 'duplicate' in format_types all_formats = 'duplicate' in format_types
if self._configuration_arg('include_duplicate_formats'): if self._configuration_arg('include_duplicate_formats'):
all_formats = True all_formats = True
@@ -3464,7 +3458,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
https_fmts = [] https_fmts = []
for fmt_stream in streaming_formats: for fmt_stream in streaming_formats:
if fmt_stream.get('targetDurationSec'): # Live adaptive https formats are not supported: skip unless extractor-arg given
if fmt_stream.get('targetDurationSec') and skip_bad_formats:
continue continue
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment # FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
@@ -3576,7 +3571,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
yield from process_https_formats() yield from process_https_formats()
needs_live_processing = self._needs_live_processing(live_status, duration) needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = 'incomplete' not in format_types
skip_manifests = set(self._configuration_arg('skip')) skip_manifests = set(self._configuration_arg('skip'))
if (needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway if (needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway
@@ -4086,16 +4080,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
needs_live_processing = self._needs_live_processing(live_status, duration) needs_live_processing = self._needs_live_processing(live_status, duration)
def is_bad_format(fmt): def adjust_incomplete_format(fmt, note_suffix='(Last 2 hours)', pref_adjustment=-10):
if needs_live_processing and not fmt.get('is_from_start'): fmt['preference'] = (fmt.get('preference') or -1) + pref_adjustment
return True fmt['format_note'] = join_nonempty(fmt.get('format_note'), note_suffix, delim=' ')
elif (live_status == 'is_live' and needs_live_processing != 'is_live'
and fmt.get('protocol') == 'http_dash_segments'):
return True
for fmt in filter(is_bad_format, formats): # Adjust preference and format note for incomplete live/post-live formats
fmt['preference'] = (fmt.get('preference') or -1) - 10 if live_status in ('is_live', 'post_live'):
fmt['format_note'] = join_nonempty(fmt.get('format_note'), '(Last 2 hours)', delim=' ') for fmt in formats:
protocol = fmt.get('protocol')
# Currently, protocol isn't set for adaptive https formats, but this could change
is_adaptive = protocol in (None, 'http', 'https')
if live_status == 'post_live' and is_adaptive:
# Post-live adaptive formats cause HttpFD to raise "Did not get any data blocks"
# These formats are *only* useful to external applications, so we can hide them
# Set their preference <= -1000 so that FormatSorter flags them as 'hidden'
adjust_incomplete_format(fmt, note_suffix='(ended)', pref_adjustment=-5000)
# Is it live with --live-from-start? Or is it post-live and its duration is >2hrs?
elif needs_live_processing:
if not fmt.get('is_from_start'):
# Post-live m3u8 formats for >2hr streams
adjust_incomplete_format(fmt)
elif live_status == 'is_live':
if protocol == 'http_dash_segments':
# Live DASH formats without --live-from-start
adjust_incomplete_format(fmt)
elif is_adaptive:
# Incomplete live adaptive https formats
adjust_incomplete_format(fmt, note_suffix='(incomplete)', pref_adjustment=-20)
if needs_live_processing: if needs_live_processing:
self._prepare_live_from_start_formats( self._prepare_live_from_start_formats(

View File

@@ -1,10 +1,10 @@
# This file is generated by devscripts/update_ejs.py. DO NOT MODIFY! # This file is generated by devscripts/update_ejs.py. DO NOT MODIFY!
VERSION = '0.4.0' VERSION = '0.5.0'
HASHES = { HASHES = {
'yt.solver.bun.lib.js': '6ff45e94de9f0ea936a183c48173cfa9ce526ee4b7544cd556428427c1dd53c8073ef0174e79b320252bf0e7c64b0032cc1cf9c4358f3fda59033b7caa01c241', 'yt.solver.bun.lib.js': '6ff45e94de9f0ea936a183c48173cfa9ce526ee4b7544cd556428427c1dd53c8073ef0174e79b320252bf0e7c64b0032cc1cf9c4358f3fda59033b7caa01c241',
'yt.solver.core.js': '05964b458d92a65d4fb7a90bcb5921c9fed2370f4e4f2f25badb41f28aff9069e0b3c4e5bf1baf2d3021787b67fc6093cefa44de30cffdc6f9fb25532484003b', 'yt.solver.core.js': '9742868113d7b0c29e24a95c8eb2c2bec7cdf95513dc7f55f523ba053c0ecf2af7dcb0138b1d933578304f0dda633a6b3bfff64e912b4c547b99dad083428c4b',
'yt.solver.core.min.js': '0cd3c0b37e095d3cca99443b58fe03980ac3bf2e777c2485c23e1f6052b5ede9f07c7f1c79a9c3af3258ea91a228f099741e7eb07b53125b5dcc84bb4c0054f3', 'yt.solver.core.min.js': 'aee8c3354cfd535809c871c2a517d03231f89cd184e903af82ee274bcc2e90991ef19cb3f65f2ccc858c4963856ea87f8692fe16d71209f4fc7f41c44b828e36',
'yt.solver.deno.lib.js': '9c8ee3ab6c23e443a5a951e3ac73c6b8c1c8fb34335e7058a07bf99d349be5573611de00536dcd03ecd3cf34014c4e9b536081de37af3637c5390c6a6fd6a0f0', 'yt.solver.deno.lib.js': '9c8ee3ab6c23e443a5a951e3ac73c6b8c1c8fb34335e7058a07bf99d349be5573611de00536dcd03ecd3cf34014c4e9b536081de37af3637c5390c6a6fd6a0f0',
'yt.solver.lib.js': '1ee3753a8222fc855f5c39db30a9ccbb7967dbe1fb810e86dc9a89aa073a0907f294c720e9b65427d560a35aa1ce6af19ef854d9126a05ca00afe03f72047733', 'yt.solver.lib.js': '1ee3753a8222fc855f5c39db30a9ccbb7967dbe1fb810e86dc9a89aa073a0907f294c720e9b65427d560a35aa1ce6af19ef854d9126a05ca00afe03f72047733',
'yt.solver.lib.min.js': '8420c259ad16e99ce004e4651ac1bcabb53b4457bf5668a97a9359be9a998a789fee8ab124ee17f91a2ea8fd84e0f2b2fc8eabcaf0b16a186ba734cf422ad053', 'yt.solver.lib.min.js': '8420c259ad16e99ce004e4651ac1bcabb53b4457bf5668a97a9359be9a998a789fee8ab124ee17f91a2ea8fd84e0f2b2fc8eabcaf0b16a186ba734cf422ad053',

View File

@@ -60,26 +60,29 @@ var jsc = (function (meriyah, astring) {
} }
return value; return value;
} }
const nsigExpression = { const nsig = {
type: 'VariableDeclaration', type: 'CallExpression',
kind: 'var', callee: { or: [{ type: 'Identifier' }, { type: 'SequenceExpression' }] },
declarations: [ arguments: [
{},
{ {
type: 'VariableDeclarator', type: 'CallExpression',
init: { callee: { type: 'Identifier', name: 'decodeURIComponent' },
type: 'CallExpression', arguments: [{}],
callee: { type: 'Identifier' },
arguments: [
{ type: 'Literal' },
{
type: 'CallExpression',
callee: { type: 'Identifier', name: 'decodeURIComponent' },
},
],
},
}, },
], ],
}; };
const nsigAssignment = {
type: 'AssignmentExpression',
left: { type: 'Identifier' },
operator: '=',
right: nsig,
};
const nsigDeclarator = {
type: 'VariableDeclarator',
id: { type: 'Identifier' },
init: nsig,
};
const logicalExpression = { const logicalExpression = {
type: 'ExpressionStatement', type: 'ExpressionStatement',
expression: { expression: {
@@ -97,6 +100,17 @@ var jsc = (function (meriyah, astring) {
callee: { type: 'Identifier' }, callee: { type: 'Identifier' },
arguments: { arguments: {
or: [ or: [
[
{
type: 'CallExpression',
callee: {
type: 'Identifier',
name: 'decodeURIComponent',
},
arguments: [{ type: 'Identifier' }],
optional: false,
},
],
[ [
{ type: 'Literal' }, { type: 'Literal' },
{ {
@@ -110,6 +124,8 @@ var jsc = (function (meriyah, astring) {
}, },
], ],
[ [
{ type: 'Literal' },
{ type: 'Literal' },
{ {
type: 'CallExpression', type: 'CallExpression',
callee: { callee: {
@@ -138,18 +154,18 @@ var jsc = (function (meriyah, astring) {
expression: { expression: {
type: 'AssignmentExpression', type: 'AssignmentExpression',
operator: '=', operator: '=',
left: { type: 'Identifier' }, left: { or: [{ type: 'Identifier' }, { type: 'MemberExpression' }] },
right: { type: 'FunctionExpression', params: [{}, {}, {}] }, right: { type: 'FunctionExpression' },
}, },
}, },
{ type: 'FunctionDeclaration', params: [{}, {}, {}] }, { type: 'FunctionDeclaration' },
{ {
type: 'VariableDeclaration', type: 'VariableDeclaration',
declarations: { declarations: {
anykey: [ anykey: [
{ {
type: 'VariableDeclarator', type: 'VariableDeclarator',
init: { type: 'FunctionExpression', params: [{}, {}, {}] }, init: { type: 'FunctionExpression' },
}, },
], ],
}, },
@@ -157,124 +173,150 @@ var jsc = (function (meriyah, astring) {
], ],
}; };
function extract$1(node) { function extract$1(node) {
if (!matchesStructure(node, identifier$1)) { const blocks = [];
return null; if (matchesStructure(node, identifier$1)) {
} if (
let block; node.type === 'ExpressionStatement' &&
if ( node.expression.type === 'AssignmentExpression' &&
node.expression.right.type === 'FunctionExpression' &&
node.expression.right.params.length >= 3
) {
blocks.push(node.expression.right.body);
} else if (node.type === 'VariableDeclaration') {
for (const decl of node.declarations) {
if (
_optionalChain$2([
decl,
'access',
(_) => _.init,
'optionalAccess',
(_2) => _2.type,
]) === 'FunctionExpression' &&
decl.init.params.length >= 3
) {
blocks.push(decl.init.body);
}
}
} else if (
node.type === 'FunctionDeclaration' &&
node.params.length >= 3
) {
blocks.push(node.body);
} else {
return null;
}
} else if (
node.type === 'ExpressionStatement' && node.type === 'ExpressionStatement' &&
node.expression.type === 'AssignmentExpression' && node.expression.type === 'SequenceExpression'
node.expression.right.type === 'FunctionExpression'
) { ) {
block = node.expression.right.body; for (const expr of node.expression.expressions) {
} else if (node.type === 'VariableDeclaration') {
for (const decl of node.declarations) {
if ( if (
decl.type === 'VariableDeclarator' && expr.type === 'AssignmentExpression' &&
_optionalChain$2([ expr.right.type === 'FunctionExpression' &&
decl, expr.right.params.length === 3
'access',
(_) => _.init,
'optionalAccess',
(_2) => _2.type,
]) === 'FunctionExpression' &&
_optionalChain$2([
decl,
'access',
(_3) => _3.init,
'optionalAccess',
(_4) => _4.params,
'access',
(_5) => _5.length,
]) === 3
) { ) {
block = decl.init.body; blocks.push(expr.right.body);
break;
} }
} }
} else if (node.type === 'FunctionDeclaration') {
block = node.body;
} else { } else {
return null; return null;
} }
const relevantExpression = _optionalChain$2([ for (const block of blocks) {
block, let call = null;
'optionalAccess', for (const stmt of block.body) {
(_6) => _6.body, if (matchesStructure(stmt, logicalExpression)) {
'access', if (
(_7) => _7.at, stmt.type === 'ExpressionStatement' &&
'call', stmt.expression.type === 'LogicalExpression' &&
(_8) => _8(-2), stmt.expression.right.type === 'SequenceExpression' &&
]); stmt.expression.right.expressions[0].type ===
let call = null; 'AssignmentExpression' &&
if (matchesStructure(relevantExpression, logicalExpression)) { stmt.expression.right.expressions[0].right.type === 'CallExpression'
if ( ) {
_optionalChain$2([ call = stmt.expression.right.expressions[0].right;
relevantExpression, }
'optionalAccess', } else if (stmt.type === 'IfStatement') {
(_9) => _9.type, let consequent = stmt.consequent;
]) !== 'ExpressionStatement' || while (consequent.type === 'LabeledStatement') {
relevantExpression.expression.type !== 'LogicalExpression' || consequent = consequent.body;
relevantExpression.expression.right.type !== 'SequenceExpression' || }
relevantExpression.expression.right.expressions[0].type !== if (consequent.type !== 'BlockStatement') {
'AssignmentExpression' || continue;
relevantExpression.expression.right.expressions[0].right.type !== }
'CallExpression' for (const n of consequent.body) {
) { if (n.type !== 'VariableDeclaration') {
return null; continue;
} }
call = relevantExpression.expression.right.expressions[0].right; for (const decl of n.declarations) {
} else if ( if (
_optionalChain$2([ matchesStructure(decl, nsigDeclarator) &&
relevantExpression, _optionalChain$2([
'optionalAccess', decl,
(_10) => _10.type, 'access',
]) === 'IfStatement' && (_3) => _3.init,
relevantExpression.consequent.type === 'BlockStatement' 'optionalAccess',
) { (_4) => _4.type,
for (const n of relevantExpression.consequent.body) { ]) === 'CallExpression'
if (!matchesStructure(n, nsigExpression)) { ) {
continue; call = decl.init;
break;
}
}
if (call) {
break;
}
}
} else if (stmt.type === 'ExpressionStatement') {
if (
stmt.expression.type !== 'LogicalExpression' ||
stmt.expression.operator !== '&&' ||
stmt.expression.right.type !== 'SequenceExpression'
) {
continue;
}
for (const expr of stmt.expression.right.expressions) {
if (matchesStructure(expr, nsigAssignment) && expr.type) {
if (
expr.type === 'AssignmentExpression' &&
expr.right.type === 'CallExpression'
) {
call = expr.right;
break;
}
}
}
} }
if ( if (call) {
n.type !== 'VariableDeclaration' || break;
_optionalChain$2([
n,
'access',
(_11) => _11.declarations,
'access',
(_12) => _12[0],
'access',
(_13) => _13.init,
'optionalAccess',
(_14) => _14.type,
]) !== 'CallExpression'
) {
continue;
} }
call = n.declarations[0].init;
break;
} }
if (!call) {
continue;
}
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: 'sig' }],
body: {
type: 'CallExpression',
callee: call.callee,
arguments: call.arguments.map((arg) => {
if (
arg.type === 'CallExpression' &&
arg.callee.type === 'Identifier' &&
arg.callee.name === 'decodeURIComponent'
) {
return { type: 'Identifier', name: 'sig' };
}
return arg;
}),
optional: false,
},
async: false,
expression: false,
generator: false,
};
} }
if (call === null) { return null;
return null;
}
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: 'sig' }],
body: {
type: 'CallExpression',
callee: { type: 'Identifier', name: call.callee.name },
arguments:
call.arguments.length === 1
? [{ type: 'Identifier', name: 'sig' }]
: [call.arguments[0], { type: 'Identifier', name: 'sig' }],
optional: false,
},
async: false,
expression: false,
generator: false,
};
} }
function _optionalChain$1(ops) { function _optionalChain$1(ops) {
let lastAccessLHS = undefined; let lastAccessLHS = undefined;
@@ -472,8 +514,31 @@ var jsc = (function (meriyah, astring) {
return value; return value;
} }
function preprocessPlayer(data) { function preprocessPlayer(data) {
const ast = meriyah.parse(data); const program = meriyah.parse(data);
const body = ast.body; const plainStatements = modifyPlayer(program);
const solutions = getSolutions(plainStatements);
for (const [name, options] of Object.entries(solutions)) {
plainStatements.push({
type: 'ExpressionStatement',
expression: {
type: 'AssignmentExpression',
operator: '=',
left: {
type: 'MemberExpression',
computed: false,
object: { type: 'Identifier', name: '_result' },
property: { type: 'Identifier', name: name },
optional: false,
},
right: multiTry(options),
},
});
}
program.body.splice(0, 0, ...setupNodes);
return astring.generate(program);
}
function modifyPlayer(program) {
const body = program.body;
const block = (() => { const block = (() => {
switch (body.length) { switch (body.length) {
case 1: { case 1: {
@@ -506,16 +571,7 @@ var jsc = (function (meriyah, astring) {
} }
throw 'unexpected structure'; throw 'unexpected structure';
})(); })();
const found = { n: [], sig: [] }; block.body = block.body.filter((node) => {
const plainExpressions = block.body.filter((node) => {
const n = extract(node);
if (n) {
found.n.push(n);
}
const sig = extract$1(node);
if (sig) {
found.sig.push(sig);
}
if (node.type === 'ExpressionStatement') { if (node.type === 'ExpressionStatement') {
if (node.expression.type === 'AssignmentExpression') { if (node.expression.type === 'AssignmentExpression') {
return true; return true;
@@ -524,41 +580,241 @@ var jsc = (function (meriyah, astring) {
} }
return true; return true;
}); });
block.body = plainExpressions; return block.body;
for (const [name, options] of Object.entries(found)) { }
const unique = new Set(options.map((x) => JSON.stringify(x))); function getSolutions(statements) {
if (unique.size !== 1) { const found = { n: [], sig: [] };
const message = `found ${unique.size} ${name} function possibilities`; for (const statement of statements) {
throw ( const n = extract(statement);
message + if (n) {
(unique.size found.n.push(n);
? `: ${options.map((x) => astring.generate(x)).join(', ')}` }
: '') const sig = extract$1(statement);
); if (sig) {
found.sig.push(sig);
} }
plainExpressions.push({
type: 'ExpressionStatement',
expression: {
type: 'AssignmentExpression',
operator: '=',
left: {
type: 'MemberExpression',
computed: false,
object: { type: 'Identifier', name: '_result' },
property: { type: 'Identifier', name: name },
},
right: options[0],
},
});
} }
ast.body.splice(0, 0, ...setupNodes); return found;
return astring.generate(ast);
} }
function getFromPrepared(code) { function getFromPrepared(code) {
const resultObj = { n: null, sig: null }; const resultObj = { n: null, sig: null };
Function('_result', code)(resultObj); Function('_result', code)(resultObj);
return resultObj; return resultObj;
} }
function multiTry(generators) {
return {
type: 'ArrowFunctionExpression',
params: [{ type: 'Identifier', name: '_input' }],
body: {
type: 'BlockStatement',
body: [
{
type: 'VariableDeclaration',
kind: 'const',
declarations: [
{
type: 'VariableDeclarator',
id: { type: 'Identifier', name: '_results' },
init: {
type: 'NewExpression',
callee: { type: 'Identifier', name: 'Set' },
arguments: [],
},
},
],
},
{
type: 'ForOfStatement',
left: {
type: 'VariableDeclaration',
kind: 'const',
declarations: [
{
type: 'VariableDeclarator',
id: { type: 'Identifier', name: '_generator' },
init: null,
},
],
},
right: { type: 'ArrayExpression', elements: generators },
body: {
type: 'BlockStatement',
body: [
{
type: 'TryStatement',
block: {
type: 'BlockStatement',
body: [
{
type: 'ExpressionStatement',
expression: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'add' },
optional: false,
},
arguments: [
{
type: 'CallExpression',
callee: {
type: 'Identifier',
name: '_generator',
},
arguments: [
{ type: 'Identifier', name: '_input' },
],
optional: false,
},
],
optional: false,
},
},
],
},
handler: {
type: 'CatchClause',
param: null,
body: { type: 'BlockStatement', body: [] },
},
finalizer: null,
},
],
},
await: false,
},
{
type: 'IfStatement',
test: {
type: 'UnaryExpression',
operator: '!',
argument: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'size' },
optional: false,
},
prefix: true,
},
consequent: {
type: 'BlockStatement',
body: [
{
type: 'ThrowStatement',
argument: {
type: 'TemplateLiteral',
expressions: [],
quasis: [
{
type: 'TemplateElement',
value: { cooked: 'no solutions', raw: 'no solutions' },
tail: true,
},
],
},
},
],
},
alternate: null,
},
{
type: 'IfStatement',
test: {
type: 'BinaryExpression',
left: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'size' },
optional: false,
},
right: { type: 'Literal', value: 1 },
operator: '!==',
},
consequent: {
type: 'BlockStatement',
body: [
{
type: 'ThrowStatement',
argument: {
type: 'TemplateLiteral',
expressions: [
{
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'join' },
optional: false,
},
arguments: [{ type: 'Literal', value: ', ' }],
optional: false,
},
],
quasis: [
{
type: 'TemplateElement',
value: {
cooked: 'invalid solutions: ',
raw: 'invalid solutions: ',
},
tail: false,
},
{
type: 'TemplateElement',
value: { cooked: '', raw: '' },
tail: true,
},
],
},
},
],
},
alternate: null,
},
{
type: 'ReturnStatement',
argument: {
type: 'MemberExpression',
object: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: {
type: 'CallExpression',
callee: {
type: 'MemberExpression',
object: { type: 'Identifier', name: '_results' },
computed: false,
property: { type: 'Identifier', name: 'values' },
optional: false,
},
arguments: [],
optional: false,
},
computed: false,
property: { type: 'Identifier', name: 'next' },
optional: false,
},
arguments: [],
optional: false,
},
computed: false,
property: { type: 'Identifier', name: 'value' },
optional: false,
},
},
],
},
async: false,
expression: false,
generator: false,
};
}
function main(input) { function main(input) {
const preprocessedPlayer = const preprocessedPlayer =
input.type === 'player' input.type === 'player'

View File

@@ -18,6 +18,14 @@ from .utils import (
) )
def int_to_int32(n):
"""Converts an integer to a signed 32-bit integer"""
n &= 0xFFFFFFFF
if n & 0x80000000:
return n - 0x100000000
return n
def _js_bit_op(op): def _js_bit_op(op):
def zeroise(x): def zeroise(x):
if x in (None, JS_Undefined): if x in (None, JS_Undefined):
@@ -28,7 +36,7 @@ def _js_bit_op(op):
return int(float(x)) return int(float(x))
def wrapped(a, b): def wrapped(a, b):
return op(zeroise(a), zeroise(b)) & 0xffffffff return int_to_int32(op(int_to_int32(zeroise(a)), int_to_int32(zeroise(b))))
return wrapped return wrapped
@@ -368,6 +376,10 @@ class JSInterpreter:
if not _OPERATORS.get(op): if not _OPERATORS.get(op):
return right_val return right_val
# TODO: This is only correct for str+str and str+number; fix for str+array, str+object, etc
if op == '+' and (isinstance(left_val, str) or isinstance(right_val, str)):
return f'{left_val}{right_val}'
try: try:
return _OPERATORS[op](left_val, right_val) return _OPERATORS[op](left_val, right_val)
except Exception as e: except Exception as e:
@@ -377,7 +389,7 @@ class JSInterpreter:
if idx == 'length': if idx == 'length':
return len(obj) return len(obj)
try: try:
return obj[int(idx)] if isinstance(obj, list) else obj[idx] return obj[int(idx)] if isinstance(obj, list) else obj[str(idx)]
except Exception as e: except Exception as e:
if allow_undefined: if allow_undefined:
return JS_Undefined return JS_Undefined

View File

@@ -175,6 +175,13 @@ _TARGETS_COMPAT_LOOKUP = {
'safari180_ios': 'safari18_0_ios', 'safari180_ios': 'safari18_0_ios',
} }
# These targets are known to be insufficient, unreliable or blocked
# See: https://github.com/yt-dlp/yt-dlp/issues/16012
_DEPRIORITIZED_TARGETS = {
ImpersonateTarget('chrome', '133', 'macos', '15'), # chrome133a
ImpersonateTarget('chrome', '136', 'macos', '15'), # chrome136
}
@register_rh @register_rh
class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin): class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
@@ -192,6 +199,8 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
for version, targets in BROWSER_TARGETS.items() for version, targets in BROWSER_TARGETS.items()
if curl_cffi_version >= version if curl_cffi_version >= version
), key=lambda x: ( ), key=lambda x: (
# deprioritize unreliable targets so they are not selected by default
x[1] not in _DEPRIORITIZED_TARGETS,
# deprioritize mobile targets since they give very different behavior # deprioritize mobile targets since they give very different behavior
x[1].os not in ('ios', 'android'), x[1].os not in ('ios', 'android'),
# prioritize tor < edge < firefox < safari < chrome # prioritize tor < edge < firefox < safari < chrome

View File

@@ -511,7 +511,7 @@ def create_parser():
general.add_option( general.add_option(
'--live-from-start', '--live-from-start',
action='store_true', dest='live_from_start', action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch') help='Download livestreams from the start. Currently experimental and only supported for YouTube, Twitch, and TVer')
general.add_option( general.add_option(
'--no-live-from-start', '--no-live-from-start',
action='store_false', dest='live_from_start', action='store_false', dest='live_from_start',

View File

@@ -75,6 +75,9 @@ MONTH_NAMES = {
'fr': [ 'fr': [
'janvier', 'février', 'mars', 'avril', 'mai', 'juin', 'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'], 'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'],
'is': [
'janúar', 'febrúar', 'mars', 'apríl', 'maí', 'júní',
'júlí', 'ágúst', 'september', 'október', 'nóvember', 'desember'],
# these follow the genitive grammatical case (dopełniacz) # these follow the genitive grammatical case (dopełniacz)
# some websites might be using nominative, which will require another month list # some websites might be using nominative, which will require another month list
# https://en.wikibooks.org/wiki/Polish/Noun_cases # https://en.wikibooks.org/wiki/Polish/Noun_cases

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2026.01.31' __version__ = '2026.02.21'
RELEASE_GIT_HEAD = '9a9a6b6fe44a30458c1754ef064f354f04a84004' RELEASE_GIT_HEAD = '646bb31f39614e6c2f7ba687c53e7496394cbadb'
VARIANT = None VARIANT = None
@@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2026.01.31' _pkg_version = '2026.02.21'