mirror of
https://github.com/yt-dlp/yt-dlp.git
synced 2026-02-23 17:05:58 +00:00
Compare commits
43 Commits
2026.01.31
...
2026.02.21
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e2a9cc7d13 | ||
|
|
646bb31f39 | ||
|
|
1fbbe29b99 | ||
|
|
c105461647 | ||
|
|
1d1358d09f | ||
|
|
1fe0bf23aa | ||
|
|
f05e1cd1f1 | ||
|
|
46d5b6f2b7 | ||
|
|
166356d1a1 | ||
|
|
2485653859 | ||
|
|
f532a91cef | ||
|
|
81bdea03f3 | ||
|
|
e74076141d | ||
|
|
97f03660f5 | ||
|
|
772559e3db | ||
|
|
c7945800e4 | ||
|
|
e2444584a3 | ||
|
|
acfc00a955 | ||
|
|
224fe478b0 | ||
|
|
77221098fc | ||
|
|
319a2bda83 | ||
|
|
2204cee6d8 | ||
|
|
071ad7dfa0 | ||
|
|
0d8898c3f4 | ||
|
|
d108ca10b9 | ||
|
|
c9c8651975 | ||
|
|
62574f5763 | ||
|
|
abade83f8d | ||
|
|
43229d1d5f | ||
|
|
8d6e0b29bf | ||
|
|
1ea7329cc9 | ||
|
|
a13f281012 | ||
|
|
02ce3efbfe | ||
|
|
1a9c4b8238 | ||
|
|
637ae202ac | ||
|
|
23c059a455 | ||
|
|
6f38df31b4 | ||
|
|
442c90da3e | ||
|
|
133cb959be | ||
|
|
c7c45f5289 | ||
|
|
bb3af7e6d5 | ||
|
|
c677d866d4 | ||
|
|
1a895c18aa |
10
CONTRIBUTORS
10
CONTRIBUTORS
@@ -864,3 +864,13 @@ Sytm
|
||||
zahlman
|
||||
azdlonky
|
||||
thematuu
|
||||
beacdeac
|
||||
blauerdorf
|
||||
CanOfSocks
|
||||
gravesducking
|
||||
gseddon
|
||||
hunter-gatherer8
|
||||
LordMZTE
|
||||
regulad
|
||||
stastix
|
||||
syphyr
|
||||
|
||||
63
Changelog.md
63
Changelog.md
@@ -4,6 +4,69 @@
|
||||
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
|
||||
-->
|
||||
|
||||
### 2026.02.21
|
||||
|
||||
#### Important changes
|
||||
- Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)
|
||||
- The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters
|
||||
|
||||
#### Core changes
|
||||
- **cookies**: [Ignore cookies with control characters](https://github.com/yt-dlp/yt-dlp/commit/43229d1d5f47b313e1958d719faff6321d853ed3) ([#15862](https://github.com/yt-dlp/yt-dlp/issues/15862)) by [bashonly](https://github.com/bashonly), [syphyr](https://github.com/syphyr)
|
||||
- **jsinterp**
|
||||
- [Fix bitwise operations](https://github.com/yt-dlp/yt-dlp/commit/62574f5763755a8637880044630b12582e4a55a5) ([#15985](https://github.com/yt-dlp/yt-dlp/issues/15985)) by [bashonly](https://github.com/bashonly)
|
||||
- [Stringify bracket notation keys in object access](https://github.com/yt-dlp/yt-dlp/commit/c9c86519753d6cdafa052945d2de0d3fcd448927) ([#15989](https://github.com/yt-dlp/yt-dlp/issues/15989)) by [bashonly](https://github.com/bashonly)
|
||||
- [Support string concatenation with `+` and `+=`](https://github.com/yt-dlp/yt-dlp/commit/d108ca10b926410ed99031fec86894bfdea8f8eb) ([#15990](https://github.com/yt-dlp/yt-dlp/issues/15990)) by [bashonly](https://github.com/bashonly)
|
||||
|
||||
#### Extractor changes
|
||||
- [Add browser impersonation support to more extractors](https://github.com/yt-dlp/yt-dlp/commit/1d1358d09fedcdc6b3e83538a29b0b539cb9be3f) ([#16029](https://github.com/yt-dlp/yt-dlp/issues/16029)) by [bashonly](https://github.com/bashonly)
|
||||
- [Limit `netrc_machine` parameter to shell-safe characters](https://github.com/yt-dlp/yt-dlp/commit/1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a) by [Grub4K](https://github.com/Grub4K)
|
||||
- **1tv**: [Extract chapters](https://github.com/yt-dlp/yt-dlp/commit/23c059a455acbb317b2bbe657efd59113bf4d5ac) ([#15848](https://github.com/yt-dlp/yt-dlp/issues/15848)) by [hunter-gatherer8](https://github.com/hunter-gatherer8)
|
||||
- **aenetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/24856538595a3b25c75e1199146fcc82ea812d97) ([#14959](https://github.com/yt-dlp/yt-dlp/issues/14959)) by [Sipherdrakon](https://github.com/Sipherdrakon)
|
||||
- **applepodcasts**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1ea7329cc91da38a790174e831fffafcb3ea3c3d) ([#15901](https://github.com/yt-dlp/yt-dlp/issues/15901)) by [coreywright](https://github.com/coreywright)
|
||||
- **dailymotion**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/224fe478b0ef83d13b36924befa53686290cb000) ([#15995](https://github.com/yt-dlp/yt-dlp/issues/15995)) by [bashonly](https://github.com/bashonly)
|
||||
- **facebook**: ads: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e2444584a3e590077b81828ad8a12fc4c3b1aa6d) ([#16002](https://github.com/yt-dlp/yt-dlp/issues/16002)) by [bashonly](https://github.com/bashonly)
|
||||
- **gem.cbc.ca**: [Support standalone, series & Olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/637ae202aca7a990b3b61bc33d692870dc16c3ad) ([#15878](https://github.com/yt-dlp/yt-dlp/issues/15878)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly), [makew0rld](https://github.com/makew0rld)
|
||||
- **learningonscreen**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/46d5b6f2b7989d8991a59215d434fb8b5a8ec7bb) ([#16028](https://github.com/yt-dlp/yt-dlp/issues/16028)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
|
||||
- **locipo**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/442c90da3ec680037b7d94abf91ec63b2e5a9ade) ([#15486](https://github.com/yt-dlp/yt-dlp/issues/15486)) by [doe1080](https://github.com/doe1080), [gravesducking](https://github.com/gravesducking)
|
||||
- **matchitv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/8d6e0b29bf15365638e0ceeb803a274e4db6157d) ([#15204](https://github.com/yt-dlp/yt-dlp/issues/15204)) by [gseddon](https://github.com/gseddon)
|
||||
- **odnoklassniki**: [Fix inefficient regular expression](https://github.com/yt-dlp/yt-dlp/commit/071ad7dfa012f5b71572d29ef96fc154cb2dc9cc) ([#15974](https://github.com/yt-dlp/yt-dlp/issues/15974)) by [bashonly](https://github.com/bashonly)
|
||||
- **opencast**: [Support `oc-p.uni-jena.de` URLs](https://github.com/yt-dlp/yt-dlp/commit/166356d1a1cac19cac14298e735eeae44b52c70e) ([#16026](https://github.com/yt-dlp/yt-dlp/issues/16026)) by [LordMZTE](https://github.com/LordMZTE)
|
||||
- **pornhub**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6f38df31b477cf5ea3c8f91207452e3a4e8d5aa6) ([#15858](https://github.com/yt-dlp/yt-dlp/issues/15858)) by [beacdeac](https://github.com/beacdeac)
|
||||
- **saucepluschannel**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/97f03660f55696dc9fce56e7ee43fbe3324a9867) ([#15830](https://github.com/yt-dlp/yt-dlp/issues/15830)) by [regulad](https://github.com/regulad)
|
||||
- **soundcloud**
|
||||
- [Fix client ID extraction](https://github.com/yt-dlp/yt-dlp/commit/81bdea03f3414dd4d086610c970ec14e15bd3d36) ([#16019](https://github.com/yt-dlp/yt-dlp/issues/16019)) by [bashonly](https://github.com/bashonly)
|
||||
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f532a91cef11075eb5a7809255259b32d2bca8ca) ([#16020](https://github.com/yt-dlp/yt-dlp/issues/16020)) by [bashonly](https://github.com/bashonly)
|
||||
- **spankbang**
|
||||
- [Fix playlist title extraction](https://github.com/yt-dlp/yt-dlp/commit/1fe0bf23aa2249858c08408b7cc6287aaf528690) ([#14132](https://github.com/yt-dlp/yt-dlp/issues/14132)) by [blauerdorf](https://github.com/blauerdorf)
|
||||
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/f05e1cd1f1052cb40fc966d2fc175571986da863) ([#14130](https://github.com/yt-dlp/yt-dlp/issues/14130)) by [blauerdorf](https://github.com/blauerdorf)
|
||||
- **steam**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1a9c4b8238434c760b3e27d0c9df6a4a2482d918) ([#15028](https://github.com/yt-dlp/yt-dlp/issues/15028)) by [doe1080](https://github.com/doe1080)
|
||||
- **tele5**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/772559e3db2eb82e5d862d6d779588ca4b0b048d) ([#16005](https://github.com/yt-dlp/yt-dlp/issues/16005)) by [bashonly](https://github.com/bashonly)
|
||||
- **tver**: olympic: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/02ce3efbfe51d54cb0866953af423fc6d1f38933) ([#15885](https://github.com/yt-dlp/yt-dlp/issues/15885)) by [doe1080](https://github.com/doe1080)
|
||||
- **tvo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/a13f281012a21c85f76cf3e320fc3b00d480d6c6) ([#15903](https://github.com/yt-dlp/yt-dlp/issues/15903)) by [doe1080](https://github.com/doe1080)
|
||||
- **twitter**: [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/0d8898c3f4e76742afb2b877f817fdee89fa1258) ([#15993](https://github.com/yt-dlp/yt-dlp/issues/15993)) by [bashonly](https://github.com/bashonly) (With fixes in [7722109](https://github.com/yt-dlp/yt-dlp/commit/77221098fc5016f12118421982f02b662021972c))
|
||||
- **visir**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/c7c45f52890eee40565188aee874ff4e58e95c4f) ([#15811](https://github.com/yt-dlp/yt-dlp/issues/15811)) by [doe1080](https://github.com/doe1080)
|
||||
- **vk**: [Solve JS challenges using native JS interpreter](https://github.com/yt-dlp/yt-dlp/commit/acfc00a955208ee780b4cb18ae26de7b62444153) ([#15992](https://github.com/yt-dlp/yt-dlp/issues/15992)) by [0xvd](https://github.com/0xvd), [bashonly](https://github.com/bashonly)
|
||||
- **xhamster**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/133cb959be4d268e2cd6b3f1d9bf87fba4c3743e) ([#15831](https://github.com/yt-dlp/yt-dlp/issues/15831)) by [0xvd](https://github.com/0xvd)
|
||||
- **youtube**
|
||||
- [Add more known player JS variants](https://github.com/yt-dlp/yt-dlp/commit/2204cee6d8301e491d8455a2c54fd0e1b23468f5) ([#15975](https://github.com/yt-dlp/yt-dlp/issues/15975)) by [bashonly](https://github.com/bashonly)
|
||||
- [Extract live adaptive `incomplete` formats](https://github.com/yt-dlp/yt-dlp/commit/319a2bda83f5e54054661c56c1391533f82473c2) ([#15937](https://github.com/yt-dlp/yt-dlp/issues/15937)) by [bashonly](https://github.com/bashonly), [CanOfSocks](https://github.com/CanOfSocks)
|
||||
- [Update ejs to 0.5.0](https://github.com/yt-dlp/yt-dlp/commit/c105461647315f7f479091194944713b392ca729) ([#16031](https://github.com/yt-dlp/yt-dlp/issues/16031)) by [bashonly](https://github.com/bashonly)
|
||||
- date, search: [Remove broken `ytsearchdate` support](https://github.com/yt-dlp/yt-dlp/commit/c7945800e4ccd8cad2d5ee7806a872963c0c6d44) ([#15959](https://github.com/yt-dlp/yt-dlp/issues/15959)) by [stastix](https://github.com/stastix)
|
||||
|
||||
#### Networking changes
|
||||
- **Request Handler**: curl_cffi: [Deprioritize unreliable impersonate targets](https://github.com/yt-dlp/yt-dlp/commit/e74076141dc86d5603680ea641d7cec86a821ac8) ([#16018](https://github.com/yt-dlp/yt-dlp/issues/16018)) by [bashonly](https://github.com/bashonly)
|
||||
|
||||
#### Misc. changes
|
||||
- **cleanup**
|
||||
- [Bump ruff to 0.15.x](https://github.com/yt-dlp/yt-dlp/commit/abade83f8ddb63a11746b69038ebcd9c1405a00a) ([#15951](https://github.com/yt-dlp/yt-dlp/issues/15951)) by [Grub4K](https://github.com/Grub4K)
|
||||
- Miscellaneous: [646bb31](https://github.com/yt-dlp/yt-dlp/commit/646bb31f39614e6c2f7ba687c53e7496394cbadb) by [Grub4K](https://github.com/Grub4K)
|
||||
|
||||
### 2026.02.04
|
||||
|
||||
#### Extractor changes
|
||||
- **unsupported**: [Update unsupported URLs](https://github.com/yt-dlp/yt-dlp/commit/c677d866d41eb4075b0a5e0c944a6543fc13f15d) ([#15812](https://github.com/yt-dlp/yt-dlp/issues/15812)) by [doe1080](https://github.com/doe1080)
|
||||
- **youtube**: [Default to `tv` player JS variant](https://github.com/yt-dlp/yt-dlp/commit/1a895c18aaaf00f557aa8cbacb21faa638842431) ([#15818](https://github.com/yt-dlp/yt-dlp/issues/15818)) by [bashonly](https://github.com/bashonly)
|
||||
|
||||
### 2026.01.31
|
||||
|
||||
#### Extractor changes
|
||||
|
||||
6
Makefile
6
Makefile
@@ -202,9 +202,9 @@ CONTRIBUTORS: Changelog.md
|
||||
|
||||
# The following EJS_-prefixed variables are auto-generated by devscripts/update_ejs.py
|
||||
# DO NOT EDIT!
|
||||
EJS_VERSION = 0.4.0
|
||||
EJS_WHEEL_NAME = yt_dlp_ejs-0.4.0-py3-none-any.whl
|
||||
EJS_WHEEL_HASH = sha256:19278cff397b243074df46342bb7616c404296aeaff01986b62b4e21823b0b9c
|
||||
EJS_VERSION = 0.5.0
|
||||
EJS_WHEEL_NAME = yt_dlp_ejs-0.5.0-py3-none-any.whl
|
||||
EJS_WHEEL_HASH = sha256:674fc0efea741d3100cdf3f0f9e123150715ee41edf47ea7a62fbdeda204bdec
|
||||
EJS_PY_FOLDERS = yt_dlp_ejs yt_dlp_ejs/yt yt_dlp_ejs/yt/solver
|
||||
EJS_PY_FILES = yt_dlp_ejs/__init__.py yt_dlp_ejs/_version.py yt_dlp_ejs/yt/__init__.py yt_dlp_ejs/yt/solver/__init__.py
|
||||
EJS_JS_FOLDERS = yt_dlp_ejs/yt/solver
|
||||
|
||||
@@ -406,7 +406,7 @@ Tip: Use `CTRL`+`F` (or `Command`+`F`) to search by keywords
|
||||
(default)
|
||||
--live-from-start Download livestreams from the start.
|
||||
Currently experimental and only supported
|
||||
for YouTube and Twitch
|
||||
for YouTube, Twitch, and TVer
|
||||
--no-live-from-start Download livestreams from the current time
|
||||
(default)
|
||||
--wait-for-video MIN[-MAX] Wait for scheduled streams to become
|
||||
@@ -1864,13 +1864,13 @@ The following extractors use this feature:
|
||||
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
|
||||
* `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests
|
||||
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
|
||||
* `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `tv`, `tv_es6`, `phone`, `tablet`. The default is `main`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site
|
||||
* `player_js_variant`: The player javascript variant to use for n/sig deciphering. The known variants are: `main`, `tcc`, `tce`, `es5`, `es6`, `es6_tcc`, `es6_tce`, `tv`, `tv_es6`, `phone`, `house`. The default is `tv`, and the others are for debugging purposes. You can use `actual` to go with what is prescribed by the site
|
||||
* `player_js_version`: The player javascript version to use for n/sig deciphering, in the format of `signature_timestamp@hash` (e.g. `20348@0004de42`). The default is to use what is prescribed by the site, and can be selected with `actual`
|
||||
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
|
||||
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread,max-depth`. Default is `all,all,all,all,all`
|
||||
* A `max-depth` value of `1` will discard all replies, regardless of the `max-replies` or `max-replies-per-thread` values given
|
||||
* E.g. `all,all,1000,10,2` will get a maximum of 1000 replies total, with up to 10 replies per thread, and only 2 levels of depth (i.e. top-level comments plus their immediate replies). `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
|
||||
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
|
||||
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash, live adaptive https, and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
|
||||
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
|
||||
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
|
||||
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
|
||||
@@ -2261,7 +2261,7 @@ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
|
||||
|
||||
* **YouTube improvements**:
|
||||
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefixes (`ytsearch:`, `ytsearchdate:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)
|
||||
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefix (`ytsearch:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)
|
||||
* Fix for [n-sig based throttling](https://github.com/ytdl-org/youtube-dl/issues/29326) **\***
|
||||
* Download livestreams from the start using `--live-from-start` (*experimental*)
|
||||
* Channel URLs download all uploads of the channel, including shorts and live
|
||||
|
||||
@@ -337,5 +337,10 @@
|
||||
"when": "e2ea6bd6ab639f910b99e55add18856974ff4c3a",
|
||||
"short": "[ie] Fix prioritization of Youtube URL matching (#15596)",
|
||||
"authors": ["Grub4K"]
|
||||
},
|
||||
{
|
||||
"action": "add",
|
||||
"when": "1fbbe29b99dc61375bf6d786f824d9fcf6ea9c1a",
|
||||
"short": "[priority] Security: [[CVE-2026-26331](https://nvd.nist.gov/vuln/detail/CVE-2026-26331)] [Arbitrary command injection with the `--netrc-cmd` option](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-g3gw-q23r-pgqm)\n - The argument passed to the command in `--netrc-cmd` is now limited to a safe subset of characters"
|
||||
}
|
||||
]
|
||||
|
||||
@@ -55,7 +55,7 @@ default = [
|
||||
"requests>=2.32.2,<3",
|
||||
"urllib3>=2.0.2,<3",
|
||||
"websockets>=13.0",
|
||||
"yt-dlp-ejs==0.4.0",
|
||||
"yt-dlp-ejs==0.5.0",
|
||||
]
|
||||
curl-cffi = [
|
||||
"curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.15; implementation_name=='cpython'",
|
||||
@@ -85,7 +85,7 @@ dev = [
|
||||
]
|
||||
static-analysis = [
|
||||
"autopep8~=2.0",
|
||||
"ruff~=0.14.0",
|
||||
"ruff~=0.15.0",
|
||||
]
|
||||
test = [
|
||||
"pytest~=8.1",
|
||||
|
||||
@@ -506,7 +506,8 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **GDCVault**: [*gdcvault*](## "netrc machine") (**Currently broken**)
|
||||
- **GediDigital**
|
||||
- **gem.cbc.ca**: [*cbcgem*](## "netrc machine")
|
||||
- **gem.cbc.ca:live**
|
||||
- **gem.cbc.ca:live**: [*cbcgem*](## "netrc machine")
|
||||
- **gem.cbc.ca:olympics**: [*cbcgem*](## "netrc machine")
|
||||
- **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine")
|
||||
- **Genius**
|
||||
- **GeniusLyrics**
|
||||
@@ -734,6 +735,8 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **Livestreamfails**
|
||||
- **Lnk**
|
||||
- **loc**: Library of Congress
|
||||
- **Locipo**
|
||||
- **LocipoPlaylist**
|
||||
- **Loco**
|
||||
- **loom**
|
||||
- **loom:folder**: (**Currently broken**)
|
||||
@@ -763,6 +766,7 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **MarkizaPage**: (**Currently broken**)
|
||||
- **massengeschmack.tv**
|
||||
- **Masters**
|
||||
- **MatchiTV**
|
||||
- **MatchTV**
|
||||
- **mave**
|
||||
- **mave:channel**
|
||||
@@ -1283,6 +1287,7 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **Sangiin**: 参議院インターネット審議中継 (archive)
|
||||
- **Sapo**: SAPO Vídeos
|
||||
- **SaucePlus**: Sauce+
|
||||
- **SaucePlusChannel**
|
||||
- **SBS**: sbs.com.au
|
||||
- **sbs.co.kr**
|
||||
- **sbs.co.kr:allvod_program**
|
||||
@@ -1550,10 +1555,12 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **TVC**
|
||||
- **TVCArticle**
|
||||
- **TVer**
|
||||
- **tver:olympic**
|
||||
- **tvigle**: Интернет-телевидение Tvigle.ru
|
||||
- **TVIPlayer**
|
||||
- **TVN24**: (**Currently broken**)
|
||||
- **tvnoe**: Televize Noe
|
||||
- **TVO**
|
||||
- **tvopengr:embed**: tvopen.gr embedded videos
|
||||
- **tvopengr:watch**: tvopen.gr (and ethnos.gr) videos
|
||||
- **tvp**: Telewizja Polska
|
||||
@@ -1664,6 +1671,7 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **ViMP:Playlist**
|
||||
- **Viously**
|
||||
- **Viqeo**: (**Currently broken**)
|
||||
- **Visir**: Vísir
|
||||
- **Viu**
|
||||
- **viu:ott**: [*viu*](## "netrc machine")
|
||||
- **viu:playlist**
|
||||
@@ -1812,7 +1820,6 @@ The only reliable way to check if a site is supported is to try it.
|
||||
- **youtube:playlist**: [*youtube*](## "netrc machine") YouTube playlists
|
||||
- **youtube:recommended**: [*youtube*](## "netrc machine") YouTube recommended videos; ":ytrec" keyword
|
||||
- **youtube:search**: [*youtube*](## "netrc machine") YouTube search; "ytsearch:" prefix
|
||||
- **youtube:search:date**: [*youtube*](## "netrc machine") YouTube search, newest videos first; "ytsearchdate:" prefix
|
||||
- **youtube:search_url**: [*youtube*](## "netrc machine") YouTube search URLs with sorting and filter support
|
||||
- **youtube:shorts:pivot:audio**: [*youtube*](## "netrc machine") YouTube Shorts audio pivot (Shorts using audio of a given video)
|
||||
- **youtube:subscriptions**: [*youtube*](## "netrc machine") YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)
|
||||
|
||||
@@ -294,7 +294,7 @@ def expect_info_dict(self, got_dict, expected_dict):
|
||||
|
||||
missing_keys = sorted(
|
||||
test_info_dict.keys() - expected_dict.keys(),
|
||||
key=lambda x: ALLOWED_KEYS_SORT_ORDER.index(x))
|
||||
key=ALLOWED_KEYS_SORT_ORDER.index)
|
||||
if missing_keys:
|
||||
def _repr(v):
|
||||
if isinstance(v, str):
|
||||
|
||||
@@ -76,6 +76,8 @@ class TestInfoExtractor(unittest.TestCase):
|
||||
self.assertEqual(ie._get_netrc_login_info(netrc_machine='empty_pass'), ('user', ''))
|
||||
self.assertEqual(ie._get_netrc_login_info(netrc_machine='both_empty'), ('', ''))
|
||||
self.assertEqual(ie._get_netrc_login_info(netrc_machine='nonexistent'), (None, None))
|
||||
with self.assertRaises(ExtractorError):
|
||||
ie._get_netrc_login_info(netrc_machine=';echo rce')
|
||||
|
||||
def test_html_search_regex(self):
|
||||
html = '<p id="foo">Watch this <a href="http://www.youtube.com/watch?v=BaW_jenozKc">video</a></p>'
|
||||
|
||||
@@ -205,8 +205,8 @@ class TestLenientSimpleCookie(unittest.TestCase):
|
||||
),
|
||||
(
|
||||
'Test quoted cookie',
|
||||
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
|
||||
{'keebler': 'E=mc2; L="Loves"; fudge=\012;'},
|
||||
'keebler="E=mc2; L=\\"Loves\\"; fudge=;"',
|
||||
{'keebler': 'E=mc2; L="Loves"; fudge=;'},
|
||||
),
|
||||
(
|
||||
"Allow '=' in an unquoted value",
|
||||
@@ -328,4 +328,30 @@ class TestLenientSimpleCookie(unittest.TestCase):
|
||||
'Key=Value; [Invalid]=Value; Another=Value',
|
||||
{'Key': 'Value', 'Another': 'Value'},
|
||||
),
|
||||
# Ref: https://github.com/python/cpython/issues/143919
|
||||
(
|
||||
'Test invalid cookie name w/ control character',
|
||||
'foo\012=bar;',
|
||||
{},
|
||||
),
|
||||
(
|
||||
'Test invalid cookie name w/ control character 2',
|
||||
'foo\015baz=bar',
|
||||
{},
|
||||
),
|
||||
(
|
||||
'Test invalid cookie name w/ control character followed by valid cookie',
|
||||
'foo\015=bar; x=y;',
|
||||
{'x': 'y'},
|
||||
),
|
||||
(
|
||||
'Test invalid cookie value w/ control character',
|
||||
'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
|
||||
{},
|
||||
),
|
||||
(
|
||||
'Test invalid quoted attribute value w/ control character',
|
||||
'Customer="WILE_E_COYOTE"; Version="1\\012"; Path="/acme"',
|
||||
{},
|
||||
),
|
||||
)
|
||||
|
||||
@@ -33,9 +33,12 @@ class Variant(enum.Enum):
|
||||
tce = 'player_ias_tce.vflset/en_US/base.js'
|
||||
es5 = 'player_es5.vflset/en_US/base.js'
|
||||
es6 = 'player_es6.vflset/en_US/base.js'
|
||||
es6_tcc = 'player_es6_tcc.vflset/en_US/base.js'
|
||||
es6_tce = 'player_es6_tce.vflset/en_US/base.js'
|
||||
tv = 'tv-player-ias.vflset/tv-player-ias.js'
|
||||
tv_es6 = 'tv-player-es6.vflset/tv-player-es6.js'
|
||||
phone = 'player-plasma-ias-phone-en_US.vflset/base.js'
|
||||
house = 'house_brand_player.vflset/en_US/base.js'
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
@@ -102,6 +105,66 @@ CHALLENGES: list[Challenge] = [
|
||||
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
|
||||
'ttJC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNGa1kOyBf6HPuAuCduh-_',
|
||||
}),
|
||||
# 4e51e895: main variant broke sig solving; n challenge is added only for regression testing
|
||||
Challenge('4e51e895', Variant.main, JsChallengeType.N, {
|
||||
'0eRGgQWJGfT5rFHFj': 't5kO23_msekBur',
|
||||
}),
|
||||
Challenge('4e51e895', Variant.main, JsChallengeType.SIG, {
|
||||
'AL6p_8AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7':
|
||||
'AwdY9yAhRzK8rYA_9n97Kizf7_9n97Kizf7_9n9pKizf7_9n97Kizf7_9n97Kizf7_9n97Kizf7',
|
||||
}),
|
||||
# 42c5570b: tce variant broke sig solving; n challenge is added only for regression testing
|
||||
Challenge('42c5570b', Variant.tce, JsChallengeType.N, {
|
||||
'ZdZIqFPQK-Ty8wId': 'CRoXjB-R-R',
|
||||
}),
|
||||
Challenge('42c5570b', Variant.tce, JsChallengeType.SIG, {
|
||||
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
|
||||
'EN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavcOmNdYN-wUtgEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt',
|
||||
}),
|
||||
# 54bd1de4: tce variant broke sig solving; n challenge is added only for regression testing
|
||||
Challenge('54bd1de4', Variant.tce, JsChallengeType.N, {
|
||||
'ZdZIqFPQK-Ty8wId': 'ka-slAQ31sijFN',
|
||||
}),
|
||||
Challenge('54bd1de4', Variant.tce, JsChallengeType.SIG, {
|
||||
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt':
|
||||
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0titeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtp',
|
||||
}),
|
||||
# 94667337: tce and es6 variants broke sig solving; n and main/tv variants are added only for regression testing
|
||||
Challenge('94667337', Variant.main, JsChallengeType.N, {
|
||||
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
|
||||
}),
|
||||
Challenge('94667337', Variant.main, JsChallengeType.SIG, {
|
||||
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
|
||||
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
|
||||
}),
|
||||
Challenge('94667337', Variant.tv, JsChallengeType.N, {
|
||||
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
|
||||
}),
|
||||
Challenge('94667337', Variant.tv, JsChallengeType.SIG, {
|
||||
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
|
||||
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
|
||||
}),
|
||||
Challenge('94667337', Variant.es6, JsChallengeType.N, {
|
||||
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
|
||||
}),
|
||||
Challenge('94667337', Variant.es6, JsChallengeType.SIG, {
|
||||
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
|
||||
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
|
||||
}),
|
||||
Challenge('94667337', Variant.tce, JsChallengeType.N, {
|
||||
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
|
||||
}),
|
||||
Challenge('94667337', Variant.tce, JsChallengeType.SIG, {
|
||||
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
|
||||
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
|
||||
}),
|
||||
Challenge('94667337', Variant.es6_tce, JsChallengeType.N, {
|
||||
'BQoJvGBkC2nj1ZZLK-': 'ib1ShEOGoFXIIw',
|
||||
}),
|
||||
Challenge('94667337', Variant.es6_tce, JsChallengeType.SIG, {
|
||||
'NJAJEij0EwRgIhAI0KExTgjfPk-MPM9MAdzyyPRt=BM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=gwzz':
|
||||
'AJEij0EwRgIhAI0KExTgjfPk-MPM9MNdzyyPRtzBM8-XO5tm5hlMCSVpAiEAv7eP3CURqZNSPow8BXXAoazVoXgeMP7gH9BdylHCwgw=',
|
||||
}),
|
||||
]
|
||||
|
||||
requests: list[JsChallengeRequest] = []
|
||||
|
||||
@@ -9,7 +9,12 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
import math
|
||||
|
||||
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter, js_number_to_string
|
||||
from yt_dlp.jsinterp import (
|
||||
JS_Undefined,
|
||||
JSInterpreter,
|
||||
int_to_int32,
|
||||
js_number_to_string,
|
||||
)
|
||||
|
||||
|
||||
class NaN:
|
||||
@@ -101,8 +106,16 @@ class TestJSInterpreter(unittest.TestCase):
|
||||
self._test('function f(){return 5 ^ 9;}', 12)
|
||||
self._test('function f(){return 0.0 << NaN}', 0)
|
||||
self._test('function f(){return null << undefined}', 0)
|
||||
# TODO: Does not work due to number too large
|
||||
# self._test('function f(){return 21 << 4294967297}', 42)
|
||||
self._test('function f(){return -12616 ^ 5041}', -8951)
|
||||
self._test('function f(){return 21 << 4294967297}', 42)
|
||||
|
||||
def test_string_concat(self):
|
||||
self._test('function f(){return "a" + "b";}', 'ab')
|
||||
self._test('function f(){let x = "a"; x += "b"; return x;}', 'ab')
|
||||
self._test('function f(){return "a" + 1;}', 'a1')
|
||||
self._test('function f(){let x = "a"; x += 1; return x;}', 'a1')
|
||||
self._test('function f(){return 2 + "b";}', '2b')
|
||||
self._test('function f(){let x = 2; x += "b"; return x;}', '2b')
|
||||
|
||||
def test_array_access(self):
|
||||
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])
|
||||
@@ -325,6 +338,7 @@ class TestJSInterpreter(unittest.TestCase):
|
||||
self._test('function f() { let a = {m1: 42, m2: 0 }; return [a["m1"], a.m2]; }', [42, 0])
|
||||
self._test('function f() { let a; return a?.qq; }', JS_Undefined)
|
||||
self._test('function f() { let a = {m1: 42, m2: 0 }; return a?.qq; }', JS_Undefined)
|
||||
self._test('function f() { let a = {"1": 123}; return a[1]; }', 123)
|
||||
|
||||
def test_regex(self):
|
||||
self._test('function f() { let a=/,,[/,913,/](,)}/; }', None)
|
||||
@@ -447,6 +461,22 @@ class TestJSInterpreter(unittest.TestCase):
|
||||
def test_splice(self):
|
||||
self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0'])
|
||||
|
||||
def test_int_to_int32(self):
|
||||
for inp, exp in [
|
||||
(0, 0),
|
||||
(1, 1),
|
||||
(-1, -1),
|
||||
(-8951, -8951),
|
||||
(2147483647, 2147483647),
|
||||
(2147483648, -2147483648),
|
||||
(2147483649, -2147483647),
|
||||
(-2147483649, 2147483647),
|
||||
(-2147483648, -2147483648),
|
||||
(-16799986688, 379882496),
|
||||
(39570129568, 915423904),
|
||||
]:
|
||||
assert int_to_int32(inp) == exp
|
||||
|
||||
def test_js_number_to_string(self):
|
||||
for test, radix, expected in [
|
||||
(0, None, '0'),
|
||||
|
||||
@@ -1004,6 +1004,7 @@ class TestUrllibRequestHandler(TestRequestHandlerBase):
|
||||
|
||||
@pytest.mark.parametrize('handler', ['Requests'], indirect=True)
|
||||
class TestRequestsRequestHandler(TestRequestHandlerBase):
|
||||
# ruff: disable[PLW0108] `requests` and/or `urllib3` may not be available
|
||||
@pytest.mark.parametrize('raised,expected', [
|
||||
(lambda: requests.exceptions.ConnectTimeout(), TransportError),
|
||||
(lambda: requests.exceptions.ReadTimeout(), TransportError),
|
||||
@@ -1017,8 +1018,10 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
|
||||
# catch-all: https://github.com/psf/requests/blob/main/src/requests/adapters.py#L535
|
||||
(lambda: urllib3.exceptions.HTTPError(), TransportError),
|
||||
(lambda: requests.exceptions.RequestException(), RequestError),
|
||||
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError) - Needs a response object
|
||||
# Needs a response object
|
||||
# (lambda: requests.exceptions.TooManyRedirects(), HTTPError),
|
||||
])
|
||||
# ruff: enable[PLW0108]
|
||||
def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
|
||||
with handler() as rh:
|
||||
def mock_get_instance(*args, **kwargs):
|
||||
@@ -1034,6 +1037,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
|
||||
|
||||
assert exc_info.type is expected
|
||||
|
||||
# ruff: disable[PLW0108] `urllib3` may not be available
|
||||
@pytest.mark.parametrize('raised,expected,match', [
|
||||
(lambda: urllib3.exceptions.SSLError(), SSLError, None),
|
||||
(lambda: urllib3.exceptions.TimeoutError(), TransportError, None),
|
||||
@@ -1052,6 +1056,7 @@ class TestRequestsRequestHandler(TestRequestHandlerBase):
|
||||
'3 bytes read, 5 more expected',
|
||||
),
|
||||
])
|
||||
# ruff: enable[PLW0108]
|
||||
def test_response_error_mapping(self, handler, monkeypatch, raised, expected, match):
|
||||
from requests.models import Response as RequestsResponse
|
||||
from urllib3.response import HTTPResponse as Urllib3Response
|
||||
|
||||
@@ -239,6 +239,7 @@ class TestTraversal:
|
||||
'accept matching `expected_type` type'
|
||||
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int) is None, \
|
||||
'reject non matching `expected_type` type'
|
||||
# ruff: noqa: PLW0108 `type`s get special treatment, so wrap in lambda
|
||||
assert traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)) == '0', \
|
||||
'transform type using type function'
|
||||
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0) is None, \
|
||||
|
||||
@@ -924,6 +924,7 @@ class TestUtil(unittest.TestCase):
|
||||
self.assertEqual(month_by_name(None), None)
|
||||
self.assertEqual(month_by_name('December', 'en'), 12)
|
||||
self.assertEqual(month_by_name('décembre', 'fr'), 12)
|
||||
self.assertEqual(month_by_name('desember', 'is'), 12)
|
||||
self.assertEqual(month_by_name('December'), 12)
|
||||
self.assertEqual(month_by_name('décembre'), None)
|
||||
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
|
||||
|
||||
@@ -448,6 +448,7 @@ def create_fake_ws_connection(raised):
|
||||
|
||||
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
|
||||
class TestWebsocketsRequestHandler:
|
||||
# ruff: disable[PLW0108] `websockets` may not be available
|
||||
@pytest.mark.parametrize('raised,expected', [
|
||||
# https://websockets.readthedocs.io/en/stable/reference/exceptions.html
|
||||
(lambda: websockets.exceptions.InvalidURI(msg='test', uri='test://'), RequestError),
|
||||
@@ -459,13 +460,14 @@ class TestWebsocketsRequestHandler:
|
||||
(lambda: websockets.exceptions.NegotiationError(), TransportError),
|
||||
# Catch-all
|
||||
(lambda: websockets.exceptions.WebSocketException(), TransportError),
|
||||
(lambda: TimeoutError(), TransportError),
|
||||
(TimeoutError, TransportError),
|
||||
# These may be raised by our create_connection implementation, which should also be caught
|
||||
(lambda: OSError(), TransportError),
|
||||
(lambda: ssl.SSLError(), SSLError),
|
||||
(lambda: ssl.SSLCertVerificationError(), CertificateVerifyError),
|
||||
(lambda: socks.ProxyError(), ProxyError),
|
||||
(OSError, TransportError),
|
||||
(ssl.SSLError, SSLError),
|
||||
(ssl.SSLCertVerificationError, CertificateVerifyError),
|
||||
(socks.ProxyError, ProxyError),
|
||||
])
|
||||
# ruff: enable[PLW0108]
|
||||
def test_request_error_mapping(self, handler, monkeypatch, raised, expected):
|
||||
import websockets.sync.client
|
||||
|
||||
@@ -482,11 +484,12 @@ class TestWebsocketsRequestHandler:
|
||||
@pytest.mark.parametrize('raised,expected,match', [
|
||||
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.send
|
||||
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
|
||||
(lambda: RuntimeError(), TransportError, None),
|
||||
(lambda: TimeoutError(), TransportError, None),
|
||||
(lambda: TypeError(), RequestError, None),
|
||||
(lambda: socks.ProxyError(), ProxyError, None),
|
||||
(RuntimeError, TransportError, None),
|
||||
(TimeoutError, TransportError, None),
|
||||
(TypeError, RequestError, None),
|
||||
(socks.ProxyError, ProxyError, None),
|
||||
# Catch-all
|
||||
# ruff: noqa: PLW0108 `websockets` may not be available
|
||||
(lambda: websockets.exceptions.WebSocketException(), TransportError, None),
|
||||
])
|
||||
def test_ws_send_error_mapping(self, handler, monkeypatch, raised, expected, match):
|
||||
@@ -499,10 +502,11 @@ class TestWebsocketsRequestHandler:
|
||||
@pytest.mark.parametrize('raised,expected,match', [
|
||||
# https://websockets.readthedocs.io/en/stable/reference/sync/client.html#websockets.sync.client.ClientConnection.recv
|
||||
(lambda: websockets.exceptions.ConnectionClosed(None, None), TransportError, None),
|
||||
(lambda: RuntimeError(), TransportError, None),
|
||||
(lambda: TimeoutError(), TransportError, None),
|
||||
(lambda: socks.ProxyError(), ProxyError, None),
|
||||
(RuntimeError, TransportError, None),
|
||||
(TimeoutError, TransportError, None),
|
||||
(socks.ProxyError, ProxyError, None),
|
||||
# Catch-all
|
||||
# ruff: noqa: PLW0108 `websockets` may not be available
|
||||
(lambda: websockets.exceptions.WebSocketException(), TransportError, None),
|
||||
])
|
||||
def test_ws_recv_error_mapping(self, handler, monkeypatch, raised, expected, match):
|
||||
|
||||
@@ -1168,6 +1168,7 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
# We use Morsel's legal key chars to avoid errors on setting values
|
||||
_LEGAL_KEY_CHARS = r'\w\d' + re.escape('!#$%&\'*+-.:^_`|~')
|
||||
_LEGAL_VALUE_CHARS = _LEGAL_KEY_CHARS + re.escape('(),/<=>?@[]{}')
|
||||
_LEGAL_KEY_RE = re.compile(rf'[{_LEGAL_KEY_CHARS}]+', re.ASCII)
|
||||
|
||||
_RESERVED = {
|
||||
'expires',
|
||||
@@ -1185,17 +1186,17 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
|
||||
# Added 'bad' group to catch the remaining value
|
||||
_COOKIE_PATTERN = re.compile(r'''
|
||||
\s* # Optional whitespace at start of cookie
|
||||
[ ]* # Optional whitespace at start of cookie
|
||||
(?P<key> # Start of group 'key'
|
||||
[''' + _LEGAL_KEY_CHARS + r''']+?# Any word of at least one letter
|
||||
[^ =;]+ # Match almost anything here for now and validate later
|
||||
) # End of group 'key'
|
||||
( # Optional group: there may not be a value.
|
||||
\s*=\s* # Equal Sign
|
||||
[ ]*=[ ]* # Equal Sign
|
||||
( # Start of potential value
|
||||
(?P<val> # Start of group 'val'
|
||||
"(?:[^\\"]|\\.)*" # Any doublequoted string
|
||||
| # or
|
||||
\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr
|
||||
\w{3},\ [\w\d -]{9,11}\ [\d:]{8}\ GMT # Special case for "expires" attr
|
||||
| # or
|
||||
[''' + _LEGAL_VALUE_CHARS + r''']* # Any word or empty string
|
||||
) # End of group 'val'
|
||||
@@ -1203,10 +1204,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
(?P<bad>(?:\\;|[^;])*?) # 'bad' group fallback for invalid values
|
||||
) # End of potential value
|
||||
)? # End of optional value group
|
||||
\s* # Any number of spaces.
|
||||
(\s+|;|$) # Ending either at space, semicolon, or EOS.
|
||||
[ ]* # Any number of spaces.
|
||||
([ ]+|;|$) # Ending either at space, semicolon, or EOS.
|
||||
''', re.ASCII | re.VERBOSE)
|
||||
|
||||
# http.cookies.Morsel raises on values w/ control characters in Python 3.14.3+ & 3.13.12+
|
||||
# Ref: https://github.com/python/cpython/issues/143919
|
||||
_CONTROL_CHARACTER_RE = re.compile(r'[\x00-\x1F\x7F]')
|
||||
|
||||
def load(self, data):
|
||||
# Workaround for https://github.com/yt-dlp/yt-dlp/issues/4776
|
||||
if not isinstance(data, str):
|
||||
@@ -1219,6 +1224,9 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
continue
|
||||
|
||||
key, value = match.group('key', 'val')
|
||||
if not self._LEGAL_KEY_RE.fullmatch(key):
|
||||
morsel = None
|
||||
continue
|
||||
|
||||
is_attribute = False
|
||||
if key.startswith('$'):
|
||||
@@ -1237,6 +1245,14 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
value = True
|
||||
else:
|
||||
value, _ = self.value_decode(value)
|
||||
# Guard against control characters in quoted attribute values
|
||||
if self._CONTROL_CHARACTER_RE.search(value):
|
||||
# While discarding the entire morsel is not very lenient,
|
||||
# it's better than http.cookies.Morsel raising a CookieError
|
||||
# and it's probably better to err on the side of caution
|
||||
self.pop(morsel.key, None)
|
||||
morsel = None
|
||||
continue
|
||||
|
||||
morsel[key] = value
|
||||
|
||||
@@ -1246,6 +1262,10 @@ class LenientSimpleCookie(http.cookies.SimpleCookie):
|
||||
elif value is not None:
|
||||
morsel = self.get(key, http.cookies.Morsel())
|
||||
real_value, coded_value = self.value_decode(value)
|
||||
# Guard against control characters in quoted cookie values
|
||||
if self._CONTROL_CHARACTER_RE.search(real_value):
|
||||
morsel = None
|
||||
continue
|
||||
morsel.set(key, real_value, coded_value)
|
||||
self[key] = morsel
|
||||
|
||||
|
||||
@@ -311,8 +311,10 @@ from .canalsurmas import CanalsurmasIE
|
||||
from .caracoltv import CaracolTvPlayIE
|
||||
from .cbc import (
|
||||
CBCIE,
|
||||
CBCGemContentIE,
|
||||
CBCGemIE,
|
||||
CBCGemLiveIE,
|
||||
CBCGemOlympicsIE,
|
||||
CBCGemPlaylistIE,
|
||||
CBCListenIE,
|
||||
CBCPlayerIE,
|
||||
@@ -1029,6 +1031,10 @@ from .livestream import (
|
||||
)
|
||||
from .livestreamfails import LivestreamfailsIE
|
||||
from .lnk import LnkIE
|
||||
from .locipo import (
|
||||
LocipoIE,
|
||||
LocipoPlaylistIE,
|
||||
)
|
||||
from .loco import LocoIE
|
||||
from .loom import (
|
||||
LoomFolderIE,
|
||||
@@ -1071,6 +1077,7 @@ from .markiza import (
|
||||
)
|
||||
from .massengeschmacktv import MassengeschmackTVIE
|
||||
from .masters import MastersIE
|
||||
from .matchitv import MatchiTVIE
|
||||
from .matchtv import MatchTVIE
|
||||
from .mave import (
|
||||
MaveChannelIE,
|
||||
@@ -1785,7 +1792,10 @@ from .safari import (
|
||||
from .saitosan import SaitosanIE
|
||||
from .samplefocus import SampleFocusIE
|
||||
from .sapo import SapoIE
|
||||
from .sauceplus import SaucePlusIE
|
||||
from .sauceplus import (
|
||||
SaucePlusChannelIE,
|
||||
SaucePlusIE,
|
||||
)
|
||||
from .sbs import SBSIE
|
||||
from .sbscokr import (
|
||||
SBSCoKrAllvodProgramIE,
|
||||
@@ -2174,11 +2184,15 @@ from .tvc import (
|
||||
TVCIE,
|
||||
TVCArticleIE,
|
||||
)
|
||||
from .tver import TVerIE
|
||||
from .tver import (
|
||||
TVerIE,
|
||||
TVerOlympicIE,
|
||||
)
|
||||
from .tvigle import TvigleIE
|
||||
from .tviplayer import TVIPlayerIE
|
||||
from .tvn24 import TVN24IE
|
||||
from .tvnoe import TVNoeIE
|
||||
from .tvo import TvoIE
|
||||
from .tvopengr import (
|
||||
TVOpenGrEmbedIE,
|
||||
TVOpenGrWatchIE,
|
||||
@@ -2343,6 +2357,7 @@ from .vimm import (
|
||||
)
|
||||
from .viously import ViouslyIE
|
||||
from .viqeo import ViqeoIE
|
||||
from .visir import VisirIE
|
||||
from .viu import (
|
||||
ViuIE,
|
||||
ViuOTTIE,
|
||||
@@ -2541,7 +2556,6 @@ from .youtube import (
|
||||
YoutubeNotificationsIE,
|
||||
YoutubePlaylistIE,
|
||||
YoutubeRecommendedIE,
|
||||
YoutubeSearchDateIE,
|
||||
YoutubeSearchIE,
|
||||
YoutubeSearchURLIE,
|
||||
YoutubeShortsAudioPivotIE,
|
||||
|
||||
@@ -5,10 +5,12 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
GeoRestrictedError,
|
||||
int_or_none,
|
||||
make_archive_id,
|
||||
remove_start,
|
||||
traverse_obj,
|
||||
update_url_query,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import traverse_obj
|
||||
|
||||
|
||||
class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
|
||||
@@ -29,6 +31,19 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
|
||||
'historyvault.com': (None, 'historyvault', None),
|
||||
'biography.com': (None, 'biography', None),
|
||||
}
|
||||
_GRAPHQL_QUERY = '''
|
||||
query getUserVideo($videoId: ID!) {
|
||||
video(id: $videoId) {
|
||||
title
|
||||
publicUrl
|
||||
programId
|
||||
tvSeasonNumber
|
||||
tvSeasonEpisodeNumber
|
||||
series {
|
||||
title
|
||||
}
|
||||
}
|
||||
}'''
|
||||
|
||||
def _extract_aen_smil(self, smil_url, video_id, auth=None):
|
||||
query = {
|
||||
@@ -73,19 +88,39 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
|
||||
|
||||
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
|
||||
requestor_id, brand, software_statement = self._DOMAIN_MAP[domain]
|
||||
if filter_key == 'canonical':
|
||||
webpage = self._download_webpage(url, filter_value)
|
||||
graphql_video_id = self._search_regex(
|
||||
r'<meta\b[^>]+\bcontent="[^"]*\btpid/(\d+)"', webpage,
|
||||
'id') or self._html_search_meta('videoId', webpage, 'GraphQL video ID', fatal=True)
|
||||
else:
|
||||
graphql_video_id = filter_value
|
||||
|
||||
result = self._download_json(
|
||||
f'https://feeds.video.aetnd.com/api/v2/{brand}/videos',
|
||||
filter_value, query={f'filter[{filter_key}]': filter_value})
|
||||
result = traverse_obj(
|
||||
result, ('results',
|
||||
lambda k, v: k == 0 and v[filter_key] == filter_value),
|
||||
get_all=False)
|
||||
if not result:
|
||||
'https://yoga.appsvcs.aetnd.com/', graphql_video_id,
|
||||
query={
|
||||
'brand': brand,
|
||||
'mode': 'live',
|
||||
'platform': 'web',
|
||||
},
|
||||
data=json.dumps({
|
||||
'operationName': 'getUserVideo',
|
||||
'variables': {
|
||||
'videoId': graphql_video_id,
|
||||
},
|
||||
'query': self._GRAPHQL_QUERY,
|
||||
}).encode(),
|
||||
headers={
|
||||
'Content-Type': 'application/json',
|
||||
})
|
||||
|
||||
result = traverse_obj(result, ('data', 'video', {dict}))
|
||||
media_url = traverse_obj(result, ('publicUrl', {url_or_none}))
|
||||
if not media_url:
|
||||
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
|
||||
video_id=remove_start(filter_value, '/'))
|
||||
title = result['title']
|
||||
video_id = result['id']
|
||||
media_url = result['publicUrl']
|
||||
video_id = result['programId']
|
||||
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
|
||||
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
|
||||
info = self._parse_theplatform_metadata(theplatform_metadata)
|
||||
@@ -100,9 +135,13 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
|
||||
info.update(self._extract_aen_smil(media_url, video_id, auth))
|
||||
info.update({
|
||||
'title': title,
|
||||
'series': result.get('seriesName'),
|
||||
'season_number': int_or_none(result.get('tvSeasonNumber')),
|
||||
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
|
||||
'display_id': graphql_video_id,
|
||||
'_old_archive_ids': [make_archive_id(self, graphql_video_id)],
|
||||
**traverse_obj(result, {
|
||||
'series': ('series', 'title', {str}),
|
||||
'season_number': ('tvSeasonNumber', {int_or_none}),
|
||||
'episode_number': ('tvSeasonEpisodeNumber', {int_or_none}),
|
||||
}),
|
||||
})
|
||||
return info
|
||||
|
||||
@@ -116,7 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
(?:shows/[^/?#]+/)?videos/[^/?#]+
|
||||
)'''
|
||||
_TESTS = [{
|
||||
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
|
||||
'url': 'https://www.history.com/shows/mountain-men/season-1/episode-1',
|
||||
'info_dict': {
|
||||
'id': '22253814',
|
||||
'ext': 'mp4',
|
||||
@@ -139,11 +178,11 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'add_ie': ['ThePlatform'],
|
||||
'skip': 'Geo-restricted - This content is not available in your location.',
|
||||
'skip': 'This content requires a valid, unexpired auth token',
|
||||
}, {
|
||||
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
|
||||
'url': 'https://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
|
||||
'info_dict': {
|
||||
'id': '600587331957',
|
||||
'id': '147486',
|
||||
'ext': 'mp4',
|
||||
'title': 'Inlawful Entry',
|
||||
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
|
||||
@@ -160,6 +199,8 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
'season_number': 9,
|
||||
'series': 'Duck Dynasty',
|
||||
'age_limit': 0,
|
||||
'display_id': '600587331957',
|
||||
'_old_archive_ids': ['aenetworks 600587331957'],
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'add_ie': ['ThePlatform'],
|
||||
@@ -186,6 +227,7 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'add_ie': ['ThePlatform'],
|
||||
'skip': '404 Not Found',
|
||||
}, {
|
||||
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story',
|
||||
'info_dict': {
|
||||
@@ -209,6 +251,7 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'add_ie': ['ThePlatform'],
|
||||
'skip': 'This content requires a valid, unexpired auth token',
|
||||
}, {
|
||||
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
|
||||
'only_matching': True,
|
||||
@@ -259,7 +302,7 @@ class AENetworksListBaseIE(AENetworksBaseIE):
|
||||
domain, slug = self._match_valid_url(url).groups()
|
||||
_, brand, _ = self._DOMAIN_MAP[domain]
|
||||
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
|
||||
base_url = f'http://watch.{domain}'
|
||||
base_url = f'https://watch.{domain}'
|
||||
|
||||
entries = []
|
||||
for item in (playlist.get(self._ITEMS_KEY) or []):
|
||||
|
||||
@@ -11,18 +11,18 @@ from ..utils.traversal import traverse_obj
|
||||
class ApplePodcastsIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://podcasts.apple.com/us/podcast/ferreck-dawn-to-the-break-of-dawn-117/id1625658232?i=1000665010654',
|
||||
'md5': '82cc219b8cc1dcf8bfc5a5e99b23b172',
|
||||
'url': 'https://podcasts.apple.com/us/podcast/urbana-podcast-724-by-david-penn/id1531349107?i=1000748574256',
|
||||
'md5': 'f8a6f92735d0cfbd5e6a7294151e28d8',
|
||||
'info_dict': {
|
||||
'id': '1000665010654',
|
||||
'ext': 'mp3',
|
||||
'title': 'Ferreck Dawn - To The Break of Dawn 117',
|
||||
'episode': 'Ferreck Dawn - To The Break of Dawn 117',
|
||||
'description': 'md5:8c4f5c2c30af17ed6a98b0b9daf15b76',
|
||||
'upload_date': '20240812',
|
||||
'timestamp': 1723449600,
|
||||
'duration': 3596,
|
||||
'series': 'Ferreck Dawn - To The Break of Dawn',
|
||||
'id': '1000748574256',
|
||||
'ext': 'm4a',
|
||||
'title': 'URBANA PODCAST 724 BY DAVID PENN',
|
||||
'episode': 'URBANA PODCAST 724 BY DAVID PENN',
|
||||
'description': 'md5:fec77bacba32db8c9b3dda5486ed085f',
|
||||
'upload_date': '20260206',
|
||||
'timestamp': 1770400801,
|
||||
'duration': 3602,
|
||||
'series': 'Urbana Radio Show',
|
||||
'thumbnail': 're:.+[.](png|jpe?g|webp)',
|
||||
},
|
||||
}, {
|
||||
@@ -57,22 +57,22 @@ class ApplePodcastsIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, episode_id)
|
||||
server_data = self._search_json(
|
||||
r'<script [^>]*\bid=["\']serialized-server-data["\'][^>]*>', webpage,
|
||||
'server data', episode_id, contains_pattern=r'\[{(?s:.+)}\]')[0]['data']
|
||||
'server data', episode_id)['data'][0]['data']
|
||||
model_data = traverse_obj(server_data, (
|
||||
'headerButtonItems', lambda _, v: v['$kind'] == 'share' and v['modelType'] == 'EpisodeLockup',
|
||||
'model', {dict}, any))
|
||||
|
||||
return {
|
||||
'id': episode_id,
|
||||
**self._json_ld(
|
||||
traverse_obj(server_data, ('seoData', 'schemaContent', {dict}))
|
||||
or self._yield_json_ld(webpage, episode_id, fatal=False), episode_id, fatal=False),
|
||||
**traverse_obj(model_data, {
|
||||
'title': ('title', {str}),
|
||||
'description': ('summary', {clean_html}),
|
||||
'url': ('playAction', 'episodeOffer', 'streamUrl', {clean_podcast_url}),
|
||||
'timestamp': ('releaseDate', {parse_iso8601}),
|
||||
'duration': ('duration', {int_or_none}),
|
||||
'episode': ('title', {str}),
|
||||
'episode_number': ('episodeNumber', {int_or_none}),
|
||||
'series': ('showTitle', {str}),
|
||||
}),
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
'vcodec': 'none',
|
||||
|
||||
@@ -124,7 +124,7 @@ class BilibiliBaseIE(InfoExtractor):
|
||||
**traverse_obj(play_info, {
|
||||
'quality': ('quality', {int_or_none}),
|
||||
'format_id': ('quality', {str_or_none}),
|
||||
'format_note': ('quality', {lambda x: format_names.get(x)}),
|
||||
'format_note': ('quality', {format_names.get}),
|
||||
'duration': ('timelength', {float_or_none(scale=1000)}),
|
||||
}),
|
||||
**parse_resolution(format_names.get(play_info.get('quality'))),
|
||||
|
||||
@@ -10,6 +10,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
js_to_json,
|
||||
jwt_decode_hs256,
|
||||
mimetype2ext,
|
||||
@@ -25,6 +26,7 @@ from ..utils import (
|
||||
url_basename,
|
||||
url_or_none,
|
||||
urlencode_postdata,
|
||||
urljoin,
|
||||
)
|
||||
from ..utils.traversal import require, traverse_obj, trim_str
|
||||
|
||||
@@ -540,6 +542,32 @@ class CBCGemBaseIE(InfoExtractor):
|
||||
f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}',
|
||||
display_id or item_id, query={'device': 'web'})
|
||||
|
||||
def _call_media_api(self, media_id, app_code='gem', display_id=None, headers=None):
|
||||
media_data = self._download_json(
|
||||
'https://services.radio-canada.ca/media/validation/v2/',
|
||||
display_id or media_id, headers=headers, query={
|
||||
'appCode': app_code,
|
||||
'connectionType': 'hd',
|
||||
'deviceType': 'ipad',
|
||||
'multibitrate': 'true',
|
||||
'output': 'json',
|
||||
'tech': 'hls',
|
||||
'manifestVersion': '2',
|
||||
'manifestType': 'desktop',
|
||||
'idMedia': media_id,
|
||||
})
|
||||
|
||||
error_code = traverse_obj(media_data, ('errorCode', {int}))
|
||||
if error_code == 1:
|
||||
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
|
||||
if error_code == 35:
|
||||
self.raise_login_required(method='password')
|
||||
if error_code != 0:
|
||||
error_message = join_nonempty(error_code, media_data.get('message'), delim=' - ')
|
||||
raise ExtractorError(f'{self.IE_NAME} said: {error_message}')
|
||||
|
||||
return media_data
|
||||
|
||||
def _extract_item_info(self, item_info):
|
||||
episode_number = None
|
||||
title = traverse_obj(item_info, ('title', {str}))
|
||||
@@ -567,7 +595,7 @@ class CBCGemBaseIE(InfoExtractor):
|
||||
|
||||
class CBCGemIE(CBCGemBaseIE):
|
||||
IE_NAME = 'gem.cbc.ca'
|
||||
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]+)'
|
||||
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]{2,4})/?(?:[?#]|$)'
|
||||
_TESTS = [{
|
||||
# This is a normal, public, TV show video
|
||||
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
|
||||
@@ -709,29 +737,10 @@ class CBCGemIE(CBCGemBaseIE):
|
||||
if claims_token := self._fetch_claims_token():
|
||||
headers['x-claims-token'] = claims_token
|
||||
|
||||
m3u8_info = self._download_json(
|
||||
'https://services.radio-canada.ca/media/validation/v2/',
|
||||
video_id, headers=headers, query={
|
||||
'appCode': 'gem',
|
||||
'connectionType': 'hd',
|
||||
'deviceType': 'ipad',
|
||||
'multibitrate': 'true',
|
||||
'output': 'json',
|
||||
'tech': 'hls',
|
||||
'manifestVersion': '2',
|
||||
'manifestType': 'desktop',
|
||||
'idMedia': item_info['idMedia'],
|
||||
})
|
||||
|
||||
if m3u8_info.get('errorCode') == 1:
|
||||
self.raise_geo_restricted(countries=['CA'])
|
||||
elif m3u8_info.get('errorCode') == 35:
|
||||
self.raise_login_required(method='password')
|
||||
elif m3u8_info.get('errorCode') != 0:
|
||||
raise ExtractorError(f'{self.IE_NAME} said: {m3u8_info.get("errorCode")} - {m3u8_info.get("message")}')
|
||||
|
||||
m3u8_url = self._call_media_api(
|
||||
item_info['idMedia'], display_id=video_id, headers=headers)['url']
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_info['url'], video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
|
||||
m3u8_url, video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
|
||||
self._remove_duplicate_formats(formats)
|
||||
|
||||
for fmt in formats:
|
||||
@@ -801,7 +810,128 @@ class CBCGemPlaylistIE(CBCGemBaseIE):
|
||||
}), series=traverse_obj(show_info, ('title', {str})))
|
||||
|
||||
|
||||
class CBCGemLiveIE(InfoExtractor):
|
||||
class CBCGemContentIE(CBCGemBaseIE):
|
||||
IE_NAME = 'gem.cbc.ca:content'
|
||||
IE_DESC = False # Do not list
|
||||
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>[0-9a-z-]+)/?(?:[?#]|$)'
|
||||
_TESTS = [{
|
||||
# Series URL; content_type == 'Season'
|
||||
'url': 'https://gem.cbc.ca/the-tunnel',
|
||||
'playlist_count': 3,
|
||||
'info_dict': {
|
||||
'id': 'the-tunnel',
|
||||
},
|
||||
}, {
|
||||
# Miniseries URL; content_type == 'Parts'
|
||||
'url': 'https://gem.cbc.ca/summit-72',
|
||||
'playlist_count': 1,
|
||||
'info_dict': {
|
||||
'id': 'summit-72',
|
||||
},
|
||||
}, {
|
||||
# Olympics URL; content_type == 'Standalone'
|
||||
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086',
|
||||
'info_dict': {
|
||||
'id': 'ski-jumping-nh-individual-womens-final-30086',
|
||||
'ext': 'mp4',
|
||||
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
|
||||
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
|
||||
'duration': 12793,
|
||||
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
|
||||
'release_timestamp': 1770482100,
|
||||
'release_date': '20260207',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
}, {
|
||||
# Movie URL; content_type == 'Standalone'; requires authentication
|
||||
'url': 'https://gem.cbc.ca/copa-71',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
data = self._search_nextjs_data(webpage, display_id)['props']['pageProps']['data']
|
||||
content_type = data['contentType']
|
||||
self.write_debug(f'Routing for content type "{content_type}"')
|
||||
|
||||
if content_type == 'Standalone':
|
||||
new_url = traverse_obj(data, (
|
||||
'header', 'cta', 'media', 'url', {urljoin('https://gem.cbc.ca/')}))
|
||||
if CBCGemOlympicsIE.suitable(new_url):
|
||||
return self.url_result(new_url, CBCGemOlympicsIE)
|
||||
|
||||
# Manually construct non-Olympics standalone URLs to avoid returning trailer URLs
|
||||
return self.url_result(f'https://gem.cbc.ca/{display_id}/s01e01', CBCGemIE)
|
||||
|
||||
# Handle series URLs (content_type == 'Season') and miniseries URLs (content_type == 'Parts')
|
||||
def entries():
|
||||
for playlist_url in traverse_obj(data, (
|
||||
'content', ..., 'lineups', ..., 'url', {urljoin('https://gem.cbc.ca/')},
|
||||
{lambda x: x if CBCGemPlaylistIE.suitable(x) else None},
|
||||
)):
|
||||
yield self.url_result(playlist_url, CBCGemPlaylistIE)
|
||||
|
||||
return self.playlist_result(entries(), display_id)
|
||||
|
||||
|
||||
class CBCGemOlympicsIE(CBCGemBaseIE):
|
||||
IE_NAME = 'gem.cbc.ca:olympics'
|
||||
_VALID_URL = r'https?://gem\.cbc\.ca/(?P<id>(?:[0-9a-z]+-)+[0-9]{5,})/s01e(?P<media_id>[0-9]{5,})'
|
||||
_TESTS = [{
|
||||
'url': 'https://gem.cbc.ca/ski-jumping-nh-individual-womens-final-30086/s01e30086',
|
||||
'info_dict': {
|
||||
'id': 'ski-jumping-nh-individual-womens-final-30086',
|
||||
'ext': 'mp4',
|
||||
'title': 'Ski Jumping: NH Individual (Women\'s) - Final',
|
||||
'description': 'md5:411c07c8a9a4a36344530b0c726bf8ab',
|
||||
'duration': 12793,
|
||||
'thumbnail': r're:https://[^.]+\.cbc\.ca/.+\.jpg',
|
||||
'release_timestamp': 1770482100,
|
||||
'release_date': '20260207',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id, media_id = self._match_valid_url(url).group('id', 'media_id')
|
||||
|
||||
video_info = self._call_show_api(video_id)
|
||||
item_info = traverse_obj(video_info, (
|
||||
'content', ..., 'lineups', ..., 'items',
|
||||
lambda _, v: v['formattedIdMedia'] == media_id, any, {require('item info')}))
|
||||
|
||||
live_status = {
|
||||
'LiveEvent': 'is_live',
|
||||
'Replay': 'was_live',
|
||||
}.get(item_info.get('type'))
|
||||
|
||||
release_timestamp = traverse_obj(item_info, (
|
||||
'metadata', (('live', 'startDate'), ('replay', 'airDate')), {parse_iso8601}, any))
|
||||
|
||||
if live_status == 'is_live' and release_timestamp and release_timestamp > time.time():
|
||||
formats = []
|
||||
live_status = 'is_upcoming'
|
||||
self.raise_no_formats('This livestream has not yet started', expected=True)
|
||||
else:
|
||||
m3u8_url = self._call_media_api(media_id, 'medianetlive', video_id)['url']
|
||||
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'live_status': live_status,
|
||||
'release_timestamp': release_timestamp,
|
||||
**traverse_obj(item_info, {
|
||||
'title': ('title', {str}),
|
||||
'description': ('description', {str}),
|
||||
'thumbnail': ('images', 'card', 'url', {url_or_none}),
|
||||
'duration': ('metadata', 'replay', 'duration', {int_or_none}),
|
||||
}),
|
||||
}
|
||||
|
||||
|
||||
class CBCGemLiveIE(CBCGemBaseIE):
|
||||
IE_NAME = 'gem.cbc.ca:live'
|
||||
_VALID_URL = r'https?://gem\.cbc\.ca/live(?:-event)?/(?P<id>\d+)'
|
||||
_TESTS = [
|
||||
@@ -871,7 +1001,6 @@ class CBCGemLiveIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
},
|
||||
]
|
||||
_GEO_COUNTRIES = ['CA']
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
@@ -900,19 +1029,8 @@ class CBCGemLiveIE(InfoExtractor):
|
||||
live_status = 'is_upcoming'
|
||||
self.raise_no_formats('This livestream has not yet started', expected=True)
|
||||
else:
|
||||
stream_data = self._download_json(
|
||||
'https://services.radio-canada.ca/media/validation/v2/', video_id, query={
|
||||
'appCode': 'medianetlive',
|
||||
'connectionType': 'hd',
|
||||
'deviceType': 'ipad',
|
||||
'idMedia': video_stream_id,
|
||||
'multibitrate': 'true',
|
||||
'output': 'json',
|
||||
'tech': 'hls',
|
||||
'manifestType': 'desktop',
|
||||
})
|
||||
formats = self._extract_m3u8_formats(
|
||||
stream_data['url'], video_id, 'mp4', live=live_status == 'is_live')
|
||||
m3u8_url = self._call_media_api(video_stream_id, 'medianetlive', video_id)['url']
|
||||
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', live=live_status == 'is_live')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
|
||||
@@ -661,9 +661,11 @@ class InfoExtractor:
|
||||
if not self._ready:
|
||||
self._initialize_pre_login()
|
||||
if self.supports_login():
|
||||
username, password = self._get_login_info()
|
||||
if username:
|
||||
self._perform_login(username, password)
|
||||
# try login only if it would actually do anything
|
||||
if type(self)._perform_login is not InfoExtractor._perform_login:
|
||||
username, password = self._get_login_info()
|
||||
if username:
|
||||
self._perform_login(username, password)
|
||||
elif self.get_param('username') and False not in (self.IE_DESC, self._NETRC_MACHINE):
|
||||
self.report_warning(f'Login with password is not supported for this website. {self._login_hint("cookies")}')
|
||||
self._real_initialize()
|
||||
@@ -1385,6 +1387,11 @@ class InfoExtractor:
|
||||
|
||||
def _get_netrc_login_info(self, netrc_machine=None):
|
||||
netrc_machine = netrc_machine or self._NETRC_MACHINE
|
||||
if not netrc_machine:
|
||||
raise ExtractorError(f'Missing netrc_machine and {type(self).__name__}._NETRC_MACHINE')
|
||||
ALLOWED = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_'
|
||||
if netrc_machine.startswith(('-', '_')) or not all(c in ALLOWED for c in netrc_machine):
|
||||
raise ExtractorError(f'Invalid netrc machine: {netrc_machine!r}', expected=True)
|
||||
|
||||
cmd = self.get_param('netrc_cmd')
|
||||
if cmd:
|
||||
|
||||
@@ -384,8 +384,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
||||
last_error = None
|
||||
|
||||
for note, kwargs in (
|
||||
('Downloading m3u8 information', {}),
|
||||
('Retrying m3u8 download with randomized headers', {
|
||||
('Downloading m3u8 information with randomized headers', {
|
||||
'headers': self._generate_blockbuster_headers(),
|
||||
}),
|
||||
('Retrying m3u8 download with Chrome impersonation', {
|
||||
|
||||
@@ -1041,8 +1041,6 @@ class FacebookAdsIE(InfoExtractor):
|
||||
'uploader': 'Casper',
|
||||
'uploader_id': '224110981099062',
|
||||
'uploader_url': 'https://www.facebook.com/Casper/',
|
||||
'timestamp': 1766299837,
|
||||
'upload_date': '20251221',
|
||||
'like_count': int,
|
||||
},
|
||||
'playlist_count': 2,
|
||||
@@ -1054,12 +1052,23 @@ class FacebookAdsIE(InfoExtractor):
|
||||
'uploader': 'Case \u00e0 Chocs',
|
||||
'uploader_id': '112960472096793',
|
||||
'uploader_url': 'https://www.facebook.com/Caseachocs/',
|
||||
'timestamp': 1768498293,
|
||||
'upload_date': '20260115',
|
||||
'like_count': int,
|
||||
'description': 'md5:f02a255fcf7dce6ed40e9494cf4bc49a',
|
||||
},
|
||||
'playlist_count': 3,
|
||||
}, {
|
||||
'url': 'https://www.facebook.com/ads/library/?id=1704834754236452',
|
||||
'info_dict': {
|
||||
'id': '1704834754236452',
|
||||
'ext': 'mp4',
|
||||
'title': 'Get answers now!',
|
||||
'description': 'Ask the best psychics and get accurate answers on questions that bother you!',
|
||||
'uploader': 'Your Relationship Advisor',
|
||||
'uploader_id': '108939234726306',
|
||||
'uploader_url': 'https://www.facebook.com/100068970634636/',
|
||||
'like_count': int,
|
||||
'thumbnail': r're:https://.+/.+\.jpg',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://es-la.facebook.com/ads/library/?id=901230958115569',
|
||||
'only_matching': True,
|
||||
@@ -1123,8 +1132,11 @@ class FacebookAdsIE(InfoExtractor):
|
||||
post_data = traverse_obj(
|
||||
re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})</script>', webpage), (..., {json.loads}))
|
||||
data = get_first(post_data, (
|
||||
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ...,
|
||||
'entryPointRoot', 'otherProps', 'deeplinkAdCard', 'snapshot', {dict}))
|
||||
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ..., (
|
||||
('__bbox', 'result', 'data', 'ad_library_main', 'deeplink_ad_archive_result', 'deeplink_ad_archive'),
|
||||
# old path
|
||||
('entryPointRoot', 'otherProps', 'deeplinkAdCard'),
|
||||
), 'snapshot', {dict}))
|
||||
if not data:
|
||||
raise ExtractorError('Unable to extract ad data')
|
||||
|
||||
@@ -1140,11 +1152,12 @@ class FacebookAdsIE(InfoExtractor):
|
||||
'title': title,
|
||||
'description': markup or None,
|
||||
}, traverse_obj(data, {
|
||||
'description': ('link_description', {lambda x: x if not x.startswith('{{product.') else None}),
|
||||
'description': (
|
||||
(('body', 'text'), 'link_description'),
|
||||
{lambda x: x if not x.startswith('{{product.') else None}, any),
|
||||
'uploader': ('page_name', {str}),
|
||||
'uploader_id': ('page_id', {str_or_none}),
|
||||
'uploader_url': ('page_profile_uri', {url_or_none}),
|
||||
'timestamp': ('creation_time', {int_or_none}),
|
||||
'like_count': ('page_like_count', {int_or_none}),
|
||||
}))
|
||||
|
||||
@@ -1155,7 +1168,8 @@ class FacebookAdsIE(InfoExtractor):
|
||||
entries.append({
|
||||
'id': f'{video_id}_{idx}',
|
||||
'title': entry.get('title') or title,
|
||||
'description': traverse_obj(entry, 'body', 'link_description') or info_dict.get('description'),
|
||||
'description': traverse_obj(
|
||||
entry, 'body', 'link_description', expected_type=str) or info_dict.get('description'),
|
||||
'thumbnail': url_or_none(entry.get('video_preview_image_url')),
|
||||
'formats': self._extract_formats(entry),
|
||||
})
|
||||
|
||||
@@ -3,10 +3,12 @@ import urllib.parse
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
mimetype2ext,
|
||||
parse_qs,
|
||||
unescapeHTML,
|
||||
unified_strdate,
|
||||
url_or_none,
|
||||
)
|
||||
@@ -107,6 +109,11 @@ class FirstTVIE(InfoExtractor):
|
||||
'timestamp': ('dvr_begin_at', {int_or_none}),
|
||||
'upload_date': ('date_air', {unified_strdate}),
|
||||
'duration': ('duration', {int_or_none}),
|
||||
'chapters': ('episodes', lambda _, v: float_or_none(v['from']) is not None, {
|
||||
'start_time': ('from', {float_or_none}),
|
||||
'title': ('name', {str}, {unescapeHTML}),
|
||||
'end_time': ('to', {float_or_none}),
|
||||
}),
|
||||
}),
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
|
||||
@@ -318,9 +318,48 @@ class FloatplaneIE(FloatplaneBaseIE):
|
||||
self.raise_login_required()
|
||||
|
||||
|
||||
class FloatplaneChannelIE(InfoExtractor):
|
||||
class FloatplaneChannelBaseIE(InfoExtractor):
|
||||
"""Subclasses must set _RESULT_IE, _BASE_URL and _PAGE_SIZE"""
|
||||
|
||||
def _fetch_page(self, display_id, creator_id, channel_id, page):
|
||||
query = {
|
||||
'id': creator_id,
|
||||
'limit': self._PAGE_SIZE,
|
||||
'fetchAfter': page * self._PAGE_SIZE,
|
||||
}
|
||||
if channel_id:
|
||||
query['channel'] = channel_id
|
||||
page_data = self._download_json(
|
||||
f'{self._BASE_URL}/api/v3/content/creator', display_id,
|
||||
query=query, note=f'Downloading page {page + 1}')
|
||||
for post in page_data or []:
|
||||
yield self.url_result(
|
||||
f'{self._BASE_URL}/post/{post["id"]}',
|
||||
self._RESULT_IE, id=post['id'], title=post.get('title'),
|
||||
release_timestamp=parse_iso8601(post.get('releaseDate')))
|
||||
|
||||
def _real_extract(self, url):
|
||||
creator, channel = self._match_valid_url(url).group('id', 'channel')
|
||||
display_id = join_nonempty(creator, channel, delim='/')
|
||||
|
||||
creator_data = self._download_json(
|
||||
f'{self._BASE_URL}/api/v3/creator/named',
|
||||
display_id, query={'creatorURL[0]': creator})[0]
|
||||
|
||||
channel_data = traverse_obj(
|
||||
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
|
||||
|
||||
return self.playlist_result(OnDemandPagedList(functools.partial(
|
||||
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
|
||||
display_id, title=channel_data.get('title') or creator_data.get('title'),
|
||||
description=channel_data.get('about') or creator_data.get('about'))
|
||||
|
||||
|
||||
class FloatplaneChannelIE(FloatplaneChannelBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
|
||||
_BASE_URL = 'https://www.floatplane.com'
|
||||
_PAGE_SIZE = 20
|
||||
_RESULT_IE = FloatplaneIE
|
||||
_TESTS = [{
|
||||
'url': 'https://www.floatplane.com/channel/linustechtips/home/ltxexpo',
|
||||
'info_dict': {
|
||||
@@ -346,36 +385,3 @@ class FloatplaneChannelIE(InfoExtractor):
|
||||
},
|
||||
'playlist_mincount': 200,
|
||||
}]
|
||||
|
||||
def _fetch_page(self, display_id, creator_id, channel_id, page):
|
||||
query = {
|
||||
'id': creator_id,
|
||||
'limit': self._PAGE_SIZE,
|
||||
'fetchAfter': page * self._PAGE_SIZE,
|
||||
}
|
||||
if channel_id:
|
||||
query['channel'] = channel_id
|
||||
page_data = self._download_json(
|
||||
'https://www.floatplane.com/api/v3/content/creator', display_id,
|
||||
query=query, note=f'Downloading page {page + 1}')
|
||||
for post in page_data or []:
|
||||
yield self.url_result(
|
||||
f'https://www.floatplane.com/post/{post["id"]}',
|
||||
FloatplaneIE, id=post['id'], title=post.get('title'),
|
||||
release_timestamp=parse_iso8601(post.get('releaseDate')))
|
||||
|
||||
def _real_extract(self, url):
|
||||
creator, channel = self._match_valid_url(url).group('id', 'channel')
|
||||
display_id = join_nonempty(creator, channel, delim='/')
|
||||
|
||||
creator_data = self._download_json(
|
||||
'https://www.floatplane.com/api/v3/creator/named',
|
||||
display_id, query={'creatorURL[0]': creator})[0]
|
||||
|
||||
channel_data = traverse_obj(
|
||||
creator_data, ('channels', lambda _, v: v['urlname'] == channel), get_all=False) or {}
|
||||
|
||||
return self.playlist_result(OnDemandPagedList(functools.partial(
|
||||
self._fetch_page, display_id, creator_data['id'], channel_data.get('id')), self._PAGE_SIZE),
|
||||
display_id, title=channel_data.get('title') or creator_data.get('title'),
|
||||
description=channel_data.get('about') or creator_data.get('about'))
|
||||
|
||||
@@ -59,7 +59,7 @@ class GetCourseRuIE(InfoExtractor):
|
||||
'marafon.mani-beauty.com',
|
||||
'on.psbook.ru',
|
||||
]
|
||||
_BASE_URL_RE = rf'https?://(?:(?!player02\.)[^.]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})'
|
||||
_BASE_URL_RE = rf'https?://(?:(?!player02\.)[a-zA-Z0-9-]+\.getcourse\.(?:ru|io)|{"|".join(map(re.escape, _DOMAINS))})'
|
||||
_VALID_URL = [
|
||||
rf'{_BASE_URL_RE}/(?!pl/|teach/)(?P<id>[^?#]+)',
|
||||
rf'{_BASE_URL_RE}/(?:pl/)?teach/control/lesson/view\?(?:[^#]+&)?id=(?P<id>\d+)',
|
||||
|
||||
@@ -29,7 +29,7 @@ class LearningOnScreenIE(InfoExtractor):
|
||||
}]
|
||||
|
||||
def _real_initialize(self):
|
||||
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'):
|
||||
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-LOS-LIVE'):
|
||||
self.raise_login_required(method='session_cookies')
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
209
yt_dlp/extractor/locipo.py
Normal file
209
yt_dlp/extractor/locipo.py
Normal file
@@ -0,0 +1,209 @@
|
||||
import functools
|
||||
import math
|
||||
|
||||
from .streaks import StreaksBaseIE
|
||||
from ..networking import HEADRequest
|
||||
from ..utils import (
|
||||
InAdvancePagedList,
|
||||
clean_html,
|
||||
js_to_json,
|
||||
parse_iso8601,
|
||||
parse_qs,
|
||||
str_or_none,
|
||||
)
|
||||
from ..utils.traversal import require, traverse_obj
|
||||
|
||||
|
||||
class LocipoBaseIE(StreaksBaseIE):
|
||||
_API_BASE = 'https://web-api.locipo.jp'
|
||||
_BASE_URL = 'https://locipo.jp'
|
||||
_UUID_RE = r'[\da-f]{8}(?:-[\da-f]{4}){3}-[\da-f]{12}'
|
||||
|
||||
def _call_api(self, path, item_id, note, fatal=True):
|
||||
return self._download_json(
|
||||
f'{self._API_BASE}/{path}', item_id,
|
||||
f'Downloading {note} API JSON',
|
||||
f'Unable to download {note} API JSON',
|
||||
fatal=fatal)
|
||||
|
||||
|
||||
class LocipoIE(LocipoBaseIE):
|
||||
_VALID_URL = [
|
||||
fr'https?://locipo\.jp/creative/(?P<id>{LocipoBaseIE._UUID_RE})',
|
||||
fr'https?://locipo\.jp/embed/?\?(?:[^#]+&)?id=(?P<id>{LocipoBaseIE._UUID_RE})',
|
||||
]
|
||||
_TESTS = [{
|
||||
'url': 'https://locipo.jp/creative/fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
|
||||
'info_dict': {
|
||||
'id': 'fb5ffeaa-398d-45ce-bb49-0e221b5f94f1',
|
||||
'ext': 'mp4',
|
||||
'title': 'リアルカレカノ#4 ~伊達さゆりと勉強しよっ?~',
|
||||
'description': 'md5:70a40c202f3fb7946b61e55fa015094c',
|
||||
'display_id': '5a2947fe596441f5bab88a61b0432d0d',
|
||||
'live_status': 'not_live',
|
||||
'modified_date': r're:\d{8}',
|
||||
'modified_timestamp': int,
|
||||
'release_timestamp': 1711789200,
|
||||
'release_date': '20240330',
|
||||
'series': 'リアルカレカノ',
|
||||
'series_id': '1142',
|
||||
'tags': 'count:4',
|
||||
'thumbnail': r're:https?://.+\.(?:jpg|png)',
|
||||
'timestamp': 1756984919,
|
||||
'upload_date': '20250904',
|
||||
'uploader': '東海テレビ',
|
||||
'uploader_id': 'locipo-prod',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://locipo.jp/embed/?id=71a334a0-2b25-406f-9d96-88f341f571c2',
|
||||
'info_dict': {
|
||||
'id': '71a334a0-2b25-406f-9d96-88f341f571c2',
|
||||
'ext': 'mp4',
|
||||
'title': '#1 オーディション/ゲスト伊藤美来、豊田萌絵',
|
||||
'description': 'md5:5bbcf532474700439cf56ceb6a15630e',
|
||||
'display_id': '0ab32634b884499a84adb25de844c551',
|
||||
'live_status': 'not_live',
|
||||
'modified_date': r're:\d{8}',
|
||||
'modified_timestamp': int,
|
||||
'release_timestamp': 1751623200,
|
||||
'release_date': '20250704',
|
||||
'series': '声優ラジオのウラカブリ~Locipo出張所~',
|
||||
'series_id': '1454',
|
||||
'tags': 'count:6',
|
||||
'thumbnail': r're:https?://.+\.(?:jpg|png)',
|
||||
'timestamp': 1757002966,
|
||||
'upload_date': '20250904',
|
||||
'uploader': 'テレビ愛知',
|
||||
'uploader_id': 'locipo-prod',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://locipo.jp/creative/bff9950d-229b-4fe9-911a-7fa71a232f35?list=69a5b15c-901f-4828-a336-30c0de7612d3',
|
||||
'info_dict': {
|
||||
'id': '69a5b15c-901f-4828-a336-30c0de7612d3',
|
||||
'title': '見て・乗って・語りたい。 東海の鉄道沼',
|
||||
},
|
||||
'playlist_mincount': 3,
|
||||
}, {
|
||||
'url': 'https://locipo.jp/creative/a0751a7f-c7dd-4a10-a7f1-e12720bdf16c?list=006cff3f-ba74-42f0-b4fd-241486ebda2b',
|
||||
'info_dict': {
|
||||
'id': 'a0751a7f-c7dd-4a10-a7f1-e12720bdf16c',
|
||||
'ext': 'mp4',
|
||||
'title': '#839 人間真空パック',
|
||||
'description': 'md5:9fe190333b6975c5001c8c9cbe20d276',
|
||||
'display_id': 'c2b4c9f4a6d648bd8e3c320e384b9d56',
|
||||
'live_status': 'not_live',
|
||||
'modified_date': r're:\d{8}',
|
||||
'modified_timestamp': int,
|
||||
'release_timestamp': 1746239400,
|
||||
'release_date': '20250503',
|
||||
'series': 'でんじろう先生のはぴエネ!',
|
||||
'series_id': '202',
|
||||
'tags': 'count:3',
|
||||
'thumbnail': r're:https?://.+\.(?:jpg|png)',
|
||||
'timestamp': 1756975909,
|
||||
'upload_date': '20250904',
|
||||
'uploader': '中京テレビ',
|
||||
'uploader_id': 'locipo-prod',
|
||||
},
|
||||
'params': {'noplaylist': True},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
playlist_id = traverse_obj(parse_qs(url), ('list', -1, {str}))
|
||||
if self._yes_playlist(playlist_id, video_id):
|
||||
return self.url_result(
|
||||
f'{self._BASE_URL}/playlist/{playlist_id}', LocipoPlaylistIE)
|
||||
|
||||
creatives = self._call_api(f'creatives/{video_id}', video_id, 'Creatives')
|
||||
media_id = traverse_obj(creatives, ('media_id', {str}, {require('Streaks media ID')}))
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
config = self._search_json(
|
||||
r'window\.__NUXT__\.config\s*=', webpage, 'config', video_id, transform_source=js_to_json)
|
||||
api_key = traverse_obj(config, ('public', 'streaksVodPlaybackApiKey', {str}, {require('api key')}))
|
||||
|
||||
return {
|
||||
**self._extract_from_streaks_api('locipo-prod', media_id, headers={
|
||||
'Origin': 'https://locipo.jp',
|
||||
'X-Streaks-Api-Key': api_key,
|
||||
}),
|
||||
**traverse_obj(creatives, {
|
||||
'title': ('name', {clean_html}),
|
||||
'description': ('description', {clean_html}, filter),
|
||||
'release_timestamp': ('publication_started_at', {parse_iso8601}),
|
||||
'tags': ('keyword', {clean_html}, {lambda x: x.split(',')}, ..., {str.strip}, filter),
|
||||
'uploader': ('company', 'name', {clean_html}, filter),
|
||||
}),
|
||||
**traverse_obj(creatives, ('series', {
|
||||
'series': ('name', {clean_html}, filter),
|
||||
'series_id': ('id', {str_or_none}),
|
||||
})),
|
||||
'id': video_id,
|
||||
}
|
||||
|
||||
|
||||
class LocipoPlaylistIE(LocipoBaseIE):
|
||||
_VALID_URL = [
|
||||
fr'https?://locipo\.jp/(?P<type>playlist)/(?P<id>{LocipoBaseIE._UUID_RE})',
|
||||
r'https?://locipo\.jp/(?P<type>series)/(?P<id>\d+)',
|
||||
]
|
||||
_TESTS = [{
|
||||
'url': 'https://locipo.jp/playlist/35d3dd2b-531d-4824-8575-b1c527d29538',
|
||||
'info_dict': {
|
||||
'id': '35d3dd2b-531d-4824-8575-b1c527d29538',
|
||||
'title': 'レシピ集',
|
||||
},
|
||||
'playlist_mincount': 135,
|
||||
}, {
|
||||
# Redirects to https://locipo.jp/series/1363
|
||||
'url': 'https://locipo.jp/playlist/fef7c4fb-741f-4d6a-a3a6-754f354302a2',
|
||||
'info_dict': {
|
||||
'id': '1363',
|
||||
'title': 'CBCアナウンサー公式【みてちょてれび】',
|
||||
'description': 'md5:50a1b23e63112d5c06c882835c8c1fb1',
|
||||
},
|
||||
'playlist_mincount': 38,
|
||||
}, {
|
||||
'url': 'https://locipo.jp/series/503',
|
||||
'info_dict': {
|
||||
'id': '503',
|
||||
'title': 'FishingLover東海',
|
||||
'description': '東海地区の釣り場でフィッシングの魅力を余すところなくご紹介!!',
|
||||
},
|
||||
'playlist_mincount': 223,
|
||||
}]
|
||||
_PAGE_SIZE = 100
|
||||
|
||||
def _fetch_page(self, path, playlist_id, page):
|
||||
creatives = self._download_json(
|
||||
f'{self._API_BASE}/{path}/{playlist_id}/creatives',
|
||||
playlist_id, f'Downloading page {page + 1}', query={
|
||||
'premium': False,
|
||||
'live': False,
|
||||
'limit': self._PAGE_SIZE,
|
||||
'offset': page * self._PAGE_SIZE,
|
||||
})
|
||||
|
||||
for video_id in traverse_obj(creatives, ('items', ..., 'id', {str})):
|
||||
yield self.url_result(f'{self._BASE_URL}/creative/{video_id}', LocipoIE)
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_type, playlist_id = self._match_valid_url(url).group('type', 'id')
|
||||
if urlh := self._request_webpage(HEADRequest(url), playlist_id, fatal=False):
|
||||
playlist_type, playlist_id = self._match_valid_url(urlh.url).group('type', 'id')
|
||||
|
||||
path = 'playlists' if playlist_type == 'playlist' else 'series'
|
||||
creatives = self._call_api(
|
||||
f'{path}/{playlist_id}/creatives', playlist_id, path.capitalize())
|
||||
|
||||
entries = InAdvancePagedList(
|
||||
functools.partial(self._fetch_page, path, playlist_id),
|
||||
math.ceil(int(creatives['total']) / self._PAGE_SIZE), self._PAGE_SIZE)
|
||||
|
||||
return self.playlist_result(
|
||||
entries, playlist_id,
|
||||
**traverse_obj(creatives, ('items', ..., playlist_type, {
|
||||
'title': ('name', {clean_html}, filter),
|
||||
'description': ('description', {clean_html}, filter),
|
||||
}, any)))
|
||||
38
yt_dlp/extractor/matchitv.py
Normal file
38
yt_dlp/extractor/matchitv.py
Normal file
@@ -0,0 +1,38 @@
|
||||
from .common import InfoExtractor
|
||||
from ..utils import join_nonempty, unified_strdate
|
||||
from ..utils.traversal import traverse_obj
|
||||
|
||||
|
||||
class MatchiTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?matchi\.tv/watch/?\?(?:[^#]+&)?s=(?P<id>[0-9a-zA-Z]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://matchi.tv/watch?s=0euhjzrxsjm',
|
||||
'info_dict': {
|
||||
'id': '0euhjzrxsjm',
|
||||
'ext': 'mp4',
|
||||
'title': 'Court 2 at Stratford Padel Club 2024-07-13T18:32:24',
|
||||
'thumbnail': 'https://thumbnails.padelgo.tv/0euhjzrxsjm.jpg',
|
||||
'upload_date': '20240713',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://matchi.tv/watch?s=FkKDJ9SvAx1',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
loaded_media = traverse_obj(
|
||||
self._search_nextjs_data(webpage, video_id, fatal=False),
|
||||
('props', 'pageProps', 'loadedMedia', {dict})) or {}
|
||||
start_date_time = traverse_obj(loaded_media, ('startDateTime', {str}))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': join_nonempty(loaded_media.get('courtDescription'), start_date_time, delim=' '),
|
||||
'thumbnail': f'https://thumbnails.padelgo.tv/{video_id}.jpg',
|
||||
'upload_date': unified_strdate(start_date_time),
|
||||
'formats': self._extract_m3u8_formats(
|
||||
f'https://streams.padelgo.tv/v2/streams/m3u8/{video_id}/anonymous/playlist.m3u8',
|
||||
video_id, 'mp4', m3u8_id='hls'),
|
||||
}
|
||||
@@ -25,7 +25,7 @@ class MixcloudBaseIE(InfoExtractor):
|
||||
%s
|
||||
}
|
||||
}''' % (lookup_key, username, f', slug: "{slug}"' if slug else '', object_fields), # noqa: UP031
|
||||
})['data'][lookup_key]
|
||||
}, impersonate=True)['data'][lookup_key]
|
||||
|
||||
|
||||
class MixcloudIE(MixcloudBaseIE):
|
||||
|
||||
@@ -9,13 +9,13 @@ from ..utils import (
|
||||
int_or_none,
|
||||
qualities,
|
||||
smuggle_url,
|
||||
traverse_obj,
|
||||
unescapeHTML,
|
||||
unified_strdate,
|
||||
unsmuggle_url,
|
||||
url_or_none,
|
||||
urlencode_postdata,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj
|
||||
|
||||
|
||||
class OdnoklassnikiIE(InfoExtractor):
|
||||
@@ -264,9 +264,7 @@ class OdnoklassnikiIE(InfoExtractor):
|
||||
note='Downloading desktop webpage',
|
||||
headers={'Referer': smuggled['referrer']} if smuggled.get('referrer') else {})
|
||||
|
||||
error = self._search_regex(
|
||||
r'[^>]+class="vp_video_stub_txt"[^>]*>([^<]+)<',
|
||||
webpage, 'error', default=None)
|
||||
error = traverse_obj(webpage, {find_element(cls='vp_video_stub_txt')})
|
||||
# Direct link from boosty
|
||||
if (error == 'The author of this video has not been found or is blocked'
|
||||
and not smuggled.get('referrer') and mode == 'videoembed'):
|
||||
|
||||
@@ -33,7 +33,8 @@ class OpencastBaseIE(InfoExtractor):
|
||||
vid\.igb\.illinois\.edu|
|
||||
cursosabertos\.c3sl\.ufpr\.br|
|
||||
mcmedia\.missioncollege\.org|
|
||||
clases\.odon\.edu\.uy
|
||||
clases\.odon\.edu\.uy|
|
||||
oc-p\.uni-jena\.de
|
||||
)'''
|
||||
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
|
||||
|
||||
@@ -106,7 +107,7 @@ class OpencastBaseIE(InfoExtractor):
|
||||
|
||||
class OpencastIE(OpencastBaseIE):
|
||||
_VALID_URL = rf'''(?x)
|
||||
https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella/ui/watch\.html\?
|
||||
https?://(?P<host>{OpencastBaseIE._INSTANCES_RE})/paella[0-9]*/ui/watch\.html\?
|
||||
(?:[^#]+&)?id=(?P<id>{OpencastBaseIE._UUID_RE})'''
|
||||
|
||||
_API_BASE = 'https://%s/search/episode.json?id=%s'
|
||||
@@ -131,8 +132,12 @@ class OpencastIE(OpencastBaseIE):
|
||||
|
||||
def _real_extract(self, url):
|
||||
host, video_id = self._match_valid_url(url).group('host', 'id')
|
||||
return self._parse_mediapackage(
|
||||
self._call_api(host, video_id)['search-results']['result']['mediapackage'])
|
||||
response = self._call_api(host, video_id)
|
||||
package = traverse_obj(response, (
|
||||
('search-results', 'result'),
|
||||
('result', ...), # Path needed for oc-p.uni-jena.de
|
||||
'mediapackage', {dict}, any)) or {}
|
||||
return self._parse_mediapackage(package)
|
||||
|
||||
|
||||
class OpencastPlaylistIE(OpencastBaseIE):
|
||||
|
||||
@@ -128,7 +128,7 @@ class PornHubIE(PornHubBaseIE):
|
||||
_VALID_URL = rf'''(?x)
|
||||
https?://
|
||||
(?:
|
||||
(?:[^/]+\.)?
|
||||
(?:[a-zA-Z0-9.-]+\.)?
|
||||
{PornHubBaseIE._PORNHUB_HOST_RE}
|
||||
/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
|
||||
(?:www\.)?thumbzilla\.com/video/
|
||||
@@ -506,6 +506,7 @@ class PornHubIE(PornHubBaseIE):
|
||||
'cast': ({find_elements(attr='data-label', value='pornstar')}, ..., {clean_html}),
|
||||
}),
|
||||
'subtitles': subtitles,
|
||||
'http_headers': {'Referer': f'https://www.{host}/'},
|
||||
}, info)
|
||||
|
||||
|
||||
@@ -533,7 +534,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
|
||||
|
||||
|
||||
class PornHubUserIE(PornHubPlaylistBaseIE):
|
||||
_VALID_URL = rf'(?P<url>https?://(?:[^/]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
|
||||
_VALID_URL = rf'(?P<url>https?://(?:[a-zA-Z0-9.-]+\.)?{PornHubBaseIE._PORNHUB_HOST_RE}/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.pornhub.com/model/zoe_ph',
|
||||
'playlist_mincount': 118,
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from .floatplane import FloatplaneBaseIE
|
||||
from .floatplane import FloatplaneBaseIE, FloatplaneChannelBaseIE
|
||||
|
||||
|
||||
class SaucePlusIE(FloatplaneBaseIE):
|
||||
@@ -39,3 +39,19 @@ class SaucePlusIE(FloatplaneBaseIE):
|
||||
def _real_initialize(self):
|
||||
if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'):
|
||||
self.raise_login_required()
|
||||
|
||||
|
||||
class SaucePlusChannelIE(FloatplaneChannelBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?:www|beta)\.)?sauceplus\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'
|
||||
_BASE_URL = 'https://www.sauceplus.com'
|
||||
_RESULT_IE = SaucePlusIE
|
||||
_PAGE_SIZE = 20
|
||||
_TESTS = [{
|
||||
'url': 'https://www.sauceplus.com/channel/williamosman/home',
|
||||
'info_dict': {
|
||||
'id': 'williamosman',
|
||||
'title': 'William Osman',
|
||||
'description': 'md5:a67bc961d23c293b2c5308d84f34f26c',
|
||||
},
|
||||
'playlist_mincount': 158,
|
||||
}]
|
||||
|
||||
@@ -146,8 +146,8 @@ class SBSIE(InfoExtractor):
|
||||
'release_year': ('releaseYear', {int_or_none}),
|
||||
'duration': ('duration', ({float_or_none}, {parse_duration})),
|
||||
'is_live': ('liveStream', {bool}),
|
||||
'age_limit': (('classificationID', 'contentRating'), {str.upper}, {
|
||||
lambda x: self._AUS_TV_PARENTAL_GUIDELINES.get(x)}), # dict.get is unhashable in py3.7
|
||||
'age_limit': (
|
||||
('classificationID', 'contentRating'), {str.upper}, {self._AUS_TV_PARENTAL_GUIDELINES.get}),
|
||||
}, get_all=False),
|
||||
**traverse_obj(media, {
|
||||
'categories': (('genres', ...), ('taxonomy', ('genre', 'subgenre'), 'name'), {str}),
|
||||
|
||||
@@ -6,6 +6,7 @@ import re
|
||||
from .common import InfoExtractor, SearchInfoExtractor
|
||||
from ..networking import HEADRequest
|
||||
from ..networking.exceptions import HTTPError
|
||||
from ..networking.impersonate import ImpersonateTarget
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
@@ -118,9 +119,9 @@ class SoundcloudBaseIE(InfoExtractor):
|
||||
self.cache.store('soundcloud', 'client_id', client_id)
|
||||
|
||||
def _update_client_id(self):
|
||||
webpage = self._download_webpage('https://soundcloud.com/', None)
|
||||
webpage = self._download_webpage('https://soundcloud.com/', None, 'Downloading main page')
|
||||
for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
|
||||
script = self._download_webpage(src, None, fatal=False)
|
||||
script = self._download_webpage(src, None, 'Downloading JS asset', fatal=False)
|
||||
if script:
|
||||
client_id = self._search_regex(
|
||||
r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
|
||||
@@ -136,13 +137,13 @@ class SoundcloudBaseIE(InfoExtractor):
|
||||
if non_fatal:
|
||||
del kwargs['fatal']
|
||||
query = kwargs.get('query', {}).copy()
|
||||
for _ in range(2):
|
||||
for is_first_attempt in (True, False):
|
||||
query['client_id'] = self._CLIENT_ID
|
||||
kwargs['query'] = query
|
||||
try:
|
||||
return self._download_json(*args, **kwargs)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
|
||||
if is_first_attempt and isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
|
||||
self._store_client_id(None)
|
||||
self._update_client_id()
|
||||
continue
|
||||
@@ -152,7 +153,10 @@ class SoundcloudBaseIE(InfoExtractor):
|
||||
raise
|
||||
|
||||
def _initialize_pre_login(self):
|
||||
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf'
|
||||
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id')
|
||||
if self._CLIENT_ID:
|
||||
return
|
||||
self._update_client_id()
|
||||
|
||||
def _verify_oauth_token(self, token):
|
||||
if self._request_webpage(
|
||||
@@ -830,6 +834,30 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
|
||||
'entries': self._entries(base_url, playlist_id),
|
||||
}
|
||||
|
||||
@functools.cached_property
|
||||
def _browser_impersonate_target(self):
|
||||
available_targets = self._downloader._get_available_impersonate_targets()
|
||||
if not available_targets:
|
||||
# impersonate=True gives a generic warning when no impersonation targets are available
|
||||
return True
|
||||
|
||||
# Any browser target older than chrome-116 is 403'd by Datadome
|
||||
MIN_SUPPORTED_TARGET = ImpersonateTarget('chrome', '116', 'windows', '10')
|
||||
version_as_float = lambda x: float(x.version) if x.version else 0
|
||||
|
||||
# Always try to use the newest Chrome target available
|
||||
filtered = sorted([
|
||||
target[0] for target in available_targets
|
||||
if target[0].client == 'chrome' and target[0].os in ('windows', 'macos')
|
||||
], key=version_as_float)
|
||||
|
||||
if not filtered or version_as_float(filtered[-1]) < version_as_float(MIN_SUPPORTED_TARGET):
|
||||
# All available targets are inadequate or newest available Chrome target is too old, so
|
||||
# warn the user to upgrade their dependency to a version with the minimum supported target
|
||||
return MIN_SUPPORTED_TARGET
|
||||
|
||||
return filtered[-1]
|
||||
|
||||
def _entries(self, url, playlist_id):
|
||||
# Per the SoundCloud documentation, the maximum limit for a linked partitioning query is 200.
|
||||
# https://developers.soundcloud.com/blog/offset-pagination-deprecated
|
||||
@@ -844,7 +872,9 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudBaseIE):
|
||||
try:
|
||||
response = self._call_api(
|
||||
url, playlist_id, query=query, headers=self._HEADERS,
|
||||
note=f'Downloading track page {i + 1}')
|
||||
note=f'Downloading track page {i + 1}',
|
||||
# See: https://github.com/yt-dlp/yt-dlp/issues/15660
|
||||
impersonate=self._browser_impersonate_target)
|
||||
break
|
||||
except ExtractorError as e:
|
||||
# Downloading page may result in intermittent 502 HTTP error
|
||||
|
||||
@@ -3,6 +3,7 @@ import re
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
clean_html,
|
||||
determine_ext,
|
||||
merge_dicts,
|
||||
parse_duration,
|
||||
@@ -12,6 +13,7 @@ from ..utils import (
|
||||
urlencode_postdata,
|
||||
urljoin,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj, trim_str
|
||||
|
||||
|
||||
class SpankBangIE(InfoExtractor):
|
||||
@@ -122,7 +124,7 @@ class SpankBangIE(InfoExtractor):
|
||||
}), headers={
|
||||
'Referer': url,
|
||||
'X-Requested-With': 'XMLHttpRequest',
|
||||
})
|
||||
}, impersonate=True)
|
||||
|
||||
for format_id, format_url in stream.items():
|
||||
if format_url and isinstance(format_url, list):
|
||||
@@ -178,9 +180,9 @@ class SpankBangPlaylistIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
mobj = self._match_valid_url(url)
|
||||
playlist_id = mobj.group('id')
|
||||
|
||||
webpage = self._download_webpage(
|
||||
url, playlist_id, headers={'Cookie': 'country=US; mobile=on'})
|
||||
country = self.get_param('geo_bypass_country') or 'US'
|
||||
self._set_cookie('.spankbang.com', 'country', country.upper())
|
||||
webpage = self._download_webpage(url, playlist_id, impersonate=True)
|
||||
|
||||
entries = [self.url_result(
|
||||
urljoin(url, mobj.group('path')),
|
||||
@@ -189,8 +191,8 @@ class SpankBangPlaylistIE(InfoExtractor):
|
||||
r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/[^"\'](?:(?!\1).)*)\1',
|
||||
webpage)]
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<em>([^<]+)</em>\s+playlist\s*<', webpage, 'playlist title',
|
||||
fatal=False)
|
||||
title = traverse_obj(webpage, (
|
||||
{find_element(tag='h1', attr='data-testid', value='playlist-title')},
|
||||
{clean_html}, {trim_str(end=' Playlist')}))
|
||||
|
||||
return self.playlist_result(entries, playlist_id, title)
|
||||
|
||||
@@ -8,15 +8,12 @@ from ..utils import (
|
||||
extract_attributes,
|
||||
join_nonempty,
|
||||
js_to_json,
|
||||
parse_resolution,
|
||||
str_or_none,
|
||||
url_basename,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import (
|
||||
find_element,
|
||||
find_elements,
|
||||
traverse_obj,
|
||||
trim_str,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj
|
||||
|
||||
|
||||
class SteamIE(InfoExtractor):
|
||||
@@ -27,7 +24,7 @@ class SteamIE(InfoExtractor):
|
||||
'id': '105600',
|
||||
'title': 'Terraria',
|
||||
},
|
||||
'playlist_mincount': 3,
|
||||
'playlist_mincount': 5,
|
||||
}, {
|
||||
'url': 'https://store.steampowered.com/app/271590/Grand_Theft_Auto_V/',
|
||||
'info_dict': {
|
||||
@@ -37,6 +34,39 @@ class SteamIE(InfoExtractor):
|
||||
'playlist_mincount': 26,
|
||||
}]
|
||||
|
||||
def _entries(self, app_id, app_name, data_props):
|
||||
for trailer in traverse_obj(data_props, (
|
||||
'trailers', lambda _, v: str_or_none(v['id']),
|
||||
)):
|
||||
movie_id = str_or_none(trailer['id'])
|
||||
|
||||
thumbnails = []
|
||||
for thumbnail_url in traverse_obj(trailer, (
|
||||
('poster', 'thumbnail'), {url_or_none},
|
||||
)):
|
||||
thumbnails.append({
|
||||
'url': thumbnail_url,
|
||||
**parse_resolution(url_basename(thumbnail_url)),
|
||||
})
|
||||
|
||||
formats = []
|
||||
if hls_manifest := traverse_obj(trailer, ('hlsManifest', {url_or_none})):
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
for dash_manifest in traverse_obj(trailer, ('dashManifests', ..., {url_or_none})):
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
dash_manifest, app_id, mpd_id='dash', fatal=False))
|
||||
self._remove_duplicate_formats(formats)
|
||||
|
||||
yield {
|
||||
'id': join_nonempty(app_id, movie_id),
|
||||
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
|
||||
'formats': formats,
|
||||
'series': app_name,
|
||||
'series_id': app_id,
|
||||
'thumbnails': thumbnails,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
app_id = self._match_id(url)
|
||||
|
||||
@@ -45,32 +75,13 @@ class SteamIE(InfoExtractor):
|
||||
self._set_cookie('store.steampowered.com', 'lastagecheckage', '1-January-2000')
|
||||
|
||||
webpage = self._download_webpage(url, app_id)
|
||||
app_name = traverse_obj(webpage, ({find_element(cls='apphub_AppName')}, {clean_html}))
|
||||
data_props = traverse_obj(webpage, (
|
||||
{find_element(cls='gamehighlight_desktopcarousel', html=True)},
|
||||
{extract_attributes}, 'data-props', {json.loads}, {dict}))
|
||||
app_name = traverse_obj(data_props, ('appName', {clean_html}))
|
||||
|
||||
entries = []
|
||||
for data_prop in traverse_obj(webpage, (
|
||||
{find_elements(cls='highlight_player_item highlight_movie', html=True)},
|
||||
..., {extract_attributes}, 'data-props', {json.loads}, {dict},
|
||||
)):
|
||||
formats = []
|
||||
if hls_manifest := traverse_obj(data_prop, ('hlsManifest', {url_or_none})):
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_manifest, app_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
for dash_manifest in traverse_obj(data_prop, ('dashManifests', ..., {url_or_none})):
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
dash_manifest, app_id, mpd_id='dash', fatal=False))
|
||||
|
||||
movie_id = traverse_obj(data_prop, ('id', {trim_str(start='highlight_movie_')}))
|
||||
entries.append({
|
||||
'id': movie_id,
|
||||
'title': join_nonempty(app_name, 'video', movie_id, delim=' '),
|
||||
'formats': formats,
|
||||
'series': app_name,
|
||||
'series_id': app_id,
|
||||
'thumbnail': traverse_obj(data_prop, ('screenshot', {url_or_none})),
|
||||
})
|
||||
|
||||
return self.playlist_result(entries, app_id, app_name)
|
||||
return self.playlist_result(
|
||||
self._entries(app_id, app_name, data_props), app_id, app_name)
|
||||
|
||||
|
||||
class SteamCommunityIE(InfoExtractor):
|
||||
|
||||
@@ -22,7 +22,7 @@ class StreaksBaseIE(InfoExtractor):
|
||||
_GEO_BYPASS = False
|
||||
_GEO_COUNTRIES = ['JP']
|
||||
|
||||
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False):
|
||||
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False, live_from_start=False):
|
||||
try:
|
||||
response = self._download_json(
|
||||
self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''),
|
||||
@@ -83,6 +83,10 @@ class StreaksBaseIE(InfoExtractor):
|
||||
|
||||
fmts, subs = self._extract_m3u8_formats_and_subtitles(
|
||||
src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query)
|
||||
for fmt in fmts:
|
||||
if live_from_start:
|
||||
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args': ['-live_start_index', '0']})
|
||||
fmt['is_from_start'] = True
|
||||
formats.extend(fmts)
|
||||
self._merge_subtitles(subs, target=subtitles)
|
||||
|
||||
|
||||
@@ -102,7 +102,7 @@ class TeachableIE(TeachableBaseIE):
|
||||
_WORKING = False
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
{}https?://(?P<site_t>[^/]+)|
|
||||
{}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
|
||||
https?://(?:www\.)?(?P<site>{})
|
||||
)
|
||||
/courses/[^/]+/lectures/(?P<id>\d+)
|
||||
@@ -211,7 +211,7 @@ class TeachableIE(TeachableBaseIE):
|
||||
class TeachableCourseIE(TeachableBaseIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
{}https?://(?P<site_t>[^/]+)|
|
||||
{}https?://(?P<site_t>[a-zA-Z0-9.-]+)|
|
||||
https?://(?:www\.)?(?P<site>{})
|
||||
)
|
||||
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
|
||||
|
||||
@@ -9,39 +9,39 @@ class Tele5IE(DiscoveryPlusBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?P<parent_slug>[\w-]+)/(?P<slug_a>[\w-]+)(?:/(?P<slug_b>[\w-]+))?'
|
||||
_TESTS = [{
|
||||
# slug_a and slug_b
|
||||
'url': 'https://tele5.de/mediathek/stargate-atlantis/quarantane',
|
||||
'url': 'https://tele5.de/mediathek/star-trek-enterprise/vox-sola',
|
||||
'info_dict': {
|
||||
'id': '6852024',
|
||||
'id': '4140114',
|
||||
'ext': 'mp4',
|
||||
'title': 'Quarantäne',
|
||||
'description': 'md5:6af0373bd0fcc4f13e5d47701903d675',
|
||||
'episode': 'Episode 73',
|
||||
'episode_number': 73,
|
||||
'season': 'Season 4',
|
||||
'season_number': 4,
|
||||
'series': 'Stargate Atlantis',
|
||||
'upload_date': '20240525',
|
||||
'timestamp': 1716643200,
|
||||
'duration': 2503.2,
|
||||
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/21/c81fcb45-8902-309b-badb-4e6d546b575d.jpeg',
|
||||
'creators': ['Tele5'],
|
||||
'title': 'Vox Sola',
|
||||
'description': 'md5:329d115f74324d4364efc1a11c4ea7c9',
|
||||
'duration': 2542.76,
|
||||
'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
|
||||
'tags': [],
|
||||
'creators': ['Tele5'],
|
||||
'series': 'Star Trek - Enterprise',
|
||||
'season': 'Season 1',
|
||||
'season_number': 1,
|
||||
'episode': 'Episode 22',
|
||||
'episode_number': 22,
|
||||
'timestamp': 1770491100,
|
||||
'upload_date': '20260207',
|
||||
},
|
||||
}, {
|
||||
# only slug_a
|
||||
'url': 'https://tele5.de/mediathek/inside-out',
|
||||
'url': 'https://tele5.de/mediathek/30-miles-from-nowhere-im-wald-hoert-dich-niemand-schreien',
|
||||
'info_dict': {
|
||||
'id': '6819502',
|
||||
'id': '4102641',
|
||||
'ext': 'mp4',
|
||||
'title': 'Inside out',
|
||||
'description': 'md5:7e5f32ed0be5ddbd27713a34b9293bfd',
|
||||
'series': 'Inside out',
|
||||
'upload_date': '20240523',
|
||||
'timestamp': 1716494400,
|
||||
'duration': 5343.4,
|
||||
'thumbnail': 'https://eu1-prod-images.disco-api.com/2024/05/15/181eba3c-f9f0-3faf-b14d-0097050a3aa4.jpeg',
|
||||
'creators': ['Tele5'],
|
||||
'title': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
|
||||
'description': 'md5:0b731539f39ee186ebcd9dd444a86fc2',
|
||||
'duration': 4849.96,
|
||||
'thumbnail': r're:https://[^/.]+\.disco-api\.com/.+\.jpe?g',
|
||||
'tags': [],
|
||||
'creators': ['Tele5'],
|
||||
'series': '30 Miles from Nowhere - Im Wald hört dich niemand schreien',
|
||||
'timestamp': 1770417300,
|
||||
'upload_date': '20260206',
|
||||
},
|
||||
}, {
|
||||
# playlist
|
||||
@@ -50,20 +50,27 @@ class Tele5IE(DiscoveryPlusBaseIE):
|
||||
'id': 'mediathek-schlefaz',
|
||||
},
|
||||
'playlist_mincount': 3,
|
||||
'skip': 'Dead link',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
parent_slug, slug_a, slug_b = self._match_valid_url(url).group('parent_slug', 'slug_a', 'slug_b')
|
||||
playlist_id = join_nonempty(parent_slug, slug_a, slug_b, delim='-')
|
||||
|
||||
query = {'environment': 'tele5', 'v': '2'}
|
||||
query = {
|
||||
'include': 'default',
|
||||
'filter[environment]': 'tele5',
|
||||
'v': '2',
|
||||
}
|
||||
|
||||
if not slug_b:
|
||||
endpoint = f'page/{slug_a}'
|
||||
query['parent_slug'] = parent_slug
|
||||
else:
|
||||
endpoint = f'videos/{slug_b}'
|
||||
query['filter[show.slug]'] = slug_a
|
||||
cms_data = self._download_json(f'https://de-api.loma-cms.com/feloma/{endpoint}/', playlist_id, query=query)
|
||||
endpoint = f'shows/{slug_a}'
|
||||
query['filter[video.slug]'] = slug_b
|
||||
|
||||
cms_data = self._download_json(f'https://public.aurora.enhanced.live/site/{endpoint}/', playlist_id, query=query)
|
||||
|
||||
return self.playlist_result(map(
|
||||
functools.partial(self._get_disco_api_info, url, disco_host='eu1-prod.disco-api.com', realm='dmaxde', country='DE'),
|
||||
|
||||
@@ -51,7 +51,8 @@ class TruthIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
status = self._download_json(f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id)
|
||||
status = self._download_json(
|
||||
f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id, impersonate=True)
|
||||
uploader_id = strip_or_none(traverse_obj(status, ('account', 'username')))
|
||||
return {
|
||||
'id': video_id,
|
||||
|
||||
@@ -4,6 +4,7 @@ from .streaks import StreaksBaseIE
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
GeoRestrictedError,
|
||||
clean_html,
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
make_archive_id,
|
||||
@@ -11,7 +12,9 @@ from ..utils import (
|
||||
str_or_none,
|
||||
strip_or_none,
|
||||
time_seconds,
|
||||
unified_timestamp,
|
||||
update_url_query,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import require, traverse_obj
|
||||
|
||||
@@ -257,3 +260,113 @@ class TVerIE(StreaksBaseIE):
|
||||
'id': video_id,
|
||||
'_old_archive_ids': [make_archive_id('BrightcoveNew', brightcove_id)] if brightcove_id else None,
|
||||
}
|
||||
|
||||
|
||||
class TVerOlympicIE(StreaksBaseIE):
|
||||
IE_NAME = 'tver:olympic'
|
||||
|
||||
_API_BASE = 'https://olympic-data.tver.jp/api'
|
||||
_VALID_URL = r'https?://(?:www\.)?tver\.jp/olympic/milanocortina2026/(?P<type>live|video)/play/(?P<id>\w+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://tver.jp/olympic/milanocortina2026/video/play/3b1d4462150b42558d9cc8aabb5238d0/',
|
||||
'info_dict': {
|
||||
'id': '3b1d4462150b42558d9cc8aabb5238d0',
|
||||
'ext': 'mp4',
|
||||
'title': '【開会式】ぎゅっと凝縮ハイライト',
|
||||
'display_id': 'ref:3b1d4462150b42558d9cc8aabb5238d0',
|
||||
'duration': 712.045,
|
||||
'live_status': 'not_live',
|
||||
'modified_date': r're:\d{8}',
|
||||
'modified_timestamp': int,
|
||||
'tags': 'count:1',
|
||||
'thumbnail': r're:https://.+\.(?:jpg|png)',
|
||||
'timestamp': 1770420187,
|
||||
'upload_date': '20260206',
|
||||
'uploader_id': 'tver-olympic',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://tver.jp/olympic/milanocortina2026/live/play/glts313itwvj/',
|
||||
'info_dict': {
|
||||
'id': 'glts313itwvj',
|
||||
'ext': 'mp4',
|
||||
'title': '開会式ハイライト',
|
||||
'channel_id': 'ntv',
|
||||
'display_id': 'ref:sp_260207_spc_01_dvr',
|
||||
'duration': 7680,
|
||||
'live_status': 'was_live',
|
||||
'modified_date': r're:\d{8}',
|
||||
'modified_timestamp': int,
|
||||
'thumbnail': r're:https://.+\.(?:jpg|png)',
|
||||
'timestamp': 1770420300,
|
||||
'upload_date': '20260206',
|
||||
'uploader_id': 'tver-olympic-live',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_type, video_id = self._match_valid_url(url).group('type', 'id')
|
||||
live_from_start = self.get_param('live_from_start')
|
||||
|
||||
if video_type == 'live':
|
||||
project_id = 'tver-olympic-live'
|
||||
api_key = 'a35ebb1ca7d443758dc7fcc5d99b1f72'
|
||||
olympic_data = traverse_obj(self._download_json(
|
||||
f'{self._API_BASE}/live/{video_id}', video_id), ('contents', 'live', {dict}))
|
||||
media_id = traverse_obj(olympic_data, ('video_id', {str}))
|
||||
|
||||
now = time_seconds()
|
||||
start_timestamp_str = traverse_obj(olympic_data, ('onair_start_date', {str}))
|
||||
start_timestamp = unified_timestamp(start_timestamp_str, tz_offset=9)
|
||||
if not start_timestamp:
|
||||
raise ExtractorError('Unable to extract on-air start time')
|
||||
end_timestamp = traverse_obj(olympic_data, (
|
||||
'onair_end_date', {unified_timestamp(tz_offset=9)}, {require('on-air end time')}))
|
||||
|
||||
if now < start_timestamp:
|
||||
self.raise_no_formats(
|
||||
f'This program is scheduled to start at {start_timestamp_str} JST', expected=True)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'live_status': 'is_upcoming',
|
||||
'release_timestamp': start_timestamp,
|
||||
}
|
||||
elif start_timestamp <= now < end_timestamp:
|
||||
live_status = 'is_live'
|
||||
if live_from_start:
|
||||
media_id += '_dvr'
|
||||
elif end_timestamp <= now:
|
||||
dvr_end_timestamp = traverse_obj(olympic_data, (
|
||||
'dvr_end_date', {unified_timestamp(tz_offset=9)}))
|
||||
if dvr_end_timestamp and now < dvr_end_timestamp:
|
||||
live_status = 'was_live'
|
||||
media_id += '_dvr'
|
||||
else:
|
||||
raise ExtractorError(
|
||||
'This program is no longer available', expected=True)
|
||||
else:
|
||||
project_id = 'tver-olympic'
|
||||
api_key = '4b55a4db3cce4ad38df6dd8543e3e46a'
|
||||
media_id = video_id
|
||||
live_status = 'not_live'
|
||||
olympic_data = traverse_obj(self._download_json(
|
||||
f'{self._API_BASE}/video/{video_id}', video_id), ('contents', 'video', {dict}))
|
||||
|
||||
return {
|
||||
**self._extract_from_streaks_api(project_id, f'ref:{media_id}', {
|
||||
'Origin': 'https://tver.jp',
|
||||
'Referer': 'https://tver.jp/',
|
||||
'X-Streaks-Api-Key': api_key,
|
||||
}, live_from_start=live_from_start),
|
||||
**traverse_obj(olympic_data, {
|
||||
'title': ('title', {clean_html}, filter),
|
||||
'alt_title': ('sub_title', {clean_html}, filter),
|
||||
'channel': ('channel', {clean_html}, filter),
|
||||
'channel_id': ('channel_id', {clean_html}, filter),
|
||||
'description': (('description', 'description_l', 'description_s'), {clean_html}, filter, any),
|
||||
'timestamp': ('onair_start_date', {unified_timestamp(tz_offset=9)}),
|
||||
'thumbnail': (('picture_l_url', 'picture_m_url', 'picture_s_url'), {url_or_none}, any),
|
||||
}),
|
||||
'id': video_id,
|
||||
'live_status': live_status,
|
||||
}
|
||||
|
||||
152
yt_dlp/extractor/tvo.py
Normal file
152
yt_dlp/extractor/tvo.py
Normal file
@@ -0,0 +1,152 @@
|
||||
import json
|
||||
import urllib.parse
|
||||
|
||||
from .brightcove import BrightcoveNewIE
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
smuggle_url,
|
||||
str_or_none,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import (
|
||||
require,
|
||||
traverse_obj,
|
||||
trim_str,
|
||||
)
|
||||
|
||||
|
||||
class TvoIE(InfoExtractor):
|
||||
IE_NAME = 'TVO'
|
||||
_VALID_URL = r'https?://(?:www\.)?tvo\.org/video(?:/documentaries)?/(?P<id>[\w-]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvo.org/video/how-can-ontario-survive-the-trade-war',
|
||||
'info_dict': {
|
||||
'id': '6377531034112',
|
||||
'ext': 'mp4',
|
||||
'title': 'How Can Ontario Survive the Trade War?',
|
||||
'description': 'md5:e7455d9cd4b6b1270141922044161457',
|
||||
'display_id': 'how-can-ontario-survive-the-trade-war',
|
||||
'duration': 3531,
|
||||
'episode': 'How Can Ontario Survive the Trade War?',
|
||||
'episode_id': 'how-can-ontario-survive-the-trade-war',
|
||||
'episode_number': 1,
|
||||
'season': 'Season 1',
|
||||
'season_number': 1,
|
||||
'series': 'TVO at AMO',
|
||||
'series_id': 'tvo-at-amo',
|
||||
'tags': 'count:17',
|
||||
'thumbnail': r're:https?://.+',
|
||||
'timestamp': 1756944016,
|
||||
'upload_date': '20250904',
|
||||
'uploader_id': '18140038001',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.tvo.org/video/documentaries/the-pitch',
|
||||
'info_dict': {
|
||||
'id': '6382500333112',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Pitch',
|
||||
'categories': ['Documentaries'],
|
||||
'description': 'md5:9d4246b70dce772a3a396c4bd84c8506',
|
||||
'display_id': 'the-pitch',
|
||||
'duration': 5923,
|
||||
'episode': 'The Pitch',
|
||||
'episode_id': 'the-pitch',
|
||||
'episode_number': 1,
|
||||
'season': 'Season 1',
|
||||
'season_number': 1,
|
||||
'series': 'The Pitch',
|
||||
'series_id': 'the-pitch',
|
||||
'tags': 'count:8',
|
||||
'thumbnail': r're:https?://.+',
|
||||
'timestamp': 1762693216,
|
||||
'upload_date': '20251109',
|
||||
'uploader_id': '18140038001',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.tvo.org/video/documentaries/valentines-day',
|
||||
'info_dict': {
|
||||
'id': '6387298331112',
|
||||
'ext': 'mp4',
|
||||
'title': 'Valentine\'s Day',
|
||||
'categories': ['Documentaries'],
|
||||
'description': 'md5:b142149beb2d3a855244816c50cd2f14',
|
||||
'display_id': 'valentines-day',
|
||||
'duration': 3121,
|
||||
'episode': 'Valentine\'s Day',
|
||||
'episode_id': 'valentines-day',
|
||||
'episode_number': 2,
|
||||
'season': 'Season 1',
|
||||
'season_number': 1,
|
||||
'series': 'How We Celebrate',
|
||||
'series_id': 'how-we-celebrate',
|
||||
'tags': 'count:6',
|
||||
'thumbnail': r're:https?://.+',
|
||||
'timestamp': 1770386416,
|
||||
'upload_date': '20260206',
|
||||
'uploader_id': '18140038001',
|
||||
},
|
||||
}]
|
||||
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/18140038001/default_default/index.html?videoId=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
'https://hmy0rc1bo2.execute-api.ca-central-1.amazonaws.com/graphql',
|
||||
display_id, headers={'Content-Type': 'application/json'},
|
||||
data=json.dumps({
|
||||
'operationName': 'getVideo',
|
||||
'variables': {'slug': urllib.parse.urlparse(url).path.rstrip('/')},
|
||||
'query': '''query getVideo($slug: String) {
|
||||
getTVOOrgVideo(slug: $slug) {
|
||||
contentCategory
|
||||
description
|
||||
length
|
||||
program {
|
||||
nodeUrl
|
||||
title
|
||||
}
|
||||
programOrder
|
||||
publishedAt
|
||||
season
|
||||
tags
|
||||
thumbnail
|
||||
title
|
||||
videoSource {
|
||||
brightcoveRefId
|
||||
}
|
||||
}
|
||||
}''',
|
||||
}, separators=(',', ':')).encode(),
|
||||
)['data']['getTVOOrgVideo']
|
||||
|
||||
brightcove_id = traverse_obj(video_data, (
|
||||
'videoSource', 'brightcoveRefId', {str_or_none}, {require('Brightcove ID')}))
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': BrightcoveNewIE.ie_key(),
|
||||
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['CA']}),
|
||||
'display_id': display_id,
|
||||
'episode_id': display_id,
|
||||
**traverse_obj(video_data, {
|
||||
'title': ('title', {clean_html}, filter),
|
||||
'categories': ('contentCategory', {clean_html}, filter, all, filter),
|
||||
'description': ('description', {clean_html}, filter),
|
||||
'duration': ('length', {parse_duration}),
|
||||
'episode': ('title', {clean_html}, filter),
|
||||
'episode_number': ('programOrder', {int_or_none}),
|
||||
'season_number': ('season', {int_or_none}),
|
||||
'tags': ('tags', ..., {clean_html}, filter),
|
||||
'thumbnail': ('thumbnail', {url_or_none}),
|
||||
'timestamp': ('publishedAt', {parse_iso8601}),
|
||||
}),
|
||||
**traverse_obj(video_data, ('program', {
|
||||
'series': ('title', {clean_html}, filter),
|
||||
'series_id': ('nodeUrl', {clean_html}, {trim_str(start='/programs/')}, filter),
|
||||
})),
|
||||
}
|
||||
@@ -131,11 +131,15 @@ class TwitterBaseIE(InfoExtractor):
|
||||
video_id, headers=headers, query=query, expected_status=allowed_status,
|
||||
note=f'Downloading {"GraphQL" if graphql else "legacy API"} JSON')
|
||||
|
||||
if result.get('errors'):
|
||||
errors = ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str}))))
|
||||
if errors and 'not authorized' in errors:
|
||||
self.raise_login_required(remove_end(errors, '.'))
|
||||
raise ExtractorError(f'Error(s) while querying API: {errors or "Unknown error"}')
|
||||
if error_msg := ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str})))):
|
||||
# Errors with the message 'Dependency: Unspecified' are a false positive
|
||||
# See https://github.com/yt-dlp/yt-dlp/issues/15963
|
||||
if error_msg.lower() == 'dependency: unspecified':
|
||||
self.write_debug(f'Ignoring Twitter API error: "{error_msg}"')
|
||||
elif 'not authorized' in error_msg.lower():
|
||||
self.raise_login_required(remove_end(error_msg, '.'))
|
||||
else:
|
||||
raise ExtractorError(f'Error(s) while querying API: {error_msg or "Unknown error"}')
|
||||
|
||||
return result
|
||||
|
||||
@@ -1078,7 +1082,7 @@ class TwitterIE(TwitterBaseIE):
|
||||
raise ExtractorError(f'Twitter API says: {cause or "Unknown error"}', expected=True)
|
||||
elif typename == 'TweetUnavailable':
|
||||
reason = result.get('reason')
|
||||
if reason == 'NsfwLoggedOut':
|
||||
if reason in ('NsfwLoggedOut', 'NsfwViewerHasNoStatedAge'):
|
||||
self.raise_login_required('NSFW tweet requires authentication')
|
||||
elif reason == 'Protected':
|
||||
self.raise_login_required('You are not authorized to view this protected tweet')
|
||||
|
||||
@@ -67,6 +67,10 @@ class KnownDRMIE(UnsupportedInfoExtractor):
|
||||
r'plus\.rtl\.de(?!/podcast/)',
|
||||
r'mediasetinfinity\.es',
|
||||
r'tv5mondeplus\.com',
|
||||
r'tv\.rakuten\.co\.jp',
|
||||
r'watch\.telusoriginals\.com',
|
||||
r'video\.unext\.jp',
|
||||
r'www\.web\.nhk',
|
||||
)
|
||||
|
||||
_TESTS = [{
|
||||
@@ -231,6 +235,23 @@ class KnownDRMIE(UnsupportedInfoExtractor):
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/14743
|
||||
'url': 'https://www.tv5mondeplus.com/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/8821
|
||||
'url': 'https://tv.rakuten.co.jp/content/519554/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/9851
|
||||
'url': 'https://watch.telusoriginals.com/play?assetID=fruit-is-ripe',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/13220
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/14564
|
||||
'url': 'https://video.unext.jp/play/SID0062010/ED00337407',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# https://github.com/yt-dlp/yt-dlp/issues/14620
|
||||
'url': 'https://www.web.nhk/tv/an/72hours/pl/series-tep-W3W8WRN8M3/ep/QW8ZY6146V',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
116
yt_dlp/extractor/visir.py
Normal file
116
yt_dlp/extractor/visir.py
Normal file
@@ -0,0 +1,116 @@
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
UnsupportedError,
|
||||
clean_html,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
month_by_name,
|
||||
url_or_none,
|
||||
urljoin,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj
|
||||
|
||||
|
||||
class VisirIE(InfoExtractor):
|
||||
IE_DESC = 'Vísir'
|
||||
|
||||
_VALID_URL = r'https?://(?:www\.)?visir\.is/(?P<type>k|player)/(?P<id>[\da-f-]+)(?:/(?P<slug>[\w.-]+))?'
|
||||
_EMBED_REGEX = [rf'<iframe[^>]+src=["\'](?P<url>{_VALID_URL})']
|
||||
_TESTS = [{
|
||||
'url': 'https://www.visir.is/k/eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
|
||||
'info_dict': {
|
||||
'id': 'eabb8f7f-ad87-46fb-9469-a0f1dc0fc4bc-1769022963988',
|
||||
'ext': 'mp4',
|
||||
'title': 'Sveppi og Siggi Þór mestu skaphundarnir',
|
||||
'categories': ['island-i-dag'],
|
||||
'description': 'md5:e06bd6a0cd8bdde328ad8cf00d3d4df6',
|
||||
'duration': 792,
|
||||
'thumbnail': r're:https?://www\.visir\.is/.+',
|
||||
'upload_date': '20260121',
|
||||
'view_count': int,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.visir.is/k/b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704/tonlistarborgin-reykjavik',
|
||||
'info_dict': {
|
||||
'id': 'b0a88e02-eceb-4270-855c-8328b76b9d81-1763979306704',
|
||||
'ext': 'mp4',
|
||||
'title': 'Tónlistarborgin Reykjavík',
|
||||
'categories': ['tonlist'],
|
||||
'description': 'md5:47237589dc95dbde55dfbb163396f88a',
|
||||
'display_id': 'tonlistarborgin-reykjavik',
|
||||
'duration': 81,
|
||||
'thumbnail': r're:https?://www\.visir\.is/.+',
|
||||
'upload_date': '20251124',
|
||||
'view_count': int,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.visir.is/player/0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
|
||||
'info_dict': {
|
||||
'id': '0cd5709e-6870-46d0-aaaf-0ae637de94f1-1770060083580',
|
||||
'ext': 'mp4',
|
||||
'title': 'Sportpakkinn 2. febrúar 2026',
|
||||
'categories': ['sportpakkinn'],
|
||||
'display_id': 'sportpakkinn-2.-februar-2026',
|
||||
'duration': 293,
|
||||
'thumbnail': r're:https?://www\.visir\.is/.+',
|
||||
'upload_date': '20260202',
|
||||
'view_count': int,
|
||||
},
|
||||
}]
|
||||
_WEBPAGE_TESTS = [{
|
||||
'url': 'https://www.visir.is/g/20262837896d/segir-von-brigdin-med-prinsessuna-rista-djupt',
|
||||
'info_dict': {
|
||||
'id': '9ad5e58a-f26f-49f7-8b1d-68f0629485b7-1770059257365',
|
||||
'ext': 'mp4',
|
||||
'title': 'Norðmenn tala ekki um annað en prinsessuna',
|
||||
'categories': ['frettir'],
|
||||
'description': 'md5:53e2623ae79e1355778c14f5b557a0cd',
|
||||
'display_id': 'nordmenn-tala-ekki-um-annad-en-prinsessuna',
|
||||
'duration': 138,
|
||||
'thumbnail': r're:https?://www\.visir\.is/.+',
|
||||
'upload_date': '20260202',
|
||||
'view_count': int,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_type, video_id, display_id = self._match_valid_url(url).group('type', 'id', 'slug')
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
if video_type == 'player':
|
||||
real_url = self._og_search_url(webpage)
|
||||
if not self.suitable(real_url) or self._match_valid_url(real_url).group('type') == 'player':
|
||||
raise UnsupportedError(real_url)
|
||||
return self.url_result(real_url, self.ie_key())
|
||||
|
||||
upload_date = None
|
||||
date_elements = traverse_obj(webpage, (
|
||||
{find_element(cls='article-item__date')}, {clean_html}, filter, {str.split}))
|
||||
if date_elements and len(date_elements) == 3:
|
||||
day, month, year = date_elements
|
||||
day = int_or_none(day.rstrip('.'))
|
||||
month = month_by_name(month, 'is')
|
||||
if day and month and re.fullmatch(r'[0-9]{4}', year):
|
||||
upload_date = f'{year}{month:02d}{day:02d}'
|
||||
|
||||
player = self._search_json(
|
||||
r'App\.Player\.Init\(', webpage, video_id, 'player', transform_source=js_to_json)
|
||||
m3u8_url = traverse_obj(player, ('File', {urljoin('https://vod.visir.is/')}))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4'),
|
||||
'upload_date': upload_date,
|
||||
**traverse_obj(webpage, ({find_element(cls='article-item press-ads')}, {
|
||||
'description': ({find_element(cls='-large')}, {clean_html}, filter),
|
||||
'view_count': ({find_element(cls='article-item__viewcount')}, {clean_html}, {int_or_none}),
|
||||
})),
|
||||
**traverse_obj(player, {
|
||||
'title': ('Title', {clean_html}),
|
||||
'categories': ('Categoryname', {clean_html}, filter, all, filter),
|
||||
'duration': ('MediaDuration', {int_or_none}),
|
||||
'thumbnail': ('Image', {url_or_none}),
|
||||
}),
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
import collections
|
||||
import hashlib
|
||||
import re
|
||||
import urllib.parse
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .dailymotion import DailymotionIE
|
||||
@@ -8,6 +9,7 @@ from .odnoklassniki import OdnoklassnikiIE
|
||||
from .sibnet import SibnetEmbedIE
|
||||
from .vimeo import VimeoIE
|
||||
from .youtube import YoutubeIE
|
||||
from ..jsinterp import JSInterpreter
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
UserNotLive,
|
||||
@@ -36,16 +38,38 @@ class VKBaseIE(InfoExtractor):
|
||||
|
||||
def _download_webpage_handle(self, url_or_request, video_id, *args, fatal=True, **kwargs):
|
||||
response = super()._download_webpage_handle(url_or_request, video_id, *args, fatal=fatal, **kwargs)
|
||||
challenge_url, cookie = response[1].url if response else '', None
|
||||
if challenge_url.startswith('https://vk.com/429.html?'):
|
||||
cookie = self._get_cookies(challenge_url).get('hash429')
|
||||
if not cookie:
|
||||
if response is False:
|
||||
return response
|
||||
|
||||
hash429 = hashlib.md5(cookie.value.encode('ascii')).hexdigest()
|
||||
webpage, urlh = response
|
||||
challenge_url = urlh.url
|
||||
if urllib.parse.urlparse(challenge_url).path != '/challenge.html':
|
||||
return response
|
||||
|
||||
self.to_screen(join_nonempty(
|
||||
video_id and f'[{video_id}]',
|
||||
'Received a JS challenge response',
|
||||
delim=' '))
|
||||
|
||||
challenge_hash = traverse_obj(challenge_url, (
|
||||
{parse_qs}, 'hash429', -1, {require('challenge hash')}))
|
||||
|
||||
func_code = self._search_regex(
|
||||
r'(?s)var\s+salt\s*=\s*\(\s*function\s*\(\)\s*(\{.+?\})\s*\)\(\);\s*var\s+hash',
|
||||
webpage, 'JS challenge salt function')
|
||||
|
||||
jsi = JSInterpreter(f'function salt() {func_code}')
|
||||
salt = jsi.extract_function('salt')([])
|
||||
self.write_debug(f'Generated salt with native JS interpreter: {salt}')
|
||||
|
||||
key_hash = hashlib.md5(f'{challenge_hash}:{salt}'.encode()).hexdigest()
|
||||
self.write_debug(f'JS challenge key hash: {key_hash}')
|
||||
|
||||
# Request with the challenge key and the response should set a 'solution429' cookie
|
||||
self._request_webpage(
|
||||
update_url_query(challenge_url, {'key': hash429}), video_id, fatal=fatal,
|
||||
note='Resolving WAF challenge', errnote='Failed to bypass WAF challenge')
|
||||
update_url_query(challenge_url, {'key': key_hash}), video_id,
|
||||
'Submitting JS challenge solution', 'Unable to solve JS challenge', fatal=True)
|
||||
|
||||
return super()._download_webpage_handle(url_or_request, video_id, *args, fatal=True, **kwargs)
|
||||
|
||||
def _perform_login(self, username, password):
|
||||
|
||||
@@ -3,6 +3,7 @@ import re
|
||||
import urllib.parse
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..jsinterp import int_to_int32
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
clean_html,
|
||||
@@ -20,73 +21,69 @@ from ..utils import (
|
||||
)
|
||||
|
||||
|
||||
def to_signed_32(n):
|
||||
return n % ((-1 if n < 0 else 1) * 2**32)
|
||||
|
||||
|
||||
class _ByteGenerator:
|
||||
def __init__(self, algo_id, seed):
|
||||
try:
|
||||
self._algorithm = getattr(self, f'_algo{algo_id}')
|
||||
except AttributeError:
|
||||
raise ExtractorError(f'Unknown algorithm ID "{algo_id}"')
|
||||
self._s = to_signed_32(seed)
|
||||
self._s = int_to_int32(seed)
|
||||
|
||||
def _algo1(self, s):
|
||||
# LCG (a=1664525, c=1013904223, m=2^32)
|
||||
# Ref: https://en.wikipedia.org/wiki/Linear_congruential_generator
|
||||
s = self._s = to_signed_32(s * 1664525 + 1013904223)
|
||||
s = self._s = int_to_int32(s * 1664525 + 1013904223)
|
||||
return s
|
||||
|
||||
def _algo2(self, s):
|
||||
# xorshift32
|
||||
# Ref: https://en.wikipedia.org/wiki/Xorshift
|
||||
s = to_signed_32(s ^ (s << 13))
|
||||
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 17))
|
||||
s = self._s = to_signed_32(s ^ (s << 5))
|
||||
s = int_to_int32(s ^ (s << 13))
|
||||
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 17))
|
||||
s = self._s = int_to_int32(s ^ (s << 5))
|
||||
return s
|
||||
|
||||
def _algo3(self, s):
|
||||
# Weyl Sequence (k≈2^32*φ, m=2^32) + MurmurHash3 (fmix32)
|
||||
# Ref: https://en.wikipedia.org/wiki/Weyl_sequence
|
||||
# https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash3.java.html
|
||||
s = self._s = to_signed_32(s + 0x9e3779b9)
|
||||
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16))
|
||||
s = to_signed_32(s * to_signed_32(0x85ebca77))
|
||||
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 13))
|
||||
s = to_signed_32(s * to_signed_32(0xc2b2ae3d))
|
||||
return to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 16))
|
||||
s = self._s = int_to_int32(s + 0x9e3779b9)
|
||||
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
|
||||
s = int_to_int32(s * int_to_int32(0x85ebca77))
|
||||
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 13))
|
||||
s = int_to_int32(s * int_to_int32(0xc2b2ae3d))
|
||||
return int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 16))
|
||||
|
||||
def _algo4(self, s):
|
||||
# Custom scrambling function involving a left rotation (ROL)
|
||||
s = self._s = to_signed_32(s + 0x6d2b79f5)
|
||||
s = to_signed_32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7
|
||||
s = to_signed_32(s + 0x9e3779b9)
|
||||
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 11))
|
||||
return to_signed_32(s * 0x27d4eb2d)
|
||||
s = self._s = int_to_int32(s + 0x6d2b79f5)
|
||||
s = int_to_int32((s << 7) | ((s & 0xFFFFFFFF) >> 25)) # ROL 7
|
||||
s = int_to_int32(s + 0x9e3779b9)
|
||||
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 11))
|
||||
return int_to_int32(s * 0x27d4eb2d)
|
||||
|
||||
def _algo5(self, s):
|
||||
# xorshift variant with a final addition
|
||||
s = to_signed_32(s ^ (s << 7))
|
||||
s = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 9))
|
||||
s = to_signed_32(s ^ (s << 8))
|
||||
s = self._s = to_signed_32(s + 0xa5a5a5a5)
|
||||
s = int_to_int32(s ^ (s << 7))
|
||||
s = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 9))
|
||||
s = int_to_int32(s ^ (s << 8))
|
||||
s = self._s = int_to_int32(s + 0xa5a5a5a5)
|
||||
return s
|
||||
|
||||
def _algo6(self, s):
|
||||
# LCG (a=0x2c9277b5, c=0xac564b05) with a variable right shift scrambler
|
||||
s = self._s = to_signed_32(s * to_signed_32(0x2c9277b5) + to_signed_32(0xac564b05))
|
||||
s2 = to_signed_32(s ^ ((s & 0xFFFFFFFF) >> 18))
|
||||
s = self._s = int_to_int32(s * int_to_int32(0x2c9277b5) + int_to_int32(0xac564b05))
|
||||
s2 = int_to_int32(s ^ ((s & 0xFFFFFFFF) >> 18))
|
||||
shift = (s & 0xFFFFFFFF) >> 27 & 31
|
||||
return to_signed_32((s2 & 0xFFFFFFFF) >> shift)
|
||||
return int_to_int32((s2 & 0xFFFFFFFF) >> shift)
|
||||
|
||||
def _algo7(self, s):
|
||||
# Weyl Sequence (k=0x9e3779b9) + custom multiply-xor-shift mixing function
|
||||
s = self._s = to_signed_32(s + to_signed_32(0x9e3779b9))
|
||||
e = to_signed_32(s ^ (s << 5))
|
||||
e = to_signed_32(e * to_signed_32(0x7feb352d))
|
||||
e = to_signed_32(e ^ ((e & 0xFFFFFFFF) >> 15))
|
||||
return to_signed_32(e * to_signed_32(0x846ca68b))
|
||||
s = self._s = int_to_int32(s + int_to_int32(0x9e3779b9))
|
||||
e = int_to_int32(s ^ (s << 5))
|
||||
e = int_to_int32(e * int_to_int32(0x7feb352d))
|
||||
e = int_to_int32(e ^ ((e & 0xFFFFFFFF) >> 15))
|
||||
return int_to_int32(e * int_to_int32(0x846ca68b))
|
||||
|
||||
def __next__(self):
|
||||
return self._algorithm(self._s) & 0xFF
|
||||
@@ -213,16 +210,9 @@ class XHamsterIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _decipher_format_url(self, format_url, format_id):
|
||||
parsed_url = urllib.parse.urlparse(format_url)
|
||||
|
||||
hex_string, path_remainder = self._search_regex(
|
||||
r'^/(?P<hex>[0-9a-fA-F]{12,})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
|
||||
default=(None, None), group=('hex', 'rem'))
|
||||
if not hex_string:
|
||||
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
|
||||
return None
|
||||
_VALID_HEX_RE = r'[0-9a-fA-F]{12,}'
|
||||
|
||||
def _decipher_hex_string(self, hex_string, format_id):
|
||||
byte_data = bytes.fromhex(hex_string)
|
||||
seed = int.from_bytes(byte_data[1:5], byteorder='little', signed=True)
|
||||
|
||||
@@ -232,7 +222,33 @@ class XHamsterIE(InfoExtractor):
|
||||
self.report_warning(f'Skipping format "{format_id}": {e.msg}')
|
||||
return None
|
||||
|
||||
deciphered = bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1')
|
||||
return bytearray(byte ^ next(byte_gen) for byte in byte_data[5:]).decode('latin-1')
|
||||
|
||||
def _decipher_format_url(self, format_url, format_id):
|
||||
# format_url can be hex ciphertext or a URL with a hex ciphertext segment
|
||||
if re.fullmatch(self._VALID_HEX_RE, format_url):
|
||||
return self._decipher_hex_string(format_url, format_id)
|
||||
elif not url_or_none(format_url):
|
||||
if re.fullmatch(r'[0-9a-fA-F]+', format_url):
|
||||
# Hex strings that are too short are expected, so we don't want to warn
|
||||
self.write_debug(f'Skipping dummy ciphertext for "{format_id}": {format_url}')
|
||||
else:
|
||||
# Something has likely changed on the site's end, so we need to warn
|
||||
self.report_warning(f'Skipping format "{format_id}": invalid ciphertext')
|
||||
return None
|
||||
|
||||
parsed_url = urllib.parse.urlparse(format_url)
|
||||
|
||||
hex_string, path_remainder = self._search_regex(
|
||||
rf'^/(?P<hex>{self._VALID_HEX_RE})(?P<rem>[/,].+)$', parsed_url.path, 'url components',
|
||||
default=(None, None), group=('hex', 'rem'))
|
||||
if not hex_string:
|
||||
self.report_warning(f'Skipping format "{format_id}": unsupported URL format')
|
||||
return None
|
||||
|
||||
deciphered = self._decipher_hex_string(hex_string, format_id)
|
||||
if not deciphered:
|
||||
return None
|
||||
|
||||
return parsed_url._replace(path=f'/{deciphered}{path_remainder}').geturl()
|
||||
|
||||
@@ -252,7 +268,7 @@ class XHamsterIE(InfoExtractor):
|
||||
display_id = mobj.group('display_id') or mobj.group('display_id_2')
|
||||
|
||||
desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
|
||||
webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
|
||||
webpage, urlh = self._download_webpage_handle(desktop_url, video_id, impersonate=True)
|
||||
|
||||
error = self._html_search_regex(
|
||||
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
|
||||
|
||||
@@ -16,7 +16,7 @@ from ._redirect import (
|
||||
YoutubeYtBeIE,
|
||||
YoutubeYtUserIE,
|
||||
)
|
||||
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE
|
||||
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchIE, YoutubeSearchURLIE
|
||||
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
|
||||
from ._video import YoutubeIE
|
||||
|
||||
@@ -39,7 +39,6 @@ for _cls in [
|
||||
YoutubeYtBeIE,
|
||||
YoutubeYtUserIE,
|
||||
YoutubeMusicSearchURLIE,
|
||||
YoutubeSearchDateIE,
|
||||
YoutubeSearchIE,
|
||||
YoutubeSearchURLIE,
|
||||
YoutubePlaylistIE,
|
||||
|
||||
@@ -28,21 +28,6 @@ class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
|
||||
}]
|
||||
|
||||
|
||||
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
|
||||
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
|
||||
_SEARCH_KEY = 'ytsearchdate'
|
||||
IE_DESC = 'YouTube search, newest videos first'
|
||||
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
|
||||
_TESTS = [{
|
||||
'url': 'ytsearchdate5:youtube-dl test video',
|
||||
'playlist_count': 5,
|
||||
'info_dict': {
|
||||
'id': 'youtube-dl test video',
|
||||
'title': 'youtube-dl test video',
|
||||
},
|
||||
}]
|
||||
|
||||
|
||||
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
|
||||
IE_DESC = 'YouTube search URLs with sorting and filter support'
|
||||
IE_NAME = YoutubeSearchIE.IE_NAME + '_url'
|
||||
|
||||
@@ -139,11 +139,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
]
|
||||
_RETURN_TYPE = 'video' # XXX: How to handle multifeed?
|
||||
|
||||
_PLAYER_INFO_RE = (
|
||||
r'/s/player/(?P<id>[a-zA-Z0-9_-]{8,})/(?:tv-)?player',
|
||||
r'/(?P<id>[a-zA-Z0-9_-]{8,})/player(?:_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?|-plasma-ias-(?:phone|tablet)-[a-z]{2}_[A-Z]{2}\.vflset)/base\.js$',
|
||||
r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.js$',
|
||||
)
|
||||
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
|
||||
_DEFAULT_CLIENTS = ('android_vr', 'web', 'web_safari')
|
||||
_DEFAULT_JSLESS_CLIENTS = ('android_vr',)
|
||||
@@ -1879,17 +1874,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
}]
|
||||
|
||||
_DEFAULT_PLAYER_JS_VERSION = 'actual'
|
||||
_DEFAULT_PLAYER_JS_VARIANT = 'main'
|
||||
_DEFAULT_PLAYER_JS_VARIANT = 'tv'
|
||||
_PLAYER_JS_VARIANT_MAP = {
|
||||
'main': 'player_ias.vflset/en_US/base.js',
|
||||
'tcc': 'player_ias_tcc.vflset/en_US/base.js',
|
||||
'tce': 'player_ias_tce.vflset/en_US/base.js',
|
||||
'es5': 'player_es5.vflset/en_US/base.js',
|
||||
'es6': 'player_es6.vflset/en_US/base.js',
|
||||
'es6_tcc': 'player_es6_tcc.vflset/en_US/base.js',
|
||||
'es6_tce': 'player_es6_tce.vflset/en_US/base.js',
|
||||
'tv': 'tv-player-ias.vflset/tv-player-ias.js',
|
||||
'tv_es6': 'tv-player-es6.vflset/tv-player-es6.js',
|
||||
'phone': 'player-plasma-ias-phone-en_US.vflset/base.js',
|
||||
'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js', # Dead since 19712d96 (2025.11.06)
|
||||
'house': 'house_brand_player.vflset/en_US/base.js', # Used by Google Drive
|
||||
}
|
||||
_INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()}
|
||||
|
||||
@@ -2179,13 +2176,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
|
||||
@classmethod
|
||||
def _extract_player_info(cls, player_url):
|
||||
for player_re in cls._PLAYER_INFO_RE:
|
||||
id_m = re.search(player_re, player_url)
|
||||
if id_m:
|
||||
break
|
||||
else:
|
||||
raise ExtractorError(f'Cannot identify player {player_url!r}')
|
||||
return id_m.group('id')
|
||||
if m := re.search(r'/s/player/(?P<id>[a-fA-F0-9]{8,})/', player_url):
|
||||
return m.group('id')
|
||||
raise ExtractorError(f'Cannot identify player {player_url!r}')
|
||||
|
||||
def _load_player(self, video_id, player_url, fatal=True):
|
||||
player_js_key = self._player_js_cache_key(player_url)
|
||||
@@ -3219,6 +3212,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
])
|
||||
skip_player_js = 'js' in self._configuration_arg('player_skip')
|
||||
format_types = self._configuration_arg('formats')
|
||||
skip_bad_formats = 'incomplete' not in format_types
|
||||
all_formats = 'duplicate' in format_types
|
||||
if self._configuration_arg('include_duplicate_formats'):
|
||||
all_formats = True
|
||||
@@ -3464,7 +3458,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
https_fmts = []
|
||||
|
||||
for fmt_stream in streaming_formats:
|
||||
if fmt_stream.get('targetDurationSec'):
|
||||
# Live adaptive https formats are not supported: skip unless extractor-arg given
|
||||
if fmt_stream.get('targetDurationSec') and skip_bad_formats:
|
||||
continue
|
||||
|
||||
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
|
||||
@@ -3576,7 +3571,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
yield from process_https_formats()
|
||||
|
||||
needs_live_processing = self._needs_live_processing(live_status, duration)
|
||||
skip_bad_formats = 'incomplete' not in format_types
|
||||
|
||||
skip_manifests = set(self._configuration_arg('skip'))
|
||||
if (needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway
|
||||
@@ -4086,16 +4080,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
|
||||
needs_live_processing = self._needs_live_processing(live_status, duration)
|
||||
|
||||
def is_bad_format(fmt):
|
||||
if needs_live_processing and not fmt.get('is_from_start'):
|
||||
return True
|
||||
elif (live_status == 'is_live' and needs_live_processing != 'is_live'
|
||||
and fmt.get('protocol') == 'http_dash_segments'):
|
||||
return True
|
||||
def adjust_incomplete_format(fmt, note_suffix='(Last 2 hours)', pref_adjustment=-10):
|
||||
fmt['preference'] = (fmt.get('preference') or -1) + pref_adjustment
|
||||
fmt['format_note'] = join_nonempty(fmt.get('format_note'), note_suffix, delim=' ')
|
||||
|
||||
for fmt in filter(is_bad_format, formats):
|
||||
fmt['preference'] = (fmt.get('preference') or -1) - 10
|
||||
fmt['format_note'] = join_nonempty(fmt.get('format_note'), '(Last 2 hours)', delim=' ')
|
||||
# Adjust preference and format note for incomplete live/post-live formats
|
||||
if live_status in ('is_live', 'post_live'):
|
||||
for fmt in formats:
|
||||
protocol = fmt.get('protocol')
|
||||
# Currently, protocol isn't set for adaptive https formats, but this could change
|
||||
is_adaptive = protocol in (None, 'http', 'https')
|
||||
if live_status == 'post_live' and is_adaptive:
|
||||
# Post-live adaptive formats cause HttpFD to raise "Did not get any data blocks"
|
||||
# These formats are *only* useful to external applications, so we can hide them
|
||||
# Set their preference <= -1000 so that FormatSorter flags them as 'hidden'
|
||||
adjust_incomplete_format(fmt, note_suffix='(ended)', pref_adjustment=-5000)
|
||||
# Is it live with --live-from-start? Or is it post-live and its duration is >2hrs?
|
||||
elif needs_live_processing:
|
||||
if not fmt.get('is_from_start'):
|
||||
# Post-live m3u8 formats for >2hr streams
|
||||
adjust_incomplete_format(fmt)
|
||||
elif live_status == 'is_live':
|
||||
if protocol == 'http_dash_segments':
|
||||
# Live DASH formats without --live-from-start
|
||||
adjust_incomplete_format(fmt)
|
||||
elif is_adaptive:
|
||||
# Incomplete live adaptive https formats
|
||||
adjust_incomplete_format(fmt, note_suffix='(incomplete)', pref_adjustment=-20)
|
||||
|
||||
if needs_live_processing:
|
||||
self._prepare_live_from_start_formats(
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# This file is generated by devscripts/update_ejs.py. DO NOT MODIFY!
|
||||
|
||||
VERSION = '0.4.0'
|
||||
VERSION = '0.5.0'
|
||||
HASHES = {
|
||||
'yt.solver.bun.lib.js': '6ff45e94de9f0ea936a183c48173cfa9ce526ee4b7544cd556428427c1dd53c8073ef0174e79b320252bf0e7c64b0032cc1cf9c4358f3fda59033b7caa01c241',
|
||||
'yt.solver.core.js': '05964b458d92a65d4fb7a90bcb5921c9fed2370f4e4f2f25badb41f28aff9069e0b3c4e5bf1baf2d3021787b67fc6093cefa44de30cffdc6f9fb25532484003b',
|
||||
'yt.solver.core.min.js': '0cd3c0b37e095d3cca99443b58fe03980ac3bf2e777c2485c23e1f6052b5ede9f07c7f1c79a9c3af3258ea91a228f099741e7eb07b53125b5dcc84bb4c0054f3',
|
||||
'yt.solver.core.js': '9742868113d7b0c29e24a95c8eb2c2bec7cdf95513dc7f55f523ba053c0ecf2af7dcb0138b1d933578304f0dda633a6b3bfff64e912b4c547b99dad083428c4b',
|
||||
'yt.solver.core.min.js': 'aee8c3354cfd535809c871c2a517d03231f89cd184e903af82ee274bcc2e90991ef19cb3f65f2ccc858c4963856ea87f8692fe16d71209f4fc7f41c44b828e36',
|
||||
'yt.solver.deno.lib.js': '9c8ee3ab6c23e443a5a951e3ac73c6b8c1c8fb34335e7058a07bf99d349be5573611de00536dcd03ecd3cf34014c4e9b536081de37af3637c5390c6a6fd6a0f0',
|
||||
'yt.solver.lib.js': '1ee3753a8222fc855f5c39db30a9ccbb7967dbe1fb810e86dc9a89aa073a0907f294c720e9b65427d560a35aa1ce6af19ef854d9126a05ca00afe03f72047733',
|
||||
'yt.solver.lib.min.js': '8420c259ad16e99ce004e4651ac1bcabb53b4457bf5668a97a9359be9a998a789fee8ab124ee17f91a2ea8fd84e0f2b2fc8eabcaf0b16a186ba734cf422ad053',
|
||||
|
||||
@@ -60,26 +60,29 @@ var jsc = (function (meriyah, astring) {
|
||||
}
|
||||
return value;
|
||||
}
|
||||
const nsigExpression = {
|
||||
type: 'VariableDeclaration',
|
||||
kind: 'var',
|
||||
declarations: [
|
||||
const nsig = {
|
||||
type: 'CallExpression',
|
||||
callee: { or: [{ type: 'Identifier' }, { type: 'SequenceExpression' }] },
|
||||
arguments: [
|
||||
{},
|
||||
{
|
||||
type: 'VariableDeclarator',
|
||||
init: {
|
||||
type: 'CallExpression',
|
||||
callee: { type: 'Identifier' },
|
||||
arguments: [
|
||||
{ type: 'Literal' },
|
||||
{
|
||||
type: 'CallExpression',
|
||||
callee: { type: 'Identifier', name: 'decodeURIComponent' },
|
||||
},
|
||||
],
|
||||
},
|
||||
type: 'CallExpression',
|
||||
callee: { type: 'Identifier', name: 'decodeURIComponent' },
|
||||
arguments: [{}],
|
||||
},
|
||||
],
|
||||
};
|
||||
const nsigAssignment = {
|
||||
type: 'AssignmentExpression',
|
||||
left: { type: 'Identifier' },
|
||||
operator: '=',
|
||||
right: nsig,
|
||||
};
|
||||
const nsigDeclarator = {
|
||||
type: 'VariableDeclarator',
|
||||
id: { type: 'Identifier' },
|
||||
init: nsig,
|
||||
};
|
||||
const logicalExpression = {
|
||||
type: 'ExpressionStatement',
|
||||
expression: {
|
||||
@@ -97,6 +100,17 @@ var jsc = (function (meriyah, astring) {
|
||||
callee: { type: 'Identifier' },
|
||||
arguments: {
|
||||
or: [
|
||||
[
|
||||
{
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'Identifier',
|
||||
name: 'decodeURIComponent',
|
||||
},
|
||||
arguments: [{ type: 'Identifier' }],
|
||||
optional: false,
|
||||
},
|
||||
],
|
||||
[
|
||||
{ type: 'Literal' },
|
||||
{
|
||||
@@ -110,6 +124,8 @@ var jsc = (function (meriyah, astring) {
|
||||
},
|
||||
],
|
||||
[
|
||||
{ type: 'Literal' },
|
||||
{ type: 'Literal' },
|
||||
{
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
@@ -138,18 +154,18 @@ var jsc = (function (meriyah, astring) {
|
||||
expression: {
|
||||
type: 'AssignmentExpression',
|
||||
operator: '=',
|
||||
left: { type: 'Identifier' },
|
||||
right: { type: 'FunctionExpression', params: [{}, {}, {}] },
|
||||
left: { or: [{ type: 'Identifier' }, { type: 'MemberExpression' }] },
|
||||
right: { type: 'FunctionExpression' },
|
||||
},
|
||||
},
|
||||
{ type: 'FunctionDeclaration', params: [{}, {}, {}] },
|
||||
{ type: 'FunctionDeclaration' },
|
||||
{
|
||||
type: 'VariableDeclaration',
|
||||
declarations: {
|
||||
anykey: [
|
||||
{
|
||||
type: 'VariableDeclarator',
|
||||
init: { type: 'FunctionExpression', params: [{}, {}, {}] },
|
||||
init: { type: 'FunctionExpression' },
|
||||
},
|
||||
],
|
||||
},
|
||||
@@ -157,124 +173,150 @@ var jsc = (function (meriyah, astring) {
|
||||
],
|
||||
};
|
||||
function extract$1(node) {
|
||||
if (!matchesStructure(node, identifier$1)) {
|
||||
return null;
|
||||
}
|
||||
let block;
|
||||
if (
|
||||
const blocks = [];
|
||||
if (matchesStructure(node, identifier$1)) {
|
||||
if (
|
||||
node.type === 'ExpressionStatement' &&
|
||||
node.expression.type === 'AssignmentExpression' &&
|
||||
node.expression.right.type === 'FunctionExpression' &&
|
||||
node.expression.right.params.length >= 3
|
||||
) {
|
||||
blocks.push(node.expression.right.body);
|
||||
} else if (node.type === 'VariableDeclaration') {
|
||||
for (const decl of node.declarations) {
|
||||
if (
|
||||
_optionalChain$2([
|
||||
decl,
|
||||
'access',
|
||||
(_) => _.init,
|
||||
'optionalAccess',
|
||||
(_2) => _2.type,
|
||||
]) === 'FunctionExpression' &&
|
||||
decl.init.params.length >= 3
|
||||
) {
|
||||
blocks.push(decl.init.body);
|
||||
}
|
||||
}
|
||||
} else if (
|
||||
node.type === 'FunctionDeclaration' &&
|
||||
node.params.length >= 3
|
||||
) {
|
||||
blocks.push(node.body);
|
||||
} else {
|
||||
return null;
|
||||
}
|
||||
} else if (
|
||||
node.type === 'ExpressionStatement' &&
|
||||
node.expression.type === 'AssignmentExpression' &&
|
||||
node.expression.right.type === 'FunctionExpression'
|
||||
node.expression.type === 'SequenceExpression'
|
||||
) {
|
||||
block = node.expression.right.body;
|
||||
} else if (node.type === 'VariableDeclaration') {
|
||||
for (const decl of node.declarations) {
|
||||
for (const expr of node.expression.expressions) {
|
||||
if (
|
||||
decl.type === 'VariableDeclarator' &&
|
||||
_optionalChain$2([
|
||||
decl,
|
||||
'access',
|
||||
(_) => _.init,
|
||||
'optionalAccess',
|
||||
(_2) => _2.type,
|
||||
]) === 'FunctionExpression' &&
|
||||
_optionalChain$2([
|
||||
decl,
|
||||
'access',
|
||||
(_3) => _3.init,
|
||||
'optionalAccess',
|
||||
(_4) => _4.params,
|
||||
'access',
|
||||
(_5) => _5.length,
|
||||
]) === 3
|
||||
expr.type === 'AssignmentExpression' &&
|
||||
expr.right.type === 'FunctionExpression' &&
|
||||
expr.right.params.length === 3
|
||||
) {
|
||||
block = decl.init.body;
|
||||
break;
|
||||
blocks.push(expr.right.body);
|
||||
}
|
||||
}
|
||||
} else if (node.type === 'FunctionDeclaration') {
|
||||
block = node.body;
|
||||
} else {
|
||||
return null;
|
||||
}
|
||||
const relevantExpression = _optionalChain$2([
|
||||
block,
|
||||
'optionalAccess',
|
||||
(_6) => _6.body,
|
||||
'access',
|
||||
(_7) => _7.at,
|
||||
'call',
|
||||
(_8) => _8(-2),
|
||||
]);
|
||||
let call = null;
|
||||
if (matchesStructure(relevantExpression, logicalExpression)) {
|
||||
if (
|
||||
_optionalChain$2([
|
||||
relevantExpression,
|
||||
'optionalAccess',
|
||||
(_9) => _9.type,
|
||||
]) !== 'ExpressionStatement' ||
|
||||
relevantExpression.expression.type !== 'LogicalExpression' ||
|
||||
relevantExpression.expression.right.type !== 'SequenceExpression' ||
|
||||
relevantExpression.expression.right.expressions[0].type !==
|
||||
'AssignmentExpression' ||
|
||||
relevantExpression.expression.right.expressions[0].right.type !==
|
||||
'CallExpression'
|
||||
) {
|
||||
return null;
|
||||
}
|
||||
call = relevantExpression.expression.right.expressions[0].right;
|
||||
} else if (
|
||||
_optionalChain$2([
|
||||
relevantExpression,
|
||||
'optionalAccess',
|
||||
(_10) => _10.type,
|
||||
]) === 'IfStatement' &&
|
||||
relevantExpression.consequent.type === 'BlockStatement'
|
||||
) {
|
||||
for (const n of relevantExpression.consequent.body) {
|
||||
if (!matchesStructure(n, nsigExpression)) {
|
||||
continue;
|
||||
for (const block of blocks) {
|
||||
let call = null;
|
||||
for (const stmt of block.body) {
|
||||
if (matchesStructure(stmt, logicalExpression)) {
|
||||
if (
|
||||
stmt.type === 'ExpressionStatement' &&
|
||||
stmt.expression.type === 'LogicalExpression' &&
|
||||
stmt.expression.right.type === 'SequenceExpression' &&
|
||||
stmt.expression.right.expressions[0].type ===
|
||||
'AssignmentExpression' &&
|
||||
stmt.expression.right.expressions[0].right.type === 'CallExpression'
|
||||
) {
|
||||
call = stmt.expression.right.expressions[0].right;
|
||||
}
|
||||
} else if (stmt.type === 'IfStatement') {
|
||||
let consequent = stmt.consequent;
|
||||
while (consequent.type === 'LabeledStatement') {
|
||||
consequent = consequent.body;
|
||||
}
|
||||
if (consequent.type !== 'BlockStatement') {
|
||||
continue;
|
||||
}
|
||||
for (const n of consequent.body) {
|
||||
if (n.type !== 'VariableDeclaration') {
|
||||
continue;
|
||||
}
|
||||
for (const decl of n.declarations) {
|
||||
if (
|
||||
matchesStructure(decl, nsigDeclarator) &&
|
||||
_optionalChain$2([
|
||||
decl,
|
||||
'access',
|
||||
(_3) => _3.init,
|
||||
'optionalAccess',
|
||||
(_4) => _4.type,
|
||||
]) === 'CallExpression'
|
||||
) {
|
||||
call = decl.init;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (call) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
} else if (stmt.type === 'ExpressionStatement') {
|
||||
if (
|
||||
stmt.expression.type !== 'LogicalExpression' ||
|
||||
stmt.expression.operator !== '&&' ||
|
||||
stmt.expression.right.type !== 'SequenceExpression'
|
||||
) {
|
||||
continue;
|
||||
}
|
||||
for (const expr of stmt.expression.right.expressions) {
|
||||
if (matchesStructure(expr, nsigAssignment) && expr.type) {
|
||||
if (
|
||||
expr.type === 'AssignmentExpression' &&
|
||||
expr.right.type === 'CallExpression'
|
||||
) {
|
||||
call = expr.right;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if (
|
||||
n.type !== 'VariableDeclaration' ||
|
||||
_optionalChain$2([
|
||||
n,
|
||||
'access',
|
||||
(_11) => _11.declarations,
|
||||
'access',
|
||||
(_12) => _12[0],
|
||||
'access',
|
||||
(_13) => _13.init,
|
||||
'optionalAccess',
|
||||
(_14) => _14.type,
|
||||
]) !== 'CallExpression'
|
||||
) {
|
||||
continue;
|
||||
if (call) {
|
||||
break;
|
||||
}
|
||||
call = n.declarations[0].init;
|
||||
break;
|
||||
}
|
||||
if (!call) {
|
||||
continue;
|
||||
}
|
||||
return {
|
||||
type: 'ArrowFunctionExpression',
|
||||
params: [{ type: 'Identifier', name: 'sig' }],
|
||||
body: {
|
||||
type: 'CallExpression',
|
||||
callee: call.callee,
|
||||
arguments: call.arguments.map((arg) => {
|
||||
if (
|
||||
arg.type === 'CallExpression' &&
|
||||
arg.callee.type === 'Identifier' &&
|
||||
arg.callee.name === 'decodeURIComponent'
|
||||
) {
|
||||
return { type: 'Identifier', name: 'sig' };
|
||||
}
|
||||
return arg;
|
||||
}),
|
||||
optional: false,
|
||||
},
|
||||
async: false,
|
||||
expression: false,
|
||||
generator: false,
|
||||
};
|
||||
}
|
||||
if (call === null) {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
type: 'ArrowFunctionExpression',
|
||||
params: [{ type: 'Identifier', name: 'sig' }],
|
||||
body: {
|
||||
type: 'CallExpression',
|
||||
callee: { type: 'Identifier', name: call.callee.name },
|
||||
arguments:
|
||||
call.arguments.length === 1
|
||||
? [{ type: 'Identifier', name: 'sig' }]
|
||||
: [call.arguments[0], { type: 'Identifier', name: 'sig' }],
|
||||
optional: false,
|
||||
},
|
||||
async: false,
|
||||
expression: false,
|
||||
generator: false,
|
||||
};
|
||||
return null;
|
||||
}
|
||||
function _optionalChain$1(ops) {
|
||||
let lastAccessLHS = undefined;
|
||||
@@ -472,8 +514,31 @@ var jsc = (function (meriyah, astring) {
|
||||
return value;
|
||||
}
|
||||
function preprocessPlayer(data) {
|
||||
const ast = meriyah.parse(data);
|
||||
const body = ast.body;
|
||||
const program = meriyah.parse(data);
|
||||
const plainStatements = modifyPlayer(program);
|
||||
const solutions = getSolutions(plainStatements);
|
||||
for (const [name, options] of Object.entries(solutions)) {
|
||||
plainStatements.push({
|
||||
type: 'ExpressionStatement',
|
||||
expression: {
|
||||
type: 'AssignmentExpression',
|
||||
operator: '=',
|
||||
left: {
|
||||
type: 'MemberExpression',
|
||||
computed: false,
|
||||
object: { type: 'Identifier', name: '_result' },
|
||||
property: { type: 'Identifier', name: name },
|
||||
optional: false,
|
||||
},
|
||||
right: multiTry(options),
|
||||
},
|
||||
});
|
||||
}
|
||||
program.body.splice(0, 0, ...setupNodes);
|
||||
return astring.generate(program);
|
||||
}
|
||||
function modifyPlayer(program) {
|
||||
const body = program.body;
|
||||
const block = (() => {
|
||||
switch (body.length) {
|
||||
case 1: {
|
||||
@@ -506,16 +571,7 @@ var jsc = (function (meriyah, astring) {
|
||||
}
|
||||
throw 'unexpected structure';
|
||||
})();
|
||||
const found = { n: [], sig: [] };
|
||||
const plainExpressions = block.body.filter((node) => {
|
||||
const n = extract(node);
|
||||
if (n) {
|
||||
found.n.push(n);
|
||||
}
|
||||
const sig = extract$1(node);
|
||||
if (sig) {
|
||||
found.sig.push(sig);
|
||||
}
|
||||
block.body = block.body.filter((node) => {
|
||||
if (node.type === 'ExpressionStatement') {
|
||||
if (node.expression.type === 'AssignmentExpression') {
|
||||
return true;
|
||||
@@ -524,41 +580,241 @@ var jsc = (function (meriyah, astring) {
|
||||
}
|
||||
return true;
|
||||
});
|
||||
block.body = plainExpressions;
|
||||
for (const [name, options] of Object.entries(found)) {
|
||||
const unique = new Set(options.map((x) => JSON.stringify(x)));
|
||||
if (unique.size !== 1) {
|
||||
const message = `found ${unique.size} ${name} function possibilities`;
|
||||
throw (
|
||||
message +
|
||||
(unique.size
|
||||
? `: ${options.map((x) => astring.generate(x)).join(', ')}`
|
||||
: '')
|
||||
);
|
||||
return block.body;
|
||||
}
|
||||
function getSolutions(statements) {
|
||||
const found = { n: [], sig: [] };
|
||||
for (const statement of statements) {
|
||||
const n = extract(statement);
|
||||
if (n) {
|
||||
found.n.push(n);
|
||||
}
|
||||
const sig = extract$1(statement);
|
||||
if (sig) {
|
||||
found.sig.push(sig);
|
||||
}
|
||||
plainExpressions.push({
|
||||
type: 'ExpressionStatement',
|
||||
expression: {
|
||||
type: 'AssignmentExpression',
|
||||
operator: '=',
|
||||
left: {
|
||||
type: 'MemberExpression',
|
||||
computed: false,
|
||||
object: { type: 'Identifier', name: '_result' },
|
||||
property: { type: 'Identifier', name: name },
|
||||
},
|
||||
right: options[0],
|
||||
},
|
||||
});
|
||||
}
|
||||
ast.body.splice(0, 0, ...setupNodes);
|
||||
return astring.generate(ast);
|
||||
return found;
|
||||
}
|
||||
function getFromPrepared(code) {
|
||||
const resultObj = { n: null, sig: null };
|
||||
Function('_result', code)(resultObj);
|
||||
return resultObj;
|
||||
}
|
||||
function multiTry(generators) {
|
||||
return {
|
||||
type: 'ArrowFunctionExpression',
|
||||
params: [{ type: 'Identifier', name: '_input' }],
|
||||
body: {
|
||||
type: 'BlockStatement',
|
||||
body: [
|
||||
{
|
||||
type: 'VariableDeclaration',
|
||||
kind: 'const',
|
||||
declarations: [
|
||||
{
|
||||
type: 'VariableDeclarator',
|
||||
id: { type: 'Identifier', name: '_results' },
|
||||
init: {
|
||||
type: 'NewExpression',
|
||||
callee: { type: 'Identifier', name: 'Set' },
|
||||
arguments: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
type: 'ForOfStatement',
|
||||
left: {
|
||||
type: 'VariableDeclaration',
|
||||
kind: 'const',
|
||||
declarations: [
|
||||
{
|
||||
type: 'VariableDeclarator',
|
||||
id: { type: 'Identifier', name: '_generator' },
|
||||
init: null,
|
||||
},
|
||||
],
|
||||
},
|
||||
right: { type: 'ArrayExpression', elements: generators },
|
||||
body: {
|
||||
type: 'BlockStatement',
|
||||
body: [
|
||||
{
|
||||
type: 'TryStatement',
|
||||
block: {
|
||||
type: 'BlockStatement',
|
||||
body: [
|
||||
{
|
||||
type: 'ExpressionStatement',
|
||||
expression: {
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'MemberExpression',
|
||||
object: { type: 'Identifier', name: '_results' },
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'add' },
|
||||
optional: false,
|
||||
},
|
||||
arguments: [
|
||||
{
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'Identifier',
|
||||
name: '_generator',
|
||||
},
|
||||
arguments: [
|
||||
{ type: 'Identifier', name: '_input' },
|
||||
],
|
||||
optional: false,
|
||||
},
|
||||
],
|
||||
optional: false,
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
handler: {
|
||||
type: 'CatchClause',
|
||||
param: null,
|
||||
body: { type: 'BlockStatement', body: [] },
|
||||
},
|
||||
finalizer: null,
|
||||
},
|
||||
],
|
||||
},
|
||||
await: false,
|
||||
},
|
||||
{
|
||||
type: 'IfStatement',
|
||||
test: {
|
||||
type: 'UnaryExpression',
|
||||
operator: '!',
|
||||
argument: {
|
||||
type: 'MemberExpression',
|
||||
object: { type: 'Identifier', name: '_results' },
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'size' },
|
||||
optional: false,
|
||||
},
|
||||
prefix: true,
|
||||
},
|
||||
consequent: {
|
||||
type: 'BlockStatement',
|
||||
body: [
|
||||
{
|
||||
type: 'ThrowStatement',
|
||||
argument: {
|
||||
type: 'TemplateLiteral',
|
||||
expressions: [],
|
||||
quasis: [
|
||||
{
|
||||
type: 'TemplateElement',
|
||||
value: { cooked: 'no solutions', raw: 'no solutions' },
|
||||
tail: true,
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
alternate: null,
|
||||
},
|
||||
{
|
||||
type: 'IfStatement',
|
||||
test: {
|
||||
type: 'BinaryExpression',
|
||||
left: {
|
||||
type: 'MemberExpression',
|
||||
object: { type: 'Identifier', name: '_results' },
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'size' },
|
||||
optional: false,
|
||||
},
|
||||
right: { type: 'Literal', value: 1 },
|
||||
operator: '!==',
|
||||
},
|
||||
consequent: {
|
||||
type: 'BlockStatement',
|
||||
body: [
|
||||
{
|
||||
type: 'ThrowStatement',
|
||||
argument: {
|
||||
type: 'TemplateLiteral',
|
||||
expressions: [
|
||||
{
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'MemberExpression',
|
||||
object: { type: 'Identifier', name: '_results' },
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'join' },
|
||||
optional: false,
|
||||
},
|
||||
arguments: [{ type: 'Literal', value: ', ' }],
|
||||
optional: false,
|
||||
},
|
||||
],
|
||||
quasis: [
|
||||
{
|
||||
type: 'TemplateElement',
|
||||
value: {
|
||||
cooked: 'invalid solutions: ',
|
||||
raw: 'invalid solutions: ',
|
||||
},
|
||||
tail: false,
|
||||
},
|
||||
{
|
||||
type: 'TemplateElement',
|
||||
value: { cooked: '', raw: '' },
|
||||
tail: true,
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
alternate: null,
|
||||
},
|
||||
{
|
||||
type: 'ReturnStatement',
|
||||
argument: {
|
||||
type: 'MemberExpression',
|
||||
object: {
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'MemberExpression',
|
||||
object: {
|
||||
type: 'CallExpression',
|
||||
callee: {
|
||||
type: 'MemberExpression',
|
||||
object: { type: 'Identifier', name: '_results' },
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'values' },
|
||||
optional: false,
|
||||
},
|
||||
arguments: [],
|
||||
optional: false,
|
||||
},
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'next' },
|
||||
optional: false,
|
||||
},
|
||||
arguments: [],
|
||||
optional: false,
|
||||
},
|
||||
computed: false,
|
||||
property: { type: 'Identifier', name: 'value' },
|
||||
optional: false,
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
async: false,
|
||||
expression: false,
|
||||
generator: false,
|
||||
};
|
||||
}
|
||||
function main(input) {
|
||||
const preprocessedPlayer =
|
||||
input.type === 'player'
|
||||
|
||||
@@ -18,6 +18,14 @@ from .utils import (
|
||||
)
|
||||
|
||||
|
||||
def int_to_int32(n):
|
||||
"""Converts an integer to a signed 32-bit integer"""
|
||||
n &= 0xFFFFFFFF
|
||||
if n & 0x80000000:
|
||||
return n - 0x100000000
|
||||
return n
|
||||
|
||||
|
||||
def _js_bit_op(op):
|
||||
def zeroise(x):
|
||||
if x in (None, JS_Undefined):
|
||||
@@ -28,7 +36,7 @@ def _js_bit_op(op):
|
||||
return int(float(x))
|
||||
|
||||
def wrapped(a, b):
|
||||
return op(zeroise(a), zeroise(b)) & 0xffffffff
|
||||
return int_to_int32(op(int_to_int32(zeroise(a)), int_to_int32(zeroise(b))))
|
||||
|
||||
return wrapped
|
||||
|
||||
@@ -368,6 +376,10 @@ class JSInterpreter:
|
||||
if not _OPERATORS.get(op):
|
||||
return right_val
|
||||
|
||||
# TODO: This is only correct for str+str and str+number; fix for str+array, str+object, etc
|
||||
if op == '+' and (isinstance(left_val, str) or isinstance(right_val, str)):
|
||||
return f'{left_val}{right_val}'
|
||||
|
||||
try:
|
||||
return _OPERATORS[op](left_val, right_val)
|
||||
except Exception as e:
|
||||
@@ -377,7 +389,7 @@ class JSInterpreter:
|
||||
if idx == 'length':
|
||||
return len(obj)
|
||||
try:
|
||||
return obj[int(idx)] if isinstance(obj, list) else obj[idx]
|
||||
return obj[int(idx)] if isinstance(obj, list) else obj[str(idx)]
|
||||
except Exception as e:
|
||||
if allow_undefined:
|
||||
return JS_Undefined
|
||||
|
||||
@@ -175,6 +175,13 @@ _TARGETS_COMPAT_LOOKUP = {
|
||||
'safari180_ios': 'safari18_0_ios',
|
||||
}
|
||||
|
||||
# These targets are known to be insufficient, unreliable or blocked
|
||||
# See: https://github.com/yt-dlp/yt-dlp/issues/16012
|
||||
_DEPRIORITIZED_TARGETS = {
|
||||
ImpersonateTarget('chrome', '133', 'macos', '15'), # chrome133a
|
||||
ImpersonateTarget('chrome', '136', 'macos', '15'), # chrome136
|
||||
}
|
||||
|
||||
|
||||
@register_rh
|
||||
class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
|
||||
@@ -192,6 +199,8 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
|
||||
for version, targets in BROWSER_TARGETS.items()
|
||||
if curl_cffi_version >= version
|
||||
), key=lambda x: (
|
||||
# deprioritize unreliable targets so they are not selected by default
|
||||
x[1] not in _DEPRIORITIZED_TARGETS,
|
||||
# deprioritize mobile targets since they give very different behavior
|
||||
x[1].os not in ('ios', 'android'),
|
||||
# prioritize tor < edge < firefox < safari < chrome
|
||||
|
||||
@@ -511,7 +511,7 @@ def create_parser():
|
||||
general.add_option(
|
||||
'--live-from-start',
|
||||
action='store_true', dest='live_from_start',
|
||||
help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch')
|
||||
help='Download livestreams from the start. Currently experimental and only supported for YouTube, Twitch, and TVer')
|
||||
general.add_option(
|
||||
'--no-live-from-start',
|
||||
action='store_false', dest='live_from_start',
|
||||
|
||||
@@ -75,6 +75,9 @@ MONTH_NAMES = {
|
||||
'fr': [
|
||||
'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
|
||||
'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'],
|
||||
'is': [
|
||||
'janúar', 'febrúar', 'mars', 'apríl', 'maí', 'júní',
|
||||
'júlí', 'ágúst', 'september', 'október', 'nóvember', 'desember'],
|
||||
# these follow the genitive grammatical case (dopełniacz)
|
||||
# some websites might be using nominative, which will require another month list
|
||||
# https://en.wikibooks.org/wiki/Polish/Noun_cases
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Autogenerated by devscripts/update-version.py
|
||||
|
||||
__version__ = '2026.01.31'
|
||||
__version__ = '2026.02.21'
|
||||
|
||||
RELEASE_GIT_HEAD = '9a9a6b6fe44a30458c1754ef064f354f04a84004'
|
||||
RELEASE_GIT_HEAD = '646bb31f39614e6c2f7ba687c53e7496394cbadb'
|
||||
|
||||
VARIANT = None
|
||||
|
||||
@@ -12,4 +12,4 @@ CHANNEL = 'stable'
|
||||
|
||||
ORIGIN = 'yt-dlp/yt-dlp'
|
||||
|
||||
_pkg_version = '2026.01.31'
|
||||
_pkg_version = '2026.02.21'
|
||||
|
||||
Reference in New Issue
Block a user