mirror of
https://github.com/yt-dlp/yt-dlp.git
synced 2025-08-25 05:48:29 +00:00
Merge branch 'master' into jsi
This commit is contained in:
commit
12c405341c
@ -770,3 +770,8 @@ NeonMan
|
||||
pj47x
|
||||
troex
|
||||
WouterGordts
|
||||
baierjan
|
||||
GeoffreyFrogeye
|
||||
Pawka
|
||||
v3DJG6GL
|
||||
yozel
|
||||
|
46
Changelog.md
46
Changelog.md
@ -4,6 +4,52 @@ # Changelog
|
||||
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
|
||||
-->
|
||||
|
||||
### 2025.05.22
|
||||
|
||||
#### Core changes
|
||||
- **cookies**: [Fix Linux desktop environment detection](https://github.com/yt-dlp/yt-dlp/commit/e491fd4d090db3af52a82863fb0553dd5e17fb85) ([#13197](https://github.com/yt-dlp/yt-dlp/issues/13197)) by [mbway](https://github.com/mbway)
|
||||
- **jsinterp**: [Fix increment/decrement evaluation](https://github.com/yt-dlp/yt-dlp/commit/167d7a9f0ffd1b4fe600193441bdb7358db2740b) ([#13238](https://github.com/yt-dlp/yt-dlp/issues/13238)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
|
||||
|
||||
#### Extractor changes
|
||||
- **1tv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/41c0a1fb89628696f8bb88e2b9f3a68f355b8c26) ([#13168](https://github.com/yt-dlp/yt-dlp/issues/13168)) by [bashonly](https://github.com/bashonly)
|
||||
- **amcnetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/464c84fedf78eef822a431361155f108b5df96d7) ([#13147](https://github.com/yt-dlp/yt-dlp/issues/13147)) by [bashonly](https://github.com/bashonly)
|
||||
- **bitchute**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1d0f6539c47e5d5c68c3c47cdb7075339e2885ac) ([#13081](https://github.com/yt-dlp/yt-dlp/issues/13081)) by [bashonly](https://github.com/bashonly)
|
||||
- **cartoonnetwork**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/7dbb47f84f0ee1266a3a01f58c9bc4c76d76794a) ([#13148](https://github.com/yt-dlp/yt-dlp/issues/13148)) by [bashonly](https://github.com/bashonly)
|
||||
- **iprima**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/a7d9a5eb79ceeecb851389f3f2c88597871ca3f2) ([#12937](https://github.com/yt-dlp/yt-dlp/issues/12937)) by [baierjan](https://github.com/baierjan)
|
||||
- **jiosaavn**
|
||||
- artist: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/586b557b124f954d3f625360ebe970989022ad97) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
|
||||
- playlist, show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/317f4b8006c2c0f0f64f095b1485163ad97c9053) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
|
||||
- show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/6839276496d8814cf16f58b637e45663467928e6) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
|
||||
- **lrtradio**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/abf58dcd6a09e14eec4ea82ae12f79a0337cb383) ([#13200](https://github.com/yt-dlp/yt-dlp/issues/13200)) by [Pawka](https://github.com/Pawka)
|
||||
- **nebula**: [Support `--mark-watched`](https://github.com/yt-dlp/yt-dlp/commit/20f288bdc2173c7cc58d709d25ca193c1f6001e7) ([#13120](https://github.com/yt-dlp/yt-dlp/issues/13120)) by [GeoffreyFrogeye](https://github.com/GeoffreyFrogeye)
|
||||
- **niconico**
|
||||
- [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/f569be4602c2a857087e495d5d7ed6060cd97abe) ([#13236](https://github.com/yt-dlp/yt-dlp/issues/13236)) by [bashonly](https://github.com/bashonly)
|
||||
- live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7a7b85c9014d96421e18aa7ea5f4c1bee5ceece0) ([#13045](https://github.com/yt-dlp/yt-dlp/issues/13045)) by [doe1080](https://github.com/doe1080)
|
||||
- **nytimesarticle**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/b26bc32579c00ef579d75a835807ccc87d20ee0a) ([#13104](https://github.com/yt-dlp/yt-dlp/issues/13104)) by [bashonly](https://github.com/bashonly)
|
||||
- **once**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/f475e8b529d18efdad603ffda02a56e707fe0e2c) ([#13164](https://github.com/yt-dlp/yt-dlp/issues/13164)) by [bashonly](https://github.com/bashonly)
|
||||
- **picarto**: vod: [Support `/profile/` video URLs](https://github.com/yt-dlp/yt-dlp/commit/31e090cb787f3504ec25485adff9a2a51d056734) ([#13227](https://github.com/yt-dlp/yt-dlp/issues/13227)) by [subrat-lima](https://github.com/subrat-lima)
|
||||
- **playsuisse**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/d880e060803ae8ed5a047e578cca01e1f0e630ce) ([#12466](https://github.com/yt-dlp/yt-dlp/issues/12466)) by [v3DJG6GL](https://github.com/v3DJG6GL)
|
||||
- **sprout**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/cbcfe6378dde33a650e3852ab17ad4503b8e008d) ([#13149](https://github.com/yt-dlp/yt-dlp/issues/13149)) by [bashonly](https://github.com/bashonly)
|
||||
- **svtpage**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ea8498ed534642dd7e925961b97b934987142fd3) ([#12957](https://github.com/yt-dlp/yt-dlp/issues/12957)) by [diman8](https://github.com/diman8)
|
||||
- **twitch**: [Support `--live-from-start`](https://github.com/yt-dlp/yt-dlp/commit/00b1bec55249cf2ad6271d36492c51b34b6459d1) ([#13202](https://github.com/yt-dlp/yt-dlp/issues/13202)) by [bashonly](https://github.com/bashonly)
|
||||
- **vimeo**: event: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/545c1a5b6f2fe88722b41aef0e7485bf3be3f3f9) ([#13216](https://github.com/yt-dlp/yt-dlp/issues/13216)) by [bashonly](https://github.com/bashonly)
|
||||
- **wat.tv**: [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/f123cc83b3aea45053f5fa1d9141048b01fc2774) ([#13111](https://github.com/yt-dlp/yt-dlp/issues/13111)) by [bashonly](https://github.com/bashonly)
|
||||
- **weverse**: [Fix live extraction](https://github.com/yt-dlp/yt-dlp/commit/5328eda8820cc5f21dcf917684d23fbdca41831d) ([#13084](https://github.com/yt-dlp/yt-dlp/issues/13084)) by [bashonly](https://github.com/bashonly)
|
||||
- **xinpianchang**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/83fabf352489d52843f67e6e9cc752db86d27e6e) ([#13245](https://github.com/yt-dlp/yt-dlp/issues/13245)) by [garret1317](https://github.com/garret1317)
|
||||
- **youtube**
|
||||
- [Add PO token support for subtitles](https://github.com/yt-dlp/yt-dlp/commit/32ed5f107c6c641958d1cd2752e130de4db55a13) ([#13234](https://github.com/yt-dlp/yt-dlp/issues/13234)) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz)
|
||||
- [Add `web_embedded` client for age-restricted videos](https://github.com/yt-dlp/yt-dlp/commit/0feec6dc131f488428bf881519e7c69766fbb9ae) ([#13089](https://github.com/yt-dlp/yt-dlp/issues/13089)) by [bashonly](https://github.com/bashonly)
|
||||
- [Add a PO Token Provider Framework](https://github.com/yt-dlp/yt-dlp/commit/2685654a37141cca63eda3a92da0e2706e23ccfd) ([#12840](https://github.com/yt-dlp/yt-dlp/issues/12840)) by [coletdjnz](https://github.com/coletdjnz)
|
||||
- [Extract `media_type` for all videos](https://github.com/yt-dlp/yt-dlp/commit/ded11ebc9afba6ba33923375103e9be2d7c804e7) ([#13136](https://github.com/yt-dlp/yt-dlp/issues/13136)) by [bashonly](https://github.com/bashonly)
|
||||
- [Fix `--live-from-start` support for premieres](https://github.com/yt-dlp/yt-dlp/commit/8f303afb43395be360cafd7ad4ce2b6e2eedfb8a) ([#13079](https://github.com/yt-dlp/yt-dlp/issues/13079)) by [arabcoders](https://github.com/arabcoders)
|
||||
- [Fix geo-restriction error handling](https://github.com/yt-dlp/yt-dlp/commit/c7e575e31608c19c5b26c10a4229db89db5fc9a8) ([#13217](https://github.com/yt-dlp/yt-dlp/issues/13217)) by [yozel](https://github.com/yozel)
|
||||
|
||||
#### Misc. changes
|
||||
- **build**
|
||||
- [Bump PyInstaller to v6.13.0](https://github.com/yt-dlp/yt-dlp/commit/17cf9088d0d535e4a7feffbf02bd49cd9dae5ab9) ([#13082](https://github.com/yt-dlp/yt-dlp/issues/13082)) by [bashonly](https://github.com/bashonly)
|
||||
- [Bump run-on-arch-action to v3](https://github.com/yt-dlp/yt-dlp/commit/9064d2482d1fe722bbb4a49731fe0711c410d1c8) ([#13088](https://github.com/yt-dlp/yt-dlp/issues/13088)) by [bashonly](https://github.com/bashonly)
|
||||
- **cleanup**: Miscellaneous: [7977b32](https://github.com/yt-dlp/yt-dlp/commit/7977b329ed97b216e37bd402f4935f28c00eac9e) by [bashonly](https://github.com/bashonly)
|
||||
|
||||
### 2025.04.30
|
||||
|
||||
#### Important changes
|
||||
|
30
README.md
30
README.md
@ -44,6 +44,7 @@
|
||||
* [Post-processing Options](#post-processing-options)
|
||||
* [SponsorBlock Options](#sponsorblock-options)
|
||||
* [Extractor Options](#extractor-options)
|
||||
* [Preset Aliases](#preset-aliases)
|
||||
* [CONFIGURATION](#configuration)
|
||||
* [Configuration file encoding](#configuration-file-encoding)
|
||||
* [Authentication with netrc](#authentication-with-netrc)
|
||||
@ -348,8 +349,8 @@ ## General Options:
|
||||
--no-flat-playlist Fully extract the videos of a playlist
|
||||
(default)
|
||||
--live-from-start Download livestreams from the start.
|
||||
Currently only supported for YouTube
|
||||
(Experimental)
|
||||
Currently experimental and only supported
|
||||
for YouTube and Twitch
|
||||
--no-live-from-start Download livestreams from the current time
|
||||
(default)
|
||||
--wait-for-video MIN[-MAX] Wait for scheduled streams to become
|
||||
@ -375,12 +376,12 @@ ## General Options:
|
||||
an alias starts with a dash "-", it is
|
||||
prefixed with "--". Arguments are parsed
|
||||
according to the Python string formatting
|
||||
mini-language. E.g. --alias get-audio,-X
|
||||
"-S=aext:{0},abr -x --audio-format {0}"
|
||||
creates options "--get-audio" and "-X" that
|
||||
takes an argument (ARG0) and expands to
|
||||
"-S=aext:ARG0,abr -x --audio-format ARG0".
|
||||
All defined aliases are listed in the --help
|
||||
mini-language. E.g. --alias get-audio,-X "-S
|
||||
aext:{0},abr -x --audio-format {0}" creates
|
||||
options "--get-audio" and "-X" that takes an
|
||||
argument (ARG0) and expands to "-S
|
||||
aext:ARG0,abr -x --audio-format ARG0". All
|
||||
defined aliases are listed in the --help
|
||||
output. Alias options can trigger more
|
||||
aliases; so be careful to avoid defining
|
||||
recursive options. As a safety measure, each
|
||||
@ -1108,6 +1109,10 @@ ## Extractor Options:
|
||||
arguments for different extractors
|
||||
|
||||
## Preset Aliases:
|
||||
Predefined aliases for convenience and ease of use. Note that future
|
||||
versions of yt-dlp may add or adjust presets, but the existing preset
|
||||
names will not be changed or removed
|
||||
|
||||
-t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format
|
||||
mp3
|
||||
|
||||
@ -1798,6 +1803,7 @@ #### youtube
|
||||
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
|
||||
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
|
||||
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
|
||||
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
|
||||
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
|
||||
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
|
||||
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
|
||||
@ -1807,8 +1813,12 @@ #### youtube
|
||||
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
|
||||
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
|
||||
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
|
||||
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
|
||||
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
|
||||
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be any of `gvs` (Google Video Server URLs), `player` (Innertube player request) or `subs` (Subtitles)
|
||||
* `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default)
|
||||
* `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context)
|
||||
|
||||
#### youtubepot-webpo
|
||||
* `bind_to_visitor_id`: Whether to use the Visitor ID instead of Visitor Data for caching WebPO tokens. Either `true` (default) or `false`
|
||||
|
||||
#### youtubetab (YouTube playlists, channels, feeds, etc.)
|
||||
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
|
||||
|
@ -246,7 +246,6 @@ # Supported sites
|
||||
- **Canalplus**: mycanal.fr and piwiplus.fr
|
||||
- **Canalsurmas**
|
||||
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
|
||||
- **CartoonNetwork**
|
||||
- **cbc.ca**
|
||||
- **cbc.ca:player**
|
||||
- **cbc.ca:player:playlist**
|
||||
@ -649,7 +648,10 @@ # Supported sites
|
||||
- **jiocinema**: [*jiocinema*](## "netrc machine")
|
||||
- **jiocinema:series**: [*jiocinema*](## "netrc machine")
|
||||
- **jiosaavn:album**
|
||||
- **jiosaavn:artist**
|
||||
- **jiosaavn:playlist**
|
||||
- **jiosaavn:show**
|
||||
- **jiosaavn:show:playlist**
|
||||
- **jiosaavn:song**
|
||||
- **Joj**
|
||||
- **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)
|
||||
@ -1081,8 +1083,8 @@ # Supported sites
|
||||
- **Photobucket**
|
||||
- **PiaLive**
|
||||
- **Piapro**: [*piapro*](## "netrc machine")
|
||||
- **Picarto**
|
||||
- **PicartoVod**
|
||||
- **picarto**
|
||||
- **picarto:vod**
|
||||
- **Piksel**
|
||||
- **Pinkbike**
|
||||
- **Pinterest**
|
||||
@ -1390,7 +1392,6 @@ # Supported sites
|
||||
- **Spreaker**
|
||||
- **SpreakerShow**
|
||||
- **SpringboardPlatform**
|
||||
- **Sprout**
|
||||
- **SproutVideo**
|
||||
- **sr:mediathek**: Saarländischer Rundfunk (**Currently broken**)
|
||||
- **SRGSSR**
|
||||
@ -1656,6 +1657,7 @@ # Supported sites
|
||||
- **vimeo**: [*vimeo*](## "netrc machine")
|
||||
- **vimeo:album**: [*vimeo*](## "netrc machine")
|
||||
- **vimeo:channel**: [*vimeo*](## "netrc machine")
|
||||
- **vimeo:event**: [*vimeo*](## "netrc machine")
|
||||
- **vimeo:group**: [*vimeo*](## "netrc machine")
|
||||
- **vimeo:likes**: [*vimeo*](## "netrc machine") Vimeo user likes
|
||||
- **vimeo:ondemand**: [*vimeo*](## "netrc machine")
|
||||
|
@ -1435,6 +1435,27 @@ def test_load_plugins_compat(self):
|
||||
FakeYDL().close()
|
||||
assert all_plugins_loaded.value
|
||||
|
||||
def test_close_hooks(self):
|
||||
# Should call all registered close hooks on close
|
||||
close_hook_called = False
|
||||
close_hook_two_called = False
|
||||
|
||||
def close_hook():
|
||||
nonlocal close_hook_called
|
||||
close_hook_called = True
|
||||
|
||||
def close_hook_two():
|
||||
nonlocal close_hook_two_called
|
||||
close_hook_two_called = True
|
||||
|
||||
ydl = FakeYDL()
|
||||
ydl.add_close_hook(close_hook)
|
||||
ydl.add_close_hook(close_hook_two)
|
||||
|
||||
ydl.close()
|
||||
self.assertTrue(close_hook_called, 'Close hook was not called')
|
||||
self.assertTrue(close_hook_two_called, 'Close hook two was not called')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
@ -58,6 +58,14 @@ def test_get_desktop_environment(self):
|
||||
({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
|
||||
({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
|
||||
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'gnome'}, _LinuxDesktopEnvironment.GNOME),
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'mate'}, _LinuxDesktopEnvironment.GNOME),
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
|
||||
|
||||
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'my_custom_de', 'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
|
||||
|
||||
({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
|
||||
({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE3),
|
||||
({'KDE_FULL_SESSION': 1, 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),
|
||||
|
@ -478,6 +478,14 @@ def test_extract_function_with_global_stack(self):
|
||||
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
|
||||
self.assertEqual(func([1]), 1111)
|
||||
|
||||
def test_increment_decrement(self):
|
||||
self._test('function f() { var x = 1; return ++x; }', 2)
|
||||
self._test('function f() { var x = 1; return x++; }', 1)
|
||||
self._test('function f() { var x = 1; x--; return x }', 0)
|
||||
self._test('function f() { var y; var x = 1; x++, --x, x--, x--, y="z", "abc", x++; return --x }', -1)
|
||||
self._test('function f() { var a = "test--"; return a; }', 'test--')
|
||||
self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
@ -20,7 +20,6 @@
|
||||
add_accept_encoding_header,
|
||||
get_redirect_method,
|
||||
make_socks_proxy_opts,
|
||||
select_proxy,
|
||||
ssl_load_certs,
|
||||
)
|
||||
from yt_dlp.networking.exceptions import (
|
||||
@ -28,7 +27,7 @@
|
||||
IncompleteRead,
|
||||
)
|
||||
from yt_dlp.socks import ProxyType
|
||||
from yt_dlp.utils.networking import HTTPHeaderDict
|
||||
from yt_dlp.utils.networking import HTTPHeaderDict, select_proxy
|
||||
|
||||
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||
|
||||
|
71
test/test_pot/conftest.py
Normal file
71
test/test_pot/conftest.py
Normal file
@ -0,0 +1,71 @@
|
||||
import collections
|
||||
|
||||
import pytest
|
||||
|
||||
from yt_dlp import YoutubeDL
|
||||
from yt_dlp.cookies import YoutubeDLCookieJar
|
||||
from yt_dlp.extractor.common import InfoExtractor
|
||||
from yt_dlp.extractor.youtube.pot._provider import IEContentProviderLogger
|
||||
from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest, PoTokenContext
|
||||
from yt_dlp.utils.networking import HTTPHeaderDict
|
||||
|
||||
|
||||
class MockLogger(IEContentProviderLogger):
|
||||
|
||||
log_level = IEContentProviderLogger.LogLevel.TRACE
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.messages = collections.defaultdict(list)
|
||||
|
||||
def trace(self, message: str):
|
||||
self.messages['trace'].append(message)
|
||||
|
||||
def debug(self, message: str):
|
||||
self.messages['debug'].append(message)
|
||||
|
||||
def info(self, message: str):
|
||||
self.messages['info'].append(message)
|
||||
|
||||
def warning(self, message: str, *, once=False):
|
||||
self.messages['warning'].append(message)
|
||||
|
||||
def error(self, message: str):
|
||||
self.messages['error'].append(message)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ie() -> InfoExtractor:
|
||||
ydl = YoutubeDL()
|
||||
return ydl.get_info_extractor('Youtube')
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def logger() -> MockLogger:
|
||||
return MockLogger()
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def pot_request() -> PoTokenRequest:
|
||||
return PoTokenRequest(
|
||||
context=PoTokenContext.GVS,
|
||||
innertube_context={'client': {'clientName': 'WEB'}},
|
||||
innertube_host='youtube.com',
|
||||
session_index=None,
|
||||
player_url=None,
|
||||
is_authenticated=False,
|
||||
video_webpage=None,
|
||||
|
||||
visitor_data='example-visitor-data',
|
||||
data_sync_id='example-data-sync-id',
|
||||
video_id='example-video-id',
|
||||
|
||||
request_cookiejar=YoutubeDLCookieJar(),
|
||||
request_proxy=None,
|
||||
request_headers=HTTPHeaderDict(),
|
||||
request_timeout=None,
|
||||
request_source_address=None,
|
||||
request_verify_tls=True,
|
||||
|
||||
bypass_cache=False,
|
||||
)
|
117
test/test_pot/test_pot_builtin_memorycache.py
Normal file
117
test/test_pot/test_pot_builtin_memorycache.py
Normal file
@ -0,0 +1,117 @@
|
||||
import threading
|
||||
import time
|
||||
from collections import OrderedDict
|
||||
import pytest
|
||||
from yt_dlp.extractor.youtube.pot._provider import IEContentProvider, BuiltinIEContentProvider
|
||||
from yt_dlp.utils import bug_reports_message
|
||||
from yt_dlp.extractor.youtube.pot._builtin.memory_cache import MemoryLRUPCP, memorylru_preference, initialize_global_cache
|
||||
from yt_dlp.version import __version__
|
||||
from yt_dlp.extractor.youtube.pot._registry import _pot_cache_providers, _pot_memory_cache
|
||||
|
||||
|
||||
class TestMemoryLRUPCS:
|
||||
|
||||
def test_base_type(self):
|
||||
assert issubclass(MemoryLRUPCP, IEContentProvider)
|
||||
assert issubclass(MemoryLRUPCP, BuiltinIEContentProvider)
|
||||
|
||||
@pytest.fixture
|
||||
def pcp(self, ie, logger) -> MemoryLRUPCP:
|
||||
return MemoryLRUPCP(ie, logger, {}, initialize_cache=lambda max_size: (OrderedDict(), threading.Lock(), max_size))
|
||||
|
||||
def test_is_registered(self):
|
||||
assert _pot_cache_providers.value.get('MemoryLRU') == MemoryLRUPCP
|
||||
|
||||
def test_initialization(self, pcp):
|
||||
assert pcp.PROVIDER_NAME == 'memory'
|
||||
assert pcp.PROVIDER_VERSION == __version__
|
||||
assert pcp.BUG_REPORT_MESSAGE == bug_reports_message(before='')
|
||||
assert pcp.is_available()
|
||||
|
||||
def test_store_and_get(self, pcp):
|
||||
pcp.store('key1', 'value1', int(time.time()) + 60)
|
||||
assert pcp.get('key1') == 'value1'
|
||||
assert len(pcp.cache) == 1
|
||||
|
||||
def test_store_ignore_expired(self, pcp):
|
||||
pcp.store('key1', 'value1', int(time.time()) - 1)
|
||||
assert len(pcp.cache) == 0
|
||||
assert pcp.get('key1') is None
|
||||
assert len(pcp.cache) == 0
|
||||
|
||||
def test_store_override_existing_key(self, ie, logger):
|
||||
MAX_SIZE = 2
|
||||
pcp = MemoryLRUPCP(ie, logger, {}, initialize_cache=lambda max_size: (OrderedDict(), threading.Lock(), MAX_SIZE))
|
||||
pcp.store('key1', 'value1', int(time.time()) + 60)
|
||||
pcp.store('key2', 'value2', int(time.time()) + 60)
|
||||
assert len(pcp.cache) == 2
|
||||
pcp.store('key1', 'value2', int(time.time()) + 60)
|
||||
# Ensure that the override key gets added to the end of the cache instead of in the same position
|
||||
pcp.store('key3', 'value3', int(time.time()) + 60)
|
||||
assert pcp.get('key1') == 'value2'
|
||||
|
||||
def test_store_ignore_expired_existing_key(self, pcp):
|
||||
pcp.store('key1', 'value2', int(time.time()) + 60)
|
||||
pcp.store('key1', 'value1', int(time.time()) - 1)
|
||||
assert len(pcp.cache) == 1
|
||||
assert pcp.get('key1') == 'value2'
|
||||
assert len(pcp.cache) == 1
|
||||
|
||||
def test_get_key_expired(self, pcp):
|
||||
pcp.store('key1', 'value1', int(time.time()) + 60)
|
||||
assert pcp.get('key1') == 'value1'
|
||||
assert len(pcp.cache) == 1
|
||||
pcp.cache['key1'] = ('value1', int(time.time()) - 1)
|
||||
assert pcp.get('key1') is None
|
||||
assert len(pcp.cache) == 0
|
||||
|
||||
def test_lru_eviction(self, ie, logger):
|
||||
MAX_SIZE = 2
|
||||
provider = MemoryLRUPCP(ie, logger, {}, initialize_cache=lambda max_size: (OrderedDict(), threading.Lock(), MAX_SIZE))
|
||||
provider.store('key1', 'value1', int(time.time()) + 5)
|
||||
provider.store('key2', 'value2', int(time.time()) + 5)
|
||||
assert len(provider.cache) == 2
|
||||
|
||||
assert provider.get('key1') == 'value1'
|
||||
|
||||
provider.store('key3', 'value3', int(time.time()) + 5)
|
||||
assert len(provider.cache) == 2
|
||||
|
||||
assert provider.get('key2') is None
|
||||
|
||||
provider.store('key4', 'value4', int(time.time()) + 5)
|
||||
assert len(provider.cache) == 2
|
||||
|
||||
assert provider.get('key1') is None
|
||||
assert provider.get('key3') == 'value3'
|
||||
assert provider.get('key4') == 'value4'
|
||||
|
||||
def test_delete(self, pcp):
|
||||
pcp.store('key1', 'value1', int(time.time()) + 5)
|
||||
assert len(pcp.cache) == 1
|
||||
assert pcp.get('key1') == 'value1'
|
||||
pcp.delete('key1')
|
||||
assert len(pcp.cache) == 0
|
||||
assert pcp.get('key1') is None
|
||||
|
||||
def test_use_global_cache_default(self, ie, logger):
|
||||
pcp = MemoryLRUPCP(ie, logger, {})
|
||||
assert pcp.max_size == _pot_memory_cache.value['max_size'] == 25
|
||||
assert pcp.cache is _pot_memory_cache.value['cache']
|
||||
assert pcp.lock is _pot_memory_cache.value['lock']
|
||||
|
||||
pcp2 = MemoryLRUPCP(ie, logger, {})
|
||||
assert pcp.max_size == pcp2.max_size == _pot_memory_cache.value['max_size'] == 25
|
||||
assert pcp.cache is pcp2.cache is _pot_memory_cache.value['cache']
|
||||
assert pcp.lock is pcp2.lock is _pot_memory_cache.value['lock']
|
||||
|
||||
def test_fail_max_size_change_global(self, ie, logger):
|
||||
pcp = MemoryLRUPCP(ie, logger, {})
|
||||
assert pcp.max_size == _pot_memory_cache.value['max_size'] == 25
|
||||
with pytest.raises(ValueError, match='Cannot change max_size of initialized global memory cache'):
|
||||
initialize_global_cache(50)
|
||||
|
||||
assert pcp.max_size == _pot_memory_cache.value['max_size'] == 25
|
||||
|
||||
def test_memory_lru_preference(self, pcp, ie, pot_request):
|
||||
assert memorylru_preference(pcp, pot_request) == 10000
|
47
test/test_pot/test_pot_builtin_utils.py
Normal file
47
test/test_pot/test_pot_builtin_utils.py
Normal file
@ -0,0 +1,47 @@
|
||||
import pytest
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenContext,
|
||||
|
||||
)
|
||||
|
||||
from yt_dlp.extractor.youtube.pot.utils import get_webpo_content_binding, ContentBindingType
|
||||
|
||||
|
||||
class TestGetWebPoContentBinding:
|
||||
|
||||
@pytest.mark.parametrize('client_name, context, is_authenticated, expected', [
|
||||
*[(client, context, is_authenticated, expected) for client in [
|
||||
'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER']
|
||||
for context, is_authenticated, expected in [
|
||||
(PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),
|
||||
(PoTokenContext.PLAYER, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
|
||||
(PoTokenContext.SUBS, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
|
||||
(PoTokenContext.GVS, True, ('example-data-sync-id', ContentBindingType.DATASYNC_ID)),
|
||||
]],
|
||||
('WEB_REMIX', PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),
|
||||
('WEB_REMIX', PoTokenContext.PLAYER, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),
|
||||
('ANDROID', PoTokenContext.GVS, False, (None, None)),
|
||||
('IOS', PoTokenContext.GVS, False, (None, None)),
|
||||
])
|
||||
def test_get_webpo_content_binding(self, pot_request, client_name, context, is_authenticated, expected):
|
||||
pot_request.innertube_context['client']['clientName'] = client_name
|
||||
pot_request.context = context
|
||||
pot_request.is_authenticated = is_authenticated
|
||||
assert get_webpo_content_binding(pot_request) == expected
|
||||
|
||||
def test_extract_visitor_id(self, pot_request):
|
||||
pot_request.visitor_data = 'CgsxMjNhYmNYWVpfLSiA4s%2DqBg%3D%3D'
|
||||
assert get_webpo_content_binding(pot_request, bind_to_visitor_id=True) == ('123abcXYZ_-', ContentBindingType.VISITOR_ID)
|
||||
|
||||
def test_invalid_visitor_id(self, pot_request):
|
||||
# visitor id not alphanumeric (i.e. protobuf extraction failed)
|
||||
pot_request.visitor_data = 'CggxMjM0NTY3OCiA4s-qBg%3D%3D'
|
||||
assert get_webpo_content_binding(pot_request, bind_to_visitor_id=True) == (pot_request.visitor_data, ContentBindingType.VISITOR_DATA)
|
||||
|
||||
def test_no_visitor_id(self, pot_request):
|
||||
pot_request.visitor_data = 'KIDiz6oG'
|
||||
assert get_webpo_content_binding(pot_request, bind_to_visitor_id=True) == (pot_request.visitor_data, ContentBindingType.VISITOR_DATA)
|
||||
|
||||
def test_invalid_base64(self, pot_request):
|
||||
pot_request.visitor_data = 'invalid-base64'
|
||||
assert get_webpo_content_binding(pot_request, bind_to_visitor_id=True) == (pot_request.visitor_data, ContentBindingType.VISITOR_DATA)
|
92
test/test_pot/test_pot_builtin_webpospec.py
Normal file
92
test/test_pot/test_pot_builtin_webpospec.py
Normal file
@ -0,0 +1,92 @@
|
||||
import pytest
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import IEContentProvider, BuiltinIEContentProvider
|
||||
from yt_dlp.extractor.youtube.pot.cache import CacheProviderWritePolicy
|
||||
from yt_dlp.utils import bug_reports_message
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenRequest,
|
||||
PoTokenContext,
|
||||
|
||||
)
|
||||
from yt_dlp.version import __version__
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._builtin.webpo_cachespec import WebPoPCSP
|
||||
from yt_dlp.extractor.youtube.pot._registry import _pot_pcs_providers
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def pot_request(pot_request) -> PoTokenRequest:
|
||||
pot_request.visitor_data = 'CgsxMjNhYmNYWVpfLSiA4s%2DqBg%3D%3D' # visitor_id=123abcXYZ_-
|
||||
return pot_request
|
||||
|
||||
|
||||
class TestWebPoPCSP:
|
||||
def test_base_type(self):
|
||||
assert issubclass(WebPoPCSP, IEContentProvider)
|
||||
assert issubclass(WebPoPCSP, BuiltinIEContentProvider)
|
||||
|
||||
def test_init(self, ie, logger):
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={})
|
||||
assert pcs.PROVIDER_NAME == 'webpo'
|
||||
assert pcs.PROVIDER_VERSION == __version__
|
||||
assert pcs.BUG_REPORT_MESSAGE == bug_reports_message(before='')
|
||||
assert pcs.is_available()
|
||||
|
||||
def test_is_registered(self):
|
||||
assert _pot_pcs_providers.value.get('WebPo') == WebPoPCSP
|
||||
|
||||
@pytest.mark.parametrize('client_name, context, is_authenticated', [
|
||||
('ANDROID', PoTokenContext.GVS, False),
|
||||
('IOS', PoTokenContext.GVS, False),
|
||||
('IOS', PoTokenContext.PLAYER, False),
|
||||
])
|
||||
def test_not_supports(self, ie, logger, pot_request, client_name, context, is_authenticated):
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={})
|
||||
pot_request.innertube_context['client']['clientName'] = client_name
|
||||
pot_request.context = context
|
||||
pot_request.is_authenticated = is_authenticated
|
||||
assert pcs.generate_cache_spec(pot_request) is None
|
||||
|
||||
@pytest.mark.parametrize('client_name, context, is_authenticated, remote_host, source_address, request_proxy, expected', [
|
||||
*[(client, context, is_authenticated, remote_host, source_address, request_proxy, expected) for client in [
|
||||
'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER']
|
||||
for context, is_authenticated, remote_host, source_address, request_proxy, expected in [
|
||||
(PoTokenContext.GVS, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'visitor_id'}),
|
||||
(PoTokenContext.PLAYER, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'video_id'}),
|
||||
(PoTokenContext.GVS, True, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': 'example-data-sync-id', 'cbt': 'datasync_id'}),
|
||||
]],
|
||||
('WEB_REMIX', PoTokenContext.PLAYER, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'visitor_id'}),
|
||||
('WEB', PoTokenContext.GVS, False, None, None, None, {'t': 'webpo', 'cb': '123abcXYZ_-', 'cbt': 'visitor_id', 'ip': None, 'sa': None, 'px': None}),
|
||||
('TVHTML5', PoTokenContext.PLAYER, False, None, None, 'http://example.com', {'t': 'webpo', 'cb': '123abcXYZ_-', 'cbt': 'video_id', 'ip': None, 'sa': None, 'px': 'http://example.com'}),
|
||||
|
||||
])
|
||||
def test_generate_key_bindings(self, ie, logger, pot_request, client_name, context, is_authenticated, remote_host, source_address, request_proxy, expected):
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={})
|
||||
pot_request.innertube_context['client']['clientName'] = client_name
|
||||
pot_request.context = context
|
||||
pot_request.is_authenticated = is_authenticated
|
||||
pot_request.innertube_context['client']['remoteHost'] = remote_host
|
||||
pot_request.request_source_address = source_address
|
||||
pot_request.request_proxy = request_proxy
|
||||
pot_request.video_id = '123abcXYZ_-' # same as visitor id to test type
|
||||
|
||||
assert pcs.generate_cache_spec(pot_request).key_bindings == expected
|
||||
|
||||
def test_no_bind_visitor_id(self, ie, logger, pot_request):
|
||||
# Should not bind to visitor id if setting is set to False
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={'bind_to_visitor_id': ['false']})
|
||||
pot_request.innertube_context['client']['clientName'] = 'WEB'
|
||||
pot_request.context = PoTokenContext.GVS
|
||||
pot_request.is_authenticated = False
|
||||
assert pcs.generate_cache_spec(pot_request).key_bindings == {'t': 'webpo', 'ip': None, 'sa': None, 'px': None, 'cb': 'CgsxMjNhYmNYWVpfLSiA4s%2DqBg%3D%3D', 'cbt': 'visitor_data'}
|
||||
|
||||
def test_default_ttl(self, ie, logger, pot_request):
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={})
|
||||
assert pcs.generate_cache_spec(pot_request).default_ttl == 6 * 60 * 60 # should default to 6 hours
|
||||
|
||||
def test_write_policy(self, ie, logger, pot_request):
|
||||
pcs = WebPoPCSP(ie=ie, logger=logger, settings={})
|
||||
pot_request.context = PoTokenContext.GVS
|
||||
assert pcs.generate_cache_spec(pot_request).write_policy == CacheProviderWritePolicy.WRITE_ALL
|
||||
pot_request.context = PoTokenContext.PLAYER
|
||||
assert pcs.generate_cache_spec(pot_request).write_policy == CacheProviderWritePolicy.WRITE_FIRST
|
1529
test/test_pot/test_pot_director.py
Normal file
1529
test/test_pot/test_pot_director.py
Normal file
File diff suppressed because it is too large
Load Diff
629
test/test_pot/test_pot_framework.py
Normal file
629
test/test_pot/test_pot_framework.py
Normal file
@ -0,0 +1,629 @@
|
||||
import pytest
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import IEContentProvider
|
||||
from yt_dlp.cookies import YoutubeDLCookieJar
|
||||
from yt_dlp.utils.networking import HTTPHeaderDict
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenRequest,
|
||||
PoTokenContext,
|
||||
ExternalRequestFeature,
|
||||
|
||||
)
|
||||
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
PoTokenCacheProvider,
|
||||
PoTokenCacheSpec,
|
||||
PoTokenCacheSpecProvider,
|
||||
CacheProviderWritePolicy,
|
||||
)
|
||||
|
||||
import yt_dlp.extractor.youtube.pot.cache as cache
|
||||
|
||||
from yt_dlp.networking import Request
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenResponse,
|
||||
PoTokenProvider,
|
||||
PoTokenProviderRejectedRequest,
|
||||
provider_bug_report_message,
|
||||
register_provider,
|
||||
register_preference,
|
||||
)
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._registry import _pot_providers, _ptp_preferences, _pot_pcs_providers, _pot_cache_providers, _pot_cache_provider_preferences
|
||||
|
||||
|
||||
class ExamplePTP(PoTokenProvider):
|
||||
PROVIDER_NAME = 'example'
|
||||
PROVIDER_VERSION = '0.0.1'
|
||||
BUG_REPORT_LOCATION = 'https://example.com/issues'
|
||||
|
||||
_SUPPORTED_CLIENTS = ('WEB',)
|
||||
_SUPPORTED_CONTEXTS = (PoTokenContext.GVS, )
|
||||
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = (
|
||||
ExternalRequestFeature.PROXY_SCHEME_HTTP,
|
||||
ExternalRequestFeature.PROXY_SCHEME_SOCKS5H,
|
||||
)
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
return PoTokenResponse('example-token', expires_at=123)
|
||||
|
||||
|
||||
class ExampleCacheProviderPCP(PoTokenCacheProvider):
|
||||
|
||||
PROVIDER_NAME = 'example'
|
||||
PROVIDER_VERSION = '0.0.1'
|
||||
BUG_REPORT_LOCATION = 'https://example.com/issues'
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
def get(self, key: str):
|
||||
return 'example-cache'
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
|
||||
class ExampleCacheSpecProviderPCSP(PoTokenCacheSpecProvider):
|
||||
|
||||
PROVIDER_NAME = 'example'
|
||||
PROVIDER_VERSION = '0.0.1'
|
||||
BUG_REPORT_LOCATION = 'https://example.com/issues'
|
||||
|
||||
def generate_cache_spec(self, request: PoTokenRequest):
|
||||
return PoTokenCacheSpec(
|
||||
key_bindings={'field': 'example-key'},
|
||||
default_ttl=60,
|
||||
write_policy=CacheProviderWritePolicy.WRITE_FIRST,
|
||||
)
|
||||
|
||||
|
||||
class TestPoTokenProvider:
|
||||
|
||||
def test_base_type(self):
|
||||
assert issubclass(PoTokenProvider, IEContentProvider)
|
||||
|
||||
def test_create_provider_missing_fetch_method(self, ie, logger):
|
||||
class MissingMethodsPTP(PoTokenProvider):
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_create_provider_missing_available_method(self, ie, logger):
|
||||
class MissingMethodsPTP(PoTokenProvider):
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
raise PoTokenProviderRejectedRequest('Not implemented')
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_barebones_provider(self, ie, logger):
|
||||
class BarebonesProviderPTP(PoTokenProvider):
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
raise PoTokenProviderRejectedRequest('Not implemented')
|
||||
|
||||
provider = BarebonesProviderPTP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_KEY == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_VERSION == '0.0.0'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at (developer has not provided a bug report location) .'
|
||||
|
||||
def test_example_provider_success(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'example'
|
||||
assert provider.PROVIDER_KEY == 'Example'
|
||||
assert provider.PROVIDER_VERSION == '0.0.1'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at https://example.com/issues .'
|
||||
assert provider.is_available()
|
||||
|
||||
response = provider.request_pot(pot_request)
|
||||
|
||||
assert response.po_token == 'example-token'
|
||||
assert response.expires_at == 123
|
||||
|
||||
def test_provider_unsupported_context(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
pot_request.context = PoTokenContext.PLAYER
|
||||
|
||||
with pytest.raises(PoTokenProviderRejectedRequest):
|
||||
provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_unsupported_client(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
pot_request.innertube_context['client']['clientName'] = 'ANDROID'
|
||||
|
||||
with pytest.raises(PoTokenProviderRejectedRequest):
|
||||
provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_unsupported_proxy_scheme(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
pot_request.request_proxy = 'socks4://example.com'
|
||||
|
||||
with pytest.raises(
|
||||
PoTokenProviderRejectedRequest,
|
||||
match='External requests by "example" provider do not support proxy scheme "socks4". Supported proxy '
|
||||
'schemes: http, socks5h',
|
||||
):
|
||||
provider.request_pot(pot_request)
|
||||
|
||||
pot_request.request_proxy = 'http://example.com'
|
||||
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_ignore_external_request_features(self, ie, logger, pot_request):
|
||||
class InternalPTP(ExamplePTP):
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = None
|
||||
|
||||
provider = InternalPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
pot_request.request_proxy = 'socks5://example.com'
|
||||
assert provider.request_pot(pot_request)
|
||||
pot_request.request_source_address = '0.0.0.0'
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_unsupported_external_request_source_address(self, ie, logger, pot_request):
|
||||
class InternalPTP(ExamplePTP):
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = tuple()
|
||||
|
||||
provider = InternalPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
pot_request.request_source_address = None
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
pot_request.request_source_address = '0.0.0.0'
|
||||
with pytest.raises(
|
||||
PoTokenProviderRejectedRequest,
|
||||
match='External requests by "example" provider do not support setting source address',
|
||||
):
|
||||
provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_supported_external_request_source_address(self, ie, logger, pot_request):
|
||||
class InternalPTP(ExamplePTP):
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = (
|
||||
ExternalRequestFeature.SOURCE_ADDRESS,
|
||||
)
|
||||
|
||||
provider = InternalPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
pot_request.request_source_address = None
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
pot_request.request_source_address = '0.0.0.0'
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_unsupported_external_request_tls_verification(self, ie, logger, pot_request):
|
||||
class InternalPTP(ExamplePTP):
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = tuple()
|
||||
|
||||
provider = InternalPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
pot_request.request_verify_tls = True
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
pot_request.request_verify_tls = False
|
||||
with pytest.raises(
|
||||
PoTokenProviderRejectedRequest,
|
||||
match='External requests by "example" provider do not support ignoring TLS certificate failures',
|
||||
):
|
||||
provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_supported_external_request_tls_verification(self, ie, logger, pot_request):
|
||||
class InternalPTP(ExamplePTP):
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = (
|
||||
ExternalRequestFeature.DISABLE_TLS_VERIFICATION,
|
||||
)
|
||||
|
||||
provider = InternalPTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
pot_request.request_verify_tls = True
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
pot_request.request_verify_tls = False
|
||||
assert provider.request_pot(pot_request)
|
||||
|
||||
def test_provider_request_webpage(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
cookiejar = YoutubeDLCookieJar()
|
||||
pot_request.request_headers = HTTPHeaderDict({'User-Agent': 'example-user-agent'})
|
||||
pot_request.request_proxy = 'socks5://example-proxy.com'
|
||||
pot_request.request_cookiejar = cookiejar
|
||||
|
||||
def mock_urlopen(request):
|
||||
return request
|
||||
|
||||
ie._downloader.urlopen = mock_urlopen
|
||||
|
||||
sent_request = provider._request_webpage(Request(
|
||||
'https://example.com',
|
||||
), pot_request=pot_request)
|
||||
|
||||
assert sent_request.url == 'https://example.com'
|
||||
assert sent_request.headers['User-Agent'] == 'example-user-agent'
|
||||
assert sent_request.proxies == {'all': 'socks5://example-proxy.com'}
|
||||
assert sent_request.extensions['cookiejar'] is cookiejar
|
||||
assert 'Requesting webpage' in logger.messages['info']
|
||||
|
||||
def test_provider_request_webpage_override(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
cookiejar_request = YoutubeDLCookieJar()
|
||||
pot_request.request_headers = HTTPHeaderDict({'User-Agent': 'example-user-agent'})
|
||||
pot_request.request_proxy = 'socks5://example-proxy.com'
|
||||
pot_request.request_cookiejar = cookiejar_request
|
||||
|
||||
def mock_urlopen(request):
|
||||
return request
|
||||
|
||||
ie._downloader.urlopen = mock_urlopen
|
||||
|
||||
sent_request = provider._request_webpage(Request(
|
||||
'https://example.com',
|
||||
headers={'User-Agent': 'override-user-agent-override'},
|
||||
proxies={'http': 'http://example-proxy-override.com'},
|
||||
extensions={'cookiejar': YoutubeDLCookieJar()},
|
||||
), pot_request=pot_request, note='Custom requesting webpage')
|
||||
|
||||
assert sent_request.url == 'https://example.com'
|
||||
assert sent_request.headers['User-Agent'] == 'override-user-agent-override'
|
||||
assert sent_request.proxies == {'http': 'http://example-proxy-override.com'}
|
||||
assert sent_request.extensions['cookiejar'] is not cookiejar_request
|
||||
assert 'Custom requesting webpage' in logger.messages['info']
|
||||
|
||||
def test_provider_request_webpage_no_log(self, ie, logger, pot_request):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def mock_urlopen(request):
|
||||
return request
|
||||
|
||||
ie._downloader.urlopen = mock_urlopen
|
||||
|
||||
sent_request = provider._request_webpage(Request(
|
||||
'https://example.com',
|
||||
), note=False)
|
||||
|
||||
assert sent_request.url == 'https://example.com'
|
||||
assert 'info' not in logger.messages
|
||||
|
||||
def test_provider_request_webpage_no_pot_request(self, ie, logger):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def mock_urlopen(request):
|
||||
return request
|
||||
|
||||
ie._downloader.urlopen = mock_urlopen
|
||||
|
||||
sent_request = provider._request_webpage(Request(
|
||||
'https://example.com',
|
||||
), pot_request=None)
|
||||
|
||||
assert sent_request.url == 'https://example.com'
|
||||
|
||||
def test_get_config_arg(self, ie, logger):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={'abc': ['123D'], 'xyz': ['456a', '789B']})
|
||||
|
||||
assert provider._configuration_arg('abc') == ['123d']
|
||||
assert provider._configuration_arg('abc', default=['default']) == ['123d']
|
||||
assert provider._configuration_arg('ABC', default=['default']) == ['default']
|
||||
assert provider._configuration_arg('abc', casesense=True) == ['123D']
|
||||
assert provider._configuration_arg('xyz', casesense=False) == ['456a', '789b']
|
||||
|
||||
def test_require_class_end_with_suffix(self, ie, logger):
|
||||
class InvalidSuffix(PoTokenProvider):
|
||||
PROVIDER_NAME = 'invalid-suffix'
|
||||
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
raise PoTokenProviderRejectedRequest('Not implemented')
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
provider = InvalidSuffix(ie=ie, logger=logger, settings={})
|
||||
|
||||
with pytest.raises(AssertionError):
|
||||
provider.PROVIDER_KEY # noqa: B018
|
||||
|
||||
|
||||
class TestPoTokenCacheProvider:
|
||||
|
||||
def test_base_type(self):
|
||||
assert issubclass(PoTokenCacheProvider, IEContentProvider)
|
||||
|
||||
def test_create_provider_missing_get_method(self, ie, logger):
|
||||
class MissingMethodsPCP(PoTokenCacheProvider):
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPCP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_create_provider_missing_store_method(self, ie, logger):
|
||||
class MissingMethodsPCP(PoTokenCacheProvider):
|
||||
def get(self, key: str):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPCP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_create_provider_missing_delete_method(self, ie, logger):
|
||||
class MissingMethodsPCP(PoTokenCacheProvider):
|
||||
def get(self, key: str):
|
||||
pass
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPCP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_create_provider_missing_is_available_method(self, ie, logger):
|
||||
class MissingMethodsPCP(PoTokenCacheProvider):
|
||||
def get(self, key: str):
|
||||
pass
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPCP(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_barebones_provider(self, ie, logger):
|
||||
class BarebonesProviderPCP(PoTokenCacheProvider):
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
def get(self, key: str):
|
||||
return 'example-cache'
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
provider = BarebonesProviderPCP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_KEY == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_VERSION == '0.0.0'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at (developer has not provided a bug report location) .'
|
||||
|
||||
def test_create_provider_example(self, ie, logger):
|
||||
provider = ExampleCacheProviderPCP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'example'
|
||||
assert provider.PROVIDER_KEY == 'ExampleCacheProvider'
|
||||
assert provider.PROVIDER_VERSION == '0.0.1'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at https://example.com/issues .'
|
||||
assert provider.is_available()
|
||||
|
||||
def test_get_config_arg(self, ie, logger):
|
||||
provider = ExampleCacheProviderPCP(ie=ie, logger=logger, settings={'abc': ['123D'], 'xyz': ['456a', '789B']})
|
||||
assert provider._configuration_arg('abc') == ['123d']
|
||||
assert provider._configuration_arg('abc', default=['default']) == ['123d']
|
||||
assert provider._configuration_arg('ABC', default=['default']) == ['default']
|
||||
assert provider._configuration_arg('abc', casesense=True) == ['123D']
|
||||
assert provider._configuration_arg('xyz', casesense=False) == ['456a', '789b']
|
||||
|
||||
def test_require_class_end_with_suffix(self, ie, logger):
|
||||
class InvalidSuffix(PoTokenCacheProvider):
|
||||
def get(self, key: str):
|
||||
return 'example-cache'
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
provider = InvalidSuffix(ie=ie, logger=logger, settings={})
|
||||
|
||||
with pytest.raises(AssertionError):
|
||||
provider.PROVIDER_KEY # noqa: B018
|
||||
|
||||
|
||||
class TestPoTokenCacheSpecProvider:
|
||||
|
||||
def test_base_type(self):
|
||||
assert issubclass(PoTokenCacheSpecProvider, IEContentProvider)
|
||||
|
||||
def test_create_provider_missing_supports_method(self, ie, logger):
|
||||
class MissingMethodsPCS(PoTokenCacheSpecProvider):
|
||||
pass
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
MissingMethodsPCS(ie=ie, logger=logger, settings={})
|
||||
|
||||
def test_create_provider_barebones(self, ie, pot_request, logger):
|
||||
class BarebonesProviderPCSP(PoTokenCacheSpecProvider):
|
||||
def generate_cache_spec(self, request: PoTokenRequest):
|
||||
return PoTokenCacheSpec(
|
||||
default_ttl=100,
|
||||
key_bindings={},
|
||||
)
|
||||
|
||||
provider = BarebonesProviderPCSP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_KEY == 'BarebonesProvider'
|
||||
assert provider.PROVIDER_VERSION == '0.0.0'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at (developer has not provided a bug report location) .'
|
||||
assert provider.is_available()
|
||||
assert provider.generate_cache_spec(request=pot_request).default_ttl == 100
|
||||
assert provider.generate_cache_spec(request=pot_request).key_bindings == {}
|
||||
assert provider.generate_cache_spec(request=pot_request).write_policy == CacheProviderWritePolicy.WRITE_ALL
|
||||
|
||||
def test_create_provider_example(self, ie, pot_request, logger):
|
||||
provider = ExampleCacheSpecProviderPCSP(ie=ie, logger=logger, settings={})
|
||||
assert provider.PROVIDER_NAME == 'example'
|
||||
assert provider.PROVIDER_KEY == 'ExampleCacheSpecProvider'
|
||||
assert provider.PROVIDER_VERSION == '0.0.1'
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at https://example.com/issues .'
|
||||
assert provider.is_available()
|
||||
assert provider.generate_cache_spec(pot_request)
|
||||
assert provider.generate_cache_spec(pot_request).key_bindings == {'field': 'example-key'}
|
||||
assert provider.generate_cache_spec(pot_request).default_ttl == 60
|
||||
assert provider.generate_cache_spec(pot_request).write_policy == CacheProviderWritePolicy.WRITE_FIRST
|
||||
|
||||
def test_get_config_arg(self, ie, logger):
|
||||
provider = ExampleCacheSpecProviderPCSP(ie=ie, logger=logger, settings={'abc': ['123D'], 'xyz': ['456a', '789B']})
|
||||
|
||||
assert provider._configuration_arg('abc') == ['123d']
|
||||
assert provider._configuration_arg('abc', default=['default']) == ['123d']
|
||||
assert provider._configuration_arg('ABC', default=['default']) == ['default']
|
||||
assert provider._configuration_arg('abc', casesense=True) == ['123D']
|
||||
assert provider._configuration_arg('xyz', casesense=False) == ['456a', '789b']
|
||||
|
||||
def test_require_class_end_with_suffix(self, ie, logger):
|
||||
class InvalidSuffix(PoTokenCacheSpecProvider):
|
||||
def generate_cache_spec(self, request: PoTokenRequest):
|
||||
return None
|
||||
|
||||
provider = InvalidSuffix(ie=ie, logger=logger, settings={})
|
||||
|
||||
with pytest.raises(AssertionError):
|
||||
provider.PROVIDER_KEY # noqa: B018
|
||||
|
||||
|
||||
class TestPoTokenRequest:
|
||||
def test_copy_request(self, pot_request):
|
||||
copied_request = pot_request.copy()
|
||||
|
||||
assert copied_request is not pot_request
|
||||
assert copied_request.context == pot_request.context
|
||||
assert copied_request.innertube_context == pot_request.innertube_context
|
||||
assert copied_request.innertube_context is not pot_request.innertube_context
|
||||
copied_request.innertube_context['client']['clientName'] = 'ANDROID'
|
||||
assert pot_request.innertube_context['client']['clientName'] != 'ANDROID'
|
||||
assert copied_request.innertube_host == pot_request.innertube_host
|
||||
assert copied_request.session_index == pot_request.session_index
|
||||
assert copied_request.player_url == pot_request.player_url
|
||||
assert copied_request.is_authenticated == pot_request.is_authenticated
|
||||
assert copied_request.visitor_data == pot_request.visitor_data
|
||||
assert copied_request.data_sync_id == pot_request.data_sync_id
|
||||
assert copied_request.video_id == pot_request.video_id
|
||||
assert copied_request.request_cookiejar is pot_request.request_cookiejar
|
||||
assert copied_request.request_proxy == pot_request.request_proxy
|
||||
assert copied_request.request_headers == pot_request.request_headers
|
||||
assert copied_request.request_headers is not pot_request.request_headers
|
||||
assert copied_request.request_timeout == pot_request.request_timeout
|
||||
assert copied_request.request_source_address == pot_request.request_source_address
|
||||
assert copied_request.request_verify_tls == pot_request.request_verify_tls
|
||||
assert copied_request.bypass_cache == pot_request.bypass_cache
|
||||
|
||||
|
||||
def test_provider_bug_report_message(ie, logger):
|
||||
provider = ExamplePTP(ie=ie, logger=logger, settings={})
|
||||
assert provider.BUG_REPORT_MESSAGE == 'please report this issue to the provider developer at https://example.com/issues .'
|
||||
|
||||
message = provider_bug_report_message(provider)
|
||||
assert message == '; please report this issue to the provider developer at https://example.com/issues .'
|
||||
|
||||
message_before = provider_bug_report_message(provider, before='custom message!')
|
||||
assert message_before == 'custom message! Please report this issue to the provider developer at https://example.com/issues .'
|
||||
|
||||
|
||||
def test_register_provider(ie):
|
||||
|
||||
@register_provider
|
||||
class UnavailableProviderPTP(PoTokenProvider):
|
||||
def is_available(self) -> bool:
|
||||
return False
|
||||
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
raise PoTokenProviderRejectedRequest('Not implemented')
|
||||
|
||||
assert _pot_providers.value.get('UnavailableProvider') == UnavailableProviderPTP
|
||||
_pot_providers.value.pop('UnavailableProvider')
|
||||
|
||||
|
||||
def test_register_pot_preference(ie):
|
||||
before = len(_ptp_preferences.value)
|
||||
|
||||
@register_preference(ExamplePTP)
|
||||
def unavailable_preference(provider: PoTokenProvider, request: PoTokenRequest):
|
||||
return 1
|
||||
|
||||
assert len(_ptp_preferences.value) == before + 1
|
||||
|
||||
|
||||
def test_register_cache_provider(ie):
|
||||
|
||||
@cache.register_provider
|
||||
class UnavailableCacheProviderPCP(PoTokenCacheProvider):
|
||||
def is_available(self) -> bool:
|
||||
return False
|
||||
|
||||
def get(self, key: str):
|
||||
return 'example-cache'
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
assert _pot_cache_providers.value.get('UnavailableCacheProvider') == UnavailableCacheProviderPCP
|
||||
_pot_cache_providers.value.pop('UnavailableCacheProvider')
|
||||
|
||||
|
||||
def test_register_cache_provider_spec(ie):
|
||||
|
||||
@cache.register_spec
|
||||
class UnavailableCacheProviderPCSP(PoTokenCacheSpecProvider):
|
||||
def is_available(self) -> bool:
|
||||
return False
|
||||
|
||||
def generate_cache_spec(self, request: PoTokenRequest):
|
||||
return None
|
||||
|
||||
assert _pot_pcs_providers.value.get('UnavailableCacheProvider') == UnavailableCacheProviderPCSP
|
||||
_pot_pcs_providers.value.pop('UnavailableCacheProvider')
|
||||
|
||||
|
||||
def test_register_cache_provider_preference(ie):
|
||||
before = len(_pot_cache_provider_preferences.value)
|
||||
|
||||
@cache.register_preference(ExampleCacheProviderPCP)
|
||||
def unavailable_preference(provider: PoTokenCacheProvider, request: PoTokenRequest):
|
||||
return 1
|
||||
|
||||
assert len(_pot_cache_provider_preferences.value) == before + 1
|
||||
|
||||
|
||||
def test_logger_log_level(logger):
|
||||
assert logger.LogLevel('INFO') == logger.LogLevel.INFO
|
||||
assert logger.LogLevel('debuG') == logger.LogLevel.DEBUG
|
||||
assert logger.LogLevel(10) == logger.LogLevel.DEBUG
|
||||
assert logger.LogLevel('UNKNOWN') == logger.LogLevel.INFO
|
@ -316,6 +316,10 @@
|
||||
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
|
||||
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
|
||||
),
|
||||
(
|
||||
'https://www.youtube.com/s/player/59b252b9/player_ias.vflset/en_US/base.js',
|
||||
'D3XWVpYgwhLLKNK4AGX', 'aZrQ1qWJ5yv5h',
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
|
@ -644,6 +644,7 @@ def __init__(self, params=None, auto_init=True):
|
||||
self._printed_messages = set()
|
||||
self._first_webpage_request = True
|
||||
self._post_hooks = []
|
||||
self._close_hooks = []
|
||||
self._progress_hooks = []
|
||||
self._postprocessor_hooks = []
|
||||
self._download_retcode = 0
|
||||
@ -912,6 +913,11 @@ def add_post_hook(self, ph):
|
||||
"""Add the post hook"""
|
||||
self._post_hooks.append(ph)
|
||||
|
||||
def add_close_hook(self, ch):
|
||||
"""Add a close hook, called when YoutubeDL.close() is called"""
|
||||
assert callable(ch), 'Close hook must be callable'
|
||||
self._close_hooks.append(ch)
|
||||
|
||||
def add_progress_hook(self, ph):
|
||||
"""Add the download progress hook"""
|
||||
self._progress_hooks.append(ph)
|
||||
@ -1020,6 +1026,9 @@ def close(self):
|
||||
self._request_director.close()
|
||||
del self._request_director
|
||||
|
||||
for close_hook in self._close_hooks:
|
||||
close_hook()
|
||||
|
||||
def trouble(self, message=None, tb=None, is_error=True):
|
||||
"""Determine action to take when a download problem appears.
|
||||
|
||||
|
@ -764,11 +764,11 @@ def _get_linux_desktop_environment(env, logger):
|
||||
GetDesktopEnvironment
|
||||
"""
|
||||
xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None)
|
||||
desktop_session = env.get('DESKTOP_SESSION', None)
|
||||
desktop_session = env.get('DESKTOP_SESSION', '')
|
||||
if xdg_current_desktop is not None:
|
||||
for part in map(str.strip, xdg_current_desktop.split(':')):
|
||||
if part == 'Unity':
|
||||
if desktop_session is not None and 'gnome-fallback' in desktop_session:
|
||||
if 'gnome-fallback' in desktop_session:
|
||||
return _LinuxDesktopEnvironment.GNOME
|
||||
else:
|
||||
return _LinuxDesktopEnvironment.UNITY
|
||||
@ -797,35 +797,34 @@ def _get_linux_desktop_environment(env, logger):
|
||||
return _LinuxDesktopEnvironment.UKUI
|
||||
elif part == 'LXQt':
|
||||
return _LinuxDesktopEnvironment.LXQT
|
||||
logger.info(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"')
|
||||
logger.debug(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"')
|
||||
|
||||
elif desktop_session is not None:
|
||||
if desktop_session == 'deepin':
|
||||
return _LinuxDesktopEnvironment.DEEPIN
|
||||
elif desktop_session in ('mate', 'gnome'):
|
||||
return _LinuxDesktopEnvironment.GNOME
|
||||
elif desktop_session in ('kde4', 'kde-plasma'):
|
||||
if desktop_session == 'deepin':
|
||||
return _LinuxDesktopEnvironment.DEEPIN
|
||||
elif desktop_session in ('mate', 'gnome'):
|
||||
return _LinuxDesktopEnvironment.GNOME
|
||||
elif desktop_session in ('kde4', 'kde-plasma'):
|
||||
return _LinuxDesktopEnvironment.KDE4
|
||||
elif desktop_session == 'kde':
|
||||
if 'KDE_SESSION_VERSION' in env:
|
||||
return _LinuxDesktopEnvironment.KDE4
|
||||
elif desktop_session == 'kde':
|
||||
if 'KDE_SESSION_VERSION' in env:
|
||||
return _LinuxDesktopEnvironment.KDE4
|
||||
else:
|
||||
return _LinuxDesktopEnvironment.KDE3
|
||||
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
|
||||
return _LinuxDesktopEnvironment.XFCE
|
||||
elif desktop_session == 'ukui':
|
||||
return _LinuxDesktopEnvironment.UKUI
|
||||
else:
|
||||
logger.info(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"')
|
||||
|
||||
return _LinuxDesktopEnvironment.KDE3
|
||||
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
|
||||
return _LinuxDesktopEnvironment.XFCE
|
||||
elif desktop_session == 'ukui':
|
||||
return _LinuxDesktopEnvironment.UKUI
|
||||
else:
|
||||
if 'GNOME_DESKTOP_SESSION_ID' in env:
|
||||
return _LinuxDesktopEnvironment.GNOME
|
||||
elif 'KDE_FULL_SESSION' in env:
|
||||
if 'KDE_SESSION_VERSION' in env:
|
||||
return _LinuxDesktopEnvironment.KDE4
|
||||
else:
|
||||
return _LinuxDesktopEnvironment.KDE3
|
||||
logger.debug(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"')
|
||||
|
||||
if 'GNOME_DESKTOP_SESSION_ID' in env:
|
||||
return _LinuxDesktopEnvironment.GNOME
|
||||
elif 'KDE_FULL_SESSION' in env:
|
||||
if 'KDE_SESSION_VERSION' in env:
|
||||
return _LinuxDesktopEnvironment.KDE4
|
||||
else:
|
||||
return _LinuxDesktopEnvironment.KDE3
|
||||
|
||||
return _LinuxDesktopEnvironment.OTHER
|
||||
|
||||
|
||||
|
@ -2147,6 +2147,7 @@
|
||||
from .toggo import ToggoIE
|
||||
from .tonline import TOnlineIE
|
||||
from .toongoggles import ToonGogglesIE
|
||||
from .toutiao import ToutiaoIE
|
||||
from .toutv import TouTvIE
|
||||
from .toypics import (
|
||||
ToypicsIE,
|
||||
@ -2369,6 +2370,7 @@
|
||||
VHXEmbedIE,
|
||||
VimeoAlbumIE,
|
||||
VimeoChannelIE,
|
||||
VimeoEventIE,
|
||||
VimeoGroupsIE,
|
||||
VimeoIE,
|
||||
VimeoLikesIE,
|
||||
|
@ -2,7 +2,6 @@
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
merge_dicts,
|
||||
str_or_none,
|
||||
traverse_obj,
|
||||
unified_timestamp,
|
||||
url_or_none,
|
||||
@ -138,13 +137,15 @@ def _real_extract(self, url):
|
||||
'https://www.lrt.lt/radioteka/api/media', video_id,
|
||||
query={'url': f'/mediateka/irasas/{video_id}/{path}'})
|
||||
|
||||
return traverse_obj(media, {
|
||||
'id': ('id', {int}, {str_or_none}),
|
||||
'title': ('title', {str}),
|
||||
'tags': ('tags', ..., 'name', {str}),
|
||||
'categories': ('playlist_item', 'category', {str}, filter, all, filter),
|
||||
'description': ('content', {clean_html}, {str}),
|
||||
'timestamp': ('date', {lambda x: x.replace('.', '/')}, {unified_timestamp}),
|
||||
'thumbnail': ('playlist_item', 'image', {urljoin('https://www.lrt.lt')}),
|
||||
'formats': ('playlist_item', 'file', {lambda x: self._extract_m3u8_formats(x, video_id)}),
|
||||
})
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': self._extract_m3u8_formats(media['playlist_item']['file'], video_id),
|
||||
**traverse_obj(media, {
|
||||
'title': ('title', {str}),
|
||||
'tags': ('tags', ..., 'name', {str}),
|
||||
'categories': ('playlist_item', 'category', {str}, filter, all, filter),
|
||||
'description': ('content', {clean_html}, {str}),
|
||||
'timestamp': ('date', {lambda x: x.replace('.', '/')}, {unified_timestamp}),
|
||||
'thumbnail': ('playlist_item', 'image', {urljoin('https://www.lrt.lt')}),
|
||||
}),
|
||||
}
|
||||
|
@ -32,7 +32,7 @@
|
||||
urlencode_postdata,
|
||||
urljoin,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj
|
||||
from ..utils.traversal import find_element, require, traverse_obj
|
||||
|
||||
|
||||
class NiconicoBaseIE(InfoExtractor):
|
||||
@ -283,35 +283,54 @@ def _yield_dms_formats(self, api_data, video_id):
|
||||
lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1
|
||||
yield video_fmt
|
||||
|
||||
def _extract_server_response(self, webpage, video_id, fatal=True):
|
||||
try:
|
||||
return traverse_obj(
|
||||
self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id),
|
||||
('data', 'response', {dict}, {require('server response')}))
|
||||
except ExtractorError:
|
||||
if not fatal:
|
||||
return {}
|
||||
raise
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
try:
|
||||
webpage, handle = self._download_webpage_handle(
|
||||
'https://www.nicovideo.jp/watch/' + video_id, video_id)
|
||||
f'https://www.nicovideo.jp/watch/{video_id}', video_id,
|
||||
headers=self.geo_verification_headers())
|
||||
if video_id.startswith('so'):
|
||||
video_id = self._match_id(handle.url)
|
||||
|
||||
api_data = traverse_obj(
|
||||
self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id),
|
||||
('data', 'response', {dict}))
|
||||
if not api_data:
|
||||
raise ExtractorError('Server response data not found')
|
||||
api_data = self._extract_server_response(webpage, video_id)
|
||||
except ExtractorError as e:
|
||||
try:
|
||||
api_data = self._download_json(
|
||||
f'https://www.nicovideo.jp/api/watch/v3/{video_id}?_frontendId=6&_frontendVersion=0&actionTrackId=AAAAAAAAAA_{round(time.time() * 1000)}', video_id,
|
||||
note='Downloading API JSON', errnote='Unable to fetch data')['data']
|
||||
f'https://www.nicovideo.jp/api/watch/v3/{video_id}', video_id,
|
||||
'Downloading API JSON', 'Unable to fetch data', query={
|
||||
'_frontendId': '6',
|
||||
'_frontendVersion': '0',
|
||||
'actionTrackId': f'AAAAAAAAAA_{round(time.time() * 1000)}',
|
||||
}, headers=self.geo_verification_headers())['data']
|
||||
except ExtractorError:
|
||||
if not isinstance(e.cause, HTTPError):
|
||||
# Raise if original exception was from _parse_json or utils.traversal.require
|
||||
raise
|
||||
# The webpage server response has more detailed error info than the API response
|
||||
webpage = e.cause.response.read().decode('utf-8', 'replace')
|
||||
error_msg = self._html_search_regex(
|
||||
r'(?s)<section\s+class="(?:(?:ErrorMessage|WatchExceptionPage-message)\s*)+">(.+?)</section>',
|
||||
webpage, 'error reason', default=None)
|
||||
if not error_msg:
|
||||
reason_code = self._extract_server_response(
|
||||
webpage, video_id, fatal=False).get('reasonCode')
|
||||
if not reason_code:
|
||||
raise
|
||||
raise ExtractorError(clean_html(error_msg), expected=True)
|
||||
if reason_code in ('DOMESTIC_VIDEO', 'HIGH_RISK_COUNTRY_VIDEO'):
|
||||
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
|
||||
elif reason_code == 'HIDDEN_VIDEO':
|
||||
raise ExtractorError(
|
||||
'The viewing period of this video has expired', expected=True)
|
||||
elif reason_code == 'DELETED_VIDEO':
|
||||
raise ExtractorError('This video has been deleted', expected=True)
|
||||
raise ExtractorError(f'Niconico says: {reason_code}')
|
||||
|
||||
availability = self._availability(**(traverse_obj(api_data, ('payment', 'video', {
|
||||
'needs_premium': ('isPremium', {bool}),
|
||||
|
@ -340,8 +340,9 @@ def _real_extract(self, url):
|
||||
'channel_follower_count': ('attributes', 'patron_count', {int_or_none}),
|
||||
}))
|
||||
|
||||
# all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, Vimeo
|
||||
headers = {'referer': 'https://patreon.com/'}
|
||||
# Must be all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, and Vimeo.
|
||||
# patreon.com URLs redirect to www.patreon.com; this matters when requesting mux.com m3u8s
|
||||
headers = {'referer': 'https://www.patreon.com/'}
|
||||
|
||||
# handle Vimeo embeds
|
||||
if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo':
|
||||
@ -352,7 +353,7 @@ def _real_extract(self, url):
|
||||
v_url, video_id, 'Checking Vimeo embed URL', headers=headers,
|
||||
fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection
|
||||
entries.append(self.url_result(
|
||||
VimeoIE._smuggle_referrer(v_url, 'https://patreon.com/'),
|
||||
VimeoIE._smuggle_referrer(v_url, headers['referer']),
|
||||
VimeoIE, url_transparent=True))
|
||||
|
||||
embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none}))
|
||||
@ -379,11 +380,13 @@ def _real_extract(self, url):
|
||||
'url': post_file['url'],
|
||||
})
|
||||
elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8':
|
||||
formats, subtitles = self._extract_m3u8_formats_and_subtitles(post_file['url'], video_id)
|
||||
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
|
||||
post_file['url'], video_id, headers=headers)
|
||||
entries.append({
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'http_headers': headers,
|
||||
})
|
||||
|
||||
can_view_post = traverse_obj(attributes, 'current_user_can_view')
|
||||
|
@ -10,7 +10,8 @@
|
||||
|
||||
|
||||
class PicartoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
|
||||
IE_NAME = 'picarto'
|
||||
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[^/#?]+)/?(?:$|[?#])'
|
||||
_TEST = {
|
||||
'url': 'https://picarto.tv/Setz',
|
||||
'info_dict': {
|
||||
@ -89,7 +90,8 @@ def _real_extract(self, url):
|
||||
|
||||
|
||||
class PicartoVodIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?picarto\.tv/(?:videopopout|\w+/videos)/(?P<id>[^/?#&]+)'
|
||||
IE_NAME = 'picarto:vod'
|
||||
_VALID_URL = r'https?://(?:www\.)?picarto\.tv/(?:videopopout|\w+(?:/profile)?/videos)/(?P<id>[^/?#&]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://picarto.tv/videopopout/ArtofZod_2017.12.12.00.13.23.flv',
|
||||
'md5': '3ab45ba4352c52ee841a28fb73f2d9ca',
|
||||
@ -111,6 +113,18 @@ class PicartoVodIE(InfoExtractor):
|
||||
'channel': 'ArtofZod',
|
||||
'age_limit': 18,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://picarto.tv/DrechuArt/profile/videos/400347',
|
||||
'md5': 'f9ea54868b1d9dec40eb554b484cc7bf',
|
||||
'info_dict': {
|
||||
'id': '400347',
|
||||
'ext': 'mp4',
|
||||
'title': 'Welcome to the Show',
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'channel': 'DrechuArt',
|
||||
'age_limit': 0,
|
||||
},
|
||||
|
||||
}, {
|
||||
'url': 'https://picarto.tv/videopopout/Plague',
|
||||
'only_matching': True,
|
||||
|
@ -9,11 +9,10 @@
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
parse_qs,
|
||||
traverse_obj,
|
||||
update_url_query,
|
||||
urlencode_postdata,
|
||||
)
|
||||
from ..utils.traversal import unpack
|
||||
from ..utils.traversal import traverse_obj, unpack
|
||||
|
||||
|
||||
class PlaySuisseIE(InfoExtractor):
|
||||
|
@ -5,11 +5,13 @@
|
||||
from ..utils import (
|
||||
OnDemandPagedList,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
orderedSet,
|
||||
str_or_none,
|
||||
str_to_int,
|
||||
traverse_obj,
|
||||
unified_timestamp,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import require, traverse_obj
|
||||
|
||||
|
||||
class PodchaserIE(InfoExtractor):
|
||||
@ -21,24 +23,25 @@ class PodchaserIE(InfoExtractor):
|
||||
'id': '104365585',
|
||||
'title': 'Ep. 285 – freeze me off',
|
||||
'description': 'cam ahn',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'thumbnail': r're:https?://.+/.+\.jpg',
|
||||
'ext': 'mp3',
|
||||
'categories': ['Comedy'],
|
||||
'categories': ['Comedy', 'News', 'Politics', 'Arts'],
|
||||
'tags': ['comedy', 'dark humor'],
|
||||
'series': 'Cum Town',
|
||||
'series': 'The Adam Friedland Show Podcast',
|
||||
'duration': 3708,
|
||||
'timestamp': 1636531259,
|
||||
'upload_date': '20211110',
|
||||
'average_rating': 4.0,
|
||||
'series_id': '36924',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.podchaser.com/podcasts/the-bone-zone-28853',
|
||||
'info_dict': {
|
||||
'id': '28853',
|
||||
'title': 'The Bone Zone',
|
||||
'description': 'Podcast by The Bone Zone',
|
||||
'description': r're:The official home of the Bone Zone podcast.+',
|
||||
},
|
||||
'playlist_count': 275,
|
||||
'playlist_mincount': 275,
|
||||
}, {
|
||||
'url': 'https://www.podchaser.com/podcasts/sean-carrolls-mindscape-scienc-699349/episodes',
|
||||
'info_dict': {
|
||||
@ -51,19 +54,33 @@ class PodchaserIE(InfoExtractor):
|
||||
|
||||
@staticmethod
|
||||
def _parse_episode(episode, podcast):
|
||||
return {
|
||||
'id': str(episode.get('id')),
|
||||
'title': episode.get('title'),
|
||||
'description': episode.get('description'),
|
||||
'url': episode.get('audio_url'),
|
||||
'thumbnail': episode.get('image_url'),
|
||||
'duration': str_to_int(episode.get('length')),
|
||||
'timestamp': unified_timestamp(episode.get('air_date')),
|
||||
'average_rating': float_or_none(episode.get('rating')),
|
||||
'categories': list(set(traverse_obj(podcast, (('summary', None), 'categories', ..., 'text')))),
|
||||
'tags': traverse_obj(podcast, ('tags', ..., 'text')),
|
||||
'series': podcast.get('title'),
|
||||
}
|
||||
info = traverse_obj(episode, {
|
||||
'id': ('id', {int}, {str_or_none}, {require('episode ID')}),
|
||||
'title': ('title', {str}),
|
||||
'description': ('description', {str}),
|
||||
'url': ('audio_url', {url_or_none}),
|
||||
'thumbnail': ('image_url', {url_or_none}),
|
||||
'duration': ('length', {int_or_none}),
|
||||
'timestamp': ('air_date', {unified_timestamp}),
|
||||
'average_rating': ('rating', {float_or_none}),
|
||||
})
|
||||
info.update(traverse_obj(podcast, {
|
||||
'series': ('title', {str}),
|
||||
'series_id': ('id', {int}, {str_or_none}),
|
||||
'categories': (('summary', None), 'categories', ..., 'text', {str}, filter, all, {orderedSet}),
|
||||
'tags': ('tags', ..., 'text', {str}),
|
||||
}))
|
||||
info['vcodec'] = 'none'
|
||||
|
||||
if info.get('series_id'):
|
||||
podcast_slug = traverse_obj(podcast, ('slug', {str})) or 'podcast'
|
||||
episode_slug = traverse_obj(episode, ('slug', {str})) or 'episode'
|
||||
info['webpage_url'] = '/'.join((
|
||||
'https://www.podchaser.com/podcasts',
|
||||
'-'.join((podcast_slug[:30].rstrip('-'), info['series_id'])),
|
||||
'-'.join((episode_slug[:30].rstrip('-'), info['id']))))
|
||||
|
||||
return info
|
||||
|
||||
def _call_api(self, path, *args, **kwargs):
|
||||
return self._download_json(f'https://api.podchaser.com/{path}', *args, **kwargs)
|
||||
@ -93,5 +110,5 @@ def _real_extract(self, url):
|
||||
OnDemandPagedList(functools.partial(self._fetch_page, podcast_id, podcast), self._PAGE_SIZE),
|
||||
str_or_none(podcast.get('id')), podcast.get('title'), podcast.get('description'))
|
||||
|
||||
episode = self._call_api(f'episodes/{episode_id}', episode_id)
|
||||
episode = self._call_api(f'podcasts/{podcast_id}/episodes/{episode_id}/player_ids', episode_id)
|
||||
return self._parse_episode(episode, podcast)
|
||||
|
@ -697,7 +697,7 @@ def _real_extract(self, url):
|
||||
try:
|
||||
return self._extract_info_dict(info, full_title, token)
|
||||
except ExtractorError as e:
|
||||
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429:
|
||||
if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
|
||||
raise
|
||||
self.report_warning(
|
||||
'You have reached the API rate limit, which is ~600 requests per '
|
||||
|
121
yt_dlp/extractor/toutiao.py
Normal file
121
yt_dlp/extractor/toutiao.py
Normal file
@ -0,0 +1,121 @@
|
||||
import json
|
||||
import urllib.parse
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
str_or_none,
|
||||
try_call,
|
||||
url_or_none,
|
||||
)
|
||||
from ..utils.traversal import find_element, traverse_obj
|
||||
|
||||
|
||||
class ToutiaoIE(InfoExtractor):
|
||||
IE_NAME = 'toutiao'
|
||||
IE_DESC = '今日头条'
|
||||
|
||||
_VALID_URL = r'https?://www\.toutiao\.com/video/(?P<id>\d+)/?(?:[?#]|$)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.toutiao.com/video/7505382061495176511/',
|
||||
'info_dict': {
|
||||
'id': '7505382061495176511',
|
||||
'ext': 'mp4',
|
||||
'title': '新疆多地现不明飞行物,目击者称和月亮一样亮,几秒内突然加速消失,气象部门回应',
|
||||
'comment_count': int,
|
||||
'duration': 9.753,
|
||||
'like_count': int,
|
||||
'release_date': '20250517',
|
||||
'release_timestamp': 1747483344,
|
||||
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
|
||||
'uploader': '极目新闻',
|
||||
'uploader_id': 'MS4wLjABAAAAeateBb9Su8I3MJOZozmvyzWktmba5LMlliRDz1KffnM',
|
||||
'view_count': int,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.toutiao.com/video/7479446610359878153/',
|
||||
'info_dict': {
|
||||
'id': '7479446610359878153',
|
||||
'ext': 'mp4',
|
||||
'title': '小伙竟然利用两块磁铁制作成磁力减震器,简直太有创意了!',
|
||||
'comment_count': int,
|
||||
'duration': 118.374,
|
||||
'like_count': int,
|
||||
'release_date': '20250308',
|
||||
'release_timestamp': 1741444368,
|
||||
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
|
||||
'uploader': '小莉创意发明',
|
||||
'uploader_id': 'MS4wLjABAAAA4f7d4mwtApALtHIiq-QM20dwXqe32NUz0DeWF7wbHKw',
|
||||
'view_count': int,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_initialize(self):
|
||||
if self._get_cookies('https://www.toutiao.com').get('ttwid'):
|
||||
return
|
||||
|
||||
urlh = self._request_webpage(
|
||||
'https://ttwid.bytedance.com/ttwid/union/register/', None,
|
||||
'Fetching ttwid', 'Unable to fetch ttwid', headers={
|
||||
'Content-Type': 'application/json',
|
||||
}, data=json.dumps({
|
||||
'aid': 24,
|
||||
'needFid': False,
|
||||
'region': 'cn',
|
||||
'service': 'www.toutiao.com',
|
||||
'union': True,
|
||||
}).encode(),
|
||||
)
|
||||
|
||||
if ttwid := try_call(lambda: self._get_cookies(urlh.url)['ttwid'].value):
|
||||
self._set_cookie('.toutiao.com', 'ttwid', ttwid)
|
||||
return
|
||||
|
||||
self.raise_login_required()
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
video_data = traverse_obj(webpage, (
|
||||
{find_element(tag='script', id='RENDER_DATA')},
|
||||
{urllib.parse.unquote}, {json.loads}, 'data', 'initialVideo',
|
||||
))
|
||||
|
||||
formats = []
|
||||
for video in traverse_obj(video_data, (
|
||||
'videoPlayInfo', 'video_list', lambda _, v: v['main_url'],
|
||||
)):
|
||||
formats.append({
|
||||
'url': video['main_url'],
|
||||
**traverse_obj(video, ('video_meta', {
|
||||
'acodec': ('audio_profile', {str}),
|
||||
'asr': ('audio_sample_rate', {int_or_none}),
|
||||
'audio_channels': ('audio_channels', {float_or_none}, {int_or_none}),
|
||||
'ext': ('vtype', {str}),
|
||||
'filesize': ('size', {int_or_none}),
|
||||
'format_id': ('definition', {str}),
|
||||
'fps': ('fps', {int_or_none}),
|
||||
'height': ('vheight', {int_or_none}),
|
||||
'tbr': ('real_bitrate', {float_or_none(scale=1000)}),
|
||||
'vcodec': ('codec_type', {str}),
|
||||
'width': ('vwidth', {int_or_none}),
|
||||
})),
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
**traverse_obj(video_data, {
|
||||
'comment_count': ('commentCount', {int_or_none}),
|
||||
'duration': ('videoPlayInfo', 'video_duration', {float_or_none}),
|
||||
'like_count': ('repinCount', {int_or_none}),
|
||||
'release_timestamp': ('publishTime', {int_or_none}),
|
||||
'thumbnail': (('poster', 'coverUrl'), {url_or_none}, any),
|
||||
'title': ('title', {str}),
|
||||
'uploader': ('userInfo', 'name', {str}),
|
||||
'uploader_id': ('userInfo', 'userId', {str_or_none}),
|
||||
'view_count': ('playCount', {int_or_none}),
|
||||
'webpage_url': ('detailUrl', {url_or_none}),
|
||||
}),
|
||||
}
|
@ -1,4 +1,5 @@
|
||||
import base64
|
||||
import hashlib
|
||||
import itertools
|
||||
import re
|
||||
|
||||
@ -16,6 +17,7 @@
|
||||
str_to_int,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
update_url_query,
|
||||
url_or_none,
|
||||
urlencode_postdata,
|
||||
urljoin,
|
||||
@ -171,6 +173,10 @@ def find_dmu(x):
|
||||
'player': 'pc_web',
|
||||
})
|
||||
|
||||
password_params = {
|
||||
'word': hashlib.md5(video_password.encode()).hexdigest(),
|
||||
} if video_password else None
|
||||
|
||||
formats = []
|
||||
# low: 640x360, medium: 1280x720, high: 1920x1080
|
||||
qq = qualities(['low', 'medium', 'high'])
|
||||
@ -178,7 +184,7 @@ def find_dmu(x):
|
||||
'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
|
||||
)):
|
||||
formats.append({
|
||||
'url': m3u8_url,
|
||||
'url': update_url_query(m3u8_url, password_params),
|
||||
'format_id': f'hls-{quality}',
|
||||
'ext': 'mp4',
|
||||
'quality': qq(quality),
|
||||
@ -192,7 +198,7 @@ def find_dmu(x):
|
||||
'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
|
||||
)):
|
||||
formats.append({
|
||||
'url': ws_url,
|
||||
'url': update_url_query(ws_url, password_params),
|
||||
'format_id': f'ws-{mode}',
|
||||
'ext': 'mp4',
|
||||
'quality': qq(mode),
|
||||
|
@ -187,7 +187,7 @@ def _get_thumbnails(self, thumbnail):
|
||||
'url': thumbnail,
|
||||
}] if thumbnail else None
|
||||
|
||||
def _extract_twitch_m3u8_formats(self, path, video_id, token, signature):
|
||||
def _extract_twitch_m3u8_formats(self, path, video_id, token, signature, live_from_start=False):
|
||||
formats = self._extract_m3u8_formats(
|
||||
f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={
|
||||
'allow_source': 'true',
|
||||
@ -204,7 +204,10 @@ def _extract_twitch_m3u8_formats(self, path, video_id, token, signature):
|
||||
for fmt in formats:
|
||||
if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'):
|
||||
# mpegts does not yet have proper support for av1
|
||||
fmt['downloader_options'] = {'ffmpeg_args_out': ['-f', 'mp4']}
|
||||
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args_out': ['-f', 'mp4']})
|
||||
if live_from_start:
|
||||
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args': ['-live_start_index', '0']})
|
||||
fmt['is_from_start'] = True
|
||||
|
||||
return formats
|
||||
|
||||
@ -550,7 +553,8 @@ def _real_extract(self, url):
|
||||
access_token = self._download_access_token(vod_id, 'video', 'id')
|
||||
|
||||
formats = self._extract_twitch_m3u8_formats(
|
||||
'vod', vod_id, access_token['value'], access_token['signature'])
|
||||
'vod', vod_id, access_token['value'], access_token['signature'],
|
||||
live_from_start=self.get_param('live_from_start'))
|
||||
formats.extend(self._extract_storyboard(vod_id, video.get('storyboard'), info.get('duration')))
|
||||
|
||||
self._prefer_source(formats)
|
||||
@ -633,6 +637,10 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
|
||||
_PAGE_LIMIT = 100
|
||||
|
||||
def _entries(self, channel_name, *args):
|
||||
"""
|
||||
Subclasses must define _make_variables() and _extract_entry(),
|
||||
as well as set _OPERATION_NAME, _ENTRY_KIND, _EDGE_KIND, and _NODE_KIND
|
||||
"""
|
||||
cursor = None
|
||||
variables_common = self._make_variables(channel_name, *args)
|
||||
entries_key = f'{self._ENTRY_KIND}s'
|
||||
@ -672,7 +680,22 @@ def _entries(self, channel_name, *args):
|
||||
break
|
||||
|
||||
|
||||
class TwitchVideosIE(TwitchPlaylistBaseIE):
|
||||
class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
|
||||
_OPERATION_NAME = 'FilterableVideoTower_Videos'
|
||||
_ENTRY_KIND = 'video'
|
||||
_EDGE_KIND = 'VideoEdge'
|
||||
_NODE_KIND = 'Video'
|
||||
|
||||
@staticmethod
|
||||
def _make_variables(channel_name, broadcast_type, sort):
|
||||
return {
|
||||
'channelOwnerLogin': channel_name,
|
||||
'broadcastType': broadcast_type,
|
||||
'videoSort': sort.upper(),
|
||||
}
|
||||
|
||||
|
||||
class TwitchVideosIE(TwitchVideosBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)'
|
||||
|
||||
_TESTS = [{
|
||||
@ -751,11 +774,6 @@ class TwitchVideosIE(TwitchPlaylistBaseIE):
|
||||
'views': 'Popular',
|
||||
}
|
||||
|
||||
_OPERATION_NAME = 'FilterableVideoTower_Videos'
|
||||
_ENTRY_KIND = 'video'
|
||||
_EDGE_KIND = 'VideoEdge'
|
||||
_NODE_KIND = 'Video'
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return (False
|
||||
@ -764,14 +782,6 @@ def suitable(cls, url):
|
||||
TwitchVideosCollectionsIE))
|
||||
else super().suitable(url))
|
||||
|
||||
@staticmethod
|
||||
def _make_variables(channel_name, broadcast_type, sort):
|
||||
return {
|
||||
'channelOwnerLogin': channel_name,
|
||||
'broadcastType': broadcast_type,
|
||||
'videoSort': sort.upper(),
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _extract_entry(node):
|
||||
return _make_video_result(node)
|
||||
@ -919,7 +929,7 @@ def _real_extract(self, url):
|
||||
playlist_title=f'{channel_name} - Collections')
|
||||
|
||||
|
||||
class TwitchStreamIE(TwitchBaseIE):
|
||||
class TwitchStreamIE(TwitchVideosBaseIE):
|
||||
IE_NAME = 'twitch:stream'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
@ -982,6 +992,7 @@ class TwitchStreamIE(TwitchBaseIE):
|
||||
'skip_download': 'Livestream',
|
||||
},
|
||||
}]
|
||||
_PAGE_LIMIT = 1
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
@ -995,6 +1006,20 @@ def suitable(cls, url):
|
||||
TwitchClipsIE))
|
||||
else super().suitable(url))
|
||||
|
||||
@staticmethod
|
||||
def _extract_entry(node):
|
||||
if not isinstance(node, dict) or not node.get('id'):
|
||||
return None
|
||||
video_id = node['id']
|
||||
return {
|
||||
'_type': 'url',
|
||||
'ie_key': TwitchVodIE.ie_key(),
|
||||
'id': 'v' + video_id,
|
||||
'url': f'https://www.twitch.tv/videos/{video_id}',
|
||||
'title': node.get('title'),
|
||||
'timestamp': unified_timestamp(node.get('publishedAt')) or 0,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_name = self._match_id(url).lower()
|
||||
|
||||
@ -1029,6 +1054,16 @@ def _real_extract(self, url):
|
||||
if not stream:
|
||||
raise UserNotLive(video_id=channel_name)
|
||||
|
||||
timestamp = unified_timestamp(stream.get('createdAt'))
|
||||
|
||||
if self.get_param('live_from_start'):
|
||||
self.to_screen(f'{channel_name}: Extracting VOD to download live from start')
|
||||
entry = next(self._entries(channel_name, None, 'time'), None)
|
||||
if entry and entry.pop('timestamp') >= (timestamp or float('inf')):
|
||||
return entry
|
||||
self.report_warning(
|
||||
'Unable to extract the VOD associated with this livestream', video_id=channel_name)
|
||||
|
||||
access_token = self._download_access_token(
|
||||
channel_name, 'stream', 'channelName')
|
||||
|
||||
@ -1038,7 +1073,6 @@ def _real_extract(self, url):
|
||||
self._prefer_source(formats)
|
||||
|
||||
view_count = stream.get('viewers')
|
||||
timestamp = unified_timestamp(stream.get('createdAt'))
|
||||
|
||||
sq_user = try_get(gql, lambda x: x[1]['data']['user'], dict) or {}
|
||||
uploader = sq_user.get('displayName')
|
||||
|
@ -20,7 +20,6 @@
|
||||
remove_end,
|
||||
str_or_none,
|
||||
strip_or_none,
|
||||
traverse_obj,
|
||||
truncate_string,
|
||||
try_call,
|
||||
try_get,
|
||||
@ -29,6 +28,7 @@
|
||||
url_or_none,
|
||||
xpath_text,
|
||||
)
|
||||
from ..utils.traversal import require, traverse_obj
|
||||
|
||||
|
||||
class TwitterBaseIE(InfoExtractor):
|
||||
@ -1342,7 +1342,7 @@ def _extract_status(self, twid):
|
||||
'tweet_mode': 'extended',
|
||||
})
|
||||
except ExtractorError as e:
|
||||
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429:
|
||||
if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
|
||||
raise
|
||||
self.report_warning('Rate-limit exceeded; falling back to syndication endpoint')
|
||||
status = self._call_syndication_api(twid)
|
||||
@ -1596,8 +1596,8 @@ def _find_dimension(target):
|
||||
|
||||
class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
IE_NAME = 'twitter:broadcast'
|
||||
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
|
||||
|
||||
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/(?P<type>broadcasts|events)/(?P<id>\w+)'
|
||||
_TESTS = [{
|
||||
# untitled Periscope video
|
||||
'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
|
||||
@ -1605,6 +1605,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'id': '1yNGaQLWpejGj',
|
||||
'ext': 'mp4',
|
||||
'title': 'Andrea May Sahouri - Periscope Broadcast',
|
||||
'display_id': '1yNGaQLWpejGj',
|
||||
'uploader': 'Andrea May Sahouri',
|
||||
'uploader_id': 'andreamsahouri',
|
||||
'uploader_url': 'https://twitter.com/andreamsahouri',
|
||||
@ -1612,6 +1613,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'upload_date': '20200601',
|
||||
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
|
||||
'view_count': int,
|
||||
'concurrent_view_count': int,
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://twitter.com/i/broadcasts/1ZkKzeyrPbaxv',
|
||||
@ -1619,6 +1622,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'id': '1ZkKzeyrPbaxv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Starship | SN10 | High-Altitude Flight Test',
|
||||
'display_id': '1ZkKzeyrPbaxv',
|
||||
'uploader': 'SpaceX',
|
||||
'uploader_id': 'SpaceX',
|
||||
'uploader_url': 'https://twitter.com/SpaceX',
|
||||
@ -1626,6 +1630,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'upload_date': '20210303',
|
||||
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
|
||||
'view_count': int,
|
||||
'concurrent_view_count': int,
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://twitter.com/i/broadcasts/1OyKAVQrgzwGb',
|
||||
@ -1633,6 +1639,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'id': '1OyKAVQrgzwGb',
|
||||
'ext': 'mp4',
|
||||
'title': 'Starship Flight Test',
|
||||
'display_id': '1OyKAVQrgzwGb',
|
||||
'uploader': 'SpaceX',
|
||||
'uploader_id': 'SpaceX',
|
||||
'uploader_url': 'https://twitter.com/SpaceX',
|
||||
@ -1640,21 +1647,58 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
|
||||
'upload_date': '20230420',
|
||||
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
|
||||
'view_count': int,
|
||||
'concurrent_view_count': int,
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://x.com/i/events/1910629646300762112',
|
||||
'info_dict': {
|
||||
'id': '1LyxBWDRNqyKN',
|
||||
'ext': 'mp4',
|
||||
'title': '#ガンニバル ウォッチパーティー',
|
||||
'concurrent_view_count': int,
|
||||
'display_id': '1910629646300762112',
|
||||
'live_status': 'was_live',
|
||||
'release_date': '20250423',
|
||||
'release_timestamp': 1745409000,
|
||||
'tags': ['ガンニバル'],
|
||||
'thumbnail': r're:https?://[^?#]+\.jpg\?token=',
|
||||
'timestamp': 1745403328,
|
||||
'upload_date': '20250423',
|
||||
'uploader': 'ディズニープラス公式',
|
||||
'uploader_id': 'DisneyPlusJP',
|
||||
'uploader_url': 'https://twitter.com/DisneyPlusJP',
|
||||
'view_count': int,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
broadcast_id = self._match_id(url)
|
||||
broadcast_type, display_id = self._match_valid_url(url).group('type', 'id')
|
||||
|
||||
if broadcast_type == 'events':
|
||||
timeline = self._call_api(
|
||||
f'live_event/1/{display_id}/timeline.json', display_id)
|
||||
broadcast_id = traverse_obj(timeline, (
|
||||
'twitter_objects', 'broadcasts', ..., ('id', 'broadcast_id'),
|
||||
{str}, any, {require('broadcast ID')}))
|
||||
else:
|
||||
broadcast_id = display_id
|
||||
|
||||
broadcast = self._call_api(
|
||||
'broadcasts/show.json', broadcast_id,
|
||||
{'ids': broadcast_id})['broadcasts'][broadcast_id]
|
||||
if not broadcast:
|
||||
raise ExtractorError('Broadcast no longer exists', expected=True)
|
||||
info = self._parse_broadcast_data(broadcast, broadcast_id)
|
||||
info['title'] = broadcast.get('status') or info.get('title')
|
||||
info['uploader_id'] = broadcast.get('twitter_username') or info.get('uploader_id')
|
||||
info['uploader_url'] = format_field(broadcast, 'twitter_username', 'https://twitter.com/%s', default=None)
|
||||
info.update({
|
||||
'display_id': display_id,
|
||||
'title': broadcast.get('status') or info.get('title'),
|
||||
'uploader_id': broadcast.get('twitter_username') or info.get('uploader_id'),
|
||||
'uploader_url': format_field(
|
||||
broadcast, 'twitter_username', 'https://twitter.com/%s', default=None),
|
||||
})
|
||||
if info['live_status'] == 'is_upcoming':
|
||||
self.raise_no_formats('This live broadcast has not yet started', expected=True)
|
||||
return info
|
||||
|
||||
media_key = broadcast['media_key']
|
||||
|
@ -3,6 +3,7 @@
|
||||
import itertools
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
from .common import InfoExtractor
|
||||
@ -13,10 +14,12 @@
|
||||
OnDemandPagedList,
|
||||
clean_html,
|
||||
determine_ext,
|
||||
filter_dict,
|
||||
get_element_by_class,
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
js_to_json,
|
||||
jwt_decode_hs256,
|
||||
merge_dicts,
|
||||
parse_filesize,
|
||||
parse_iso8601,
|
||||
@ -39,6 +42,9 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
||||
_NETRC_MACHINE = 'vimeo'
|
||||
_LOGIN_REQUIRED = False
|
||||
_LOGIN_URL = 'https://vimeo.com/log_in'
|
||||
_REFERER_HINT = (
|
||||
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
|
||||
'with the URL of the page that embeds this video.')
|
||||
_IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw=='
|
||||
_IOS_CLIENT_HEADERS = {
|
||||
'Accept': 'application/vnd.vimeo.*+json; version=3.4.10',
|
||||
@ -47,6 +53,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
||||
}
|
||||
_IOS_OAUTH_CACHE_KEY = 'oauth-token-ios'
|
||||
_ios_oauth_token = None
|
||||
_viewer_info = None
|
||||
|
||||
@staticmethod
|
||||
def _smuggle_referrer(url, referrer_url):
|
||||
@ -60,8 +67,21 @@ def _unsmuggle_headers(self, url):
|
||||
headers['Referer'] = data['referer']
|
||||
return url, data, headers
|
||||
|
||||
def _jwt_is_expired(self, token):
|
||||
return jwt_decode_hs256(token)['exp'] - time.time() < 120
|
||||
|
||||
def _fetch_viewer_info(self, display_id=None, fatal=True):
|
||||
if self._viewer_info and not self._jwt_is_expired(self._viewer_info['jwt']):
|
||||
return self._viewer_info
|
||||
|
||||
self._viewer_info = self._download_json(
|
||||
'https://vimeo.com/_next/viewer', display_id, 'Downloading web token info',
|
||||
'Failed to download web token info', fatal=fatal, headers={'Accept': 'application/json'})
|
||||
|
||||
return self._viewer_info
|
||||
|
||||
def _perform_login(self, username, password):
|
||||
viewer = self._download_json('https://vimeo.com/_next/viewer', None, 'Downloading login token')
|
||||
viewer = self._fetch_viewer_info()
|
||||
data = {
|
||||
'action': 'login',
|
||||
'email': username,
|
||||
@ -96,11 +116,10 @@ def _get_video_password(self):
|
||||
expected=True)
|
||||
return password
|
||||
|
||||
def _verify_video_password(self, video_id):
|
||||
def _verify_video_password(self, video_id, path=None):
|
||||
video_password = self._get_video_password()
|
||||
token = self._download_json(
|
||||
'https://vimeo.com/_next/viewer', video_id, 'Downloading viewer info')['xsrft']
|
||||
url = f'https://vimeo.com/{video_id}'
|
||||
token = self._fetch_viewer_info(video_id)['xsrft']
|
||||
url = join_nonempty('https://vimeo.com', path, video_id, delim='/')
|
||||
try:
|
||||
self._request_webpage(
|
||||
f'{url}/password', video_id,
|
||||
@ -117,6 +136,10 @@ def _verify_video_password(self, video_id):
|
||||
raise ExtractorError('Wrong password', expected=True)
|
||||
raise
|
||||
|
||||
def _extract_config_url(self, webpage, **kwargs):
|
||||
return self._html_search_regex(
|
||||
r'\bdata-config-url="([^"]+)"', webpage, 'config URL', **kwargs)
|
||||
|
||||
def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
|
||||
vimeo_config = self._search_regex(
|
||||
r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));',
|
||||
@ -164,6 +187,7 @@ def _parse_config(self, config, video_id):
|
||||
sep_pattern = r'/sep/video/'
|
||||
for files_type in ('hls', 'dash'):
|
||||
for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
|
||||
# TODO: Also extract 'avc_url'? Investigate if there are 'hevc_url', 'av1_url'?
|
||||
manifest_url = cdn_data.get('url')
|
||||
if not manifest_url:
|
||||
continue
|
||||
@ -244,7 +268,10 @@ def _parse_config(self, config, video_id):
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'live_status': live_status,
|
||||
'release_timestamp': traverse_obj(live_event, ('ingest', 'scheduled_start_time', {parse_iso8601})),
|
||||
'release_timestamp': traverse_obj(live_event, ('ingest', (
|
||||
('scheduled_start_time', {parse_iso8601}),
|
||||
('start_time', {int_or_none}),
|
||||
), any)),
|
||||
# Note: Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
|
||||
# at the same time without actual units specified.
|
||||
'_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'),
|
||||
@ -353,7 +380,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
||||
(?:
|
||||
(?P<u>user)|
|
||||
(?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
|
||||
(?:.*?/)??
|
||||
(?:(?!event/).*?/)??
|
||||
(?P<q>
|
||||
(?:
|
||||
play_redirect_hls|
|
||||
@ -933,8 +960,7 @@ def _try_album_password(self, url):
|
||||
r'vimeo\.com/(?:album|showcase)/([^/]+)', url, 'album id', default=None)
|
||||
if not album_id:
|
||||
return
|
||||
viewer = self._download_json(
|
||||
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
|
||||
viewer = self._fetch_viewer_info(album_id, fatal=False)
|
||||
if not viewer:
|
||||
webpage = self._download_webpage(url, album_id)
|
||||
viewer = self._parse_json(self._search_regex(
|
||||
@ -992,9 +1018,7 @@ def _real_extract(self, url):
|
||||
raise
|
||||
errmsg = error.cause.response.read()
|
||||
if b'Because of its privacy settings, this video cannot be played here' in errmsg:
|
||||
raise ExtractorError(
|
||||
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
|
||||
'with the URL of the page that embeds this video.', expected=True)
|
||||
raise ExtractorError(self._REFERER_HINT, expected=True)
|
||||
# 403 == vimeo.com TLS fingerprint or DC IP block; 429 == player.vimeo.com TLS FP block
|
||||
status = error.cause.status
|
||||
dcip_msg = 'If you are using a data center IP or VPN/proxy, your IP may be blocked'
|
||||
@ -1039,8 +1063,7 @@ def _real_extract(self, url):
|
||||
channel_id = self._search_regex(
|
||||
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
|
||||
if channel_id:
|
||||
config_url = self._html_search_regex(
|
||||
r'\bdata-config-url="([^"]+)"', webpage, 'config URL', default=None)
|
||||
config_url = self._extract_config_url(webpage, default=None)
|
||||
video_description = clean_html(get_element_by_class('description', webpage))
|
||||
info_dict.update({
|
||||
'channel_id': channel_id,
|
||||
@ -1333,8 +1356,7 @@ def _fetch_page(self, album_id, authorization, hashed_pass, page):
|
||||
|
||||
def _real_extract(self, url):
|
||||
album_id = self._match_id(url)
|
||||
viewer = self._download_json(
|
||||
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
|
||||
viewer = self._fetch_viewer_info(album_id, fatal=False)
|
||||
if not viewer:
|
||||
webpage = self._download_webpage(url, album_id)
|
||||
viewer = self._parse_json(self._search_regex(
|
||||
@ -1626,3 +1648,377 @@ def _real_extract(self, url):
|
||||
|
||||
return self.url_result(vimeo_url, VimeoIE, video_id, url_transparent=True,
|
||||
description=description)
|
||||
|
||||
|
||||
class VimeoEventIE(VimeoBaseInfoExtractor):
|
||||
IE_NAME = 'vimeo:event'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://(?:www\.)?vimeo\.com/event/(?P<id>\d+)(?:/
|
||||
(?:
|
||||
(?:embed/)?(?P<unlisted_hash>[\da-f]{10})|
|
||||
videos/(?P<video_id>\d+)
|
||||
)
|
||||
)?'''
|
||||
_EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>https?://vimeo\.com/event/\d+/embed(?:[/?][^"\']*)?)["\'][^>]*>']
|
||||
_TESTS = [{
|
||||
# stream_privacy.view: 'anybody'
|
||||
'url': 'https://vimeo.com/event/5116195',
|
||||
'info_dict': {
|
||||
'id': '1082194134',
|
||||
'ext': 'mp4',
|
||||
'display_id': '5116195',
|
||||
'title': 'Skidmore College Commencement 2025',
|
||||
'description': 'md5:1902dd5165d21f98aa198297cc729d23',
|
||||
'uploader': 'Skidmore College',
|
||||
'uploader_id': 'user116066434',
|
||||
'uploader_url': 'https://vimeo.com/user116066434',
|
||||
'comment_count': int,
|
||||
'like_count': int,
|
||||
'duration': 9810,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'timestamp': 1747502974,
|
||||
'upload_date': '20250517',
|
||||
'release_timestamp': 1747502998,
|
||||
'release_date': '20250517',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# stream_privacy.view: 'embed_only'
|
||||
'url': 'https://vimeo.com/event/5034253/embed',
|
||||
'info_dict': {
|
||||
'id': '1071439154',
|
||||
'ext': 'mp4',
|
||||
'display_id': '5034253',
|
||||
'title': 'Advancing Humans with AI',
|
||||
'description': r're:AI is here to stay, but how do we ensure that people flourish in a world of pervasive AI use.{322}$',
|
||||
'uploader': 'MIT Media Lab',
|
||||
'uploader_id': 'mitmedialab',
|
||||
'uploader_url': 'https://vimeo.com/mitmedialab',
|
||||
'duration': 23235,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'chapters': 'count:37',
|
||||
'release_timestamp': 1744290000,
|
||||
'release_date': '20250410',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': 'm3u8',
|
||||
'http_headers': {'Referer': 'https://www.media.mit.edu/events/aha-symposium/'},
|
||||
},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# Last entry on 2nd page of the 37 video playlist, but use clip_to_play_id API param shortcut
|
||||
'url': 'https://vimeo.com/event/4753126/videos/1046153257',
|
||||
'info_dict': {
|
||||
'id': '1046153257',
|
||||
'ext': 'mp4',
|
||||
'display_id': '4753126',
|
||||
'title': 'January 12, 2025 The True Vine (Pastor John Mindrup)',
|
||||
'description': 'The True Vine (Pastor \tJohn Mindrup)',
|
||||
'uploader': 'Salem United Church of Christ',
|
||||
'uploader_id': 'user230181094',
|
||||
'uploader_url': 'https://vimeo.com/user230181094',
|
||||
'comment_count': int,
|
||||
'like_count': int,
|
||||
'duration': 4962,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'timestamp': 1736702464,
|
||||
'upload_date': '20250112',
|
||||
'release_timestamp': 1736702543,
|
||||
'release_date': '20250112',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# "24/7" livestream
|
||||
'url': 'https://vimeo.com/event/4768062',
|
||||
'info_dict': {
|
||||
'id': '1079901414',
|
||||
'ext': 'mp4',
|
||||
'display_id': '4768062',
|
||||
'title': r're:GRACELAND CAM \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
|
||||
'description': '24/7 camera at Graceland Mansion',
|
||||
'uploader': 'Elvis Presley\'s Graceland',
|
||||
'uploader_id': 'visitgraceland',
|
||||
'uploader_url': 'https://vimeo.com/visitgraceland',
|
||||
'release_timestamp': 1745975450,
|
||||
'release_date': '20250430',
|
||||
'live_status': 'is_live',
|
||||
},
|
||||
'params': {'skip_download': 'livestream'},
|
||||
}, {
|
||||
# stream_privacy.view: 'unlisted' with unlisted_hash in URL path (stream_privacy.embed: 'whitelist')
|
||||
'url': 'https://vimeo.com/event/4259978/3db517c479',
|
||||
'info_dict': {
|
||||
'id': '939104114',
|
||||
'ext': 'mp4',
|
||||
'display_id': '4259978',
|
||||
'title': 'Enhancing Credibility in Your Community Science Project',
|
||||
'description': 'md5:eab953341168b9c146bc3cfe3f716070',
|
||||
'uploader': 'NOAA Research',
|
||||
'uploader_id': 'noaaresearch',
|
||||
'uploader_url': 'https://vimeo.com/noaaresearch',
|
||||
'comment_count': int,
|
||||
'like_count': int,
|
||||
'duration': 3961,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'timestamp': 1716408008,
|
||||
'upload_date': '20240522',
|
||||
'release_timestamp': 1716408062,
|
||||
'release_date': '20240522',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# "done" event with video_id in URL and unlisted_hash in VimeoIE URL
|
||||
'url': 'https://vimeo.com/event/595460/videos/498149131/',
|
||||
'info_dict': {
|
||||
'id': '498149131',
|
||||
'ext': 'mp4',
|
||||
'display_id': '595460',
|
||||
'title': '2021 Eighth Annual John Cardinal Foley Lecture on Social Communications',
|
||||
'description': 'Replay: https://vimeo.com/catholicphilly/review/498149131/544f26a12f',
|
||||
'uploader': 'Kearns Media Consulting LLC',
|
||||
'uploader_id': 'kearnsmediaconsulting',
|
||||
'uploader_url': 'https://vimeo.com/kearnsmediaconsulting',
|
||||
'comment_count': int,
|
||||
'like_count': int,
|
||||
'duration': 4466,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'timestamp': 1612228466,
|
||||
'upload_date': '20210202',
|
||||
'release_timestamp': 1612228538,
|
||||
'release_date': '20210202',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# stream_privacy.view: 'password'; stream_privacy.embed: 'public'
|
||||
'url': 'https://vimeo.com/event/4940578',
|
||||
'info_dict': {
|
||||
'id': '1059263570',
|
||||
'ext': 'mp4',
|
||||
'display_id': '4940578',
|
||||
'title': 'TMAC AKC AGILITY 2-22-2025',
|
||||
'uploader': 'Paws \'N Effect',
|
||||
'uploader_id': 'pawsneffect',
|
||||
'uploader_url': 'https://vimeo.com/pawsneffect',
|
||||
'comment_count': int,
|
||||
'like_count': int,
|
||||
'duration': 33115,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'timestamp': 1740261836,
|
||||
'upload_date': '20250222',
|
||||
'release_timestamp': 1740261873,
|
||||
'release_date': '20250222',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {
|
||||
'videopassword': '22',
|
||||
'skip_download': 'm3u8',
|
||||
},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}, {
|
||||
# API serves a playlist of 37 videos, but the site only streams the newest one (changes every Sunday)
|
||||
'url': 'https://vimeo.com/event/4753126',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# Scheduled for 2025.05.15 but never started; "unavailable"; stream_privacy.view: "anybody"
|
||||
'url': 'https://vimeo.com/event/5120811/embed',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://vimeo.com/event/5112969/embed?muted=1',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://vimeo.com/event/5097437/embed/interaction?muted=1',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://vimeo.com/event/5113032/embed?autoplay=1&muted=1',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# Ended livestream with video_id
|
||||
'url': 'https://vimeo.com/event/595460/videos/507329569/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# stream_privacy.view: 'unlisted' with unlisted_hash in URL path (stream_privacy.embed: 'public')
|
||||
'url': 'https://vimeo.com/event/4606123/embed/358d60ce2e',
|
||||
'only_matching': True,
|
||||
}]
|
||||
_WEBPAGE_TESTS = [{
|
||||
# Same result as https://vimeo.com/event/5034253/embed
|
||||
'url': 'https://www.media.mit.edu/events/aha-symposium/',
|
||||
'info_dict': {
|
||||
'id': '1071439154',
|
||||
'ext': 'mp4',
|
||||
'display_id': '5034253',
|
||||
'title': 'Advancing Humans with AI',
|
||||
'description': r're:AI is here to stay, but how do we ensure that people flourish in a world of pervasive AI use.{322}$',
|
||||
'uploader': 'MIT Media Lab',
|
||||
'uploader_id': 'mitmedialab',
|
||||
'uploader_url': 'https://vimeo.com/mitmedialab',
|
||||
'duration': 23235,
|
||||
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
|
||||
'chapters': 'count:37',
|
||||
'release_timestamp': 1744290000,
|
||||
'release_date': '20250410',
|
||||
'live_status': 'was_live',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
'expected_warnings': ['Failed to parse XML: not well-formed'],
|
||||
}]
|
||||
|
||||
_EVENT_FIELDS = (
|
||||
'title', 'uri', 'schedule', 'stream_description', 'stream_privacy.embed', 'stream_privacy.view',
|
||||
'clip_to_play.name', 'clip_to_play.uri', 'clip_to_play.config_url', 'clip_to_play.live.status',
|
||||
'clip_to_play.privacy.embed', 'clip_to_play.privacy.view', 'clip_to_play.password',
|
||||
'streamable_clip.name', 'streamable_clip.uri', 'streamable_clip.config_url', 'streamable_clip.live.status',
|
||||
)
|
||||
_VIDEOS_FIELDS = ('items', 'uri', 'name', 'config_url', 'duration', 'live.status')
|
||||
|
||||
def _call_events_api(
|
||||
self, event_id, ep=None, unlisted_hash=None, note=None,
|
||||
fields=(), referrer=None, query=None, headers=None,
|
||||
):
|
||||
resource = join_nonempty('event', ep, note, 'API JSON', delim=' ')
|
||||
|
||||
return self._download_json(
|
||||
join_nonempty(
|
||||
'https://api.vimeo.com/live_events',
|
||||
join_nonempty(event_id, unlisted_hash, delim=':'), ep, delim='/'),
|
||||
event_id, f'Downloading {resource}', f'Failed to download {resource}',
|
||||
query=filter_dict({
|
||||
'fields': ','.join(fields) or [],
|
||||
# Correct spelling with 4 R's is deliberate
|
||||
'referrer': referrer,
|
||||
**(query or {}),
|
||||
}), headers=filter_dict({
|
||||
'Accept': 'application/json',
|
||||
'Authorization': f'jwt {self._fetch_viewer_info(event_id)["jwt"]}',
|
||||
'Referer': referrer,
|
||||
**(headers or {}),
|
||||
}))
|
||||
|
||||
@staticmethod
|
||||
def _extract_video_id_and_unlisted_hash(video):
|
||||
if not traverse_obj(video, ('uri', {lambda x: x.startswith('/videos/')})):
|
||||
return None, None
|
||||
video_id, _, unlisted_hash = video['uri'][8:].partition(':')
|
||||
return video_id, unlisted_hash or None
|
||||
|
||||
def _vimeo_url_result(self, video_id, unlisted_hash=None, event_id=None):
|
||||
# VimeoIE can extract more metadata and formats for was_live event videos
|
||||
return self.url_result(
|
||||
join_nonempty('https://vimeo.com', video_id, unlisted_hash, delim='/'), VimeoIE,
|
||||
video_id, display_id=event_id, live_status='was_live', url_transparent=True)
|
||||
|
||||
@classmethod
|
||||
def _extract_embed_urls(cls, url, webpage):
|
||||
for embed_url in super()._extract_embed_urls(url, webpage):
|
||||
yield cls._smuggle_referrer(embed_url, url)
|
||||
|
||||
def _real_extract(self, url):
|
||||
url, _, headers = self._unsmuggle_headers(url)
|
||||
# XXX: Keep key name in sync with _unsmuggle_headers
|
||||
referrer = headers.get('Referer')
|
||||
event_id, unlisted_hash, video_id = self._match_valid_url(url).group('id', 'unlisted_hash', 'video_id')
|
||||
|
||||
for retry in (False, True):
|
||||
try:
|
||||
live_event_data = self._call_events_api(
|
||||
event_id, unlisted_hash=unlisted_hash, fields=self._EVENT_FIELDS,
|
||||
referrer=referrer, query={'clip_to_play_id': video_id or '0'},
|
||||
headers={'Accept': 'application/vnd.vimeo.*+json;version=3.4.9'})
|
||||
break
|
||||
except ExtractorError as e:
|
||||
if retry or not isinstance(e.cause, HTTPError) or e.cause.status not in (400, 403):
|
||||
raise
|
||||
response = traverse_obj(e.cause.response.read(), ({json.loads}, {dict})) or {}
|
||||
error_code = response.get('error_code')
|
||||
if error_code == 2204:
|
||||
self._verify_video_password(event_id, path='event')
|
||||
continue
|
||||
if error_code == 3200:
|
||||
raise ExtractorError(self._REFERER_HINT, expected=True)
|
||||
if error_msg := response.get('error'):
|
||||
raise ExtractorError(f'Vimeo says: {error_msg}', expected=True)
|
||||
raise
|
||||
|
||||
# stream_privacy.view can be: 'anybody', 'embed_only', 'nobody', 'password', 'unlisted'
|
||||
view_policy = live_event_data['stream_privacy']['view']
|
||||
if view_policy == 'nobody':
|
||||
raise ExtractorError('This event has not been made available to anyone', expected=True)
|
||||
|
||||
clip_data = traverse_obj(live_event_data, ('clip_to_play', {dict})) or {}
|
||||
# live.status can be: 'streaming' (is_live), 'done' (was_live), 'unavailable' (is_upcoming OR dead)
|
||||
clip_status = traverse_obj(clip_data, ('live', 'status', {str}))
|
||||
start_time = traverse_obj(live_event_data, ('schedule', 'start_time', {str}))
|
||||
release_timestamp = parse_iso8601(start_time)
|
||||
|
||||
if clip_status == 'unavailable' and release_timestamp and release_timestamp > time.time():
|
||||
self.raise_no_formats(f'This live event is scheduled for {start_time}', expected=True)
|
||||
live_status = 'is_upcoming'
|
||||
config_url = None
|
||||
|
||||
elif view_policy == 'embed_only':
|
||||
webpage = self._download_webpage(
|
||||
join_nonempty('https://vimeo.com/event', event_id, 'embed', unlisted_hash, delim='/'),
|
||||
event_id, 'Downloading embed iframe webpage', impersonate=True, headers=headers)
|
||||
# The _parse_config result will overwrite live_status w/ 'is_live' if livestream is active
|
||||
live_status = 'was_live'
|
||||
config_url = self._extract_config_url(webpage)
|
||||
|
||||
else: # view_policy in ('anybody', 'password', 'unlisted')
|
||||
if video_id:
|
||||
clip_id, clip_hash = self._extract_video_id_and_unlisted_hash(clip_data)
|
||||
if video_id == clip_id and clip_status == 'done' and (clip_hash or view_policy != 'unlisted'):
|
||||
return self._vimeo_url_result(clip_id, clip_hash, event_id)
|
||||
|
||||
video_filter = lambda _, v: self._extract_video_id_and_unlisted_hash(v)[0] == video_id
|
||||
else:
|
||||
video_filter = lambda _, v: v['live']['status'] in ('streaming', 'done')
|
||||
|
||||
for page in itertools.count(1):
|
||||
videos_data = self._call_events_api(
|
||||
event_id, 'videos', unlisted_hash=unlisted_hash, note=f'page {page}',
|
||||
fields=self._VIDEOS_FIELDS, referrer=referrer, query={'page': page},
|
||||
headers={'Accept': 'application/vnd.vimeo.*;version=3.4.1'})
|
||||
|
||||
video = traverse_obj(videos_data, ('data', video_filter, any))
|
||||
if video or not traverse_obj(videos_data, ('paging', 'next', {str})):
|
||||
break
|
||||
|
||||
live_status = {
|
||||
'streaming': 'is_live',
|
||||
'done': 'was_live',
|
||||
}.get(traverse_obj(video, ('live', 'status', {str})))
|
||||
|
||||
if not live_status: # requested video_id is unavailable or no videos are available
|
||||
raise ExtractorError('This event video is unavailable', expected=True)
|
||||
elif live_status == 'was_live':
|
||||
return self._vimeo_url_result(*self._extract_video_id_and_unlisted_hash(video), event_id)
|
||||
config_url = video['config_url']
|
||||
|
||||
if config_url: # view_policy == 'embed_only' or live_status == 'is_live'
|
||||
info = filter_dict(self._parse_config(
|
||||
self._download_json(config_url, event_id, 'Downloading config JSON'), event_id))
|
||||
else: # live_status == 'is_upcoming'
|
||||
info = {'id': event_id}
|
||||
|
||||
if info.get('live_status') == 'post_live':
|
||||
self.report_warning('This live event recently ended and some formats may not yet be available')
|
||||
|
||||
return {
|
||||
**traverse_obj(live_event_data, {
|
||||
'title': ('title', {str}),
|
||||
'description': ('stream_description', {str}),
|
||||
}),
|
||||
'display_id': event_id,
|
||||
'live_status': live_status,
|
||||
'release_timestamp': release_timestamp,
|
||||
**info,
|
||||
}
|
||||
|
@ -45,7 +45,7 @@ class XinpianchangIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id=video_id)
|
||||
webpage = self._download_webpage(url, video_id=video_id, headers={'Referer': url})
|
||||
video_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['detail']['video']
|
||||
|
||||
data = self._download_json(
|
||||
|
@ -35,6 +35,7 @@
|
||||
class _PoTokenContext(enum.Enum):
|
||||
PLAYER = 'player'
|
||||
GVS = 'gvs'
|
||||
SUBS = 'subs'
|
||||
|
||||
|
||||
# any clients starting with _ cannot be explicitly requested by the user
|
||||
@ -787,6 +788,7 @@ def _download_webpage_with_retries(self, *args, retry_fatal=False, retry_on_stat
|
||||
|
||||
def _download_ytcfg(self, client, video_id):
|
||||
url = {
|
||||
'mweb': 'https://m.youtube.com',
|
||||
'web': 'https://www.youtube.com',
|
||||
'web_music': 'https://music.youtube.com',
|
||||
'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1',
|
||||
|
@ -23,6 +23,8 @@
|
||||
_split_innertube_client,
|
||||
short_client_name,
|
||||
)
|
||||
from .pot._director import initialize_pot_director
|
||||
from .pot.provider import PoTokenContext, PoTokenRequest
|
||||
from ...jsinterp import JSInterpreter, PhantomJSwrapper
|
||||
from ...networking.exceptions import HTTPError
|
||||
from ...utils import (
|
||||
@ -65,9 +67,13 @@
|
||||
urljoin,
|
||||
variadic,
|
||||
)
|
||||
from ...utils.networking import clean_headers, clean_proxies, select_proxy
|
||||
|
||||
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
|
||||
STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
|
||||
STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token'
|
||||
STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context'
|
||||
|
||||
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
|
||||
|
||||
|
||||
@ -1808,6 +1814,11 @@ def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self._code_cache = {}
|
||||
self._player_cache = {}
|
||||
self._pot_director = None
|
||||
|
||||
def _real_initialize(self):
|
||||
super()._real_initialize()
|
||||
self._pot_director = initialize_pot_director(self)
|
||||
|
||||
def _prepare_live_from_start_formats(self, formats, video_id, live_start_time, url, webpage_url, smuggled_data, is_live):
|
||||
lock = threading.Lock()
|
||||
@ -2854,7 +2865,8 @@ def _get_config_po_token(self, client: str, context: _PoTokenContext):
|
||||
continue
|
||||
|
||||
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None,
|
||||
data_sync_id=None, session_index=None, player_url=None, video_id=None, **kwargs):
|
||||
data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None,
|
||||
required=False, **kwargs):
|
||||
"""
|
||||
Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client.
|
||||
|
||||
@ -2868,10 +2880,15 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
|
||||
@param session_index: session index.
|
||||
@param player_url: player URL.
|
||||
@param video_id: video ID.
|
||||
@param webpage: video webpage.
|
||||
@param required: Whether the PO Token is required (i.e. try to fetch unless policy is "never").
|
||||
@param kwargs: Additional arguments to pass down. May be more added in the future.
|
||||
@return: The fetched PO Token. None if it could not be fetched.
|
||||
"""
|
||||
|
||||
# TODO(future): This validation should be moved into pot framework.
|
||||
# Some sort of middleware or validation provider perhaps?
|
||||
|
||||
# GVS WebPO Token is bound to visitor_data / Visitor ID when logged out.
|
||||
# Must have visitor_data for it to function.
|
||||
if player_url and context == _PoTokenContext.GVS and not visitor_data and not self.is_authenticated:
|
||||
@ -2893,6 +2910,7 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
|
||||
f'Got a GVS PO Token for {client} client, but missing Data Sync ID for account. Formats may not work.'
|
||||
f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"')
|
||||
|
||||
self.write_debug(f'{video_id}: Retrieved a {context.value} PO Token for {client} client from config')
|
||||
return config_po_token
|
||||
|
||||
# Require GVS WebPO Token if logged in for external fetching
|
||||
@ -2902,7 +2920,7 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
|
||||
f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"')
|
||||
return
|
||||
|
||||
return self._fetch_po_token(
|
||||
po_token = self._fetch_po_token(
|
||||
client=client,
|
||||
context=context.value,
|
||||
ytcfg=ytcfg,
|
||||
@ -2911,11 +2929,68 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
|
||||
session_index=session_index,
|
||||
player_url=player_url,
|
||||
video_id=video_id,
|
||||
video_webpage=webpage,
|
||||
required=required,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
if po_token:
|
||||
self.write_debug(f'{video_id}: Retrieved a {context.value} PO Token for {client} client')
|
||||
return po_token
|
||||
|
||||
def _fetch_po_token(self, client, **kwargs):
|
||||
"""(Unstable) External PO Token fetch stub"""
|
||||
context = kwargs.get('context')
|
||||
|
||||
# Avoid fetching PO Tokens when not required
|
||||
fetch_pot_policy = self._configuration_arg('fetch_pot', [''], ie_key=YoutubeIE)[0]
|
||||
if fetch_pot_policy not in ('never', 'auto', 'always'):
|
||||
fetch_pot_policy = 'auto'
|
||||
if (
|
||||
fetch_pot_policy == 'never'
|
||||
or (
|
||||
fetch_pot_policy == 'auto'
|
||||
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
|
||||
and not kwargs.get('required', False)
|
||||
)
|
||||
):
|
||||
return None
|
||||
|
||||
headers = self.get_param('http_headers').copy()
|
||||
proxies = self._downloader.proxies.copy()
|
||||
clean_headers(headers)
|
||||
clean_proxies(proxies, headers)
|
||||
|
||||
innertube_host = self._select_api_hostname(None, default_client=client)
|
||||
|
||||
pot_request = PoTokenRequest(
|
||||
context=PoTokenContext(context),
|
||||
innertube_context=traverse_obj(kwargs, ('ytcfg', 'INNERTUBE_CONTEXT')),
|
||||
innertube_host=innertube_host,
|
||||
internal_client_name=client,
|
||||
session_index=kwargs.get('session_index'),
|
||||
player_url=kwargs.get('player_url'),
|
||||
video_webpage=kwargs.get('video_webpage'),
|
||||
is_authenticated=self.is_authenticated,
|
||||
visitor_data=kwargs.get('visitor_data'),
|
||||
data_sync_id=kwargs.get('data_sync_id'),
|
||||
video_id=kwargs.get('video_id'),
|
||||
request_cookiejar=self._downloader.cookiejar,
|
||||
|
||||
# All requests that would need to be proxied should be in the
|
||||
# context of www.youtube.com or the innertube host
|
||||
request_proxy=(
|
||||
select_proxy('https://www.youtube.com', proxies)
|
||||
or select_proxy(f'https://{innertube_host}', proxies)
|
||||
),
|
||||
request_headers=headers,
|
||||
request_timeout=self.get_param('socket_timeout'),
|
||||
request_verify_tls=not self.get_param('nocheckcertificate'),
|
||||
request_source_address=self.get_param('source_address'),
|
||||
|
||||
bypass_cache=False,
|
||||
)
|
||||
|
||||
return self._pot_director.get_po_token(pot_request)
|
||||
|
||||
@staticmethod
|
||||
def _is_agegated(player_response):
|
||||
@ -3064,6 +3139,8 @@ def append_client(*client_names):
|
||||
player_url = self._download_player_url(video_id)
|
||||
tried_iframe_fallback = True
|
||||
|
||||
pr = initial_pr if client == 'web' else None
|
||||
|
||||
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg)
|
||||
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg)
|
||||
|
||||
@ -3073,16 +3150,24 @@ def append_client(*client_names):
|
||||
'video_id': video_id,
|
||||
'data_sync_id': data_sync_id if self.is_authenticated else None,
|
||||
'player_url': player_url if require_js_player else None,
|
||||
'webpage': webpage,
|
||||
'session_index': self._extract_session_index(master_ytcfg, player_ytcfg),
|
||||
'ytcfg': player_ytcfg,
|
||||
'ytcfg': player_ytcfg or self._get_default_ytcfg(client),
|
||||
}
|
||||
|
||||
player_po_token = self.fetch_po_token(
|
||||
# Don't need a player PO token for WEB if using player response from webpage
|
||||
player_po_token = None if pr else self.fetch_po_token(
|
||||
context=_PoTokenContext.PLAYER, **fetch_po_token_args)
|
||||
|
||||
gvs_po_token = self.fetch_po_token(
|
||||
context=_PoTokenContext.GVS, **fetch_po_token_args)
|
||||
|
||||
fetch_subs_po_token_func = functools.partial(
|
||||
self.fetch_po_token,
|
||||
context=_PoTokenContext.SUBS,
|
||||
**fetch_po_token_args,
|
||||
)
|
||||
|
||||
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
|
||||
|
||||
if (
|
||||
@ -3109,7 +3194,6 @@ def append_client(*client_names):
|
||||
only_once=True)
|
||||
deprioritize_pr = True
|
||||
|
||||
pr = initial_pr if client == 'web' else None
|
||||
try:
|
||||
pr = pr or self._extract_player_response(
|
||||
client, video_id,
|
||||
@ -3127,10 +3211,13 @@ def append_client(*client_names):
|
||||
if pr_id := self._invalid_player_response(pr, video_id):
|
||||
skipped_clients[client] = pr_id
|
||||
elif pr:
|
||||
# Save client name for introspection later
|
||||
sd = traverse_obj(pr, ('streamingData', {dict})) or {}
|
||||
# Save client details for introspection later
|
||||
innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT')
|
||||
sd = pr.setdefault('streamingData', {})
|
||||
sd[STREAMING_DATA_CLIENT_NAME] = client
|
||||
sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
|
||||
sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context
|
||||
sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func
|
||||
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
|
||||
f[STREAMING_DATA_CLIENT_NAME] = client
|
||||
f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
|
||||
@ -3192,6 +3279,25 @@ def _report_pot_format_skipped(self, video_id, client_name, proto):
|
||||
else:
|
||||
self.report_warning(msg, only_once=True)
|
||||
|
||||
def _report_pot_subtitles_skipped(self, video_id, client_name, msg=None):
|
||||
msg = msg or (
|
||||
f'{video_id}: Some {client_name} client subtitles require a PO Token which was not provided. '
|
||||
'They will be discarded since they are not downloadable as-is. '
|
||||
f'You can manually pass a Subtitles PO Token for this client with '
|
||||
f'--extractor-args "youtube:po_token={client_name}.subs+XXX" . '
|
||||
f'For more information, refer to {PO_TOKEN_GUIDE_URL}')
|
||||
|
||||
subs_wanted = any((
|
||||
self.get_param('writesubtitles'),
|
||||
self.get_param('writeautomaticsub'),
|
||||
self.get_param('listsubtitles')))
|
||||
|
||||
# Only raise a warning for non-default clients, to not confuse users.
|
||||
if not subs_wanted or client_name in (*self._DEFAULT_CLIENTS, *self._DEFAULT_AUTHED_CLIENTS):
|
||||
self.write_debug(msg, only_once=True)
|
||||
else:
|
||||
self.report_warning(msg, only_once=True)
|
||||
|
||||
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
|
||||
CHUNK_SIZE = 10 << 20
|
||||
PREFERRED_LANG_VALUE = 10
|
||||
@ -3483,6 +3589,9 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
|
||||
hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}'
|
||||
fmts, subs = self._extract_m3u8_formats_and_subtitles(
|
||||
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
|
||||
for sub in traverse_obj(subs, (..., ..., {dict})):
|
||||
# HLS subs (m3u8) do not need a PO token; save client name for debugging
|
||||
sub[STREAMING_DATA_CLIENT_NAME] = client_name
|
||||
subtitles = self._merge_subtitles(subs, subtitles)
|
||||
for f in fmts:
|
||||
if process_manifest_format(f, 'hls', client_name, self._search_regex(
|
||||
@ -3494,6 +3603,9 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
|
||||
if po_token:
|
||||
dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}'
|
||||
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
|
||||
for sub in traverse_obj(subs, (..., ..., {dict})):
|
||||
# TODO: Investigate if DASH subs ever need a PO token; save client name for debugging
|
||||
sub[STREAMING_DATA_CLIENT_NAME] = client_name
|
||||
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
|
||||
for f in formats:
|
||||
if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token):
|
||||
@ -3685,7 +3797,7 @@ def feed_entry(name):
|
||||
reason = self._get_text(pemr, 'reason') or get_first(playability_statuses, 'reason')
|
||||
subreason = clean_html(self._get_text(pemr, 'subreason') or '')
|
||||
if subreason:
|
||||
if subreason == 'The uploader has not made this video available in your country.':
|
||||
if subreason.startswith('The uploader has not made this video available in your country'):
|
||||
countries = get_first(microformats, 'availableCountries')
|
||||
if not countries:
|
||||
regions_allowed = search_meta('regionsAllowed')
|
||||
@ -3820,47 +3932,85 @@ def is_bad_format(fmt):
|
||||
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'),
|
||||
}
|
||||
|
||||
def get_lang_code(track):
|
||||
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
|
||||
or track.get('languageCode'))
|
||||
|
||||
def process_language(container, base_url, lang_code, sub_name, client_name, query):
|
||||
lang_subs = container.setdefault(lang_code, [])
|
||||
for fmt in self._SUBTITLE_FORMATS:
|
||||
query = {**query, 'fmt': fmt}
|
||||
lang_subs.append({
|
||||
'ext': fmt,
|
||||
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
|
||||
'name': sub_name,
|
||||
STREAMING_DATA_CLIENT_NAME: client_name,
|
||||
})
|
||||
|
||||
subtitles = {}
|
||||
pctr = traverse_obj(player_responses, (..., 'captions', 'playerCaptionsTracklistRenderer'), expected_type=dict)
|
||||
if pctr:
|
||||
def get_lang_code(track):
|
||||
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
|
||||
or track.get('languageCode'))
|
||||
skipped_subs_clients = set()
|
||||
|
||||
# Converted into dicts to remove duplicates
|
||||
captions = {
|
||||
get_lang_code(sub): sub
|
||||
for sub in traverse_obj(pctr, (..., 'captionTracks', ...))}
|
||||
translation_languages = {
|
||||
lang.get('languageCode'): self._get_text(lang.get('languageName'), max_runs=1)
|
||||
for lang in traverse_obj(pctr, (..., 'translationLanguages', ...))}
|
||||
# Only web/mweb clients provide translationLanguages, so include initial_pr in the traversal
|
||||
translation_languages = {
|
||||
lang['languageCode']: self._get_text(lang['languageName'], max_runs=1)
|
||||
for lang in traverse_obj(player_responses, (
|
||||
..., 'captions', 'playerCaptionsTracklistRenderer', 'translationLanguages',
|
||||
lambda _, v: v['languageCode'] and v['languageName']))
|
||||
}
|
||||
# NB: Constructing the full subtitle dictionary is slow
|
||||
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and (
|
||||
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles'))
|
||||
|
||||
def process_language(container, base_url, lang_code, sub_name, query):
|
||||
lang_subs = container.setdefault(lang_code, [])
|
||||
for fmt in self._SUBTITLE_FORMATS:
|
||||
query.update({
|
||||
'fmt': fmt,
|
||||
})
|
||||
lang_subs.append({
|
||||
'ext': fmt,
|
||||
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
|
||||
'name': sub_name,
|
||||
})
|
||||
# Filter out initial_pr which does not have streamingData (smuggled client context)
|
||||
prs = traverse_obj(player_responses, (
|
||||
lambda _, v: v['streamingData'] and v['captions']['playerCaptionsTracklistRenderer']))
|
||||
all_captions = traverse_obj(prs, (
|
||||
..., 'captions', 'playerCaptionsTracklistRenderer', 'captionTracks', ..., {dict}))
|
||||
need_subs_langs = {get_lang_code(sub) for sub in all_captions if sub.get('kind') != 'asr'}
|
||||
need_caps_langs = {
|
||||
remove_start(get_lang_code(sub), 'a-')
|
||||
for sub in all_captions if sub.get('kind') == 'asr'}
|
||||
|
||||
# NB: Constructing the full subtitle dictionary is slow
|
||||
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and (
|
||||
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles'))
|
||||
for lang_code, caption_track in captions.items():
|
||||
base_url = caption_track.get('baseUrl')
|
||||
orig_lang = parse_qs(base_url).get('lang', [None])[-1]
|
||||
if not base_url:
|
||||
continue
|
||||
for pr in prs:
|
||||
pctr = pr['captions']['playerCaptionsTracklistRenderer']
|
||||
client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME]
|
||||
innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName']
|
||||
required_contexts = self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
|
||||
fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN]
|
||||
|
||||
pot_params = {}
|
||||
already_fetched_pot = False
|
||||
|
||||
for caption_track in traverse_obj(pctr, ('captionTracks', lambda _, v: v['baseUrl'])):
|
||||
base_url = caption_track['baseUrl']
|
||||
qs = parse_qs(base_url)
|
||||
lang_code = get_lang_code(caption_track)
|
||||
requires_pot = (
|
||||
# We can detect the experiment for now
|
||||
any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv'))
|
||||
or _PoTokenContext.SUBS in required_contexts)
|
||||
|
||||
if not already_fetched_pot:
|
||||
already_fetched_pot = True
|
||||
if subs_po_token := fetch_subs_po_token_func(required=requires_pot):
|
||||
pot_params.update({
|
||||
'pot': subs_po_token,
|
||||
'potc': '1',
|
||||
'c': innertube_client_name,
|
||||
})
|
||||
|
||||
if not pot_params and requires_pot:
|
||||
skipped_subs_clients.add(client_name)
|
||||
self._report_pot_subtitles_skipped(video_id, client_name)
|
||||
break
|
||||
|
||||
orig_lang = qs.get('lang', [None])[-1]
|
||||
lang_name = self._get_text(caption_track, 'name', max_runs=1)
|
||||
if caption_track.get('kind') != 'asr':
|
||||
if not lang_code:
|
||||
continue
|
||||
process_language(
|
||||
subtitles, base_url, lang_code, lang_name, {})
|
||||
subtitles, base_url, lang_code, lang_name, client_name, pot_params)
|
||||
if not caption_track.get('isTranslatable'):
|
||||
continue
|
||||
for trans_code, trans_name in translation_languages.items():
|
||||
@ -3880,10 +4030,25 @@ def process_language(container, base_url, lang_code, sub_name, query):
|
||||
# Add an "-orig" label to the original language so that it can be distinguished.
|
||||
# The subs are returned without "-orig" as well for compatibility
|
||||
process_language(
|
||||
automatic_captions, base_url, f'{trans_code}-orig', f'{trans_name} (Original)', {})
|
||||
automatic_captions, base_url, f'{trans_code}-orig',
|
||||
f'{trans_name} (Original)', client_name, pot_params)
|
||||
# Setting tlang=lang returns damaged subtitles.
|
||||
process_language(automatic_captions, base_url, trans_code, trans_name,
|
||||
{} if orig_lang == orig_trans_code else {'tlang': trans_code})
|
||||
process_language(
|
||||
automatic_captions, base_url, trans_code, trans_name, client_name,
|
||||
pot_params if orig_lang == orig_trans_code else {'tlang': trans_code, **pot_params})
|
||||
|
||||
# Avoid duplication if we've already got everything we need
|
||||
need_subs_langs.difference_update(subtitles)
|
||||
need_caps_langs.difference_update(automatic_captions)
|
||||
if not (need_subs_langs or need_caps_langs):
|
||||
break
|
||||
|
||||
if skipped_subs_clients and (need_subs_langs or need_caps_langs):
|
||||
self._report_pot_subtitles_skipped(video_id, True, msg=join_nonempty(
|
||||
f'{video_id}: There are missing subtitles languages because a PO token was not provided.',
|
||||
need_subs_langs and f'Subtitles for these languages are missing: {", ".join(need_subs_langs)}.',
|
||||
need_caps_langs and f'Automatic captions for {len(need_caps_langs)} languages are missing.',
|
||||
delim=' '))
|
||||
|
||||
info['automatic_captions'] = automatic_captions
|
||||
info['subtitles'] = subtitles
|
||||
|
309
yt_dlp/extractor/youtube/pot/README.md
Normal file
309
yt_dlp/extractor/youtube/pot/README.md
Normal file
@ -0,0 +1,309 @@
|
||||
# YoutubeIE PO Token Provider Framework
|
||||
|
||||
As part of the YouTube extractor, we have a framework for providing PO Tokens programmatically. This can be used by plugins.
|
||||
|
||||
Refer to the [PO Token Guide](https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide) for more information on PO Tokens.
|
||||
|
||||
> [!TIP]
|
||||
> If publishing a PO Token Provider plugin to GitHub, add the [yt-dlp-pot-provider](https://github.com/topics/yt-dlp-pot-provider) topic to your repository to help users find it.
|
||||
|
||||
|
||||
## Public APIs
|
||||
|
||||
- `yt_dlp.extractor.youtube.pot.cache`
|
||||
- `yt_dlp.extractor.youtube.pot.provider`
|
||||
- `yt_dlp.extractor.youtube.pot.utils`
|
||||
|
||||
Everything else is internal-only and no guarantees are made about the API stability.
|
||||
|
||||
> [!WARNING]
|
||||
> We will try our best to maintain stability with the public APIs.
|
||||
> However, due to the nature of extractors and YouTube, we may need to remove or change APIs in the future.
|
||||
> If you are using these APIs outside yt-dlp plugins, please account for this by importing them safely.
|
||||
|
||||
## PO Token Provider
|
||||
|
||||
`yt_dlp.extractor.youtube.pot.provider`
|
||||
|
||||
```python
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenRequest,
|
||||
PoTokenContext,
|
||||
PoTokenProvider,
|
||||
PoTokenResponse,
|
||||
PoTokenProviderError,
|
||||
PoTokenProviderRejectedRequest,
|
||||
register_provider,
|
||||
register_preference,
|
||||
ExternalRequestFeature,
|
||||
)
|
||||
from yt_dlp.networking.common import Request
|
||||
from yt_dlp.extractor.youtube.pot.utils import get_webpo_content_binding
|
||||
from yt_dlp.utils import traverse_obj
|
||||
from yt_dlp.networking.exceptions import RequestError
|
||||
import json
|
||||
|
||||
|
||||
@register_provider
|
||||
class MyPoTokenProviderPTP(PoTokenProvider): # Provider class name must end with "PTP"
|
||||
PROVIDER_VERSION = '0.2.1'
|
||||
# Define a unique display name for the provider
|
||||
PROVIDER_NAME = 'my-provider'
|
||||
BUG_REPORT_LOCATION = 'https://issues.example.com/report'
|
||||
|
||||
# -- Validation shortcuts. Set these to None to disable. --
|
||||
|
||||
# Innertube Client Name.
|
||||
# For example, "WEB", "ANDROID", "TVHTML5".
|
||||
# For a list of WebPO client names,
|
||||
# see yt_dlp.extractor.youtube.pot.utils.WEBPO_CLIENTS.
|
||||
# Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS
|
||||
# for a list of client names currently supported by the YouTube extractor.
|
||||
_SUPPORTED_CLIENTS = ('WEB', 'TVHTML5')
|
||||
|
||||
_SUPPORTED_CONTEXTS = (
|
||||
PoTokenContext.GVS,
|
||||
)
|
||||
|
||||
# If your provider makes external requests to websites (i.e. to youtube.com)
|
||||
# using another library or service (i.e., not _request_webpage),
|
||||
# set the request features that are supported here.
|
||||
# If only using _request_webpage to make external requests, set this to None.
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = (
|
||||
ExternalRequestFeature.PROXY_SCHEME_HTTP,
|
||||
ExternalRequestFeature.SOURCE_ADDRESS,
|
||||
ExternalRequestFeature.DISABLE_TLS_VERIFICATION
|
||||
)
|
||||
|
||||
def is_available(self) -> bool:
|
||||
"""
|
||||
Check if the provider is available (e.g. all required dependencies are available)
|
||||
This is used to determine if the provider should be used and to provide debug information.
|
||||
|
||||
IMPORTANT: This method SHOULD NOT make any network requests or perform any expensive operations.
|
||||
|
||||
Since this is called multiple times, we recommend caching the result.
|
||||
"""
|
||||
return True
|
||||
|
||||
def close(self):
|
||||
# Optional close hook, called when YoutubeDL is closed.
|
||||
pass
|
||||
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
# ℹ️ If you need to validate the request before making the request to the external source.
|
||||
# Raise yt_dlp.extractor.youtube.pot.provider.PoTokenProviderRejectedRequest if the request is not supported.
|
||||
if request.is_authenticated:
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
'This provider does not support authenticated requests'
|
||||
)
|
||||
|
||||
# ℹ️ Settings are pulled from extractor args passed to yt-dlp with the key `youtubepot-<PROVIDER_KEY>`.
|
||||
# For this example, the extractor arg would be:
|
||||
# `--extractor-args "youtubepot-mypotokenprovider:url=https://custom.example.com/get_pot"`
|
||||
external_provider_url = self._configuration_arg(
|
||||
'url', default=['https://provider.example.com/get_pot'])[0]
|
||||
|
||||
# See below for logging guidelines
|
||||
self.logger.trace(f'Using external provider URL: {external_provider_url}')
|
||||
|
||||
# You should use the internal HTTP client to make requests where possible,
|
||||
# as it will handle cookies and other networking settings passed to yt-dlp.
|
||||
try:
|
||||
# See docstring in _request_webpage method for request tips
|
||||
response = self._request_webpage(
|
||||
Request(external_provider_url, data=json.dumps({
|
||||
'content_binding': get_webpo_content_binding(request),
|
||||
'proxy': request.request_proxy,
|
||||
'headers': request.request_headers,
|
||||
'source_address': request.request_source_address,
|
||||
'verify_tls': request.request_verify_tls,
|
||||
# Important: If your provider has its own caching, please respect `bypass_cache`.
|
||||
# This may be used in the future to request a fresh PO Token if required.
|
||||
'do_not_cache': request.bypass_cache,
|
||||
}).encode(), proxies={'all': None}),
|
||||
pot_request=request,
|
||||
note=(
|
||||
f'Requesting {request.context.value} PO Token '
|
||||
f'for {request.internal_client_name} client from external provider'),
|
||||
)
|
||||
|
||||
except RequestError as e:
|
||||
# ℹ️ If there is an error, raise PoTokenProviderError.
|
||||
# You can specify whether it is expected or not. If it is unexpected,
|
||||
# the log will include a link to the bug report location (BUG_REPORT_LOCATION).
|
||||
raise PoTokenProviderError(
|
||||
'Networking error while fetching to get PO Token from external provider',
|
||||
expected=True
|
||||
) from e
|
||||
|
||||
# Note: PO Token is expected to be base64url encoded
|
||||
po_token = traverse_obj(response, 'po_token')
|
||||
if not po_token:
|
||||
raise PoTokenProviderError(
|
||||
'Bad PO Token Response from external provider',
|
||||
expected=False
|
||||
)
|
||||
|
||||
return PoTokenResponse(
|
||||
po_token=po_token,
|
||||
# Optional, add a custom expiration timestamp for the token. Use for caching.
|
||||
# By default, yt-dlp will use the default ttl from a registered cache spec (see below)
|
||||
# Set to 0 or -1 to not cache this response.
|
||||
expires_at=None,
|
||||
)
|
||||
|
||||
|
||||
# If there are multiple PO Token Providers that can handle the same PoTokenRequest,
|
||||
# you can define a preference function to increase/decrease the priority of providers.
|
||||
|
||||
@register_preference(MyPoTokenProviderPTP)
|
||||
def my_provider_preference(provider: PoTokenProvider, request: PoTokenRequest) -> int:
|
||||
return 50
|
||||
```
|
||||
|
||||
## Logging Guidelines
|
||||
|
||||
- Use the `self.logger` object to log messages.
|
||||
- When making HTTP requests or any other expensive operation, use `self.logger.info` to log a message to standard non-verbose output.
|
||||
- This lets users know what is happening when a time-expensive operation is taking place.
|
||||
- It is recommended to include the PO Token context and internal client name in the message if possible.
|
||||
- For example, `self.logger.info(f'Requesting {request.context.value} PO Token for {request.internal_client_name} client from external provider')`.
|
||||
- Use `self.logger.debug` to log a message to the verbose output (`--verbose`).
|
||||
- For debugging information visible to users posting verbose logs.
|
||||
- Try to not log too much, prefer using trace logging for detailed debug messages.
|
||||
- Use `self.logger.trace` to log a message to the PO Token debug output (`--extractor-args "youtube:pot_trace=true"`).
|
||||
- Log as much as you like here as needed for debugging your provider.
|
||||
- Avoid logging PO Tokens or any sensitive information to debug or info output.
|
||||
|
||||
## Debugging
|
||||
|
||||
- Use `-v --extractor-args "youtube:pot_trace=true"` to enable PO Token debug output.
|
||||
|
||||
## Caching
|
||||
|
||||
> [!WARNING]
|
||||
> The following describes more advance features that most users/developers will not need to use.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> yt-dlp currently has a built-in LRU Memory Cache Provider and a cache spec provider for WebPO Tokens.
|
||||
> You should only need to implement cache providers if you want an external cache, or a cache spec if you are handling non-WebPO Tokens.
|
||||
|
||||
### Cache Providers
|
||||
|
||||
`yt_dlp.extractor.youtube.pot.cache`
|
||||
|
||||
```python
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
PoTokenCacheProvider,
|
||||
register_preference,
|
||||
register_provider
|
||||
)
|
||||
|
||||
from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest
|
||||
|
||||
|
||||
@register_provider
|
||||
class MyCacheProviderPCP(PoTokenCacheProvider): # Provider class name must end with "PCP"
|
||||
PROVIDER_VERSION = '0.1.0'
|
||||
# Define a unique display name for the provider
|
||||
PROVIDER_NAME = 'my-cache-provider'
|
||||
BUG_REPORT_LOCATION = 'https://issues.example.com/report'
|
||||
|
||||
def is_available(self) -> bool:
|
||||
"""
|
||||
Check if the provider is available (e.g. all required dependencies are available)
|
||||
This is used to determine if the provider should be used and to provide debug information.
|
||||
|
||||
IMPORTANT: This method SHOULD NOT make any network requests or perform any expensive operations.
|
||||
|
||||
Since this is called multiple times, we recommend caching the result.
|
||||
"""
|
||||
return True
|
||||
|
||||
def get(self, key: str):
|
||||
# ℹ️ Similar to PO Token Providers, Cache Providers and Cache Spec Providers
|
||||
# are passed down extractor args matching key youtubepot-<PROVIDER_KEY>.
|
||||
some_setting = self._configuration_arg('some_setting', default=['default_value'])[0]
|
||||
return self.my_cache.get(key)
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
# ⚠ expires_at MUST be respected.
|
||||
# Cache entries should not be returned if they have expired.
|
||||
self.my_cache.store(key, value, expires_at)
|
||||
|
||||
def delete(self, key: str):
|
||||
self.my_cache.delete(key)
|
||||
|
||||
def close(self):
|
||||
# Optional close hook, called when the YoutubeDL instance is closed.
|
||||
pass
|
||||
|
||||
# If there are multiple PO Token Cache Providers available, you can
|
||||
# define a preference function to increase/decrease the priority of providers.
|
||||
|
||||
# IMPORTANT: Providers should be in preference of cache lookup time.
|
||||
# For example, a memory cache should have a higher preference than a disk cache.
|
||||
|
||||
# VERY IMPORTANT: yt-dlp has a built-in memory cache with a priority of 10000.
|
||||
# Your cache provider should be lower than this.
|
||||
|
||||
|
||||
@register_preference(MyCacheProviderPCP)
|
||||
def my_cache_preference(provider: PoTokenCacheProvider, request: PoTokenRequest) -> int:
|
||||
return 50
|
||||
```
|
||||
|
||||
### Cache Specs
|
||||
|
||||
`yt_dlp.extractor.youtube.pot.cache`
|
||||
|
||||
These are used to provide information on how to cache a particular PO Token Request.
|
||||
You might have a different cache spec for different kinds of PO Tokens.
|
||||
|
||||
```python
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
PoTokenCacheSpec,
|
||||
PoTokenCacheSpecProvider,
|
||||
CacheProviderWritePolicy,
|
||||
register_spec,
|
||||
)
|
||||
from yt_dlp.utils import traverse_obj
|
||||
from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest
|
||||
|
||||
|
||||
@register_spec
|
||||
class MyCacheSpecProviderPCSP(PoTokenCacheSpecProvider): # Provider class name must end with "PCSP"
|
||||
PROVIDER_VERSION = '0.1.0'
|
||||
# Define a unique display name for the provider
|
||||
PROVIDER_NAME = 'mycachespec'
|
||||
BUG_REPORT_LOCATION = 'https://issues.example.com/report'
|
||||
|
||||
def generate_cache_spec(self, request: PoTokenRequest):
|
||||
|
||||
client_name = traverse_obj(request.innertube_context, ('client', 'clientName'))
|
||||
if client_name != 'ANDROID':
|
||||
# ℹ️ If the request is not supported by the cache spec, return None
|
||||
return None
|
||||
|
||||
# Generate a cache spec for the request
|
||||
return PoTokenCacheSpec(
|
||||
# Key bindings to uniquely identify the request. These are used to generate a cache key.
|
||||
key_bindings={
|
||||
'client_name': client_name,
|
||||
'content_binding': 'unique_content_binding',
|
||||
'ip': traverse_obj(request.innertube_context, ('client', 'remoteHost')),
|
||||
'source_address': request.request_source_address,
|
||||
'proxy': request.request_proxy,
|
||||
},
|
||||
# Default Cache TTL in seconds
|
||||
default_ttl=21600,
|
||||
|
||||
# Optional: Specify a write policy.
|
||||
# WRITE_FIRST will write to the highest priority provider only,
|
||||
# whereas WRITE_ALL will write to all providers.
|
||||
# WRITE_FIRST may be useful if the PO Token is short-lived
|
||||
# and there is no use writing to all providers.
|
||||
write_policy=CacheProviderWritePolicy.WRITE_ALL,
|
||||
)
|
||||
```
|
3
yt_dlp/extractor/youtube/pot/__init__.py
Normal file
3
yt_dlp/extractor/youtube/pot/__init__.py
Normal file
@ -0,0 +1,3 @@
|
||||
# Trigger import of built-in providers
|
||||
from ._builtin.memory_cache import MemoryLRUPCP as _MemoryLRUPCP # noqa: F401
|
||||
from ._builtin.webpo_cachespec import WebPoPCSP as _WebPoPCSP # noqa: F401
|
0
yt_dlp/extractor/youtube/pot/_builtin/__init__.py
Normal file
0
yt_dlp/extractor/youtube/pot/_builtin/__init__.py
Normal file
78
yt_dlp/extractor/youtube/pot/_builtin/memory_cache.py
Normal file
78
yt_dlp/extractor/youtube/pot/_builtin/memory_cache.py
Normal file
@ -0,0 +1,78 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import datetime as dt
|
||||
import typing
|
||||
from threading import Lock
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import BuiltinIEContentProvider
|
||||
from yt_dlp.extractor.youtube.pot._registry import _pot_memory_cache
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
PoTokenCacheProvider,
|
||||
register_preference,
|
||||
register_provider,
|
||||
)
|
||||
|
||||
|
||||
def initialize_global_cache(max_size: int):
|
||||
if _pot_memory_cache.value.get('cache') is None:
|
||||
_pot_memory_cache.value['cache'] = {}
|
||||
_pot_memory_cache.value['lock'] = Lock()
|
||||
_pot_memory_cache.value['max_size'] = max_size
|
||||
|
||||
if _pot_memory_cache.value['max_size'] != max_size:
|
||||
raise ValueError('Cannot change max_size of initialized global memory cache')
|
||||
|
||||
return (
|
||||
_pot_memory_cache.value['cache'],
|
||||
_pot_memory_cache.value['lock'],
|
||||
_pot_memory_cache.value['max_size'],
|
||||
)
|
||||
|
||||
|
||||
@register_provider
|
||||
class MemoryLRUPCP(PoTokenCacheProvider, BuiltinIEContentProvider):
|
||||
PROVIDER_NAME = 'memory'
|
||||
DEFAULT_CACHE_SIZE = 25
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*args,
|
||||
initialize_cache: typing.Callable[[int], tuple[dict[str, tuple[str, int]], Lock, int]] = initialize_global_cache,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.cache, self.lock, self.max_size = initialize_cache(self.DEFAULT_CACHE_SIZE)
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
def get(self, key: str) -> str | None:
|
||||
with self.lock:
|
||||
if key not in self.cache:
|
||||
return None
|
||||
value, expires_at = self.cache.pop(key)
|
||||
if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()):
|
||||
return None
|
||||
self.cache[key] = (value, expires_at)
|
||||
return value
|
||||
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
with self.lock:
|
||||
if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()):
|
||||
return
|
||||
if key in self.cache:
|
||||
self.cache.pop(key)
|
||||
self.cache[key] = (value, expires_at)
|
||||
if len(self.cache) > self.max_size:
|
||||
oldest_key = next(iter(self.cache))
|
||||
self.cache.pop(oldest_key)
|
||||
|
||||
def delete(self, key: str):
|
||||
with self.lock:
|
||||
self.cache.pop(key, None)
|
||||
|
||||
|
||||
@register_preference(MemoryLRUPCP)
|
||||
def memorylru_preference(*_, **__):
|
||||
# Memory LRU Cache SHOULD be the highest priority
|
||||
return 10000
|
48
yt_dlp/extractor/youtube/pot/_builtin/webpo_cachespec.py
Normal file
48
yt_dlp/extractor/youtube/pot/_builtin/webpo_cachespec.py
Normal file
@ -0,0 +1,48 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import BuiltinIEContentProvider
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
CacheProviderWritePolicy,
|
||||
PoTokenCacheSpec,
|
||||
PoTokenCacheSpecProvider,
|
||||
register_spec,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenRequest,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot.utils import ContentBindingType, get_webpo_content_binding
|
||||
from yt_dlp.utils import traverse_obj
|
||||
|
||||
|
||||
@register_spec
|
||||
class WebPoPCSP(PoTokenCacheSpecProvider, BuiltinIEContentProvider):
|
||||
PROVIDER_NAME = 'webpo'
|
||||
|
||||
def generate_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None:
|
||||
bind_to_visitor_id = self._configuration_arg(
|
||||
'bind_to_visitor_id', default=['true'])[0] == 'true'
|
||||
|
||||
content_binding, content_binding_type = get_webpo_content_binding(
|
||||
request, bind_to_visitor_id=bind_to_visitor_id)
|
||||
|
||||
if not content_binding or not content_binding_type:
|
||||
return None
|
||||
|
||||
write_policy = CacheProviderWritePolicy.WRITE_ALL
|
||||
if content_binding_type == ContentBindingType.VIDEO_ID:
|
||||
write_policy = CacheProviderWritePolicy.WRITE_FIRST
|
||||
|
||||
return PoTokenCacheSpec(
|
||||
key_bindings={
|
||||
't': 'webpo',
|
||||
'cb': content_binding,
|
||||
'cbt': content_binding_type.value,
|
||||
'ip': traverse_obj(request.innertube_context, ('client', 'remoteHost')),
|
||||
'sa': request.request_source_address,
|
||||
'px': request.request_proxy,
|
||||
},
|
||||
# Integrity token response usually states it has a ttl of 12 hours (43200 seconds).
|
||||
# We will default to 6 hours to be safe.
|
||||
default_ttl=21600,
|
||||
write_policy=write_policy,
|
||||
)
|
468
yt_dlp/extractor/youtube/pot/_director.py
Normal file
468
yt_dlp/extractor/youtube/pot/_director.py
Normal file
@ -0,0 +1,468 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import binascii
|
||||
import dataclasses
|
||||
import datetime as dt
|
||||
import hashlib
|
||||
import json
|
||||
import typing
|
||||
import urllib.parse
|
||||
from collections.abc import Iterable
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import (
|
||||
BuiltinIEContentProvider,
|
||||
IEContentProvider,
|
||||
IEContentProviderLogger,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot._registry import (
|
||||
_pot_cache_provider_preferences,
|
||||
_pot_cache_providers,
|
||||
_pot_pcs_providers,
|
||||
_pot_providers,
|
||||
_ptp_preferences,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot.cache import (
|
||||
CacheProviderWritePolicy,
|
||||
PoTokenCacheProvider,
|
||||
PoTokenCacheProviderError,
|
||||
PoTokenCacheSpec,
|
||||
PoTokenCacheSpecProvider,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot.provider import (
|
||||
PoTokenProvider,
|
||||
PoTokenProviderError,
|
||||
PoTokenProviderRejectedRequest,
|
||||
PoTokenRequest,
|
||||
PoTokenResponse,
|
||||
provider_bug_report_message,
|
||||
)
|
||||
from yt_dlp.utils import bug_reports_message, format_field, join_nonempty
|
||||
|
||||
if typing.TYPE_CHECKING:
|
||||
from yt_dlp.extractor.youtube.pot.cache import CacheProviderPreference
|
||||
from yt_dlp.extractor.youtube.pot.provider import Preference
|
||||
|
||||
|
||||
class YoutubeIEContentProviderLogger(IEContentProviderLogger):
|
||||
def __init__(self, ie, prefix, log_level: IEContentProviderLogger.LogLevel | None = None):
|
||||
self.__ie = ie
|
||||
self.prefix = prefix
|
||||
self.log_level = log_level if log_level is not None else self.LogLevel.INFO
|
||||
|
||||
def _format_msg(self, message: str):
|
||||
prefixstr = format_field(self.prefix, None, '[%s] ')
|
||||
return f'{prefixstr}{message}'
|
||||
|
||||
def trace(self, message: str):
|
||||
if self.log_level <= self.LogLevel.TRACE:
|
||||
self.__ie.write_debug(self._format_msg('TRACE: ' + message))
|
||||
|
||||
def debug(self, message: str):
|
||||
if self.log_level <= self.LogLevel.DEBUG:
|
||||
self.__ie.write_debug(self._format_msg(message))
|
||||
|
||||
def info(self, message: str):
|
||||
if self.log_level <= self.LogLevel.INFO:
|
||||
self.__ie.to_screen(self._format_msg(message))
|
||||
|
||||
def warning(self, message: str, *, once=False):
|
||||
if self.log_level <= self.LogLevel.WARNING:
|
||||
self.__ie.report_warning(self._format_msg(message), only_once=once)
|
||||
|
||||
def error(self, message: str):
|
||||
if self.log_level <= self.LogLevel.ERROR:
|
||||
self.__ie._downloader.report_error(self._format_msg(message), is_error=False)
|
||||
|
||||
|
||||
class PoTokenCache:
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
logger: IEContentProviderLogger,
|
||||
cache_providers: list[PoTokenCacheProvider],
|
||||
cache_spec_providers: list[PoTokenCacheSpecProvider],
|
||||
cache_provider_preferences: list[CacheProviderPreference] | None = None,
|
||||
):
|
||||
self.cache_providers: dict[str, PoTokenCacheProvider] = {
|
||||
provider.PROVIDER_KEY: provider for provider in (cache_providers or [])}
|
||||
self.cache_provider_preferences: list[CacheProviderPreference] = cache_provider_preferences or []
|
||||
self.cache_spec_providers: dict[str, PoTokenCacheSpecProvider] = {
|
||||
provider.PROVIDER_KEY: provider for provider in (cache_spec_providers or [])}
|
||||
self.logger = logger
|
||||
|
||||
def _get_cache_providers(self, request: PoTokenRequest) -> Iterable[PoTokenCacheProvider]:
|
||||
"""Sorts available cache providers by preference, given a request"""
|
||||
preferences = {
|
||||
provider: sum(pref(provider, request) for pref in self.cache_provider_preferences)
|
||||
for provider in self.cache_providers.values()
|
||||
}
|
||||
if self.logger.log_level <= self.logger.LogLevel.TRACE:
|
||||
# calling is_available() for every PO Token provider upfront may have some overhead
|
||||
self.logger.trace(f'PO Token Cache Providers: {provider_display_list(self.cache_providers.values())}')
|
||||
self.logger.trace('Cache Provider preferences for this request: {}'.format(', '.join(
|
||||
f'{provider.PROVIDER_KEY}={pref}' for provider, pref in preferences.items())))
|
||||
|
||||
return (
|
||||
provider for provider in sorted(
|
||||
self.cache_providers.values(), key=preferences.get, reverse=True) if provider.is_available())
|
||||
|
||||
def _get_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None:
|
||||
for provider in self.cache_spec_providers.values():
|
||||
if not provider.is_available():
|
||||
continue
|
||||
try:
|
||||
spec = provider.generate_cache_spec(request)
|
||||
if not spec:
|
||||
continue
|
||||
if not validate_cache_spec(spec):
|
||||
self.logger.error(
|
||||
f'PoTokenCacheSpecProvider "{provider.PROVIDER_KEY}" generate_cache_spec() '
|
||||
f'returned invalid spec {spec}{provider_bug_report_message(provider)}')
|
||||
continue
|
||||
spec = dataclasses.replace(spec, _provider=provider)
|
||||
self.logger.trace(
|
||||
f'Retrieved cache spec {spec} from cache spec provider "{provider.PROVIDER_NAME}"')
|
||||
return spec
|
||||
except Exception as e:
|
||||
self.logger.error(
|
||||
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache spec provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider)}')
|
||||
continue
|
||||
return None
|
||||
|
||||
def _generate_key_bindings(self, spec: PoTokenCacheSpec) -> dict[str, str]:
|
||||
bindings_cleaned = {
|
||||
**{k: v for k, v in spec.key_bindings.items() if v is not None},
|
||||
# Allow us to invalidate caches if such need arises
|
||||
'_dlp_cache': 'v1',
|
||||
}
|
||||
if spec._provider:
|
||||
bindings_cleaned['_p'] = spec._provider.PROVIDER_KEY
|
||||
self.logger.trace(f'Generated cache key bindings: {bindings_cleaned}')
|
||||
return bindings_cleaned
|
||||
|
||||
def _generate_key(self, bindings: dict) -> str:
|
||||
binding_string = ''.join(repr(dict(sorted(bindings.items()))))
|
||||
return hashlib.sha256(binding_string.encode()).hexdigest()
|
||||
|
||||
def get(self, request: PoTokenRequest) -> PoTokenResponse | None:
|
||||
spec = self._get_cache_spec(request)
|
||||
if not spec:
|
||||
self.logger.trace('No cache spec available for this request, unable to fetch from cache')
|
||||
return None
|
||||
|
||||
cache_key = self._generate_key(self._generate_key_bindings(spec))
|
||||
self.logger.trace(f'Attempting to access PO Token cache using key: {cache_key}')
|
||||
|
||||
for idx, provider in enumerate(self._get_cache_providers(request)):
|
||||
try:
|
||||
self.logger.trace(
|
||||
f'Attempting to fetch PO Token response from "{provider.PROVIDER_NAME}" cache provider')
|
||||
cache_response = provider.get(cache_key)
|
||||
if not cache_response:
|
||||
continue
|
||||
try:
|
||||
po_token_response = PoTokenResponse(**json.loads(cache_response))
|
||||
except (TypeError, ValueError, json.JSONDecodeError):
|
||||
po_token_response = None
|
||||
if not validate_response(po_token_response):
|
||||
self.logger.error(
|
||||
f'Invalid PO Token response retrieved from cache provider "{provider.PROVIDER_NAME}": '
|
||||
f'{cache_response}{provider_bug_report_message(provider)}')
|
||||
provider.delete(cache_key)
|
||||
continue
|
||||
self.logger.trace(
|
||||
f'PO Token response retrieved from cache using "{provider.PROVIDER_NAME}" provider: '
|
||||
f'{po_token_response}')
|
||||
if idx > 0:
|
||||
# Write back to the highest priority cache provider,
|
||||
# so we stop trying to fetch from lower priority providers
|
||||
self.logger.trace('Writing PO Token response to highest priority cache provider')
|
||||
self.store(request, po_token_response, write_policy=CacheProviderWritePolicy.WRITE_FIRST)
|
||||
|
||||
return po_token_response
|
||||
except PoTokenCacheProviderError as e:
|
||||
self.logger.warning(
|
||||
f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
|
||||
continue
|
||||
except Exception as e:
|
||||
self.logger.error(
|
||||
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider)}',
|
||||
)
|
||||
continue
|
||||
return None
|
||||
|
||||
def store(
|
||||
self,
|
||||
request: PoTokenRequest,
|
||||
response: PoTokenResponse,
|
||||
write_policy: CacheProviderWritePolicy | None = None,
|
||||
):
|
||||
spec = self._get_cache_spec(request)
|
||||
if not spec:
|
||||
self.logger.trace('No cache spec available for this request. Not caching.')
|
||||
return
|
||||
|
||||
if not validate_response(response):
|
||||
self.logger.error(
|
||||
f'Invalid PO Token response provided to PoTokenCache.store(): '
|
||||
f'{response}{bug_reports_message()}')
|
||||
return
|
||||
|
||||
cache_key = self._generate_key(self._generate_key_bindings(spec))
|
||||
self.logger.trace(f'Attempting to access PO Token cache using key: {cache_key}')
|
||||
|
||||
default_expires_at = int(dt.datetime.now(dt.timezone.utc).timestamp()) + spec.default_ttl
|
||||
cache_response = dataclasses.replace(response, expires_at=response.expires_at or default_expires_at)
|
||||
|
||||
write_policy = write_policy or spec.write_policy
|
||||
self.logger.trace(f'Using write policy: {write_policy}')
|
||||
|
||||
for idx, provider in enumerate(self._get_cache_providers(request)):
|
||||
try:
|
||||
self.logger.trace(
|
||||
f'Caching PO Token response in "{provider.PROVIDER_NAME}" cache provider '
|
||||
f'(key={cache_key}, expires_at={cache_response.expires_at})')
|
||||
provider.store(
|
||||
key=cache_key,
|
||||
value=json.dumps(dataclasses.asdict(cache_response)),
|
||||
expires_at=cache_response.expires_at)
|
||||
except PoTokenCacheProviderError as e:
|
||||
self.logger.warning(
|
||||
f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
|
||||
except Exception as e:
|
||||
self.logger.error(
|
||||
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider)}')
|
||||
|
||||
# WRITE_FIRST should not write to lower priority providers in the case the highest priority provider fails
|
||||
if idx == 0 and write_policy == CacheProviderWritePolicy.WRITE_FIRST:
|
||||
return
|
||||
|
||||
def close(self):
|
||||
for provider in self.cache_providers.values():
|
||||
provider.close()
|
||||
for spec_provider in self.cache_spec_providers.values():
|
||||
spec_provider.close()
|
||||
|
||||
|
||||
class PoTokenRequestDirector:
|
||||
|
||||
def __init__(self, logger: IEContentProviderLogger, cache: PoTokenCache):
|
||||
self.providers: dict[str, PoTokenProvider] = {}
|
||||
self.preferences: list[Preference] = []
|
||||
self.cache = cache
|
||||
self.logger = logger
|
||||
|
||||
def register_provider(self, provider: PoTokenProvider):
|
||||
self.providers[provider.PROVIDER_KEY] = provider
|
||||
|
||||
def register_preference(self, preference: Preference):
|
||||
self.preferences.append(preference)
|
||||
|
||||
def _get_providers(self, request: PoTokenRequest) -> Iterable[PoTokenProvider]:
|
||||
"""Sorts available providers by preference, given a request"""
|
||||
preferences = {
|
||||
provider: sum(pref(provider, request) for pref in self.preferences)
|
||||
for provider in self.providers.values()
|
||||
}
|
||||
if self.logger.log_level <= self.logger.LogLevel.TRACE:
|
||||
# calling is_available() for every PO Token provider upfront may have some overhead
|
||||
self.logger.trace(f'PO Token Providers: {provider_display_list(self.providers.values())}')
|
||||
self.logger.trace('Provider preferences for this request: {}'.format(', '.join(
|
||||
f'{provider.PROVIDER_NAME}={pref}' for provider, pref in preferences.items())))
|
||||
|
||||
return (
|
||||
provider for provider in sorted(
|
||||
self.providers.values(), key=preferences.get, reverse=True)
|
||||
if provider.is_available()
|
||||
)
|
||||
|
||||
def _get_po_token(self, request) -> PoTokenResponse | None:
|
||||
for provider in self._get_providers(request):
|
||||
try:
|
||||
self.logger.trace(
|
||||
f'Attempting to fetch a PO Token from "{provider.PROVIDER_NAME}" provider')
|
||||
response = provider.request_pot(request.copy())
|
||||
except PoTokenProviderRejectedRequest as e:
|
||||
self.logger.trace(
|
||||
f'PO Token Provider "{provider.PROVIDER_NAME}" rejected this request, '
|
||||
f'trying next available provider. Reason: {e}')
|
||||
continue
|
||||
except PoTokenProviderError as e:
|
||||
self.logger.warning(
|
||||
f'Error fetching PO Token from "{provider.PROVIDER_NAME}" provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
|
||||
continue
|
||||
except Exception as e:
|
||||
self.logger.error(
|
||||
f'Unexpected error when fetching PO Token from "{provider.PROVIDER_NAME}" provider: '
|
||||
f'{e!r}{provider_bug_report_message(provider)}')
|
||||
continue
|
||||
|
||||
self.logger.trace(f'PO Token response from "{provider.PROVIDER_NAME}" provider: {response}')
|
||||
|
||||
if not validate_response(response):
|
||||
self.logger.error(
|
||||
f'Invalid PO Token response received from "{provider.PROVIDER_NAME}" provider: '
|
||||
f'{response}{provider_bug_report_message(provider)}')
|
||||
continue
|
||||
|
||||
return response
|
||||
|
||||
self.logger.trace('No PO Token providers were able to provide a valid PO Token')
|
||||
return None
|
||||
|
||||
def get_po_token(self, request: PoTokenRequest) -> str | None:
|
||||
if not request.bypass_cache:
|
||||
if pot_response := self.cache.get(request):
|
||||
return clean_pot(pot_response.po_token)
|
||||
|
||||
if not self.providers:
|
||||
self.logger.trace('No PO Token providers registered')
|
||||
return None
|
||||
|
||||
pot_response = self._get_po_token(request)
|
||||
if not pot_response:
|
||||
return None
|
||||
|
||||
pot_response.po_token = clean_pot(pot_response.po_token)
|
||||
|
||||
if pot_response.expires_at is None or pot_response.expires_at > 0:
|
||||
self.cache.store(request, pot_response)
|
||||
else:
|
||||
self.logger.trace(
|
||||
f'PO Token response will not be cached (expires_at={pot_response.expires_at})')
|
||||
|
||||
return pot_response.po_token
|
||||
|
||||
def close(self):
|
||||
for provider in self.providers.values():
|
||||
provider.close()
|
||||
self.cache.close()
|
||||
|
||||
|
||||
EXTRACTOR_ARG_PREFIX = 'youtubepot'
|
||||
|
||||
|
||||
def initialize_pot_director(ie):
|
||||
assert ie._downloader is not None, 'Downloader not set'
|
||||
|
||||
enable_trace = ie._configuration_arg(
|
||||
'pot_trace', ['false'], ie_key='youtube', casesense=False)[0] == 'true'
|
||||
|
||||
if enable_trace:
|
||||
log_level = IEContentProviderLogger.LogLevel.TRACE
|
||||
elif ie.get_param('verbose', False):
|
||||
log_level = IEContentProviderLogger.LogLevel.DEBUG
|
||||
else:
|
||||
log_level = IEContentProviderLogger.LogLevel.INFO
|
||||
|
||||
def get_provider_logger_and_settings(provider, logger_key):
|
||||
logger_prefix = f'{logger_key}:{provider.PROVIDER_NAME}'
|
||||
extractor_key = f'{EXTRACTOR_ARG_PREFIX}-{provider.PROVIDER_KEY.lower()}'
|
||||
return (
|
||||
YoutubeIEContentProviderLogger(ie, logger_prefix, log_level=log_level),
|
||||
ie.get_param('extractor_args', {}).get(extractor_key, {}))
|
||||
|
||||
cache_providers = []
|
||||
for cache_provider in _pot_cache_providers.value.values():
|
||||
logger, settings = get_provider_logger_and_settings(cache_provider, 'pot:cache')
|
||||
cache_providers.append(cache_provider(ie, logger, settings))
|
||||
cache_spec_providers = []
|
||||
for cache_spec_provider in _pot_pcs_providers.value.values():
|
||||
logger, settings = get_provider_logger_and_settings(cache_spec_provider, 'pot:cache:spec')
|
||||
cache_spec_providers.append(cache_spec_provider(ie, logger, settings))
|
||||
|
||||
cache = PoTokenCache(
|
||||
logger=YoutubeIEContentProviderLogger(ie, 'pot:cache', log_level=log_level),
|
||||
cache_providers=cache_providers,
|
||||
cache_spec_providers=cache_spec_providers,
|
||||
cache_provider_preferences=list(_pot_cache_provider_preferences.value),
|
||||
)
|
||||
|
||||
director = PoTokenRequestDirector(
|
||||
logger=YoutubeIEContentProviderLogger(ie, 'pot', log_level=log_level),
|
||||
cache=cache,
|
||||
)
|
||||
|
||||
ie._downloader.add_close_hook(director.close)
|
||||
|
||||
for provider in _pot_providers.value.values():
|
||||
logger, settings = get_provider_logger_and_settings(provider, 'pot')
|
||||
director.register_provider(provider(ie, logger, settings))
|
||||
|
||||
for preference in _ptp_preferences.value:
|
||||
director.register_preference(preference)
|
||||
|
||||
if director.logger.log_level <= director.logger.LogLevel.DEBUG:
|
||||
# calling is_available() for every PO Token provider upfront may have some overhead
|
||||
director.logger.debug(f'PO Token Providers: {provider_display_list(director.providers.values())}')
|
||||
director.logger.debug(f'PO Token Cache Providers: {provider_display_list(cache.cache_providers.values())}')
|
||||
director.logger.debug(f'PO Token Cache Spec Providers: {provider_display_list(cache.cache_spec_providers.values())}')
|
||||
director.logger.trace(f'Registered {len(director.preferences)} provider preferences')
|
||||
director.logger.trace(f'Registered {len(cache.cache_provider_preferences)} cache provider preferences')
|
||||
|
||||
return director
|
||||
|
||||
|
||||
def provider_display_list(providers: Iterable[IEContentProvider]):
|
||||
def provider_display_name(provider):
|
||||
display_str = join_nonempty(
|
||||
provider.PROVIDER_NAME,
|
||||
provider.PROVIDER_VERSION if not isinstance(provider, BuiltinIEContentProvider) else None)
|
||||
statuses = []
|
||||
if not isinstance(provider, BuiltinIEContentProvider):
|
||||
statuses.append('external')
|
||||
if not provider.is_available():
|
||||
statuses.append('unavailable')
|
||||
if statuses:
|
||||
display_str += f' ({", ".join(statuses)})'
|
||||
return display_str
|
||||
|
||||
return ', '.join(provider_display_name(provider) for provider in providers) or 'none'
|
||||
|
||||
|
||||
def clean_pot(po_token: str):
|
||||
# Clean and validate the PO Token. This will strip invalid characters off
|
||||
# (e.g. additional url params the user may accidentally include)
|
||||
try:
|
||||
return base64.urlsafe_b64encode(
|
||||
base64.urlsafe_b64decode(urllib.parse.unquote(po_token))).decode()
|
||||
except (binascii.Error, ValueError):
|
||||
raise ValueError('Invalid PO Token')
|
||||
|
||||
|
||||
def validate_response(response: PoTokenResponse | None):
|
||||
if (
|
||||
not isinstance(response, PoTokenResponse)
|
||||
or not isinstance(response.po_token, str)
|
||||
or not response.po_token
|
||||
): # noqa: SIM103
|
||||
return False
|
||||
|
||||
try:
|
||||
clean_pot(response.po_token)
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
if not isinstance(response.expires_at, int):
|
||||
return response.expires_at is None
|
||||
|
||||
return response.expires_at <= 0 or response.expires_at > int(dt.datetime.now(dt.timezone.utc).timestamp())
|
||||
|
||||
|
||||
def validate_cache_spec(spec: PoTokenCacheSpec):
|
||||
return (
|
||||
isinstance(spec, PoTokenCacheSpec)
|
||||
and isinstance(spec.write_policy, CacheProviderWritePolicy)
|
||||
and isinstance(spec.default_ttl, int)
|
||||
and isinstance(spec.key_bindings, dict)
|
||||
and all(isinstance(k, str) for k in spec.key_bindings)
|
||||
and all(v is None or isinstance(v, str) for v in spec.key_bindings.values())
|
||||
and bool([v for v in spec.key_bindings.values() if v is not None])
|
||||
)
|
156
yt_dlp/extractor/youtube/pot/_provider.py
Normal file
156
yt_dlp/extractor/youtube/pot/_provider.py
Normal file
@ -0,0 +1,156 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
import enum
|
||||
import functools
|
||||
|
||||
from yt_dlp.extractor.common import InfoExtractor
|
||||
from yt_dlp.utils import NO_DEFAULT, bug_reports_message, classproperty, traverse_obj
|
||||
from yt_dlp.version import __version__
|
||||
|
||||
# xxx: these could be generalized outside YoutubeIE eventually
|
||||
|
||||
|
||||
class IEContentProviderLogger(abc.ABC):
|
||||
|
||||
class LogLevel(enum.IntEnum):
|
||||
TRACE = 0
|
||||
DEBUG = 10
|
||||
INFO = 20
|
||||
WARNING = 30
|
||||
ERROR = 40
|
||||
|
||||
@classmethod
|
||||
def _missing_(cls, value):
|
||||
if isinstance(value, str):
|
||||
value = value.upper()
|
||||
if value in dir(cls):
|
||||
return cls[value]
|
||||
|
||||
return cls.INFO
|
||||
|
||||
log_level = LogLevel.INFO
|
||||
|
||||
@abc.abstractmethod
|
||||
def trace(self, message: str):
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def debug(self, message: str):
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def info(self, message: str):
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def warning(self, message: str, *, once=False):
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def error(self, message: str):
|
||||
pass
|
||||
|
||||
|
||||
class IEContentProviderError(Exception):
|
||||
def __init__(self, msg=None, expected=False):
|
||||
super().__init__(msg)
|
||||
self.expected = expected
|
||||
|
||||
|
||||
class IEContentProvider(abc.ABC):
|
||||
PROVIDER_VERSION: str = '0.0.0'
|
||||
BUG_REPORT_LOCATION: str = '(developer has not provided a bug report location)'
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
ie: InfoExtractor,
|
||||
logger: IEContentProviderLogger,
|
||||
settings: dict[str, list[str]], *_, **__,
|
||||
):
|
||||
self.ie = ie
|
||||
self.settings = settings or {}
|
||||
self.logger = logger
|
||||
super().__init__()
|
||||
|
||||
@classmethod
|
||||
def __init_subclass__(cls, *, suffix=None, **kwargs):
|
||||
if suffix:
|
||||
cls._PROVIDER_KEY_SUFFIX = suffix
|
||||
return super().__init_subclass__(**kwargs)
|
||||
|
||||
@classproperty
|
||||
def PROVIDER_NAME(cls) -> str:
|
||||
return cls.__name__[:-len(cls._PROVIDER_KEY_SUFFIX)]
|
||||
|
||||
@classproperty
|
||||
def BUG_REPORT_MESSAGE(cls):
|
||||
return f'please report this issue to the provider developer at {cls.BUG_REPORT_LOCATION} .'
|
||||
|
||||
@classproperty
|
||||
def PROVIDER_KEY(cls) -> str:
|
||||
assert hasattr(cls, '_PROVIDER_KEY_SUFFIX'), 'Content Provider implementation must define a suffix for the provider key'
|
||||
assert cls.__name__.endswith(cls._PROVIDER_KEY_SUFFIX), f'PoTokenProvider class names must end with "{cls._PROVIDER_KEY_SUFFIX}"'
|
||||
return cls.__name__[:-len(cls._PROVIDER_KEY_SUFFIX)]
|
||||
|
||||
@abc.abstractmethod
|
||||
def is_available(self) -> bool:
|
||||
"""
|
||||
Check if the provider is available (e.g. all required dependencies are available)
|
||||
This is used to determine if the provider should be used and to provide debug information.
|
||||
|
||||
IMPORTANT: This method should not make any network requests or perform any expensive operations.
|
||||
It is called multiple times.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
def close(self): # noqa: B027
|
||||
pass
|
||||
|
||||
def _configuration_arg(self, key, default=NO_DEFAULT, *, casesense=False):
|
||||
"""
|
||||
@returns A list of values for the setting given by "key"
|
||||
or "default" if no such key is present
|
||||
@param default The default value to return when the key is not present (default: [])
|
||||
@param casesense When false, the values are converted to lower case
|
||||
"""
|
||||
val = traverse_obj(self.settings, key)
|
||||
if val is None:
|
||||
return [] if default is NO_DEFAULT else default
|
||||
return list(val) if casesense else [x.lower() for x in val]
|
||||
|
||||
|
||||
class BuiltinIEContentProvider(IEContentProvider, abc.ABC):
|
||||
PROVIDER_VERSION = __version__
|
||||
BUG_REPORT_MESSAGE = bug_reports_message(before='')
|
||||
|
||||
|
||||
def register_provider_generic(
|
||||
provider,
|
||||
base_class,
|
||||
registry,
|
||||
):
|
||||
"""Generic function to register a provider class"""
|
||||
assert issubclass(provider, base_class), f'{provider} must be a subclass of {base_class.__name__}'
|
||||
assert provider.PROVIDER_KEY not in registry, f'{base_class.__name__} {provider.PROVIDER_KEY} already registered'
|
||||
registry[provider.PROVIDER_KEY] = provider
|
||||
return provider
|
||||
|
||||
|
||||
def register_preference_generic(
|
||||
base_class,
|
||||
registry,
|
||||
*providers,
|
||||
):
|
||||
"""Generic function to register a preference for a provider"""
|
||||
assert all(issubclass(provider, base_class) for provider in providers)
|
||||
|
||||
def outer(preference):
|
||||
@functools.wraps(preference)
|
||||
def inner(provider, *args, **kwargs):
|
||||
if not providers or isinstance(provider, providers):
|
||||
return preference(provider, *args, **kwargs)
|
||||
return 0
|
||||
registry.add(inner)
|
||||
return preference
|
||||
return outer
|
8
yt_dlp/extractor/youtube/pot/_registry.py
Normal file
8
yt_dlp/extractor/youtube/pot/_registry.py
Normal file
@ -0,0 +1,8 @@
|
||||
from yt_dlp.globals import Indirect
|
||||
|
||||
_pot_providers = Indirect({})
|
||||
_ptp_preferences = Indirect(set())
|
||||
_pot_pcs_providers = Indirect({})
|
||||
_pot_cache_providers = Indirect({})
|
||||
_pot_cache_provider_preferences = Indirect(set())
|
||||
_pot_memory_cache = Indirect({})
|
97
yt_dlp/extractor/youtube/pot/cache.py
Normal file
97
yt_dlp/extractor/youtube/pot/cache.py
Normal file
@ -0,0 +1,97 @@
|
||||
"""PUBLIC API"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
import dataclasses
|
||||
import enum
|
||||
import typing
|
||||
|
||||
from yt_dlp.extractor.youtube.pot._provider import (
|
||||
IEContentProvider,
|
||||
IEContentProviderError,
|
||||
register_preference_generic,
|
||||
register_provider_generic,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot._registry import (
|
||||
_pot_cache_provider_preferences,
|
||||
_pot_cache_providers,
|
||||
_pot_pcs_providers,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest
|
||||
|
||||
|
||||
class PoTokenCacheProviderError(IEContentProviderError):
|
||||
"""An error occurred while fetching a PO Token"""
|
||||
|
||||
|
||||
class PoTokenCacheProvider(IEContentProvider, abc.ABC, suffix='PCP'):
|
||||
@abc.abstractmethod
|
||||
def get(self, key: str) -> str | None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def store(self, key: str, value: str, expires_at: int):
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
def delete(self, key: str):
|
||||
pass
|
||||
|
||||
|
||||
class CacheProviderWritePolicy(enum.Enum):
|
||||
WRITE_ALL = enum.auto() # Write to all cache providers
|
||||
WRITE_FIRST = enum.auto() # Write to only the first cache provider
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PoTokenCacheSpec:
|
||||
key_bindings: dict[str, str | None]
|
||||
default_ttl: int
|
||||
write_policy: CacheProviderWritePolicy = CacheProviderWritePolicy.WRITE_ALL
|
||||
|
||||
# Internal
|
||||
_provider: PoTokenCacheSpecProvider | None = None
|
||||
|
||||
|
||||
class PoTokenCacheSpecProvider(IEContentProvider, abc.ABC, suffix='PCSP'):
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
@abc.abstractmethod
|
||||
def generate_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None:
|
||||
"""Generate a cache spec for the given request"""
|
||||
pass
|
||||
|
||||
|
||||
def register_provider(provider: type[PoTokenCacheProvider]):
|
||||
"""Register a PoTokenCacheProvider class"""
|
||||
return register_provider_generic(
|
||||
provider=provider,
|
||||
base_class=PoTokenCacheProvider,
|
||||
registry=_pot_cache_providers.value,
|
||||
)
|
||||
|
||||
|
||||
def register_spec(provider: type[PoTokenCacheSpecProvider]):
|
||||
"""Register a PoTokenCacheSpecProvider class"""
|
||||
return register_provider_generic(
|
||||
provider=provider,
|
||||
base_class=PoTokenCacheSpecProvider,
|
||||
registry=_pot_pcs_providers.value,
|
||||
)
|
||||
|
||||
|
||||
def register_preference(
|
||||
*providers: type[PoTokenCacheProvider]) -> typing.Callable[[CacheProviderPreference], CacheProviderPreference]:
|
||||
"""Register a preference for a PoTokenCacheProvider"""
|
||||
return register_preference_generic(
|
||||
PoTokenCacheProvider,
|
||||
_pot_cache_provider_preferences.value,
|
||||
*providers,
|
||||
)
|
||||
|
||||
|
||||
if typing.TYPE_CHECKING:
|
||||
CacheProviderPreference = typing.Callable[[PoTokenCacheProvider, PoTokenRequest], int]
|
281
yt_dlp/extractor/youtube/pot/provider.py
Normal file
281
yt_dlp/extractor/youtube/pot/provider.py
Normal file
@ -0,0 +1,281 @@
|
||||
"""PUBLIC API"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
import copy
|
||||
import dataclasses
|
||||
import enum
|
||||
import functools
|
||||
import typing
|
||||
import urllib.parse
|
||||
|
||||
from yt_dlp.cookies import YoutubeDLCookieJar
|
||||
from yt_dlp.extractor.youtube.pot._provider import (
|
||||
IEContentProvider,
|
||||
IEContentProviderError,
|
||||
register_preference_generic,
|
||||
register_provider_generic,
|
||||
)
|
||||
from yt_dlp.extractor.youtube.pot._registry import _pot_providers, _ptp_preferences
|
||||
from yt_dlp.networking import Request, Response
|
||||
from yt_dlp.utils import traverse_obj
|
||||
from yt_dlp.utils.networking import HTTPHeaderDict
|
||||
|
||||
__all__ = [
|
||||
'ExternalRequestFeature',
|
||||
'PoTokenContext',
|
||||
'PoTokenProvider',
|
||||
'PoTokenProviderError',
|
||||
'PoTokenProviderRejectedRequest',
|
||||
'PoTokenRequest',
|
||||
'PoTokenResponse',
|
||||
'provider_bug_report_message',
|
||||
'register_preference',
|
||||
'register_provider',
|
||||
]
|
||||
|
||||
|
||||
class PoTokenContext(enum.Enum):
|
||||
GVS = 'gvs'
|
||||
PLAYER = 'player'
|
||||
SUBS = 'subs'
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PoTokenRequest:
|
||||
# YouTube parameters
|
||||
context: PoTokenContext
|
||||
innertube_context: InnertubeContext
|
||||
innertube_host: str | None = None
|
||||
session_index: str | None = None
|
||||
player_url: str | None = None
|
||||
is_authenticated: bool = False
|
||||
video_webpage: str | None = None
|
||||
internal_client_name: str | None = None
|
||||
|
||||
# Content binding parameters
|
||||
visitor_data: str | None = None
|
||||
data_sync_id: str | None = None
|
||||
video_id: str | None = None
|
||||
|
||||
# Networking parameters
|
||||
request_cookiejar: YoutubeDLCookieJar = dataclasses.field(default_factory=YoutubeDLCookieJar)
|
||||
request_proxy: str | None = None
|
||||
request_headers: HTTPHeaderDict = dataclasses.field(default_factory=HTTPHeaderDict)
|
||||
request_timeout: float | None = None
|
||||
request_source_address: str | None = None
|
||||
request_verify_tls: bool = True
|
||||
|
||||
# Generate a new token, do not used a cached token
|
||||
# The token should still be cached for future requests
|
||||
bypass_cache: bool = False
|
||||
|
||||
def copy(self):
|
||||
return dataclasses.replace(
|
||||
self,
|
||||
request_headers=HTTPHeaderDict(self.request_headers),
|
||||
innertube_context=copy.deepcopy(self.innertube_context),
|
||||
)
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PoTokenResponse:
|
||||
po_token: str
|
||||
expires_at: int | None = None
|
||||
|
||||
|
||||
class PoTokenProviderRejectedRequest(IEContentProviderError):
|
||||
"""Reject the PoTokenRequest (cannot handle the request)"""
|
||||
|
||||
|
||||
class PoTokenProviderError(IEContentProviderError):
|
||||
"""An error occurred while fetching a PO Token"""
|
||||
|
||||
|
||||
class ExternalRequestFeature(enum.Enum):
|
||||
PROXY_SCHEME_HTTP = enum.auto()
|
||||
PROXY_SCHEME_HTTPS = enum.auto()
|
||||
PROXY_SCHEME_SOCKS4 = enum.auto()
|
||||
PROXY_SCHEME_SOCKS4A = enum.auto()
|
||||
PROXY_SCHEME_SOCKS5 = enum.auto()
|
||||
PROXY_SCHEME_SOCKS5H = enum.auto()
|
||||
SOURCE_ADDRESS = enum.auto()
|
||||
DISABLE_TLS_VERIFICATION = enum.auto()
|
||||
|
||||
|
||||
class PoTokenProvider(IEContentProvider, abc.ABC, suffix='PTP'):
|
||||
|
||||
# Set to None to disable the check
|
||||
_SUPPORTED_CONTEXTS: tuple[PoTokenContext] | None = ()
|
||||
|
||||
# Innertube Client Name.
|
||||
# For example, "WEB", "ANDROID", "TVHTML5".
|
||||
# For a list of WebPO client names, see yt_dlp.extractor.youtube.pot.utils.WEBPO_CLIENTS.
|
||||
# Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS
|
||||
# for a list of client names currently supported by the YouTube extractor.
|
||||
_SUPPORTED_CLIENTS: tuple[str] | None = ()
|
||||
|
||||
# If making external requests to websites (i.e. to youtube.com)
|
||||
# using another library or service (i.e., not _request_webpage),
|
||||
# add the request features that are supported.
|
||||
# If only using _request_webpage to make external requests, set this to None.
|
||||
_SUPPORTED_EXTERNAL_REQUEST_FEATURES: tuple[ExternalRequestFeature] | None = ()
|
||||
|
||||
def __validate_request(self, request: PoTokenRequest):
|
||||
if not self.is_available():
|
||||
raise PoTokenProviderRejectedRequest(f'{self.PROVIDER_NAME} is not available')
|
||||
|
||||
# Validate request using built-in settings
|
||||
if (
|
||||
self._SUPPORTED_CONTEXTS is not None
|
||||
and request.context not in self._SUPPORTED_CONTEXTS
|
||||
):
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
f'PO Token Context "{request.context}" is not supported by {self.PROVIDER_NAME}')
|
||||
|
||||
if self._SUPPORTED_CLIENTS is not None:
|
||||
client_name = traverse_obj(
|
||||
request.innertube_context, ('client', 'clientName'))
|
||||
if client_name not in self._SUPPORTED_CLIENTS:
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
f'Client "{client_name}" is not supported by {self.PROVIDER_NAME}. '
|
||||
f'Supported clients: {", ".join(self._SUPPORTED_CLIENTS) or "none"}')
|
||||
|
||||
self.__validate_external_request_features(request)
|
||||
|
||||
@functools.cached_property
|
||||
def _supported_proxy_schemes(self):
|
||||
return {
|
||||
scheme: feature
|
||||
for scheme, feature in {
|
||||
'http': ExternalRequestFeature.PROXY_SCHEME_HTTP,
|
||||
'https': ExternalRequestFeature.PROXY_SCHEME_HTTPS,
|
||||
'socks4': ExternalRequestFeature.PROXY_SCHEME_SOCKS4,
|
||||
'socks4a': ExternalRequestFeature.PROXY_SCHEME_SOCKS4A,
|
||||
'socks5': ExternalRequestFeature.PROXY_SCHEME_SOCKS5,
|
||||
'socks5h': ExternalRequestFeature.PROXY_SCHEME_SOCKS5H,
|
||||
}.items()
|
||||
if feature in (self._SUPPORTED_EXTERNAL_REQUEST_FEATURES or [])
|
||||
}
|
||||
|
||||
def __validate_external_request_features(self, request: PoTokenRequest):
|
||||
if self._SUPPORTED_EXTERNAL_REQUEST_FEATURES is None:
|
||||
return
|
||||
|
||||
if request.request_proxy:
|
||||
scheme = urllib.parse.urlparse(request.request_proxy).scheme
|
||||
if scheme.lower() not in self._supported_proxy_schemes:
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
f'External requests by "{self.PROVIDER_NAME}" provider do not '
|
||||
f'support proxy scheme "{scheme}". Supported proxy schemes: '
|
||||
f'{", ".join(self._supported_proxy_schemes) or "none"}')
|
||||
|
||||
if (
|
||||
request.request_source_address
|
||||
and ExternalRequestFeature.SOURCE_ADDRESS not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES
|
||||
):
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
f'External requests by "{self.PROVIDER_NAME}" provider '
|
||||
f'do not support setting source address')
|
||||
|
||||
if (
|
||||
not request.request_verify_tls
|
||||
and ExternalRequestFeature.DISABLE_TLS_VERIFICATION not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES
|
||||
):
|
||||
raise PoTokenProviderRejectedRequest(
|
||||
f'External requests by "{self.PROVIDER_NAME}" provider '
|
||||
f'do not support ignoring TLS certificate failures')
|
||||
|
||||
def request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
self.__validate_request(request)
|
||||
return self._real_request_pot(request)
|
||||
|
||||
@abc.abstractmethod
|
||||
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
|
||||
"""To be implemented by subclasses"""
|
||||
pass
|
||||
|
||||
# Helper functions
|
||||
|
||||
def _request_webpage(self, request: Request, pot_request: PoTokenRequest | None = None, note=None, **kwargs) -> Response:
|
||||
"""Make a request using the internal HTTP Client.
|
||||
Use this instead of calling requests, urllib3 or other HTTP client libraries directly!
|
||||
|
||||
YouTube cookies will be automatically applied if this request is made to YouTube.
|
||||
|
||||
@param request: The request to make
|
||||
@param pot_request: The PoTokenRequest to use. Request parameters will be merged from it.
|
||||
@param note: Custom log message to display when making the request. Set to `False` to disable logging.
|
||||
|
||||
Tips:
|
||||
- Disable proxy (e.g. if calling local service): Request(..., proxies={'all': None})
|
||||
- Set request timeout: Request(..., extensions={'timeout': 5.0})
|
||||
"""
|
||||
req = request.copy()
|
||||
|
||||
# Merge some ctx request settings into the request
|
||||
# Most of these will already be used by the configured ydl instance,
|
||||
# however, the YouTube extractor may override some.
|
||||
if pot_request is not None:
|
||||
req.headers = HTTPHeaderDict(pot_request.request_headers, req.headers)
|
||||
req.proxies = req.proxies or ({'all': pot_request.request_proxy} if pot_request.request_proxy else {})
|
||||
|
||||
if pot_request.request_cookiejar is not None:
|
||||
req.extensions['cookiejar'] = req.extensions.get('cookiejar', pot_request.request_cookiejar)
|
||||
|
||||
if note is not False:
|
||||
self.logger.info(str(note) if note else 'Requesting webpage')
|
||||
return self.ie._downloader.urlopen(req)
|
||||
|
||||
|
||||
def register_provider(provider: type[PoTokenProvider]):
|
||||
"""Register a PoTokenProvider class"""
|
||||
return register_provider_generic(
|
||||
provider=provider,
|
||||
base_class=PoTokenProvider,
|
||||
registry=_pot_providers.value,
|
||||
)
|
||||
|
||||
|
||||
def provider_bug_report_message(provider: IEContentProvider, before=';'):
|
||||
msg = provider.BUG_REPORT_MESSAGE
|
||||
|
||||
before = before.rstrip()
|
||||
if not before or before.endswith(('.', '!', '?')):
|
||||
msg = msg[0].title() + msg[1:]
|
||||
|
||||
return f'{before} {msg}' if before else msg
|
||||
|
||||
|
||||
def register_preference(*providers: type[PoTokenProvider]) -> typing.Callable[[Preference], Preference]:
|
||||
"""Register a preference for a PoTokenProvider"""
|
||||
return register_preference_generic(
|
||||
PoTokenProvider,
|
||||
_ptp_preferences.value,
|
||||
*providers,
|
||||
)
|
||||
|
||||
|
||||
if typing.TYPE_CHECKING:
|
||||
Preference = typing.Callable[[PoTokenProvider, PoTokenRequest], int]
|
||||
__all__.append('Preference')
|
||||
|
||||
# Barebones innertube context. There may be more fields.
|
||||
class ClientInfo(typing.TypedDict, total=False):
|
||||
hl: str | None
|
||||
gl: str | None
|
||||
remoteHost: str | None
|
||||
deviceMake: str | None
|
||||
deviceModel: str | None
|
||||
visitorData: str | None
|
||||
userAgent: str | None
|
||||
clientName: str
|
||||
clientVersion: str
|
||||
osName: str | None
|
||||
osVersion: str | None
|
||||
|
||||
class InnertubeContext(typing.TypedDict, total=False):
|
||||
client: ClientInfo
|
||||
request: dict
|
||||
user: dict
|
73
yt_dlp/extractor/youtube/pot/utils.py
Normal file
73
yt_dlp/extractor/youtube/pot/utils.py
Normal file
@ -0,0 +1,73 @@
|
||||
"""PUBLIC API"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import contextlib
|
||||
import enum
|
||||
import re
|
||||
import urllib.parse
|
||||
|
||||
from yt_dlp.extractor.youtube.pot.provider import PoTokenContext, PoTokenRequest
|
||||
from yt_dlp.utils import traverse_obj
|
||||
|
||||
__all__ = ['WEBPO_CLIENTS', 'ContentBindingType', 'get_webpo_content_binding']
|
||||
|
||||
WEBPO_CLIENTS = (
|
||||
'WEB',
|
||||
'MWEB',
|
||||
'TVHTML5',
|
||||
'WEB_EMBEDDED_PLAYER',
|
||||
'WEB_CREATOR',
|
||||
'WEB_REMIX',
|
||||
'TVHTML5_SIMPLY_EMBEDDED_PLAYER',
|
||||
)
|
||||
|
||||
|
||||
class ContentBindingType(enum.Enum):
|
||||
VISITOR_DATA = 'visitor_data'
|
||||
DATASYNC_ID = 'datasync_id'
|
||||
VIDEO_ID = 'video_id'
|
||||
VISITOR_ID = 'visitor_id'
|
||||
|
||||
|
||||
def get_webpo_content_binding(
|
||||
request: PoTokenRequest,
|
||||
webpo_clients=WEBPO_CLIENTS,
|
||||
bind_to_visitor_id=False,
|
||||
) -> tuple[str | None, ContentBindingType | None]:
|
||||
|
||||
client_name = traverse_obj(request.innertube_context, ('client', 'clientName'))
|
||||
if not client_name or client_name not in webpo_clients:
|
||||
return None, None
|
||||
|
||||
if request.context == PoTokenContext.GVS or client_name in ('WEB_REMIX', ):
|
||||
if request.is_authenticated:
|
||||
return request.data_sync_id, ContentBindingType.DATASYNC_ID
|
||||
else:
|
||||
if bind_to_visitor_id:
|
||||
visitor_id = _extract_visitor_id(request.visitor_data)
|
||||
if visitor_id:
|
||||
return visitor_id, ContentBindingType.VISITOR_ID
|
||||
return request.visitor_data, ContentBindingType.VISITOR_DATA
|
||||
|
||||
elif request.context in (PoTokenContext.PLAYER, PoTokenContext.SUBS):
|
||||
return request.video_id, ContentBindingType.VIDEO_ID
|
||||
|
||||
return None, None
|
||||
|
||||
|
||||
def _extract_visitor_id(visitor_data):
|
||||
if not visitor_data:
|
||||
return None
|
||||
|
||||
# Attempt to extract the visitor ID from the visitor_data protobuf
|
||||
# xxx: ideally should use a protobuf parser
|
||||
with contextlib.suppress(Exception):
|
||||
visitor_id = base64.urlsafe_b64decode(
|
||||
urllib.parse.unquote_plus(visitor_data))[2:13].decode()
|
||||
# check that visitor id is all letters and numbers
|
||||
if re.fullmatch(r'[A-Za-z0-9_-]{11}', visitor_id):
|
||||
return visitor_id
|
||||
|
||||
return None
|
@ -590,39 +590,12 @@ def dict_item(key, val):
|
||||
return ret, True
|
||||
return ret, False
|
||||
|
||||
for m in re.finditer(rf'''(?x)
|
||||
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
|
||||
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
|
||||
var = m.group('var1') or m.group('var2')
|
||||
start, end = m.span()
|
||||
sign = m.group('pre_sign') or m.group('post_sign')
|
||||
ret = local_vars[var]
|
||||
local_vars[var] += 1 if sign[0] == '+' else -1
|
||||
if m.group('pre_sign'):
|
||||
ret = local_vars[var]
|
||||
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
|
||||
|
||||
if not expr:
|
||||
return None, should_return
|
||||
|
||||
m = re.match(fr'''(?x)
|
||||
(?P<assign>
|
||||
(?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s*
|
||||
(?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})?
|
||||
=(?!=)(?P<expr>.*)$
|
||||
)|(?P<return>
|
||||
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$
|
||||
)|(?P<attribute>
|
||||
(?P<var>{_NAME_RE})(?:
|
||||
(?P<nullish>\?)?\.(?P<member>[^(]+)|
|
||||
\[(?P<member2>{_NESTED_BRACKETS})\]
|
||||
)\s*
|
||||
)|(?P<indexing>
|
||||
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
|
||||
)|(?P<function>
|
||||
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
|
||||
)''', expr)
|
||||
if m and m.group('assign'):
|
||||
''', expr)
|
||||
if m: # We are assigning a value to a variable
|
||||
left_val = local_vars.get(m.group('out'))
|
||||
|
||||
if not m.group('index'):
|
||||
@ -640,7 +613,35 @@ def dict_item(key, val):
|
||||
m.group('op'), self._index(left_val, idx), m.group('expr'), expr, local_vars, allow_recursion)
|
||||
return left_val[idx], should_return
|
||||
|
||||
elif expr.isdigit():
|
||||
for m in re.finditer(rf'''(?x)
|
||||
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
|
||||
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
|
||||
var = m.group('var1') or m.group('var2')
|
||||
start, end = m.span()
|
||||
sign = m.group('pre_sign') or m.group('post_sign')
|
||||
ret = local_vars[var]
|
||||
local_vars[var] += 1 if sign[0] == '+' else -1
|
||||
if m.group('pre_sign'):
|
||||
ret = local_vars[var]
|
||||
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
|
||||
|
||||
if not expr:
|
||||
return None, should_return
|
||||
|
||||
m = re.match(fr'''(?x)
|
||||
(?P<return>
|
||||
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$
|
||||
)|(?P<attribute>
|
||||
(?P<var>{_NAME_RE})(?:
|
||||
(?P<nullish>\?)?\.(?P<member>[^(]+)|
|
||||
\[(?P<member2>{_NESTED_BRACKETS})\]
|
||||
)\s*
|
||||
)|(?P<indexing>
|
||||
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
|
||||
)|(?P<function>
|
||||
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
|
||||
)''', expr)
|
||||
if expr.isdigit():
|
||||
return int(expr), should_return
|
||||
|
||||
elif expr == 'break':
|
||||
|
@ -6,7 +6,8 @@
|
||||
import re
|
||||
import urllib.parse
|
||||
|
||||
from ._helper import InstanceStoreMixin, select_proxy
|
||||
from ._helper import InstanceStoreMixin
|
||||
from ..utils.networking import select_proxy
|
||||
from .common import (
|
||||
Features,
|
||||
Request,
|
||||
|
@ -13,7 +13,6 @@
|
||||
from .exceptions import RequestError
|
||||
from ..dependencies import certifi
|
||||
from ..socks import ProxyType, sockssocket
|
||||
from ..utils import format_field, traverse_obj
|
||||
|
||||
if typing.TYPE_CHECKING:
|
||||
from collections.abc import Iterable
|
||||
@ -82,19 +81,6 @@ def unquote_if_non_empty(s):
|
||||
}
|
||||
|
||||
|
||||
def select_proxy(url, proxies):
|
||||
"""Unified proxy selector for all backends"""
|
||||
url_components = urllib.parse.urlparse(url)
|
||||
if 'no' in proxies:
|
||||
hostport = url_components.hostname + format_field(url_components.port, None, ':%s')
|
||||
if urllib.request.proxy_bypass_environment(hostport, {'no': proxies['no']}):
|
||||
return
|
||||
elif urllib.request.proxy_bypass(hostport): # check system settings
|
||||
return
|
||||
|
||||
return traverse_obj(proxies, url_components.scheme or 'http', 'all')
|
||||
|
||||
|
||||
def get_redirect_method(method, status):
|
||||
"""Unified redirect method handling"""
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
|
||||
from ..dependencies import brotli, requests, urllib3
|
||||
from ..utils import bug_reports_message, int_or_none, variadic
|
||||
from ..utils.networking import normalize_url
|
||||
from ..utils.networking import normalize_url, select_proxy
|
||||
|
||||
if requests is None:
|
||||
raise ImportError('requests module is not installed')
|
||||
@ -41,7 +41,6 @@
|
||||
create_socks_proxy_socket,
|
||||
get_redirect_method,
|
||||
make_socks_proxy_opts,
|
||||
select_proxy,
|
||||
)
|
||||
from .common import (
|
||||
Features,
|
||||
|
@ -26,7 +26,6 @@
|
||||
create_socks_proxy_socket,
|
||||
get_redirect_method,
|
||||
make_socks_proxy_opts,
|
||||
select_proxy,
|
||||
)
|
||||
from .common import Features, RequestHandler, Response, register_rh
|
||||
from .exceptions import (
|
||||
@ -41,7 +40,7 @@
|
||||
from ..dependencies import brotli
|
||||
from ..socks import ProxyError as SocksProxyError
|
||||
from ..utils import update_url_query
|
||||
from ..utils.networking import normalize_url
|
||||
from ..utils.networking import normalize_url, select_proxy
|
||||
|
||||
SUPPORTED_ENCODINGS = ['gzip', 'deflate']
|
||||
CONTENT_DECODE_ERRORS = [zlib.error, OSError]
|
||||
|
@ -11,8 +11,8 @@
|
||||
create_connection,
|
||||
create_socks_proxy_socket,
|
||||
make_socks_proxy_opts,
|
||||
select_proxy,
|
||||
)
|
||||
from ..utils.networking import select_proxy
|
||||
from .common import Features, Response, register_rh
|
||||
from .exceptions import (
|
||||
CertificateVerifyError,
|
||||
|
@ -230,6 +230,9 @@ def format_option_help(self, formatter=None):
|
||||
formatter.indent()
|
||||
heading = formatter.format_heading('Preset Aliases')
|
||||
formatter.indent()
|
||||
description = formatter.format_description(
|
||||
'Predefined aliases for convenience and ease of use. Note that future versions of yt-dlp '
|
||||
'may add or adjust presets, but the existing preset names will not be changed or removed')
|
||||
result = []
|
||||
for name, args in _PRESET_ALIASES.items():
|
||||
option = optparse.Option('-t', help=shlex.join(args))
|
||||
@ -238,7 +241,7 @@ def format_option_help(self, formatter=None):
|
||||
formatter.dedent()
|
||||
formatter.dedent()
|
||||
help_lines = '\n'.join(result)
|
||||
return f'{formatted_help}\n{heading}{help_lines}'
|
||||
return f'{formatted_help}\n{heading}{description}\n{help_lines}'
|
||||
|
||||
|
||||
def create_parser():
|
||||
@ -470,7 +473,7 @@ def _preset_alias_callback(option, opt_str, value, parser):
|
||||
general.add_option(
|
||||
'--live-from-start',
|
||||
action='store_true', dest='live_from_start',
|
||||
help='Download livestreams from the start. Currently only supported for YouTube (Experimental)')
|
||||
help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch')
|
||||
general.add_option(
|
||||
'--no-live-from-start',
|
||||
action='store_false', dest='live_from_start',
|
||||
@ -545,9 +548,9 @@ def _preset_alias_callback(option, opt_str, value, parser):
|
||||
help=(
|
||||
'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". '
|
||||
'Arguments are parsed according to the Python string formatting mini-language. '
|
||||
'E.g. --alias get-audio,-X "-S=aext:{0},abr -x --audio-format {0}" creates options '
|
||||
'E.g. --alias get-audio,-X "-S aext:{0},abr -x --audio-format {0}" creates options '
|
||||
'"--get-audio" and "-X" that takes an argument (ARG0) and expands to '
|
||||
'"-S=aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. '
|
||||
'"-S aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. '
|
||||
'Alias options can trigger more aliases; so be careful to avoid defining recursive options. '
|
||||
f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. '
|
||||
'This option can be used multiple times'))
|
||||
|
@ -10,7 +10,8 @@
|
||||
if typing.TYPE_CHECKING:
|
||||
T = typing.TypeVar('T')
|
||||
|
||||
from ._utils import NO_DEFAULT, remove_start
|
||||
from ._utils import NO_DEFAULT, remove_start, format_field
|
||||
from .traversal import traverse_obj
|
||||
|
||||
|
||||
def random_user_agent():
|
||||
@ -278,3 +279,16 @@ def normalize_url(url):
|
||||
query=escape_rfc3986(url_parsed.query),
|
||||
fragment=escape_rfc3986(url_parsed.fragment),
|
||||
).geturl()
|
||||
|
||||
|
||||
def select_proxy(url, proxies):
|
||||
"""Unified proxy selector for all backends"""
|
||||
url_components = urllib.parse.urlparse(url)
|
||||
if 'no' in proxies:
|
||||
hostport = url_components.hostname + format_field(url_components.port, None, ':%s')
|
||||
if urllib.request.proxy_bypass_environment(hostport, {'no': proxies['no']}):
|
||||
return
|
||||
elif urllib.request.proxy_bypass(hostport): # check system settings
|
||||
return
|
||||
|
||||
return traverse_obj(proxies, url_components.scheme or 'http', 'all')
|
||||
|
@ -1,8 +1,8 @@
|
||||
# Autogenerated by devscripts/update-version.py
|
||||
|
||||
__version__ = '2025.04.30'
|
||||
__version__ = '2025.05.22'
|
||||
|
||||
RELEASE_GIT_HEAD = '505b400795af557bdcfd9d4fa7e9133b26ef431c'
|
||||
RELEASE_GIT_HEAD = '7977b329ed97b216e37bd402f4935f28c00eac9e'
|
||||
|
||||
VARIANT = None
|
||||
|
||||
@ -12,4 +12,4 @@
|
||||
|
||||
ORIGIN = 'yt-dlp/yt-dlp'
|
||||
|
||||
_pkg_version = '2025.04.30'
|
||||
_pkg_version = '2025.05.22'
|
||||
|
Loading…
Reference in New Issue
Block a user