1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-07-09 14:58:32 +00:00

Merge branch 'master' of github-cb:c-basalt/yt-dlp into jsi

This commit is contained in:
c-basalt 2025-04-13 12:46:05 -04:00
commit 191740c90c
105 changed files with 7822 additions and 5741 deletions

View File

@ -742,3 +742,21 @@ lfavole
mp3butcher mp3butcher
slipinthedove slipinthedove
YoshiTabletopGamer YoshiTabletopGamer
Arc8ne
benfaerber
chrisellsworth
fries1234
Kenshin9977
MichaelDeBoey
msikma
pedro
pferreir
red-acid
refack
rysson
somini
thedenv
vallovic
arabcoders
mireq
mlabeeb03

View File

@ -4,6 +4,142 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.03.31
#### Core changes
- [Add `--compat-options 2024`](https://github.com/yt-dlp/yt-dlp/commit/22e34adbd741e1c7072015debd615dc3fb71c401) ([#12789](https://github.com/yt-dlp/yt-dlp/issues/12789)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **francaisfacile**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bb321cfdc3fd4400598ddb12a15862bc2ac8fc10) ([#12787](https://github.com/yt-dlp/yt-dlp/issues/12787)) by [mlabeeb03](https://github.com/mlabeeb03)
- **generic**: [Validate response before checking m3u8 live status](https://github.com/yt-dlp/yt-dlp/commit/9a1ec1d36e172d252714cef712a6d091e0a0c4f2) ([#12784](https://github.com/yt-dlp/yt-dlp/issues/12784)) by [bashonly](https://github.com/bashonly)
- **microsoftlearnepisode**: [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/d63696f23a341ee36a3237ccb5d5e14b34c2c579) ([#12799](https://github.com/yt-dlp/yt-dlp/issues/12799)) by [bashonly](https://github.com/bashonly)
- **mlbtv**: [Fix radio-only extraction](https://github.com/yt-dlp/yt-dlp/commit/f033d86b96b36f8c5289dd7c3304f42d4d9f6ff4) ([#12792](https://github.com/yt-dlp/yt-dlp/issues/12792)) by [bashonly](https://github.com/bashonly)
- **on24**: [Support `mainEvent` URLs](https://github.com/yt-dlp/yt-dlp/commit/e465b078ead75472fcb7b86f6ccaf2b5d3bc4c21) ([#12800](https://github.com/yt-dlp/yt-dlp/issues/12800)) by [bashonly](https://github.com/bashonly)
- **sbs**: [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/29560359120f28adaaac67c86fa8442eb72daa0d) ([#12785](https://github.com/yt-dlp/yt-dlp/issues/12785)) by [bashonly](https://github.com/bashonly)
- **stvr**: [Rename extractor from RTVS to STVR](https://github.com/yt-dlp/yt-dlp/commit/5fc521cbd0ce7b2410d0935369558838728e205d) ([#12788](https://github.com/yt-dlp/yt-dlp/issues/12788)) by [mireq](https://github.com/mireq)
- **twitch**: clips: [Extract portrait formats](https://github.com/yt-dlp/yt-dlp/commit/61046c31612b30c749cbdae934b7fe26abe659d7) ([#12763](https://github.com/yt-dlp/yt-dlp/issues/12763)) by [DmitryScaletta](https://github.com/DmitryScaletta)
- **youtube**
- [Add `player_js_variant` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/07f04005e40ebdb368920c511e36e98af0077ed3) ([#12767](https://github.com/yt-dlp/yt-dlp/issues/12767)) by [bashonly](https://github.com/bashonly)
- tab: [Fix playlist continuation extraction](https://github.com/yt-dlp/yt-dlp/commit/6a6d97b2cbc78f818de05cc96edcdcfd52caa259) ([#12777](https://github.com/yt-dlp/yt-dlp/issues/12777)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **cleanup**: Miscellaneous: [5e457af](https://github.com/yt-dlp/yt-dlp/commit/5e457af57fae9645b1b8fa0ed689229c8fb9656b) by [bashonly](https://github.com/bashonly)
### 2025.03.27
#### Core changes
- **jsinterp**: [Fix nested attributes and object extraction](https://github.com/yt-dlp/yt-dlp/commit/a8b9ff3c2a0ae25735e580173becc78545b92572) ([#12760](https://github.com/yt-dlp/yt-dlp/issues/12760)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
#### Extractor changes
- **youtube**: [Make signature and nsig extraction more robust](https://github.com/yt-dlp/yt-dlp/commit/48be862b32648bff5b3e553e40fca4dcc6e88b28) ([#12761](https://github.com/yt-dlp/yt-dlp/issues/12761)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2025.03.26
#### Extractor changes
- **youtube**
- [Fix signature and nsig extraction for player `4fcd6e4a`](https://github.com/yt-dlp/yt-dlp/commit/a550dfc904a02843a26369ae50dbb7c0febfb30e) ([#12748](https://github.com/yt-dlp/yt-dlp/issues/12748)) by [seproDev](https://github.com/seproDev)
- [Only cache nsig code on successful decoding](https://github.com/yt-dlp/yt-dlp/commit/ecee97b4fa90d51c48f9154c3a6d5a8ffe46cd5c) ([#12750](https://github.com/yt-dlp/yt-dlp/issues/12750)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2025.03.25
#### Core changes
- [Fix attribute error on failed VT init](https://github.com/yt-dlp/yt-dlp/commit/b872ffec50fd50f790a5a490e006a369a28a3df3) ([#12696](https://github.com/yt-dlp/yt-dlp/issues/12696)) by [Grub4K](https://github.com/Grub4K)
- **utils**: `js_to_json`: [Make function less fatal](https://github.com/yt-dlp/yt-dlp/commit/9491b44032b330e05bd5eaa546187005d1e8538e) ([#12715](https://github.com/yt-dlp/yt-dlp/issues/12715)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- [Fix sorting of HLS audio formats by `GROUP-ID`](https://github.com/yt-dlp/yt-dlp/commit/86ab79e1a5182092321102adf6ca34195803b878) ([#12714](https://github.com/yt-dlp/yt-dlp/issues/12714)) by [bashonly](https://github.com/bashonly)
- **17live**: vod: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3396eb50dcd245b49c0f4aecd6e80ec914095d16) ([#12723](https://github.com/yt-dlp/yt-dlp/issues/12723)) by [subrat-lima](https://github.com/subrat-lima)
- **9now.com.au**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/9d5e6de2e7a47226d1f72c713ad45c88ba01db68) ([#12702](https://github.com/yt-dlp/yt-dlp/issues/12702)) by [bashonly](https://github.com/bashonly)
- **chzzk**: video: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/e2dfccaf808b406d5bcb7dd04ae9ce420752dd6f) ([#12692](https://github.com/yt-dlp/yt-dlp/issues/12692)) by [bashonly](https://github.com/bashonly), [dirkf](https://github.com/dirkf)
- **deezer**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/be5af3f9e91747768c2b41157851bfbe14c663f7) ([#12704](https://github.com/yt-dlp/yt-dlp/issues/12704)) by [seproDev](https://github.com/seproDev)
- **generic**: [Fix MPD base URL parsing](https://github.com/yt-dlp/yt-dlp/commit/5086d4aed6aeb3908c62f49e2d8f74cc0cb05110) ([#12718](https://github.com/yt-dlp/yt-dlp/issues/12718)) by [fireattack](https://github.com/fireattack)
- **streaks**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/801afeac91f97dc0b58cd39cc7e8c50f619dc4e1) ([#12679](https://github.com/yt-dlp/yt-dlp/issues/12679)) by [doe1080](https://github.com/doe1080)
- **tver**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/66e0bab814e4a52ef3e12d81123ad992a29df50e) ([#12659](https://github.com/yt-dlp/yt-dlp/issues/12659)) by [arabcoders](https://github.com/arabcoders), [bashonly](https://github.com/bashonly)
- **viki**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/fe4f14b8369038e7c58f7de546d76de1ce3a91ce) ([#12703](https://github.com/yt-dlp/yt-dlp/issues/12703)) by [seproDev](https://github.com/seproDev)
- **vrsquare**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/b7fbb5a0a16a8e8d3e29c29e26ebed677d0d6ea3) ([#12515](https://github.com/yt-dlp/yt-dlp/issues/12515)) by [doe1080](https://github.com/doe1080)
- **youtube**
- [Fix PhantomJS nsig fallback](https://github.com/yt-dlp/yt-dlp/commit/4054a2b623bd1e277b49d2e9abc3d112a4b1c7be) ([#12728](https://github.com/yt-dlp/yt-dlp/issues/12728)) by [bashonly](https://github.com/bashonly)
- [Fix signature and nsig extraction for player `363db69b`](https://github.com/yt-dlp/yt-dlp/commit/b9c979461b244713bf42691a5bc02834e2ba4b2c) ([#12725](https://github.com/yt-dlp/yt-dlp/issues/12725)) by [bashonly](https://github.com/bashonly)
#### Networking changes
- **Request Handler**: curl_cffi: [Support `curl_cffi` 0.10.x](https://github.com/yt-dlp/yt-dlp/commit/9bf23902ceb948b9685ce1dab575491571720fc6) ([#12670](https://github.com/yt-dlp/yt-dlp/issues/12670)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [9dde546](https://github.com/yt-dlp/yt-dlp/commit/9dde546e7ee3e1515d88ee3af08b099351455dc0) by [seproDev](https://github.com/seproDev)
### 2025.03.21
#### Core changes
- [Fix external downloader availability when using `--ffmpeg-location`](https://github.com/yt-dlp/yt-dlp/commit/9f77e04c76e36e1cbbf49bc9eb385fa6ef804b67) ([#12318](https://github.com/yt-dlp/yt-dlp/issues/12318)) by [Kenshin9977](https://github.com/Kenshin9977)
- [Load plugins on demand](https://github.com/yt-dlp/yt-dlp/commit/4445f37a7a66b248dbd8376c43137e6e441f138e) ([#11305](https://github.com/yt-dlp/yt-dlp/issues/11305)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan) (With fixes in [c034d65](https://github.com/yt-dlp/yt-dlp/commit/c034d655487be668222ef9476a16f374584e49a7))
- [Support emitting ConEmu progress codes](https://github.com/yt-dlp/yt-dlp/commit/f7a1f2d8132967a62b0f6d5665c6d2dde2d42c09) ([#10649](https://github.com/yt-dlp/yt-dlp/issues/10649)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **azmedien**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/26a502fc727d0e91b2db6bf4a112823bcc672e85) ([#12375](https://github.com/yt-dlp/yt-dlp/issues/12375)) by [goggle](https://github.com/goggle)
- **bilibiliplaylist**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f5fb2229e66cf59d5bf16065bc041b42a28354a0) ([#12690](https://github.com/yt-dlp/yt-dlp/issues/12690)) by [bashonly](https://github.com/bashonly)
- **bunnycdn**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3a1583ca75fb523cbad0e5e174387ea7b477d175) ([#11586](https://github.com/yt-dlp/yt-dlp/issues/11586)) by [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **canalsurmas**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/01a8be4c23f186329d85f9c78db34a55f3294ac5) ([#12497](https://github.com/yt-dlp/yt-dlp/issues/12497)) by [Arc8ne](https://github.com/Arc8ne)
- **cda**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/be0d819e1103195043f6743650781f0d4d343f6d) ([#12552](https://github.com/yt-dlp/yt-dlp/issues/12552)) by [rysson](https://github.com/rysson)
- **cultureunplugged**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/3042afb5fe342d3a00de76704cd7de611acc350e) ([#12486](https://github.com/yt-dlp/yt-dlp/issues/12486)) by [seproDev](https://github.com/seproDev)
- **dailymotion**: [Improve embed detection](https://github.com/yt-dlp/yt-dlp/commit/ad60137c141efa5023fbc0ac8579eaefe8b3d8cc) ([#12464](https://github.com/yt-dlp/yt-dlp/issues/12464)) by [seproDev](https://github.com/seproDev)
- **gem.cbc.ca**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/eb1417786a3027b1e7290ec37ef6aaece50ebed0) ([#12414](https://github.com/yt-dlp/yt-dlp/issues/12414)) by [bashonly](https://github.com/bashonly)
- **globo**: [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/0e1697232fcbba7551f983fd1ba93bb445cbb08b) ([#12270](https://github.com/yt-dlp/yt-dlp/issues/12270)) by [pedro](https://github.com/pedro)
- **instagram**
- [Add `app_id` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/a90641c8363fa0c10800b36eb6b01ee22d3a9409) ([#12359](https://github.com/yt-dlp/yt-dlp/issues/12359)) by [chrisellsworth](https://github.com/chrisellsworth)
- [Fix extraction of older private posts](https://github.com/yt-dlp/yt-dlp/commit/a59abe0636dc49b22a67246afe35613571b86f05) ([#12451](https://github.com/yt-dlp/yt-dlp/issues/12451)) by [bashonly](https://github.com/bashonly)
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/480125560a3b9972d29ae0da850aba8109e6bd41) ([#12410](https://github.com/yt-dlp/yt-dlp/issues/12410)) by [bashonly](https://github.com/bashonly)
- story: [Support `--no-playlist`](https://github.com/yt-dlp/yt-dlp/commit/65c3c58c0a67463a150920203cec929045c95a24) ([#12397](https://github.com/yt-dlp/yt-dlp/issues/12397)) by [fireattack](https://github.com/fireattack)
- **jamendo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/89a68c4857ddbaf937ff22f12648baaf6b5af840) ([#12622](https://github.com/yt-dlp/yt-dlp/issues/12622)) by [bashonly](https://github.com/bashonly), [JChris246](https://github.com/JChris246)
- **ketnet**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/bbada3ec0779422cde34f1ce3dcf595da463b493) ([#12628](https://github.com/yt-dlp/yt-dlp/issues/12628)) by [MichaelDeBoey](https://github.com/MichaelDeBoey)
- **lbry**
- [Make m3u8 format extraction non-fatal](https://github.com/yt-dlp/yt-dlp/commit/9807181cfbf87bfa732f415c30412bdbd77cbf81) ([#12463](https://github.com/yt-dlp/yt-dlp/issues/12463)) by [bashonly](https://github.com/bashonly)
- [Raise appropriate error for non-media files](https://github.com/yt-dlp/yt-dlp/commit/7126b472601814b7fd8c9de02069e8fff1764891) ([#12462](https://github.com/yt-dlp/yt-dlp/issues/12462)) by [bashonly](https://github.com/bashonly)
- **loco**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/983095485c731240aae27c950cb8c24a50827b56) ([#12667](https://github.com/yt-dlp/yt-dlp/issues/12667)) by [DTrombett](https://github.com/DTrombett)
- **magellantv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/172d5fcd778bf2605db7647ebc56b29ed18d24ac) ([#12505](https://github.com/yt-dlp/yt-dlp/issues/12505)) by [seproDev](https://github.com/seproDev)
- **mitele**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7223d29569a48a35ad132a508c115973866838d3) ([#12689](https://github.com/yt-dlp/yt-dlp/issues/12689)) by [bashonly](https://github.com/bashonly)
- **msn**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/4815dac131d42c51e12c1d05232db0bbbf607329) ([#12513](https://github.com/yt-dlp/yt-dlp/issues/12513)) by [seproDev](https://github.com/seproDev), [thedenv](https://github.com/thedenv)
- **n1**: [Fix extraction of newer articles](https://github.com/yt-dlp/yt-dlp/commit/9d70abe4de401175cbbaaa36017806f16b2df9af) ([#12514](https://github.com/yt-dlp/yt-dlp/issues/12514)) by [u-spec-png](https://github.com/u-spec-png)
- **nbcstations**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ebac65aa9e0bf9a97c24d00f7977900d2577364b) ([#12534](https://github.com/yt-dlp/yt-dlp/issues/12534)) by [refack](https://github.com/refack)
- **niconico**
- [Fix format sorting](https://github.com/yt-dlp/yt-dlp/commit/7508e34f203e97389f1d04db92140b13401dd724) ([#12442](https://github.com/yt-dlp/yt-dlp/issues/12442)) by [xpadev-net](https://github.com/xpadev-net)
- live: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/c2e6e1d5f77f3b720a6266f2869eb750d20e5dc1) ([#12419](https://github.com/yt-dlp/yt-dlp/issues/12419)) by [bashonly](https://github.com/bashonly)
- **openrec**: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/17504f253564cfad86244de2b6346d07d2300ca5) ([#12608](https://github.com/yt-dlp/yt-dlp/issues/12608)) by [fireattack](https://github.com/fireattack)
- **pinterest**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bd0a66816934de70312eea1e71c59c13b401dc3a) ([#12538](https://github.com/yt-dlp/yt-dlp/issues/12538)) by [mikf](https://github.com/mikf)
- **playsuisse**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/6933f5670cea9c3e2fb16c1caa1eda54d13122c5) ([#12444](https://github.com/yt-dlp/yt-dlp/issues/12444)) by [bashonly](https://github.com/bashonly)
- **reddit**: [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/d9a53cc1e6fd912daf500ca4f19e9ca88994dbf9) ([#12567](https://github.com/yt-dlp/yt-dlp/issues/12567)) by [seproDev](https://github.com/seproDev)
- **rtp**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/8eb9c1bf3b9908cca22ef043602aa24fb9f352c6) ([#11638](https://github.com/yt-dlp/yt-dlp/issues/11638)) by [pferreir](https://github.com/pferreir), [red-acid](https://github.com/red-acid), [seproDev](https://github.com/seproDev), [somini](https://github.com/somini), [vallovic](https://github.com/vallovic)
- **softwhiteunderbelly**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/652827d5a076c9483c36654ad2cf3fe46219baf4) ([#12281](https://github.com/yt-dlp/yt-dlp/issues/12281)) by [benfaerber](https://github.com/benfaerber)
- **soop**: [Fix timestamp extraction](https://github.com/yt-dlp/yt-dlp/commit/8305df00012ff8138a6ff95279d06b54ac607f63) ([#12609](https://github.com/yt-dlp/yt-dlp/issues/12609)) by [msikma](https://github.com/msikma)
- **soundcloud**
- [Extract tags](https://github.com/yt-dlp/yt-dlp/commit/9deed13d7cce6d3647379e50589c92de89227509) ([#12420](https://github.com/yt-dlp/yt-dlp/issues/12420)) by [bashonly](https://github.com/bashonly)
- [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/6deeda5c11f34f613724fa0627879f0d607ba1b4) ([#12447](https://github.com/yt-dlp/yt-dlp/issues/12447)) by [bashonly](https://github.com/bashonly)
- **tiktok**
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/99ea2978757a431eeb2a265b3395ccbe4ce202cf) ([#12445](https://github.com/yt-dlp/yt-dlp/issues/12445)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/83b119dadb0f267f1fb66bf7ed74c097349de79e) ([#12566](https://github.com/yt-dlp/yt-dlp/issues/12566)) by [seproDev](https://github.com/seproDev)
- **tv8.it**: [Add live and playlist extractors](https://github.com/yt-dlp/yt-dlp/commit/2ee3a0aff9be2be3bea60640d3d8a0febaf0acb6) ([#12569](https://github.com/yt-dlp/yt-dlp/issues/12569)) by [DTrombett](https://github.com/DTrombett)
- **tvw**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/42b7440963866e31ff84a5b89030d1c596fa2e6e) ([#12271](https://github.com/yt-dlp/yt-dlp/issues/12271)) by [fries1234](https://github.com/fries1234)
- **twitter**
- [Fix syndication token generation](https://github.com/yt-dlp/yt-dlp/commit/b8b47547049f5ebc3dd680fc7de70ed0ca9c0d70) ([#12537](https://github.com/yt-dlp/yt-dlp/issues/12537)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/06f6de78db2eceeabd062ab1a3023e0ff9d4df53) ([#12560](https://github.com/yt-dlp/yt-dlp/issues/12560)) by [seproDev](https://github.com/seproDev)
- **vk**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/05c8023a27dd37c49163c0498bf98e3e3c1cb4b9) ([#12510](https://github.com/yt-dlp/yt-dlp/issues/12510)) by [seproDev](https://github.com/seproDev)
- **vrtmax**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/df9ebeec00d658693252978d1ffb885e67aa6ab6) ([#12479](https://github.com/yt-dlp/yt-dlp/issues/12479)) by [bergoid](https://github.com/bergoid), [MichaelDeBoey](https://github.com/MichaelDeBoey), [seproDev](https://github.com/seproDev)
- **weibo**: [Support playlists](https://github.com/yt-dlp/yt-dlp/commit/0bb39788626002a8a67e925580227952c563c8b9) ([#12284](https://github.com/yt-dlp/yt-dlp/issues/12284)) by [4ft35t](https://github.com/4ft35t)
- **wsj**: [Support opinion URLs and impersonation](https://github.com/yt-dlp/yt-dlp/commit/7f3006eb0c0659982bb956d71b0bc806bcb0a5f2) ([#12431](https://github.com/yt-dlp/yt-dlp/issues/12431)) by [refack](https://github.com/refack)
- **youtube**
- [Fix nsig and signature extraction for player `643afba4`](https://github.com/yt-dlp/yt-dlp/commit/9b868518a15599f3d7ef5a1c730dda164c30da9b) ([#12684](https://github.com/yt-dlp/yt-dlp/issues/12684)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/3380febe9984c21c79c3147c1d390a4cf339bc4c) ([#12603](https://github.com/yt-dlp/yt-dlp/issues/12603)) by [seproDev](https://github.com/seproDev)
- [Split into package](https://github.com/yt-dlp/yt-dlp/commit/4432a9390c79253ac830702b226d2e558b636725) ([#12557](https://github.com/yt-dlp/yt-dlp/issues/12557)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on DRM formats](https://github.com/yt-dlp/yt-dlp/commit/e67d786c7cc87bd449d22e0ddef08306891c1173) ([#12593](https://github.com/yt-dlp/yt-dlp/issues/12593)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on missing formats due to SSAP](https://github.com/yt-dlp/yt-dlp/commit/79ec2fdff75c8c1bb89b550266849ad4dec48dd3) ([#12483](https://github.com/yt-dlp/yt-dlp/issues/12483)) by [coletdjnz](https://github.com/coletdjnz)
#### Networking changes
- [Add `keep_header_casing` extension](https://github.com/yt-dlp/yt-dlp/commit/7d18fed8f1983fe6de4ddc810dfb2761ba5744ac) ([#11652](https://github.com/yt-dlp/yt-dlp/issues/11652)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K)
- [Always add unsupported suffix on version mismatch](https://github.com/yt-dlp/yt-dlp/commit/95f8df2f796d0048119615200758199aedcd7cf4) ([#12626](https://github.com/yt-dlp/yt-dlp/issues/12626)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [f36e4b6](https://github.com/yt-dlp/yt-dlp/commit/f36e4b6e65cb8403791aae2f520697115cb88dec) by [dirkf](https://github.com/dirkf), [gamer191](https://github.com/gamer191), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **test**: [Show all differences for `expect_value` and `expect_dict`](https://github.com/yt-dlp/yt-dlp/commit/a3e0c7d3b267abdf3933b709704a28d43bb46503) ([#12334](https://github.com/yt-dlp/yt-dlp/issues/12334)) by [Grub4K](https://github.com/Grub4K)
### 2025.02.19 ### 2025.02.19
#### Core changes #### Core changes

View File

@ -1772,7 +1772,7 @@ # EXTRACTOR ARGUMENTS
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios` * `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1785,6 +1785,7 @@ #### youtube
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage` * `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID) * `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request) * `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
#### youtubetab (YouTube playlists, channels, feeds, etc.) #### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details) * `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
@ -1869,6 +1870,9 @@ #### bilibili
#### sonylivseries #### sonylivseries
* `sort_order`: Episode sort order for series extraction - one of `asc` (ascending, oldest first) or `desc` (descending, newest first). Default is `asc` * `sort_order`: Episode sort order for series extraction - one of `asc` (ascending, oldest first) or `desc` (descending, newest first). Default is `asc`
#### tver
* `backend`: Backend API to use for extraction - one of `streaks` (default) or `brightcove` (deprecated)
**Note**: These options may be changed/removed in the future without concern for backward compatibility **Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE --> <!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@ -2218,7 +2222,7 @@ ### Differences in default behavior
* Live chats (if available) are considered as subtitles. Use `--sub-langs all,-live_chat` to download all subtitles except live chat. You can also use `--compat-options no-live-chat` to prevent any live chat/danmaku from downloading * Live chats (if available) are considered as subtitles. Use `--sub-langs all,-live_chat` to download all subtitles except live chat. You can also use `--compat-options no-live-chat` to prevent any live chat/danmaku from downloading
* YouTube channel URLs download all uploads of the channel. To download only the videos in a specific tab, pass the tab's URL. If the channel does not show the requested tab, an error will be raised. Also, `/live` URLs raise an error if there are no live videos instead of silently downloading the entire channel. You may use `--compat-options no-youtube-channel-redirect` to revert all these redirections * YouTube channel URLs download all uploads of the channel. To download only the videos in a specific tab, pass the tab's URL. If the channel does not show the requested tab, an error will be raised. Also, `/live` URLs raise an error if there are no live videos instead of silently downloading the entire channel. You may use `--compat-options no-youtube-channel-redirect` to revert all these redirections
* Unavailable videos are also listed for YouTube playlists. Use `--compat-options no-youtube-unavailable-videos` to remove this * Unavailable videos are also listed for YouTube playlists. Use `--compat-options no-youtube-unavailable-videos` to remove this
* The upload dates extracted from YouTube are in UTC [when available](https://github.com/yt-dlp/yt-dlp/blob/89e4d86171c7b7c997c77d4714542e0383bf0db0/yt_dlp/extractor/youtube.py#L3898-L3900). Use `--compat-options no-youtube-prefer-utc-upload-date` to prefer the non-UTC upload date. * The upload dates extracted from YouTube are in UTC.
* If `ffmpeg` is used as the downloader, the downloading and merging of formats happen in a single step when possible. Use `--compat-options no-direct-merge` to revert this * If `ffmpeg` is used as the downloader, the downloading and merging of formats happen in a single step when possible. Use `--compat-options no-direct-merge` to revert this
* Thumbnail embedding in `mp4` is done with mutagen if possible. Use `--compat-options embed-thumbnail-atomicparsley` to force the use of AtomicParsley instead * Thumbnail embedding in `mp4` is done with mutagen if possible. Use `--compat-options embed-thumbnail-atomicparsley` to force the use of AtomicParsley instead
* Some internal metadata such as filenames are removed by default from the infojson. Use `--no-clean-infojson` or `--compat-options no-clean-infojson` to revert this * Some internal metadata such as filenames are removed by default from the infojson. Use `--no-clean-infojson` or `--compat-options no-clean-infojson` to revert this
@ -2237,9 +2241,10 @@ ### Differences in default behavior
* `--compat-options all`: Use all compat options (**Do NOT use this!**) * `--compat-options all`: Use all compat options (**Do NOT use this!**)
* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort` * `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort`
* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort` * `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort`
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization,no-youtube-prefer-utc-upload-date` * `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization`
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx` * `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Same as `--compat-options prefer-vp9-sort`. Use this to enable all future compat options * `--compat-options 2023`: Same as `--compat-options 2024,prefer-vp9-sort`
* `--compat-options 2024`: Currently does nothing. Use this to enable all future compat options
The following compat options restore vulnerable behavior from before security patches: The following compat options restore vulnerable behavior from before security patches:

View File

@ -55,8 +55,7 @@ default = [
"websockets>=13.0", "websockets>=13.0",
] ]
curl-cffi = [ curl-cffi = [
"curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'", "curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.11; implementation_name=='cpython'",
"curl-cffi>=0.5.10,!=0.6.*,<0.7.2; os_name!='nt' and implementation_name=='cpython'",
] ]
secretstorage = [ secretstorage = [
"cffi", "cffi",
@ -76,7 +75,7 @@ dev = [
] ]
static-analysis = [ static-analysis = [
"autopep8~=2.0", "autopep8~=2.0",
"ruff~=0.9.0", "ruff~=0.11.0",
] ]
test = [ test = [
"pytest~=8.1", "pytest~=8.1",
@ -387,7 +386,11 @@ select = [
exclude = "*/extractor/lazy_extractors.py,*venv*,*/test/testdata/sigs/player-*.js,.idea,.vscode" exclude = "*/extractor/lazy_extractors.py,*venv*,*/test/testdata/sigs/player-*.js,.idea,.vscode"
[tool.pytest.ini_options] [tool.pytest.ini_options]
addopts = "-ra -v --strict-markers" addopts = [
"-ra", # summary: all except passed
"--verbose",
"--strict-markers",
]
markers = [ markers = [
"download", "download",
] ]

View File

@ -7,6 +7,7 @@ # Supported sites
- **17live** - **17live**
- **17live:clip** - **17live:clip**
- **17live:vod**
- **1News**: 1news.co.nz article videos - **1News**: 1news.co.nz article videos
- **1tv**: Первый канал - **1tv**: Первый канал
- **20min** - **20min**
@ -200,7 +201,7 @@ # Supported sites
- **blogger.com** - **blogger.com**
- **Bloomberg** - **Bloomberg**
- **Bluesky** - **Bluesky**
- **BokeCC** - **BokeCC**: CC视频
- **BongaCams** - **BongaCams**
- **Boosty** - **Boosty**
- **BostonGlobe** - **BostonGlobe**
@ -224,6 +225,7 @@ # Supported sites
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **Bundesliga** - **Bundesliga**
- **Bundestag** - **Bundestag**
- **BunnyCdn**
- **BusinessInsider** - **BusinessInsider**
- **BuzzFeed** - **BuzzFeed**
- **BYUtv**: (**Currently broken**) - **BYUtv**: (**Currently broken**)
@ -242,6 +244,7 @@ # Supported sites
- **CanalAlpha** - **CanalAlpha**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalsurmas**
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine") - **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
- **CartoonNetwork** - **CartoonNetwork**
- **cbc.ca** - **cbc.ca**
@ -345,8 +348,6 @@ # Supported sites
- **daystar:clip** - **daystar:clip**
- **DBTV** - **DBTV**
- **DctpTv** - **DctpTv**
- **DeezerAlbum**
- **DeezerPlaylist**
- **democracynow** - **democracynow**
- **DestinationAmerica** - **DestinationAmerica**
- **DetikEmbed** - **DetikEmbed**
@ -471,6 +472,7 @@ # Supported sites
- **FoxNewsVideo** - **FoxNewsVideo**
- **FoxSports** - **FoxSports**
- **fptplay**: fptplay.vn - **fptplay**: fptplay.vn
- **FrancaisFacile**
- **FranceCulture** - **FranceCulture**
- **FranceInter** - **FranceInter**
- **francetv** - **francetv**
@ -609,10 +611,10 @@ # Supported sites
- **Inc** - **Inc**
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram**: [*instagram*](## "netrc machine") - **Instagram**
- **instagram:story**: [*instagram*](## "netrc machine") - **instagram:story**
- **instagram:tag**: [*instagram*](## "netrc machine") Instagram hashtag search URLs - **instagram:tag**: Instagram hashtag search URLs
- **instagram:user**: [*instagram*](## "netrc machine") Instagram user profile (**Currently broken**) - **instagram:user**: Instagram user profile (**Currently broken**)
- **InstagramIOS**: IOS instagram:// URL - **InstagramIOS**: IOS instagram:// URL
- **Internazionale** - **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
@ -661,7 +663,6 @@ # Supported sites
- **KelbyOne**: (**Currently broken**) - **KelbyOne**: (**Currently broken**)
- **Kenh14Playlist** - **Kenh14Playlist**
- **Kenh14Video** - **Kenh14Video**
- **Ketnet**
- **khanacademy** - **khanacademy**
- **khanacademy:unit** - **khanacademy:unit**
- **kick:clips** - **kick:clips**
@ -733,6 +734,7 @@ # Supported sites
- **Livestreamfails** - **Livestreamfails**
- **Lnk** - **Lnk**
- **loc**: Library of Congress - **loc**: Library of Congress
- **Loco**
- **loom** - **loom**
- **loom:folder** - **loom:folder**
- **LoveHomePorn** - **LoveHomePorn**
@ -827,11 +829,11 @@ # Supported sites
- **MotherlessUploader** - **MotherlessUploader**
- **Motorsport**: motorsport.com (**Currently broken**) - **Motorsport**: motorsport.com (**Currently broken**)
- **MovieFap** - **MovieFap**
- **Moviepilot** - **moviepilot**: Moviepilot trailer
- **MoviewPlay** - **MoviewPlay**
- **Moviezine** - **Moviezine**
- **MovingImage** - **MovingImage**
- **MSN**: (**Currently broken**) - **MSN**
- **mtg**: MTG services - **mtg**: MTG services
- **mtv** - **mtv**
- **mtv.de**: (**Currently broken**) - **mtv.de**: (**Currently broken**)
@ -1250,7 +1252,6 @@ # Supported sites
- **rtve.es:infantil**: RTVE infantil - **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television** - **rtve.es:television**
- **RTVS**
- **rtvslo.si** - **rtvslo.si**
- **rtvslo.si:show** - **rtvslo.si:show**
- **RudoVideo** - **RudoVideo**
@ -1305,8 +1306,8 @@ # Supported sites
- **sejm** - **sejm**
- **Sen** - **Sen**
- **SenalColombiaLive**: (**Currently broken**) - **SenalColombiaLive**: (**Currently broken**)
- **SenateGov** - **senate.gov**
- **SenateISVP** - **senate.gov:isvp**
- **SendtoNews**: (**Currently broken**) - **SendtoNews**: (**Currently broken**)
- **Servus** - **Servus**
- **Sexu**: (**Currently broken**) - **Sexu**: (**Currently broken**)
@ -1342,6 +1343,7 @@ # Supported sites
- **Smotrim** - **Smotrim**
- **SnapchatSpotlight** - **SnapchatSpotlight**
- **Snotr** - **Snotr**
- **SoftWhiteUnderbelly**: [*softwhiteunderbelly*](## "netrc machine")
- **Sohu** - **Sohu**
- **SohuV** - **SohuV**
- **SonyLIV**: [*sonyliv*](## "netrc machine") - **SonyLIV**: [*sonyliv*](## "netrc machine")
@ -1398,12 +1400,14 @@ # Supported sites
- **StoryFire** - **StoryFire**
- **StoryFireSeries** - **StoryFireSeries**
- **StoryFireUser** - **StoryFireUser**
- **Streaks**
- **Streamable** - **Streamable**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
- **StretchInternet** - **StretchInternet**
- **Stripchat** - **Stripchat**
- **stv:player** - **stv:player**
- **stvr**: Slovak Television and Radio (formerly RTVS)
- **Subsplash** - **Subsplash**
- **subsplash:playlist** - **subsplash:playlist**
- **Substack** - **Substack**
@ -1536,6 +1540,8 @@ # Supported sites
- **tv5unis** - **tv5unis**
- **tv5unis:video** - **tv5unis:video**
- **tv8.it** - **tv8.it**
- **tv8.it:live**: TV8 Live
- **tv8.it:playlist**: TV8 Playlist
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **tvaplus**: TVA+ - **tvaplus**: TVA+
@ -1556,6 +1562,7 @@ # Supported sites
- **tvp:vod:series** - **tvp:vod:series**
- **TVPlayer** - **TVPlayer**
- **TVPlayHome** - **TVPlayHome**
- **Tvw**
- **Tweakers** - **Tweakers**
- **TwitCasting** - **TwitCasting**
- **TwitCastingLive** - **TwitCastingLive**
@ -1637,8 +1644,6 @@ # Supported sites
- **viewlift** - **viewlift**
- **viewlift:embed** - **viewlift:embed**
- **Viidea** - **Viidea**
- **viki**: [*viki*](## "netrc machine")
- **viki:channel**: [*viki*](## "netrc machine")
- **vimeo**: [*vimeo*](## "netrc machine") - **vimeo**: [*vimeo*](## "netrc machine")
- **vimeo:album**: [*vimeo*](## "netrc machine") - **vimeo:album**: [*vimeo*](## "netrc machine")
- **vimeo:channel**: [*vimeo*](## "netrc machine") - **vimeo:channel**: [*vimeo*](## "netrc machine")
@ -1676,8 +1681,12 @@ # Supported sites
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **vqq:series** - **vqq:series**
- **vqq:video** - **vqq:video**
- **vrsquare**: VR SQUARE
- **vrsquare:channel**
- **vrsquare:search**
- **vrsquare:section**
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza - **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: [*vrtnu*](## "netrc machine") VRT MAX - **vrtmax**: [*vrtnu*](## "netrc machine") VRT MAX (formerly VRT NU)
- **VTM**: (**Currently broken**) - **VTM**: (**Currently broken**)
- **VTV** - **VTV**
- **VTVGo** - **VTVGo**

View File

@ -638,6 +638,7 @@ def test_parse_m3u8_formats(self):
'img_bipbop_adv_example_fmp4', 'img_bipbop_adv_example_fmp4',
'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8', 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
[{ [{
# 60kbps (bitrate not provided in m3u8); sorted as worst because it's grouped with lowest bitrate video track
'format_id': 'aud1-English', 'format_id': 'aud1-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a1/prog_index.m3u8', 'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a1/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8', 'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
@ -645,15 +646,9 @@ def test_parse_m3u8_formats(self):
'ext': 'mp4', 'ext': 'mp4',
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'audio_ext': 'mp4', 'audio_ext': 'mp4',
'source_preference': 0,
}, { }, {
'format_id': 'aud2-English', # 192kbps (bitrate not provided in m3u8)
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a2/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
'language': 'en',
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
}, {
'format_id': 'aud3-English', 'format_id': 'aud3-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a3/prog_index.m3u8', 'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a3/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8', 'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
@ -661,6 +656,17 @@ def test_parse_m3u8_formats(self):
'ext': 'mp4', 'ext': 'mp4',
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'audio_ext': 'mp4', 'audio_ext': 'mp4',
'source_preference': 1,
}, {
# 384kbps (bitrate not provided in m3u8); sorted as best because it's grouped with the highest bitrate video track
'format_id': 'aud2-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a2/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
'language': 'en',
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
'source_preference': 2,
}, { }, {
'format_id': '530', 'format_id': '530',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/prog_index.m3u8', 'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/prog_index.m3u8',

View File

@ -331,10 +331,6 @@ def test_http_connect_auth(self, handler, ctx):
assert proxy_info['proxy'] == server_address assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers'] assert 'Proxy-Authorization' in proxy_info['headers']
@pytest.mark.skip_handler(
'Requests',
'bug in urllib3 causes unclosed socket: https://github.com/urllib3/urllib3/issues/3374',
)
def test_http_connect_bad_auth(self, handler, ctx): def test_http_connect_bad_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address: with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh: with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh:

View File

@ -118,6 +118,7 @@ def test_assignments(self):
self._test('function f(){var x = 20; x = 30 + 1; return x;}', 31) self._test('function f(){var x = 20; x = 30 + 1; return x;}', 31)
self._test('function f(){var x = 20; x += 30 + 1; return x;}', 51) self._test('function f(){var x = 20; x += 30 + 1; return x;}', 51)
self._test('function f(){var x = 20; x -= 30 + 1; return x;}', -11) self._test('function f(){var x = 20; x -= 30 + 1; return x;}', -11)
self._test('function f(){var x = 2; var y = ["a", "b"]; y[x%y["length"]]="z"; return y}', ['z', 'b'])
@unittest.skip('Not implemented') @unittest.skip('Not implemented')
def test_comments(self): def test_comments(self):
@ -384,7 +385,7 @@ def test_negative(self):
@unittest.skip('Not implemented') @unittest.skip('Not implemented')
def test_packed(self): def test_packed(self):
jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''') jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) # noqa: SIM905
def test_join(self): def test_join(self):
test_input = list('test') test_input = list('test')
@ -403,6 +404,8 @@ def test_split(self):
test_result = list('test') test_result = list('test')
tests = [ tests = [
'function f(a, b){return a.split(b)}', 'function f(a, b){return a.split(b)}',
'function f(a, b){return a["split"](b)}',
'function f(a, b){let x = ["split"]; return a[x[0]](b)}',
'function f(a, b){return String.prototype.split.call(a, b)}', 'function f(a, b){return String.prototype.split.call(a, b)}',
'function f(a, b){return String.prototype.split.apply(a, [b])}', 'function f(a, b){return String.prototype.split.apply(a, [b])}',
] ]
@ -441,6 +444,9 @@ def test_slice(self):
self._test('function f(){return "012345678".slice(-1, 1)}', '') self._test('function f(){return "012345678".slice(-1, 1)}', '')
self._test('function f(){return "012345678".slice(-3, -1)}', '67') self._test('function f(){return "012345678".slice(-3, -1)}', '67')
def test_splice(self):
self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0'])
def test_js_number_to_string(self): def test_js_number_to_string(self):
for test, radix, expected in [ for test, radix, expected in [
(0, None, '0'), (0, None, '0'),
@ -462,6 +468,16 @@ def test_js_number_to_string(self):
]: ]:
assert js_number_to_string(test, radix) == expected assert js_number_to_string(test, radix) == expected
def test_extract_function(self):
jsi = JSInterpreter('function a(b) { return b + 1; }')
func = jsi.extract_function('a')
self.assertEqual(func([2]), 3)
def test_extract_function_with_global_stack(self):
jsi = JSInterpreter('function c(d) { return d + e + f + g; }')
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -614,7 +614,6 @@ def test_source_address(self, handler):
rh, Request(f'http://127.0.0.1:{self.http_port}/source_address')).read().decode() rh, Request(f'http://127.0.0.1:{self.http_port}/source_address')).read().decode()
assert source_address == data assert source_address == data
# Not supported by CurlCFFI
@pytest.mark.skip_handler('CurlCFFI', 'not supported by curl-cffi') @pytest.mark.skip_handler('CurlCFFI', 'not supported by curl-cffi')
def test_gzip_trailing_garbage(self, handler): def test_gzip_trailing_garbage(self, handler):
with handler() as rh: with handler() as rh:
@ -720,6 +719,15 @@ def test_allproxy(self, handler):
rh, Request( rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close() f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close()
@pytest.mark.skip_handlers_if(lambda _, handler: handler not in ['Urllib', 'CurlCFFI'], 'handler does not support keep_header_casing')
def test_keep_header_casing(self, handler):
with handler() as rh:
res = validate_and_send(
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', headers={'X-test-heaDer': 'test'}, extensions={'keep_header_casing': True})).read().decode()
assert 'X-test-heaDer: test' in res
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True) @pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate: class TestClientCertificate:
@ -1289,6 +1297,7 @@ class HTTPSupportedRH(ValidationRH):
({'legacy_ssl': False}, False), ({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False), ({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError), ({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': True}, UnsupportedRequest),
]), ]),
('Requests', 'http', [ ('Requests', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError), ({'cookiejar': 'notacookiejar'}, AssertionError),
@ -1299,6 +1308,9 @@ class HTTPSupportedRH(ValidationRH):
({'legacy_ssl': False}, False), ({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False), ({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError), ({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': False}, False),
({'keep_header_casing': True}, False),
({'keep_header_casing': 'notabool'}, AssertionError),
]), ]),
('CurlCFFI', 'http', [ ('CurlCFFI', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError), ({'cookiejar': 'notacookiejar'}, AssertionError),

View File

@ -23,7 +23,6 @@
TedTalkIE, TedTalkIE,
ThePlatformFeedIE, ThePlatformFeedIE,
ThePlatformIE, ThePlatformIE,
VikiIE,
VimeoIE, VimeoIE,
WallaIE, WallaIE,
YoutubeIE, YoutubeIE,
@ -331,20 +330,6 @@ def test_subtitles_array_key(self):
self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd') self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd')
@is_download_test
@unittest.skip('IE broken - DRM only')
class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18'
IE = VikiIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'en'})
self.assertEqual(md5(subtitles['en']), '53cb083a5914b2d84ef1ab67b880d18a')
@is_download_test @is_download_test
class TestThePlatformSubtitles(BaseTestSubtitles): class TestThePlatformSubtitles(BaseTestSubtitles):
# from http://www.3playmedia.com/services-features/tools/integrations/theplatform/ # from http://www.3playmedia.com/services-features/tools/integrations/theplatform/

View File

@ -3,19 +3,20 @@
# Allow direct execution # Allow direct execution
import os import os
import sys import sys
import unittest
import unittest.mock
import warnings
import datetime as dt
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import contextlib import contextlib
import datetime as dt
import io import io
import itertools import itertools
import json import json
import pickle
import subprocess import subprocess
import unittest
import unittest.mock
import warnings
import xml.etree.ElementTree import xml.etree.ElementTree
from yt_dlp.compat import ( from yt_dlp.compat import (
@ -218,11 +219,8 @@ def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw') self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
self.assertEqual(sanitize_filename('N0Y__7-UOdI', is_id=True), 'N0Y__7-UOdI') self.assertEqual(sanitize_filename('N0Y__7-UOdI', is_id=True), 'N0Y__7-UOdI')
@unittest.mock.patch('sys.platform', 'win32')
def test_sanitize_path(self): def test_sanitize_path(self):
with unittest.mock.patch('sys.platform', 'win32'):
self._test_sanitize_path()
def _test_sanitize_path(self):
self.assertEqual(sanitize_path('abc'), 'abc') self.assertEqual(sanitize_path('abc'), 'abc')
self.assertEqual(sanitize_path('abc/def'), 'abc\\def') self.assertEqual(sanitize_path('abc/def'), 'abc\\def')
self.assertEqual(sanitize_path('abc\\def'), 'abc\\def') self.assertEqual(sanitize_path('abc\\def'), 'abc\\def')
@ -253,10 +251,8 @@ def _test_sanitize_path(self):
# Check with nt._path_normpath if available # Check with nt._path_normpath if available
try: try:
import nt from nt import _path_normpath as nt_path_normpath
except ImportError:
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
nt_path_normpath = None nt_path_normpath = None
for test, expected in [ for test, expected in [
@ -663,6 +659,8 @@ def test_url_or_none(self):
self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de') self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de')
self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de') self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de')
self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de') self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de')
self.assertEqual(url_or_none('ws://foo.de'), 'ws://foo.de')
self.assertEqual(url_or_none('wss://foo.de'), 'wss://foo.de')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
@ -1264,6 +1262,7 @@ def test_js_to_json_edgecases(self):
def test_js_to_json_malformed(self): def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"') self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1') self.assertEqual(js_to_json('42a-1'), '42"a"-1')
self.assertEqual(js_to_json('{a: `${e("")}`}'), '{"a": "\\"e\\"(\\"\\")"}')
def test_js_to_json_template_literal(self): def test_js_to_json_template_literal(self):
self.assertEqual(js_to_json('`Hello ${name}`', {'name': '"world"'}), '"Hello world"') self.assertEqual(js_to_json('`Hello ${name}`', {'name': '"world"'}), '"Hello world"')
@ -2087,21 +2086,26 @@ def test_http_header_dict(self):
headers = HTTPHeaderDict() headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0' headers['ytdl-test'] = b'0'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '0')])
headers['ytdl-test'] = 1 headers['ytdl-test'] = 1
self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '1')])
headers['Ytdl-test'] = '2' headers['Ytdl-test'] = '2'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')])
self.assertEqual(list(headers.sensitive().items()), [('Ytdl-test', '2')])
self.assertTrue('ytDl-Test' in headers) self.assertTrue('ytDl-Test' in headers)
self.assertEqual(str(headers), str(dict(headers))) self.assertEqual(str(headers), str(dict(headers)))
self.assertEqual(repr(headers), str(dict(headers))) self.assertEqual(repr(headers), str(dict(headers)))
headers.update({'X-dlp': 'data'}) headers.update({'X-dlp': 'data'})
self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')}) self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')})
self.assertEqual(set(headers.sensitive().items()), {('Ytdl-test', '2'), ('X-dlp', 'data')})
self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'}) self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'})
self.assertEqual(len(headers), 2) self.assertEqual(len(headers), 2)
self.assertEqual(headers.copy(), headers) self.assertEqual(headers.copy(), headers)
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, **headers, **{'X-dlp': 'data2'}) headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, headers, **{'X-dlP': 'data2'})
self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')}) self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')})
self.assertEqual(set(headers2.sensitive().items()), {('Ytdl-test', '2'), ('X-dlP', 'data2')})
self.assertEqual(len(headers2), 2) self.assertEqual(len(headers2), 2)
headers2.clear() headers2.clear()
self.assertEqual(len(headers2), 0) self.assertEqual(len(headers2), 0)
@ -2109,16 +2113,23 @@ def test_http_header_dict(self):
# ensure we prefer latter headers # ensure we prefer latter headers
headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2}) headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2})
self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')}) self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')})
self.assertEqual(set(headers3.sensitive().items()), {('Ytdl-test', '2')})
del headers3['ytdl-tesT'] del headers3['ytdl-tesT']
self.assertEqual(dict(headers3), {}) self.assertEqual(dict(headers3), {})
headers4 = HTTPHeaderDict({'ytdl-test': 'data;'}) headers4 = HTTPHeaderDict({'ytdl-test': 'data;'})
self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')}) self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers4.sensitive().items()), {('ytdl-test', 'data;')})
# common mistake: strip whitespace from values # common mistake: strip whitespace from values
# https://github.com/yt-dlp/yt-dlp/issues/8729 # https://github.com/yt-dlp/yt-dlp/issues/8729
headers5 = HTTPHeaderDict({'ytdl-test': ' data; '}) headers5 = HTTPHeaderDict({'ytdl-test': ' data; '})
self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')}) self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers5.sensitive().items()), {('ytdl-test', 'data;')})
# test if picklable
headers6 = HTTPHeaderDict(a=1, b=2)
self.assertEqual(pickle.loads(pickle.dumps(headers6)), headers6)
def test_extract_basic_auth(self): def test_extract_basic_auth(self):
assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None) assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None)

View File

@ -44,7 +44,7 @@ def websocket_handler(websocket):
return websocket.send('2') return websocket.send('2')
elif isinstance(message, str): elif isinstance(message, str):
if message == 'headers': if message == 'headers':
return websocket.send(json.dumps(dict(websocket.request.headers))) return websocket.send(json.dumps(dict(websocket.request.headers.raw_items())))
elif message == 'path': elif message == 'path':
return websocket.send(websocket.request.path) return websocket.send(websocket.request.path)
elif message == 'source_address': elif message == 'source_address':
@ -266,18 +266,18 @@ def test_cookies(self, handler):
with handler(cookiejar=cookiejar) as rh: with handler(cookiejar=cookiejar) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
with handler() as rh: with handler() as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar})) ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -287,7 +287,7 @@ def test_cookie_sync_only_cookiejar(self, handler):
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()})) ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()})) ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()}))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -298,12 +298,12 @@ def test_cookie_sync_delete_cookie(self, handler):
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie')) ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie'))
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
cookiejar.clear_session_cookies() cookiejar.clear_session_cookies()
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
def test_source_address(self, handler): def test_source_address(self, handler):
@ -341,6 +341,14 @@ def test_request_headers(self, handler):
assert headers['test3'] == 'test3' assert headers['test3'] == 'test3'
ws.close() ws.close()
def test_keep_header_casing(self, handler):
with handler(headers=HTTPHeaderDict({'x-TeSt1': 'test'})) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url, headers={'x-TeSt2': 'test'}, extensions={'keep_header_casing': True}))
ws.send('headers')
headers = json.loads(ws.recv())
assert 'x-TeSt1' in headers
assert 'x-TeSt2' in headers
@pytest.mark.parametrize('client_cert', ( @pytest.mark.parametrize('client_cert', (
{'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')}, {'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')},
{ {

View File

@ -78,6 +78,61 @@
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA', '2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q', '0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
), ),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'AAOAOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7vgpDL0QwbdV06sCIEzpWqMGkFR20CFOS21Tp-7vj_EMu-m37KtXJoOy1',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpz2ICs6EVdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpz2ICs6EVdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'wAOAOq0QJ8ARAIgXmPlOPSBkkUs1bYFYlJCfe29xx8q7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'wAOAOq0QJ8ARAIgXmPlOPSBkkUs1bYFYlJCfe29xx8q7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/20830619/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-phone-en_US.vflset/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-tablet-en_US.vflset/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/8a8ac953/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
),
] ]
_NSIG_TESTS = [ _NSIG_TESTS = [
@ -205,6 +260,62 @@
'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js', 'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js',
'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg', 'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg',
), ),
(
'https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js',
'Sy4aDGc0VpYRR9ew_', '5UPOT1VhoZxNLQ',
),
(
'https://www.youtube.com/s/player/d50f54ef/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'XFTb2HoeOE5MHg',
),
(
'https://www.youtube.com/s/player/074a8365/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'ufTsrE0IVYrkl8v',
),
(
'https://www.youtube.com/s/player/643afba4/player_ias.vflset/en_US/base.js',
'N5uAlLqm0eg1GyHO', 'dCBQOejdq5s-ww',
),
(
'https://www.youtube.com/s/player/69f581a5/tv-player-ias.vflset/tv-player-ias.js',
'-qIP447rVlTTwaZjY', 'KNcGOksBAvwqQg',
),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'ir9-V6cdbCiyKxhr', '2PL7ZDYAALMfmA',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias.vflset/en_US/base.js',
'eWYu5d5YeY_4LyEDc', 'XJQqf-N7Xra3gg',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias.vflset/en_US/base.js',
'o_L251jm8yhZkWtBW', 'lXoxI3XvToqn6A',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias_tce.vflset/en_US/base.js',
'o_L251jm8yhZkWtBW', 'lXoxI3XvToqn6A',
),
(
'https://www.youtube.com/s/player/20830619/tv-player-ias.vflset/tv-player-ias.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-phone-en_US.vflset/base.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-tablet-en_US.vflset/base.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/8a8ac953/player_ias_tce.vflset/en_US/base.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
),
(
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
),
] ]
@ -218,6 +329,8 @@ def test_youtube_extract_player_info(self):
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js', 'e7567ecf'),
('https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js', '643afba4'),
# obsolete # obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'), ('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'), ('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
@ -250,46 +363,51 @@ def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig): def make_tfunc(url, sig_input, expected_sig):
m = url_pattern.match(url) m = url_pattern.match(url)
assert m, f'{url!r} should follow URL format' assert m, f'{url!r} should follow URL format'
test_id = m.group('id') test_id = re.sub(r'[/.-]', '_', m.group('id') or m.group('compat_id'))
def test_func(self): def test_func(self):
basename = f'player-{name}-{test_id}.js' basename = f'player-{test_id}.js'
fn = os.path.join(self.TESTDATA_DIR, basename) fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn): if not os.path.exists(fn):
urllib.request.urlretrieve(url, fn) urllib.request.urlretrieve(url, fn)
with open(fn, encoding='utf-8') as testf: with open(fn, encoding='utf-8') as testf:
jscode = testf.read() jscode = testf.read()
self.assertEqual(sig_func(jscode, sig_input), expected_sig) self.assertEqual(sig_func(jscode, sig_input, url), expected_sig)
test_func.__name__ = f'test_{name}_js_{test_id}' test_func.__name__ = f'test_{name}_js_{test_id}'
setattr(TestSignature, test_func.__name__, test_func) setattr(TestSignature, test_func.__name__, test_func)
return make_tfunc return make_tfunc
def signature(jscode, sig_input): def signature(jscode, sig_input, player_url):
func = YoutubeIE(FakeYDL())._parse_sig_js(jscode) func = YoutubeIE(FakeYDL())._parse_sig_js(jscode, player_url)
src_sig = ( src_sig = (
str(string.printable[:sig_input]) str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input) if isinstance(sig_input, int) else sig_input)
return func(src_sig) return func(src_sig)
def n_sig(jscode, sig_input): def n_sig(jscode, sig_input, player_url):
ie = YoutubeIE(FakeYDL()) ie = YoutubeIE(FakeYDL())
funcname = ie._extract_n_function_name(jscode) funcname = ie._extract_n_function_name(jscode, player_url=player_url)
jsi = JSInterpreter(jscode) jsi = JSInterpreter(jscode)
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname))) func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname), jscode, player_url))
return func([sig_input]) return func([sig_input])
make_sig_test = t_factory( make_sig_test = t_factory(
'signature', signature, re.compile(r'.*(?:-|/player/)(?P<id>[a-zA-Z0-9_-]+)(?:/.+\.js|(?:/watch_as3|/html5player)?\.[a-z]+)$')) 'signature', signature,
re.compile(r'''(?x)
.+(?:
/player/(?P<id>[a-zA-Z0-9_/.-]+)|
/html5player-(?:en_US-)?(?P<compat_id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?
)\.js$'''))
for test_spec in _SIG_TESTS: for test_spec in _SIG_TESTS:
make_sig_test(*test_spec) make_sig_test(*test_spec)
make_nsig_test = t_factory( make_nsig_test = t_factory(
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_-]+)/.+.js$')) 'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_/.-]+)\.js$'))
for test_spec in _NSIG_TESTS: for test_spec in _NSIG_TESTS:
make_nsig_test(*test_spec) make_nsig_test(*test_spec)

View File

@ -656,19 +656,21 @@ def __init__(self, params=None, auto_init=True):
if not all_plugins_loaded.value: if not all_plugins_loaded.value:
load_all_plugins() load_all_plugins()
try:
windows_enable_vt_mode()
except Exception as e:
self.write_debug(f'Failed to enable VT mode: {e}')
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout
self._out_files = Namespace( self._out_files = Namespace(
out=stdout, out=stdout,
error=sys.stderr, error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout, screen=sys.stderr if self.params.get('quiet') else stdout,
console=next(filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
) )
try:
windows_enable_vt_mode()
except Exception as e:
self.write_debug(f'Failed to enable VT mode: {e}')
# hehe "immutable" namespace
self._out_files.console = next(filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None)
if self.params.get('no_color'): if self.params.get('no_color'):
if self.params.get('color') is not None: if self.params.get('color') is not None:
self.params.setdefault('_warnings', []).append( self.params.setdefault('_warnings', []).append(
@ -4152,7 +4154,7 @@ def _get_available_impersonate_targets(self):
(target, rh.RH_NAME) (target, rh.RH_NAME)
for rh in self._request_director.handlers.values() for rh in self._request_director.handlers.values()
if isinstance(rh, ImpersonateRequestHandler) if isinstance(rh, ImpersonateRequestHandler)
for target in rh.supported_targets for target in reversed(rh.supported_targets)
] ]
def _impersonate_target_available(self, target): def _impersonate_target_available(self, target):

View File

@ -1022,8 +1022,9 @@ def _real_main(argv=None):
# List of simplified targets we know are supported, # List of simplified targets we know are supported,
# to help users know what dependencies may be required. # to help users know what dependencies may be required.
(ImpersonateTarget('chrome'), 'curl_cffi'), (ImpersonateTarget('chrome'), 'curl_cffi'),
(ImpersonateTarget('edge'), 'curl_cffi'),
(ImpersonateTarget('safari'), 'curl_cffi'), (ImpersonateTarget('safari'), 'curl_cffi'),
(ImpersonateTarget('firefox'), 'curl_cffi>=0.10'),
(ImpersonateTarget('edge'), 'curl_cffi'),
] ]
available_targets = ydl._get_available_impersonate_targets() available_targets = ydl._get_available_impersonate_targets()
@ -1039,12 +1040,12 @@ def make_row(target, handler):
for known_target, known_handler in known_targets: for known_target, known_handler in known_targets:
if not any( if not any(
known_target in target and handler == known_handler known_target in target and known_handler.startswith(handler)
for target, handler in available_targets for target, handler in available_targets
): ):
rows.append([ rows.insert(0, [
ydl._format_out(text, ydl.Styles.SUPPRESS) ydl._format_out(text, ydl.Styles.SUPPRESS)
for text in make_row(known_target, f'{known_handler} (not available)') for text in make_row(known_target, f'{known_handler} (unavailable)')
]) ])
ydl.to_screen('[info] Available impersonate targets') ydl.to_screen('[info] Available impersonate targets')

View File

@ -83,7 +83,7 @@ def aes_ecb_encrypt(data, key, iv=None):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
for i in range(block_count): for i in range(block_count):
@ -103,7 +103,7 @@ def aes_ecb_decrypt(data, key, iv=None):
@returns {int[]} decrypted data @returns {int[]} decrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
for i in range(block_count): for i in range(block_count):
@ -134,7 +134,7 @@ def aes_ctr_encrypt(data, key, iv):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
counter = iter_vector(iv) counter = iter_vector(iv)
encrypted_data = [] encrypted_data = []
@ -158,7 +158,7 @@ def aes_cbc_decrypt(data, key, iv):
@returns {int[]} decrypted data @returns {int[]} decrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
decrypted_data = [] decrypted_data = []
previous_cipher_block = iv previous_cipher_block = iv
@ -183,7 +183,7 @@ def aes_cbc_encrypt(data, key, iv, *, padding_mode='pkcs7'):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
previous_cipher_block = iv previous_cipher_block = iv

View File

@ -85,6 +85,7 @@ def communicate_ws(reconnect):
'quality': live_quality, 'quality': live_quality,
'protocol': 'hls+fmp4', 'protocol': 'hls+fmp4',
'latency': live_latency, 'latency': live_latency,
'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
}, },
'room': { 'room': {

View File

@ -336,6 +336,7 @@
from .canalalpha import CanalAlphaIE from .canalalpha import CanalAlphaIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE from .caracoltv import CaracolTvPlayIE
from .cartoonnetwork import CartoonNetworkIE from .cartoonnetwork import CartoonNetworkIE
from .cbc import ( from .cbc import (
@ -495,10 +496,6 @@
from .daystar import DaystarClipIE from .daystar import DaystarClipIE
from .dbtv import DBTVIE from .dbtv import DBTVIE
from .dctp import DctpTvIE from .dctp import DctpTvIE
from .deezer import (
DeezerAlbumIE,
DeezerPlaylistIE,
)
from .democracynow import DemocracynowIE from .democracynow import DemocracynowIE
from .detik import DetikEmbedIE from .detik import DetikEmbedIE
from .deuxm import ( from .deuxm import (
@ -686,6 +683,7 @@
) )
from .foxsports import FoxSportsIE from .foxsports import FoxSportsIE
from .fptplay import FptplayIE from .fptplay import FptplayIE
from .francaisfacile import FrancaisFacileIE
from .franceinter import FranceInterIE from .franceinter import FranceInterIE
from .francetv import ( from .francetv import (
FranceTVIE, FranceTVIE,
@ -842,6 +840,7 @@
from .ichinanalive import ( from .ichinanalive import (
IchinanaLiveClipIE, IchinanaLiveClipIE,
IchinanaLiveIE, IchinanaLiveIE,
IchinanaLiveVODIE,
) )
from .idolplus import IdolPlusIE from .idolplus import IdolPlusIE
from .ign import ( from .ign import (
@ -904,6 +903,7 @@
IviIE, IviIE,
) )
from .ivideon import IvideonIE from .ivideon import IvideonIE
from .ivoox import IvooxIE
from .iwara import ( from .iwara import (
IwaraIE, IwaraIE,
IwaraPlaylistIE, IwaraPlaylistIE,
@ -961,7 +961,10 @@
) )
from .kicker import KickerIE from .kicker import KickerIE
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .kika import KikaIE from .kika import (
KikaIE,
KikaPlaylistIE,
)
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE from .kinopoisk import KinoPoiskIE
from .kommunetv import KommunetvIE from .kommunetv import KommunetvIE
@ -1054,6 +1057,7 @@
) )
from .livestreamfails import LivestreamfailsIE from .livestreamfails import LivestreamfailsIE
from .lnk import LnkIE from .lnk import LnkIE
from .loco import LocoIE
from .loom import ( from .loom import (
LoomFolderIE, LoomFolderIE,
LoomIE, LoomIE,
@ -1061,6 +1065,7 @@
from .lovehomeporn import LoveHomePornIE from .lovehomeporn import LoveHomePornIE
from .lrt import ( from .lrt import (
LRTVODIE, LRTVODIE,
LRTRadioIE,
LRTStreamIE, LRTStreamIE,
) )
from .lsm import ( from .lsm import (
@ -1493,6 +1498,10 @@
) )
from .parler import ParlerIE from .parler import ParlerIE
from .parlview import ParlviewIE from .parlview import ParlviewIE
from .parti import (
PartiLivestreamIE,
PartiVideoIE,
)
from .patreon import ( from .patreon import (
PatreonCampaignIE, PatreonCampaignIE,
PatreonIE, PatreonIE,
@ -1739,6 +1748,7 @@
RoosterTeethSeriesIE, RoosterTeethSeriesIE,
) )
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
from .roya import RoyaLiveIE
from .rozhlas import ( from .rozhlas import (
MujRozhlasIE, MujRozhlasIE,
RozhlasIE, RozhlasIE,
@ -1882,6 +1892,8 @@
SkyItVideoIE, SkyItVideoIE,
SkyItVideoLiveIE, SkyItVideoLiveIE,
TV8ItIE, TV8ItIE,
TV8ItLiveIE,
TV8ItPlaylistIE,
) )
from .skylinewebcams import SkylineWebcamsIE from .skylinewebcams import SkylineWebcamsIE
from .skynewsarabia import ( from .skynewsarabia import (
@ -1985,6 +1997,7 @@
StoryFireSeriesIE, StoryFireSeriesIE,
StoryFireUserIE, StoryFireUserIE,
) )
from .streaks import StreaksIE
from .streamable import StreamableIE from .streamable import StreamableIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
@ -2224,6 +2237,7 @@
TVPlayIE, TVPlayIE,
) )
from .tvplayer import TVPlayerIE from .tvplayer import TVPlayerIE
from .tvw import TvwIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE from .twentythreevideo import TwentyThreeVideoIE
@ -2347,10 +2361,6 @@
ViewLiftIE, ViewLiftIE,
) )
from .viidea import ViideaIE from .viidea import ViideaIE
from .viki import (
VikiChannelIE,
VikiIE,
)
from .vimeo import ( from .vimeo import (
VHXEmbedIE, VHXEmbedIE,
VimeoAlbumIE, VimeoAlbumIE,
@ -2395,10 +2405,15 @@
VoxMediaIE, VoxMediaIE,
VoxMediaVolumeIE, VoxMediaVolumeIE,
) )
from .vrsquare import (
VrSquareChannelIE,
VrSquareIE,
VrSquareSearchIE,
VrSquareSectionIE,
)
from .vrt import ( from .vrt import (
VRTIE, VRTIE,
DagelijkseKostIE, DagelijkseKostIE,
KetnetIE,
Radio1BeIE, Radio1BeIE,
VrtNUIE, VrtNUIE,
) )

View File

@ -1,3 +1,4 @@
import datetime as dt
import functools import functools
from .common import InfoExtractor from .common import InfoExtractor
@ -10,7 +11,7 @@
filter_dict, filter_dict,
int_or_none, int_or_none,
orderedSet, orderedSet,
unified_timestamp, parse_iso8601,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -87,9 +88,9 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'uploader_id': 'rlantnghks', 'uploader_id': 'rlantnghks',
'uploader': '페이즈으', 'uploader': '페이즈으',
'duration': 10840, 'duration': 10840,
'thumbnail': r're:https?://videoimg\.sooplive\.co/.kr/.+', 'thumbnail': r're:https?://videoimg\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'upload_date': '20230108', 'upload_date': '20230108',
'timestamp': 1673218805, 'timestamp': 1673186405,
'title': '젠지 페이즈', 'title': '젠지 페이즈',
}, },
'params': { 'params': {
@ -102,7 +103,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20170411_BE689A0E_190960999_1_2_h', 'id': '20170411_BE689A0E_190960999_1_2_h',
'ext': 'mp4', 'ext': 'mp4',
'title': '혼자사는여자집', 'title': '혼자사는여자집',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+', 'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '♥이슬이', 'uploader': '♥이슬이',
'uploader_id': 'dasl8121', 'uploader_id': 'dasl8121',
'upload_date': '20170411', 'upload_date': '20170411',
@ -119,7 +120,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20180327_27901457_202289533_1', 'id': '20180327_27901457_202289533_1',
'ext': 'mp4', 'ext': 'mp4',
'title': '[생]빨개요♥ (part 1)', 'title': '[생]빨개요♥ (part 1)',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+', 'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '[SA]서아', 'uploader': '[SA]서아',
'uploader_id': 'bjdyrksu', 'uploader_id': 'bjdyrksu',
'upload_date': '20180327', 'upload_date': '20180327',
@ -187,7 +188,7 @@ def _real_extract(self, url):
'formats': formats, 'formats': formats,
**traverse_obj(file_element, { **traverse_obj(file_element, {
'duration': ('duration', {int_or_none(scale=1000)}), 'duration': ('duration', {int_or_none(scale=1000)}),
'timestamp': ('file_start', {unified_timestamp}), 'timestamp': ('file_start', {parse_iso8601(delimiter=' ', timezone=dt.timedelta(hours=9))}),
}), }),
}) })
@ -370,7 +371,7 @@ def _real_extract(self, url):
'title': channel_info.get('TITLE') or station_info.get('station_title'), 'title': channel_info.get('TITLE') or station_info.get('station_title'),
'uploader': channel_info.get('BJNICK') or station_info.get('station_name'), 'uploader': channel_info.get('BJNICK') or station_info.get('station_name'),
'uploader_id': broadcaster_id, 'uploader_id': broadcaster_id,
'timestamp': unified_timestamp(station_info.get('broad_start')), 'timestamp': parse_iso8601(station_info.get('broad_start'), delimiter=' ', timezone=dt.timedelta(hours=9)),
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': True,
'http_headers': {'Referer': url}, 'http_headers': {'Referer': url},

View File

@ -146,7 +146,7 @@ class TokFMPodcastIE(InfoExtractor):
'url': 'https://audycje.tokfm.pl/podcast/91275,-Systemowy-rasizm-Czy-zamieszki-w-USA-po-morderstwie-w-Minneapolis-doprowadza-do-zmian-w-sluzbach-panstwowych', 'url': 'https://audycje.tokfm.pl/podcast/91275,-Systemowy-rasizm-Czy-zamieszki-w-USA-po-morderstwie-w-Minneapolis-doprowadza-do-zmian-w-sluzbach-panstwowych',
'info_dict': { 'info_dict': {
'id': '91275', 'id': '91275',
'ext': 'aac', 'ext': 'mp3',
'title': 'md5:a9b15488009065556900169fb8061cce', 'title': 'md5:a9b15488009065556900169fb8061cce',
'episode': 'md5:a9b15488009065556900169fb8061cce', 'episode': 'md5:a9b15488009065556900169fb8061cce',
'series': 'Analizy', 'series': 'Analizy',
@ -164,23 +164,20 @@ def _real_extract(self, url):
raise ExtractorError('No such podcast', expected=True) raise ExtractorError('No such podcast', expected=True)
metadata = metadata[0] metadata = metadata[0]
formats = [] mp3_url = self._download_json(
for ext in ('aac', 'mp3'): 'https://api.podcast.radioagora.pl/api4/getSongUrl',
url_data = self._download_json( media_id, 'Downloading podcast mp3 URL', query={
f'https://api.podcast.radioagora.pl/api4/getSongUrl?podcast_id={media_id}&device_id={uuid.uuid4()}&ppre=false&audio={ext}', 'podcast_id': media_id,
media_id, f'Downloading podcast {ext} URL') 'device_id': str(uuid.uuid4()),
# prevents inserting the mp3 (default) multiple times 'ppre': 'false',
if 'link_ssl' in url_data and f'.{ext}' in url_data['link_ssl']: 'audio': 'mp3',
formats.append({ })['link_ssl']
'url': url_data['link_ssl'],
'ext': ext,
'vcodec': 'none',
'acodec': ext,
})
return { return {
'id': media_id, 'id': media_id,
'formats': formats, 'url': mp3_url,
'vcodec': 'none',
'ext': 'mp3',
'title': metadata.get('podcast_name'), 'title': metadata.get('podcast_name'),
'series': metadata.get('series_name'), 'series': metadata.get('series_name'),
'episode': metadata.get('podcast_name'), 'episode': metadata.get('podcast_name'),

View File

@ -86,7 +86,7 @@ def _parse_video(self, video_data, url=None):
'webpage_url': ( 'webpage_url': (
'id', ({value(url)}, {format_field(template='https://www.bandlab.com/post/%s')}), filter, any), 'id', ({value(url)}, {format_field(template='https://www.bandlab.com/post/%s')}), filter, any),
'url': ('video', 'url', {url_or_none}), 'url': ('video', 'url', {url_or_none}),
'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}), 'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
'description': ('caption', {str}), 'description': ('caption', {str}),
'thumbnail': ('video', 'picture', 'url', {url_or_none}), 'thumbnail': ('video', 'picture', 'url', {url_or_none}),
'view_count': ('video', 'counters', 'plays', {int_or_none}), 'view_count': ('video', 'counters', 'plays', {int_or_none}),
@ -120,7 +120,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995, 'duration': 54.629999999999995,
'title': 'sweet black', 'title': 'sweet black',
'upload_date': '20231210', 'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'], 'genres': ['Lofi'],
'uploader': 'ender milze', 'uploader': 'ender milze',
'comment_count': int, 'comment_count': int,
@ -142,7 +142,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995, 'duration': 54.629999999999995,
'title': 'sweet black', 'title': 'sweet black',
'upload_date': '20231210', 'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'], 'genres': ['Lofi'],
'uploader': 'ender milze', 'uploader': 'ender milze',
'comment_count': int, 'comment_count': int,
@ -158,7 +158,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int, 'comment_count': int,
'genres': ['Other'], 'genres': ['Other'],
'uploader_id': 'user8353034818103753', 'uploader_id': 'user8353034818103753',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/',
'timestamp': 1709625771, 'timestamp': 1709625771,
'track': 'PodcastMaerchen4b', 'track': 'PodcastMaerchen4b',
'duration': 468.14, 'duration': 468.14,
@ -178,7 +178,7 @@ class BandlabIE(BandlabBaseIE):
'id': '110343fc-148b-ea11-96d2-0003ffd1fc09', 'id': '110343fc-148b-ea11-96d2-0003ffd1fc09',
'ext': 'm4a', 'ext': 'm4a',
'timestamp': 1588273294, 'timestamp': 1588273294,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/',
'description': 'Final Revision.', 'description': 'Final Revision.',
'title': 'Replay ( Instrumental)', 'title': 'Replay ( Instrumental)',
'uploader': 'David R Sparks', 'uploader': 'David R Sparks',
@ -200,7 +200,7 @@ class BandlabIE(BandlabBaseIE):
'id': '5cdf9036-3857-ef11-991a-6045bd36e0d9', 'id': '5cdf9036-3857-ef11-991a-6045bd36e0d9',
'ext': 'mp4', 'ext': 'mp4',
'duration': 44.705, 'duration': 44.705,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/',
'comment_count': int, 'comment_count': int,
'title': 'backing vocals', 'title': 'backing vocals',
'uploader_id': 'marliashya', 'uploader_id': 'marliashya',
@ -224,7 +224,7 @@ class BandlabIE(BandlabBaseIE):
'view_count': int, 'view_count': int,
'track': 'Positronic Meltdown', 'track': 'Positronic Meltdown',
'duration': 318.55, 'duration': 318.55,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/',
'description': 'Checkout my tracks at AOMX http://aomxsounds.com/', 'description': 'Checkout my tracks at AOMX http://aomxsounds.com/',
'uploader_id': 'microfreaks', 'uploader_id': 'microfreaks',
'title': 'Positronic Meltdown', 'title': 'Positronic Meltdown',
@ -246,7 +246,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int, 'comment_count': int,
'uploader': 'Sorakime', 'uploader': 'Sorakime',
'uploader_id': 'sorakime', 'uploader_id': 'sorakime',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/',
'timestamp': 1691162128, 'timestamp': 1691162128,
'upload_date': '20230804', 'upload_date': '20230804',
'media_type': 'track', 'media_type': 'track',

View File

@ -1596,16 +1596,16 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, list_id) webpage = self._download_webpage(url, list_id)
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', list_id) initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', list_id)
if traverse_obj(initial_state, ('error', 'code', {int_or_none})) != 200: error = traverse_obj(initial_state, (('error', 'listError'), all, lambda _, v: v['code'], any))
error_code = traverse_obj(initial_state, ('error', 'trueCode', {int_or_none})) if error and error['code'] != 200:
error_message = traverse_obj(initial_state, ('error', 'message', {str_or_none})) error_code = error.get('trueCode')
if error_code == -400 and list_id == 'watchlater': if error_code == -400 and list_id == 'watchlater':
self.raise_login_required('You need to login to access your watchlater playlist') self.raise_login_required('You need to login to access your watchlater playlist')
elif error_code == -403: elif error_code == -403:
self.raise_login_required('This is a private playlist. You need to login as its owner') self.raise_login_required('This is a private playlist. You need to login as its owner')
elif error_code == 11010: elif error_code == 11010:
raise ExtractorError('Playlist is no longer available', expected=True) raise ExtractorError('Playlist is no longer available', expected=True)
raise ExtractorError(f'Could not access playlist: {error_code} {error_message}') raise ExtractorError(f'Could not access playlist: {error_code} {error.get("message")}')
query = { query = {
'ps': 20, 'ps': 20,

View File

@ -53,7 +53,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$', 'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...', 'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement', 'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}', 'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911', 'upload_date': '20240911',
@ -172,7 +172,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$', 'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...', 'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement', 'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}', 'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911', 'upload_date': '20240911',
@ -191,7 +191,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '3l7rdfxhyds2f', 'id': '3l7rdfxhyds2f',
'ext': 'mp4', 'ext': 'mp4',
'uploader': 'cinnamon', 'uploader': 'cinnamon 🐇 🏳️‍⚧️',
'uploader_id': 'cinny.bun.how', 'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/cinny.bun.how', 'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide', 'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
@ -255,7 +255,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '3l77u64l7le2e', 'id': '3l77u64l7le2e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'hearing people on twitter say that bluesky isn\'...', 'title': "hearing people on twitter say that bluesky isn't funny yet so post t...",
'like_count': int, 'like_count': int,
'uploader_id': 'thafnine.net', 'uploader_id': 'thafnine.net',
'uploader_url': 'https://bsky.app/profile/thafnine.net', 'uploader_url': 'https://bsky.app/profile/thafnine.net',
@ -387,7 +387,7 @@ def _extract_videos(self, root, video_id, embed_path='embed', record_path='recor
'age_limit': ( 'age_limit': (
'labels', ..., 'val', {lambda x: 18 if x in ('sexual', 'porn', 'graphic-media') else None}, any), 'labels', ..., 'val', {lambda x: 18 if x in ('sexual', 'porn', 'graphic-media') else None}, any),
'description': (*record_path, 'text', {str}, filter), 'description': (*record_path, 'text', {str}, filter),
'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}), 'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
}), }),
}) })
return entries return entries

View File

@ -24,7 +24,7 @@ def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
class BokeCCIE(BokeCCBaseIE): class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频' IE_DESC = 'CC视频'
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)' _VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{ _TESTS = [{

View File

@ -0,0 +1,84 @@
import json
import time
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
jwt_decode_hs256,
parse_iso8601,
url_or_none,
variadic,
)
from ..utils.traversal import traverse_obj
class CanalsurmasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canalsurmas\.es/videos/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.canalsurmas.es/videos/44006-el-gran-queo-1-lora-del-rio-sevilla-20072014',
'md5': '861f86fdc1221175e15523047d0087ef',
'info_dict': {
'id': '44006',
'ext': 'mp4',
'title': 'Lora del Río (Sevilla)',
'description': 'md5:3d9ee40a9b1b26ed8259e6b71ed27b8b',
'thumbnail': 'https://cdn2.rtva.interactvty.com/content_cards/00f3e8f67b0a4f3b90a4a14618a48b0d.jpg',
'timestamp': 1648123182,
'upload_date': '20220324',
},
}]
_API_BASE = 'https://api-rtva.interactvty.com'
_access_token = None
@staticmethod
def _is_jwt_expired(token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _call_api(self, endpoint, video_id, fields=None):
if not self._access_token or self._is_jwt_expired(self._access_token):
self._access_token = self._download_json(
f'{self._API_BASE}/jwt/token/', None,
'Downloading access token', 'Failed to download access token',
headers={'Content-Type': 'application/json'},
data=json.dumps({
'username': 'canalsur_demo',
'password': 'dsUBXUcI',
}).encode())['access']
return self._download_json(
f'{self._API_BASE}/api/2.0/contents/{endpoint}/{video_id}/', video_id,
f'Downloading {endpoint} API JSON', f'Failed to download {endpoint} API JSON',
headers={'Authorization': f'jwtok {self._access_token}'},
query={'optional_fields': ','.join(variadic(fields))} if fields else None)
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._call_api('content', video_id, fields=[
'description', 'image', 'duration', 'created_at', 'tags',
])
stream_info = self._call_api('content_resources', video_id, 'media_url')
formats, subtitles = [], {}
for stream_url in traverse_obj(stream_info, ('results', ..., 'media_url', {url_or_none})):
if determine_ext(stream_url) == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({'url': stream_url})
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_info, {
'title': ('name', {str.strip}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'duration': ('duration', {float_or_none}),
'timestamp': ('created_at', {parse_iso8601}),
'tags': ('tags', ..., {str}),
}),
}

View File

@ -121,10 +121,7 @@ def _download_age_confirm_page(self, url, video_id, *args, **kwargs):
}, **kwargs) }, **kwargs)
def _perform_login(self, username, password): def _perform_login(self, username, password):
app_version = random.choice(( app_version = '1.2.255 build 21541'
'1.2.88 build 15306',
'1.2.174 build 18469',
))
android_version = random.randrange(8, 14) android_version = random.randrange(8, 14)
phone_model = random.choice(( phone_model = random.choice((
# x-kom.pl top selling Android smartphones, as of 2022-12-26 # x-kom.pl top selling Android smartphones, as of 2022-12-26
@ -190,7 +187,7 @@ def _api_extract(self, video_id):
meta = self._download_json( meta = self._download_json(
f'{self._BASE_API_URL}/video/{video_id}', video_id, headers=self._API_HEADERS)['video'] f'{self._BASE_API_URL}/video/{video_id}', video_id, headers=self._API_HEADERS)['video']
uploader = traverse_obj(meta, 'author', 'login') uploader = traverse_obj(meta, ('author', 'login', {str}))
formats = [{ formats = [{
'url': quality['file'], 'url': quality['file'],

View File

@ -21,7 +21,7 @@ class CHZZKLiveIE(InfoExtractor):
'channel': '진짜도현', 'channel': '진짜도현',
'channel_id': 'c68b8ef525fb3d2fa146344d84991753', 'channel_id': 'c68b8ef525fb3d2fa146344d84991753',
'channel_is_verified': False, 'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1705510344, 'timestamp': 1705510344,
'upload_date': '20240117', 'upload_date': '20240117',
'live_status': 'is_live', 'live_status': 'is_live',
@ -98,7 +98,7 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '침착맨', 'channel': '침착맨',
'channel_id': 'bb382c2c0cc9fa7c86ab3b037fb5799c', 'channel_id': 'bb382c2c0cc9fa7c86ab3b037fb5799c',
'channel_is_verified': False, 'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 15577, 'duration': 15577,
'timestamp': 1702970505.417, 'timestamp': 1702970505.417,
'upload_date': '20231219', 'upload_date': '20231219',
@ -115,7 +115,7 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '라디유radiyu', 'channel': '라디유radiyu',
'channel_id': '68f895c59a1043bc5019b5e08c83a5c5', 'channel_id': '68f895c59a1043bc5019b5e08c83a5c5',
'channel_is_verified': False, 'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 95, 'duration': 95,
'timestamp': 1703102631.722, 'timestamp': 1703102631.722,
'upload_date': '20231220', 'upload_date': '20231220',
@ -131,12 +131,30 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '강지', 'channel': '강지',
'channel_id': 'b5ed5db484d04faf4d150aedd362f34b', 'channel_id': 'b5ed5db484d04faf4d150aedd362f34b',
'channel_is_verified': True, 'channel_is_verified': True,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 4433, 'duration': 4433,
'timestamp': 1703307460.214, 'timestamp': 1703307460.214,
'upload_date': '20231223', 'upload_date': '20231223',
'view_count': int, 'view_count': int,
}, },
}, {
# video_status == 'NONE' but is downloadable
'url': 'https://chzzk.naver.com/video/6325166',
'info_dict': {
'id': '6325166',
'ext': 'mp4',
'title': '와이프 숙제빼주기',
'channel': '이 다',
'channel_id': '0076a519f147ee9fd0959bf02f9571ca',
'channel_is_verified': False,
'view_count': int,
'duration': 28167,
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1742139216.86,
'upload_date': '20250316',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -147,11 +165,7 @@ def _real_extract(self, url):
live_status = 'was_live' if video_meta.get('liveOpenDate') else 'not_live' live_status = 'was_live' if video_meta.get('liveOpenDate') else 'not_live'
video_status = video_meta.get('vodStatus') video_status = video_meta.get('vodStatus')
if video_status == 'UPLOAD': if video_status == 'ABR_HLS':
playback = self._parse_json(video_meta['liveRewindPlaybackJson'], video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
playback['media'][0]['path'], video_id, 'mp4', m3u8_id='hls')
elif video_status == 'ABR_HLS':
formats, subtitles = self._extract_mpd_formats_and_subtitles( formats, subtitles = self._extract_mpd_formats_and_subtitles(
f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}', f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}',
video_id, query={ video_id, query={
@ -161,6 +175,13 @@ def _real_extract(self, url):
'cpl': 'en_US', 'cpl': 'en_US',
}) })
else: else:
fatal = video_status == 'UPLOAD'
playback = self._parse_json(video_meta['liveRewindPlaybackJson'], video_id, fatal=fatal)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
traverse_obj(playback, ('media', 0, 'path')), video_id, 'mp4', m3u8_id='hls', fatal=fatal)
if formats and video_status != 'UPLOAD':
self.write_debug(f'Video found with status: "{video_status}"')
elif not formats:
self.raise_no_formats( self.raise_no_formats(
f'Unknown video status detected: "{video_status}"', expected=True, video_id=video_id) f'Unknown video status detected: "{video_status}"', expected=True, video_id=video_id)
formats, subtitles = [], {} formats, subtitles = [], {}

View File

@ -78,6 +78,7 @@
parse_iso8601, parse_iso8601,
parse_m3u8_attributes, parse_m3u8_attributes,
parse_resolution, parse_resolution,
qualities,
sanitize_url, sanitize_url,
smuggle_url, smuggle_url,
str_or_none, str_or_none,
@ -1569,6 +1570,8 @@ def _yield_json_ld(self, html, video_id, *, fatal=True, default=NO_DEFAULT):
"""Yield all json ld objects in the html""" """Yield all json ld objects in the html"""
if default is not NO_DEFAULT: if default is not NO_DEFAULT:
fatal = False fatal = False
if not fatal and not isinstance(html, str):
return
for mobj in re.finditer(JSON_LD_RE, html): for mobj in re.finditer(JSON_LD_RE, html):
json_ld_item = self._parse_json( json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal, mobj.group('json_ld'), video_id, fatal=fatal,
@ -2177,6 +2180,8 @@ def extract_media(x_media_line):
media_url = media.get('URI') media_url = media.get('URI')
if media_url: if media_url:
manifest_url = format_url(media_url) manifest_url = format_url(media_url)
is_audio = media_type == 'AUDIO'
is_alternate = media.get('DEFAULT') == 'NO' or media.get('AUTOSELECT') == 'NO'
formats.extend({ formats.extend({
'format_id': join_nonempty(m3u8_id, group_id, name, idx), 'format_id': join_nonempty(m3u8_id, group_id, name, idx),
'format_note': name, 'format_note': name,
@ -2189,7 +2194,11 @@ def extract_media(x_media_line):
'preference': preference, 'preference': preference,
'quality': quality, 'quality': quality,
'has_drm': has_drm, 'has_drm': has_drm,
'vcodec': 'none' if media_type == 'AUDIO' else None, 'vcodec': 'none' if is_audio else None,
# Alternate audio formats (e.g. audio description) should be deprioritized
'source_preference': -2 if is_audio and is_alternate else None,
# Save this to assign source_preference based on associated video stream
'_audio_group_id': group_id if is_audio and not is_alternate else None,
} for idx in _extract_m3u8_playlist_indices(manifest_url)) } for idx in _extract_m3u8_playlist_indices(manifest_url))
def build_stream_name(): def build_stream_name():
@ -2284,6 +2293,8 @@ def build_stream_name():
# ignore references to rendition groups and treat them # ignore references to rendition groups and treat them
# as complete formats. # as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none': if audio_group_id and codecs and f.get('vcodec') != 'none':
# Save this to determine quality of audio formats that only have a GROUP-ID
f['_audio_group_id'] = audio_group_id
audio_group = groups.get(audio_group_id) audio_group = groups.get(audio_group_id)
if audio_group and audio_group[0].get('URI'): if audio_group and audio_group[0].get('URI'):
# TODO: update acodec for audio only formats with # TODO: update acodec for audio only formats with
@ -2306,6 +2317,28 @@ def build_stream_name():
formats.append(http_f) formats.append(http_f)
last_stream_inf = {} last_stream_inf = {}
# Some audio-only formats only have a GROUP-ID without any other quality/bitrate/codec info
# Each audio GROUP-ID corresponds with one or more video formats' AUDIO attribute
# For sorting purposes, set source_preference based on the quality of the video formats they are grouped with
# See https://github.com/yt-dlp/yt-dlp/issues/11178
audio_groups_by_quality = orderedSet(f['_audio_group_id'] for f in sorted(
traverse_obj(formats, lambda _, v: v.get('vcodec') != 'none' and v['_audio_group_id']),
key=lambda x: (x.get('tbr') or 0, x.get('width') or 0)))
audio_quality_map = {
audio_groups_by_quality[0]: 'low',
audio_groups_by_quality[-1]: 'high',
} if len(audio_groups_by_quality) > 1 else None
audio_preference = qualities(audio_groups_by_quality)
for fmt in formats:
audio_group_id = fmt.pop('_audio_group_id', None)
if not audio_quality_map or not audio_group_id or fmt.get('vcodec') != 'none':
continue
# Use source_preference since quality and preference are set by params
fmt['source_preference'] = audio_preference(audio_group_id)
fmt['format_note'] = join_nonempty(
fmt.get('format_note'), audio_quality_map.get(audio_group_id), delim=', ')
return formats, subtitles return formats, subtitles
def _extract_m3u8_vod_duration( def _extract_m3u8_vod_duration(
@ -2935,8 +2968,7 @@ def location_key(location):
segment_duration = None segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info: if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale']) segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil( representation_ms_info['total_number'] = math.ceil(float_or_none(period_duration, segment_duration, default=0))
float_or_none(period_duration, segment_duration, default=0)))
representation_ms_info['fragments'] = [{ representation_ms_info['fragments'] = [{
media_location_key: media_template % { media_location_key: media_template % {
'Number': segment_number, 'Number': segment_number,

View File

@ -5,7 +5,9 @@
int_or_none, int_or_none,
try_get, try_get,
unified_strdate, unified_strdate,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class CrowdBunkerIE(InfoExtractor): class CrowdBunkerIE(InfoExtractor):
@ -44,16 +46,15 @@ def _real_extract(self, url):
'url': sub_url, 'url': sub_url,
}) })
mpd_url = try_get(video_json, lambda x: x['dashManifest']['url']) if mpd_url := traverse_obj(video_json, ('dashManifest', 'url', {url_or_none})):
if mpd_url: fmts, subs = self._extract_mpd_formats_and_subtitles(mpd_url, video_id, mpd_id='dash', fatal=False)
fmts, subs = self._extract_mpd_formats_and_subtitles(mpd_url, video_id)
formats.extend(fmts) formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs) self._merge_subtitles(subs, target=subtitles)
m3u8_url = try_get(video_json, lambda x: x['hlsManifest']['url'])
if m3u8_url: if m3u8_url := traverse_obj(video_json, ('hlsManifest', 'url', {url_or_none})):
fmts, subs = self._extract_m3u8_formats_and_subtitles(mpd_url, video_id) fmts, subs = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts) formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs) self._merge_subtitles(subs, target=subtitles)
thumbnails = [{ thumbnails = [{
'url': image['url'], 'url': image['url'],

View File

@ -1,142 +0,0 @@
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
orderedSet,
)
class DeezerBaseInfoExtractor(InfoExtractor):
def get_data(self, url):
if not self.get_param('test'):
self.report_warning('For now, this extractor only supports the 30 second previews. Patches welcome!')
mobj = self._match_valid_url(url)
data_id = mobj.group('id')
webpage = self._download_webpage(url, data_id)
geoblocking_msg = self._html_search_regex(
r'<p class="soon-txt">(.*?)</p>', webpage, 'geoblocking message',
default=None)
if geoblocking_msg is not None:
raise ExtractorError(
f'Deezer said: {geoblocking_msg}', expected=True)
data_json = self._search_regex(
(r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
webpage, 'data JSON')
data = json.loads(data_json)
return data_id, webpage, data
class DeezerPlaylistIE(DeezerBaseInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?playlist/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.deezer.com/playlist/176747451',
'info_dict': {
'id': '176747451',
'title': 'Best!',
'uploader': 'anonymous',
'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
},
'playlist_count': 29,
}
def _real_extract(self, url):
playlist_id, webpage, data = self.get_data(url)
playlist_title = data.get('DATA', {}).get('TITLE')
playlist_uploader = data.get('DATA', {}).get('PARENT_USERNAME')
playlist_thumbnail = self._search_regex(
r'<img id="naboo_playlist_image".*?src="([^"]+)"', webpage,
'playlist thumbnail')
entries = []
for s in data.get('SONGS', {}).get('data'):
formats = [{
'format_id': 'preview',
'url': s.get('MEDIA', [{}])[0].get('HREF'),
'preference': -100, # Only the first 30 seconds
'ext': 'mp3',
}]
artists = ', '.join(
orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
entries.append({
'id': s.get('SNG_ID'),
'duration': int_or_none(s.get('DURATION')),
'title': '{} - {}'.format(artists, s.get('SNG_TITLE')),
'uploader': s.get('ART_NAME'),
'uploader_id': s.get('ART_ID'),
'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
'formats': formats,
})
return {
'_type': 'playlist',
'id': playlist_id,
'title': playlist_title,
'uploader': playlist_uploader,
'thumbnail': playlist_thumbnail,
'entries': entries,
}
class DeezerAlbumIE(DeezerBaseInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?album/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://www.deezer.com/fr/album/67505622',
'info_dict': {
'id': '67505622',
'title': 'Last Week',
'uploader': 'Home Brew',
'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
},
'playlist_count': 7,
}
def _real_extract(self, url):
album_id, webpage, data = self.get_data(url)
album_title = data.get('DATA', {}).get('ALB_TITLE')
album_uploader = data.get('DATA', {}).get('ART_NAME')
album_thumbnail = self._search_regex(
r'<img id="naboo_album_image".*?src="([^"]+)"', webpage,
'album thumbnail')
entries = []
for s in data.get('SONGS', {}).get('data'):
formats = [{
'format_id': 'preview',
'url': s.get('MEDIA', [{}])[0].get('HREF'),
'preference': -100, # Only the first 30 seconds
'ext': 'mp3',
}]
artists = ', '.join(
orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
entries.append({
'id': s.get('SNG_ID'),
'duration': int_or_none(s.get('DURATION')),
'title': '{} - {}'.format(artists, s.get('SNG_TITLE')),
'uploader': s.get('ART_NAME'),
'uploader_id': s.get('ART_ID'),
'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
'formats': formats,
'track': s.get('SNG_TITLE'),
'track_number': int_or_none(s.get('TRACK_NUMBER')),
'track_id': s.get('SNG_ID'),
'artist': album_uploader,
'album': album_title,
'album_artist': album_uploader,
})
return {
'_type': 'playlist',
'id': album_id,
'title': album_title,
'uploader': album_uploader,
'thumbnail': album_thumbnail,
'entries': entries,
}

View File

@ -0,0 +1,87 @@
import urllib.parse
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class FrancaisFacileIE(InfoExtractor):
_VALID_URL = r'https?://francaisfacile\.rfi\.fr/[a-z]{2}/(?:actualit%C3%A9|podcasts/[^/#?]+)/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://francaisfacile.rfi.fr/fr/actualit%C3%A9/20250305-r%C3%A9concilier-les-jeunes-avec-la-lecture-gr%C3%A2ce-aux-r%C3%A9seaux-sociaux',
'md5': '4f33674cb205744345cc835991100afa',
'info_dict': {
'id': 'WBMZ58952-FLE-FR-20250305',
'display_id': '20250305-réconcilier-les-jeunes-avec-la-lecture-grâce-aux-réseaux-sociaux',
'title': 'Réconcilier les jeunes avec la lecture grâce aux réseaux sociaux',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/05/6b6af52a-f9ba-11ef-a1f8-005056a97652.mp3',
'ext': 'mp3',
'description': 'md5:b903c63d8585bd59e8cc4d5f80c4272d',
'duration': 103.15,
'timestamp': 1741177984,
'upload_date': '20250305',
},
}, {
'url': 'https://francaisfacile.rfi.fr/fr/actualit%C3%A9/20250307-argentine-le-sac-d-un-alpiniste-retrouv%C3%A9-40-ans-apr%C3%A8s-sa-mort',
'md5': 'b8c3a63652d4ae8e8092dda5700c1cd9',
'info_dict': {
'id': 'WBMZ59102-FLE-FR-20250307',
'display_id': '20250307-argentine-le-sac-d-un-alpiniste-retrouvé-40-ans-après-sa-mort',
'title': 'Argentine: le sac d\'un alpiniste retrouvé 40 ans après sa mort',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/07/8edf4082-fb46-11ef-8a37-005056bf762b.mp3',
'ext': 'mp3',
'description': 'md5:7fd088fbdf4a943bb68cf82462160dca',
'duration': 117.74,
'timestamp': 1741352789,
'upload_date': '20250307',
},
}, {
'url': 'https://francaisfacile.rfi.fr/fr/podcasts/un-mot-une-histoire/20250317-le-mot-de-david-foenkinos-peut-%C3%AAtre',
'md5': 'db83c2cc2589b4c24571c6b6cf14f5f1',
'info_dict': {
'id': 'WBMZ59441-FLE-FR-20250317',
'display_id': '20250317-le-mot-de-david-foenkinos-peut-être',
'title': 'Le mot de David Foenkinos: «peut-être» - Un mot, une histoire',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/17/4ca6cbbe-0315-11f0-a85b-005056a97652.mp3',
'ext': 'mp3',
'description': 'md5:3fe35fae035803df696bfa7af2496e49',
'duration': 198.96,
'timestamp': 1742210897,
'upload_date': '20250317',
},
}]
def _real_extract(self, url):
display_id = urllib.parse.unquote(self._match_id(url))
try: # yt-dlp's default user-agents are too old and blocked by the site
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient
webpage = self._download_webpage(url, display_id, impersonate=True)
data = self._search_json(
r'<script[^>]+\bdata-media-id=[^>]+\btype="application/json"[^>]*>',
webpage, 'audio data', display_id)
return {
'id': data['mediaId'],
'display_id': display_id,
'vcodec': 'none',
'title': self._html_extract_title(webpage),
**self._search_json_ld(webpage, display_id, fatal=False),
**traverse_obj(data, {
'title': ('title', {str}),
'url': ('sources', ..., 'url', {url_or_none}, any),
'duration': ('sources', ..., 'duration', {float_or_none}, any),
}),
}

View File

@ -16,6 +16,7 @@
MEDIA_EXTENSIONS, MEDIA_EXTENSIONS,
ExtractorError, ExtractorError,
UnsupportedError, UnsupportedError,
base_url,
determine_ext, determine_ext,
determine_protocol, determine_protocol,
dict_get, dict_get,
@ -2213,10 +2214,21 @@ def hex_or_none(value):
if is_live is not None: if is_live is not None:
info['live_status'] = 'not_live' if is_live == 'false' else 'is_live' info['live_status'] = 'not_live' if is_live == 'false' else 'is_live'
return return
headers = m3u8_format.get('http_headers') or info.get('http_headers') headers = m3u8_format.get('http_headers') or info.get('http_headers') or {}
duration = self._extract_m3u8_vod_duration( display_id = info.get('id')
m3u8_format['url'], info.get('id'), note='Checking m3u8 live status', urlh = self._request_webpage(
errnote='Failed to download m3u8 media playlist', headers=headers) m3u8_format['url'], display_id, 'Checking m3u8 live status', errnote=False,
headers={**headers, 'Accept-Encoding': 'identity'}, fatal=False)
if urlh is False:
return
first_bytes = urlh.read(512)
if not first_bytes.startswith(b'#EXTM3U'):
return
m3u8_doc = self._webpage_read_content(
urlh, urlh.url, display_id, prefix=first_bytes, fatal=False, errnote=False)
if not m3u8_doc:
return
duration = self._parse_m3u8_vod_duration(m3u8_doc, display_id)
if not duration: if not duration:
info['live_status'] = 'is_live' info['live_status'] = 'is_live'
info['duration'] = info.get('duration') or duration info['duration'] = info.get('duration') or duration
@ -2531,7 +2543,7 @@ def _real_extract(self, url):
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles( info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles(
doc, doc,
mpd_base_url=full_response.url.rpartition('/')[0], mpd_base_url=base_url(full_response.url),
mpd_url=url) mpd_url=url)
info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None
self._extra_manifest_info(info_dict, url) self._extra_manifest_info(info_dict, url)

View File

@ -1,19 +0,0 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
)
class GigyaBaseIE(InfoExtractor):
def _gigya_login(self, auth_data):
auth_info = self._download_json(
'https://accounts.eu1.gigya.com/accounts.login', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(auth_data))
error_message = auth_info.get('errorDetails') or auth_info.get('errorMessage')
if error_message:
raise ExtractorError(
f'Unable to login: {error_message}', expected=True)
return auth_info

View File

@ -6,7 +6,7 @@
) )
class HSEShowBaseInfoExtractor(InfoExtractor): class HSEShowBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE'] _GEO_COUNTRIES = ['DE']
def _extract_redux_data(self, url, video_id): def _extract_redux_data(self, url, video_id):
@ -28,7 +28,7 @@ def _extract_formats_and_subtitles(self, sources, video_id):
return formats, subtitles return formats, subtitles
class HSEShowIE(HSEShowBaseInfoExtractor): class HSEShowIE(HSEShowBaseIE):
_VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/c/tv-shows/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/c/tv-shows/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.hse.de/dpl/c/tv-shows/505350', 'url': 'https://www.hse.de/dpl/c/tv-shows/505350',
@ -64,7 +64,7 @@ def _real_extract(self, url):
} }
class HSEProductIE(HSEShowBaseInfoExtractor): class HSEProductIE(HSEShowBaseIE):
_VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/p/product/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/p/product/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.hse.de/dpl/p/product/408630', 'url': 'https://www.hse.de/dpl/p/product/408630',

View File

@ -1,5 +1,13 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError, str_or_none, traverse_obj, unified_strdate from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
traverse_obj,
unified_strdate,
url_or_none,
)
class IchinanaLiveIE(InfoExtractor): class IchinanaLiveIE(InfoExtractor):
@ -157,3 +165,51 @@ def _real_extract(self, url):
'description': view_data.get('caption'), 'description': view_data.get('caption'),
'upload_date': unified_strdate(str_or_none(view_data.get('createdAt'))), 'upload_date': unified_strdate(str_or_none(view_data.get('createdAt'))),
} }
class IchinanaLiveVODIE(InfoExtractor):
IE_NAME = '17live:vod'
_VALID_URL = r'https?://(?:www\.)?17\.live/ja/vod/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://17.live/ja/vod/27323042/2cf84520-e65e-4b22-891e-1d3a00b0f068',
'md5': '3299b930d7457b069639486998a89580',
'info_dict': {
'id': '2cf84520-e65e-4b22-891e-1d3a00b0f068',
'ext': 'mp4',
'title': 'md5:b5f8cbf497d54cc6a60eb3b480182f01',
'uploader': 'md5:29fb12122ab94b5a8495586e7c3085a5',
'uploader_id': '27323042',
'channel': '🌟オールナイトニッポン アーカイブ🌟',
'channel_id': '2b4f85f1-d61e-429d-a901-68d32bdd8645',
'like_count': int,
'view_count': int,
'thumbnail': r're:https?://.+/.+\.(?:jpe?g|png)',
'duration': 549,
'description': 'md5:116f326579700f00eaaf5581aae1192e',
'timestamp': 1741058645,
'upload_date': '20250304',
},
}, {
'url': 'https://17.live/ja/vod/27323042/0de11bac-9bea-40b8-9eab-0239a7d88079',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(f'https://wap-api.17app.co/api/v1/vods/{video_id}', video_id)
return traverse_obj(json_data, {
'id': ('vodID', {str}),
'title': ('title', {str}),
'formats': ('vodURL', {lambda x: self._extract_m3u8_formats(x, video_id)}),
'uploader': ('userInfo', 'displayName', {str}),
'uploader_id': ('userInfo', 'roomID', {int}, {str_or_none}),
'channel': ('userInfo', 'name', {str}),
'channel_id': ('userInfo', 'userID', {str}),
'like_count': ('likeCount', {int_or_none}),
'view_count': ('viewCount', {int_or_none}),
'thumbnail': ('imageURL', {url_or_none}),
'duration': ('duration', {int_or_none}),
'description': ('description', {str}),
'timestamp': ('createdAt', {int_or_none}),
})

78
yt_dlp/extractor/ivoox.py Normal file
View File

@ -0,0 +1,78 @@
from .common import InfoExtractor
from ..utils import int_or_none, parse_iso8601, url_or_none, urljoin
from ..utils.traversal import traverse_obj
class IvooxIE(InfoExtractor):
_VALID_URL = (
r'https?://(?:www\.)?ivoox\.com/(?:\w{2}/)?[^/?#]+_rf_(?P<id>[0-9]+)_1\.html',
r'https?://go\.ivoox\.com/rf/(?P<id>[0-9]+)',
)
_TESTS = [{
'url': 'https://www.ivoox.com/dex-08x30-rostros-del-mal-los-asesinos-en-audios-mp3_rf_143594959_1.html',
'md5': '993f712de5b7d552459fc66aa3726885',
'info_dict': {
'id': '143594959',
'ext': 'mp3',
'timestamp': 1742731200,
'channel': 'DIAS EXTRAÑOS con Santiago Camacho',
'title': 'DEx 08x30 Rostros del mal: Los asesinos en serie que aterrorizaron España',
'description': 'md5:eae8b4b9740d0216d3871390b056bb08',
'uploader': 'Santiago Camacho',
'thumbnail': 'https://static-1.ivoox.com/audios/c/d/5/2/cd52f46783fe735000c33a803dce2554_XXL.jpg',
'upload_date': '20250323',
'episode': 'DEx 08x30 Rostros del mal: Los asesinos en serie que aterrorizaron España',
'duration': 11837,
'tags': ['españa', 'asesinos en serie', 'arropiero', 'historia criminal', 'mataviejas'],
},
}, {
'url': 'https://go.ivoox.com/rf/143594959',
'only_matching': True,
}, {
'url': 'https://www.ivoox.com/en/campodelgas-28-03-2025-audios-mp3_rf_144036942_1.html',
'only_matching': True,
}]
def _real_extract(self, url):
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id, fatal=False)
data = self._search_nuxt_data(
webpage, media_id, fatal=False, traverse=('data', 0, 'data', 'audio'))
direct_download = self._download_json(
f'https://vcore-web.ivoox.com/v1/public/audios/{media_id}/download-url', media_id, fatal=False,
note='Fetching direct download link', headers={'Referer': url})
download_paths = {
*traverse_obj(direct_download, ('data', 'downloadUrl', {str}, filter, all)),
*traverse_obj(data, (('downloadUrl', 'mediaUrl'), {str}, filter)),
}
formats = []
for path in download_paths:
formats.append({
'url': urljoin('https://ivoox.com', path),
'http_headers': {'Referer': url},
})
return {
'id': media_id,
'formats': formats,
'uploader': self._html_search_regex(r'data-prm-author="([^"]+)"', webpage, 'author', default=None),
'timestamp': parse_iso8601(
self._html_search_regex(r'data-prm-pubdate="([^"]+)"', webpage, 'timestamp', default=None)),
'channel': self._html_search_regex(r'data-prm-podname="([^"]+)"', webpage, 'channel', default=None),
'title': self._html_search_regex(r'data-prm-title="([^"]+)"', webpage, 'title', default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
**self._search_json_ld(webpage, media_id, default={}),
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'timestamp': ('uploadDate', {parse_iso8601(delimiter=' ')}),
'duration': ('duration', {int_or_none}),
'tags': ('tags', ..., 'name', {str}),
}),
}

View File

@ -2,10 +2,12 @@
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
try_get, try_get,
urlhandle_detect_ext,
) )
@ -27,7 +29,7 @@ class JamendoIE(InfoExtractor):
'ext': 'flac', 'ext': 'flac',
# 'title': 'Maya Filipič - Stories from Emona I', # 'title': 'Maya Filipič - Stories from Emona I',
'title': 'Stories from Emona I', 'title': 'Stories from Emona I',
'artist': 'Maya Filipič', 'artists': ['Maya Filipič'],
'album': 'Between two worlds', 'album': 'Between two worlds',
'track': 'Stories from Emona I', 'track': 'Stories from Emona I',
'duration': 210, 'duration': 210,
@ -93,9 +95,15 @@ def _real_extract(self, url):
if not cover_url or cover_url in urls: if not cover_url or cover_url in urls:
continue continue
urls.append(cover_url) urls.append(cover_url)
urlh = self._request_webpage(
HEADRequest(cover_url), track_id, 'Checking thumbnail extension',
errnote=False, fatal=False)
if not urlh:
continue
size = int_or_none(cover_id.lstrip('size')) size = int_or_none(cover_id.lstrip('size'))
thumbnails.append({ thumbnails.append({
'id': cover_id, 'id': cover_id,
'ext': urlhandle_detect_ext(urlh, default='jpg'),
'url': cover_url, 'url': cover_url,
'width': size, 'width': size,
'height': size, 'height': size,

View File

@ -1,3 +1,5 @@
import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@ -124,3 +126,43 @@ def _extract_formats(self, media_info, video_id):
'vbr': ('bitrateVideo', {int_or_none}, {lambda x: None if x == -1 else x}), 'vbr': ('bitrateVideo', {int_or_none}, {lambda x: None if x == -1 else x}),
}), }),
} }
class KikaPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kika\.de/[\w-]+/(?P<id>[a-z-]+\d+)'
_TESTS = [{
'url': 'https://www.kika.de/logo/logo-die-welt-und-ich-562',
'info_dict': {
'id': 'logo-die-welt-und-ich-562',
'title': 'logo!',
'description': 'md5:7b9d7f65561b82fa512f2cfb553c397d',
},
'playlist_count': 100,
}]
def _entries(self, playlist_url, playlist_id):
for page in itertools.count(1):
data = self._download_json(playlist_url, playlist_id, note=f'Downloading page {page}')
for item in traverse_obj(data, ('content', lambda _, v: url_or_none(v['api']['url']))):
yield self.url_result(
item['api']['url'], ie=KikaIE,
**traverse_obj(item, {
'id': ('id', {str}),
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'timestamp': ('date', {parse_iso8601}),
}))
playlist_url = traverse_obj(data, ('links', 'next', {url_or_none}))
if not playlist_url:
break
def _real_extract(self, url):
playlist_id = self._match_id(url)
brand_data = self._download_json(
f'https://www.kika.de/_next-api/proxy/v1/brands/{playlist_id}', playlist_id)
return self.playlist_result(
self._entries(brand_data['videoSubchannel']['videosPageUrl'], playlist_id),
playlist_id, title=brand_data.get('title'), description=brand_data.get('description'))

87
yt_dlp/extractor/loco.py Normal file
View File

@ -0,0 +1,87 @@
from .common import InfoExtractor
from ..utils import int_or_none, url_or_none
from ..utils.traversal import require, traverse_obj
class LocoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?loco\.com/(?P<type>streamers|stream)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://loco.com/streamers/teuzinfps',
'info_dict': {
'id': 'teuzinfps',
'ext': 'mp4',
'title': r're:MS BOLADAO, RESENHA & GAMEPLAY ALTO NIVEL',
'description': 'bom e novo',
'uploader_id': 'RLUVE3S9JU',
'channel': 'teuzinfps',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/743701a9-98ca-41ae-9a8b-70bd5da070ad.jpg',
'tags': ['MMORPG', 'Gameplay'],
'series': 'Tibia',
'timestamp': int,
'modified_timestamp': int,
'live_status': 'is_live',
'upload_date': str,
'modified_date': str,
},
'params': {
'skip_download': 'Livestream',
},
}, {
'url': 'https://loco.com/stream/c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'md5': '45ebc8a47ee1c2240178757caf8881b5',
'info_dict': {
'id': 'c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'ext': 'mp4',
'title': 'PAULINHO LOKO NA LOCO!',
'description': 'live on na loco',
'uploader_id': '2MDO7Z1DPM',
'channel': 'paulinholokobr',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'duration': 14491,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/59b5970b-23c1-4518-9e96-17ce341299fe.jpg',
'tags': ['Gameplay'],
'series': 'GTA 5',
'timestamp': 1740612872,
'modified_timestamp': 1740613037,
'upload_date': '20250226',
'modified_date': '20250226',
},
}]
def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id')
webpage = self._download_webpage(url, video_id)
stream = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', ('liveStreamData', 'stream'), {dict}, any, {require('stream info')}))
return {
'formats': self._extract_m3u8_formats(stream['conf']['hls'], video_id),
'id': video_id,
'is_live': video_type == 'streamers',
**traverse_obj(stream, {
'title': ('title', {str}),
'series': ('game_name', {str}),
'uploader_id': ('user_uid', {str}),
'channel': ('alias', {str}),
'description': ('description', {str}),
'concurrent_view_count': ('viewersCurrent', {int_or_none}),
'view_count': ('total_views', {int_or_none}),
'thumbnail': ('thumbnail_url_small', {url_or_none}),
'like_count': ('likes', {int_or_none}),
'tags': ('tags', ..., {str}),
'timestamp': ('started_at', {int_or_none(scale=1000)}),
'modified_timestamp': ('updated_at', {int_or_none(scale=1000)}),
'comment_count': ('comments_count', {int_or_none}),
'channel_follower_count': ('followers_count', {int_or_none}),
'duration': ('duration', {int_or_none}),
}),
}

View File

@ -2,8 +2,11 @@
from ..utils import ( from ..utils import (
clean_html, clean_html,
merge_dicts, merge_dicts,
str_or_none,
traverse_obj, traverse_obj,
unified_timestamp,
url_or_none, url_or_none,
urljoin,
) )
@ -80,7 +83,7 @@ class LRTVODIE(LRTBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
path, video_id = self._match_valid_url(url).groups() path, video_id = self._match_valid_url(url).group('path', 'id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
media_url = self._extract_js_var(webpage, 'main_url', path) media_url = self._extract_js_var(webpage, 'main_url', path)
@ -106,3 +109,42 @@ def _real_extract(self, url):
} }
return merge_dicts(clean_info, jw_data, json_ld_data) return merge_dicts(clean_info, jw_data, json_ld_data)
class LRTRadioIE(LRTBaseIE):
_VALID_URL = r'https?://(?:www\.)?lrt\.lt/radioteka/irasas/(?P<id>\d+)/(?P<path>[^?#/]+)'
_TESTS = [{
# m3u8 download
'url': 'https://www.lrt.lt/radioteka/irasas/2000359728/nemarios-eiles-apie-pragarus-ir-skaistyklas-su-aiste-kiltinaviciute',
'info_dict': {
'id': '2000359728',
'ext': 'm4a',
'title': 'Nemarios eilės: apie pragarus ir skaistyklas su Aiste Kiltinavičiūte',
'description': 'md5:5eee9a0e86a55bf547bd67596204625d',
'timestamp': 1726143120,
'upload_date': '20240912',
'tags': 'count:5',
'thumbnail': r're:https?://.+/.+\.jpe?g',
'categories': ['Daiktiniai įrodymai'],
},
}, {
'url': 'https://www.lrt.lt/radioteka/irasas/2000304654/vakaras-su-knyga-svetlana-aleksijevic-cernobylio-malda-v-dalis?season=%2Fmediateka%2Faudio%2Fvakaras-su-knyga%2F2023',
'only_matching': True,
}]
def _real_extract(self, url):
video_id, path = self._match_valid_url(url).group('id', 'path')
media = self._download_json(
'https://www.lrt.lt/radioteka/api/media', video_id,
query={'url': f'/mediateka/irasas/{video_id}/{path}'})
return traverse_obj(media, {
'id': ('id', {int}, {str_or_none}),
'title': ('title', {str}),
'tags': ('tags', ..., 'name', {str}),
'categories': ('playlist_item', 'category', {str}, filter, all, filter),
'description': ('content', {clean_html}, {str}),
'timestamp': ('date', {lambda x: x.replace('.', '/')}, {unified_timestamp}),
'thumbnail': ('playlist_item', 'image', {urljoin('https://www.lrt.lt')}),
'formats': ('playlist_item', 'file', {lambda x: self._extract_m3u8_formats(x, video_id)}),
})

View File

@ -1,35 +1,36 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_age_limit, parse_duration, traverse_obj from ..utils import parse_age_limit, parse_duration, url_or_none
from ..utils.traversal import traverse_obj
class MagellanTVIE(InfoExtractor): class MagellanTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.magellantv.com/watch/my-dads-on-death-row?type=v', 'url': 'https://www.magellantv.com/watch/incas-the-new-story?type=v',
'info_dict': { 'info_dict': {
'id': 'my-dads-on-death-row', 'id': 'incas-the-new-story',
'ext': 'mp4', 'ext': 'mp4',
'title': 'My Dad\'s On Death Row', 'title': 'Incas: The New Story',
'description': 'md5:33ba23b9f0651fc4537ed19b1d5b0d7a', 'description': 'md5:936c7f6d711c02dfb9db22a067b586fe',
'duration': 3780.0,
'age_limit': 14, 'age_limit': 14,
'tags': ['Justice', 'Reality', 'United States', 'True Crime'], 'duration': 3060.0,
'tags': ['Ancient History', 'Archaeology', 'Anthropology'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.magellantv.com/video/james-bulger-the-new-revelations', 'url': 'https://www.magellantv.com/video/tortured-to-death-murdering-the-nanny',
'info_dict': { 'info_dict': {
'id': 'james-bulger-the-new-revelations', 'id': 'tortured-to-death-murdering-the-nanny',
'ext': 'mp4', 'ext': 'mp4',
'title': 'James Bulger: The New Revelations', 'title': 'Tortured to Death: Murdering the Nanny',
'description': 'md5:7b97922038bad1d0fe8d0470d8a189f2', 'description': 'md5:d87033594fa218af2b1a8b49f52511e5',
'age_limit': 14,
'duration': 2640.0, 'duration': 2640.0,
'age_limit': 0, 'tags': ['True Crime', 'Murder'],
'tags': ['Investigation', 'True Crime', 'Justice', 'Europe'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.magellantv.com/watch/celebration-nation', 'url': 'https://www.magellantv.com/watch/celebration-nation?type=s',
'info_dict': { 'info_dict': {
'id': 'celebration-nation', 'id': 'celebration-nation',
'ext': 'mp4', 'ext': 'mp4',
@ -43,10 +44,19 @@ class MagellanTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data = traverse_obj(self._search_nextjs_data(webpage, video_id), ( context = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['reactContext']
'props', 'pageProps', 'reactContext', data = traverse_obj(context, ((('video', 'detail'), ('series', 'currentEpisode')), {dict}, any))
(('video', 'detail'), ('series', 'currentEpisode')), {dict}), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(data['jwpVideoUrl'], video_id) formats, subtitles = [], {}
for m3u8_url in set(traverse_obj(data, ((('manifests', ..., 'hls'), 'jwp_video_url'), {url_or_none}))):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and (error := traverse_obj(context, ('errorDetailPage', 'errorMessage', {str}))):
if 'available in your country' in error:
self.raise_geo_restricted(msg=error)
self.raise_no_formats(f'{self.IE_NAME} said: {error}', expected=True)
return { return {
'id': video_id, 'id': video_id,

View File

@ -102,11 +102,10 @@ def add_item(container, item_url, height, id_key='format_id', item_id=None):
item_id = item_id or '%dp' % height item_id = item_id or '%dp' % height
if item_id not in item_url: if item_id not in item_url:
return return
width = int(round(aspect_ratio * height))
container.append({ container.append({
'url': item_url, 'url': item_url,
id_key: item_id, id_key: item_id,
'width': width, 'width': round(aspect_ratio * height),
'height': height, 'height': height,
}) })

View File

@ -4,6 +4,7 @@
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
parse_resolution,
traverse_obj, traverse_obj,
unified_timestamp, unified_timestamp,
url_basename, url_basename,
@ -83,8 +84,8 @@ def _sub_to_dict(subtitle_list):
subtitles.setdefault(sub.pop('tag', 'und'), []).append(sub) subtitles.setdefault(sub.pop('tag', 'und'), []).append(sub)
return subtitles return subtitles
def _extract_ism(self, ism_url, video_id): def _extract_ism(self, ism_url, video_id, fatal=True):
formats = self._extract_ism_formats(ism_url, video_id) formats = self._extract_ism_formats(ism_url, video_id, fatal=fatal)
for fmt in formats: for fmt in formats:
if fmt['language'] != 'eng' and 'English' not in fmt['format_id']: if fmt['language'] != 'eng' and 'English' not in fmt['format_id']:
fmt['language_preference'] = -10 fmt['language_preference'] = -10
@ -218,9 +219,21 @@ class MicrosoftLearnEpisodeIE(MicrosoftMediusBaseIE):
'description': 'md5:7bbbfb593d21c2cf2babc3715ade6b88', 'description': 'md5:7bbbfb593d21c2cf2babc3715ade6b88',
'timestamp': 1676339547, 'timestamp': 1676339547,
'upload_date': '20230214', 'upload_date': '20230214',
'thumbnail': r're:https://learn\.microsoft\.com/video/media/.*\.png', 'thumbnail': r're:https://learn\.microsoft\.com/video/media/.+\.png',
'subtitles': 'count:14', 'subtitles': 'count:14',
}, },
}, {
'url': 'https://learn.microsoft.com/en-gb/shows/on-demand-instructor-led-training-series/az-900-module-1',
'info_dict': {
'id': '4fe10f7c-d83c-463b-ac0e-c30a8195e01b',
'ext': 'mp4',
'title': 'AZ-900 Cloud fundamentals (1 of 6)',
'description': 'md5:3c2212ce865e9142f402c766441bd5c9',
'thumbnail': r're:https://.+/.+\.jpg',
'timestamp': 1706605184,
'upload_date': '20240130',
},
'params': {'format': 'bv[protocol=https]'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -230,9 +243,32 @@ def _real_extract(self, url):
entry_id = self._html_search_meta('entryId', webpage, 'entryId', fatal=True) entry_id = self._html_search_meta('entryId', webpage, 'entryId', fatal=True)
video_info = self._download_json( video_info = self._download_json(
f'https://learn.microsoft.com/api/video/public/v1/entries/{entry_id}', video_id) f'https://learn.microsoft.com/api/video/public/v1/entries/{entry_id}', video_id)
formats = []
if ism_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoUrl', {url_or_none})):
formats.extend(self._extract_ism(ism_url, video_id, fatal=False))
if hls_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoHLSUrl', {url_or_none})):
formats.extend(self._extract_m3u8_formats(hls_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
if mpd_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoDashUrl', {url_or_none})):
formats.extend(self._extract_mpd_formats(mpd_url, video_id, mpd_id='dash', fatal=False))
for key in ('low', 'medium', 'high'):
if video_url := traverse_obj(video_info, ('publicVideo', f'{key}QualityVideoUrl', {url_or_none})):
formats.append({
'url': video_url,
'format_id': f'video-http-{key}',
'acodec': 'none',
**parse_resolution(video_url),
})
if audio_url := traverse_obj(video_info, ('publicVideo', 'audioUrl', {url_or_none})):
formats.append({
'url': audio_url,
'format_id': 'audio-http',
'vcodec': 'none',
})
return { return {
'id': entry_id, 'id': entry_id,
'formats': self._extract_ism(video_info['publicVideo']['adaptiveVideoUrl'], video_id), 'formats': formats,
'subtitles': self._sub_to_dict(traverse_obj(video_info, ( 'subtitles': self._sub_to_dict(traverse_obj(video_info, (
'publicVideo', 'captions', lambda _, v: url_or_none(v['url']), { 'publicVideo', 'captions', lambda _, v: url_or_none(v['url']), {
'tag': ('language', {str}), 'tag': ('language', {str}),

View File

@ -1,5 +1,7 @@
from .telecinco import TelecincoBaseIE from .telecinco import TelecincoBaseIE
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
) )
@ -79,7 +81,17 @@ class MiTeleIE(TelecincoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
try: # yt-dlp's default user-agents are too old and blocked by akamai
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient to bypass akamai
webpage = self._download_webpage(url, display_id, impersonate=True)
pre_player = self._search_json( pre_player = self._search_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
webpage, 'Pre Player', display_id)['prePlayer'] webpage, 'Pre Player', display_id)['prePlayer']

View File

@ -10,7 +10,9 @@
parse_iso8601, parse_iso8601,
strip_or_none, strip_or_none,
try_get, try_get,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class MixcloudBaseIE(InfoExtractor): class MixcloudBaseIE(InfoExtractor):
@ -37,7 +39,7 @@ class MixcloudIE(MixcloudBaseIE):
'ext': 'm4a', 'ext': 'm4a',
'title': 'Cryptkeeper', 'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.', 'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach', 'uploader': 'dholbach',
'uploader_id': 'dholbach', 'uploader_id': 'dholbach',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'view_count': int, 'view_count': int,
@ -46,10 +48,11 @@ class MixcloudIE(MixcloudBaseIE):
'uploader_url': 'https://www.mixcloud.com/dholbach/', 'uploader_url': 'https://www.mixcloud.com/dholbach/',
'artist': 'Submorphics & Chino , Telekinesis, Porter Robinson, Enei, Breakage ft Jess Mills', 'artist': 'Submorphics & Chino , Telekinesis, Porter Robinson, Enei, Breakage ft Jess Mills',
'duration': 3723, 'duration': 3723,
'tags': [], 'tags': ['liquid drum and bass', 'drum and bass'],
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
'like_count': int, 'like_count': int,
'artists': list,
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
@ -67,7 +70,7 @@ class MixcloudIE(MixcloudBaseIE):
'upload_date': '20150203', 'upload_date': '20150203',
'uploader_url': 'https://www.mixcloud.com/gillespeterson/', 'uploader_url': 'https://www.mixcloud.com/gillespeterson/',
'duration': 2992, 'duration': 2992,
'tags': [], 'tags': ['jazz', 'soul', 'world music', 'funk'],
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
'like_count': int, 'like_count': int,
@ -149,8 +152,6 @@ def _real_extract(self, url):
elif reason: elif reason:
raise ExtractorError('Track is restricted', expected=True) raise ExtractorError('Track is restricted', expected=True)
title = cloudcast['name']
stream_info = cloudcast['streamInfo'] stream_info = cloudcast['streamInfo']
formats = [] formats = []
@ -182,47 +183,39 @@ def _real_extract(self, url):
self.raise_login_required(metadata_available=True) self.raise_login_required(metadata_available=True)
comments = [] comments = []
for edge in (try_get(cloudcast, lambda x: x['comments']['edges']) or []): for node in traverse_obj(cloudcast, ('comments', 'edges', ..., 'node', {dict})):
node = edge.get('node') or {}
text = strip_or_none(node.get('comment')) text = strip_or_none(node.get('comment'))
if not text: if not text:
continue continue
user = node.get('user') or {}
comments.append({ comments.append({
'author': user.get('displayName'),
'author_id': user.get('username'),
'text': text, 'text': text,
'timestamp': parse_iso8601(node.get('created')), **traverse_obj(node, {
'author': ('user', 'displayName', {str}),
'author_id': ('user', 'username', {str}),
'timestamp': ('created', {parse_iso8601}),
}),
}) })
tags = []
for t in cloudcast.get('tags'):
tag = try_get(t, lambda x: x['tag']['name'], str)
if not tag:
tags.append(tag)
get_count = lambda x: int_or_none(try_get(cloudcast, lambda y: y[x]['totalCount']))
owner = cloudcast.get('owner') or {}
return { return {
'id': track_id, 'id': track_id,
'title': title,
'formats': formats, 'formats': formats,
'description': cloudcast.get('description'),
'thumbnail': try_get(cloudcast, lambda x: x['picture']['url'], str),
'uploader': owner.get('displayName'),
'timestamp': parse_iso8601(cloudcast.get('publishDate')),
'uploader_id': owner.get('username'),
'uploader_url': owner.get('url'),
'duration': int_or_none(cloudcast.get('audioLength')),
'view_count': int_or_none(cloudcast.get('plays')),
'like_count': get_count('favorites'),
'repost_count': get_count('reposts'),
'comment_count': get_count('comments'),
'comments': comments, 'comments': comments,
'tags': tags, **traverse_obj(cloudcast, {
'artist': ', '.join(cloudcast.get('featuringArtistList') or []) or None, 'title': ('name', {str}),
'description': ('description', {str}),
'thumbnail': ('picture', 'url', {url_or_none}),
'timestamp': ('publishDate', {parse_iso8601}),
'duration': ('audioLength', {int_or_none}),
'uploader': ('owner', 'displayName', {str}),
'uploader_id': ('owner', 'username', {str}),
'uploader_url': ('owner', 'url', {url_or_none}),
'view_count': ('plays', {int_or_none}),
'like_count': ('favorites', 'totalCount', {int_or_none}),
'repost_count': ('reposts', 'totalCount', {int_or_none}),
'comment_count': ('comments', 'totalCount', {int_or_none}),
'tags': ('tags', ..., 'tag', 'name', {str}, filter, all, filter),
'artists': ('featuringArtistList', ..., {str}, filter, all, filter),
}),
} }
@ -295,7 +288,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/', 'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': { 'info_dict': {
'id': 'dholbach_uploads', 'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)', 'title': 'dholbach (uploads)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b', 'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
}, },
'playlist_mincount': 36, 'playlist_mincount': 36,
@ -303,7 +296,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/uploads/', 'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': { 'info_dict': {
'id': 'dholbach_uploads', 'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)', 'title': 'dholbach (uploads)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b', 'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
}, },
'playlist_mincount': 36, 'playlist_mincount': 36,
@ -311,7 +304,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/favorites/', 'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': { 'info_dict': {
'id': 'dholbach_favorites', 'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)', 'title': 'dholbach (favorites)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b', 'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
}, },
# 'params': { # 'params': {
@ -337,7 +330,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'title': 'First Ear (stream)', 'title': 'First Ear (stream)',
'description': 'we maraud for ears', 'description': 'we maraud for ears',
}, },
'playlist_mincount': 269, 'playlist_mincount': 267,
}] }]
_TITLE_KEY = 'displayName' _TITLE_KEY = 'displayName'
@ -361,7 +354,7 @@ class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
'id': 'maxvibes_jazzcat-on-ness-radio', 'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Ness Radio sessions', 'title': 'Ness Radio sessions',
}, },
'playlist_mincount': 59, 'playlist_mincount': 58,
}] }]
_TITLE_KEY = 'name' _TITLE_KEY = 'name'
_DESCRIPTION_KEY = 'description' _DESCRIPTION_KEY = 'description'

View File

@ -449,9 +449,7 @@ def _extract_formats_and_subtitles(self, broadcast, video_id):
if not (m3u8_url and token): if not (m3u8_url and token):
errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str}))) errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str})))
if 'not entitled' in errors: if errors: # Only warn when 'blacked out' or 'not entitled'; radio formats may be available
raise ExtractorError(errors, expected=True)
elif errors: # Only warn when 'blacked out' since radio formats are available
self.report_warning(f'API returned errors for {format_id}: {errors}') self.report_warning(f'API returned errors for {format_id}: {errors}')
else: else:
self.report_warning(f'No formats available for {format_id} broadcast; skipping') self.report_warning(f'No formats available for {format_id} broadcast; skipping')

View File

@ -3,8 +3,8 @@
class MoviepilotIE(InfoExtractor): class MoviepilotIE(InfoExtractor):
_IE_NAME = 'moviepilot' IE_NAME = 'moviepilot'
_IE_DESC = 'Moviepilot trailer' IE_DESC = 'Moviepilot trailer'
_VALID_URL = r'https?://(?:www\.)?moviepilot\.de/movies/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?moviepilot\.de/movies/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{

View File

@ -1,167 +1,215 @@
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
determine_ext, determine_ext,
int_or_none, int_or_none,
unescapeHTML, parse_iso8601,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class MSNIE(InfoExtractor): class MSNIE(InfoExtractor):
_WORKING = False _VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/?#]+/)+(?P<display_id>[^/?#]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d', 'url': 'https://www.msn.com/en-gb/video/news/president-macron-interrupts-trump-over-ukraine-funding/vi-AA1zMcD7',
'md5': '087548191d273c5c55d05028f8d2cbcd',
'info_dict': { 'info_dict': {
'id': 'BBPxU6d', 'id': 'AA1zMcD7',
'display_id': '7-ways-to-get-rid-of-chest-congestion',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Seven ways to get rid of chest congestion', 'display_id': 'president-macron-interrupts-trump-over-ukraine-funding',
'description': '7 Ways to Get Rid of Chest Congestion', 'title': 'President Macron interrupts Trump over Ukraine funding',
'duration': 88, 'description': 'md5:5fd3857ac25849e7a56cb25fbe1a2a8b',
'uploader': 'Health', 'uploader': 'k! News UK',
'uploader_id': 'BBPrMqa', 'uploader_id': 'BB1hz5Rj',
'duration': 59,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zMagX.img',
'tags': 'count:14',
'timestamp': 1740510914,
'upload_date': '20250225',
'release_timestamp': 1740513600,
'release_date': '20250225',
'modified_timestamp': 1741413241,
'modified_date': '20250308',
}, },
}, { }, {
# Article, multiple Dailymotion Embeds 'url': 'https://www.msn.com/en-gb/video/watch/films-success-saved-adam-pearsons-acting-career/vi-AA1znZGE?ocid=hpmsn',
'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
'info_dict': { 'info_dict': {
'id': 'BBpc7Nl', 'id': 'AA1znZGE',
'ext': 'mp4',
'display_id': 'films-success-saved-adam-pearsons-acting-career',
'title': "Films' success saved Adam Pearson's acting career",
'description': 'md5:98c05f7bd9ab4f9c423400f62f2d3da5',
'uploader': 'Sky News',
'uploader_id': 'AA2eki',
'duration': 52,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zo7nU.img',
'timestamp': 1739993965,
'upload_date': '20250219',
'release_timestamp': 1739977753,
'release_date': '20250219',
'modified_timestamp': 1742076259,
'modified_date': '20250315',
}, },
'playlist_mincount': 4,
}, { }, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf', 'url': 'https://www.msn.com/en-us/entertainment/news/rock-frontman-replacements-you-might-not-know-happened/vi-AA1yLVcD',
'only_matching': True, 'info_dict': {
}, { 'id': 'AA1yLVcD',
'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH', 'ext': 'mp4',
'only_matching': True, 'display_id': 'rock-frontman-replacements-you-might-not-know-happened',
}, { 'title': 'Rock Frontman Replacements You Might Not Know Happened',
# geo restricted 'description': 'md5:451a125496ff0c9f6816055bb1808da9',
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU', 'uploader': 'Grunge (Video)',
'only_matching': True, 'uploader_id': 'BB1oveoV',
}, { 'duration': 596,
'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-raped-woman-comment/vi-AAhvzW6', 'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1yM4OJ.img',
'only_matching': True, 'timestamp': 1739223456,
}, { 'upload_date': '20250210',
# Vidible(AOL) Embed 'release_timestamp': 1739219731,
'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR', 'release_date': '20250210',
'only_matching': True, 'modified_timestamp': 1741427272,
'modified_date': '20250308',
},
}, { }, {
# Dailymotion Embed # Dailymotion Embed
'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L', 'url': 'https://www.msn.com/de-de/nachrichten/other/the-first-descendant-gameplay-trailer-zu-serena-der-neuen-gefl%C3%BCgelten-nachfahrin/vi-AA1B1d06',
'only_matching': True, 'info_dict': {
'id': 'x9g6oli',
'ext': 'mp4',
'title': 'The First Descendant: Gameplay-Trailer zu Serena, der neuen geflügelten Nachfahrin',
'description': '',
'uploader': 'MeinMMO',
'uploader_id': 'x2mvqi4',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 60,
'thumbnail': 'https://s1.dmcdn.net/v/Y3fO61drj56vPB9SS/x1080',
'tags': ['MeinMMO', 'The First Descendant'],
'timestamp': 1742124877,
'upload_date': '20250316',
},
}, { }, {
# YouTube Embed # Youtube Embed
'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v', 'url': 'https://www.msn.com/en-gb/video/webcontent/web-content/vi-AA1ybFaJ',
'only_matching': True, 'info_dict': {
'id': 'kQSChWu95nE',
'ext': 'mp4',
'title': '7 Daily Habits to Nurture Your Personal Growth',
'description': 'md5:6f233c68341b74dee30c8c121924e827',
'uploader': 'TopThink',
'uploader_id': '@TopThink',
'uploader_url': 'https://www.youtube.com/@TopThink',
'channel': 'TopThink',
'channel_id': 'UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_url': 'https://www.youtube.com/channel/UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 705,
'thumbnail': 'https://i.ytimg.com/vi/kQSChWu95nE/maxresdefault.jpg',
'categories': ['Howto & Style'],
'tags': ['topthink', 'top think', 'personal growth'],
'timestamp': 1722711620,
'upload_date': '20240803',
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
},
}, { }, {
# NBCSports Embed # Article with social embed
'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb', 'url': 'https://www.msn.com/en-in/news/techandscience/watch-earth-sets-and-rises-behind-moon-in-breathtaking-blue-ghost-video/ar-AA1zKoAc',
'only_matching': True, 'info_dict': {
'id': 'AA1zKoAc',
'title': 'Watch: Earth sets and rises behind Moon in breathtaking Blue Ghost video',
'description': 'md5:0ad51cfa77e42e7f0c46cf98a619dbbf',
'uploader': 'India Today',
'uploader_id': 'AAyFWG',
'tags': 'count:11',
'timestamp': 1740485034,
'upload_date': '20250225',
'release_timestamp': 1740484875,
'release_date': '20250225',
'modified_timestamp': 1740488561,
'modified_date': '20250225',
},
'playlist_count': 1,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id, page_id = self._match_valid_url(url).groups() locale, display_id, page_id = self._match_valid_url(url).group('locale', 'display_id', 'id')
webpage = self._download_webpage(url, display_id) json_data = self._download_json(
f'https://assets.msn.com/content/view/v2/Detail/{locale}/{page_id}', page_id)
entries = [] common_metadata = traverse_obj(json_data, {
for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage): 'title': ('title', {str}),
video = self._parse_json(unescapeHTML(metadata), display_id) 'description': (('abstract', ('body', {clean_html})), {str}, filter, any),
'timestamp': ('createdDateTime', {parse_iso8601}),
provider_id = video.get('providerId') 'release_timestamp': ('publishedDateTime', {parse_iso8601}),
player_name = video.get('playerName') 'modified_timestamp': ('updatedDateTime', {parse_iso8601}),
if player_name and provider_id: 'thumbnail': ('thumbnail', 'image', 'url', {url_or_none}),
entry = None 'duration': ('videoMetadata', 'playTime', {int_or_none}),
if player_name == 'AOL': 'tags': ('keywords', ..., {str}),
if provider_id.startswith('http'): 'uploader': ('provider', 'name', {str}),
provider_id = self._search_regex( 'uploader_id': ('provider', 'id', {str}),
r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})', })
provider_id, 'vidible id')
entry = self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
entry = self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
elif player_name == 'YouTube':
entry = self.url_result(
provider_id, 'Youtube', provider_id)
elif player_name == 'NBCSports':
entry = self.url_result(
'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
'NBCSportsVPlayer', provider_id)
if entry:
entries.append(entry)
continue
video_id = video['uuid']
title = video['title']
page_type = json_data['type']
source_url = traverse_obj(json_data, ('sourceHref', {url_or_none}))
if page_type == 'video':
if traverse_obj(json_data, ('thirdPartyVideoPlayer', 'enabled')) and source_url:
return self.url_result(source_url)
formats = [] formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))
elif 'format=mpd-time-csf' in format_url:
formats.extend(self._extract_mpd_formats(
format_url, display_id, 'dash', fatal=False))
elif '.ism' in format_url:
if format_url.endswith('.ism'):
format_url += '/manifest'
formats.extend(self._extract_ism_formats(
format_url, display_id, 'mss', fatal=False))
else:
format_id = file_.get('formatCode')
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': format_id,
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
'quality': 1 if format_id == '1001' else None,
})
subtitles = {} subtitles = {}
for file_ in video.get('files', []): for file in traverse_obj(json_data, ('videoMetadata', 'externalVideoFiles', lambda _, v: url_or_none(v['url']))):
format_url = file_.get('url') file_url = file['url']
format_code = file_.get('formatCode') ext = determine_ext(file_url)
if not format_url or not format_code: if ext == 'm3u8':
continue fmts, subs = self._extract_m3u8_formats_and_subtitles(
if str(format_code) == '3100': file_url, page_id, 'mp4', m3u8_id='hls', fatal=False)
subtitles.setdefault(file_.get('culture', 'en'), []).append({ formats.extend(fmts)
'ext': determine_ext(format_url, 'ttml'), self._merge_subtitles(subs, target=subtitles)
'url': format_url, elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
file_url, page_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append(
traverse_obj(file, {
'url': 'url',
'format_id': ('format', {str}),
'filesize': ('fileSize', {int_or_none}),
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
}))
for caption in traverse_obj(json_data, ('videoMetadata', 'closedCaptions', lambda _, v: url_or_none(v['href']))):
lang = caption.get('locale') or 'en-us'
subtitles.setdefault(lang, []).append({
'url': caption['href'],
'ext': 'ttml',
}) })
entries.append({ return {
'id': video_id, 'id': page_id,
'display_id': display_id, 'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats, 'formats': formats,
}) 'subtitles': subtitles,
**common_metadata,
}
elif page_type == 'webcontent':
if not source_url:
raise ExtractorError('Could not find source URL')
return self.url_result(source_url)
elif page_type == 'article':
entries = []
for embed_url in traverse_obj(json_data, ('socialEmbeds', ..., 'postUrl', {url_or_none})):
entries.append(self.url_result(embed_url))
if not entries: return self.playlist_result(entries, page_id, **common_metadata)
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError(f'{self.IE_NAME} said: {error}', expected=True)
return self.playlist_result(entries, page_id) raise ExtractorError(f'Unsupported page type: {page_type}')

View File

@ -4,7 +4,9 @@
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
unified_timestamp, unified_timestamp,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class N1InfoAssetIE(InfoExtractor): class N1InfoAssetIE(InfoExtractor):
@ -35,9 +37,9 @@ class N1InfoIIE(InfoExtractor):
IE_NAME = 'N1Info:article' IE_NAME = 'N1Info:article'
_VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# Youtube embedded # YouTube embedded
'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/', 'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/',
'md5': '01ddb6646d0fd9c4c7d990aa77fe1c5a', 'md5': '987ce6fd72acfecc453281e066b87973',
'info_dict': { 'info_dict': {
'id': 'L5Hd4hQVUpk', 'id': 'L5Hd4hQVUpk',
'ext': 'mp4', 'ext': 'mp4',
@ -45,7 +47,26 @@ class N1InfoIIE(InfoExtractor):
'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS', 'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS',
'description': 'md5:467f330af1effedd2e290f10dc31bb8e', 'description': 'md5:467f330af1effedd2e290f10dc31bb8e',
'uploader': 'Sport Klub', 'uploader': 'Sport Klub',
'uploader_id': 'sportklub', 'uploader_id': '@sportklub',
'uploader_url': 'https://www.youtube.com/@sportklub',
'channel': 'Sport Klub',
'channel_id': 'UChpzBje9Ro6CComXe3BgNaw',
'channel_url': 'https://www.youtube.com/channel/UChpzBje9Ro6CComXe3BgNaw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 1049,
'thumbnail': 'https://i.ytimg.com/vi/L5Hd4hQVUpk/maxresdefault.jpg',
'chapters': 'count:9',
'categories': ['Sports'],
'tags': 'count:10',
'timestamp': 1631522787,
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
}, },
}, { }, {
'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/', 'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/',
@ -55,6 +76,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode', 'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode',
'upload_date': '20210924', 'upload_date': '20210924',
'timestamp': 1632481347, 'timestamp': 1632481347,
'thumbnail': 'http://n1info.rs/wp-content/themes/ucnewsportal-n1/dist/assets/images/placeholder-image-video.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -67,6 +89,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”', 'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”',
'timestamp': 1632567630, 'timestamp': 1632567630,
'upload_date': '20210925', 'upload_date': '20210925',
'thumbnail': 'https://n1info.si/wp-content/uploads/2021/09/06/1630945843-tomaz3.png',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -81,6 +104,14 @@ class N1InfoIIE(InfoExtractor):
'upload_date': '20210924', 'upload_date': '20210924',
'timestamp': 1632448649.0, 'timestamp': 1632448649.0,
'uploader': 'YouLotWhatDontStop', 'uploader': 'YouLotWhatDontStop',
'display_id': 'pu9wbx',
'channel_id': 'serbia',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'duration': 134,
'thumbnail': 'https://external-preview.redd.it/5nmmawSeGx60miQM3Iq-ueC9oyCLTLjjqX-qqY8uRsc.png?format=pjpg&auto=webp&s=2f973400b04d23f871b608b178e47fc01f9b8f1d',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -93,6 +124,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)', 'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)',
'upload_date': '20211102', 'upload_date': '20211102',
'timestamp': 1635861677, 'timestamp': 1635861677,
'thumbnail': 'https://nova.rs/wp-content/uploads/2021/11/02/1635860298-TNJG_Ana_Brnabic_i_Zaklina_Tatalovic_100_dana_Vlade_GP.jpg',
}, },
}, { }, {
'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/', 'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/',
@ -104,6 +136,16 @@ class N1InfoIIE(InfoExtractor):
'timestamp': 1687290536, 'timestamp': 1687290536,
'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg', 'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg',
}, },
}, {
'url': 'https://n1info.rs/vesti/vuciceva-turneja-po-srbiji-najavljuje-kontrarevoluciju-preti-svom-narodu-vredja-novinare/',
'info_dict': {
'id': '2025974',
'ext': 'mp4',
'title': 'Vučićeva turneja po Srbiji: Najavljuje kontrarevoluciju, preti svom narodu, vređa novinare',
'thumbnail': 'https://cdn-uc.brid.tv/live/partners/26827/snapshot/2025974_fhd_67c4a23280a81_1740939826.jpg',
'timestamp': 1740939936,
'upload_date': '20250302',
},
}, { }, {
'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/', 'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/',
'only_matching': True, 'only_matching': True,
@ -115,11 +157,11 @@ def _real_extract(self, url):
title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title') title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title')
timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage)) timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage))
plugin_data = self._html_search_meta('BridPlugin', webpage) plugin_data = re.findall(r'\$bp\("(?:Brid|TargetVideo)_\d+",\s(.+)\);', webpage)
entries = [] entries = []
if plugin_data: if plugin_data:
site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id') site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id')
for video_data in re.findall(r'\$bp\("Brid_\d+", (.+)\);', webpage): for video_data in plugin_data:
video_id = self._parse_json(video_data, title)['video'] video_id = self._parse_json(video_data, title)['video']
entries.append({ entries.append({
'id': video_id, 'id': video_id,
@ -140,7 +182,7 @@ def _real_extract(self, url):
'url': video_data.get('data-url'), 'url': video_data.get('data-url'),
'id': video_data.get('id'), 'id': video_data.get('id'),
'title': title, 'title': title,
'thumbnail': video_data.get('data-thumbnail'), 'thumbnail': traverse_obj(video_data, (('data-thumbnail', 'data-default_thumbnail'), {url_or_none}, any)),
'timestamp': timestamp, 'timestamp': timestamp,
'ie_key': 'N1InfoAsset', 'ie_key': 'N1InfoAsset',
}) })
@ -152,7 +194,7 @@ def _real_extract(self, url):
if url.startswith('https://www.youtube.com'): if url.startswith('https://www.youtube.com'):
entries.append(self.url_result(url, ie='Youtube')) entries.append(self.url_result(url, ie='Youtube'))
elif url.startswith('https://www.redditmedia.com'): elif url.startswith('https://www.redditmedia.com'):
entries.append(self.url_result(url, ie='RedditR')) entries.append(self.url_result(url, ie='Reddit'))
return { return {
'_type': 'playlist', '_type': 'playlist',

View File

@ -736,7 +736,7 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
nbc_data = self._search_json( nbc_data = self._search_json(
r'<script>\s*var\s+nbc\s*=', webpage, 'NBC JSON data', video_id) r'(?:<script>\s*var\s+nbc\s*=|Object\.assign\(nbc,)', webpage, 'NBC JSON data', video_id)
pdk_acct = nbc_data.get('pdkAcct') or 'Yh1nAC' pdk_acct = nbc_data.get('pdkAcct') or 'Yh1nAC'
fw_ssid = traverse_obj(nbc_data, ('video', 'fwSSID')) fw_ssid = traverse_obj(nbc_data, ('video', 'fwSSID'))

View File

@ -27,6 +27,7 @@
traverse_obj, traverse_obj,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp,
update_url_query, update_url_query,
url_basename, url_basename,
url_or_none, url_or_none,
@ -985,6 +986,7 @@ def _real_extract(self, url):
'quality': 'abr', 'quality': 'abr',
'protocol': 'hls+fmp4', 'protocol': 'hls+fmp4',
'latency': latency, 'latency': latency,
'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
}, },
'room': { 'room': {
@ -1005,6 +1007,7 @@ def _real_extract(self, url):
if data.get('type') == 'stream': if data.get('type') == 'stream':
m3u8_url = data['data']['uri'] m3u8_url = data['data']['uri']
qualities = data['data']['availableQualities'] qualities = data['data']['availableQualities']
cookies = data['data']['cookies']
break break
elif data.get('type') == 'disconnect': elif data.get('type') == 'disconnect':
self.write_debug(recv) self.write_debug(recv)
@ -1043,6 +1046,11 @@ def _real_extract(self, url):
**res, **res,
}) })
for cookie in cookies:
self._set_cookie(
cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie['expires']), path=cookie['path'], secure=cookie['secure'])
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True) formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt, q in zip(formats, reversed(qualities[1:])): for fmt, q in zip(formats, reversed(qualities[1:])):
fmt.update({ fmt.update({

View File

@ -1,34 +1,46 @@
import json
import re
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
smuggle_url, parse_iso8601,
parse_resolution,
str_or_none, str_or_none,
try_get, url_or_none,
unified_strdate,
unified_timestamp,
) )
from ..utils.traversal import require, traverse_obj, value
class NineNowIE(InfoExtractor): class NineNowIE(InfoExtractor):
IE_NAME = '9now.com.au' IE_NAME = '9now.com.au'
_VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/?#]+/){2}(?P<id>(?P<type>clip|episode)-[^/?#]+)'
_GEO_COUNTRIES = ['AU'] _GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
# clip # clip
'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc', 'url': 'https://www.9now.com.au/today/season-2025/clip-cm8hw9h5z00080hquqa5hszq7',
'md5': '17cf47d63ec9323e562c9957a968b565',
'info_dict': { 'info_dict': {
'id': '16801', 'id': '6370295582112',
'ext': 'mp4', 'ext': 'mp4',
'title': 'St. Kilda\'s Joey Montagna on the potential for a player\'s strike', 'title': 'Would Karl Stefanovic be able to land a plane?',
'description': 'Is a boycott of the NAB Cup "on the table"?', 'description': 'The Today host\'s skills are put to the test with the latest simulation tech.',
'uploader_id': '4460760524001', 'uploader_id': '4460760524001',
'upload_date': '20160713', 'duration': 197.376,
'timestamp': 1468421266, 'tags': ['flights', 'technology', 'Karl Stefanovic'],
'season': 'Season 2025',
'season_number': 2025,
'series': 'TODAY',
'timestamp': 1742507988,
'upload_date': '20250320',
'release_timestamp': 1742507983,
'release_date': '20250320',
'thumbnail': r're:https?://.+/1920x0/.+\.jpg',
},
'params': {
'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
}, },
'skip': 'Only available in Australia',
}, { }, {
# episode # episode
'url': 'https://www.9now.com.au/afl-footy-show/2016/episode-19', 'url': 'https://www.9now.com.au/afl-footy-show/2016/episode-19',
@ -41,7 +53,7 @@ class NineNowIE(InfoExtractor):
# episode of series # episode of series
'url': 'https://www.9now.com.au/lego-masters/season-3/episode-3', 'url': 'https://www.9now.com.au/lego-masters/season-3/episode-3',
'info_dict': { 'info_dict': {
'id': '6249614030001', 'id': '6308830406112',
'title': 'Episode 3', 'title': 'Episode 3',
'ext': 'mp4', 'ext': 'mp4',
'season_number': 3, 'season_number': 3,
@ -50,72 +62,87 @@ class NineNowIE(InfoExtractor):
'uploader_id': '4460760524001', 'uploader_id': '4460760524001',
'timestamp': 1619002200, 'timestamp': 1619002200,
'upload_date': '20210421', 'upload_date': '20210421',
'duration': 3574.085,
'thumbnail': r're:https?://.+/1920x0/.+\.jpg',
'tags': ['episode'],
'series': 'Lego Masters',
'season': 'Season 3',
'episode': 'Episode 3',
'release_timestamp': 1619002200,
'release_date': '20210421',
}, },
'expected_warnings': ['Ignoring subtitle tracks'],
'params': { 'params': {
'skip_download': True, 'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
},
}, {
'url': 'https://www.9now.com.au/married-at-first-sight/season-12/episode-1',
'info_dict': {
'id': '6367798770112',
'ext': 'mp4',
'title': 'Episode 1',
'description': r're:The cultural sensation of Married At First Sight returns with our first weddings! .{90}$',
'uploader_id': '4460760524001',
'duration': 5415.079,
'thumbnail': r're:https?://.+/1920x0/.+\.png',
'tags': ['episode'],
'season': 'Season 12',
'season_number': 12,
'episode': 'Episode 1',
'episode_number': 1,
'series': 'Married at First Sight',
'timestamp': 1737973800,
'upload_date': '20250127',
'release_timestamp': 1737973800,
'release_date': '20250127',
},
'params': {
'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
}, },
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId={}'
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv and yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id, video_type = self._match_valid_url(url).group('id', 'type')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage,
'page data', default='{}'), display_id, fatal=False)
if not page_data:
page_data = self._parse_json(self._parse_json(self._search_regex(
r'window\.__data\s*=\s*JSON\.parse\s*\(\s*(".+?")\s*\)\s*;',
webpage, 'page data'), display_id), display_id)
for kind in ('episode', 'clip'): common_data = traverse_obj(
current_key = page_data.get(kind, {}).get( re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
f'current{kind.capitalize()}Key') (..., {json.loads}, ..., {self._find_json},
if not current_key: lambda _, v: v['payload'][video_type]['slug'] == display_id,
continue 'payload', any, {require('video data')}))
cache = page_data.get(kind, {}).get(f'{kind}Cache', {})
if not cache:
continue
common_data = {
'episode': (cache.get(current_key) or next(iter(cache.values())))[kind],
'season': (cache.get(current_key) or next(iter(cache.values()))).get('season', None),
}
break
else:
raise ExtractorError('Unable to find video data')
if not self.get_param('allow_unplayable_formats') and try_get(common_data, lambda x: x['episode']['video']['drm'], bool): if traverse_obj(common_data, (video_type, 'video', 'drm', {bool})):
self.report_drm(display_id) self.report_drm(display_id)
brightcove_id = try_get( brightcove_id = traverse_obj(common_data, (
common_data, lambda x: x['episode']['video']['brightcoveId'], str) or 'ref:{}'.format(common_data['episode']['video']['referenceId']) video_type, 'video', (
video_id = str_or_none(try_get(common_data, lambda x: x['episode']['video']['id'])) or brightcove_id ('brightcoveId', {str}),
('referenceId', {str}, {lambda x: f'ref:{x}' if x else None}),
title = try_get(common_data, lambda x: x['episode']['name'], str) ), any, {require('brightcove ID')}))
season_number = try_get(common_data, lambda x: x['season']['seasonNumber'], int)
episode_number = try_get(common_data, lambda x: x['episode']['episodeNumber'], int)
timestamp = unified_timestamp(try_get(common_data, lambda x: x['episode']['airDate'], str))
release_date = unified_strdate(try_get(common_data, lambda x: x['episode']['availability'], str))
thumbnails_data = try_get(common_data, lambda x: x['episode']['image']['sizes'], dict) or {}
thumbnails = [{
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_id[1:]),
} for thumbnail_id, thumbnail_url in thumbnails_data.items()]
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': smuggle_url( 'ie_key': BrightcoveNewIE.ie_key(),
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'url': self.BRIGHTCOVE_URL_TEMPLATE.format(brightcove_id),
{'geo_countries': self._GEO_COUNTRIES}), **traverse_obj(common_data, {
'id': video_id, 'id': (video_type, 'video', 'id', {int}, ({str_or_none}, {value(brightcove_id)}), any),
'title': title, 'title': (video_type, 'name', {str}),
'description': try_get(common_data, lambda x: x['episode']['description'], str), 'description': (video_type, 'description', {str}),
'duration': float_or_none(try_get(common_data, lambda x: x['episode']['video']['duration'], float), 1000), 'duration': (video_type, 'video', 'duration', {float_or_none(scale=1000)}),
'thumbnails': thumbnails, 'tags': (video_type, 'tags', ..., 'name', {str}, all, filter),
'ie_key': 'BrightcoveNew', 'series': ('tvSeries', 'name', {str}),
'season_number': season_number, 'season_number': ('season', 'seasonNumber', {int_or_none}),
'episode_number': episode_number, 'episode_number': ('episode', 'episodeNumber', {int_or_none}),
'timestamp': timestamp, 'timestamp': ('episode', 'airDate', {parse_iso8601}),
'release_date': release_date, 'release_timestamp': (video_type, 'availability', {parse_iso8601}),
'thumbnails': (video_type, 'image', 'sizes', {dict.items}, lambda _, v: url_or_none(v[1]), {
'id': 0,
'url': 1,
'width': (1, {parse_resolution}, 'width'),
}),
}),
} }

View File

@ -11,12 +11,15 @@ class On24IE(InfoExtractor):
IE_NAME = 'on24' IE_NAME = 'on24'
IE_DESC = 'ON24' IE_DESC = 'ON24'
_VALID_URL = r'''(?x) _ID_RE = r'(?P<id>\d{7})'
https?://event\.on24\.com/(?: _KEY_RE = r'(?P<key>[0-9A-F]{32})'
wcc/r/(?P<id_1>\d{7})/(?P<key_1>[0-9A-F]{32})| _URL_BASE_RE = r'https?://event\.on24\.com'
eventRegistration/(?:console/EventConsoleApollo|EventLobbyServlet\?target=lobby30) _URL_QUERY_RE = rf'(?:[^#]*&)?eventid={_ID_RE}&(?:[^#]+&)?key={_KEY_RE}'
\.jsp\?(?:[^/#?]*&)?eventid=(?P<id_2>\d{7})[^/#?]*&key=(?P<key_2>[0-9A-F]{32}) _VALID_URL = [
)''' rf'{_URL_BASE_RE}/wcc/r/{_ID_RE}/{_KEY_RE}',
rf'{_URL_BASE_RE}/eventRegistration/console/(?:EventConsoleApollo\.jsp|apollox/mainEvent/?)\?{_URL_QUERY_RE}',
rf'{_URL_BASE_RE}/eventRegistration/EventLobbyServlet/?\?{_URL_QUERY_RE}',
]
_TESTS = [{ _TESTS = [{
'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?uimode=nextgeneration&eventid=2197467&sessionid=1&key=5DF57BE53237F36A43B478DD36277A84&contenttype=A&eventuserid=305999&playerwidth=1000&playerheight=650&caller=previewLobby&text_language_id=en&format=fhaudio&newConsole=false', 'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?uimode=nextgeneration&eventid=2197467&sessionid=1&key=5DF57BE53237F36A43B478DD36277A84&contenttype=A&eventuserid=305999&playerwidth=1000&playerheight=650&caller=previewLobby&text_language_id=en&format=fhaudio&newConsole=false',
@ -34,12 +37,16 @@ class On24IE(InfoExtractor):
}, { }, {
'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?&eventid=2639291&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=&flashsupportedmobiledevice=&helpcenter=&key=82829018E813065A122363877975752E&newConsole=true&nxChe=true&newTabCon=true&text_language_id=en&playerwidth=748&playerheight=526&eventuserid=338788762&contenttype=A&mediametricsessionid=384764716&mediametricid=3558192&usercd=369267058&mode=launch', 'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?&eventid=2639291&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=&flashsupportedmobiledevice=&helpcenter=&key=82829018E813065A122363877975752E&newConsole=true&nxChe=true&newTabCon=true&text_language_id=en&playerwidth=748&playerheight=526&eventuserid=338788762&contenttype=A&mediametricsessionid=384764716&mediametricid=3558192&usercd=369267058&mode=launch',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://event.on24.com/eventRegistration/EventLobbyServlet?target=reg20.jsp&eventid=3543176&key=BC0F6B968B67C34B50D461D40FDB3E18&groupId=3143628',
'only_matching': True,
}, {
'url': 'https://event.on24.com/eventRegistration/console/apollox/mainEvent?&eventid=4843671&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=&flashsupportedmobiledevice=&helpcenter=&key=4EAC9B5C564CC98FF29E619B06A2F743&newConsole=true&nxChe=true&newTabCon=true&consoleEarEventConsole=false&consoleEarCloudApi=false&text_language_id=en&playerwidth=748&playerheight=526&referrer=https%3A%2F%2Fevent.on24.com%2Finterface%2Fregistration%2Fautoreg%2Findex.html%3Fsessionid%3D1%26eventid%3D4843671%26key%3D4EAC9B5C564CC98FF29E619B06A2F743%26email%3D000a3e42-7952-4dd6-8f8a-34c38ea3cf02%2540platform%26firstname%3Ds%26lastname%3Ds%26deletecookie%3Dtrue%26event_email%3DN%26marketing_email%3DN%26std1%3D0642572014177%26std2%3D0642572014179%26std3%3D550165f7-a44e-4725-9fe6-716f89908c2b%26std4%3D0&eventuserid=745776448&contenttype=A&mediametricsessionid=640613707&mediametricid=6810717&usercd=745776448&mode=launch',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) event_id, event_key = self._match_valid_url(url).group('id', 'key')
event_id = mobj.group('id_1') or mobj.group('id_2')
event_key = mobj.group('key_1') or mobj.group('key_2')
event_data = self._download_json( event_data = self._download_json(
'https://event.on24.com/apic/utilApp/EventConsoleCachedServlet', 'https://event.on24.com/apic/utilApp/EventConsoleCachedServlet',

View File

@ -67,7 +67,7 @@ def _extract_movie(self, webpage, video_id, name, is_live):
class OpenRecIE(OpenRecBaseIE): class OpenRecIE(OpenRecBaseIE):
IE_NAME = 'openrec' IE_NAME = 'openrec'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/live/2p8v31qe4zy', 'url': 'https://www.openrec.tv/live/2p8v31qe4zy',
'only_matching': True, 'only_matching': True,
@ -85,7 +85,7 @@ def _real_extract(self, url):
class OpenRecCaptureIE(OpenRecBaseIE): class OpenRecCaptureIE(OpenRecBaseIE):
IE_NAME = 'openrec:capture' IE_NAME = 'openrec:capture'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/capture/l9nk2x4gn14', 'url': 'https://www.openrec.tv/capture/l9nk2x4gn14',
'only_matching': True, 'only_matching': True,
@ -129,7 +129,7 @@ def _real_extract(self, url):
class OpenRecMovieIE(OpenRecBaseIE): class OpenRecMovieIE(OpenRecBaseIE):
IE_NAME = 'openrec:movie' IE_NAME = 'openrec:movie'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/movie/nqz5xl5km8v', 'url': 'https://www.openrec.tv/movie/nqz5xl5km8v',
'info_dict': { 'info_dict': {
@ -141,6 +141,9 @@ class OpenRecMovieIE(OpenRecBaseIE):
'uploader_id': 'taiki_to_kazuhiro', 'uploader_id': 'taiki_to_kazuhiro',
'timestamp': 1638856800, 'timestamp': 1638856800,
}, },
}, {
'url': 'https://www.openrec.tv/movie/2p8vvex548y?playlist_id=98brq96vvsgn2nd',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

101
yt_dlp/extractor/parti.py Normal file
View File

@ -0,0 +1,101 @@
from .common import InfoExtractor
from ..utils import UserNotLive, int_or_none, parse_iso8601, url_or_none, urljoin
from ..utils.traversal import traverse_obj
class PartiBaseIE(InfoExtractor):
def _call_api(self, path, video_id, note=None):
return self._download_json(
f'https://api-backend.parti.com/parti_v2/profile/{path}', video_id, note)
class PartiVideoIE(PartiBaseIE):
IE_NAME = 'parti:video'
_VALID_URL = r'https?://(?:www\.)?parti\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'https://parti.com/video/66284',
'info_dict': {
'id': '66284',
'ext': 'mp4',
'title': 'NOW LIVE ',
'upload_date': '20250327',
'categories': ['Gaming'],
'thumbnail': 'https://assets.parti.com/351424_eb9e5250-2821-484a-9c5f-ca99aa666c87.png',
'channel': 'ItZTMGG',
'timestamp': 1743044379,
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._call_api(f'get_livestream_channel_info/recent/{video_id}', video_id)
return {
'id': video_id,
'formats': self._extract_m3u8_formats(
urljoin('https://watch.parti.com', data['livestream_recording']), video_id, 'mp4'),
**traverse_obj(data, {
'title': ('event_title', {str}),
'channel': ('user_name', {str}),
'thumbnail': ('event_file', {url_or_none}),
'categories': ('category_name', {str}, filter, all),
'timestamp': ('event_start_ts', {int_or_none}),
}),
}
class PartiLivestreamIE(PartiBaseIE):
IE_NAME = 'parti:livestream'
_VALID_URL = r'https?://(?:www\.)?parti\.com/creator/(?P<service>[\w]+)/(?P<id>[\w/-]+)'
_TESTS = [{
'url': 'https://parti.com/creator/parti/Capt_Robs_Adventures',
'info_dict': {
'id': 'Capt_Robs_Adventures',
'ext': 'mp4',
'title': r"re:I'm Live on Parti \d{4}-\d{2}-\d{2} \d{2}:\d{2}",
'view_count': int,
'thumbnail': r're:https://assets\.parti\.com/.+\.png',
'timestamp': 1743879776,
'upload_date': '20250405',
'live_status': 'is_live',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://parti.com/creator/discord/sazboxgaming/0',
'only_matching': True,
}]
def _real_extract(self, url):
service, creator_slug = self._match_valid_url(url).group('service', 'id')
encoded_creator_slug = creator_slug.replace('/', '%23')
creator_id = self._call_api(
f'get_user_by_social_media/{service}/{encoded_creator_slug}',
creator_slug, note='Fetching user ID')
data = self._call_api(
f'get_livestream_channel_info/{creator_id}', creator_id,
note='Fetching user profile feed')['channel_info']
if not traverse_obj(data, ('channel', 'is_live', {bool})):
raise UserNotLive(video_id=creator_id)
channel_info = data['channel']
return {
'id': creator_slug,
'formats': self._extract_m3u8_formats(
channel_info['playback_url'], creator_slug, live=True, query={
'token': channel_info['playback_auth_token'],
'player_version': '1.17.0',
}),
'is_live': True,
**traverse_obj(data, {
'title': ('livestream_event_info', 'event_name', {str}),
'description': ('livestream_event_info', 'event_description', {str}),
'thumbnail': ('livestream_event_info', 'livestream_preview_file', {url_or_none}),
'timestamp': ('stream', 'start_time', {parse_iso8601}),
'view_count': ('stream', 'viewer_count', {int_or_none}),
}),
}

View File

@ -23,9 +23,9 @@ class PinterestBaseIE(InfoExtractor):
def _call_api(self, resource, video_id, options): def _call_api(self, resource, video_id, options):
return self._download_json( return self._download_json(
f'https://www.pinterest.com/resource/{resource}Resource/get/', f'https://www.pinterest.com/resource/{resource}Resource/get/',
video_id, f'Download {resource} JSON metadata', query={ video_id, f'Download {resource} JSON metadata',
'data': json.dumps({'options': options}), query={'data': json.dumps({'options': options})},
})['resource_response'] headers={'X-Pinterest-PWS-Handler': 'www/[username].js'})['resource_response']
def _extract_video(self, data, extract_formats=True): def _extract_video(self, data, extract_formats=True):
video_id = data['id'] video_id = data['id']

View File

@ -22,7 +22,7 @@
) )
class PolskieRadioBaseExtractor(InfoExtractor): class PolskieRadioBaseIE(InfoExtractor):
def _extract_webpage_player_entries(self, webpage, playlist_id, base_data): def _extract_webpage_player_entries(self, webpage, playlist_id, base_data):
media_urls = set() media_urls = set()
@ -47,7 +47,7 @@ def _extract_webpage_player_entries(self, webpage, playlist_id, base_data):
yield entry yield entry
class PolskieRadioLegacyIE(PolskieRadioBaseExtractor): class PolskieRadioLegacyIE(PolskieRadioBaseIE):
# legacy sites # legacy sites
IE_NAME = 'polskieradio:legacy' IE_NAME = 'polskieradio:legacy'
_VALID_URL = r'https?://(?:www\.)?polskieradio(?:24)?\.pl/\d+/\d+/[Aa]rtykul/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?polskieradio(?:24)?\.pl/\d+/\d+/[Aa]rtykul/(?P<id>\d+)'
@ -127,7 +127,7 @@ def _real_extract(self, url):
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)
class PolskieRadioIE(PolskieRadioBaseExtractor): class PolskieRadioIE(PolskieRadioBaseIE):
# new next.js sites # new next.js sites
_VALID_URL = r'https?://(?:[^/]+\.)?(?:polskieradio(?:24)?|radiokierowcow)\.pl/artykul/(?P<id>\d+)' _VALID_URL = r'https?://(?:[^/]+\.)?(?:polskieradio(?:24)?|radiokierowcow)\.pl/artykul/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -519,7 +519,7 @@ def _real_extract(self, url):
} }
class PolskieRadioPodcastBaseExtractor(InfoExtractor): class PolskieRadioPodcastBaseIE(InfoExtractor):
_API_BASE = 'https://apipodcasts.polskieradio.pl/api' _API_BASE = 'https://apipodcasts.polskieradio.pl/api'
def _parse_episode(self, data): def _parse_episode(self, data):
@ -539,7 +539,7 @@ def _parse_episode(self, data):
} }
class PolskieRadioPodcastListIE(PolskieRadioPodcastBaseExtractor): class PolskieRadioPodcastListIE(PolskieRadioPodcastBaseIE):
IE_NAME = 'polskieradio:podcast:list' IE_NAME = 'polskieradio:podcast:list'
_VALID_URL = r'https?://podcasty\.polskieradio\.pl/podcast/(?P<id>\d+)' _VALID_URL = r'https?://podcasty\.polskieradio\.pl/podcast/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -578,7 +578,7 @@ def get_page(page_num):
} }
class PolskieRadioPodcastIE(PolskieRadioPodcastBaseExtractor): class PolskieRadioPodcastIE(PolskieRadioPodcastBaseIE):
IE_NAME = 'polskieradio:podcast' IE_NAME = 'polskieradio:podcast'
_VALID_URL = r'https?://podcasty\.polskieradio\.pl/track/(?P<id>[a-f\d]{8}(?:-[a-f\d]{4}){4}[a-f\d]{8})' _VALID_URL = r'https?://podcasty\.polskieradio\.pl/track/(?P<id>[a-f\d]{8}(?:-[a-f\d]{4}){4}[a-f\d]{8})'
_TESTS = [{ _TESTS = [{

View File

@ -8,6 +8,7 @@
int_or_none, int_or_none,
parse_qs, parse_qs,
traverse_obj, traverse_obj,
truncate_string,
try_get, try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
@ -26,6 +27,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '6rrwyj', 'display_id': '6rrwyj',
'title': 'That small heart attack.', 'title': 'That small heart attack.',
'alt_title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:4', 'thumbnails': 'count:4',
'timestamp': 1501941939, 'timestamp': 1501941939,
@ -49,7 +51,8 @@ class RedditIE(InfoExtractor):
'id': 'gyh95hiqc0b11', 'id': 'gyh95hiqc0b11',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '90bu6w', 'display_id': '90bu6w',
'title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead', 'title': 'Heat index was 110 degrees so we offered him a cold drink. He went fo...',
'alt_title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:7', 'thumbnails': 'count:7',
'timestamp': 1532051078, 'timestamp': 1532051078,
@ -69,7 +72,8 @@ class RedditIE(InfoExtractor):
'id': 'zasobba6wp071', 'id': 'zasobba6wp071',
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'nip71r', 'display_id': 'nip71r',
'title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.', 'title': 'I plan to make more stickers and prints! Check them out on my Etsy! O...',
'alt_title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:5', 'thumbnails': 'count:5',
'timestamp': 1621709093, 'timestamp': 1621709093,
@ -91,7 +95,17 @@ class RedditIE(InfoExtractor):
'playlist_count': 2, 'playlist_count': 2,
'info_dict': { 'info_dict': {
'id': 'wzqkxp', 'id': 'wzqkxp',
'title': 'md5:72d3d19402aa11eff5bd32fc96369b37', 'title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the ...',
'alt_title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the Day We Meet Again" Discussion',
'description': 'md5:5b7deb328062b164b15704c5fd67c335',
'uploader': 'TheTwelveYearOld',
'channel_id': 'KamenRider',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'timestamp': 1661676059.0,
'upload_date': '20220828',
}, },
}, { }, {
# crossposted reddit-hosted media # crossposted reddit-hosted media
@ -102,6 +116,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'zjjw82', 'display_id': 'zjjw82',
'title': 'Cringe', 'title': 'Cringe',
'alt_title': 'Cringe',
'uploader': 'Otaku-senpai69420', 'uploader': 'Otaku-senpai69420',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'upload_date': '20221212', 'upload_date': '20221212',
@ -122,6 +137,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '124pp33', 'display_id': '124pp33',
'title': 'Harmless prank of some old friends', 'title': 'Harmless prank of some old friends',
'alt_title': 'Harmless prank of some old friends',
'uploader': 'Dudezila', 'uploader': 'Dudezila',
'channel_id': 'ContagiousLaughter', 'channel_id': 'ContagiousLaughter',
'duration': 17, 'duration': 17,
@ -142,6 +158,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '12fujy3', 'display_id': '12fujy3',
'title': 'Based Hasan?', 'title': 'Based Hasan?',
'alt_title': 'Based Hasan?',
'uploader': 'KingNigelXLII', 'uploader': 'KingNigelXLII',
'channel_id': 'GenZedong', 'channel_id': 'GenZedong',
'duration': 16, 'duration': 16,
@ -161,6 +178,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1cl9h0u', 'display_id': '1cl9h0u',
'title': 'The insurance claim will be interesting', 'title': 'The insurance claim will be interesting',
'alt_title': 'The insurance claim will be interesting',
'uploader': 'darrenpauli', 'uploader': 'darrenpauli',
'channel_id': 'Unexpected', 'channel_id': 'Unexpected',
'duration': 53, 'duration': 53,
@ -183,6 +201,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1cxwzso', 'display_id': '1cxwzso',
'title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'', 'title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'alt_title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'uploader': 'Woodstovia', 'uploader': 'Woodstovia',
'channel_id': 'soccer', 'channel_id': 'soccer',
'duration': 30, 'duration': 30,
@ -206,6 +225,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'degtjo', 'display_id': 'degtjo',
'title': 'When the K hits', 'title': 'When the K hits',
'alt_title': 'When the K hits',
'uploader': '[deleted]', 'uploader': '[deleted]',
'channel_id': 'ketamine', 'channel_id': 'ketamine',
'comment_count': int, 'comment_count': int,
@ -304,14 +324,6 @@ def _real_extract(self, url):
data = data[0]['data']['children'][0]['data'] data = data[0]['data']['children'][0]['data']
video_url = data['url'] video_url = data['url']
over_18 = data.get('over_18')
if over_18 is True:
age_limit = 18
elif over_18 is False:
age_limit = 0
else:
age_limit = None
thumbnails = [] thumbnails = []
def add_thumbnail(src): def add_thumbnail(src):
@ -337,15 +349,19 @@ def add_thumbnail(src):
add_thumbnail(resolution) add_thumbnail(resolution)
info = { info = {
'title': data.get('title'),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'timestamp': float_or_none(data.get('created_utc')), 'age_limit': {True: 18, False: 0}.get(data.get('over_18')),
'uploader': data.get('author'), **traverse_obj(data, {
'channel_id': data.get('subreddit'), 'title': ('title', {truncate_string(left=72)}),
'like_count': int_or_none(data.get('ups')), 'alt_title': ('title', {str}),
'dislike_count': int_or_none(data.get('downs')), 'description': ('selftext', {str}, filter),
'comment_count': int_or_none(data.get('num_comments')), 'timestamp': ('created_utc', {float_or_none}),
'age_limit': age_limit, 'uploader': ('author', {str}),
'channel_id': ('subreddit', {str}),
'like_count': ('ups', {int_or_none}),
'dislike_count': ('downs', {int_or_none}),
'comment_count': ('num_comments', {int_or_none}),
}),
} }
parsed_url = urllib.parse.urlparse(video_url) parsed_url = urllib.parse.urlparse(video_url)
@ -371,7 +387,7 @@ def add_thumbnail(src):
**info, **info,
}) })
if entries: if entries:
return self.playlist_result(entries, video_id, info.get('title')) return self.playlist_result(entries, video_id, **info)
raise ExtractorError('No media found', expected=True) raise ExtractorError('No media found', expected=True)
# Check if media is hosted on reddit: # Check if media is hosted on reddit:

View File

@ -12,7 +12,7 @@
) )
class RedGifsBaseInfoExtractor(InfoExtractor): class RedGifsBaseIE(InfoExtractor):
_FORMATS = { _FORMATS = {
'gif': 250, 'gif': 250,
'sd': 480, 'sd': 480,
@ -113,7 +113,7 @@ def _paged_entries(self, ep, item_id, query, fields):
return page_fetcher(page) if page else OnDemandPagedList(page_fetcher, self._PAGE_SIZE) return page_fetcher(page) if page else OnDemandPagedList(page_fetcher, self._PAGE_SIZE)
class RedGifsIE(RedGifsBaseInfoExtractor): class RedGifsIE(RedGifsBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)' _VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent', 'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent',
@ -172,7 +172,7 @@ def _real_extract(self, url):
return self._parse_gif_data(video_info['gif']) return self._parse_gif_data(video_info['gif'])
class RedGifsSearchIE(RedGifsBaseInfoExtractor): class RedGifsSearchIE(RedGifsBaseIE):
IE_DESC = 'Redgifs search' IE_DESC = 'Redgifs search'
_VALID_URL = r'https?://(?:www\.)?redgifs\.com/browse\?(?P<query>[^#]+)' _VALID_URL = r'https?://(?:www\.)?redgifs\.com/browse\?(?P<query>[^#]+)'
_PAGE_SIZE = 80 _PAGE_SIZE = 80
@ -226,7 +226,7 @@ def _real_extract(self, url):
entries, query_str, tags, f'RedGifs search for {tags}, ordered by {order}') entries, query_str, tags, f'RedGifs search for {tags}, ordered by {order}')
class RedGifsUserIE(RedGifsBaseInfoExtractor): class RedGifsUserIE(RedGifsBaseIE):
IE_DESC = 'Redgifs user' IE_DESC = 'Redgifs user'
_VALID_URL = r'https?://(?:www\.)?redgifs\.com/users/(?P<username>[^/?#]+)(?:\?(?P<query>[^#]+))?' _VALID_URL = r'https?://(?:www\.)?redgifs\.com/users/(?P<username>[^/?#]+)(?:\?(?P<query>[^#]+))?'
_PAGE_SIZE = 80 _PAGE_SIZE = 80

43
yt_dlp/extractor/roya.py Normal file
View File

@ -0,0 +1,43 @@
from .common import InfoExtractor
from ..utils.traversal import traverse_obj
class RoyaLiveIE(InfoExtractor):
_VALID_URL = r'https?://roya\.tv/live-stream/(?P<id>\d+)'
_TESTS = [{
'url': 'https://roya.tv/live-stream/1',
'info_dict': {
'id': '1',
'title': r're:Roya TV \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'ext': 'mp4',
'live_status': 'is_live',
},
}, {
'url': 'https://roya.tv/live-stream/21',
'info_dict': {
'id': '21',
'title': r're:Roya News \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'ext': 'mp4',
'live_status': 'is_live',
},
}, {
'url': 'https://roya.tv/live-stream/10000',
'only_matching': True,
}]
def _real_extract(self, url):
media_id = self._match_id(url)
stream_url = self._download_json(
f'https://ticket.roya-tv.com/api/v5/fastchannel/{media_id}', media_id)['data']['secured_url']
title = traverse_obj(
self._download_json('https://backend.roya.tv/api/v01/channels/schedule-pagination', media_id, fatal=False),
('data', 0, 'channel', lambda _, v: str(v['id']) == media_id, 'title', {str}, any))
return {
'id': media_id,
'formats': self._extract_m3u8_formats(stream_url, media_id, 'mp4', m3u8_id='hls', live=True),
'title': title,
'is_live': True,
}

View File

@ -3,12 +3,20 @@
import re import re
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor, Request
from ..utils import js_to_json from ..utils import (
determine_ext,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class RTPIE(InfoExtractor): class RTPIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:(?:estudoemcasa|palco|zigzag)/)?p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:[^/#?]+/)?p(?P<program_id>\d+)/(?P<id>e\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas', 'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas',
'md5': 'e736ce0c665e459ddb818546220b4ef8', 'md5': 'e736ce0c665e459ddb818546220b4ef8',
@ -16,99 +24,173 @@ class RTPIE(InfoExtractor):
'id': 'e174042', 'id': 'e174042',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Paixões Cruzadas', 'title': 'Paixões Cruzadas',
'description': 'As paixões musicais de António Cartaxo e António Macedo', 'description': 'md5:af979e58ba0ab73f78435fc943fdb070',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'series': 'Paixões Cruzadas',
'duration': 2950.0,
'modified_timestamp': 1553693464,
'modified_date': '20190327',
'timestamp': 1417219200,
'upload_date': '20141129',
}, },
}, { }, {
'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril', 'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril',
'md5': '9a81ed53f2b2197cfa7ed455b12f8ade', 'md5': '5b4859940e3adef61247a77dfb76046a',
'info_dict': { 'info_dict': {
'id': 'e757904', 'id': 'e757904',
'ext': 'mp4', 'ext': 'mp4',
'title': '25 Curiosidades, 25 de Abril', 'title': 'Estudar ou não estudar',
'description': 'Estudar ou não estudar - Em cada um dos episódios descobrimos uma curiosidade acerca de como era viver em Portugal antes da revolução do 25 de abr', 'description': 'md5:3bfd7eb8bebfd5711a08df69c9c14c35',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1711958401,
'duration': 146.0,
'upload_date': '20240401',
'modified_timestamp': 1712242991,
'series': '25 Curiosidades, 25 de Abril',
'episode_number': 2,
'episode': 'Estudar ou não estudar',
'modified_date': '20240404',
}, },
}, { }, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas', # Episode not accessible through API
'only_matching': True, 'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/e500050/portugues-1-ano',
}, { 'md5': '57660c0b46db9f22118c52cbd65975e4',
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/portugues-1-ano', 'info_dict': {
'only_matching': True, 'id': 'e500050',
}, { 'ext': 'mp4',
'url': 'https://www.rtp.pt/play/palco/p13785/l7nnon', 'title': 'Português - 1.º ano',
'only_matching': True, 'duration': 1669.0,
'description': 'md5:be68925c81269f8c6886589f25fe83ea',
'upload_date': '20201020',
'timestamp': 1603180799,
'thumbnail': 'https://cdn-images.rtp.pt/EPG/imagens/39482_59449_64850.png?v=3&w=860',
},
}] }]
_USER_AGENT = 'rtpplay/2.0.66 (pt.rtp.rtpplay; build:2066; iOS 15.8.3) Alamofire/5.9.1'
_AUTH_TOKEN = None
def _fetch_auth_token(self):
if self._AUTH_TOKEN:
return self._AUTH_TOKEN
self._AUTH_TOKEN = traverse_obj(self._download_json(Request(
'https://rtpplayapi.rtp.pt/play/api/2/token-manager',
headers={
'Accept': '*/*',
'rtp-play-auth': 'RTPPLAY_MOBILE_IOS',
'rtp-play-auth-hash': 'fac9c328b2f27e26e03d7f8942d66c05b3e59371e16c2a079f5c83cc801bd3ee',
'rtp-play-auth-timestamp': '2145973229682',
'User-Agent': self._USER_AGENT,
}, extensions={'keep_header_casing': True}), None,
note='Fetching guest auth token', errnote='Could not fetch guest auth token',
fatal=False), ('token', 'token', {str}))
return self._AUTH_TOKEN
@staticmethod
def _cleanup_media_url(url):
if urllib.parse.urlparse(url).netloc == 'streaming-ondemand.rtp.pt':
return None
return url.replace('/drm-fps/', '/hls/').replace('/drm-dash/', '/dash/')
def _extract_formats(self, media_urls, episode_id):
formats = []
subtitles = {}
for media_url in set(traverse_obj(media_urls, (..., {url_or_none}, {self._cleanup_media_url}))):
ext = determine_ext(media_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
media_url, episode_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
media_url, episode_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'url': media_url,
'format_id': 'http',
})
return formats, subtitles
def _extract_from_api(self, program_id, episode_id):
auth_token = self._fetch_auth_token()
if not auth_token:
return
episode_data = traverse_obj(self._download_json(
f'https://www.rtp.pt/play/api/1/get-episode/{program_id}/{episode_id[1:]}', episode_id,
query={'include_assets': 'true', 'include_webparams': 'true'},
headers={
'Accept': '*/*',
'Authorization': f'Bearer {auth_token}',
'User-Agent': self._USER_AGENT,
}, fatal=False), 'result', {dict})
if not episode_data:
return
asset_urls = traverse_obj(episode_data, ('assets', 0, 'asset_url', {dict}))
media_urls = traverse_obj(asset_urls, (
((('hls', 'dash'), 'stream_url'), ('multibitrate', ('url_hls', 'url_dash'))),))
formats, subtitles = self._extract_formats(media_urls, episode_id)
for sub_data in traverse_obj(asset_urls, ('subtitles', 'vtt_list', lambda _, v: url_or_none(v['file']))):
subtitles.setdefault(sub_data.get('code') or 'pt', []).append({
'url': sub_data['file'],
'name': sub_data.get('language'),
})
return {
'id': episode_id,
'formats': formats,
'subtitles': subtitles,
'thumbnail': traverse_obj(episode_data, ('assets', 0, 'asset_thumbnail', {url_or_none})),
**traverse_obj(episode_data, ('episode', {
'title': (('episode_title', 'program_title'), {str}, filter, any),
'alt_title': ('episode_subtitle', {str}, filter),
'description': (('episode_description', 'episode_summary'), {str}, filter, any),
'timestamp': ('episode_air_date', {parse_iso8601(delimiter=' ')}),
'modified_timestamp': ('episode_lastchanged', {parse_iso8601(delimiter=' ')}),
'duration': ('episode_duration_complete', {parse_duration}),
'episode': ('episode_title', {str}, filter),
'episode_number': ('episode_number', {int_or_none}),
'season': ('program_season', {str}, filter),
'series': ('program_title', {str}, filter),
})),
}
_RX_OBFUSCATION = re.compile(r'''(?xs) _RX_OBFUSCATION = re.compile(r'''(?xs)
atob\s*\(\s*decodeURIComponent\s*\(\s* atob\s*\(\s*decodeURIComponent\s*\(\s*
(\[[0-9A-Za-z%,'"]*\]) (\[[0-9A-Za-z%,'"]*\])
\s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\) \s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\)
''') ''')
def __unobfuscate(self, data, *, video_id): def __unobfuscate(self, data):
if data.startswith('{'): return self._RX_OBFUSCATION.sub(
data = self._RX_OBFUSCATION.sub(
lambda m: json.dumps( lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote( base64.b64decode(urllib.parse.unquote(
''.join(self._parse_json(m.group(1), video_id)), ''.join(json.loads(m.group(1))),
)).decode('iso-8859-1')), )).decode('iso-8859-1')),
data) data)
return js_to_json(data)
def _real_extract(self, url): def _extract_from_html(self, url, episode_id):
video_id = self._match_id(url) webpage = self._download_webpage(url, episode_id)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
f, config = self._search_regex(
r'''(?sx)
(?:var\s+f\s*=\s*(?P<f>".*?"|{[^;]+?});\s*)?
var\s+player1\s+=\s+new\s+RTPPlayer\s*\((?P<config>{(?:(?!\*/).)+?})\);(?!\s*\*/)
''', webpage,
'player config', group=('f', 'config'))
config = self._parse_json(
config, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
f = config['file'] if not f else self._parse_json(
f, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
formats = [] formats = []
if isinstance(f, dict):
f_hls = f.get('hls')
if f_hls is not None:
formats.extend(self._extract_m3u8_formats(
f_hls, video_id, 'mp4', 'm3u8_native', m3u8_id='hls'))
f_dash = f.get('dash')
if f_dash is not None:
formats.extend(self._extract_mpd_formats(f_dash, video_id, mpd_id='dash'))
else:
formats.append({
'format_id': 'f',
'url': f,
'vcodec': 'none' if config.get('mediaType') == 'audio' else None,
})
subtitles = {} subtitles = {}
media_urls = traverse_obj(re.findall(r'(?:var\s+f\s*=|RTPPlayer\({[^}]+file:)\s*({[^}]+}|"[^"]+")', webpage), (
vtt = config.get('vtt') -1, (({self.__unobfuscate}, {js_to_json}, {json.loads}, {dict.values}, ...), {json.loads})))
if vtt is not None: formats, subtitles = self._extract_formats(media_urls, episode_id)
for lcode, lname, url in vtt:
subtitles.setdefault(lcode, []).append({
'name': lname,
'url': url,
})
return { return {
'id': video_id, 'id': episode_id,
'title': title,
'formats': formats, 'formats': formats,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
'subtitles': subtitles, 'subtitles': subtitles,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage, default=None),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image'], webpage, default=None),
**self._search_json_ld(webpage, episode_id, default={}),
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
} }
def _real_extract(self, url):
program_id, episode_id = self._match_valid_url(url).group('program_id', 'id')
return self._extract_from_api(program_id, episode_id) or self._extract_from_html(url, episode_id)

View File

@ -9,7 +9,9 @@
class RTVSIE(InfoExtractor): class RTVSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtvs\.sk/(?:radio|televizia)/archiv(?:/\d+)?/(?P<id>\d+)/?(?:[#?]|$)' IE_NAME = 'stvr'
IE_DESC = 'Slovak Television and Radio (formerly RTVS)'
_VALID_URL = r'https?://(?:www\.)?(?:rtvs|stvr)\.sk/(?:radio|televizia)/archiv(?:/\d+)?/(?P<id>\d+)/?(?:[#?]|$)'
_TESTS = [{ _TESTS = [{
# radio archive # radio archive
'url': 'http://www.rtvs.sk/radio/archiv/11224/414872', 'url': 'http://www.rtvs.sk/radio/archiv/11224/414872',
@ -19,7 +21,7 @@ class RTVSIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'Ostrov pokladov 1 časť.mp3', 'title': 'Ostrov pokladov 1 časť.mp3',
'duration': 2854, 'duration': 2854,
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0000/b1R8.rtvs.jpg', 'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0000/rtvs-00009383.png',
'display_id': '135331', 'display_id': '135331',
}, },
}, { }, {
@ -30,7 +32,7 @@ class RTVSIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Amaro Džives - Náš deň', 'title': 'Amaro Džives - Náš deň',
'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.', 'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.',
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0031/L7Qm.amaro_dzives_png.jpg', 'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0031/L7Qm.amaro_dzives_png.jpg',
'timestamp': 1428555900, 'timestamp': 1428555900,
'upload_date': '20150409', 'upload_date': '20150409',
'duration': 4986, 'duration': 4986,
@ -47,8 +49,11 @@ class RTVSIE(InfoExtractor):
'display_id': '307655', 'display_id': '307655',
'duration': 831, 'duration': 831,
'upload_date': '20211111', 'upload_date': '20211111',
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0916/robin.jpg', 'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0916/robin.jpg',
}, },
}, {
'url': 'https://www.stvr.sk/radio/archiv/11224/414872',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -7,7 +7,6 @@
ExtractorError, ExtractorError,
UnsupportedError, UnsupportedError,
clean_html, clean_html,
determine_ext,
extract_attributes, extract_attributes,
format_field, format_field,
get_element_by_class, get_element_by_class,
@ -36,7 +35,7 @@ class RumbleEmbedIE(InfoExtractor):
'upload_date': '20191020', 'upload_date': '20191020',
'channel_url': 'https://rumble.com/c/WMAR', 'channel_url': 'https://rumble.com/c/WMAR',
'channel': 'WMAR', 'channel': 'WMAR',
'thumbnail': 'https://sp.rmbl.ws/s8/1/5/M/z/1/5Mz1a.qR4e-small-WMAR-2-News-Latest-Headline.jpg', 'thumbnail': r're:https://.+\.jpg',
'duration': 234, 'duration': 234,
'uploader': 'WMAR', 'uploader': 'WMAR',
'live_status': 'not_live', 'live_status': 'not_live',
@ -52,7 +51,7 @@ class RumbleEmbedIE(InfoExtractor):
'upload_date': '20220217', 'upload_date': '20220217',
'channel_url': 'https://rumble.com/c/CyberTechNews', 'channel_url': 'https://rumble.com/c/CyberTechNews',
'channel': 'CTNews', 'channel': 'CTNews',
'thumbnail': 'https://sp.rmbl.ws/s8/6/7/i/9/h/7i9hd.OvCc.jpg', 'thumbnail': r're:https://.+\.jpg',
'duration': 901, 'duration': 901,
'uploader': 'CTNews', 'uploader': 'CTNews',
'live_status': 'not_live', 'live_status': 'not_live',
@ -114,6 +113,22 @@ class RumbleEmbedIE(InfoExtractor):
'live_status': 'was_live', 'live_status': 'was_live',
}, },
'params': {'skip_download': True}, 'params': {'skip_download': True},
}, {
'url': 'https://rumble.com/embed/v6pezdb',
'info_dict': {
'id': 'v6pezdb',
'ext': 'mp4',
'title': '"Es war einmal ein Mädchen" Ein filmisches Zeitzeugnis aus Leningrad 1944',
'uploader': 'RT DE',
'channel': 'RT DE',
'channel_url': 'https://rumble.com/c/RTDE',
'duration': 309,
'thumbnail': 'https://1a-1791.com/video/fww1/dc/s8/1/n/z/2/y/nz2yy.qR4e-small-Es-war-einmal-ein-Mdchen-Ei.jpg',
'timestamp': 1743703500,
'upload_date': '20250403',
'live_status': 'not_live',
},
'params': {'skip_download': True},
}, { }, {
'url': 'https://rumble.com/embed/ufe9n.v5pv5f', 'url': 'https://rumble.com/embed/ufe9n.v5pv5f',
'only_matching': True, 'only_matching': True,
@ -168,40 +183,42 @@ def _real_extract(self, url):
live_status = None live_status = None
formats = [] formats = []
for ext, ext_info in (video.get('ua') or {}).items(): for format_type, format_info in (video.get('ua') or {}).items():
if isinstance(ext_info, dict): if isinstance(format_info, dict):
for height, video_info in ext_info.items(): for height, video_info in format_info.items():
if not traverse_obj(video_info, ('meta', 'h', {int_or_none})): if not traverse_obj(video_info, ('meta', 'h', {int_or_none})):
video_info.setdefault('meta', {})['h'] = height video_info.setdefault('meta', {})['h'] = height
ext_info = ext_info.values() format_info = format_info.values()
for video_info in ext_info: for video_info in format_info:
meta = video_info.get('meta') or {} meta = video_info.get('meta') or {}
if not video_info.get('url'): if not video_info.get('url'):
continue continue
if ext == 'hls': # With default query params returns m3u8 variants which are duplicates, without returns tar files
if format_type == 'tar':
continue
if format_type == 'hls':
if meta.get('live') is True and video.get('live') == 1: if meta.get('live') is True and video.get('live') == 1:
live_status = 'post_live' live_status = 'post_live'
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_info['url'], video_id, video_info['url'], video_id,
ext='mp4', m3u8_id='hls', fatal=False, live=live_status == 'is_live')) ext='mp4', m3u8_id='hls', fatal=False, live=live_status == 'is_live'))
continue continue
timeline = ext == 'timeline' is_timeline = format_type == 'timeline'
if timeline: is_audio = format_type == 'audio'
ext = determine_ext(video_info['url'])
formats.append({ formats.append({
'ext': ext, 'acodec': 'none' if is_timeline else None,
'acodec': 'none' if timeline else None, 'vcodec': 'none' if is_audio else None,
'url': video_info['url'], 'url': video_info['url'],
'format_id': join_nonempty(ext, format_field(meta, 'h', '%sp')), 'format_id': join_nonempty(format_type, format_field(meta, 'h', '%sp')),
'format_note': 'Timeline' if timeline else None, 'format_note': 'Timeline' if is_timeline else None,
'fps': None if timeline else video.get('fps'), 'fps': None if is_timeline or is_audio else video.get('fps'),
**traverse_obj(meta, { **traverse_obj(meta, {
'tbr': 'bitrate', 'tbr': ('bitrate', {int_or_none}),
'filesize': 'size', 'filesize': ('size', {int_or_none}),
'width': 'w', 'width': ('w', {int_or_none}),
'height': 'h', 'height': ('h', {int_or_none}),
}, expected_type=lambda x: int(x) or None), }),
}) })
subtitles = { subtitles = {

View File

@ -122,6 +122,15 @@ def _real_extract(self, url):
if traverse_obj(media, ('partOfSeries', {dict})): if traverse_obj(media, ('partOfSeries', {dict})):
media['epName'] = traverse_obj(media, ('title', {str})) media['epName'] = traverse_obj(media, ('title', {str}))
# Need to set different language for forced subs or else they have priority over full subs
fixed_subtitles = {}
for lang, subs in subtitles.items():
for sub in subs:
fixed_lang = lang
if sub['url'].lower().endswith('_fe.vtt'):
fixed_lang += '-forced'
fixed_subtitles.setdefault(fixed_lang, []).append(sub)
return { return {
'id': video_id, 'id': video_id,
**traverse_obj(media, { **traverse_obj(media, {
@ -151,6 +160,6 @@ def _real_extract(self, url):
}), }),
}), }),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': fixed_subtitles,
'uploader': 'SBSC', 'uploader': 'SBSC',
} }

View File

@ -13,7 +13,7 @@
class SenateISVPIE(InfoExtractor): class SenateISVPIE(InfoExtractor):
_IE_NAME = 'senate.gov:isvp' IE_NAME = 'senate.gov:isvp'
_VALID_URL = r'https?://(?:www\.)?senate\.gov/isvp/?\?(?P<qs>.+)' _VALID_URL = r'https?://(?:www\.)?senate\.gov/isvp/?\?(?P<qs>.+)'
_EMBED_REGEX = [r"<iframe[^>]+src=['\"](?P<url>https?://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]"] _EMBED_REGEX = [r"<iframe[^>]+src=['\"](?P<url>https?://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]"]
@ -137,7 +137,7 @@ def _real_extract(self, url):
class SenateGovIE(InfoExtractor): class SenateGovIE(InfoExtractor):
_IE_NAME = 'senate.gov' IE_NAME = 'senate.gov'
_SUBDOMAIN_RE = '|'.join(map(re.escape, ( _SUBDOMAIN_RE = '|'.join(map(re.escape, (
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking', 'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help', 'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',

View File

@ -2,16 +2,18 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
dict_get, dict_get,
int_or_none, int_or_none,
parse_duration, parse_duration,
unified_timestamp, unified_timestamp,
url_or_none,
urljoin,
) )
from ..utils.traversal import traverse_obj
class SkyItPlayerIE(InfoExtractor): class SkyItBaseIE(InfoExtractor):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
_GEO_BYPASS = False _GEO_BYPASS = False
_DOMAIN = 'sky' _DOMAIN = 'sky'
_PLAYER_TMPL = 'https://player.sky.it/player/external.html?id=%s&domain=%s' _PLAYER_TMPL = 'https://player.sky.it/player/external.html?id=%s&domain=%s'
@ -33,7 +35,6 @@ def _player_url_result(self, video_id):
SkyItPlayerIE.ie_key(), video_id) SkyItPlayerIE.ie_key(), video_id)
def _parse_video(self, video, video_id): def _parse_video(self, video, video_id):
title = video['title']
is_live = video.get('type') == 'live' is_live = video.get('type') == 'live'
hls_url = video.get(('streaming' if is_live else 'hls') + '_url') hls_url = video.get(('streaming' if is_live else 'hls') + '_url')
if not hls_url and video.get('geoblock' if is_live else 'geob'): if not hls_url and video.get('geoblock' if is_live else 'geob'):
@ -43,7 +44,7 @@ def _parse_video(self, video, video_id):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': video.get('title'),
'formats': formats, 'formats': formats,
'thumbnail': dict_get(video, ('video_still', 'video_still_medium', 'thumb')), 'thumbnail': dict_get(video, ('video_still', 'video_still_medium', 'thumb')),
'description': video.get('short_desc') or None, 'description': video.get('short_desc') or None,
@ -52,6 +53,11 @@ def _parse_video(self, video, video_id):
'is_live': is_live, 'is_live': is_live,
} }
class SkyItPlayerIE(SkyItBaseIE):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
domain = urllib.parse.parse_qs(urllib.parse.urlparse( domain = urllib.parse.parse_qs(urllib.parse.urlparse(
@ -67,7 +73,7 @@ def _real_extract(self, url):
return self._parse_video(video, video_id) return self._parse_video(video, video_id)
class SkyItVideoIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItVideoIE(SkyItBaseIE):
IE_NAME = 'video.sky.it' IE_NAME = 'video.sky.it'
_VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)' _VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -96,7 +102,7 @@ def _real_extract(self, url):
return self._player_url_result(video_id) return self._player_url_result(video_id)
class SkyItVideoLiveIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItVideoLiveIE(SkyItBaseIE):
IE_NAME = 'video.sky.it:live' IE_NAME = 'video.sky.it:live'
_VALID_URL = r'https?://video\.sky\.it/diretta/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://video\.sky\.it/diretta/(?P<id>[^/?&#]+)'
_TEST = { _TEST = {
@ -124,7 +130,7 @@ def _real_extract(self, url):
return self._parse_video(livestream, asset_id) return self._parse_video(livestream, asset_id)
class SkyItIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItIE(SkyItBaseIE):
IE_NAME = 'sky.it' IE_NAME = 'sky.it'
_VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
@ -223,3 +229,80 @@ class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}] }]
_DOMAIN = 'mtv8' _DOMAIN = 'mtv8'
class TV8ItLiveIE(SkyItBaseIE):
IE_NAME = 'tv8.it:live'
IE_DESC = 'TV8 Live'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/streaming'
_TESTS = [{
'url': 'https://tv8.it/streaming',
'info_dict': {
'id': 'tv8',
'ext': 'mp4',
'title': str,
'description': str,
'is_live': True,
'live_status': 'is_live',
},
}]
def _real_extract(self, url):
video_id = 'tv8'
livestream = self._download_json(
'https://apid.sky.it/vdp/v1/getLivestream', video_id,
'Downloading manifest JSON', query={'id': '7'})
metadata = self._download_json('https://tv8.it/api/getStreaming', video_id, fatal=False)
return {
**self._parse_video(livestream, video_id),
**traverse_obj(metadata, ('info', {
'title': ('title', 'text', {str}),
'description': ('description', 'html', {clean_html}),
})),
}
class TV8ItPlaylistIE(InfoExtractor):
IE_NAME = 'tv8.it:playlist'
IE_DESC = 'TV8 Playlist'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?!video)[^/#?]+/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://tv8.it/intrattenimento/tv8-gialappas-night',
'playlist_mincount': 32,
'info_dict': {
'id': 'tv8-gialappas-night',
'title': 'Tv8 Gialappa\'s Night',
'description': 'md5:c876039d487d9cf40229b768872718ed',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}, {
'url': 'https://tv8.it/sport/uefa-europa-league',
'playlist_mincount': 11,
'info_dict': {
'id': 'uefa-europa-league',
'title': 'UEFA Europa League',
'description': 'md5:9ab1832b7a8b1705b1f590e13a36bc6a',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
data = self._search_nextjs_data(webpage, playlist_id)['props']['pageProps']['data']
entries = [self.url_result(
urljoin('https://tv8.it', card['href']), ie=TV8ItIE,
**traverse_obj(card, {
'description': ('extraData', 'videoDesc', {str}),
'id': ('extraData', 'asset_id', {str}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'typography', 'text', {str}),
}))
for card in traverse_obj(data, ('lastContent', 'cards', lambda _, v: v['href']))]
return self.playlist_result(entries, playlist_id, **traverse_obj(data, ('card', 'desktop', {
'description': ('description', 'html', {clean_html}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'text', {str}),
})))

236
yt_dlp/extractor/streaks.py Normal file
View File

@ -0,0 +1,236 @@
import json
import urllib.parse
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
filter_dict,
float_or_none,
join_nonempty,
mimetype2ext,
parse_iso8601,
unsmuggle_url,
update_url_query,
url_or_none,
)
from ..utils.traversal import traverse_obj
class StreaksBaseIE(InfoExtractor):
_API_URL_TEMPLATE = 'https://{}.api.streaks.jp/v1/projects/{}/medias/{}{}'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['JP']
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False):
try:
response = self._download_json(
self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''),
media_id, 'Downloading STREAKS playback API JSON', headers={
'Accept': 'application/json',
'Origin': 'https://players.streaks.jp',
**self.geo_verification_headers(),
**(headers or {}),
})
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status in {403, 404}:
error = self._parse_json(e.cause.response.read().decode(), media_id, fatal=False)
message = traverse_obj(error, ('message', {str}))
code = traverse_obj(error, ('code', {str}))
if code == 'REQUEST_FAILED':
self.raise_geo_restricted(message, countries=self._GEO_COUNTRIES)
elif code == 'MEDIA_NOT_FOUND':
raise ExtractorError(message, expected=True)
elif code or message:
raise ExtractorError(join_nonempty(code, message, delim=': '))
raise
streaks_id = response['id']
live_status = {
'clip': 'was_live',
'file': 'not_live',
'linear': 'is_live',
'live': 'is_live',
}.get(response.get('type'))
formats, subtitles = [], {}
drm_formats = False
for source in traverse_obj(response, ('sources', lambda _, v: v['src'])):
if source.get('key_systems'):
drm_formats = True
continue
src_url = source['src']
is_live = live_status == 'is_live'
ext = mimetype2ext(source.get('type'))
if ext != 'm3u8':
self.report_warning(f'Unsupported stream type: {ext}')
continue
if is_live and ssai:
session_params = traverse_obj(self._download_json(
self._API_URL_TEMPLATE.format('ssai', project_id, streaks_id, '/ssai/session'),
media_id, 'Downloading session parameters',
headers={'Content-Type': 'application/json', 'Accept': 'application/json'},
data=json.dumps({'id': source['id']}).encode(),
), (0, 'query', {urllib.parse.parse_qs}))
src_url = update_url_query(src_url, session_params)
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and drm_formats:
self.report_drm(media_id)
self._remove_duplicate_formats(formats)
for subs in traverse_obj(response, (
'tracks', lambda _, v: v['kind'] in ('captions', 'subtitles') and url_or_none(v['src']),
)):
lang = traverse_obj(subs, ('srclang', {str.lower})) or 'ja'
subtitles.setdefault(lang, []).append({'url': subs['src']})
return {
'id': streaks_id,
'display_id': media_id,
'formats': formats,
'live_status': live_status,
'subtitles': subtitles,
'uploader_id': project_id,
**traverse_obj(response, {
'title': ('name', {str}),
'description': ('description', {str}, filter),
'duration': ('duration', {float_or_none}),
'modified_timestamp': ('updated_at', {parse_iso8601}),
'tags': ('tags', ..., {str}),
'thumbnails': (('poster', 'thumbnail'), 'src', {'url': {url_or_none}}),
'timestamp': ('created_at', {parse_iso8601}),
}),
}
class StreaksIE(StreaksBaseIE):
_VALID_URL = [
r'https?://players\.streaks\.jp/(?P<project_id>[\w-]+)/[\da-f]+/index\.html\?(?:[^#]+&)?m=(?P<id>(?:ref:)?[\w-]+)',
r'https?://playback\.api\.streaks\.jp/v1/projects/(?P<project_id>[\w-]+)/medias/(?P<id>(?:ref:)?[\w-]+)',
]
_EMBED_REGEX = [rf'<iframe\s+[^>]*\bsrc\s*=\s*["\'](?P<url>{_VALID_URL[0]})']
_TESTS = [{
'url': 'https://players.streaks.jp/tipness/08155cd19dc14c12bebefb69b92eafcc/index.html?m=dbdf2df35b4d483ebaeeaeb38c594647',
'info_dict': {
'id': 'dbdf2df35b4d483ebaeeaeb38c594647',
'ext': 'mp4',
'title': '3shunenCM_edit.mp4',
'display_id': 'dbdf2df35b4d483ebaeeaeb38c594647',
'duration': 47.533,
'live_status': 'not_live',
'modified_date': '20230726',
'modified_timestamp': 1690356180,
'timestamp': 1690355996,
'upload_date': '20230726',
'uploader_id': 'tipness',
},
}, {
'url': 'https://players.streaks.jp/ktv-web/0298e8964c164ab384c07ef6e08c444b/index.html?m=ref:mycoffeetime_250317',
'info_dict': {
'id': 'dccdc079e3fd41f88b0c8435e2d453ab',
'ext': 'mp4',
'title': 'わたしの珈琲時間_250317',
'display_id': 'ref:mycoffeetime_250317',
'duration': 122.99,
'live_status': 'not_live',
'modified_date': '20250310',
'modified_timestamp': 1741586302,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1741585839,
'upload_date': '20250310',
'uploader_id': 'ktv-web',
},
}, {
'url': 'https://playback.api.streaks.jp/v1/projects/ktv-web/medias/b5411938e1e5435dac71edf829dd4813',
'info_dict': {
'id': 'b5411938e1e5435dac71edf829dd4813',
'ext': 'mp4',
'title': 'KANTELE_SYUSEi_0630',
'display_id': 'b5411938e1e5435dac71edf829dd4813',
'live_status': 'not_live',
'modified_date': '20250122',
'modified_timestamp': 1737522999,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1735205137,
'upload_date': '20241226',
'uploader_id': 'ktv-web',
},
}, {
# TVer Olympics: website already down, but api remains accessible
'url': 'https://playback.api.streaks.jp/v1/projects/tver-olympic/medias/ref:sp_240806_1748_dvr',
'info_dict': {
'id': 'c10f7345adb648cf804d7578ab93b2e3',
'ext': 'mp4',
'title': 'サッカー 男子 準決勝_dvr',
'display_id': 'ref:sp_240806_1748_dvr',
'duration': 12960.0,
'live_status': 'was_live',
'modified_date': '20240805',
'modified_timestamp': 1722896263,
'timestamp': 1722777618,
'upload_date': '20240804',
'uploader_id': 'tver-olympic',
},
}, {
# TBS FREE: 24-hour stream
'url': 'https://playback.api.streaks.jp/v1/projects/tbs/medias/ref:simul-02',
'info_dict': {
'id': 'c4e83a7b48f4409a96adacec674b4e22',
'ext': 'mp4',
'title': str,
'display_id': 'ref:simul-02',
'live_status': 'is_live',
'modified_date': '20241031',
'modified_timestamp': 1730339858,
'timestamp': 1705466840,
'upload_date': '20240117',
'uploader_id': 'tbs',
},
}, {
# DRM protected
'url': 'https://players.streaks.jp/sp-jbc/a12d7ee0f40c49d6a0a2bff520639677/index.html?m=5f89c62f37ee4a68be8e6e3b1396c7d8',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
'url': 'https://event.play.jp/playnext2023/',
'info_dict': {
'id': '2d975178293140dc8074a7fc536a7604',
'ext': 'mp4',
'title': 'PLAY NEXTキームービー本番',
'uploader_id': 'play',
'duration': 17.05,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1668387517,
'upload_date': '20221114',
'modified_timestamp': 1739411523,
'modified_date': '20250213',
'live_status': 'not_live',
},
}, {
'url': 'https://wowshop.jp/Page/special/cooking_goods/?bid=wowshop&srsltid=AfmBOor_phUNoPEE_UCPiGGSCMrJE5T2US397smvsbrSdLqUxwON0el4',
'playlist_mincount': 2,
'info_dict': {
'id': '?bid=wowshop&srsltid=AfmBOor_phUNoPEE_UCPiGGSCMrJE5T2US397smvsbrSdLqUxwON0el4',
'title': 'ワンランク上の料理道具でとびきりの“おいしい”を食卓へwowshop',
'description': 'md5:914b5cb8624fc69274c7fb7b2342958f',
'age_limit': 0,
'thumbnail': 'https://wowshop.jp/Page/special/cooking_goods/images/ogp.jpg',
},
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
project_id, media_id = self._match_valid_url(url).group('project_id', 'id')
return self._extract_from_streaks_api(
project_id, media_id, headers=filter_dict({
'X-Streaks-Api-Key': smuggled_data.get('api_key'),
}))

View File

@ -191,12 +191,12 @@ class TapTapAppIE(TapTapBaseIE):
}] }]
class TapTapIntlBase(TapTapBaseIE): class TapTapIntlBaseIE(TapTapBaseIE):
_X_UA = 'V=1&PN=WebAppIntl2&LANG=zh_TW&VN_CODE=115&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID={uuid}&CURR=&DT=PC&OS=Windows&OSV=NT%208.0.0' _X_UA = 'V=1&PN=WebAppIntl2&LANG=zh_TW&VN_CODE=115&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID={uuid}&CURR=&DT=PC&OS=Windows&OSV=NT%208.0.0'
_VIDEO_API = 'https://www.taptap.io/webapiv2/video-resource/v1/multi-get' _VIDEO_API = 'https://www.taptap.io/webapiv2/video-resource/v1/multi-get'
class TapTapAppIntlIE(TapTapIntlBase): class TapTapAppIntlIE(TapTapIntlBaseIE):
_VALID_URL = r'https?://www\.taptap\.io/app/(?P<id>\d+)' _VALID_URL = r'https?://www\.taptap\.io/app/(?P<id>\d+)'
_INFO_API = 'https://www.taptap.io/webapiv2/i/app/v5/detail' _INFO_API = 'https://www.taptap.io/webapiv2/i/app/v5/detail'
_DATA_PATH = 'app' _DATA_PATH = 'app'
@ -227,7 +227,7 @@ class TapTapAppIntlIE(TapTapIntlBase):
}] }]
class TapTapPostIntlIE(TapTapIntlBase): class TapTapPostIntlIE(TapTapIntlBaseIE):
_VALID_URL = r'https?://www\.taptap\.io/post/(?P<id>\d+)' _VALID_URL = r'https?://www\.taptap\.io/post/(?P<id>\d+)'
_INFO_API = 'https://www.taptap.io/webapiv2/creation/post/v1/detail' _INFO_API = 'https://www.taptap.io/webapiv2/creation/post/v1/detail'
_INFO_QUERY_KEY = 'id_str' _INFO_QUERY_KEY = 'id_str'

View File

@ -46,7 +46,7 @@ def _parse_content(self, content, url):
error_code = traverse_obj( error_code = traverse_obj(
self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False), self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False),
({json.loads}, 'code', {int})) ({json.loads}, 'code', {int}))
if error_code == 4038: if error_code in (4038, 40313):
self.raise_geo_restricted(countries=['ES']) self.raise_geo_restricted(countries=['ES'])
raise raise

View File

@ -26,6 +26,7 @@
srt_subtitles_timecode, srt_subtitles_timecode,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
truncate_string,
try_call, try_call,
try_get, try_get,
url_or_none, url_or_none,
@ -444,7 +445,7 @@ def extract_addr(addr, add_meta={}):
return { return {
'id': aweme_id, 'id': aweme_id,
**traverse_obj(aweme_detail, { **traverse_obj(aweme_detail, {
'title': ('desc', {str}), 'title': ('desc', {truncate_string(left=72)}),
'description': ('desc', {str}), 'description': ('desc', {str}),
'timestamp': ('create_time', {int_or_none}), 'timestamp': ('create_time', {int_or_none}),
}), }),
@ -595,7 +596,7 @@ def _parse_aweme_video_web(self, aweme_detail, webpage_url, video_id, extract_fl
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
})), })),
**traverse_obj(aweme_detail, { **traverse_obj(aweme_detail, {
'title': ('desc', {str}), 'title': ('desc', {truncate_string(left=72)}),
'description': ('desc', {str}), 'description': ('desc', {str}),
# audio-only slideshows have a video duration of 0 and an actual audio duration # audio-only slideshows have a video duration of 0 and an actual audio duration
'duration': ('video', 'duration', {int_or_none}, filter), 'duration': ('video', 'duration', {int_or_none}, filter),
@ -656,7 +657,7 @@ class TikTokIE(TikTokBaseIE):
'info_dict': { 'info_dict': {
'id': '6742501081818877190', 'id': '6742501081818877190',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:5e2a23877420bb85ce6521dbee39ba94', 'title': 'Tag 1 Friend reverse this Video and look what happens 🤩😱 @skyandtami ...',
'description': 'md5:5e2a23877420bb85ce6521dbee39ba94', 'description': 'md5:5e2a23877420bb85ce6521dbee39ba94',
'duration': 27, 'duration': 27,
'height': 1024, 'height': 1024,
@ -860,7 +861,7 @@ class TikTokIE(TikTokBaseIE):
'info_dict': { 'info_dict': {
'id': '7253412088251534594', 'id': '7253412088251534594',
'ext': 'm4a', 'ext': 'm4a',
'title': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ', 'title': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #р...',
'description': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ', 'description': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ',
'uploader': 'hara_yoimiya', 'uploader': 'hara_yoimiya',
'uploader_id': '6582536342634676230', 'uploader_id': '6582536342634676230',

View File

@ -1,31 +1,70 @@
from .common import InfoExtractor from .streaks import StreaksBaseIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none,
join_nonempty, join_nonempty,
make_archive_id,
smuggle_url, smuggle_url,
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj,
update_url_query, update_url_query,
) )
from ..utils.traversal import require, traverse_obj
class TVerIE(InfoExtractor): class TVerIE(StreaksBaseIE):
_VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature)/)+(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature)/)+(?P<id>[a-zA-Z0-9]+)'
_GEO_COUNTRIES = ['JP']
_GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
'skip': 'videos are only available for 7 days', # via Streaks backend
'url': 'https://tver.jp/episodes/ep83nf3w4p', 'url': 'https://tver.jp/episodes/epc1hdugbk',
'info_dict': { 'info_dict': {
'title': '家事ヤロウ!!! 売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着', 'id': 'epc1hdugbk',
'description': 'md5:dc2c06b6acc23f1e7c730c513737719b',
'series': '家事ヤロウ!!!',
'episode': '売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着',
'alt_title': '売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着',
'channel': 'テレビ朝日',
'id': 'ep83nf3w4p',
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'ref:baeebeac-a2a6-4dbf-9eb3-c40d59b40068',
'title': '神回だけ見せます! #2 壮烈!車大騎馬戦(木曜スペシャル)',
'alt_title': '神回だけ見せます! #2 壮烈!車大騎馬戦(木曜スペシャル) 日テレ',
'description': 'md5:2726f742d5e3886edeaf72fb6d740fef',
'uploader_id': 'tver-ntv',
'channel': '日テレ',
'duration': 1158.024,
'thumbnail': 'https://statics.tver.jp/images/content/thumbnail/episode/xlarge/epc1hdugbk.jpg?v=16',
'series': '神回だけ見せます!',
'episode': '#2 壮烈!車大騎馬戦(木曜スペシャル)',
'episode_number': 2,
'timestamp': 1736486036,
'upload_date': '20250110',
'modified_timestamp': 1736870264,
'modified_date': '20250114',
'live_status': 'not_live',
'release_timestamp': 1651453200,
'release_date': '20220502',
'_old_archive_ids': ['brightcovenew ref:baeebeac-a2a6-4dbf-9eb3-c40d59b40068'],
}, },
'add_ie': ['BrightcoveNew'], }, {
# via Brightcove backend (deprecated)
'url': 'https://tver.jp/episodes/epc1hdugbk',
'info_dict': {
'id': 'ref:baeebeac-a2a6-4dbf-9eb3-c40d59b40068',
'ext': 'mp4',
'title': '神回だけ見せます! #2 壮烈!車大騎馬戦(木曜スペシャル)',
'alt_title': '神回だけ見せます! #2 壮烈!車大騎馬戦(木曜スペシャル) 日テレ',
'description': 'md5:2726f742d5e3886edeaf72fb6d740fef',
'uploader_id': '4394098882001',
'channel': '日テレ',
'duration': 1158.101,
'thumbnail': 'https://statics.tver.jp/images/content/thumbnail/episode/xlarge/epc1hdugbk.jpg?v=16',
'tags': [],
'series': '神回だけ見せます!',
'episode': '#2 壮烈!車大騎馬戦(木曜スペシャル)',
'episode_number': 2,
'timestamp': 1651388531,
'upload_date': '20220501',
'release_timestamp': 1651453200,
'release_date': '20220502',
},
'params': {'extractor_args': {'tver': {'backend': ['brightcove']}}},
}, { }, {
'url': 'https://tver.jp/corner/f0103888', 'url': 'https://tver.jp/corner/f0103888',
'only_matching': True, 'only_matching': True,
@ -38,26 +77,7 @@ class TVerIE(InfoExtractor):
'id': 'srtxft431v', 'id': 'srtxft431v',
'title': '名探偵コナン', 'title': '名探偵コナン',
}, },
'playlist': [ 'playlist_mincount': 21,
{
'md5': '779ffd97493ed59b0a6277ea726b389e',
'info_dict': {
'id': 'ref:conan-1137-241005',
'ext': 'mp4',
'title': '名探偵コナン #1137「行列店、味変の秘密」',
'uploader_id': '5330942432001',
'tags': [],
'channel': '読売テレビ',
'series': '名探偵コナン',
'description': 'md5:601fccc1d2430d942a2c8068c4b33eb5',
'episode': '#1137「行列店、味変の秘密」',
'duration': 1469.077,
'timestamp': 1728030405,
'upload_date': '20241004',
'alt_title': '名探偵コナン #1137「行列店、味変の秘密」 読売テレビ 10月5日(土)放送分',
'thumbnail': r're:https://.+\.jpg',
},
}],
}, { }, {
'url': 'https://tver.jp/series/sru35hwdd2', 'url': 'https://tver.jp/series/sru35hwdd2',
'info_dict': { 'info_dict': {
@ -70,7 +90,11 @@ class TVerIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
_HEADERS = {'x-tver-platform-type': 'web'} _HEADERS = {
'x-tver-platform-type': 'web',
'Origin': 'https://tver.jp',
'Referer': 'https://tver.jp/',
}
_PLATFORM_QUERY = {} _PLATFORM_QUERY = {}
def _real_initialize(self): def _real_initialize(self):
@ -103,6 +127,9 @@ def _yield_episode_ids_for_series(self, series_id):
def _real_extract(self, url): def _real_extract(self, url):
video_id, video_type = self._match_valid_url(url).group('id', 'type') video_id, video_type = self._match_valid_url(url).group('id', 'type')
backend = self._configuration_arg('backend', ['streaks'])[0]
if backend not in ('brightcove', 'streaks'):
raise ExtractorError(f'Invalid backend value: {backend}', expected=True)
if video_type == 'series': if video_type == 'series':
series_info = self._call_platform_api( series_info = self._call_platform_api(
@ -129,12 +156,6 @@ def _real_extract(self, url):
video_info = self._download_json( video_info = self._download_json(
f'https://statics.tver.jp/content/episode/{video_id}.json', video_id, 'Downloading video info', f'https://statics.tver.jp/content/episode/{video_id}.json', video_id, 'Downloading video info',
query={'v': version}, headers={'Referer': 'https://tver.jp/'}) query={'v': version}, headers={'Referer': 'https://tver.jp/'})
p_id = video_info['video']['accountID']
r_id = traverse_obj(video_info, ('video', ('videoRefID', 'videoID')), get_all=False)
if not r_id:
raise ExtractorError('Failed to extract reference ID for Brightcove')
if not r_id.isdigit():
r_id = f'ref:{r_id}'
episode = strip_or_none(episode_content.get('title')) episode = strip_or_none(episode_content.get('title'))
series = str_or_none(episode_content.get('seriesTitle')) series = str_or_none(episode_content.get('seriesTitle'))
@ -161,17 +182,53 @@ def _real_extract(self, url):
] ]
] ]
return { metadata = {
'_type': 'url_transparent',
'title': title, 'title': title,
'series': series, 'series': series,
'episode': episode, 'episode': episode,
# an another title which is considered "full title" for some viewers # an another title which is considered "full title" for some viewers
'alt_title': join_nonempty(title, provider, onair_label, delim=' '), 'alt_title': join_nonempty(title, provider, onair_label, delim=' '),
'channel': provider, 'channel': provider,
'description': str_or_none(video_info.get('description')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
**traverse_obj(video_info, {
'description': ('description', {str}),
'release_timestamp': ('viewStatus', 'startAt', {int_or_none}),
'episode_number': ('no', {int_or_none}),
}),
}
brightcove_id = traverse_obj(video_info, ('video', ('videoRefID', 'videoID'), {str}, any))
if brightcove_id and not brightcove_id.isdecimal():
brightcove_id = f'ref:{brightcove_id}'
streaks_id = traverse_obj(video_info, ('streaks', 'videoRefID', {str}))
if streaks_id and not streaks_id.startswith('ref:'):
streaks_id = f'ref:{streaks_id}'
# Deprecated Brightcove extraction reachable w/extractor-arg or fallback; errors are expected
if backend == 'brightcove' or not streaks_id:
if backend != 'brightcove':
self.report_warning(
'No STREAKS ID found; falling back to Brightcove extraction', video_id=video_id)
if not brightcove_id:
raise ExtractorError('Unable to extract brightcove reference ID', expected=True)
account_id = traverse_obj(video_info, (
'video', 'accountID', {str}, {require('brightcove account ID', expected=True)}))
return {
**metadata,
'_type': 'url_transparent',
'url': smuggle_url( 'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id), {'geo_countries': ['JP']}), self.BRIGHTCOVE_URL_TEMPLATE % (account_id, brightcove_id),
{'geo_countries': ['JP']}),
'ie_key': 'BrightcoveNew', 'ie_key': 'BrightcoveNew',
} }
return {
**self._extract_from_streaks_api(video_info['streaks']['projectID'], streaks_id, {
'Origin': 'https://tver.jp',
'Referer': 'https://tver.jp/',
}),
**metadata,
'id': video_id,
'_old_archive_ids': [make_archive_id('BrightcoveNew', brightcove_id)] if brightcove_id else None,
}

117
yt_dlp/extractor/tvw.py Normal file
View File

@ -0,0 +1,117 @@
import json
from .common import InfoExtractor
from ..utils import clean_html, remove_end, unified_timestamp, url_or_none
from ..utils.traversal import traverse_obj
class TvwIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvw\.org/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://tvw.org/video/billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211/',
'md5': '9ceb94fe2bb7fd726f74f16356825703',
'info_dict': {
'id': '2024011211',
'ext': 'mp4',
'title': 'Billy Frank Jr. Statue Maquette Unveiling Ceremony',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:58a8150017d985b4f377e11ee8f6f36e',
'timestamp': 1704902400,
'upload_date': '20240110',
'location': 'Legislative Building',
'display_id': 'billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211',
'categories': ['General Interest'],
},
}, {
'url': 'https://tvw.org/video/ebeys-landing-state-park-2024081007/',
'md5': '71e87dae3deafd65d75ff3137b9a32fc',
'info_dict': {
'id': '2024081007',
'ext': 'mp4',
'title': 'Ebey\'s Landing State Park',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:50c5bd73bde32fa6286a008dbc853386',
'timestamp': 1724310900,
'upload_date': '20240822',
'location': 'Ebeys Landing State Park',
'display_id': 'ebeys-landing-state-park-2024081007',
'categories': ['Washington State Parks'],
},
}, {
'url': 'https://tvw.org/video/home-warranties-workgroup-2',
'md5': 'f678789bf94d07da89809f213cf37150',
'info_dict': {
'id': '1999121000',
'ext': 'mp4',
'title': 'Home Warranties Workgroup',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:861396cc523c9641d0dce690bc5c35f3',
'timestamp': 946389600,
'upload_date': '19991228',
'display_id': 'home-warranties-workgroup-2',
'categories': ['Legislative'],
},
}, {
'url': 'https://tvw.org/video/washington-to-washington-a-new-space-race-2022041111/?eventID=2022041111',
'md5': '6f5551090b351aba10c0d08a881b4f30',
'info_dict': {
'id': '2022041111',
'ext': 'mp4',
'title': 'Washington to Washington - A New Space Race',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:f65a24eec56107afbcebb3aa5cd26341',
'timestamp': 1650394800,
'upload_date': '20220419',
'location': 'Hayner Media Center',
'display_id': 'washington-to-washington-a-new-space-race-2022041111',
'categories': ['Washington to Washington', 'General Interest'],
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
client_id = self._html_search_meta('clientID', webpage, fatal=True)
video_id = self._html_search_meta('eventID', webpage, fatal=True)
video_data = self._download_json(
'https://api.v3.invintus.com/v2/Event/getDetailed', video_id,
headers={
'authorization': 'embedder',
'wsc-api-key': '7WhiEBzijpritypp8bqcU7pfU9uicDR',
},
data=json.dumps({
'clientID': client_id,
'eventID': video_id,
'showStreams': True,
}).encode())['data']
formats = []
subtitles = {}
for stream_url in traverse_obj(video_data, ('streamingURIs', ..., {url_or_none})):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if caption_url := traverse_obj(video_data, ('captionPath', {url_or_none})):
subtitles.setdefault('en', []).append({'url': caption_url, 'ext': 'vtt'})
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'title': remove_end(self._og_search_title(webpage, default=None), ' - TVW'),
'description': self._og_search_description(webpage, default=None),
**traverse_obj(video_data, {
'title': ('title', {str}),
'description': ('description', {clean_html}),
'categories': ('categories', ..., {str}),
'thumbnail': ('videoThumbnail', {url_or_none}),
'timestamp': ('startDateTime', {unified_timestamp}),
'location': ('locationName', {str}),
'is_live': ('eventStatus', {lambda x: x == 'live'}),
}),
}

View File

@ -14,19 +14,20 @@
dict_get, dict_get,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty,
make_archive_id, make_archive_id,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_qs, parse_qs,
qualities, qualities,
str_or_none, str_or_none,
traverse_obj,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none, url_or_none,
urljoin, urljoin,
) )
from ..utils.traversal import traverse_obj, value
class TwitchBaseIE(InfoExtractor): class TwitchBaseIE(InfoExtractor):
@ -42,10 +43,10 @@ class TwitchBaseIE(InfoExtractor):
'CollectionSideBar': '27111f1b382effad0b6def325caef1909c733fe6a4fbabf54f8d491ef2cf2f14', 'CollectionSideBar': '27111f1b382effad0b6def325caef1909c733fe6a4fbabf54f8d491ef2cf2f14',
'FilterableVideoTower_Videos': 'a937f1d22e269e39a03b509f65a7490f9fc247d7f83d6ac1421523e3b68042cb', 'FilterableVideoTower_Videos': 'a937f1d22e269e39a03b509f65a7490f9fc247d7f83d6ac1421523e3b68042cb',
'ClipsCards__User': 'b73ad2bfaecfd30a9e6c28fada15bd97032c83ec77a0440766a56fe0bd632777', 'ClipsCards__User': 'b73ad2bfaecfd30a9e6c28fada15bd97032c83ec77a0440766a56fe0bd632777',
'ShareClipRenderStatus': 'f130048a462a0ac86bb54d653c968c514e9ab9ca94db52368c1179e97b0f16eb',
'ChannelCollectionsContent': '447aec6a0cc1e8d0a8d7732d47eb0762c336a2294fdb009e9c9d854e49d484b9', 'ChannelCollectionsContent': '447aec6a0cc1e8d0a8d7732d47eb0762c336a2294fdb009e9c9d854e49d484b9',
'StreamMetadata': 'a647c2a13599e5991e175155f798ca7f1ecddde73f7f341f39009c14dbf59962', 'StreamMetadata': 'a647c2a13599e5991e175155f798ca7f1ecddde73f7f341f39009c14dbf59962',
'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01', 'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01',
'VideoAccessToken_Clip': '36b89d2507fce29e5ca551df756d27c1cfe079e2609642b4390aa4c35796eb11',
'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c', 'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c',
'VideoMetadata': '49b5b8f268cdeb259d75b58dcb0c1a748e3b575003448a2333dc5cdafd49adad', 'VideoMetadata': '49b5b8f268cdeb259d75b58dcb0c1a748e3b575003448a2333dc5cdafd49adad',
'VideoPlayer_ChapterSelectButtonVideo': '8d2793384aac3773beab5e59bd5d6f585aedb923d292800119e03d40cd0f9b41', 'VideoPlayer_ChapterSelectButtonVideo': '8d2793384aac3773beab5e59bd5d6f585aedb923d292800119e03d40cd0f9b41',
@ -1083,16 +1084,44 @@ class TwitchClipsIE(TwitchBaseIE):
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat', 'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
'md5': '761769e1eafce0ffebfb4089cb3847cd', 'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': { 'info_dict': {
'id': '42850523', 'id': '396245304',
'display_id': 'FaintLightGullWholeWheat', 'display_id': 'FaintLightGullWholeWheat',
'ext': 'mp4', 'ext': 'mp4',
'title': 'EA Play 2016 Live from the Novo Theatre', 'title': 'EA Play 2016 Live from the Novo Theatre',
'duration': 32,
'view_count': int,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1465767393, 'timestamp': 1465767393,
'upload_date': '20160612', 'upload_date': '20160612',
'creator': 'EA', 'creators': ['EA'],
'uploader': 'stereotype_', 'channel': 'EA',
'uploader_id': '43566419', 'channel_id': '25163635',
'channel_is_verified': False,
'channel_follower_count': int,
'uploader': 'EA',
'uploader_id': '25163635',
},
}, {
'url': 'https://www.twitch.tv/xqc/clip/CulturedAmazingKuduDatSheffy-TiZ_-ixAGYR3y2Uy',
'md5': 'e90fe616b36e722a8cfa562547c543f0',
'info_dict': {
'id': '3207364882',
'display_id': 'CulturedAmazingKuduDatSheffy-TiZ_-ixAGYR3y2Uy',
'ext': 'mp4',
'title': 'A day in the life of xQc',
'duration': 60,
'view_count': int,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1742869615,
'upload_date': '20250325',
'creators': ['xQc'],
'channel': 'xQc',
'channel_id': '71092938',
'channel_is_verified': True,
'channel_follower_count': int,
'uploader': 'xQc',
'uploader_id': '71092938',
'categories': ['Just Chatting'],
}, },
}, { }, {
# multiple formats # multiple formats
@ -1116,16 +1145,14 @@ class TwitchClipsIE(TwitchBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) slug = self._match_id(url)
clip = self._download_gql( clip = self._download_gql(
video_id, [{ slug, [{
'operationName': 'VideoAccessToken_Clip', 'operationName': 'ShareClipRenderStatus',
'variables': { 'variables': {'slug': slug},
'slug': video_id,
},
}], }],
'Downloading clip access token GraphQL')[0]['data']['clip'] 'Downloading clip GraphQL')[0]['data']['clip']
if not clip: if not clip:
raise ExtractorError( raise ExtractorError(
@ -1135,81 +1162,71 @@ def _real_extract(self, url):
'sig': clip['playbackAccessToken']['signature'], 'sig': clip['playbackAccessToken']['signature'],
'token': clip['playbackAccessToken']['value'], 'token': clip['playbackAccessToken']['value'],
} }
asset_default = traverse_obj(clip, ('assets', 0, {dict})) or {}
data = self._download_base_gql( asset_portrait = traverse_obj(clip, ('assets', 1, {dict})) or {}
video_id, {
'query': '''{
clip(slug: "%s") {
broadcaster {
displayName
}
createdAt
curator {
displayName
id
}
durationSeconds
id
tiny: thumbnailURL(width: 86, height: 45)
small: thumbnailURL(width: 260, height: 147)
medium: thumbnailURL(width: 480, height: 272)
title
videoQualities {
frameRate
quality
sourceURL
}
viewCount
}
}''' % video_id}, 'Downloading clip GraphQL', fatal=False) # noqa: UP031
if data:
clip = try_get(data, lambda x: x['data']['clip'], dict) or clip
formats = [] formats = []
for option in clip.get('videoQualities', []): default_aspect_ratio = float_or_none(asset_default.get('aspectRatio'))
if not isinstance(option, dict): formats.extend(traverse_obj(asset_default, ('videoQualities', lambda _, v: url_or_none(v['sourceURL']), {
continue 'url': ('sourceURL', {update_url_query(query=access_query)}),
source = url_or_none(option.get('sourceURL')) 'format_id': ('quality', {str}),
if not source: 'height': ('quality', {int_or_none}),
continue 'fps': ('frameRate', {float_or_none}),
'aspect_ratio': {value(default_aspect_ratio)},
})))
portrait_aspect_ratio = float_or_none(asset_portrait.get('aspectRatio'))
for source in traverse_obj(asset_portrait, ('videoQualities', lambda _, v: url_or_none(v['sourceURL']))):
formats.append({ formats.append({
'url': update_url_query(source, access_query), 'url': update_url_query(source['sourceURL'], access_query),
'format_id': option.get('quality'), 'format_id': join_nonempty('portrait', source.get('quality')),
'height': int_or_none(option.get('quality')), 'height': int_or_none(source.get('quality')),
'fps': int_or_none(option.get('frameRate')), 'fps': float_or_none(source.get('frameRate')),
'aspect_ratio': portrait_aspect_ratio,
'quality': -2,
}) })
thumbnails = [] thumbnails = []
for thumbnail_id in ('tiny', 'small', 'medium'): thumb_asset_default_url = url_or_none(asset_default.get('thumbnailURL'))
thumbnail_url = clip.get(thumbnail_id) if thumb_asset_default_url:
if not thumbnail_url: thumbnails.append({
continue 'id': 'default',
thumb = { 'url': thumb_asset_default_url,
'id': thumbnail_id, 'preference': 0,
'url': thumbnail_url, })
} if thumb_asset_portrait_url := url_or_none(asset_portrait.get('thumbnailURL')):
mobj = re.search(r'-(\d+)x(\d+)\.', thumbnail_url) thumbnails.append({
if mobj: 'id': 'portrait',
thumb.update({ 'url': thumb_asset_portrait_url,
'height': int(mobj.group(2)), 'preference': -1,
'width': int(mobj.group(1)), })
thumb_default_url = url_or_none(clip.get('thumbnailURL'))
if thumb_default_url and thumb_default_url != thumb_asset_default_url:
thumbnails.append({
'id': 'small',
'url': thumb_default_url,
'preference': -2,
}) })
thumbnails.append(thumb)
old_id = self._search_regex(r'%7C(\d+)(?:-\d+)?.mp4', formats[-1]['url'], 'old id', default=None) old_id = self._search_regex(r'%7C(\d+)(?:-\d+)?.mp4', formats[-1]['url'], 'old id', default=None)
return { return {
'id': clip.get('id') or video_id, 'id': clip.get('id') or slug,
'_old_archive_ids': [make_archive_id(self, old_id)] if old_id else None, '_old_archive_ids': [make_archive_id(self, old_id)] if old_id else None,
'display_id': video_id, 'display_id': slug,
'title': clip.get('title'),
'formats': formats, 'formats': formats,
'duration': int_or_none(clip.get('durationSeconds')),
'view_count': int_or_none(clip.get('viewCount')),
'timestamp': unified_timestamp(clip.get('createdAt')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'creator': try_get(clip, lambda x: x['broadcaster']['displayName'], str), **traverse_obj(clip, {
'uploader': try_get(clip, lambda x: x['curator']['displayName'], str), 'title': ('title', {str}),
'uploader_id': try_get(clip, lambda x: x['curator']['id'], str), 'duration': ('durationSeconds', {int_or_none}),
'view_count': ('viewCount', {int_or_none}),
'timestamp': ('createdAt', {parse_iso8601}),
'creators': ('broadcaster', 'displayName', {str}, filter, all),
'channel': ('broadcaster', 'displayName', {str}),
'channel_id': ('broadcaster', 'id', {str}),
'channel_follower_count': ('broadcaster', 'followers', 'totalCount', {int_or_none}),
'channel_is_verified': ('broadcaster', 'isPartner', {bool}),
'uploader': ('broadcaster', 'displayName', {str}),
'uploader_id': ('broadcaster', 'id', {str}),
'categories': ('game', 'displayName', {str}, filter, all, filter),
}),
} }

View File

@ -21,6 +21,7 @@
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj, traverse_obj,
truncate_string,
try_call, try_call,
try_get, try_get,
unified_timestamp, unified_timestamp,
@ -358,6 +359,7 @@ class TwitterCardIE(InfoExtractor):
'display_id': '560070183650213889', 'display_id': '560070183650213889',
'uploader_url': 'https://twitter.com/Twitter', 'uploader_url': 'https://twitter.com/Twitter',
}, },
'skip': 'This content is no longer available.',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768', 'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
@ -365,7 +367,7 @@ class TwitterCardIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '623160978427936768', 'id': '623160978427936768',
'ext': 'mp4', 'ext': 'mp4',
'title': "NASA - Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video.", 'title': "NASA - Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASA...",
'description': "Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video. https://t.co/BJYgOjSeGA", 'description': "Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video. https://t.co/BJYgOjSeGA",
'uploader': 'NASA', 'uploader': 'NASA',
'uploader_id': 'NASA', 'uploader_id': 'NASA',
@ -377,12 +379,14 @@ class TwitterCardIE(InfoExtractor):
'like_count': int, 'like_count': int,
'repost_count': int, 'repost_count': int,
'tags': ['PlutoFlyby'], 'tags': ['PlutoFlyby'],
'channel_id': '11348282',
'_old_archive_ids': ['twitter 623160978427936768'],
}, },
'params': {'format': '[protocol=https]'}, 'params': {'format': '[protocol=https]'},
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977', 'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
'md5': 'b6d9683dd3f48e340ded81c0e917ad46', 'md5': 'fb08fbd69595cbd8818f0b2f2a94474d',
'info_dict': { 'info_dict': {
'id': 'dq4Oj5quskI', 'id': 'dq4Oj5quskI',
'ext': 'mp4', 'ext': 'mp4',
@ -390,12 +394,12 @@ class TwitterCardIE(InfoExtractor):
'description': 'md5:a831e97fa384863d6e26ce48d1c43376', 'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
'upload_date': '20111013', 'upload_date': '20111013',
'uploader': 'OMG! UBUNTU!', 'uploader': 'OMG! UBUNTU!',
'uploader_id': 'omgubuntu', 'uploader_id': '@omgubuntu',
'channel_url': 'https://www.youtube.com/channel/UCIiSwcm9xiFb3Y4wjzR41eQ', 'channel_url': 'https://www.youtube.com/channel/UCIiSwcm9xiFb3Y4wjzR41eQ',
'channel_id': 'UCIiSwcm9xiFb3Y4wjzR41eQ', 'channel_id': 'UCIiSwcm9xiFb3Y4wjzR41eQ',
'channel_follower_count': int, 'channel_follower_count': int,
'chapters': 'count:8', 'chapters': 'count:8',
'uploader_url': 'http://www.youtube.com/user/omgubuntu', 'uploader_url': 'https://www.youtube.com/@omgubuntu',
'duration': 138, 'duration': 138,
'categories': ['Film & Animation'], 'categories': ['Film & Animation'],
'age_limit': 0, 'age_limit': 0,
@ -407,6 +411,9 @@ class TwitterCardIE(InfoExtractor):
'tags': 'count:12', 'tags': 'count:12',
'channel': 'OMG! UBUNTU!', 'channel': 'OMG! UBUNTU!',
'playable_in_embed': True, 'playable_in_embed': True,
'heatmap': 'count:100',
'timestamp': 1318500227,
'live_status': 'not_live',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
}, },
@ -548,13 +555,14 @@ class TwitterIE(TwitterBaseIE):
'age_limit': 0, 'age_limit': 0,
'_old_archive_ids': ['twitter 700207533655363584'], '_old_archive_ids': ['twitter 700207533655363584'],
}, },
'skip': 'Tweet has been deleted',
}, { }, {
'url': 'https://twitter.com/captainamerica/status/719944021058060289', 'url': 'https://twitter.com/captainamerica/status/719944021058060289',
'info_dict': { 'info_dict': {
'id': '717462543795523584', 'id': '717462543795523584',
'display_id': '719944021058060289', 'display_id': '719944021058060289',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.', 'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theat...',
'description': '@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI', 'description': '@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI',
'channel_id': '701615052', 'channel_id': '701615052',
'uploader_id': 'CaptainAmerica', 'uploader_id': 'CaptainAmerica',
@ -591,7 +599,7 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '852077943283097602', 'id': '852077943283097602',
'ext': 'mp4', 'ext': 'mp4',
'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة', 'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعا...',
'description': 'كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة https://t.co/xg6OhpyKfN', 'description': 'كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة https://t.co/xg6OhpyKfN',
'channel_id': '2526757026', 'channel_id': '2526757026',
'uploader': 'عالم الأخبار', 'uploader': 'عالم الأخبار',
@ -615,7 +623,7 @@ class TwitterIE(TwitterBaseIE):
'id': '910030238373089285', 'id': '910030238373089285',
'display_id': '910031516746514432', 'display_id': '910031516746514432',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Préfet de Guadeloupe - [Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre.', 'title': 'Préfet de Guadeloupe - [Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terr...',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': '[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo', 'description': '[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo',
'channel_id': '2319432498', 'channel_id': '2319432498',
@ -707,7 +715,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1349774757969989634', 'id': '1349774757969989634',
'display_id': '1349794411333394432', 'display_id': '1349794411333394432',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:d1c4941658e4caaa6cb579260d85dcba', 'title': "Brooklyn Nets - WATCH: Sean Marks' full media session after our acquisition of 8-time...",
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:71ead15ec44cee55071547d6447c6a3e', 'description': 'md5:71ead15ec44cee55071547d6447c6a3e',
'channel_id': '18552281', 'channel_id': '18552281',
@ -733,7 +741,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1577855447914409984', 'id': '1577855447914409984',
'display_id': '1577855540407197696', 'display_id': '1577855540407197696',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:466a3a8b049b5f5a13164ce915484b51', 'title': 'Oshtru - gm ✨️ now I can post image and video. nice update.',
'description': 'md5:b9c3699335447391d11753ab21c70a74', 'description': 'md5:b9c3699335447391d11753ab21c70a74',
'upload_date': '20221006', 'upload_date': '20221006',
'channel_id': '143077138', 'channel_id': '143077138',
@ -755,10 +763,10 @@ class TwitterIE(TwitterBaseIE):
'url': 'https://twitter.com/UltimaShadowX/status/1577719286659006464', 'url': 'https://twitter.com/UltimaShadowX/status/1577719286659006464',
'info_dict': { 'info_dict': {
'id': '1577719286659006464', 'id': '1577719286659006464',
'title': 'Ultima Reload - Test', 'title': 'Ultima - Test',
'description': 'Test https://t.co/Y3KEZD7Dad', 'description': 'Test https://t.co/Y3KEZD7Dad',
'channel_id': '168922496', 'channel_id': '168922496',
'uploader': 'Ultima Reload', 'uploader': 'Ultima',
'uploader_id': 'UltimaShadowX', 'uploader_id': 'UltimaShadowX',
'uploader_url': 'https://twitter.com/UltimaShadowX', 'uploader_url': 'https://twitter.com/UltimaShadowX',
'upload_date': '20221005', 'upload_date': '20221005',
@ -777,7 +785,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1575559336759263233', 'id': '1575559336759263233',
'display_id': '1575560063510810624', 'display_id': '1575560063510810624',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:eec26382babd0f7c18f041db8ae1c9c9', 'title': 'Max Olson - Absolutely heartbreaking footage captured by our surge probe of catas...',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:95aea692fda36a12081b9629b02daa92', 'description': 'md5:95aea692fda36a12081b9629b02daa92',
'channel_id': '1094109584', 'channel_id': '1094109584',
@ -901,18 +909,18 @@ class TwitterIE(TwitterBaseIE):
'playlist_mincount': 2, 'playlist_mincount': 2,
'info_dict': { 'info_dict': {
'id': '1600649710662213632', 'id': '1600649710662213632',
'title': 'md5:be05989b0722e114103ed3851a0ffae2', 'title': "Jocelyn Laidlaw - How Kirstie Alley's tragic death inspired me to share more about my c...",
'timestamp': 1670459604.0, 'timestamp': 1670459604.0,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c', 'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
'comment_count': int, 'comment_count': int,
'uploader_id': 'CTVJLaidlaw', 'uploader_id': 'JocelynVLaidlaw',
'channel_id': '80082014', 'channel_id': '80082014',
'repost_count': int, 'repost_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'], 'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'upload_date': '20221208', 'upload_date': '20221208',
'age_limit': 0, 'age_limit': 0,
'uploader': 'Jocelyn Laidlaw', 'uploader': 'Jocelyn Laidlaw',
'uploader_url': 'https://twitter.com/CTVJLaidlaw', 'uploader_url': 'https://twitter.com/JocelynVLaidlaw',
'like_count': int, 'like_count': int,
}, },
}, { }, {
@ -921,17 +929,17 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '1600649511827013632', 'id': '1600649511827013632',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:7662a0a27ce6faa3e5b160340f3cfab1', 'title': "Jocelyn Laidlaw - How Kirstie Alley's tragic death inspired me to share more about my c... #1",
'thumbnail': r're:^https?://.+\.jpg', 'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1670459604.0, 'timestamp': 1670459604.0,
'channel_id': '80082014', 'channel_id': '80082014',
'uploader_id': 'CTVJLaidlaw', 'uploader_id': 'JocelynVLaidlaw',
'uploader': 'Jocelyn Laidlaw', 'uploader': 'Jocelyn Laidlaw',
'repost_count': int, 'repost_count': int,
'comment_count': int, 'comment_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'], 'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'duration': 102.226, 'duration': 102.226,
'uploader_url': 'https://twitter.com/CTVJLaidlaw', 'uploader_url': 'https://twitter.com/JocelynVLaidlaw',
'display_id': '1600649710662213632', 'display_id': '1600649710662213632',
'like_count': int, 'like_count': int,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c', 'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
@ -990,6 +998,7 @@ class TwitterIE(TwitterBaseIE):
'_old_archive_ids': ['twitter 1599108751385972737'], '_old_archive_ids': ['twitter 1599108751385972737'],
}, },
'params': {'noplaylist': True}, 'params': {'noplaylist': True},
'skip': 'Tweet is limited',
}, { }, {
'url': 'https://twitter.com/MunTheShinobi/status/1600009574919962625', 'url': 'https://twitter.com/MunTheShinobi/status/1600009574919962625',
'info_dict': { 'info_dict': {
@ -1001,10 +1010,10 @@ class TwitterIE(TwitterBaseIE):
'description': 'This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525 https://t.co/cNsA0MoOml', 'description': 'This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525 https://t.co/cNsA0MoOml',
'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1600009362759733248/pu/img/XVhFQivj75H_YxxV.jpg?name=orig', 'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1600009362759733248/pu/img/XVhFQivj75H_YxxV.jpg?name=orig',
'age_limit': 0, 'age_limit': 0,
'uploader': 'Mün', 'uploader': 'Boy Called Mün',
'repost_count': int, 'repost_count': int,
'upload_date': '20221206', 'upload_date': '20221206',
'title': 'Mün - This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525', 'title': 'Boy Called Mün - This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525',
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'tags': [], 'tags': [],
@ -1042,7 +1051,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1694928337846538240', 'id': '1694928337846538240',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1695424220702888009', 'display_id': '1695424220702888009',
'title': 'md5:e8daa9527bc2b947121395494f786d9d', 'title': 'Benny Johnson - Donald Trump driving through the urban, poor neighborhoods of Atlanta...',
'description': 'md5:004f2d37fd58737724ec75bc7e679938', 'description': 'md5:004f2d37fd58737724ec75bc7e679938',
'channel_id': '15212187', 'channel_id': '15212187',
'uploader': 'Benny Johnson', 'uploader': 'Benny Johnson',
@ -1066,7 +1075,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1694928337846538240', 'id': '1694928337846538240',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1695424220702888009', 'display_id': '1695424220702888009',
'title': 'md5:e8daa9527bc2b947121395494f786d9d', 'title': 'Benny Johnson - Donald Trump driving through the urban, poor neighborhoods of Atlanta...',
'description': 'md5:004f2d37fd58737724ec75bc7e679938', 'description': 'md5:004f2d37fd58737724ec75bc7e679938',
'channel_id': '15212187', 'channel_id': '15212187',
'uploader': 'Benny Johnson', 'uploader': 'Benny Johnson',
@ -1101,6 +1110,7 @@ class TwitterIE(TwitterBaseIE):
'view_count': int, 'view_count': int,
}, },
'add_ie': ['TwitterBroadcast'], 'add_ie': ['TwitterBroadcast'],
'skip': 'Broadcast no longer exists',
}, { }, {
# Animated gif and quote tweet video # Animated gif and quote tweet video
'url': 'https://twitter.com/BAKKOOONN/status/1696256659889565950', 'url': 'https://twitter.com/BAKKOOONN/status/1696256659889565950',
@ -1129,7 +1139,7 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '1724883339285544960', 'id': '1724883339285544960',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:cc56716f9ed0b368de2ba54c478e493c', 'title': 'Robert F. Kennedy Jr - A beautifully crafted short film by Mikki Willis about my independent...',
'description': 'md5:9dc14f5b0f1311fc7caf591ae253a164', 'description': 'md5:9dc14f5b0f1311fc7caf591ae253a164',
'display_id': '1724884212803834154', 'display_id': '1724884212803834154',
'channel_id': '337808606', 'channel_id': '337808606',
@ -1150,7 +1160,7 @@ class TwitterIE(TwitterBaseIE):
}, { }, {
# x.com # x.com
'url': 'https://x.com/historyinmemes/status/1790637656616943991', 'url': 'https://x.com/historyinmemes/status/1790637656616943991',
'md5': 'daca3952ba0defe2cfafb1276d4c1ea5', 'md5': '4549eda363fecfe37439c455923cba2c',
'info_dict': { 'info_dict': {
'id': '1790637589910654976', 'id': '1790637589910654976',
'ext': 'mp4', 'ext': 'mp4',
@ -1334,7 +1344,7 @@ def _build_graphql_query(self, media_id):
def _generate_syndication_token(self, twid): def _generate_syndication_token(self, twid):
# ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '') # ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
translation = str.maketrans(dict.fromkeys('0.')) translation = str.maketrans(dict.fromkeys('0.'))
return js_number_to_string((int(twid) / 1e15) * math.PI, 36).translate(translation) return js_number_to_string((int(twid) / 1e15) * math.pi, 36).translate(translation)
def _call_syndication_api(self, twid): def _call_syndication_api(self, twid):
self.report_warning( self.report_warning(
@ -1390,7 +1400,7 @@ def _real_extract(self, url):
title = description = traverse_obj( title = description = traverse_obj(
status, (('full_text', 'text'), {lambda x: x.replace('\n', ' ')}), get_all=False) or '' status, (('full_text', 'text'), {lambda x: x.replace('\n', ' ')}), get_all=False) or ''
# strip 'https -_t.co_BJYgOjSeGA' junk from filenames # strip 'https -_t.co_BJYgOjSeGA' junk from filenames
title = re.sub(r'\s+(https?://[^ ]+)', '', title) title = truncate_string(re.sub(r'\s+(https?://[^ ]+)', '', title), left=72)
user = status.get('user') or {} user = status.get('user') or {}
uploader = user.get('name') uploader = user.get('name')
if uploader: if uploader:

View File

@ -51,6 +51,8 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'(?:watch|front)\.njpwworld\.com', r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai', r'qub\.ca/vrai',
r'(?:beta\.)?crunchyroll\.com', r'(?:beta\.)?crunchyroll\.com',
r'viki\.com',
r'deezer\.com',
) )
_TESTS = [{ _TESTS = [{
@ -160,6 +162,12 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, { }, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy', 'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1',
'only_matching': True,
}, {
'url': 'http://www.deezer.com/playlist/176747451',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,346 +0,0 @@
import hashlib
import hmac
import json
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
try_get,
)
class VikiBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?viki\.(?:com|net|mx|jp|fr)/'
_API_URL_TEMPLATE = 'https://api.viki.io%s'
_DEVICE_ID = '112395910d'
_APP = '100005a'
_APP_VERSION = '6.11.3'
_APP_SECRET = 'd96704b180208dbb2efa30fe44c48bd8690441af9f567ba8fd710a72badc85198f7472'
_GEO_BYPASS = False
_NETRC_MACHINE = 'viki'
_token = None
_ERRORS = {
'geo': 'Sorry, this content is not available in your region.',
'upcoming': 'Sorry, this content is not yet available.',
'paywall': 'Sorry, this content is only available to Viki Pass Plus subscribers',
}
def _stream_headers(self, timestamp, sig):
return {
'X-Viki-manufacturer': 'vivo',
'X-Viki-device-model': 'vivo 1606',
'X-Viki-device-os-ver': '6.0.1',
'X-Viki-connection-type': 'WIFI',
'X-Viki-carrier': '',
'X-Viki-as-id': '100005a-1625321982-3932',
'timestamp': str(timestamp),
'signature': str(sig),
'x-viki-app-ver': self._APP_VERSION,
}
def _api_query(self, path, version=4, **kwargs):
path += '?' if '?' not in path else '&'
query = f'/v{version}/{path}app={self._APP}'
if self._token:
query += f'&token={self._token}'
return query + ''.join(f'&{name}={val}' for name, val in kwargs.items())
def _sign_query(self, path):
timestamp = int(time.time())
query = self._api_query(path, version=5)
sig = hmac.new(
self._APP_SECRET.encode('ascii'), f'{query}&t={timestamp}'.encode('ascii'), hashlib.sha1).hexdigest()
return timestamp, sig, self._API_URL_TEMPLATE % query
def _call_api(
self, path, video_id, note='Downloading JSON metadata', data=None, query=None, fatal=True):
if query is None:
timestamp, sig, url = self._sign_query(path)
else:
url = self._API_URL_TEMPLATE % self._api_query(path, version=4)
resp = self._download_json(
url, video_id, note, fatal=fatal, query=query,
data=json.dumps(data).encode() if data else None,
headers=({'x-viki-app-ver': self._APP_VERSION} if data
else self._stream_headers(timestamp, sig) if query is None
else None), expected_status=400) or {}
self._raise_error(resp.get('error'), fatal)
return resp
def _raise_error(self, error, fatal=True):
if error is None:
return
msg = f'{self.IE_NAME} said: {error}'
if fatal:
raise ExtractorError(msg, expected=True)
else:
self.report_warning(msg)
def _check_errors(self, data):
for reason, status in (data.get('blocking') or {}).items():
if status and reason in self._ERRORS:
message = self._ERRORS[reason]
if reason == 'geo':
self.raise_geo_restricted(msg=message)
elif reason == 'paywall':
if try_get(data, lambda x: x['paywallable']['tvod']):
self._raise_error('This video is for rent only or TVOD (Transactional Video On demand)')
self.raise_login_required(message)
self._raise_error(message)
def _perform_login(self, username, password):
self._token = self._call_api(
'sessions.json', None, 'Logging in', fatal=False,
data={'username': username, 'password': password}).get('token')
if not self._token:
self.report_warning('Login Failed: Unable to get session token')
@staticmethod
def dict_selection(dict_obj, preferred_key):
if preferred_key in dict_obj:
return dict_obj[preferred_key]
return (list(filter(None, dict_obj.values())) or [None])[0]
class VikiIE(VikiBaseIE):
IE_NAME = 'viki'
_VALID_URL = rf'{VikiBaseIE._VALID_URL_BASE}(?:videos|player)/(?P<id>[0-9]+v)'
_TESTS = [{
'note': 'Free non-DRM video with storyboards in MPD',
'url': 'https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1',
'info_dict': {
'id': '1175236v',
'ext': 'mp4',
'title': 'Choosing Spouse by Lottery - Episode 1',
'timestamp': 1606463239,
'age_limit': 13,
'uploader': 'FCC',
'upload_date': '20201127',
},
}, {
'url': 'http://www.viki.com/videos/1023585v-heirs-episode-14',
'info_dict': {
'id': '1023585v',
'ext': 'mp4',
'title': 'Heirs - Episode 14',
'uploader': 'SBS Contents Hub',
'timestamp': 1385047627,
'upload_date': '20131121',
'age_limit': 13,
'duration': 3570,
'episode_number': 14,
},
'skip': 'Blocked in the US',
}, {
# clip
'url': 'http://www.viki.com/videos/1067139v-the-avengers-age-of-ultron-press-conference',
'md5': '86c0b5dbd4d83a6611a79987cc7a1989',
'info_dict': {
'id': '1067139v',
'ext': 'mp4',
'title': "'The Avengers: Age of Ultron' Press Conference",
'description': 'md5:d70b2f9428f5488321bfe1db10d612ea',
'duration': 352,
'timestamp': 1430380829,
'upload_date': '20150430',
'uploader': 'Arirang TV',
'like_count': int,
'age_limit': 0,
},
'skip': 'Sorry. There was an error loading this video',
}, {
'url': 'http://www.viki.com/videos/1048879v-ankhon-dekhi',
'info_dict': {
'id': '1048879v',
'ext': 'mp4',
'title': 'Ankhon Dekhi',
'duration': 6512,
'timestamp': 1408532356,
'upload_date': '20140820',
'uploader': 'Spuul',
'like_count': int,
'age_limit': 13,
},
'skip': 'Blocked in the US',
}, {
# episode
'url': 'http://www.viki.com/videos/44699v-boys-over-flowers-episode-1',
'md5': '0a53dc252e6e690feccd756861495a8c',
'info_dict': {
'id': '44699v',
'ext': 'mp4',
'title': 'Boys Over Flowers - Episode 1',
'description': 'md5:b89cf50038b480b88b5b3c93589a9076',
'duration': 4172,
'timestamp': 1270496524,
'upload_date': '20100405',
'uploader': 'group8',
'like_count': int,
'age_limit': 13,
'episode_number': 1,
},
}, {
# youtube external
'url': 'http://www.viki.com/videos/50562v-poor-nastya-complete-episode-1',
'md5': '63f8600c1da6f01b7640eee7eca4f1da',
'info_dict': {
'id': '50562v',
'ext': 'webm',
'title': 'Poor Nastya [COMPLETE] - Episode 1',
'description': '',
'duration': 606,
'timestamp': 1274949505,
'upload_date': '20101213',
'uploader': 'ad14065n',
'uploader_id': 'ad14065n',
'like_count': int,
'age_limit': 13,
},
'skip': 'Page not found!',
}, {
'url': 'http://www.viki.com/player/44699v',
'only_matching': True,
}, {
# non-English description
'url': 'http://www.viki.com/videos/158036v-love-in-magic',
'md5': '41faaba0de90483fb4848952af7c7d0d',
'info_dict': {
'id': '158036v',
'ext': 'mp4',
'uploader': 'I Planet Entertainment',
'upload_date': '20111122',
'timestamp': 1321985454,
'description': 'md5:44b1e46619df3a072294645c770cef36',
'title': 'Love In Magic',
'age_limit': 13,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._call_api(f'videos/{video_id}.json', video_id, 'Downloading video JSON', query={})
self._check_errors(video)
title = try_get(video, lambda x: x['titles']['en'], str)
episode_number = int_or_none(video.get('number'))
if not title:
title = f'Episode {episode_number}' if video.get('type') == 'episode' else video.get('id') or video_id
container_titles = try_get(video, lambda x: x['container']['titles'], dict) or {}
container_title = self.dict_selection(container_titles, 'en')
title = f'{container_title} - {title}'
thumbnails = [{
'id': thumbnail_id,
'url': thumbnail['url'],
} for thumbnail_id, thumbnail in (video.get('images') or {}).items() if thumbnail.get('url')]
resp = self._call_api(
f'playback_streams/{video_id}.json?drms=dt3&device_id={self._DEVICE_ID}',
video_id, 'Downloading video streams JSON')['main'][0]
stream_id = try_get(resp, lambda x: x['properties']['track']['stream_id'])
subtitles = dict((lang, [{
'ext': ext,
'url': self._API_URL_TEMPLATE % self._api_query(
f'videos/{video_id}/auth_subtitles/{lang}.{ext}', stream_id=stream_id),
} for ext in ('srt', 'vtt')]) for lang in (video.get('subtitle_completions') or {}))
mpd_url = resp['url']
# 720p is hidden in another MPD which can be found in the current manifest content
mpd_content = self._download_webpage(mpd_url, video_id, note='Downloading initial MPD manifest')
mpd_url = self._search_regex(
r'(?mi)<BaseURL>(http.+.mpd)', mpd_content, 'new manifest', default=mpd_url)
if 'mpdhd_high' not in mpd_url and 'sig=' not in mpd_url:
# Modify the URL to get 1080p
mpd_url = mpd_url.replace('mpdhd', 'mpdhd_high')
formats = self._extract_mpd_formats(mpd_url, video_id)
return {
'id': video_id,
'formats': formats,
'title': title,
'description': self.dict_selection(video.get('descriptions', {}), 'en'),
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('created_at')),
'uploader': video.get('author'),
'uploader_url': video.get('author_url'),
'like_count': int_or_none(try_get(video, lambda x: x['likes']['count'])),
'age_limit': parse_age_limit(video.get('rating')),
'thumbnails': thumbnails,
'subtitles': subtitles,
'episode_number': episode_number,
}
class VikiChannelIE(VikiBaseIE):
IE_NAME = 'viki:channel'
_VALID_URL = rf'{VikiBaseIE._VALID_URL_BASE}(?:tv|news|movies|artists)/(?P<id>[0-9]+c)'
_TESTS = [{
'url': 'http://www.viki.com/tv/50c-boys-over-flowers',
'info_dict': {
'id': '50c',
'title': 'Boys Over Flowers',
'description': 'md5:804ce6e7837e1fd527ad2f25420f4d59',
},
'playlist_mincount': 51,
}, {
'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete',
'info_dict': {
'id': '1354c',
'title': 'Poor Nastya [COMPLETE]',
'description': 'md5:05bf5471385aa8b21c18ad450e350525',
},
'playlist_count': 127,
'skip': 'Page not found',
}, {
'url': 'http://www.viki.com/news/24569c-showbiz-korea',
'only_matching': True,
}, {
'url': 'http://www.viki.com/movies/22047c-pride-and-prejudice-2005',
'only_matching': True,
}, {
'url': 'http://www.viki.com/artists/2141c-shinee',
'only_matching': True,
}]
_video_types = ('episodes', 'movies', 'clips', 'trailers')
def _entries(self, channel_id):
params = {
'app': self._APP, 'token': self._token, 'only_ids': 'true',
'direction': 'asc', 'sort': 'number', 'per_page': 30,
}
video_types = self._configuration_arg('video_types') or self._video_types
for video_type in video_types:
if video_type not in self._video_types:
self.report_warning(f'Unknown video_type: {video_type}')
page_num = 0
while True:
page_num += 1
params['page'] = page_num
res = self._call_api(
f'containers/{channel_id}/{video_type}.json', channel_id, query=params, fatal=False,
note=f'Downloading {video_type.title()} JSON page {page_num}')
for video_id in res.get('response') or []:
yield self.url_result(f'https://www.viki.com/videos/{video_id}', VikiIE.ie_key(), video_id)
if not res.get('more'):
break
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._call_api(f'containers/{channel_id}.json', channel_id, 'Downloading channel JSON')
self._check_errors(channel)
return self.playlist_result(
self._entries(channel_id), channel_id,
self.dict_selection(channel['titles'], 'en'),
self.dict_selection(channel['descriptions'], 'en'))

View File

@ -116,6 +116,7 @@ class VKIE(VKBaseIE):
'id': '-77521_162222515', 'id': '-77521_162222515',
'ext': 'mp4', 'ext': 'mp4',
'title': 'ProtivoGunz - Хуёвая песня', 'title': 'ProtivoGunz - Хуёвая песня',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*', 'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
'uploader_id': '39545378', 'uploader_id': '39545378',
'duration': 195, 'duration': 195,
@ -165,6 +166,7 @@ class VKIE(VKBaseIE):
'id': '-93049196_456239755', 'id': '-93049196_456239755',
'ext': 'mp4', 'ext': 'mp4',
'title': '8 серия (озвучка)', 'title': '8 серия (озвучка)',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'duration': 8383, 'duration': 8383,
'comment_count': int, 'comment_count': int,
'uploader': 'Dizi2021', 'uploader': 'Dizi2021',
@ -240,6 +242,7 @@ class VKIE(VKBaseIE):
'upload_date': '20221005', 'upload_date': '20221005',
'uploader': 'Шальная Императрица', 'uploader': 'Шальная Императрица',
'uploader_id': '-74006511', 'uploader_id': '-74006511',
'description': 'md5:f9315f7786fa0e84e75e4f824a48b056',
}, },
}, },
{ {
@ -278,6 +281,25 @@ class VKIE(VKBaseIE):
}, },
'skip': 'No formats found', 'skip': 'No formats found',
}, },
{
'note': 'video has chapters',
'url': 'https://vkvideo.ru/video-18403220_456239696',
'info_dict': {
'id': '-18403220_456239696',
'ext': 'mp4',
'title': 'Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер',
'description': 'md5:b112ea9de53683b6d03d29076f62eec2',
'uploader': 'Руслан Усачев',
'uploader_id': '-18403220',
'comment_count': int,
'like_count': int,
'duration': 1983,
'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:21',
'timestamp': 1738252883,
'upload_date': '20250130',
},
},
{ {
# live stream, hls and rtmp links, most likely already finished live # live stream, hls and rtmp links, most likely already finished live
# stream by the time you are reading this comment # stream by the time you are reading this comment
@ -449,7 +471,6 @@ def _real_extract(self, url):
return self.url_result(opts_url) return self.url_result(opts_url)
data = player['params'][0] data = player['params'][0]
title = unescapeHTML(data['md_title'])
# 2 = live # 2 = live
# 3 = post live (finished live) # 3 = post live (finished live)
@ -507,17 +528,29 @@ def _real_extract(self, url):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': title, 'subtitles': subtitles,
'thumbnail': data.get('jpg'), **traverse_obj(mv_data, {
'uploader': data.get('md_author'), 'title': ('title', {unescapeHTML}),
'uploader_id': str_or_none(data.get('author_id') or mv_data.get('authorId')), 'description': ('desc', {clean_html}, filter),
'duration': int_or_none(data.get('duration') or mv_data.get('duration')), 'duration': ('duration', {int_or_none}),
'like_count': ('likes', {int_or_none}),
'comment_count': ('commcount', {int_or_none}),
}),
**traverse_obj(data, {
'title': ('md_title', {unescapeHTML}),
'description': ('description', {clean_html}, filter),
'thumbnail': ('jpg', {url_or_none}),
'uploader': ('md_author', {str}),
'uploader_id': (('author_id', 'authorId'), {str_or_none}, any),
'duration': ('duration', {int_or_none}),
'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), {
'title': ('text', {unescapeHTML}),
'start_time': 'time',
}),
}),
'timestamp': timestamp, 'timestamp': timestamp,
'view_count': view_count, 'view_count': view_count,
'like_count': int_or_none(mv_data.get('likes')),
'comment_count': int_or_none(mv_data.get('commcount')),
'is_live': is_live, 'is_live': is_live,
'subtitles': subtitles,
'_format_sort_fields': ('res', 'source'), '_format_sort_fields': ('res', 'source'),
} }

View File

@ -0,0 +1,185 @@
import itertools
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
clean_html,
extract_attributes,
parse_duration,
parse_qs,
)
from ..utils.traversal import (
find_element,
find_elements,
traverse_obj,
)
class VrSquareIE(InfoExtractor):
IE_NAME = 'vrsquare'
IE_DESC = 'VR SQUARE'
_BASE_URL = 'https://livr.jp'
_VALID_URL = r'https?://livr\.jp/contents/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://livr.jp/contents/P470896661',
'info_dict': {
'id': 'P470896661',
'ext': 'mp4',
'title': 'そこ曲がったら、櫻坂? 7年間お疲れ様!菅井友香の卒業を祝う会!前半 2022年11月6日放送分',
'description': 'md5:523726dc835aa8014dfe1e2b38d36cd1',
'duration': 1515.0,
'tags': 'count:2',
'thumbnail': r're:https?://media\.livr\.jp/vod/img/.+\.jpg',
},
}, {
'url': 'https://livr.jp/contents/P589523973',
'info_dict': {
'id': 'P589523973',
'ext': 'mp4',
'title': '薄闇に仰ぐ しだれ桜の妖艶',
'description': 'md5:a042f517b2cbb4ed6746707afec4d306',
'duration': 1084.0,
'tags': list,
'thumbnail': r're:https?://media\.livr\.jp/vod/img/.+\.jpg',
},
'skip': 'Paid video',
}, {
'url': 'https://livr.jp/contents/P316939908',
'info_dict': {
'id': 'P316939908',
'ext': 'mp4',
'title': '2024年5月16日 「今日は誰に恋をする?」公演 小栗有以 生誕祭',
'description': 'md5:2110bdcf947f28bd7d06ec420e51b619',
'duration': 8559.0,
'tags': list,
'thumbnail': r're:https?://media\.livr\.jp/vod/img/.+\.jpg',
},
'skip': 'Premium channel subscribers only',
}, {
# Accessible only in the VR SQUARE app
'url': 'https://livr.jp/contents/P126481458',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
status = self._download_json(
f'{self._BASE_URL}/webApi/contentsStatus/{video_id}',
video_id, 'Checking contents status', fatal=False)
if traverse_obj(status, 'result_code') == '40407':
self.raise_login_required('Unable to access this video')
try:
web_api = self._download_json(
f'{self._BASE_URL}/webApi/play/url/{video_id}', video_id)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 500:
raise ExtractorError('VR SQUARE app-only videos are not supported', expected=True)
raise
return {
'id': video_id,
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage),
'description': self._html_search_meta('description', webpage),
'formats': self._extract_m3u8_formats(traverse_obj(web_api, (
'urls', ..., 'url', any)), video_id, 'mp4', fatal=False),
'thumbnail': self._html_search_meta('og:image', webpage),
**traverse_obj(webpage, {
'duration': ({find_element(cls='layout-product-data-time')}, {parse_duration}),
'tags': ({find_elements(cls='search-tag')}, ..., {clean_html}),
}),
}
class VrSquarePlaylistBaseIE(InfoExtractor):
_BASE_URL = 'https://livr.jp'
def _fetch_vids(self, source, keys=()):
for url_path in traverse_obj(source, (
*keys, {find_elements(cls='video', html=True)}, ...,
{extract_attributes}, 'data-url', {str}, filter),
):
yield self.url_result(
f'{self._BASE_URL}/contents/{url_path.removeprefix("/contents/")}', VrSquareIE)
def _entries(self, path, display_id, query=None):
for page in itertools.count(1):
ajax = self._download_json(
f'{self._BASE_URL}{path}', display_id,
f'Downloading playlist JSON page {page}',
query={'p': page, **(query or {})})
yield from self._fetch_vids(ajax, ('contents_render_list', ...))
if not traverse_obj(ajax, (('has_next', 'hasNext'), {bool}, any)):
break
class VrSquareChannelIE(VrSquarePlaylistBaseIE):
IE_NAME = 'vrsquare:channel'
_VALID_URL = r'https?://livr\.jp/channel/(?P<id>\w+)'
_TESTS = [{
'url': 'https://livr.jp/channel/H372648599',
'info_dict': {
'id': 'H372648599',
'title': 'AKB48チャンネル',
},
'playlist_mincount': 502,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
return self.playlist_result(
self._entries(f'/ajax/channel/{playlist_id}', playlist_id),
playlist_id, self._html_search_meta('og:title', webpage))
class VrSquareSearchIE(VrSquarePlaylistBaseIE):
IE_NAME = 'vrsquare:search'
_VALID_URL = r'https?://livr\.jp/web-search/?\?(?:[^#]+&)?w=[^#]+'
_TESTS = [{
'url': 'https://livr.jp/web-search?w=%23%E5%B0%8F%E6%A0%97%E6%9C%89%E4%BB%A5',
'info_dict': {
'id': '#小栗有以',
},
'playlist_mincount': 60,
}]
def _real_extract(self, url):
search_query = parse_qs(url)['w'][0]
return self.playlist_result(
self._entries('/ajax/web-search', search_query, {'w': search_query}), search_query)
class VrSquareSectionIE(VrSquarePlaylistBaseIE):
IE_NAME = 'vrsquare:section'
_VALID_URL = r'https?://livr\.jp/(?:category|headline)/(?P<id>\w+)'
_TESTS = [{
'url': 'https://livr.jp/category/C133936275',
'info_dict': {
'id': 'C133936275',
'title': 'そこ曲がったら、櫻坂VR',
},
'playlist_mincount': 308,
}, {
'url': 'https://livr.jp/headline/A296449604',
'info_dict': {
'id': 'A296449604',
'title': 'AKB48 アフターVR',
},
'playlist_mincount': 22,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
return self.playlist_result(
self._fetch_vids(webpage), playlist_id, self._html_search_meta('og:title', webpage))

View File

@ -2,31 +2,33 @@
import time import time
import urllib.parse import urllib.parse
from .gigya import GigyaBaseIE from .common import InfoExtractor
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
extract_attributes, extract_attributes,
filter_dict,
float_or_none, float_or_none,
get_element_by_class, get_element_by_class,
get_element_html_by_class, get_element_html_by_class,
int_or_none, int_or_none,
join_nonempty, jwt_decode_hs256,
jwt_encode_hs256, jwt_encode_hs256,
make_archive_id, make_archive_id,
merge_dicts, merge_dicts,
parse_age_limit, parse_age_limit,
parse_duration,
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj, traverse_obj,
try_call,
url_or_none, url_or_none,
urlencode_postdata,
) )
class VRTBaseIE(GigyaBaseIE): class VRTBaseIE(InfoExtractor):
_GEO_BYPASS = False _GEO_BYPASS = False
_PLAYER_INFO = { _PLAYER_INFO = {
'platform': 'desktop', 'platform': 'desktop',
@ -37,11 +39,11 @@ class VRTBaseIE(GigyaBaseIE):
'device': 'undefined (undefined)', 'device': 'undefined (undefined)',
'os': { 'os': {
'name': 'Windows', 'name': 'Windows',
'version': 'x86_64', 'version': '10',
}, },
'player': { 'player': {
'name': 'VRT web player', 'name': 'VRT web player',
'version': '2.7.4-prod-2023-04-19T06:05:45', 'version': '5.1.1-prod-2025-02-14T08:44:16"',
}, },
} }
# From https://player.vrt.be/vrtnws/js/main.js & https://player.vrt.be/ketnet/js/main.8cdb11341bcb79e4cd44.js # From https://player.vrt.be/vrtnws/js/main.js & https://player.vrt.be/ketnet/js/main.8cdb11341bcb79e4cd44.js
@ -90,20 +92,21 @@ def _extract_formats_and_subtitles(self, data, video_id):
def _call_api(self, video_id, client='null', id_token=None, version='v2'): def _call_api(self, video_id, client='null', id_token=None, version='v2'):
player_info = {'exp': (round(time.time(), 3) + 900), **self._PLAYER_INFO} player_info = {'exp': (round(time.time(), 3) + 900), **self._PLAYER_INFO}
player_token = self._download_json( player_token = self._download_json(
'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v2/tokens', f'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/{version}/tokens',
video_id, 'Downloading player token', headers={ video_id, 'Downloading player token', 'Failed to download player token', headers={
**self.geo_verification_headers(), **self.geo_verification_headers(),
'Content-Type': 'application/json', 'Content-Type': 'application/json',
}, data=json.dumps({ }, data=json.dumps({
'identityToken': id_token or {}, 'identityToken': id_token or '',
'playerInfo': jwt_encode_hs256(player_info, self._JWT_SIGNING_KEY, headers={ 'playerInfo': jwt_encode_hs256(player_info, self._JWT_SIGNING_KEY, headers={
'kid': self._JWT_KEY_ID, 'kid': self._JWT_KEY_ID,
}).decode(), }).decode(),
}, separators=(',', ':')).encode())['vrtPlayerToken'] }, separators=(',', ':')).encode())['vrtPlayerToken']
return self._download_json( return self._download_json(
f'https://media-services-public.vrt.be/media-aggregator/{version}/media-items/{video_id}', # The URL below redirects to https://media-services-public.vrt.be/media-aggregator/{version}/media-items/{video_id}
video_id, 'Downloading API JSON', query={ f'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/{version}/videos/{video_id}',
video_id, 'Downloading API JSON', 'Failed to download API JSON', query={
'vrtPlayerToken': player_token, 'vrtPlayerToken': player_token,
'client': client, 'client': client,
}, expected_status=400) }, expected_status=400)
@ -177,215 +180,286 @@ def _real_extract(self, url):
class VrtNUIE(VRTBaseIE): class VrtNUIE(VRTBaseIE):
IE_DESC = 'VRT MAX' IE_NAME = 'vrtmax'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/vrtnu/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)' IE_DESC = 'VRT MAX (formerly VRT NU)'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?:vrtnu|vrtmax)/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# CONTENT_IS_AGE_RESTRICTED 'url': 'https://www.vrt.be/vrtmax/a-z/ket---doc/trailer/ket---doc-trailer-s6/',
'url': 'https://www.vrt.be/vrtnu/a-z/de-ideale-wereld/2023-vj/de-ideale-wereld-d20230116/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-855b00a8-6ce2-4032-ac4f-1fcf3ae78524$vid-d2243aa1-ec46-4e34-a55b-92568459906f', 'id': 'pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tom Waes', 'channel': 'ketnet',
'description': 'Satirisch actualiteitenmagazine met Ella Leyers. Tom Waes is te gast.', 'description': 'Neem een kijkje in de bijzondere wereld van deze Ketnetters.',
'timestamp': 1673905125, 'display_id': 'ket---doc-trailer-s6',
'release_timestamp': 1673905125, 'duration': 30.0,
'series': 'De ideale wereld', 'episode': 'Reeks 6 volledig vanaf 3 maart',
'season_id': '1672830988794', 'episode_id': '1739450401467',
'episode': 'Aflevering 1', 'season': 'Trailer',
'episode_number': 1, 'season_id': '1739450401467',
'episode_id': '1672830988861', 'series': 'Ket & Doc',
'display_id': 'de-ideale-wereld-d20230116', 'thumbnail': 'https://images.vrt.be/orig/2025/02/21/63f07122-5bbd-4ca1-b42e-8565c6cd95df.jpg',
'channel': 'VRT', 'timestamp': 1740373200,
'duration': 1939.0, 'title': 'Reeks 6 volledig vanaf 3 maart',
'thumbnail': 'https://images.vrt.be/orig/2023/01/10/1bb39cb3-9115-11ed-b07d-02b7b76bf47f.jpg', 'upload_date': '20250224',
'release_date': '20230116', '_old_archive_ids': [
'upload_date': '20230116', 'canvas pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
'age_limit': 12, 'ketnet pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
],
}, },
}, { }, {
'url': 'https://www.vrt.be/vrtnu/a-z/buurman--wat-doet-u-nu-/6/buurman--wat-doet-u-nu--s6-trailer/', 'url': 'https://www.vrt.be/vrtmax/a-z/meisjes/6/meisjes-s6a5/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-ad4050eb-d9e5-48c2-9ec8-b6c355032361$vid-0465537a-34a8-4617-8352-4d8d983b4eee', 'id': 'pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Trailer seizoen 6 \'Buurman, wat doet u nu?\'', 'channel': 'ketnet',
'description': 'md5:197424726c61384b4e5c519f16c0cf02', 'description': 'md5:713793f15cbf677f66200b36b7b1ec5a',
'timestamp': 1652940000, 'display_id': 'meisjes-s6a5',
'release_timestamp': 1652940000, 'duration': 1336.02,
'series': 'Buurman, wat doet u nu?', 'episode': 'Week 5',
'season': 'Seizoen 6', 'episode_id': '1684157692901',
'episode_number': 5,
'season': '6',
'season_id': '1684157692901',
'season_number': 6, 'season_number': 6,
'season_id': '1652344200907', 'series': 'Meisjes',
'episode': 'Aflevering 0', 'thumbnail': 'https://images.vrt.be/orig/2023/05/14/bf526ae0-f1d9-11ed-91d7-02b7b76bf47f.jpg',
'episode_number': 0, 'timestamp': 1685251800,
'episode_id': '1652951873524', 'title': 'Week 5',
'display_id': 'buurman--wat-doet-u-nu--s6-trailer', 'upload_date': '20230528',
'channel': 'VRT', '_old_archive_ids': [
'duration': 33.13, 'canvas pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'thumbnail': 'https://images.vrt.be/orig/2022/05/23/3c234d21-da83-11ec-b07d-02b7b76bf47f.jpg', 'ketnet pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'release_date': '20220519', ],
'upload_date': '20220519', },
}, {
'url': 'https://www.vrt.be/vrtnu/a-z/taboe/3/taboe-s3a4/',
'info_dict': {
'id': 'pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
'ext': 'mp4',
'channel': 'een',
'description': 'md5:bf61345a95eca9393a95de4a7a54b5c6',
'display_id': 'taboe-s3a4',
'duration': 2882.02,
'episode': 'Mensen met het syndroom van Gilles de la Tourette',
'episode_id': '1739055911734',
'episode_number': 4,
'season': '3',
'season_id': '1739055911734',
'season_number': 3,
'series': 'Taboe',
'thumbnail': 'https://images.vrt.be/orig/2025/02/19/8198496c-d1ae-4bca-9a48-761cf3ea3ff2.jpg',
'timestamp': 1740286800,
'title': 'Mensen met het syndroom van Gilles de la Tourette',
'upload_date': '20250223',
'_old_archive_ids': [
'canvas pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
'ketnet pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
],
}, },
'params': {'skip_download': 'm3u8'},
}] }]
_NETRC_MACHINE = 'vrtnu' _NETRC_MACHINE = 'vrtnu'
_authenticated = False
_TOKEN_COOKIE_DOMAIN = '.www.vrt.be'
_ACCESS_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_at'
_REFRESH_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_rt'
_VIDEO_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_vt'
_VIDEO_PAGE_QUERY = '''
query VideoPage($pageId: ID!) {
page(id: $pageId) {
... on EpisodePage {
episode {
ageRaw
description
durationRaw
episodeNumberRaw
id
name
onTimeRaw
program {
title
}
season {
id
titleRaw
}
title
brand
}
ldjson
player {
image {
templateUrl
}
modes {
streamId
}
}
}
}
}
'''
def _fetch_tokens(self):
has_credentials = self._get_login_info()[0]
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
if (access_token and not self._is_jwt_token_expired(access_token)
and video_token and not self._is_jwt_token_expired(video_token)):
return access_token, video_token
if has_credentials:
access_token, video_token = self.cache.load(self._NETRC_MACHINE, 'token_data', default=(None, None))
if (access_token and not self._is_jwt_token_expired(access_token)
and video_token and not self._is_jwt_token_expired(video_token)):
self.write_debug('Restored tokens from cache')
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._ACCESS_TOKEN_COOKIE_NAME, access_token)
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._VIDEO_TOKEN_COOKIE_NAME, video_token)
return access_token, video_token
if not self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME):
return None, None
self._request_webpage(
'https://www.vrt.be/vrtmax/sso/refresh', None,
note='Refreshing tokens', errnote='Failed to refresh tokens', fatal=False)
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
if not access_token or not video_token:
self.cache.store(self._NETRC_MACHINE, 'refresh_token', None)
self.cookiejar.clear(self._TOKEN_COOKIE_DOMAIN, '/vrtmax/sso', self._REFRESH_TOKEN_COOKIE_NAME)
msg = 'Refreshing of tokens failed'
if not has_credentials:
self.report_warning(msg)
return None, None
self.report_warning(f'{msg}. Re-logging in')
return self._perform_login(*self._get_login_info())
if has_credentials:
self.cache.store(self._NETRC_MACHINE, 'token_data', (access_token, video_token))
return access_token, video_token
def _get_vrt_cookie(self, cookie_name):
# Refresh token cookie is scoped to /vrtmax/sso, others are scoped to /
return try_call(lambda: self._get_cookies('https://www.vrt.be/vrtmax/sso')[cookie_name].value)
@staticmethod
def _is_jwt_token_expired(token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _perform_login(self, username, password): def _perform_login(self, username, password):
auth_info = self._gigya_login({ refresh_token = self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME)
'APIKey': '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy', if refresh_token and not self._is_jwt_token_expired(refresh_token):
'targetEnv': 'jssdk', self.write_debug('Using refresh token from logged-in cookies; skipping login with credentials')
return
refresh_token = self.cache.load(self._NETRC_MACHINE, 'refresh_token', default=None)
if refresh_token and not self._is_jwt_token_expired(refresh_token):
self.write_debug('Restored refresh token from cache')
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._REFRESH_TOKEN_COOKIE_NAME, refresh_token, path='/vrtmax/sso')
return
self._request_webpage(
'https://www.vrt.be/vrtmax/sso/login', None,
note='Getting session cookies', errnote='Failed to get session cookies')
login_data = self._download_json(
'https://login.vrt.be/perform_login', None, data=json.dumps({
'clientId': 'vrtnu-site',
'loginID': username, 'loginID': username,
'password': password, 'password': password,
'authMode': 'cookie', }).encode(), headers={
}) 'Content-Type': 'application/json',
'Oidcxsrf': self._get_cookies('https://login.vrt.be')['OIDCXSRF'].value,
}, note='Logging in', errnote='Login failed', expected_status=403)
if login_data.get('errorCode'):
raise ExtractorError(f'Login failed: {login_data.get("errorMessage")}', expected=True)
if auth_info.get('errorDetails'): self._request_webpage(
raise ExtractorError(f'Unable to login. VrtNU said: {auth_info["errorDetails"]}', expected=True) login_data['redirectUrl'], None,
note='Getting access token', errnote='Failed to get access token')
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
refresh_token = self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME)
if not all((access_token, video_token, refresh_token)):
raise ExtractorError('Unable to extract token cookie values')
self.cache.store(self._NETRC_MACHINE, 'token_data', (access_token, video_token))
self.cache.store(self._NETRC_MACHINE, 'refresh_token', refresh_token)
return access_token, video_token
def _real_extract(self, url):
display_id = self._match_id(url)
access_token, video_token = self._fetch_tokens()
metadata = self._download_json(
f'https://www.vrt.be/vrtnu-api/graphql{"" if access_token else "/public"}/v1',
display_id, 'Downloading asset JSON', 'Unable to download asset JSON',
data=json.dumps({
'operationName': 'VideoPage',
'query': self._VIDEO_PAGE_QUERY,
'variables': {'pageId': urllib.parse.urlparse(url).path},
}).encode(),
headers=filter_dict({
'Authorization': f'Bearer {access_token}' if access_token else None,
'Content-Type': 'application/json',
'x-vrt-client-name': 'WEB',
'x-vrt-client-version': '1.5.9',
'x-vrt-zone': 'default',
}))['data']['page']
video_id = metadata['player']['modes'][0]['streamId']
# Sometimes authentication fails for no good reason, retry
for retry in self.RetryManager():
if retry.attempt > 1:
self._sleep(1, None)
try: try:
self._request_webpage( streaming_info = self._call_api(video_id, 'vrtnu-web@PROD', id_token=video_token)
'https://token.vrt.be/vrtnuinitlogin', None, note='Requesting XSRF Token',
errnote='Could not get XSRF Token', query={
'provider': 'site',
'destination': 'https://www.vrt.be/vrtnu/',
})
self._request_webpage(
'https://login.vrt.be/perform_login', None,
note='Performing login', errnote='Login failed',
query={'client_id': 'vrtnu-site'}, data=urlencode_postdata({
'UID': auth_info['UID'],
'UIDSignature': auth_info['UIDSignature'],
'signatureTimestamp': auth_info['signatureTimestamp'],
'_csrf': self._get_cookies('https://login.vrt.be').get('OIDCXSRF').value,
}))
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401: if not video_token and isinstance(e.cause, HTTPError) and e.cause.status == 404:
retry.error = e self.raise_login_required()
continue
raise raise
self._authenticated = True formats, subtitles = self._extract_formats_and_subtitles(streaming_info, video_id)
def _real_extract(self, url): code = traverse_obj(streaming_info, ('code', {str}))
display_id = self._match_id(url) if not formats and code:
parsed_url = urllib.parse.urlparse(url) if code in ('CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS', 'CONTENT_AVAILABLE_ONLY_IN_BE', 'CONTENT_UNAVAILABLE_VIA_PROXY'):
details = self._download_json(
f'{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path.rstrip("/")}.model.json',
display_id, 'Downloading asset JSON', 'Unable to download asset JSON')['details']
watch_info = traverse_obj(details, (
'actions', lambda _, v: v['type'] == 'watch-episode', {dict}), get_all=False) or {}
video_id = join_nonempty(
'episodePublicationId', 'episodeVideoId', delim='$', from_dict=watch_info)
if '$' not in video_id:
raise ExtractorError('Unable to extract video ID')
vrtnutoken = self._download_json(
'https://token.vrt.be/refreshtoken', video_id, note='Retrieving vrtnutoken',
errnote='Token refresh failed')['vrtnutoken'] if self._authenticated else None
video_info = self._call_api(video_id, 'vrtnu-web@PROD', vrtnutoken)
if 'title' not in video_info:
code = video_info.get('code')
if code in ('AUTHENTICATION_REQUIRED', 'CONTENT_IS_AGE_RESTRICTED'):
self.raise_login_required(code, method='password')
elif code in ('INVALID_LOCATION', 'CONTENT_AVAILABLE_ONLY_IN_BE'):
self.raise_geo_restricted(countries=['BE']) self.raise_geo_restricted(countries=['BE'])
elif code == 'CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS_AND_EXPATS': elif code in ('CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS_AND_EXPATS', 'CONTENT_IS_AGE_RESTRICTED', 'CONTENT_REQUIRES_AUTHENTICATION'):
if not self._authenticated: self.raise_login_required()
self.raise_login_required(code, method='password') else:
self.raise_geo_restricted(countries=['BE']) self.raise_no_formats(f'Unable to extract formats: {code}')
raise ExtractorError(code, expected=True)
formats, subtitles = self._extract_formats_and_subtitles(video_info, video_id)
return { return {
**traverse_obj(details, { 'duration': float_or_none(streaming_info.get('duration'), 1000),
'title': 'title', 'thumbnail': url_or_none(streaming_info.get('posterImageUrl')),
'description': ('description', {clean_html}), **self._json_ld(traverse_obj(metadata, ('ldjson', ..., {json.loads})), video_id, fatal=False),
'timestamp': ('data', 'episode', 'onTime', 'raw', {parse_iso8601}), **traverse_obj(metadata, ('episode', {
'release_timestamp': ('data', 'episode', 'onTime', 'raw', {parse_iso8601}), 'title': ('title', {str}),
'series': ('data', 'program', 'title'), 'description': ('description', {str}),
'season': ('data', 'season', 'title', 'value'), 'timestamp': ('onTimeRaw', {parse_iso8601}),
'season_number': ('data', 'season', 'title', 'raw', {int_or_none}), 'series': ('program', 'title', {str}),
'season_id': ('data', 'season', 'id', {str_or_none}), 'season': ('season', 'titleRaw', {str}),
'episode': ('data', 'episode', 'number', 'value', {str_or_none}), 'season_number': ('season', 'titleRaw', {int_or_none}),
'episode_number': ('data', 'episode', 'number', 'raw', {int_or_none}), 'season_id': ('id', {str_or_none}),
'episode_id': ('data', 'episode', 'id', {str_or_none}), 'episode': ('title', {str}),
'age_limit': ('data', 'episode', 'age', 'raw', {parse_age_limit}), 'episode_number': ('episodeNumberRaw', {int_or_none}),
}), 'episode_id': ('id', {str_or_none}),
'age_limit': ('ageRaw', {parse_age_limit}),
'channel': ('brand', {str}),
'duration': ('durationRaw', {parse_duration}),
})),
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'channel': 'VRT',
'formats': formats,
'duration': float_or_none(video_info.get('duration'), 1000),
'thumbnail': url_or_none(video_info.get('posterImageUrl')),
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('Canvas', video_id)],
}
class KetnetIE(VRTBaseIE):
_VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ketnet.be/kijken/m/meisjes/6/meisjes-s6a5',
'info_dict': {
'id': 'pbs-pub-39f8351c-a0a0-43e6-8394-205d597d6162$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'ext': 'mp4',
'title': 'Meisjes',
'episode': 'Reeks 6: Week 5',
'season': 'Reeks 6',
'series': 'Meisjes',
'timestamp': 1685251800,
'upload_date': '20230528',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_json(
'https://senior-bff.ketnet.be/graphql', display_id, query={
'query': '''{
video(id: "content/ketnet/nl/%s.model.json") {
description
episodeNr
imageUrl
mediaReference
programTitle
publicationDate
seasonTitle
subtitleVideodetail
titleVideodetail
}
}''' % display_id, # noqa: UP031
})['data']['video']
video_id = urllib.parse.unquote(video['mediaReference'])
data = self._call_api(video_id, 'ketnet@PROD', version='v1')
formats, subtitles = self._extract_formats_and_subtitles(data, video_id)
return {
'id': video_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('Canvas', video_id)], '_old_archive_ids': [make_archive_id('Canvas', video_id),
**traverse_obj(video, { make_archive_id('Ketnet', video_id)],
'title': ('titleVideodetail', {str}),
'description': ('description', {str}),
'thumbnail': ('thumbnail', {url_or_none}),
'timestamp': ('publicationDate', {parse_iso8601}),
'series': ('programTitle', {str}),
'season': ('seasonTitle', {str}),
'episode': ('subtitleVideodetail', {str}),
'episode_number': ('episodeNr', {int_or_none}),
}),
} }

View File

@ -109,7 +109,7 @@ def _parse_video_info(self, video_info):
**traverse_obj(video_info, { **traverse_obj(video_info, {
'display_id': ('mblogid', {str_or_none}), 'display_id': ('mblogid', {str_or_none}),
'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), 'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'),
{lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}, filter), {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}, filter),
'alt_title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter), 'alt_title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter),
'description': ('text_raw', {str}), 'description': ('text_raw', {str}),
'duration': ('page_info', 'media_info', 'duration', {int_or_none}), 'duration': ('page_info', 'media_info', 'duration', {int_or_none}),
@ -213,6 +213,7 @@ class WeiboVideoIE(WeiboBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'LEZDodaiW', 'display_id': 'LEZDodaiW',
'title': '稍微了解了一下靡烟miya感觉这东西也太二了', 'title': '稍微了解了一下靡烟miya感觉这东西也太二了',
'alt_title': '稍微了解了一下靡烟miya感觉这东西也太二了',
'description': '稍微了解了一下靡烟miya感觉这东西也太二了 http://t.cn/A6aerGsM \u200b\u200b\u200b', 'description': '稍微了解了一下靡烟miya感觉这东西也太二了 http://t.cn/A6aerGsM \u200b\u200b\u200b',
'duration': 76, 'duration': 76,
'timestamp': 1659344278, 'timestamp': 1659344278,
@ -224,6 +225,7 @@ class WeiboVideoIE(WeiboBaseIE):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'repost_count': int, 'repost_count': int,
'_old_archive_ids': ['weibomobile 4797700463137878'],
}, },
}] }]

View File

@ -11,7 +11,7 @@
) )
class WykopBaseExtractor(InfoExtractor): class WykopBaseIE(InfoExtractor):
def _get_token(self, force_refresh=False): def _get_token(self, force_refresh=False):
if not force_refresh: if not force_refresh:
maybe_cached = self.cache.load('wykop', 'bearer') maybe_cached = self.cache.load('wykop', 'bearer')
@ -72,7 +72,7 @@ def _common_data_extract(self, data):
} }
class WykopDigIE(WykopBaseExtractor): class WykopDigIE(WykopBaseIE):
IE_NAME = 'wykop:dig' IE_NAME = 'wykop:dig'
_VALID_URL = r'https?://(?:www\.)?wykop\.pl/link/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?wykop\.pl/link/(?P<id>\d+)'
@ -128,7 +128,7 @@ def _real_extract(self, url):
} }
class WykopDigCommentIE(WykopBaseExtractor): class WykopDigCommentIE(WykopBaseIE):
IE_NAME = 'wykop:dig:comment' IE_NAME = 'wykop:dig:comment'
_VALID_URL = r'https?://(?:www\.)?wykop\.pl/link/(?P<dig_id>\d+)/[^/]+/komentarz/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?wykop\.pl/link/(?P<dig_id>\d+)/[^/]+/komentarz/(?P<id>\d+)'
@ -177,7 +177,7 @@ def _real_extract(self, url):
} }
class WykopPostIE(WykopBaseExtractor): class WykopPostIE(WykopBaseIE):
IE_NAME = 'wykop:post' IE_NAME = 'wykop:post'
_VALID_URL = r'https?://(?:www\.)?wykop\.pl/wpis/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?wykop\.pl/wpis/(?P<id>\d+)'
@ -228,7 +228,7 @@ def _real_extract(self, url):
} }
class WykopPostCommentIE(WykopBaseExtractor): class WykopPostCommentIE(WykopBaseIE):
IE_NAME = 'wykop:post:comment' IE_NAME = 'wykop:post:comment'
_VALID_URL = r'https?://(?:www\.)?wykop\.pl/wpis/(?P<post_id>\d+)/[^/#]+#(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?wykop\.pl/wpis/(?P<post_id>\d+)/[^/#]+#(?P<id>\d+)'

View File

@ -2,15 +2,17 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
bug_reports_message,
determine_ext, determine_ext,
extract_attributes,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
parse_qs, parse_qs,
traverse_obj, qualities,
try_get, try_get,
update_url_query,
url_or_none, url_or_none,
) )
from ..utils.traversal import traverse_obj
class YandexVideoIE(InfoExtractor): class YandexVideoIE(InfoExtractor):
@ -186,7 +188,22 @@ def _real_extract(self, url):
return self.url_result(data_json['video']['url']) return self.url_result(data_json['video']['url'])
class ZenYandexIE(InfoExtractor): class ZenYandexBaseIE(InfoExtractor):
def _fetch_ssr_data(self, url, video_id):
webpage = self._download_webpage(url, video_id)
redirect = self._search_json(
r'(?:var|let|const)\s+it\s*=', webpage, 'redirect', video_id, default={}).get('retpath')
if redirect:
video_id = self._match_id(redirect)
webpage = self._download_webpage(redirect, video_id, note='Redirecting')
return video_id, self._search_json(
r'(?:var|let|const)\s+_params\s*=\s*\(', webpage, 'metadata', video_id,
contains_pattern=r'{["\']ssrData.+}')['ssrData']
class ZenYandexIE(ZenYandexBaseIE):
IE_NAME = 'dzen.ru'
IE_DESC = 'Дзен (dzen) formerly Яндекс.Дзен (Yandex Zen)'
_VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)' _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7', 'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7',
@ -216,6 +233,7 @@ class ZenYandexIE(InfoExtractor):
'timestamp': 1573465585, 'timestamp': 1573465585,
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'skip': 'The page does not exist',
}, { }, {
'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3', 'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3',
'info_dict': { 'info_dict': {
@ -227,6 +245,9 @@ class ZenYandexIE(InfoExtractor):
'uploader': 'TechInsider', 'uploader': 'TechInsider',
'timestamp': 1611378221, 'timestamp': 1611378221,
'upload_date': '20210123', 'upload_date': '20210123',
'view_count': int,
'duration': 243,
'tags': ['опыт', 'эксперимент', 'огонь'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
@ -240,6 +261,9 @@ class ZenYandexIE(InfoExtractor):
'uploader': 'TechInsider', 'uploader': 'TechInsider',
'upload_date': '20210123', 'upload_date': '20210123',
'timestamp': 1611378221, 'timestamp': 1611378221,
'view_count': int,
'duration': 243,
'tags': ['опыт', 'эксперимент', 'огонь'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
@ -252,44 +276,56 @@ class ZenYandexIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_id, ssr_data = self._fetch_ssr_data(url, video_id)
redirect = self._search_json(r'var it\s*=', webpage, 'redirect', id, default={}).get('retpath') video_data = ssr_data['videoMetaResponse']
if redirect:
video_id = self._match_id(redirect)
webpage = self._download_webpage(redirect, video_id, note='Redirecting')
data_json = self._search_json(
r'("data"\s*:|data\s*=)', webpage, 'metadata', video_id, contains_pattern=r'{["\']_*serverState_*video.+}')
serverstate = self._search_regex(r'(_+serverState_+video-site_[^_]+_+)', webpage, 'server state')
uploader = self._search_regex(r'(<a\s*class=["\']card-channel-link[^"\']+["\'][^>]+>)',
webpage, 'uploader', default='<a>')
uploader_name = extract_attributes(uploader).get('aria-label')
item_id = traverse_obj(data_json, (serverstate, 'videoViewer', 'openedItemId', {str}))
video_json = traverse_obj(data_json, (serverstate, 'videoViewer', 'items', item_id, {dict})) or {}
formats, subtitles = [], {} formats, subtitles = [], {}
for s_url in traverse_obj(video_json, ('video', 'streams', ..., {url_or_none})): quality = qualities(('4', '0', '1', '2', '3', '5', '6', '7'))
# Deduplicate stream URLs. The "dzen_dash" query parameter is present in some URLs but can be omitted
stream_urls = set(traverse_obj(video_data, (
'video', ('id', ('streams', ...), ('mp4Streams', ..., 'url'), ('oneVideoStreams', ..., 'url')),
{url_or_none}, {update_url_query(query={'dzen_dash': []})})))
for s_url in stream_urls:
ext = determine_ext(s_url) ext = determine_ext(s_url)
if ext == 'mpd': content_type = traverse_obj(parse_qs(s_url), ('ct', 0))
fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash') if ext == 'mpd' or content_type == '6':
elif ext == 'm3u8': fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash', fatal=False)
fmts, subs = self._extract_m3u8_formats_and_subtitles(s_url, video_id, 'mp4') elif ext == 'm3u8' or content_type == '8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(s_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
elif content_type == '0':
format_type = traverse_obj(parse_qs(s_url), ('type', 0))
formats.append({
'url': s_url,
'format_id': format_type,
'ext': 'mp4',
'quality': quality(format_type),
})
continue
else:
self.report_warning(f'Unsupported stream URL: {s_url}{bug_reports_message()}')
continue
formats.extend(fmts) formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs) self._merge_subtitles(subs, target=subtitles)
return { return {
'id': video_id, 'id': video_id,
'title': video_json.get('title') or self._og_search_title(webpage),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'duration': int_or_none(video_json.get('duration')), **traverse_obj(video_data, {
'view_count': int_or_none(video_json.get('views')), 'title': ('title', {str}),
'timestamp': int_or_none(video_json.get('publicationDate')), 'description': ('description', {str}),
'uploader': uploader_name or data_json.get('authorName') or try_get(data_json, lambda x: x['publisher']['name']), 'thumbnail': ('image', {url_or_none}),
'description': video_json.get('description') or self._og_search_description(webpage), 'duration': ('video', 'duration', {int_or_none}),
'thumbnail': self._og_search_thumbnail(webpage) or try_get(data_json, lambda x: x['og']['imageUrl']), 'view_count': ('video', 'views', {int_or_none}),
'timestamp': ('publicationDate', {int_or_none}),
'tags': ('tags', ..., {str}),
'uploader': ('source', 'title', {str}),
}),
} }
class ZenYandexChannelIE(InfoExtractor): class ZenYandexChannelIE(ZenYandexBaseIE):
IE_NAME = 'dzen.ru:channel'
_VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)' _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://zen.yandex.ru/tok_media', 'url': 'https://zen.yandex.ru/tok_media',
@ -323,8 +359,8 @@ class ZenYandexChannelIE(InfoExtractor):
'url': 'https://zen.yandex.ru/jony_me', 'url': 'https://zen.yandex.ru/jony_me',
'info_dict': { 'info_dict': {
'id': 'jony_me', 'id': 'jony_me',
'description': 'md5:ce0a5cad2752ab58701b5497835b2cc5', 'description': 'md5:7c30d11dc005faba8826feae99da3113',
'title': 'JONY ', 'title': 'JONY',
}, },
'playlist_count': 18, 'playlist_count': 18,
}, { }, {
@ -333,9 +369,8 @@ class ZenYandexChannelIE(InfoExtractor):
'url': 'https://zen.yandex.ru/tatyanareva', 'url': 'https://zen.yandex.ru/tatyanareva',
'info_dict': { 'info_dict': {
'id': 'tatyanareva', 'id': 'tatyanareva',
'description': 'md5:40a1e51f174369ec3ba9d657734ac31f', 'description': 'md5:92e56fa730a932ca2483ba5c2186ad96',
'title': 'Татьяна Рева', 'title': 'Татьяна Рева',
'entries': 'maxcount:200',
}, },
'playlist_mincount': 46, 'playlist_mincount': 46,
}, { }, {
@ -348,43 +383,31 @@ class ZenYandexChannelIE(InfoExtractor):
'playlist_mincount': 657, 'playlist_mincount': 657,
}] }]
def _entries(self, item_id, server_state_json, server_settings_json): def _entries(self, feed_data, channel_id):
items = (traverse_obj(server_state_json, ('feed', 'items', ...))
or traverse_obj(server_settings_json, ('exportData', 'items', ...)))
more = (traverse_obj(server_state_json, ('links', 'more'))
or traverse_obj(server_settings_json, ('exportData', 'more', 'link')))
next_page_id = None next_page_id = None
for page in itertools.count(1): for page in itertools.count(1):
for item in items or []: for item in traverse_obj(feed_data, (
if item.get('type') != 'gif': (None, ('items', lambda _, v: v['tab'] in ('shorts', 'longs'))),
continue 'items', lambda _, v: url_or_none(v['link']),
video_id = traverse_obj(item, 'publication_id', 'publicationId') or '' )):
yield self.url_result(item['link'], ZenYandexIE, video_id.split(':')[-1]) yield self.url_result(item['link'], ZenYandexIE, item.get('id'), title=item.get('title'))
more = traverse_obj(feed_data, ('more', 'link', {url_or_none}))
current_page_id = next_page_id current_page_id = next_page_id
next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1)) next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1))
if not all((more, items, next_page_id, next_page_id != current_page_id)): if not all((more, next_page_id, next_page_id != current_page_id)):
break break
data = self._download_json(more, item_id, note=f'Downloading Page {page}') feed_data = self._download_json(more, channel_id, note=f'Downloading Page {page}')
items, more = data.get('items'), traverse_obj(data, ('more', 'link'))
def _real_extract(self, url): def _real_extract(self, url):
item_id = self._match_id(url) channel_id = self._match_id(url)
webpage = self._download_webpage(url, item_id) channel_id, ssr_data = self._fetch_ssr_data(url, channel_id)
redirect = self._search_json( channel_data = ssr_data['exportResponse']
r'var it\s*=', webpage, 'redirect', item_id, default={}).get('retpath')
if redirect:
item_id = self._match_id(redirect)
webpage = self._download_webpage(redirect, item_id, note='Redirecting')
data = self._search_json(
r'("data"\s*:|data\s*=)', webpage, 'channel data', item_id, contains_pattern=r'{\"__serverState__.+}')
server_state_json = traverse_obj(data, lambda k, _: k.startswith('__serverState__'), get_all=False)
server_settings_json = traverse_obj(data, lambda k, _: k.startswith('__serverSettings__'), get_all=False)
return self.playlist_result( return self.playlist_result(
self._entries(item_id, server_state_json, server_settings_json), self._entries(channel_data['feedData'], channel_id),
item_id, traverse_obj(server_state_json, ('channel', 'source', 'title')), channel_id, **traverse_obj(channel_data, ('channel', 'source', {
traverse_obj(server_state_json, ('channel', 'source', 'description'))) 'title': ('title', {str}),
'description': ('description', {str}),
})))

View File

@ -227,7 +227,7 @@ def extract_tag_box(regex, title):
return result return result
class YouPornListBase(InfoExtractor): class YouPornListBaseIE(InfoExtractor):
def _get_next_url(self, url, pl_id, html): def _get_next_url(self, url, pl_id, html):
return urljoin(url, self._search_regex( return urljoin(url, self._search_regex(
r'''<a [^>]*?\bhref\s*=\s*("|')(?P<url>(?:(?!\1)[^>])+)\1''', r'''<a [^>]*?\bhref\s*=\s*("|')(?P<url>(?:(?!\1)[^>])+)\1''',
@ -284,7 +284,7 @@ def _real_extract(self, url, html=None):
playlist_id=pl_id, playlist_title=title) playlist_id=pl_id, playlist_title=title)
class YouPornCategoryIE(YouPornListBase): class YouPornCategoryIE(YouPornListBaseIE):
IE_DESC = 'YouPorn category, with sorting, filtering and pagination' IE_DESC = 'YouPorn category, with sorting, filtering and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/
@ -319,7 +319,7 @@ class YouPornCategoryIE(YouPornListBase):
}] }]
class YouPornChannelIE(YouPornListBase): class YouPornChannelIE(YouPornListBaseIE):
IE_DESC = 'YouPorn channel, with sorting and pagination' IE_DESC = 'YouPorn channel, with sorting and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/
@ -349,7 +349,7 @@ def _get_title_from_slug(title_slug):
return re.sub(r'_', ' ', title_slug).title() return re.sub(r'_', ' ', title_slug).title()
class YouPornCollectionIE(YouPornListBase): class YouPornCollectionIE(YouPornListBaseIE):
IE_DESC = 'YouPorn collection (user playlist), with sorting and pagination' IE_DESC = 'YouPorn collection (user playlist), with sorting and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/
@ -394,7 +394,7 @@ def _real_extract(self, url):
return playlist return playlist
class YouPornTagIE(YouPornListBase): class YouPornTagIE(YouPornListBaseIE):
IE_DESC = 'YouPorn tag (porntags), with sorting, filtering and pagination' IE_DESC = 'YouPorn tag (porntags), with sorting, filtering and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/
@ -442,7 +442,7 @@ def _real_extract(self, url):
return super()._real_extract(url) return super()._real_extract(url)
class YouPornStarIE(YouPornListBase): class YouPornStarIE(YouPornListBaseIE):
IE_DESC = 'YouPorn Pornstar, with description, sorting and pagination' IE_DESC = 'YouPorn Pornstar, with description, sorting and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/
@ -493,7 +493,7 @@ def _real_extract(self, url):
} }
class YouPornVideosIE(YouPornListBase): class YouPornVideosIE(YouPornListBaseIE):
IE_DESC = 'YouPorn video (browse) playlists, with sorting, filtering and pagination' IE_DESC = 'YouPorn video (browse) playlists, with sorting, filtering and pagination'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?youporn\.com/ https?://(?:www\.)?youporn\.com/

View File

@ -0,0 +1,50 @@
# flake8: noqa: F401
from ._base import YoutubeBaseInfoExtractor
from ._clip import YoutubeClipIE
from ._mistakes import YoutubeTruncatedIDIE, YoutubeTruncatedURLIE
from ._notifications import YoutubeNotificationsIE
from ._redirect import (
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
)
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
# Hack to allow plugin overrides work
for _cls in [
YoutubeBaseInfoExtractor,
YoutubeClipIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeNotificationsIE,
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
YoutubeMusicSearchURLIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubePlaylistIE,
YoutubeTabBaseInfoExtractor,
YoutubeTabIE,
YoutubeIE,
]:
_cls.__module__ = 'yt_dlp.extractor.youtube'

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,66 @@
from ._tab import YoutubeTabBaseInfoExtractor
from ._video import YoutubeIE
from ...utils import ExtractorError, traverse_obj
class YoutubeClipIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:clip'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/clip/(?P<id>[^/?#]+)'
_TESTS = [{
# FIXME: Other metadata should be extracted from the clip, not from the base video
'url': 'https://www.youtube.com/clip/UgytZKpehg-hEMBSn3F4AaABCQ',
'info_dict': {
'id': 'UgytZKpehg-hEMBSn3F4AaABCQ',
'ext': 'mp4',
'section_start': 29.0,
'section_end': 39.7,
'duration': 10.7,
'age_limit': 0,
'availability': 'public',
'categories': ['Gaming'],
'channel': 'Scott The Woz',
'channel_id': 'UC4rqhyiTs7XyuODcECvuiiQ',
'channel_url': 'https://www.youtube.com/channel/UC4rqhyiTs7XyuODcECvuiiQ',
'description': 'md5:7a4517a17ea9b4bd98996399d8bb36e7',
'like_count': int,
'playable_in_embed': True,
'tags': 'count:17',
'thumbnail': 'https://i.ytimg.com/vi_webp/ScPX26pdQik/maxresdefault.webp',
'title': 'Mobile Games on Console - Scott The Woz',
'upload_date': '20210920',
'uploader': 'Scott The Woz',
'uploader_id': '@ScottTheWoz',
'uploader_url': 'https://www.youtube.com/@ScottTheWoz',
'view_count': int,
'live_status': 'not_live',
'channel_follower_count': int,
'chapters': 'count:20',
'comment_count': int,
'heatmap': 'count:100',
},
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
_, data = self._extract_webpage(url, clip_id)
video_id = traverse_obj(data, ('currentVideoEndpoint', 'watchEndpoint', 'videoId'))
if not video_id:
raise ExtractorError('Unable to find video ID')
clip_data = traverse_obj(data, (
'engagementPanels', ..., 'engagementPanelSectionListRenderer', 'content', 'clipSectionRenderer',
'contents', ..., 'clipAttributionRenderer', 'onScrubExit', 'commandExecutorCommand', 'commands', ...,
'openPopupAction', 'popup', 'notificationActionRenderer', 'actionButton', 'buttonRenderer', 'command',
'commandExecutorCommand', 'commands', ..., 'loopCommand'), get_all=False)
return {
'_type': 'url_transparent',
'url': f'https://www.youtube.com/watch?v={video_id}',
'ie_key': YoutubeIE.ie_key(),
'id': clip_id,
'section_start': int(clip_data['startTimeMs']) / 1000,
'section_end': int(clip_data['endTimeMs']) / 1000,
'_format_sort_fields': ( # https protocol is prioritized for ffmpeg compatibility
'proto:https', 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang'),
}

View File

@ -0,0 +1,69 @@
from ._base import YoutubeBaseInfoExtractor
from ...utils import ExtractorError
class YoutubeTruncatedURLIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list
_VALID_URL = r'''(?x)
(?:https?://)?
(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/
(?:watch\?(?:
feature=[a-z_]+|
annotation_id=annotation_[^&]+|
x-yt-cl=[0-9]+|
hl=[^&]*|
t=[0-9]+
)?
|
attribution_link\?a=[^&]+
)
$
'''
_TESTS = [{
'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?feature=foo',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?hl=en-GB',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?t=2372',
'only_matching': True,
}]
def _real_extract(self, url):
raise ExtractorError(
'Did you forget to quote the URL? Remember that & is a meta '
'character in most shells, so you want to put the URL in quotes, '
'like yt-dlp '
'"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
' or simply yt-dlp BaW_jenozKc .',
expected=True)
class YoutubeTruncatedIDIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_id'
IE_DESC = False # Do not list
_VALID_URL = r'https?://(?:www\.)?youtube\.com/watch\?v=(?P<id>[0-9A-Za-z_-]{1,10})$'
_TESTS = [{
'url': 'https://www.youtube.com/watch?v=N_708QY7Ob',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
raise ExtractorError(
f'Incomplete YouTube ID {video_id}. URL {url} looks truncated.',
expected=True)

View File

@ -0,0 +1,98 @@
import itertools
import re
from ._tab import YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
from ...utils import traverse_obj
class YoutubeNotificationsIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:notif'
IE_DESC = 'YouTube notifications; ":ytnotif" keyword (requires cookies)'
_VALID_URL = r':ytnotif(?:ication)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytnotif',
'only_matching': True,
}, {
'url': ':ytnotifications',
'only_matching': True,
}]
def _extract_notification_menu(self, response, continuation_list):
notification_list = traverse_obj(
response,
('actions', 0, 'openPopupAction', 'popup', 'multiPageMenuRenderer', 'sections', 0, 'multiPageMenuNotificationSectionRenderer', 'items'),
('actions', 0, 'appendContinuationItemsAction', 'continuationItems'),
expected_type=list) or []
continuation_list[0] = None
for item in notification_list:
entry = self._extract_notification_renderer(item.get('notificationRenderer'))
if entry:
yield entry
continuation = item.get('continuationItemRenderer')
if continuation:
continuation_list[0] = continuation
def _extract_notification_renderer(self, notification):
video_id = traverse_obj(
notification, ('navigationEndpoint', 'watchEndpoint', 'videoId'), expected_type=str)
url = f'https://www.youtube.com/watch?v={video_id}'
channel_id = None
if not video_id:
browse_ep = traverse_obj(
notification, ('navigationEndpoint', 'browseEndpoint'), expected_type=dict)
channel_id = self.ucid_or_none(traverse_obj(browse_ep, 'browseId', expected_type=str))
post_id = self._search_regex(
r'/post/(.+)', traverse_obj(browse_ep, 'canonicalBaseUrl', expected_type=str),
'post id', default=None)
if not channel_id or not post_id:
return
# The direct /post url redirects to this in the browser
url = f'https://www.youtube.com/channel/{channel_id}/community?lb={post_id}'
channel = traverse_obj(
notification, ('contextualMenu', 'menuRenderer', 'items', 1, 'menuServiceItemRenderer', 'text', 'runs', 1, 'text'),
expected_type=str)
notification_title = self._get_text(notification, 'shortMessage')
if notification_title:
notification_title = notification_title.replace('\xad', '') # remove soft hyphens
# TODO: handle recommended videos
title = self._search_regex(
rf'{re.escape(channel or "")}[^:]+: (.+)', notification_title,
'video title', default=None)
timestamp = (self._parse_time_text(self._get_text(notification, 'sentTimeText'))
if self._configuration_arg('approximate_date', ie_key=YoutubeTabIE)
else None)
return {
'_type': 'url',
'url': url,
'ie_key': (YoutubeIE if video_id else YoutubeTabIE).ie_key(),
'video_id': video_id,
'title': title,
'channel_id': channel_id,
'channel': channel,
'uploader': channel,
'thumbnails': self._extract_thumbnails(notification, 'videoThumbnail'),
'timestamp': timestamp,
}
def _notification_menu_entries(self, ytcfg):
continuation_list = [None]
response = None
for page in itertools.count(1):
ctoken = traverse_obj(
continuation_list, (0, 'continuationEndpoint', 'getNotificationMenuEndpoint', 'ctoken'), expected_type=str)
response = self._extract_response(
item_id=f'page {page}', query={'ctoken': ctoken} if ctoken else {}, ytcfg=ytcfg,
ep='notification/get_notification_menu', check_get_keys='actions',
headers=self.generate_api_headers(ytcfg=ytcfg, visitor_data=self._extract_visitor_data(response)))
yield from self._extract_notification_menu(response, continuation_list)
if not continuation_list[0]:
break
def _real_extract(self, url):
display_id = 'notifications'
ytcfg = self._download_ytcfg('web', display_id) if not self.skip_webpage else {}
self._report_playlist_authcheck(ytcfg)
return self.playlist_result(self._notification_menu_entries(ytcfg), display_id, display_id)

View File

@ -0,0 +1,247 @@
import base64
import urllib.parse
from ._base import YoutubeBaseInfoExtractor
from ._tab import YoutubeTabIE
from ...utils import ExtractorError, classproperty, parse_qs, update_url_query, url_or_none
class YoutubeYtBeIE(YoutubeBaseInfoExtractor):
IE_DESC = 'youtu.be'
_VALID_URL = rf'https?://youtu\.be/(?P<id>[0-9A-Za-z_-]{{11}})/*?.*?\blist=(?P<playlist_id>{YoutubeBaseInfoExtractor._PLAYLIST_ID_RE})'
_TESTS = [{
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
'info_dict': {
'id': 'yeWKywCrFtk',
'ext': 'mp4',
'title': 'Small Scale Baler and Braiding Rugs',
'uploader': 'Backus-Page House Museum',
'uploader_id': '@backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@backuspagemuseum',
'upload_date': '20161008',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
'like_count': int,
'age_limit': 0,
'playable_in_embed': True,
'thumbnail': r're:^https?://.*\.webp',
'channel': 'Backus-Page House Museum',
'channel_id': 'UCEfMCQ9bs3tjvjy1s451zaw',
'live_status': 'not_live',
'view_count': int,
'channel_url': 'https://www.youtube.com/channel/UCEfMCQ9bs3tjvjy1s451zaw',
'availability': 'public',
'duration': 59,
'comment_count': int,
'channel_follower_count': int,
},
'params': {
'noplaylist': True,
'skip_download': True,
},
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
playlist_id = mobj.group('playlist_id')
return self.url_result(
update_url_query('https://www.youtube.com/watch', {
'v': video_id,
'list': playlist_id,
'feature': 'youtu.be',
}), ie=YoutubeTabIE.ie_key(), video_id=playlist_id)
class YoutubeLivestreamEmbedIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube livestream embeds'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/embed/live_stream/?\?(?:[^#]+&)?channel=(?P<id>[^&#]+)'
_TESTS = [{
'url': 'https://www.youtube.com/embed/live_stream?channel=UC2_KI6RB__jGdlnK6dvFEZA',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/channel/{channel_id}/live',
ie=YoutubeTabIE.ie_key(), video_id=channel_id)
class YoutubeYtUserIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube user videos; "ytuser:" prefix'
IE_NAME = 'youtube:user'
_VALID_URL = r'ytuser:(?P<id>.+)'
_TESTS = [{
'url': 'ytuser:phihag',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
return self.url_result(f'https://www.youtube.com/user/{user_id}', YoutubeTabIE, user_id)
class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:favorites'
IE_DESC = 'YouTube liked videos; ":ytfav" keyword (requires cookies)'
_VALID_URL = r':ytfav(?:ou?rite)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytfav',
'only_matching': True,
}, {
'url': ':ytfavorites',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=LL',
ie=YoutubeTabIE.ie_key())
class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
"""
Base class for feed extractors
Subclasses must re-define the _FEED_NAME property.
"""
_LOGIN_REQUIRED = True
_FEED_NAME = 'feeds'
@classproperty
def IE_NAME(cls):
return f'youtube:{cls._FEED_NAME}'
def _real_extract(self, url):
return self.url_result(
f'https://www.youtube.com/feed/{self._FEED_NAME}', ie=YoutubeTabIE.ie_key())
class YoutubeWatchLaterIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:watchlater'
IE_DESC = 'Youtube watch later list; ":ytwatchlater" keyword (requires cookies)'
_VALID_URL = r':ytwatchlater'
_TESTS = [{
'url': ':ytwatchlater',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=WL', ie=YoutubeTabIE.ie_key())
class YoutubeRecommendedIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube recommended videos; ":ytrec" keyword'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/?(?:[?#]|$)|:ytrec(?:ommended)?'
_FEED_NAME = 'recommended'
_LOGIN_REQUIRED = False
_TESTS = [{
'url': ':ytrec',
'only_matching': True,
}, {
'url': ':ytrecommended',
'only_matching': True,
}, {
'url': 'https://youtube.com',
'only_matching': True,
}]
class YoutubeSubscriptionsIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)'
_VALID_URL = r':ytsub(?:scription)?s?'
_FEED_NAME = 'subscriptions'
_TESTS = [{
'url': ':ytsubs',
'only_matching': True,
}, {
'url': ':ytsubscriptions',
'only_matching': True,
}]
class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'Youtube watch history; ":ythis" keyword (requires cookies)'
_VALID_URL = r':ythis(?:tory)?'
_FEED_NAME = 'history'
_TESTS = [{
'url': ':ythistory',
'only_matching': True,
}]
class YoutubeShortsAudioPivotIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube Shorts audio pivot (Shorts using audio of a given video)'
IE_NAME = 'youtube:shorts:pivot:audio'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/source/(?P<id>[\w-]{11})/shorts'
_TESTS = [{
'url': 'https://www.youtube.com/source/Lyj-MZSAA9o/shorts',
'only_matching': True,
}]
@staticmethod
def _generate_audio_pivot_params(video_id):
"""
Generates sfv_audio_pivot browse params for this video id
"""
pb_params = b'\xf2\x05+\n)\x12\'\n\x0b%b\x12\x0b%b\x1a\x0b%b' % ((video_id.encode(),) * 3)
return urllib.parse.quote(base64.b64encode(pb_params).decode())
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/feed/sfv_audio_pivot?bp={self._generate_audio_pivot_params(video_id)}',
ie=YoutubeTabIE)
class YoutubeConsentRedirectIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:consent'
IE_DESC = False # Do not list
_VALID_URL = r'https?://consent\.youtube\.com/m\?'
_TESTS = [{
'url': 'https://consent.youtube.com/m?continue=https%3A%2F%2Fwww.youtube.com%2Flive%2FqVv6vCqciTM%3Fcbrd%3D1&gl=NL&m=0&pc=yt&hl=en&src=1',
'info_dict': {
'id': 'qVv6vCqciTM',
'ext': 'mp4',
'age_limit': 0,
'uploader_id': '@sana_natori',
'comment_count': int,
'chapters': 'count:13',
'upload_date': '20221223',
'thumbnail': 'https://i.ytimg.com/vi/qVv6vCqciTM/maxresdefault.jpg',
'channel_url': 'https://www.youtube.com/channel/UCIdEIHpS0TdkqRkHL5OkLtA',
'uploader_url': 'https://www.youtube.com/@sana_natori',
'like_count': int,
'release_date': '20221223',
'tags': ['Vtuber', '月ノ美兎', '名取さな', 'にじさんじ', 'クリスマス', '3D配信'],
'title': '【 #インターネット女クリスマス 】3Dで歌ってはしゃぐインターネットの女たち【月美兎/名取さな】',
'view_count': int,
'playable_in_embed': True,
'duration': 4438,
'availability': 'public',
'channel_follower_count': int,
'channel_id': 'UCIdEIHpS0TdkqRkHL5OkLtA',
'categories': ['Entertainment'],
'live_status': 'was_live',
'release_timestamp': 1671793345,
'channel': 'さなちゃんねる',
'description': 'md5:6aebf95cc4a1d731aebc01ad6cc9806d',
'uploader': 'さなちゃんねる',
'channel_is_verified': True,
'heatmap': 'count:100',
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
redirect_url = url_or_none(parse_qs(url).get('continue', [None])[-1])
if not redirect_url:
raise ExtractorError('Invalid cookie consent redirect URL', expected=True)
return self.url_result(redirect_url)

View File

@ -0,0 +1,167 @@
import urllib.parse
from ._tab import YoutubeTabBaseInfoExtractor
from ..common import SearchInfoExtractor
from ...utils import join_nonempty, parse_qs
class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_DESC = 'YouTube search'
IE_NAME = 'youtube:search'
_SEARCH_KEY = 'ytsearch'
_SEARCH_PARAMS = 'EgIQAfABAQ==' # Videos only
_TESTS = [{
'url': 'ytsearch5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'note': 'Suicide/self-harm search warning',
'url': 'ytsearch1:i hate myself and i wanna die',
'playlist_count': 1,
'info_dict': {
'id': 'i hate myself and i wanna die',
'title': 'i hate myself and i wanna die',
},
}]
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube search, newest videos first'
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
_TESTS = [{
'url': 'ytsearchdate5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}]
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube search URLs with sorting and filter support'
IE_NAME = YoutubeSearchIE.IE_NAME + '_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/(?:results|search)\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.youtube.com/results?search_query=python&sp=EgIQAg%253D%253D',
'playlist_mincount': 5,
'info_dict': {
'id': 'python',
'title': 'python',
},
}, {
'url': 'https://www.youtube.com/results?search_query=%23cats',
'playlist_mincount': 1,
'info_dict': {
'id': '#cats',
'title': '#cats',
# The test suite does not have support for nested playlists
# 'entries': [{
# 'url': r're:https://(www\.)?youtube\.com/hashtag/cats',
# 'title': '#cats',
# }],
},
}, {
# Channel results
'url': 'https://www.youtube.com/results?search_query=kurzgesagt&sp=EgIQAg%253D%253D',
'info_dict': {
'id': 'kurzgesagt',
'title': 'kurzgesagt',
},
'playlist': [{
'info_dict': {
'_type': 'url',
'id': 'UCsXVk37bltHxD1rDPwtNM8Q',
'url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'ie_key': 'YoutubeTab',
'channel': 'Kurzgesagt In a Nutshell',
'description': 'md5:4ae48dfa9505ffc307dad26342d06bfc',
'title': 'Kurzgesagt In a Nutshell',
'channel_id': 'UCsXVk37bltHxD1rDPwtNM8Q',
# No longer available for search as it is set to the handle.
# 'playlist_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'thumbnails': list,
'uploader_id': '@kurzgesagt',
'uploader_url': 'https://www.youtube.com/@kurzgesagt',
'uploader': 'Kurzgesagt In a Nutshell',
'channel_is_verified': True,
'channel_follower_count': int,
},
}],
'params': {'extract_flat': True, 'playlist_items': '1'},
'playlist_mincount': 1,
}, {
'url': 'https://www.youtube.com/results?q=test&sp=EgQIBBgB',
'only_matching': True,
}]
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
return self.playlist_result(self._search_results(query, qs.get('sp', (None,))[0]), query, query)
class YoutubeMusicSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube music search URLs with selectable sections, e.g. #songs'
IE_NAME = 'youtube:music:search_url'
_VALID_URL = r'https?://music\.youtube\.com/search\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://music.youtube.com/search?q=royalty+free+music',
'playlist_count': 16,
'info_dict': {
'id': 'royalty free music',
'title': 'royalty free music',
},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music&sp=EgWKAQIIAWoKEAoQAxAEEAkQBQ%3D%3D',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - songs',
'title': 'royalty free music - songs',
},
'params': {'extract_flat': 'in_playlist'},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music#community+playlists',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - community playlists',
'title': 'royalty free music - community playlists',
},
'params': {'extract_flat': 'in_playlist'},
}]
_SECTIONS = {
'albums': 'EgWKAQIYAWoKEAoQAxAEEAkQBQ==',
'artists': 'EgWKAQIgAWoKEAoQAxAEEAkQBQ==',
'community playlists': 'EgeKAQQoAEABagoQChADEAQQCRAF',
'featured playlists': 'EgeKAQQoADgBagwQAxAJEAQQDhAKEAU==',
'songs': 'EgWKAQIIAWoKEAoQAxAEEAkQBQ==',
'videos': 'EgWKAQIQAWoKEAoQAxAEEAkQBQ==',
}
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
params = qs.get('sp', (None,))[0]
if params:
section = next((k for k, v in self._SECTIONS.items() if v == params), params)
else:
section = urllib.parse.unquote_plus(([*url.split('#'), ''])[1]).lower()
params = self._SECTIONS.get(section)
if not params:
section = None
title = join_nonempty(query, section, delim=' - ')
return self.playlist_result(self._search_results(query, params, default_client='web_music'), title, title)

File diff suppressed because it is too large Load Diff

View File

@ -188,6 +188,7 @@ def js_number_to_string(val: float, radix: int = 10):
_NAME_RE = r'[a-zA-Z_$][\w$]*' _NAME_RE = r'[a-zA-Z_$][\w$]*'
_MATCHING_PARENS = dict(zip(*zip('()', '{}', '[]'))) _MATCHING_PARENS = dict(zip(*zip('()', '{}', '[]')))
_QUOTES = '\'"/' _QUOTES = '\'"/'
_NESTED_BRACKETS = r'[^[\]]+(?:\[[^[\]]+(?:\[[^\]]+\])?\])?'
class JS_Undefined: class JS_Undefined:
@ -301,7 +302,7 @@ def _separate(expr, delim=',', max_split=None):
OP_CHARS = '+-*/%&|^=<>!,;{}:[' OP_CHARS = '+-*/%&|^=<>!,;{}:['
if not expr: if not expr:
return return
counters = {k: 0 for k in _MATCHING_PARENS.values()} counters = dict.fromkeys(_MATCHING_PARENS.values(), 0)
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1 start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
in_quote, escaping, after_op, in_regex_char_group = None, False, True, False in_quote, escaping, after_op, in_regex_char_group = None, False, True, False
for idx, char in enumerate(expr): for idx, char in enumerate(expr):
@ -606,15 +607,18 @@ def dict_item(key, val):
m = re.match(fr'''(?x) m = re.match(fr'''(?x)
(?P<assign> (?P<assign>
(?P<out>{_NAME_RE})(?:\[(?P<index>[^\]]+?)\])?\s* (?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s*
(?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})? (?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})?
=(?!=)(?P<expr>.*)$ =(?!=)(?P<expr>.*)$
)|(?P<return> )|(?P<return>
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$ (?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:
(?P<nullish>\?)?\.(?P<member>[^(]+)|
\[(?P<member2>{_NESTED_BRACKETS})\]
)\s*
)|(?P<indexing> )|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$ (?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:(?P<nullish>\?)?\.(?P<member>[^(]+)|\[(?P<member2>[^\]]+)\])\s*
)|(?P<function> )|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$ (?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr) )''', expr)
@ -707,7 +711,7 @@ def eval_method():
if obj is NO_DEFAULT: if obj is NO_DEFAULT:
if variable not in self._objects: if variable not in self._objects:
try: try:
self._objects[variable] = self.extract_object(variable) self._objects[variable] = self.extract_object(variable, local_vars)
except self.Exception: except self.Exception:
if not nullish: if not nullish:
raise raise
@ -847,7 +851,7 @@ def interpret_expression(self, expr, local_vars, allow_recursion):
raise self.Exception('Cannot return from an expression', expr) raise self.Exception('Cannot return from an expression', expr)
return ret return ret
def extract_object(self, objname): def extract_object(self, objname, *global_stack):
_FUNC_NAME_RE = r'''(?:[a-zA-Z$0-9]+|"[a-zA-Z$0-9]+"|'[a-zA-Z$0-9]+')''' _FUNC_NAME_RE = r'''(?:[a-zA-Z$0-9]+|"[a-zA-Z$0-9]+"|'[a-zA-Z$0-9]+')'''
obj = {} obj = {}
obj_m = re.search( obj_m = re.search(
@ -869,7 +873,8 @@ def extract_object(self, objname):
for f in fields_m: for f in fields_m:
argnames = f.group('args').split(',') argnames = f.group('args').split(',')
name = remove_quotes(f.group('key')) name = remove_quotes(f.group('key'))
obj[name] = function_with_repr(self.build_function(argnames, f.group('code')), f'F<{name}>') obj[name] = function_with_repr(
self.build_function(argnames, f.group('code'), *global_stack), f'F<{name}>')
return obj return obj
@ -890,9 +895,9 @@ def extract_function_code(self, funcname):
code, _ = self._separate_at_paren(func_m.group('code')) code, _ = self._separate_at_paren(func_m.group('code'))
return [x.strip() for x in func_m.group('args').split(',')], code return [x.strip() for x in func_m.group('args').split(',')], code
def extract_function(self, funcname): def extract_function(self, funcname, *global_stack):
return function_with_repr( return function_with_repr(
self.extract_function_from_code(*self.extract_function_code(funcname)), self.extract_function_from_code(*self.extract_function_code(funcname), *global_stack),
f'F<{funcname}>') f'F<{funcname}>')
def extract_function_from_code(self, argnames, code, *global_stack): def extract_function_from_code(self, argnames, code, *global_stack):

View File

@ -1,6 +1,7 @@
from __future__ import annotations from __future__ import annotations
import io import io
import itertools
import math import math
import re import re
import urllib.parse import urllib.parse
@ -31,9 +32,9 @@
curl_cffi_version = tuple(map(int, re.split(r'[^\d]+', curl_cffi.__version__)[:3])) curl_cffi_version = tuple(map(int, re.split(r'[^\d]+', curl_cffi.__version__)[:3]))
if curl_cffi_version != (0, 5, 10) and not ((0, 7, 0) <= curl_cffi_version < (0, 7, 2)): if curl_cffi_version != (0, 5, 10) and not (0, 10) <= curl_cffi_version:
curl_cffi._yt_dlp__version = f'{curl_cffi.__version__} (unsupported)' curl_cffi._yt_dlp__version = f'{curl_cffi.__version__} (unsupported)'
raise ImportError('Only curl_cffi versions 0.5.10, 0.7.0 and 0.7.1 are supported') raise ImportError('Only curl_cffi versions 0.5.10 and 0.10.x are supported')
import curl_cffi.requests import curl_cffi.requests
from curl_cffi.const import CurlECode, CurlOpt from curl_cffi.const import CurlECode, CurlOpt
@ -97,7 +98,7 @@ def read(self, amt=None):
return self.fp.read(amt) return self.fp.read(amt)
except curl_cffi.requests.errors.RequestsError as e: except curl_cffi.requests.errors.RequestsError as e:
if e.code == CurlECode.PARTIAL_FILE: if e.code == CurlECode.PARTIAL_FILE:
content_length = int_or_none(e.response.headers.get('Content-Length')) content_length = e.response and int_or_none(e.response.headers.get('Content-Length'))
raise IncompleteRead( raise IncompleteRead(
partial=self.fp.bytes_read, partial=self.fp.bytes_read,
expected=content_length - self.fp.bytes_read if content_length is not None else None, expected=content_length - self.fp.bytes_read if content_length is not None else None,
@ -105,6 +106,51 @@ def read(self, amt=None):
raise TransportError(cause=e) from e raise TransportError(cause=e) from e
# See: https://github.com/lexiforest/curl_cffi?tab=readme-ov-file#supported-impersonate-browsers
# https://github.com/lexiforest/curl-impersonate?tab=readme-ov-file#supported-browsers
BROWSER_TARGETS: dict[tuple[int, ...], dict[str, ImpersonateTarget]] = {
(0, 5): {
'chrome99': ImpersonateTarget('chrome', '99', 'windows', '10'),
'chrome99_android': ImpersonateTarget('chrome', '99', 'android', '12'),
'chrome100': ImpersonateTarget('chrome', '100', 'windows', '10'),
'chrome101': ImpersonateTarget('chrome', '101', 'windows', '10'),
'chrome104': ImpersonateTarget('chrome', '104', 'windows', '10'),
'chrome107': ImpersonateTarget('chrome', '107', 'windows', '10'),
'chrome110': ImpersonateTarget('chrome', '110', 'windows', '10'),
'edge99': ImpersonateTarget('edge', '99', 'windows', '10'),
'edge101': ImpersonateTarget('edge', '101', 'windows', '10'),
'safari15_3': ImpersonateTarget('safari', '15.3', 'macos', '11'),
'safari15_5': ImpersonateTarget('safari', '15.5', 'macos', '12'),
},
(0, 7): {
'chrome116': ImpersonateTarget('chrome', '116', 'windows', '10'),
'chrome119': ImpersonateTarget('chrome', '119', 'macos', '14'),
'chrome120': ImpersonateTarget('chrome', '120', 'macos', '14'),
'chrome123': ImpersonateTarget('chrome', '123', 'macos', '14'),
'chrome124': ImpersonateTarget('chrome', '124', 'macos', '14'),
'safari17_0': ImpersonateTarget('safari', '17.0', 'macos', '14'),
'safari17_2_ios': ImpersonateTarget('safari', '17.2', 'ios', '17.2'),
},
(0, 9): {
'safari15_3': ImpersonateTarget('safari', '15.3', 'macos', '14'),
'safari15_5': ImpersonateTarget('safari', '15.5', 'macos', '14'),
'chrome119': ImpersonateTarget('chrome', '119', 'macos', '14'),
'chrome120': ImpersonateTarget('chrome', '120', 'macos', '14'),
'chrome123': ImpersonateTarget('chrome', '123', 'macos', '14'),
'chrome124': ImpersonateTarget('chrome', '124', 'macos', '14'),
'chrome131': ImpersonateTarget('chrome', '131', 'macos', '14'),
'chrome131_android': ImpersonateTarget('chrome', '131', 'android', '14'),
'chrome133a': ImpersonateTarget('chrome', '133', 'macos', '15'),
'firefox133': ImpersonateTarget('firefox', '133', 'macos', '14'),
'safari18_0': ImpersonateTarget('safari', '18.0', 'macos', '15'),
'safari18_0_ios': ImpersonateTarget('safari', '18.0', 'ios', '18.0'),
},
(0, 10): {
'firefox135': ImpersonateTarget('firefox', '135', 'macos', '14'),
},
}
@register_rh @register_rh
class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin): class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
RH_NAME = 'curl_cffi' RH_NAME = 'curl_cffi'
@ -112,30 +158,21 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
_SUPPORTED_FEATURES = (Features.NO_PROXY, Features.ALL_PROXY) _SUPPORTED_FEATURES = (Features.NO_PROXY, Features.ALL_PROXY)
_SUPPORTED_PROXY_SCHEMES = ('http', 'https', 'socks4', 'socks4a', 'socks5', 'socks5h') _SUPPORTED_PROXY_SCHEMES = ('http', 'https', 'socks4', 'socks4a', 'socks5', 'socks5h')
_SUPPORTED_IMPERSONATE_TARGET_MAP = { _SUPPORTED_IMPERSONATE_TARGET_MAP = {
**({ target: name if curl_cffi_version >= (0, 9) else curl_cffi.requests.BrowserType[name]
ImpersonateTarget('chrome', '124', 'macos', '14'): curl_cffi.requests.BrowserType.chrome124, for name, target in dict(sorted(itertools.chain.from_iterable(
ImpersonateTarget('chrome', '123', 'macos', '14'): curl_cffi.requests.BrowserType.chrome123, targets.items()
ImpersonateTarget('chrome', '120', 'macos', '14'): curl_cffi.requests.BrowserType.chrome120, for version, targets in BROWSER_TARGETS.items()
ImpersonateTarget('chrome', '119', 'macos', '14'): curl_cffi.requests.BrowserType.chrome119, if curl_cffi_version >= version
ImpersonateTarget('chrome', '116', 'windows', '10'): curl_cffi.requests.BrowserType.chrome116, ), key=lambda x: (
} if curl_cffi_version >= (0, 7, 0) else {}), # deprioritize mobile targets since they give very different behavior
ImpersonateTarget('chrome', '110', 'windows', '10'): curl_cffi.requests.BrowserType.chrome110, x[1].os not in ('ios', 'android'),
ImpersonateTarget('chrome', '107', 'windows', '10'): curl_cffi.requests.BrowserType.chrome107, # prioritize edge < firefox < safari < chrome
ImpersonateTarget('chrome', '104', 'windows', '10'): curl_cffi.requests.BrowserType.chrome104, ('edge', 'firefox', 'safari', 'chrome').index(x[1].client),
ImpersonateTarget('chrome', '101', 'windows', '10'): curl_cffi.requests.BrowserType.chrome101, # prioritize newest version
ImpersonateTarget('chrome', '100', 'windows', '10'): curl_cffi.requests.BrowserType.chrome100, float(x[1].version) if x[1].version else 0,
ImpersonateTarget('chrome', '99', 'windows', '10'): curl_cffi.requests.BrowserType.chrome99, # group by os name
ImpersonateTarget('edge', '101', 'windows', '10'): curl_cffi.requests.BrowserType.edge101, x[1].os,
ImpersonateTarget('edge', '99', 'windows', '10'): curl_cffi.requests.BrowserType.edge99, ), reverse=True)).items()
**({
ImpersonateTarget('safari', '17.0', 'macos', '14'): curl_cffi.requests.BrowserType.safari17_0,
} if curl_cffi_version >= (0, 7, 0) else {}),
ImpersonateTarget('safari', '15.5', 'macos', '12'): curl_cffi.requests.BrowserType.safari15_5,
ImpersonateTarget('safari', '15.3', 'macos', '11'): curl_cffi.requests.BrowserType.safari15_3,
ImpersonateTarget('chrome', '99', 'android', '12'): curl_cffi.requests.BrowserType.chrome99_android,
**({
ImpersonateTarget('safari', '17.2', 'ios', '17.2'): curl_cffi.requests.BrowserType.safari17_2_ios,
} if curl_cffi_version >= (0, 7, 0) else {}),
} }
def _create_instance(self, cookiejar=None): def _create_instance(self, cookiejar=None):

View File

@ -21,9 +21,11 @@
urllib3_version = tuple(int_or_none(x, default=0) for x in urllib3.__version__.split('.')) urllib3_version = tuple(int_or_none(x, default=0) for x in urllib3.__version__.split('.'))
if urllib3_version < (1, 26, 17): if urllib3_version < (1, 26, 17):
urllib3._yt_dlp__version = f'{urllib3.__version__} (unsupported)'
raise ImportError('Only urllib3 >= 1.26.17 is supported') raise ImportError('Only urllib3 >= 1.26.17 is supported')
if requests.__build__ < 0x023202: if requests.__build__ < 0x023202:
requests._yt_dlp__version = f'{requests.__version__} (unsupported)'
raise ImportError('Only requests >= 2.32.2 is supported') raise ImportError('Only requests >= 2.32.2 is supported')
import requests.adapters import requests.adapters
@ -296,6 +298,7 @@ def _check_extensions(self, extensions):
extensions.pop('cookiejar', None) extensions.pop('cookiejar', None)
extensions.pop('timeout', None) extensions.pop('timeout', None)
extensions.pop('legacy_ssl', None) extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def _create_instance(self, cookiejar, legacy_ssl_support=None): def _create_instance(self, cookiejar, legacy_ssl_support=None):
session = RequestsSession() session = RequestsSession()
@ -312,11 +315,12 @@ def _create_instance(self, cookiejar, legacy_ssl_support=None):
session.trust_env = False # no need, we already load proxies from env session.trust_env = False # no need, we already load proxies from env
return session return session
def _send(self, request): def _prepare_headers(self, _, headers):
headers = self._merge_headers(request.headers)
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS) add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
max_redirects_exceeded = False max_redirects_exceeded = False
session = self._get_instance( session = self._get_instance(

View File

@ -379,13 +379,15 @@ def _create_instance(self, proxies, cookiejar, legacy_ssl_support=None):
opener.addheaders = [] opener.addheaders = []
return opener return opener
def _send(self, request): def _prepare_headers(self, _, headers):
headers = self._merge_headers(request.headers)
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS) add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
urllib_req = urllib.request.Request( urllib_req = urllib.request.Request(
url=request.url, url=request.url,
data=request.data, data=request.data,
headers=dict(headers), headers=headers,
method=request.method, method=request.method,
) )

View File

@ -34,6 +34,7 @@
websockets_version = tuple(map(int_or_none, websockets.version.version.split('.'))) websockets_version = tuple(map(int_or_none, websockets.version.version.split('.')))
if websockets_version < (13, 0): if websockets_version < (13, 0):
websockets._yt_dlp__version = f'{websockets.version.version} (unsupported)'
raise ImportError('Only websockets>=13.0 is supported') raise ImportError('Only websockets>=13.0 is supported')
import websockets.sync.client import websockets.sync.client
@ -116,6 +117,7 @@ def _check_extensions(self, extensions):
extensions.pop('timeout', None) extensions.pop('timeout', None)
extensions.pop('cookiejar', None) extensions.pop('cookiejar', None)
extensions.pop('legacy_ssl', None) extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def close(self): def close(self):
# Remove the logging handler that contains a reference to our logger # Remove the logging handler that contains a reference to our logger
@ -123,15 +125,16 @@ def close(self):
for name, handler in self.__logging_handlers.items(): for name, handler in self.__logging_handlers.items():
logging.getLogger(name).removeHandler(handler) logging.getLogger(name).removeHandler(handler)
def _send(self, request): def _prepare_headers(self, request, headers):
timeout = self._calculate_timeout(request)
headers = self._merge_headers(request.headers)
if 'cookie' not in headers: if 'cookie' not in headers:
cookiejar = self._get_cookiejar(request) cookiejar = self._get_cookiejar(request)
cookie_header = cookiejar.get_cookie_header(request.url) cookie_header = cookiejar.get_cookie_header(request.url)
if cookie_header: if cookie_header:
headers['cookie'] = cookie_header headers['cookie'] = cookie_header
def _send(self, request):
timeout = self._calculate_timeout(request)
headers = self._get_headers(request)
wsuri = parse_uri(request.url) wsuri = parse_uri(request.url)
create_conn_kwargs = { create_conn_kwargs = {
'source_address': (self.source_address, 0) if self.source_address else None, 'source_address': (self.source_address, 0) if self.source_address else None,

View File

@ -206,6 +206,7 @@ class RequestHandler(abc.ABC):
- `cookiejar`: Cookiejar to use for this request. - `cookiejar`: Cookiejar to use for this request.
- `timeout`: socket timeout to use for this request. - `timeout`: socket timeout to use for this request.
- `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support. - `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support.
- `keep_header_casing`: Keep the casing of headers when sending the request.
To enable these, add extensions.pop('<extension>', None) to _check_extensions To enable these, add extensions.pop('<extension>', None) to _check_extensions
Apart from the url protocol, proxies dict may contain the following keys: Apart from the url protocol, proxies dict may contain the following keys:
@ -259,6 +260,23 @@ def _make_sslcontext(self, legacy_ssl_support=None):
def _merge_headers(self, request_headers): def _merge_headers(self, request_headers):
return HTTPHeaderDict(self.headers, request_headers) return HTTPHeaderDict(self.headers, request_headers)
def _prepare_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external use.
Subclasses may define a _prepare_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers)
self._prepare_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
def _calculate_timeout(self, request): def _calculate_timeout(self, request):
return float(request.extensions.get('timeout') or self.timeout) return float(request.extensions.get('timeout') or self.timeout)
@ -317,6 +335,7 @@ def _check_extensions(self, extensions):
assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType)) assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType))
assert isinstance(extensions.get('timeout'), (float, int, NoneType)) assert isinstance(extensions.get('timeout'), (float, int, NoneType))
assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType)) assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType))
assert isinstance(extensions.get('keep_header_casing'), (bool, NoneType))
def _validate(self, request): def _validate(self, request):
self._check_url_scheme(request) self._check_url_scheme(request)

View File

@ -5,11 +5,11 @@
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any from typing import Any
from .common import RequestHandler, register_preference from .common import RequestHandler, register_preference, Request
from .exceptions import UnsupportedRequest from .exceptions import UnsupportedRequest
from ..compat.types import NoneType from ..compat.types import NoneType
from ..utils import classproperty, join_nonempty from ..utils import classproperty, join_nonempty
from ..utils.networking import std_headers from ..utils.networking import std_headers, HTTPHeaderDict
@dataclass(order=True, frozen=True) @dataclass(order=True, frozen=True)
@ -123,7 +123,17 @@ def _get_request_target(self, request):
"""Get the requested target for the request""" """Get the requested target for the request"""
return self._resolve_target(request.extensions.get('impersonate') or self.impersonate) return self._resolve_target(request.extensions.get('impersonate') or self.impersonate)
def _get_impersonate_headers(self, request): def _prepare_impersonate_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_impersonate_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external impersonation use.
Subclasses may define a _prepare_impersonate_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers) headers = self._merge_headers(request.headers)
if self._get_request_target(request) is not None: if self._get_request_target(request) is not None:
# remove all headers present in std_headers # remove all headers present in std_headers
@ -131,7 +141,11 @@ def _get_impersonate_headers(self, request):
for k, v in std_headers.items(): for k, v in std_headers.items():
if headers.get(k) == v: if headers.get(k) == v:
headers.pop(k) headers.pop(k)
return headers
self._prepare_impersonate_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
@register_preference(ImpersonateRequestHandler) @register_preference(ImpersonateRequestHandler)

View File

@ -500,7 +500,8 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'], 'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'],
'2021': ['2022', 'no-certifi', 'filename-sanitization'], '2021': ['2022', 'no-certifi', 'filename-sanitization'],
'2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'], '2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'],
'2023': ['prefer-vp9-sort'], '2023': ['2024', 'prefer-vp9-sort'],
'2024': [],
}, },
}, help=( }, help=(
'Options that can help keep compatibility with youtube-dl or youtube-dlc ' 'Options that can help keep compatibility with youtube-dl or youtube-dlc '

Some files were not shown because too many files have changed in this diff Show More