1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-20 13:51:20 +00:00

Compare commits

..

58 Commits

Author SHA1 Message Date
github-actions[bot]
b4488a9e12 Release 2025.03.21
Created by: bashonly

:ci skip all
2025-03-21 23:49:09 +00:00
Simon Sawicki
f36e4b6e65 [cleanup] Misc (#12526)
Authored by: Grub4K, seproDev, gamer191, dirkf

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-21 23:41:56 +00:00
D Trombett
983095485c [ie/loco] Add extractor (#12667)
Closes #12496
Authored by: DTrombett
2025-03-21 23:24:13 +00:00
Michaël De Boey
bbada3ec07 [ie/ketnet] Remove extractor (#12628)
Authored by: MichaelDeBoey
2025-03-21 23:19:36 +00:00
Michiel Sikma
8305df0001 [ie/soop] Fix timestamp extraction (#12609)
Closes #12606
Authored by: msikma
2025-03-21 23:16:30 +00:00
bashonly
7223d29569 [ie/mitele] Fix extractor (#12689)
Closes #12655
Authored by: bashonly
2025-03-21 23:14:46 +00:00
bashonly
f5fb2229e6 [ie/BilibiliPlaylist] Fix extractor (#12690)
Closes #12651
Authored by: bashonly
2025-03-21 23:04:58 +00:00
JChris246
89a68c4857 [ie/jamendo] Fix thumbnail extraction (#12622)
Closes #11779
Authored by: JChris246, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-21 23:04:34 +00:00
sepro
9b868518a1 [ie/youtube] Fix nsig and signature extraction for player 643afba4 (#12684)
Closes #12677, Closes #12682
Authored by: seproDev, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-21 20:58:10 +00:00
D Trombett
2ee3a0aff9 [ie/tv8.it] Add live and playlist extractors (#12569)
Closes #12542
Authored by: DTrombett
2025-03-16 23:10:16 +01:00
Arc8ne
01a8be4c23 [ie/Canalsurmas] Add extractor (#12497)
Closes #5516
Authored by: Arc8ne
2025-03-16 23:03:10 +01:00
Refael Ackermann
ebac65aa9e [ie/NBCStations] Fix extractor (#12534)
Authored by: refack
2025-03-16 21:41:32 +00:00
thedenv
4815dac131 [ie/msn] Rework extractor (#12513)
Closes #3225
Authored by: thedenv, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-16 19:54:46 +01:00
Simon Sawicki
95f8df2f79 [networking] Always add unsupported suffix on version mismatch (#12626)
Authored by: Grub4K
2025-03-16 12:45:44 +01:00
coletdjnz
e67d786c7c [ie/youtube] Warn on DRM formats (#12593)
Authored by: coletdjnz
2025-03-16 10:28:16 +13:00
sepro
d9a53cc1e6 [ie/reddit] Truncate title (#12567)
Authored by: seproDev
2025-03-15 22:16:00 +01:00
sepro
83b119dadb [ie/tiktok] Truncate title (#12566)
Authored by: seproDev
2025-03-15 22:15:29 +01:00
sepro
06f6de78db [ie/twitter] Truncate title (#12560)
Authored by: seproDev
2025-03-15 22:15:03 +01:00
sepro
3380febe99 [ie/youtube] Player client maintenance (#12603)
Authored by: seproDev
2025-03-15 21:57:56 +01:00
rysson
be0d819e11 [ie/cda] Fix login support (#12552)
Closes #10306
Authored by: rysson
2025-03-15 21:47:50 +01:00
Michaël De Boey
df9ebeec00 [ie/vrtmax] Rework extractor (#12479)
Closes #7997, Closes #8174, Closes #9375
Authored by: MichaelDeBoey, bergoid, seproDev

Co-authored-by: bergoid <bergoid@users.noreply.github.com>
Co-authored-by: sepro <sepro@sepr0.com>
2025-03-15 21:29:22 +01:00
fireattack
17504f2535 [ie/openrec] Fix _VALID_URL (#12608)
Authored by: fireattack
2025-03-15 17:14:01 +01:00
coletdjnz
4432a9390c [ie/youtube] Split into package (#12557)
Authored by: coletdjnz
2025-03-13 17:37:33 +13:00
sepro
05c8023a27 [ie/vk] Improve metadata extraction (#12510)
Closes #12509
Authored by: seproDev
2025-03-07 22:14:38 +01:00
bashonly
bd0a668169 [ie/pinterest] Fix extractor (#12538)
Closes #12529
Authored by: mikf

Co-authored-by: =?UTF-8?q?Mike=20F=C3=A4hrmann?= <mike_faehrmann@web.de>
2025-03-05 06:38:23 +00:00
bashonly
b8b4754704 [ie/twitter] Fix syndication token generation (#12537)
Fix 14cd7f3443

Authored by: bashonly
2025-03-05 06:22:52 +00:00
u-spec-png
9d70abe4de [ie/N1] Fix extraction of newer articles (#12514)
Authored by: u-spec-png
2025-03-04 01:51:23 +01:00
sepro
8eb9c1bf3b [ie/RTP] Rework extractor (#11638)
Closes #4661, Closes #10393, Closes #11244
Authored by: seproDev, vallovic, red-acid, pferreir, somini

Co-authored-by: vallovic <vallovic@gmail.com>
Co-authored-by: red-acid <161967284+red-acid@users.noreply.github.com>
Co-authored-by: Pedro Ferreira <pedro@dete.st>
Co-authored-by: somini <dev@somini.xyz>
2025-03-04 00:46:18 +01:00
fries1234
42b7440963 [ie/tvw] Add extractor (#12271)
Authored by: fries1234
2025-03-03 23:25:30 +01:00
sepro
172d5fcd77 [ie/MagellanTV] Fix extractor (#12505)
Closes #12498
Authored by: seproDev
2025-03-03 22:55:03 +01:00
Simon Sawicki
7d18fed8f1 [networking] Add keep_header_casing extension (#11652)
Authored by: coletdjnz, Grub4K

Co-authored-by: coletdjnz <coletdjnz@protonmail.com>
2025-03-03 00:10:01 +01:00
coletdjnz
79ec2fdff7 [ie/youtube] Warn on missing formats due to SSAP (#12483)
See https://github.com/yt-dlp/yt-dlp/issues/12482

Authored by: coletdjnz
2025-02-28 19:33:31 +13:00
sepro
3042afb5fe [ie/CultureUnplugged] Extend _VALID_URL (#12486)
Closes #12477
Authored by: seproDev
2025-02-26 19:39:50 +01:00
sepro
ad60137c14 [ie/Dailymotion] Improve embed detection (#12464)
Closes #12453
Authored by: seproDev
2025-02-26 19:36:33 +01:00
4ft35t
0bb3978862 [ie/weibo] Support playlists (#12284)
Closes #12283
Authored by: 4ft35t
2025-02-23 19:16:06 +00:00
XPA
7508e34f20 [ie/niconico] Fix format sorting (#12442)
Authored by: xpadev-net
2025-02-23 19:07:08 +00:00
bashonly
9807181cfb [ie/lbry] Make m3u8 format extraction non-fatal (#12463)
Closes #12459
Authored by: bashonly
2025-02-23 18:24:48 +00:00
bashonly
7126b47260 [ie/lbry] Raise appropriate error for non-media files (#12462)
Closes #12182
Authored by: bashonly
2025-02-23 17:59:22 +00:00
bashonly
eb1417786a [ie/gem.cbc.ca] Fix login support (#12414)
Closes #12406
Authored by: bashonly
2025-02-23 09:56:47 +00:00
bashonly
6933f5670c [ie/playsuisse] Fix login support (#12444)
Closes #12425
Authored by: bashonly
2025-02-23 09:22:51 +00:00
Alexander Seiler
26a502fc72 [ie/azmedien] Fix extractor (#12375)
Authored by: goggle
2025-02-23 09:14:35 +00:00
Ben Faerber
652827d5a0 [ie/softwhiteunderbelly] Add extractor (#12281)
Authored by: benfaerber
2025-02-23 09:11:58 +00:00
Pedro Belo
0e1697232f [ie/globo] Fix subtitles extraction (#12270)
Authored by: pedro
2025-02-23 08:57:27 +00:00
Kenshin9977
9f77e04c76 Fix external downloader availability when using --ffmpeg-location (#12318)
This fix is only applicable to the CLI option

Authored by: Kenshin9977
2025-02-23 08:50:43 +00:00
Simon Sawicki
c034d65548 Fix lazy extractor state (Fix 4445f37a7a) (#12452)
Authored by: coletdjnz, Grub4K, pukkandan
2025-02-23 09:44:27 +01:00
bashonly
480125560a [ie/instagram] Improve error handling (#12410)
Closes #5967, Closes #6294, Closes #7328, Closes #8452
Authored by: bashonly
2025-02-23 08:35:22 +00:00
bashonly
a59abe0636 [ie/instagram] Fix extraction of older private posts (#12451)
Authored by: bashonly
2025-02-23 08:31:00 +00:00
Chris Ellsworth
a90641c836 [ie/instagram] Add app_id extractor-arg (#12359)
Authored by: chrisellsworth
2025-02-23 08:16:04 +00:00
fireattack
65c3c58c0a [ie/instagram:story] Support --no-playlist (#12397)
Closes #12395
Authored by: fireattack
2025-02-23 07:24:21 +00:00
bashonly
99ea297875 [ie/tiktok] Improve error handling (#12445)
Closes #8678
Authored by: bashonly
2025-02-23 06:53:13 +00:00
bashonly
6deeda5c11 [ie/soundcloud] Fix thumbnail extraction (#12447)
Closes #11835, Closes #12435
Authored by: bashonly
2025-02-23 06:20:53 +00:00
Refael Ackermann
7f3006eb0c [ie/wsj] Support opinion URLs and impersonation (#12431)
Authored by: refack
2025-02-23 00:40:53 +00:00
coletdjnz
4445f37a7a [core] Load plugins on demand (#11305)
- Adds `--no-plugin-dirs` to disable plugin loading
- `--plugin-dirs` now supports post-processors

Authored by: coletdjnz, Grub4K, pukkandan
2025-02-23 11:00:46 +13:00
sepro
3a1583ca75 [ie/BunnyCdn] Add extractor (#11586)
Also adds BunnyCdnFD

Authored by: seproDev, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2025-02-21 22:39:41 +01:00
Simon Sawicki
a3e0c7d3b2 [test] Show all differences for expect_value and expect_dict (#12334)
Authored by: Grub4K
2025-02-21 21:29:07 +01:00
Simon Sawicki
f7a1f2d813 [core] Support emitting ConEmu progress codes (#10649)
Authored by: Grub4K
2025-02-20 20:33:31 +01:00
bashonly
9deed13d7c [ie/soundcloud] Extract tags (#12420)
Authored by: bashonly
2025-02-20 15:51:08 +00:00
bashonly
c2e6e1d5f7 [ie/niconico:live] Fix thumbnail extraction (#12419)
Closes #12417
Authored by: bashonly
2025-02-20 15:39:06 +00:00
101 changed files with 7265 additions and 5245 deletions

View File

@@ -742,3 +742,18 @@ lfavole
mp3butcher mp3butcher
slipinthedove slipinthedove
YoshiTabletopGamer YoshiTabletopGamer
Arc8ne
benfaerber
chrisellsworth
fries1234
Kenshin9977
MichaelDeBoey
msikma
pedro
pferreir
red-acid
refack
rysson
somini
thedenv
vallovic

View File

@@ -4,6 +4,79 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.03.21
#### Core changes
- [Fix external downloader availability when using `--ffmpeg-location`](https://github.com/yt-dlp/yt-dlp/commit/9f77e04c76e36e1cbbf49bc9eb385fa6ef804b67) ([#12318](https://github.com/yt-dlp/yt-dlp/issues/12318)) by [Kenshin9977](https://github.com/Kenshin9977)
- [Load plugins on demand](https://github.com/yt-dlp/yt-dlp/commit/4445f37a7a66b248dbd8376c43137e6e441f138e) ([#11305](https://github.com/yt-dlp/yt-dlp/issues/11305)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan) (With fixes in [c034d65](https://github.com/yt-dlp/yt-dlp/commit/c034d655487be668222ef9476a16f374584e49a7))
- [Support emitting ConEmu progress codes](https://github.com/yt-dlp/yt-dlp/commit/f7a1f2d8132967a62b0f6d5665c6d2dde2d42c09) ([#10649](https://github.com/yt-dlp/yt-dlp/issues/10649)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **azmedien**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/26a502fc727d0e91b2db6bf4a112823bcc672e85) ([#12375](https://github.com/yt-dlp/yt-dlp/issues/12375)) by [goggle](https://github.com/goggle)
- **bilibiliplaylist**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f5fb2229e66cf59d5bf16065bc041b42a28354a0) ([#12690](https://github.com/yt-dlp/yt-dlp/issues/12690)) by [bashonly](https://github.com/bashonly)
- **bunnycdn**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3a1583ca75fb523cbad0e5e174387ea7b477d175) ([#11586](https://github.com/yt-dlp/yt-dlp/issues/11586)) by [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **canalsurmas**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/01a8be4c23f186329d85f9c78db34a55f3294ac5) ([#12497](https://github.com/yt-dlp/yt-dlp/issues/12497)) by [Arc8ne](https://github.com/Arc8ne)
- **cda**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/be0d819e1103195043f6743650781f0d4d343f6d) ([#12552](https://github.com/yt-dlp/yt-dlp/issues/12552)) by [rysson](https://github.com/rysson)
- **cultureunplugged**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/3042afb5fe342d3a00de76704cd7de611acc350e) ([#12486](https://github.com/yt-dlp/yt-dlp/issues/12486)) by [seproDev](https://github.com/seproDev)
- **dailymotion**: [Improve embed detection](https://github.com/yt-dlp/yt-dlp/commit/ad60137c141efa5023fbc0ac8579eaefe8b3d8cc) ([#12464](https://github.com/yt-dlp/yt-dlp/issues/12464)) by [seproDev](https://github.com/seproDev)
- **gem.cbc.ca**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/eb1417786a3027b1e7290ec37ef6aaece50ebed0) ([#12414](https://github.com/yt-dlp/yt-dlp/issues/12414)) by [bashonly](https://github.com/bashonly)
- **globo**: [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/0e1697232fcbba7551f983fd1ba93bb445cbb08b) ([#12270](https://github.com/yt-dlp/yt-dlp/issues/12270)) by [pedro](https://github.com/pedro)
- **instagram**
- [Add `app_id` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/a90641c8363fa0c10800b36eb6b01ee22d3a9409) ([#12359](https://github.com/yt-dlp/yt-dlp/issues/12359)) by [chrisellsworth](https://github.com/chrisellsworth)
- [Fix extraction of older private posts](https://github.com/yt-dlp/yt-dlp/commit/a59abe0636dc49b22a67246afe35613571b86f05) ([#12451](https://github.com/yt-dlp/yt-dlp/issues/12451)) by [bashonly](https://github.com/bashonly)
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/480125560a3b9972d29ae0da850aba8109e6bd41) ([#12410](https://github.com/yt-dlp/yt-dlp/issues/12410)) by [bashonly](https://github.com/bashonly)
- story: [Support `--no-playlist`](https://github.com/yt-dlp/yt-dlp/commit/65c3c58c0a67463a150920203cec929045c95a24) ([#12397](https://github.com/yt-dlp/yt-dlp/issues/12397)) by [fireattack](https://github.com/fireattack)
- **jamendo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/89a68c4857ddbaf937ff22f12648baaf6b5af840) ([#12622](https://github.com/yt-dlp/yt-dlp/issues/12622)) by [bashonly](https://github.com/bashonly), [JChris246](https://github.com/JChris246)
- **ketnet**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/bbada3ec0779422cde34f1ce3dcf595da463b493) ([#12628](https://github.com/yt-dlp/yt-dlp/issues/12628)) by [MichaelDeBoey](https://github.com/MichaelDeBoey)
- **lbry**
- [Make m3u8 format extraction non-fatal](https://github.com/yt-dlp/yt-dlp/commit/9807181cfbf87bfa732f415c30412bdbd77cbf81) ([#12463](https://github.com/yt-dlp/yt-dlp/issues/12463)) by [bashonly](https://github.com/bashonly)
- [Raise appropriate error for non-media files](https://github.com/yt-dlp/yt-dlp/commit/7126b472601814b7fd8c9de02069e8fff1764891) ([#12462](https://github.com/yt-dlp/yt-dlp/issues/12462)) by [bashonly](https://github.com/bashonly)
- **loco**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/983095485c731240aae27c950cb8c24a50827b56) ([#12667](https://github.com/yt-dlp/yt-dlp/issues/12667)) by [DTrombett](https://github.com/DTrombett)
- **magellantv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/172d5fcd778bf2605db7647ebc56b29ed18d24ac) ([#12505](https://github.com/yt-dlp/yt-dlp/issues/12505)) by [seproDev](https://github.com/seproDev)
- **mitele**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7223d29569a48a35ad132a508c115973866838d3) ([#12689](https://github.com/yt-dlp/yt-dlp/issues/12689)) by [bashonly](https://github.com/bashonly)
- **msn**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/4815dac131d42c51e12c1d05232db0bbbf607329) ([#12513](https://github.com/yt-dlp/yt-dlp/issues/12513)) by [seproDev](https://github.com/seproDev), [thedenv](https://github.com/thedenv)
- **n1**: [Fix extraction of newer articles](https://github.com/yt-dlp/yt-dlp/commit/9d70abe4de401175cbbaaa36017806f16b2df9af) ([#12514](https://github.com/yt-dlp/yt-dlp/issues/12514)) by [u-spec-png](https://github.com/u-spec-png)
- **nbcstations**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ebac65aa9e0bf9a97c24d00f7977900d2577364b) ([#12534](https://github.com/yt-dlp/yt-dlp/issues/12534)) by [refack](https://github.com/refack)
- **niconico**
- [Fix format sorting](https://github.com/yt-dlp/yt-dlp/commit/7508e34f203e97389f1d04db92140b13401dd724) ([#12442](https://github.com/yt-dlp/yt-dlp/issues/12442)) by [xpadev-net](https://github.com/xpadev-net)
- live: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/c2e6e1d5f77f3b720a6266f2869eb750d20e5dc1) ([#12419](https://github.com/yt-dlp/yt-dlp/issues/12419)) by [bashonly](https://github.com/bashonly)
- **openrec**: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/17504f253564cfad86244de2b6346d07d2300ca5) ([#12608](https://github.com/yt-dlp/yt-dlp/issues/12608)) by [fireattack](https://github.com/fireattack)
- **pinterest**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bd0a66816934de70312eea1e71c59c13b401dc3a) ([#12538](https://github.com/yt-dlp/yt-dlp/issues/12538)) by [mikf](https://github.com/mikf)
- **playsuisse**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/6933f5670cea9c3e2fb16c1caa1eda54d13122c5) ([#12444](https://github.com/yt-dlp/yt-dlp/issues/12444)) by [bashonly](https://github.com/bashonly)
- **reddit**: [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/d9a53cc1e6fd912daf500ca4f19e9ca88994dbf9) ([#12567](https://github.com/yt-dlp/yt-dlp/issues/12567)) by [seproDev](https://github.com/seproDev)
- **rtp**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/8eb9c1bf3b9908cca22ef043602aa24fb9f352c6) ([#11638](https://github.com/yt-dlp/yt-dlp/issues/11638)) by [pferreir](https://github.com/pferreir), [red-acid](https://github.com/red-acid), [seproDev](https://github.com/seproDev), [somini](https://github.com/somini), [vallovic](https://github.com/vallovic)
- **softwhiteunderbelly**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/652827d5a076c9483c36654ad2cf3fe46219baf4) ([#12281](https://github.com/yt-dlp/yt-dlp/issues/12281)) by [benfaerber](https://github.com/benfaerber)
- **soop**: [Fix timestamp extraction](https://github.com/yt-dlp/yt-dlp/commit/8305df00012ff8138a6ff95279d06b54ac607f63) ([#12609](https://github.com/yt-dlp/yt-dlp/issues/12609)) by [msikma](https://github.com/msikma)
- **soundcloud**
- [Extract tags](https://github.com/yt-dlp/yt-dlp/commit/9deed13d7cce6d3647379e50589c92de89227509) ([#12420](https://github.com/yt-dlp/yt-dlp/issues/12420)) by [bashonly](https://github.com/bashonly)
- [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/6deeda5c11f34f613724fa0627879f0d607ba1b4) ([#12447](https://github.com/yt-dlp/yt-dlp/issues/12447)) by [bashonly](https://github.com/bashonly)
- **tiktok**
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/99ea2978757a431eeb2a265b3395ccbe4ce202cf) ([#12445](https://github.com/yt-dlp/yt-dlp/issues/12445)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/83b119dadb0f267f1fb66bf7ed74c097349de79e) ([#12566](https://github.com/yt-dlp/yt-dlp/issues/12566)) by [seproDev](https://github.com/seproDev)
- **tv8.it**: [Add live and playlist extractors](https://github.com/yt-dlp/yt-dlp/commit/2ee3a0aff9be2be3bea60640d3d8a0febaf0acb6) ([#12569](https://github.com/yt-dlp/yt-dlp/issues/12569)) by [DTrombett](https://github.com/DTrombett)
- **tvw**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/42b7440963866e31ff84a5b89030d1c596fa2e6e) ([#12271](https://github.com/yt-dlp/yt-dlp/issues/12271)) by [fries1234](https://github.com/fries1234)
- **twitter**
- [Fix syndication token generation](https://github.com/yt-dlp/yt-dlp/commit/b8b47547049f5ebc3dd680fc7de70ed0ca9c0d70) ([#12537](https://github.com/yt-dlp/yt-dlp/issues/12537)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/06f6de78db2eceeabd062ab1a3023e0ff9d4df53) ([#12560](https://github.com/yt-dlp/yt-dlp/issues/12560)) by [seproDev](https://github.com/seproDev)
- **vk**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/05c8023a27dd37c49163c0498bf98e3e3c1cb4b9) ([#12510](https://github.com/yt-dlp/yt-dlp/issues/12510)) by [seproDev](https://github.com/seproDev)
- **vrtmax**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/df9ebeec00d658693252978d1ffb885e67aa6ab6) ([#12479](https://github.com/yt-dlp/yt-dlp/issues/12479)) by [bergoid](https://github.com/bergoid), [MichaelDeBoey](https://github.com/MichaelDeBoey), [seproDev](https://github.com/seproDev)
- **weibo**: [Support playlists](https://github.com/yt-dlp/yt-dlp/commit/0bb39788626002a8a67e925580227952c563c8b9) ([#12284](https://github.com/yt-dlp/yt-dlp/issues/12284)) by [4ft35t](https://github.com/4ft35t)
- **wsj**: [Support opinion URLs and impersonation](https://github.com/yt-dlp/yt-dlp/commit/7f3006eb0c0659982bb956d71b0bc806bcb0a5f2) ([#12431](https://github.com/yt-dlp/yt-dlp/issues/12431)) by [refack](https://github.com/refack)
- **youtube**
- [Fix nsig and signature extraction for player `643afba4`](https://github.com/yt-dlp/yt-dlp/commit/9b868518a15599f3d7ef5a1c730dda164c30da9b) ([#12684](https://github.com/yt-dlp/yt-dlp/issues/12684)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/3380febe9984c21c79c3147c1d390a4cf339bc4c) ([#12603](https://github.com/yt-dlp/yt-dlp/issues/12603)) by [seproDev](https://github.com/seproDev)
- [Split into package](https://github.com/yt-dlp/yt-dlp/commit/4432a9390c79253ac830702b226d2e558b636725) ([#12557](https://github.com/yt-dlp/yt-dlp/issues/12557)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on DRM formats](https://github.com/yt-dlp/yt-dlp/commit/e67d786c7cc87bd449d22e0ddef08306891c1173) ([#12593](https://github.com/yt-dlp/yt-dlp/issues/12593)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on missing formats due to SSAP](https://github.com/yt-dlp/yt-dlp/commit/79ec2fdff75c8c1bb89b550266849ad4dec48dd3) ([#12483](https://github.com/yt-dlp/yt-dlp/issues/12483)) by [coletdjnz](https://github.com/coletdjnz)
#### Networking changes
- [Add `keep_header_casing` extension](https://github.com/yt-dlp/yt-dlp/commit/7d18fed8f1983fe6de4ddc810dfb2761ba5744ac) ([#11652](https://github.com/yt-dlp/yt-dlp/issues/11652)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K)
- [Always add unsupported suffix on version mismatch](https://github.com/yt-dlp/yt-dlp/commit/95f8df2f796d0048119615200758199aedcd7cf4) ([#12626](https://github.com/yt-dlp/yt-dlp/issues/12626)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [f36e4b6](https://github.com/yt-dlp/yt-dlp/commit/f36e4b6e65cb8403791aae2f520697115cb88dec) by [dirkf](https://github.com/dirkf), [gamer191](https://github.com/gamer191), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **test**: [Show all differences for `expect_value` and `expect_dict`](https://github.com/yt-dlp/yt-dlp/commit/a3e0c7d3b267abdf3933b709704a28d43bb46503) ([#12334](https://github.com/yt-dlp/yt-dlp/issues/12334)) by [Grub4K](https://github.com/Grub4K)
### 2025.02.19 ### 2025.02.19
#### Core changes #### Core changes

View File

@@ -337,10 +337,11 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--plugin-dirs PATH Path to an additional directory to search --plugin-dirs PATH Path to an additional directory to search
for plugins. This option can be used for plugins. This option can be used
multiple times to add multiple directories. multiple times to add multiple directories.
Note that this currently only works for Use "default" to search the default plugin
extractor plugins; postprocessor plugins can directories (default)
only be loaded from the default plugin --no-plugin-dirs Clear plugin directories to search,
directories including defaults and those provided by
previous --plugin-dirs
--flat-playlist Do not extract a playlist's URL result --flat-playlist Do not extract a playlist's URL result
entries; some entry metadata may be missing entries; some entry metadata may be missing
and downloading may be bypassed and downloading may be bypassed
@@ -1768,7 +1769,7 @@ The following extractors use this feature:
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios` * `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@@ -1811,6 +1812,9 @@ The following extractors use this feature:
* `vcodec`: vcodec to ignore - one or more of `h264`, `h265`, `dvh265` * `vcodec`: vcodec to ignore - one or more of `h264`, `h265`, `dvh265`
* `dr`: dynamic range to ignore - one or more of `sdr`, `hdr10`, `dv` * `dr`: dynamic range to ignore - one or more of `sdr`, `hdr10`, `dv`
#### instagram
* `app_id`: The value of the `X-IG-App-ID` header used for API requests. Default is the web app ID, `936619743392459`
#### niconicochannelplus #### niconicochannelplus
* `max_comments`: Maximum number of comments to extract - default is `120` * `max_comments`: Maximum number of comments to extract - default is `120`

View File

@@ -10,6 +10,9 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from inspect import getsource from inspect import getsource
from devscripts.utils import get_filename_args, read_file, write_file from devscripts.utils import get_filename_args, read_file, write_file
from yt_dlp.extractor import import_extractors
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
from yt_dlp.globals import extractors
NO_ATTR = object() NO_ATTR = object()
STATIC_CLASS_PROPERTIES = [ STATIC_CLASS_PROPERTIES = [
@@ -38,8 +41,7 @@ def main():
lazy_extractors_filename = get_filename_args(default_outfile='yt_dlp/extractor/lazy_extractors.py') lazy_extractors_filename = get_filename_args(default_outfile='yt_dlp/extractor/lazy_extractors.py')
from yt_dlp.extractor.extractors import _ALL_CLASSES import_extractors()
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
DummyInfoExtractor = type('InfoExtractor', (InfoExtractor,), {'IE_NAME': NO_ATTR}) DummyInfoExtractor = type('InfoExtractor', (InfoExtractor,), {'IE_NAME': NO_ATTR})
module_src = '\n'.join(( module_src = '\n'.join((
@@ -47,7 +49,7 @@ def main():
' _module = None', ' _module = None',
*extra_ie_code(DummyInfoExtractor), *extra_ie_code(DummyInfoExtractor),
'\nclass LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n', '\nclass LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n',
*build_ies(_ALL_CLASSES, (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor), *build_ies(list(extractors.value.values()), (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor),
)) ))
write_file(lazy_extractors_filename, f'{module_src}\n') write_file(lazy_extractors_filename, f'{module_src}\n')
@@ -73,7 +75,7 @@ def build_ies(ies, bases, attr_base):
if ie in ies: if ie in ies:
names.append(ie.__name__) names.append(ie.__name__)
yield f'\n_ALL_CLASSES = [{", ".join(names)}]' yield '\n_CLASS_LOOKUP = {%s}' % ', '.join(f'{name!r}: {name}' for name in names)
def sort_ies(ies, ignored_bases): def sort_ies(ies, ignored_bases):

View File

@@ -76,7 +76,7 @@ dev = [
] ]
static-analysis = [ static-analysis = [
"autopep8~=2.0", "autopep8~=2.0",
"ruff~=0.9.0", "ruff~=0.11.0",
] ]
test = [ test = [
"pytest~=8.1", "pytest~=8.1",
@@ -384,9 +384,14 @@ select = [
"W391", "W391",
"W504", "W504",
] ]
exclude = "*/extractor/lazy_extractors.py,*venv*,*/test/testdata/sigs/player-*.js,.idea,.vscode"
[tool.pytest.ini_options] [tool.pytest.ini_options]
addopts = "-ra -v --strict-markers" addopts = [
"-ra", # summary: all except passed
"--verbose",
"--strict-markers",
]
markers = [ markers = [
"download", "download",
] ]

View File

@@ -224,6 +224,7 @@ The only reliable way to check if a site is supported is to try it.
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **Bundesliga** - **Bundesliga**
- **Bundestag** - **Bundestag**
- **BunnyCdn**
- **BusinessInsider** - **BusinessInsider**
- **BuzzFeed** - **BuzzFeed**
- **BYUtv**: (**Currently broken**) - **BYUtv**: (**Currently broken**)
@@ -242,6 +243,7 @@ The only reliable way to check if a site is supported is to try it.
- **CanalAlpha** - **CanalAlpha**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalsurmas**
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine") - **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
- **CartoonNetwork** - **CartoonNetwork**
- **cbc.ca** - **cbc.ca**
@@ -609,10 +611,10 @@ The only reliable way to check if a site is supported is to try it.
- **Inc** - **Inc**
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram**: [*instagram*](## "netrc machine") - **Instagram**
- **instagram:story**: [*instagram*](## "netrc machine") - **instagram:story**
- **instagram:tag**: [*instagram*](## "netrc machine") Instagram hashtag search URLs - **instagram:tag**: Instagram hashtag search URLs
- **instagram:user**: [*instagram*](## "netrc machine") Instagram user profile (**Currently broken**) - **instagram:user**: Instagram user profile (**Currently broken**)
- **InstagramIOS**: IOS instagram:// URL - **InstagramIOS**: IOS instagram:// URL
- **Internazionale** - **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
@@ -661,7 +663,6 @@ The only reliable way to check if a site is supported is to try it.
- **KelbyOne**: (**Currently broken**) - **KelbyOne**: (**Currently broken**)
- **Kenh14Playlist** - **Kenh14Playlist**
- **Kenh14Video** - **Kenh14Video**
- **Ketnet**
- **khanacademy** - **khanacademy**
- **khanacademy:unit** - **khanacademy:unit**
- **kick:clips** - **kick:clips**
@@ -733,6 +734,7 @@ The only reliable way to check if a site is supported is to try it.
- **Livestreamfails** - **Livestreamfails**
- **Lnk** - **Lnk**
- **loc**: Library of Congress - **loc**: Library of Congress
- **Loco**
- **loom** - **loom**
- **loom:folder** - **loom:folder**
- **LoveHomePorn** - **LoveHomePorn**
@@ -831,7 +833,7 @@ The only reliable way to check if a site is supported is to try it.
- **MoviewPlay** - **MoviewPlay**
- **Moviezine** - **Moviezine**
- **MovingImage** - **MovingImage**
- **MSN**: (**Currently broken**) - **MSN**
- **mtg**: MTG services - **mtg**: MTG services
- **mtv** - **mtv**
- **mtv.de**: (**Currently broken**) - **mtv.de**: (**Currently broken**)
@@ -1342,6 +1344,7 @@ The only reliable way to check if a site is supported is to try it.
- **Smotrim** - **Smotrim**
- **SnapchatSpotlight** - **SnapchatSpotlight**
- **Snotr** - **Snotr**
- **SoftWhiteUnderbelly**: [*softwhiteunderbelly*](## "netrc machine")
- **Sohu** - **Sohu**
- **SohuV** - **SohuV**
- **SonyLIV**: [*sonyliv*](## "netrc machine") - **SonyLIV**: [*sonyliv*](## "netrc machine")
@@ -1536,6 +1539,8 @@ The only reliable way to check if a site is supported is to try it.
- **tv5unis** - **tv5unis**
- **tv5unis:video** - **tv5unis:video**
- **tv8.it** - **tv8.it**
- **tv8.it:live**: TV8 Live
- **tv8.it:playlist**: TV8 Playlist
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **tvaplus**: TVA+ - **tvaplus**: TVA+
@@ -1556,6 +1561,7 @@ The only reliable way to check if a site is supported is to try it.
- **tvp:vod:series** - **tvp:vod:series**
- **TVPlayer** - **TVPlayer**
- **TVPlayHome** - **TVPlayHome**
- **Tvw**
- **Tweakers** - **Tweakers**
- **TwitCasting** - **TwitCasting**
- **TwitCastingLive** - **TwitCastingLive**
@@ -1677,7 +1683,7 @@ The only reliable way to check if a site is supported is to try it.
- **vqq:series** - **vqq:series**
- **vqq:video** - **vqq:video**
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza - **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: [*vrtnu*](## "netrc machine") VRT MAX - **vrtmax**: [*vrtnu*](## "netrc machine") VRT MAX (formerly VRT NU)
- **VTM**: (**Currently broken**) - **VTM**: (**Currently broken**)
- **VTV** - **VTV**
- **VTVGo** - **VTVGo**

View File

@@ -101,87 +101,109 @@ def getwebpagetestcases():
md5 = lambda s: hashlib.md5(s.encode()).hexdigest() md5 = lambda s: hashlib.md5(s.encode()).hexdigest()
def expect_value(self, got, expected, field): def _iter_differences(got, expected, field):
if isinstance(expected, str) and expected.startswith('re:'): if isinstance(expected, str):
match_str = expected[len('re:'):] op, _, val = expected.partition(':')
match_rex = re.compile(match_str) if op in ('mincount', 'maxcount', 'count'):
if not isinstance(got, (list, dict)):
yield field, f'expected either {list.__name__} or {dict.__name__}, got {type(got).__name__}'
return
self.assertTrue( expected_num = int(val)
isinstance(got, str), got_num = len(got)
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
match_rex.match(got),
f'field {field} (value: {got!r}) should match {match_str!r}')
elif isinstance(expected, str) and expected.startswith('startswith:'):
start_str = expected[len('startswith:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
got.startswith(start_str),
f'field {field} (value: {got!r}) should start with {start_str!r}')
elif isinstance(expected, str) and expected.startswith('contains:'):
contains_str = expected[len('contains:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
contains_str in got,
f'field {field} (value: {got!r}) should contain {contains_str!r}')
elif isinstance(expected, type):
self.assertTrue(
isinstance(got, expected),
f'Expected type {expected!r} for field {field}, but got value {got!r} of type {type(got)!r}')
elif isinstance(expected, dict) and isinstance(got, dict):
expect_dict(self, got, expected)
elif isinstance(expected, list) and isinstance(got, list):
self.assertEqual(
len(expected), len(got),
f'Expect a list of length {len(expected)}, but got a list of length {len(got)} for field {field}')
for index, (item_got, item_expected) in enumerate(zip(got, expected)):
type_got = type(item_got)
type_expected = type(item_expected)
self.assertEqual(
type_expected, type_got,
f'Type mismatch for list item at index {index} for field {field}, '
f'expected {type_expected!r}, got {type_got!r}')
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, str),
f'Expected field {field} to be a unicode object, but got value {got!r} of type {type(got)!r}')
got = 'md5:' + md5(got)
elif isinstance(expected, str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue(
isinstance(got, (list, dict)),
f'Expected field {field} to be a list or a dict, but it is of type {type(got).__name__}')
op, _, expected_num = expected.partition(':')
expected_num = int(expected_num)
if op == 'mincount': if op == 'mincount':
assert_func = assertGreaterEqual if got_num < expected_num:
msg_tmpl = 'Expected %d items in field %s, but only got %d' yield field, f'expected at least {val} items, got {got_num}'
elif op == 'maxcount': return
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d' if op == 'maxcount':
elif op == 'count': if got_num > expected_num:
assert_func = assertEqual yield field, f'expected at most {val} items, got {got_num}'
msg_tmpl = 'Expected exactly %d items in field %s, but got %d' return
else:
assert False assert op == 'count'
assert_func( if got_num != expected_num:
self, len(got), expected_num, yield field, f'expected exactly {val} items, got {got_num}'
msg_tmpl % (expected_num, field, len(got)))
return return
self.assertEqual(
expected, got, if not isinstance(got, str):
f'Invalid value for field {field}, expected {expected!r}, got {got!r}') yield field, f'expected {str.__name__}, got {type(got).__name__}'
return
if op == 're':
if not re.match(val, got):
yield field, f'should match {val!r}, got {got!r}'
return
if op == 'startswith':
if not val.startswith(got):
yield field, f'should start with {val!r}, got {got!r}'
return
if op == 'contains':
if not val.startswith(got):
yield field, f'should contain {val!r}, got {got!r}'
return
if op == 'md5':
hash_val = md5(got)
if hash_val != val:
yield field, f'expected hash {val}, got {hash_val}'
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
return
if isinstance(expected, dict) and isinstance(got, dict):
for key, expected_val in expected.items():
if key not in got:
yield field, f'missing key: {key!r}'
continue
field_name = key if field is None else f'{field}.{key}'
yield from _iter_differences(got[key], expected_val, field_name)
return
if isinstance(expected, type):
if not isinstance(got, expected):
yield field, f'expected {expected.__name__}, got {type(got).__name__}'
return
if isinstance(expected, list) and isinstance(got, list):
# TODO: clever diffing algorithm lmao
if len(expected) != len(got):
yield field, f'expected length of {len(expected)}, got {len(got)}'
return
for index, (got_val, expected_val) in enumerate(zip(got, expected)):
field_name = str(index) if field is None else f'{field}.{index}'
yield from _iter_differences(got_val, expected_val, field_name)
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
def _expect_value(message, got, expected, field):
mismatches = list(_iter_differences(got, expected, field))
if not mismatches:
return
fields = [field for field, _ in mismatches if field is not None]
return ''.join((
message, f' ({", ".join(fields)})' if fields else '',
*(f'\n\t{field}: {message}' for field, message in mismatches)))
def expect_value(self, got, expected, field):
if message := _expect_value('values differ', got, expected, field):
self.fail(message)
def expect_dict(self, got_dict, expected_dict): def expect_dict(self, got_dict, expected_dict):
for info_field, expected in expected_dict.items(): if message := _expect_value('dictionaries differ', got_dict, expected_dict, None):
got = got_dict.get(info_field) self.fail(message)
expect_value(self, got, expected, info_field)
def sanitize_got_info_dict(got_dict): def sanitize_got_info_dict(got_dict):

View File

@@ -6,6 +6,8 @@ import sys
import unittest import unittest
from unittest.mock import patch from unittest.mock import patch
from yt_dlp.globals import all_plugins_loaded
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -1427,6 +1429,12 @@ class TestYoutubeDL(unittest.TestCase):
self.assertFalse(result.get('cookies'), msg='Cookies set in cookies field for wrong domain') self.assertFalse(result.get('cookies'), msg='Cookies set in cookies field for wrong domain')
self.assertFalse(ydl.cookiejar.get_cookie_header(fmt['url']), msg='Cookies set in cookiejar for wrong domain') self.assertFalse(ydl.cookiejar.get_cookie_header(fmt['url']), msg='Cookies set in cookiejar for wrong domain')
def test_load_plugins_compat(self):
# Should try to reload plugins if they haven't already been loaded
all_plugins_loaded.value = False
FakeYDL().close()
assert all_plugins_loaded.value
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -331,10 +331,6 @@ class TestHTTPConnectProxy:
assert proxy_info['proxy'] == server_address assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers'] assert 'Proxy-Authorization' in proxy_info['headers']
@pytest.mark.skip_handler(
'Requests',
'bug in urllib3 causes unclosed socket: https://github.com/urllib3/urllib3/issues/3374',
)
def test_http_connect_bad_auth(self, handler, ctx): def test_http_connect_bad_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address: with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh: with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh:

View File

@@ -384,7 +384,7 @@ class TestJSInterpreter(unittest.TestCase):
@unittest.skip('Not implemented') @unittest.skip('Not implemented')
def test_packed(self): def test_packed(self):
jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''') jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) # noqa: SIM905
def test_join(self): def test_join(self):
test_input = list('test') test_input = list('test')
@@ -462,6 +462,16 @@ class TestJSInterpreter(unittest.TestCase):
]: ]:
assert js_number_to_string(test, radix) == expected assert js_number_to_string(test, radix) == expected
def test_extract_function(self):
jsi = JSInterpreter('function a(b) { return b + 1; }')
func = jsi.extract_function('a')
self.assertEqual(func([2]), 3)
def test_extract_function_with_global_stack(self):
jsi = JSInterpreter('function c(d) { return d + e + f + g; }')
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -720,6 +720,15 @@ class TestHTTPRequestHandler(TestRequestHandlerBase):
rh, Request( rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close() f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close()
@pytest.mark.skip_handlers_if(lambda _, handler: handler not in ['Urllib', 'CurlCFFI'], 'handler does not support keep_header_casing')
def test_keep_header_casing(self, handler):
with handler() as rh:
res = validate_and_send(
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', headers={'X-test-heaDer': 'test'}, extensions={'keep_header_casing': True})).read().decode()
assert 'X-test-heaDer: test' in res
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True) @pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate: class TestClientCertificate:
@@ -1289,6 +1298,7 @@ class TestRequestHandlerValidation:
({'legacy_ssl': False}, False), ({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False), ({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError), ({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': True}, UnsupportedRequest),
]), ]),
('Requests', 'http', [ ('Requests', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError), ({'cookiejar': 'notacookiejar'}, AssertionError),
@@ -1299,6 +1309,9 @@ class TestRequestHandlerValidation:
({'legacy_ssl': False}, False), ({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False), ({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError), ({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': False}, False),
({'keep_header_casing': True}, False),
({'keep_header_casing': 'notabool'}, AssertionError),
]), ]),
('CurlCFFI', 'http', [ ('CurlCFFI', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError), ({'cookiejar': 'notacookiejar'}, AssertionError),

View File

@@ -10,22 +10,71 @@ TEST_DATA_DIR = Path(os.path.dirname(os.path.abspath(__file__)), 'testdata')
sys.path.append(str(TEST_DATA_DIR)) sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches() importlib.invalidate_caches()
from yt_dlp.utils import Config from yt_dlp.plugins import (
from yt_dlp.plugins import PACKAGE_NAME, directories, load_plugins PACKAGE_NAME,
PluginSpec,
directories,
load_plugins,
load_all_plugins,
register_plugin_spec,
)
from yt_dlp.globals import (
extractors,
postprocessors,
plugin_dirs,
plugin_ies,
plugin_pps,
all_plugins_loaded,
plugin_specs,
)
EXTRACTOR_PLUGIN_SPEC = PluginSpec(
module_name='extractor',
suffix='IE',
destination=extractors,
plugin_destination=plugin_ies,
)
POSTPROCESSOR_PLUGIN_SPEC = PluginSpec(
module_name='postprocessor',
suffix='PP',
destination=postprocessors,
plugin_destination=plugin_pps,
)
def reset_plugins():
plugin_ies.value = {}
plugin_pps.value = {}
plugin_dirs.value = ['default']
plugin_specs.value = {}
all_plugins_loaded.value = False
# Clearing override plugins is probably difficult
for module_name in tuple(sys.modules):
for plugin_type in ('extractor', 'postprocessor'):
if module_name.startswith(f'{PACKAGE_NAME}.{plugin_type}.'):
del sys.modules[module_name]
importlib.invalidate_caches()
class TestPlugins(unittest.TestCase): class TestPlugins(unittest.TestCase):
TEST_PLUGIN_DIR = TEST_DATA_DIR / PACKAGE_NAME TEST_PLUGIN_DIR = TEST_DATA_DIR / PACKAGE_NAME
def setUp(self):
reset_plugins()
def tearDown(self):
reset_plugins()
def test_directories_containing_plugins(self): def test_directories_containing_plugins(self):
self.assertIn(self.TEST_PLUGIN_DIR, map(Path, directories())) self.assertIn(self.TEST_PLUGIN_DIR, map(Path, directories()))
def test_extractor_classes(self): def test_extractor_classes(self):
for module_name in tuple(sys.modules): plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
if module_name.startswith(f'{PACKAGE_NAME}.extractor'):
del sys.modules[module_name]
plugins_ie = load_plugins('extractor', 'IE')
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys()) self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn('NormalPluginIE', plugins_ie.keys()) self.assertIn('NormalPluginIE', plugins_ie.keys())
@@ -35,17 +84,29 @@ class TestPlugins(unittest.TestCase):
f'{PACKAGE_NAME}.extractor._ignore' in sys.modules, f'{PACKAGE_NAME}.extractor._ignore' in sys.modules,
'loaded module beginning with underscore') 'loaded module beginning with underscore')
self.assertNotIn('IgnorePluginIE', plugins_ie.keys()) self.assertNotIn('IgnorePluginIE', plugins_ie.keys())
self.assertNotIn('IgnorePluginIE', plugin_ies.value)
# Don't load extractors with underscore prefix # Don't load extractors with underscore prefix
self.assertNotIn('_IgnoreUnderscorePluginIE', plugins_ie.keys()) self.assertNotIn('_IgnoreUnderscorePluginIE', plugins_ie.keys())
self.assertNotIn('_IgnoreUnderscorePluginIE', plugin_ies.value)
# Don't load extractors not specified in __all__ (if supplied) # Don't load extractors not specified in __all__ (if supplied)
self.assertNotIn('IgnoreNotInAllPluginIE', plugins_ie.keys()) self.assertNotIn('IgnoreNotInAllPluginIE', plugins_ie.keys())
self.assertNotIn('IgnoreNotInAllPluginIE', plugin_ies.value)
self.assertIn('InAllPluginIE', plugins_ie.keys()) self.assertIn('InAllPluginIE', plugins_ie.keys())
self.assertIn('InAllPluginIE', plugin_ies.value)
# Don't load override extractors
self.assertNotIn('OverrideGenericIE', plugins_ie.keys())
self.assertNotIn('OverrideGenericIE', plugin_ies.value)
self.assertNotIn('_UnderscoreOverrideGenericIE', plugins_ie.keys())
self.assertNotIn('_UnderscoreOverrideGenericIE', plugin_ies.value)
def test_postprocessor_classes(self): def test_postprocessor_classes(self):
plugins_pp = load_plugins('postprocessor', 'PP') plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys()) self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
self.assertIn('NormalPluginPP', plugin_pps.value)
def test_importing_zipped_module(self): def test_importing_zipped_module(self):
zip_path = TEST_DATA_DIR / 'zipped_plugins.zip' zip_path = TEST_DATA_DIR / 'zipped_plugins.zip'
@@ -58,10 +119,10 @@ class TestPlugins(unittest.TestCase):
package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}') package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(zip_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__)) self.assertIn(zip_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE') plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginIE', plugins_ie.keys()) self.assertIn('ZippedPluginIE', plugins_ie.keys())
plugins_pp = load_plugins('postprocessor', 'PP') plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginPP', plugins_pp.keys()) self.assertIn('ZippedPluginPP', plugins_pp.keys())
finally: finally:
@@ -69,23 +130,116 @@ class TestPlugins(unittest.TestCase):
os.remove(zip_path) os.remove(zip_path)
importlib.invalidate_caches() # reset the import caches importlib.invalidate_caches() # reset the import caches
def test_plugin_dirs(self): def test_reloading_plugins(self):
# Internal plugin dirs hack for CLI --plugin-dirs reload_plugins_path = TEST_DATA_DIR / 'reload_plugins'
# To be replaced with proper system later load_plugins(EXTRACTOR_PLUGIN_SPEC)
custom_plugin_dir = TEST_DATA_DIR / 'plugin_packages' load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
Config._plugin_dirs = [str(custom_plugin_dir)]
importlib.invalidate_caches() # reset the import caches
# Remove default folder and add reload_plugin path
sys.path.remove(str(TEST_DATA_DIR))
sys.path.append(str(reload_plugins_path))
importlib.invalidate_caches()
try: try:
package = importlib.import_module(f'{PACKAGE_NAME}.extractor') for plugin_type in ('extractor', 'postprocessor'):
self.assertIn(custom_plugin_dir / 'testpackage' / PACKAGE_NAME / 'extractor', map(Path, package.__path__)) package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(reload_plugins_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE') plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('PackagePluginIE', plugins_ie.keys()) self.assertIn('NormalPluginIE', plugins_ie.keys())
self.assertTrue(
plugins_ie['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin')
self.assertTrue(
extractors.value['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin globally')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertTrue(plugins_pp['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin')
self.assertTrue(
postprocessors.value['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin globally')
finally: finally:
Config._plugin_dirs = [] sys.path.remove(str(reload_plugins_path))
importlib.invalidate_caches() # reset the import caches sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches()
def test_extractor_override_plugin(self):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.TEST_FIELD, 'override')
self.assertEqual(GenericIE.SECONDARY_TEST_FIELD, 'underscore-override')
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
importlib.invalidate_caches()
# test that loading a second time doesn't wrap a second time
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
def test_load_all_plugin_types(self):
# no plugin specs registered
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
load_all_plugins()
self.assertTrue(all_plugins_loaded.value)
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_no_plugin_dirs(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
plugin_dirs.value = []
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_set_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
plugin_dirs.value = [custom_plugin_dir]
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_invalid_plugin_dir(self):
plugin_dirs.value = ['invalid_dir']
with self.assertRaises(ValueError):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
def test_append_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
self.assertEqual(plugin_dirs.value, ['default'])
plugin_dirs.value.append(custom_plugin_dir)
self.assertEqual(plugin_dirs.value, ['default', custom_plugin_dir])
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_get_plugin_spec(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('extractor'), EXTRACTOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('postprocessor'), POSTPROCESSOR_PLUGIN_SPEC)
self.assertIsNone(plugin_specs.value.get('invalid'))
if __name__ == '__main__': if __name__ == '__main__':

View File

@@ -3,19 +3,20 @@
# Allow direct execution # Allow direct execution
import os import os
import sys import sys
import unittest
import unittest.mock
import warnings
import datetime as dt
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import contextlib import contextlib
import datetime as dt
import io import io
import itertools import itertools
import json import json
import pickle
import subprocess import subprocess
import unittest
import unittest.mock
import warnings
import xml.etree.ElementTree import xml.etree.ElementTree
from yt_dlp.compat import ( from yt_dlp.compat import (
@@ -218,11 +219,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw') self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
self.assertEqual(sanitize_filename('N0Y__7-UOdI', is_id=True), 'N0Y__7-UOdI') self.assertEqual(sanitize_filename('N0Y__7-UOdI', is_id=True), 'N0Y__7-UOdI')
@unittest.mock.patch('sys.platform', 'win32')
def test_sanitize_path(self): def test_sanitize_path(self):
with unittest.mock.patch('sys.platform', 'win32'):
self._test_sanitize_path()
def _test_sanitize_path(self):
self.assertEqual(sanitize_path('abc'), 'abc') self.assertEqual(sanitize_path('abc'), 'abc')
self.assertEqual(sanitize_path('abc/def'), 'abc\\def') self.assertEqual(sanitize_path('abc/def'), 'abc\\def')
self.assertEqual(sanitize_path('abc\\def'), 'abc\\def') self.assertEqual(sanitize_path('abc\\def'), 'abc\\def')
@@ -253,10 +251,8 @@ class TestUtil(unittest.TestCase):
# Check with nt._path_normpath if available # Check with nt._path_normpath if available
try: try:
import nt from nt import _path_normpath as nt_path_normpath
except ImportError:
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
nt_path_normpath = None nt_path_normpath = None
for test, expected in [ for test, expected in [
@@ -2087,21 +2083,26 @@ Line 1
headers = HTTPHeaderDict() headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0' headers['ytdl-test'] = b'0'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '0')])
headers['ytdl-test'] = 1 headers['ytdl-test'] = 1
self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '1')])
headers['Ytdl-test'] = '2' headers['Ytdl-test'] = '2'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')]) self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')])
self.assertEqual(list(headers.sensitive().items()), [('Ytdl-test', '2')])
self.assertTrue('ytDl-Test' in headers) self.assertTrue('ytDl-Test' in headers)
self.assertEqual(str(headers), str(dict(headers))) self.assertEqual(str(headers), str(dict(headers)))
self.assertEqual(repr(headers), str(dict(headers))) self.assertEqual(repr(headers), str(dict(headers)))
headers.update({'X-dlp': 'data'}) headers.update({'X-dlp': 'data'})
self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')}) self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')})
self.assertEqual(set(headers.sensitive().items()), {('Ytdl-test', '2'), ('X-dlp', 'data')})
self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'}) self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'})
self.assertEqual(len(headers), 2) self.assertEqual(len(headers), 2)
self.assertEqual(headers.copy(), headers) self.assertEqual(headers.copy(), headers)
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, **headers, **{'X-dlp': 'data2'}) headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, headers, **{'X-dlP': 'data2'})
self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')}) self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')})
self.assertEqual(set(headers2.sensitive().items()), {('Ytdl-test', '2'), ('X-dlP', 'data2')})
self.assertEqual(len(headers2), 2) self.assertEqual(len(headers2), 2)
headers2.clear() headers2.clear()
self.assertEqual(len(headers2), 0) self.assertEqual(len(headers2), 0)
@@ -2109,16 +2110,23 @@ Line 1
# ensure we prefer latter headers # ensure we prefer latter headers
headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2}) headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2})
self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')}) self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')})
self.assertEqual(set(headers3.sensitive().items()), {('Ytdl-test', '2')})
del headers3['ytdl-tesT'] del headers3['ytdl-tesT']
self.assertEqual(dict(headers3), {}) self.assertEqual(dict(headers3), {})
headers4 = HTTPHeaderDict({'ytdl-test': 'data;'}) headers4 = HTTPHeaderDict({'ytdl-test': 'data;'})
self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')}) self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers4.sensitive().items()), {('ytdl-test', 'data;')})
# common mistake: strip whitespace from values # common mistake: strip whitespace from values
# https://github.com/yt-dlp/yt-dlp/issues/8729 # https://github.com/yt-dlp/yt-dlp/issues/8729
headers5 = HTTPHeaderDict({'ytdl-test': ' data; '}) headers5 = HTTPHeaderDict({'ytdl-test': ' data; '})
self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')}) self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers5.sensitive().items()), {('ytdl-test', 'data;')})
# test if picklable
headers6 = HTTPHeaderDict(a=1, b=2)
self.assertEqual(pickle.loads(pickle.dumps(headers6)), headers6)
def test_extract_basic_auth(self): def test_extract_basic_auth(self):
assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None) assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None)

View File

@@ -44,7 +44,7 @@ def websocket_handler(websocket):
return websocket.send('2') return websocket.send('2')
elif isinstance(message, str): elif isinstance(message, str):
if message == 'headers': if message == 'headers':
return websocket.send(json.dumps(dict(websocket.request.headers))) return websocket.send(json.dumps(dict(websocket.request.headers.raw_items())))
elif message == 'path': elif message == 'path':
return websocket.send(websocket.request.path) return websocket.send(websocket.request.path)
elif message == 'source_address': elif message == 'source_address':
@@ -266,18 +266,18 @@ class TestWebsSocketRequestHandlerConformance:
with handler(cookiejar=cookiejar) as rh: with handler(cookiejar=cookiejar) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
with handler() as rh: with handler() as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar})) ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@@ -287,7 +287,7 @@ class TestWebsSocketRequestHandlerConformance:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()})) ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()})) ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()}))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@@ -298,12 +298,12 @@ class TestWebsSocketRequestHandlerConformance:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie')) ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie'))
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close() ws.close()
cookiejar.clear_session_cookies() cookiejar.clear_session_cookies()
ws = ws_validate_and_send(rh, Request(self.ws_base_url)) ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers') ws.send('headers')
assert 'cookie' not in json.loads(ws.recv()) assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close() ws.close()
def test_source_address(self, handler): def test_source_address(self, handler):
@@ -341,6 +341,14 @@ class TestWebsSocketRequestHandlerConformance:
assert headers['test3'] == 'test3' assert headers['test3'] == 'test3'
ws.close() ws.close()
def test_keep_header_casing(self, handler):
with handler(headers=HTTPHeaderDict({'x-TeSt1': 'test'})) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url, headers={'x-TeSt2': 'test'}, extensions={'keep_header_casing': True}))
ws.send('headers')
headers = json.loads(ws.recv())
assert 'x-TeSt1' in headers
assert 'x-TeSt2' in headers
@pytest.mark.parametrize('client_cert', ( @pytest.mark.parametrize('client_cert', (
{'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')}, {'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')},
{ {

View File

@@ -78,6 +78,11 @@ _SIG_TESTS = [
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA', '2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q', '0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
), ),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'AAOAOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7vgpDL0QwbdV06sCIEzpWqMGkFR20CFOS21Tp-7vj_EMu-m37KtXJoOy1',
),
] ]
_NSIG_TESTS = [ _NSIG_TESTS = [
@@ -205,6 +210,30 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js', 'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js',
'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg', 'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg',
), ),
(
'https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js',
'Sy4aDGc0VpYRR9ew_', '5UPOT1VhoZxNLQ',
),
(
'https://www.youtube.com/s/player/d50f54ef/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'XFTb2HoeOE5MHg',
),
(
'https://www.youtube.com/s/player/074a8365/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'ufTsrE0IVYrkl8v',
),
(
'https://www.youtube.com/s/player/643afba4/player_ias.vflset/en_US/base.js',
'N5uAlLqm0eg1GyHO', 'dCBQOejdq5s-ww',
),
(
'https://www.youtube.com/s/player/69f581a5/tv-player-ias.vflset/tv-player-ias.js',
'-qIP447rVlTTwaZjY', 'KNcGOksBAvwqQg',
),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'ir9-V6cdbCiyKxhr', '2PL7ZDYAALMfmA',
),
] ]
@@ -218,6 +247,8 @@ class TestPlayerInfo(unittest.TestCase):
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'), ('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js', 'e7567ecf'),
('https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js', '643afba4'),
# obsolete # obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'), ('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'), ('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
@@ -250,7 +281,7 @@ def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig): def make_tfunc(url, sig_input, expected_sig):
m = url_pattern.match(url) m = url_pattern.match(url)
assert m, f'{url!r} should follow URL format' assert m, f'{url!r} should follow URL format'
test_id = m.group('id') test_id = re.sub(r'[/.-]', '_', m.group('id') or m.group('compat_id'))
def test_func(self): def test_func(self):
basename = f'player-{name}-{test_id}.js' basename = f'player-{name}-{test_id}.js'
@@ -279,17 +310,22 @@ def n_sig(jscode, sig_input):
ie = YoutubeIE(FakeYDL()) ie = YoutubeIE(FakeYDL())
funcname = ie._extract_n_function_name(jscode) funcname = ie._extract_n_function_name(jscode)
jsi = JSInterpreter(jscode) jsi = JSInterpreter(jscode)
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname))) func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname), jscode))
return func([sig_input]) return func([sig_input])
make_sig_test = t_factory( make_sig_test = t_factory(
'signature', signature, re.compile(r'.*(?:-|/player/)(?P<id>[a-zA-Z0-9_-]+)(?:/.+\.js|(?:/watch_as3|/html5player)?\.[a-z]+)$')) 'signature', signature,
re.compile(r'''(?x)
.+(?:
/player/(?P<id>[a-zA-Z0-9_/.-]+)|
/html5player-(?:en_US-)?(?P<compat_id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?
)\.js$'''))
for test_spec in _SIG_TESTS: for test_spec in _SIG_TESTS:
make_sig_test(*test_spec) make_sig_test(*test_spec)
make_nsig_test = t_factory( make_nsig_test = t_factory(
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_-]+)/.+.js$')) 'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_/.-]+)\.js$'))
for test_spec in _NSIG_TESTS: for test_spec in _NSIG_TESTS:
make_nsig_test(*test_spec) make_nsig_test(*test_spec)

View File

@@ -2,4 +2,5 @@ from yt_dlp.extractor.common import InfoExtractor
class PackagePluginIE(InfoExtractor): class PackagePluginIE(InfoExtractor):
_VALID_URL = 'package'
pass pass

View File

@@ -0,0 +1,10 @@
from yt_dlp.extractor.common import InfoExtractor
class NormalPluginIE(InfoExtractor):
_VALID_URL = 'normal'
REPLACED = True
class _IgnoreUnderscorePluginIE(InfoExtractor):
pass

View File

@@ -0,0 +1,5 @@
from yt_dlp.postprocessor.common import PostProcessor
class NormalPluginPP(PostProcessor):
REPLACED = True

View File

@@ -6,6 +6,7 @@ class IgnoreNotInAllPluginIE(InfoExtractor):
class InAllPluginIE(InfoExtractor): class InAllPluginIE(InfoExtractor):
_VALID_URL = 'inallpluginie'
pass pass

View File

@@ -2,8 +2,10 @@ from yt_dlp.extractor.common import InfoExtractor
class NormalPluginIE(InfoExtractor): class NormalPluginIE(InfoExtractor):
pass _VALID_URL = 'normalpluginie'
REPLACED = False
class _IgnoreUnderscorePluginIE(InfoExtractor): class _IgnoreUnderscorePluginIE(InfoExtractor):
_VALID_URL = 'ignoreunderscorepluginie'
pass pass

View File

@@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class OverrideGenericIE(GenericIE, plugin_name='override'):
TEST_FIELD = 'override'

View File

@@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class _UnderscoreOverrideGenericIE(GenericIE, plugin_name='underscore-override'):
SECONDARY_TEST_FIELD = 'underscore-override'

View File

@@ -2,4 +2,4 @@ from yt_dlp.postprocessor.common import PostProcessor
class NormalPluginPP(PostProcessor): class NormalPluginPP(PostProcessor):
pass REPLACED = False

View File

@@ -2,4 +2,5 @@ from yt_dlp.extractor.common import InfoExtractor
class ZippedPluginIE(InfoExtractor): class ZippedPluginIE(InfoExtractor):
_VALID_URL = 'zippedpluginie'
pass pass

View File

@@ -30,9 +30,18 @@ from .compat import urllib_req_to_req
from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor from .extractor import gen_extractor_classes, get_info_extractor, import_extractors
from .extractor.common import UnsupportedURLIE from .extractor.common import UnsupportedURLIE
from .extractor.openload import PhantomJSwrapper from .extractor.openload import PhantomJSwrapper
from .globals import (
IN_CLI,
LAZY_EXTRACTORS,
plugin_ies,
plugin_ies_overrides,
plugin_pps,
all_plugins_loaded,
plugin_dirs,
)
from .minicurses import format_text from .minicurses import format_text
from .networking import HEADRequest, Request, RequestDirector from .networking import HEADRequest, Request, RequestDirector
from .networking.common import _REQUEST_HANDLERS, _RH_PREFERENCES from .networking.common import _REQUEST_HANDLERS, _RH_PREFERENCES
@@ -44,8 +53,7 @@ from .networking.exceptions import (
network_exceptions, network_exceptions,
) )
from .networking.impersonate import ImpersonateRequestHandler from .networking.impersonate import ImpersonateRequestHandler
from .plugins import directories as plugin_directories from .plugins import directories as plugin_directories, load_all_plugins
from .postprocessor import _PLUGIN_CLASSES as plugin_pps
from .postprocessor import ( from .postprocessor import (
EmbedThumbnailPP, EmbedThumbnailPP,
FFmpegFixupDuplicateMoovPP, FFmpegFixupDuplicateMoovPP,
@@ -157,7 +165,7 @@ from .utils import (
write_json_file, write_json_file,
write_string, write_string,
) )
from .utils._utils import _UnsafeExtensionError, _YDLLogger from .utils._utils import _UnsafeExtensionError, _YDLLogger, _ProgressState
from .utils.networking import ( from .utils.networking import (
HTTPHeaderDict, HTTPHeaderDict,
clean_headers, clean_headers,
@@ -642,20 +650,23 @@ class YoutubeDL:
self.cache = Cache(self) self.cache = Cache(self)
self.__header_cookies = [] self.__header_cookies = []
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout # compat for API: load plugins if they have not already
self._out_files = Namespace( if not all_plugins_loaded.value:
out=stdout, load_all_plugins()
error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout,
console=None if os.name == 'nt' else next(
filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
)
try: try:
windows_enable_vt_mode() windows_enable_vt_mode()
except Exception as e: except Exception as e:
self.write_debug(f'Failed to enable VT mode: {e}') self.write_debug(f'Failed to enable VT mode: {e}')
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout
self._out_files = Namespace(
out=stdout,
error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout,
console=next(filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
)
if self.params.get('no_color'): if self.params.get('no_color'):
if self.params.get('color') is not None: if self.params.get('color') is not None:
self.params.setdefault('_warnings', []).append( self.params.setdefault('_warnings', []).append(
@@ -956,21 +967,22 @@ class YoutubeDL:
self._write_string(f'{self._bidi_workaround(message)}\n', self._out_files.error, only_once=only_once) self._write_string(f'{self._bidi_workaround(message)}\n', self._out_files.error, only_once=only_once)
def _send_console_code(self, code): def _send_console_code(self, code):
if os.name == 'nt' or not self._out_files.console: if not supports_terminal_sequences(self._out_files.console):
return return False
self._write_string(code, self._out_files.console) self._write_string(code, self._out_files.console)
return True
def to_console_title(self, message): def to_console_title(self, message=None, progress_state=None, percent=None):
if not self.params.get('consoletitle', False): if not self.params.get('consoletitle'):
return return
message = remove_terminal_sequences(message)
if os.name == 'nt': if message:
if ctypes.windll.kernel32.GetConsoleWindow(): success = self._send_console_code(f'\033]0;{remove_terminal_sequences(message)}\007')
# c_wchar_p() might not be necessary if `message` is if not success and os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# already of type unicode() ctypes.windll.kernel32.SetConsoleTitleW(message)
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
else: if isinstance(progress_state, _ProgressState):
self._send_console_code(f'\033]0;{message}\007') self._send_console_code(progress_state.get_ansi_escape(percent))
def save_console_title(self): def save_console_title(self):
if not self.params.get('consoletitle') or self.params.get('simulate'): if not self.params.get('consoletitle') or self.params.get('simulate'):
@@ -984,6 +996,7 @@ class YoutubeDL:
def __enter__(self): def __enter__(self):
self.save_console_title() self.save_console_title()
self.to_console_title(progress_state=_ProgressState.INDETERMINATE)
return self return self
def save_cookies(self): def save_cookies(self):
@@ -992,6 +1005,7 @@ class YoutubeDL:
def __exit__(self, *args): def __exit__(self, *args):
self.restore_console_title() self.restore_console_title()
self.to_console_title(progress_state=_ProgressState.HIDDEN)
self.close() self.close()
def close(self): def close(self):
@@ -3993,15 +4007,6 @@ class YoutubeDL:
if not self.params.get('verbose'): if not self.params.get('verbose'):
return return
from . import _IN_CLI # Must be delayed import
# These imports can be slow. So import them only as needed
from .extractor.extractors import _LAZY_LOADER
from .extractor.extractors import (
_PLUGIN_CLASSES as plugin_ies,
_PLUGIN_OVERRIDES as plugin_ie_overrides,
)
def get_encoding(stream): def get_encoding(stream):
ret = str(getattr(stream, 'encoding', f'missing ({type(stream).__name__})')) ret = str(getattr(stream, 'encoding', f'missing ({type(stream).__name__})'))
additional_info = [] additional_info = []
@@ -4040,17 +4045,18 @@ class YoutubeDL:
_make_label(ORIGIN, CHANNEL.partition('@')[2] or __version__, __version__), _make_label(ORIGIN, CHANNEL.partition('@')[2] or __version__, __version__),
f'[{RELEASE_GIT_HEAD[:9]}]' if RELEASE_GIT_HEAD else '', f'[{RELEASE_GIT_HEAD[:9]}]' if RELEASE_GIT_HEAD else '',
'' if source == 'unknown' else f'({source})', '' if source == 'unknown' else f'({source})',
'' if _IN_CLI else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}', '' if IN_CLI.value else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}',
delim=' ')) delim=' '))
if not _IN_CLI: if not IN_CLI.value:
write_debug(f'params: {self.params}') write_debug(f'params: {self.params}')
if not _LAZY_LOADER: import_extractors()
if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'): lazy_extractors = LAZY_EXTRACTORS.value
write_debug('Lazy loading extractors is forcibly disabled') if lazy_extractors is None:
else: write_debug('Lazy loading extractors is disabled')
write_debug('Lazy loading extractors is disabled') elif not lazy_extractors:
write_debug('Lazy loading extractors is forcibly disabled')
if self.params['compat_opts']: if self.params['compat_opts']:
write_debug('Compatibility options: {}'.format(', '.join(self.params['compat_opts']))) write_debug('Compatibility options: {}'.format(', '.join(self.params['compat_opts'])))
@@ -4079,24 +4085,27 @@ class YoutubeDL:
write_debug(f'Proxy map: {self.proxies}') write_debug(f'Proxy map: {self.proxies}')
write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}') write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}')
if os.environ.get('YTDLP_NO_PLUGINS'):
write_debug('Plugins are forcibly disabled')
return
for plugin_type, plugins in {'Extractor': plugin_ies, 'Post-Processor': plugin_pps}.items(): for plugin_type, plugins in (('Extractor', plugin_ies), ('Post-Processor', plugin_pps)):
display_list = ['{}{}'.format( display_list = [
klass.__name__, '' if klass.__name__ == name else f' as {name}') klass.__name__ if klass.__name__ == name else f'{klass.__name__} as {name}'
for name, klass in plugins.items()] for name, klass in plugins.value.items()]
if plugin_type == 'Extractor': if plugin_type == 'Extractor':
display_list.extend(f'{plugins[-1].IE_NAME.partition("+")[2]} ({parent.__name__})' display_list.extend(f'{plugins[-1].IE_NAME.partition("+")[2]} ({parent.__name__})'
for parent, plugins in plugin_ie_overrides.items()) for parent, plugins in plugin_ies_overrides.value.items())
if not display_list: if not display_list:
continue continue
write_debug(f'{plugin_type} Plugins: {", ".join(sorted(display_list))}') write_debug(f'{plugin_type} Plugins: {", ".join(sorted(display_list))}')
plugin_dirs = plugin_directories() plugin_dirs_msg = 'none'
if plugin_dirs: if not plugin_dirs.value:
write_debug(f'Plugin directories: {plugin_dirs}') plugin_dirs_msg = 'none (disabled)'
else:
found_plugin_directories = plugin_directories()
if found_plugin_directories:
plugin_dirs_msg = ', '.join(found_plugin_directories)
write_debug(f'Plugin directories: {plugin_dirs_msg}')
@functools.cached_property @functools.cached_property
def proxies(self): def proxies(self):

View File

@@ -19,7 +19,9 @@ from .downloader.external import get_external_downloader
from .extractor import list_extractor_classes from .extractor import list_extractor_classes
from .extractor.adobepass import MSO_INFO from .extractor.adobepass import MSO_INFO
from .networking.impersonate import ImpersonateTarget from .networking.impersonate import ImpersonateTarget
from .globals import IN_CLI, plugin_dirs
from .options import parseOpts from .options import parseOpts
from .plugins import load_all_plugins as _load_all_plugins
from .postprocessor import ( from .postprocessor import (
FFmpegExtractAudioPP, FFmpegExtractAudioPP,
FFmpegMergerPP, FFmpegMergerPP,
@@ -33,7 +35,6 @@ from .postprocessor import (
) )
from .update import Updater from .update import Updater
from .utils import ( from .utils import (
Config,
NO_DEFAULT, NO_DEFAULT,
POSTPROCESS_WHEN, POSTPROCESS_WHEN,
DateRange, DateRange,
@@ -66,8 +67,6 @@ from .utils.networking import std_headers
from .utils._utils import _UnsafeExtensionError from .utils._utils import _UnsafeExtensionError
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
_IN_CLI = False
def _exit(status=0, *args): def _exit(status=0, *args):
for msg in args: for msg in args:
@@ -433,6 +432,10 @@ def validate_options(opts):
} }
# Other options # Other options
opts.plugin_dirs = opts.plugin_dirs
if opts.plugin_dirs is None:
opts.plugin_dirs = ['default']
if opts.playlist_items is not None: if opts.playlist_items is not None:
try: try:
tuple(PlaylistEntries.parse_playlist_items(opts.playlist_items)) tuple(PlaylistEntries.parse_playlist_items(opts.playlist_items))
@@ -973,11 +976,6 @@ def _real_main(argv=None):
parser, opts, all_urls, ydl_opts = parse_options(argv) parser, opts, all_urls, ydl_opts = parse_options(argv)
# HACK: Set the plugin dirs early on
# TODO(coletdjnz): remove when plugin globals system is implemented
if opts.plugin_dirs is not None:
Config._plugin_dirs = list(map(expand_path, opts.plugin_dirs))
# Dump user agent # Dump user agent
if opts.dump_user_agent: if opts.dump_user_agent:
ua = traverse_obj(opts.headers, 'User-Agent', casesense=False, default=std_headers['User-Agent']) ua = traverse_obj(opts.headers, 'User-Agent', casesense=False, default=std_headers['User-Agent'])
@@ -992,6 +990,11 @@ def _real_main(argv=None):
if opts.ffmpeg_location: if opts.ffmpeg_location:
FFmpegPostProcessor._ffmpeg_location.set(opts.ffmpeg_location) FFmpegPostProcessor._ffmpeg_location.set(opts.ffmpeg_location)
# load all plugins into the global lookup
plugin_dirs.value = opts.plugin_dirs
if plugin_dirs.value:
_load_all_plugins()
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:
pre_process = opts.update_self or opts.rm_cachedir pre_process = opts.update_self or opts.rm_cachedir
actual_use = all_urls or opts.load_info_filename actual_use = all_urls or opts.load_info_filename
@@ -1091,8 +1094,7 @@ def _real_main(argv=None):
def main(argv=None): def main(argv=None):
global _IN_CLI IN_CLI.value = True
_IN_CLI = True
try: try:
_exit(*variadic(_real_main(argv))) _exit(*variadic(_real_main(argv)))
except (CookieLoadError, DownloadError): except (CookieLoadError, DownloadError):

View File

@@ -83,7 +83,7 @@ def aes_ecb_encrypt(data, key, iv=None):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
for i in range(block_count): for i in range(block_count):
@@ -103,7 +103,7 @@ def aes_ecb_decrypt(data, key, iv=None):
@returns {int[]} decrypted data @returns {int[]} decrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
for i in range(block_count): for i in range(block_count):
@@ -134,7 +134,7 @@ def aes_ctr_encrypt(data, key, iv):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
counter = iter_vector(iv) counter = iter_vector(iv)
encrypted_data = [] encrypted_data = []
@@ -158,7 +158,7 @@ def aes_cbc_decrypt(data, key, iv):
@returns {int[]} decrypted data @returns {int[]} decrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
decrypted_data = [] decrypted_data = []
previous_cipher_block = iv previous_cipher_block = iv
@@ -183,7 +183,7 @@ def aes_cbc_encrypt(data, key, iv, *, padding_mode='pkcs7'):
@returns {int[]} encrypted data @returns {int[]} encrypted data
""" """
expanded_key = key_expansion(key) expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES)) block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = [] encrypted_data = []
previous_cipher_block = iv previous_cipher_block = iv

View File

@@ -35,6 +35,7 @@ from .rtmp import RtmpFD
from .rtsp import RtspFD from .rtsp import RtspFD
from .websocket import WebSocketFragmentFD from .websocket import WebSocketFragmentFD
from .youtube_live_chat import YoutubeLiveChatFD from .youtube_live_chat import YoutubeLiveChatFD
from .bunnycdn import BunnyCdnFD
PROTOCOL_MAP = { PROTOCOL_MAP = {
'rtmp': RtmpFD, 'rtmp': RtmpFD,
@@ -55,6 +56,7 @@ PROTOCOL_MAP = {
'websocket_frag': WebSocketFragmentFD, 'websocket_frag': WebSocketFragmentFD,
'youtube_live_chat': YoutubeLiveChatFD, 'youtube_live_chat': YoutubeLiveChatFD,
'youtube_live_chat_replay': YoutubeLiveChatFD, 'youtube_live_chat_replay': YoutubeLiveChatFD,
'bunnycdn': BunnyCdnFD,
} }

View File

@@ -0,0 +1,50 @@
import hashlib
import random
import threading
from .common import FileDownloader
from . import HlsFD
from ..networking import Request
from ..networking.exceptions import network_exceptions
class BunnyCdnFD(FileDownloader):
"""
Downloads from BunnyCDN with required pings
Note, this is not a part of public API, and will be removed without notice.
DO NOT USE
"""
def real_download(self, filename, info_dict):
self.to_screen(f'[{self.FD_NAME}] Downloading from BunnyCDN')
fd = HlsFD(self.ydl, self.params)
stop_event = threading.Event()
ping_thread = threading.Thread(target=self.ping_thread, args=(stop_event,), kwargs=info_dict['_bunnycdn_ping_data'])
ping_thread.start()
try:
return fd.real_download(filename, info_dict)
finally:
stop_event.set()
def ping_thread(self, stop_event, url, headers, secret, context_id):
# Site sends ping every 4 seconds, but this throttles the download. Pinging every 2 seconds seems to work.
ping_interval = 2
# Hard coded resolution as it doesn't seem to matter
res = 1080
paused = 'false'
current_time = 0
while not stop_event.wait(ping_interval):
current_time += ping_interval
time = current_time + round(random.random(), 6)
md5_hash = hashlib.md5(f'{secret}_{context_id}_{time}_{paused}_{res}'.encode()).hexdigest()
ping_url = f'{url}?hash={md5_hash}&time={time}&paused={paused}&resolution={res}'
try:
self.ydl.urlopen(Request(ping_url, headers=headers)).read()
except network_exceptions as e:
self.to_screen(f'[{self.FD_NAME}] Ping failed: {e}')

View File

@@ -31,6 +31,7 @@ from ..utils import (
timetuple_from_msec, timetuple_from_msec,
try_call, try_call,
) )
from ..utils._utils import _ProgressState
class FileDownloader: class FileDownloader:
@@ -333,7 +334,7 @@ class FileDownloader:
progress_dict), s.get('progress_idx') or 0) progress_dict), s.get('progress_idx') or 0)
self.to_console_title(self.ydl.evaluate_outtmpl( self.to_console_title(self.ydl.evaluate_outtmpl(
progress_template.get('download-title') or 'yt-dlp %(progress._default_template)s', progress_template.get('download-title') or 'yt-dlp %(progress._default_template)s',
progress_dict)) progress_dict), _ProgressState.from_dict(s), s.get('_percent'))
def _format_progress(self, *args, **kwargs): def _format_progress(self, *args, **kwargs):
return self.ydl._format_text( return self.ydl._format_text(
@@ -357,6 +358,7 @@ class FileDownloader:
'_speed_str': self.format_speed(speed).strip(), '_speed_str': self.format_speed(speed).strip(),
'_total_bytes_str': _format_bytes('total_bytes'), '_total_bytes_str': _format_bytes('total_bytes'),
'_elapsed_str': self.format_seconds(s.get('elapsed')), '_elapsed_str': self.format_seconds(s.get('elapsed')),
'_percent': 100.0,
'_percent_str': self.format_percent(100), '_percent_str': self.format_percent(100),
}) })
self._report_progress_status(s, join_nonempty( self._report_progress_status(s, join_nonempty(
@@ -375,13 +377,15 @@ class FileDownloader:
return return
self._progress_delta_time += update_delta self._progress_delta_time += update_delta
progress = try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)
s.update({ s.update({
'_eta_str': self.format_eta(s.get('eta')).strip(), '_eta_str': self.format_eta(s.get('eta')).strip(),
'_speed_str': self.format_speed(s.get('speed')), '_speed_str': self.format_speed(s.get('speed')),
'_percent_str': self.format_percent(try_call( '_percent': progress,
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'], '_percent_str': self.format_percent(progress),
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)),
'_total_bytes_str': _format_bytes('total_bytes'), '_total_bytes_str': _format_bytes('total_bytes'),
'_total_bytes_estimate_str': _format_bytes('total_bytes_estimate'), '_total_bytes_estimate_str': _format_bytes('total_bytes_estimate'),
'_downloaded_bytes_str': _format_bytes('downloaded_bytes'), '_downloaded_bytes_str': _format_bytes('downloaded_bytes'),

View File

@@ -457,8 +457,6 @@ class FFmpegFD(ExternalFD):
@classmethod @classmethod
def available(cls, path=None): def available(cls, path=None):
# TODO: Fix path for ffmpeg
# Fixme: This may be wrong when --ffmpeg-location is used
return FFmpegPostProcessor().available return FFmpegPostProcessor().available
def on_process_started(self, proc, stdin): def on_process_started(self, proc, stdin):

View File

@@ -1,16 +1,25 @@
from ..compat.compat_utils import passthrough_module from ..compat.compat_utils import passthrough_module
from ..globals import extractors as _extractors_context
from ..globals import plugin_ies as _plugin_ies_context
from ..plugins import PluginSpec, register_plugin_spec
passthrough_module(__name__, '.extractors') passthrough_module(__name__, '.extractors')
del passthrough_module del passthrough_module
register_plugin_spec(PluginSpec(
module_name='extractor',
suffix='IE',
destination=_extractors_context,
plugin_destination=_plugin_ies_context,
))
def gen_extractor_classes(): def gen_extractor_classes():
""" Return a list of supported extractors. """ Return a list of supported extractors.
The order does matter; the first extractor matched is the one handling the URL. The order does matter; the first extractor matched is the one handling the URL.
""" """
from .extractors import _ALL_CLASSES import_extractors()
return list(_extractors_context.value.values())
return _ALL_CLASSES
def gen_extractors(): def gen_extractors():
@@ -37,6 +46,9 @@ def list_extractors(age_limit=None):
def get_info_extractor(ie_name): def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name""" """Returns the info extractor class with the given ie_name"""
from . import extractors import_extractors()
return _extractors_context.value[f'{ie_name}IE']
return getattr(extractors, f'{ie_name}IE')
def import_extractors():
from . import extractors # noqa: F401

View File

@@ -312,6 +312,7 @@ from .brilliantpala import (
) )
from .bundesliga import BundesligaIE from .bundesliga import BundesligaIE
from .bundestag import BundestagIE from .bundestag import BundestagIE
from .bunnycdn import BunnyCdnIE
from .businessinsider import BusinessInsiderIE from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE from .byutv import BYUtvIE
@@ -335,6 +336,7 @@ from .canal1 import Canal1IE
from .canalalpha import CanalAlphaIE from .canalalpha import CanalAlphaIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE from .caracoltv import CaracolTvPlayIE
from .cartoonnetwork import CartoonNetworkIE from .cartoonnetwork import CartoonNetworkIE
from .cbc import ( from .cbc import (
@@ -1053,6 +1055,7 @@ from .livestream import (
) )
from .livestreamfails import LivestreamfailsIE from .livestreamfails import LivestreamfailsIE
from .lnk import LnkIE from .lnk import LnkIE
from .loco import LocoIE
from .loom import ( from .loom import (
LoomFolderIE, LoomFolderIE,
LoomIE, LoomIE,
@@ -1881,6 +1884,8 @@ from .skyit import (
SkyItVideoIE, SkyItVideoIE,
SkyItVideoLiveIE, SkyItVideoLiveIE,
TV8ItIE, TV8ItIE,
TV8ItLiveIE,
TV8ItPlaylistIE,
) )
from .skylinewebcams import SkylineWebcamsIE from .skylinewebcams import SkylineWebcamsIE
from .skynewsarabia import ( from .skynewsarabia import (
@@ -1894,6 +1899,7 @@ from .slutload import SlutloadIE
from .smotrim import SmotrimIE from .smotrim import SmotrimIE
from .snapchat import SnapchatSpotlightIE from .snapchat import SnapchatSpotlightIE
from .snotr import SnotrIE from .snotr import SnotrIE
from .softwhiteunderbelly import SoftWhiteUnderbellyIE
from .sohu import ( from .sohu import (
SohuIE, SohuIE,
SohuVIE, SohuVIE,
@@ -2222,6 +2228,7 @@ from .tvplay import (
TVPlayIE, TVPlayIE,
) )
from .tvplayer import TVPlayerIE from .tvplayer import TVPlayerIE
from .tvw import TvwIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE from .twentythreevideo import TwentyThreeVideoIE
@@ -2396,7 +2403,6 @@ from .voxmedia import (
from .vrt import ( from .vrt import (
VRTIE, VRTIE,
DagelijkseKostIE, DagelijkseKostIE,
KetnetIE,
Radio1BeIE, Radio1BeIE,
VrtNUIE, VrtNUIE,
) )

View File

@@ -1,3 +1,4 @@
import datetime as dt
import functools import functools
from .common import InfoExtractor from .common import InfoExtractor
@@ -10,7 +11,7 @@ from ..utils import (
filter_dict, filter_dict,
int_or_none, int_or_none,
orderedSet, orderedSet,
unified_timestamp, parse_iso8601,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@@ -87,9 +88,9 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'uploader_id': 'rlantnghks', 'uploader_id': 'rlantnghks',
'uploader': '페이즈으', 'uploader': '페이즈으',
'duration': 10840, 'duration': 10840,
'thumbnail': r're:https?://videoimg\.sooplive\.co/.kr/.+', 'thumbnail': r're:https?://videoimg\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'upload_date': '20230108', 'upload_date': '20230108',
'timestamp': 1673218805, 'timestamp': 1673186405,
'title': '젠지 페이즈', 'title': '젠지 페이즈',
}, },
'params': { 'params': {
@@ -102,7 +103,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20170411_BE689A0E_190960999_1_2_h', 'id': '20170411_BE689A0E_190960999_1_2_h',
'ext': 'mp4', 'ext': 'mp4',
'title': '혼자사는여자집', 'title': '혼자사는여자집',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+', 'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '♥이슬이', 'uploader': '♥이슬이',
'uploader_id': 'dasl8121', 'uploader_id': 'dasl8121',
'upload_date': '20170411', 'upload_date': '20170411',
@@ -119,7 +120,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20180327_27901457_202289533_1', 'id': '20180327_27901457_202289533_1',
'ext': 'mp4', 'ext': 'mp4',
'title': '[생]빨개요♥ (part 1)', 'title': '[생]빨개요♥ (part 1)',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+', 'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '[SA]서아', 'uploader': '[SA]서아',
'uploader_id': 'bjdyrksu', 'uploader_id': 'bjdyrksu',
'upload_date': '20180327', 'upload_date': '20180327',
@@ -187,7 +188,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'formats': formats, 'formats': formats,
**traverse_obj(file_element, { **traverse_obj(file_element, {
'duration': ('duration', {int_or_none(scale=1000)}), 'duration': ('duration', {int_or_none(scale=1000)}),
'timestamp': ('file_start', {unified_timestamp}), 'timestamp': ('file_start', {parse_iso8601(delimiter=' ', timezone=dt.timedelta(hours=9))}),
}), }),
}) })
@@ -370,7 +371,7 @@ class AfreecaTVLiveIE(AfreecaTVBaseIE):
'title': channel_info.get('TITLE') or station_info.get('station_title'), 'title': channel_info.get('TITLE') or station_info.get('station_title'),
'uploader': channel_info.get('BJNICK') or station_info.get('station_name'), 'uploader': channel_info.get('BJNICK') or station_info.get('station_name'),
'uploader_id': broadcaster_id, 'uploader_id': broadcaster_id,
'timestamp': unified_timestamp(station_info.get('broad_start')), 'timestamp': parse_iso8601(station_info.get('broad_start'), delimiter=' ', timezone=dt.timedelta(hours=9)),
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': True,
'http_headers': {'Referer': url}, 'http_headers': {'Referer': url},

View File

@@ -1,7 +1,6 @@
import json
from .common import InfoExtractor from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils.traversal import require, traverse_obj
class AZMedienIE(InfoExtractor): class AZMedienIE(InfoExtractor):
@@ -9,15 +8,15 @@ class AZMedienIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.|tv\.)? (?:www\.|tv\.)?
(?P<host> (?:
telezueri\.ch| telezueri\.ch|
telebaern\.tv| telebaern\.tv|
telem1\.ch| telem1\.ch|
tvo-online\.ch tvo-online\.ch
)/ )/
[^/]+/ [^/?#]+/
(?P<id> (?P<id>
[^/]+-(?P<article_id>\d+) [^/?#]+-\d+
) )
(?: (?:
\#video= \#video=
@@ -47,19 +46,17 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1', 'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True, 'only_matching': True,
}] }]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221' _PARTNER_ID = '1719221'
def _real_extract(self, url): def _real_extract(self, url):
host, display_id, article_id, entry_id = self._match_valid_url(url).groups() display_id, entry_id = self._match_valid_url(url).groups()
if not entry_id: if not entry_id:
entry_id = self._download_json( webpage = self._download_webpage(url, display_id)
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={ data = self._search_json(
'variables': json.dumps({ r'window\.__APOLLO_STATE__\s*=', webpage, 'video data', display_id)
'contextId': 'NewsArticle:' + article_id, entry_id = traverse_obj(data, (
}), lambda _, v: v['__typename'] == 'KalturaData', 'kalturaId', any, {require('kaltura id')}))
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
return self.url_result( return self.url_result(
f'kaltura:{self._PARTNER_ID}:{entry_id}', f'kaltura:{self._PARTNER_ID}:{entry_id}',

View File

@@ -86,7 +86,7 @@ class BandlabBaseIE(InfoExtractor):
'webpage_url': ( 'webpage_url': (
'id', ({value(url)}, {format_field(template='https://www.bandlab.com/post/%s')}), filter, any), 'id', ({value(url)}, {format_field(template='https://www.bandlab.com/post/%s')}), filter, any),
'url': ('video', 'url', {url_or_none}), 'url': ('video', 'url', {url_or_none}),
'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}), 'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
'description': ('caption', {str}), 'description': ('caption', {str}),
'thumbnail': ('video', 'picture', 'url', {url_or_none}), 'thumbnail': ('video', 'picture', 'url', {url_or_none}),
'view_count': ('video', 'counters', 'plays', {int_or_none}), 'view_count': ('video', 'counters', 'plays', {int_or_none}),
@@ -120,7 +120,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995, 'duration': 54.629999999999995,
'title': 'sweet black', 'title': 'sweet black',
'upload_date': '20231210', 'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'], 'genres': ['Lofi'],
'uploader': 'ender milze', 'uploader': 'ender milze',
'comment_count': int, 'comment_count': int,
@@ -142,7 +142,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995, 'duration': 54.629999999999995,
'title': 'sweet black', 'title': 'sweet black',
'upload_date': '20231210', 'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'], 'genres': ['Lofi'],
'uploader': 'ender milze', 'uploader': 'ender milze',
'comment_count': int, 'comment_count': int,
@@ -158,7 +158,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int, 'comment_count': int,
'genres': ['Other'], 'genres': ['Other'],
'uploader_id': 'user8353034818103753', 'uploader_id': 'user8353034818103753',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/',
'timestamp': 1709625771, 'timestamp': 1709625771,
'track': 'PodcastMaerchen4b', 'track': 'PodcastMaerchen4b',
'duration': 468.14, 'duration': 468.14,
@@ -178,7 +178,7 @@ class BandlabIE(BandlabBaseIE):
'id': '110343fc-148b-ea11-96d2-0003ffd1fc09', 'id': '110343fc-148b-ea11-96d2-0003ffd1fc09',
'ext': 'm4a', 'ext': 'm4a',
'timestamp': 1588273294, 'timestamp': 1588273294,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/',
'description': 'Final Revision.', 'description': 'Final Revision.',
'title': 'Replay ( Instrumental)', 'title': 'Replay ( Instrumental)',
'uploader': 'David R Sparks', 'uploader': 'David R Sparks',
@@ -200,7 +200,7 @@ class BandlabIE(BandlabBaseIE):
'id': '5cdf9036-3857-ef11-991a-6045bd36e0d9', 'id': '5cdf9036-3857-ef11-991a-6045bd36e0d9',
'ext': 'mp4', 'ext': 'mp4',
'duration': 44.705, 'duration': 44.705,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/',
'comment_count': int, 'comment_count': int,
'title': 'backing vocals', 'title': 'backing vocals',
'uploader_id': 'marliashya', 'uploader_id': 'marliashya',
@@ -224,7 +224,7 @@ class BandlabIE(BandlabBaseIE):
'view_count': int, 'view_count': int,
'track': 'Positronic Meltdown', 'track': 'Positronic Meltdown',
'duration': 318.55, 'duration': 318.55,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/',
'description': 'Checkout my tracks at AOMX http://aomxsounds.com/', 'description': 'Checkout my tracks at AOMX http://aomxsounds.com/',
'uploader_id': 'microfreaks', 'uploader_id': 'microfreaks',
'title': 'Positronic Meltdown', 'title': 'Positronic Meltdown',
@@ -246,7 +246,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int, 'comment_count': int,
'uploader': 'Sorakime', 'uploader': 'Sorakime',
'uploader_id': 'sorakime', 'uploader_id': 'sorakime',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/', 'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/',
'timestamp': 1691162128, 'timestamp': 1691162128,
'upload_date': '20230804', 'upload_date': '20230804',
'media_type': 'track', 'media_type': 'track',

View File

@@ -1596,16 +1596,16 @@ class BilibiliPlaylistIE(BilibiliSpaceListBaseIE):
webpage = self._download_webpage(url, list_id) webpage = self._download_webpage(url, list_id)
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', list_id) initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', list_id)
if traverse_obj(initial_state, ('error', 'code', {int_or_none})) != 200: error = traverse_obj(initial_state, (('error', 'listError'), all, lambda _, v: v['code'], any))
error_code = traverse_obj(initial_state, ('error', 'trueCode', {int_or_none})) if error and error['code'] != 200:
error_message = traverse_obj(initial_state, ('error', 'message', {str_or_none})) error_code = error.get('trueCode')
if error_code == -400 and list_id == 'watchlater': if error_code == -400 and list_id == 'watchlater':
self.raise_login_required('You need to login to access your watchlater playlist') self.raise_login_required('You need to login to access your watchlater playlist')
elif error_code == -403: elif error_code == -403:
self.raise_login_required('This is a private playlist. You need to login as its owner') self.raise_login_required('This is a private playlist. You need to login as its owner')
elif error_code == 11010: elif error_code == 11010:
raise ExtractorError('Playlist is no longer available', expected=True) raise ExtractorError('Playlist is no longer available', expected=True)
raise ExtractorError(f'Could not access playlist: {error_code} {error_message}') raise ExtractorError(f'Could not access playlist: {error_code} {error.get("message")}')
query = { query = {
'ps': 20, 'ps': 20,

View File

@@ -53,7 +53,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$', 'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...', 'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement', 'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}', 'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911', 'upload_date': '20240911',
@@ -172,7 +172,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur', 'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$', 'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...', 'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement', 'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}', 'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911', 'upload_date': '20240911',
@@ -191,7 +191,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '3l7rdfxhyds2f', 'id': '3l7rdfxhyds2f',
'ext': 'mp4', 'ext': 'mp4',
'uploader': 'cinnamon', 'uploader': 'cinnamon 🐇 🏳️‍⚧️',
'uploader_id': 'cinny.bun.how', 'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/cinny.bun.how', 'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide', 'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
@@ -255,7 +255,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '3l77u64l7le2e', 'id': '3l77u64l7le2e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'hearing people on twitter say that bluesky isn\'...', 'title': "hearing people on twitter say that bluesky isn't funny yet so post t...",
'like_count': int, 'like_count': int,
'uploader_id': 'thafnine.net', 'uploader_id': 'thafnine.net',
'uploader_url': 'https://bsky.app/profile/thafnine.net', 'uploader_url': 'https://bsky.app/profile/thafnine.net',
@@ -387,7 +387,7 @@ class BlueskyIE(InfoExtractor):
'age_limit': ( 'age_limit': (
'labels', ..., 'val', {lambda x: 18 if x in ('sexual', 'porn', 'graphic-media') else None}, any), 'labels', ..., 'val', {lambda x: 18 if x in ('sexual', 'porn', 'graphic-media') else None}, any),
'description': (*record_path, 'text', {str}, filter), 'description': (*record_path, 'text', {str}, filter),
'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}), 'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
}), }),
}) })
return entries return entries

View File

@@ -0,0 +1,178 @@
import json
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
extract_attributes,
int_or_none,
parse_qs,
smuggle_url,
unsmuggle_url,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import find_element, traverse_obj
class BunnyCdnIE(InfoExtractor):
_VALID_URL = r'https?://(?:iframe\.mediadelivery\.net|video\.bunnycdn\.com)/(?:embed|play)/(?P<library_id>\d+)/(?P<id>[\da-f-]+)'
_EMBED_REGEX = [rf'<iframe[^>]+src=[\'"](?P<url>{_VALID_URL}[^\'"]*)[\'"]']
_TESTS = [{
'url': 'https://iframe.mediadelivery.net/embed/113933/e73edec1-e381-4c8b-ae73-717a140e0924',
'info_dict': {
'id': 'e73edec1-e381-4c8b-ae73-717a140e0924',
'ext': 'mp4',
'title': 'mistress morgana (3).mp4',
'description': '',
'timestamp': 1693251673,
'thumbnail': r're:^https?://.*\.b-cdn\.net/e73edec1-e381-4c8b-ae73-717a140e0924/thumbnail\.jpg',
'duration': 7.0,
'upload_date': '20230828',
},
'params': {'skip_download': True},
}, {
'url': 'https://iframe.mediadelivery.net/play/136145/32e34c4b-0d72-437c-9abb-05e67657da34',
'info_dict': {
'id': '32e34c4b-0d72-437c-9abb-05e67657da34',
'ext': 'mp4',
'timestamp': 1691145748,
'thumbnail': r're:^https?://.*\.b-cdn\.net/32e34c4b-0d72-437c-9abb-05e67657da34/thumbnail_9172dc16\.jpg',
'duration': 106.0,
'description': 'md5:981a3e899a5c78352b21ed8b2f1efd81',
'upload_date': '20230804',
'title': 'Sanela ist Teil der #arbeitsmarktkraft',
},
'params': {'skip_download': True},
}, {
# Stream requires activation and pings
'url': 'https://iframe.mediadelivery.net/embed/200867/2e8545ec-509d-4571-b855-4cf0235ccd75',
'info_dict': {
'id': '2e8545ec-509d-4571-b855-4cf0235ccd75',
'ext': 'mp4',
'timestamp': 1708497752,
'title': 'netflix part 1',
'duration': 3959.0,
'description': '',
'upload_date': '20240221',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2e8545ec-509d-4571-b855-4cf0235ccd75/thumbnail\.jpg',
},
'params': {'skip_download': True},
}]
_WEBPAGE_TESTS = [{
# Stream requires Referer
'url': 'https://conword.io/',
'info_dict': {
'id': '3a5d863e-9cd6-447e-b6ef-e289af50b349',
'ext': 'mp4',
'title': 'Conword bei der Stadt Köln und Stadt Dortmund',
'description': '',
'upload_date': '20231031',
'duration': 31.0,
'thumbnail': 'https://video.watchuh.com/3a5d863e-9cd6-447e-b6ef-e289af50b349/thumbnail.jpg',
'timestamp': 1698783879,
},
'params': {'skip_download': True},
}, {
# URL requires token and expires
'url': 'https://www.stockphotos.com/video/moscow-subway-the-train-is-arriving-at-the-park-kultury-station-10017830',
'info_dict': {
'id': '0b02fa20-4e8c-4140-8f87-f64d820a3386',
'ext': 'mp4',
'thumbnail': r're:^https?://.*\.b-cdn\.net/0b02fa20-4e8c-4140-8f87-f64d820a3386/thumbnail\.jpg',
'title': 'Moscow subway. The train is arriving at the Park Kultury station.',
'upload_date': '20240531',
'duration': 18.0,
'timestamp': 1717152269,
'description': '',
},
'params': {'skip_download': True},
}]
@classmethod
def _extract_embed_urls(cls, url, webpage):
for embed_url in super()._extract_embed_urls(url, webpage):
yield smuggle_url(embed_url, {'Referer': url})
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id, library_id = self._match_valid_url(url).group('id', 'library_id')
webpage = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/{library_id}/{video_id}', video_id,
headers=traverse_obj(smuggled_data, {'Referer': 'Referer'}),
query=traverse_obj(parse_qs(url), {'token': 'token', 'expires': 'expires'}))
if html_title := self._html_extract_title(webpage, default=None) == '403':
raise ExtractorError(
'This video is inaccessible. Setting a Referer header '
'might be required to access the video', expected=True)
elif html_title == '404':
raise ExtractorError('This video does not exist', expected=True)
headers = {'Referer': url}
info = traverse_obj(self._parse_html5_media_entries(url, webpage, video_id, _headers=headers), 0) or {}
formats = info.get('formats') or []
subtitles = info.get('subtitles') or {}
original_url = self._search_regex(
r'(?:var|const|let)\s+originalUrl\s*=\s*["\']([^"\']+)["\']', webpage, 'original url', default=None)
if url_or_none(original_url):
urlh = self._request_webpage(
HEADRequest(original_url), video_id=video_id, note='Checking original',
headers=headers, fatal=False, expected_status=(403, 404))
if urlh and urlh.status == 200:
formats.append({
'url': original_url,
'format_id': 'source',
'quality': 1,
'http_headers': headers,
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
})
# MediaCage Streams require activation and pings
src_url = self._search_regex(
r'\.setAttribute\([\'"]src[\'"],\s*[\'"]([^\'"]+)[\'"]\)', webpage, 'src url', default=None)
activation_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/activate)[\'"]', webpage, 'activation url', default=None)
ping_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/ping)[\'"]', webpage, 'ping url', default=None)
secret = traverse_obj(parse_qs(src_url), ('secret', 0))
context_id = traverse_obj(parse_qs(src_url), ('contextId', 0))
ping_data = {}
if src_url and activation_url and ping_url and secret and context_id:
self._download_webpage(
activation_url, video_id, headers=headers, note='Downloading activation data')
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, video_id, 'mp4', headers=headers, m3u8_id='hls', fatal=False)
for fmt in fmts:
fmt.update({
'protocol': 'bunnycdn',
'http_headers': headers,
})
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
ping_data = {
'_bunnycdn_ping_data': {
'url': ping_url,
'headers': headers,
'secret': secret,
'context_id': context_id,
},
}
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(webpage, ({find_element(id='main-video', html=True)}, {extract_attributes}, {
'title': ('data-plyr-config', {json.loads}, 'title', {str}),
'thumbnail': ('data-poster', {url_or_none}),
})),
**ping_data,
**self._search_json_ld(webpage, video_id, fatal=False),
}

View File

@@ -0,0 +1,84 @@
import json
import time
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
jwt_decode_hs256,
parse_iso8601,
url_or_none,
variadic,
)
from ..utils.traversal import traverse_obj
class CanalsurmasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canalsurmas\.es/videos/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.canalsurmas.es/videos/44006-el-gran-queo-1-lora-del-rio-sevilla-20072014',
'md5': '861f86fdc1221175e15523047d0087ef',
'info_dict': {
'id': '44006',
'ext': 'mp4',
'title': 'Lora del Río (Sevilla)',
'description': 'md5:3d9ee40a9b1b26ed8259e6b71ed27b8b',
'thumbnail': 'https://cdn2.rtva.interactvty.com/content_cards/00f3e8f67b0a4f3b90a4a14618a48b0d.jpg',
'timestamp': 1648123182,
'upload_date': '20220324',
},
}]
_API_BASE = 'https://api-rtva.interactvty.com'
_access_token = None
@staticmethod
def _is_jwt_expired(token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _call_api(self, endpoint, video_id, fields=None):
if not self._access_token or self._is_jwt_expired(self._access_token):
self._access_token = self._download_json(
f'{self._API_BASE}/jwt/token/', None,
'Downloading access token', 'Failed to download access token',
headers={'Content-Type': 'application/json'},
data=json.dumps({
'username': 'canalsur_demo',
'password': 'dsUBXUcI',
}).encode())['access']
return self._download_json(
f'{self._API_BASE}/api/2.0/contents/{endpoint}/{video_id}/', video_id,
f'Downloading {endpoint} API JSON', f'Failed to download {endpoint} API JSON',
headers={'Authorization': f'jwtok {self._access_token}'},
query={'optional_fields': ','.join(variadic(fields))} if fields else None)
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._call_api('content', video_id, fields=[
'description', 'image', 'duration', 'created_at', 'tags',
])
stream_info = self._call_api('content_resources', video_id, 'media_url')
formats, subtitles = [], {}
for stream_url in traverse_obj(stream_info, ('results', ..., 'media_url', {url_or_none})):
if determine_ext(stream_url) == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({'url': stream_url})
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_info, {
'title': ('name', {str.strip}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'duration': ('duration', {float_or_none}),
'timestamp': ('created_at', {parse_iso8601}),
'tags': ('tags', ..., {str}),
}),
}

View File

@@ -1,17 +1,17 @@
import base64
import functools import functools
import json
import re import re
import time import time
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
js_to_json, js_to_json,
jwt_decode_hs256,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_age_limit, parse_age_limit,
@@ -24,6 +24,7 @@ from ..utils import (
update_url, update_url,
url_basename, url_basename,
url_or_none, url_or_none,
urlencode_postdata,
) )
from ..utils.traversal import require, traverse_obj, trim_str from ..utils.traversal import require, traverse_obj, trim_str
@@ -608,66 +609,82 @@ class CBCGemIE(CBCGemBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
_TOKEN_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37' _CLIENT_ID = 'fc05b0ee-3865-4400-a3cc-3da82c330c23'
_refresh_token = None
_access_token = None
_claims_token = None _claims_token = None
def _new_claims_token(self, email, password): @functools.cached_property
data = json.dumps({ def _ropc_settings(self):
'email': email, return self._download_json(
'password': password, 'https://services.radio-canada.ca/ott/catalog/v1/gem/settings', None,
}).encode() 'Downloading site settings', query={'device': 'web'})['identityManagement']['ropc']
headers = {'content-type': 'application/json'}
query = {'apikey': self._TOKEN_API_KEY}
resp = self._download_json('https://api.loginradius.com/identity/v2/auth/login',
None, data=data, headers=headers, query=query)
access_token = resp['access_token']
query = { def _is_jwt_expired(self, token):
'access_token': access_token, return jwt_decode_hs256(token)['exp'] - time.time() < 300
'apikey': self._TOKEN_API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json('https://cloud-api.loginradius.com/sso/jwt/api/token',
None, headers=headers, query=query)
sig = resp['signature']
data = json.dumps({'jwt': sig}).encode() def _call_oauth_api(self, oauth_data, note='Refreshing access token'):
headers = {'content-type': 'application/json', 'ott-device-type': 'web'} response = self._download_json(
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/token', self._ropc_settings['url'], None, note, data=urlencode_postdata({
None, data=data, headers=headers, expected_status=426) 'client_id': self._CLIENT_ID,
cbc_access_token = resp['accessToken'] **oauth_data,
'scope': self._ropc_settings['scopes'],
}))
self._refresh_token = response['refresh_token']
self._access_token = response['access_token']
self.cache.store(self._NETRC_MACHINE, 'token_data', [self._refresh_token, self._access_token])
headers = {'content-type': 'application/json', 'ott-device-type': 'web', 'ott-access-token': cbc_access_token} def _perform_login(self, username, password):
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/profile', if not self._refresh_token:
None, headers=headers, expected_status=426) self._refresh_token, self._access_token = self.cache.load(
return resp['claimsToken'] self._NETRC_MACHINE, 'token_data', default=[None, None])
def _get_claims_token_expiry(self): if self._refresh_token and self._access_token:
# Token is a JWT self.write_debug('Using cached refresh token')
# JWT is decoded here and 'exp' field is extracted if not self._claims_token:
# It is a Unix timestamp for when the token expires self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
b64_data = self._claims_token.split('.')[1]
data = base64.urlsafe_b64decode(b64_data + '==')
return json.loads(data)['exp']
def claims_token_expired(self):
exp = self._get_claims_token_expiry()
# It will expire in less than 10 seconds, or has already expired
return exp - time.time() < 10
def claims_token_valid(self):
return self._claims_token is not None and not self.claims_token_expired()
def _get_claims_token(self, email, password):
if not self.claims_token_valid():
self._claims_token = self._new_claims_token(email, password)
self.cache.store(self._NETRC_MACHINE, 'claims_token', self._claims_token)
return self._claims_token
def _real_initialize(self):
if self.claims_token_valid():
return return
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
try:
self._call_oauth_api({
'grant_type': 'password',
'username': username,
'password': password,
}, note='Logging in')
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 400:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
def _fetch_access_token(self):
if self._is_jwt_expired(self._access_token):
try:
self._call_oauth_api({
'grant_type': 'refresh_token',
'refresh_token': self._refresh_token,
})
except ExtractorError:
self._refresh_token, self._access_token = None, None
self.cache.store(self._NETRC_MACHINE, 'token_data', [None, None])
self.report_warning('Refresh token has been invalidated; retrying with credentials')
self._perform_login(*self._get_login_info())
return self._access_token
def _fetch_claims_token(self):
if not self._get_login_info()[0]:
return None
if not self._claims_token or self._is_jwt_expired(self._claims_token):
self._claims_token = self._download_json(
'https://services.radio-canada.ca/ott/subscription/v2/gem/Subscriber/profile',
None, 'Downloading claims token', query={'device': 'web'},
headers={'Authorization': f'Bearer {self._fetch_access_token()}'})['claimsToken']
self.cache.store(self._NETRC_MACHINE, 'claims_token', self._claims_token)
else:
self.write_debug('Using cached claims token')
return self._claims_token
def _real_extract(self, url): def _real_extract(self, url):
video_id, season_number = self._match_valid_url(url).group('id', 'season') video_id, season_number = self._match_valid_url(url).group('id', 'season')
@@ -675,14 +692,10 @@ class CBCGemIE(CBCGemBaseIE):
item_info = traverse_obj(video_info, ( item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items', 'content', ..., 'lineups', ..., 'items',
lambda _, v: v['url'] == video_id, any, {require('item info')})) lambda _, v: v['url'] == video_id, any, {require('item info')}))
media_id = item_info['idMedia']
email, password = self._get_login_info() headers = {}
if email and password: if claims_token := self._fetch_claims_token():
claims_token = self._get_claims_token(email, password) headers['x-claims-token'] = claims_token
headers = {'x-claims-token': claims_token}
else:
headers = {}
m3u8_info = self._download_json( m3u8_info = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/', 'https://services.radio-canada.ca/media/validation/v2/',
@@ -695,7 +708,7 @@ class CBCGemIE(CBCGemBaseIE):
'tech': 'hls', 'tech': 'hls',
'manifestVersion': '2', 'manifestVersion': '2',
'manifestType': 'desktop', 'manifestType': 'desktop',
'idMedia': media_id, 'idMedia': item_info['idMedia'],
}) })
if m3u8_info.get('errorCode') == 1: if m3u8_info.get('errorCode') == 1:

View File

@@ -121,10 +121,7 @@ class CDAIE(InfoExtractor):
}, **kwargs) }, **kwargs)
def _perform_login(self, username, password): def _perform_login(self, username, password):
app_version = random.choice(( app_version = '1.2.255 build 21541'
'1.2.88 build 15306',
'1.2.174 build 18469',
))
android_version = random.randrange(8, 14) android_version = random.randrange(8, 14)
phone_model = random.choice(( phone_model = random.choice((
# x-kom.pl top selling Android smartphones, as of 2022-12-26 # x-kom.pl top selling Android smartphones, as of 2022-12-26
@@ -190,7 +187,7 @@ class CDAIE(InfoExtractor):
meta = self._download_json( meta = self._download_json(
f'{self._BASE_API_URL}/video/{video_id}', video_id, headers=self._API_HEADERS)['video'] f'{self._BASE_API_URL}/video/{video_id}', video_id, headers=self._API_HEADERS)['video']
uploader = traverse_obj(meta, 'author', 'login') uploader = traverse_obj(meta, ('author', 'login', {str}))
formats = [{ formats = [{
'url': quality['file'], 'url': quality['file'],

View File

@@ -29,6 +29,7 @@ from ..compat import (
from ..cookies import LenientSimpleCookie from ..cookies import LenientSimpleCookie
from ..downloader.f4m import get_base_url, remove_encrypted_media from ..downloader.f4m import get_base_url, remove_encrypted_media
from ..downloader.hls import HlsFD from ..downloader.hls import HlsFD
from ..globals import plugin_ies_overrides
from ..networking import HEADRequest, Request from ..networking import HEADRequest, Request
from ..networking.exceptions import ( from ..networking.exceptions import (
HTTPError, HTTPError,
@@ -2934,8 +2935,7 @@ class InfoExtractor:
segment_duration = None segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info: if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale']) segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil( representation_ms_info['total_number'] = math.ceil(float_or_none(period_duration, segment_duration, default=0))
float_or_none(period_duration, segment_duration, default=0)))
representation_ms_info['fragments'] = [{ representation_ms_info['fragments'] = [{
media_location_key: media_template % { media_location_key: media_template % {
'Number': segment_number, 'Number': segment_number,
@@ -3954,14 +3954,18 @@ class InfoExtractor:
def __init_subclass__(cls, *, plugin_name=None, **kwargs): def __init_subclass__(cls, *, plugin_name=None, **kwargs):
if plugin_name: if plugin_name:
mro = inspect.getmro(cls) mro = inspect.getmro(cls)
super_class = cls.__wrapped__ = mro[mro.index(cls) + 1] next_mro_class = super_class = mro[mro.index(cls) + 1]
cls.PLUGIN_NAME, cls.ie_key = plugin_name, super_class.ie_key
cls.IE_NAME = f'{super_class.IE_NAME}+{plugin_name}'
while getattr(super_class, '__wrapped__', None): while getattr(super_class, '__wrapped__', None):
super_class = super_class.__wrapped__ super_class = super_class.__wrapped__
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
_PLUGIN_OVERRIDES[super_class].append(cls)
if not any(override.PLUGIN_NAME == plugin_name for override in plugin_ies_overrides.value[super_class]):
cls.__wrapped__ = next_mro_class
cls.PLUGIN_NAME, cls.ie_key = plugin_name, next_mro_class.ie_key
cls.IE_NAME = f'{next_mro_class.IE_NAME}+{plugin_name}'
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
plugin_ies_overrides.value[super_class].append(cls)
return super().__init_subclass__(**kwargs) return super().__init_subclass__(**kwargs)
@@ -4017,6 +4021,3 @@ class UnsupportedURLIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
raise UnsupportedError(url) raise UnsupportedError(url)
_PLUGIN_OVERRIDES = collections.defaultdict(list)

View File

@@ -3,7 +3,7 @@ from ..utils import int_or_none
class CultureUnpluggedIE(InfoExtractor): class CultureUnpluggedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?' _VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/(?:documentary/watch-online/)?play/(?P<id>\d+)(?:/(?P<display_id>[^/#?]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West', 'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
'md5': 'ac6c093b089f7d05e79934dcb3d228fc', 'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
@@ -12,12 +12,25 @@ class CultureUnpluggedIE(InfoExtractor):
'display_id': 'The-Next--Best-West', 'display_id': 'The-Next--Best-West',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Next, Best West', 'title': 'The Next, Best West',
'description': 'md5:0423cd00833dea1519cf014e9d0903b1', 'description': 'md5:770033a3b7c2946a3bcfb7f1c6fb7045',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'Coldstream Creative', 'creators': ['Coldstream Creative'],
'duration': 2203, 'duration': 2203,
'view_count': int, 'view_count': int,
}, },
}, {
'url': 'https://www.cultureunplugged.com/play/2833/Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'md5': 'dc2014bc470dfccba389a1c934fa29fa',
'info_dict': {
'id': '2833',
'display_id': 'Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'ext': 'mp4',
'title': 'Koi Sunta Hai: Journeys with Kumar & Kabir (Someone is Listening)',
'description': 'md5:fa94ac934927c98660362b8285b2cda5',
'view_count': int,
'thumbnail': 'https://s3.amazonaws.com/cdn.cultureunplugged.com/thumbnails_16_9/lg/2833.jpg',
'creators': ['Srishti'],
},
}, { }, {
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662', 'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
'only_matching': True, 'only_matching': True,

View File

@@ -100,7 +100,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor): class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'''(?ix) _VALID_URL = r'''(?ix)
https?:// (?:https?:)?//
(?: (?:
dai\.ly/| dai\.ly/|
(?: (?:
@@ -116,7 +116,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
(?P<id>[^/?_&#]+)(?:[\w-]*\?playlist=(?P<playlist_id>x[0-9a-z]+))? (?P<id>[^/?_&#]+)(?:[\w-]*\?playlist=(?P<playlist_id>x[0-9a-z]+))?
''' '''
IE_NAME = 'dailymotion' IE_NAME = 'dailymotion'
_EMBED_REGEX = [r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1'] _EMBED_REGEX = [rf'(?ix)<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)["\'](?P<url>{_VALID_URL[5:]})']
_TESTS = [{ _TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news', 'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe', 'md5': '074b95bdee76b9e3654137aee9c79dfe',
@@ -308,6 +308,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'description': 'Que lindura', 'description': 'Que lindura',
'tags': [], 'tags': [],
}, },
}, {
# //geo.dailymotion.com/player/xysxq.html?video=k2Y4Mjp7krAF9iCuINM
'url': 'https://lcp.fr/programmes/avant-la-catastrophe-la-naissance-de-la-dictature-nazie-1933-1936-346819',
'info_dict': {
'id': 'k2Y4Mjp7krAF9iCuINM',
'ext': 'mp4',
'title': 'Avant la catastrophe la naissance de la dictature nazie 1933 -1936',
'description': 'md5:7b620d5e26edbe45f27bbddc1c0257c1',
'uploader': 'LCP Assemblée nationale',
'uploader_id': 'xbz33d',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 3220,
'thumbnail': 'https://s1.dmcdn.net/v/Xvumk1djJBUZfjj2a/x1080',
'tags': [],
'timestamp': 1739919947,
'upload_date': '20250218',
},
}] }]
_GEO_BYPASS = False _GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description _COMMON_MEDIA_FIELDS = '''description

View File

@@ -1,28 +1,37 @@
import contextlib import inspect
import os import os
from ..plugins import load_plugins from ..globals import LAZY_EXTRACTORS
from ..globals import extractors as _extractors_context
# NB: Must be before other imports so that plugins can be correctly injected _CLASS_LOOKUP = None
_PLUGIN_CLASSES = load_plugins('extractor', 'IE') if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
LAZY_EXTRACTORS.value = False
else:
try:
from .lazy_extractors import _CLASS_LOOKUP
LAZY_EXTRACTORS.value = True
except ImportError:
LAZY_EXTRACTORS.value = None
_LAZY_LOADER = False if not _CLASS_LOOKUP:
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'): from . import _extractors
with contextlib.suppress(ImportError):
from .lazy_extractors import * # noqa: F403
from .lazy_extractors import _ALL_CLASSES
_LAZY_LOADER = True
if not _LAZY_LOADER: _CLASS_LOOKUP = {
from ._extractors import * # noqa: F403 name: value
_ALL_CLASSES = [ # noqa: F811 for name, value in inspect.getmembers(_extractors)
klass
for name, klass in globals().items()
if name.endswith('IE') and name != 'GenericIE' if name.endswith('IE') and name != 'GenericIE'
] }
_ALL_CLASSES.append(GenericIE) # noqa: F405 _CLASS_LOOKUP['GenericIE'] = _extractors.GenericIE
globals().update(_PLUGIN_CLASSES) # We want to append to the main lookup
_ALL_CLASSES[:0] = _PLUGIN_CLASSES.values() _current = _extractors_context.value
for name, ie in _CLASS_LOOKUP.items():
_current.setdefault(name, ie)
from .common import _PLUGIN_OVERRIDES # noqa: F401
def __getattr__(name):
value = _CLASS_LOOKUP.get(name)
if not value:
raise AttributeError(f'module {__name__} has no attribute {name}')
return value

View File

@@ -1,19 +0,0 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
)
class GigyaBaseIE(InfoExtractor):
def _gigya_login(self, auth_data):
auth_info = self._download_json(
'https://accounts.eu1.gigya.com/accounts.login', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(auth_data))
error_message = auth_info.get('errorDetails') or auth_info.get('errorMessage')
if error_message:
raise ExtractorError(
f'Unable to login: {error_message}', expected=True)
return auth_info

View File

@@ -69,8 +69,13 @@ class GloboIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '8013907', 'id': '8013907',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Capítulo de 14081989', 'title': 'Capítulo de 14/08/1989',
'episode': 'Episode 1',
'episode_number': 1, 'episode_number': 1,
'uploader': 'Tieta',
'uploader_id': '11895',
'duration': 2858.389,
'subtitles': 'count:1',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -82,7 +87,12 @@ class GloboIE(InfoExtractor):
'id': '12824146', 'id': '12824146',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Acordo de damas', 'title': 'Acordo de damas',
'episode': 'Episode 1',
'episode_number': 1, 'episode_number': 1,
'uploader': 'Rensga Hits!',
'uploader_id': '20481',
'duration': 1953.994,
'season': 'Season 2',
'season_number': 2, 'season_number': 2,
}, },
'params': { 'params': {
@@ -136,9 +146,10 @@ class GloboIE(InfoExtractor):
else: else:
formats, subtitles = self._extract_m3u8_formats_and_subtitles( formats, subtitles = self._extract_m3u8_formats_and_subtitles(
main_source['url'], video_id, 'mp4', m3u8_id='hls') main_source['url'], video_id, 'mp4', m3u8_id='hls')
self._merge_subtitles(traverse_obj(main_source, ('text', ..., {
'url': ('subtitle', 'srt', 'url', {url_or_none}), self._merge_subtitles(traverse_obj(main_source, ('text', ..., ('caption', 'subtitle'), {
}, all, {subs_list_to_dict(lang='en')})), target=subtitles) 'url': ('srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='pt-BR')})), target=subtitles)
return { return {
'id': video_id, 'id': video_id,

View File

@@ -2,12 +2,12 @@ import hashlib
import itertools import itertools
import json import json
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
bug_reports_message,
decode_base_n, decode_base_n,
encode_base_n, encode_base_n,
filter_dict, filter_dict,
@@ -15,12 +15,12 @@ from ..utils import (
format_field, format_field,
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
join_nonempty,
lowercase_escape, lowercase_escape,
str_or_none, str_or_none,
str_to_int, str_to_int,
traverse_obj, traverse_obj,
url_or_none, url_or_none,
urlencode_postdata,
) )
_ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_' _ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
@@ -28,63 +28,30 @@ _ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345678
def _pk_to_id(media_id): def _pk_to_id(media_id):
"""Source: https://stackoverflow.com/questions/24437823/getting-instagram-post-url-from-media-id""" """Source: https://stackoverflow.com/questions/24437823/getting-instagram-post-url-from-media-id"""
return encode_base_n(int(media_id.split('_')[0]), table=_ENCODING_CHARS) pk = int(str(media_id).split('_')[0])
return encode_base_n(pk, table=_ENCODING_CHARS)
def _id_to_pk(shortcode): def _id_to_pk(shortcode):
"""Covert a shortcode to a numeric value""" """Convert a shortcode to a numeric value"""
return decode_base_n(shortcode[:11], table=_ENCODING_CHARS) if len(shortcode) > 28:
shortcode = shortcode[:-28]
return decode_base_n(shortcode, table=_ENCODING_CHARS)
class InstagramBaseIE(InfoExtractor): class InstagramBaseIE(InfoExtractor):
_NETRC_MACHINE = 'instagram'
_IS_LOGGED_IN = False
_API_BASE_URL = 'https://i.instagram.com/api/v1' _API_BASE_URL = 'https://i.instagram.com/api/v1'
_LOGIN_URL = 'https://www.instagram.com/accounts/login' _LOGIN_URL = 'https://www.instagram.com/accounts/login'
_API_HEADERS = {
'X-IG-App-ID': '936619743392459',
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
'Accept': '*/*',
}
def _perform_login(self, username, password): @property
if self._IS_LOGGED_IN: def _api_headers(self):
return return {
'X-IG-App-ID': self._configuration_arg('app_id', ['936619743392459'], ie_key=InstagramIE)[0],
login_webpage = self._download_webpage( 'X-ASBD-ID': '198387',
self._LOGIN_URL, None, note='Downloading login webpage', errnote='Failed to download login webpage') 'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
shared_data = self._parse_json(self._search_regex( 'Accept': '*/*',
r'window\._sharedData\s*=\s*({.+?});', login_webpage, 'shared data', default='{}'), None) }
login = self._download_json(
f'{self._LOGIN_URL}/ajax/', None, note='Logging in', headers={
**self._API_HEADERS,
'X-Requested-With': 'XMLHttpRequest',
'X-CSRFToken': shared_data['config']['csrf_token'],
'X-Instagram-AJAX': shared_data['rollout_hash'],
'Referer': 'https://www.instagram.com/',
}, data=urlencode_postdata({
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{int(time.time())}:{password}',
'username': username,
'queryParams': '{}',
'optIntoOneTap': 'false',
'stopDeletionNonce': '',
'trustedDeviceRecords': '{}',
}))
if not login.get('authenticated'):
if login.get('message'):
raise ExtractorError(f'Unable to login: {login["message"]}')
elif login.get('user'):
raise ExtractorError('Unable to login: Sorry, your password was incorrect. Please double-check your password.', expected=True)
elif login.get('user') is False:
raise ExtractorError('Unable to login: The username you entered doesn\'t belong to an account. Please check your username and try again.', expected=True)
raise ExtractorError('Unable to login')
InstagramBaseIE._IS_LOGGED_IN = True
def _get_count(self, media, kind, *keys): def _get_count(self, media, kind, *keys):
return traverse_obj( return traverse_obj(
@@ -209,7 +176,7 @@ class InstagramBaseIE(InfoExtractor):
def _get_comments(self, video_id): def _get_comments(self, video_id):
comments_info = self._download_json( comments_info = self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/comments/?can_support_threading=true&permalink_enabled=false', video_id, f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/comments/?can_support_threading=true&permalink_enabled=false', video_id,
fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._API_HEADERS) or {} fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._api_headers) or {}
comment_data = traverse_obj(comments_info, ('edge_media_to_parent_comment', 'edges'), 'comments') comment_data = traverse_obj(comments_info, ('edge_media_to_parent_comment', 'edges'), 'comments')
for comment_dict in comment_data or []: for comment_dict in comment_data or []:
@@ -402,14 +369,14 @@ class InstagramIE(InstagramBaseIE):
info = traverse_obj(self._download_json( info = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/info/', video_id, f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/info/', video_id,
fatal=False, errnote='Video info extraction failed', fatal=False, errnote='Video info extraction failed',
note='Downloading video info', headers=self._API_HEADERS), ('items', 0)) note='Downloading video info', headers=self._api_headers), ('items', 0))
if info: if info:
media.update(info) media.update(info)
return self._extract_product(media) return self._extract_product(media)
api_check = self._download_json( api_check = self._download_json(
f'{self._API_BASE_URL}/web/get_ruling_for_content/?content_type=MEDIA&target_id={_id_to_pk(video_id)}', f'{self._API_BASE_URL}/web/get_ruling_for_content/?content_type=MEDIA&target_id={_id_to_pk(video_id)}',
video_id, headers=self._API_HEADERS, fatal=False, note='Setting up session', errnote=False) or {} video_id, headers=self._api_headers, fatal=False, note='Setting up session', errnote=False) or {}
csrf_token = self._get_cookies('https://www.instagram.com').get('csrftoken') csrf_token = self._get_cookies('https://www.instagram.com').get('csrftoken')
if not csrf_token: if not csrf_token:
@@ -429,7 +396,7 @@ class InstagramIE(InstagramBaseIE):
general_info = self._download_json( general_info = self._download_json(
'https://www.instagram.com/graphql/query/', video_id, fatal=False, errnote=False, 'https://www.instagram.com/graphql/query/', video_id, fatal=False, errnote=False,
headers={ headers={
**self._API_HEADERS, **self._api_headers,
'X-CSRFToken': csrf_token or '', 'X-CSRFToken': csrf_token or '',
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
'Referer': url, 'Referer': url,
@@ -437,7 +404,6 @@ class InstagramIE(InstagramBaseIE):
'doc_id': '8845758582119845', 'doc_id': '8845758582119845',
'variables': json.dumps(variables, separators=(',', ':')), 'variables': json.dumps(variables, separators=(',', ':')),
}) })
media.update(traverse_obj(general_info, ('data', 'xdt_shortcode_media')) or {})
if not general_info: if not general_info:
self.report_warning('General metadata extraction failed (some metadata might be missing).', video_id) self.report_warning('General metadata extraction failed (some metadata might be missing).', video_id)
@@ -466,6 +432,26 @@ class InstagramIE(InstagramBaseIE):
media.update(traverse_obj( media.update(traverse_obj(
additional_data, ('graphql', 'shortcode_media'), 'shortcode_media', expected_type=dict) or {}) additional_data, ('graphql', 'shortcode_media'), 'shortcode_media', expected_type=dict) or {})
else:
xdt_shortcode_media = traverse_obj(general_info, ('data', 'xdt_shortcode_media', {dict})) or {}
if not xdt_shortcode_media:
error = join_nonempty('title', 'description', delim=': ', from_dict=api_check)
if 'Restricted Video' in error:
self.raise_login_required(error)
elif error:
raise ExtractorError(error, expected=True)
elif len(video_id) > 28:
# It's a private post (video_id == shortcode + 28 extra characters)
# Only raise after getting empty response; sometimes "long"-shortcode posts are public
self.raise_login_required(
'This content is only available for registered users who follow this account')
raise ExtractorError(
'Instagram sent an empty media response. Check if this post is accessible in your '
f'browser without being logged-in. If it is not, then u{self._login_hint()[1:]}. '
'Otherwise, if the post is accessible in browser without being logged-in'
f'{bug_reports_message(before=",")}', expected=True)
media.update(xdt_shortcode_media)
username = traverse_obj(media, ('owner', 'username')) or self._search_regex( username = traverse_obj(media, ('owner', 'username')) or self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"', webpage, 'username', fatal=False) r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"', webpage, 'username', fatal=False)
@@ -485,8 +471,7 @@ class InstagramIE(InstagramBaseIE):
return self.playlist_result( return self.playlist_result(
self._extract_nodes(nodes, True), video_id, self._extract_nodes(nodes, True), video_id,
format_field(username, None, 'Post by %s'), description) format_field(username, None, 'Post by %s'), description)
raise ExtractorError('There is no video in this post', expected=True)
video_url = self._og_search_video_url(webpage, secure=False)
formats = [{ formats = [{
'url': video_url, 'url': video_url,
@@ -689,7 +674,7 @@ class InstagramTagIE(InstagramPlaylistBaseIE):
class InstagramStoryIE(InstagramBaseIE): class InstagramStoryIE(InstagramBaseIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/]+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/?#]+)(?:/(?P<id>\d+))?'
IE_NAME = 'instagram:story' IE_NAME = 'instagram:story'
_TESTS = [{ _TESTS = [{
@@ -699,25 +684,38 @@ class InstagramStoryIE(InstagramBaseIE):
'title': 'Rare', 'title': 'Rare',
}, },
'playlist_mincount': 50, 'playlist_mincount': 50,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper/3570766765028588805/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
username, story_id = self._match_valid_url(url).groups() username, story_id = self._match_valid_url(url).group('user', 'id')
story_info = self._download_webpage(url, story_id) if username == 'highlights' and not story_id: # story id is only mandatory for highlights
user_info = self._search_json(r'"user":', story_info, 'user info', story_id, fatal=False) raise ExtractorError('Input URL is missing a highlight ID', expected=True)
display_id = story_id or username
story_info = self._download_webpage(url, display_id)
user_info = self._search_json(r'"user":', story_info, 'user info', display_id, fatal=False)
if not user_info: if not user_info:
self.raise_login_required('This content is unreachable') self.raise_login_required('This content is unreachable')
user_id = traverse_obj(user_info, 'pk', 'id', expected_type=str) user_id = traverse_obj(user_info, 'pk', 'id', expected_type=str)
story_info_url = user_id if username != 'highlights' else f'highlight:{story_id}' if username == 'highlights':
if not story_info_url: # user id is only mandatory for non-highlights story_info_url = f'highlight:{story_id}'
raise ExtractorError('Unable to extract user id') else:
if not user_id: # user id is only mandatory for non-highlights
raise ExtractorError('Unable to extract user id')
story_info_url = user_id
videos = traverse_obj(self._download_json( videos = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/feed/reels_media/?reel_ids={story_info_url}', f'{self._API_BASE_URL}/feed/reels_media/?reel_ids={story_info_url}',
story_id, errnote=False, fatal=False, headers=self._API_HEADERS), 'reels') display_id, errnote=False, fatal=False, headers=self._api_headers), 'reels')
if not videos: if not videos:
self.raise_login_required('You need to log in to access this content') self.raise_login_required('You need to log in to access this content')
user_info = traverse_obj(videos, (user_id, 'user', {dict})) or {}
full_name = traverse_obj(videos, (f'highlight:{story_id}', 'user', 'full_name'), (user_id, 'user', 'full_name')) full_name = traverse_obj(videos, (f'highlight:{story_id}', 'user', 'full_name'), (user_id, 'user', 'full_name'))
story_title = traverse_obj(videos, (f'highlight:{story_id}', 'title')) story_title = traverse_obj(videos, (f'highlight:{story_id}', 'title'))
@@ -727,6 +725,7 @@ class InstagramStoryIE(InstagramBaseIE):
highlights = traverse_obj(videos, (f'highlight:{story_id}', 'items'), (user_id, 'items')) highlights = traverse_obj(videos, (f'highlight:{story_id}', 'items'), (user_id, 'items'))
info_data = [] info_data = []
for highlight in highlights: for highlight in highlights:
highlight.setdefault('user', {}).update(user_info)
highlight_data = self._extract_product(highlight) highlight_data = self._extract_product(highlight)
if highlight_data.get('formats'): if highlight_data.get('formats'):
info_data.append({ info_data.append({
@@ -734,4 +733,7 @@ class InstagramStoryIE(InstagramBaseIE):
'uploader_id': user_id, 'uploader_id': user_id,
**filter_dict(highlight_data), **filter_dict(highlight_data),
}) })
if username != 'highlights' and story_id and not self._yes_playlist(username, story_id):
return traverse_obj(info_data, (lambda _, v: v['id'] == _pk_to_id(story_id), any))
return self.playlist_result(info_data, playlist_id=story_id, playlist_title=story_title) return self.playlist_result(info_data, playlist_id=story_id, playlist_title=story_title)

View File

@@ -2,10 +2,12 @@ import hashlib
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
try_get, try_get,
urlhandle_detect_ext,
) )
@@ -27,7 +29,7 @@ class JamendoIE(InfoExtractor):
'ext': 'flac', 'ext': 'flac',
# 'title': 'Maya Filipič - Stories from Emona I', # 'title': 'Maya Filipič - Stories from Emona I',
'title': 'Stories from Emona I', 'title': 'Stories from Emona I',
'artist': 'Maya Filipič', 'artists': ['Maya Filipič'],
'album': 'Between two worlds', 'album': 'Between two worlds',
'track': 'Stories from Emona I', 'track': 'Stories from Emona I',
'duration': 210, 'duration': 210,
@@ -93,9 +95,15 @@ class JamendoIE(InfoExtractor):
if not cover_url or cover_url in urls: if not cover_url or cover_url in urls:
continue continue
urls.append(cover_url) urls.append(cover_url)
urlh = self._request_webpage(
HEADRequest(cover_url), track_id, 'Checking thumbnail extension',
errnote=False, fatal=False)
if not urlh:
continue
size = int_or_none(cover_id.lstrip('size')) size = int_or_none(cover_id.lstrip('size'))
thumbnails.append({ thumbnails.append({
'id': cover_id, 'id': cover_id,
'ext': urlhandle_detect_ext(urlh, default='jpg'),
'url': cover_url, 'url': cover_url,
'width': size, 'width': size,
'height': size, 'height': size,

View File

@@ -26,6 +26,7 @@ class LBRYBaseIE(InfoExtractor):
_CLAIM_ID_REGEX = r'[0-9a-f]{1,40}' _CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = f'[^$@:/?#&]+(?:[:#]{_CLAIM_ID_REGEX})?' _OPT_CLAIM_ID = f'[^$@:/?#&]+(?:[:#]{_CLAIM_ID_REGEX})?'
_SUPPORTED_STREAM_TYPES = ['video', 'audio'] _SUPPORTED_STREAM_TYPES = ['video', 'audio']
_UNSUPPORTED_STREAM_TYPES = ['binary']
_PAGE_SIZE = 50 _PAGE_SIZE = 50
def _call_api_proxy(self, method, display_id, params, resource): def _call_api_proxy(self, method, display_id, params, resource):
@@ -336,12 +337,15 @@ class LBRYIE(LBRYBaseIE):
'vcodec': 'none' if stream_type == 'audio' else None, 'vcodec': 'none' if stream_type == 'audio' else None,
}) })
final_url = None
# HEAD request returns redirect response to m3u8 URL if available # HEAD request returns redirect response to m3u8 URL if available
final_url = self._request_webpage( urlh = self._request_webpage(
HEADRequest(streaming_url), display_id, headers=headers, HEADRequest(streaming_url), display_id, headers=headers,
note='Downloading streaming redirect url info').url note='Downloading streaming redirect url info', fatal=False)
if urlh:
final_url = urlh.url
elif result.get('value_type') == 'stream': elif result.get('value_type') == 'stream' and stream_type not in self._UNSUPPORTED_STREAM_TYPES:
claim_id, is_live = result['signing_channel']['claim_id'], True claim_id, is_live = result['signing_channel']['claim_id'], True
live_data = self._download_json( live_data = self._download_json(
'https://api.odysee.live/livestream/is_live', claim_id, 'https://api.odysee.live/livestream/is_live', claim_id,

87
yt_dlp/extractor/loco.py Normal file
View File

@@ -0,0 +1,87 @@
from .common import InfoExtractor
from ..utils import int_or_none, url_or_none
from ..utils.traversal import require, traverse_obj
class LocoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?loco\.com/(?P<type>streamers|stream)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://loco.com/streamers/teuzinfps',
'info_dict': {
'id': 'teuzinfps',
'ext': 'mp4',
'title': r're:MS BOLADAO, RESENHA & GAMEPLAY ALTO NIVEL',
'description': 'bom e novo',
'uploader_id': 'RLUVE3S9JU',
'channel': 'teuzinfps',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/743701a9-98ca-41ae-9a8b-70bd5da070ad.jpg',
'tags': ['MMORPG', 'Gameplay'],
'series': 'Tibia',
'timestamp': int,
'modified_timestamp': int,
'live_status': 'is_live',
'upload_date': str,
'modified_date': str,
},
'params': {
'skip_download': 'Livestream',
},
}, {
'url': 'https://loco.com/stream/c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'md5': '45ebc8a47ee1c2240178757caf8881b5',
'info_dict': {
'id': 'c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'ext': 'mp4',
'title': 'PAULINHO LOKO NA LOCO!',
'description': 'live on na loco',
'uploader_id': '2MDO7Z1DPM',
'channel': 'paulinholokobr',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'duration': 14491,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/59b5970b-23c1-4518-9e96-17ce341299fe.jpg',
'tags': ['Gameplay'],
'series': 'GTA 5',
'timestamp': 1740612872,
'modified_timestamp': 1740613037,
'upload_date': '20250226',
'modified_date': '20250226',
},
}]
def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id')
webpage = self._download_webpage(url, video_id)
stream = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', ('liveStreamData', 'stream'), {dict}, any, {require('stream info')}))
return {
'formats': self._extract_m3u8_formats(stream['conf']['hls'], video_id),
'id': video_id,
'is_live': video_type == 'streamers',
**traverse_obj(stream, {
'title': ('title', {str}),
'series': ('game_name', {str}),
'uploader_id': ('user_uid', {str}),
'channel': ('alias', {str}),
'description': ('description', {str}),
'concurrent_view_count': ('viewersCurrent', {int_or_none}),
'view_count': ('total_views', {int_or_none}),
'thumbnail': ('thumbnail_url_small', {url_or_none}),
'like_count': ('likes', {int_or_none}),
'tags': ('tags', ..., {str}),
'timestamp': ('started_at', {int_or_none(scale=1000)}),
'modified_timestamp': ('updated_at', {int_or_none(scale=1000)}),
'comment_count': ('comments_count', {int_or_none}),
'channel_follower_count': ('followers_count', {int_or_none}),
'duration': ('duration', {int_or_none}),
}),
}

View File

@@ -1,35 +1,36 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_age_limit, parse_duration, traverse_obj from ..utils import parse_age_limit, parse_duration, url_or_none
from ..utils.traversal import traverse_obj
class MagellanTVIE(InfoExtractor): class MagellanTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.magellantv.com/watch/my-dads-on-death-row?type=v', 'url': 'https://www.magellantv.com/watch/incas-the-new-story?type=v',
'info_dict': { 'info_dict': {
'id': 'my-dads-on-death-row', 'id': 'incas-the-new-story',
'ext': 'mp4', 'ext': 'mp4',
'title': 'My Dad\'s On Death Row', 'title': 'Incas: The New Story',
'description': 'md5:33ba23b9f0651fc4537ed19b1d5b0d7a', 'description': 'md5:936c7f6d711c02dfb9db22a067b586fe',
'duration': 3780.0,
'age_limit': 14, 'age_limit': 14,
'tags': ['Justice', 'Reality', 'United States', 'True Crime'], 'duration': 3060.0,
'tags': ['Ancient History', 'Archaeology', 'Anthropology'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.magellantv.com/video/james-bulger-the-new-revelations', 'url': 'https://www.magellantv.com/video/tortured-to-death-murdering-the-nanny',
'info_dict': { 'info_dict': {
'id': 'james-bulger-the-new-revelations', 'id': 'tortured-to-death-murdering-the-nanny',
'ext': 'mp4', 'ext': 'mp4',
'title': 'James Bulger: The New Revelations', 'title': 'Tortured to Death: Murdering the Nanny',
'description': 'md5:7b97922038bad1d0fe8d0470d8a189f2', 'description': 'md5:d87033594fa218af2b1a8b49f52511e5',
'age_limit': 14,
'duration': 2640.0, 'duration': 2640.0,
'age_limit': 0, 'tags': ['True Crime', 'Murder'],
'tags': ['Investigation', 'True Crime', 'Justice', 'Europe'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.magellantv.com/watch/celebration-nation', 'url': 'https://www.magellantv.com/watch/celebration-nation?type=s',
'info_dict': { 'info_dict': {
'id': 'celebration-nation', 'id': 'celebration-nation',
'ext': 'mp4', 'ext': 'mp4',
@@ -43,10 +44,19 @@ class MagellanTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data = traverse_obj(self._search_nextjs_data(webpage, video_id), ( context = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['reactContext']
'props', 'pageProps', 'reactContext', data = traverse_obj(context, ((('video', 'detail'), ('series', 'currentEpisode')), {dict}, any))
(('video', 'detail'), ('series', 'currentEpisode')), {dict}), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(data['jwpVideoUrl'], video_id) formats, subtitles = [], {}
for m3u8_url in set(traverse_obj(data, ((('manifests', ..., 'hls'), 'jwp_video_url'), {url_or_none}))):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and (error := traverse_obj(context, ('errorDetailPage', 'errorMessage', {str}))):
if 'available in your country' in error:
self.raise_geo_restricted(msg=error)
self.raise_no_formats(f'{self.IE_NAME} said: {error}', expected=True)
return { return {
'id': video_id, 'id': video_id,

View File

@@ -102,11 +102,10 @@ class MedalTVIE(InfoExtractor):
item_id = item_id or '%dp' % height item_id = item_id or '%dp' % height
if item_id not in item_url: if item_id not in item_url:
return return
width = int(round(aspect_ratio * height))
container.append({ container.append({
'url': item_url, 'url': item_url,
id_key: item_id, id_key: item_id,
'width': width, 'width': round(aspect_ratio * height),
'height': height, 'height': height,
}) })

View File

@@ -1,5 +1,7 @@
from .telecinco import TelecincoBaseIE from .telecinco import TelecincoBaseIE
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
) )
@@ -79,7 +81,17 @@ class MiTeleIE(TelecincoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
try: # yt-dlp's default user-agents are too old and blocked by akamai
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient to bypass akamai
webpage = self._download_webpage(url, display_id, impersonate=True)
pre_player = self._search_json( pre_player = self._search_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
webpage, 'Pre Player', display_id)['prePlayer'] webpage, 'Pre Player', display_id)['prePlayer']

View File

@@ -1,167 +1,215 @@
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
determine_ext, determine_ext,
int_or_none, int_or_none,
unescapeHTML, parse_iso8601,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class MSNIE(InfoExtractor): class MSNIE(InfoExtractor):
_WORKING = False _VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/?#]+/)+(?P<display_id>[^/?#]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d', 'url': 'https://www.msn.com/en-gb/video/news/president-macron-interrupts-trump-over-ukraine-funding/vi-AA1zMcD7',
'md5': '087548191d273c5c55d05028f8d2cbcd',
'info_dict': { 'info_dict': {
'id': 'BBPxU6d', 'id': 'AA1zMcD7',
'display_id': '7-ways-to-get-rid-of-chest-congestion',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Seven ways to get rid of chest congestion', 'display_id': 'president-macron-interrupts-trump-over-ukraine-funding',
'description': '7 Ways to Get Rid of Chest Congestion', 'title': 'President Macron interrupts Trump over Ukraine funding',
'duration': 88, 'description': 'md5:5fd3857ac25849e7a56cb25fbe1a2a8b',
'uploader': 'Health', 'uploader': 'k! News UK',
'uploader_id': 'BBPrMqa', 'uploader_id': 'BB1hz5Rj',
'duration': 59,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zMagX.img',
'tags': 'count:14',
'timestamp': 1740510914,
'upload_date': '20250225',
'release_timestamp': 1740513600,
'release_date': '20250225',
'modified_timestamp': 1741413241,
'modified_date': '20250308',
}, },
}, { }, {
# Article, multiple Dailymotion Embeds 'url': 'https://www.msn.com/en-gb/video/watch/films-success-saved-adam-pearsons-acting-career/vi-AA1znZGE?ocid=hpmsn',
'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
'info_dict': { 'info_dict': {
'id': 'BBpc7Nl', 'id': 'AA1znZGE',
'ext': 'mp4',
'display_id': 'films-success-saved-adam-pearsons-acting-career',
'title': "Films' success saved Adam Pearson's acting career",
'description': 'md5:98c05f7bd9ab4f9c423400f62f2d3da5',
'uploader': 'Sky News',
'uploader_id': 'AA2eki',
'duration': 52,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zo7nU.img',
'timestamp': 1739993965,
'upload_date': '20250219',
'release_timestamp': 1739977753,
'release_date': '20250219',
'modified_timestamp': 1742076259,
'modified_date': '20250315',
}, },
'playlist_mincount': 4,
}, { }, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf', 'url': 'https://www.msn.com/en-us/entertainment/news/rock-frontman-replacements-you-might-not-know-happened/vi-AA1yLVcD',
'only_matching': True, 'info_dict': {
}, { 'id': 'AA1yLVcD',
'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH', 'ext': 'mp4',
'only_matching': True, 'display_id': 'rock-frontman-replacements-you-might-not-know-happened',
}, { 'title': 'Rock Frontman Replacements You Might Not Know Happened',
# geo restricted 'description': 'md5:451a125496ff0c9f6816055bb1808da9',
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU', 'uploader': 'Grunge (Video)',
'only_matching': True, 'uploader_id': 'BB1oveoV',
}, { 'duration': 596,
'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-raped-woman-comment/vi-AAhvzW6', 'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1yM4OJ.img',
'only_matching': True, 'timestamp': 1739223456,
}, { 'upload_date': '20250210',
# Vidible(AOL) Embed 'release_timestamp': 1739219731,
'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR', 'release_date': '20250210',
'only_matching': True, 'modified_timestamp': 1741427272,
'modified_date': '20250308',
},
}, { }, {
# Dailymotion Embed # Dailymotion Embed
'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L', 'url': 'https://www.msn.com/de-de/nachrichten/other/the-first-descendant-gameplay-trailer-zu-serena-der-neuen-gefl%C3%BCgelten-nachfahrin/vi-AA1B1d06',
'only_matching': True, 'info_dict': {
'id': 'x9g6oli',
'ext': 'mp4',
'title': 'The First Descendant: Gameplay-Trailer zu Serena, der neuen geflügelten Nachfahrin',
'description': '',
'uploader': 'MeinMMO',
'uploader_id': 'x2mvqi4',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 60,
'thumbnail': 'https://s1.dmcdn.net/v/Y3fO61drj56vPB9SS/x1080',
'tags': ['MeinMMO', 'The First Descendant'],
'timestamp': 1742124877,
'upload_date': '20250316',
},
}, { }, {
# YouTube Embed # Youtube Embed
'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v', 'url': 'https://www.msn.com/en-gb/video/webcontent/web-content/vi-AA1ybFaJ',
'only_matching': True, 'info_dict': {
'id': 'kQSChWu95nE',
'ext': 'mp4',
'title': '7 Daily Habits to Nurture Your Personal Growth',
'description': 'md5:6f233c68341b74dee30c8c121924e827',
'uploader': 'TopThink',
'uploader_id': '@TopThink',
'uploader_url': 'https://www.youtube.com/@TopThink',
'channel': 'TopThink',
'channel_id': 'UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_url': 'https://www.youtube.com/channel/UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 705,
'thumbnail': 'https://i.ytimg.com/vi/kQSChWu95nE/maxresdefault.jpg',
'categories': ['Howto & Style'],
'tags': ['topthink', 'top think', 'personal growth'],
'timestamp': 1722711620,
'upload_date': '20240803',
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
},
}, { }, {
# NBCSports Embed # Article with social embed
'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb', 'url': 'https://www.msn.com/en-in/news/techandscience/watch-earth-sets-and-rises-behind-moon-in-breathtaking-blue-ghost-video/ar-AA1zKoAc',
'only_matching': True, 'info_dict': {
'id': 'AA1zKoAc',
'title': 'Watch: Earth sets and rises behind Moon in breathtaking Blue Ghost video',
'description': 'md5:0ad51cfa77e42e7f0c46cf98a619dbbf',
'uploader': 'India Today',
'uploader_id': 'AAyFWG',
'tags': 'count:11',
'timestamp': 1740485034,
'upload_date': '20250225',
'release_timestamp': 1740484875,
'release_date': '20250225',
'modified_timestamp': 1740488561,
'modified_date': '20250225',
},
'playlist_count': 1,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id, page_id = self._match_valid_url(url).groups() locale, display_id, page_id = self._match_valid_url(url).group('locale', 'display_id', 'id')
webpage = self._download_webpage(url, display_id) json_data = self._download_json(
f'https://assets.msn.com/content/view/v2/Detail/{locale}/{page_id}', page_id)
entries = [] common_metadata = traverse_obj(json_data, {
for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage): 'title': ('title', {str}),
video = self._parse_json(unescapeHTML(metadata), display_id) 'description': (('abstract', ('body', {clean_html})), {str}, filter, any),
'timestamp': ('createdDateTime', {parse_iso8601}),
provider_id = video.get('providerId') 'release_timestamp': ('publishedDateTime', {parse_iso8601}),
player_name = video.get('playerName') 'modified_timestamp': ('updatedDateTime', {parse_iso8601}),
if player_name and provider_id: 'thumbnail': ('thumbnail', 'image', 'url', {url_or_none}),
entry = None 'duration': ('videoMetadata', 'playTime', {int_or_none}),
if player_name == 'AOL': 'tags': ('keywords', ..., {str}),
if provider_id.startswith('http'): 'uploader': ('provider', 'name', {str}),
provider_id = self._search_regex( 'uploader_id': ('provider', 'id', {str}),
r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})', })
provider_id, 'vidible id')
entry = self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
entry = self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
elif player_name == 'YouTube':
entry = self.url_result(
provider_id, 'Youtube', provider_id)
elif player_name == 'NBCSports':
entry = self.url_result(
'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
'NBCSportsVPlayer', provider_id)
if entry:
entries.append(entry)
continue
video_id = video['uuid']
title = video['title']
page_type = json_data['type']
source_url = traverse_obj(json_data, ('sourceHref', {url_or_none}))
if page_type == 'video':
if traverse_obj(json_data, ('thirdPartyVideoPlayer', 'enabled')) and source_url:
return self.url_result(source_url)
formats = [] formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))
elif 'format=mpd-time-csf' in format_url:
formats.extend(self._extract_mpd_formats(
format_url, display_id, 'dash', fatal=False))
elif '.ism' in format_url:
if format_url.endswith('.ism'):
format_url += '/manifest'
formats.extend(self._extract_ism_formats(
format_url, display_id, 'mss', fatal=False))
else:
format_id = file_.get('formatCode')
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': format_id,
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
'quality': 1 if format_id == '1001' else None,
})
subtitles = {} subtitles = {}
for file_ in video.get('files', []): for file in traverse_obj(json_data, ('videoMetadata', 'externalVideoFiles', lambda _, v: url_or_none(v['url']))):
format_url = file_.get('url') file_url = file['url']
format_code = file_.get('formatCode') ext = determine_ext(file_url)
if not format_url or not format_code: if ext == 'm3u8':
continue fmts, subs = self._extract_m3u8_formats_and_subtitles(
if str(format_code) == '3100': file_url, page_id, 'mp4', m3u8_id='hls', fatal=False)
subtitles.setdefault(file_.get('culture', 'en'), []).append({ formats.extend(fmts)
'ext': determine_ext(format_url, 'ttml'), self._merge_subtitles(subs, target=subtitles)
'url': format_url, elif ext == 'mpd':
}) fmts, subs = self._extract_mpd_formats_and_subtitles(
file_url, page_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append(
traverse_obj(file, {
'url': 'url',
'format_id': ('format', {str}),
'filesize': ('fileSize', {int_or_none}),
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
}))
for caption in traverse_obj(json_data, ('videoMetadata', 'closedCaptions', lambda _, v: url_or_none(v['href']))):
lang = caption.get('locale') or 'en-us'
subtitles.setdefault(lang, []).append({
'url': caption['href'],
'ext': 'ttml',
})
entries.append({ return {
'id': video_id, 'id': page_id,
'display_id': display_id, 'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats, 'formats': formats,
}) 'subtitles': subtitles,
**common_metadata,
}
elif page_type == 'webcontent':
if not source_url:
raise ExtractorError('Could not find source URL')
return self.url_result(source_url)
elif page_type == 'article':
entries = []
for embed_url in traverse_obj(json_data, ('socialEmbeds', ..., 'postUrl', {url_or_none})):
entries.append(self.url_result(embed_url))
if not entries: return self.playlist_result(entries, page_id, **common_metadata)
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError(f'{self.IE_NAME} said: {error}', expected=True)
return self.playlist_result(entries, page_id) raise ExtractorError(f'Unsupported page type: {page_type}')

View File

@@ -4,7 +4,9 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
unified_timestamp, unified_timestamp,
url_or_none,
) )
from ..utils.traversal import traverse_obj
class N1InfoAssetIE(InfoExtractor): class N1InfoAssetIE(InfoExtractor):
@@ -35,9 +37,9 @@ class N1InfoIIE(InfoExtractor):
IE_NAME = 'N1Info:article' IE_NAME = 'N1Info:article'
_VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# Youtube embedded # YouTube embedded
'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/', 'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/',
'md5': '01ddb6646d0fd9c4c7d990aa77fe1c5a', 'md5': '987ce6fd72acfecc453281e066b87973',
'info_dict': { 'info_dict': {
'id': 'L5Hd4hQVUpk', 'id': 'L5Hd4hQVUpk',
'ext': 'mp4', 'ext': 'mp4',
@@ -45,7 +47,26 @@ class N1InfoIIE(InfoExtractor):
'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS', 'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS',
'description': 'md5:467f330af1effedd2e290f10dc31bb8e', 'description': 'md5:467f330af1effedd2e290f10dc31bb8e',
'uploader': 'Sport Klub', 'uploader': 'Sport Klub',
'uploader_id': 'sportklub', 'uploader_id': '@sportklub',
'uploader_url': 'https://www.youtube.com/@sportklub',
'channel': 'Sport Klub',
'channel_id': 'UChpzBje9Ro6CComXe3BgNaw',
'channel_url': 'https://www.youtube.com/channel/UChpzBje9Ro6CComXe3BgNaw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 1049,
'thumbnail': 'https://i.ytimg.com/vi/L5Hd4hQVUpk/maxresdefault.jpg',
'chapters': 'count:9',
'categories': ['Sports'],
'tags': 'count:10',
'timestamp': 1631522787,
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
}, },
}, { }, {
'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/', 'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/',
@@ -55,6 +76,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode', 'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode',
'upload_date': '20210924', 'upload_date': '20210924',
'timestamp': 1632481347, 'timestamp': 1632481347,
'thumbnail': 'http://n1info.rs/wp-content/themes/ucnewsportal-n1/dist/assets/images/placeholder-image-video.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -67,6 +89,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”', 'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”',
'timestamp': 1632567630, 'timestamp': 1632567630,
'upload_date': '20210925', 'upload_date': '20210925',
'thumbnail': 'https://n1info.si/wp-content/uploads/2021/09/06/1630945843-tomaz3.png',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -81,6 +104,14 @@ class N1InfoIIE(InfoExtractor):
'upload_date': '20210924', 'upload_date': '20210924',
'timestamp': 1632448649.0, 'timestamp': 1632448649.0,
'uploader': 'YouLotWhatDontStop', 'uploader': 'YouLotWhatDontStop',
'display_id': 'pu9wbx',
'channel_id': 'serbia',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'duration': 134,
'thumbnail': 'https://external-preview.redd.it/5nmmawSeGx60miQM3Iq-ueC9oyCLTLjjqX-qqY8uRsc.png?format=pjpg&auto=webp&s=2f973400b04d23f871b608b178e47fc01f9b8f1d',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -93,6 +124,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)', 'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)',
'upload_date': '20211102', 'upload_date': '20211102',
'timestamp': 1635861677, 'timestamp': 1635861677,
'thumbnail': 'https://nova.rs/wp-content/uploads/2021/11/02/1635860298-TNJG_Ana_Brnabic_i_Zaklina_Tatalovic_100_dana_Vlade_GP.jpg',
}, },
}, { }, {
'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/', 'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/',
@@ -104,6 +136,16 @@ class N1InfoIIE(InfoExtractor):
'timestamp': 1687290536, 'timestamp': 1687290536,
'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg', 'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg',
}, },
}, {
'url': 'https://n1info.rs/vesti/vuciceva-turneja-po-srbiji-najavljuje-kontrarevoluciju-preti-svom-narodu-vredja-novinare/',
'info_dict': {
'id': '2025974',
'ext': 'mp4',
'title': 'Vučićeva turneja po Srbiji: Najavljuje kontrarevoluciju, preti svom narodu, vređa novinare',
'thumbnail': 'https://cdn-uc.brid.tv/live/partners/26827/snapshot/2025974_fhd_67c4a23280a81_1740939826.jpg',
'timestamp': 1740939936,
'upload_date': '20250302',
},
}, { }, {
'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/', 'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/',
'only_matching': True, 'only_matching': True,
@@ -115,11 +157,11 @@ class N1InfoIIE(InfoExtractor):
title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title') title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title')
timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage)) timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage))
plugin_data = self._html_search_meta('BridPlugin', webpage) plugin_data = re.findall(r'\$bp\("(?:Brid|TargetVideo)_\d+",\s(.+)\);', webpage)
entries = [] entries = []
if plugin_data: if plugin_data:
site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id') site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id')
for video_data in re.findall(r'\$bp\("Brid_\d+", (.+)\);', webpage): for video_data in plugin_data:
video_id = self._parse_json(video_data, title)['video'] video_id = self._parse_json(video_data, title)['video']
entries.append({ entries.append({
'id': video_id, 'id': video_id,
@@ -140,7 +182,7 @@ class N1InfoIIE(InfoExtractor):
'url': video_data.get('data-url'), 'url': video_data.get('data-url'),
'id': video_data.get('id'), 'id': video_data.get('id'),
'title': title, 'title': title,
'thumbnail': video_data.get('data-thumbnail'), 'thumbnail': traverse_obj(video_data, (('data-thumbnail', 'data-default_thumbnail'), {url_or_none}, any)),
'timestamp': timestamp, 'timestamp': timestamp,
'ie_key': 'N1InfoAsset', 'ie_key': 'N1InfoAsset',
}) })
@@ -152,7 +194,7 @@ class N1InfoIIE(InfoExtractor):
if url.startswith('https://www.youtube.com'): if url.startswith('https://www.youtube.com'):
entries.append(self.url_result(url, ie='Youtube')) entries.append(self.url_result(url, ie='Youtube'))
elif url.startswith('https://www.redditmedia.com'): elif url.startswith('https://www.redditmedia.com'):
entries.append(self.url_result(url, ie='RedditR')) entries.append(self.url_result(url, ie='Reddit'))
return { return {
'_type': 'playlist', '_type': 'playlist',

View File

@@ -736,7 +736,7 @@ class NBCStationsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
nbc_data = self._search_json( nbc_data = self._search_json(
r'<script>\s*var\s+nbc\s*=', webpage, 'NBC JSON data', video_id) r'(?:<script>\s*var\s+nbc\s*=|Object\.assign\(nbc,)', webpage, 'NBC JSON data', video_id)
pdk_acct = nbc_data.get('pdkAcct') or 'Yh1nAC' pdk_acct = nbc_data.get('pdkAcct') or 'Yh1nAC'
fw_ssid = traverse_obj(nbc_data, ('video', 'fwSSID')) fw_ssid = traverse_obj(nbc_data, ('video', 'fwSSID'))

View File

@@ -13,11 +13,13 @@ from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
clean_html, clean_html,
determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_qs,
parse_resolution, parse_resolution,
qualities, qualities,
remove_start, remove_start,
@@ -26,6 +28,7 @@ from ..utils import (
try_get, try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
url_basename,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@@ -430,6 +433,7 @@ class NiconicoIE(InfoExtractor):
'format_id': ('id', {str}), 'format_id': ('id', {str}),
'abr': ('bitRate', {float_or_none(scale=1000)}), 'abr': ('bitRate', {float_or_none(scale=1000)}),
'asr': ('samplingRate', {int_or_none}), 'asr': ('samplingRate', {int_or_none}),
'quality': ('qualityLevel', {int_or_none}),
}), get_all=False), }), get_all=False),
'acodec': 'aac', 'acodec': 'aac',
} }
@@ -441,7 +445,9 @@ class NiconicoIE(InfoExtractor):
min_abr = min(traverse_obj(audios, (..., 'bitRate', {float_or_none})), default=0) / 1000 min_abr = min(traverse_obj(audios, (..., 'bitRate', {float_or_none})), default=0) / 1000
for video_fmt in video_fmts: for video_fmt in video_fmts:
video_fmt['tbr'] -= min_abr video_fmt['tbr'] -= min_abr
video_fmt['format_id'] = f'video-{video_fmt["tbr"]:.0f}' video_fmt['format_id'] = url_basename(video_fmt['url']).rpartition('.')[0]
video_fmt['quality'] = traverse_obj(videos, (
lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1
yield video_fmt yield video_fmt
def _real_extract(self, url): def _real_extract(self, url):
@@ -1033,6 +1039,7 @@ class NiconicoLiveIE(InfoExtractor):
thumbnails.append({ thumbnails.append({
'id': f'{name}_{width}x{height}', 'id': f'{name}_{width}x{height}',
'url': img_url, 'url': img_url,
'ext': traverse_obj(parse_qs(img_url), ('image', 0, {determine_ext(default_ext='jpg')})),
**res, **res,
}) })

View File

@@ -67,7 +67,7 @@ class OpenRecBaseIE(InfoExtractor):
class OpenRecIE(OpenRecBaseIE): class OpenRecIE(OpenRecBaseIE):
IE_NAME = 'openrec' IE_NAME = 'openrec'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/live/2p8v31qe4zy', 'url': 'https://www.openrec.tv/live/2p8v31qe4zy',
'only_matching': True, 'only_matching': True,
@@ -85,7 +85,7 @@ class OpenRecIE(OpenRecBaseIE):
class OpenRecCaptureIE(OpenRecBaseIE): class OpenRecCaptureIE(OpenRecBaseIE):
IE_NAME = 'openrec:capture' IE_NAME = 'openrec:capture'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/capture/l9nk2x4gn14', 'url': 'https://www.openrec.tv/capture/l9nk2x4gn14',
'only_matching': True, 'only_matching': True,
@@ -129,7 +129,7 @@ class OpenRecCaptureIE(OpenRecBaseIE):
class OpenRecMovieIE(OpenRecBaseIE): class OpenRecMovieIE(OpenRecBaseIE):
IE_NAME = 'openrec:movie' IE_NAME = 'openrec:movie'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.openrec.tv/movie/nqz5xl5km8v', 'url': 'https://www.openrec.tv/movie/nqz5xl5km8v',
'info_dict': { 'info_dict': {
@@ -141,6 +141,9 @@ class OpenRecMovieIE(OpenRecBaseIE):
'uploader_id': 'taiki_to_kazuhiro', 'uploader_id': 'taiki_to_kazuhiro',
'timestamp': 1638856800, 'timestamp': 1638856800,
}, },
}, {
'url': 'https://www.openrec.tv/movie/2p8vvex548y?playlist_id=98brq96vvsgn2nd',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -23,9 +23,9 @@ class PinterestBaseIE(InfoExtractor):
def _call_api(self, resource, video_id, options): def _call_api(self, resource, video_id, options):
return self._download_json( return self._download_json(
f'https://www.pinterest.com/resource/{resource}Resource/get/', f'https://www.pinterest.com/resource/{resource}Resource/get/',
video_id, f'Download {resource} JSON metadata', query={ video_id, f'Download {resource} JSON metadata',
'data': json.dumps({'options': options}), query={'data': json.dumps({'options': options})},
})['resource_response'] headers={'X-Pinterest-PWS-Handler': 'www/[username].js'})['resource_response']
def _extract_video(self, data, extract_formats=True): def _extract_video(self, data, extract_formats=True):
video_id = data['id'] video_id = data['id']

View File

@@ -1,4 +1,7 @@
import base64
import hashlib
import json import json
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@@ -142,39 +145,73 @@ class PlaySuisseIE(InfoExtractor):
id id
url url
}''' }'''
_LOGIN_BASE_URL = 'https://login.srgssr.ch/srgssrlogin.onmicrosoft.com' _CLIENT_ID = '1e33f1bf-8bf3-45e4-bbd9-c9ad934b5fca'
_LOGIN_PATH = 'B2C_1A__SignInV2' _LOGIN_BASE = 'https://account.srgssr.ch'
_ID_TOKEN = None _ID_TOKEN = None
def _perform_login(self, username, password): def _perform_login(self, username, password):
login_page = self._download_webpage( code_verifier = uuid.uuid4().hex + uuid.uuid4().hex + uuid.uuid4().hex
'https://www.playsuisse.ch/api/sso/login', None, note='Downloading login page', code_challenge = base64.urlsafe_b64encode(
query={'x': 'x', 'locale': 'de', 'redirectUrl': 'https://www.playsuisse.ch/'}) hashlib.sha256(code_verifier.encode()).digest()).decode().rstrip('=')
settings = self._search_json(r'var\s+SETTINGS\s*=', login_page, 'settings', None)
csrf_token = settings['csrf'] request_id = parse_qs(self._request_webpage(
query = {'tx': settings['transId'], 'p': self._LOGIN_PATH} f'{self._LOGIN_BASE}/authz-srv/authz', None, 'Requesting session ID', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'scope': 'email profile openid offline_access',
'response_type': 'code',
'code_challenge': code_challenge,
'code_challenge_method': 'S256',
'view_type': 'login',
}).url)['requestId'][0]
status = traverse_obj(self._download_json( try:
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/SelfAsserted', None, 'Logging in', exchange_id = self._download_json(
query=query, headers={'X-CSRF-TOKEN': csrf_token}, data=urlencode_postdata({ f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/initiate/password', None,
'request_type': 'RESPONSE', 'Submitting username', headers={'content-type': 'application/json'}, data=json.dumps({
'signInName': username, 'usage_type': 'INITIAL_AUTHENTICATION',
'password': password, 'request_id': request_id,
}), expected_status=400), ('status', {int_or_none})) 'medium_id': 'PASSWORD',
if status == 400: 'type': 'password',
raise ExtractorError('Invalid username or password', expected=True) 'identifier': username,
}).encode())['data']['exchange_id']['exchange_id']
except ExtractorError:
raise ExtractorError('Invalid username', expected=True)
urlh = self._request_webpage( try:
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/api/CombinedSigninAndSignup/confirmed', login_data = self._download_json(
None, 'Downloading ID token', query={ f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/authenticate/password', None,
'rememberMe': 'false', 'Submitting password', headers={'content-type': 'application/json'}, data=json.dumps({
'csrf_token': csrf_token, 'requestId': request_id,
**query, 'exchange_id': exchange_id,
'diags': '', 'type': 'password',
}) 'password': password,
}).encode())['data']
except ExtractorError:
raise ExtractorError('Invalid password', expected=True)
authorization_code = parse_qs(self._request_webpage(
f'{self._LOGIN_BASE}/login-srv/verification/login', None, 'Logging in',
data=urlencode_postdata({
'requestId': request_id,
'exchange_id': login_data['exchange_id']['exchange_id'],
'verificationType': 'password',
'sub': login_data['sub'],
'status_id': login_data['status_id'],
'rememberMe': True,
'lat': '',
'lon': '',
})).url)['code'][0]
self._ID_TOKEN = self._download_json(
f'{self._LOGIN_BASE}/proxy/token', None, 'Downloading token', data=b'', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'code': authorization_code,
'code_verifier': code_verifier,
'grant_type': 'authorization_code',
})['id_token']
self._ID_TOKEN = traverse_obj(parse_qs(urlh.url), ('id_token', 0))
if not self._ID_TOKEN: if not self._ID_TOKEN:
raise ExtractorError('Login failed') raise ExtractorError('Login failed')

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
parse_qs, parse_qs,
traverse_obj, traverse_obj,
truncate_string,
try_get, try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
@@ -26,6 +27,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '6rrwyj', 'display_id': '6rrwyj',
'title': 'That small heart attack.', 'title': 'That small heart attack.',
'alt_title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:4', 'thumbnails': 'count:4',
'timestamp': 1501941939, 'timestamp': 1501941939,
@@ -49,7 +51,8 @@ class RedditIE(InfoExtractor):
'id': 'gyh95hiqc0b11', 'id': 'gyh95hiqc0b11',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '90bu6w', 'display_id': '90bu6w',
'title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead', 'title': 'Heat index was 110 degrees so we offered him a cold drink. He went fo...',
'alt_title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:7', 'thumbnails': 'count:7',
'timestamp': 1532051078, 'timestamp': 1532051078,
@@ -69,7 +72,8 @@ class RedditIE(InfoExtractor):
'id': 'zasobba6wp071', 'id': 'zasobba6wp071',
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'nip71r', 'display_id': 'nip71r',
'title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.', 'title': 'I plan to make more stickers and prints! Check them out on my Etsy! O...',
'alt_title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:5', 'thumbnails': 'count:5',
'timestamp': 1621709093, 'timestamp': 1621709093,
@@ -91,7 +95,17 @@ class RedditIE(InfoExtractor):
'playlist_count': 2, 'playlist_count': 2,
'info_dict': { 'info_dict': {
'id': 'wzqkxp', 'id': 'wzqkxp',
'title': 'md5:72d3d19402aa11eff5bd32fc96369b37', 'title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the ...',
'alt_title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the Day We Meet Again" Discussion',
'description': 'md5:5b7deb328062b164b15704c5fd67c335',
'uploader': 'TheTwelveYearOld',
'channel_id': 'KamenRider',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'timestamp': 1661676059.0,
'upload_date': '20220828',
}, },
}, { }, {
# crossposted reddit-hosted media # crossposted reddit-hosted media
@@ -102,6 +116,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'zjjw82', 'display_id': 'zjjw82',
'title': 'Cringe', 'title': 'Cringe',
'alt_title': 'Cringe',
'uploader': 'Otaku-senpai69420', 'uploader': 'Otaku-senpai69420',
'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'upload_date': '20221212', 'upload_date': '20221212',
@@ -122,6 +137,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '124pp33', 'display_id': '124pp33',
'title': 'Harmless prank of some old friends', 'title': 'Harmless prank of some old friends',
'alt_title': 'Harmless prank of some old friends',
'uploader': 'Dudezila', 'uploader': 'Dudezila',
'channel_id': 'ContagiousLaughter', 'channel_id': 'ContagiousLaughter',
'duration': 17, 'duration': 17,
@@ -142,6 +158,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '12fujy3', 'display_id': '12fujy3',
'title': 'Based Hasan?', 'title': 'Based Hasan?',
'alt_title': 'Based Hasan?',
'uploader': 'KingNigelXLII', 'uploader': 'KingNigelXLII',
'channel_id': 'GenZedong', 'channel_id': 'GenZedong',
'duration': 16, 'duration': 16,
@@ -161,6 +178,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1cl9h0u', 'display_id': '1cl9h0u',
'title': 'The insurance claim will be interesting', 'title': 'The insurance claim will be interesting',
'alt_title': 'The insurance claim will be interesting',
'uploader': 'darrenpauli', 'uploader': 'darrenpauli',
'channel_id': 'Unexpected', 'channel_id': 'Unexpected',
'duration': 53, 'duration': 53,
@@ -183,6 +201,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1cxwzso', 'display_id': '1cxwzso',
'title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'', 'title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'alt_title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'uploader': 'Woodstovia', 'uploader': 'Woodstovia',
'channel_id': 'soccer', 'channel_id': 'soccer',
'duration': 30, 'duration': 30,
@@ -206,6 +225,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'degtjo', 'display_id': 'degtjo',
'title': 'When the K hits', 'title': 'When the K hits',
'alt_title': 'When the K hits',
'uploader': '[deleted]', 'uploader': '[deleted]',
'channel_id': 'ketamine', 'channel_id': 'ketamine',
'comment_count': int, 'comment_count': int,
@@ -304,14 +324,6 @@ class RedditIE(InfoExtractor):
data = data[0]['data']['children'][0]['data'] data = data[0]['data']['children'][0]['data']
video_url = data['url'] video_url = data['url']
over_18 = data.get('over_18')
if over_18 is True:
age_limit = 18
elif over_18 is False:
age_limit = 0
else:
age_limit = None
thumbnails = [] thumbnails = []
def add_thumbnail(src): def add_thumbnail(src):
@@ -337,15 +349,19 @@ class RedditIE(InfoExtractor):
add_thumbnail(resolution) add_thumbnail(resolution)
info = { info = {
'title': data.get('title'),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'timestamp': float_or_none(data.get('created_utc')), 'age_limit': {True: 18, False: 0}.get(data.get('over_18')),
'uploader': data.get('author'), **traverse_obj(data, {
'channel_id': data.get('subreddit'), 'title': ('title', {truncate_string(left=72)}),
'like_count': int_or_none(data.get('ups')), 'alt_title': ('title', {str}),
'dislike_count': int_or_none(data.get('downs')), 'description': ('selftext', {str}, filter),
'comment_count': int_or_none(data.get('num_comments')), 'timestamp': ('created_utc', {float_or_none}),
'age_limit': age_limit, 'uploader': ('author', {str}),
'channel_id': ('subreddit', {str}),
'like_count': ('ups', {int_or_none}),
'dislike_count': ('downs', {int_or_none}),
'comment_count': ('num_comments', {int_or_none}),
}),
} }
parsed_url = urllib.parse.urlparse(video_url) parsed_url = urllib.parse.urlparse(video_url)
@@ -371,7 +387,7 @@ class RedditIE(InfoExtractor):
**info, **info,
}) })
if entries: if entries:
return self.playlist_result(entries, video_id, info.get('title')) return self.playlist_result(entries, video_id, **info)
raise ExtractorError('No media found', expected=True) raise ExtractorError('No media found', expected=True)
# Check if media is hosted on reddit: # Check if media is hosted on reddit:

View File

@@ -3,12 +3,20 @@ import json
import re import re
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor, Request
from ..utils import js_to_json from ..utils import (
determine_ext,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class RTPIE(InfoExtractor): class RTPIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:(?:estudoemcasa|palco|zigzag)/)?p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:[^/#?]+/)?p(?P<program_id>\d+)/(?P<id>e\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas', 'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas',
'md5': 'e736ce0c665e459ddb818546220b4ef8', 'md5': 'e736ce0c665e459ddb818546220b4ef8',
@@ -16,99 +24,173 @@ class RTPIE(InfoExtractor):
'id': 'e174042', 'id': 'e174042',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Paixões Cruzadas', 'title': 'Paixões Cruzadas',
'description': 'As paixões musicais de António Cartaxo e António Macedo', 'description': 'md5:af979e58ba0ab73f78435fc943fdb070',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'series': 'Paixões Cruzadas',
'duration': 2950.0,
'modified_timestamp': 1553693464,
'modified_date': '20190327',
'timestamp': 1417219200,
'upload_date': '20141129',
}, },
}, { }, {
'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril', 'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril',
'md5': '9a81ed53f2b2197cfa7ed455b12f8ade', 'md5': '5b4859940e3adef61247a77dfb76046a',
'info_dict': { 'info_dict': {
'id': 'e757904', 'id': 'e757904',
'ext': 'mp4', 'ext': 'mp4',
'title': '25 Curiosidades, 25 de Abril', 'title': 'Estudar ou não estudar',
'description': 'Estudar ou não estudar - Em cada um dos episódios descobrimos uma curiosidade acerca de como era viver em Portugal antes da revolução do 25 de abr', 'description': 'md5:3bfd7eb8bebfd5711a08df69c9c14c35',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1711958401,
'duration': 146.0,
'upload_date': '20240401',
'modified_timestamp': 1712242991,
'series': '25 Curiosidades, 25 de Abril',
'episode_number': 2,
'episode': 'Estudar ou não estudar',
'modified_date': '20240404',
}, },
}, { }, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas', # Episode not accessible through API
'only_matching': True, 'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/e500050/portugues-1-ano',
}, { 'md5': '57660c0b46db9f22118c52cbd65975e4',
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/portugues-1-ano', 'info_dict': {
'only_matching': True, 'id': 'e500050',
}, { 'ext': 'mp4',
'url': 'https://www.rtp.pt/play/palco/p13785/l7nnon', 'title': 'Português - 1.º ano',
'only_matching': True, 'duration': 1669.0,
'description': 'md5:be68925c81269f8c6886589f25fe83ea',
'upload_date': '20201020',
'timestamp': 1603180799,
'thumbnail': 'https://cdn-images.rtp.pt/EPG/imagens/39482_59449_64850.png?v=3&w=860',
},
}] }]
_USER_AGENT = 'rtpplay/2.0.66 (pt.rtp.rtpplay; build:2066; iOS 15.8.3) Alamofire/5.9.1'
_AUTH_TOKEN = None
def _fetch_auth_token(self):
if self._AUTH_TOKEN:
return self._AUTH_TOKEN
self._AUTH_TOKEN = traverse_obj(self._download_json(Request(
'https://rtpplayapi.rtp.pt/play/api/2/token-manager',
headers={
'Accept': '*/*',
'rtp-play-auth': 'RTPPLAY_MOBILE_IOS',
'rtp-play-auth-hash': 'fac9c328b2f27e26e03d7f8942d66c05b3e59371e16c2a079f5c83cc801bd3ee',
'rtp-play-auth-timestamp': '2145973229682',
'User-Agent': self._USER_AGENT,
}, extensions={'keep_header_casing': True}), None,
note='Fetching guest auth token', errnote='Could not fetch guest auth token',
fatal=False), ('token', 'token', {str}))
return self._AUTH_TOKEN
@staticmethod
def _cleanup_media_url(url):
if urllib.parse.urlparse(url).netloc == 'streaming-ondemand.rtp.pt':
return None
return url.replace('/drm-fps/', '/hls/').replace('/drm-dash/', '/dash/')
def _extract_formats(self, media_urls, episode_id):
formats = []
subtitles = {}
for media_url in set(traverse_obj(media_urls, (..., {url_or_none}, {self._cleanup_media_url}))):
ext = determine_ext(media_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
media_url, episode_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
media_url, episode_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'url': media_url,
'format_id': 'http',
})
return formats, subtitles
def _extract_from_api(self, program_id, episode_id):
auth_token = self._fetch_auth_token()
if not auth_token:
return
episode_data = traverse_obj(self._download_json(
f'https://www.rtp.pt/play/api/1/get-episode/{program_id}/{episode_id[1:]}', episode_id,
query={'include_assets': 'true', 'include_webparams': 'true'},
headers={
'Accept': '*/*',
'Authorization': f'Bearer {auth_token}',
'User-Agent': self._USER_AGENT,
}, fatal=False), 'result', {dict})
if not episode_data:
return
asset_urls = traverse_obj(episode_data, ('assets', 0, 'asset_url', {dict}))
media_urls = traverse_obj(asset_urls, (
((('hls', 'dash'), 'stream_url'), ('multibitrate', ('url_hls', 'url_dash'))),))
formats, subtitles = self._extract_formats(media_urls, episode_id)
for sub_data in traverse_obj(asset_urls, ('subtitles', 'vtt_list', lambda _, v: url_or_none(v['file']))):
subtitles.setdefault(sub_data.get('code') or 'pt', []).append({
'url': sub_data['file'],
'name': sub_data.get('language'),
})
return {
'id': episode_id,
'formats': formats,
'subtitles': subtitles,
'thumbnail': traverse_obj(episode_data, ('assets', 0, 'asset_thumbnail', {url_or_none})),
**traverse_obj(episode_data, ('episode', {
'title': (('episode_title', 'program_title'), {str}, filter, any),
'alt_title': ('episode_subtitle', {str}, filter),
'description': (('episode_description', 'episode_summary'), {str}, filter, any),
'timestamp': ('episode_air_date', {parse_iso8601(delimiter=' ')}),
'modified_timestamp': ('episode_lastchanged', {parse_iso8601(delimiter=' ')}),
'duration': ('episode_duration_complete', {parse_duration}),
'episode': ('episode_title', {str}, filter),
'episode_number': ('episode_number', {int_or_none}),
'season': ('program_season', {str}, filter),
'series': ('program_title', {str}, filter),
})),
}
_RX_OBFUSCATION = re.compile(r'''(?xs) _RX_OBFUSCATION = re.compile(r'''(?xs)
atob\s*\(\s*decodeURIComponent\s*\(\s* atob\s*\(\s*decodeURIComponent\s*\(\s*
(\[[0-9A-Za-z%,'"]*\]) (\[[0-9A-Za-z%,'"]*\])
\s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\) \s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\)
''') ''')
def __unobfuscate(self, data, *, video_id): def __unobfuscate(self, data):
if data.startswith('{'): return self._RX_OBFUSCATION.sub(
data = self._RX_OBFUSCATION.sub( lambda m: json.dumps(
lambda m: json.dumps( base64.b64decode(urllib.parse.unquote(
base64.b64decode(urllib.parse.unquote( ''.join(json.loads(m.group(1))),
''.join(self._parse_json(m.group(1), video_id)), )).decode('iso-8859-1')),
)).decode('iso-8859-1')), data)
data)
return js_to_json(data)
def _real_extract(self, url): def _extract_from_html(self, url, episode_id):
video_id = self._match_id(url) webpage = self._download_webpage(url, episode_id)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
f, config = self._search_regex(
r'''(?sx)
(?:var\s+f\s*=\s*(?P<f>".*?"|{[^;]+?});\s*)?
var\s+player1\s+=\s+new\s+RTPPlayer\s*\((?P<config>{(?:(?!\*/).)+?})\);(?!\s*\*/)
''', webpage,
'player config', group=('f', 'config'))
config = self._parse_json(
config, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
f = config['file'] if not f else self._parse_json(
f, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
formats = [] formats = []
if isinstance(f, dict):
f_hls = f.get('hls')
if f_hls is not None:
formats.extend(self._extract_m3u8_formats(
f_hls, video_id, 'mp4', 'm3u8_native', m3u8_id='hls'))
f_dash = f.get('dash')
if f_dash is not None:
formats.extend(self._extract_mpd_formats(f_dash, video_id, mpd_id='dash'))
else:
formats.append({
'format_id': 'f',
'url': f,
'vcodec': 'none' if config.get('mediaType') == 'audio' else None,
})
subtitles = {} subtitles = {}
media_urls = traverse_obj(re.findall(r'(?:var\s+f\s*=|RTPPlayer\({[^}]+file:)\s*({[^}]+}|"[^"]+")', webpage), (
vtt = config.get('vtt') -1, (({self.__unobfuscate}, {js_to_json}, {json.loads}, {dict.values}, ...), {json.loads})))
if vtt is not None: formats, subtitles = self._extract_formats(media_urls, episode_id)
for lcode, lname, url in vtt:
subtitles.setdefault(lcode, []).append({
'name': lname,
'url': url,
})
return { return {
'id': video_id, 'id': episode_id,
'title': title,
'formats': formats, 'formats': formats,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
'subtitles': subtitles, 'subtitles': subtitles,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage, default=None),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image'], webpage, default=None),
**self._search_json_ld(webpage, episode_id, default={}),
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
} }
def _real_extract(self, url):
program_id, episode_id = self._match_valid_url(url).group('program_id', 'id')
return self._extract_from_api(program_id, episode_id) or self._extract_from_html(url, episode_id)

View File

@@ -2,16 +2,18 @@ import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
dict_get, dict_get,
int_or_none, int_or_none,
parse_duration, parse_duration,
unified_timestamp, unified_timestamp,
url_or_none,
urljoin,
) )
from ..utils.traversal import traverse_obj
class SkyItPlayerIE(InfoExtractor): class SkyItBaseIE(InfoExtractor):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
_GEO_BYPASS = False _GEO_BYPASS = False
_DOMAIN = 'sky' _DOMAIN = 'sky'
_PLAYER_TMPL = 'https://player.sky.it/player/external.html?id=%s&domain=%s' _PLAYER_TMPL = 'https://player.sky.it/player/external.html?id=%s&domain=%s'
@@ -33,7 +35,6 @@ class SkyItPlayerIE(InfoExtractor):
SkyItPlayerIE.ie_key(), video_id) SkyItPlayerIE.ie_key(), video_id)
def _parse_video(self, video, video_id): def _parse_video(self, video, video_id):
title = video['title']
is_live = video.get('type') == 'live' is_live = video.get('type') == 'live'
hls_url = video.get(('streaming' if is_live else 'hls') + '_url') hls_url = video.get(('streaming' if is_live else 'hls') + '_url')
if not hls_url and video.get('geoblock' if is_live else 'geob'): if not hls_url and video.get('geoblock' if is_live else 'geob'):
@@ -43,7 +44,7 @@ class SkyItPlayerIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': video.get('title'),
'formats': formats, 'formats': formats,
'thumbnail': dict_get(video, ('video_still', 'video_still_medium', 'thumb')), 'thumbnail': dict_get(video, ('video_still', 'video_still_medium', 'thumb')),
'description': video.get('short_desc') or None, 'description': video.get('short_desc') or None,
@@ -52,6 +53,11 @@ class SkyItPlayerIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
} }
class SkyItPlayerIE(SkyItBaseIE):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
domain = urllib.parse.parse_qs(urllib.parse.urlparse( domain = urllib.parse.parse_qs(urllib.parse.urlparse(
@@ -67,7 +73,7 @@ class SkyItPlayerIE(InfoExtractor):
return self._parse_video(video, video_id) return self._parse_video(video, video_id)
class SkyItVideoIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItVideoIE(SkyItBaseIE):
IE_NAME = 'video.sky.it' IE_NAME = 'video.sky.it'
_VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)' _VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@@ -96,7 +102,7 @@ class SkyItVideoIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
return self._player_url_result(video_id) return self._player_url_result(video_id)
class SkyItVideoLiveIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItVideoLiveIE(SkyItBaseIE):
IE_NAME = 'video.sky.it:live' IE_NAME = 'video.sky.it:live'
_VALID_URL = r'https?://video\.sky\.it/diretta/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://video\.sky\.it/diretta/(?P<id>[^/?&#]+)'
_TEST = { _TEST = {
@@ -124,7 +130,7 @@ class SkyItVideoLiveIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
return self._parse_video(livestream, asset_id) return self._parse_video(livestream, asset_id)
class SkyItIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE class SkyItIE(SkyItBaseIE):
IE_NAME = 'sky.it' IE_NAME = 'sky.it'
_VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
@@ -223,3 +229,80 @@ class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}] }]
_DOMAIN = 'mtv8' _DOMAIN = 'mtv8'
class TV8ItLiveIE(SkyItBaseIE):
IE_NAME = 'tv8.it:live'
IE_DESC = 'TV8 Live'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/streaming'
_TESTS = [{
'url': 'https://tv8.it/streaming',
'info_dict': {
'id': 'tv8',
'ext': 'mp4',
'title': str,
'description': str,
'is_live': True,
'live_status': 'is_live',
},
}]
def _real_extract(self, url):
video_id = 'tv8'
livestream = self._download_json(
'https://apid.sky.it/vdp/v1/getLivestream', video_id,
'Downloading manifest JSON', query={'id': '7'})
metadata = self._download_json('https://tv8.it/api/getStreaming', video_id, fatal=False)
return {
**self._parse_video(livestream, video_id),
**traverse_obj(metadata, ('info', {
'title': ('title', 'text', {str}),
'description': ('description', 'html', {clean_html}),
})),
}
class TV8ItPlaylistIE(InfoExtractor):
IE_NAME = 'tv8.it:playlist'
IE_DESC = 'TV8 Playlist'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?!video)[^/#?]+/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://tv8.it/intrattenimento/tv8-gialappas-night',
'playlist_mincount': 32,
'info_dict': {
'id': 'tv8-gialappas-night',
'title': 'Tv8 Gialappa\'s Night',
'description': 'md5:c876039d487d9cf40229b768872718ed',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}, {
'url': 'https://tv8.it/sport/uefa-europa-league',
'playlist_mincount': 11,
'info_dict': {
'id': 'uefa-europa-league',
'title': 'UEFA Europa League',
'description': 'md5:9ab1832b7a8b1705b1f590e13a36bc6a',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
data = self._search_nextjs_data(webpage, playlist_id)['props']['pageProps']['data']
entries = [self.url_result(
urljoin('https://tv8.it', card['href']), ie=TV8ItIE,
**traverse_obj(card, {
'description': ('extraData', 'videoDesc', {str}),
'id': ('extraData', 'asset_id', {str}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'typography', 'text', {str}),
}))
for card in traverse_obj(data, ('lastContent', 'cards', lambda _, v: v['href']))]
return self.playlist_result(entries, playlist_id, **traverse_obj(data, ('card', 'desktop', {
'description': ('description', 'html', {clean_html}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'text', {str}),
})))

View File

@@ -0,0 +1,87 @@
from .common import InfoExtractor
from .vimeo import VHXEmbedIE
from ..utils import (
ExtractorError,
clean_html,
update_url,
urlencode_postdata,
)
from ..utils.traversal import find_element, traverse_obj
class SoftWhiteUnderbellyIE(InfoExtractor):
_LOGIN_URL = 'https://www.softwhiteunderbelly.com/login'
_NETRC_MACHINE = 'softwhiteunderbelly'
_VALID_URL = r'https?://(?:www\.)?softwhiteunderbelly\.com/videos/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.softwhiteunderbelly.com/videos/kenneth-final1',
'note': 'A single Soft White Underbelly Episode',
'md5': '8e79f29ec1f1bda6da2e0b998fcbebb8',
'info_dict': {
'id': '3201266',
'ext': 'mp4',
'display_id': 'kenneth-final1',
'title': 'Appalachian Man interview-Kenneth',
'description': 'Soft White Underbelly interview and portrait of Kenneth, an Appalachian man in Clay County, Kentucky.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/249f6db0-2b39-49a4-979b-f8dad4681825.jpg',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos',
'uploader_id': 'user80538407',
'duration': 512,
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
'url': 'https://www.softwhiteunderbelly.com/videos/tj-2-final-2160p',
'note': 'A single Soft White Underbelly Episode',
'md5': '286bd8851b4824c62afb369e6f307036',
'info_dict': {
'id': '3506029',
'ext': 'mp4',
'display_id': 'tj-2-final-2160p',
'title': 'Fentanyl Addict interview-TJ (follow up)',
'description': 'Soft White Underbelly follow up interview and portrait of TJ, a fentanyl addict on Skid Row.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/c883d531-5da0-4faf-a2e2-8eba97e5adfc.jpg',
'duration': 817,
'uploader': 'OTT Videos',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader_id': 'user80538407',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}]
def _perform_login(self, username, password):
signin_page = self._download_webpage(self._LOGIN_URL, None, 'Fetching authenticity token')
self._download_webpage(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata({
'email': username,
'password': password,
'authenticity_token': self._html_search_regex(
r'name=["\']authenticity_token["\']\s+value=["\']([^"\']+)', signin_page, 'authenticity_token'),
'utf8': True,
}),
)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if '<div id="watch-unauthorized"' in webpage:
if self._get_cookies('https://www.softwhiteunderbelly.com').get('_session'):
raise ExtractorError('This account is not subscribed to this content', expected=True)
self.raise_login_required()
embed_url, embed_id = self._html_search_regex(
r'embed_url:\s*["\'](?P<url>https?://embed\.vhx\.tv/videos/(?P<id>\d+)[^"\']*)',
webpage, 'embed url', group=('url', 'id'))
return {
'_type': 'url_transparent',
'ie_key': VHXEmbedIE.ie_key(),
'url': VHXEmbedIE._smuggle_referrer(embed_url, 'https://www.softwhiteunderbelly.com'),
'id': embed_id,
'display_id': display_id,
'title': traverse_obj(webpage, ({find_element(id='watch-info')}, {find_element(cls='video-title')}, {clean_html})),
'description': self._html_search_meta('description', webpage, default=None),
'thumbnail': update_url(self._og_search_thumbnail(webpage) or '', query=None) or None,
}

View File

@@ -52,7 +52,8 @@ class SoundcloudBaseIE(InfoExtractor):
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s' _API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_HEADERS = {} _HEADERS = {}
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg' _IMAGE_REPL_RE = r'-[0-9a-z]+\.(?P<ext>jpg|png)'
_TAGS_RE = re.compile(r'"([^"]+)"|([^ ]+)')
_ARTWORK_MAP = { _ARTWORK_MAP = {
'mini': 16, 'mini': 16,
@@ -331,12 +332,14 @@ class SoundcloudBaseIE(InfoExtractor):
thumbnails = [] thumbnails = []
artwork_url = info.get('artwork_url') artwork_url = info.get('artwork_url')
thumbnail = artwork_url or user.get('avatar_url') thumbnail = artwork_url or user.get('avatar_url')
if isinstance(thumbnail, str): if url_or_none(thumbnail):
if re.search(self._IMAGE_REPL_RE, thumbnail): if mobj := re.search(self._IMAGE_REPL_RE, thumbnail):
for image_id, size in self._ARTWORK_MAP.items(): for image_id, size in self._ARTWORK_MAP.items():
# Soundcloud serves JPEG regardless of URL's ext *except* for "original" thumb
ext = mobj.group('ext') if image_id == 'original' else 'jpg'
i = { i = {
'id': image_id, 'id': image_id,
'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.jpg', thumbnail), 'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.{ext}', thumbnail),
} }
if image_id == 'tiny' and not artwork_url: if image_id == 'tiny' and not artwork_url:
size = 18 size = 18
@@ -372,6 +375,7 @@ class SoundcloudBaseIE(InfoExtractor):
'comment_count': extract_count('comment'), 'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'), 'repost_count': extract_count('reposts'),
'genres': traverse_obj(info, ('genre', {str}, filter, all, filter)), 'genres': traverse_obj(info, ('genre', {str}, filter, all, filter)),
'tags': traverse_obj(info, ('tag_list', {self._TAGS_RE.findall}, ..., ..., filter)),
'artists': traverse_obj(info, ('publisher_metadata', 'artist', {str}, filter, all, filter)), 'artists': traverse_obj(info, ('publisher_metadata', 'artist', {str}, filter, all, filter)),
'formats': formats if not extract_flat else None, 'formats': formats if not extract_flat else None,
} }
@@ -425,6 +429,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int, 'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg', 'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg',
'uploader_url': 'https://soundcloud.com/ethmusic', 'uploader_url': 'https://soundcloud.com/ethmusic',
'tags': 'count:14',
}, },
}, },
# geo-restricted # geo-restricted
@@ -440,7 +445,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_id': '9615865', 'uploader_id': '9615865',
'timestamp': 1337635207, 'timestamp': 1337635207,
'upload_date': '20120521', 'upload_date': '20120521',
'duration': 227.155, 'duration': 227.103,
'license': 'all-rights-reserved', 'license': 'all-rights-reserved',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
@@ -450,6 +455,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg', 'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg',
'genres': ['Alternative'], 'genres': ['Alternative'],
'artists': ['The Royal Concept'], 'artists': ['The Royal Concept'],
'tags': [],
}, },
}, },
# private link # private link
@@ -475,6 +481,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf', 'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png', 'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'], 'genres': ['youtubedl'],
'tags': [],
}, },
}, },
# private link (alt format) # private link (alt format)
@@ -500,15 +507,16 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf', 'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png', 'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'], 'genres': ['youtubedl'],
'tags': [],
}, },
}, },
# downloadable song # downloadable song
{ {
'url': 'https://soundcloud.com/the80m/the-following', 'url': 'https://soundcloud.com/the80m/the-following',
'md5': '9ffcddb08c87d74fb5808a3c183a1d04', 'md5': 'ecb87d7705d5f53e6c02a63760573c75', # wav: '9ffcddb08c87d74fb5808a3c183a1d04'
'info_dict': { 'info_dict': {
'id': '343609555', 'id': '343609555',
'ext': 'wav', 'ext': 'opus', # wav original available with auth
'title': 'The Following', 'title': 'The Following',
'track': 'The Following', 'track': 'The Following',
'description': '', 'description': '',
@@ -526,15 +534,18 @@ class SoundcloudIE(SoundcloudBaseIE):
'view_count': int, 'view_count': int,
'genres': ['Dance & EDM'], 'genres': ['Dance & EDM'],
'artists': ['80M'], 'artists': ['80M'],
'tags': ['80M', 'EDM', 'Dance', 'Music'],
}, },
'expected_warnings': ['Original download format is only available for registered users'],
}, },
# private link, downloadable format # private link, downloadable format
# tags with spaces (e.g. "Uplifting Trance", "Ori Uplift")
{ {
'url': 'https://soundcloud.com/oriuplift/uponly-238-no-talking-wav/s-AyZUd', 'url': 'https://soundcloud.com/oriuplift/uponly-238-no-talking-wav/s-AyZUd',
'md5': '64a60b16e617d41d0bef032b7f55441e', 'md5': '2e1530d0e9986a833a67cb34fc90ece0', # wav: '64a60b16e617d41d0bef032b7f55441e'
'info_dict': { 'info_dict': {
'id': '340344461', 'id': '340344461',
'ext': 'wav', 'ext': 'opus', # wav original available with auth
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]', 'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]', 'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366', 'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
@@ -552,7 +563,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/oriuplift', 'uploader_url': 'https://soundcloud.com/oriuplift',
'genres': ['Trance'], 'genres': ['Trance'],
'artists': ['Ori Uplift'], 'artists': ['Ori Uplift'],
'tags': ['Orchestral', 'Emotional', 'Uplifting Trance', 'Trance', 'Ori Uplift', 'UpOnly'],
}, },
'expected_warnings': ['Original download format is only available for registered users'],
}, },
# no album art, use avatar pic for thumbnail # no album art, use avatar pic for thumbnail
{ {
@@ -577,6 +590,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int, 'repost_count': int,
'uploader_url': 'https://soundcloud.com/garyvee', 'uploader_url': 'https://soundcloud.com/garyvee',
'artists': ['MadReal'], 'artists': ['MadReal'],
'tags': [],
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -604,8 +618,47 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int, 'repost_count': int,
'genres': ['Piano'], 'genres': ['Piano'],
'uploader_url': 'https://soundcloud.com/giovannisarani', 'uploader_url': 'https://soundcloud.com/giovannisarani',
'tags': 'count:10',
}, },
}, },
# .png "original" artwork, 160kbps m4a HLS format
{
'url': 'https://soundcloud.com/skorxh/audio-dealer',
'info_dict': {
'id': '2011421339',
'ext': 'm4a',
'title': 'audio dealer',
'description': '',
'uploader': '$KORCH',
'uploader_id': '150292288',
'uploader_url': 'https://soundcloud.com/skorxh',
'comment_count': int,
'view_count': int,
'like_count': int,
'repost_count': int,
'duration': 213.469,
'tags': [],
'artists': ['$KORXH'],
'track': 'audio dealer',
'timestamp': 1737143201,
'upload_date': '20250117',
'license': 'all-rights-reserved',
'thumbnail': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png',
'thumbnails': [
{'id': 'mini', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-mini.jpg'},
{'id': 'tiny', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-tiny.jpg'},
{'id': 'small', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-small.jpg'},
{'id': 'badge', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-badge.jpg'},
{'id': 't67x67', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t67x67.jpg'},
{'id': 'large', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-large.jpg'},
{'id': 't300x300', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t300x300.jpg'},
{'id': 'crop', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-crop.jpg'},
{'id': 't500x500', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t500x500.jpg'},
{'id': 'original', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png'},
],
},
'params': {'skip_download': 'm3u8', 'format': 'hls_aac_160k'},
},
{ {
# AAC HQ format available (account with active subscription needed) # AAC HQ format available (account with active subscription needed)
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1', 'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',

View File

@@ -1,5 +1,6 @@
from .bunnycdn import BunnyCdnIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import try_get, unified_timestamp from ..utils import make_archive_id, try_get, unified_timestamp
class SovietsClosetBaseIE(InfoExtractor): class SovietsClosetBaseIE(InfoExtractor):
@@ -43,7 +44,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'url': 'https://sovietscloset.com/video/1337', 'url': 'https://sovietscloset.com/video/1337',
'md5': 'bd012b04b261725510ca5383074cdd55', 'md5': 'bd012b04b261725510ca5383074cdd55',
'info_dict': { 'info_dict': {
'id': '1337', 'id': '2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Witcher #13', 'title': 'The Witcher #13',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67/thumbnail\.jpg$', 'thumbnail': r're:^https?://.*\.b-cdn\.net/2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67/thumbnail\.jpg$',
@@ -55,20 +56,23 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20170413', 'upload_date': '20170413',
'uploader_id': 'SovietWomble', 'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble', 'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 7007, 'duration': 7008,
'was_live': True, 'was_live': True,
'availability': 'public', 'availability': 'public',
'series': 'The Witcher', 'series': 'The Witcher',
'season': 'Misc', 'season': 'Misc',
'episode_number': 13, 'episode_number': 13,
'episode': 'Episode 13', 'episode': 'Episode 13',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1337'],
}, },
}, },
{ {
'url': 'https://sovietscloset.com/video/1105', 'url': 'https://sovietscloset.com/video/1105',
'md5': '89fa928f183893cb65a0b7be846d8a90', 'md5': '89fa928f183893cb65a0b7be846d8a90',
'info_dict': { 'info_dict': {
'id': '1105', 'id': 'c0e5e76f-3a93-40b4-bf01-12343c2eec5d',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Arma 3 - Zeus Games #5', 'title': 'Arma 3 - Zeus Games #5',
'uploader': 'SovietWomble', 'uploader': 'SovietWomble',
@@ -80,39 +84,20 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20160420', 'upload_date': '20160420',
'uploader_id': 'SovietWomble', 'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble', 'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 8804, 'duration': 8805,
'was_live': True, 'was_live': True,
'availability': 'public', 'availability': 'public',
'series': 'Arma 3', 'series': 'Arma 3',
'season': 'Zeus Games', 'season': 'Zeus Games',
'episode_number': 5, 'episode_number': 5,
'episode': 'Episode 5', 'episode': 'Episode 5',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1105'],
}, },
}, },
] ]
def _extract_bunnycdn_iframe(self, video_id, bunnycdn_id):
iframe = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/5105/{bunnycdn_id}',
video_id, note='Downloading BunnyCDN iframe', headers=self.MEDIADELIVERY_REFERER)
m3u8_url = self._search_regex(r'(https?://.*?\.m3u8)', iframe, 'm3u8 url')
thumbnail_url = self._search_regex(r'(https?://.*?thumbnail\.jpg)', iframe, 'thumbnail url')
m3u8_formats = self._extract_m3u8_formats(m3u8_url, video_id, headers=self.MEDIADELIVERY_REFERER)
if not m3u8_formats:
duration = None
else:
duration = self._extract_m3u8_vod_duration(
m3u8_formats[0]['url'], video_id, headers=self.MEDIADELIVERY_REFERER)
return {
'formats': m3u8_formats,
'thumbnail': thumbnail_url,
'duration': duration,
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@@ -122,13 +107,13 @@ class SovietsClosetIE(SovietsClosetBaseIE):
stream = self.parse_nuxt_jsonp(f'{static_assets_base}/video/{video_id}/payload.js', video_id, 'video')['stream'] stream = self.parse_nuxt_jsonp(f'{static_assets_base}/video/{video_id}/payload.js', video_id, 'video')['stream']
return { return self.url_result(
f'https://iframe.mediadelivery.net/embed/5105/{stream["bunnyId"]}', ie=BunnyCdnIE, url_transparent=True,
**self.video_meta( **self.video_meta(
video_id=video_id, game_name=stream['game']['name'], video_id=video_id, game_name=stream['game']['name'],
category_name=try_get(stream, lambda x: x['subcategory']['name'], str), category_name=try_get(stream, lambda x: x['subcategory']['name'], str),
episode_number=stream.get('number'), stream_date=stream.get('date')), episode_number=stream.get('number'), stream_date=stream.get('date')),
**self._extract_bunnycdn_iframe(video_id, stream['bunnyId']), _old_archive_ids=[make_archive_id(self, video_id)])
}
class SovietsClosetPlaylistIE(SovietsClosetBaseIE): class SovietsClosetPlaylistIE(SovietsClosetBaseIE):

View File

@@ -46,7 +46,7 @@ class TelecincoBaseIE(InfoExtractor):
error_code = traverse_obj( error_code = traverse_obj(
self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False), self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False),
({json.loads}, 'code', {int})) ({json.loads}, 'code', {int}))
if error_code == 4038: if error_code in (4038, 40313):
self.raise_geo_restricted(countries=['ES']) self.raise_geo_restricted(countries=['ES'])
raise raise

View File

@@ -26,6 +26,7 @@ from ..utils import (
srt_subtitles_timecode, srt_subtitles_timecode,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
truncate_string,
try_call, try_call,
try_get, try_get,
url_or_none, url_or_none,
@@ -249,6 +250,12 @@ class TikTokBaseIE(InfoExtractor):
elif fatal: elif fatal:
raise ExtractorError('Unable to extract webpage video data') raise ExtractorError('Unable to extract webpage video data')
if not traverse_obj(video_data, ('video', {dict})) and traverse_obj(video_data, ('isContentClassified', {bool})):
message = 'This post may not be comfortable for some audiences. Log in for access'
if fatal:
self.raise_login_required(message)
self.report_warning(f'{message}. {self._login_hint()}', video_id=video_id)
return video_data, status return video_data, status
def _get_subtitles(self, aweme_detail, aweme_id, user_name): def _get_subtitles(self, aweme_detail, aweme_id, user_name):
@@ -438,7 +445,7 @@ class TikTokBaseIE(InfoExtractor):
return { return {
'id': aweme_id, 'id': aweme_id,
**traverse_obj(aweme_detail, { **traverse_obj(aweme_detail, {
'title': ('desc', {str}), 'title': ('desc', {truncate_string(left=72)}),
'description': ('desc', {str}), 'description': ('desc', {str}),
'timestamp': ('create_time', {int_or_none}), 'timestamp': ('create_time', {int_or_none}),
}), }),
@@ -589,7 +596,7 @@ class TikTokBaseIE(InfoExtractor):
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
})), })),
**traverse_obj(aweme_detail, { **traverse_obj(aweme_detail, {
'title': ('desc', {str}), 'title': ('desc', {truncate_string(left=72)}),
'description': ('desc', {str}), 'description': ('desc', {str}),
# audio-only slideshows have a video duration of 0 and an actual audio duration # audio-only slideshows have a video duration of 0 and an actual audio duration
'duration': ('video', 'duration', {int_or_none}, filter), 'duration': ('video', 'duration', {int_or_none}, filter),
@@ -650,7 +657,7 @@ class TikTokIE(TikTokBaseIE):
'info_dict': { 'info_dict': {
'id': '6742501081818877190', 'id': '6742501081818877190',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:5e2a23877420bb85ce6521dbee39ba94', 'title': 'Tag 1 Friend reverse this Video and look what happens 🤩😱 @skyandtami ...',
'description': 'md5:5e2a23877420bb85ce6521dbee39ba94', 'description': 'md5:5e2a23877420bb85ce6521dbee39ba94',
'duration': 27, 'duration': 27,
'height': 1024, 'height': 1024,
@@ -854,7 +861,7 @@ class TikTokIE(TikTokBaseIE):
'info_dict': { 'info_dict': {
'id': '7253412088251534594', 'id': '7253412088251534594',
'ext': 'm4a', 'ext': 'm4a',
'title': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ', 'title': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #р...',
'description': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ', 'description': 'я ред флаг простите #переписка #щитпост #тревожныйтиппривязанности #рекомендации ',
'uploader': 'hara_yoimiya', 'uploader': 'hara_yoimiya',
'uploader_id': '6582536342634676230', 'uploader_id': '6582536342634676230',
@@ -895,8 +902,12 @@ class TikTokIE(TikTokBaseIE):
if video_data and status == 0: if video_data and status == 0:
return self._parse_aweme_video_web(video_data, url, video_id) return self._parse_aweme_video_web(video_data, url, video_id)
elif status == 10216: elif status in (10216, 10222):
raise ExtractorError('This video is private', expected=True) # 10216: private post; 10222: private account
self.raise_login_required(
'You do not have permission to view this post. Log into an account that has access')
elif status == 10204:
raise ExtractorError('Your IP address is blocked from accessing this post', expected=True)
raise ExtractorError(f'Video not available, status code {status}', video_id=video_id) raise ExtractorError(f'Video not available, status code {status}', video_id=video_id)

117
yt_dlp/extractor/tvw.py Normal file
View File

@@ -0,0 +1,117 @@
import json
from .common import InfoExtractor
from ..utils import clean_html, remove_end, unified_timestamp, url_or_none
from ..utils.traversal import traverse_obj
class TvwIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvw\.org/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://tvw.org/video/billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211/',
'md5': '9ceb94fe2bb7fd726f74f16356825703',
'info_dict': {
'id': '2024011211',
'ext': 'mp4',
'title': 'Billy Frank Jr. Statue Maquette Unveiling Ceremony',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:58a8150017d985b4f377e11ee8f6f36e',
'timestamp': 1704902400,
'upload_date': '20240110',
'location': 'Legislative Building',
'display_id': 'billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211',
'categories': ['General Interest'],
},
}, {
'url': 'https://tvw.org/video/ebeys-landing-state-park-2024081007/',
'md5': '71e87dae3deafd65d75ff3137b9a32fc',
'info_dict': {
'id': '2024081007',
'ext': 'mp4',
'title': 'Ebey\'s Landing State Park',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:50c5bd73bde32fa6286a008dbc853386',
'timestamp': 1724310900,
'upload_date': '20240822',
'location': 'Ebeys Landing State Park',
'display_id': 'ebeys-landing-state-park-2024081007',
'categories': ['Washington State Parks'],
},
}, {
'url': 'https://tvw.org/video/home-warranties-workgroup-2',
'md5': 'f678789bf94d07da89809f213cf37150',
'info_dict': {
'id': '1999121000',
'ext': 'mp4',
'title': 'Home Warranties Workgroup',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:861396cc523c9641d0dce690bc5c35f3',
'timestamp': 946389600,
'upload_date': '19991228',
'display_id': 'home-warranties-workgroup-2',
'categories': ['Legislative'],
},
}, {
'url': 'https://tvw.org/video/washington-to-washington-a-new-space-race-2022041111/?eventID=2022041111',
'md5': '6f5551090b351aba10c0d08a881b4f30',
'info_dict': {
'id': '2022041111',
'ext': 'mp4',
'title': 'Washington to Washington - A New Space Race',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:f65a24eec56107afbcebb3aa5cd26341',
'timestamp': 1650394800,
'upload_date': '20220419',
'location': 'Hayner Media Center',
'display_id': 'washington-to-washington-a-new-space-race-2022041111',
'categories': ['Washington to Washington', 'General Interest'],
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
client_id = self._html_search_meta('clientID', webpage, fatal=True)
video_id = self._html_search_meta('eventID', webpage, fatal=True)
video_data = self._download_json(
'https://api.v3.invintus.com/v2/Event/getDetailed', video_id,
headers={
'authorization': 'embedder',
'wsc-api-key': '7WhiEBzijpritypp8bqcU7pfU9uicDR',
},
data=json.dumps({
'clientID': client_id,
'eventID': video_id,
'showStreams': True,
}).encode())['data']
formats = []
subtitles = {}
for stream_url in traverse_obj(video_data, ('streamingURIs', ..., {url_or_none})):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if caption_url := traverse_obj(video_data, ('captionPath', {url_or_none})):
subtitles.setdefault('en', []).append({'url': caption_url, 'ext': 'vtt'})
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'title': remove_end(self._og_search_title(webpage, default=None), ' - TVW'),
'description': self._og_search_description(webpage, default=None),
**traverse_obj(video_data, {
'title': ('title', {str}),
'description': ('description', {clean_html}),
'categories': ('categories', ..., {str}),
'thumbnail': ('videoThumbnail', {url_or_none}),
'timestamp': ('startDateTime', {unified_timestamp}),
'location': ('locationName', {str}),
'is_live': ('eventStatus', {lambda x: x == 'live'}),
}),
}

View File

@@ -21,6 +21,7 @@ from ..utils import (
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj, traverse_obj,
truncate_string,
try_call, try_call,
try_get, try_get,
unified_timestamp, unified_timestamp,
@@ -358,6 +359,7 @@ class TwitterCardIE(InfoExtractor):
'display_id': '560070183650213889', 'display_id': '560070183650213889',
'uploader_url': 'https://twitter.com/Twitter', 'uploader_url': 'https://twitter.com/Twitter',
}, },
'skip': 'This content is no longer available.',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768', 'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
@@ -365,7 +367,7 @@ class TwitterCardIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '623160978427936768', 'id': '623160978427936768',
'ext': 'mp4', 'ext': 'mp4',
'title': "NASA - Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video.", 'title': "NASA - Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASA...",
'description': "Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video. https://t.co/BJYgOjSeGA", 'description': "Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video. https://t.co/BJYgOjSeGA",
'uploader': 'NASA', 'uploader': 'NASA',
'uploader_id': 'NASA', 'uploader_id': 'NASA',
@@ -377,12 +379,14 @@ class TwitterCardIE(InfoExtractor):
'like_count': int, 'like_count': int,
'repost_count': int, 'repost_count': int,
'tags': ['PlutoFlyby'], 'tags': ['PlutoFlyby'],
'channel_id': '11348282',
'_old_archive_ids': ['twitter 623160978427936768'],
}, },
'params': {'format': '[protocol=https]'}, 'params': {'format': '[protocol=https]'},
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977', 'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
'md5': 'b6d9683dd3f48e340ded81c0e917ad46', 'md5': 'fb08fbd69595cbd8818f0b2f2a94474d',
'info_dict': { 'info_dict': {
'id': 'dq4Oj5quskI', 'id': 'dq4Oj5quskI',
'ext': 'mp4', 'ext': 'mp4',
@@ -390,12 +394,12 @@ class TwitterCardIE(InfoExtractor):
'description': 'md5:a831e97fa384863d6e26ce48d1c43376', 'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
'upload_date': '20111013', 'upload_date': '20111013',
'uploader': 'OMG! UBUNTU!', 'uploader': 'OMG! UBUNTU!',
'uploader_id': 'omgubuntu', 'uploader_id': '@omgubuntu',
'channel_url': 'https://www.youtube.com/channel/UCIiSwcm9xiFb3Y4wjzR41eQ', 'channel_url': 'https://www.youtube.com/channel/UCIiSwcm9xiFb3Y4wjzR41eQ',
'channel_id': 'UCIiSwcm9xiFb3Y4wjzR41eQ', 'channel_id': 'UCIiSwcm9xiFb3Y4wjzR41eQ',
'channel_follower_count': int, 'channel_follower_count': int,
'chapters': 'count:8', 'chapters': 'count:8',
'uploader_url': 'http://www.youtube.com/user/omgubuntu', 'uploader_url': 'https://www.youtube.com/@omgubuntu',
'duration': 138, 'duration': 138,
'categories': ['Film & Animation'], 'categories': ['Film & Animation'],
'age_limit': 0, 'age_limit': 0,
@@ -407,6 +411,9 @@ class TwitterCardIE(InfoExtractor):
'tags': 'count:12', 'tags': 'count:12',
'channel': 'OMG! UBUNTU!', 'channel': 'OMG! UBUNTU!',
'playable_in_embed': True, 'playable_in_embed': True,
'heatmap': 'count:100',
'timestamp': 1318500227,
'live_status': 'not_live',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
}, },
@@ -548,13 +555,14 @@ class TwitterIE(TwitterBaseIE):
'age_limit': 0, 'age_limit': 0,
'_old_archive_ids': ['twitter 700207533655363584'], '_old_archive_ids': ['twitter 700207533655363584'],
}, },
'skip': 'Tweet has been deleted',
}, { }, {
'url': 'https://twitter.com/captainamerica/status/719944021058060289', 'url': 'https://twitter.com/captainamerica/status/719944021058060289',
'info_dict': { 'info_dict': {
'id': '717462543795523584', 'id': '717462543795523584',
'display_id': '719944021058060289', 'display_id': '719944021058060289',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.', 'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theat...',
'description': '@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI', 'description': '@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI',
'channel_id': '701615052', 'channel_id': '701615052',
'uploader_id': 'CaptainAmerica', 'uploader_id': 'CaptainAmerica',
@@ -591,7 +599,7 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '852077943283097602', 'id': '852077943283097602',
'ext': 'mp4', 'ext': 'mp4',
'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة', 'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعا...',
'description': 'كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة https://t.co/xg6OhpyKfN', 'description': 'كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة https://t.co/xg6OhpyKfN',
'channel_id': '2526757026', 'channel_id': '2526757026',
'uploader': 'عالم الأخبار', 'uploader': 'عالم الأخبار',
@@ -615,7 +623,7 @@ class TwitterIE(TwitterBaseIE):
'id': '910030238373089285', 'id': '910030238373089285',
'display_id': '910031516746514432', 'display_id': '910031516746514432',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Préfet de Guadeloupe - [Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre.', 'title': 'Préfet de Guadeloupe - [Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terr...',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': '[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo', 'description': '[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo',
'channel_id': '2319432498', 'channel_id': '2319432498',
@@ -707,7 +715,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1349774757969989634', 'id': '1349774757969989634',
'display_id': '1349794411333394432', 'display_id': '1349794411333394432',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:d1c4941658e4caaa6cb579260d85dcba', 'title': "Brooklyn Nets - WATCH: Sean Marks' full media session after our acquisition of 8-time...",
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:71ead15ec44cee55071547d6447c6a3e', 'description': 'md5:71ead15ec44cee55071547d6447c6a3e',
'channel_id': '18552281', 'channel_id': '18552281',
@@ -733,7 +741,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1577855447914409984', 'id': '1577855447914409984',
'display_id': '1577855540407197696', 'display_id': '1577855540407197696',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:466a3a8b049b5f5a13164ce915484b51', 'title': 'Oshtru - gm ✨️ now I can post image and video. nice update.',
'description': 'md5:b9c3699335447391d11753ab21c70a74', 'description': 'md5:b9c3699335447391d11753ab21c70a74',
'upload_date': '20221006', 'upload_date': '20221006',
'channel_id': '143077138', 'channel_id': '143077138',
@@ -755,10 +763,10 @@ class TwitterIE(TwitterBaseIE):
'url': 'https://twitter.com/UltimaShadowX/status/1577719286659006464', 'url': 'https://twitter.com/UltimaShadowX/status/1577719286659006464',
'info_dict': { 'info_dict': {
'id': '1577719286659006464', 'id': '1577719286659006464',
'title': 'Ultima Reload - Test', 'title': 'Ultima - Test',
'description': 'Test https://t.co/Y3KEZD7Dad', 'description': 'Test https://t.co/Y3KEZD7Dad',
'channel_id': '168922496', 'channel_id': '168922496',
'uploader': 'Ultima Reload', 'uploader': 'Ultima',
'uploader_id': 'UltimaShadowX', 'uploader_id': 'UltimaShadowX',
'uploader_url': 'https://twitter.com/UltimaShadowX', 'uploader_url': 'https://twitter.com/UltimaShadowX',
'upload_date': '20221005', 'upload_date': '20221005',
@@ -777,7 +785,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1575559336759263233', 'id': '1575559336759263233',
'display_id': '1575560063510810624', 'display_id': '1575560063510810624',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:eec26382babd0f7c18f041db8ae1c9c9', 'title': 'Max Olson - Absolutely heartbreaking footage captured by our surge probe of catas...',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:95aea692fda36a12081b9629b02daa92', 'description': 'md5:95aea692fda36a12081b9629b02daa92',
'channel_id': '1094109584', 'channel_id': '1094109584',
@@ -901,18 +909,18 @@ class TwitterIE(TwitterBaseIE):
'playlist_mincount': 2, 'playlist_mincount': 2,
'info_dict': { 'info_dict': {
'id': '1600649710662213632', 'id': '1600649710662213632',
'title': 'md5:be05989b0722e114103ed3851a0ffae2', 'title': "Jocelyn Laidlaw - How Kirstie Alley's tragic death inspired me to share more about my c...",
'timestamp': 1670459604.0, 'timestamp': 1670459604.0,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c', 'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
'comment_count': int, 'comment_count': int,
'uploader_id': 'CTVJLaidlaw', 'uploader_id': 'JocelynVLaidlaw',
'channel_id': '80082014', 'channel_id': '80082014',
'repost_count': int, 'repost_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'], 'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'upload_date': '20221208', 'upload_date': '20221208',
'age_limit': 0, 'age_limit': 0,
'uploader': 'Jocelyn Laidlaw', 'uploader': 'Jocelyn Laidlaw',
'uploader_url': 'https://twitter.com/CTVJLaidlaw', 'uploader_url': 'https://twitter.com/JocelynVLaidlaw',
'like_count': int, 'like_count': int,
}, },
}, { }, {
@@ -921,17 +929,17 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '1600649511827013632', 'id': '1600649511827013632',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:7662a0a27ce6faa3e5b160340f3cfab1', 'title': "Jocelyn Laidlaw - How Kirstie Alley's tragic death inspired me to share more about my c... #1",
'thumbnail': r're:^https?://.+\.jpg', 'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1670459604.0, 'timestamp': 1670459604.0,
'channel_id': '80082014', 'channel_id': '80082014',
'uploader_id': 'CTVJLaidlaw', 'uploader_id': 'JocelynVLaidlaw',
'uploader': 'Jocelyn Laidlaw', 'uploader': 'Jocelyn Laidlaw',
'repost_count': int, 'repost_count': int,
'comment_count': int, 'comment_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'], 'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'duration': 102.226, 'duration': 102.226,
'uploader_url': 'https://twitter.com/CTVJLaidlaw', 'uploader_url': 'https://twitter.com/JocelynVLaidlaw',
'display_id': '1600649710662213632', 'display_id': '1600649710662213632',
'like_count': int, 'like_count': int,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c', 'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
@@ -990,6 +998,7 @@ class TwitterIE(TwitterBaseIE):
'_old_archive_ids': ['twitter 1599108751385972737'], '_old_archive_ids': ['twitter 1599108751385972737'],
}, },
'params': {'noplaylist': True}, 'params': {'noplaylist': True},
'skip': 'Tweet is limited',
}, { }, {
'url': 'https://twitter.com/MunTheShinobi/status/1600009574919962625', 'url': 'https://twitter.com/MunTheShinobi/status/1600009574919962625',
'info_dict': { 'info_dict': {
@@ -1001,10 +1010,10 @@ class TwitterIE(TwitterBaseIE):
'description': 'This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525 https://t.co/cNsA0MoOml', 'description': 'This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525 https://t.co/cNsA0MoOml',
'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1600009362759733248/pu/img/XVhFQivj75H_YxxV.jpg?name=orig', 'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1600009362759733248/pu/img/XVhFQivj75H_YxxV.jpg?name=orig',
'age_limit': 0, 'age_limit': 0,
'uploader': 'Mün', 'uploader': 'Boy Called Mün',
'repost_count': int, 'repost_count': int,
'upload_date': '20221206', 'upload_date': '20221206',
'title': 'Mün - This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525', 'title': 'Boy Called Mün - This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525',
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'tags': [], 'tags': [],
@@ -1042,7 +1051,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1694928337846538240', 'id': '1694928337846538240',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1695424220702888009', 'display_id': '1695424220702888009',
'title': 'md5:e8daa9527bc2b947121395494f786d9d', 'title': 'Benny Johnson - Donald Trump driving through the urban, poor neighborhoods of Atlanta...',
'description': 'md5:004f2d37fd58737724ec75bc7e679938', 'description': 'md5:004f2d37fd58737724ec75bc7e679938',
'channel_id': '15212187', 'channel_id': '15212187',
'uploader': 'Benny Johnson', 'uploader': 'Benny Johnson',
@@ -1066,7 +1075,7 @@ class TwitterIE(TwitterBaseIE):
'id': '1694928337846538240', 'id': '1694928337846538240',
'ext': 'mp4', 'ext': 'mp4',
'display_id': '1695424220702888009', 'display_id': '1695424220702888009',
'title': 'md5:e8daa9527bc2b947121395494f786d9d', 'title': 'Benny Johnson - Donald Trump driving through the urban, poor neighborhoods of Atlanta...',
'description': 'md5:004f2d37fd58737724ec75bc7e679938', 'description': 'md5:004f2d37fd58737724ec75bc7e679938',
'channel_id': '15212187', 'channel_id': '15212187',
'uploader': 'Benny Johnson', 'uploader': 'Benny Johnson',
@@ -1101,6 +1110,7 @@ class TwitterIE(TwitterBaseIE):
'view_count': int, 'view_count': int,
}, },
'add_ie': ['TwitterBroadcast'], 'add_ie': ['TwitterBroadcast'],
'skip': 'Broadcast no longer exists',
}, { }, {
# Animated gif and quote tweet video # Animated gif and quote tweet video
'url': 'https://twitter.com/BAKKOOONN/status/1696256659889565950', 'url': 'https://twitter.com/BAKKOOONN/status/1696256659889565950',
@@ -1129,7 +1139,7 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '1724883339285544960', 'id': '1724883339285544960',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:cc56716f9ed0b368de2ba54c478e493c', 'title': 'Robert F. Kennedy Jr - A beautifully crafted short film by Mikki Willis about my independent...',
'description': 'md5:9dc14f5b0f1311fc7caf591ae253a164', 'description': 'md5:9dc14f5b0f1311fc7caf591ae253a164',
'display_id': '1724884212803834154', 'display_id': '1724884212803834154',
'channel_id': '337808606', 'channel_id': '337808606',
@@ -1150,7 +1160,7 @@ class TwitterIE(TwitterBaseIE):
}, { }, {
# x.com # x.com
'url': 'https://x.com/historyinmemes/status/1790637656616943991', 'url': 'https://x.com/historyinmemes/status/1790637656616943991',
'md5': 'daca3952ba0defe2cfafb1276d4c1ea5', 'md5': '4549eda363fecfe37439c455923cba2c',
'info_dict': { 'info_dict': {
'id': '1790637589910654976', 'id': '1790637589910654976',
'ext': 'mp4', 'ext': 'mp4',
@@ -1334,7 +1344,7 @@ class TwitterIE(TwitterBaseIE):
def _generate_syndication_token(self, twid): def _generate_syndication_token(self, twid):
# ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '') # ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
translation = str.maketrans(dict.fromkeys('0.')) translation = str.maketrans(dict.fromkeys('0.'))
return js_number_to_string((int(twid) / 1e15) * math.PI, 36).translate(translation) return js_number_to_string((int(twid) / 1e15) * math.pi, 36).translate(translation)
def _call_syndication_api(self, twid): def _call_syndication_api(self, twid):
self.report_warning( self.report_warning(
@@ -1390,7 +1400,7 @@ class TwitterIE(TwitterBaseIE):
title = description = traverse_obj( title = description = traverse_obj(
status, (('full_text', 'text'), {lambda x: x.replace('\n', ' ')}), get_all=False) or '' status, (('full_text', 'text'), {lambda x: x.replace('\n', ' ')}), get_all=False) or ''
# strip 'https -_t.co_BJYgOjSeGA' junk from filenames # strip 'https -_t.co_BJYgOjSeGA' junk from filenames
title = re.sub(r'\s+(https?://[^ ]+)', '', title) title = truncate_string(re.sub(r'\s+(https?://[^ ]+)', '', title), left=72)
user = status.get('user') or {} user = status.get('user') or {}
uploader = user.get('name') uploader = user.get('name')
if uploader: if uploader:

View File

@@ -116,6 +116,7 @@ class VKIE(VKBaseIE):
'id': '-77521_162222515', 'id': '-77521_162222515',
'ext': 'mp4', 'ext': 'mp4',
'title': 'ProtivoGunz - Хуёвая песня', 'title': 'ProtivoGunz - Хуёвая песня',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*', 'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
'uploader_id': '39545378', 'uploader_id': '39545378',
'duration': 195, 'duration': 195,
@@ -165,6 +166,7 @@ class VKIE(VKBaseIE):
'id': '-93049196_456239755', 'id': '-93049196_456239755',
'ext': 'mp4', 'ext': 'mp4',
'title': '8 серия (озвучка)', 'title': '8 серия (озвучка)',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'duration': 8383, 'duration': 8383,
'comment_count': int, 'comment_count': int,
'uploader': 'Dizi2021', 'uploader': 'Dizi2021',
@@ -240,6 +242,7 @@ class VKIE(VKBaseIE):
'upload_date': '20221005', 'upload_date': '20221005',
'uploader': 'Шальная Императрица', 'uploader': 'Шальная Императрица',
'uploader_id': '-74006511', 'uploader_id': '-74006511',
'description': 'md5:f9315f7786fa0e84e75e4f824a48b056',
}, },
}, },
{ {
@@ -278,6 +281,25 @@ class VKIE(VKBaseIE):
}, },
'skip': 'No formats found', 'skip': 'No formats found',
}, },
{
'note': 'video has chapters',
'url': 'https://vkvideo.ru/video-18403220_456239696',
'info_dict': {
'id': '-18403220_456239696',
'ext': 'mp4',
'title': 'Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер',
'description': 'md5:b112ea9de53683b6d03d29076f62eec2',
'uploader': 'Руслан Усачев',
'uploader_id': '-18403220',
'comment_count': int,
'like_count': int,
'duration': 1983,
'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:21',
'timestamp': 1738252883,
'upload_date': '20250130',
},
},
{ {
# live stream, hls and rtmp links, most likely already finished live # live stream, hls and rtmp links, most likely already finished live
# stream by the time you are reading this comment # stream by the time you are reading this comment
@@ -449,7 +471,6 @@ class VKIE(VKBaseIE):
return self.url_result(opts_url) return self.url_result(opts_url)
data = player['params'][0] data = player['params'][0]
title = unescapeHTML(data['md_title'])
# 2 = live # 2 = live
# 3 = post live (finished live) # 3 = post live (finished live)
@@ -507,17 +528,29 @@ class VKIE(VKBaseIE):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': title, 'subtitles': subtitles,
'thumbnail': data.get('jpg'), **traverse_obj(mv_data, {
'uploader': data.get('md_author'), 'title': ('title', {unescapeHTML}),
'uploader_id': str_or_none(data.get('author_id') or mv_data.get('authorId')), 'description': ('desc', {clean_html}, filter),
'duration': int_or_none(data.get('duration') or mv_data.get('duration')), 'duration': ('duration', {int_or_none}),
'like_count': ('likes', {int_or_none}),
'comment_count': ('commcount', {int_or_none}),
}),
**traverse_obj(data, {
'title': ('md_title', {unescapeHTML}),
'description': ('description', {clean_html}, filter),
'thumbnail': ('jpg', {url_or_none}),
'uploader': ('md_author', {str}),
'uploader_id': (('author_id', 'authorId'), {str_or_none}, any),
'duration': ('duration', {int_or_none}),
'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), {
'title': ('text', {str}),
'start_time': 'time',
}),
}),
'timestamp': timestamp, 'timestamp': timestamp,
'view_count': view_count, 'view_count': view_count,
'like_count': int_or_none(mv_data.get('likes')),
'comment_count': int_or_none(mv_data.get('commcount')),
'is_live': is_live, 'is_live': is_live,
'subtitles': subtitles,
'_format_sort_fields': ('res', 'source'), '_format_sort_fields': ('res', 'source'),
} }

View File

@@ -2,31 +2,33 @@ import json
import time import time
import urllib.parse import urllib.parse
from .gigya import GigyaBaseIE from .common import InfoExtractor
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
extract_attributes, extract_attributes,
filter_dict,
float_or_none, float_or_none,
get_element_by_class, get_element_by_class,
get_element_html_by_class, get_element_html_by_class,
int_or_none, int_or_none,
join_nonempty, jwt_decode_hs256,
jwt_encode_hs256, jwt_encode_hs256,
make_archive_id, make_archive_id,
merge_dicts, merge_dicts,
parse_age_limit, parse_age_limit,
parse_duration,
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj, traverse_obj,
try_call,
url_or_none, url_or_none,
urlencode_postdata,
) )
class VRTBaseIE(GigyaBaseIE): class VRTBaseIE(InfoExtractor):
_GEO_BYPASS = False _GEO_BYPASS = False
_PLAYER_INFO = { _PLAYER_INFO = {
'platform': 'desktop', 'platform': 'desktop',
@@ -37,11 +39,11 @@ class VRTBaseIE(GigyaBaseIE):
'device': 'undefined (undefined)', 'device': 'undefined (undefined)',
'os': { 'os': {
'name': 'Windows', 'name': 'Windows',
'version': 'x86_64', 'version': '10',
}, },
'player': { 'player': {
'name': 'VRT web player', 'name': 'VRT web player',
'version': '2.7.4-prod-2023-04-19T06:05:45', 'version': '5.1.1-prod-2025-02-14T08:44:16"',
}, },
} }
# From https://player.vrt.be/vrtnws/js/main.js & https://player.vrt.be/ketnet/js/main.8cdb11341bcb79e4cd44.js # From https://player.vrt.be/vrtnws/js/main.js & https://player.vrt.be/ketnet/js/main.8cdb11341bcb79e4cd44.js
@@ -90,20 +92,21 @@ class VRTBaseIE(GigyaBaseIE):
def _call_api(self, video_id, client='null', id_token=None, version='v2'): def _call_api(self, video_id, client='null', id_token=None, version='v2'):
player_info = {'exp': (round(time.time(), 3) + 900), **self._PLAYER_INFO} player_info = {'exp': (round(time.time(), 3) + 900), **self._PLAYER_INFO}
player_token = self._download_json( player_token = self._download_json(
'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v2/tokens', f'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/{version}/tokens',
video_id, 'Downloading player token', headers={ video_id, 'Downloading player token', 'Failed to download player token', headers={
**self.geo_verification_headers(), **self.geo_verification_headers(),
'Content-Type': 'application/json', 'Content-Type': 'application/json',
}, data=json.dumps({ }, data=json.dumps({
'identityToken': id_token or {}, 'identityToken': id_token or '',
'playerInfo': jwt_encode_hs256(player_info, self._JWT_SIGNING_KEY, headers={ 'playerInfo': jwt_encode_hs256(player_info, self._JWT_SIGNING_KEY, headers={
'kid': self._JWT_KEY_ID, 'kid': self._JWT_KEY_ID,
}).decode(), }).decode(),
}, separators=(',', ':')).encode())['vrtPlayerToken'] }, separators=(',', ':')).encode())['vrtPlayerToken']
return self._download_json( return self._download_json(
f'https://media-services-public.vrt.be/media-aggregator/{version}/media-items/{video_id}', # The URL below redirects to https://media-services-public.vrt.be/media-aggregator/{version}/media-items/{video_id}
video_id, 'Downloading API JSON', query={ f'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/{version}/videos/{video_id}',
video_id, 'Downloading API JSON', 'Failed to download API JSON', query={
'vrtPlayerToken': player_token, 'vrtPlayerToken': player_token,
'client': client, 'client': client,
}, expected_status=400) }, expected_status=400)
@@ -177,215 +180,286 @@ class VRTIE(VRTBaseIE):
class VrtNUIE(VRTBaseIE): class VrtNUIE(VRTBaseIE):
IE_DESC = 'VRT MAX' IE_NAME = 'vrtmax'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/vrtnu/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)' IE_DESC = 'VRT MAX (formerly VRT NU)'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?:vrtnu|vrtmax)/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# CONTENT_IS_AGE_RESTRICTED 'url': 'https://www.vrt.be/vrtmax/a-z/ket---doc/trailer/ket---doc-trailer-s6/',
'url': 'https://www.vrt.be/vrtnu/a-z/de-ideale-wereld/2023-vj/de-ideale-wereld-d20230116/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-855b00a8-6ce2-4032-ac4f-1fcf3ae78524$vid-d2243aa1-ec46-4e34-a55b-92568459906f', 'id': 'pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tom Waes', 'channel': 'ketnet',
'description': 'Satirisch actualiteitenmagazine met Ella Leyers. Tom Waes is te gast.', 'description': 'Neem een kijkje in de bijzondere wereld van deze Ketnetters.',
'timestamp': 1673905125, 'display_id': 'ket---doc-trailer-s6',
'release_timestamp': 1673905125, 'duration': 30.0,
'series': 'De ideale wereld', 'episode': 'Reeks 6 volledig vanaf 3 maart',
'season_id': '1672830988794', 'episode_id': '1739450401467',
'episode': 'Aflevering 1', 'season': 'Trailer',
'episode_number': 1, 'season_id': '1739450401467',
'episode_id': '1672830988861', 'series': 'Ket & Doc',
'display_id': 'de-ideale-wereld-d20230116', 'thumbnail': 'https://images.vrt.be/orig/2025/02/21/63f07122-5bbd-4ca1-b42e-8565c6cd95df.jpg',
'channel': 'VRT', 'timestamp': 1740373200,
'duration': 1939.0, 'title': 'Reeks 6 volledig vanaf 3 maart',
'thumbnail': 'https://images.vrt.be/orig/2023/01/10/1bb39cb3-9115-11ed-b07d-02b7b76bf47f.jpg', 'upload_date': '20250224',
'release_date': '20230116', '_old_archive_ids': [
'upload_date': '20230116', 'canvas pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
'age_limit': 12, 'ketnet pbs-pub-c8a78645-5d3e-468a-89ec-6f3ed5534bd5$vid-242ddfe9-18f5-4e16-ab45-09b122a19251',
],
}, },
}, { }, {
'url': 'https://www.vrt.be/vrtnu/a-z/buurman--wat-doet-u-nu-/6/buurman--wat-doet-u-nu--s6-trailer/', 'url': 'https://www.vrt.be/vrtmax/a-z/meisjes/6/meisjes-s6a5/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-ad4050eb-d9e5-48c2-9ec8-b6c355032361$vid-0465537a-34a8-4617-8352-4d8d983b4eee', 'id': 'pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Trailer seizoen 6 \'Buurman, wat doet u nu?\'', 'channel': 'ketnet',
'description': 'md5:197424726c61384b4e5c519f16c0cf02', 'description': 'md5:713793f15cbf677f66200b36b7b1ec5a',
'timestamp': 1652940000, 'display_id': 'meisjes-s6a5',
'release_timestamp': 1652940000, 'duration': 1336.02,
'series': 'Buurman, wat doet u nu?', 'episode': 'Week 5',
'season': 'Seizoen 6', 'episode_id': '1684157692901',
'episode_number': 5,
'season': '6',
'season_id': '1684157692901',
'season_number': 6, 'season_number': 6,
'season_id': '1652344200907', 'series': 'Meisjes',
'episode': 'Aflevering 0', 'thumbnail': 'https://images.vrt.be/orig/2023/05/14/bf526ae0-f1d9-11ed-91d7-02b7b76bf47f.jpg',
'episode_number': 0, 'timestamp': 1685251800,
'episode_id': '1652951873524', 'title': 'Week 5',
'display_id': 'buurman--wat-doet-u-nu--s6-trailer', 'upload_date': '20230528',
'channel': 'VRT', '_old_archive_ids': [
'duration': 33.13, 'canvas pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'thumbnail': 'https://images.vrt.be/orig/2022/05/23/3c234d21-da83-11ec-b07d-02b7b76bf47f.jpg', 'ketnet pbs-pub-97b541ab-e05c-43b9-9a40-445702ef7189$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'release_date': '20220519', ],
'upload_date': '20220519', },
}, {
'url': 'https://www.vrt.be/vrtnu/a-z/taboe/3/taboe-s3a4/',
'info_dict': {
'id': 'pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
'ext': 'mp4',
'channel': 'een',
'description': 'md5:bf61345a95eca9393a95de4a7a54b5c6',
'display_id': 'taboe-s3a4',
'duration': 2882.02,
'episode': 'Mensen met het syndroom van Gilles de la Tourette',
'episode_id': '1739055911734',
'episode_number': 4,
'season': '3',
'season_id': '1739055911734',
'season_number': 3,
'series': 'Taboe',
'thumbnail': 'https://images.vrt.be/orig/2025/02/19/8198496c-d1ae-4bca-9a48-761cf3ea3ff2.jpg',
'timestamp': 1740286800,
'title': 'Mensen met het syndroom van Gilles de la Tourette',
'upload_date': '20250223',
'_old_archive_ids': [
'canvas pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
'ketnet pbs-pub-f50faa3a-1778-46b6-9117-4ba85f197703$vid-547507fe-1c8b-4394-b361-21e627cbd0fd',
],
}, },
'params': {'skip_download': 'm3u8'},
}] }]
_NETRC_MACHINE = 'vrtnu' _NETRC_MACHINE = 'vrtnu'
_authenticated = False
_TOKEN_COOKIE_DOMAIN = '.www.vrt.be'
_ACCESS_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_at'
_REFRESH_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_rt'
_VIDEO_TOKEN_COOKIE_NAME = 'vrtnu-site_profile_vt'
_VIDEO_PAGE_QUERY = '''
query VideoPage($pageId: ID!) {
page(id: $pageId) {
... on EpisodePage {
episode {
ageRaw
description
durationRaw
episodeNumberRaw
id
name
onTimeRaw
program {
title
}
season {
id
titleRaw
}
title
brand
}
ldjson
player {
image {
templateUrl
}
modes {
streamId
}
}
}
}
}
'''
def _fetch_tokens(self):
has_credentials = self._get_login_info()[0]
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
if (access_token and not self._is_jwt_token_expired(access_token)
and video_token and not self._is_jwt_token_expired(video_token)):
return access_token, video_token
if has_credentials:
access_token, video_token = self.cache.load(self._NETRC_MACHINE, 'token_data', default=(None, None))
if (access_token and not self._is_jwt_token_expired(access_token)
and video_token and not self._is_jwt_token_expired(video_token)):
self.write_debug('Restored tokens from cache')
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._ACCESS_TOKEN_COOKIE_NAME, access_token)
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._VIDEO_TOKEN_COOKIE_NAME, video_token)
return access_token, video_token
if not self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME):
return None, None
self._request_webpage(
'https://www.vrt.be/vrtmax/sso/refresh', None,
note='Refreshing tokens', errnote='Failed to refresh tokens', fatal=False)
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
if not access_token or not video_token:
self.cache.store(self._NETRC_MACHINE, 'refresh_token', None)
self.cookiejar.clear(self._TOKEN_COOKIE_DOMAIN, '/vrtmax/sso', self._REFRESH_TOKEN_COOKIE_NAME)
msg = 'Refreshing of tokens failed'
if not has_credentials:
self.report_warning(msg)
return None, None
self.report_warning(f'{msg}. Re-logging in')
return self._perform_login(*self._get_login_info())
if has_credentials:
self.cache.store(self._NETRC_MACHINE, 'token_data', (access_token, video_token))
return access_token, video_token
def _get_vrt_cookie(self, cookie_name):
# Refresh token cookie is scoped to /vrtmax/sso, others are scoped to /
return try_call(lambda: self._get_cookies('https://www.vrt.be/vrtmax/sso')[cookie_name].value)
@staticmethod
def _is_jwt_token_expired(token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _perform_login(self, username, password): def _perform_login(self, username, password):
auth_info = self._gigya_login({ refresh_token = self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME)
'APIKey': '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy', if refresh_token and not self._is_jwt_token_expired(refresh_token):
'targetEnv': 'jssdk', self.write_debug('Using refresh token from logged-in cookies; skipping login with credentials')
'loginID': username, return
'password': password,
'authMode': 'cookie',
})
if auth_info.get('errorDetails'): refresh_token = self.cache.load(self._NETRC_MACHINE, 'refresh_token', default=None)
raise ExtractorError(f'Unable to login. VrtNU said: {auth_info["errorDetails"]}', expected=True) if refresh_token and not self._is_jwt_token_expired(refresh_token):
self.write_debug('Restored refresh token from cache')
self._set_cookie(self._TOKEN_COOKIE_DOMAIN, self._REFRESH_TOKEN_COOKIE_NAME, refresh_token, path='/vrtmax/sso')
return
# Sometimes authentication fails for no good reason, retry self._request_webpage(
for retry in self.RetryManager(): 'https://www.vrt.be/vrtmax/sso/login', None,
if retry.attempt > 1: note='Getting session cookies', errnote='Failed to get session cookies')
self._sleep(1, None)
try:
self._request_webpage(
'https://token.vrt.be/vrtnuinitlogin', None, note='Requesting XSRF Token',
errnote='Could not get XSRF Token', query={
'provider': 'site',
'destination': 'https://www.vrt.be/vrtnu/',
})
self._request_webpage(
'https://login.vrt.be/perform_login', None,
note='Performing login', errnote='Login failed',
query={'client_id': 'vrtnu-site'}, data=urlencode_postdata({
'UID': auth_info['UID'],
'UIDSignature': auth_info['UIDSignature'],
'signatureTimestamp': auth_info['signatureTimestamp'],
'_csrf': self._get_cookies('https://login.vrt.be').get('OIDCXSRF').value,
}))
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
retry.error = e
continue
raise
self._authenticated = True login_data = self._download_json(
'https://login.vrt.be/perform_login', None, data=json.dumps({
'clientId': 'vrtnu-site',
'loginID': username,
'password': password,
}).encode(), headers={
'Content-Type': 'application/json',
'Oidcxsrf': self._get_cookies('https://login.vrt.be')['OIDCXSRF'].value,
}, note='Logging in', errnote='Login failed', expected_status=403)
if login_data.get('errorCode'):
raise ExtractorError(f'Login failed: {login_data.get("errorMessage")}', expected=True)
self._request_webpage(
login_data['redirectUrl'], None,
note='Getting access token', errnote='Failed to get access token')
access_token = self._get_vrt_cookie(self._ACCESS_TOKEN_COOKIE_NAME)
video_token = self._get_vrt_cookie(self._VIDEO_TOKEN_COOKIE_NAME)
refresh_token = self._get_vrt_cookie(self._REFRESH_TOKEN_COOKIE_NAME)
if not all((access_token, video_token, refresh_token)):
raise ExtractorError('Unable to extract token cookie values')
self.cache.store(self._NETRC_MACHINE, 'token_data', (access_token, video_token))
self.cache.store(self._NETRC_MACHINE, 'refresh_token', refresh_token)
return access_token, video_token
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
parsed_url = urllib.parse.urlparse(url) access_token, video_token = self._fetch_tokens()
details = self._download_json(
f'{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path.rstrip("/")}.model.json',
display_id, 'Downloading asset JSON', 'Unable to download asset JSON')['details']
watch_info = traverse_obj(details, ( metadata = self._download_json(
'actions', lambda _, v: v['type'] == 'watch-episode', {dict}), get_all=False) or {} f'https://www.vrt.be/vrtnu-api/graphql{"" if access_token else "/public"}/v1',
video_id = join_nonempty( display_id, 'Downloading asset JSON', 'Unable to download asset JSON',
'episodePublicationId', 'episodeVideoId', delim='$', from_dict=watch_info) data=json.dumps({
if '$' not in video_id: 'operationName': 'VideoPage',
raise ExtractorError('Unable to extract video ID') 'query': self._VIDEO_PAGE_QUERY,
'variables': {'pageId': urllib.parse.urlparse(url).path},
}).encode(),
headers=filter_dict({
'Authorization': f'Bearer {access_token}' if access_token else None,
'Content-Type': 'application/json',
'x-vrt-client-name': 'WEB',
'x-vrt-client-version': '1.5.9',
'x-vrt-zone': 'default',
}))['data']['page']
vrtnutoken = self._download_json( video_id = metadata['player']['modes'][0]['streamId']
'https://token.vrt.be/refreshtoken', video_id, note='Retrieving vrtnutoken',
errnote='Token refresh failed')['vrtnutoken'] if self._authenticated else None
video_info = self._call_api(video_id, 'vrtnu-web@PROD', vrtnutoken) try:
streaming_info = self._call_api(video_id, 'vrtnu-web@PROD', id_token=video_token)
except ExtractorError as e:
if not video_token and isinstance(e.cause, HTTPError) and e.cause.status == 404:
self.raise_login_required()
raise
if 'title' not in video_info: formats, subtitles = self._extract_formats_and_subtitles(streaming_info, video_id)
code = video_info.get('code')
if code in ('AUTHENTICATION_REQUIRED', 'CONTENT_IS_AGE_RESTRICTED'): code = traverse_obj(streaming_info, ('code', {str}))
self.raise_login_required(code, method='password') if not formats and code:
elif code in ('INVALID_LOCATION', 'CONTENT_AVAILABLE_ONLY_IN_BE'): if code in ('CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS', 'CONTENT_AVAILABLE_ONLY_IN_BE', 'CONTENT_UNAVAILABLE_VIA_PROXY'):
self.raise_geo_restricted(countries=['BE']) self.raise_geo_restricted(countries=['BE'])
elif code == 'CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS_AND_EXPATS': elif code in ('CONTENT_AVAILABLE_ONLY_FOR_BE_RESIDENTS_AND_EXPATS', 'CONTENT_IS_AGE_RESTRICTED', 'CONTENT_REQUIRES_AUTHENTICATION'):
if not self._authenticated: self.raise_login_required()
self.raise_login_required(code, method='password') else:
self.raise_geo_restricted(countries=['BE']) self.raise_no_formats(f'Unable to extract formats: {code}')
raise ExtractorError(code, expected=True)
formats, subtitles = self._extract_formats_and_subtitles(video_info, video_id)
return { return {
**traverse_obj(details, { 'duration': float_or_none(streaming_info.get('duration'), 1000),
'title': 'title', 'thumbnail': url_or_none(streaming_info.get('posterImageUrl')),
'description': ('description', {clean_html}), **self._json_ld(traverse_obj(metadata, ('ldjson', ..., {json.loads})), video_id, fatal=False),
'timestamp': ('data', 'episode', 'onTime', 'raw', {parse_iso8601}), **traverse_obj(metadata, ('episode', {
'release_timestamp': ('data', 'episode', 'onTime', 'raw', {parse_iso8601}), 'title': ('title', {str}),
'series': ('data', 'program', 'title'), 'description': ('description', {str}),
'season': ('data', 'season', 'title', 'value'), 'timestamp': ('onTimeRaw', {parse_iso8601}),
'season_number': ('data', 'season', 'title', 'raw', {int_or_none}), 'series': ('program', 'title', {str}),
'season_id': ('data', 'season', 'id', {str_or_none}), 'season': ('season', 'titleRaw', {str}),
'episode': ('data', 'episode', 'number', 'value', {str_or_none}), 'season_number': ('season', 'titleRaw', {int_or_none}),
'episode_number': ('data', 'episode', 'number', 'raw', {int_or_none}), 'season_id': ('id', {str_or_none}),
'episode_id': ('data', 'episode', 'id', {str_or_none}), 'episode': ('title', {str}),
'age_limit': ('data', 'episode', 'age', 'raw', {parse_age_limit}), 'episode_number': ('episodeNumberRaw', {int_or_none}),
}), 'episode_id': ('id', {str_or_none}),
'age_limit': ('ageRaw', {parse_age_limit}),
'channel': ('brand', {str}),
'duration': ('durationRaw', {parse_duration}),
})),
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'channel': 'VRT',
'formats': formats,
'duration': float_or_none(video_info.get('duration'), 1000),
'thumbnail': url_or_none(video_info.get('posterImageUrl')),
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('Canvas', video_id)],
}
class KetnetIE(VRTBaseIE):
_VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ketnet.be/kijken/m/meisjes/6/meisjes-s6a5',
'info_dict': {
'id': 'pbs-pub-39f8351c-a0a0-43e6-8394-205d597d6162$vid-5e306921-a9aa-4fa9-9f39-5b82c8f1028e',
'ext': 'mp4',
'title': 'Meisjes',
'episode': 'Reeks 6: Week 5',
'season': 'Reeks 6',
'series': 'Meisjes',
'timestamp': 1685251800,
'upload_date': '20230528',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_json(
'https://senior-bff.ketnet.be/graphql', display_id, query={
'query': '''{
video(id: "content/ketnet/nl/%s.model.json") {
description
episodeNr
imageUrl
mediaReference
programTitle
publicationDate
seasonTitle
subtitleVideodetail
titleVideodetail
}
}''' % display_id, # noqa: UP031
})['data']['video']
video_id = urllib.parse.unquote(video['mediaReference'])
data = self._call_api(video_id, 'ketnet@PROD', version='v1')
formats, subtitles = self._extract_formats_and_subtitles(data, video_id)
return {
'id': video_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('Canvas', video_id)], '_old_archive_ids': [make_archive_id('Canvas', video_id),
**traverse_obj(video, { make_archive_id('Ketnet', video_id)],
'title': ('titleVideodetail', {str}),
'description': ('description', {str}),
'thumbnail': ('thumbnail', {url_or_none}),
'timestamp': ('publicationDate', {parse_iso8601}),
'series': ('programTitle', {str}),
'season': ('seasonTitle', {str}),
'episode': ('subtitleVideodetail', {str}),
'episode_number': ('episodeNr', {int_or_none}),
}),
} }

View File

@@ -12,6 +12,7 @@ from ..utils import (
str_or_none, str_or_none,
strip_jsonp, strip_jsonp,
traverse_obj, traverse_obj,
truncate_string,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@@ -96,7 +97,8 @@ class WeiboBaseIE(InfoExtractor):
}) })
return formats return formats
def _parse_video_info(self, video_info, video_id=None): def _parse_video_info(self, video_info):
video_id = traverse_obj(video_info, (('id', 'id_str', 'mid'), {str_or_none}, any))
return { return {
'id': video_id, 'id': video_id,
'extractor_key': WeiboIE.ie_key(), 'extractor_key': WeiboIE.ie_key(),
@@ -105,9 +107,10 @@ class WeiboBaseIE(InfoExtractor):
'http_headers': {'Referer': 'https://weibo.com/'}, 'http_headers': {'Referer': 'https://weibo.com/'},
'_old_archive_ids': [make_archive_id('WeiboMobile', video_id)], '_old_archive_ids': [make_archive_id('WeiboMobile', video_id)],
**traverse_obj(video_info, { **traverse_obj(video_info, {
'id': (('id', 'id_str', 'mid'), {str_or_none}),
'display_id': ('mblogid', {str_or_none}), 'display_id': ('mblogid', {str_or_none}),
'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter), 'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'),
{lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}, filter),
'alt_title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter),
'description': ('text_raw', {str}), 'description': ('text_raw', {str}),
'duration': ('page_info', 'media_info', 'duration', {int_or_none}), 'duration': ('page_info', 'media_info', 'duration', {int_or_none}),
'timestamp': ('page_info', 'media_info', 'video_publish_time', {int_or_none}), 'timestamp': ('page_info', 'media_info', 'video_publish_time', {int_or_none}),
@@ -129,9 +132,11 @@ class WeiboIE(WeiboBaseIE):
'url': 'https://weibo.com/7827771738/N4xlMvjhI', 'url': 'https://weibo.com/7827771738/N4xlMvjhI',
'info_dict': { 'info_dict': {
'id': '4910815147462302', 'id': '4910815147462302',
'_old_archive_ids': ['weibomobile 4910815147462302'],
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'N4xlMvjhI', 'display_id': 'N4xlMvjhI',
'title': '【睡前消息暑假版第一期:拉泰国一把 对中国有好处】', 'title': '【睡前消息暑假版第一期:拉泰国一把 对中国有好处】',
'alt_title': '【睡前消息暑假版第一期:拉泰国一把 对中国有好处】',
'description': 'md5:e2637a7673980d68694ea7c43cf12a5f', 'description': 'md5:e2637a7673980d68694ea7c43cf12a5f',
'duration': 918, 'duration': 918,
'timestamp': 1686312819, 'timestamp': 1686312819,
@@ -149,9 +154,11 @@ class WeiboIE(WeiboBaseIE):
'url': 'https://m.weibo.cn/status/4189191225395228', 'url': 'https://m.weibo.cn/status/4189191225395228',
'info_dict': { 'info_dict': {
'id': '4189191225395228', 'id': '4189191225395228',
'_old_archive_ids': ['weibomobile 4189191225395228'],
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'FBqgOmDxO', 'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频', 'title': '柴犬柴犬的秒拍视频',
'alt_title': '柴犬柴犬的秒拍视频',
'description': 'md5:80f461ab5cdae6bbdb70efbf5a1db24f', 'description': 'md5:80f461ab5cdae6bbdb70efbf5a1db24f',
'duration': 53, 'duration': 53,
'timestamp': 1514264429, 'timestamp': 1514264429,
@@ -166,34 +173,35 @@ class WeiboIE(WeiboBaseIE):
}, },
}, { }, {
'url': 'https://m.weibo.cn/detail/4189191225395228', 'url': 'https://m.weibo.cn/detail/4189191225395228',
'info_dict': { 'only_matching': True,
'id': '4189191225395228',
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagramshibainu.gaku http://t.cn/RHbmjzW ',
'duration': 53,
'timestamp': 1514264429,
'upload_date': '20171226',
'thumbnail': r're:https://.*\.jpg',
'uploader': '柴犬柴犬',
'uploader_id': '5926682210',
'uploader_url': 'https://weibo.com/u/5926682210',
'view_count': int,
'like_count': int,
'repost_count': int,
},
}, { }, {
'url': 'https://weibo.com/0/4224132150961381', 'url': 'https://weibo.com/0/4224132150961381',
'note': 'no playback_list example', 'note': 'no playback_list example',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://m.weibo.cn/detail/5120561132606436',
'info_dict': {
'id': '5120561132606436',
},
'playlist_count': 9,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self._parse_video_info(self._weibo_download_json( meta = self._weibo_download_json(f'https://weibo.com/ajax/statuses/show?id={video_id}', video_id)
f'https://weibo.com/ajax/statuses/show?id={video_id}', video_id)) mix_media_info = traverse_obj(meta, ('mix_media_info', 'items', ...))
if not mix_media_info:
return self._parse_video_info(meta)
return self.playlist_result(self._entries(mix_media_info), video_id)
def _entries(self, mix_media_info):
for media_info in traverse_obj(mix_media_info, lambda _, v: v['type'] != 'pic'):
yield self._parse_video_info(traverse_obj(media_info, {
'id': ('data', 'object_id'),
'page_info': {'media_info': ('data', 'media_info', {dict})},
}))
class WeiboVideoIE(WeiboBaseIE): class WeiboVideoIE(WeiboBaseIE):
@@ -205,6 +213,7 @@ class WeiboVideoIE(WeiboBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'LEZDodaiW', 'display_id': 'LEZDodaiW',
'title': '稍微了解了一下靡烟miya感觉这东西也太二了', 'title': '稍微了解了一下靡烟miya感觉这东西也太二了',
'alt_title': '稍微了解了一下靡烟miya感觉这东西也太二了',
'description': '稍微了解了一下靡烟miya感觉这东西也太二了 http://t.cn/A6aerGsM \u200b\u200b\u200b', 'description': '稍微了解了一下靡烟miya感觉这东西也太二了 http://t.cn/A6aerGsM \u200b\u200b\u200b',
'duration': 76, 'duration': 76,
'timestamp': 1659344278, 'timestamp': 1659344278,
@@ -216,6 +225,7 @@ class WeiboVideoIE(WeiboBaseIE):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'repost_count': int, 'repost_count': int,
'_old_archive_ids': ['weibomobile 4797700463137878'],
}, },
}] }]

View File

@@ -100,8 +100,8 @@ class WSJIE(InfoExtractor):
class WSJArticleIE(InfoExtractor): class WSJArticleIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?wsj\.com/articles/(?P<id>[^/?#&]+)' _VALID_URL = r'(?i)https?://(?:www\.)?wsj\.com/(?:articles|opinion)/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'https://www.wsj.com/articles/dont-like-china-no-pandas-for-you-1490366939?', 'url': 'https://www.wsj.com/articles/dont-like-china-no-pandas-for-you-1490366939?',
'info_dict': { 'info_dict': {
'id': '4B13FA62-1D8C-45DB-8EA1-4105CB20B362', 'id': '4B13FA62-1D8C-45DB-8EA1-4105CB20B362',
@@ -110,11 +110,20 @@ class WSJArticleIE(InfoExtractor):
'uploader_id': 'ralcaraz', 'uploader_id': 'ralcaraz',
'title': 'Bao Bao the Panda Leaves for China', 'title': 'Bao Bao the Panda Leaves for China',
}, },
} }, {
'url': 'https://www.wsj.com/opinion/hamas-hostages-caskets-bibas-family-israel-gaza-29da083b',
'info_dict': {
'id': 'CE68D629-8DB8-4CD3-B30A-92112C102054',
'ext': 'mp4',
'upload_date': '20241007',
'uploader_id': 'Tinnes, David',
'title': 'WSJ Opinion: "Get the Jew": The Crown Heights Riot Revisited',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
article_id = self._match_id(url) article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id) webpage = self._download_webpage(url, article_id, impersonate=True)
video_id = self._search_regex( video_id = self._search_regex(
r'(?:id=["\']video|video-|iframe\.html\?guid=|data-src=["\'])([a-fA-F0-9-]{36})', r'(?:id=["\']video|video-|iframe\.html\?guid=|data-src=["\'])([a-fA-F0-9-]{36})',
webpage, 'video id') webpage, 'video id')

View File

@@ -0,0 +1,50 @@
# flake8: noqa: F401
from ._base import YoutubeBaseInfoExtractor
from ._clip import YoutubeClipIE
from ._mistakes import YoutubeTruncatedIDIE, YoutubeTruncatedURLIE
from ._notifications import YoutubeNotificationsIE
from ._redirect import (
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
)
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
# Hack to allow plugin overrides work
for _cls in [
YoutubeBaseInfoExtractor,
YoutubeClipIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeNotificationsIE,
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
YoutubeMusicSearchURLIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubePlaylistIE,
YoutubeTabBaseInfoExtractor,
YoutubeTabIE,
YoutubeIE,
]:
_cls.__module__ = 'yt_dlp.extractor.youtube'

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,66 @@
from ._tab import YoutubeTabBaseInfoExtractor
from ._video import YoutubeIE
from ...utils import ExtractorError, traverse_obj
class YoutubeClipIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:clip'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/clip/(?P<id>[^/?#]+)'
_TESTS = [{
# FIXME: Other metadata should be extracted from the clip, not from the base video
'url': 'https://www.youtube.com/clip/UgytZKpehg-hEMBSn3F4AaABCQ',
'info_dict': {
'id': 'UgytZKpehg-hEMBSn3F4AaABCQ',
'ext': 'mp4',
'section_start': 29.0,
'section_end': 39.7,
'duration': 10.7,
'age_limit': 0,
'availability': 'public',
'categories': ['Gaming'],
'channel': 'Scott The Woz',
'channel_id': 'UC4rqhyiTs7XyuODcECvuiiQ',
'channel_url': 'https://www.youtube.com/channel/UC4rqhyiTs7XyuODcECvuiiQ',
'description': 'md5:7a4517a17ea9b4bd98996399d8bb36e7',
'like_count': int,
'playable_in_embed': True,
'tags': 'count:17',
'thumbnail': 'https://i.ytimg.com/vi_webp/ScPX26pdQik/maxresdefault.webp',
'title': 'Mobile Games on Console - Scott The Woz',
'upload_date': '20210920',
'uploader': 'Scott The Woz',
'uploader_id': '@ScottTheWoz',
'uploader_url': 'https://www.youtube.com/@ScottTheWoz',
'view_count': int,
'live_status': 'not_live',
'channel_follower_count': int,
'chapters': 'count:20',
'comment_count': int,
'heatmap': 'count:100',
},
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
_, data = self._extract_webpage(url, clip_id)
video_id = traverse_obj(data, ('currentVideoEndpoint', 'watchEndpoint', 'videoId'))
if not video_id:
raise ExtractorError('Unable to find video ID')
clip_data = traverse_obj(data, (
'engagementPanels', ..., 'engagementPanelSectionListRenderer', 'content', 'clipSectionRenderer',
'contents', ..., 'clipAttributionRenderer', 'onScrubExit', 'commandExecutorCommand', 'commands', ...,
'openPopupAction', 'popup', 'notificationActionRenderer', 'actionButton', 'buttonRenderer', 'command',
'commandExecutorCommand', 'commands', ..., 'loopCommand'), get_all=False)
return {
'_type': 'url_transparent',
'url': f'https://www.youtube.com/watch?v={video_id}',
'ie_key': YoutubeIE.ie_key(),
'id': clip_id,
'section_start': int(clip_data['startTimeMs']) / 1000,
'section_end': int(clip_data['endTimeMs']) / 1000,
'_format_sort_fields': ( # https protocol is prioritized for ffmpeg compatibility
'proto:https', 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang'),
}

View File

@@ -0,0 +1,69 @@
from ._base import YoutubeBaseInfoExtractor
from ...utils import ExtractorError
class YoutubeTruncatedURLIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list
_VALID_URL = r'''(?x)
(?:https?://)?
(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/
(?:watch\?(?:
feature=[a-z_]+|
annotation_id=annotation_[^&]+|
x-yt-cl=[0-9]+|
hl=[^&]*|
t=[0-9]+
)?
|
attribution_link\?a=[^&]+
)
$
'''
_TESTS = [{
'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?feature=foo',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?hl=en-GB',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?t=2372',
'only_matching': True,
}]
def _real_extract(self, url):
raise ExtractorError(
'Did you forget to quote the URL? Remember that & is a meta '
'character in most shells, so you want to put the URL in quotes, '
'like yt-dlp '
'"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
' or simply yt-dlp BaW_jenozKc .',
expected=True)
class YoutubeTruncatedIDIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_id'
IE_DESC = False # Do not list
_VALID_URL = r'https?://(?:www\.)?youtube\.com/watch\?v=(?P<id>[0-9A-Za-z_-]{1,10})$'
_TESTS = [{
'url': 'https://www.youtube.com/watch?v=N_708QY7Ob',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
raise ExtractorError(
f'Incomplete YouTube ID {video_id}. URL {url} looks truncated.',
expected=True)

View File

@@ -0,0 +1,98 @@
import itertools
import re
from ._tab import YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
from ...utils import traverse_obj
class YoutubeNotificationsIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:notif'
IE_DESC = 'YouTube notifications; ":ytnotif" keyword (requires cookies)'
_VALID_URL = r':ytnotif(?:ication)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytnotif',
'only_matching': True,
}, {
'url': ':ytnotifications',
'only_matching': True,
}]
def _extract_notification_menu(self, response, continuation_list):
notification_list = traverse_obj(
response,
('actions', 0, 'openPopupAction', 'popup', 'multiPageMenuRenderer', 'sections', 0, 'multiPageMenuNotificationSectionRenderer', 'items'),
('actions', 0, 'appendContinuationItemsAction', 'continuationItems'),
expected_type=list) or []
continuation_list[0] = None
for item in notification_list:
entry = self._extract_notification_renderer(item.get('notificationRenderer'))
if entry:
yield entry
continuation = item.get('continuationItemRenderer')
if continuation:
continuation_list[0] = continuation
def _extract_notification_renderer(self, notification):
video_id = traverse_obj(
notification, ('navigationEndpoint', 'watchEndpoint', 'videoId'), expected_type=str)
url = f'https://www.youtube.com/watch?v={video_id}'
channel_id = None
if not video_id:
browse_ep = traverse_obj(
notification, ('navigationEndpoint', 'browseEndpoint'), expected_type=dict)
channel_id = self.ucid_or_none(traverse_obj(browse_ep, 'browseId', expected_type=str))
post_id = self._search_regex(
r'/post/(.+)', traverse_obj(browse_ep, 'canonicalBaseUrl', expected_type=str),
'post id', default=None)
if not channel_id or not post_id:
return
# The direct /post url redirects to this in the browser
url = f'https://www.youtube.com/channel/{channel_id}/community?lb={post_id}'
channel = traverse_obj(
notification, ('contextualMenu', 'menuRenderer', 'items', 1, 'menuServiceItemRenderer', 'text', 'runs', 1, 'text'),
expected_type=str)
notification_title = self._get_text(notification, 'shortMessage')
if notification_title:
notification_title = notification_title.replace('\xad', '') # remove soft hyphens
# TODO: handle recommended videos
title = self._search_regex(
rf'{re.escape(channel or "")}[^:]+: (.+)', notification_title,
'video title', default=None)
timestamp = (self._parse_time_text(self._get_text(notification, 'sentTimeText'))
if self._configuration_arg('approximate_date', ie_key=YoutubeTabIE)
else None)
return {
'_type': 'url',
'url': url,
'ie_key': (YoutubeIE if video_id else YoutubeTabIE).ie_key(),
'video_id': video_id,
'title': title,
'channel_id': channel_id,
'channel': channel,
'uploader': channel,
'thumbnails': self._extract_thumbnails(notification, 'videoThumbnail'),
'timestamp': timestamp,
}
def _notification_menu_entries(self, ytcfg):
continuation_list = [None]
response = None
for page in itertools.count(1):
ctoken = traverse_obj(
continuation_list, (0, 'continuationEndpoint', 'getNotificationMenuEndpoint', 'ctoken'), expected_type=str)
response = self._extract_response(
item_id=f'page {page}', query={'ctoken': ctoken} if ctoken else {}, ytcfg=ytcfg,
ep='notification/get_notification_menu', check_get_keys='actions',
headers=self.generate_api_headers(ytcfg=ytcfg, visitor_data=self._extract_visitor_data(response)))
yield from self._extract_notification_menu(response, continuation_list)
if not continuation_list[0]:
break
def _real_extract(self, url):
display_id = 'notifications'
ytcfg = self._download_ytcfg('web', display_id) if not self.skip_webpage else {}
self._report_playlist_authcheck(ytcfg)
return self.playlist_result(self._notification_menu_entries(ytcfg), display_id, display_id)

View File

@@ -0,0 +1,247 @@
import base64
import urllib.parse
from ._base import YoutubeBaseInfoExtractor
from ._tab import YoutubeTabIE
from ...utils import ExtractorError, classproperty, parse_qs, update_url_query, url_or_none
class YoutubeYtBeIE(YoutubeBaseInfoExtractor):
IE_DESC = 'youtu.be'
_VALID_URL = rf'https?://youtu\.be/(?P<id>[0-9A-Za-z_-]{{11}})/*?.*?\blist=(?P<playlist_id>{YoutubeBaseInfoExtractor._PLAYLIST_ID_RE})'
_TESTS = [{
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
'info_dict': {
'id': 'yeWKywCrFtk',
'ext': 'mp4',
'title': 'Small Scale Baler and Braiding Rugs',
'uploader': 'Backus-Page House Museum',
'uploader_id': '@backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@backuspagemuseum',
'upload_date': '20161008',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
'like_count': int,
'age_limit': 0,
'playable_in_embed': True,
'thumbnail': r're:^https?://.*\.webp',
'channel': 'Backus-Page House Museum',
'channel_id': 'UCEfMCQ9bs3tjvjy1s451zaw',
'live_status': 'not_live',
'view_count': int,
'channel_url': 'https://www.youtube.com/channel/UCEfMCQ9bs3tjvjy1s451zaw',
'availability': 'public',
'duration': 59,
'comment_count': int,
'channel_follower_count': int,
},
'params': {
'noplaylist': True,
'skip_download': True,
},
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
playlist_id = mobj.group('playlist_id')
return self.url_result(
update_url_query('https://www.youtube.com/watch', {
'v': video_id,
'list': playlist_id,
'feature': 'youtu.be',
}), ie=YoutubeTabIE.ie_key(), video_id=playlist_id)
class YoutubeLivestreamEmbedIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube livestream embeds'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/embed/live_stream/?\?(?:[^#]+&)?channel=(?P<id>[^&#]+)'
_TESTS = [{
'url': 'https://www.youtube.com/embed/live_stream?channel=UC2_KI6RB__jGdlnK6dvFEZA',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/channel/{channel_id}/live',
ie=YoutubeTabIE.ie_key(), video_id=channel_id)
class YoutubeYtUserIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube user videos; "ytuser:" prefix'
IE_NAME = 'youtube:user'
_VALID_URL = r'ytuser:(?P<id>.+)'
_TESTS = [{
'url': 'ytuser:phihag',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
return self.url_result(f'https://www.youtube.com/user/{user_id}', YoutubeTabIE, user_id)
class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:favorites'
IE_DESC = 'YouTube liked videos; ":ytfav" keyword (requires cookies)'
_VALID_URL = r':ytfav(?:ou?rite)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytfav',
'only_matching': True,
}, {
'url': ':ytfavorites',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=LL',
ie=YoutubeTabIE.ie_key())
class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
"""
Base class for feed extractors
Subclasses must re-define the _FEED_NAME property.
"""
_LOGIN_REQUIRED = True
_FEED_NAME = 'feeds'
@classproperty
def IE_NAME(cls):
return f'youtube:{cls._FEED_NAME}'
def _real_extract(self, url):
return self.url_result(
f'https://www.youtube.com/feed/{self._FEED_NAME}', ie=YoutubeTabIE.ie_key())
class YoutubeWatchLaterIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:watchlater'
IE_DESC = 'Youtube watch later list; ":ytwatchlater" keyword (requires cookies)'
_VALID_URL = r':ytwatchlater'
_TESTS = [{
'url': ':ytwatchlater',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=WL', ie=YoutubeTabIE.ie_key())
class YoutubeRecommendedIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube recommended videos; ":ytrec" keyword'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/?(?:[?#]|$)|:ytrec(?:ommended)?'
_FEED_NAME = 'recommended'
_LOGIN_REQUIRED = False
_TESTS = [{
'url': ':ytrec',
'only_matching': True,
}, {
'url': ':ytrecommended',
'only_matching': True,
}, {
'url': 'https://youtube.com',
'only_matching': True,
}]
class YoutubeSubscriptionsIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)'
_VALID_URL = r':ytsub(?:scription)?s?'
_FEED_NAME = 'subscriptions'
_TESTS = [{
'url': ':ytsubs',
'only_matching': True,
}, {
'url': ':ytsubscriptions',
'only_matching': True,
}]
class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'Youtube watch history; ":ythis" keyword (requires cookies)'
_VALID_URL = r':ythis(?:tory)?'
_FEED_NAME = 'history'
_TESTS = [{
'url': ':ythistory',
'only_matching': True,
}]
class YoutubeShortsAudioPivotIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube Shorts audio pivot (Shorts using audio of a given video)'
IE_NAME = 'youtube:shorts:pivot:audio'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/source/(?P<id>[\w-]{11})/shorts'
_TESTS = [{
'url': 'https://www.youtube.com/source/Lyj-MZSAA9o/shorts',
'only_matching': True,
}]
@staticmethod
def _generate_audio_pivot_params(video_id):
"""
Generates sfv_audio_pivot browse params for this video id
"""
pb_params = b'\xf2\x05+\n)\x12\'\n\x0b%b\x12\x0b%b\x1a\x0b%b' % ((video_id.encode(),) * 3)
return urllib.parse.quote(base64.b64encode(pb_params).decode())
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/feed/sfv_audio_pivot?bp={self._generate_audio_pivot_params(video_id)}',
ie=YoutubeTabIE)
class YoutubeConsentRedirectIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:consent'
IE_DESC = False # Do not list
_VALID_URL = r'https?://consent\.youtube\.com/m\?'
_TESTS = [{
'url': 'https://consent.youtube.com/m?continue=https%3A%2F%2Fwww.youtube.com%2Flive%2FqVv6vCqciTM%3Fcbrd%3D1&gl=NL&m=0&pc=yt&hl=en&src=1',
'info_dict': {
'id': 'qVv6vCqciTM',
'ext': 'mp4',
'age_limit': 0,
'uploader_id': '@sana_natori',
'comment_count': int,
'chapters': 'count:13',
'upload_date': '20221223',
'thumbnail': 'https://i.ytimg.com/vi/qVv6vCqciTM/maxresdefault.jpg',
'channel_url': 'https://www.youtube.com/channel/UCIdEIHpS0TdkqRkHL5OkLtA',
'uploader_url': 'https://www.youtube.com/@sana_natori',
'like_count': int,
'release_date': '20221223',
'tags': ['Vtuber', '月ノ美兎', '名取さな', 'にじさんじ', 'クリスマス', '3D配信'],
'title': '【 #インターネット女クリスマス 】3Dで歌ってはしゃぐインターネットの女たち【月美兎/名取さな】',
'view_count': int,
'playable_in_embed': True,
'duration': 4438,
'availability': 'public',
'channel_follower_count': int,
'channel_id': 'UCIdEIHpS0TdkqRkHL5OkLtA',
'categories': ['Entertainment'],
'live_status': 'was_live',
'release_timestamp': 1671793345,
'channel': 'さなちゃんねる',
'description': 'md5:6aebf95cc4a1d731aebc01ad6cc9806d',
'uploader': 'さなちゃんねる',
'channel_is_verified': True,
'heatmap': 'count:100',
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
redirect_url = url_or_none(parse_qs(url).get('continue', [None])[-1])
if not redirect_url:
raise ExtractorError('Invalid cookie consent redirect URL', expected=True)
return self.url_result(redirect_url)

View File

@@ -0,0 +1,167 @@
import urllib.parse
from ._tab import YoutubeTabBaseInfoExtractor
from ..common import SearchInfoExtractor
from ...utils import join_nonempty, parse_qs
class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_DESC = 'YouTube search'
IE_NAME = 'youtube:search'
_SEARCH_KEY = 'ytsearch'
_SEARCH_PARAMS = 'EgIQAfABAQ==' # Videos only
_TESTS = [{
'url': 'ytsearch5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'note': 'Suicide/self-harm search warning',
'url': 'ytsearch1:i hate myself and i wanna die',
'playlist_count': 1,
'info_dict': {
'id': 'i hate myself and i wanna die',
'title': 'i hate myself and i wanna die',
},
}]
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube search, newest videos first'
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
_TESTS = [{
'url': 'ytsearchdate5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}]
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube search URLs with sorting and filter support'
IE_NAME = YoutubeSearchIE.IE_NAME + '_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/(?:results|search)\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.youtube.com/results?search_query=python&sp=EgIQAg%253D%253D',
'playlist_mincount': 5,
'info_dict': {
'id': 'python',
'title': 'python',
},
}, {
'url': 'https://www.youtube.com/results?search_query=%23cats',
'playlist_mincount': 1,
'info_dict': {
'id': '#cats',
'title': '#cats',
# The test suite does not have support for nested playlists
# 'entries': [{
# 'url': r're:https://(www\.)?youtube\.com/hashtag/cats',
# 'title': '#cats',
# }],
},
}, {
# Channel results
'url': 'https://www.youtube.com/results?search_query=kurzgesagt&sp=EgIQAg%253D%253D',
'info_dict': {
'id': 'kurzgesagt',
'title': 'kurzgesagt',
},
'playlist': [{
'info_dict': {
'_type': 'url',
'id': 'UCsXVk37bltHxD1rDPwtNM8Q',
'url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'ie_key': 'YoutubeTab',
'channel': 'Kurzgesagt In a Nutshell',
'description': 'md5:4ae48dfa9505ffc307dad26342d06bfc',
'title': 'Kurzgesagt In a Nutshell',
'channel_id': 'UCsXVk37bltHxD1rDPwtNM8Q',
# No longer available for search as it is set to the handle.
# 'playlist_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'thumbnails': list,
'uploader_id': '@kurzgesagt',
'uploader_url': 'https://www.youtube.com/@kurzgesagt',
'uploader': 'Kurzgesagt In a Nutshell',
'channel_is_verified': True,
'channel_follower_count': int,
},
}],
'params': {'extract_flat': True, 'playlist_items': '1'},
'playlist_mincount': 1,
}, {
'url': 'https://www.youtube.com/results?q=test&sp=EgQIBBgB',
'only_matching': True,
}]
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
return self.playlist_result(self._search_results(query, qs.get('sp', (None,))[0]), query, query)
class YoutubeMusicSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube music search URLs with selectable sections, e.g. #songs'
IE_NAME = 'youtube:music:search_url'
_VALID_URL = r'https?://music\.youtube\.com/search\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://music.youtube.com/search?q=royalty+free+music',
'playlist_count': 16,
'info_dict': {
'id': 'royalty free music',
'title': 'royalty free music',
},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music&sp=EgWKAQIIAWoKEAoQAxAEEAkQBQ%3D%3D',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - songs',
'title': 'royalty free music - songs',
},
'params': {'extract_flat': 'in_playlist'},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music#community+playlists',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - community playlists',
'title': 'royalty free music - community playlists',
},
'params': {'extract_flat': 'in_playlist'},
}]
_SECTIONS = {
'albums': 'EgWKAQIYAWoKEAoQAxAEEAkQBQ==',
'artists': 'EgWKAQIgAWoKEAoQAxAEEAkQBQ==',
'community playlists': 'EgeKAQQoAEABagoQChADEAQQCRAF',
'featured playlists': 'EgeKAQQoADgBagwQAxAJEAQQDhAKEAU==',
'songs': 'EgWKAQIIAWoKEAoQAxAEEAkQBQ==',
'videos': 'EgWKAQIQAWoKEAoQAxAEEAkQBQ==',
}
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
params = qs.get('sp', (None,))[0]
if params:
section = next((k for k, v in self._SECTIONS.items() if v == params), params)
else:
section = urllib.parse.unquote_plus(([*url.split('#'), ''])[1]).lower()
params = self._SECTIONS.get(section)
if not params:
section = None
title = join_nonempty(query, section, delim=' - ')
return self.playlist_result(self._search_results(query, params, default_client='web_music'), title, title)

File diff suppressed because it is too large Load Diff

30
yt_dlp/globals.py Normal file
View File

@@ -0,0 +1,30 @@
from collections import defaultdict
# Please Note: Due to necessary changes and the complex nature involved in the plugin/globals system,
# no backwards compatibility is guaranteed for the plugin system API.
# However, we will still try our best.
class Indirect:
def __init__(self, initial, /):
self.value = initial
def __repr__(self, /):
return f'{type(self).__name__}({self.value!r})'
postprocessors = Indirect({})
extractors = Indirect({})
# Plugins
all_plugins_loaded = Indirect(False)
plugin_specs = Indirect({})
plugin_dirs = Indirect(['default'])
plugin_ies = Indirect({})
plugin_pps = Indirect({})
plugin_ies_overrides = Indirect(defaultdict(list))
# Misc
IN_CLI = Indirect(False)
LAZY_EXTRACTORS = Indirect(None) # `False`=force, `None`=disabled, `True`=enabled

View File

@@ -301,7 +301,7 @@ class JSInterpreter:
OP_CHARS = '+-*/%&|^=<>!,;{}:[' OP_CHARS = '+-*/%&|^=<>!,;{}:['
if not expr: if not expr:
return return
counters = {k: 0 for k in _MATCHING_PARENS.values()} counters = dict.fromkeys(_MATCHING_PARENS.values(), 0)
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1 start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
in_quote, escaping, after_op, in_regex_char_group = None, False, True, False in_quote, escaping, after_op, in_regex_char_group = None, False, True, False
for idx, char in enumerate(expr): for idx, char in enumerate(expr):
@@ -890,9 +890,9 @@ class JSInterpreter:
code, _ = self._separate_at_paren(func_m.group('code')) code, _ = self._separate_at_paren(func_m.group('code'))
return [x.strip() for x in func_m.group('args').split(',')], code return [x.strip() for x in func_m.group('args').split(',')], code
def extract_function(self, funcname): def extract_function(self, funcname, *global_stack):
return function_with_repr( return function_with_repr(
self.extract_function_from_code(*self.extract_function_code(funcname)), self.extract_function_from_code(*self.extract_function_code(funcname), *global_stack),
f'F<{funcname}>') f'F<{funcname}>')
def extract_function_from_code(self, argnames, code, *global_stack): def extract_function_from_code(self, argnames, code, *global_stack):

View File

@@ -21,9 +21,11 @@ if urllib3 is None:
urllib3_version = tuple(int_or_none(x, default=0) for x in urllib3.__version__.split('.')) urllib3_version = tuple(int_or_none(x, default=0) for x in urllib3.__version__.split('.'))
if urllib3_version < (1, 26, 17): if urllib3_version < (1, 26, 17):
urllib3._yt_dlp__version = f'{urllib3.__version__} (unsupported)'
raise ImportError('Only urllib3 >= 1.26.17 is supported') raise ImportError('Only urllib3 >= 1.26.17 is supported')
if requests.__build__ < 0x023202: if requests.__build__ < 0x023202:
requests._yt_dlp__version = f'{requests.__version__} (unsupported)'
raise ImportError('Only requests >= 2.32.2 is supported') raise ImportError('Only requests >= 2.32.2 is supported')
import requests.adapters import requests.adapters
@@ -296,6 +298,7 @@ class RequestsRH(RequestHandler, InstanceStoreMixin):
extensions.pop('cookiejar', None) extensions.pop('cookiejar', None)
extensions.pop('timeout', None) extensions.pop('timeout', None)
extensions.pop('legacy_ssl', None) extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def _create_instance(self, cookiejar, legacy_ssl_support=None): def _create_instance(self, cookiejar, legacy_ssl_support=None):
session = RequestsSession() session = RequestsSession()
@@ -312,11 +315,12 @@ class RequestsRH(RequestHandler, InstanceStoreMixin):
session.trust_env = False # no need, we already load proxies from env session.trust_env = False # no need, we already load proxies from env
return session return session
def _send(self, request): def _prepare_headers(self, _, headers):
headers = self._merge_headers(request.headers)
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS) add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
max_redirects_exceeded = False max_redirects_exceeded = False
session = self._get_instance( session = self._get_instance(

View File

@@ -379,13 +379,15 @@ class UrllibRH(RequestHandler, InstanceStoreMixin):
opener.addheaders = [] opener.addheaders = []
return opener return opener
def _send(self, request): def _prepare_headers(self, _, headers):
headers = self._merge_headers(request.headers)
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS) add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
urllib_req = urllib.request.Request( urllib_req = urllib.request.Request(
url=request.url, url=request.url,
data=request.data, data=request.data,
headers=dict(headers), headers=headers,
method=request.method, method=request.method,
) )

View File

@@ -34,6 +34,7 @@ import websockets.version
websockets_version = tuple(map(int_or_none, websockets.version.version.split('.'))) websockets_version = tuple(map(int_or_none, websockets.version.version.split('.')))
if websockets_version < (13, 0): if websockets_version < (13, 0):
websockets._yt_dlp__version = f'{websockets.version.version} (unsupported)'
raise ImportError('Only websockets>=13.0 is supported') raise ImportError('Only websockets>=13.0 is supported')
import websockets.sync.client import websockets.sync.client
@@ -116,6 +117,7 @@ class WebsocketsRH(WebSocketRequestHandler):
extensions.pop('timeout', None) extensions.pop('timeout', None)
extensions.pop('cookiejar', None) extensions.pop('cookiejar', None)
extensions.pop('legacy_ssl', None) extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def close(self): def close(self):
# Remove the logging handler that contains a reference to our logger # Remove the logging handler that contains a reference to our logger
@@ -123,15 +125,16 @@ class WebsocketsRH(WebSocketRequestHandler):
for name, handler in self.__logging_handlers.items(): for name, handler in self.__logging_handlers.items():
logging.getLogger(name).removeHandler(handler) logging.getLogger(name).removeHandler(handler)
def _send(self, request): def _prepare_headers(self, request, headers):
timeout = self._calculate_timeout(request)
headers = self._merge_headers(request.headers)
if 'cookie' not in headers: if 'cookie' not in headers:
cookiejar = self._get_cookiejar(request) cookiejar = self._get_cookiejar(request)
cookie_header = cookiejar.get_cookie_header(request.url) cookie_header = cookiejar.get_cookie_header(request.url)
if cookie_header: if cookie_header:
headers['cookie'] = cookie_header headers['cookie'] = cookie_header
def _send(self, request):
timeout = self._calculate_timeout(request)
headers = self._get_headers(request)
wsuri = parse_uri(request.url) wsuri = parse_uri(request.url)
create_conn_kwargs = { create_conn_kwargs = {
'source_address': (self.source_address, 0) if self.source_address else None, 'source_address': (self.source_address, 0) if self.source_address else None,

View File

@@ -206,6 +206,7 @@ class RequestHandler(abc.ABC):
- `cookiejar`: Cookiejar to use for this request. - `cookiejar`: Cookiejar to use for this request.
- `timeout`: socket timeout to use for this request. - `timeout`: socket timeout to use for this request.
- `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support. - `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support.
- `keep_header_casing`: Keep the casing of headers when sending the request.
To enable these, add extensions.pop('<extension>', None) to _check_extensions To enable these, add extensions.pop('<extension>', None) to _check_extensions
Apart from the url protocol, proxies dict may contain the following keys: Apart from the url protocol, proxies dict may contain the following keys:
@@ -259,6 +260,23 @@ class RequestHandler(abc.ABC):
def _merge_headers(self, request_headers): def _merge_headers(self, request_headers):
return HTTPHeaderDict(self.headers, request_headers) return HTTPHeaderDict(self.headers, request_headers)
def _prepare_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external use.
Subclasses may define a _prepare_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers)
self._prepare_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
def _calculate_timeout(self, request): def _calculate_timeout(self, request):
return float(request.extensions.get('timeout') or self.timeout) return float(request.extensions.get('timeout') or self.timeout)
@@ -317,6 +335,7 @@ class RequestHandler(abc.ABC):
assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType)) assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType))
assert isinstance(extensions.get('timeout'), (float, int, NoneType)) assert isinstance(extensions.get('timeout'), (float, int, NoneType))
assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType)) assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType))
assert isinstance(extensions.get('keep_header_casing'), (bool, NoneType))
def _validate(self, request): def _validate(self, request):
self._check_url_scheme(request) self._check_url_scheme(request)

View File

@@ -5,11 +5,11 @@ from abc import ABC
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any from typing import Any
from .common import RequestHandler, register_preference from .common import RequestHandler, register_preference, Request
from .exceptions import UnsupportedRequest from .exceptions import UnsupportedRequest
from ..compat.types import NoneType from ..compat.types import NoneType
from ..utils import classproperty, join_nonempty from ..utils import classproperty, join_nonempty
from ..utils.networking import std_headers from ..utils.networking import std_headers, HTTPHeaderDict
@dataclass(order=True, frozen=True) @dataclass(order=True, frozen=True)
@@ -123,7 +123,17 @@ class ImpersonateRequestHandler(RequestHandler, ABC):
"""Get the requested target for the request""" """Get the requested target for the request"""
return self._resolve_target(request.extensions.get('impersonate') or self.impersonate) return self._resolve_target(request.extensions.get('impersonate') or self.impersonate)
def _get_impersonate_headers(self, request): def _prepare_impersonate_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_impersonate_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external impersonation use.
Subclasses may define a _prepare_impersonate_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers) headers = self._merge_headers(request.headers)
if self._get_request_target(request) is not None: if self._get_request_target(request) is not None:
# remove all headers present in std_headers # remove all headers present in std_headers
@@ -131,7 +141,11 @@ class ImpersonateRequestHandler(RequestHandler, ABC):
for k, v in std_headers.items(): for k, v in std_headers.items():
if headers.get(k) == v: if headers.get(k) == v:
headers.pop(k) headers.pop(k)
return headers
self._prepare_impersonate_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
@register_preference(ImpersonateRequestHandler) @register_preference(ImpersonateRequestHandler)

View File

@@ -398,7 +398,7 @@ def create_parser():
'(Alias: --no-config)')) '(Alias: --no-config)'))
general.add_option( general.add_option(
'--no-config-locations', '--no-config-locations',
action='store_const', dest='config_locations', const=[], action='store_const', dest='config_locations', const=None,
help=( help=(
'Do not load any custom configuration files (default). When given inside a ' 'Do not load any custom configuration files (default). When given inside a '
'configuration file, ignore all previous --config-locations defined in the current file')) 'configuration file, ignore all previous --config-locations defined in the current file'))
@@ -410,12 +410,21 @@ def create_parser():
'("-" for stdin). Can be used multiple times and inside other configuration files')) '("-" for stdin). Can be used multiple times and inside other configuration files'))
general.add_option( general.add_option(
'--plugin-dirs', '--plugin-dirs',
dest='plugin_dirs', metavar='PATH', action='append', metavar='PATH',
dest='plugin_dirs',
action='callback',
callback=_list_from_options_callback,
type='str',
callback_kwargs={'delim': None},
default=['default'],
help=( help=(
'Path to an additional directory to search for plugins. ' 'Path to an additional directory to search for plugins. '
'This option can be used multiple times to add multiple directories. ' 'This option can be used multiple times to add multiple directories. '
'Note that this currently only works for extractor plugins; ' 'Use "default" to search the default plugin directories (default)'))
'postprocessor plugins can only be loaded from the default plugin directories')) general.add_option(
'--no-plugin-dirs',
dest='plugin_dirs', action='store_const', const=[],
help='Clear plugin directories to search, including defaults and those provided by previous --plugin-dirs')
general.add_option( general.add_option(
'--flat-playlist', '--flat-playlist',
action='store_const', dest='extract_flat', const='in_playlist', default=False, action='store_const', dest='extract_flat', const='in_playlist', default=False,

View File

@@ -1,4 +1,5 @@
import contextlib import contextlib
import dataclasses
import functools import functools
import importlib import importlib
import importlib.abc import importlib.abc
@@ -14,17 +15,48 @@ import zipimport
from pathlib import Path from pathlib import Path
from zipfile import ZipFile from zipfile import ZipFile
from .globals import (
Indirect,
plugin_dirs,
all_plugins_loaded,
plugin_specs,
)
from .utils import ( from .utils import (
Config,
get_executable_path, get_executable_path,
get_system_config_dirs, get_system_config_dirs,
get_user_config_dirs, get_user_config_dirs,
merge_dicts,
orderedSet, orderedSet,
write_string, write_string,
) )
PACKAGE_NAME = 'yt_dlp_plugins' PACKAGE_NAME = 'yt_dlp_plugins'
COMPAT_PACKAGE_NAME = 'ytdlp_plugins' COMPAT_PACKAGE_NAME = 'ytdlp_plugins'
_BASE_PACKAGE_PATH = Path(__file__).parent
# Please Note: Due to necessary changes and the complex nature involved,
# no backwards compatibility is guaranteed for the plugin system API.
# However, we will still try our best.
__all__ = [
'COMPAT_PACKAGE_NAME',
'PACKAGE_NAME',
'PluginSpec',
'directories',
'load_all_plugins',
'load_plugins',
'register_plugin_spec',
]
@dataclasses.dataclass
class PluginSpec:
module_name: str
suffix: str
destination: Indirect
plugin_destination: Indirect
class PluginLoader(importlib.abc.Loader): class PluginLoader(importlib.abc.Loader):
@@ -44,7 +76,42 @@ def dirs_in_zip(archive):
pass pass
except Exception as e: except Exception as e:
write_string(f'WARNING: Could not read zip file {archive}: {e}\n') write_string(f'WARNING: Could not read zip file {archive}: {e}\n')
return set() return ()
def default_plugin_paths():
def _get_package_paths(*root_paths, containing_folder):
for config_dir in orderedSet(map(Path, root_paths), lazy=True):
# We need to filter the base path added when running __main__.py directly
if config_dir == _BASE_PACKAGE_PATH:
continue
with contextlib.suppress(OSError):
yield from (config_dir / containing_folder).iterdir()
# Load from yt-dlp config folders
yield from _get_package_paths(
*get_user_config_dirs('yt-dlp'),
*get_system_config_dirs('yt-dlp'),
containing_folder='plugins',
)
# Load from yt-dlp-plugins folders
yield from _get_package_paths(
get_executable_path(),
*get_user_config_dirs(''),
*get_system_config_dirs(''),
containing_folder='yt-dlp-plugins',
)
# Load from PYTHONPATH directories
yield from (path for path in map(Path, sys.path) if path != _BASE_PACKAGE_PATH)
def candidate_plugin_paths(candidate):
candidate_path = Path(candidate)
if not candidate_path.is_dir():
raise ValueError(f'Invalid plugin directory: {candidate_path}')
yield from candidate_path.iterdir()
class PluginFinder(importlib.abc.MetaPathFinder): class PluginFinder(importlib.abc.MetaPathFinder):
@@ -56,40 +123,16 @@ class PluginFinder(importlib.abc.MetaPathFinder):
def __init__(self, *packages): def __init__(self, *packages):
self._zip_content_cache = {} self._zip_content_cache = {}
self.packages = set(itertools.chain.from_iterable( self.packages = set(
itertools.accumulate(name.split('.'), lambda a, b: '.'.join((a, b))) itertools.chain.from_iterable(
for name in packages)) itertools.accumulate(name.split('.'), lambda a, b: '.'.join((a, b)))
for name in packages))
def search_locations(self, fullname): def search_locations(self, fullname):
candidate_locations = [] candidate_locations = itertools.chain.from_iterable(
default_plugin_paths() if candidate == 'default' else candidate_plugin_paths(candidate)
def _get_package_paths(*root_paths, containing_folder='plugins'): for candidate in plugin_dirs.value
for config_dir in orderedSet(map(Path, root_paths), lazy=True): )
with contextlib.suppress(OSError):
yield from (config_dir / containing_folder).iterdir()
# Load from yt-dlp config folders
candidate_locations.extend(_get_package_paths(
*get_user_config_dirs('yt-dlp'),
*get_system_config_dirs('yt-dlp'),
containing_folder='plugins'))
# Load from yt-dlp-plugins folders
candidate_locations.extend(_get_package_paths(
get_executable_path(),
*get_user_config_dirs(''),
*get_system_config_dirs(''),
containing_folder='yt-dlp-plugins'))
candidate_locations.extend(map(Path, sys.path)) # PYTHONPATH
with contextlib.suppress(ValueError): # Added when running __main__.py directly
candidate_locations.remove(Path(__file__).parent)
# TODO(coletdjnz): remove when plugin globals system is implemented
if Config._plugin_dirs:
candidate_locations.extend(_get_package_paths(
*Config._plugin_dirs,
containing_folder=''))
parts = Path(*fullname.split('.')) parts = Path(*fullname.split('.'))
for path in orderedSet(candidate_locations, lazy=True): for path in orderedSet(candidate_locations, lazy=True):
@@ -109,7 +152,8 @@ class PluginFinder(importlib.abc.MetaPathFinder):
search_locations = list(map(str, self.search_locations(fullname))) search_locations = list(map(str, self.search_locations(fullname)))
if not search_locations: if not search_locations:
return None # Prevent using built-in meta finders for searching plugins.
raise ModuleNotFoundError(fullname)
spec = importlib.machinery.ModuleSpec(fullname, PluginLoader(), is_package=True) spec = importlib.machinery.ModuleSpec(fullname, PluginLoader(), is_package=True)
spec.submodule_search_locations = search_locations spec.submodule_search_locations = search_locations
@@ -123,8 +167,10 @@ class PluginFinder(importlib.abc.MetaPathFinder):
def directories(): def directories():
spec = importlib.util.find_spec(PACKAGE_NAME) with contextlib.suppress(ModuleNotFoundError):
return spec.submodule_search_locations if spec else [] if spec := importlib.util.find_spec(PACKAGE_NAME):
return list(spec.submodule_search_locations)
return []
def iter_modules(subpackage): def iter_modules(subpackage):
@@ -134,19 +180,23 @@ def iter_modules(subpackage):
yield from pkgutil.iter_modules(path=pkg.__path__, prefix=f'{fullname}.') yield from pkgutil.iter_modules(path=pkg.__path__, prefix=f'{fullname}.')
def load_module(module, module_name, suffix): def get_regular_classes(module, module_name, suffix):
# Find standard public plugin classes (not overrides)
return inspect.getmembers(module, lambda obj: ( return inspect.getmembers(module, lambda obj: (
inspect.isclass(obj) inspect.isclass(obj)
and obj.__name__.endswith(suffix) and obj.__name__.endswith(suffix)
and obj.__module__.startswith(module_name) and obj.__module__.startswith(module_name)
and not obj.__name__.startswith('_') and not obj.__name__.startswith('_')
and obj.__name__ in getattr(module, '__all__', [obj.__name__]))) and obj.__name__ in getattr(module, '__all__', [obj.__name__])
and getattr(obj, 'PLUGIN_NAME', None) is None
))
def load_plugins(name, suffix): def load_plugins(plugin_spec: PluginSpec):
classes = {} name, suffix = plugin_spec.module_name, plugin_spec.suffix
if os.environ.get('YTDLP_NO_PLUGINS'): regular_classes = {}
return classes if os.environ.get('YTDLP_NO_PLUGINS') or not plugin_dirs.value:
return regular_classes
for finder, module_name, _ in iter_modules(name): for finder, module_name, _ in iter_modules(name):
if any(x.startswith('_') for x in module_name.split('.')): if any(x.startswith('_') for x in module_name.split('.')):
@@ -163,24 +213,42 @@ def load_plugins(name, suffix):
sys.modules[module_name] = module sys.modules[module_name] = module
spec.loader.exec_module(module) spec.loader.exec_module(module)
except Exception: except Exception:
write_string(f'Error while importing module {module_name!r}\n{traceback.format_exc(limit=-1)}') write_string(
f'Error while importing module {module_name!r}\n{traceback.format_exc(limit=-1)}',
)
continue continue
classes.update(load_module(module, module_name, suffix)) regular_classes.update(get_regular_classes(module, module_name, suffix))
# Compat: old plugin system using __init__.py # Compat: old plugin system using __init__.py
# Note: plugins imported this way do not show up in directories() # Note: plugins imported this way do not show up in directories()
# nor are considered part of the yt_dlp_plugins namespace package # nor are considered part of the yt_dlp_plugins namespace package
with contextlib.suppress(FileNotFoundError): if 'default' in plugin_dirs.value:
spec = importlib.util.spec_from_file_location( with contextlib.suppress(FileNotFoundError):
name, Path(get_executable_path(), COMPAT_PACKAGE_NAME, name, '__init__.py')) spec = importlib.util.spec_from_file_location(
plugins = importlib.util.module_from_spec(spec) name,
sys.modules[spec.name] = plugins Path(get_executable_path(), COMPAT_PACKAGE_NAME, name, '__init__.py'),
spec.loader.exec_module(plugins) )
classes.update(load_module(plugins, spec.name, suffix)) plugins = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = plugins
spec.loader.exec_module(plugins)
regular_classes.update(get_regular_classes(plugins, spec.name, suffix))
return classes # Add the classes into the global plugin lookup for that type
plugin_spec.plugin_destination.value = regular_classes
# We want to prepend to the main lookup for that type
plugin_spec.destination.value = merge_dicts(regular_classes, plugin_spec.destination.value)
return regular_classes
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.extractor', f'{PACKAGE_NAME}.postprocessor')) def load_all_plugins():
for plugin_spec in plugin_specs.value.values():
load_plugins(plugin_spec)
all_plugins_loaded.value = True
__all__ = ['COMPAT_PACKAGE_NAME', 'PACKAGE_NAME', 'directories', 'load_plugins']
def register_plugin_spec(plugin_spec: PluginSpec):
# If the plugin spec for a module is already registered, it will not be added again
if plugin_spec.module_name not in plugin_specs.value:
plugin_specs.value[plugin_spec.module_name] = plugin_spec
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.{plugin_spec.module_name}'))

View File

@@ -33,15 +33,38 @@ from .movefilesafterdownload import MoveFilesAfterDownloadPP
from .sponskrub import SponSkrubPP from .sponskrub import SponSkrubPP
from .sponsorblock import SponsorBlockPP from .sponsorblock import SponsorBlockPP
from .xattrpp import XAttrMetadataPP from .xattrpp import XAttrMetadataPP
from ..plugins import load_plugins from ..globals import plugin_pps, postprocessors
from ..plugins import PACKAGE_NAME, register_plugin_spec, PluginSpec
from ..utils import deprecation_warning
_PLUGIN_CLASSES = load_plugins('postprocessor', 'PP')
def __getattr__(name):
lookup = plugin_pps.value
if name in lookup:
deprecation_warning(
f'Importing a plugin Post-Processor from {__name__} is deprecated. '
f'Please import {PACKAGE_NAME}.postprocessor.{name} instead.')
return lookup[name]
raise AttributeError(f'module {__name__!r} has no attribute {name!r}')
def get_postprocessor(key): def get_postprocessor(key):
return globals()[key + 'PP'] return postprocessors.value[key + 'PP']
globals().update(_PLUGIN_CLASSES) register_plugin_spec(PluginSpec(
__all__ = [name for name in globals() if name.endswith('PP')] module_name='postprocessor',
__all__.extend(('FFmpegPostProcessor', 'PostProcessor')) suffix='PP',
destination=postprocessors,
plugin_destination=plugin_pps,
))
_default_pps = {
name: value
for name, value in globals().items()
if name.endswith('PP') or name in ('FFmpegPostProcessor', 'PostProcessor')
}
postprocessors.value.update(_default_pps)
__all__ = list(_default_pps.values())

View File

@@ -10,6 +10,7 @@ from ..utils import (
_configuration_args, _configuration_args,
deprecation_warning, deprecation_warning,
) )
from ..utils._utils import _ProgressState
class PostProcessorMetaClass(type): class PostProcessorMetaClass(type):
@@ -189,7 +190,7 @@ class PostProcessor(metaclass=PostProcessorMetaClass):
self._downloader.to_console_title(self._downloader.evaluate_outtmpl( self._downloader.to_console_title(self._downloader.evaluate_outtmpl(
progress_template.get('postprocess-title') or 'yt-dlp %(progress._default_template)s', progress_template.get('postprocess-title') or 'yt-dlp %(progress._default_template)s',
progress_dict)) progress_dict), _ProgressState.from_dict(s), s.get('_percent'))
def _retry_download(self, err, count, retries): def _retry_download(self, err, count, retries):
# While this is not an extractor, it behaves similar to one and # While this is not an extractor, it behaves similar to one and

View File

@@ -202,7 +202,7 @@ class FFmpegPostProcessor(PostProcessor):
@property @property
def available(self): def available(self):
return self.basename is not None return bool(self._ffmpeg_location.get()) or self.basename is not None
@property @property
def executable(self): def executable(self):
@@ -743,7 +743,7 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
if value not in ('', None): if value not in ('', None):
value = ', '.join(map(str, variadic(value))) value = ', '.join(map(str, variadic(value)))
value = value.replace('\0', '') # nul character cannot be passed in command line value = value.replace('\0', '') # nul character cannot be passed in command line
metadata['common'].update({meta_f: value for meta_f in variadic(meta_list)}) metadata['common'].update(dict.fromkeys(variadic(meta_list), value))
# Info on media metadata/metadata supported by ffmpeg: # Info on media metadata/metadata supported by ffmpeg:
# https://wiki.multimedia.cx/index.php/FFmpeg_Metadata # https://wiki.multimedia.cx/index.php/FFmpeg_Metadata

View File

@@ -117,7 +117,7 @@ _FILE_SUFFIXES = {
} }
_NON_UPDATEABLE_REASONS = { _NON_UPDATEABLE_REASONS = {
**{variant: None for variant in _FILE_SUFFIXES}, # Updatable **dict.fromkeys(_FILE_SUFFIXES), # Updatable
**{variant: f'Auto-update is not supported for unpackaged {name} executable; Re-download the latest release' **{variant: f'Auto-update is not supported for unpackaged {name} executable; Re-download the latest release'
for variant, name in {'win32_dir': 'Windows', 'darwin_dir': 'MacOS', 'linux_dir': 'Linux'}.items()}, for variant, name in {'win32_dir': 'Windows', 'darwin_dir': 'MacOS', 'linux_dir': 'Linux'}.items()},
'py2exe': 'py2exe is no longer supported by yt-dlp; This executable cannot be updated', 'py2exe': 'py2exe is no longer supported by yt-dlp; This executable cannot be updated',

View File

@@ -8,6 +8,7 @@ import contextlib
import datetime as dt import datetime as dt
import email.header import email.header
import email.utils import email.utils
import enum
import errno import errno
import functools import functools
import hashlib import hashlib
@@ -51,6 +52,7 @@ from ..compat import (
compat_HTMLParseError, compat_HTMLParseError,
) )
from ..dependencies import xattr from ..dependencies import xattr
from ..globals import IN_CLI
__name__ = __name__.rsplit('.', 1)[0] # noqa: A001: Pretend to be the parent module __name__ = __name__.rsplit('.', 1)[0] # noqa: A001: Pretend to be the parent module
@@ -1486,8 +1488,7 @@ def write_string(s, out=None, encoding=None):
# TODO: Use global logger # TODO: Use global logger
def deprecation_warning(msg, *, printer=None, stacklevel=0, **kwargs): def deprecation_warning(msg, *, printer=None, stacklevel=0, **kwargs):
from .. import _IN_CLI if IN_CLI.value:
if _IN_CLI:
if msg in deprecation_warning._cache: if msg in deprecation_warning._cache:
return return
deprecation_warning._cache.add(msg) deprecation_warning._cache.add(msg)
@@ -3246,7 +3247,7 @@ def _match_one(filter_part, dct, incomplete):
op = lambda attr, value: not unnegated_op(attr, value) op = lambda attr, value: not unnegated_op(attr, value)
else: else:
op = unnegated_op op = unnegated_op
comparison_value = m['quotedstrval'] or m['strval'] or m['intval'] comparison_value = m['quotedstrval'] or m['strval']
if m['quote']: if m['quote']:
comparison_value = comparison_value.replace(r'\{}'.format(m['quote']), m['quote']) comparison_value = comparison_value.replace(r'\{}'.format(m['quote']), m['quote'])
actual_value = dct.get(m['key']) actual_value = dct.get(m['key'])
@@ -4890,10 +4891,6 @@ class Config:
filename = None filename = None
__initialized = False __initialized = False
# Internal only, do not use! Hack to enable --plugin-dirs
# TODO(coletdjnz): remove when plugin globals system is implemented
_plugin_dirs = None
def __init__(self, parser, label=None): def __init__(self, parser, label=None):
self.parser, self.label = parser, label self.parser, self.label = parser, label
self._loaded_paths, self.configs = set(), [] self._loaded_paths, self.configs = set(), []
@@ -5677,3 +5674,32 @@ class _YDLLogger:
def stderr(self, message): def stderr(self, message):
if self._ydl: if self._ydl:
self._ydl.to_stderr(message) self._ydl.to_stderr(message)
class _ProgressState(enum.Enum):
"""
Represents a state for a progress bar.
See: https://conemu.github.io/en/AnsiEscapeCodes.html#ConEmu_specific_OSC
"""
HIDDEN = 0
INDETERMINATE = 3
VISIBLE = 1
WARNING = 4
ERROR = 2
@classmethod
def from_dict(cls, s, /):
if s['status'] == 'finished':
return cls.INDETERMINATE
# Not currently used
if s['status'] == 'error':
return cls.ERROR
return cls.INDETERMINATE if s.get('_percent') is None else cls.VISIBLE
def get_ansi_escape(self, /, percent=None):
percent = 0 if percent is None else int(percent)
return f'\033]9;4;{self.value};{percent}\007'

View File

@@ -1,9 +1,16 @@
from __future__ import annotations
import collections import collections
import collections.abc
import random import random
import typing
import urllib.parse import urllib.parse
import urllib.request import urllib.request
from ._utils import remove_start if typing.TYPE_CHECKING:
T = typing.TypeVar('T')
from ._utils import NO_DEFAULT, remove_start
def random_user_agent(): def random_user_agent():
@@ -51,32 +58,141 @@ def random_user_agent():
return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS) return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS)
class HTTPHeaderDict(collections.UserDict, dict): class HTTPHeaderDict(dict):
""" """
Store and access keys case-insensitively. Store and access keys case-insensitively.
The constructor can take multiple dicts, in which keys in the latter are prioritised. The constructor can take multiple dicts, in which keys in the latter are prioritised.
Retains a case sensitive mapping of the headers, which can be accessed via `.sensitive()`.
""" """
def __new__(cls, *args: typing.Any, **kwargs: typing.Any) -> typing.Self:
obj = dict.__new__(cls, *args, **kwargs)
obj.__sensitive_map = {}
return obj
def __init__(self, *args, **kwargs): def __init__(self, /, *args, **kwargs):
super().__init__() super().__init__()
for dct in args: self.__sensitive_map = {}
if dct is not None:
self.update(dct)
self.update(kwargs)
def __setitem__(self, key, value): for dct in filter(None, args):
if isinstance(value, bytes): self.update(dct)
value = value.decode('latin-1') if kwargs:
super().__setitem__(key.title(), str(value).strip()) self.update(kwargs)
def __getitem__(self, key): def sensitive(self, /) -> dict[str, str]:
return {
self.__sensitive_map[key]: value
for key, value in self.items()
}
def __contains__(self, key: str, /) -> bool:
return super().__contains__(key.title() if isinstance(key, str) else key)
def __delitem__(self, key: str, /) -> None:
key = key.title()
del self.__sensitive_map[key]
super().__delitem__(key)
def __getitem__(self, key, /) -> str:
return super().__getitem__(key.title()) return super().__getitem__(key.title())
def __delitem__(self, key): def __ior__(self, other, /):
super().__delitem__(key.title()) if isinstance(other, type(self)):
other = other.sensitive()
if isinstance(other, dict):
self.update(other)
return
return NotImplemented
def __contains__(self, key): def __or__(self, other, /) -> typing.Self:
return super().__contains__(key.title() if isinstance(key, str) else key) if isinstance(other, type(self)):
other = other.sensitive()
if isinstance(other, dict):
return type(self)(self.sensitive(), other)
return NotImplemented
def __ror__(self, other, /) -> typing.Self:
if isinstance(other, type(self)):
other = other.sensitive()
if isinstance(other, dict):
return type(self)(other, self.sensitive())
return NotImplemented
def __setitem__(self, key: str, value, /) -> None:
if isinstance(value, bytes):
value = value.decode('latin-1')
key_title = key.title()
self.__sensitive_map[key_title] = key
super().__setitem__(key_title, str(value).strip())
def clear(self, /) -> None:
self.__sensitive_map.clear()
super().clear()
def copy(self, /) -> typing.Self:
return type(self)(self.sensitive())
@typing.overload
def get(self, key: str, /) -> str | None: ...
@typing.overload
def get(self, key: str, /, default: T) -> str | T: ...
def get(self, key, /, default=NO_DEFAULT):
key = key.title()
if default is NO_DEFAULT:
return super().get(key)
return super().get(key, default)
@typing.overload
def pop(self, key: str, /) -> str: ...
@typing.overload
def pop(self, key: str, /, default: T) -> str | T: ...
def pop(self, key, /, default=NO_DEFAULT):
key = key.title()
if default is NO_DEFAULT:
self.__sensitive_map.pop(key)
return super().pop(key)
self.__sensitive_map.pop(key, default)
return super().pop(key, default)
def popitem(self) -> tuple[str, str]:
self.__sensitive_map.popitem()
return super().popitem()
@typing.overload
def setdefault(self, key: str, /) -> str: ...
@typing.overload
def setdefault(self, key: str, /, default) -> str: ...
def setdefault(self, key, /, default=None) -> str:
key = key.title()
if key in self.__sensitive_map:
return super().__getitem__(key)
self[key] = default or ''
return self[key]
def update(self, other, /, **kwargs) -> None:
if isinstance(other, type(self)):
other = other.sensitive()
if isinstance(other, collections.abc.Mapping):
for key, value in other.items():
self[key] = value
elif hasattr(other, 'keys'):
for key in other.keys(): # noqa: SIM118
self[key] = other[key]
else:
for key, value in other:
self[key] = value
for key, value in kwargs.items():
self[key] = value
std_headers = HTTPHeaderDict({ std_headers = HTTPHeaderDict({

Some files were not shown because too many files have changed in this diff Show More