1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-13 10:21:30 +00:00

Compare commits

...

324 Commits

Author SHA1 Message Date
github-actions
b761428226 [version] update
Created by: pukkandan

:ci skip all
2022-02-04 06:39:10 +00:00
pukkandan
c1653e9efb Release 2022.02.04 2022-02-04 12:03:56 +05:30
pukkandan
84bbc54599 [youtube:search] Add tests 2022-02-04 12:02:01 +05:30
pukkandan
1e5d87beee [websocket] Make syntax error in websockets module non-fatal
Closes #2633
2022-02-04 12:02:01 +05:30
nixxo
22219f2d1f [mediaset] Fix extractor (#2158)
Closes #2149
Authored by: nixxo
2022-02-04 11:29:38 +05:30
Lesmiscore (Naoya Ozaki)
5a13fdd225 [twitcasting] Enforce UTF-8 for POST payload (#2521)
Authored by: Lesmiscore
2022-02-04 11:24:33 +05:30
coletdjnz
af5c1c553e [youtube] Fix search extractor
Regression introduced in 16aa9ea41d. Closes #2628
Authored-by: coletdjnz
2022-02-04 10:32:56 +13:00
github-actions
3cea9ec2eb [version] update
Created by: pukkandan

:ci skip all
2022-02-03 17:51:54 +00:00
pukkandan
28469edd7d Release 2022.02.03 2022-02-03 23:14:46 +05:30
pukkandan
d5a398988b Update to ytdl-commit-78ce962
[youtube] Support channel search
78ce962f4f
2022-02-03 22:23:24 +05:30
pukkandan
455a15e2dc [cleanup,docs] Minor fixes
Closes #2541, #2484
2022-02-03 21:00:39 +05:30
pukkandan
460a1c08b9 [FFmpegConcat] Abort on --skip-download and download errors
Closes #2470
2022-02-03 21:00:38 +05:30
pukkandan
4918522735 [utils] Strip double spaces in clean_html
Closes #2497
Authored by: dirkf
2022-02-03 21:00:37 +05:30
pukkandan
65662dffb1 Make nested --config-locations relative to parent file
* and allow environment variables in it so that you can use `$PWD`/`%cd%`
to specify paths relative to current directory
2022-02-03 21:00:36 +05:30
pukkandan
5e51f4a8ad [glomex] Simplify embed detection (#2600)
Closes #2512
2022-02-03 13:16:47 +05:30
pukkandan
54bb39065c [bilibili] Fix extractor
Closes #2599, Closes #2562
Fixes https://github.com/yt-dlp/yt-dlp/pull/1716#issuecomment-980512982
2022-02-02 18:06:47 +05:30
pukkandan
c5332d7fbb [instagram] Fix bug in 013322a95e
Closes #2552
2022-02-02 08:55:43 +05:30
pukkandan
35cd4c4d88 [cctv] De-prioritize sample format
Closes #2479
2022-02-02 07:44:04 +05:30
pukkandan
67fb99f193 [doodstream] Fix extractor
Closes #2584
2022-02-02 07:33:28 +05:30
pukkandan
85553414ae [generic] Allow further processing of json_ld URL
Closes #2578
2022-02-02 07:33:16 +05:30
pukkandan
d16df59db5 Fix --compat-options list-formats
Closes #2481
2022-02-02 06:09:10 +05:30
Bricio
63c3ee4f63 [globo] Fix extractor (#2589)
Closes #2524
Authored by: Bricio
2022-02-02 06:00:09 +05:30
pukkandan
182bda88e8 [youtube, cleanup] Misc fixes and cleanup 2022-02-02 02:15:53 +05:30
pukkandan
16aa9ea41d [youtube] Add extractor YoutubeMusicSearchURLIE
Closes #2568
2022-02-02 00:11:38 +05:30
Lesmiscore (Naoya Ozaki)
d6bc443bde [fc2] Fix extraction (#2572)
Closes #2566
Authored by: Lesmiscore
2022-02-01 18:52:18 +05:30
MinePlayersPE
046cab3915 [TikTok] Iterate through app versions (#2449)
Closes #2476
Authored by: MinePlayersPE
2022-02-01 13:56:23 +05:30
Sipherdrakon
7df07a3b55 [dplay] Add extractors for site changes (#2401)
Closes #2438
Authored by: Sipherdrakon
2022-02-01 13:17:23 +05:30
Zenon Mousmoulas
2d49720f89 [ertgr] Add new extractors (#2338)
Authored-by: zmousm, dirkf
2022-02-01 13:02:13 +05:30
pukkandan
48416bc4a8 [youtube] Fix n-sig for player e06dea74 2022-02-01 08:10:41 +05:30
pukkandan
6a0546e313 [outtmpl] Handle hard-coded file extension better
When we know that the user-provided extension is the correct final one,
replace it with intermediate extension during download
2022-02-01 06:26:01 +05:30
pukkandan
dbcea0585f [outtmpl] Handle -o "" better
Since the specific type of file is not downloaded when giving `-o "<type>:"`,
now `-o ""` acts as an alias to `--skip-download`
2022-02-01 06:21:36 +05:30
KiberInfinity
f7d4854131 [Pladform] Fix redirection to external player (#2550)
Authored by: KiberInfinity
2022-02-01 05:23:54 +05:30
foghawk
403be2eefb [tumblr] Fix 403 errors and handle vimeo embeds (#2542)
Fixes https://github.com/ytdl-org/youtube-dl/issues/29585
Authored by: foghawk
2022-02-01 02:31:21 +05:30
lazypete365
63bac931c2 [mildom] Fix extractor (#2533)
Closes #2519
Authored by: lazypete365
2022-02-01 02:24:49 +05:30
Jeff Huffman
7c74a01584 [crunchyroll] Fix login (#2530)
Closes #1424
Authored by: tejing1
2022-02-01 01:31:17 +05:30
pukkandan
1d3586d0d5 [aes] Add unpad_pkcs7 2022-02-01 00:29:36 +05:30
pukkandan
c533c89ce1 [GoogleSearch] Fix extractor 2022-02-01 00:12:34 +05:30
KiberInfinity
b8b3f4562a [Odnoklassniki] Improve embedded players extraction (#2549)
Authored by: KiberInfinity
2022-02-01 00:07:07 +05:30
nyuszika7h
1c6f480160 [viki] Fix "Bad request" for manifest (#2540)
Closes #2499
Authored by: nyuszika7h
2022-01-31 20:22:42 +05:30
u-spec-png
f8580bf02f [Bilibili] Add 8k support (#1964)
Closes #1898, #1819
Authored by: u-spec-png
2022-01-31 00:21:22 +05:30
Zenon Mousmoulas
19afd9ea51 [GlomexEmbed] Avoid large match objects
Closes #2512
Authored by: zmousm
2022-01-30 19:05:39 +05:30
trasssh
b72270d27e [MySpass] Fix video url processing (#2510)
Closes #2507
Authored by: trassshhub
2022-01-30 02:24:48 +05:30
Jeff Huffman
706dfe441b [crunchyroll:beta] Add cookies support (#2506)
* Extract directly from the beta API when cookies are passed. If login cookie is absent, the extraction is delegated to `CrunchyrollIE`. This causes different metadata to be extracted (including formats and video id) and therefore results in a different archive entry. For now, this issue is unavoidable since the browser also redirects to the old site when not logged in.

* Adds extractor-args `format` and `hardsub` to control the source and subtitles of the extracted formats

Closes #1911
Authored by: tejing1
2022-01-29 06:03:51 +05:30
YuenSzeHong
c4da5ff971 [Fujitv] Extract metadata and support premium (#2505)
Authored by: YuenSzeHong
2022-01-28 18:28:03 +05:30
KiberInfinity
e26f9cc1e5 [YandexVideoPreview] Add extractor (#2500)
Closes #1794
Authored by: KiberInfinity
2022-01-28 01:21:40 +05:30
pukkandan
fa8fd95118 [cookies] Fix keyring selection for unsupported desktops
Closes #2450
2022-01-24 22:52:35 +05:30
MinePlayersPE
05b23b4156 [iq.com] Add VIP support (#2444)
Authored by: MinePlayersPE
2022-01-24 22:49:34 +05:30
Aleri Kaisattera
8f028b5f40 [Vimm] add recording extractor (#2441)
Authored by: alerikaisattera
2022-01-24 22:37:04 +05:30
MinePlayersPE
013322a95e [Instagram] Fix extraction when logged in (#2439)
Closes #2435
Authored by: MinePlayersPE
2022-01-24 22:34:34 +05:30
Ashish Gupta
fb62afd6f0 [Musicdex] Add extractors (#2421)
Closes #2204
Authored by: Ashish0804
2022-01-24 22:11:21 +05:30
Ashish Gupta
50600e833d [ThisOldHouse] Improve Premium URL check (#2445)
Closes #2443
Authored by: Ashish0804
2022-01-24 21:54:28 +05:30
pukkandan
fc08bdd6ab [extractor] Allow non-fatal title extraction 2022-01-24 21:04:38 +05:30
pukkandan
2568d41f70 [bilibili] Make anthology title non-fatal 2022-01-24 20:58:32 +05:30
pukkandan
88f23a18e0 [docs,cleanup] Fix linter and misc cleanup
Closes #2419
2022-01-24 03:24:23 +05:30
pukkandan
bb66c24797 Add option --print-to-file
Closes #2372
2022-01-24 03:24:15 +05:30
pukkandan
2edb38e8ca [extractor] Extract video inside Article json_ld
Closes #2448
2022-01-24 03:24:07 +05:30
pukkandan
af6793f804 [downloader/ffmpeg] Handle unknown formats better 2022-01-24 01:15:54 +05:30
pukkandan
b695e3f9bd [orf:tvthek] Lazy playlist extraction and obey --no-playlist
Closes #2411
2022-01-24 01:11:35 +05:30
pukkandan
6a5a30f9e2 Ensure _type is present in info.json
Closes #2447
2022-01-24 01:07:15 +05:30
pukkandan
d37707bda4 Fix/improve InAdvancePagedList 2022-01-24 01:07:14 +05:30
pukkandan
f40ee5e9a0 [extractor] Add convinience function _yes_playlist 2022-01-24 01:07:14 +05:30
pukkandan
1f13021eca [web.archive:youtube] Add ytarchive: prefix
and misc cleanup
2022-01-23 18:42:37 +05:30
pukkandan
e612f66c7c [archive.org] Ignore unnecessary files
Closes #2452
2022-01-23 18:37:43 +05:30
coletdjnz
87e8e8a7d0 [youtube:api] Do not use seek when reading HTTPError response
Authored-by: coletdjnz
2022-01-23 19:11:32 +13:00
Aleri Kaisattera
e600a5c908 [CAM4] Add thumbnail extraction (#2425)
Authored by: alerikaisattera
2022-01-22 03:02:25 +05:30
github-actions
50ce204cc2 [version] update
Created by: pukkandan

:ci skip all
2022-01-21 17:41:33 +05:30
pukkandan
144a3588b4 Release 2022.01.22 2022-01-21 17:41:33 +05:30
pukkandan
ed40877833 Fix 426764371f for Py3.6 2022-01-21 16:08:25 +05:30
sian1468
935f5a4209 [line] Remove tv.line.me (#2420)
Service is discontinued
Authored by: sian1468
2022-01-21 15:12:03 +05:30
pukkandan
6970b6005e [cleanup] Minor fixes
Closes #2334
2022-01-21 13:27:44 +05:30
pukkandan
fc5fa964c7 [docs] Improvements 2022-01-21 13:27:44 +05:30
pukkandan
e0ddbd02bd [cleanup] Use format_field where applicable 2022-01-21 13:27:40 +05:30
pukkandan
0bfc53d05c List playlist thumbnails in --list-thumbnails 2022-01-21 12:52:15 +05:30
Ashish Gupta
78ab4f447c [Newsy] Add extractor (#2416)
Closes #2346
Authored by: Ashish0804
2022-01-21 12:31:25 +05:30
coletdjnz
85fee22152 [PRX] Add Extractors (#2245)
Closes #2144, https://github.com/ytdl-org/youtube-dl/issues/15948

Authored by: coletdjnz
2022-01-21 12:30:29 +05:30
Felix S
ad9158d5f4 [ard] Extract subtitles (#2409)
Fixes https://github.com/ytdl-org/youtube-dl/issues/30543, related: https://github.com/ytdl-org/youtube-dl/pull/17766
Authored by: fstirlitz
2022-01-21 12:28:22 +05:30
xtkoba
f81c62a6a4 Add option --legacy-server-connect (#778)
to allow HTTPS connection to servers that do not support RFC 5746 secure renegotiation

Authored by: xtkoba
2022-01-21 11:42:30 +05:30
coletdjnz
6c73052c0a [youtube] Extract channel subscriber count (#2399)
Closes #2350
* Adds `channel_follower_count` field
Authored-by: coletdjnz
2022-01-21 06:04:36 +00:00
Ashish Gupta
593e43c030 [LnkIE] Add extractor (#2408)
Closes: #2268 
Authored by: Ashish0804
2022-01-21 11:32:31 +05:30
Ashish Gupta
8fe514d382 [CrowdBunker] Add extractors (#2407)
Closes: #2356 
Authored by: Ashish0804
2022-01-21 11:25:55 +05:30
pukkandan
b1156c1e59 Fix d14cbdd92d 2022-01-21 07:49:03 +05:30
pukkandan
311b6615d8 [extractor] Improve url_result and related 2022-01-20 21:14:40 +05:30
coletdjnz
396a76f7bf [youtube] Enforce UTC (#2402)
and [utils] use `utcnow` in `datetime_from_str`

Related: #2223 
Authored by: coletdjnz
2022-01-20 20:32:01 +05:30
coletdjnz
301d07fc4b [youtube:tab] Extract channel banner (#2400)
Closes #2237
Authored by: coletdjnz
2022-01-20 20:29:09 +05:30
pukkandan
d14cbdd92d [utils] Add Sec-Fetch-Mode to std_headers
Closes #2187
2022-01-20 20:21:54 +05:30
pukkandan
19b4c74d40 Revert d6579d532b
Closes #2396, Reopens #2187
2022-01-20 20:00:40 +05:30
pukkandan
135dfa2c7e [extractor,cleanup] Use _search_nextjs_data 2022-01-20 04:38:24 +05:30
MinePlayersPE
e0585e6562 [TikTok] Extract captions (#2185)
Closes #2184
Authored by: MinePlayersPE
2022-01-20 04:05:27 +05:30
MinePlayersPE
426764371f [iq.com] Add extractors (#2354)
Closes #704
Authored by: MinePlayersPE
2022-01-20 03:53:55 +05:30
krichbanana
64f36541c9 [youtube:tab] Raise error on tab redirect (#2318)
Closes #2306
Authored by: krichbanana, coletdjnz
2022-01-20 03:01:57 +05:30
Aleri Kaisattera
0ff1e0fba3 [Theta] Fix valid URL (#2323)
Authored by: alerikaisattera
2022-01-20 02:14:13 +05:30
Zenon Mousmoulas
1a20d29552 [tvopengr] Add extractors (#2297)
Authored by: zmousm
2022-01-20 02:13:02 +05:30
nyuszika7h
f7085283e1 [instagram] Fix username extraction for stories and highlights (#2348)
Authored by: nyuszika7h
2022-01-20 01:06:40 +05:30
Ashish Gupta
e25ca9b017 [RTNews] Add extractor (#2377)
Closes #2371
Authored by: Ashish0804
2022-01-19 22:07:32 +05:30
trasssh
4259402c56 [Ted] Rewrite extractor (#2359)
Closes #2343
Authored by: pukkandan, trassshhub
2022-01-19 21:34:20 +05:30
Ashish Gupta
dfb7f2a25d [CTVNewsIE] Add fallback for video search (#2378)
Closes #2370
Authored by: Ashish0804
2022-01-19 19:18:57 +05:30
Lesmiscore (Naoya Ozaki)
42c5458a02 [TVer] Extract message for unaired live (#2375)
Closes #2365
Authored by: Lesmiscore
2022-01-19 19:17:07 +05:30
Lesmiscore (Naoya Ozaki)
ba1c671d2e [mixch] Add MixchArchiveIE (#2373)
Closes #2363
Authored by: Lesmiscore
2022-01-19 19:15:35 +05:30
Zenon Mousmoulas
b143e83ec9 [megatvcom] Add embed test (#2362)
Authored by: zmousm
2022-01-19 19:13:51 +05:30
k3ns1n
4a77fb1d6b [daftsex] Add extractors (#2379)
Authored by: k3ns1n
2022-01-19 19:12:02 +05:30
pukkandan
66f7c6a3e0 [youtube] Do not return upload_date for playlists
Closes #2349
Partially reverts #1018
Re-opens #1883
2022-01-19 19:07:40 +05:30
pukkandan
baf599effa [pbs] de-prioritize AD formats
Related: #2335
2022-01-19 19:00:45 +05:30
pukkandan
8bd1c00bf3 [utils] Handle ss:xxx in parse_duration
Closes #2388
2022-01-19 18:57:29 +05:30
pukkandan
596379e260 [youtube] Make invalid storyboard URL non-fatal
Closes #2382
2022-01-19 18:57:29 +05:30
pukkandan
b6ce9bb038 [youtube] Detect live-stream embeds
Closes #2380
2022-01-19 18:56:55 +05:30
Ashish Gupta
eea1b0358e [ThisOldHouseIE] Add support for premium videos (#2358)
Authored by: Ashish0804
Closes: #2341
2022-01-18 13:10:55 +05:30
Zenon Mousmoulas
32b95bb643 [megatvcom] Add extractors (#1980)
Authored by: zmousm
2022-01-17 00:11:31 +05:30
Zenon Mousmoulas
fdf80059d9 [glomex] Minor fixes (#2357)
Authored by: zmousm
2022-01-16 18:08:31 +05:30
Ashish Gupta
aa062713c1 [PokerGo] Add extractors (#2331)
Authored by: Ashish0804
Closes: #2316
2022-01-14 18:12:58 +05:30
Zenon Mousmoulas
71738b1451 [glomex] Add new extractors (#1979)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/30212
Authored by: zmousm
2022-01-14 02:39:52 +05:30
pukkandan
0bb5ac1ac4 [dplay] Re-structure DiscoveryPlus extractors 2022-01-13 23:17:50 +05:30
Timendum
77b28f000a [dplay] Migrate DiscoveryPlusItaly to DiscoveryPlus (#2315)
Partially fixes #2138
Authored by: timendum
2022-01-13 22:41:26 +05:30
pukkandan
d57576b9d9 [httpie] Fix available method
Closes #2330
2022-01-13 22:20:59 +05:30
trasssh
11c861702d [generic] Improve KVS player extraction (#2328)
Closes #2281
Authored by: trassshhub
2022-01-13 22:21:00 +05:30
Lesmiscore (The Hatsune Daishi)
a4a426023d [twitcasting] Refactor extractor (#2310)
Co-authored-by: Ashish Gupta <39122144+Ashish0804@users.noreply.github.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

Authored by: Lesmiscore
2022-01-13 20:47:33 +09:00
pukkandan
3b603dbdf1 Add option --concat-playlist
Closes #1855, related: #382
2022-01-13 16:32:23 +05:30
pukkandan
5df1ac92bd [ffmpeg] Ignore unknown streams
Closes #2307
2022-01-13 16:32:22 +05:30
pukkandan
b2db8102dc [facebook] Fix extraction from groups
Closes #2264, related: #2320
2022-01-13 16:32:21 +05:30
Hirokuni Yano
e9a6a65a55 [yahoo:gyao] Improved playlist handling (#1975)
Authored by: hyano
2022-01-12 22:11:35 +05:30
pukkandan
ed8d87f911 [cleanup, docs] Minor fixes
Closes #2230
2022-01-12 09:00:21 +05:30
pukkandan
397235c52b [ffmpeg] Standardize use of -map 0
Closes #2182
2022-01-12 08:52:09 +05:30
pukkandan
4636548463 [CeskaTelevize] Use http for manifests
Workaround for #2043
2022-01-12 07:18:10 +05:30
pukkandan
cb3c5682ae [kakao] Detect geo-restriction
Code from: d8085580f6
2022-01-11 22:41:21 +05:30
Petr Vaněk
7d449fff53 [streamcz] Fix extractor (#1616)
Closes #1329, closes #1731
Authored by: arkamar, pukkandan
2022-01-11 22:26:18 +05:30
pukkandan
80fa6e5327 [facebook] Improve title and uploader extraction
Closes #1943, closes #795
2022-01-11 22:17:03 +05:30
Lesmiscore (The Hatsune Daishi)
fabb27fcea [twitcasting] Throw proper error for login-only streams (#2290)
Closes #2289

Authored by: Lesmiscore
2022-01-12 00:07:51 +09:00
pukkandan
e04938ab88 Check for existing thumbnail/subtitle in final directory
Closes #2275
2022-01-11 14:51:39 +05:30
teridon
8bcd404818 [digitalconcerthall] Add extractor (#1931)
Authored by: teridon
2022-01-11 03:06:05 +05:30
nixxo
0df11dafdd [rai] Add Raiplaysound extractors (#1955)
Closes #1951
Authored by: nixxo, pukkandan
2022-01-11 02:46:01 +05:30
pukkandan
dc5f409cdc Fix typo in ed5835b451 2022-01-11 00:31:19 +05:30
pukkandan
99d6f9461d [aparat] Fix extractor
Closes #2285
2022-01-11 00:28:02 +05:30
pukkandan
8130779db6 Allow listing formats, thumbnails, subtitles using --print (#2238)
Closes #2083
Authored by: pukkandan, Zirro
2022-01-11 00:28:01 +05:30
pukkandan
ed5835b451 Allow --print to be run at any post-processing stage 2022-01-11 00:28:01 +05:30
trasssh
e88e1febd8 [noodlemagazine] Add extractor (#2293)
Authored by: trassshhub
2022-01-10 22:43:09 +05:30
trasssh
faca674510 [Rule34video] Add extractor (#2279)
Authored by: trassshhub
2022-01-10 22:14:04 +05:30
MinePlayersPE
0931ba94ab [Nexx] Extract more metadata (#2273)
Authored by: MinePlayersPE
2022-01-10 21:02:15 +05:30
pukkandan
b31874334d [tiktok] Extract user thumbnail
Closes #2186
Authored by: pukkandan, MinePlayersPE
2022-01-10 19:24:10 +05:30
pukkandan
f1150b9e1e [twitter] Fix video in quoted tweets
Closes #2254
2022-01-10 18:50:28 +05:30
pukkandan
d6579d532b [utils] Partially revert d76d15a669
Closes #2187
2022-01-10 15:02:27 +05:30
pukkandan
2be56f2242 [funk] Support origin URLs
Closes #2270
2022-01-10 15:02:26 +05:30
pukkandan
f95a7b93e6 [test] Fix TestVerboseOutput
Closes #2269
2022-01-10 15:02:26 +05:30
foghawk
62c955efc9 [veoh] Improve extractor (#2251)
* [veoh] Remove old _extract_video
* [veoh] Extend _VALID_URL to accept '/videos/'
* [veoh] Prefer high quality
* [veoh] Extract more metadata

Authored by: foghawk
2022-01-09 23:50:26 +05:30
Zenon Mousmoulas
0254f16274 [utils] Improve get_elements_text_and_html_by_attribute regex (#2280)
Authored by: zmousm, pukkandan
2022-01-09 23:44:56 +05:30
Ashish Gupta
a70b71e85a [vk] Fix VKUserVideosIE (#2248)
Authored by: Ashish0804
Closes #2196
2022-01-09 21:01:34 +05:30
Unit 193
4c968755fc [PornHub,YouTube] Refresh onion addresses (#2272)
Authored by: unit193
2022-01-09 20:08:34 +05:30
MinePlayersPE
be1f331f21 [TikTok] Misc fixes (#2271)
Closes #2265
Authored by: MinePlayersPE
2022-01-09 13:51:56 +05:30
Ashish Gupta
3cf5429a21 Add EuropeanTourIE (#2247)
Closes #2208 
Authored by: Ashish0804
2022-01-08 14:09:46 +05:30
ischmidt20
bfa0e270cf [NBCSports] Fix extraction of platform URLs (#2244)
Authored by: ischmidt20
2022-01-08 10:24:52 +05:30
Luc Ritchie
f76ca2dd56 [afreecatv] Add support for livestreams (#2097)
Authored by: wlritchi
2022-01-07 17:42:29 +05:30
MinePlayersPE
5f969a78b0 [Nexx] Support 3q CDN (#2213)
Closes #1637
Authored by: MinePlayersPE
2022-01-07 17:41:10 +05:30
ischmidt20
443f8de820 [fox] Extract m3u8 from preview (#2235)
Authored by: ischmidt20
2022-01-07 17:39:24 +05:30
Moises Lima
768145d48a [Pornez] Add extractor (#2236)
Authored by: mozlima
2022-01-07 17:36:09 +05:30
pukkandan
976ae3eabb [youtube] Update tests 2022-01-07 17:25:58 +05:30
coletdjnz
f0d785d3ed [youtube:tab] Extract more playlist metadata (#2069)
* Add fields modified_date, modified_timestamp
* Add field playlist_count
* [youtube:tab] Extract view_count, playlist_count, modified_date

Authored by: coletdjnz, pukkandan
2022-01-07 16:33:02 +05:30
foghawk
97a6b117d9 [callin] Add extractor (#2000)
Authored by: foghawk
2022-01-07 15:49:15 +05:30
Zenon Mousmoulas
6f32a0b5b7 [utils] Improve parsing for nested HTML elements (#2129)
and add functions to return the HTML of elements

Authored by: zmousm
2022-01-06 00:07:49 +05:30
Aleri Kaisattera
e8736539f3 [Vimm] Add extractor (#2231)
Authored by: alerikaisattera
2022-01-05 15:25:23 +05:30
coletdjnz
9c634ef857 [MainStreaming] Add extractor (#2180)
Closes #1183, https://github.com/ytdl-org/youtube-dl/issues/29615

Authored by: coletdjnz
2022-01-05 14:18:17 +05:30
coletdjnz
9f517bb1f3 [gfycat] Support embeds (#2229)
Closes #2214
Authored by: coletdjnz
2022-01-05 14:09:24 +05:30
Lesmiscore (The Hatsune Daishi)
b8eeced286 [openrec] Add movie extractor (#2222)
Closes #2218
Authored by: Lesmiscore
2022-01-04 15:45:30 +05:30
Alexander Simon
db47787024 [hrfernsehen] Fix ardloader extraction (#2217)
Authored by: CreaValix
2022-01-04 13:43:20 +05:30
pukkandan
fdeab99eab [zee5] Add geo-bypass 2022-01-04 13:15:09 +05:30
pukkandan
9e907ebddf [cleanup] Misc cleanup 2022-01-04 01:25:10 +05:30
pukkandan
21df2117e4 [vk] Capture clip URLs 2022-01-04 00:45:30 +05:30
pukkandan
06e57990f7 Allow multiple and nested configuration files 2022-01-04 00:26:58 +05:30
pukkandan
b62fa6d75f Fix -s --ignore-no-formats --force-write-archive
Bug in a13e684813
2022-01-03 23:52:56 +05:30
pukkandan
be72c62480 Fix recursion error in f46e2f9d92
Closes #2216
2022-01-03 23:52:45 +05:30
pukkandan
61e9d9268c Fix bug in 8896899216
Closes #2215
2022-01-03 20:45:55 +05:30
pukkandan
a13e684813 Write download_archive only after all formats are downloaded
Closes #1470
2022-01-03 19:41:11 +05:30
pukkandan
f46e2f9d92 Add key requested_downloads in the root info_dict 2022-01-03 19:41:08 +05:30
pukkandan
9c906919ae Add field video_autonumber
Closes #662
2022-01-03 19:40:11 +05:30
pukkandan
6020e05d23 Raise error if subtitle download fails
Closes #2212
2022-01-03 19:40:06 +05:30
pukkandan
ebed8b3732 Add more post-processing stages
playlist = After entire playlist
after_video = After downloading all formats of a video
2022-01-03 19:40:05 +05:30
pukkandan
1e43a6f733 Allow --exec to be run at any post-processing stage
Deprecates `--exec-before-download`
2022-01-03 19:40:02 +05:30
pukkandan
ca30f449a1 Add --print playlist: to print fields per playlist 2022-01-03 19:39:59 +05:30
k3ns1n
af3cbd8782 [vk] Improve _VALID_URL (#2207)
Authored by: k3ns1n
2022-01-02 22:44:20 +05:30
zenerdi0de
7141ced57d [Dropbox] Support password protected files and more formats (#2201)
Authored by: zenerdi0de
2022-01-02 16:14:10 +05:30
coletdjnz
18c7683d27 [youtube:api] Update Innertube clients (#2163)
* Updated iOS clients to support 60fps formats (see: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558)
* General update of versions and keys of other clients
 Authored-by: coletdjnz
2022-01-02 05:22:31 +00:00
chris
f5c2c2c9b0 [zdf] Add chapter extraction (#2198)
Authored by: iw0nderhow
2022-01-02 03:37:31 +05:30
pukkandan
8896899216 [FfmpegMetadata] Allow setting metadata of individual streams
Closes #877
2022-01-02 03:33:15 +05:30
pukkandan
1797b073ed [utils] Use key None in traverse_obj to return as-is 2022-01-02 03:07:24 +05:30
pukkandan
4c922dd3fc Fix live title for multiple formats 2022-01-02 03:03:26 +05:30
pukkandan
b8e976a445 [facebook] Parse dash manifests 2022-01-02 02:41:47 +05:30
Ashish Gupta
a9f5f5d6eb [RedBullTV] Parse subtitles from manifest (#2200)
Closes #2151
Authored by: Ashish0804
2022-01-02 02:38:36 +05:30
chris
f522573787 [extractor] Extract chapters from JSON-LD (#2031)
Authored by: iw0nderhow, pukkandan
2022-01-02 02:37:00 +05:30
nixxo
7592749cbe [extractor] Extract thumbnails from JSON-LD (#2195)
Authored by: nixxo
2022-01-02 01:20:27 +05:30
pukkandan
767f999b53 [build] Reduce dependency on third party workflows
Closes #2194
2022-01-01 14:46:10 +05:30
MinePlayersPE
8efffafa53 [XVideos] Check HLS formats (#2193)
Closes #1823
Authored by; MinePlayersPE
2022-01-01 11:42:33 +05:30
Ashish Gupta
26f2aa3db9 [hotstar] Add extractor args to ignore tags (#2116)
Authored by: Ashish0804
2022-01-01 02:32:23 +05:30
pgaig
3464a2727b [VrtNU] Handle empty title (#2147)
Closes #2146
Authored by: pgaig
2022-01-01 02:28:23 +05:30
Ashish Gupta
497d77e1aa [KelbyOne] Add extractor (#2181)
Closes #2170
Authored by: Ashish0804
2022-01-01 02:10:38 +05:30
LE
9040e2d6e3 [mixcloud] Detect restrictions (#2169)
Authored by; llacb47
2022-01-01 01:41:35 +05:30
MinePlayersPE
6134fbeb65 [TikTok] Pass cookies to formats (#2171)
Closes #2166
Authored by: MinePlayersPE
2022-01-01 01:40:46 +05:30
MinePlayersPE
cfcf60ea99 [BiliIntl] Add login (#2172)
and misc improvements

Authored by: MinePlayersPE
2022-01-01 01:39:30 +05:30
Felix S
4afa3ec4b6 [extractor] Detect more subtitle codecs in MPD manifests (#2174)
Authored by: fstirlitz
2022-01-01 01:36:45 +05:30
MinePlayersPE
11aa91a12f [TikTok] Fix extraction for sigi-based webpages (#2164)
Fixes: #2133
Authored by: MinePlayersPE
2021-12-30 09:50:17 +05:30
pukkandan
abbeeebc4c [outtmpl] Alternate form for D and fix suffix's case
Fixes: https://github.com/yt-dlp/yt-dlp/issues/2085#issuecomment-1002247689, https://github.com/yt-dlp/yt-dlp/pull/2132/files#r775729811
2021-12-30 08:44:18 +05:30
pukkandan
2c539d493a [cookies] Fix bug when keyring is unspecified
Closes #2167
2021-12-30 08:44:17 +05:30
pukkandan
042931a507 Allow escaped , in --extractor-args
Closes #2152
2021-12-30 08:44:16 +05:30
MinePlayersPE
96f13f01a6 [TikTok] Change app version (#2161)
Closes #2133, #2135
Authored by: MinePlayersPE, llacb47
2021-12-30 03:30:44 +05:30
u-spec-png
4b9353239e [Drooble] Add extractor (#1547)
Closes #1527
Authored by: u-spec-png
2021-12-29 02:12:14 +05:30
u-spec-png
dd5e60b15d [Instagram] Add story/highlight extractor (#2006)
Fixes https://github.com/ytdl-org/youtube-dl/issues/25575
Authored by: u-spec-png
2021-12-29 00:28:06 +05:30
MinePlayersPE
e540c56f39 [TikTok] Fallback to feed API endpoint (#2142)
Authored by: MinePlayersPE
Workaround for #2133
2021-12-28 08:08:23 +05:30
pukkandan
45d86abeb4 Allow unicode characters in info.json
Closes #2139
2021-12-28 04:21:13 +05:30
Pierre Mdawar
f02d24d8d2 [utils] Fix format_bytes output for Bytes (#2132)
Authored by: pukkandan, mdawar
2021-12-28 03:42:19 +05:30
pukkandan
ceb98323f2 Don't treat empty containers as None in sanitize_info 2021-12-28 02:54:11 +05:30
pukkandan
7537e35b64 [gfycat] Fix uploader 2021-12-28 02:54:11 +05:30
github-actions
1e5c83b26b [version] update
Created by: pukkandan

:ci skip all
2021-12-27 02:30:03 +00:00
pukkandan
6223f67a8c Release 2021.12.27 2021-12-27 07:36:23 +05:30
pukkandan
6a34813a0d [docs] Add examples for using TYPES: in -P/-o 2021-12-27 07:26:39 +05:30
Matt Broadway
f59f5ef8b6 [cookies] Support other keyrings (#2032)
Authored by: mbway
2021-12-27 06:58:44 +05:30
pukkandan
f44afb54ef [aria2c] Don't show progress when --no-progress 2021-12-27 04:27:34 +05:30
pukkandan
77cee0f188 [EmbedThumbnail] Prefer AtomicParsley over ffmpeg if available 2021-12-27 03:49:43 +05:30
pukkandan
6a17677577 [ThumbnailsConvertor] Fix for when there are no thumbnails
Closes #2125
2021-12-27 03:18:31 +05:30
Ashish Gupta
ee7b9bdf5d [Zee5] Fix VALID_URL for tv-shows 2021-12-26 20:01:43 +05:30
pukkandan
185bf31070 [youtube] End live-from-start properly when stream ends with 403
Closes #2089
2021-12-26 16:14:00 +05:30
pukkandan
0b77924a38 [tiktok] Fix extractor_key used in archive 2021-12-26 15:10:09 +05:30
MinePlayersPE
8126298c1b [TikTok] Add music, sticker and tag IEs (#2119)
Closes #1752
Authored by: MinePlayersPE
2021-12-26 14:23:19 +05:30
pukkandan
6da22e7d4f Avoid recursion error when re-extracting info 2021-12-26 04:20:16 +05:30
MinePlayersPE
c62ecf0d90 [BiliIntl] Fix extractor (#2077)
Closes #1744
Authored by: MinePlayersPE
2021-12-26 04:11:38 +05:30
The Hatsune Daishi
3774f4f427 [PixivSketch] Add extractors (#2104)
Authored by: nao20010128nao
2021-12-26 01:46:24 +05:30
git-anony-mouse
9980d3d213 [generic] Fix HTTP KVS Player (#2111)
Authored by: git-anony-mouse
2021-12-25 08:48:19 +05:30
pukkandan
8eb4b1bb8e [ffmpeg] Fix position of --ppa
Bug in ca5db158ae
Closes #2112
2021-12-25 08:42:08 +05:30
pukkandan
332da56f52 [CBC] Fix URL regex
Closes #2110
2021-12-25 07:53:38 +05:30
github-actions
459aea84c3 [version] update
Created by: pukkandan

:ci skip all
2021-12-25 00:34:16 +00:00
pukkandan
87e0499624 Release 2021.12.25 2021-12-25 06:01:54 +05:30
pukkandan
0f86a1cd59 [dplay] Temporary fix for discoveryplus.com/it
Closes #2073
2021-12-25 05:09:07 +05:30
pukkandan
d80d98e7d4 [docs] Minor fixes 2021-12-25 04:11:12 +05:30
pukkandan
352d5da812 [utils] Improve parse_count 2021-12-25 04:07:19 +05:30
MinePlayersPE
d43de6821c [GameJolt] Add extractors (#2036)
Authored by: MinePlayersPE
2021-12-25 03:58:57 +05:30
u-spec-png
070f6a85ea [Steam] Fix extractor (#2029)
Closes #1992
Authored by: u-spec-png
2021-12-25 03:55:44 +05:30
Benedikt Wildenhain
4b4b7f746c [OpenCast] Add extractors (#1905)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/26934
Authored by: bwildenhain, C0D3D3V
2021-12-25 03:35:23 +05:30
Sonic
e9efb99f66 [dropout] Add extractor (#2045)
Authored-by: TwoThousandHedgehogs, pukkandan
2021-12-24 17:19:33 +05:30
coletdjnz
a709d87335 [youtube:tab] Extract video thumbnails from playlist (#2096)
closes #1184
Co-Authored-by: coletdjnz, pukkandan
2021-12-24 03:42:02 +00:00
siddharth
774a46c53d [npr] Make SMIL extraction non-fatal (#2099)
Closes #1934
Authored by: r5d
2021-12-24 07:45:48 +05:30
MinePlayersPE
c8b80b9643 [RCTIPlusSeries] Lazy extraction and video type selection (#2050)
Authored by: MinePlayersPE
2021-12-24 05:05:40 +05:30
MinePlayersPE
4e260d1a56 [Instagram] Try bypassing login wall with embed page (#2095)
Authored by: MinePlayersPE
2021-12-24 03:43:10 +05:30
Luc Ritchie
4f3fa23e5a [utils] Fix parsing YYYYMMDD dates in Nov/Dec (#2094)
The date format `%Y%m%d%H%M` will successfully match against
one-digit month, day, hour, and minute strings, even though %m et al.
are documented as being zero-padded. So dates without time in
Nov/Dec may be wrongly parsed as dates in January with time.

This commit adds a format string of `%Y%m%d` to our supported date
format strings directly below (higher priority) its problematic relatives.

Closes #2076
Authored by: wlritchi
2021-12-24 02:04:01 +05:30
pukkandan
b28bac93ab Fix bug in 1cefca9e44
Fixes https://github.com/ytdl-patched/ytdl-patched/issues/11
2021-12-23 09:15:05 +05:30
pukkandan
37893bb0c9 [outtmpl] Change filename sanitization type to S
`F` is already used for float!
Bug in e0fd95737d
2021-12-23 09:15:05 +05:30
Ashish Gupta
c25de59cf7 [LBRY] Support livestreams (#2062)
Closes #2054 
Authored by: Ashish0804, pukkandan
2021-12-23 08:48:02 +05:30
Emanuel Hoogeveen
205a0654c0 Add option --file-access-retries (#2066)
Closes #517
Authored by: ehoogeveen-medweb
2021-12-23 07:59:03 +05:30
aarubui
663949f825 [NJPWWorld] Extract formats from m3u8 (#2075)
Authored by: aarubui
2021-12-23 07:33:30 +05:30
pukkandan
b69fd25c25 [cleanup] Misc cleanup
Closes #1942 #1976 #2020 #2058 #1984
2021-12-23 07:12:46 +05:30
pukkandan
e0fd95737d [outtmpl] Add alternate forms F, D
and improve `id` detection

F = sanitize as filename (# = restricted)
D = add Decimal suffixes

Closes #2085, 2081
2021-12-23 06:49:16 +05:30
pukkandan
4ac5b94807 [dash] Fix --test
Bug in adbc4ec4bb
2021-12-23 03:34:18 +05:30
pukkandan
4273cc776d [dash] Fix aria2c dash downloads
Bug in adbc4ec4bb
2021-12-21 21:40:04 +05:30
pukkandan
fa9f30b802 Add interactive format selection with -f -
Closes #2065
2021-12-21 21:40:04 +05:30
pukkandan
1cefca9e44 Add warning when using -f best 2021-12-21 21:40:03 +05:30
kebianizao
5edb8dfec2 [rtve] Add RTVEAudioIE (#1657)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/29023
Authored by: kebianizao
2021-12-21 11:05:34 +05:30
pukkandan
0fcba15d57 [docs] Fix bug in regex escape in python 3.6
Bug in ec2e44fc57
Closes #2060
2021-12-20 18:48:43 +05:30
The Hatsune Daishi
adbc4ec4bb [dash,youtube] Download live from start to end (#888)
* Add option `--live-from-start` to enable downloading live videos from start
* Add key `is_from_start` in formats to identify formats (of live videos) that downloads from start
* [dash] Create protocol `http_dash_segments_generator` that allows a function to be passed instead of fragments
* [fragment] Allow multiple live dash formats to download simultaneously
* [youtube] Implement fragment re-fetching for the live dash formats
* [youtube] Re-extract dash manifest every 5 hours (manifest expires in 6hrs)
* [postprocessor/ffmpeg] Add `FFmpegFixupDuplicateMoovPP` to fixup duplicated moov atoms

Known issue: Ctrl+C doesn't work on Windows when downloading multiple formats

Closes #1521
Authored by: nao20010128nao, pukkandan
2021-12-20 11:36:46 +05:30
Julien Hadley Jack
c031b0414c [ondemandkorea] Update jw_config regex (#2056)
Authored by: julien-hadleyjack
2021-12-20 10:32:48 +05:30
coletdjnz
f3aa3c3f98 [youtube:tab] Extract more metadata from feeds/channels/playlists (#1018)
Parse relative time text, extract live, upcoming status, availability and channel id from feeds/channels/playlists (where applicable). 
Closes #1883
Authored-by: coletdjnz
2021-12-20 04:47:53 +00:00
cypheron
ae43a4b986 [hse] Add extractors (#1906)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/27060
Authored by: cypheron, pukkandan
2021-12-20 09:44:01 +05:30
pukkandan
ca5db158ae [postprocessor/ffmpeg] Always add faststart
Closes #1491
2021-12-20 08:52:34 +05:30
pukkandan
5f549d4959 [Facebook] Handle redirect URLs
Closes #1035
2021-12-20 08:52:34 +05:30
Paul Wise
6839d02cb6 [ABC:iview] Add show extractor (#1630)
Authored by: pabs3
2021-12-20 08:18:41 +05:30
Abdullah Ibn Fulan
2aae2c91ff [audiomack] Update album and song VALID_URL (#1203)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/29810
Closes #1352, https://github.com/ytdl-org/youtube-dl/issues/29800
Authored by: abdullah-if, dirkf
2021-12-20 07:23:42 +05:30
Lapinot
c2dedf12e8 [soundcloud] Add related tracks extractor (#1000)
Authored by: Lapin0t
2021-12-20 06:44:19 +05:30
Unit 193
e75bb0d6c3 [cleanup] Fix some typos (#2033)
Authored by: unit193
2021-12-19 20:48:06 +05:30
pukkandan
dd0228ce1f Remove known invalid thumbnails from info_dict
Related: https://github.com/yt-dlp/yt-dlp/issues/980#issuecomment-997396821
2021-12-19 20:25:01 +05:30
pukkandan
37e57a9fd4 [youtube:tab] Ignore query when redirecting channel to playlist
and cleanup of related code
Closes #2046
2021-12-19 09:05:59 +05:30
pukkandan
940a67a3e2 [docs] Change all examples to use double quotes
to be platform-agnostic
2021-12-19 09:05:58 +05:30
pukkandan
e6ae51c123 [generic] Extract m3u8 formats from JSON-LD 2021-12-19 09:05:50 +05:30
pukkandan
75ad33572b [test/download] Split sanitize_got_info_dict into a separate function
so that it can be used by third party scripts
2021-12-19 09:05:40 +05:30
pukkandan
aab41cdd33 [PlutoTV] Expand _VALID_URL
Closes #2007
2021-12-19 02:06:05 +05:30
pukkandan
b3a5115ff1 [zee5] Support /episodes in URL
Closes #2016
2021-12-19 02:05:55 +05:30
Felix S
d76d15a669 [utils] Update std_headers (#2023)
* Update our chrome versions used for `User-Agent`s
* Drop the `Accept-Charset` header that no browser emits any more

Authored by: kikuyan, fstirlitz
2021-12-18 05:34:24 +05:30
PilzAdam
e978789f0f [outtmpl] Add operator & for replacement text (#2012)
Authored by: PilzAdam
2021-12-18 02:05:48 +05:30
chris
ec2e44fc57 [docs] Improve manpage format (#2003)
Closes #1448
Authored by: iw0nderhow, pukkandan
2021-12-17 06:53:04 +05:30
Sematre
375d9360bf [gronkh] Support new URL pattern (#2019)
Authored by: Sematre
2021-12-17 02:30:03 +05:30
Zenon Mousmoulas
d5c3254889 [extractor] Support default implicit graph in JSON-LD (#1983)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/30229

Per W3C JSON-LD v1.1 §4.9 (non-normative ref):

    When a JSON-LD document's top-level structure is a map that contains
    no other keys than @graph and optionally @context (properties that
    are not mapped to an IRI or a keyword are ignored), @graph is
    considered to express the otherwise implicit default graph.

Authored by: zmousm
2021-12-17 02:16:30 +05:30
std-move
fed1309651 [test/download] Ignore field webpage_url_domain (#2014)
Authored by: std-move
2021-12-16 15:40:16 +05:30
std-move
fe69f52e5c [NovaEmbed] update player regex (#2008)
Authored by: std-move
2021-12-16 01:25:11 +05:30
pukkandan
3116be32b4 [brightcove] Fix 487c5b3389 2021-12-15 22:20:30 +05:30
pukkandan
a8549f19e7 [tiktok] Fix 53dad39e30 2021-12-15 22:18:01 +05:30
pukkandan
39ca3b5c7f [extractor] Standardize _live_title 2021-12-15 22:09:07 +05:30
coletdjnz
46383212b3 [youtube:comments] Add more options for limiting number of comments extracted (#1626)
Extends `max_comments` extractor arg to support `max-parents,max-replies,max-replies-per-thread`.
Authored-by: coletdjnz
2021-12-15 04:29:48 +00:00
pukkandan
0bb322b9c0 Add field webpage_url_domain
Closes #1311
2021-12-15 04:51:52 +05:30
pukkandan
ff9f925b63 [test/download] Add more fields 2021-12-15 03:38:12 +05:30
pukkandan
5bfc8bee5a Fix PostProcessor hooks not registered for some PPs
Closes #1993
2021-12-15 02:14:14 +05:30
pukkandan
19188702ef [FormatSort] Prevent incorrect deprecation warning
Closes #1981
2021-12-15 01:33:15 +05:30
The Hatsune Daishi
d984a98def [ok.ru] add mobile fallback (#1972)
Authored by: nao20010128nao
2021-12-14 23:39:57 +05:30
u-spec-png
069c6ccf02 [olympics] Add uploader and cleanup (#1990)
Authored by: u-spec-png
2021-12-14 22:45:50 +05:30
MinePlayersPE
53dad39e30 [TikTok] Pass cookies to mobile API (#1994)
Authored by: MinePlayersPE
2021-12-14 22:40:13 +05:30
Ashish Gupta
db77c49c84 [SonyLiv] Add OTP login support (#1959)
Closes #1945
Authored by: Ashish0804
2021-12-14 22:39:11 +05:30
Ashish Gupta
abc07b554c [NateTV] Add NateIE and NateProgramIE (#1950)
Authored by: Ashish0804, Hyeeji
2021-12-14 22:29:17 +05:30
Ashish Gupta
86f3d52f8c [DiscoveryPlusShowBaseIE] yield actual video id 2021-12-13 18:48:31 +05:30
u-spec-png
8b688881ba [instagram] Expand valid URL (#1977)
Closes #1925

Authored by: u-spec-png
2021-12-12 23:31:00 +05:30
Ashish Gupta
13debc86e7 [Rutube] Add RutubeChannelIE (#1970)
Closes #1966 
Authored by: Ashish0804
2021-12-12 21:26:36 +05:30
nyuszika7h
b5f94e4fa1 [toggo] Add extractor (#1961)
Authored by: nyuszika7h
2021-12-11 20:23:42 +05:30
YuenSzeHong
61882afdc5 [fujitv] Extract 1080p from tv_android m3u8 (#1928)
Authored by: YuenSzeHong
2021-12-11 19:14:08 +05:30
coletdjnz
aa4b054512 [web.archive:youtube] Improve metadata extraction (#1785)
Authored-by: coletdjnz
2021-12-09 23:43:15 +00:00
YuenSzeHong
487c5b3389 [TVer] Extract better thumbnails (#1929)
Authored by: YuenSzeHong
2021-12-09 18:49:00 +05:30
Nil Admirari
8157a09d22 [SponsorBlock] Add Filler and Highlight categories (#1664)
Authored by: nihil-admirari, pukkandan
2021-12-09 18:10:31 +05:30
Jertzukka
b1aaf1c07f [gofile] Add extractor (#1850)
Closes #1831 
Authored by: Jertzukka, Ashish0804
2021-12-09 17:55:30 +05:30
chris
5f9aaac8c2 [zdf] Support videos with different ptmd location (#1893)
Authored by: iw0nderhow
2021-12-09 17:24:31 +05:30
David Skrundz
54c2521ca6 [CBC Gem] Extract 1080p formats (#1913)
Authored by: DavidSkrundz
2021-12-09 17:17:56 +05:30
The Hatsune Daishi
2814f12ba4 [skeb] Add extractor (#1916)
Fixes: https://github.com/ytdl-org/youtube-dl/issues/30287
Authored by: nao20010128nao
2021-12-09 17:10:52 +05:30
raleeper
1619836cb7 [crackle] Look for non-DRM formats (#1938)
Authored by: raleeper
2021-12-09 17:09:51 +05:30
pukkandan
e3c7d49571 [compat] Suppress errors in enabling VT mode
Closes #1932
2021-12-08 19:58:50 +05:30
The Hatsune Daishi
ddd24c9949 [ntvcojp] Extract NUXT data (#1915)
Fixes: https://github.com/ytdl-org/youtube-dl/issues/30309
Authored by: nao20010128nao
2021-12-07 22:33:48 +05:30
Michal Kubeček
443b21dc4e [ceskatelevize] Fetch iframe from nextJS data (#1904)
Closes #1899
Authored by: mkubecek
2021-12-07 22:14:43 +05:30
The Hatsune Daishi
66f4c04e50 [extractor] Add _search_nuxt_data (#1921)
Authored by: nao20010128nao
2021-12-07 22:08:50 +05:30
nixxo
93864403ea [redtube] Handle formats delivered inside a JSON (#1877)
Closes #1663
Authored by: dirkf, nixxo
2021-12-07 19:29:54 +05:30
pukkandan
b5475f1145 Pre-process when using --flat-playlist 2021-12-07 02:07:48 +05:30
pukkandan
38d79fd16c Use parse_duration for --wait-for-video
and some minor fix
2021-12-06 23:30:33 +05:30
pukkandan
acc0d6a411 Allow --no-write-thumbnail to override --write-all-thumbnail
Closes #1900
2021-12-06 23:27:35 +05:30
pukkandan
146cc4114a bugfix for 63ccf4ff1a 2021-12-06 23:24:42 +05:30
pukkandan
818faa3a86 [vimeo] Extract chapters
Closes #1892
2021-12-05 20:00:59 +05:30
MinePlayersPE
aa5ecf082c [TrueID] Add extractor (#1847)
Authored by: MinePlayersPE
2021-12-05 00:53:05 +05:30
pukkandan
d2b2fca53f [extractor] Ignore errors in comment extraction when -i is given
Closes #1787
2021-12-03 03:46:04 +05:30
pukkandan
63ccf4ff1a [lazy_extractors] Fix bug in 2c4aaaddc9
SearchIEs must not inherit from extractors that have a _VALID_URL defined
2021-12-03 03:22:26 +05:30
pukkandan
43b2290658 Fix --throttled-rate 2021-12-03 02:52:53 +05:30
nixxo
99148c6a33 [RaiNews] Fix extractor (#1864)
Closes #1862
Authored by: nixxo
2021-12-03 01:09:08 +05:30
pukkandan
9bdd99cf39 [EmbedSubtitle] Disable duration check temporarily
Closes #1870, #1385
2021-12-02 19:54:01 +05:30
pukkandan
2c4aaaddc9 [lazy_extractors] Fix for search IEs
Closes #1851
2021-12-01 23:23:59 +05:30
pukkandan
5f7cb91ae9 [youtube] Fix ytsearchdate
Related: #1851
2021-12-01 23:23:45 +05:30
pukkandan
3efb96a6d1 Fix control characters being printed to --console-title
Closes #1859
2021-12-01 22:39:57 +05:30
pukkandan
3262f8abf2 [trovo] Fix inheritance of TrovoChannelBaseIE
Closes #1849
2021-12-01 21:44:33 +05:30
Christian Paul
bdbafb3913 [Jamendo] Fix use of _VALID_URL_RE (#1858)
Closes #1857
Authored by: jaller94
2021-12-01 21:40:10 +05:30
pukkandan
a804f6d89c [cleanup Fix some typos
* `MetadataFromFieldPP` is not deprecated!
* Wrong args to `MetadataFromFieldPP`
* Some mistakes in change log
* Type in build.yml causing release tag to be placed on wrong commit
2021-12-01 10:15:01 +05:30
github-actions
814dfb7e25 [version] update
Created by: pukkandan

:ci skip all
2021-12-01 00:23:24 +00:00
275 changed files with 14902 additions and 5840 deletions

View File

@@ -1,6 +1,6 @@
name: Broken site support
name: Broken site
description: Report broken or misfunctioning site
labels: [triage, extractor-bug]
labels: [triage, site-bug]
body:
- type: checkboxes
id: checklist
@@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a broken site
required: true
- label: I've verified that I'm running yt-dlp version **2021.11.10.1**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
- label: I've verified that I'm running yt-dlp version **2022.02.04**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've checked that all provided URLs are alive and playable in a browser
required: true
@@ -44,19 +44,19 @@ body:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp **that clearly demonstrates the problem**.
Add the `-Uv` flag to your command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2021.11.10.1 (exe)
[debug] yt-dlp version 2022.02.04 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2021.11.10.1)
yt-dlp is up to date (2022.02.04)
<more lines>
render: shell
validations:

View File

@@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a new site support request
required: true
- label: I've verified that I'm running yt-dlp version **2021.11.10.1**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
- label: I've verified that I'm running yt-dlp version **2022.02.04**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've checked that all provided URLs are alive and playable in a browser
required: true
@@ -34,7 +34,7 @@ body:
label: Example URLs
description: |
Provide all kinds of example URLs for which support should be added
value: |
placeholder: |
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
@@ -55,19 +55,19 @@ body:
label: Verbose log
description: |
Provide the complete verbose output **using one of the example URLs provided above**.
Add the `-Uv` flag to your command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2021.11.10.1 (exe)
[debug] yt-dlp version 2022.02.04 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2021.11.10.1)
yt-dlp is up to date (2022.02.04)
<more lines>
render: shell
validations:

View File

@@ -1,5 +1,5 @@
name: Site feature request
description: Request a new functionality for a site
description: Request a new functionality for a supported site
labels: [triage, site-enhancement]
body:
- type: checkboxes
@@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a site feature request
required: true
- label: I've verified that I'm running yt-dlp version **2021.11.10.1**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
- label: I've verified that I'm running yt-dlp version **2022.02.04**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've checked that all provided URLs are alive and playable in a browser
required: true
@@ -32,7 +32,7 @@ body:
label: Example URLs
description: |
Example URLs that can be used to demonstrate the requested feature
value: |
placeholder: |
https://www.youtube.com/watch?v=BaW_jenozKc
validations:
required: true
@@ -47,3 +47,26 @@ body:
placeholder: WRITE DESCRIPTION HERE
validations:
required: true
- type: textarea
id: log
attributes:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp that demonstrates the need for the enhancement.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2022.02.04 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2022.02.04)
<more lines>
render: shell
validations:
required: true

View File

@@ -1,6 +1,6 @@
name: Bug report
description: Report a bug unrelated to any particular site or extractor
labels: [triage,bug]
labels: [triage, bug]
body:
- type: checkboxes
id: checklist
@@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a bug unrelated to a specific site
required: true
- label: I've verified that I'm running yt-dlp version **2021.11.10.1**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
- label: I've verified that I'm running yt-dlp version **2022.02.04**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've checked that all provided URLs are alive and playable in a browser
required: true
@@ -38,19 +38,19 @@ body:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp **that clearly demonstrates the problem**.
Add the `-Uv` flag to **your** command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to **your** command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2021.11.10.1 (exe)
[debug] yt-dlp version 2022.02.04 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2021.11.10.1)
yt-dlp is up to date (2022.02.04)
<more lines>
render: shell
validations:

View File

@@ -1,4 +1,4 @@
name: Feature request request
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
@@ -11,7 +11,9 @@ body:
options:
- label: I'm reporting a feature request
required: true
- label: I've verified that I'm running yt-dlp version **2021.11.10.1**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2022.02.04**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues including closed ones. DO NOT post duplicates
required: true

View File

@@ -9,7 +9,7 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm asking a question and not reporting a bug/feature request
- label: I'm asking a question and **not** reporting a bug/feature request
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
@@ -24,7 +24,30 @@ body:
description: |
Ask your question in an arbitrary form.
Please make sure it's worded well enough to be understood, see [is-the-description-of-the-issue-itself-sufficient](https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient).
Provide any additional information and as much context and examples as possible
Provide any additional information and as much context and examples as possible.
If your question contains "isn't working" or "can you add", this is most likely the wrong template.
If you are in doubt if this is the right template, use another template!
placeholder: WRITE QUESTION HERE
validations:
required: true
- type: textarea
id: log
attributes:
label: Verbose log
description: |
If your question involes a yt-dlp command, provide the complete verbose output of that command.
Add the `-vU` flag to **your** command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2021.12.01 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2021.12.01)
<more lines>
render: shell

View File

@@ -2,4 +2,7 @@ blank_issues_enabled: false
contact_links:
- name: Get help from the community on Discord
url: https://discord.gg/H5MNcFW63r
about: Join the yt-dlp Discord for community-powered support!
about: Join the yt-dlp Discord for community-powered support!
- name: Matrix Bridge to the Discord server
url: https://matrix.to/#/#yt-dlp:matrix.org
about: For those who do not want to use Discord

View File

@@ -1,6 +1,6 @@
name: Broken site support
name: Broken site
description: Report broken or misfunctioning site
labels: [triage, extractor-bug]
labels: [triage, site-bug]
body:
- type: checkboxes
id: checklist
@@ -44,10 +44,10 @@ body:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp **that clearly demonstrates the problem**.
Add the `-Uv` flag to your command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252

View File

@@ -34,7 +34,7 @@ body:
label: Example URLs
description: |
Provide all kinds of example URLs for which support should be added
value: |
placeholder: |
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
@@ -55,10 +55,10 @@ body:
label: Verbose log
description: |
Provide the complete verbose output **using one of the example URLs provided above**.
Add the `-Uv` flag to your command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252

View File

@@ -1,5 +1,5 @@
name: Site feature request
description: Request a new functionality for a site
description: Request a new functionality for a supported site
labels: [triage, site-enhancement]
body:
- type: checkboxes
@@ -32,7 +32,7 @@ body:
label: Example URLs
description: |
Example URLs that can be used to demonstrate the requested feature
value: |
placeholder: |
https://www.youtube.com/watch?v=BaW_jenozKc
validations:
required: true
@@ -47,3 +47,26 @@ body:
placeholder: WRITE DESCRIPTION HERE
validations:
required: true
- type: textarea
id: log
attributes:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp that demonstrates the need for the enhancement.
Add the `-vU` flag to your command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version %(version)s (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (%(version)s)
<more lines>
render: shell
validations:
required: true

View File

@@ -1,6 +1,6 @@
name: Bug report
description: Report a bug unrelated to any particular site or extractor
labels: [triage,bug]
labels: [triage, bug]
body:
- type: checkboxes
id: checklist
@@ -38,10 +38,10 @@ body:
label: Verbose log
description: |
Provide the complete verbose output of yt-dlp **that clearly demonstrates the problem**.
Add the `-Uv` flag to **your** command line you run yt-dlp with (`yt-dlp -Uv <your command line>`), copy the WHOLE output and insert it below.
Add the `-vU` flag to **your** command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-Uv', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252

View File

@@ -1,4 +1,4 @@
name: Feature request request
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
@@ -11,6 +11,8 @@ body:
options:
- label: I'm reporting a feature request
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **%(version)s**. ([update instructions](https://github.com/yt-dlp/yt-dlp#update))
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues including closed ones. DO NOT post duplicates

View File

@@ -9,7 +9,7 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm asking a question and not reporting a bug/feature request
- label: I'm asking a question and **not** reporting a bug/feature request
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
@@ -24,7 +24,30 @@ body:
description: |
Ask your question in an arbitrary form.
Please make sure it's worded well enough to be understood, see [is-the-description-of-the-issue-itself-sufficient](https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient).
Provide any additional information and as much context and examples as possible
Provide any additional information and as much context and examples as possible.
If your question contains "isn't working" or "can you add", this is most likely the wrong template.
If you are in doubt if this is the right template, use another template!
placeholder: WRITE QUESTION HERE
validations:
required: true
- type: textarea
id: log
attributes:
label: Verbose log
description: |
If your question involes a yt-dlp command, provide the complete verbose output of that command.
Add the `-vU` flag to **your** command line you run yt-dlp with (`yt-dlp -vU <your command line>`), copy the WHOLE output and insert it below.
It should look similar to this:
placeholder: |
[debug] Command-line config: ['-vU', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2021.12.01 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2021.12.01)
<more lines>
render: shell

View File

@@ -96,7 +96,7 @@ jobs:
env:
BREW_TOKEN: ${{ secrets.BREW_TOKEN }}
if: "env.BREW_TOKEN != ''"
uses: webfactory/ssh-agent@v0.5.3
uses: yt-dlp/ssh-agent@v0.5.3
with:
ssh-private-key: ${{ env.BREW_TOKEN }}
- name: Update Homebrew Formulae
@@ -119,7 +119,7 @@ jobs:
with:
tag_name: ${{ steps.bump_version.outputs.ytdlp_version }}
release_name: yt-dlp ${{ steps.bump_version.outputs.ytdlp_version }}
commitish: ${{ steps.push_update.outputs.head_sha }}
commitish: ${{ steps.push_release.outputs.head_sha }}
body: |
#### [A description of the various files]((https://github.com/yt-dlp/yt-dlp#release-files)) are in the README
@@ -165,7 +165,7 @@ jobs:
- name: Install Requirements
run: |
brew install coreutils
/usr/bin/python3 -m pip install -U --user pip Pyinstaller==4.5.1 mutagen pycryptodomex websockets
/usr/bin/python3 -m pip install -U --user pip Pyinstaller==4.5.1 -r requirements.txt
- name: Bump version
id: bump_version
run: /usr/bin/python3 devscripts/update-version.py
@@ -192,11 +192,9 @@ jobs:
run: echo "::set-output name=sha512_macos::$(sha512sum dist/yt-dlp_macos | awk '{print $1}')"
- name: Run PyInstaller Script with --onedir
run: /usr/bin/python3 pyinst.py --target-architecture universal2 --onedir
- uses: papeloto/action-zip@v1
with:
files: ./dist/yt-dlp_macos
dest: ./dist/yt-dlp_macos.zip
run: |
/usr/bin/python3 pyinst.py --target-architecture universal2 --onedir
zip ./dist/yt-dlp_macos.zip ./dist/yt-dlp_macos
- name: Upload yt-dlp MacOS onedir
id: upload-release-macos-zip
uses: actions/upload-release-asset@v1
@@ -210,7 +208,7 @@ jobs:
- name: Get SHA2-256SUMS for yt-dlp_macos.zip
id: sha256_macos_zip
run: echo "::set-output name=sha256_macos_zip::$(sha256sum dist/yt-dlp_macos.zip | awk '{print $1}')"
- name: Get SHA2-512SUMS for yt-dlp_macos
- name: Get SHA2-512SUMS for yt-dlp_macos.zip
id: sha512_macos_zip
run: echo "::set-output name=sha512_macos_zip::$(sha512sum dist/yt-dlp_macos.zip | awk '{print $1}')"
@@ -236,7 +234,7 @@ jobs:
# Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
run: |
python -m pip install --upgrade pip setuptools wheel py2exe
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-4.5.1-py3-none-any.whl" mutagen pycryptodomex websockets
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-4.5.1-py3-none-any.whl" -r requirements.txt
- name: Bump version
id: bump_version
env:
@@ -265,11 +263,9 @@ jobs:
run: echo "::set-output name=sha512_win::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA512).Hash.ToLower())"
- name: Run PyInstaller Script with --onedir
run: python pyinst.py --onedir
- uses: papeloto/action-zip@v1
with:
files: ./dist/yt-dlp
dest: ./dist/yt-dlp_win.zip
run: |
python pyinst.py --onedir
Compress-Archive -LiteralPath ./dist/yt-dlp -DestinationPath ./dist/yt-dlp_win.zip
- name: Upload yt-dlp Windows onedir
id: upload-release-windows-zip
uses: actions/upload-release-asset@v1
@@ -325,7 +321,7 @@ jobs:
- name: Install Requirements
run: |
python -m pip install --upgrade pip setuptools wheel
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/i686/pyinstaller-4.5.1-py3-none-any.whl" mutagen pycryptodomex websockets
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/i686/pyinstaller-4.5.1-py3-none-any.whl" -r requirements.txt
- name: Bump version
id: bump_version
env:

32
.gitignore vendored
View File

@@ -1,27 +1,35 @@
# Config
*.conf
*.spec
cookies
*cookies.txt
.netrc
# Downloaded
*.3gp
*.annotations.xml
*.ape
*.aria2
*.avi
*.description
*.desktop
*.dump
*.flac
*.flv
*.frag
*.frag.aria2
*.frag.urls
*.info.json
*.live_chat.json
*.meta
*.part*
*.tmp
*.temp
*.unknown_video
*.ytdl
.cache/
*.3gp
*.ape
*.avi
*.desktop
*.flac
*.flv
*.jpeg
*.jpg
*.live_chat.json
*.m4a
*.m4v
*.mhtml
@@ -31,23 +39,18 @@ cookies
*.mp4
*.ogg
*.opus
*.part
*.part-*
*.png
*.sbv
*.srt
*.swf
*.swp
*.ttml
*.unknown_video
*.url
*.vtt
*.wav
*.webloc
*.webm
*.webp
*.ytdl
.cache/
# Allow config/media files in testdata
!test/**
@@ -86,11 +89,10 @@ README.txt
*.1
*.bash-completion
*.fish
*.exe
*.tar.gz
*.zsh
*.spec
test/testdata/player-*.js
test/testdata/sigs/player-*.js
# Binary
/youtube-dl

View File

@@ -10,6 +10,7 @@
- [Does the issue involve one problem, and one problem only?](#does-the-issue-involve-one-problem-and-one-problem-only)
- [Is anyone going to need the feature?](#is-anyone-going-to-need-the-feature)
- [Is your question about yt-dlp?](#is-your-question-about-yt-dlp)
- [Are you willing to share account details if needed?](#are-you-willing-to-share-account-details-if-needed)
- [DEVELOPER INSTRUCTIONS](#developer-instructions)
- [Adding new feature or making overarching changes](#adding-new-feature-or-making-overarching-changes)
- [Adding support for a new site](#adding-support-for-a-new-site)
@@ -18,6 +19,7 @@
- [Provide fallbacks](#provide-fallbacks)
- [Regular expressions](#regular-expressions)
- [Long lines policy](#long-lines-policy)
- [Quotes](#quotes)
- [Inline values](#inline-values)
- [Collapse fallbacks](#collapse-fallbacks)
- [Trailing parentheses](#trailing-parentheses)
@@ -30,9 +32,9 @@
Bugs and suggestions should be reported at: [yt-dlp/yt-dlp/issues](https://github.com/yt-dlp/yt-dlp/issues). Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in our [discord server](https://discord.gg/H5MNcFW63r).
**Please include the full output of yt-dlp when run with `-Uv`**, i.e. **add** `-Uv` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
**Please include the full output of yt-dlp when run with `-vU`**, i.e. **add** `-vU` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ yt-dlp -Uv <your command line>
$ yt-dlp -vU <your command line>
[debug] Command-line config: ['-v', 'demo.com']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] yt-dlp version 2021.09.25 (zip)
@@ -63,7 +65,7 @@ So please elaborate on what feature you are requesting, or what bug you want to
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. We often get frustrated by these issues, since the only possible way for us to move forward on them is to ask for clarification over and over.
For bug reports, this means that your report should contain the **complete** output of yt-dlp when called with the `-Uv` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
For bug reports, this means that your report should contain the **complete** output of yt-dlp when called with the `-vU` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--write-pages` and upload the `.dump` files you get [somewhere](https://gist.github.com).
@@ -111,7 +113,7 @@ If the issue is with `youtube-dl` (the upstream fork of yt-dlp) and not with yt-
### Are you willing to share account details if needed?
The maintainers and potential contributors of the project often do not have an account for the website you are asking support for. So any developer interested in solving your issue may ask you for account details. It is your personal discression whether you are willing to share the account in order for the developer to try and solve your issue. However, if you are unwilling or unable to provide details, they obviously cannot work on the issue and it cannot be solved unless some developer who both has an account and is willing/able to contribute decides to solve it.
The maintainers and potential contributors of the project often do not have an account for the website you are asking support for. So any developer interested in solving your issue may ask you for account details. It is your personal discretion whether you are willing to share the account in order for the developer to try and solve your issue. However, if you are unwilling or unable to provide details, they obviously cannot work on the issue and it cannot be solved unless some developer who both has an account and is willing/able to contribute decides to solve it.
By sharing an account with anyone, you agree to bear all risks associated with it. The maintainers and yt-dlp can't be held responsible for any misuse of the credentials.
@@ -215,7 +217,7 @@ After you have ensured this site is distributing its content legally, you can fo
$ flake8 yt_dlp/extractor/yourextractor.py
1. Make sure your code works under all [Python](https://www.python.org/) versions supported by yt-dlp, namely CPython and PyPy for Python 3.6 and above. Backward compatability is not required for even older versions of Python.
1. Make sure your code works under all [Python](https://www.python.org/) versions supported by yt-dlp, namely CPython and PyPy for Python 3.6 and above. Backward compatibility is not required for even older versions of Python.
1. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files, [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add yt_dlp/extractor/extractors.py
@@ -227,6 +229,13 @@ After you have ensured this site is distributing its content legally, you can fo
In any case, thank you very much for your contributions!
**Tip:** To test extractors that require login information, create a file `test/local_parameters.json` and add `"usenetrc": true` or your username and password in it:
```json
{
"username": "your user name",
"password": "your password"
}
```
## yt-dlp coding conventions
@@ -243,7 +252,11 @@ For extraction to work yt-dlp relies on metadata your extractor extracts and pro
- `title` (media title)
- `url` (media download URL) or `formats`
The aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. While, in fact, only `id` is technically mandatory, due to compatability reasons, yt-dlp also treats `title` as mandatory. The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - Eg: when the video is a live stream that has not started yet.
The aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. While all extractors must return a `title`, they must also allow it's extraction to be non-fatal.
For pornographic sites, appropriate `age_limit` must also be returned.
The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - Eg: when the video is a live stream that has not started yet.
[Any field](yt_dlp/extractor/common.py#219-L426) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
@@ -444,10 +457,14 @@ Here the presence or absence of other attributes including `style` is irrelevent
### Long lines policy
There is a soft limit to keep lines of code under 100 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse. Sometimes, it may be reasonable to go upto 120 characters and sometimes even 80 can be unreadable. Keep in mind that this is not a hard limit and is just one of many tools to make the code more readable
There is a soft limit to keep lines of code under 100 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse. Sometimes, it may be reasonable to go upto 120 characters and sometimes even 80 can be unreadable. Keep in mind that this is not a hard limit and is just one of many tools to make the code more readable.
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
Conversely, don't unecessarily split small lines further. As a rule of thumb, if removing the line split keeps the code under 80 characters, it should be a single line.
##### Examples
Correct:
```python
@@ -461,6 +478,47 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
Correct:
```python
uploader = traverse_obj(info, ('uploader', 'name'), ('author', 'fullname'))
```
Incorrect:
```python
uploader = traverse_obj(
info,
('uploader', 'name'),
('author', 'fullname'))
```
Correct:
```python
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Downloading HD m3u8 information', errnote='Unable to download HD m3u8 information')
```
Incorrect:
```python
formats = self._extract_m3u8_formats(m3u8_url,
video_id,
'mp4',
'm3u8_native',
m3u8_id='hls',
note='Downloading HD m3u8 information',
errnote='Unable to download HD m3u8 information')
```
### Quotes
Always use single quotes for strings (even if the string has `'`) and double quotes for docstrings. Use `'''` only for multi-line strings. An exception can be made if a string has multiple single quotes in it and escaping makes it significantly harder to read. For f-strings, use you can use double quotes on the inside. But avoid f-strings that have too many quotes inside.
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
@@ -510,27 +568,68 @@ Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`,
### Trailing parentheses
Always move trailing parentheses after the last argument.
Always move trailing parentheses used for grouping/functions after the last argument. On the other hand, literal list/tuple/dict/set should closed be in a new line. Generators and list/dict comprehensions may use either style
Note that this *does not* apply to braces `}` or square brackets `]` both of which should closed be in a new line
#### Example
#### Examples
Correct:
```python
url = try_get(
info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Correct:
```python
url = try_get(info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
url = try_get(
info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
Correct:
```python
f = {
'url': url,
'format_id': format_id,
}
```
Incorrect:
```python
f = {'url': url,
'format_id': format_id}
```
Correct:
```python
formats = [process_formats(f) for f in format_data
if f.get('type') in ('hls', 'dash', 'direct') and f.get('downloadable')]
```
Correct:
```python
formats = [
process_formats(f) for f in format_data
if f.get('type') in ('hls', 'dash', 'direct') and f.get('downloadable')
]
```
### Use convenience conversion and parsing functions

View File

@@ -2,6 +2,7 @@ pukkandan (owner)
shirt-dev (collaborator)
coletdjnz/colethedj (collaborator)
Ashish0804 (collaborator)
nao20010128nao/Lesmiscore (collaborator)
h-h-h-h
pauldubois98
nixxo
@@ -19,7 +20,6 @@ samiksome
alxnull
FelixFrog
Zocker1999NET
nao20010128nao
kurumigi
bbepis
animelover1984/horahoradev
@@ -155,3 +155,42 @@ staubichsauger
xenova
Yakabuff
zulaport
ehoogeveen-medweb
PilzAdam
zmousm
iw0nderhow
unit193
TwoThousandHedgehogs
Jertzukka
cypheron
Hyeeji
bwildenhain
C0D3D3V
kebianizao
Lapin0t
abdullah-if
DavidSkrundz
mkubecek
raleeper
YuenSzeHong
Sematre
jaller94
r5d
julien-hadleyjack
git-anony-mouse
mdawar
trassshhub
foghawk
k3ns1n
teridon
mozlima
timendum
ischmidt20
CreaValix
sian1468
arkamar
hyano
KiberInfinity
tejing1
Bricio
lazypete365

View File

@@ -10,6 +10,311 @@
* Dispatch the workflow https://github.com/yt-dlp/yt-dlp/actions/workflows/build.yml on master
-->
### 2022.02.04
* [youtube:search] Fix extractor by [coletdjnz](https://github.com/coletdjnz)
* [youtube:search] Add tests
* [twitcasting] Enforce UTF-8 for POST payload by [Lesmiscore](https://github.com/Lesmiscore)
* [mediaset] Fix extractor by [nixxo](https://github.com/nixxo)
* [websocket] Make syntax error in `websockets` module non-fatal
### 2022.02.03
* Merge youtube-dl: Upto [commit/78ce962](https://github.com/ytdl-org/youtube-dl/commit/78ce962f4fe020994c216dd2671546fbe58a5c67)
* Add option `--print-to-file`
* Make nested --config-locations relative to parent file
* Ensure `_type` is present in `info.json`
* Fix `--compat-options list-formats`
* Fix/improve `InAdvancePagedList`
* [downloader/ffmpeg] Handle unknown formats better
* [outtmpl] Handle `-o ""` better
* [outtmpl] Handle hard-coded file extension better
* [extractor] Add convinience function `_yes_playlist`
* [extractor] Allow non-fatal `title` extraction
* [extractor] Extract video inside `Article` json_ld
* [generic] Allow further processing of json_ld URL
* [cookies] Fix keyring selection for unsupported desktops
* [utils] Strip double spaces in `clean_html` by [dirkf](https://github.com/dirkf)
* [aes] Add `unpad_pkcs7`
* [test] Fix `test_youtube_playlist_noplaylist`
* [docs,cleanup] Misc cleanup
* [dplay] Add extractors for site changes by [Sipherdrakon](https://github.com/Sipherdrakon)
* [ertgr] Add extractors by [zmousm](https://github.com/zmousm), [dirkf](https://github.com/dirkf)
* [Musicdex] Add extractors by [Ashish0804](https://github.com/Ashish0804)
* [YandexVideoPreview] Add extractor by [KiberInfinity](https://github.com/KiberInfinity)
* [youtube] Add extractor `YoutubeMusicSearchURLIE`
* [archive.org] Ignore unnecessary files
* [Bilibili] Add 8k support by [u-spec-png](https://github.com/u-spec-png)
* [bilibili] Fix extractor, make anthology title non-fatal
* [CAM4] Add thumbnail extraction by [alerikaisattera](https://github.com/alerikaisattera)
* [cctv] De-prioritize sample format
* [crunchyroll:beta] Add cookies support by [tejing1](https://github.com/tejing1)
* [crunchyroll] Fix login by [tejing1](https://github.com/tejing1)
* [doodstream] Fix extractor
* [fc2] Fix extraction by [Lesmiscore](https://github.com/Lesmiscore)
* [FFmpegConcat] Abort on --skip-download and download errors
* [Fujitv] Extract metadata and support premium by [YuenSzeHong](https://github.com/YuenSzeHong)
* [globo] Fix extractor by [Bricio](https://github.com/Bricio)
* [glomex] Simplify embed detection
* [GoogleSearch] Fix extractor
* [Instagram] Fix extraction when logged in by [MinePlayersPE](https://github.com/MinePlayersPE)
* [iq.com] Add VIP support by [MinePlayersPE](https://github.com/MinePlayersPE)
* [mildom] Fix extractor by [lazypete365](https://github.com/lazypete365)
* [MySpass] Fix video url processing by [trassshhub](https://github.com/trassshhub)
* [Odnoklassniki] Improve embedded players extraction by [KiberInfinity](https://github.com/KiberInfinity)
* [orf:tvthek] Lazy playlist extraction and obey --no-playlist
* [Pladform] Fix redirection to external player by [KiberInfinity](https://github.com/KiberInfinity)
* [ThisOldHouse] Improve Premium URL check by [Ashish0804](https://github.com/Ashish0804)
* [TikTok] Iterate through app versions by [MinePlayersPE](https://github.com/MinePlayersPE)
* [tumblr] Fix 403 errors and handle vimeo embeds by [foghawk](https://github.com/foghawk)
* [viki] Fix "Bad request" for manifest by [nyuszika7h](https://github.com/nyuszika7h)
* [Vimm] add recording extractor by [alerikaisattera](https://github.com/alerikaisattera)
* [web.archive:youtube] Add `ytarchive:` prefix and misc cleanup
* [youtube:api] Do not use seek when reading HTTPError response by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Fix n-sig for player e06dea74
* [youtube, cleanup] Misc fixes and cleanup
### 2022.01.21
* Add option `--concat-playlist` to **concat videos in a playlist**
* Allow **multiple and nested configuration files**
* Add more post-processing stages (`after_video`, `playlist`)
* Allow `--exec` to be run at any post-processing stage (Deprecates `--exec-before-download`)
* Allow `--print` to be run at any post-processing stage
* Allow listing formats, thumbnails, subtitles using `--print` by [pukkandan](https://github.com/pukkandan), [Zirro](https://github.com/Zirro)
* Add fields `video_autonumber`, `modified_date`, `modified_timestamp`, `playlist_count`, `channel_follower_count`
* Add key `requested_downloads` in the root `info_dict`
* Write `download_archive` only after all formats are downloaded
* [FfmpegMetadata] Allow setting metadata of individual streams using `meta<n>_` prefix
* Add option `--legacy-server-connect` by [xtkoba](https://github.com/xtkoba)
* Allow escaped `,` in `--extractor-args`
* Allow unicode characters in `info.json`
* Check for existing thumbnail/subtitle in final directory
* Don't treat empty containers as `None` in `sanitize_info`
* Fix `-s --ignore-no-formats --force-write-archive`
* Fix live title for multiple formats
* List playlist thumbnails in `--list-thumbnails`
* Raise error if subtitle download fails
* [cookies] Fix bug when keyring is unspecified
* [ffmpeg] Ignore unknown streams, standardize use of `-map 0`
* [outtmpl] Alternate form for `D` and fix suffix's case
* [utils] Add `Sec-Fetch-Mode` to `std_headers`
* [utils] Fix `format_bytes` output for Bytes by [pukkandan](https://github.com/pukkandan), [mdawar](https://github.com/mdawar)
* [utils] Handle `ss:xxx` in `parse_duration`
* [utils] Improve parsing for nested HTML elements by [zmousm](https://github.com/zmousm), [pukkandan](https://github.com/pukkandan)
* [utils] Use key `None` in `traverse_obj` to return as-is
* [extractor] Detect more subtitle codecs in MPD manifests by [fstirlitz](https://github.com/fstirlitz)
* [extractor] Extract chapters from JSON-LD by [iw0nderhow](https://github.com/iw0nderhow), [pukkandan](https://github.com/pukkandan)
* [extractor] Extract thumbnails from JSON-LD by [nixxo](https://github.com/nixxo)
* [extractor] Improve `url_result` and related
* [generic] Improve KVS player extraction by [trassshhub](https://github.com/trassshhub)
* [build] Reduce dependency on third party workflows
* [extractor,cleanup] Use `_search_nextjs_data`, `format_field`
* [cleanup] Minor fixes and cleanup
* [docs] Improvements
* [test] Fix TestVerboseOutput
* [afreecatv] Add livestreams extractor by [wlritchi](https://github.com/wlritchi)
* [callin] Add extractor by [foghawk](https://github.com/foghawk)
* [CrowdBunker] Add extractors by [Ashish0804](https://github.com/Ashish0804)
* [daftsex] Add extractors by [k3ns1n](https://github.com/k3ns1n)
* [digitalconcerthall] Add extractor by [teridon](https://github.com/teridon)
* [Drooble] Add extractor by [u-spec-png](https://github.com/u-spec-png)
* [EuropeanTour] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [iq.com] Add extractors by [MinePlayersPE](https://github.com/MinePlayersPE)
* [KelbyOne] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [LnkIE] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [MainStreaming] Add extractor by [coletdjnz](https://github.com/coletdjnz)
* [megatvcom] Add extractors by [zmousm](https://github.com/zmousm)
* [Newsy] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [noodlemagazine] Add extractor by [trassshhub](https://github.com/trassshhub)
* [PokerGo] Add extractors by [Ashish0804](https://github.com/Ashish0804)
* [Pornez] Add extractor by [mozlima](https://github.com/mozlima)
* [PRX] Add Extractors by [coletdjnz](https://github.com/coletdjnz)
* [RTNews] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [Rule34video] Add extractor by [trassshhub](https://github.com/trassshhub)
* [tvopengr] Add extractors by [zmousm](https://github.com/zmousm)
* [Vimm] Add extractor by [alerikaisattera](https://github.com/alerikaisattera)
* [glomex] Add extractors by [zmousm](https://github.com/zmousm)
* [instagram] Add story/highlight extractor by [u-spec-png](https://github.com/u-spec-png)
* [openrec] Add movie extractor by [Lesmiscore](https://github.com/Lesmiscore)
* [rai] Add Raiplaysound extractors by [nixxo](https://github.com/nixxo), [pukkandan](https://github.com/pukkandan)
* [aparat] Fix extractor
* [ard] Extract subtitles by [fstirlitz](https://github.com/fstirlitz)
* [BiliIntl] Add login by [MinePlayersPE](https://github.com/MinePlayersPE)
* [CeskaTelevize] Use `http` for manifests
* [CTVNewsIE] Add fallback for video search by [Ashish0804](https://github.com/Ashish0804)
* [dplay] Migrate DiscoveryPlusItaly to DiscoveryPlus by [timendum](https://github.com/timendum)
* [dplay] Re-structure DiscoveryPlus extractors
* [Dropbox] Support password protected files and more formats by [zenerdi0de](https://github.com/zenerdi0de)
* [facebook] Fix extraction from groups
* [facebook] Improve title and uploader extraction
* [facebook] Parse dash manifests
* [fox] Extract m3u8 from preview by [ischmidt20](https://github.com/ischmidt20)
* [funk] Support origin URLs
* [gfycat] Fix `uploader`
* [gfycat] Support embeds by [coletdjnz](https://github.com/coletdjnz)
* [hotstar] Add extractor args to ignore tags by [Ashish0804](https://github.com/Ashish0804)
* [hrfernsehen] Fix ardloader extraction by [CreaValix](https://github.com/CreaValix)
* [instagram] Fix username extraction for stories and highlights by [nyuszika7h](https://github.com/nyuszika7h)
* [kakao] Detect geo-restriction
* [line] Remove `tv.line.me` by [sian1468](https://github.com/sian1468)
* [mixch] Add `MixchArchiveIE` by [Lesmiscore](https://github.com/Lesmiscore)
* [mixcloud] Detect restrictions by [llacb47](https://github.com/llacb47)
* [NBCSports] Fix extraction of platform URLs by [ischmidt20](https://github.com/ischmidt20)
* [Nexx] Extract more metadata by [MinePlayersPE](https://github.com/MinePlayersPE)
* [Nexx] Support 3q CDN by [MinePlayersPE](https://github.com/MinePlayersPE)
* [pbs] de-prioritize AD formats
* [PornHub,YouTube] Refresh onion addresses by [unit193](https://github.com/unit193)
* [RedBullTV] Parse subtitles from manifest by [Ashish0804](https://github.com/Ashish0804)
* [streamcz] Fix extractor by [arkamar](https://github.com/arkamar), [pukkandan](https://github.com/pukkandan)
* [Ted] Rewrite extractor by [pukkandan](https://github.com/pukkandan), [trassshhub](https://github.com/trassshhub)
* [Theta] Fix valid URL by [alerikaisattera](https://github.com/alerikaisattera)
* [ThisOldHouseIE] Add support for premium videos by [Ashish0804](https://github.com/Ashish0804)
* [TikTok] Fix extraction for sigi-based webpages, add API fallback by [MinePlayersPE](https://github.com/MinePlayersPE)
* [TikTok] Pass cookies to formats, and misc fixes by [MinePlayersPE](https://github.com/MinePlayersPE)
* [TikTok] Extract captions, user thumbnail by [MinePlayersPE](https://github.com/MinePlayersPE)
* [TikTok] Change app version by [MinePlayersPE](https://github.com/MinePlayersPE), [llacb47](https://github.com/llacb47)
* [TVer] Extract message for unaired live by [Lesmiscore](https://github.com/Lesmiscore)
* [twitcasting] Refactor extractor by [Lesmiscore](https://github.com/Lesmiscore)
* [twitter] Fix video in quoted tweets
* [veoh] Improve extractor by [foghawk](https://github.com/foghawk)
* [vk] Capture `clip` URLs
* [vk] Fix VKUserVideosIE by [Ashish0804](https://github.com/Ashish0804)
* [vk] Improve `_VALID_URL` by [k3ns1n](https://github.com/k3ns1n)
* [VrtNU] Handle empty title by [pgaig](https://github.com/pgaig)
* [XVideos] Check HLS formats by [MinePlayersPE](https://github.com/MinePlayersPE)
* [yahoo:gyao] Improved playlist handling by [hyano](https://github.com/hyano)
* [youtube:tab] Extract more playlist metadata by [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [youtube:tab] Raise error on tab redirect by [krichbanana](https://github.com/krichbanana), [coletdjnz](https://github.com/coletdjnz)
* [youtube] Update Innertube clients by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Detect live-stream embeds
* [youtube] Do not return `upload_date` for playlists
* [youtube] Extract channel subscriber count by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Make invalid storyboard URL non-fatal
* [youtube] Enforce UTC, update innertube clients and tests by [coletdjnz](https://github.com/coletdjnz)
* [zdf] Add chapter extraction by [iw0nderhow](https://github.com/iw0nderhow)
* [zee5] Add geo-bypass
### 2021.12.27
* Avoid recursion error when re-extracting info
* [ffmpeg] Fix position of `--ppa`
* [aria2c] Don't show progress when `--no-progress`
* [cookies] Support other keyrings by [mbway](https://github.com/mbway)
* [EmbedThumbnail] Prefer AtomicParsley over ffmpeg if available
* [generic] Fix HTTP KVS Player by [git-anony-mouse](https://github.com/git-anony-mouse)
* [ThumbnailsConvertor] Fix for when there are no thumbnails
* [docs] Add examples for using `TYPES:` in `-P`/`-o`
* [PixivSketch] Add extractors by [nao20010128nao](https://github.com/nao20010128nao)
* [tiktok] Add music, sticker and tag IEs by [MinePlayersPE](https://github.com/MinePlayersPE)
* [BiliIntl] Fix extractor by [MinePlayersPE](https://github.com/MinePlayersPE)
* [CBC] Fix URL regex
* [tiktok] Fix `extractor_key` used in archive
* [youtube] **End `live-from-start` properly when stream ends with 403**
* [Zee5] Fix VALID_URL for tv-shows by [Ashish0804](https://github.com/Ashish0804)
### 2021.12.25
* [dash,youtube] **Download live from start to end** by [nao20010128nao](https://github.com/nao20010128nao), [pukkandan](https://github.com/pukkandan)
* Add option `--live-from-start` to enable downloading live videos from start
* Add key `is_from_start` in formats to identify formats (of live videos) that downloads from start
* [dash] Create protocol `http_dash_segments_generator` that allows a function to be passed instead of fragments
* [fragment] Allow multiple live dash formats to download simultaneously
* [youtube] Implement fragment re-fetching for the live dash formats
* [youtube] Re-extract dash manifest every 5 hours (manifest expires in 6hrs)
* [postprocessor/ffmpeg] Add `FFmpegFixupDuplicateMoovPP` to fixup duplicated moov atoms
* Known issues:
* Ctrl+C doesn't work on Windows when downloading multiple formats
* If video becomes private, download hangs
* [SponsorBlock] Add `Filler` and `Highlight` categories by [nihil-admirari](https://github.com/nihil-admirari), [pukkandan](https://github.com/pukkandan)
* Change `--sponsorblock-cut all` to `--sponsorblock-cut default` if you do not want filler sections to be removed
* Add field `webpage_url_domain`
* Add interactive format selection with `-f -`
* Add option `--file-access-retries` by [ehoogeveen-medweb](https://github.com/ehoogeveen-medweb)
* [outtmpl] Add alternate forms `S`, `D` and improve `id` detection
* [outtmpl] Add operator `&` for replacement text by [PilzAdam](https://github.com/PilzAdam)
* [EmbedSubtitle] Disable duration check temporarily
* [extractor] Add `_search_nuxt_data` by [nao20010128nao](https://github.com/nao20010128nao)
* [extractor] Ignore errors in comment extraction when `-i` is given
* [extractor] Standardize `_live_title`
* [FormatSort] Prevent incorrect deprecation warning
* [generic] Extract m3u8 formats from JSON-LD
* [postprocessor/ffmpeg] Always add `faststart`
* [utils] Fix parsing `YYYYMMDD` dates in Nov/Dec by [wlritchi](https://github.com/wlritchi)
* [utils] Improve `parse_count`
* [utils] Update `std_headers` by [kikuyan](https://github.com/kikuyan), [fstirlitz](https://github.com/fstirlitz)
* [lazy_extractors] Fix for search IEs
* [extractor] Support default implicit graph in JSON-LD by [zmousm](https://github.com/zmousm)
* Allow `--no-write-thumbnail` to override `--write-all-thumbnail`
* Fix `--throttled-rate`
* Fix control characters being printed to `--console-title`
* Fix PostProcessor hooks not registered for some PPs
* Pre-process when using `--flat-playlist`
* Remove known invalid thumbnails from `info_dict`
* Add warning when using `-f best`
* Use `parse_duration` for `--wait-for-video` and some minor fix
* [test/download] Add more fields
* [test/download] Ignore field `webpage_url_domain` by [std-move](https://github.com/std-move)
* [compat] Suppress errors in enabling VT mode
* [docs] Improve manpage format by [iw0nderhow](https://github.com/iw0nderhow), [pukkandan](https://github.com/pukkandan)
* [docs,cleanup] Minor fixes and cleanup
* [cleanup] Fix some typos by [unit193](https://github.com/unit193)
* [ABC:iview] Add show extractor by [pabs3](https://github.com/pabs3)
* [dropout] Add extractor by [TwoThousandHedgehogs](https://github.com/TwoThousandHedgehogs), [pukkandan](https://github.com/pukkandan)
* [GameJolt] Add extractors by [MinePlayersPE](https://github.com/MinePlayersPE)
* [gofile] Add extractor by [Jertzukka](https://github.com/Jertzukka), [Ashish0804](https://github.com/Ashish0804)
* [hse] Add extractors by [cypheron](https://github.com/cypheron), [pukkandan](https://github.com/pukkandan)
* [NateTV] Add NateIE and NateProgramIE by [Ashish0804](https://github.com/Ashish0804), [Hyeeji](https://github.com/Hyeeji)
* [OpenCast] Add extractors by [bwildenhain](https://github.com/bwildenhain), [C0D3D3V](https://github.com/C0D3D3V)
* [rtve] Add `RTVEAudioIE` by [kebianizao](https://github.com/kebianizao)
* [Rutube] Add RutubeChannelIE by [Ashish0804](https://github.com/Ashish0804)
* [skeb] Add extractor by [nao20010128nao](https://github.com/nao20010128nao)
* [soundcloud] Add related tracks extractor by [Lapin0t](https://github.com/Lapin0t)
* [toggo] Add extractor by [nyuszika7h](https://github.com/nyuszika7h)
* [TrueID] Add extractor by [MinePlayersPE](https://github.com/MinePlayersPE)
* [audiomack] Update album and song VALID_URL by [abdullah-if](https://github.com/abdullah-if), [dirkf](https://github.com/dirkf)
* [CBC Gem] Extract 1080p formats by [DavidSkrundz](https://github.com/DavidSkrundz)
* [ceskatelevize] Fetch iframe from nextJS data by [mkubecek](https://github.com/mkubecek)
* [crackle] Look for non-DRM formats by [raleeper](https://github.com/raleeper)
* [dplay] Temporary fix for `discoveryplus.com/it`
* [DiscoveryPlusShowBaseIE] yield actual video id by [Ashish0804](https://github.com/Ashish0804)
* [Facebook] Handle redirect URLs
* [fujitv] Extract 1080p from `tv_android` m3u8 by [YuenSzeHong](https://github.com/YuenSzeHong)
* [gronkh] Support new URL pattern by [Sematre](https://github.com/Sematre)
* [instagram] Expand valid URL by [u-spec-png](https://github.com/u-spec-png)
* [Instagram] Try bypassing login wall with embed page by [MinePlayersPE](https://github.com/MinePlayersPE)
* [Jamendo] Fix use of `_VALID_URL_RE` by [jaller94](https://github.com/jaller94)
* [LBRY] Support livestreams by [Ashish0804](https://github.com/Ashish0804), [pukkandan](https://github.com/pukkandan)
* [NJPWWorld] Extract formats from m3u8 by [aarubui](https://github.com/aarubui)
* [NovaEmbed] update player regex by [std-move](https://github.com/std-move)
* [npr] Make SMIL extraction non-fatal by [r5d](https://github.com/r5d)
* [ntvcojp] Extract NUXT data by [nao20010128nao](https://github.com/nao20010128nao)
* [ok.ru] add mobile fallback by [nao20010128nao](https://github.com/nao20010128nao)
* [olympics] Add uploader and cleanup by [u-spec-png](https://github.com/u-spec-png)
* [ondemandkorea] Update `jw_config` regex by [julien-hadleyjack](https://github.com/julien-hadleyjack)
* [PlutoTV] Expand `_VALID_URL`
* [RaiNews] Fix extractor by [nixxo](https://github.com/nixxo)
* [RCTIPlusSeries] Lazy extraction and video type selection by [MinePlayersPE](https://github.com/MinePlayersPE)
* [redtube] Handle formats delivered inside a JSON by [dirkf](https://github.com/dirkf), [nixxo](https://github.com/nixxo)
* [SonyLiv] Add OTP login support by [Ashish0804](https://github.com/Ashish0804)
* [Steam] Fix extractor by [u-spec-png](https://github.com/u-spec-png)
* [TikTok] Pass cookies to mobile API by [MinePlayersPE](https://github.com/MinePlayersPE)
* [trovo] Fix inheritance of `TrovoChannelBaseIE`
* [TVer] Extract better thumbnails by [YuenSzeHong](https://github.com/YuenSzeHong)
* [vimeo] Extract chapters
* [web.archive:youtube] Improve metadata extraction by [coletdjnz](https://github.com/coletdjnz)
* [youtube:comments] Add more options for limiting number of comments extracted by [coletdjnz](https://github.com/coletdjnz)
* [youtube:tab] Extract more metadata from feeds/channels/playlists by [coletdjnz](https://github.com/coletdjnz)
* [youtube:tab] Extract video thumbnails from playlist by [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [youtube:tab] Ignore query when redirecting channel to playlist and cleanup of related code
* [youtube] Fix `ytsearchdate`
* [zdf] Support videos with different ptmd location by [iw0nderhow](https://github.com/iw0nderhow)
* [zee5] Support /episodes in URL
### 2021.12.01
* **Add option `--wait-for-video` to wait for scheduled streams**
@@ -38,7 +343,7 @@
* Ensure directory exists when checking formats
* Ensure path for link files exists by [Zirro](https://github.com/Zirro)
* Ensure same config file is not loaded multiple times
* Fix 'postprocessor_hooks`
* Fix `postprocessor_hooks`
* Fix `--break-on-archive` when pre-checking
* Fix `--check-formats` for `mhtml`
* Fix `--load-info-json` of playlists with failed entries
@@ -110,10 +415,9 @@
* [youtube] Decrypt n-sig for URLs with `ratebypass`
* [youtube] Minor improvement to format sorting
* [cleanup] Add deprecation warnings
* [cleanup] Minor cleanup
* [cleanup] Misc cleanup
* [cleanup] Refactor `JSInterpreter._seperate`
* [Cleanup] Remove some unnecessary groups in regexes by [Ashish0804](https://github.com/Ashish0804)
* [cleanup] Misc cleanup
### 2021.11.10.1
@@ -1476,6 +1780,7 @@
**Note**: All uncredited changes above this point are authored by [pukkandan](https://github.com/pukkandan)
### Unreleased changes in [blackjack4494/yt-dlc](https://github.com/blackjack4494/yt-dlc)
* Updated to youtube-dl release 2020.11.26 by [pukkandan](https://github.com/pukkandan)
* Youtube improvements by [pukkandan](https://github.com/pukkandan)
* Implemented all Youtube Feeds (ytfav, ytwatchlater, ytsubs, ythistory, ytrec) and SearchURL

View File

@@ -28,6 +28,7 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
[![gh-sponsor](https://img.shields.io/badge/_-Sponsor-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/coletdjnz)
* YouTube improvements including: age-gate bypass, private playlists, multiple-clients (to avoid throttling) and a lot of under-the-hood improvements
* Added support for downloading YoutubeWebArchive videos
@@ -35,5 +36,15 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/ashish0804)
* Added support for new websites Zee5, MXPlayer, DiscoveryPlusIndia, ShemarooMe, Utreon etc
* Added playlist/series downloads for TubiTv, SonyLIV, Voot, HotStar etc
* Added support for new websites BiliIntl, DiscoveryPlusIndia, OlympicsReplay, PlanetMarathi, ShemarooMe, Utreon, Zee5 etc
* Added playlist/series downloads for Hotstar, ParamountPlus, Rumble, SonyLIV, Trovo, TubiTv, Voot etc
* Improved/fixed support for HiDive, HotStar, Hungama, LBRY, LinkedInLearning, Mxplayer, SonyLiv, TV2, Vimeo, VLive etc
## [Lesmiscore](https://github.com/Lesmiscore) (nao20010128nao)
**Bitcoin**: bc1qfd02r007cutfdjwjmyy9w23rjvtls6ncve7r3s
**Monacoin**: mona1q3tf7dzvshrhfe3md379xtvt2n22duhglv5dskr
* Download live from start to end for YouTube
* Added support for new websites mildom, PixivSketch, skeb, radiko, voicy, mirrativ, openrec, whowatch, damtomo, 17.live, mixch etc

View File

@@ -1,5 +1,6 @@
all: lazy-extractors yt-dlp doc pypi-files
clean: clean-test clean-dist clean-cache
clean: clean-test clean-dist
clean-all: clean clean-cache
completions: completion-bash completion-fish completion-zsh
doc: README.md CONTRIBUTING.md issuetemplates supportedsites
ot: offlinetest
@@ -13,15 +14,15 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites com
.PHONY: all clean install test tar pypi-files completions ot offlinetest codetest supportedsites
clean-test:
rm -rf *.3gp *.annotations.xml *.ape *.avi *.description *.dump *.flac *.flv *.frag *.frag.aria2 *.frag.urls \
*.info.json *.jpeg *.jpg *.live_chat.json *.m4a *.m4v *.mkv *.mp3 *.mp4 *.ogg *.opus *.part* *.png *.sbv *.srt \
*.swf *.swp *.ttml *.vtt *.wav *.webm *.webp *.mhtml *.mov *.unknown_video *.desktop *.url *.webloc *.ytdl \
test/testdata/player-*.js tmp/
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.avi *.desktop *.flac *.flv *.jpeg *.jpg *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 \
*.mp4 *.ogg *.opus *.png *.sbv *.srt *.swf *.swp *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp
clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \
yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS .mailmap
clean-cache:
find . -name "*.pyc" -o -name "*.class" -delete
find . \( -name "*.pyc" -o -name "*.class" \) -delete
completion-bash: completions/bash/yt-dlp
completion-fish: completions/fish/yt-dlp.fish

485
README.md

File diff suppressed because it is too large Load Diff

View File

@@ -39,12 +39,6 @@ class {name}({bases}):
_module = '{module}'
'''
make_valid_template = '''
@classmethod
def _make_valid_url(cls):
return {valid_url!r}
'''
def get_base_name(base):
if base is InfoExtractor:
@@ -61,15 +55,14 @@ def build_lazy_ie(ie, name):
bases=', '.join(map(get_base_name, ie.__bases__)),
module=ie.__module__)
valid_url = getattr(ie, '_VALID_URL', None)
if not valid_url and hasattr(ie, '_make_valid_url'):
valid_url = ie._make_valid_url()
if valid_url:
s += f' _VALID_URL = {valid_url!r}\n'
if not ie._WORKING:
s += ' _WORKING = False\n'
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += f'\n{getsource(ie.suitable)}'
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(valid_url=ie._make_valid_url())
return s

View File

@@ -13,12 +13,14 @@ PREFIX = r'''%yt-dlp(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
yt\-dlp \- A youtube-dl fork with additional features and patches
# SYNOPSIS
**yt-dlp** \[OPTIONS\] URL [URL...]
# DESCRIPTION
'''
@@ -33,47 +35,63 @@ def main():
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+yt-dlp \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_excluded_sections(readme)
readme = move_sections(readme)
readme = filter_options(readme)
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(readme)
outf.write(PREFIX + readme)
def filter_excluded_sections(readme):
EXCLUDED_SECTION_BEGIN_STRING = re.escape('<!-- MANPAGE: BEGIN EXCLUDED SECTION -->')
EXCLUDED_SECTION_END_STRING = re.escape('<!-- MANPAGE: END EXCLUDED SECTION -->')
return re.sub(
rf'(?s){EXCLUDED_SECTION_BEGIN_STRING}.+?{EXCLUDED_SECTION_END_STRING}\n',
'', readme)
def move_sections(readme):
MOVE_TAG_TEMPLATE = '<!-- MANPAGE: MOVE "%s" SECTION HERE -->'
sections = re.findall(r'(?m)^%s$' % (
re.escape(MOVE_TAG_TEMPLATE).replace(r'\%', '%') % '(.+)'), readme)
for section_name in sections:
move_tag = MOVE_TAG_TEMPLATE % section_name
if readme.count(move_tag) > 1:
raise Exception(f'There is more than one occurrence of "{move_tag}". This is unexpected')
sections = re.findall(rf'(?sm)(^# {re.escape(section_name)}.+?)(?=^# )', readme)
if len(sections) < 1:
raise Exception(f'The section {section_name} does not exist')
elif len(sections) > 1:
raise Exception(f'There are multiple occurrences of section {section_name}, this is unhandled')
readme = readme.replace(sections[0], '', 1).replace(move_tag, sections[0], 1)
return readme
def filter_options(readme):
ret = ''
in_options = False
for line in readme.split('\n'):
if line.startswith('# '):
if line[2:].startswith('OPTIONS'):
in_options = True
else:
in_options = False
section = re.search(r'(?sm)^# USAGE AND OPTIONS\n.+?(?=^# )', readme).group(0)
options = '# OPTIONS\n'
for line in section.split('\n')[1:]:
if line.lstrip().startswith('-'):
split = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if in_options:
if line.lstrip().startswith('-'):
split = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + [f'*{split_option[-1]}*'])
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html
options += f'\n{option}\n: {description}\n'
continue
options += line.lstrip() + '\n'
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
continue
ret += line.lstrip() + '\n'
else:
ret += line + '\n'
return ret
return readme.replace(section, options, 1)
if __name__ == '__main__':

View File

@@ -27,13 +27,13 @@ try:
except Exception:
GIT_HEAD = None
VERSION_FILE = f'''
VERSION_FILE = f'''\
# Autogenerated by devscripts/update-version.py
__version__ = {VERSION!r}
RELEASE_GIT_HEAD = {GIT_HEAD!r}
'''.lstrip()
'''
with open('yt_dlp/version.py', 'wt') as f:
f.write(VERSION_FILE)

5
docs/Contributing.md Normal file
View File

@@ -0,0 +1,5 @@
---
orphan: true
---
```{include} ../Contributing.md
```

View File

@@ -40,7 +40,7 @@ def main():
'--icon=devscripts/logo.ico',
'--upx-exclude=vcruntime140.dll',
'--noconfirm',
*dependancy_options(),
*dependency_options(),
*opts,
'yt_dlp/__main__.py',
]
@@ -73,11 +73,11 @@ def version_to_list(version):
return list(map(int, version_list)) + [0] * (4 - len(version_list))
def dependancy_options():
dependancies = [pycryptodome_module(), 'mutagen'] + collect_submodules('websockets')
def dependency_options():
dependencies = [pycryptodome_module(), 'mutagen'] + collect_submodules('websockets')
excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc']
yield from (f'--hidden-import={module}' for module in dependancies)
yield from (f'--hidden-import={module}' for module in dependencies)
yield from (f'--exclude-module={module}' for module in excluded_modules)

View File

@@ -21,6 +21,7 @@
- **9now.com.au**
- **abc.net.au**
- **abc.net.au:iview**
- **abc.net.au:iview:showseries**
- **abcnews**
- **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
@@ -40,6 +41,7 @@
- **aenetworks:collection**
- **aenetworks:show**
- **afreecatv**: afreecatv.com
- **afreecatv:live**: afreecatv.com
- **AirMozilla**
- **AliExpressLive**
- **AlJazeera**
@@ -52,6 +54,7 @@
- **AMCNetworks**
- **AmericasTestKitchen**
- **AmericasTestKitchenSeason**
- **AmHistoryChannel**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimalPlanet**
- **AnimeLab**
@@ -161,6 +164,7 @@
- **BuzzFeed**
- **BYUtv**
- **CableAV**
- **Callin**
- **CAM4**
- **Camdemy**
- **CamdemyFolder**
@@ -224,6 +228,7 @@
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **CONtv**
- **CookingChannel**
- **Corus**
- **Coub**
- **CozyTV**
@@ -231,6 +236,8 @@
- **Cracked**
- **Crackle**
- **CrooksAndLiars**
- **CrowdBunker**
- **CrowdBunkerChannel**
- **crunchyroll**
- **crunchyroll:beta**
- **crunchyroll:playlist**
@@ -245,6 +252,7 @@
- **curiositystream:collections**
- **curiositystream:series**
- **CWTV**
- **Daftsex**
- **DagelijkseKost**: dagelijksekost.een.be
- **DailyMail**
- **dailymotion**
@@ -262,19 +270,20 @@
- **DeezerPlaylist**
- **defense.gouv.fr**
- **democracynow**
- **DestinationAmerica**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **Digg**
- **DigitalConcertHall**: DigitalConcertHall extractor
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **DiscoveryGo**
- **DiscoveryGoPlaylist**
- **DiscoveryLife**
- **DiscoveryNetworksDe**
- **DiscoveryPlus**
- **DiscoveryPlusIndia**
- **DiscoveryPlusIndiaShow**
- **DiscoveryPlusItaly**
- **DiscoveryPlusItalyShow**
- **DiscoveryVR**
- **Disney**
- **DIYNetwork**
- **dlive:stream**
@@ -286,7 +295,10 @@
- **DouyuTV**: 斗鱼
- **DPlay**
- **DRBonanza**
- **Drooble**
- **Dropbox**
- **Dropout**
- **DropoutSeason**
- **DrTuber**
- **drtv**
- **drtv:live**
@@ -320,12 +332,16 @@
- **Eporner**
- **EroProfile**
- **EroProfile:album**
- **ertflix**: ERTFLIX videos
- **ertflix:codename**: ERTFLIX videos by codename
- **ertwebtv:embed**: ert.gr webtv embedded videos
- **Escapist**
- **ESPN**
- **ESPNArticle**
- **ESPNCricInfo**
- **EsriVideo**
- **Europa**
- **EuropeanTour**
- **EUScreen**
- **EWETV**
- **ExpoTV**
@@ -348,6 +364,7 @@
- **FiveTV**
- **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FoodNetwork**
- **FootyRoom**
- **Formula1**
- **FOX**
@@ -379,6 +396,12 @@
- **GabTV**
- **Gaia**
- **GameInformer**
- **GameJolt**
- **GameJoltCommunity**
- **GameJoltGame**
- **GameJoltGameSoundtrack**
- **GameJoltSearch**
- **GameJoltUser**
- **GameSpot**
- **GameStar**
- **Gaskrank**
@@ -397,8 +420,12 @@
- **Glide**: Glide mobile video messages (glide.me)
- **Globo**
- **GloboArticle**
- **glomex**: Glomex videos
- **glomex:embed**: Glomex embedded videos
- **Go**
- **GoDiscovery**
- **GodTube**
- **Gofile**
- **Golem**
- **google:podcasts**
- **google:podcasts:feed**
@@ -418,6 +445,7 @@
- **hetklokhuis**
- **hgtv.com:show**
- **HGTVDe**
- **HGTVUsa**
- **HiDive**
- **HistoricFilms**
- **history:player**
@@ -436,6 +464,8 @@
- **hrfernsehen**
- **HRTi**
- **HRTiPlaylist**
- **HSEProduct**
- **HSEShow**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post
- **Hungama**
@@ -457,13 +487,17 @@
- **IndavideoEmbed**
- **InfoQ**
- **Instagram**
- **instagram:story**
- **instagram:tag**: Instagram hashtag search URLs
- **instagram:user**: Instagram user profile
- **InstagramIOS**: IOS instagram:// URL
- **Internazionale**
- **InternetVideoArchive**
- **InvestigationDiscovery**
- **IPrima**
- **IPrimaCNN**
- **iq.com**: International version of iQiyi
- **iq.com:album**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ITTF**
@@ -487,6 +521,7 @@
- **KarriereVideos**
- **Katsomo**
- **KeezMovies**
- **KelbyOne**
- **Ketnet**
- **khanacademy**
- **khanacademy:unit**
@@ -532,7 +567,6 @@
- **limelight:channel_list**
- **LineLive**
- **LineLiveChannel**
- **LineTV**
- **LinkedIn**
- **linkedin:learning**
- **linkedin:learning:course**
@@ -541,6 +575,7 @@
- **LiveJournal**
- **livestream**
- **livestream:original**
- **Lnk**
- **LnkGo**
- **loc**: Library of Congress
- **LocalNews8**
@@ -553,6 +588,7 @@
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MainStreaming**: MainStreaming Player
- **MallTV**
- **mangomolo:live**
- **mangomolo:video**
@@ -579,6 +615,8 @@
- **MediasiteNamedCatalog**
- **Medici**
- **megaphone.fm**: megaphone.fm embedded players
- **megatvcom**: megatv.com videos
- **megatvcom:embed**: megatv.com embedded videos
- **Meipai**: 美拍
- **MelonVOD**
- **META**
@@ -602,6 +640,7 @@
- **mirrativ:user**
- **MiTele**: mitele.es
- **mixch**
- **mixch:archive**
- **mixcloud**
- **mixcloud:playlist**
- **mixcloud:user**
@@ -634,6 +673,10 @@
- **MTVUutisetArticle**
- **MuenchenTV**: münchen.tv
- **MuseScore**
- **MusicdexAlbum**
- **MusicdexArtist**
- **MusicdexPlaylist**
- **MusicdexSong**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
@@ -652,6 +695,8 @@
- **n-tv.de**
- **N1Info:article**
- **N1InfoAsset**
- **Nate**
- **NateProgram**
- **natgeo:video**
- **NationalGeographicTV**
- **Naver**
@@ -689,6 +734,7 @@
- **Newgrounds:playlist**
- **Newgrounds:user**
- **Newstube**
- **Newsy**
- **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **NextTV**: 壹電視
@@ -718,6 +764,7 @@
- **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize**
- **NonkTube**
- **NoodleMagazine**
- **Noovo**
- **Normalboots**
- **NosVideo**
@@ -766,8 +813,11 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **Opencast**
- **OpencastPlaylist**
- **openrec**
- **openrec:capture**
- **openrec:movie**
- **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4
@@ -819,6 +869,8 @@
- **Pinkbike**
- **Pinterest**
- **PinterestCollection**
- **pixiv:sketch**
- **pixiv:sketch:user**
- **Pladform**
- **PlanetMarathi**
- **Platzi**
@@ -837,6 +889,8 @@
- **podomatic**
- **Pokemon**
- **PokemonWatch**
- **PokerGo**
- **PokerGoCollection**
- **PolsatGo**
- **PolskieRadio**
- **polskieradio:kierowcow**
@@ -848,6 +902,7 @@
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **Pornez**
- **PornFlip**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
@@ -862,6 +917,11 @@
- **PressTV**
- **ProjectVeritas**
- **prosiebensat1**: ProSiebenSat.1 Digital
- **PRXAccount**
- **PRXSeries**
- **prxseries:search**: PRX Series Search; "prxseries:" prefix
- **prxstories:search**: PRX Stories Search; "prxstories:" prefix
- **PRXStory**
- **puhutv**
- **puhutv:serie**
- **Puls4**
@@ -895,8 +955,9 @@
- **RaiPlay**
- **RaiPlayLive**
- **RaiPlayPlaylist**
- **RaiPlayRadio**
- **RaiPlayRadioPlaylist**
- **RaiPlaySound**
- **RaiPlaySoundLive**
- **RaiPlaySoundPlaylist**
- **RayWenderlich**
- **RayWenderlichCourse**
- **RBMARadio**
@@ -931,30 +992,37 @@
- **Roxwel**
- **Rozhlas**
- **RTBF**
- **RTDocumentry**
- **RTDocumentryPlaylist**
- **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio
- **rtl.nl**: rtl.nl and rtlxl.nl
- **rtl2**
- **rtl2:you**
- **rtl2:you:series**
- **RTNews**
- **RTP**
- **RTRFM**
- **RTS**: RTS.ch
- **rtve.es:alacarta**: RTVE a la carta
- **rtve.es:audio**: RTVE audio
- **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH**
- **RTVS**
- **RUHD**
- **Rule34Video**
- **RumbleChannel**
- **RumbleEmbed**
- **Ruptly**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
- **rutube:channel**: Rutube channel
- **rutube:embed**: Rutube embedded videos
- **rutube:movie**: Rutube movies
- **rutube:person**: Rutube person videos
- **rutube:playlist**: Rutube playlists
- **rutube:tags**: Rutube tags
- **RUTV**: RUTV.RU
- **Ruutu**
- **Ruv**
@@ -994,6 +1062,7 @@
- **simplecast:episode**
- **simplecast:podcast**
- **Sina**
- **Skeb**
- **sky.it**
- **sky:news**
- **sky:news:story**
@@ -1013,6 +1082,7 @@
- **SonyLIVSeries**
- **soundcloud**
- **soundcloud:playlist**
- **soundcloud:related**
- **soundcloud:search**: Soundcloud search; "scsearch:" prefix
- **soundcloud:set**
- **soundcloud:trackstation**
@@ -1086,7 +1156,10 @@
- **TeamTreeHouse**
- **TechTalks**
- **techtv.mit.edu**
- **ted**
- **TedEmbed**
- **TedPlaylist**
- **TedSeries**
- **TedTalk**
- **Tele13**
- **Tele5**
- **TeleBruxelles**
@@ -1120,12 +1193,17 @@
- **ThreeSpeak**
- **ThreeSpeakUser**
- **TikTok**
- **tiktok:effect**
- **tiktok:sound**
- **tiktok:tag**
- **tiktok:user**
- **tinypic**: tinypic.com videos
- **TLC**
- **TMZ**
- **TNAFlix**
- **TNAFlixNetworkEmbed**
- **toggle**
- **toggo**
- **Tokentube**
- **Tokentube:channel**
- **ToonGoggles**
@@ -1133,11 +1211,13 @@
- **Toypics**: Toypics video
- **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken)
- **TravelChannel**
- **Trilulilu**
- **Trovo**
- **TrovoChannelClip**: All Clips of a trovo.live channel; "trovoclip:" prefix
- **TrovoChannelVod**: All VODs of a trovo.live channel; "trovovod:" prefix
- **TrovoVod**
- **TrueID**
- **TruNews**
- **TruTV**
- **Tube8**
@@ -1179,6 +1259,8 @@
- **TVNowNew**
- **TVNowSeason**
- **TVNowShow**
- **tvopengr:embed**: tvopen.gr embedded videos
- **tvopengr:watch**: tvopen.gr (and ethnos.gr) videos
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
@@ -1242,7 +1324,7 @@
- **Viddler**
- **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search; "gvsearch:" prefix (Currently broken)
- **video.google:search**: Google Video search; "gvsearch:" prefix
- **video.sky.it**
- **video.sky.it:live**
- **VideoDetective**
@@ -1271,6 +1353,8 @@
- **vimeo:review**: Review pages on vimeo
- **vimeo:user**
- **vimeo:watchlater**: Vimeo watch later list, "vimeowatchlater" keyword (requires authentication)
- **Vimm:recording**
- **Vimm:stream**
- **Vimple**: Vimple - one-click video hosting
- **Vine**
- **vine:user**
@@ -1323,7 +1407,7 @@
- **wdr:mobile**
- **WDRElefant**
- **WDRPage**
- **web.archive:youtube**: web.archive.org saved youtube videos
- **web.archive:youtube**: web.archive.org saved youtube videos, "ytarchive:" prefix
- **Webcaster**
- **WebcasterFeed**
- **WebOfStories**
@@ -1374,6 +1458,7 @@
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек
- **YandexVideo**
- **YandexVideoPreview**
- **YapFiles**
- **YesJapan**
- **yinyuetai:video**: 音悦Tai
@@ -1390,6 +1475,7 @@
- **youtube**: YouTube
- **youtube:favorites**: YouTube liked videos; ":ytfav" keyword (requires cookies)
- **youtube:history**: Youtube watch history; ":ythis" keyword (requires cookies)
- **youtube:music:search_url**: YouTube music search URLs with selectable sections (Eg: #songs)
- **youtube:playlist**: YouTube playlists
- **youtube:recommended**: YouTube recommended videos; ":ytrec" keyword
- **youtube:search**: YouTube search; "ytsearch:" prefix
@@ -1397,9 +1483,10 @@
- **youtube:search_url**: YouTube search URLs with sorting and filter support
- **youtube:subscriptions**: YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)
- **youtube:tab**: YouTube Tabs
- **youtube:user**: YouTube user videos; "ytuser:" prefix
- **youtube:watchlater**: Youtube watch later list; ":ytwatchlater" keyword (requires cookies)
- **YoutubeLivestreamEmbed**: YouTube livestream embeds
- **YoutubeYtBe**: youtu.be
- **YoutubeYtUser**: YouTube user videos; "ytuser:" prefix
- **Zapiks**
- **Zattoo**
- **ZattooLive**

View File

@@ -194,6 +194,53 @@ def expect_dict(self, got_dict, expected_dict):
expect_value(self, got, expected, info_field)
def sanitize_got_info_dict(got_dict):
IGNORED_FIELDS = (
# Format keys
'url', 'manifest_url', 'format', 'format_id', 'format_note', 'width', 'height', 'resolution',
'dynamic_range', 'tbr', 'abr', 'acodec', 'asr', 'vbr', 'fps', 'vcodec', 'container', 'filesize',
'filesize_approx', 'player_url', 'protocol', 'fragment_base_url', 'fragments', 'preference',
'language', 'language_preference', 'quality', 'source_preference', 'http_headers',
'stretched_ratio', 'no_resume', 'has_drm', 'downloader_options',
# RTMP formats
'page_url', 'app', 'play_path', 'tc_url', 'flash_version', 'rtmp_live', 'rtmp_conn', 'rtmp_protocol', 'rtmp_real_time',
# Lists
'formats', 'thumbnails', 'subtitles', 'automatic_captions', 'comments', 'entries',
# Auto-generated
'autonumber', 'playlist', 'format_index', 'video_ext', 'audio_ext', 'duration_string', 'epoch',
'fulltitle', 'extractor', 'extractor_key', 'filepath', 'infojson_filename', 'original_url', 'n_entries',
# Only live_status needs to be checked
'is_live', 'was_live',
)
IGNORED_PREFIXES = ('', 'playlist', 'requested', 'webpage')
def sanitize(key, value):
if isinstance(value, str) and len(value) > 100 and key != 'thumbnail':
return f'md5:{md5(value)}'
elif isinstance(value, list) and len(value) > 10:
return f'count:{len(value)}'
elif key.endswith('_count') and isinstance(value, int):
return int
return value
test_info_dict = {
key: sanitize(key, value) for key, value in got_dict.items()
if value is not None and key not in IGNORED_FIELDS and not any(
key.startswith(f'{prefix}_') for prefix in IGNORED_PREFIXES)
}
# display_id may be generated from id
if test_info_dict.get('display_id') == test_info_dict.get('id'):
test_info_dict.pop('display_id')
return test_info_dict
def expect_info_dict(self, got_dict, expected_dict):
expect_dict(self, got_dict, expected_dict)
# Check for the presence of mandatory fields
@@ -207,15 +254,15 @@ def expect_info_dict(self, got_dict, expected_dict):
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(got_dict.get(key), 'Missing field: %s' % key)
# Are checkable fields missing from the test case definition?
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in got_dict.items()
if value and key in ('id', 'title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location', 'age_limit'))
test_info_dict = sanitize_got_info_dict(got_dict)
missing_keys = set(test_info_dict.keys()) - set(expected_dict.keys())
if missing_keys:
def _repr(v):
if isinstance(v, compat_str):
return "'%s'" % v.replace('\\', '\\\\').replace("'", "\\'").replace('\n', '\\n')
elif isinstance(v, type):
return v.__name__
else:
return repr(v)
info_dict_str = ''

View File

@@ -99,10 +99,10 @@ class TestInfoExtractor(unittest.TestCase):
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_search_json_ld_realworld(self):
# https://github.com/ytdl-org/youtube-dl/issues/23306
expect_dict(
self,
self.ie._search_json_ld(r'''<script type="application/ld+json">
_TESTS = [
# https://github.com/ytdl-org/youtube-dl/issues/23306
(
r'''<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "VideoObject",
@@ -135,17 +135,171 @@ class TestInfoExtractor(unittest.TestCase):
"name": "Kleio Valentien",
"url": "https://www.eporner.com/pornstar/kleio-valentien/"
}]}
</script>''', None),
{
'title': '1 On 1 With Kleio',
'description': 'Kleio Valentien',
'url': 'https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4',
'timestamp': 1449347075,
'duration': 743.0,
'view_count': 1120958,
'width': 1920,
'height': 1080,
})
</script>''',
{
'title': '1 On 1 With Kleio',
'description': 'Kleio Valentien',
'url': 'https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4',
'timestamp': 1449347075,
'duration': 743.0,
'view_count': 1120958,
'width': 1920,
'height': 1080,
},
{},
),
(
r'''<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "NewsArticle",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://www.ant1news.gr/Society/article/620286/symmoria-anilikon-dikigoros-thymaton-ithelan-na-toys-apoteleiosoyn"
},
"headline": "Συμμορία ανηλίκων δικηγόρος θυμάτων: ήθελαν να τους αποτελειώσουν",
"name": "Συμμορία ανηλίκων δικηγόρος θυμάτων: ήθελαν να τους αποτελειώσουν",
"description": "Τα παιδιά δέχθηκαν την επίθεση επειδή αρνήθηκαν να γίνουν μέλη της συμμορίας, ανέφερε ο Γ. Ζαχαρόπουλος.",
"image": {
"@type": "ImageObject",
"url": "https://ant1media.azureedge.net/imgHandler/1100/a635c968-be71-447c-bf9c-80d843ece21e.jpg",
"width": 1100,
"height": 756 },
"datePublished": "2021-11-10T08:50:00+03:00",
"dateModified": "2021-11-10T08:52:53+03:00",
"author": {
"@type": "Person",
"@id": "https://www.ant1news.gr/",
"name": "Ant1news",
"image": "https://www.ant1news.gr/images/logo-e5d7e4b3e714c88e8d2eca96130142f6.png",
"url": "https://www.ant1news.gr/"
},
"publisher": {
"@type": "Organization",
"@id": "https://www.ant1news.gr#publisher",
"name": "Ant1news",
"url": "https://www.ant1news.gr",
"logo": {
"@type": "ImageObject",
"url": "https://www.ant1news.gr/images/logo-e5d7e4b3e714c88e8d2eca96130142f6.png",
"width": 400,
"height": 400 },
"sameAs": [
"https://www.facebook.com/Ant1news.gr",
"https://twitter.com/antennanews",
"https://www.youtube.com/channel/UC0smvAbfczoN75dP0Hw4Pzw",
"https://www.instagram.com/ant1news/"
]
},
"keywords": "μαχαίρωμα,συμμορία ανηλίκων,ΕΙΔΗΣΕΙΣ,ΕΙΔΗΣΕΙΣ ΣΗΜΕΡΑ,ΝΕΑ,Κοινωνία - Ant1news",
"articleSection": "Κοινωνία"
}
]
}
</script>''',
{
'timestamp': 1636523400,
'title': 'md5:91fe569e952e4d146485740ae927662b',
},
{'expected_type': 'NewsArticle'},
),
(
r'''<script type="application/ld+json">
{"url":"/vrtnu/a-z/het-journaal/2021/het-journaal-het-journaal-19u-20211231/",
"name":"Het journaal 19u",
"description":"Het journaal 19u van vrijdag 31 december 2021.",
"potentialAction":{"url":"https://vrtnu.page.link/pfVy6ihgCAJKgHqe8","@type":"ShareAction"},
"mainEntityOfPage":{"@id":"1640092242445","@type":"WebPage"},
"publication":[{
"startDate":"2021-12-31T19:00:00.000+01:00",
"endDate":"2022-01-30T23:55:00.000+01:00",
"publishedBy":{"name":"een","@type":"Organization"},
"publishedOn":{"url":"https://www.vrt.be/vrtnu/","name":"VRT NU","@type":"BroadcastService"},
"@id":"pbs-pub-3a7ec233-da95-4c1e-9b2b-cf5fdfebcbe8",
"@type":"BroadcastEvent"
}],
"video":{
"name":"Het journaal - Aflevering 365 (Seizoen 2021)",
"description":"Het journaal 19u van vrijdag 31 december 2021. Bekijk aflevering 365 van seizoen 2021 met VRT NU via de site of app.",
"thumbnailUrl":"//images.vrt.be/width1280/2021/12/31/80d5ed00-6a64-11ec-b07d-02b7b76bf47f.jpg",
"expires":"2022-01-30T23:55:00.000+01:00",
"hasPart":[
{"name":"Explosie Turnhout","startOffset":70,"@type":"Clip"},
{"name":"Jaarwisseling","startOffset":440,"@type":"Clip"},
{"name":"Natuurbranden Colorado","startOffset":1179,"@type":"Clip"},
{"name":"Klimaatverandering","startOffset":1263,"@type":"Clip"},
{"name":"Zacht weer","startOffset":1367,"@type":"Clip"},
{"name":"Financiële balans","startOffset":1383,"@type":"Clip"},
{"name":"Club Brugge","startOffset":1484,"@type":"Clip"},
{"name":"Mentale gezondheid bij topsporters","startOffset":1575,"@type":"Clip"},
{"name":"Olympische Winterspelen","startOffset":1728,"@type":"Clip"},
{"name":"Sober oudjaar in Nederland","startOffset":1873,"@type":"Clip"}
],
"duration":"PT34M39.23S",
"uploadDate":"2021-12-31T19:00:00.000+01:00",
"@id":"vid-9457d0c6-b8ac-4aba-b5e1-15aa3a3295b5",
"@type":"VideoObject"
},
"genre":["Nieuws en actua"],
"episodeNumber":365,
"partOfSeries":{"name":"Het journaal","@id":"222831405527","@type":"TVSeries"},
"partOfSeason":{"name":"Seizoen 2021","@id":"961809365527","@type":"TVSeason"},
"@context":"https://schema.org","@id":"961685295527","@type":"TVEpisode"}</script>
''',
{
'chapters': [
{"title": "Explosie Turnhout", "start_time": 70, "end_time": 440},
{"title": "Jaarwisseling", "start_time": 440, "end_time": 1179},
{"title": "Natuurbranden Colorado", "start_time": 1179, "end_time": 1263},
{"title": "Klimaatverandering", "start_time": 1263, "end_time": 1367},
{"title": "Zacht weer", "start_time": 1367, "end_time": 1383},
{"title": "Financiële balans", "start_time": 1383, "end_time": 1484},
{"title": "Club Brugge", "start_time": 1484, "end_time": 1575},
{"title": "Mentale gezondheid bij topsporters", "start_time": 1575, "end_time": 1728},
{"title": "Olympische Winterspelen", "start_time": 1728, "end_time": 1873},
{"title": "Sober oudjaar in Nederland", "start_time": 1873, "end_time": 2079.23}
],
'title': 'Het journaal - Aflevering 365 (Seizoen 2021)'
}, {}
),
(
# test multiple thumbnails in a list
r'''
<script type="application/ld+json">
{"@context":"https://schema.org",
"@type":"VideoObject",
"thumbnailUrl":["https://www.rainews.it/cropgd/640x360/dl/img/2021/12/30/1640886376927_GettyImages.jpg"]}
</script>''',
{
'thumbnails': [{'url': 'https://www.rainews.it/cropgd/640x360/dl/img/2021/12/30/1640886376927_GettyImages.jpg'}],
},
{},
),
(
# test single thumbnail
r'''
<script type="application/ld+json">
{"@context":"https://schema.org",
"@type":"VideoObject",
"thumbnailUrl":"https://www.rainews.it/cropgd/640x360/dl/img/2021/12/30/1640886376927_GettyImages.jpg"}
</script>''',
{
'thumbnails': [{'url': 'https://www.rainews.it/cropgd/640x360/dl/img/2021/12/30/1640886376927_GettyImages.jpg'}],
},
{},
)
]
for html, expected_dict, search_json_ld_kwargs in _TESTS:
expect_dict(
self,
self.ie._search_json_ld(html, None, **search_json_ld_kwargs),
expected_dict
)
def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')

View File

@@ -30,6 +30,7 @@ class YDL(FakeYDL):
self.msgs = []
def process_info(self, info_dict):
info_dict = info_dict.copy()
info_dict.pop('__original_infodict', None)
self.downloaded_info_dicts.append(info_dict)
@@ -645,6 +646,7 @@ class TestYoutubeDL(unittest.TestCase):
'ext': 'mp4',
'width': None,
'height': 1080,
'filesize': 1024,
'title1': '$PATH',
'title2': '%PATH%',
'title3': 'foo/bar\\test',
@@ -717,6 +719,7 @@ class TestYoutubeDL(unittest.TestCase):
test('%(id)s', '.abcd', info={'id': '.abcd'})
test('%(id)s', 'ab__cd', info={'id': 'ab__cd'})
test('%(id)s', ('ab:cd', 'ab -cd'), info={'id': 'ab:cd'})
test('%(id.0)s', '-', info={'id': '--'})
# Invalid templates
self.assertTrue(isinstance(YoutubeDL.validate_outtmpl('%(title)'), ValueError))
@@ -777,6 +780,11 @@ class TestYoutubeDL(unittest.TestCase):
test('%(title5)#U', 'a\u0301e\u0301i\u0301 𝐀')
test('%(title5)+U', 'áéí A')
test('%(title5)+#U', 'a\u0301e\u0301i\u0301 A')
test('%(height)D', '1k')
test('%(filesize)#D', '1Ki')
test('%(height)5.2D', ' 1.08k')
test('%(title4)#S', 'foo_bar_test')
test('%(title4).10S', ('foo \'bar\' ', 'foo \'bar\'' + ('#' if compat_os_name == 'nt' else ' ')))
if compat_os_name == 'nt':
test('%(title4)q', ('"foo \\"bar\\" test"', "'foo _'bar_' test'"))
test('%(formats.:.id)#q', ('"id 1" "id 2" "id 3"', "'id 1' 'id 2' 'id 3'"))
@@ -808,6 +816,11 @@ class TestYoutubeDL(unittest.TestCase):
test('%(width-100,height+width|def)s', 'def')
test('%(timestamp-x>%H\\,%M\\,%S,timestamp>%H\\,%M\\,%S)s', '12,00,00')
# Replacement
test('%(id&foo)s.bar', 'foo.bar')
test('%(title&foo)s.bar', 'NA.bar')
test('%(title&foo|baz)s.bar', 'baz.bar')
# Laziness
def gen():
yield from range(5)
@@ -896,7 +909,7 @@ class TestYoutubeDL(unittest.TestCase):
def _match_entry(self, info_dict, incomplete=False):
res = super(FilterYDL, self)._match_entry(info_dict, incomplete)
if res is None:
self.downloaded_info_dicts.append(info_dict)
self.downloaded_info_dicts.append(info_dict.copy())
return res
first = {
@@ -1141,6 +1154,7 @@ class TestYoutubeDL(unittest.TestCase):
self.assertTrue(entries[1] is None)
self.assertEqual(len(ydl.downloaded_info_dicts), 1)
downloaded = ydl.downloaded_info_dicts[0]
entries[2].pop('requested_downloads', None)
self.assertEqual(entries[2], downloaded)
self.assertEqual(downloaded['url'], TEST_URL)
self.assertEqual(downloaded['title'], 'Video Transparent 2')

View File

@@ -8,6 +8,8 @@ from yt_dlp.cookies import (
WindowsChromeCookieDecryptor,
parse_safari_cookies,
pbkdf2_sha1,
_get_linux_desktop_environment,
_LinuxDesktopEnvironment,
)
@@ -42,6 +44,37 @@ class MonkeyPatch:
class TestCookies(unittest.TestCase):
def test_get_desktop_environment(self):
""" based on https://chromium.googlesource.com/chromium/src/+/refs/heads/main/base/nix/xdg_util_unittest.cc """
test_cases = [
({}, _LinuxDesktopEnvironment.OTHER),
({'DESKTOP_SESSION': 'gnome'}, _LinuxDesktopEnvironment.GNOME),
({'DESKTOP_SESSION': 'mate'}, _LinuxDesktopEnvironment.GNOME),
({'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE),
({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE),
({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE),
({'XDG_CURRENT_DESKTOP': 'X-Cinnamon'}, _LinuxDesktopEnvironment.CINNAMON),
({'XDG_CURRENT_DESKTOP': 'GNOME'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'GNOME:GNOME-Classic'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'GNOME : GNOME-Classic'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'Unity', 'DESKTOP_SESSION': 'gnome-fallback'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'KDE', 'KDE_SESSION_VERSION': '5'}, _LinuxDesktopEnvironment.KDE),
({'XDG_CURRENT_DESKTOP': 'KDE'}, _LinuxDesktopEnvironment.KDE),
({'XDG_CURRENT_DESKTOP': 'Pantheon'}, _LinuxDesktopEnvironment.PANTHEON),
({'XDG_CURRENT_DESKTOP': 'Unity'}, _LinuxDesktopEnvironment.UNITY),
({'XDG_CURRENT_DESKTOP': 'Unity:Unity7'}, _LinuxDesktopEnvironment.UNITY),
({'XDG_CURRENT_DESKTOP': 'Unity:Unity8'}, _LinuxDesktopEnvironment.UNITY),
]
for env, expected_desktop_environment in test_cases:
self.assertEqual(_get_linux_desktop_environment(env), expected_desktop_environment)
def test_chrome_cookie_decryptor_linux_derive_key(self):
key = LinuxChromeCookieDecryptor.derive_key(b'abc')
self.assertEqual(key, b'7\xa1\xec\xd4m\xfcA\xc7\xb19Z\xd0\x19\xdcM\x17')
@@ -58,8 +91,7 @@ class TestCookies(unittest.TestCase):
self.assertEqual(decryptor.decrypt(encrypted_value), value)
def test_chrome_cookie_decryptor_linux_v11(self):
with MonkeyPatch(cookies, {'_get_linux_keyring_password': lambda *args, **kwargs: b'',
'KEYRING_AVAILABLE': True}):
with MonkeyPatch(cookies, {'_get_linux_keyring_password': lambda *args, **kwargs: b''}):
encrypted_value = b'v11#\x81\x10>`w\x8f)\xc0\xb2\xc1\r\xf4\x1al\xdd\x93\xfd\xf8\xf8N\xf2\xa9\x83\xf1\xe9o\x0elVQd'
value = 'tz=Europe.London'
decryptor = LinuxChromeCookieDecryptor('Chrome', Logger())

View File

@@ -53,7 +53,7 @@ class YoutubeDL(yt_dlp.YoutubeDL):
raise ExtractorError(message)
def process_info(self, info_dict):
self.processed_info_dicts.append(info_dict)
self.processed_info_dicts.append(info_dict.copy())
return super(YoutubeDL, self).process_info(info_dict)

View File

@@ -1,26 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from yt_dlp.options import _hide_login_info
class TestOptions(unittest.TestCase):
def test_hide_login_info(self):
self.assertEqual(_hide_login_info(['-u', 'foo', '-p', 'bar']),
['-u', 'PRIVATE', '-p', 'PRIVATE'])
self.assertEqual(_hide_login_info(['-u']), ['-u'])
self.assertEqual(_hide_login_info(['-u', 'foo', '-u', 'bar']),
['-u', 'PRIVATE', '-u', 'PRIVATE'])
self.assertEqual(_hide_login_info(['--username=foo']),
['--username=PRIVATE'])
if __name__ == '__main__':
unittest.main()

View File

@@ -124,11 +124,11 @@ class TestModifyChaptersPP(unittest.TestCase):
chapters = self._chapters([70], ['c']) + [
self._sponsor_chapter(10, 20, 'sponsor'),
self._sponsor_chapter(30, 40, 'preview'),
self._sponsor_chapter(50, 60, 'sponsor')]
self._sponsor_chapter(50, 60, 'filler')]
expected = self._chapters(
[10, 20, 30, 40, 50, 60, 70],
['c', '[SponsorBlock]: Sponsor', 'c', '[SponsorBlock]: Preview/Recap',
'c', '[SponsorBlock]: Sponsor', 'c'])
'c', '[SponsorBlock]: Filler Tangent', 'c'])
self._remove_marked_arrange_sponsors_test_impl(chapters, expected, [])
def test_remove_marked_arrange_sponsors_UniqueNamesForOverlappingSponsors(self):

View File

@@ -13,7 +13,7 @@ from test.helper import FakeYDL, md5, is_download_test
from yt_dlp.extractor import (
YoutubeIE,
DailymotionIE,
TEDIE,
TedTalkIE,
VimeoIE,
WallaIE,
CeskaTelevizeIE,
@@ -141,7 +141,7 @@ class TestDailymotionSubtitles(BaseTestSubtitles):
@is_download_test
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
IE = TedTalkIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True

View File

@@ -23,6 +23,7 @@ from yt_dlp.utils import (
caesar,
clean_html,
clean_podcast_url,
Config,
date_from_str,
datetime_from_str,
DateRange,
@@ -37,11 +38,18 @@ from yt_dlp.utils import (
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
format_bytes,
float_or_none,
get_element_by_class,
get_element_by_attribute,
get_elements_by_class,
get_elements_by_attribute,
get_element_html_by_class,
get_element_html_by_attribute,
get_elements_html_by_class,
get_elements_html_by_attribute,
get_elements_text_and_html_by_attribute,
get_element_text_and_html_by_tag,
InAdvancePagedList,
int_or_none,
intlist_to_bytes,
@@ -116,6 +124,7 @@ from yt_dlp.compat import (
compat_chr,
compat_etree_fromstring,
compat_getenv,
compat_HTMLParseError,
compat_os_name,
compat_setenv,
)
@@ -634,6 +643,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
self.assertEqual(parse_duration('PT00H03M30SZ'), 210)
self.assertEqual(parse_duration('P0Y0M0DT0H4M20.880S'), 260.88)
self.assertEqual(parse_duration('01:02:03:050'), 3723.05)
self.assertEqual(parse_duration('103:050'), 103.05)
def test_fix_xml_ampersands(self):
self.assertEqual(
@@ -1122,7 +1133,7 @@ class TestUtil(unittest.TestCase):
def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b')
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
self.assertEqual(clean_html('a<br>\xa0b'), 'a\nb')
def test_intlist_to_bytes(self):
@@ -1156,9 +1167,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1000'), 1000)
self.assertEqual(parse_count('1.000'), 1000)
self.assertEqual(parse_count('1.1k'), 1100)
self.assertEqual(parse_count('1.1 k'), 1100)
self.assertEqual(parse_count('1,1 k'), 1100)
self.assertEqual(parse_count('1.1kk'), 1100000)
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1,1kk'), 1100000)
self.assertEqual(parse_count('100 views'), 100)
self.assertEqual(parse_count('1,100 views'), 1100)
self.assertEqual(parse_count('1.1kk views'), 1100000)
self.assertEqual(parse_count('10M views'), 10000000)
self.assertEqual(parse_count('has 10M views'), 10000000)
def test_parse_resolution(self):
self.assertEqual(parse_resolution(None), {})
@@ -1566,46 +1584,116 @@ Line 1
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
GET_ELEMENT_BY_CLASS_TEST_STRING = '''
<span class="foo bar">nice</span>
'''
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
html = self.GET_ELEMENT_BY_CLASS_TEST_STRING
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
def test_get_element_html_by_class(self):
html = self.GET_ELEMENT_BY_CLASS_TEST_STRING
self.assertEqual(get_element_html_by_class('foo', html), html.strip())
self.assertEqual(get_element_by_class('no-such-class', html), None)
GET_ELEMENT_BY_ATTRIBUTE_TEST_STRING = '''
<div itemprop="author" itemscope>foo</div>
'''
def test_get_element_by_attribute(self):
html = '''
<span class="foo bar">nice</span>
'''
html = self.GET_ELEMENT_BY_CLASS_TEST_STRING
self.assertEqual(get_element_by_attribute('class', 'foo bar', html), 'nice')
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
html = '''
<div itemprop="author" itemscope>foo</div>
'''
html = self.GET_ELEMENT_BY_ATTRIBUTE_TEST_STRING
self.assertEqual(get_element_by_attribute('itemprop', 'author', html), 'foo')
def test_get_element_html_by_attribute(self):
html = self.GET_ELEMENT_BY_CLASS_TEST_STRING
self.assertEqual(get_element_html_by_attribute('class', 'foo bar', html), html.strip())
self.assertEqual(get_element_html_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_html_by_attribute('class', 'no-such-foo', html), None)
html = self.GET_ELEMENT_BY_ATTRIBUTE_TEST_STRING
self.assertEqual(get_element_html_by_attribute('itemprop', 'author', html), html.strip())
GET_ELEMENTS_BY_CLASS_TEST_STRING = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
GET_ELEMENTS_BY_CLASS_RES = ['<span class="foo bar">nice</span>', '<span class="foo bar">also nice</span>']
def test_get_elements_by_class(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
html = self.GET_ELEMENTS_BY_CLASS_TEST_STRING
self.assertEqual(get_elements_by_class('foo', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_class('no-such-class', html), [])
def test_get_elements_html_by_class(self):
html = self.GET_ELEMENTS_BY_CLASS_TEST_STRING
self.assertEqual(get_elements_html_by_class('foo', html), self.GET_ELEMENTS_BY_CLASS_RES)
self.assertEqual(get_elements_html_by_class('no-such-class', html), [])
def test_get_elements_by_attribute(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
html = self.GET_ELEMENTS_BY_CLASS_TEST_STRING
self.assertEqual(get_elements_by_attribute('class', 'foo bar', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
def test_get_elements_html_by_attribute(self):
html = self.GET_ELEMENTS_BY_CLASS_TEST_STRING
self.assertEqual(get_elements_html_by_attribute('class', 'foo bar', html), self.GET_ELEMENTS_BY_CLASS_RES)
self.assertEqual(get_elements_html_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_html_by_attribute('class', 'no-such-foo', html), [])
def test_get_elements_text_and_html_by_attribute(self):
html = self.GET_ELEMENTS_BY_CLASS_TEST_STRING
self.assertEqual(
list(get_elements_text_and_html_by_attribute('class', 'foo bar', html)),
list(zip(['nice', 'also nice'], self.GET_ELEMENTS_BY_CLASS_RES)))
self.assertEqual(list(get_elements_text_and_html_by_attribute('class', 'foo', html)), [])
self.assertEqual(list(get_elements_text_and_html_by_attribute('class', 'no-such-foo', html)), [])
GET_ELEMENT_BY_TAG_TEST_STRING = '''
random text lorem ipsum</p>
<div>
this should be returned
<span>this should also be returned</span>
<div>
this should also be returned
</div>
closing tag above should not trick, so this should also be returned
</div>
but this text should not be returned
'''
GET_ELEMENT_BY_TAG_RES_OUTERDIV_HTML = GET_ELEMENT_BY_TAG_TEST_STRING.strip()[32:276]
GET_ELEMENT_BY_TAG_RES_OUTERDIV_TEXT = GET_ELEMENT_BY_TAG_RES_OUTERDIV_HTML[5:-6]
GET_ELEMENT_BY_TAG_RES_INNERSPAN_HTML = GET_ELEMENT_BY_TAG_TEST_STRING.strip()[78:119]
GET_ELEMENT_BY_TAG_RES_INNERSPAN_TEXT = GET_ELEMENT_BY_TAG_RES_INNERSPAN_HTML[6:-7]
def test_get_element_text_and_html_by_tag(self):
html = self.GET_ELEMENT_BY_TAG_TEST_STRING
self.assertEqual(
get_element_text_and_html_by_tag('div', html),
(self.GET_ELEMENT_BY_TAG_RES_OUTERDIV_TEXT, self.GET_ELEMENT_BY_TAG_RES_OUTERDIV_HTML))
self.assertEqual(
get_element_text_and_html_by_tag('span', html),
(self.GET_ELEMENT_BY_TAG_RES_INNERSPAN_TEXT, self.GET_ELEMENT_BY_TAG_RES_INNERSPAN_HTML))
self.assertRaises(compat_HTMLParseError, get_element_text_and_html_by_tag, 'article', html)
def test_iri_to_uri(self):
self.assertEqual(
iri_to_uri('https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&client=firefox-b'),
@@ -1681,6 +1769,27 @@ Line 1
ll = reversed(ll)
test(ll, -15, 14, range(15))
def test_format_bytes(self):
self.assertEqual(format_bytes(0), '0.00B')
self.assertEqual(format_bytes(1000), '1000.00B')
self.assertEqual(format_bytes(1024), '1.00KiB')
self.assertEqual(format_bytes(1024**2), '1.00MiB')
self.assertEqual(format_bytes(1024**3), '1.00GiB')
self.assertEqual(format_bytes(1024**4), '1.00TiB')
self.assertEqual(format_bytes(1024**5), '1.00PiB')
self.assertEqual(format_bytes(1024**6), '1.00EiB')
self.assertEqual(format_bytes(1024**7), '1.00ZiB')
self.assertEqual(format_bytes(1024**8), '1.00YiB')
def test_hide_login_info(self):
self.assertEqual(Config.hide_login_info(['-u', 'foo', '-p', 'bar']),
['-u', 'PRIVATE', '-p', 'PRIVATE'])
self.assertEqual(Config.hide_login_info(['-u']), ['-u'])
self.assertEqual(Config.hide_login_info(['-u', 'foo', '-u', 'bar']),
['-u', 'PRIVATE', '-u', 'PRIVATE'])
self.assertEqual(Config.hide_login_info(['--username=foo']),
['--username=PRIVATE'])
if __name__ == '__main__':
unittest.main()

View File

@@ -19,52 +19,52 @@ class TestVerboseOutput(unittest.TestCase):
[
sys.executable, 'yt_dlp/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
'--password', 'my_secret_password',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
self.assertTrue(b'my_secret_password' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'yt_dlp/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
'-p', 'my_secret_password',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
self.assertTrue(b'my_secret_password' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'yt_dlp/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
'--password=my_secret_password',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
self.assertTrue(b'my_secret_password' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'yt_dlp/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
'-p=my_secret_password',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
self.assertTrue(b'my_secret_password' not in serr)
if __name__ == '__main__':

View File

@@ -9,11 +9,9 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, is_download_test
from yt_dlp.extractor import (
YoutubePlaylistIE,
YoutubeTabIE,
YoutubeIE,
YoutubeTabIE,
)
@@ -27,21 +25,10 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubeTabIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
result = ie.extract('https://www.youtube.com/watch?v=OmJ-4B-mS-Y&list=PLydZ2Hrp_gPRJViZjLFKaBMgCQOYEEkyp&index=2')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE.extract_id(result['url']), 'FXxLjLQi3Fg')
def test_youtube_course(self):
print('Skipping: Course URLs no longer exists')
return
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = list(result['entries'])
self.assertEqual(YoutubeIE.extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE.extract_id(entries[-1]['url']), 'rYefUsYuEp0')
self.assertEqual(result['ie_key'], YoutubeIE.ie_key())
self.assertEqual(YoutubeIE.extract_id(result['url']), 'OmJ-4B-mS-Y')
def test_youtube_mix(self):
dl = FakeYDL()
@@ -52,15 +39,6 @@ class TestYoutubeLists(unittest.TestCase):
original_video = entries[0]
self.assertEqual(original_video['id'], 'tyITL_exICo')
def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')
return
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=MCUS')
entries = result['entries']
self.assertEqual(len(entries), 100)
def test_youtube_flat_playlist_extraction(self):
dl = FakeYDL()
dl.params['extract_flat'] = True

View File

@@ -82,6 +82,14 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/f1ca6900/player_ias.vflset/en_US/base.js',
'cu3wyu6LQn2hse', 'jvxetvmlI9AN9Q',
),
(
'https://www.youtube.com/s/player/8040e515/player_ias.vflset/en_US/base.js',
'wvOFaY-yjgDuIEg5', 'HkfBFDHmgw4rsw',
),
(
'https://www.youtube.com/s/player/e06dea74/player_ias.vflset/en_US/base.js',
'AiuodmaDDYw8d3y4bf', 'ankd8eza2T6Qmw',
),
]
@@ -112,10 +120,17 @@ class TestPlayerInfo(unittest.TestCase):
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata/sigs')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def tearDown(self):
try:
for f in os.listdir(self.TESTDATA_DIR):
os.remove(f)
except OSError:
pass
def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig):

File diff suppressed because it is too large Load Diff

View File

@@ -18,10 +18,11 @@ from .options import (
)
from .compat import (
compat_getpass,
compat_os_name,
compat_shlex_quote,
workaround_optparse_bug9161,
)
from .cookies import SUPPORTED_BROWSERS
from .cookies import SUPPORTED_BROWSERS, SUPPORTED_KEYRINGS
from .utils import (
DateRange,
decodeOption,
@@ -95,7 +96,8 @@ def _real_main(argv=None):
if opts.batchfile is not None:
try:
if opts.batchfile == '-':
write_string('Reading URLs from stdin:\n')
write_string('Reading URLs from stdin - EOF (%s) to end:\n' % (
'Ctrl+Z' if compat_os_name == 'nt' else 'Ctrl+D'))
batchfd = sys.stdin
else:
batchfd = io.open(
@@ -136,6 +138,13 @@ def _real_main(argv=None):
sys.exit(0)
# Conflicting, missing and erroneous options
if opts.format == 'best':
warnings.append('.\n '.join((
'"-f best" selects the best pre-merged format which is often not the best option',
'To let yt-dlp download and merge the best available formats, simply do not pass any format selection',
'If you know what you are doing and want only the best pre-merged format, use "-f b" instead to suppress this warning')))
if opts.exec_cmd.get('before_dl') and opts.exec_before_dl_cmd:
parser.error('using "--exec-before-download" conflicts with "--exec before_dl:"')
if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None:
@@ -197,12 +206,11 @@ def _real_main(argv=None):
if opts.concurrent_fragment_downloads <= 0:
parser.error('Concurrent fragments must be positive')
if opts.wait_for_video is not None:
mobj = re.match(r'(?P<min>\d+)(?:-(?P<max>\d+))?$', opts.wait_for_video)
if not mobj:
parser.error('Invalid time range to wait')
min_wait, max_wait = map(int_or_none, mobj.group('min', 'max'))
if max_wait is not None and max_wait < min_wait:
min_wait, max_wait, *_ = map(parse_duration, opts.wait_for_video.split('-', 1) + [None])
if min_wait is None or (max_wait is None and '-' in opts.wait_for_video):
parser.error('Invalid time range to wait')
elif max_wait is not None and max_wait < min_wait:
parser.error('Minimum time range to wait must not be longer than the maximum')
opts.wait_for_video = (min_wait, max_wait)
def parse_retries(retries, name=''):
@@ -216,6 +224,8 @@ def _real_main(argv=None):
return parsed_retries
if opts.retries is not None:
opts.retries = parse_retries(opts.retries)
if opts.file_access_retries is not None:
opts.file_access_retries = parse_retries(opts.file_access_retries, 'file access ')
if opts.fragment_retries is not None:
opts.fragment_retries = parse_retries(opts.fragment_retries, 'fragment ')
if opts.extractor_retries is not None:
@@ -258,10 +268,20 @@ def _real_main(argv=None):
if opts.convertthumbnails not in FFmpegThumbnailsConvertorPP.SUPPORTED_EXTS:
parser.error('invalid thumbnail format specified')
if opts.cookiesfrombrowser is not None:
opts.cookiesfrombrowser = [
part.strip() or None for part in opts.cookiesfrombrowser.split(':', 1)]
if opts.cookiesfrombrowser[0].lower() not in SUPPORTED_BROWSERS:
parser.error('unsupported browser specified for cookies')
mobj = re.match(r'(?P<name>[^+:]+)(\s*\+\s*(?P<keyring>[^:]+))?(\s*:(?P<profile>.+))?', opts.cookiesfrombrowser)
if mobj is None:
parser.error(f'invalid cookies from browser arguments: {opts.cookiesfrombrowser}')
browser_name, keyring, profile = mobj.group('name', 'keyring', 'profile')
browser_name = browser_name.lower()
if browser_name not in SUPPORTED_BROWSERS:
parser.error(f'unsupported browser specified for cookies: "{browser_name}". '
f'Supported browsers are: {", ".join(sorted(SUPPORTED_BROWSERS))}')
if keyring is not None:
keyring = keyring.upper()
if keyring not in SUPPORTED_KEYRINGS:
parser.error(f'unsupported keyring specified for cookies: "{keyring}". '
f'Supported keyrings are: {", ".join(sorted(SUPPORTED_KEYRINGS))}')
opts.cookiesfrombrowser = (browser_name, profile, keyring)
geo_bypass_code = opts.geo_bypass_ip_block or opts.geo_bypass_country
if geo_bypass_code is not None:
try:
@@ -315,6 +335,9 @@ def _real_main(argv=None):
if _video_multistreams_set is False and _audio_multistreams_set is False:
_unused_compat_opt('multistreams')
outtmpl_default = opts.outtmpl.get('default')
if outtmpl_default == '':
outtmpl_default, opts.skip_download = None, True
del opts.outtmpl['default']
if opts.useid:
if outtmpl_default is None:
outtmpl_default = opts.outtmpl['default'] = '%(id)s.%(ext)s'
@@ -333,9 +356,13 @@ def _real_main(argv=None):
for k, tmpl in opts.outtmpl.items():
validate_outtmpl(tmpl, f'{k} output template')
opts.forceprint = opts.forceprint or []
for tmpl in opts.forceprint or []:
validate_outtmpl(tmpl, 'print template')
for type_, tmpl_list in opts.forceprint.items():
for tmpl in tmpl_list:
validate_outtmpl(tmpl, f'{type_} print template')
for type_, tmpl_list in opts.print_to_file.items():
for tmpl, file in tmpl_list:
validate_outtmpl(tmpl, f'{type_} print-to-file template')
validate_outtmpl(file, f'{type_} print-to-file filename')
validate_outtmpl(opts.sponsorblock_chapter_title, 'SponsorBlock chapter title')
for k, tmpl in opts.progress_template.items():
k = f'{k[:-6]} console title' if '-title' in k else f'{k} progress'
@@ -377,7 +404,10 @@ def _real_main(argv=None):
opts.parse_metadata.append('title:%s' % opts.metafromtitle)
opts.parse_metadata = list(itertools.chain(*map(metadataparser_actions, opts.parse_metadata)))
any_getting = opts.forceprint or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_getting = (any(opts.forceprint.values()) or opts.dumpjson or opts.dump_single_json
or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail
or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration)
any_printing = opts.print_json
download_archive_fn = expand_path(opts.download_archive) if opts.download_archive is not None else opts.download_archive
@@ -468,13 +498,6 @@ def _real_main(argv=None):
# Run this before the actual video download
'when': 'before_dl'
})
# Must be after all other before_dl
if opts.exec_before_dl_cmd:
postprocessors.append({
'key': 'Exec',
'exec_cmd': opts.exec_before_dl_cmd,
'when': 'before_dl'
})
if opts.extractaudio:
postprocessors.append({
'key': 'FFmpegExtractAudio',
@@ -514,7 +537,7 @@ def _real_main(argv=None):
if len(dur) == 2 and all(t is not None for t in dur):
remove_ranges.append(tuple(dur))
continue
parser.error(f'invalid --remove-chapters time range {regex!r}. Must be of the form ?start-end')
parser.error(f'invalid --remove-chapters time range {regex!r}. Must be of the form *start-end')
try:
remove_chapters_patterns.append(re.compile(regex))
except re.error as err:
@@ -559,13 +582,12 @@ def _real_main(argv=None):
'_from_cli': True,
})
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
'key': 'EmbedThumbnail',
# already_have_thumbnail = True prevents the file from being deleted after embedding
'already_have_thumbnail': already_have_thumbnail
'already_have_thumbnail': opts.writethumbnail
})
if not already_have_thumbnail:
if not opts.writethumbnail:
opts.writethumbnail = True
opts.outtmpl['pl_thumbnail'] = ''
if opts.split_chapters:
@@ -576,13 +598,21 @@ def _real_main(argv=None):
# XAttrMetadataPP should be run after post-processors that may change file contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Exec must be the last PP
if opts.exec_cmd:
if opts.concat_playlist != 'never':
postprocessors.append({
'key': 'FFmpegConcat',
'only_multi_video': opts.concat_playlist != 'always',
'when': 'playlist',
})
# Exec must be the last PP of each category
if opts.exec_before_dl_cmd:
opts.exec_cmd.setdefault('before_dl', opts.exec_before_dl_cmd)
for when, exec_cmd in opts.exec_cmd.items():
postprocessors.append({
'key': 'Exec',
'exec_cmd': opts.exec_cmd,
'exec_cmd': exec_cmd,
# Run this only after the files have been moved to their final locations
'when': 'after_move'
'when': when,
})
def report_args_compat(arg, name):
@@ -640,6 +670,7 @@ def _real_main(argv=None):
'forcefilename': opts.getfilename,
'forceformat': opts.getformat,
'forceprint': opts.forceprint,
'print_to_file': opts.print_to_file,
'forcejson': opts.dumpjson or opts.print_json,
'dump_single_json': opts.dump_single_json,
'force_write_download_archive': opts.force_write_download_archive,
@@ -668,6 +699,7 @@ def _real_main(argv=None):
'throttledratelimit': opts.throttledratelimit,
'overwrites': opts.overwrites,
'retries': opts.retries,
'file_access_retries': opts.file_access_retries,
'fragment_retries': opts.fragment_retries,
'extractor_retries': opts.extractor_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
@@ -695,8 +727,8 @@ def _real_main(argv=None):
'allow_playlist_files': opts.allow_playlist_files,
'clean_infojson': opts.clean_infojson,
'getcomments': opts.getcomments,
'writethumbnail': opts.writethumbnail,
'write_all_thumbnails': opts.write_all_thumbnails,
'writethumbnail': opts.writethumbnail is True,
'write_all_thumbnails': opts.writethumbnail == 'all',
'writelink': opts.writelink,
'writeurllink': opts.writeurllink,
'writewebloclink': opts.writewebloclink,
@@ -732,6 +764,7 @@ def _real_main(argv=None):
'skip_playlist_after_errors': opts.skip_playlist_after_errors,
'cookiefile': opts.cookiefile,
'cookiesfrombrowser': opts.cookiesfrombrowser,
'legacyserverconnect': opts.legacy_server_connect,
'nocheckcertificate': opts.no_check_certificate,
'prefer_insecure': opts.prefer_insecure,
'proxy': opts.proxy,
@@ -747,6 +780,7 @@ def _real_main(argv=None):
'youtube_include_hls_manifest': opts.youtube_include_hls_manifest,
'encoding': opts.encoding,
'extract_flat': opts.extract_flat,
'live_from_start': opts.live_from_start,
'wait_for_video': opts.wait_for_video,
'mark_watched': opts.mark_watched,
'merge_output_format': opts.merge_output_format,

View File

@@ -2,8 +2,15 @@ from __future__ import unicode_literals
from math import ceil
from .compat import compat_b64decode, compat_pycrypto_AES
from .utils import bytes_to_intlist, intlist_to_bytes
from .compat import (
compat_b64decode,
compat_ord,
compat_pycrypto_AES,
)
from .utils import (
bytes_to_intlist,
intlist_to_bytes,
)
if compat_pycrypto_AES:
@@ -25,6 +32,10 @@ else:
return intlist_to_bytes(aes_gcm_decrypt_and_verify(*map(bytes_to_intlist, (data, key, tag, nonce))))
def unpad_pkcs7(data):
return data[:-compat_ord(data[-1])]
BLOCK_SIZE_BYTES = 16
@@ -506,5 +517,6 @@ __all__ = [
'aes_encrypt',
'aes_gcm_decrypt_and_verify',
'aes_gcm_decrypt_and_verify_bytes',
'key_expansion'
'key_expansion',
'unpad_pkcs7',
]

View File

@@ -2,6 +2,7 @@
import asyncio
import base64
import collections
import ctypes
import getpass
import html
@@ -160,26 +161,37 @@ except ImportError:
compat_pycrypto_AES = None
WINDOWS_VT_MODE = False if compat_os_name == 'nt' else None
def windows_enable_vt_mode(): # TODO: Do this the proper way https://bugs.python.org/issue30075
if compat_os_name != 'nt':
return
global WINDOWS_VT_MODE
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
subprocess.Popen('', shell=True, startupinfo=startupinfo)
try:
subprocess.Popen('', shell=True, startupinfo=startupinfo)
WINDOWS_VT_MODE = True
except Exception:
pass
# Deprecated
compat_basestring = str
compat_chr = chr
compat_filter = filter
compat_input = input
compat_integer_types = (int, )
compat_kwargs = lambda kwargs: kwargs
compat_map = map
compat_numeric_types = (int, float, complex)
compat_str = str
compat_xpath = lambda xpath: xpath
compat_zip = zip
compat_collections_abc = collections.abc
compat_HTMLParser = html.parser.HTMLParser
compat_HTTPError = urllib.error.HTTPError
compat_Struct = struct.Struct
@@ -226,6 +238,7 @@ compat_xml_parse_error = etree.ParseError
# Set public objects
__all__ = [
'WINDOWS_VT_MODE',
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
@@ -236,6 +249,7 @@ __all__ = [
'compat_b64decode',
'compat_basestring',
'compat_chr',
'compat_collections_abc',
'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies',
@@ -245,6 +259,7 @@ __all__ = [
'compat_etree_fromstring',
'compat_etree_register_namespace',
'compat_expanduser',
'compat_filter',
'compat_get_terminal_size',
'compat_getenv',
'compat_getpass',
@@ -256,6 +271,7 @@ __all__ = [
'compat_integer_types',
'compat_itertools_count',
'compat_kwargs',
'compat_map',
'compat_numeric_types',
'compat_ord',
'compat_os_name',

View File

@@ -1,3 +1,4 @@
import contextlib
import ctypes
import json
import os
@@ -7,15 +8,19 @@ import subprocess
import sys
import tempfile
from datetime import datetime, timedelta, timezone
from enum import Enum, auto
from hashlib import pbkdf2_hmac
from .aes import aes_cbc_decrypt_bytes, aes_gcm_decrypt_and_verify_bytes
from .aes import (
aes_cbc_decrypt_bytes,
aes_gcm_decrypt_and_verify_bytes,
unpad_pkcs7,
)
from .compat import (
compat_b64decode,
compat_cookiejar_Cookie,
)
from .utils import (
bug_reports_message,
expand_path,
Popen,
YoutubeDLCookieJar,
@@ -31,19 +36,16 @@ except ImportError:
try:
import keyring
KEYRING_AVAILABLE = True
KEYRING_UNAVAILABLE_REASON = f'due to unknown reasons{bug_reports_message()}'
import secretstorage
SECRETSTORAGE_AVAILABLE = True
except ImportError:
KEYRING_AVAILABLE = False
KEYRING_UNAVAILABLE_REASON = (
'as the `keyring` module is not installed. '
'Please install by running `python3 -m pip install keyring`. '
'Depending on your platform, additional packages may be required '
'to access the keyring; see https://pypi.org/project/keyring')
SECRETSTORAGE_AVAILABLE = False
SECRETSTORAGE_UNAVAILABLE_REASON = (
'as the `secretstorage` module is not installed. '
'Please install by running `python3 -m pip install secretstorage`.')
except Exception as _err:
KEYRING_AVAILABLE = False
KEYRING_UNAVAILABLE_REASON = 'as the `keyring` module could not be initialized: %s' % _err
SECRETSTORAGE_AVAILABLE = False
SECRETSTORAGE_UNAVAILABLE_REASON = f'as the `secretstorage` module could not be initialized. {_err}'
CHROMIUM_BASED_BROWSERS = {'brave', 'chrome', 'chromium', 'edge', 'opera', 'vivaldi'}
@@ -74,8 +76,8 @@ class YDLLogger:
def load_cookies(cookie_file, browser_specification, ydl):
cookie_jars = []
if browser_specification is not None:
browser_name, profile = _parse_browser_specification(*browser_specification)
cookie_jars.append(extract_cookies_from_browser(browser_name, profile, YDLLogger(ydl)))
browser_name, profile, keyring = _parse_browser_specification(*browser_specification)
cookie_jars.append(extract_cookies_from_browser(browser_name, profile, YDLLogger(ydl), keyring=keyring))
if cookie_file is not None:
cookie_file = expand_path(cookie_file)
@@ -87,13 +89,13 @@ def load_cookies(cookie_file, browser_specification, ydl):
return _merge_cookie_jars(cookie_jars)
def extract_cookies_from_browser(browser_name, profile=None, logger=YDLLogger()):
def extract_cookies_from_browser(browser_name, profile=None, logger=YDLLogger(), *, keyring=None):
if browser_name == 'firefox':
return _extract_firefox_cookies(profile, logger)
elif browser_name == 'safari':
return _extract_safari_cookies(profile, logger)
elif browser_name in CHROMIUM_BASED_BROWSERS:
return _extract_chrome_cookies(browser_name, profile, logger)
return _extract_chrome_cookies(browser_name, profile, keyring, logger)
else:
raise ValueError('unknown browser: {}'.format(browser_name))
@@ -207,7 +209,7 @@ def _get_chromium_based_browser_settings(browser_name):
}
def _extract_chrome_cookies(browser_name, profile, logger):
def _extract_chrome_cookies(browser_name, profile, keyring, logger):
logger.info('Extracting cookies from {}'.format(browser_name))
if not SQLITE_AVAILABLE:
@@ -234,7 +236,7 @@ def _extract_chrome_cookies(browser_name, profile, logger):
raise FileNotFoundError('could not find {} cookies database in "{}"'.format(browser_name, search_root))
logger.debug('Extracting cookies from: "{}"'.format(cookie_database_path))
decryptor = get_cookie_decryptor(config['browser_dir'], config['keyring_name'], logger)
decryptor = get_cookie_decryptor(config['browser_dir'], config['keyring_name'], logger, keyring=keyring)
with tempfile.TemporaryDirectory(prefix='yt_dlp') as tmpdir:
cursor = None
@@ -247,6 +249,7 @@ def _extract_chrome_cookies(browser_name, profile, logger):
'expires_utc, {} FROM cookies'.format(secure_column))
jar = YoutubeDLCookieJar()
failed_cookies = 0
unencrypted_cookies = 0
for host_key, name, value, encrypted_value, path, expires_utc, is_secure in cursor.fetchall():
host_key = host_key.decode('utf-8')
name = name.decode('utf-8')
@@ -258,6 +261,8 @@ def _extract_chrome_cookies(browser_name, profile, logger):
if value is None:
failed_cookies += 1
continue
else:
unencrypted_cookies += 1
cookie = compat_cookiejar_Cookie(
version=0, name=name, value=value, port=None, port_specified=False,
@@ -270,6 +275,9 @@ def _extract_chrome_cookies(browser_name, profile, logger):
else:
failed_message = ''
logger.info('Extracted {} cookies from {}{}'.format(len(jar), browser_name, failed_message))
counts = decryptor.cookie_counts.copy()
counts['unencrypted'] = unencrypted_cookies
logger.debug('cookie version breakdown: {}'.format(counts))
return jar
finally:
if cursor is not None:
@@ -305,10 +313,14 @@ class ChromeCookieDecryptor:
def decrypt(self, encrypted_value):
raise NotImplementedError
@property
def cookie_counts(self):
raise NotImplementedError
def get_cookie_decryptor(browser_root, browser_keyring_name, logger):
def get_cookie_decryptor(browser_root, browser_keyring_name, logger, *, keyring=None):
if sys.platform in ('linux', 'linux2'):
return LinuxChromeCookieDecryptor(browser_keyring_name, logger)
return LinuxChromeCookieDecryptor(browser_keyring_name, logger, keyring=keyring)
elif sys.platform == 'darwin':
return MacChromeCookieDecryptor(browser_keyring_name, logger)
elif sys.platform == 'win32':
@@ -319,13 +331,12 @@ def get_cookie_decryptor(browser_root, browser_keyring_name, logger):
class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
def __init__(self, browser_keyring_name, logger):
def __init__(self, browser_keyring_name, logger, *, keyring=None):
self._logger = logger
self._v10_key = self.derive_key(b'peanuts')
if KEYRING_AVAILABLE:
self._v11_key = self.derive_key(_get_linux_keyring_password(browser_keyring_name))
else:
self._v11_key = None
password = _get_linux_keyring_password(browser_keyring_name, keyring, logger)
self._v11_key = None if password is None else self.derive_key(password)
self._cookie_counts = {'v10': 0, 'v11': 0, 'other': 0}
@staticmethod
def derive_key(password):
@@ -333,20 +344,27 @@ class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_linux.cc
return pbkdf2_sha1(password, salt=b'saltysalt', iterations=1, key_length=16)
@property
def cookie_counts(self):
return self._cookie_counts
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
self._cookie_counts['v10'] += 1
return _decrypt_aes_cbc(ciphertext, self._v10_key, self._logger)
elif version == b'v11':
self._cookie_counts['v11'] += 1
if self._v11_key is None:
self._logger.warning(f'cannot decrypt cookie {KEYRING_UNAVAILABLE_REASON}', only_once=True)
self._logger.warning('cannot decrypt v11 cookies: no key found', only_once=True)
return None
return _decrypt_aes_cbc(ciphertext, self._v11_key, self._logger)
else:
self._cookie_counts['other'] += 1
return None
@@ -355,6 +373,7 @@ class MacChromeCookieDecryptor(ChromeCookieDecryptor):
self._logger = logger
password = _get_mac_keyring_password(browser_keyring_name, logger)
self._v10_key = None if password is None else self.derive_key(password)
self._cookie_counts = {'v10': 0, 'other': 0}
@staticmethod
def derive_key(password):
@@ -362,11 +381,16 @@ class MacChromeCookieDecryptor(ChromeCookieDecryptor):
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_mac.mm
return pbkdf2_sha1(password, salt=b'saltysalt', iterations=1003, key_length=16)
@property
def cookie_counts(self):
return self._cookie_counts
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
self._cookie_counts['v10'] += 1
if self._v10_key is None:
self._logger.warning('cannot decrypt v10 cookies: no key found', only_once=True)
return None
@@ -374,6 +398,7 @@ class MacChromeCookieDecryptor(ChromeCookieDecryptor):
return _decrypt_aes_cbc(ciphertext, self._v10_key, self._logger)
else:
self._cookie_counts['other'] += 1
# other prefixes are considered 'old data' which were stored as plaintext
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_mac.mm
return encrypted_value
@@ -383,12 +408,18 @@ class WindowsChromeCookieDecryptor(ChromeCookieDecryptor):
def __init__(self, browser_root, logger):
self._logger = logger
self._v10_key = _get_windows_v10_key(browser_root, logger)
self._cookie_counts = {'v10': 0, 'other': 0}
@property
def cookie_counts(self):
return self._cookie_counts
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
self._cookie_counts['v10'] += 1
if self._v10_key is None:
self._logger.warning('cannot decrypt v10 cookies: no key found', only_once=True)
return None
@@ -408,6 +439,7 @@ class WindowsChromeCookieDecryptor(ChromeCookieDecryptor):
return _decrypt_aes_gcm(ciphertext, self._v10_key, nonce, authentication_tag, self._logger)
else:
self._cookie_counts['other'] += 1
# any other prefix means the data is DPAPI encrypted
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_win.cc
return _decrypt_windows_dpapi(encrypted_value, self._logger).decode('utf-8')
@@ -577,42 +609,220 @@ def parse_safari_cookies(data, jar=None, logger=YDLLogger()):
return jar
def _get_linux_keyring_password(browser_keyring_name):
password = keyring.get_password('{} Keys'.format(browser_keyring_name),
'{} Safe Storage'.format(browser_keyring_name))
if password is None:
# this sometimes occurs in KDE because chrome does not check hasEntry and instead
# just tries to read the value (which kwallet returns "") whereas keyring checks hasEntry
# to verify this:
# dbus-monitor "interface='org.kde.KWallet'" "type=method_return"
# while starting chrome.
# this may be a bug as the intended behaviour is to generate a random password and store
# it, but that doesn't matter here.
password = ''
return password.encode('utf-8')
class _LinuxDesktopEnvironment(Enum):
"""
https://chromium.googlesource.com/chromium/src/+/refs/heads/main/base/nix/xdg_util.h
DesktopEnvironment
"""
OTHER = auto()
CINNAMON = auto()
GNOME = auto()
KDE = auto()
PANTHEON = auto()
UNITY = auto()
XFCE = auto()
class _LinuxKeyring(Enum):
"""
https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/key_storage_util_linux.h
SelectedLinuxBackend
"""
KWALLET = auto()
GNOMEKEYRING = auto()
BASICTEXT = auto()
SUPPORTED_KEYRINGS = _LinuxKeyring.__members__.keys()
def _get_linux_desktop_environment(env):
"""
https://chromium.googlesource.com/chromium/src/+/refs/heads/main/base/nix/xdg_util.cc
GetDesktopEnvironment
"""
xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None)
desktop_session = env.get('DESKTOP_SESSION', None)
if xdg_current_desktop is not None:
xdg_current_desktop = xdg_current_desktop.split(':')[0].strip()
if xdg_current_desktop == 'Unity':
if desktop_session is not None and 'gnome-fallback' in desktop_session:
return _LinuxDesktopEnvironment.GNOME
else:
return _LinuxDesktopEnvironment.UNITY
elif xdg_current_desktop == 'GNOME':
return _LinuxDesktopEnvironment.GNOME
elif xdg_current_desktop == 'X-Cinnamon':
return _LinuxDesktopEnvironment.CINNAMON
elif xdg_current_desktop == 'KDE':
return _LinuxDesktopEnvironment.KDE
elif xdg_current_desktop == 'Pantheon':
return _LinuxDesktopEnvironment.PANTHEON
elif xdg_current_desktop == 'XFCE':
return _LinuxDesktopEnvironment.XFCE
elif desktop_session is not None:
if desktop_session in ('mate', 'gnome'):
return _LinuxDesktopEnvironment.GNOME
elif 'kde' in desktop_session:
return _LinuxDesktopEnvironment.KDE
elif 'xfce' in desktop_session:
return _LinuxDesktopEnvironment.XFCE
else:
if 'GNOME_DESKTOP_SESSION_ID' in env:
return _LinuxDesktopEnvironment.GNOME
elif 'KDE_FULL_SESSION' in env:
return _LinuxDesktopEnvironment.KDE
return _LinuxDesktopEnvironment.OTHER
def _choose_linux_keyring(logger):
"""
https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/key_storage_util_linux.cc
SelectBackend
"""
desktop_environment = _get_linux_desktop_environment(os.environ)
logger.debug('detected desktop environment: {}'.format(desktop_environment.name))
if desktop_environment == _LinuxDesktopEnvironment.KDE:
linux_keyring = _LinuxKeyring.KWALLET
elif desktop_environment == _LinuxDesktopEnvironment.OTHER:
linux_keyring = _LinuxKeyring.BASICTEXT
else:
linux_keyring = _LinuxKeyring.GNOMEKEYRING
return linux_keyring
def _get_kwallet_network_wallet(logger):
""" The name of the wallet used to store network passwords.
https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/kwallet_dbus.cc
KWalletDBus::NetworkWallet
which does a dbus call to the following function:
https://api.kde.org/frameworks/kwallet/html/classKWallet_1_1Wallet.html
Wallet::NetworkWallet
"""
default_wallet = 'kdewallet'
try:
proc = Popen([
'dbus-send', '--session', '--print-reply=literal',
'--dest=org.kde.kwalletd5',
'/modules/kwalletd5',
'org.kde.KWallet.networkWallet'
], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
stdout, stderr = proc.communicate_or_kill()
if proc.returncode != 0:
logger.warning('failed to read NetworkWallet')
return default_wallet
else:
network_wallet = stdout.decode('utf-8').strip()
logger.debug('NetworkWallet = "{}"'.format(network_wallet))
return network_wallet
except BaseException as e:
logger.warning('exception while obtaining NetworkWallet: {}'.format(e))
return default_wallet
def _get_kwallet_password(browser_keyring_name, logger):
logger.debug('using kwallet-query to obtain password from kwallet')
if shutil.which('kwallet-query') is None:
logger.error('kwallet-query command not found. KWallet and kwallet-query '
'must be installed to read from KWallet. kwallet-query should be'
'included in the kwallet package for your distribution')
return b''
network_wallet = _get_kwallet_network_wallet(logger)
try:
proc = Popen([
'kwallet-query',
'--read-password', '{} Safe Storage'.format(browser_keyring_name),
'--folder', '{} Keys'.format(browser_keyring_name),
network_wallet
], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
stdout, stderr = proc.communicate_or_kill()
if proc.returncode != 0:
logger.error('kwallet-query failed with return code {}. Please consult '
'the kwallet-query man page for details'.format(proc.returncode))
return b''
else:
if stdout.lower().startswith(b'failed to read'):
logger.debug('failed to read password from kwallet. Using empty string instead')
# this sometimes occurs in KDE because chrome does not check hasEntry and instead
# just tries to read the value (which kwallet returns "") whereas kwallet-query
# checks hasEntry. To verify this:
# dbus-monitor "interface='org.kde.KWallet'" "type=method_return"
# while starting chrome.
# this may be a bug as the intended behaviour is to generate a random password and store
# it, but that doesn't matter here.
return b''
else:
logger.debug('password found')
if stdout[-1:] == b'\n':
stdout = stdout[:-1]
return stdout
except BaseException as e:
logger.warning(f'exception running kwallet-query: {type(e).__name__}({e})')
return b''
def _get_gnome_keyring_password(browser_keyring_name, logger):
if not SECRETSTORAGE_AVAILABLE:
logger.error('secretstorage not available {}'.format(SECRETSTORAGE_UNAVAILABLE_REASON))
return b''
# the Gnome keyring does not seem to organise keys in the same way as KWallet,
# using `dbus-monitor` during startup, it can be observed that chromium lists all keys
# and presumably searches for its key in the list. It appears that we must do the same.
# https://github.com/jaraco/keyring/issues/556
with contextlib.closing(secretstorage.dbus_init()) as con:
col = secretstorage.get_default_collection(con)
for item in col.get_all_items():
if item.get_label() == '{} Safe Storage'.format(browser_keyring_name):
return item.get_secret()
else:
logger.error('failed to read from keyring')
return b''
def _get_linux_keyring_password(browser_keyring_name, keyring, logger):
# note: chrome/chromium can be run with the following flags to determine which keyring backend
# it has chosen to use
# chromium --enable-logging=stderr --v=1 2>&1 | grep key_storage_
# Chromium supports a flag: --password-store=<basic|gnome|kwallet> so the automatic detection
# will not be sufficient in all cases.
keyring = _LinuxKeyring[keyring] if keyring else _choose_linux_keyring(logger)
logger.debug(f'Chosen keyring: {keyring.name}')
if keyring == _LinuxKeyring.KWALLET:
return _get_kwallet_password(browser_keyring_name, logger)
elif keyring == _LinuxKeyring.GNOMEKEYRING:
return _get_gnome_keyring_password(browser_keyring_name, logger)
elif keyring == _LinuxKeyring.BASICTEXT:
# when basic text is chosen, all cookies are stored as v10 (so no keyring password is required)
return None
assert False, f'Unknown keyring {keyring}'
def _get_mac_keyring_password(browser_keyring_name, logger):
if KEYRING_AVAILABLE:
logger.debug('using keyring to obtain password')
password = keyring.get_password('{} Safe Storage'.format(browser_keyring_name), browser_keyring_name)
return password.encode('utf-8')
else:
logger.debug('using find-generic-password to obtain password')
logger.debug('using find-generic-password to obtain password from OSX keychain')
try:
proc = Popen(
['security', 'find-generic-password',
'-w', # write password to stdout
'-a', browser_keyring_name, # match 'account'
'-s', '{} Safe Storage'.format(browser_keyring_name)], # match 'service'
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
try:
stdout, stderr = proc.communicate_or_kill()
if stdout[-1:] == b'\n':
stdout = stdout[:-1]
return stdout
except BaseException as e:
logger.warning(f'exception running find-generic-password: {type(e).__name__}({e})')
return None
stdout, stderr = proc.communicate_or_kill()
if stdout[-1:] == b'\n':
stdout = stdout[:-1]
return stdout
except BaseException as e:
logger.warning(f'exception running find-generic-password: {type(e).__name__}({e})')
return None
def _get_windows_v10_key(browser_root, logger):
@@ -640,10 +850,9 @@ def pbkdf2_sha1(password, salt, iterations, key_length):
def _decrypt_aes_cbc(ciphertext, key, logger, initialization_vector=b' ' * 16):
plaintext = aes_cbc_decrypt_bytes(ciphertext, key, initialization_vector)
padding_length = plaintext[-1]
plaintext = unpad_pkcs7(aes_cbc_decrypt_bytes(ciphertext, key, initialization_vector))
try:
return plaintext[:-padding_length].decode('utf-8')
return plaintext.decode('utf-8')
except UnicodeDecodeError:
logger.warning('failed to decrypt cookie (AES-CBC) because UTF-8 decoding failed. Possibly the key is wrong?', only_once=True)
return None
@@ -736,10 +945,11 @@ def _is_path(value):
return os.path.sep in value
def _parse_browser_specification(browser_name, profile=None):
browser_name = browser_name.lower()
def _parse_browser_specification(browser_name, profile=None, keyring=None):
if browser_name not in SUPPORTED_BROWSERS:
raise ValueError(f'unsupported browser: "{browser_name}"')
if keyring not in (None, *SUPPORTED_KEYRINGS):
raise ValueError(f'unsupported keyring: "{keyring}"')
if profile is not None and _is_path(profile):
profile = os.path.expanduser(profile)
return browser_name, profile
return browser_name, profile, keyring

View File

@@ -12,10 +12,15 @@ def get_suitable_downloader(info_dict, params={}, default=NO_DEFAULT, protocol=N
info_copy = info_dict.copy()
info_copy['to_stdout'] = to_stdout
downloaders = [_get_suitable_downloader(info_copy, proto, params, default)
for proto in (protocol or info_copy['protocol']).split('+')]
protocols = (protocol or info_copy['protocol']).split('+')
downloaders = [_get_suitable_downloader(info_copy, proto, params, default) for proto in protocols]
if set(downloaders) == {FFmpegFD} and FFmpegFD.can_merge_formats(info_copy, params):
return FFmpegFD
elif (set(downloaders) == {DashSegmentsFD}
and not (to_stdout and len(protocols) > 1)
and set(protocols) == {'http_dash_segments_generator'}):
return DashSegmentsFD
elif len(downloaders) == 1:
return downloaders[0]
return None
@@ -49,6 +54,7 @@ PROTOCOL_MAP = {
'rtsp': RtspFD,
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'http_dash_segments_generator': DashSegmentsFD,
'ism': IsmFD,
'mhtml': MhtmlFD,
'niconico_dmc': NiconicoDmcFD,
@@ -63,6 +69,7 @@ def shorten_protocol_name(proto, simplify=False):
'm3u8_native': 'm3u8_n',
'rtmp_ffmpeg': 'rtmp_f',
'http_dash_segments': 'dash',
'http_dash_segments_generator': 'dash_g',
'niconico_dmc': 'dmc',
'websocket_frag': 'WSfrag',
}
@@ -71,6 +78,7 @@ def shorten_protocol_name(proto, simplify=False):
'https': 'http',
'ftps': 'ftp',
'm3u8_native': 'm3u8',
'http_dash_segments_generator': 'dash',
'rtmp_ffmpeg': 'rtmp',
'm3u8_frag_urls': 'm3u8',
'dash_frag_urls': 'dash',

View File

@@ -4,12 +4,14 @@ import os
import re
import time
import random
import errno
from ..utils import (
decodeArgument,
encodeFilename,
error_to_compat_str,
format_bytes,
sanitize_open,
shell_quote,
timeconvert,
timetuple_from_msec,
@@ -39,6 +41,7 @@ class FileDownloader(object):
ratelimit: Download speed limit, in bytes/sec.
throttledratelimit: Assume the download is being throttled below this speed (bytes/sec)
retries: Number of times to retry for HTTP error 5xx
file_access_retries: Number of times to retry on file access error
buffersize: Size of download buffer in bytes.
noresizebuffer: Do not automatically resize the download buffer.
continuedl: Try to continue downloads if possible.
@@ -207,6 +210,21 @@ class FileDownloader(object):
def ytdl_filename(self, filename):
return filename + '.ytdl'
def sanitize_open(self, filename, open_mode):
file_access_retries = self.params.get('file_access_retries', 10)
retry = 0
while True:
try:
return sanitize_open(filename, open_mode)
except (IOError, OSError) as err:
retry = retry + 1
if retry > file_access_retries or err.errno not in (errno.EACCES,):
raise
self.to_screen(
'[download] Got file access error. Retrying (attempt %d of %s) ...'
% (retry, self.format_retries(file_access_retries)))
time.sleep(0.01)
def try_rename(self, old_filename, new_filename):
if old_filename == new_filename:
return
@@ -397,6 +415,7 @@ class FileDownloader(object):
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
}, info_dict)
self._finish_multiline_status()
return True, False
if subtitle is False:

View File

@@ -1,4 +1,5 @@
from __future__ import unicode_literals
import time
from ..downloader import get_suitable_downloader
from .fragment import FragmentFD
@@ -15,27 +16,53 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
if info_dict.get('is_live'):
if info_dict.get('is_live') and set(info_dict['protocol'].split('+')) != {'http_dash_segments_generator'}:
self.report_error('Live DASH videos are not supported')
fragment_base_url = info_dict.get('fragment_base_url')
fragments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
real_start = time.time()
real_downloader = get_suitable_downloader(
info_dict, self.params, None, protocol='dash_frag_urls', to_stdout=(filename == '-'))
ctx = {
'filename': filename,
'total_frags': len(fragments),
}
requested_formats = [{**info_dict, **fmt} for fmt in info_dict.get('requested_formats', [])]
args = []
for fmt in requested_formats or [info_dict]:
try:
fragment_count = 1 if self.params.get('test') else len(fmt['fragments'])
except TypeError:
fragment_count = None
ctx = {
'filename': fmt.get('filepath') or filename,
'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'),
'total_frags': fragment_count,
}
if real_downloader:
self._prepare_external_frag_download(ctx)
else:
self._prepare_and_start_frag_download(ctx, info_dict)
if real_downloader:
self._prepare_external_frag_download(ctx)
else:
self._prepare_and_start_frag_download(ctx, fmt)
ctx['start'] = real_start
fragments_to_download = self._get_fragments(fmt, ctx)
if real_downloader:
self.to_screen(
'[%s] Fragment downloads will be delegated to %s' % (self.FD_NAME, real_downloader.get_basename()))
info_dict['fragments'] = list(fragments_to_download)
fd = real_downloader(self.ydl, self.params)
return fd.real_download(filename, info_dict)
args.append([ctx, fragments_to_download, fmt])
return self.download_and_append_fragments_multiple(*args)
def _resolve_fragments(self, fragments, ctx):
fragments = fragments(ctx) if callable(fragments) else fragments
return [next(iter(fragments))] if self.params.get('test') else fragments
def _get_fragments(self, fmt, ctx):
fragment_base_url = fmt.get('fragment_base_url')
fragments = self._resolve_fragments(fmt['fragments'], ctx)
fragments_to_download = []
frag_index = 0
for i, fragment in enumerate(fragments):
frag_index += 1
@@ -46,17 +73,8 @@ class DashSegmentsFD(FragmentFD):
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
fragments_to_download.append({
yield {
'frag_index': frag_index,
'index': i,
'url': fragment_url,
})
if real_downloader:
self.to_screen(
'[%s] Fragment downloads will be delegated to %s' % (self.FD_NAME, real_downloader.get_basename()))
info_dict['fragments'] = fragments_to_download
fd = real_downloader(self.ydl, self.params)
return fd.real_download(filename, info_dict)
return self.download_and_append_fragments(ctx, fragments_to_download, info_dict)
}

View File

@@ -17,12 +17,13 @@ from ..utils import (
cli_valueless_option,
cli_bool_option,
_configuration_args,
determine_ext,
encodeFilename,
encodeArgument,
handle_youtubedl_headers,
check_executable,
Popen,
sanitize_open,
remove_end,
)
@@ -144,11 +145,11 @@ class ExternalFD(FragmentFD):
return -1
decrypt_fragment = self.decrypter(info_dict)
dest, _ = sanitize_open(tmpfilename, 'wb')
dest, _ = self.sanitize_open(tmpfilename, 'wb')
for frag_index, fragment in enumerate(info_dict['fragments']):
fragment_filename = '%s-Frag%d' % (tmpfilename, frag_index)
try:
src, _ = sanitize_open(fragment_filename, 'rb')
src, _ = self.sanitize_open(fragment_filename, 'rb')
except IOError as err:
if skip_unavailable_fragments and frag_index > 1:
self.report_skip_fragment(frag_index, err)
@@ -266,6 +267,7 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += self._bool_option('--show-console-readout', 'noprogress', 'false', 'true', '=')
cmd += self._configuration_args()
# aria2c strips out spaces from the beginning/end of filenames and paths.
@@ -290,7 +292,7 @@ class Aria2cFD(ExternalFD):
for frag_index, fragment in enumerate(info_dict['fragments']):
fragment_filename = '%s-Frag%d' % (os.path.basename(tmpfilename), frag_index)
url_list.append('%s\n\tout=%s' % (fragment['url'], fragment_filename))
stream, _ = sanitize_open(url_list_file, 'wb')
stream, _ = self.sanitize_open(url_list_file, 'wb')
stream.write('\n'.join(url_list).encode('utf-8'))
stream.close()
cmd += ['-i', url_list_file]
@@ -304,7 +306,7 @@ class HttpieFD(ExternalFD):
@classmethod
def available(cls, path=None):
return ExternalFD.available(cls, path or 'http')
return super().available(path or 'http')
def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
@@ -463,6 +465,15 @@ class FFmpegFD(ExternalFD):
args += ['-f', 'flv']
elif ext == 'mp4' and tmpfilename == '-':
args += ['-f', 'mpegts']
elif ext == 'unknown_video':
ext = determine_ext(remove_end(tmpfilename, '.part'))
if ext == 'unknown_video':
self.report_warning(
'The video format is unknown and cannot be downloaded by ffmpeg. '
'Explicitly set the extension in the filename to attempt download in that format')
else:
self.report_warning(f'The video format is unknown. Trying to download as {ext} according to the filename')
args += ['-f', EXT_TO_OUT_FORMATS.get(ext, ext)]
else:
args += ['-f', EXT_TO_OUT_FORMATS.get(ext, ext)]

View File

@@ -366,7 +366,7 @@ class F4mFD(FragmentFD):
ctx = {
'filename': filename,
'total_frags': total_frags,
'live': live,
'live': bool(live),
}
self._prepare_frag_download(ctx)

View File

@@ -1,9 +1,10 @@
from __future__ import division, unicode_literals
import http.client
import json
import math
import os
import time
import json
from math import ceil
try:
import concurrent.futures
@@ -13,8 +14,9 @@ except ImportError:
from .common import FileDownloader
from .http import HttpFD
from ..aes import aes_cbc_decrypt_bytes
from ..aes import aes_cbc_decrypt_bytes, unpad_pkcs7
from ..compat import (
compat_os_name,
compat_urllib_error,
compat_struct_pack,
)
@@ -22,7 +24,6 @@ from ..utils import (
DownloadError,
error_to_compat_str,
encodeFilename,
sanitize_open,
sanitized_Request,
)
@@ -90,11 +91,11 @@ class FragmentFD(FileDownloader):
self._start_frag_download(ctx, info_dict)
def __do_ytdl_file(self, ctx):
return not ctx['live'] and not ctx['tmpfilename'] == '-' and not self.params.get('_no_ytdl_file')
return ctx['live'] is not True and ctx['tmpfilename'] != '-' and not self.params.get('_no_ytdl_file')
def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx
stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
stream, _ = self.sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
try:
ytdl_data = json.loads(stream.read())
ctx['fragment_index'] = ytdl_data['downloader']['current_fragment']['index']
@@ -106,7 +107,7 @@ class FragmentFD(FileDownloader):
stream.close()
def _write_ytdl_file(self, ctx):
frag_index_stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
frag_index_stream, _ = self.sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
try:
downloader = {
'current_fragment': {
@@ -138,7 +139,7 @@ class FragmentFD(FileDownloader):
return True, self._read_fragment(ctx)
def _read_fragment(self, ctx):
down, frag_sanitized = sanitize_open(ctx['fragment_filename_sanitized'], 'rb')
down, frag_sanitized = self.sanitize_open(ctx['fragment_filename_sanitized'], 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
down.close()
@@ -214,7 +215,7 @@ class FragmentFD(FileDownloader):
self._write_ytdl_file(ctx)
assert ctx['fragment_index'] == 0
dest_stream, tmpfilename = sanitize_open(tmpfilename, open_mode)
dest_stream, tmpfilename = self.sanitize_open(tmpfilename, open_mode)
ctx.update({
'dl': dl,
@@ -365,8 +366,7 @@ class FragmentFD(FileDownloader):
# not what it decrypts to.
if self.params.get('test', False):
return frag_content
decrypted_data = aes_cbc_decrypt_bytes(frag_content, decrypt_info['KEY'], iv)
return decrypted_data[:-decrypted_data[-1]]
return unpad_pkcs7(aes_cbc_decrypt_bytes(frag_content, decrypt_info['KEY'], iv))
return decrypt_fragment
@@ -375,17 +375,20 @@ class FragmentFD(FileDownloader):
@params (ctx1, fragments1, info_dict1), (ctx2, fragments2, info_dict2), ...
all args must be either tuple or list
'''
interrupt_trigger = [True]
max_progress = len(args)
if max_progress == 1:
return self.download_and_append_fragments(*args[0], pack_func=pack_func, finish_func=finish_func)
max_workers = self.params.get('concurrent_fragment_downloads', max_progress)
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if max_progress > 1:
self._prepare_multiline_status(max_progress)
def thread_func(idx, ctx, fragments, info_dict, tpe):
ctx['max_progress'] = max_progress
ctx['progress_idx'] = idx
return self.download_and_append_fragments(ctx, fragments, info_dict, pack_func=pack_func, finish_func=finish_func, tpe=tpe)
return self.download_and_append_fragments(
ctx, fragments, info_dict, pack_func=pack_func, finish_func=finish_func,
tpe=tpe, interrupt_trigger=interrupt_trigger)
class FTPE(concurrent.futures.ThreadPoolExecutor):
# has to stop this or it's going to wait on the worker thread itself
@@ -393,8 +396,11 @@ class FragmentFD(FileDownloader):
pass
spins = []
if compat_os_name == 'nt':
self.report_warning('Ctrl+C does not work on Windows when used with parallel threads. '
'This is a known issue and patches are welcome')
for idx, (ctx, fragments, info_dict) in enumerate(args):
tpe = FTPE(ceil(max_workers / max_progress))
tpe = FTPE(math.ceil(max_workers / max_progress))
job = tpe.submit(thread_func, idx, ctx, fragments, info_dict, tpe)
spins.append((tpe, job))
@@ -402,18 +408,33 @@ class FragmentFD(FileDownloader):
for tpe, job in spins:
try:
result = result and job.result()
except KeyboardInterrupt:
interrupt_trigger[0] = False
finally:
tpe.shutdown(wait=True)
if not interrupt_trigger[0]:
raise KeyboardInterrupt()
return result
def download_and_append_fragments(self, ctx, fragments, info_dict, *, pack_func=None, finish_func=None, tpe=None):
def download_and_append_fragments(
self, ctx, fragments, info_dict, *, pack_func=None, finish_func=None,
tpe=None, interrupt_trigger=None):
if not interrupt_trigger:
interrupt_trigger = (True, )
fragment_retries = self.params.get('fragment_retries', 0)
is_fatal = (lambda idx: idx == 0) if self.params.get('skip_unavailable_fragments', True) else (lambda _: True)
is_fatal = (
((lambda _: False) if info_dict.get('is_live') else (lambda idx: idx == 0))
if self.params.get('skip_unavailable_fragments', True) else (lambda _: True))
if not pack_func:
pack_func = lambda frag_content, _: frag_content
def download_fragment(fragment, ctx):
frag_index = ctx['fragment_index'] = fragment['frag_index']
ctx['last_error'] = None
if not interrupt_trigger[0]:
return False, frag_index
headers = info_dict.get('http_headers', {}).copy()
byte_range = fragment.get('byte_range')
if byte_range:
@@ -428,12 +449,13 @@ class FragmentFD(FileDownloader):
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
except (compat_urllib_error.HTTPError, http.client.IncompleteRead) as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
ctx['last_error'] = err
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
@@ -466,7 +488,8 @@ class FragmentFD(FileDownloader):
decrypt_fragment = self.decrypter(info_dict)
max_workers = self.params.get('concurrent_fragment_downloads', 1)
max_workers = math.ceil(
self.params.get('concurrent_fragment_downloads', 1) / ctx.get('max_progress', 1))
if can_threaded_download and max_workers > 1:
def _download_fragment(fragment):
@@ -477,6 +500,8 @@ class FragmentFD(FileDownloader):
self.report_warning('The download speed shown is only of one thread. This is a known issue and patches are welcome')
with tpe or concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
for fragment, frag_content, frag_index, frag_filename in pool.map(_download_fragment, fragments):
if not interrupt_trigger[0]:
break
ctx['fragment_filename_sanitized'] = frag_filename
ctx['fragment_index'] = frag_index
result = append_fragment(decrypt_fragment(fragment, frag_content), frag_index, ctx)
@@ -484,6 +509,8 @@ class FragmentFD(FileDownloader):
return False
else:
for fragment in fragments:
if not interrupt_trigger[0]:
break
frag_content, frag_index = download_fragment(fragment, ctx)
result = append_fragment(decrypt_fragment(fragment, frag_content), frag_index, ctx)
if not result:

View File

@@ -16,7 +16,6 @@ from ..utils import (
ContentTooShortError,
encodeFilename,
int_or_none,
sanitize_open,
sanitized_Request,
ThrottledDownload,
write_xattr,
@@ -263,7 +262,7 @@ class HttpFD(FileDownloader):
# Open destination file just in time
if ctx.stream is None:
try:
ctx.stream, ctx.tmpfilename = sanitize_open(
ctx.stream, ctx.tmpfilename = self.sanitize_open(
ctx.tmpfilename, ctx.open_mode)
assert ctx.stream is not None
ctx.filename = self.undo_temp_name(ctx.tmpfilename)

View File

@@ -5,9 +5,12 @@ import threading
try:
import websockets
has_websockets = True
except ImportError:
except (ImportError, SyntaxError):
# websockets 3.10 on python 3.6 causes SyntaxError
# See https://github.com/yt-dlp/yt-dlp/issues/2633
has_websockets = False
else:
has_websockets = True
from .common import FileDownloader
from .external import FFmpegFD

View File

@@ -8,6 +8,7 @@ import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
dict_get,
ExtractorError,
js_to_json,
int_or_none,
@@ -233,8 +234,6 @@ class ABCIViewIE(InfoExtractor):
}]
is_live = video_params.get('livestream') == '1'
if is_live:
title = self._live_title(title)
return {
'id': video_id,
@@ -255,3 +254,65 @@ class ABCIViewIE(InfoExtractor):
'subtitles': subtitles,
'is_live': is_live,
}
class ABCIViewShowSeriesIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview:showseries'
_VALID_URL = r'https?://iview\.abc\.net\.au/show/(?P<id>[^/]+)(?:/series/\d+)?$'
_GEO_COUNTRIES = ['AU']
_TESTS = [{
'url': 'https://iview.abc.net.au/show/upper-middle-bogan',
'info_dict': {
'id': '124870-1',
'title': 'Series 1',
'description': 'md5:93119346c24a7c322d446d8eece430ff',
'series': 'Upper Middle Bogan',
'season': 'Series 1',
'thumbnail': r're:^https?://cdn\.iview\.abc\.net\.au/thumbs/.*\.jpg$'
},
'playlist_count': 8,
}, {
'url': 'https://iview.abc.net.au/show/upper-middle-bogan',
'info_dict': {
'id': 'CO1108V001S00',
'ext': 'mp4',
'title': 'Series 1 Ep 1 I\'m A Swan',
'description': 'md5:7b676758c1de11a30b79b4d301e8da93',
'series': 'Upper Middle Bogan',
'uploader_id': 'abc1',
'upload_date': '20210630',
'timestamp': 1625036400,
},
'params': {
'noplaylist': True,
'skip_download': 'm3u8',
},
}]
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
webpage_data = self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*[\'"](.+?)[\'"]\s*;',
webpage, 'initial state')
video_data = self._parse_json(
unescapeHTML(webpage_data).encode('utf-8').decode('unicode_escape'), show_id)
video_data = video_data['route']['pageData']['_embedded']
highlight = try_get(video_data, lambda x: x['highlightVideo']['shareUrl'])
if not self._yes_playlist(show_id, bool(highlight), video_label='highlight video'):
return self.url_result(highlight, ie=ABCIViewIE.ie_key())
series = video_data['selectedSeries']
return {
'_type': 'playlist',
'entries': [self.url_result(episode['shareUrl'])
for episode in series['_embedded']['videoEpisodes']],
'id': series.get('id'),
'title': dict_get(series, ('title', 'displaySubtitle')),
'description': series.get('description'),
'series': dict_get(series, ('showTitle', 'displayTitle')),
'season': dict_get(series, ('title', 'displaySubtitle')),
'thumbnail': series.get('thumbnail'),
}

View File

@@ -8,11 +8,10 @@ import os
import random
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..aes import aes_cbc_decrypt_bytes, unpad_pkcs7
from ..compat import (
compat_HTTPError,
compat_b64decode,
compat_ord,
)
from ..utils import (
ass_subtitles_timecode,
@@ -84,14 +83,11 @@ class ADNIE(InfoExtractor):
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(binascii.unhexlify(self._K + 'ab9f52f5baae7c72')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
None, fatal=False)
dec_subtitles = unpad_pkcs7(aes_cbc_decrypt_bytes(
compat_b64decode(enc_subtitles[24:]),
binascii.unhexlify(self._K + 'ab9f52f5baae7c72'),
compat_b64decode(enc_subtitles[:24])))
subtitles_json = self._parse_json(dec_subtitles.decode(), None, fatal=False)
if not subtitles_json:
return None

View File

@@ -31,7 +31,7 @@ class AdobeConnectIE(InfoExtractor):
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'formats': formats,
'is_live': is_live,
}

View File

@@ -10,7 +10,11 @@ from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
qualities,
traverse_obj,
unified_strdate,
unified_timestamp,
update_url_query,
url_or_none,
urlencode_postdata,
xpath_text,
@@ -380,3 +384,96 @@ class AfreecaTVIE(InfoExtractor):
})
return info
class AfreecaTVLiveIE(AfreecaTVIE):
IE_NAME = 'afreecatv:live'
_VALID_URL = r'https?://play\.afreeca(?:tv)?\.com/(?P<id>[^/]+)(?:/(?P<bno>\d+))?'
_TESTS = [{
'url': 'https://play.afreecatv.com/pyh3646/237852185',
'info_dict': {
'id': '237852185',
'ext': 'mp4',
'title': '【 우루과이 오늘은 무슨일이? 】',
'uploader': '박진우[JINU]',
'uploader_id': 'pyh3646',
'timestamp': 1640661495,
'is_live': True,
},
'skip': 'Livestream has ended',
}, {
'url': 'http://play.afreeca.com/pyh3646/237852185',
'only_matching': True,
}, {
'url': 'http://play.afreeca.com/pyh3646',
'only_matching': True,
}]
_LIVE_API_URL = 'https://live.afreecatv.com/afreeca/player_live_api.php'
_QUALITIES = ('sd', 'hd', 'hd2k', 'original')
def _real_extract(self, url):
broadcaster_id, broadcast_no = self._match_valid_url(url).group('id', 'bno')
info = self._download_json(self._LIVE_API_URL, broadcaster_id, fatal=False,
data=urlencode_postdata({'bid': broadcaster_id})) or {}
channel_info = info.get('CHANNEL') or {}
broadcaster_id = channel_info.get('BJID') or broadcaster_id
broadcast_no = channel_info.get('BNO') or broadcast_no
if not broadcast_no:
raise ExtractorError(f'Unable to extract broadcast number ({broadcaster_id} may not be live)', expected=True)
formats = []
quality_key = qualities(self._QUALITIES)
for quality_str in self._QUALITIES:
aid_response = self._download_json(
self._LIVE_API_URL, broadcast_no, fatal=False,
data=urlencode_postdata({
'bno': broadcast_no,
'stream_type': 'common',
'type': 'aid',
'quality': quality_str,
}),
note=f'Downloading access token for {quality_str} stream',
errnote=f'Unable to download access token for {quality_str} stream')
aid = traverse_obj(aid_response, ('CHANNEL', 'AID'))
if not aid:
continue
stream_base_url = channel_info.get('RMD') or 'https://livestream-manager.afreecatv.com'
stream_info = self._download_json(
f'{stream_base_url}/broad_stream_assign.html', broadcast_no, fatal=False,
query={
'return_type': channel_info.get('CDN', 'gcp_cdn'),
'broad_key': f'{broadcast_no}-common-{quality_str}-hls',
},
note=f'Downloading metadata for {quality_str} stream',
errnote=f'Unable to download metadata for {quality_str} stream') or {}
if stream_info.get('view_url'):
formats.append({
'format_id': quality_str,
'url': update_url_query(stream_info['view_url'], {'aid': aid}),
'ext': 'mp4',
'protocol': 'm3u8',
'quality': quality_key(quality_str),
})
self._sort_formats(formats)
station_info = self._download_json(
'https://st.afreecatv.com/api/get_station_status.php', broadcast_no,
query={'szBjId': broadcaster_id}, fatal=False,
note='Downloading channel metadata', errnote='Unable to download channel metadata') or {}
return {
'id': broadcast_no,
'title': channel_info.get('TITLE') or station_info.get('station_title'),
'uploader': channel_info.get('BJNICK') or station_info.get('station_name'),
'uploader_id': broadcaster_id,
'timestamp': unified_timestamp(station_info.get('broad_start')),
'formats': formats,
'is_live': True,
}

View File

@@ -33,19 +33,22 @@ class AparatIE(InfoExtractor):
'only_matching': True,
}]
def _parse_options(self, webpage, video_id, fatal=True):
return self._parse_json(self._search_regex(
r'options\s*=\s*({.+?})\s*;', webpage, 'options', default='{}'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
# Provides more metadata
# If available, provides more metadata
webpage = self._download_webpage(url, video_id, fatal=False)
options = self._parse_options(webpage, video_id, fatal=False)
if not webpage:
if not options:
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
options = self._parse_json(self._search_regex(
r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
video_id, 'Downloading embed webpage')
options = self._parse_options(webpage, video_id)
formats = []
for sources in (options.get('multiSRC') or []):

View File

@@ -3,33 +3,37 @@ from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from .youtube import YoutubeIE
from .youtube import YoutubeIE, YoutubeBaseInfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_HTTPError
)
from ..utils import (
bug_reports_message,
clean_html,
determine_ext,
dict_get,
extract_attributes,
ExtractorError,
get_element_by_id,
HEADRequest,
int_or_none,
join_nonempty,
KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext,
orderedSet,
parse_duration,
parse_qs,
RegexNotFoundError,
str_to_int,
str_or_none,
traverse_obj,
try_get,
unified_strdate,
unified_timestamp,
urlhandle_detect_ext,
url_or_none
)
@@ -61,7 +65,7 @@ class ArchiveOrgIE(InfoExtractor):
'description': 'md5:43a603fd6c5b4b90d12a96b921212b9c',
'uploader': 'yorkmba99@hotmail.com',
'timestamp': 1387699629,
'upload_date': "20131222",
'upload_date': '20131222',
},
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
@@ -147,8 +151,7 @@ class ArchiveOrgIE(InfoExtractor):
# Archive.org metadata API doesn't clearly demarcate playlist entries
# or subtitle tracks, so we get them from the embeddable player.
embed_page = self._download_webpage(
'https://archive.org/embed/' + identifier, identifier)
embed_page = self._download_webpage(f'https://archive.org/embed/{identifier}', identifier)
playlist = self._playlist_data(embed_page)
entries = {}
@@ -163,17 +166,17 @@ class ArchiveOrgIE(InfoExtractor):
'thumbnails': [],
'artist': p.get('artist'),
'track': p.get('title'),
'subtitles': {}}
'subtitles': {},
}
for track in p.get('tracks', []):
if track['kind'] != 'subtitles':
continue
entries[p['orig']][track['label']] = {
'url': 'https://archive.org/' + track['file'].lstrip('/')}
'url': 'https://archive.org/' + track['file'].lstrip('/')
}
metadata = self._download_json(
'http://archive.org/metadata/' + identifier, identifier)
metadata = self._download_json('http://archive.org/metadata/' + identifier, identifier)
m = metadata['metadata']
identifier = m['identifier']
@@ -186,7 +189,7 @@ class ArchiveOrgIE(InfoExtractor):
'license': m.get('licenseurl'),
'release_date': unified_strdate(m.get('date')),
'timestamp': unified_timestamp(dict_get(m, ['publicdate', 'addeddate'])),
'webpage_url': 'https://archive.org/details/' + identifier,
'webpage_url': f'https://archive.org/details/{identifier}',
'location': m.get('venue'),
'release_year': int_or_none(m.get('year'))}
@@ -204,7 +207,7 @@ class ArchiveOrgIE(InfoExtractor):
'discnumber': int_or_none(f.get('disc')),
'release_year': int_or_none(f.get('year'))})
entry = entries[f['name']]
elif f.get('original') in entries:
elif traverse_obj(f, 'original', expected_type=str) in entries:
entry = entries[f['original']]
else:
continue
@@ -227,13 +230,12 @@ class ArchiveOrgIE(InfoExtractor):
'filesize': int_or_none(f.get('size')),
'protocol': 'https'})
# Sort available formats by filesize
for entry in entries.values():
entry['formats'] = list(sorted(entry['formats'], key=lambda x: x.get('filesize', -1)))
self._sort_formats(entry['formats'])
if len(entries) == 1:
# If there's only one item, use it as the main info dict
only_video = entries[list(entries.keys())[0]]
only_video = next(iter(entries.values()))
if entry_id:
info = merge_dicts(only_video, info)
else:
@@ -258,19 +260,19 @@ class ArchiveOrgIE(InfoExtractor):
class YoutubeWebArchiveIE(InfoExtractor):
IE_NAME = 'web.archive:youtube'
IE_DESC = 'web.archive.org saved youtube videos'
_VALID_URL = r"""(?x)^
(?:https?://)?web\.archive\.org/
(?:web/)?
(?:[0-9A-Za-z_*]+/)? # /web and the version index is optional
(?:https?(?::|%3[Aa])//)?
(?:
(?:\w+\.)?youtube\.com/watch(?:\?|%3[fF])(?:[^\#]+(?:&|%26))?v(?:=|%3[dD]) # Youtube URL
|(wayback-fakeurl\.archive\.org/yt/) # Or the internal fake url
)
(?P<id>[0-9A-Za-z_-]{11})(?:%26|\#|&|$)
"""
IE_DESC = 'web.archive.org saved youtube videos, "ytarchive:" prefix'
_VALID_URL = r'''(?x)(?:(?P<prefix>ytarchive:)|
(?:https?://)?web\.archive\.org/
(?:web/)?(?:(?P<date>[0-9]{14})?[0-9A-Za-z_*]*/)? # /web and the version index is optional
(?:https?(?::|%3[Aa])//)?(?:
(?:\w+\.)?youtube\.com(?::(?:80|443))?/watch(?:\.php)?(?:\?|%3[fF])(?:[^\#]+(?:&|%26))?v(?:=|%3[dD]) # Youtube URL
|(?:wayback-fakeurl\.archive\.org/yt/) # Or the internal fake url
)
)(?P<id>[0-9A-Za-z_-]{11})
(?(prefix)
(?::(?P<date2>[0-9]{14}))?$|
(?:%26|[#&]|$)
)'''
_TESTS = [
{
@@ -278,141 +280,394 @@ class YoutubeWebArchiveIE(InfoExtractor):
'info_dict': {
'id': 'aYAGB11YrSs',
'ext': 'webm',
'title': 'Team Fortress 2 - Sandviches!'
'title': 'Team Fortress 2 - Sandviches!',
'description': 'md5:4984c0f9a07f349fc5d8e82ab7af4eaf',
'upload_date': '20110926',
'uploader': 'Zeurel',
'channel_id': 'UCukCyHaD-bK3in_pKpfH9Eg',
'duration': 32,
'uploader_id': 'Zeurel',
'uploader_url': 'http://www.youtube.com/user/Zeurel'
}
},
{
}, {
# Internal link
'url': 'https://web.archive.org/web/2oe/http://wayback-fakeurl.archive.org/yt/97t7Xj_iBv0',
'info_dict': {
'id': '97t7Xj_iBv0',
'ext': 'mp4',
'title': 'How Flexible Machines Could Save The World'
'title': 'Why Machines That Bend Are Better',
'description': 'md5:00404df2c632d16a674ff8df1ecfbb6c',
'upload_date': '20190312',
'uploader': 'Veritasium',
'channel_id': 'UCHnyfMqiRRG1u-2MsSQLbXA',
'duration': 771,
'uploader_id': '1veritasium',
'uploader_url': 'http://www.youtube.com/user/1veritasium'
}
},
{
# Video from 2012, webm format itag 45.
}, {
# Video from 2012, webm format itag 45. Newest capture is deleted video, with an invalid description.
# Should use the date in the link. Title ends with '- Youtube'. Capture has description in eow-description
'url': 'https://web.archive.org/web/20120712231619/http://www.youtube.com/watch?v=AkhihxRKcrs&gl=US&hl=en',
'info_dict': {
'id': 'AkhihxRKcrs',
'ext': 'webm',
'title': 'Limited Run: Mondo\'s Modern Classic 1 of 3 (SDCC 2012)'
'title': 'Limited Run: Mondo\'s Modern Classic 1 of 3 (SDCC 2012)',
'upload_date': '20120712',
'duration': 398,
'description': 'md5:ff4de6a7980cb65d951c2f6966a4f2f3',
'uploader_id': 'machinima',
'uploader_url': 'http://www.youtube.com/user/machinima'
}
},
{
# Old flash-only video. Webpage title starts with "YouTube - ".
}, {
# FLV video. Video file URL does not provide itag information
'url': 'https://web.archive.org/web/20081211103536/http://www.youtube.com/watch?v=jNQXAC9IVRw',
'info_dict': {
'id': 'jNQXAC9IVRw',
'ext': 'unknown_video',
'title': 'Me at the zoo'
'ext': 'flv',
'title': 'Me at the zoo',
'upload_date': '20050423',
'channel_id': 'UC4QobU6STFB0P71PMvOGN5A',
'duration': 19,
'description': 'md5:10436b12e07ac43ff8df65287a56efb4',
'uploader_id': 'jawed',
'uploader_url': 'http://www.youtube.com/user/jawed'
}
},
{
# Flash video with .flv extension (itag 34). Title has prefix "YouTube -"
# Title has some weird unicode characters too.
}, {
'url': 'https://web.archive.org/web/20110712231407/http://www.youtube.com/watch?v=lTx3G6h2xyA',
'info_dict': {
'id': 'lTx3G6h2xyA',
'ext': 'flv',
'title': 'Madeon - Pop Culture (live mashup)'
'title': 'Madeon - Pop Culture (live mashup)',
'upload_date': '20110711',
'uploader': 'Madeon',
'channel_id': 'UCqMDNf3Pn5L7pcNkuSEeO3w',
'duration': 204,
'description': 'md5:f7535343b6eda34a314eff8b85444680',
'uploader_id': 'itsmadeon',
'uploader_url': 'http://www.youtube.com/user/itsmadeon'
}
},
{ # Some versions of Youtube have have "YouTube" as page title in html (and later rewritten by js).
}, {
# First capture is of dead video, second is the oldest from CDX response.
'url': 'https://web.archive.org/https://www.youtube.com/watch?v=1JYutPM8O6E',
'info_dict': {
'id': '1JYutPM8O6E',
'ext': 'mp4',
'title': 'Fake Teen Doctor Strikes AGAIN! - Weekly Weird News',
'upload_date': '20160218',
'channel_id': 'UCdIaNUarhzLSXGoItz7BHVA',
'duration': 1236,
'description': 'md5:21032bae736421e89c2edf36d1936947',
'uploader_id': 'MachinimaETC',
'uploader_url': 'http://www.youtube.com/user/MachinimaETC'
}
}, {
# First capture of dead video, capture date in link links to dead capture.
'url': 'https://web.archive.org/web/20180803221945/https://www.youtube.com/watch?v=6FPhZJGvf4E',
'info_dict': {
'id': '6FPhZJGvf4E',
'ext': 'mp4',
'title': 'WTF: Video Games Still Launch BROKEN?! - T.U.G.S.',
'upload_date': '20160219',
'channel_id': 'UCdIaNUarhzLSXGoItz7BHVA',
'duration': 798,
'description': 'md5:a1dbf12d9a3bd7cb4c5e33b27d77ffe7',
'uploader_id': 'MachinimaETC',
'uploader_url': 'http://www.youtube.com/user/MachinimaETC'
},
'expected_warnings': [
r'unable to download capture webpage \(it may not be archived\)'
]
}, { # Very old YouTube page, has - YouTube in title.
'url': 'http://web.archive.org/web/20070302011044/http://youtube.com/watch?v=-06-KB9XTzg',
'info_dict': {
'id': '-06-KB9XTzg',
'ext': 'flv',
'title': 'New Coin Hack!! 100% Safe!!'
}
}, {
'url': 'web.archive.org/https://www.youtube.com/watch?v=dWW7qP423y8',
'info_dict': {
'id': 'dWW7qP423y8',
'ext': 'mp4',
'title': 'It\'s Bootleg AirPods Time.',
'upload_date': '20211021',
'channel_id': 'UC7Jwj9fkrf1adN4fMmTkpug',
'channel_url': 'http://www.youtube.com/channel/UC7Jwj9fkrf1adN4fMmTkpug',
'duration': 810,
'description': 'md5:7b567f898d8237b256f36c1a07d6d7bc',
'uploader': 'DankPods',
'uploader_id': 'UC7Jwj9fkrf1adN4fMmTkpug',
'uploader_url': 'http://www.youtube.com/channel/UC7Jwj9fkrf1adN4fMmTkpug'
}
}, {
# player response contains '};' See: https://github.com/ytdl-org/youtube-dl/issues/27093
'url': 'https://web.archive.org/web/20200827003909if_/http://www.youtube.com/watch?v=6Dh-RL__uN4',
'info_dict': {
'id': '6Dh-RL__uN4',
'ext': 'mp4',
'title': 'bitch lasagna',
'upload_date': '20181005',
'channel_id': 'UC-lHJZR3Gqxm24_Vd_AJ5Yw',
'channel_url': 'http://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw',
'duration': 135,
'description': 'md5:2dbe4051feeff2dab5f41f82bb6d11d0',
'uploader': 'PewDiePie',
'uploader_id': 'PewDiePie',
'uploader_url': 'http://www.youtube.com/user/PewDiePie'
}
}, {
'url': 'https://web.archive.org/web/http://www.youtube.com/watch?v=kH-G_aIBlFw',
'info_dict': {
'id': 'kH-G_aIBlFw',
'ext': 'mp4',
'title': 'kH-G_aIBlFw'
},
'expected_warnings': [
'unable to extract title',
]
},
{
# First capture is a 302 redirect intermediary page.
'url': 'https://web.archive.org/web/20050214000000/http://www.youtube.com/watch?v=0altSZ96U4M',
'info_dict': {
'id': '0altSZ96U4M',
'ext': 'mp4',
'title': '0altSZ96U4M'
},
'expected_warnings': [
'unable to extract title',
]
},
{
'only_matching': True
}, {
'url': 'https://web.archive.org/web/20050214000000_if/http://www.youtube.com/watch?v=0altSZ96U4M',
'only_matching': True
}, {
# Video not archived, only capture is unavailable video page
'url': 'https://web.archive.org/web/20210530071008/https://www.youtube.com/watch?v=lHJTf93HL1s&spfreload=10',
'only_matching': True,
},
{ # Encoded url
'only_matching': True
}, { # Encoded url
'url': 'https://web.archive.org/web/20120712231619/http%3A//www.youtube.com/watch%3Fgl%3DUS%26v%3DAkhihxRKcrs%26hl%3Den',
'only_matching': True,
},
{
'only_matching': True
}, {
'url': 'https://web.archive.org/web/20120712231619/http%3A//www.youtube.com/watch%3Fv%3DAkhihxRKcrs%26gl%3DUS%26hl%3Den',
'only_matching': True,
}
'only_matching': True
}, {
'url': 'https://web.archive.org/web/20060527081937/http://www.youtube.com:80/watch.php?v=ELTFsLT73fA&amp;search=soccer',
'only_matching': True
}, {
'url': 'https://web.archive.org/http://www.youtube.com:80/watch?v=-05VVye-ffg',
'only_matching': True
}, {
'url': 'ytarchive:BaW_jenozKc:20050214000000',
'only_matching': True
}, {
'url': 'ytarchive:BaW_jenozKc',
'only_matching': True
},
]
_YT_INITIAL_DATA_RE = r'(?:(?:(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_DATA_RE
_YT_INITIAL_PLAYER_RESPONSE_RE = r'(?:(?:(?:window\s*\[\s*["\']ytInitialPlayerResponse["\']\s*\]|ytInitialPlayerResponse)\s*=[(\s]*({.+?})[)\s]*;)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_PLAYER_RESPONSE_RE
_YT_INITIAL_BOUNDARY_RE = r'(?:(?:var\s+meta|</script|\n)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_BOUNDARY_RE
_YT_DEFAULT_THUMB_SERVERS = ['i.ytimg.com'] # thumbnails most likely archived on these servers
_YT_ALL_THUMB_SERVERS = orderedSet(
_YT_DEFAULT_THUMB_SERVERS + ['img.youtube.com', *[f'{c}{n or ""}.ytimg.com' for c in ('i', 's') for n in (*range(0, 5), 9)]])
_WAYBACK_BASE_URL = 'https://web.archive.org/web/%sif_/'
_OLDEST_CAPTURE_DATE = 20050214000000
_NEWEST_CAPTURE_DATE = 20500101000000
def _call_cdx_api(self, item_id, url, filters: list = None, collapse: list = None, query: dict = None, note='Downloading CDX API JSON'):
# CDX docs: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md
query = {
'url': url,
'output': 'json',
'fl': 'original,mimetype,length,timestamp',
'limit': 500,
'filter': ['statuscode:200'] + (filters or []),
'collapse': collapse or [],
**(query or {})
}
res = self._download_json('https://web.archive.org/cdx/search/cdx', item_id, note, query=query)
if isinstance(res, list) and len(res) >= 2:
# format response to make it easier to use
return list(dict(zip(res[0], v)) for v in res[1:])
elif not isinstance(res, list) or len(res) != 0:
self.report_warning('Error while parsing CDX API response' + bug_reports_message())
def _extract_yt_initial_variable(self, webpage, regex, video_id, name):
return self._parse_json(self._search_regex(
(r'%s\s*%s' % (regex, self._YT_INITIAL_BOUNDARY_RE),
regex), webpage, name, default='{}'), video_id, fatal=False)
def _extract_webpage_title(self, webpage):
page_title = self._html_search_regex(
r'<title>([^<]*)</title>', webpage, 'title', default='')
# YouTube video pages appear to always have either 'YouTube -' as prefix or '- YouTube' as suffix.
return self._html_search_regex(
r'(?:YouTube\s*-\s*(.*)$)|(?:(.*)\s*-\s*YouTube$)',
page_title, 'title', default='')
def _extract_metadata(self, video_id, webpage):
search_meta = ((lambda x: self._html_search_meta(x, webpage, default=None)) if webpage else (lambda x: None))
player_response = self._extract_yt_initial_variable(
webpage, self._YT_INITIAL_PLAYER_RESPONSE_RE, video_id, 'initial player response') or {}
initial_data = self._extract_yt_initial_variable(
webpage, self._YT_INITIAL_DATA_RE, video_id, 'initial player response') or {}
initial_data_video = traverse_obj(
initial_data, ('contents', 'twoColumnWatchNextResults', 'results', 'results', 'contents', ..., 'videoPrimaryInfoRenderer'),
expected_type=dict, get_all=False, default={})
video_details = traverse_obj(
player_response, 'videoDetails', expected_type=dict, get_all=False, default={})
microformats = traverse_obj(
player_response, ('microformat', 'playerMicroformatRenderer'), expected_type=dict, get_all=False, default={})
video_title = (
video_details.get('title')
or YoutubeBaseInfoExtractor._get_text(microformats, 'title')
or YoutubeBaseInfoExtractor._get_text(initial_data_video, 'title')
or self._extract_webpage_title(webpage)
or search_meta(['og:title', 'twitter:title', 'title']))
channel_id = str_or_none(
video_details.get('channelId')
or microformats.get('externalChannelId')
or search_meta('channelId')
or self._search_regex(
r'data-channel-external-id=(["\'])(?P<id>(?:(?!\1).)+)\1', # @b45a9e6
webpage, 'channel id', default=None, group='id'))
channel_url = f'http://www.youtube.com/channel/{channel_id}' if channel_id else None
duration = int_or_none(
video_details.get('lengthSeconds')
or microformats.get('lengthSeconds')
or parse_duration(search_meta('duration')))
description = (
video_details.get('shortDescription')
or YoutubeBaseInfoExtractor._get_text(microformats, 'description')
or clean_html(get_element_by_id('eow-description', webpage)) # @9e6dd23
or search_meta(['description', 'og:description', 'twitter:description']))
uploader = video_details.get('author')
# Uploader ID and URL
uploader_mobj = re.search(
r'<link itemprop="url" href="(?P<uploader_url>https?://www\.youtube\.com/(?:user|channel)/(?P<uploader_id>[^"]+))">', # @fd05024
webpage)
if uploader_mobj is not None:
uploader_id, uploader_url = uploader_mobj.group('uploader_id'), uploader_mobj.group('uploader_url')
else:
# @a6211d2
uploader_url = url_or_none(microformats.get('ownerProfileUrl'))
uploader_id = self._search_regex(
r'(?:user|channel)/([^/]+)', uploader_url or '', 'uploader id', default=None)
upload_date = unified_strdate(
dict_get(microformats, ('uploadDate', 'publishDate'))
or search_meta(['uploadDate', 'datePublished'])
or self._search_regex(
[r'(?s)id="eow-date.*?>(.*?)</span>',
r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'], # @7998520
webpage, 'upload date', default=None))
return {
'title': video_title,
'description': description,
'upload_date': upload_date,
'uploader': uploader,
'channel_id': channel_id,
'channel_url': channel_url,
'duration': duration,
'uploader_url': uploader_url,
'uploader_id': uploader_id,
}
def _extract_thumbnails(self, video_id):
try_all = 'thumbnails' in self._configuration_arg('check_all')
thumbnail_base_urls = ['http://{server}/vi{webp}/{video_id}'.format(
webp='_webp' if ext == 'webp' else '', video_id=video_id, server=server)
for server in (self._YT_ALL_THUMB_SERVERS if try_all else self._YT_DEFAULT_THUMB_SERVERS) for ext in (('jpg', 'webp') if try_all else ('jpg',))]
thumbnails = []
for url in thumbnail_base_urls:
response = self._call_cdx_api(
video_id, url, filters=['mimetype:image/(?:webp|jpeg)'],
collapse=['urlkey'], query={'matchType': 'prefix'})
if not response:
continue
thumbnails.extend(
{
'url': (self._WAYBACK_BASE_URL % (int_or_none(thumbnail_dict.get('timestamp')) or self._OLDEST_CAPTURE_DATE)) + thumbnail_dict.get('original'),
'filesize': int_or_none(thumbnail_dict.get('length')),
'preference': int_or_none(thumbnail_dict.get('length'))
} for thumbnail_dict in response)
if not try_all:
break
self._remove_duplicate_formats(thumbnails)
return thumbnails
def _get_capture_dates(self, video_id, url_date):
capture_dates = []
# Note: CDX API will not find watch pages with extra params in the url.
response = self._call_cdx_api(
video_id, f'https://www.youtube.com/watch?v={video_id}',
filters=['mimetype:text/html'], collapse=['timestamp:6', 'digest'], query={'matchType': 'prefix'}) or []
all_captures = sorted([int_or_none(r['timestamp']) for r in response if int_or_none(r['timestamp']) is not None])
# Prefer the new polymer UI captures as we support extracting more metadata from them
# WBM captures seem to all switch to this layout ~July 2020
modern_captures = [x for x in all_captures if x >= 20200701000000]
if modern_captures:
capture_dates.append(modern_captures[0])
capture_dates.append(url_date)
if all_captures:
capture_dates.append(all_captures[0])
if 'captures' in self._configuration_arg('check_all'):
capture_dates.extend(modern_captures + all_captures)
# Fallbacks if any of the above fail
capture_dates.extend([self._OLDEST_CAPTURE_DATE, self._NEWEST_CAPTURE_DATE])
return orderedSet(filter(None, capture_dates))
def _real_extract(self, url):
video_id = self._match_id(url)
title = video_id # if we are not able get a title
video_id, url_date, url_date_2 = self._match_valid_url(url).group('id', 'date', 'date2')
url_date = url_date or url_date_2
def _extract_title(webpage):
page_title = self._html_search_regex(
r'<title>([^<]*)</title>', webpage, 'title', fatal=False) or ''
# YouTube video pages appear to always have either 'YouTube -' as suffix or '- YouTube' as prefix.
try:
page_title = self._html_search_regex(
r'(?:YouTube\s*-\s*(.*)$)|(?:(.*)\s*-\s*YouTube$)',
page_title, 'title', default='')
except RegexNotFoundError:
page_title = None
if not page_title:
self.report_warning('unable to extract title', video_id=video_id)
return
return page_title
# If the video is no longer available, the oldest capture may be one before it was removed.
# Setting the capture date in url to early date seems to redirect to earliest capture.
webpage = self._download_webpage(
'https://web.archive.org/web/20050214000000/http://www.youtube.com/watch?v=%s' % video_id,
video_id=video_id, fatal=False, errnote='unable to download video webpage (probably not archived).')
if webpage:
title = _extract_title(webpage) or title
# Use link translator mentioned in https://github.com/ytdl-org/youtube-dl/issues/13655
internal_fake_url = 'https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/%s' % video_id
urlh = None
try:
video_file_webpage = self._request_webpage(
HEADRequest(internal_fake_url), video_id,
note='Fetching video file url', expected_status=True)
urlh = self._request_webpage(
HEADRequest('https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/%s' % video_id),
video_id, note='Fetching archived video file url', expected_status=True)
except ExtractorError as e:
# HTTP Error 404 is expected if the video is not saved.
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
raise ExtractorError(
'HTTP Error %s. Most likely the video is not archived or issue with web.archive.org.' % e.cause.code,
self.raise_no_formats(
'The requested video is not archived, indexed, or there is an issue with web.archive.org',
expected=True)
raise
video_file_url = compat_urllib_parse_unquote(video_file_webpage.url)
video_file_url_qs = parse_qs(video_file_url)
else:
raise
# Attempt to recover any ext & format info from playback url
format = {'url': video_file_url}
itag = try_get(video_file_url_qs, lambda x: x['itag'][0])
if itag and itag in YoutubeIE._formats: # Naughty access but it works
format.update(YoutubeIE._formats[itag])
format.update({'format_id': itag})
else:
mime = try_get(video_file_url_qs, lambda x: x['mime'][0])
ext = mimetype2ext(mime) or determine_ext(video_file_url)
format.update({'ext': ext})
return {
'id': video_id,
'title': title,
'formats': [format],
'duration': str_to_int(try_get(video_file_url_qs, lambda x: x['dur'][0]))
}
capture_dates = self._get_capture_dates(video_id, int_or_none(url_date))
self.write_debug('Captures to try: ' + join_nonempty(*capture_dates, delim=', '))
info = {'id': video_id}
for capture in capture_dates:
webpage = self._download_webpage(
(self._WAYBACK_BASE_URL + 'http://www.youtube.com/watch?v=%s') % (capture, video_id),
video_id=video_id, fatal=False, errnote='unable to download capture webpage (it may not be archived)',
note='Downloading capture webpage')
current_info = self._extract_metadata(video_id, webpage or '')
# Try avoid getting deleted video metadata
if current_info.get('title'):
info = merge_dicts(info, current_info)
if 'captures' not in self._configuration_arg('check_all'):
break
info['thumbnails'] = self._extract_thumbnails(video_id)
if urlh:
url = compat_urllib_parse_unquote(urlh.geturl())
video_file_url_qs = parse_qs(url)
# Attempt to recover any ext & format info from playback url & response headers
format = {'url': url, 'filesize': int_or_none(urlh.headers.get('x-archive-orig-content-length'))}
itag = try_get(video_file_url_qs, lambda x: x['itag'][0])
if itag and itag in YoutubeIE._formats:
format.update(YoutubeIE._formats[itag])
format.update({'format_id': itag})
else:
mime = try_get(video_file_url_qs, lambda x: x['mime'][0])
ext = (mimetype2ext(mime)
or urlhandle_detect_ext(urlh)
or mimetype2ext(urlh.headers.get('x-archive-guessed-content-type')))
format.update({'ext': ext})
info['formats'] = [format]
if not info.get('duration'):
info['duration'] = str_to_int(try_get(video_file_url_qs, lambda x: x['dur'][0]))
if not info.get('title'):
info['title'] = video_id
return info

View File

@@ -158,7 +158,7 @@ class ArcPublishingIE(InfoExtractor):
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'title': title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,

View File

@@ -280,7 +280,7 @@ class ARDMediathekIE(ARDMediathekBaseIE):
info.update({
'id': video_id,
'title': self._live_title(title) if info.get('is_live') else title,
'title': title,
'description': description,
'thumbnail': thumbnail,
})
@@ -376,9 +376,24 @@ class ARDIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
_SUB_FORMATS = (
('./dataTimedText', 'ttml'),
('./dataTimedTextNoOffset', 'ttml'),
('./dataTimedTextVtt', 'vtt'),
)
subtitles = {}
for subsel, subext in _SUB_FORMATS:
for node in video_node.findall(subsel):
subtitles.setdefault('de', []).append({
'url': node.attrib['url'],
'ext': subext,
})
return {
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
'title': video_node.find('./title').text,
'duration': parse_duration(video_node.find('./duration').text),

View File

@@ -7,6 +7,7 @@ from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
format_field,
float_or_none,
int_or_none,
parse_iso8601,
@@ -92,7 +93,7 @@ class ArnesIE(InfoExtractor):
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'channel_url': format_field(channel_id, template=f'{self._BASE_URL}/?channel=%s'),
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),

View File

@@ -14,7 +14,7 @@ from ..utils import (
class AudiomackIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/song/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:song/|(?=.+/song/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack'
_TESTS = [
# hosted on audiomack
@@ -39,15 +39,16 @@ class AudiomackIE(InfoExtractor):
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'ILOVEMAKONNEN',
'upload_date': '20160414',
}
},
'skip': 'Song has been removed from the site',
},
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/song/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/song/', '/')
# Request the extended version of the api for extra fields like artist and title
api_response = self._download_json(
@@ -73,13 +74,13 @@ class AudiomackIE(InfoExtractor):
class AudiomackAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/album/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:album/|(?=.+/album/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack:album'
_TESTS = [
# Standard album playlist
{
'url': 'http://www.audiomack.com/album/flytunezcom/tha-tour-part-2-mixtape',
'playlist_count': 15,
'playlist_count': 11,
'info_dict':
{
'id': '812251',
@@ -95,24 +96,27 @@ class AudiomackAlbumIE(InfoExtractor):
},
'playlist': [{
'info_dict': {
'title': 'PPP (Pistol P Project) - 9. Heaven or Hell (CHIMACA) ft Zuse (prod by DJ FU)',
'id': '837577',
'title': 'PPP (Pistol P Project) - 8. Real (prod by SYK SENSE )',
'id': '837576',
'ext': 'mp3',
'uploader': 'Lil Herb a.k.a. G Herbo',
}
}, {
'info_dict': {
'title': 'PPP (Pistol P Project) - 10. 4 Minutes Of Hell Part 4 (prod by DY OF 808 MAFIA)',
'id': '837580',
'ext': 'mp3',
'uploader': 'Lil Herb a.k.a. G Herbo',
}
}],
'params': {
'playliststart': 9,
'playlistend': 9,
}
}
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/album/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/album/', '/')
result = {'_type': 'playlist', 'entries': []}
# There is no one endpoint for album metadata - instead it is included/repeated in each song's metadata
# Therefore we don't know how many songs the album has and must infi-loop until failure
@@ -134,7 +138,7 @@ class AudiomackAlbumIE(InfoExtractor):
# Pull out the album metadata and add to result (if it exists)
for resultkey, apikey in [('id', 'album_id'), ('title', 'album_title')]:
if apikey in api_response and resultkey not in result:
result[resultkey] = api_response[apikey]
result[resultkey] = compat_str(api_response[apikey])
song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({
'id': compat_str(api_response.get('id', song_id)),

View File

@@ -9,6 +9,7 @@ from ..compat import (
compat_str,
)
from ..utils import (
format_field,
int_or_none,
parse_iso8601,
smuggle_url,
@@ -41,9 +42,9 @@ class AWAANBaseIE(InfoExtractor):
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
'thumbnail': format_field(img, template='http://admin.mangomolo.com/analytics/%s'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,

View File

@@ -1,5 +1,6 @@
# coding: utf-8
import base64
import hashlib
import itertools
import functools
@@ -16,17 +17,18 @@ from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
mimetype2ext,
parse_iso8601,
traverse_obj,
try_get,
parse_count,
smuggle_url,
srt_subtitles_timecode,
str_or_none,
str_to_int,
strip_jsonp,
unified_timestamp,
unsmuggle_url,
urlencode_postdata,
url_or_none,
OnDemandPagedList
)
@@ -50,16 +52,14 @@ class BiliBiliIE(InfoExtractor):
'url': 'http://www.bilibili.com/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'info_dict': {
'id': '1074402',
'ext': 'flv',
'id': '1074402_part1',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'timestamp': 1398012678,
'upload_date': '20140420',
'thumbnail': r're:^https?://.+\.jpg',
'uploader': '菊子桑',
'uploader_id': '156160',
'uploader': '菊子桑',
'upload_date': '20140420',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'timestamp': 1398012678,
},
}, {
# Tested in BiliBiliBangumiIE
@@ -73,49 +73,27 @@ class BiliBiliIE(InfoExtractor):
'url': 'http://bangumi.bilibili.com/anime/5802/play#100643',
'md5': '3f721ad1e75030cc06faf73587cfec57',
'info_dict': {
'id': '100643',
'id': '100643_part1',
'ext': 'mp4',
'title': 'CHAOS;CHILD',
'description': '如果你是神明并且能够让妄想成为现实。那你会进行怎么样的妄想是淫靡的世界独裁社会毁灭性的制裁还是……2015年涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
},
'skip': 'Geo-restricted to China',
}, {
# Title with double quotes
'url': 'http://www.bilibili.com/video/av8903802/',
'info_dict': {
'id': '8903802',
'id': '8903802_part1',
'ext': 'mp4',
'title': '阿滴英文|英文歌分享#6 "Closer',
'upload_date': '20170301',
'description': '滴妹今天唱Closer給你聽! 有史以来,被推最多次也是最久的歌曲,其实歌词跟我原本想像差蛮多的,不过还是好听! 微博@阿滴英文',
'timestamp': 1488382634,
'uploader_id': '65880958',
'uploader': '阿滴英文',
},
'params': {
'skip_download': True,
},
'playlist': [{
'info_dict': {
'id': '8903802_part1',
'ext': 'flv',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382634,
'upload_date': '20170301',
},
'params': {
'skip_download': True,
},
}, {
'info_dict': {
'id': '8903802_part2',
'ext': 'flv',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382634,
'upload_date': '20170301',
},
'params': {
'skip_download': True,
},
}]
}, {
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
@@ -150,6 +128,7 @@ class BiliBiliIE(InfoExtractor):
av_id, bv_id = self._get_video_id_set(video_id, mobj.group('id_bv') is not None)
video_id = av_id
info = {}
anime_id = mobj.group('anime_id')
page_id = mobj.group('page')
webpage = self._download_webpage(url, video_id)
@@ -201,35 +180,48 @@ class BiliBiliIE(InfoExtractor):
}
headers.update(self.geo_verification_headers())
video_info = self._parse_json(
self._search_regex(r'window.__playinfo__\s*=\s*({.+?})</script>', webpage, 'video info', default=None) or '{}',
video_id, fatal=False)
video_info = video_info.get('data') or {}
durl = traverse_obj(video_info, ('dash', 'video'))
audios = traverse_obj(video_info, ('dash', 'audio')) or []
entries = []
RENDITIONS = ('qn=80&quality=80&type=', 'quality=2&type=mp4')
for num, rendition in enumerate(RENDITIONS, start=1):
payload = 'appkey=%s&cid=%s&otype=json&%s' % (self._APP_KEY, cid, rendition)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
video_info = self._download_json(
'http://interface.bilibili.com/v2/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page',
headers=headers, fatal=num == len(RENDITIONS))
if not video_info:
continue
video_info = self._download_json(
'http://interface.bilibili.com/v2/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page',
headers=headers, fatal=num == len(RENDITIONS))
if not video_info:
continue
if 'durl' not in video_info:
if not durl and 'durl' not in video_info:
if num < len(RENDITIONS):
continue
self._report_error(video_info)
for idx, durl in enumerate(video_info['durl']):
formats = [{
'url': durl['url'],
'filesize': int_or_none(durl['size']),
}]
for backup_url in durl.get('backup_url', []):
formats = []
for idx, durl in enumerate(durl or video_info['durl']):
formats.append({
'url': durl.get('baseUrl') or durl.get('base_url') or durl.get('url'),
'ext': mimetype2ext(durl.get('mimeType') or durl.get('mime_type')),
'fps': int_or_none(durl.get('frameRate') or durl.get('frame_rate')),
'width': int_or_none(durl.get('width')),
'height': int_or_none(durl.get('height')),
'vcodec': durl.get('codecs'),
'acodec': 'none' if audios else None,
'tbr': float_or_none(durl.get('bandwidth'), scale=1000),
'filesize': int_or_none(durl.get('size')),
})
for backup_url in traverse_obj(durl, 'backup_url', expected_type=list) or []:
formats.append({
'url': backup_url,
# backup URLs have lower priorities
'quality': -2 if 'hd.mp4' in backup_url else -3,
})
@@ -237,30 +229,47 @@ class BiliBiliIE(InfoExtractor):
a_format.setdefault('http_headers', {}).update({
'Referer': url,
})
self._sort_formats(formats)
entries.append({
'id': '%s_part%s' % (video_id, idx),
'duration': float_or_none(durl.get('length'), 1000),
'formats': formats,
for audio in audios:
formats.append({
'url': audio.get('baseUrl') or audio.get('base_url') or audio.get('url'),
'ext': mimetype2ext(audio.get('mimeType') or audio.get('mime_type')),
'fps': int_or_none(audio.get('frameRate') or audio.get('frame_rate')),
'width': int_or_none(audio.get('width')),
'height': int_or_none(audio.get('height')),
'acodec': audio.get('codecs'),
'vcodec': 'none',
'tbr': float_or_none(audio.get('bandwidth'), scale=1000),
'filesize': int_or_none(audio.get('size'))
})
for backup_url in traverse_obj(audio, 'backup_url', expected_type=list) or []:
formats.append({
'url': backup_url,
# backup URLs have lower priorities
'quality': -3,
})
info.update({
'id': video_id,
'duration': float_or_none(durl.get('length'), 1000),
'formats': formats,
})
break
title = self._html_search_regex(
(r'<h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'(?s)<h1[^>]*>(?P<title>.+?)</h1>'), webpage, 'title',
group='title')
self._sort_formats(formats)
title = self._html_search_regex((
r'<h1[^>]+title=(["\'])(?P<content>[^"\']+)',
r'(?s)<h1[^>]*>(?P<content>.+?)</h1>',
self._meta_regex('title')
), webpage, 'title', group='content', fatal=False)
# Get part title for anthologies
if page_id is not None:
# TODO: The json is already downloaded by _extract_anthology_entries. Don't redownload for each video
part_title = try_get(
self._download_json(
f'https://api.bilibili.com/x/player/pagelist?bvid={bv_id}&jsonp=jsonp',
video_id, note='Extracting videos in anthology'),
lambda x: x['data'][int(page_id) - 1]['part'])
title = part_title or title
# TODO: The json is already downloaded by _extract_anthology_entries. Don't redownload for each video.
part_info = traverse_obj(self._download_json(
f'https://api.bilibili.com/x/player/pagelist?bvid={bv_id}&jsonp=jsonp',
video_id, note='Extracting videos in anthology'), 'data', expected_type=list)
title = title if len(part_info) == 1 else traverse_obj(part_info, (int(page_id) - 1, 'part')) or title
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
@@ -270,15 +279,15 @@ class BiliBiliIE(InfoExtractor):
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
# TODO 'view_count' requires deobfuscating Javascript
info = {
'id': str(video_id) if page_id is None else '%s_part%s' % (video_id, page_id),
info.update({
'id': f'{video_id}_part{page_id or 1}',
'cid': cid,
'title': title,
'description': description,
'timestamp': timestamp,
'thumbnail': thumbnail,
'duration': float_or_none(video_info.get('timelength'), scale=1000),
}
})
uploader_mobj = re.search(
r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]*>\s*(?P<name>[^<]+?)\s*<',
@@ -299,7 +308,7 @@ class BiliBiliIE(InfoExtractor):
video_id, fatal=False, note='Downloading tags'), ('data', ..., 'tag_name')),
}
entries[0]['subtitles'] = {
info['subtitles'] = {
'danmaku': [{
'ext': 'xml',
'url': f'https://comment.bilibili.com/{cid}.xml',
@@ -334,12 +343,10 @@ class BiliBiliIE(InfoExtractor):
entry['id'] = '%s_part%d' % (video_id, (idx + 1))
return {
'_type': 'multi_video',
'id': str(video_id),
'bv_id': bv_id,
'title': title,
'description': description,
'entries': entries,
**info, **top_level_info
}
@@ -480,9 +487,9 @@ class BilibiliChannelIE(InfoExtractor):
data = self._download_json(
self._API_URL % (list_id, page_num), list_id, note=f'Downloading page {page_num}')['data']
max_count = max_count or try_get(data, lambda x: x['page']['count'])
max_count = max_count or traverse_obj(data, ('page', 'count'))
entries = try_get(data, lambda x: x['list']['vlist'])
entries = traverse_obj(data, ('list', 'vlist'))
if not entries:
return
for entry in entries:
@@ -520,7 +527,7 @@ class BilibiliCategoryIE(InfoExtractor):
api_url, query, query={'Search_key': query, 'pn': page_num},
note='Extracting results from page %s of %s' % (page_num, num_pages))
video_list = try_get(parsed_json, lambda x: x['data']['archives'], list)
video_list = traverse_obj(parsed_json, ('data', 'archives'), expected_type=list)
if not video_list:
raise ExtractorError('Failed to retrieve video list for page %d' % page_num)
@@ -550,7 +557,7 @@ class BilibiliCategoryIE(InfoExtractor):
api_url = 'https://api.bilibili.com/x/web-interface/newlist?rid=%d&type=1&ps=20&jsonp=jsonp' % rid_value
page_json = self._download_json(api_url, query, query={'Search_key': query, 'pn': '1'})
page_data = try_get(page_json, lambda x: x['data']['page'], dict)
page_data = traverse_obj(page_json, ('data', 'page'), expected_type=dict)
count, size = int_or_none(page_data.get('count')), int_or_none(page_data.get('size'))
if count is None or not size:
raise ExtractorError('Failed to calculate either page count or size')
@@ -722,40 +729,57 @@ class BiliBiliPlayerIE(InfoExtractor):
class BiliIntlBaseIE(InfoExtractor):
_API_URL = 'https://api.bili{}/intl/gateway{}'
_API_URL = 'https://api.bilibili.tv/intl/gateway'
_NETRC_MACHINE = 'biliintl'
def _call_api(self, type, endpoint, id):
return self._download_json(self._API_URL.format(type, endpoint), id)['data']
def _call_api(self, endpoint, *args, **kwargs):
json = self._download_json(self._API_URL + endpoint, *args, **kwargs)
if json.get('code'):
if json['code'] in (10004004, 10004005, 10023006):
self.raise_login_required()
elif json['code'] == 10004001:
self.raise_geo_restricted()
else:
if json.get('message') and str(json['code']) != json['message']:
errmsg = f'{kwargs.get("errnote", "Unable to download JSON metadata")}: {self.IE_NAME} said: {json["message"]}'
else:
errmsg = kwargs.get('errnote', 'Unable to download JSON metadata')
if kwargs.get('fatal'):
raise ExtractorError(errmsg)
else:
self.report_warning(errmsg)
return json.get('data')
def json2srt(self, json):
data = '\n\n'.join(
f'{i + 1}\n{srt_subtitles_timecode(line["from"])} --> {srt_subtitles_timecode(line["to"])}\n{line["content"]}'
for i, line in enumerate(json['body']))
for i, line in enumerate(json['body']) if line.get('content'))
return data
def _get_subtitles(self, type, ep_id):
sub_json = self._call_api(type, f'/m/subtitle?ep_id={ep_id}&platform=web', ep_id)
def _get_subtitles(self, ep_id):
sub_json = self._call_api(f'/web/v2/subtitle?episode_id={ep_id}&platform=web', ep_id)
subtitles = {}
for sub in sub_json.get('subtitles', []):
for sub in sub_json.get('subtitles') or []:
sub_url = sub.get('url')
if not sub_url:
continue
sub_data = self._download_json(sub_url, ep_id, fatal=False)
sub_data = self._download_json(
sub_url, ep_id, errnote='Unable to download subtitles', fatal=False,
note='Downloading subtitles%s' % f' for {sub["lang"]}' if sub.get('lang') else '')
if not sub_data:
continue
subtitles.setdefault(sub.get('key', 'en'), []).append({
subtitles.setdefault(sub.get('lang_key', 'en'), []).append({
'ext': 'srt',
'data': self.json2srt(sub_data)
})
return subtitles
def _get_formats(self, type, ep_id):
video_json = self._call_api(type, f'/web/playurl?ep_id={ep_id}&platform=web', ep_id)
if not video_json:
self.raise_login_required(method='cookies')
def _get_formats(self, ep_id):
video_json = self._call_api(f'/web/playurl?ep_id={ep_id}&platform=web', ep_id,
note='Downloading video formats', errnote='Unable to download video formats')
video_json = video_json['playurl']
formats = []
for vid in video_json.get('video', []):
for vid in video_json.get('video') or []:
video_res = vid.get('video_resource') or {}
video_info = vid.get('stream_info') or {}
if not video_res.get('url'):
@@ -771,7 +795,7 @@ class BiliIntlBaseIE(InfoExtractor):
'vcodec': video_res.get('codecs'),
'filesize': video_res.get('size'),
})
for aud in video_json.get('audio_resource', []):
for aud in video_json.get('audio_resource') or []:
if not aud.get('url'):
continue
formats.append({
@@ -786,85 +810,144 @@ class BiliIntlBaseIE(InfoExtractor):
self._sort_formats(formats)
return formats
def _extract_ep_info(self, type, episode_data, ep_id):
def _extract_ep_info(self, episode_data, ep_id):
return {
'id': ep_id,
'title': episode_data.get('long_title') or episode_data['title'],
'title': episode_data.get('title_display') or episode_data['title'],
'thumbnail': episode_data.get('cover'),
'episode_number': str_to_int(episode_data.get('title')),
'formats': self._get_formats(type, ep_id),
'subtitles': self._get_subtitles(type, ep_id),
'episode_number': int_or_none(self._search_regex(
r'^E(\d+)(?:$| - )', episode_data.get('title_display'), 'episode number', default=None)),
'formats': self._get_formats(ep_id),
'subtitles': self._get_subtitles(ep_id),
'extractor_key': BiliIntlIE.ie_key(),
}
def _login(self):
username, password = self._get_login_info()
if username is None:
return
try:
from Cryptodome.PublicKey import RSA
from Cryptodome.Cipher import PKCS1_v1_5
except ImportError:
try:
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_v1_5
except ImportError:
raise ExtractorError('pycryptodomex not found. Please install', expected=True)
key_data = self._download_json(
'https://passport.bilibili.tv/x/intl/passport-login/web/key?lang=en-US', None,
note='Downloading login key', errnote='Unable to download login key')['data']
public_key = RSA.importKey(key_data['key'])
password_hash = PKCS1_v1_5.new(public_key).encrypt((key_data['hash'] + password).encode('utf-8'))
login_post = self._download_json(
'https://passport.bilibili.tv/x/intl/passport-login/web/login/password?lang=en-US', None, data=urlencode_postdata({
'username': username,
'password': base64.b64encode(password_hash).decode('ascii'),
'keep_me': 'true',
's_locale': 'en_US',
'isTrusted': 'true'
}), note='Logging in', errnote='Unable to log in')
if login_post.get('code'):
if login_post.get('message'):
raise ExtractorError(f'Unable to log in: {self.IE_NAME} said: {login_post["message"]}', expected=True)
else:
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()
class BiliIntlIE(BiliIntlBaseIE):
_VALID_URL = r'https?://(?:www\.)?bili(?P<type>bili\.tv|intl.com)/(?:[a-z]{2}/)?play/(?P<season_id>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?bili(?:bili\.tv|intl\.com)/(?:[a-z]{2}/)?play/(?P<season_id>\d+)/(?P<id>\d+)'
_TESTS = [{
# Bstation page
'url': 'https://www.bilibili.tv/en/play/34613/341736',
'info_dict': {
'id': '341736',
'ext': 'mp4',
'title': 'The First Night',
'thumbnail': 'https://i0.hdslb.com/bfs/intl/management/91e30e5521235d9b163339a26a0b030ebda54310.png',
'title': 'E2 - The First Night',
'thumbnail': r're:^https://pic\.bstarstatic\.com/ogv/.+\.png$',
'episode_number': 2,
}
}, {
# Non-Bstation page
'url': 'https://www.bilibili.tv/en/play/1033760/11005006',
'info_dict': {
'id': '11005006',
'ext': 'mp4',
'title': 'E3 - Who?',
'thumbnail': r're:^https://pic\.bstarstatic\.com/ogv/.+\.png$',
'episode_number': 3,
}
}, {
# Subtitle with empty content
'url': 'https://www.bilibili.tv/en/play/1005144/10131790',
'info_dict': {
'id': '10131790',
'ext': 'mp4',
'title': 'E140 - Two Heartbeats: Kabuto\'s Trap',
'thumbnail': r're:^https://pic\.bstarstatic\.com/ogv/.+\.png$',
'episode_number': 140,
},
'params': {
'format': 'bv',
},
'skip': 'According to the copyright owner\'s request, you may only watch the video after you log in.'
}, {
'url': 'https://www.biliintl.com/en/play/34613/341736',
'info_dict': {
'id': '341736',
'ext': 'mp4',
'title': 'The First Night',
'thumbnail': 'https://i0.hdslb.com/bfs/intl/management/91e30e5521235d9b163339a26a0b030ebda54310.png',
'episode_number': 2,
},
'params': {
'format': 'bv',
},
'only_matching': True,
}]
def _real_extract(self, url):
type, season_id, id = self._match_valid_url(url).groups()
data_json = self._call_api(type, f'/web/view/ogv_collection?season_id={season_id}', id)
episode_data = next(
episode for episode in data_json.get('episodes', [])
if str(episode.get('ep_id')) == id)
return self._extract_ep_info(type, episode_data, id)
season_id, video_id = self._match_valid_url(url).groups()
webpage = self._download_webpage(url, video_id)
# Bstation layout
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), video_id, fatal=False) or {}
episode_data = traverse_obj(initial_data, ('OgvVideo', 'epDetail'), expected_type=dict)
if not episode_data:
# Non-Bstation layout, read through episode list
season_json = self._call_api(f'/web/v2/ogv/play/episodes?season_id={season_id}&platform=web', video_id)
episode_data = next(
episode for episode in traverse_obj(season_json, ('sections', ..., 'episodes', ...), expected_type=dict)
if str(episode.get('episode_id')) == video_id)
return self._extract_ep_info(episode_data, video_id)
class BiliIntlSeriesIE(BiliIntlBaseIE):
_VALID_URL = r'https?://(?:www\.)?bili(?P<type>bili\.tv|intl.com)/(?:[a-z]{2}/)?play/(?P<id>\d+)$'
_VALID_URL = r'https?://(?:www\.)?bili(?:bili\.tv|intl\.com)/(?:[a-z]{2}/)?play/(?P<id>\d+)$'
_TESTS = [{
'url': 'https://www.bilibili.tv/en/play/34613',
'playlist_mincount': 15,
'info_dict': {
'id': '34613',
'title': 'Fly Me to the Moon',
'description': 'md5:a861ee1c4dc0acfad85f557cc42ac627',
'categories': ['Romance', 'Comedy', 'Slice of life'],
'thumbnail': r're:^https://pic\.bstarstatic\.com/ogv/.+\.png$',
'view_count': int,
},
'params': {
'skip_download': True,
'format': 'bv',
},
}, {
'url': 'https://www.biliintl.com/en/play/34613',
'playlist_mincount': 15,
'info_dict': {
'id': '34613',
},
'params': {
'skip_download': True,
'format': 'bv',
},
'only_matching': True,
}]
def _entries(self, id, type):
data_json = self._call_api(type, f'/web/view/ogv_collection?season_id={id}', id)
for episode in data_json.get('episodes', []):
episode_id = str(episode.get('ep_id'))
yield self._extract_ep_info(type, episode, episode_id)
def _entries(self, series_id):
series_json = self._call_api(f'/web/v2/ogv/play/episodes?season_id={series_id}&platform=web', series_id)
for episode in traverse_obj(series_json, ('sections', ..., 'episodes', ...), expected_type=dict, default=[]):
episode_id = str(episode.get('episode_id'))
yield self._extract_ep_info(episode, episode_id)
def _real_extract(self, url):
type, id = self._match_valid_url(url).groups()
return self.playlist_result(self._entries(id, type), playlist_id=id)
series_id = self._match_id(url)
series_info = self._call_api(f'/web/v2/ogv/play/season_info?season_id={series_id}&platform=web', series_id).get('season') or {}
return self.playlist_result(
self._entries(series_id), series_id, series_info.get('title'), series_info.get('description'),
categories=traverse_obj(series_info, ('styles', ..., 'title'), expected_type=str_or_none),
thumbnail=url_or_none(series_info.get('horizontal_cover')), view_count=parse_count(series_info.get('view')))

View File

@@ -51,7 +51,7 @@ class BitwaveStreamIE(InfoExtractor):
return {
'id': username,
'title': self._live_title(channel['data']['title']),
'title': channel['data']['title'],
'uploader': username,
'uploader_id': username,
'formats': formats,

View File

@@ -49,7 +49,7 @@ class BongaCamsIE(InfoExtractor):
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'title': uploader or uploader_id,
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,

View File

@@ -16,6 +16,7 @@ from ..compat import (
)
from ..utils import (
clean_html,
dict_get,
extract_attributes,
ExtractorError,
find_xpath_attr,
@@ -471,32 +472,22 @@ class BrightcoveNewIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
num_drm_sources = 0
formats, subtitles = [], {}
sources = json_data.get('sources') or []
for source in sources:
container = source.get('container')
ext = mimetype2ext(source.get('type'))
src = source.get('src')
skip_unplayable = not self.get_param('allow_unplayable_formats')
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if skip_unplayable and (container == 'WVM' or source.get('key_systems')):
num_drm_sources += 1
continue
elif ext == 'ism' and skip_unplayable:
continue
elif ext == 'm3u8' or container == 'M2TS':
if ext == 'm3u8' or container == 'M2TS':
if not src:
continue
f, subs = self._extract_m3u8_formats_and_subtitles(
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(f)
subtitles = self._merge_subtitles(subtitles, subs)
elif ext == 'mpd':
if not src:
continue
f, subs = self._extract_mpd_formats_and_subtitles(src, video_id, 'dash', fatal=False)
formats.extend(f)
fmts, subs = self._extract_mpd_formats_and_subtitles(src, video_id, 'dash', fatal=False)
subtitles = self._merge_subtitles(subtitles, subs)
else:
streaming_src = source.get('streaming_src')
@@ -543,7 +534,13 @@ class BrightcoveNewIE(AdobePassIE):
'play_path': stream_name,
'format_id': build_format_id('rtmp'),
})
formats.append(f)
fmts = [f]
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if container == 'WVM' or source.get('key_systems') or ext == 'ism':
for f in fmts:
f['has_drm'] = True
formats.extend(fmts)
if not formats:
errors = json_data.get('errors')
@@ -551,9 +548,6 @@ class BrightcoveNewIE(AdobePassIE):
error = errors[0]
self.raise_no_formats(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
elif (not self.get_param('allow_unplayable_formats')
and sources and num_drm_sources == len(sources)):
self.report_drm(video_id)
self._sort_formats(formats)
@@ -577,11 +571,19 @@ class BrightcoveNewIE(AdobePassIE):
if duration is not None and duration <= 0:
is_live = True
common_res = [(160, 90), (320, 180), (480, 720), (640, 360), (768, 432), (1024, 576), (1280, 720), (1366, 768), (1920, 1080)]
thumb_base_url = dict_get(json_data, ('poster', 'thumbnail'))
thumbnails = [{
'url': re.sub(r'\d+x\d+', f'{w}x{h}', thumb_base_url),
'width': w,
'height': h,
} for w, h in common_res] if thumb_base_url else None
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'thumbnails': thumbnails,
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': json_data.get('account_id'),

114
yt_dlp/extractor/callin.py Normal file
View File

@@ -0,0 +1,114 @@
# coding: utf-8
from .common import InfoExtractor
from ..utils import (
traverse_obj,
float_or_none,
int_or_none
)
class CallinIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?callin\.com/(episode)/(?P<id>[-a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.callin.com/episode/the-title-ix-regime-and-the-long-march-through-EBfXYSrsjc',
'info_dict': {
'id': '218b979630a35ead12c6fd096f2996c56c37e4d0dc1f6dc0feada32dcf7b31cd',
'title': 'The Title IX Regime and the Long March Through and Beyond the Institutions',
'ext': 'ts',
'display_id': 'the-title-ix-regime-and-the-long-march-through-EBfXYSrsjc',
'thumbnail': 're:https://.+\\.png',
'description': 'First episode',
'uploader': 'Wesley Yang',
'timestamp': 1639404128.65,
'upload_date': '20211213',
'uploader_id': 'wesyang',
'uploader_url': 'http://wesleyyang.substack.com',
'channel': 'Conversations in Year Zero',
'channel_id': '436d1f82ddeb30cd2306ea9156044d8d2cfdc3f1f1552d245117a42173e78553',
'channel_url': 'https://callin.com/show/conversations-in-year-zero-oJNllRFSfx',
'duration': 9951.936,
'view_count': int,
'categories': ['News & Politics', 'History', 'Technology'],
'cast': ['Wesley Yang', 'KC Johnson', 'Gabi Abramovich'],
'series': 'Conversations in Year Zero',
'series_id': '436d1f82ddeb30cd2306ea9156044d8d2cfdc3f1f1552d245117a42173e78553',
'episode': 'The Title IX Regime and the Long March Through and Beyond the Institutions',
'episode_number': 1,
'episode_id': '218b979630a35ead12c6fd096f2996c56c37e4d0dc1f6dc0feada32dcf7b31cd'
}
}]
def try_get_user_name(self, d):
names = [d.get(n) for n in ('first', 'last')]
if None in names:
return next((n for n in names if n), default=None)
return ' '.join(names)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
next_data = self._search_nextjs_data(webpage, display_id)
episode = next_data['props']['pageProps']['episode']
id = episode['id']
title = (episode.get('title')
or self._og_search_title(webpage, fatal=False)
or self._html_search_regex('<title>(.*?)</title>', webpage, 'title'))
url = episode['m3u8']
formats = self._extract_m3u8_formats(url, display_id, ext='ts')
self._sort_formats(formats)
show = traverse_obj(episode, ('show', 'title'))
show_id = traverse_obj(episode, ('show', 'id'))
show_json = None
app_slug = (self._html_search_regex(
'<script\\s+src=["\']/_next/static/([-_a-zA-Z0-9]+)/_',
webpage, 'app slug', fatal=False) or next_data.get('buildId'))
show_slug = traverse_obj(episode, ('show', 'linkObj', 'resourceUrl'))
if app_slug and show_slug and '/' in show_slug:
show_slug = show_slug.rsplit('/', 1)[1]
show_json_url = f'https://www.callin.com/_next/data/{app_slug}/show/{show_slug}.json'
show_json = self._download_json(show_json_url, display_id, fatal=False)
host = (traverse_obj(show_json, ('pageProps', 'show', 'hosts', 0))
or traverse_obj(episode, ('speakers', 0)))
host_nick = traverse_obj(host, ('linkObj', 'resourceUrl'))
host_nick = host_nick.rsplit('/', 1)[1] if (host_nick and '/' in host_nick) else None
cast = list(filter(None, [
self.try_get_user_name(u) for u in
traverse_obj(episode, (('speakers', 'callerTags'), ...)) or []
]))
episode_list = traverse_obj(show_json, ('pageProps', 'show', 'episodes')) or []
episode_number = next(
(len(episode_list) - i for (i, e) in enumerate(episode_list) if e.get('id') == id),
None)
return {
'id': id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': traverse_obj(episode, ('show', 'photo')),
'description': episode.get('description'),
'uploader': self.try_get_user_name(host) if host else None,
'timestamp': episode.get('publishedAt'),
'uploader_id': host_nick,
'uploader_url': traverse_obj(show_json, ('pageProps', 'show', 'url')),
'channel': show,
'channel_id': show_id,
'channel_url': traverse_obj(episode, ('show', 'linkObj', 'resourceUrl')),
'duration': float_or_none(episode.get('runtime')),
'view_count': int_or_none(episode.get('plays')),
'categories': traverse_obj(episode, ('show', 'categorizations', ..., 'name')),
'cast': cast if cast else None,
'series': show,
'series_id': show_id,
'episode': title,
'episode_number': episode_number,
'episode_id': id
}

View File

@@ -13,6 +13,8 @@ class CAM4IE(InfoExtractor):
'ext': 'mp4',
'title': 're:^foxynesss [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'age_limit': 18,
'live_status': 'is_live',
'thumbnail': 'https://snapshots.xcdnpro.com/thumbnails/foxynesss',
}
}
@@ -25,8 +27,9 @@ class CAM4IE(InfoExtractor):
return {
'id': channel_id,
'title': self._live_title(channel_id),
'title': channel_id,
'is_live': True,
'age_limit': 18,
'formats': formats,
'thumbnail': f'https://snapshots.xcdnpro.com/thumbnails/{channel_id}',
}

View File

@@ -91,7 +91,7 @@ class CamModelsIE(InfoExtractor):
return {
'id': user_id,
'title': self._live_title(user_id),
'title': user_id,
'is_live': True,
'formats': formats,
'age_limit': 18

View File

@@ -78,11 +78,11 @@ class CanalAlphaIE(InfoExtractor):
'height': try_get(video, lambda x: x['res']['height'], expected_type=int),
} for video in try_get(data_json, lambda x: x['video']['mp4'], expected_type=list) or [] if video.get('$url')]
if manifests.get('hls'):
m3u8_frmts, m3u8_subs = self._parse_m3u8_formats_and_subtitles(manifests['hls'], id)
m3u8_frmts, m3u8_subs = self._parse_m3u8_formats_and_subtitles(manifests['hls'], video_id=id)
formats.extend(m3u8_frmts)
subtitles = self._merge_subtitles(subtitles, m3u8_subs)
if manifests.get('dash'):
dash_frmts, dash_subs = self._parse_mpd_formats_and_subtitles(manifests['dash'], id)
dash_frmts, dash_subs = self._parse_mpd_formats_and_subtitles(manifests['dash'])
formats.extend(dash_frmts)
subtitles = self._merge_subtitles(subtitles, dash_subs)
self._sort_formats(formats)

View File

@@ -76,7 +76,7 @@ class CanvasIE(InfoExtractor):
'vrtPlayerToken': vrtPlayerToken,
'client': 'null',
}, expected_status=400)
if not data.get('title'):
if 'title' not in data:
code = data.get('code')
if code == 'AUTHENTICATION_REQUIRED':
self.raise_login_required()
@@ -84,7 +84,8 @@ class CanvasIE(InfoExtractor):
self.raise_geo_restricted(countries=['BE'])
raise ExtractorError(data.get('message') or code, expected=True)
title = data['title']
# Note: The title may be an empty string
title = data['title'] or f'{site_id} {video_id}'
description = data.get('description')
formats = []

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
format_field,
float_or_none,
int_or_none,
try_get,
@@ -43,7 +44,7 @@ class CarambaTVIE(InfoExtractor):
formats = [{
'url': base_url + f['fn'],
'height': int_or_none(f.get('height')),
'format_id': '%sp' % f['height'] if f.get('height') else None,
'format_id': format_field(f, 'height', '%sp'),
} for f in video['qualities'] if f.get('fn')]
self._sort_formats(formats)

View File

@@ -11,11 +11,13 @@ from ..compat import (
compat_str,
)
from ..utils import (
int_or_none,
join_nonempty,
js_to_json,
smuggle_url,
try_get,
orderedSet,
smuggle_url,
strip_or_none,
try_get,
ExtractorError,
)
@@ -313,6 +315,38 @@ class CBCGemIE(InfoExtractor):
return
self._claims_token = self._downloader.cache.load(self._NETRC_MACHINE, 'claims_token')
def _find_secret_formats(self, formats, video_id):
""" Find a valid video url and convert it to the secret variant """
base_format = next((f for f in formats if f.get('vcodec') != 'none'), None)
if not base_format:
return
base_url = re.sub(r'(Manifest\(.*?),filter=[\w-]+(.*?\))', r'\1\2', base_format['url'])
url = re.sub(r'(Manifest\(.*?),format=[\w-]+(.*?\))', r'\1\2', base_url)
secret_xml = self._download_xml(url, video_id, note='Downloading secret XML', fatal=False)
if not secret_xml:
return
for child in secret_xml:
if child.attrib.get('Type') != 'video':
continue
for video_quality in child:
bitrate = int_or_none(video_quality.attrib.get('Bitrate'))
if not bitrate or 'Index' not in video_quality.attrib:
continue
height = int_or_none(video_quality.attrib.get('MaxHeight'))
yield {
**base_format,
'format_id': join_nonempty('sec', height),
# Note: \g<1> is necessary instead of \1 since bitrate is a number
'url': re.sub(r'(QualityLevels\()\d+(\))', fr'\g<1>{bitrate}\2', base_url),
'width': int_or_none(video_quality.attrib.get('MaxWidth')),
'tbr': bitrate / 1000.0,
'height': height,
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/assets/' + video_id, video_id)
@@ -335,6 +369,7 @@ class CBCGemIE(InfoExtractor):
formats = self._extract_m3u8_formats(m3u8_url, video_id, m3u8_id='hls')
self._remove_duplicate_formats(formats)
formats.extend(self._find_secret_formats(formats, video_id))
for format in formats:
if format.get('vcodec') == 'none':

View File

@@ -162,7 +162,8 @@ class CCTVIE(InfoExtractor):
'url': video_url,
'format_id': 'http',
'quality': quality,
'source_preference': -10
# Sample clip
'preference': -10
})
hls_url = try_get(data, lambda x: x['hls_url'], compat_str)

View File

@@ -12,8 +12,7 @@ from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
unescapeHTML,
update_url_query,
traverse_obj,
urlencode_postdata,
USER_AGENTS,
)
@@ -99,11 +98,13 @@ class CeskaTelevizeIE(InfoExtractor):
playlist_description = playlist_description.replace('\xa0', ' ')
if parsed_url.path.startswith('/porady/'):
refer_url = update_url_query(unescapeHTML(self._search_regex(
(r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?ceskatelevize\.cz/ivysilani/embed/iFramePlayer\.php.*?)\1'),
webpage, 'iframe player url', group='url')), query={'autoStart': 'true'})
webpage = self._download_webpage(refer_url, playlist_id)
next_data = self._search_nextjs_data(webpage, playlist_id)
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', ('show', 'mediaMeta'), 'idec'), get_all=False)
if not idec:
raise ExtractorError('Failed to find IDEC id')
iframe_hash = self._download_webpage('https://www.ceskatelevize.cz/v-api/iframe-hash/', playlist_id)
webpage = self._download_webpage('https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php', playlist_id,
query={'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', 'IDEC': idec})
NOT_AVAILABLE_STRING = 'This content is not available at your territory due to limited copyright.'
if '%s</p>' % NOT_AVAILABLE_STRING in webpage:
@@ -176,6 +177,7 @@ class CeskaTelevizeIE(InfoExtractor):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
stream_url = stream_url.replace('https://', 'http://')
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', 'm3u8_native',
@@ -211,8 +213,6 @@ class CeskaTelevizeIE(InfoExtractor):
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)

View File

@@ -101,7 +101,7 @@ class ChaturbateIE(InfoExtractor):
return {
'id': video_id,
'title': self._live_title(video_id),
'title': video_id,
'thumbnail': 'https://roomimg.stream.highwebmedia.com/ri/%s.jpg' % video_id,
'age_limit': self._rta_search(webpage),
'is_live': True,

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import base64
import collections
import datetime
import hashlib
import itertools
import json
@@ -46,6 +45,7 @@ from ..utils import (
determine_ext,
determine_protocol,
dict_get,
encode_data_uri,
error_to_compat_str,
extract_attributes,
ExtractorError,
@@ -164,9 +164,8 @@ class InfoExtractor(object):
* filesize_approx An estimate for the number of bytes
* player_url SWF Player URL (used for rtmpdump).
* protocol The protocol that will be used for the actual
download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmp_ffmpeg", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments".
download, lower-case. One of "http", "https" or
one of the protocols defined in downloader.PROTOCOL_MAP
* fragment_base_url
Base URL for fragments. Each fragment's path
value (if present) will be relative to
@@ -182,6 +181,8 @@ class InfoExtractor(object):
fragment_base_url
* "duration" (optional, int or float)
* "filesize" (optional, int)
* is_from_start Is a live format that can be downloaded
from the start. Boolean
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field, regardless of all other values.
@@ -243,11 +244,16 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video was uploaded
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
If not explicitly set, calculated from timestamp
release_timestamp: UNIX timestamp of the moment the video was released.
If it is not clear whether to use timestamp or this, use the former
release_date: The date (YYYYMMDD) when the video was released.
If not explicitly set, calculated from release_timestamp
modified_timestamp: UNIX timestamp of the moment the video was last modified.
modified_date: The date (YYYYMMDD) when the video was last modified.
If not explicitly set, calculated from modified_timestamp
uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader.
channel: Full name of the channel the video is uploaded on.
@@ -255,6 +261,7 @@ class InfoExtractor(object):
fields. This depends on a particular extractor.
channel_id: Id of the channel.
channel_url: Full URL to a channel webpage.
channel_follower_count: Number of followers of the channel.
location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
@@ -370,6 +377,7 @@ class InfoExtractor(object):
disc_number: Number of the disc or other physical medium the track belongs to,
as an integer.
release_year: Year (YYYY) when the album was released.
composer: Composer of the piece
Unless mentioned otherwise, the fields should be Unicode strings.
@@ -383,6 +391,11 @@ class InfoExtractor(object):
Additionally, playlists can have "id", "title", and any other relevent
attributes with the same semantics as videos (see above).
It can also have the following optional fields:
playlist_count: The total number of videos in a playlist. If not given,
YoutubeDL tries to calculate it from "entries"
_type "multi_video" indicates that there are multiple videos that
form a single show, for examples multiple acts of an opera or TV episode.
@@ -466,6 +479,8 @@ class InfoExtractor(object):
# we have cached the regexp for *this* class, whereas getattr would also
# match the superclass
if '_VALID_URL_RE' not in cls.__dict__:
if '_VALID_URL' not in cls.__dict__:
cls._VALID_URL = cls._make_valid_url()
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
return cls._VALID_URL_RE.match(url)
@@ -614,7 +629,7 @@ class InfoExtractor(object):
kwargs = {
'video_id': e.video_id or self.get_temp_id(url),
'ie': self.IE_NAME,
'tb': e.traceback,
'tb': e.traceback or sys.exc_info()[2],
'expected': e.expected,
'cause': e.cause
}
@@ -1106,39 +1121,39 @@ class InfoExtractor(object):
# Methods for following #608
@staticmethod
def url_result(url, ie=None, video_id=None, video_title=None, **kwargs):
def url_result(url, ie=None, video_id=None, video_title=None, *, url_transparent=False, **kwargs):
"""Returns a URL that points to a page that should be processed"""
# TODO: ie should be the class used for getting the info
video_info = {'_type': 'url',
'url': url,
'ie_key': ie}
video_info.update(kwargs)
if ie is not None:
kwargs['ie_key'] = ie if isinstance(ie, str) else ie.ie_key()
if video_id is not None:
video_info['id'] = video_id
kwargs['id'] = video_id
if video_title is not None:
video_info['title'] = video_title
return video_info
kwargs['title'] = video_title
return {
**kwargs,
'_type': 'url_transparent' if url_transparent else 'url',
'url': url,
}
def playlist_from_matches(self, matches, playlist_id=None, playlist_title=None, getter=None, ie=None):
urls = orderedSet(
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urls, playlist_id=playlist_id, playlist_title=playlist_title)
def playlist_from_matches(self, matches, playlist_id=None, playlist_title=None, getter=None, ie=None, **kwargs):
urls = (self.url_result(self._proto_relative_url(m), ie)
for m in orderedSet(map(getter, matches) if getter else matches))
return self.playlist_result(urls, playlist_id, playlist_title, **kwargs)
@staticmethod
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None, **kwargs):
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None, *, multi_video=False, **kwargs):
"""Returns a playlist"""
video_info = {'_type': 'playlist',
'entries': entries}
video_info.update(kwargs)
if playlist_id:
video_info['id'] = playlist_id
kwargs['id'] = playlist_id
if playlist_title:
video_info['title'] = playlist_title
kwargs['title'] = playlist_title
if playlist_description is not None:
video_info['description'] = playlist_description
return video_info
kwargs['description'] = playlist_description
return {
**kwargs,
'_type': 'multi_video' if multi_video else 'playlist',
'entries': entries,
}
def _search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
@@ -1276,6 +1291,7 @@ class InfoExtractor(object):
return self._og_search_property('description', html, fatal=False, **kargs)
def _og_search_title(self, html, **kargs):
kargs.setdefault('fatal', False)
return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
@@ -1427,6 +1443,23 @@ class InfoExtractor(object):
continue
info[count_key] = interaction_count
def extract_chapter_information(e):
chapters = [{
'title': part.get('name'),
'start_time': part.get('startOffset'),
'end_time': part.get('endOffset'),
} for part in variadic(e.get('hasPart') or []) if part.get('@type') == 'Clip']
for idx, (last_c, current_c, next_c) in enumerate(zip(
[{'end_time': 0}] + chapters, chapters, chapters[1:])):
current_c['end_time'] = current_c['end_time'] or next_c['start_time']
current_c['start_time'] = current_c['start_time'] or last_c['end_time']
if None in current_c.values():
self.report_warning(f'Chapter {idx} contains broken data. Not extracting chapters')
return
if chapters:
chapters[-1]['end_time'] = chapters[-1]['end_time'] or info['duration']
info['chapters'] = chapters
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
author = e.get('author')
@@ -1434,7 +1467,8 @@ class InfoExtractor(object):
'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'thumbnails': [{'url': url_or_none(url)}
for url in variadic(traverse_obj(e, 'thumbnailUrl', 'thumbnailURL'))],
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
# author can be an instance of 'Organization' or 'Person' types.
@@ -1449,9 +1483,15 @@ class InfoExtractor(object):
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
extract_chapter_information(e)
for e in json_ld:
if '@context' in e:
def traverse_json_ld(json_ld, at_top_level=True):
for e in json_ld:
if at_top_level and '@context' not in e:
continue
if at_top_level and set(e.keys()) == {'@context', '@graph'}:
traverse_json_ld(variadic(e['@graph'], allowed_types=(dict,)), at_top_level=False)
break
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
continue
@@ -1487,8 +1527,10 @@ class InfoExtractor(object):
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
'description': unescapeHTML(e.get('articleBody') or e.get('description')),
})
if traverse_obj(e, ('video', 0, '@type')) == 'VideoObject':
extract_video_object(e['video'][0])
elif item_type == 'VideoObject':
extract_video_object(e)
if expected_type is None:
@@ -1502,14 +1544,34 @@ class InfoExtractor(object):
continue
else:
break
traverse_json_ld(json_ld)
return dict((k, v) for k, v in info.items() if v is not None)
def _search_nextjs_data(self, webpage, video_id, **kw):
def _search_nextjs_data(self, webpage, video_id, *, transform_source=None, fatal=True, **kw):
return self._parse_json(
self._search_regex(
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
webpage, 'next.js data', **kw),
video_id, **kw)
webpage, 'next.js data', fatal=fatal, **kw),
video_id, transform_source=transform_source, fatal=fatal)
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__'):
''' Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function. '''
# not all website do this, but it can be changed
# https://stackoverflow.com/questions/67463109/how-to-change-or-hide-nuxt-and-nuxt-keyword-in-page-source
rectx = re.escape(context_name)
js, arg_keys, arg_vals = self._search_regex(
(r'<script>window\.%s=\(function\((?P<arg_keys>.*?)\)\{return\s(?P<js>\{.*?\})\}\((?P<arg_vals>.+?)\)\);?</script>' % rectx,
r'%s\(.*?\(function\((?P<arg_keys>.*?)\)\{return\s(?P<js>\{.*?\})\}\((?P<arg_vals>.*?)\)' % rectx),
webpage, context_name, group=['js', 'arg_keys', 'arg_vals'])
args = dict(zip(arg_keys.split(','), arg_vals.split(',')))
for key, val in args.items():
if val in ('undefined', 'void 0'):
args[key] = 'null'
return self._parse_json(js_to_json(js, args), video_id)['data'][0]
@staticmethod
def _hidden_inputs(html):
@@ -1547,7 +1609,7 @@ class InfoExtractor(object):
'vcodec': {'type': 'ordered', 'regex': True,
'order': ['av0?1', 'vp0?9.2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'acodec': {'type': 'ordered', 'regex': True,
'order': ['opus', 'vorbis', 'aac', 'mp?4a?', 'mp3', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']},
'order': ['[af]lac', 'wav|aiff', 'opus', 'vorbis', 'aac', 'mp?4a?', 'mp3', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']},
'hdr': {'type': 'ordered', 'regex': True, 'field': 'dynamic_range',
'order': ['dv', '(hdr)?12', r'(hdr)?10\+', '(hdr)?10', 'hlg', '', 'sdr', None]},
'proto': {'type': 'ordered', 'regex': True, 'field': 'protocol',
@@ -1586,6 +1648,11 @@ class InfoExtractor(object):
'res': {'type': 'multiple', 'field': ('height', 'width'),
'function': lambda it: (lambda l: min(l) if l else 0)(tuple(filter(None, it)))},
# For compatibility with youtube-dl
'format_id': {'type': 'alias', 'field': 'id'},
'preference': {'type': 'alias', 'field': 'ie_pref'},
'language_preference': {'type': 'alias', 'field': 'lang'},
# Deprecated
'dimension': {'type': 'alias', 'field': 'res'},
'resolution': {'type': 'alias', 'field': 'res'},
@@ -1595,7 +1662,6 @@ class InfoExtractor(object):
'video_bitrate': {'type': 'alias', 'field': 'vbr'},
'audio_bitrate': {'type': 'alias', 'field': 'abr'},
'framerate': {'type': 'alias', 'field': 'fps'},
'language_preference': {'type': 'alias', 'field': 'lang'},
'protocol': {'type': 'alias', 'field': 'proto'},
'source_preference': {'type': 'alias', 'field': 'source'},
'filesize_approx': {'type': 'alias', 'field': 'fs_approx'},
@@ -1610,9 +1676,7 @@ class InfoExtractor(object):
'audio': {'type': 'alias', 'field': 'hasaud'},
'has_audio': {'type': 'alias', 'field': 'hasaud'},
'extractor': {'type': 'alias', 'field': 'ie_pref'},
'preference': {'type': 'alias', 'field': 'ie_pref'},
'extractor_preference': {'type': 'alias', 'field': 'ie_pref'},
'format_id': {'type': 'alias', 'field': 'id'},
}
def __init__(self, ie, field_preference):
@@ -1712,9 +1776,10 @@ class InfoExtractor(object):
continue
if self._get_field_setting(field, 'type') == 'alias':
alias, field = field, self._get_field_setting(field, 'field')
self.ydl.deprecation_warning(
f'Format sorting alias {alias} is deprecated '
f'and may be removed in a future version. Please use {field} instead')
if alias not in ('format_id', 'preference', 'language_preference'):
self.ydl.deprecation_warning(
f'Format sorting alias {alias} is deprecated '
f'and may be removed in a future version. Please use {field} instead')
reverse = match.group('reverse') is not None
closest = match.group('separator') == '~'
limit_text = match.group('limit')
@@ -2046,7 +2111,7 @@ class InfoExtractor(object):
headers=headers, query=query, video_id=video_id)
def _parse_m3u8_formats_and_subtitles(
self, m3u8_doc, m3u8_url, ext=None, entry_protocol='m3u8_native',
self, m3u8_doc, m3u8_url=None, ext=None, entry_protocol='m3u8_native',
preference=None, quality=None, m3u8_id=None, live=False, note=None,
errnote=None, fatal=True, data=None, headers={}, query={},
video_id=None):
@@ -2096,7 +2161,7 @@ class InfoExtractor(object):
formats = [{
'format_id': join_nonempty(m3u8_id, idx),
'format_index': idx,
'url': m3u8_url,
'url': m3u8_url or encode_data_uri(m3u8_doc.encode('utf-8'), 'application/x-mpegurl'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
@@ -2302,7 +2367,7 @@ class InfoExtractor(object):
if smil is False:
assert not fatal
return []
return [], {}
namespace = self._parse_smil_namespace(smil)
@@ -2682,11 +2747,15 @@ class InfoExtractor(object):
mime_type = representation_attrib['mimeType']
content_type = representation_attrib.get('contentType', mime_type.split('/')[0])
codecs = representation_attrib.get('codecs', '')
codecs = parse_codecs(representation_attrib.get('codecs', ''))
if content_type not in ('video', 'audio', 'text'):
if mime_type == 'image/jpeg':
content_type = mime_type
elif codecs.split('.')[0] == 'stpp':
elif codecs['vcodec'] != 'none':
content_type = 'video'
elif codecs['acodec'] != 'none':
content_type = 'audio'
elif codecs.get('tcodec', 'none') != 'none':
content_type = 'text'
elif mimetype2ext(mime_type) in ('tt', 'dfxp', 'ttml', 'xml', 'json'):
content_type = 'text'
@@ -2732,8 +2801,8 @@ class InfoExtractor(object):
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
'container': mimetype2ext(mime_type) + '_dash',
**codecs
}
f.update(parse_codecs(codecs))
elif content_type == 'text':
f = {
'ext': mimetype2ext(mime_type),
@@ -3433,15 +3502,11 @@ class InfoExtractor(object):
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
now_str = now.strftime('%Y-%m-%d %H:%M')
return name + ' ' + now_str
self._downloader.deprecation_warning('yt_dlp.InfoExtractor._live_title is deprecated and does not work as expected')
return name
def _int(self, v, name, fatal=False, **kwargs):
res = int_or_none(v, **kwargs)
if 'get_attr' in kwargs:
print(getattr(v, kwargs['get_attr']))
if res is None:
msg = 'Failed to extract %s: Could not parse value %r' % (name, v)
if fatal:
@@ -3546,14 +3611,18 @@ class InfoExtractor(object):
def extractor():
comments = []
interrupted = True
try:
while True:
comments.append(next(generator))
except KeyboardInterrupt:
interrupted = True
self.to_screen('Interrupted by user')
except StopIteration:
interrupted = False
except KeyboardInterrupt:
self.to_screen('Interrupted by user')
except Exception as e:
if self.get_param('ignoreerrors') is not True:
raise
self._downloader.report_error(e)
comment_count = len(comments)
self.to_screen(f'Extracted {comment_count} comments')
return {
@@ -3631,7 +3700,7 @@ class InfoExtractor(object):
else 'public' if all_known
else None)
def _configuration_arg(self, key, default=NO_DEFAULT, casesense=False):
def _configuration_arg(self, key, default=NO_DEFAULT, *, ie_key=None, casesense=False):
'''
@returns A list of values for the extractor argument given by "key"
or "default" if no such key is present
@@ -3639,11 +3708,27 @@ class InfoExtractor(object):
@param casesense When false, the values are converted to lower case
'''
val = traverse_obj(
self._downloader.params, ('extractor_args', self.ie_key().lower(), key))
self._downloader.params, ('extractor_args', (ie_key or self.ie_key()).lower(), key))
if val is None:
return [] if default is NO_DEFAULT else default
return list(val) if casesense else [x.lower() for x in val]
def _yes_playlist(self, playlist_id, video_id, smuggled_data=None, *, playlist_label='playlist', video_label='video'):
if not playlist_id or not video_id:
return not video_id
no_playlist = (smuggled_data or {}).get('force_noplaylist')
if no_playlist is not None:
return not no_playlist
video_id = '' if video_id is True else f' {video_id}'
playlist_id = '' if playlist_id is True else f' {playlist_id}'
if self.get_param('noplaylist'):
self.to_screen(f'Downloading just the {video_label}{video_id} because of --no-playlist')
return False
self.to_screen(f'Downloading {playlist_label}{playlist_id} - add --no-playlist to download just the {video_label}{video_id}')
return True
class SearchInfoExtractor(InfoExtractor):
"""
@@ -3658,17 +3743,8 @@ class SearchInfoExtractor(InfoExtractor):
def _make_valid_url(cls):
return r'%s(?P<prefix>|[1-9][0-9]*|all):(?P<query>[\s\S]+)' % cls._SEARCH_KEY
@classmethod
def suitable(cls, url):
return re.match(cls._make_valid_url(), url) is not None
def _real_extract(self, query):
mobj = re.match(self._make_valid_url(), query)
if mobj is None:
raise ExtractorError('Invalid search query "%s"' % query)
prefix = mobj.group('prefix')
query = mobj.group('query')
prefix, query = self._match_valid_url(query).group('prefix', 'query')
if prefix == '':
return self._get_n_results(query, 1)
elif prefix == 'all':

View File

@@ -23,32 +23,35 @@ from ..utils import (
class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?(?:sony)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TESTS = [{
# geo restricted to CA
'url': 'https://www.crackle.com/andromeda/2502343',
# Crackle is available in the United States and territories
'url': 'https://www.crackle.com/thanksgiving/2510064',
'info_dict': {
'id': '2502343',
'id': '2510064',
'ext': 'mp4',
'title': 'Under The Night',
'description': 'md5:d2b8ca816579ae8a7bf28bfff8cefc8a',
'duration': 2583,
'title': 'Touch Football',
'description': 'md5:cfbb513cf5de41e8b56d7ab756cff4df',
'duration': 1398,
'view_count': int,
'average_rating': 0,
'age_limit': 14,
'genre': 'Action, Sci-Fi',
'creator': 'Allan Kroeker',
'artist': 'Keith Hamilton Cobb, Kevin Sorbo, Lisa Ryder, Lexa Doig, Robert Hewitt Wolfe',
'release_year': 2000,
'series': 'Andromeda',
'episode': 'Under The Night',
'age_limit': 17,
'genre': 'Comedy',
'creator': 'Daniel Powell',
'artist': 'Chris Elliott, Amy Sedaris',
'release_year': 2016,
'series': 'Thanksgiving',
'episode': 'Touch Football',
'season_number': 1,
'episode_number': 1,
},
'params': {
# m3u8 download
'skip_download': True,
}
},
'expected_warnings': [
'Trying with a list of known countries'
],
}, {
'url': 'https://www.sonycrackle.com/andromeda/2502343',
'url': 'https://www.sonycrackle.com/thanksgiving/2510064',
'only_matching': True,
}]
@@ -129,7 +132,6 @@ class CrackleIE(InfoExtractor):
break
ignore_no_formats = self.get_param('ignore_no_formats_error')
allow_unplayable_formats = self.get_param('allow_unplayable_formats')
if not media or (not media.get('MediaURLs') and not ignore_no_formats):
raise ExtractorError(
@@ -143,9 +145,9 @@ class CrackleIE(InfoExtractor):
for e in media.get('MediaURLs') or []:
if e.get('UseDRM'):
has_drm = True
if not allow_unplayable_formats:
continue
format_url = url_or_none(e.get('Path'))
format_url = url_or_none(e.get('DRMPath'))
else:
format_url = url_or_none(e.get('Path'))
if not format_url:
continue
ext = determine_ext(format_url)

View File

@@ -0,0 +1,113 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
unified_strdate,
)
class CrowdBunkerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?crowdbunker\.com/v/(?P<id>[^/?#$&]+)'
_TESTS = [{
'url': 'https://crowdbunker.com/v/0z4Kms8pi8I',
'info_dict': {
'id': '0z4Kms8pi8I',
'ext': 'mp4',
'title': '117) Pass vax et solutions',
'description': 'md5:86bcb422c29475dbd2b5dcfa6ec3749c',
'view_count': int,
'duration': 5386,
'uploader': 'Jérémie Mercier',
'uploader_id': 'UCeN_qQV829NYf0pvPJhW5dQ',
'like_count': int,
'upload_date': '20211218',
'thumbnail': 'https://scw.divulg.org/cb-medias4/images/0z4Kms8pi8I/maxres.jpg'
},
'params': {'skip_download': True}
}]
def _real_extract(self, url):
id = self._match_id(url)
data_json = self._download_json(f'https://api.divulg.org/post/{id}/details',
id, headers={'accept': 'application/json, text/plain, */*'})
video_json = data_json['video']
formats, subtitles = [], {}
for sub in video_json.get('captions') or []:
sub_url = try_get(sub, lambda x: x['file']['url'])
if not sub_url:
continue
subtitles.setdefault(sub.get('languageCode', 'fr'), []).append({
'url': sub_url,
})
mpd_url = try_get(video_json, lambda x: x['dashManifest']['url'])
if mpd_url:
fmts, subs = self._extract_mpd_formats_and_subtitles(mpd_url, id)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
m3u8_url = try_get(video_json, lambda x: x['hlsManifest']['url'])
if m3u8_url:
fmts, subs = self._extract_m3u8_formats_and_subtitles(mpd_url, id)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
thumbnails = [{
'url': image['url'],
'height': int_or_none(image.get('height')),
'width': int_or_none(image.get('width')),
} for image in video_json.get('thumbnails') or [] if image.get('url')]
self._sort_formats(formats)
return {
'id': id,
'title': video_json.get('title'),
'description': video_json.get('description'),
'view_count': video_json.get('viewCount'),
'duration': video_json.get('duration'),
'uploader': try_get(data_json, lambda x: x['channel']['name']),
'uploader_id': try_get(data_json, lambda x: x['channel']['id']),
'like_count': data_json.get('likesCount'),
'upload_date': unified_strdate(video_json.get('publishedAt') or video_json.get('createdAt')),
'thumbnails': thumbnails,
'formats': formats,
'subtitles': subtitles,
}
class CrowdBunkerChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?crowdbunker\.com/@(?P<id>[^/?#$&]+)'
_TESTS = [{
'url': 'https://crowdbunker.com/@Milan_UHRIN',
'playlist_mincount': 14,
'info_dict': {
'id': 'Milan_UHRIN',
},
}]
def _entries(self, id):
last = None
for page in itertools.count():
channel_json = self._download_json(
f'https://api.divulg.org/organization/{id}/posts', id, headers={'accept': 'application/json, text/plain, */*'},
query={'after': last} if last else {}, note=f'Downloading Page {page}')
for item in channel_json.get('items') or []:
v_id = item.get('uid')
if not v_id:
continue
yield self.url_result(
'https://crowdbunker.com/v/%s' % v_id, ie=CrowdBunkerIE.ie_key(), video_id=v_id)
last = channel_json.get('last')
if not last:
break
def _real_extract(self, url):
id = self._match_id(url)
return self.playlist_result(self._entries(id), playlist_id=id)

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
import json
import zlib
@@ -23,15 +24,17 @@ from ..utils import (
bytes_to_intlist,
extract_attributes,
float_or_none,
format_field,
intlist_to_bytes,
int_or_none,
join_nonempty,
lowercase_escape,
merge_dicts,
qualities,
remove_end,
sanitized_Request,
traverse_obj,
try_get,
urlencode_postdata,
xpath_text,
)
from ..aes import (
@@ -40,8 +43,8 @@ from ..aes import (
class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/login'
_LOGIN_FORM = 'login_form'
_LOGIN_URL = 'https://www.crunchyroll.com/welcome/login'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
def _call_rpc_api(self, method, video_id, note=None, data=None):
@@ -58,50 +61,33 @@ class CrunchyrollBaseIE(InfoExtractor):
username, password = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
def is_logged(webpage):
return 'href="/logout"' in webpage
# Already logged in
if is_logged(login_page):
if self._get_cookies(self._LOGIN_URL).get('etp_rt'):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
upsell_response = self._download_json(
f'{self._API_BASE}/get_upsell_data.0.json', None, 'Getting session id',
query={
'sess_id': 1,
'device_id': 'whatvalueshouldbeforweb',
'device_type': 'com.crunchyroll.static',
'access_token': 'giKq5eY27ny3cqz',
'referer': self._LOGIN_URL
})
if upsell_response['code'] != 'ok':
raise ExtractorError('Could not get session id')
session_id = upsell_response['data']['session_id']
post_url = extract_attributes(login_form_str).get('action')
if not post_url:
post_url = self._LOGIN_URL
elif not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
login_form.update({
'login_form[name]': username,
'login_form[password]': password,
})
response = self._download_webpage(
post_url, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if is_logged(response):
return
error = self._html_search_regex(
'(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
login_response = self._download_json(
f'{self._API_BASE}/login.1.json', None, 'Logging in',
data=compat_urllib_parse_urlencode({
'account': username,
'password': password,
'session_id': session_id
}).encode('ascii'))
if login_response['code'] != 'ok':
raise ExtractorError('Login failed. Bad username or password?', expected=True)
if not self._get_cookies(self._LOGIN_URL).get('etp_rt'):
raise ExtractorError('Login succeeded but did not set etp_rt cookie')
def _real_initialize(self):
self._login()
@@ -733,13 +719,118 @@ class CrunchyrollBetaIE(CrunchyrollBaseIE):
def _real_extract(self, url):
lang, internal_id, display_id = self._match_valid_url(url).group('lang', 'internal_id', 'id')
webpage = self._download_webpage(url, display_id)
episode_data = self._parse_json(
self._search_regex(r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'episode data'),
display_id)['content']['byId'][internal_id]
video_id = episode_data['external_id'].split('.')[1]
series_id = episode_data['episode_metadata']['series_slug_title']
return self.url_result(f'https://www.crunchyroll.com/{lang}{series_id}/{display_id}-{video_id}',
CrunchyrollIE.ie_key(), video_id)
initial_state = self._parse_json(
self._search_regex(r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'initial state'),
display_id)
episode_data = initial_state['content']['byId'][internal_id]
if not self._get_cookies(url).get('etp_rt'):
video_id = episode_data['external_id'].split('.')[1]
series_id = episode_data['episode_metadata']['series_slug_title']
return self.url_result(f'https://www.crunchyroll.com/{lang}{series_id}/{display_id}-{video_id}',
CrunchyrollIE.ie_key(), video_id)
app_config = self._parse_json(
self._search_regex(r'__APP_CONFIG__\s*=\s*({.+?})\s*;', webpage, 'app config'),
display_id)
client_id = app_config['cxApiParams']['accountAuthClientId']
api_domain = app_config['cxApiParams']['apiDomain']
basic_token = str(base64.b64encode(('%s:' % client_id).encode('ascii')), 'ascii')
auth_response = self._download_json(
f'{api_domain}/auth/v1/token', display_id,
note='Authenticating with cookie',
headers={
'Authorization': 'Basic ' + basic_token
}, data='grant_type=etp_rt_cookie'.encode('ascii'))
policy_response = self._download_json(
f'{api_domain}/index/v2', display_id,
note='Retrieving signed policy',
headers={
'Authorization': auth_response['token_type'] + ' ' + auth_response['access_token']
})
bucket = policy_response['cms']['bucket']
params = {
'Policy': policy_response['cms']['policy'],
'Signature': policy_response['cms']['signature'],
'Key-Pair-Id': policy_response['cms']['key_pair_id']
}
locale = traverse_obj(initial_state, ('localization', 'locale'))
if locale:
params['locale'] = locale
episode_response = self._download_json(
f'{api_domain}/cms/v2{bucket}/episodes/{internal_id}', display_id,
note='Retrieving episode metadata',
query=params)
if episode_response.get('is_premium_only') and not episode_response.get('playback'):
raise ExtractorError('This video is for premium members only.', expected=True)
stream_response = self._download_json(
episode_response['playback'], display_id,
note='Retrieving stream info')
thumbnails = []
for thumbnails_data in traverse_obj(episode_response, ('images', 'thumbnail')):
for thumbnail_data in thumbnails_data:
thumbnails.append({
'url': thumbnail_data.get('source'),
'width': thumbnail_data.get('width'),
'height': thumbnail_data.get('height'),
})
subtitles = {}
for lang, subtitle_data in stream_response.get('subtitles').items():
subtitles[lang] = [{
'url': subtitle_data.get('url'),
'ext': subtitle_data.get('format')
}]
requested_hardsubs = [('' if val == 'none' else val) for val in (self._configuration_arg('hardsub') or ['none'])]
hardsub_preference = qualities(requested_hardsubs[::-1])
requested_formats = self._configuration_arg('format') or ['adaptive_hls']
formats = []
for stream_type, streams in stream_response.get('streams', {}).items():
if stream_type not in requested_formats:
continue
for stream in streams.values():
hardsub_lang = stream.get('hardsub_locale') or ''
if hardsub_lang.lower() not in requested_hardsubs:
continue
format_id = join_nonempty(
stream_type,
format_field(stream, 'hardsub_locale', 'hardsub-%s'))
if not stream.get('url'):
continue
if stream_type.split('_')[-1] == 'hls':
adaptive_formats = self._extract_m3u8_formats(
stream['url'], display_id, 'mp4', m3u8_id=format_id,
note='Downloading %s information' % format_id,
fatal=False)
elif stream_type.split('_')[-1] == 'dash':
adaptive_formats = self._extract_mpd_formats(
stream['url'], display_id, mpd_id=format_id,
note='Downloading %s information' % format_id,
fatal=False)
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = stream_response.get('audio_locale')
f['quality'] = hardsub_preference(hardsub_lang.lower())
formats.extend(adaptive_formats)
self._sort_formats(formats)
return {
'id': internal_id,
'title': '%s Episode %s %s' % (episode_response.get('season_title'), episode_response.get('episode'), episode_response.get('title')),
'description': episode_response.get('description').replace(r'\r\n', '\n'),
'duration': float_or_none(episode_response.get('duration_ms'), 1000),
'thumbnails': thumbnails,
'series': episode_response.get('series_title'),
'series_id': episode_response.get('series_id'),
'season': episode_response.get('season_title'),
'season_id': episode_response.get('season_id'),
'season_number': episode_response.get('season_number'),
'episode': episode_response.get('title'),
'episode_number': episode_response.get('sequence_number'),
'subtitles': subtitles,
'formats': formats
}
class CrunchyrollBetaShowIE(CrunchyrollBaseIE):

View File

@@ -65,4 +65,9 @@ class CTVNewsIE(InfoExtractor):
})
entries = [ninecninemedia_url_result(clip_id) for clip_id in orderedSet(
re.findall(r'clip\.id\s*=\s*(\d+);', webpage))]
if not entries:
webpage = self._download_webpage(url, page_id)
if 'getAuthStates("' in webpage:
entries = [ninecninemedia_url_result(clip_id) for clip_id in
self._search_regex(r'getAuthStates\("([\d+,]+)"', webpage, 'clip ids').split(',')]
return self.playlist_result(entries, page_id)

View File

@@ -0,0 +1,79 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
get_elements_by_class,
int_or_none,
js_to_json,
parse_count,
parse_duration,
try_get,
)
class DaftsexIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?daftsex\.com/watch/(?P<id>-?\d+_\d+)'
_TESTS = [{
'url': 'https://daftsex.com/watch/-156601359_456242791',
'info_dict': {
'id': '-156601359_456242791',
'ext': 'mp4',
'title': 'Skye Blue - Dinner And A Show',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = get_elements_by_class('heading', webpage)[-1]
duration = parse_duration(self._search_regex(
r'Duration: ((?:[0-9]{2}:){0,2}[0-9]{2})',
webpage, 'duration', fatal=False))
views = parse_count(self._search_regex(
r'Views: ([0-9 ]+)',
webpage, 'views', fatal=False))
player_hash = self._search_regex(
r'DaxabPlayer\.Init\({[\s\S]*hash:\s*"([0-9a-zA-Z_\-]+)"[\s\S]*}',
webpage, 'player hash')
player_color = self._search_regex(
r'DaxabPlayer\.Init\({[\s\S]*color:\s*"([0-9a-z]+)"[\s\S]*}',
webpage, 'player color', fatal=False) or ''
embed_page = self._download_webpage(
'https://daxab.com/player/%s?color=%s' % (player_hash, player_color),
video_id, headers={'Referer': url})
video_params = self._parse_json(
self._search_regex(
r'window\.globParams\s*=\s*({[\S\s]+})\s*;\s*<\/script>',
embed_page, 'video parameters'),
video_id, transform_source=js_to_json)
server_domain = 'https://%s' % compat_b64decode(video_params['server'][::-1]).decode('utf-8')
formats = []
for format_id, format_data in video_params['video']['cdn_files'].items():
ext, height = format_id.split('_')
extra_quality_data = format_data.split('.')[-1]
url = f'{server_domain}/videos/{video_id.replace("_", "/")}/{height}.mp4?extra={extra_quality_data}'
formats.append({
'format_id': format_id,
'url': url,
'height': int_or_none(height),
'ext': ext,
})
self._sort_formats(formats)
thumbnail = try_get(video_params,
lambda vi: 'https:' + compat_b64decode(vi['video']['thumb']).decode('utf-8'))
return {
'id': video_id,
'title': title,
'formats': formats,
'duration': duration,
'thumbnail': thumbnail,
'view_count': views,
'age_limit': 18,
}

View File

@@ -207,12 +207,10 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
video_id, playlist_id = self._match_valid_url(url).groups()
if playlist_id:
if not self.get_param('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
if self._yes_playlist(playlist_id, video_id):
return self.url_result(
'http://www.dailymotion.com/playlist/' + playlist_id,
'DailymotionPlaylist', playlist_id)
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
password = self.get_param('videopassword')
media = self._call_api(
@@ -305,7 +303,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'description': clean_html(media.get('description')),
'thumbnails': thumbnails,
'duration': int_or_none(metadata.get('duration')) or None,

View File

@@ -157,11 +157,8 @@ class DaumListIE(InfoExtractor):
query_dict = parse_qs(url)
if 'clipid' in query_dict:
clip_id = query_dict['clipid'][0]
if self.get_param('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % clip_id)
if not self._yes_playlist(list_id, clip_id):
return self.url_result(DaumClipIE._URL_TEMPLATE % clip_id, 'DaumClip')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % list_id)
class DaumPlaylistIE(DaumListIE):

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_resolution,
traverse_obj,
try_get,
urlencode_postdata,
)
class DigitalConcertHallIE(InfoExtractor):
IE_DESC = 'DigitalConcertHall extractor'
_VALID_URL = r'https?://(?:www\.)?digitalconcerthall\.com/(?P<language>[a-z]+)/concert/(?P<id>[0-9]+)'
_OAUTH_URL = 'https://api.digitalconcerthall.com/v2/oauth2/token'
_ACCESS_TOKEN = None
_NETRC_MACHINE = 'digitalconcerthall'
_TESTS = [{
'note': 'Playlist with only one video',
'url': 'https://www.digitalconcerthall.com/en/concert/53201',
'info_dict': {
'id': '53201-1',
'ext': 'mp4',
'composer': 'Kurt Weill',
'title': '[Magic Night]',
'thumbnail': r're:^https?://images.digitalconcerthall.com/cms/thumbnails.*\.jpg$',
'upload_date': '20210624',
'timestamp': 1624548600,
'duration': 2798,
'album_artist': 'Members of the Berliner Philharmoniker / Simon Rössler',
},
'params': {'skip_download': 'm3u8'},
}, {
'note': 'Concert with several works and an interview',
'url': 'https://www.digitalconcerthall.com/en/concert/53785',
'info_dict': {
'id': '53785',
'album_artist': 'Berliner Philharmoniker / Kirill Petrenko',
'title': 'Kirill Petrenko conducts Mendelssohn and Shostakovich',
},
'params': {'skip_download': 'm3u8'},
'playlist_count': 3,
}]
def _login(self):
username, password = self._get_login_info()
if not username:
self.raise_login_required()
token_response = self._download_json(
self._OAUTH_URL,
None, 'Obtaining token', errnote='Unable to obtain token', data=urlencode_postdata({
'affiliate': 'none',
'grant_type': 'device',
'device_vendor': 'unknown',
'app_id': 'dch.webapp',
'app_version': '1.0.0',
'client_secret': '2ySLN+2Fwb',
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
self._ACCESS_TOKEN = token_response['access_token']
try:
self._download_json(
self._OAUTH_URL,
None, note='Logging in', errnote='Unable to login', data=urlencode_postdata({
'grant_type': 'password',
'username': username,
'password': password,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://www.digitalconcerthall.com',
'Authorization': f'Bearer {self._ACCESS_TOKEN}'
})
except ExtractorError:
self.raise_login_required(msg='Login info incorrect')
def _real_initialize(self):
self._login()
def _entries(self, items, language, **kwargs):
for item in items:
video_id = item['id']
stream_info = self._download_json(
self._proto_relative_url(item['_links']['streams']['href']), video_id, headers={
'Accept': 'application/json',
'Authorization': f'Bearer {self._ACCESS_TOKEN}',
'Accept-Language': language
})
m3u8_url = traverse_obj(
stream_info, ('channel', lambda x: x.startswith('vod_mixed'), 'stream', 0, 'url'), get_all=False)
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False)
self._sort_formats(formats)
yield {
'id': video_id,
'title': item.get('title'),
'composer': item.get('name_composer'),
'url': m3u8_url,
'formats': formats,
'duration': item.get('duration_total'),
'timestamp': traverse_obj(item, ('date', 'published')),
'description': item.get('short_description') or stream_info.get('short_description'),
**kwargs,
'chapters': [{
'start_time': chapter.get('time'),
'end_time': try_get(chapter, lambda x: x['time'] + x['duration']),
'title': chapter.get('text'),
} for chapter in item['cuepoints']] if item.get('cuepoints') else None,
}
def _real_extract(self, url):
language, video_id = self._match_valid_url(url).group('language', 'id')
if not language:
language = 'en'
thumbnail_url = self._html_search_regex(
r'(https?://images\.digitalconcerthall\.com/cms/thumbnails/.*\.jpg)',
self._download_webpage(url, video_id), 'thumbnail')
thumbnails = [{
'url': thumbnail_url,
**parse_resolution(thumbnail_url)
}]
vid_info = self._download_json(
f'https://api.digitalconcerthall.com/v2/concert/{video_id}', video_id, headers={
'Accept': 'application/json',
'Accept-Language': language
})
album_artist = ' / '.join(traverse_obj(vid_info, ('_links', 'artist', ..., 'name')) or '')
return {
'_type': 'playlist',
'id': video_id,
'title': vid_info.get('title'),
'entries': self._entries(traverse_obj(vid_info, ('_embedded', ..., ...)), language,
thumbnails=thumbnails, album_artist=album_artist),
'thumbnails': thumbnails,
'album_artist': album_artist,
}

View File

@@ -74,13 +74,11 @@ class DigitallySpeakingIE(InfoExtractor):
tbr = int_or_none(bitrate)
vbr = int_or_none(self._search_regex(
r'-(\d+)\.mp4', video_path, 'vbr', default=None))
abr = tbr - vbr if tbr and vbr else None
video_formats.append({
'format_id': bitrate,
'url': url,
'tbr': tbr,
'vbr': vbr,
'abr': abr,
})
return video_formats
@@ -121,6 +119,7 @@ class DigitallySpeakingIE(InfoExtractor):
video_formats = self._parse_mp4(metadata)
if video_formats is None:
video_formats = self._parse_flv(metadata)
self._sort_formats(video_formats)
return {
'id': video_id,

View File

@@ -84,7 +84,7 @@ class DLiveStreamIE(InfoExtractor):
self._sort_formats(formats)
return {
'id': display_name,
'title': self._live_title(title),
'title': title,
'uploader': display_name,
'uploader_id': username,
'formats': formats,

View File

@@ -20,6 +20,16 @@ class DoodStreamIE(InfoExtractor):
'description': 'Kat Wonders - Monthly May 2020 | DoodStream.com',
'thumbnail': 'https://img.doodcdn.com/snaps/flyus84qgl2fsk4g.jpg',
}
}, {
'url': 'http://dood.watch/d/5s1wmbdacezb',
'md5': '4568b83b31e13242b3f1ff96c55f0595',
'info_dict': {
'id': '5s1wmbdacezb',
'ext': 'mp4',
'title': 'Kat Wonders - Monthly May 2020',
'description': 'Kat Wonders - Monthly May 2020 | DoodStream.com',
'thumbnail': 'https://img.doodcdn.com/snaps/flyus84qgl2fsk4g.jpg',
}
}, {
'url': 'https://dood.to/d/jzrxn12t2s7n',
'md5': '3207e199426eca7c2aa23c2872e6728a',
@@ -34,31 +44,26 @@ class DoodStreamIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = f'https://dood.to/e/{video_id}'
webpage = self._download_webpage(url, video_id)
if '/d/' in url:
url = "https://dood.to" + self._html_search_regex(
r'<iframe src="(/e/[a-z0-9]+)"', webpage, 'embed')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(['og:title', 'twitter:title'],
webpage, default=None)
thumb = self._html_search_meta(['og:image', 'twitter:image'],
webpage, default=None)
title = self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None)
thumb = self._html_search_meta(['og:image', 'twitter:image'], webpage, default=None)
token = self._html_search_regex(r'[?&]token=([a-z0-9]+)[&\']', webpage, 'token')
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, default=None)
auth_url = 'https://dood.to' + self._html_search_regex(
r'(/pass_md5.*?)\'', webpage, 'pass_md5')
['og:description', 'description', 'twitter:description'], webpage, default=None)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/66.0',
'referer': url
}
webpage = self._download_webpage(auth_url, video_id, headers=headers)
final_url = webpage + ''.join([random.choice(string.ascii_letters + string.digits) for _ in range(10)]) + "?token=" + token + "&expiry=" + str(int(time.time() * 1000))
pass_md5 = self._html_search_regex(r'(/pass_md5.*?)\'', webpage, 'pass_md5')
final_url = ''.join((
self._download_webpage(f'https://dood.to{pass_md5}', video_id, headers=headers),
*(random.choice(string.ascii_letters + string.digits) for _ in range(10)),
f'?token={token}&expiry={int(time.time() * 1000)}',
))
return {
'id': video_id,

View File

@@ -105,7 +105,7 @@ class DouyuTVIE(InfoExtractor):
'aid': 'pcclient'
})['data']['live_url']
title = self._live_title(unescapeHTML(room['room_name']))
title = unescapeHTML(room['room_name'])
description = room.get('show_details')
thumbnail = room.get('room_src')
uploader = room.get('nickname')

View File

@@ -347,8 +347,381 @@ class HGTVDeIE(DPlayBaseIE):
url, display_id, 'eu1-prod.disco-api.com', 'hgtv', 'de')
class DiscoveryPlusIE(DPlayBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?:\w{2}/)?video' + DPlayBaseIE._PATH_REGEX
class DiscoveryPlusBaseIE(DPlayBaseIE):
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': self._PRODUCT,
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
return self._get_disco_api_info(url, self._match_id(url), **self._DISCO_API_PARAMS)
class GoDiscoveryIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:go\.)?discovery\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://go.discovery.com/video/dirty-jobs-discovery-atve-us/rodbuster-galvanizer',
'info_dict': {
'id': '4164906',
'display_id': 'dirty-jobs-discovery-atve-us/rodbuster-galvanizer',
'ext': 'mp4',
'title': 'Rodbuster / Galvanizer',
'description': 'Mike installs rebar with a team of rodbusters, then he galvanizes steel.',
'season_number': 9,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://discovery.com/video/dirty-jobs-discovery-atve-us/rodbuster-galvanizer',
'only_matching': True,
}]
_PRODUCT = 'dsc'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.go.discovery.com',
'realm': 'go',
'country': 'us',
}
class TravelChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?travelchannel\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.travelchannel.com/video/ghost-adventures-travel-channel/ghost-train-of-ely',
'info_dict': {
'id': '2220256',
'display_id': 'ghost-adventures-travel-channel/ghost-train-of-ely',
'ext': 'mp4',
'title': 'Ghost Train of Ely',
'description': 'The crew investigates the dark history of the Nevada Northern Railway.',
'season_number': 24,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.travelchannel.com/video/ghost-adventures-travel-channel/ghost-train-of-ely',
'only_matching': True,
}]
_PRODUCT = 'trav'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.travelchannel.com',
'realm': 'go',
'country': 'us',
}
class CookingChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?cookingchanneltv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.cookingchanneltv.com/video/carnival-eats-cooking-channel/the-postman-always-brings-rice-2348634',
'info_dict': {
'id': '2348634',
'display_id': 'carnival-eats-cooking-channel/the-postman-always-brings-rice-2348634',
'ext': 'mp4',
'title': 'The Postman Always Brings Rice',
'description': 'Noah visits the Maui Fair and the Aurora Winter Festival in Vancouver.',
'season_number': 9,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.cookingchanneltv.com/video/carnival-eats-cooking-channel/the-postman-always-brings-rice-2348634',
'only_matching': True,
}]
_PRODUCT = 'cook'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.cookingchanneltv.com',
'realm': 'go',
'country': 'us',
}
class HGTVUsaIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?hgtv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.hgtv.com/video/home-inspector-joe-hgtv-atve-us/this-mold-house',
'info_dict': {
'id': '4289736',
'display_id': 'home-inspector-joe-hgtv-atve-us/this-mold-house',
'ext': 'mp4',
'title': 'This Mold House',
'description': 'Joe and Noel help take a familys dream home from hazardous to fabulous.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.hgtv.com/video/home-inspector-joe-hgtv-atve-us/this-mold-house',
'only_matching': True,
}]
_PRODUCT = 'hgtv'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.hgtv.com',
'realm': 'go',
'country': 'us',
}
class FoodNetworkIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?foodnetwork\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.foodnetwork.com/video/kids-baking-championship-food-network/float-like-a-butterfly',
'info_dict': {
'id': '4116449',
'display_id': 'kids-baking-championship-food-network/float-like-a-butterfly',
'ext': 'mp4',
'title': 'Float Like a Butterfly',
'description': 'The 12 kid bakers create colorful carved butterfly cakes.',
'season_number': 10,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.foodnetwork.com/video/kids-baking-championship-food-network/float-like-a-butterfly',
'only_matching': True,
}]
_PRODUCT = 'food'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.foodnetwork.com',
'realm': 'go',
'country': 'us',
}
class DestinationAmericaIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?destinationamerica\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.destinationamerica.com/video/alaska-monsters-destination-america-atve-us/central-alaskas-bigfoot',
'info_dict': {
'id': '4210904',
'display_id': 'alaska-monsters-destination-america-atve-us/central-alaskas-bigfoot',
'ext': 'mp4',
'title': 'Central Alaskas Bigfoot',
'description': 'A team heads to central Alaska to investigate an aggressive Bigfoot.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.destinationamerica.com/video/alaska-monsters-destination-america-atve-us/central-alaskas-bigfoot',
'only_matching': True,
}]
_PRODUCT = 'dam'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.destinationamerica.com',
'realm': 'go',
'country': 'us',
}
class InvestigationDiscoveryIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?investigationdiscovery\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.investigationdiscovery.com/video/unmasked-investigation-discovery/the-killer-clown',
'info_dict': {
'id': '2139409',
'display_id': 'unmasked-investigation-discovery/the-killer-clown',
'ext': 'mp4',
'title': 'The Killer Clown',
'description': 'A wealthy Florida woman is fatally shot in the face by a clown at her door.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.investigationdiscovery.com/video/unmasked-investigation-discovery/the-killer-clown',
'only_matching': True,
}]
_PRODUCT = 'ids'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.investigationdiscovery.com',
'realm': 'go',
'country': 'us',
}
class AmHistoryChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?ahctv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.ahctv.com/video/modern-sniper-ahc/army',
'info_dict': {
'id': '2309730',
'display_id': 'modern-sniper-ahc/army',
'ext': 'mp4',
'title': 'Army',
'description': 'Snipers today face challenges their predecessors couldve only dreamed of.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.ahctv.com/video/modern-sniper-ahc/army',
'only_matching': True,
}]
_PRODUCT = 'ahc'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.ahctv.com',
'realm': 'go',
'country': 'us',
}
class ScienceChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?sciencechannel\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine',
'info_dict': {
'id': '2842849',
'display_id': 'strangest-things-science-atve-us/nazi-mystery-machine',
'ext': 'mp4',
'title': 'Nazi Mystery Machine',
'description': 'Experts investigate the secrets of a revolutionary encryption machine.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine',
'only_matching': True,
}]
_PRODUCT = 'sci'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.sciencechannel.com',
'realm': 'go',
'country': 'us',
}
class DIYNetworkIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?diynetwork\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas',
'info_dict': {
'id': '2309730',
'display_id': 'pool-kings-diy-network/bringing-beach-life-to-texas',
'ext': 'mp4',
'title': 'Bringing Beach Life to Texas',
'description': 'The Pool Kings give a family a day at the beach in their own backyard.',
'season_number': 10,
'episode_number': 2,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas',
'only_matching': True,
}]
_PRODUCT = 'diy'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.diynetwork.com',
'realm': 'go',
'country': 'us',
}
class DiscoveryLifeIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoverylife\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoverylife.com/video/surviving-death-discovery-life-atve-us/bodily-trauma',
'info_dict': {
'id': '2218238',
'display_id': 'surviving-death-discovery-life-atve-us/bodily-trauma',
'ext': 'mp4',
'title': 'Bodily Trauma',
'description': 'Meet three people who tested the limits of the human body.',
'season_number': 1,
'episode_number': 2,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.discoverylife.com/video/surviving-death-discovery-life-atve-us/bodily-trauma',
'only_matching': True,
}]
_PRODUCT = 'dlf'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.discoverylife.com',
'realm': 'go',
'country': 'us',
}
class AnimalPlanetIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?animalplanet\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.animalplanet.com/video/north-woods-law-animal-planet/squirrel-showdown',
'info_dict': {
'id': '3338923',
'display_id': 'north-woods-law-animal-planet/squirrel-showdown',
'ext': 'mp4',
'title': 'Squirrel Showdown',
'description': 'A woman is suspected of being in possession of flying squirrel kits.',
'season_number': 16,
'episode_number': 11,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.animalplanet.com/video/north-woods-law-animal-planet/squirrel-showdown',
'only_matching': True,
}]
_PRODUCT = 'apl'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.animalplanet.com',
'realm': 'go',
'country': 'us',
}
class TLCIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:go\.)?tlc\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://go.tlc.com/video/my-600-lb-life-tlc/melissas-story-part-1',
'info_dict': {
'id': '2206540',
'display_id': 'my-600-lb-life-tlc/melissas-story-part-1',
'ext': 'mp4',
'title': 'Melissas Story (Part 1)',
'description': 'At 650 lbs, Melissa is ready to begin her seven-year weight loss journey.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://go.tlc.com/video/my-600-lb-life-tlc/melissas-story-part-1',
'only_matching': True,
}]
_PRODUCT = 'tlc'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.tlc.com',
'realm': 'go',
'country': 'us',
}
class DiscoveryPlusIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:\w{2}/)?video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
@@ -372,92 +745,14 @@ class DiscoveryPlusIE(DPlayBaseIE):
}]
_PRODUCT = 'dplus_us'
_API_URL = 'us1-prod-direct.discoveryplus.com'
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': self._PRODUCT,
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, self._API_URL, 'go', 'us')
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.discoveryplus.com',
'realm': 'go',
'country': 'us',
}
class ScienceChannelIE(DiscoveryPlusIE):
_VALID_URL = r'https?://(?:www\.)?sciencechannel\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine',
'info_dict': {
'id': '2842849',
'display_id': 'strangest-things-science-atve-us/nazi-mystery-machine',
'ext': 'mp4',
'title': 'Nazi Mystery Machine',
'description': 'Experts investigate the secrets of a revolutionary encryption machine.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
_PRODUCT = 'sci'
_API_URL = 'us1-prod-direct.sciencechannel.com'
class DIYNetworkIE(DiscoveryPlusIE):
_VALID_URL = r'https?://(?:watch\.)?diynetwork\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas',
'info_dict': {
'id': '2309730',
'display_id': 'pool-kings-diy-network/bringing-beach-life-to-texas',
'ext': 'mp4',
'title': 'Bringing Beach Life to Texas',
'description': 'The Pool Kings give a family a day at the beach in their own backyard.',
'season_number': 10,
'episode_number': 2,
},
'skip': 'Available for Premium users',
}]
_PRODUCT = 'diy'
_API_URL = 'us1-prod-direct.watch.diynetwork.com'
class AnimalPlanetIE(DiscoveryPlusIE):
_VALID_URL = r'https?://(?:www\.)?animalplanet\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.animalplanet.com/video/north-woods-law-animal-planet/squirrel-showdown',
'info_dict': {
'id': '3338923',
'display_id': 'north-woods-law-animal-planet/squirrel-showdown',
'ext': 'mp4',
'title': 'Squirrel Showdown',
'description': 'A woman is suspected of being in possession of flying squirrel kits.',
'season_number': 16,
'episode_number': 11,
},
'skip': 'Available for Premium users',
}]
_PRODUCT = 'apl'
_API_URL = 'us1-prod-direct.animalplanet.com'
class DiscoveryPlusIndiaIE(DPlayBaseIE):
class DiscoveryPlusIndiaIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.in/videos?' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.in/videos/how-do-they-do-it/fugu-and-more?seasonId=8&type=EPISODE',
@@ -467,41 +762,38 @@ class DiscoveryPlusIndiaIE(DPlayBaseIE):
'display_id': 'how-do-they-do-it/fugu-and-more',
'title': 'Fugu and More',
'description': 'The Japanese catch, prepare and eat the deadliest fish on the planet.',
'duration': 1319,
'duration': 1319.32,
'timestamp': 1582309800,
'upload_date': '20200221',
'series': 'How Do They Do It?',
'season_number': 8,
'episode_number': 2,
'creator': 'Discovery Channel',
'thumbnail': r're:https://.+\.jpeg',
'episode': 'Episode 2',
'season': 'Season 8',
'tags': [],
},
'params': {
'skip_download': True,
}
}]
_PRODUCT = 'dplus-india'
_DISCO_API_PARAMS = {
'disco_host': 'ap2-prod-direct.discoveryplus.in',
'realm': 'dplusindia',
'country': 'in',
'domain': 'https://www.discoveryplus.in/',
}
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers.update({
'x-disco-params': 'realm=%s' % realm,
'x-disco-client': 'WEB:UNKNOWN:dplus-india:17.0.0',
'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:17.0.0',
'Authorization': self._get_auth(disco_base, display_id, realm),
})
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'ap2-prod-direct.discoveryplus.in', 'dplusindia', 'in', 'https://www.discoveryplus.in/')
class DiscoveryNetworksDeIE(DPlayBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show|sendungen)/(?P<programme>[^/]+)/(?:video/)?(?P<alternate_id>[^/]+)'
@@ -515,6 +807,16 @@ class DiscoveryNetworksDeIE(DPlayBaseIE):
'description': 'md5:61033c12b73286e409d99a41742ef608',
'timestamp': 1554069600,
'upload_date': '20190331',
'creator': 'TLC',
'season': 'Season 1',
'series': 'Breaking Amish',
'episode_number': 1,
'tags': ['new york', 'großstadt', 'amische', 'landleben', 'modern', 'infos', 'tradition', 'herausforderung'],
'display_id': 'breaking-amish/die-welt-da-drauen',
'episode': 'Episode 1',
'duration': 2625.024,
'season_number': 1,
'thumbnail': r're:https://.+\.jpg',
},
'params': {
'skip_download': True,
@@ -564,10 +866,10 @@ class DiscoveryPlusShowBaseIE(DPlayBaseIE):
total_pages = try_get(season_json, lambda x: x['meta']['totalPages'], int) or 1
episodes_json = season_json['data']
for episode in episodes_json:
video_id = episode['attributes']['path']
video_path = episode['attributes']['path']
yield self.url_result(
'%svideos/%s' % (self._DOMAIN, video_id),
ie=self._VIDEO_IE.ie_key(), video_id=video_id)
'%svideos/%s' % (self._DOMAIN, video_path),
ie=self._VIDEO_IE.ie_key(), video_id=episode.get('id') or video_path)
page_num += 1
def _real_extract(self, url):
@@ -575,6 +877,21 @@ class DiscoveryPlusShowBaseIE(DPlayBaseIE):
return self.playlist_result(self._entries(show_name), playlist_id=show_name)
class DiscoveryPlusItalyIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/it/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/it/video/i-signori-della-neve/stagione-2-episodio-1-i-preparativi',
'only_matching': True,
}]
_PRODUCT = 'dplus_us'
_DISCO_API_PARAMS = {
'disco_host': 'eu1-prod-direct.discoveryplus.com',
'realm': 'dplay',
'country': 'it',
}
class DiscoveryPlusItalyShowIE(DiscoveryPlusShowBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.it/programmi/(?P<show_name>[^/]+)/?(?:[?#]|$)'
_TESTS = [{

116
yt_dlp/extractor/drooble.py Normal file
View File

@@ -0,0 +1,116 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
try_get,
)
class DroobleIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://drooble\.com/(?:
(?:(?P<user>[^/]+)/)?(?P<kind>song|videos|music/albums)/(?P<id>\d+)|
(?P<user_2>[^/]+)/(?P<kind_2>videos|music))
'''
_TESTS = [{
'url': 'https://drooble.com/song/2858030',
'md5': '5ffda90f61c7c318dc0c3df4179eb064',
'info_dict': {
'id': '2858030',
'ext': 'mp3',
'title': 'Skankocillin',
'upload_date': '20200801',
'timestamp': 1596241390,
'uploader_id': '95894',
'uploader': 'Bluebeat Shelter',
}
}, {
'url': 'https://drooble.com/karl340758/videos/2859183',
'info_dict': {
'id': 'J6QCQY_I5Tk',
'ext': 'mp4',
'title': 'Skankocillin',
'uploader_id': 'UCrSRoI5vVyeYihtWEYua7rg',
'description': 'md5:ffc0bd8ba383db5341a86a6cd7d9bcca',
'upload_date': '20200731',
'uploader': 'Bluebeat Shelter',
}
}, {
'url': 'https://drooble.com/karl340758/music/albums/2858031',
'info_dict': {
'id': '2858031',
},
'playlist_mincount': 8,
}, {
'url': 'https://drooble.com/karl340758/music',
'info_dict': {
'id': 'karl340758',
},
'playlist_mincount': 8,
}, {
'url': 'https://drooble.com/karl340758/videos',
'info_dict': {
'id': 'karl340758',
},
'playlist_mincount': 8,
}]
def _call_api(self, method, video_id, data=None):
response = self._download_json(
f'https://drooble.com/api/dt/{method}', video_id, data=json.dumps(data).encode())
if not response[0]:
raise ExtractorError('Unable to download JSON metadata')
return response[1]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
user = mobj.group('user') or mobj.group('user_2')
kind = mobj.group('kind') or mobj.group('kind_2')
display_id = mobj.group('id') or user
if mobj.group('kind_2') == 'videos':
data = {'from_user': display_id, 'album': -1, 'limit': 18, 'offset': 0, 'order': 'new2old', 'type': 'video'}
elif kind in ('music/albums', 'music'):
data = {'user': user, 'public_only': True, 'individual_limit': {'singles': 1, 'albums': 1, 'playlists': 1}}
else:
data = {'url_slug': display_id, 'children': 10, 'order': 'old2new'}
method = 'getMusicOverview' if kind in ('music/albums', 'music') else 'getElements'
json_data = self._call_api(method, display_id, data=data)
if kind in ('music/albums', 'music'):
json_data = json_data['singles']['list']
entites = []
for media in json_data:
url = media.get('external_media_url') or media.get('link')
if url.startswith('https://www.youtube.com'):
entites.append({
'_type': 'url',
'url': url,
'ie_key': 'Youtube'
})
continue
is_audio = (media.get('type') or '').lower() == 'audio'
entites.append({
'url': url,
'id': media['id'],
'title': media['title'],
'duration': int_or_none(media.get('duration')),
'timestamp': int_or_none(media.get('timestamp')),
'album': try_get(media, lambda x: x['album']['title']),
'uploader': try_get(media, lambda x: x['creator']['display_name']),
'uploader_id': try_get(media, lambda x: x['creator']['id']),
'thumbnail': media.get('image_comment'),
'like_count': int_or_none(media.get('likes')),
'vcodec': 'none' if is_audio else None,
'ext': 'mp3' if is_audio else None,
})
if len(entites) > 1:
return self.playlist_result(entites, display_id)
return entites[0]

View File

@@ -6,7 +6,12 @@ import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import url_basename
from ..utils import (
ExtractorError,
traverse_obj,
try_get,
url_basename,
)
class DropboxIE(InfoExtractor):
@@ -28,13 +33,44 @@ class DropboxIE(InfoExtractor):
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
fn = compat_urllib_parse_unquote(url_basename(url))
title = os.path.splitext(fn)[0]
video_url = re.sub(r'[?&]dl=0', '', url)
video_url += ('?' if '?' not in video_url else '&') + 'dl=1'
password = self.get_param('videopassword')
if (self._og_search_title(webpage) == 'Dropbox - Password Required'
or 'Enter the password for this link' in webpage):
if password:
content_id = self._search_regex(r'content_id=(.*?)["\']', webpage, 'content_id')
payload = f'is_xhr=true&t={self._get_cookies("https://www.dropbox.com").get("t").value}&content_id={content_id}&password={password}&url={url}'
response = self._download_json(
'https://www.dropbox.com/sm/auth', video_id, 'POSTing video password', data=payload.encode('UTF-8'),
headers={'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'})
if response.get('status') != 'authed':
raise ExtractorError('Authentication failed!', expected=True)
webpage = self._download_webpage(url, video_id)
elif self._get_cookies('https://dropbox.com').get('sm_auth'):
webpage = self._download_webpage(url, video_id)
else:
raise ExtractorError('Password protected video, use --video-password <password>', expected=True)
json_string = self._html_search_regex(r'InitReact\.mountComponent.+ "props":(.+), "elem_id"', webpage, 'Info JSON')
info_json = self._parse_json(json_string, video_id)
transcode_url = traverse_obj(info_json, ((None, 'preview'), 'file', 'preview', 'content', 'transcode_url'), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(transcode_url, video_id)
# downloads enabled we can get the original file
if 'anonymous' in (try_get(info_json, lambda x: x['sharePermission']['canDownloadRoles']) or []):
video_url = re.sub(r'[?&]dl=0', '', url)
video_url += ('?' if '?' not in video_url else '&') + 'dl=1'
formats.append({'url': video_url, 'format_id': 'original', 'format_note': 'Original', 'quality': 1})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'subtitles': subtitles
}

212
yt_dlp/extractor/dropout.py Normal file
View File

@@ -0,0 +1,212 @@
# coding: utf-8
from .common import InfoExtractor
from .vimeo import VHXEmbedIE
from ..utils import (
clean_html,
ExtractorError,
get_element_by_class,
get_element_by_id,
get_elements_by_class,
int_or_none,
join_nonempty,
unified_strdate,
urlencode_postdata,
)
class DropoutIE(InfoExtractor):
_LOGIN_URL = 'https://www.dropout.tv/login'
_NETRC_MACHINE = 'dropout'
_VALID_URL = r'https?://(?:www\.)?dropout\.tv/(?:[^/]+/)*videos/(?P<id>[^/]+)/?$'
_TESTS = [
{
'url': 'https://www.dropout.tv/game-changer/season:2/videos/yes-or-no',
'note': 'Episode in a series',
'md5': '5e000fdfd8d8fa46ff40456f1c2af04a',
'info_dict': {
'id': '738153',
'display_id': 'yes-or-no',
'ext': 'mp4',
'title': 'Yes or No',
'description': 'Ally, Brennan, and Zac are asked a simple question, but is there a correct answer?',
'release_date': '20200508',
'thumbnail': 'https://vhx.imgix.net/chuncensoredstaging/assets/351e3f24-c4a3-459a-8b79-dc80f1e5b7fd.jpg',
'series': 'Game Changer',
'season_number': 2,
'season': 'Season 2',
'episode_number': 6,
'episode': 'Yes or No',
'duration': 1180,
'uploader_id': 'user80538407',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos'
},
'expected_warnings': ['Ignoring subtitle tracks found in the HLS manifest']
},
{
'url': 'https://www.dropout.tv/dimension-20-fantasy-high/season:1/videos/episode-1',
'note': 'Episode in a series (missing release_date)',
'md5': '712caf7c191f1c47c8f1879520c2fa5c',
'info_dict': {
'id': '320562',
'display_id': 'episode-1',
'ext': 'mp4',
'title': 'The Beginning Begins',
'description': 'The cast introduces their PCs, including a neurotic elf, a goblin PI, and a corn-worshipping cleric.',
'thumbnail': 'https://vhx.imgix.net/chuncensoredstaging/assets/4421ed0d-f630-4c88-9004-5251b2b8adfa.jpg',
'series': 'Dimension 20: Fantasy High',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'The Beginning Begins',
'duration': 6838,
'uploader_id': 'user80538407',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos'
},
'expected_warnings': ['Ignoring subtitle tracks found in the HLS manifest']
},
{
'url': 'https://www.dropout.tv/videos/misfits-magic-holiday-special',
'note': 'Episode not in a series',
'md5': 'c30fa18999c5880d156339f13c953a26',
'info_dict': {
'id': '1915774',
'display_id': 'misfits-magic-holiday-special',
'ext': 'mp4',
'title': 'Misfits & Magic Holiday Special',
'description': 'The magical misfits spend Christmas break at Gowpenny, with an unwelcome visitor.',
'release_date': '20211215',
'thumbnail': 'https://vhx.imgix.net/chuncensoredstaging/assets/d91ea8a6-b250-42ed-907e-b30fb1c65176-8e24b8e5.jpg',
'duration': 11698,
'uploader_id': 'user80538407',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos'
},
'expected_warnings': ['Ignoring subtitle tracks found in the HLS manifest']
}
]
def _get_authenticity_token(self, display_id):
signin_page = self._download_webpage(
self._LOGIN_URL, display_id, note='Getting authenticity token')
return self._html_search_regex(
r'name=["\']authenticity_token["\'] value=["\'](.+?)["\']',
signin_page, 'authenticity_token')
def _login(self, display_id):
username, password = self._get_login_info()
if not (username and password):
self.raise_login_required(method='password')
response = self._download_webpage(
self._LOGIN_URL, display_id, note='Logging in', data=urlencode_postdata({
'email': username,
'password': password,
'authenticity_token': self._get_authenticity_token(display_id),
'utf8': True
}))
user_has_subscription = self._search_regex(
r'user_has_subscription:\s*["\'](.+?)["\']', response, 'subscription status', default='none')
if user_has_subscription.lower() == 'true':
return response
elif user_has_subscription.lower() == 'false':
raise ExtractorError('Account is not subscribed')
else:
raise ExtractorError('Incorrect username/password')
def _real_extract(self, url):
display_id = self._match_id(url)
try:
self._login(display_id)
webpage = self._download_webpage(url, display_id, note='Downloading video webpage')
finally:
self._download_webpage('https://www.dropout.tv/logout', display_id, note='Logging out')
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
thumbnail = self._og_search_thumbnail(webpage)
watch_info = get_element_by_id('watch-info', webpage) or ''
title = clean_html(get_element_by_class('video-title', watch_info))
season_episode = get_element_by_class(
'site-font-secondary-color', get_element_by_class('text', watch_info))
episode_number = int_or_none(self._search_regex(
r'Episode (\d+)', season_episode or '', 'episode', default=None))
return {
'_type': 'url_transparent',
'ie_key': VHXEmbedIE.ie_key(),
'url': embed_url,
'id': self._search_regex(r'embed.vhx.tv/videos/(.+?)\?', embed_url, 'id'),
'display_id': display_id,
'title': title,
'description': self._html_search_meta('description', webpage, fatal=False),
'thumbnail': thumbnail.split('?')[0] if thumbnail else None, # Ignore crop/downscale
'series': clean_html(get_element_by_class('series-title', watch_info)),
'episode_number': episode_number,
'episode': title if episode_number else None,
'season_number': int_or_none(self._search_regex(
r'Season (\d+),', season_episode or '', 'season', default=None)),
'release_date': unified_strdate(self._search_regex(
r'data-meta-field-name=["\']release_dates["\'] data-meta-field-value=["\'](.+?)["\']',
watch_info, 'release date', default=None)),
}
class DropoutSeasonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dropout\.tv/(?P<id>[^\/$&?#]+)(?:/?$|/season:[0-9]+/?$)'
_TESTS = [
{
'url': 'https://www.dropout.tv/dimension-20-fantasy-high/season:1',
'note': 'Multi-season series with the season in the url',
'playlist_count': 17,
'info_dict': {
'id': 'dimension-20-fantasy-high-season-1',
'title': 'Dimension 20 Fantasy High - Season 1'
}
},
{
'url': 'https://www.dropout.tv/dimension-20-fantasy-high',
'note': 'Multi-season series with the season not in the url',
'playlist_count': 17,
'info_dict': {
'id': 'dimension-20-fantasy-high-season-1',
'title': 'Dimension 20 Fantasy High - Season 1'
}
},
{
'url': 'https://www.dropout.tv/dimension-20-shriek-week',
'note': 'Single-season series',
'playlist_count': 4,
'info_dict': {
'id': 'dimension-20-shriek-week-season-1',
'title': 'Dimension 20 Shriek Week - Season 1'
}
}
]
def _real_extract(self, url):
season_id = self._match_id(url)
season_title = season_id.replace('-', ' ').title()
webpage = self._download_webpage(url, season_id)
entries = [
self.url_result(
url=self._search_regex(r'<a href=["\'](.+?)["\'] class=["\']browse-item-link["\']',
item, 'item_url'),
ie=DropoutIE.ie_key()
) for item in get_elements_by_class('js-collection-item', webpage)
]
seasons = (get_element_by_class('select-dropdown-wrapper', webpage) or '').strip().replace('\n', '')
current_season = self._search_regex(r'<option[^>]+selected>([^<]+)</option>',
seasons, 'current_season', default='').strip()
return {
'_type': 'playlist',
'id': join_nonempty(season_id, current_season.lower().replace(' ', '-')),
'title': join_nonempty(season_title, current_season, delim=' - '),
'entries': entries
}

View File

@@ -7,13 +7,11 @@ import re
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..aes import aes_cbc_decrypt_bytes, unpad_pkcs7
from ..compat import compat_urllib_parse_unquote
from ..utils import (
bytes_to_intlist,
ExtractorError,
int_or_none,
intlist_to_bytes,
float_or_none,
mimetype2ext,
str_or_none,
@@ -191,13 +189,11 @@ class DRTVIE(InfoExtractor):
def decrypt_uri(e):
n = int(e[2:10], 16)
a = e[10 + n:]
data = bytes_to_intlist(hex_to_bytes(e[10:10 + n]))
key = bytes_to_intlist(hashlib.sha256(
('%s:sRBzYNXBzkKgnjj8pGtkACch' % a).encode('utf-8')).digest())
iv = bytes_to_intlist(hex_to_bytes(a))
decrypted = aes_cbc_decrypt(data, key, iv)
return intlist_to_bytes(
decrypted[:-decrypted[-1]]).decode('utf-8').split('?')[0]
data = hex_to_bytes(e[10:10 + n])
key = hashlib.sha256(('%s:sRBzYNXBzkKgnjj8pGtkACch' % a).encode('utf-8')).digest()
iv = hex_to_bytes(a)
decrypted = unpad_pkcs7(aes_cbc_decrypt_bytes(data, key, iv))
return decrypted.decode('utf-8').split('?')[0]
for asset in assets:
kind = asset.get('Kind')
@@ -321,7 +317,7 @@ class DRTVLiveIE(InfoExtractor):
channel_data = self._download_json(
'https://www.dr.dk/mu-online/api/1.0/channel/' + channel_id,
channel_id)
title = self._live_title(channel_data['Title'])
title = channel_data['Title']
formats = []
for streaming_server in channel_data.get('StreamingServers', []):

316
yt_dlp/extractor/ertgr.py Normal file
View File

@@ -0,0 +1,316 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
ExtractorError,
dict_get,
int_or_none,
merge_dicts,
parse_qs,
parse_age_limit,
parse_iso8601,
str_or_none,
try_get,
unescapeHTML,
url_or_none,
variadic,
)
class ERTFlixBaseIE(InfoExtractor):
def _call_api(
self, video_id, method='Player/AcquireContent', api_version=1,
param_headers=None, data=None, headers=None, **params):
platform_codename = {'platformCodename': 'www'}
headers_as_param = {'X-Api-Date-Format': 'iso', 'X-Api-Camel-Case': False}
headers_as_param.update(param_headers or {})
headers = headers or {}
if data:
headers['Content-Type'] = headers_as_param['Content-Type'] = 'application/json;charset=utf-8'
data = json.dumps(merge_dicts(platform_codename, data)).encode('utf-8')
query = merge_dicts(
{} if data else platform_codename,
{'$headers': json.dumps(headers_as_param)},
params)
response = self._download_json(
'https://api.app.ertflix.gr/v%s/%s' % (str(api_version), method),
video_id, fatal=False, query=query, data=data, headers=headers)
if try_get(response, lambda x: x['Result']['Success']) is True:
return response
def _call_api_get_tiles(self, video_id, *tile_ids):
requested_tile_ids = [video_id] + list(tile_ids)
requested_tiles = [{'Id': tile_id} for tile_id in requested_tile_ids]
tiles_response = self._call_api(
video_id, method='Tile/GetTiles', api_version=2,
data={'RequestedTiles': requested_tiles})
tiles = try_get(tiles_response, lambda x: x['Tiles'], list) or []
if tile_ids:
if sorted([tile['Id'] for tile in tiles]) != sorted(requested_tile_ids):
raise ExtractorError('Requested tiles not found', video_id=video_id)
return tiles
try:
return next(tile for tile in tiles if tile['Id'] == video_id)
except StopIteration:
raise ExtractorError('No matching tile found', video_id=video_id)
class ERTFlixCodenameIE(ERTFlixBaseIE):
IE_NAME = 'ertflix:codename'
IE_DESC = 'ERTFLIX videos by codename'
_VALID_URL = r'ertflix:(?P<id>[\w-]+)'
_TESTS = [{
'url': 'ertflix:monogramma-praxitelis-tzanoylinos',
'md5': '5b9c2cd171f09126167e4082fc1dd0ef',
'info_dict': {
'id': 'monogramma-praxitelis-tzanoylinos',
'ext': 'mp4',
'title': 'md5:ef0b439902963d56c43ac83c3f41dd0e',
},
},
]
def _extract_formats_and_subs(self, video_id, allow_none=True):
media_info = self._call_api(video_id, codename=video_id)
formats, subs = [], {}
for media_file in try_get(media_info, lambda x: x['MediaFiles'], list) or []:
for media in try_get(media_file, lambda x: x['Formats'], list) or []:
fmt_url = url_or_none(try_get(media, lambda x: x['Url']))
if not fmt_url:
continue
ext = determine_ext(fmt_url)
if ext == 'm3u8':
formats_, subs_ = self._extract_m3u8_formats_and_subtitles(
fmt_url, video_id, m3u8_id='hls', ext='mp4', fatal=False)
elif ext == 'mpd':
formats_, subs_ = self._extract_mpd_formats_and_subtitles(
fmt_url, video_id, mpd_id='dash', fatal=False)
else:
formats.append({
'url': fmt_url,
'format_id': str_or_none(media.get('Id')),
})
continue
formats.extend(formats_)
self._merge_subtitles(subs_, target=subs)
if formats or not allow_none:
self._sort_formats(formats)
return formats, subs
def _real_extract(self, url):
video_id = self._match_id(url)
formats, subs = self._extract_formats_and_subs(video_id)
if formats:
return {
'id': video_id,
'formats': formats,
'subtitles': subs,
'title': self._generic_title(url),
}
class ERTFlixIE(ERTFlixBaseIE):
IE_NAME = 'ertflix'
IE_DESC = 'ERTFLIX videos'
_VALID_URL = r'https?://www\.ertflix\.gr/(?:series|vod)/(?P<id>[a-z]{3}\.\d+)'
_TESTS = [{
'url': 'https://www.ertflix.gr/vod/vod.173258-aoratoi-ergates',
'md5': '6479d5e60fd7e520b07ba5411dcdd6e7',
'info_dict': {
'id': 'aoratoi-ergates',
'ext': 'mp4',
'title': 'md5:c1433d598fbba0211b0069021517f8b4',
'description': 'md5:01a64d113c31957eb7eb07719ab18ff4',
'thumbnail': r're:https?://.+\.jpg',
'episode_id': 'vod.173258',
'timestamp': 1639648800,
'upload_date': '20211216',
'duration': 3166,
'age_limit': 8,
},
}, {
'url': 'https://www.ertflix.gr/series/ser.3448-monogramma',
'info_dict': {
'id': 'ser.3448',
'age_limit': 8,
'description': 'Η εκπομπή σαράντα ετών που σημάδεψε τον πολιτισμό μας.',
'title': 'Μονόγραμμα',
},
'playlist_mincount': 64,
}, {
'url': 'https://www.ertflix.gr/series/ser.3448-monogramma?season=1',
'info_dict': {
'id': 'ser.3448',
'age_limit': 8,
'description': 'Η εκπομπή σαράντα ετών που σημάδεψε τον πολιτισμό μας.',
'title': 'Μονόγραμμα',
},
'playlist_count': 22,
}, {
'url': 'https://www.ertflix.gr/series/ser.3448-monogramma?season=1&season=2021%20-%202022',
'info_dict': {
'id': 'ser.3448',
'age_limit': 8,
'description': 'Η εκπομπή σαράντα ετών που σημάδεψε τον πολιτισμό μας.',
'title': 'Μονόγραμμα',
},
'playlist_mincount': 36,
}, {
'url': 'https://www.ertflix.gr/series/ser.164991-to-diktuo-1?season=1-9',
'info_dict': {
'id': 'ser.164991',
'age_limit': 8,
'description': 'Η πρώτη ελληνική εκπομπή με θεματολογία αποκλειστικά γύρω από το ίντερνετ.',
'title': 'Το δίκτυο',
},
'playlist_mincount': 9,
}]
def _extract_episode(self, episode):
codename = try_get(episode, lambda x: x['Codename'], compat_str)
title = episode.get('Title')
description = clean_html(dict_get(episode, ('ShortDescription', 'TinyDescription', )))
if not codename or not title or not episode.get('HasPlayableStream', True):
return
thumbnail = next((
url_or_none(thumb.get('Url'))
for thumb in variadic(dict_get(episode, ('Images', 'Image')) or {})
if thumb.get('IsMain')),
None)
return {
'_type': 'url_transparent',
'thumbnail': thumbnail,
'id': codename,
'episode_id': episode.get('Id'),
'title': title,
'alt_title': episode.get('Subtitle'),
'description': description,
'timestamp': parse_iso8601(episode.get('PublishDate')),
'duration': episode.get('DurationSeconds'),
'age_limit': self._parse_age_rating(episode),
'url': 'ertflix:%s' % (codename, ),
}
@staticmethod
def _parse_age_rating(info_dict):
return parse_age_limit(
info_dict.get('AgeRating')
or (info_dict.get('IsAdultContent') and 18)
or (info_dict.get('IsKidsContent') and 0))
def _extract_series(self, video_id, season_titles=None, season_numbers=None):
media_info = self._call_api(video_id, method='Tile/GetSeriesDetails', id=video_id)
series = try_get(media_info, lambda x: x['Series'], dict) or {}
series_info = {
'age_limit': self._parse_age_rating(series),
'title': series.get('Title'),
'description': dict_get(series, ('ShortDescription', 'TinyDescription', )),
}
if season_numbers:
season_titles = season_titles or []
for season in try_get(series, lambda x: x['Seasons'], list) or []:
if season.get('SeasonNumber') in season_numbers and season.get('Title'):
season_titles.append(season['Title'])
def gen_episode(m_info, season_titles):
for episode_group in try_get(m_info, lambda x: x['EpisodeGroups'], list) or []:
if season_titles and episode_group.get('Title') not in season_titles:
continue
episodes = try_get(episode_group, lambda x: x['Episodes'], list)
if not episodes:
continue
season_info = {
'season': episode_group.get('Title'),
'season_number': int_or_none(episode_group.get('SeasonNumber')),
}
try:
episodes = [(int(ep['EpisodeNumber']), ep) for ep in episodes]
episodes.sort()
except (KeyError, ValueError):
episodes = enumerate(episodes, 1)
for n, episode in episodes:
info = self._extract_episode(episode)
if info is None:
continue
info['episode_number'] = n
info.update(season_info)
yield info
return self.playlist_result(
gen_episode(media_info, season_titles), playlist_id=video_id, **series_info)
def _real_extract(self, url):
video_id = self._match_id(url)
if video_id.startswith('ser.'):
param_season = parse_qs(url).get('season', [None])
param_season = [
(have_number, int_or_none(v) if have_number else str_or_none(v))
for have_number, v in
[(int_or_none(ps) is not None, ps) for ps in param_season]
if v is not None
]
season_kwargs = {
k: [v for is_num, v in param_season if is_num is c] or None
for k, c in
[('season_titles', False), ('season_numbers', True)]
}
return self._extract_series(video_id, **season_kwargs)
return self._extract_episode(self._call_api_get_tiles(video_id))
class ERTWebtvEmbedIE(InfoExtractor):
IE_NAME = 'ertwebtv:embed'
IE_DESC = 'ert.gr webtv embedded videos'
_BASE_PLAYER_URL_RE = re.escape('//www.ert.gr/webtv/live-uni/vod/dt-uni-vod.php')
_VALID_URL = rf'https?:{_BASE_PLAYER_URL_RE}\?([^#]+&)?f=(?P<id>[^#&]+)'
_TESTS = [{
'url': 'https://www.ert.gr/webtv/live-uni/vod/dt-uni-vod.php?f=trailers/E2251_TO_DIKTYO_E09_16-01_1900.mp4&bgimg=/photos/2022/1/to_diktio_ep09_i_istoria_tou_diadiktiou_stin_Ellada_1021x576.jpg',
'md5': 'f9e9900c25c26f4ecfbddbb4b6305854',
'info_dict': {
'id': 'trailers/E2251_TO_DIKTYO_E09_16-01_1900.mp4',
'title': 'md5:914f06a73cd8b62fbcd6fb90c636e497',
'ext': 'mp4',
'thumbnail': 'https://program.ert.gr/photos/2022/1/to_diktio_ep09_i_istoria_tou_diadiktiou_stin_Ellada_1021x576.jpg'
},
}]
@classmethod
def _extract_urls(cls, webpage):
EMBED_URL_RE = rf'(?:https?:)?{cls._BASE_PLAYER_URL_RE}\?(?:(?!(?P=_q1)).)+'
EMBED_RE = rf'<iframe[^>]+?src=(?P<_q1>["\'])(?P<url>{EMBED_URL_RE})(?P=_q1)'
for mobj in re.finditer(EMBED_RE, webpage):
url = unescapeHTML(mobj.group('url'))
if not cls.suitable(url):
continue
yield url
def _real_extract(self, url):
video_id = self._match_id(url)
formats, subs = self._extract_m3u8_formats_and_subtitles(
f'https://mediastream.ert.gr/vodedge/_definst_/mp4:dvrorigin/{video_id}/playlist.m3u8',
video_id, 'mp4')
self._sort_formats(formats)
thumbnail_id = parse_qs(url).get('bgimg', [None])[0]
if thumbnail_id and not thumbnail_id.startswith('http'):
thumbnail_id = f'https://program.ert.gr{thumbnail_id}'
return {
'id': video_id,
'title': f'VOD - {video_id}',
'thumbnail': thumbnail_id,
'formats': formats,
'subtitles': subs,
}

View File

@@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class EuropeanTourIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?europeantour\.com/dpworld-tour/news/video/(?P<id>[^/&?#$]+)'
_TESTS = [{
'url': 'https://www.europeantour.com/dpworld-tour/news/video/the-best-shots-of-the-2021-seasons/',
'info_dict': {
'id': '6287788195001',
'ext': 'mp4',
'title': 'The best shots of the 2021 seasons',
'duration': 2416.512,
'timestamp': 1640010141,
'uploader_id': '5136026580001',
'tags': ['prod-imported'],
'thumbnail': 'md5:fdac52bc826548860edf8145ee74e71a',
'upload_date': '20211220'
},
'params': {'skip_download': True}
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
def _real_extract(self, url):
id = self._match_id(url)
webpage = self._download_webpage(url, id)
vid, aid = re.search(r'(?s)brightcove-player\s?video-id="([^"]+)".*"ACCOUNT_ID":"([^"]+)"', webpage).groups()
if not aid:
aid = '5136026580001'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (aid, vid), 'BrightcoveNew')

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
from .abc import (
ABCIE,
ABCIViewIE,
ABCIViewShowSeriesIE,
)
from .abcnews import (
AbcNewsIE,
@@ -36,7 +37,10 @@ from .aenetworks import (
HistoryPlayerIE,
BiographyIE,
)
from .afreecatv import AfreecaTVIE
from .afreecatv import (
AfreecaTVIE,
AfreecaTVLiveIE,
)
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
@@ -189,6 +193,7 @@ from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .cableav import CableAVIE
from .callin import CallinIE
from .cam4 import CAM4IE
from .camdemy import (
CamdemyIE,
@@ -299,6 +304,10 @@ from .cozytv import CozyTVIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .crooksandliars import CrooksAndLiarsIE
from .crowdbunker import (
CrowdBunkerIE,
CrowdBunkerChannelIE,
)
from .crunchyroll import (
CrunchyrollIE,
CrunchyrollShowPlaylistIE,
@@ -316,6 +325,7 @@ from .curiositystream import (
CuriosityStreamSeriesIE,
)
from .cwtv import CWTVIE
from .daftsex import DaftsexIE
from .dailymail import DailyMailIE
from .dailymotion import (
DailymotionIE,
@@ -351,11 +361,22 @@ from .dplay import (
DPlayIE,
DiscoveryPlusIE,
HGTVDeIE,
GoDiscoveryIE,
TravelChannelIE,
CookingChannelIE,
HGTVUsaIE,
FoodNetworkIE,
InvestigationDiscoveryIE,
DestinationAmericaIE,
AmHistoryChannelIE,
ScienceChannelIE,
DIYNetworkIE,
DiscoveryLifeIE,
AnimalPlanetIE,
TLCIE,
DiscoveryPlusIndiaIE,
DiscoveryNetworksDeIE,
DiscoveryPlusItalyIE,
DiscoveryPlusItalyShowIE,
DiscoveryPlusIndiaShowIE,
)
@@ -374,16 +395,16 @@ from .duboku import (
)
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .digitalconcerthall import DigitalConcertHallIE
from .discovery import DiscoveryIE
from .discoverygo import (
DiscoveryGoIE,
DiscoveryGoPlaylistIE,
)
from .discoveryvr import DiscoveryVRIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
from .doodstream import DoodStreamIE
from .dropbox import DropboxIE
from .dropout import (
DropoutSeasonIE,
DropoutIE
)
from .dw import (
DWIE,
DWArticleIE,
@@ -417,6 +438,11 @@ from .eroprofile import (
EroProfileIE,
EroProfileAlbumIE,
)
from .ertgr import (
ERTFlixCodenameIE,
ERTFlixIE,
ERTWebtvEmbedIE,
)
from .escapist import EscapistIE
from .espn import (
ESPNIE,
@@ -426,6 +452,7 @@ from .espn import (
)
from .esri import EsriVideoIE
from .europa import EuropaIE
from .europeantour import EuropeanTourIE
from .euscreen import EUScreenIE
from .expotv import ExpoTVIE
from .expressen import ExpressenIE
@@ -434,6 +461,7 @@ from .eyedotv import EyedoTVIE
from .facebook import (
FacebookIE,
FacebookPluginsVideoIE,
FacebookRedirectURLIE,
)
from .fancode import (
FancodeVodIE,
@@ -505,6 +533,14 @@ from .gab import (
)
from .gaia import GaiaIE
from .gameinformer import GameInformerIE
from .gamejolt import (
GameJoltIE,
GameJoltUserIE,
GameJoltGameIE,
GameJoltGameSoundtrackIE,
GameJoltCommunityIE,
GameJoltSearchIE,
)
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
@@ -523,6 +559,7 @@ from .globo import (
)
from .go import GoIE
from .godtube import GodTubeIE
from .gofile import GofileIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googlepodcasts import (
@@ -562,6 +599,10 @@ from .hrti import (
HRTiIE,
HRTiPlaylistIE,
)
from .hse import (
HSEShowIE,
HSEProductIE,
)
from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE
from .hungama import (
@@ -601,6 +642,7 @@ from .instagram import (
InstagramIOSIE,
InstagramUserIE,
InstagramTagIE,
InstagramStoryIE,
)
from .internazionale import InternazionaleIE
from .internetvideoarchive import InternetVideoArchiveIE
@@ -608,7 +650,11 @@ from .iprima import (
IPrimaIE,
IPrimaCNNIE
)
from .iqiyi import IqiyiIE
from .iqiyi import (
IqiyiIE,
IqIE,
IqAlbumIE
)
from .ir90tv import Ir90TvIE
from .itv import (
ITVIE,
@@ -635,6 +681,7 @@ from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE
from .kelbyone import KelbyOneIE
from .ketnet import KetnetIE
from .khanacademy import (
KhanAcademyIE,
@@ -702,7 +749,6 @@ from .limelight import (
LimelightChannelListIE,
)
from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
@@ -719,7 +765,10 @@ from .livestream import (
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .lnkgo import (
LnkGoIE,
LnkIE,
)
from .localnews8 import LocalNews8IE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
@@ -734,6 +783,7 @@ from .mailru import (
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .mainstreaming import MainStreamingIE
from .malltv import MallTVIE
from .mangomolo import (
MangomoloVideoIE,
@@ -799,7 +849,10 @@ from .mirrativ import (
)
from .mit import TechTVMITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixch import MixchIE
from .mixch import (
MixchIE,
MixchArchiveIE,
)
from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
@@ -838,6 +891,12 @@ from .mtv import (
)
from .muenchentv import MuenchenTVIE
from .musescore import MuseScoreIE
from .musicdex import (
MusicdexSongIE,
MusicdexAlbumIE,
MusicdexArtistIE,
MusicdexPlaylistIE,
)
from .mwave import MwaveIE, MwaveMeetGreetIE
from .mxplayer import (
MxplayerIE,
@@ -856,6 +915,10 @@ from .n1 import (
N1InfoAssetIE,
N1InfoIIE,
)
from .nate import (
NateIE,
NateProgramIE,
)
from .nationalgeographic import (
NationalGeographicVideoIE,
NationalGeographicTVIE,
@@ -910,6 +973,7 @@ from .newgrounds import (
NewgroundsUserIE,
)
from .newstube import NewstubeIE
from .newsy import NewsyIE
from .nextmedia import (
NextMediaIE,
NextMediaActionNewsIE,
@@ -957,6 +1021,7 @@ from .nitter import NitterIE
from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE
from .nonktube import NonkTubeIE
from .noodlemagazine import NoodleMagazineIE
from .noovo import NoovoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
@@ -1025,9 +1090,14 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .opencast import (
OpencastIE,
OpencastPlaylistIE,
)
from .openrec import (
OpenRecIE,
OpenRecCaptureIE,
OpenRecMovieIE,
)
from .ora import OraTVIE
from .orf import (
@@ -1098,6 +1168,10 @@ from .pinterest import (
PinterestIE,
PinterestCollectionIE,
)
from .pixivsketch import (
PixivSketchIE,
PixivSketchUserIE,
)
from .pladform import PladformIE
from .planetmarathi import PlanetMarathiIE
from .platzi import (
@@ -1121,6 +1195,10 @@ from .pokemon import (
PokemonIE,
PokemonWatchIE,
)
from .pokergo import (
PokerGoIE,
PokerGoCollectionIE,
)
from .polsatgo import PolsatGoIE
from .polskieradio import (
PolskieRadioIE,
@@ -1146,6 +1224,7 @@ from .pornhub import (
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE
from .pornez import PornezIE
from .puhutv import (
PuhuTVIE,
PuhuTVSerieIE,
@@ -1153,6 +1232,13 @@ from .puhutv import (
from .presstv import PressTVIE
from .projectveritas import ProjectVeritasIE
from .prosiebensat1 import ProSiebenSat1IE
from .prx import (
PRXStoryIE,
PRXSeriesIE,
PRXAccountIE,
PRXStoriesSearchIE,
PRXSeriesSearchIE
)
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
from .qqmusic import (
@@ -1189,9 +1275,10 @@ from .rai import (
RaiPlayIE,
RaiPlayLiveIE,
RaiPlayPlaylistIE,
RaiPlaySoundIE,
RaiPlaySoundLiveIE,
RaiPlaySoundPlaylistIE,
RaiIE,
RaiPlayRadioIE,
RaiPlayRadioPlaylistIE,
)
from .raywenderlich import (
RayWenderlichIE,
@@ -1246,13 +1333,26 @@ from .rtl2 import (
RTL2YouIE,
RTL2YouSeriesIE,
)
from .rtnews import (
RTNewsIE,
RTDocumentryIE,
RTDocumentryPlaylistIE,
RuptlyIE,
)
from .rtp import RTPIE
from .rtrfm import RTRFMIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtve import (
RTVEALaCartaIE,
RTVEAudioIE,
RTVELiveIE,
RTVEInfantilIE,
RTVETelevisionIE,
)
from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
from .ruhd import RUHDIE
from .rule34video import Rule34VideoIE
from .rumble import (
RumbleEmbedIE,
RumbleChannelIE,
@@ -1264,6 +1364,15 @@ from .rutube import (
RutubeMovieIE,
RutubePersonIE,
RutubePlaylistIE,
RutubeTagsIE,
)
from .glomex import (
GlomexIE,
GlomexEmbedIE,
)
from .megatvcom import (
MegaTVComIE,
MegaTVComEmbedIE,
)
from .rutv import RUTVIE
from .ruutu import RuutuIE
@@ -1315,6 +1424,7 @@ from .simplecast import (
)
from .sina import SinaIE
from .sixplay import SixPlayIE
from .skeb import SkebIE
from .skyit import (
SkyItPlayerIE,
SkyItVideoIE,
@@ -1350,6 +1460,7 @@ from .soundcloud import (
SoundcloudEmbedIE,
SoundcloudIE,
SoundcloudSetIE,
SoundcloudRelatedIE,
SoundcloudUserIE,
SoundcloudTrackStationIE,
SoundcloudPlaylistIE,
@@ -1451,7 +1562,12 @@ from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .teamtreehouse import TeamTreeHouseIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .ted import (
TedEmbedIE,
TedPlaylistIE,
TedSeriesIE,
TedTalkIE,
)
from .tele5 import Tele5IE
from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE
@@ -1497,6 +1613,9 @@ from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
TikTokUserIE,
TikTokSoundIE,
TikTokEffectIE,
TikTokTagIE,
DouyinIE,
)
from .tinypic import TinyPicIE
@@ -1511,6 +1630,9 @@ from .toggle import (
ToggleIE,
MeWatchIE,
)
from .toggo import (
ToggoIE,
)
from .tokentube import (
TokentubeIE,
TokentubeChannelIE
@@ -1527,6 +1649,7 @@ from .trovo import (
TrovoChannelVodIE,
TrovoChannelClipIE,
)
from .trueid import TrueIDIE
from .trunews import TruNewsIE
from .trutv import TruTVIE
from .tube8 import Tube8IE
@@ -1590,6 +1713,10 @@ from .tvnow import (
TVNowAnnualIE,
TVNowShowIE,
)
from .tvopengr import (
TVOpenGrWatchIE,
TVOpenGrEmbedIE,
)
from .tvp import (
TVPEmbedIE,
TVPIE,
@@ -1643,6 +1770,7 @@ from .dlive import (
DLiveVODIE,
DLiveStreamIE,
)
from .drooble import DroobleIE
from .umg import UMGDeIE
from .unistra import UnistraIE
from .unity import UnityIE
@@ -1717,6 +1845,10 @@ from .vimeo import (
VimeoWatchLaterIE,
VHXEmbedIE,
)
from .vimm import (
VimmIE,
VimmRecordingIE,
)
from .vimple import VimpleIE
from .vine import (
VineIE,
@@ -1863,6 +1995,7 @@ from .yandexmusic import (
)
from .yandexvideo import (
YandexVideoIE,
YandexVideoPreviewIE,
ZenYandexIE,
ZenYandexChannelIE,
)
@@ -1889,11 +2022,13 @@ from .youtube import (
YoutubeFavouritesIE,
YoutubeHistoryIE,
YoutubeTabIE,
YoutubeLivestreamEmbedIE,
YoutubePlaylistIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubeMusicSearchURLIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,

View File

@@ -13,21 +13,25 @@ from ..compat import (
)
from ..utils import (
clean_html,
determine_ext,
error_to_compat_str,
ExtractorError,
float_or_none,
get_element_by_id,
int_or_none,
js_to_json,
limit_length,
merge_dicts,
network_exceptions,
parse_count,
parse_qs,
qualities,
sanitized_Request,
traverse_obj,
try_get,
url_or_none,
urlencode_postdata,
urljoin,
variadic,
)
@@ -161,7 +165,7 @@ class FacebookIE(InfoExtractor):
'info_dict': {
'id': '1417995061575415',
'ext': 'mp4',
'title': 'Yaroslav Korpan - Довгоочікуване відео',
'title': 'Ukrainian Scientists Worldwide | Довгоочікуване відео',
'description': 'Довгоочікуване відео',
'timestamp': 1486648771,
'upload_date': '20170209',
@@ -192,8 +196,8 @@ class FacebookIE(InfoExtractor):
'info_dict': {
'id': '202882990186699',
'ext': 'mp4',
'title': 'Elisabeth Ahtn - Hello? Yes your uber ride is here\n* Jukin...',
'description': 'Hello? Yes your uber ride is here\n* Jukin Media Verified *\nFind this video and others like it by visiting...',
'title': 'birb (O v O") | Hello? Yes your uber ride is here',
'description': 'Hello? Yes your uber ride is here * Jukin Media Verified * Find this video and others like it by visiting...',
'timestamp': 1486035513,
'upload_date': '20170202',
'uploader': 'Elisabeth Ahtn',
@@ -395,28 +399,31 @@ class FacebookIE(InfoExtractor):
url.replace('://m.facebook.com/', '://www.facebook.com/'), video_id)
def extract_metadata(webpage):
video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage,
'title', default=None)
if not video_title:
video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', default=None)
if not video_title:
video_title = self._html_search_meta(
['og:title', 'twitter:title', 'description'],
webpage, 'title', default=None)
if video_title:
video_title = limit_length(video_title, 80)
else:
video_title = 'Facebook video #%s' % video_id
description = self._html_search_meta(
post_data = [self._parse_json(j, video_id, fatal=False) for j in re.findall(
r'handleWithCustomApplyEach\(\s*ScheduledApplyEach\s*,\s*(\{.+?\})\s*\);', webpage)]
post = traverse_obj(post_data, (
..., 'require', ..., ..., ..., '__bbox', 'result', 'data'), expected_type=dict) or []
media = [m for m in traverse_obj(post, (..., 'attachments', ..., 'media'), expected_type=dict) or []
if str(m.get('id')) == video_id and m.get('__typename') == 'Video']
title = traverse_obj(media, (..., 'title', 'text'), get_all=False)
description = traverse_obj(media, (
..., 'creation_story', 'comet_sections', 'message', 'story', 'message', 'text'), get_all=False)
uploader_data = (traverse_obj(media, (..., 'owner'), get_all=False)
or traverse_obj(post, (..., 'node', 'actors', ...), get_all=False) or {})
page_title = title or self._html_search_regex((
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>(?P<content>[^<]*)</h2>',
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(?P<content>.*?)</span>',
self._meta_regex('og:title'), self._meta_regex('twitter:title'), r'<title>(?P<content>.+?)</title>'
), webpage, 'title', default=None, group='content')
description = description or self._html_search_meta(
['description', 'og:description', 'twitter:description'],
webpage, 'description', default=None)
uploader = clean_html(get_element_by_id(
'fbPhotoPageAuthorName', webpage)) or self._search_regex(
r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader',
default=None) or self._og_search_title(webpage, fatal=False)
uploader = uploader_data.get('name') or (
clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
or self._search_regex(
(r'ownerName\s*:\s*"([^"]+)"', *self._og_regexes('title')), webpage, 'uploader', fatal=False))
timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None))
@@ -431,17 +438,17 @@ class FacebookIE(InfoExtractor):
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',
default=None))
info_dict = {
'title': video_title,
'description': description,
'uploader': uploader,
'uploader_id': uploader_data.get('id'),
'timestamp': timestamp,
'thumbnail': thumbnail,
'view_count': view_count,
}
info_json_ld = self._search_json_ld(webpage, video_id, default={})
if info_json_ld.get('title'):
info_json_ld['title'] = limit_length(
re.sub(r'\s*\|\s*Facebook$', '', info_json_ld['title']), 80)
info_json_ld['title'] = (re.sub(r'\s*\|\s*Facebook$', '', title or info_json_ld.get('title') or page_title or '')
or (description or '').replace('\n', ' ') or f'Facebook video #{video_id}')
return merge_dicts(info_json_ld, info_dict)
video_data = None
@@ -508,15 +515,19 @@ class FacebookIE(InfoExtractor):
def parse_graphql_video(video):
formats = []
q = qualities(['sd', 'hd'])
for (suffix, format_id) in [('', 'sd'), ('_quality_hd', 'hd')]:
playable_url = video.get('playable_url' + suffix)
for key, format_id in (('playable_url', 'sd'), ('playable_url_quality_hd', 'hd'),
('playable_url_dash', '')):
playable_url = video.get(key)
if not playable_url:
continue
formats.append({
'format_id': format_id,
'quality': q(format_id),
'url': playable_url,
})
if determine_ext(playable_url) == 'mpd':
formats.extend(self._extract_mpd_formats(playable_url, video_id))
else:
formats.append({
'format_id': format_id,
'quality': q(format_id),
'url': playable_url,
})
extract_dash_manifest(video, formats)
process_formats(formats)
v_id = video.get('videoId') or video.get('id') or video_id
@@ -544,22 +555,15 @@ class FacebookIE(InfoExtractor):
if media.get('__typename') == 'Video':
return parse_graphql_video(media)
nodes = data.get('nodes') or []
node = data.get('node') or {}
if not nodes and node:
nodes.append(node)
for node in nodes:
story = try_get(node, lambda x: x['comet_sections']['content']['story'], dict) or {}
attachments = try_get(story, [
lambda x: x['attached_story']['attachments'],
lambda x: x['attachments']
], list) or []
for attachment in attachments:
attachment = try_get(attachment, lambda x: x['style_type_renderer']['attachment'], dict)
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
nodes = variadic(traverse_obj(data, 'nodes', 'node') or [])
attachments = traverse_obj(nodes, (
..., 'comet_sections', 'content', 'story', (None, 'attached_story'), 'attachments',
..., ('styles', 'style_type_renderer'), 'attachment'), expected_type=dict) or []
for attachment in attachments:
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
edges = try_get(data, lambda x: x['mediaset']['currMedia']['edges'], list) or []
for edge in edges:
@@ -728,6 +732,7 @@ class FacebookPluginsVideoIE(InfoExtractor):
'info_dict': {
'id': '10154383743583686',
'ext': 'mp4',
# TODO: Fix title, uploader
'title': 'What to do during the haze?',
'uploader': 'Gov.sg',
'upload_date': '20160826',
@@ -746,3 +751,42 @@ class FacebookPluginsVideoIE(InfoExtractor):
return self.url_result(
compat_urllib_parse_unquote(self._match_id(url)),
FacebookIE.ie_key())
class FacebookRedirectURLIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'https?://(?:[\w-]+\.)?facebook\.com/flx/warn[/?]'
_TESTS = [{
'url': 'https://www.facebook.com/flx/warn/?h=TAQHsoToz&u=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DpO8h3EaFRdo&s=1',
'info_dict': {
'id': 'pO8h3EaFRdo',
'ext': 'mp4',
'title': 'Tripeo Boiler Room x Dekmantel Festival DJ Set',
'description': 'md5:2d713ccbb45b686a1888397b2c77ca6b',
'channel_id': 'UCGBpxWJr9FNOcFYA5GkKrMg',
'playable_in_embed': True,
'categories': ['Music'],
'channel': 'Boiler Room',
'uploader_id': 'brtvofficial',
'uploader': 'Boiler Room',
'tags': 'count:11',
'duration': 3332,
'live_status': 'not_live',
'thumbnail': 'https://i.ytimg.com/vi/pO8h3EaFRdo/maxresdefault.jpg',
'channel_url': 'https://www.youtube.com/channel/UCGBpxWJr9FNOcFYA5GkKrMg',
'availability': 'public',
'uploader_url': 'http://www.youtube.com/user/brtvofficial',
'upload_date': '20150917',
'age_limit': 0,
'view_count': int,
'like_count': int,
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
redirect_url = url_or_none(parse_qs(url).get('u', [None])[-1])
if not redirect_url:
raise ExtractorError('Invalid facebook redirect URL', expected=True)
return self.url_result(redirect_url)

View File

@@ -41,7 +41,7 @@ class FancodeVodIE(InfoExtractor):
_ACCESS_TOKEN = None
_NETRC_MACHINE = 'fancode'
_LOGIN_HINT = 'Use "--user refresh --password <refresh_token>" to login using a refresh token'
_LOGIN_HINT = 'Use "--username refresh --password <refresh_token>" to login using a refresh token'
headers = {
'content-type': 'application/json',

Some files were not shown because too many files have changed in this diff Show More