1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-11 17:31:31 +00:00

Compare commits

..

56 Commits

Author SHA1 Message Date
pukkandan
245524e6a3 Release 2021.07.21
and fix some typos
Closes #538
2021-07-22 02:33:28 +05:30
pukkandan
9c0d7f4951 [youtube] Make --extractor-retries work for more errors
Closes #507
2021-07-22 02:32:20 +05:30
pukkandan
e37d0efbd9 Fix bug where original_url was not propagated when _type=url 2021-07-22 02:32:19 +05:30
coletdjnz
c926c9541f [youtube] Add debug message for SAPISID cookie extraction (#540)
Authored by: colethedj
2021-07-21 20:45:05 +00:00
Matt Broadway
982ee69a74 Add option --cookies-from-browser to load cookies from a browser (#488)
* also adds `--no-cookies-from-browser`

Original PR: https://github.com/ytdl-org/youtube-dl/pull/29201
Authored by: mbway
2021-07-22 02:02:49 +05:30
pukkandan
7ea6541124 [youtube] Improve extraction of livestream metadata
Modified from and closes #441
Authored by: pukkandan, krichbanana
2021-07-21 20:50:59 +05:30
pukkandan
ae30b84072 Add field live_status 2021-07-21 20:50:58 +05:30
pukkandan
cc9d1493c6 bugfix for 50fed816dd 2021-07-21 20:50:49 +05:30
Philip Xu
f6755419d1 [douyin] Add extractor (#513)
Authored-by: pukkandan, pyx
2021-07-21 20:49:27 +05:30
Henrik Heimbuerger
145bd631c5 [nebula] Authentication via tokens from cookie jar (#537)
Closes #496
Co-authored-by: hheimbuerger, TpmKranz
2021-07-21 18:12:43 +05:30
pukkandan
b35496d825 Add only_once param for write_debug 2021-07-21 18:06:34 +05:30
pukkandan
352d63fdb5 [utils] Improve traverse_obj 2021-07-21 11:30:06 +05:30
pukkandan
11f9be0912 [youtube] Extract data from multiple clients (#536)
* `player_client` accepts multiple clients
* default `player_client` = `android,web`
* music clients can be specifically requested
* Add IOS `player_client`
* Hide live dash since they can't be downloaded

Closes #501

Authored-by: pukkandan, colethedj
2021-07-21 09:22:34 +05:30
pukkandan
c84aeac6b5 Add only_once param for report_warning
Related: https://github.com/yt-dlp/yt-dlp/pull/488#discussion_r667527297
2021-07-21 01:39:58 +05:30
pukkandan
50fed816dd Errors in playlist extraction should obey --ignore-errors
Related: https://github.com/yt-dlp/yt-dlp/issues/535#issuecomment-883277272, https://github.com/yt-dlp/yt-dlp/issues/518#issuecomment-881794754
2021-07-21 01:04:53 +05:30
coletdjnz
a1a7907bc0 [youtube] Fix controversial videos when requested via API (#533)
Closes: https://github.com/yt-dlp/yt-dlp/issues/511#issuecomment-883024350
Authored by: colethedj
2021-07-20 23:31:28 +05:30
pukkandan
d61fc64618 [youtube:tab] Fix channels tab 2021-07-20 23:22:34 +05:30
pukkandan
6586bca9b9 [utils] Fix LazyList for Falsey values 2021-07-20 23:22:26 +05:30
pukkandan
da503b7a52 [youtube] Make parse_time_text and _extract_chapters non-fatal
Related: #532, 7c365c2109
2021-07-20 07:22:26 +05:30
pukkandan
7c365c2109 [youtube] Sanity check chapters (and refactor related code)
Closes #520
2021-07-20 05:39:02 +05:30
pukkandan
3f698246b2 Rename NOTE in -F to MORE INFO
since it's often confused to be the same as `format_note`
2021-07-20 05:30:28 +05:30
pukkandan
cca80fe611 [youtube] Extract even more thumbnails and reduce testing
* Also fix bug where `_test_url` was being ignored

Ref: https://stackoverflow.com/a/20542029
Related: #340
2021-07-20 03:46:06 +05:30
pukkandan
c634ad2a3c [compat] Remove unnecessary code 2021-07-20 03:46:05 +05:30
pukkandan
8f3343809e [utils] Improve traverse_obj
* Allow skipping a level: `traverse_obj([{k:v1}, {k:v2}], (None, k))` => `[v1, v2]`
* Make keys variadic: `traverse_obj(obj, k1: str, k2: str)` => `traverse_obj(obj, (k1,), (k2,))`
* Fetch from multiple keys: `traverse_obj([{k1:[1], k2:[2], k3:[3]}], (0, (k1, k2), 0))` => `[1, 2]`

TODO: Add tests
2021-07-20 02:42:11 +05:30
pukkandan
0ba692acc8 [youtube] Extract more thumbnails
* The thumbnail URLs are hard-coded and their actual existence is tested lazily
* Added option `--no-check-formats` to not test them

Closes #340, Related: #402, #337, https://github.com/ytdl-org/youtube-dl/issues/29049
2021-07-20 02:42:11 +05:30
pukkandan
d9488f69c1 [crunchyroll:playlist] Force http
Closes #495
2021-07-20 02:42:11 +05:30
pukkandan
dce8743677 [docs] fix default of multistreams 2021-07-19 23:47:57 +05:30
pukkandan
5520aa2dc9 Add option --exec-before-download
Closes #530
2021-07-19 23:47:45 +05:30
mzbaulhaque
8d9b902243 [pornflip] Add new extractor (#523)
Authored-by: mzbaulhaque
2021-07-19 23:46:21 +05:30
coletdjnz
fe93e2c4cf [youtube] misc cleanup and bug fixes (#505)
* Update some `_extract_response` calls to keep them consistent
* Cleanup continuation extraction related code using new API format
* Improve `_extract_account_syncid` to support multiple parameters
* Generalize `get_text` and related functions into one
* Update `INNERTUBE_CONTEXT_CLIENT_NAME` with integer values

Authored by: colethedj
2021-07-19 10:25:07 +05:30
coletdjnz
314ee30548 [youtube] Fix session index extraction and headers for non-web player clients (#526)
Fixes #522
2021-07-18 06:23:32 +00:00
coletdjnz
34917076ad [youtube] Fix authentication when using multiple accounts
`SESSION_INDEX` in `ytcfg` is the index of the active account and should be sent as `X-Goog-AuthUser` header

Closes #518
Authored by @colethedj
2021-07-17 11:50:05 +05:30
The Hatsune Daishi
ccc7795ca3 [yahoo:gyao:player] Relax _VALID_URL (#503)
Authored by: nao20010128nao
2021-07-16 20:06:53 +05:30
Felix S
da1c94ee45 [generic] Extract previously missed subtitles (#515)
* [generic] Extract subtitles in cases missed previously
* [common] Detect discarded subtitles in SMIL manifests
* [generic] Extract everything in the SMIL manifest

Authored by: fstirlitz
2021-07-16 19:52:56 +05:30
pukkandan
3b297919e0 Revert "Merge webm formats into mkv if thumbnails are to be embedded (#173)"
This reverts commit 4d971a16b8 by @damianoamatruda
Closes #500

This was wrongly checking for `write_thumbnail`
2021-07-15 23:34:52 +05:30
coletdjnz
47193e0298 [youtube:tab] Extract playlist availability (#504)
Authored by: colethedj
2021-07-15 02:42:30 +00:00
coletdjnz
49bd8c66d3 [youtube:comments] Improve comment vote count parsing (fixes #506) (#508)
Authored by: colethedj
2021-07-14 23:24:42 +00:00
Felix S
182b6ae8a6 [RTP] Fix extraction and add subtitles (#497)
Authored by: fstirlitz
2021-07-14 05:06:18 +05:30
felix
c843e68588 [utils] Improve js_to_json comment regex
Capture the newline character as part of a single-line comment

From #497, Authored by: fstirlitz
2021-07-14 05:02:43 +05:30
felix
198f7ea89e [extractor] Allow extracting multiple groups in _search_regex
From #497, Authored by: fstirlitz
2021-07-14 05:02:42 +05:30
coletdjnz
c888ffb95a [youtube] Use android client as default and add age-gate bypass for it (#492)
Authored by: colethedj
2021-07-14 03:58:51 +05:30
coletdjnz
9752433221 [youtube:comments] Fix is_favorited (#491)
Authored by colethedj
2021-07-12 06:50:03 +05:30
pukkandan
f0ff9979c6 [vlive] Extract thumbnail directly in addition to the one from Naver
Closes #477
2021-07-12 06:07:23 +05:30
pukkandan
501dd1ad55 [metadatafromfield] Do not detect numbers as field names
Related: https://github.com/yt-dlp/yt-dlp/issues/486#issuecomment-877820394
2021-07-12 05:20:12 +05:30
pukkandan
75722b037d [webtt] Fix timestamps
Closes #474
2021-07-12 05:20:12 +05:30
coletdjnz
2d6659b9ea [youtube:comments] Move comment extraction to new API (#466)
Closes #438, #481, #485 

Authored by: colethedj
2021-07-12 04:48:40 +05:30
Kevin O'Connor
c5370857b3 [BravoTV] Improve metadata extraction (#483)
Authored by: kevinoconnor7
2021-07-11 16:36:26 +05:30
pukkandan
00034c146a [embedthumbnail] Fix _get_thumbnail_resolution 2021-07-11 04:46:53 +05:30
pukkandan
325ebc1703 Improve traverse_obj 2021-07-11 04:46:53 +05:30
pukkandan
7dde84f3c9 [FFmpegMetadata] Add language of each stream
and some refactoring
2021-07-11 04:46:52 +05:30
pukkandan
6606817a86 [utils] Add variadic 2021-07-11 04:46:51 +05:30
zackmark29
73d829c144 [VIKI] Rewrite extractors (#475)
Closes #462
Also added extractor-arg `video_types` to `vikichannel`

Co-authored-by: zackmark29, pukkandan
2021-07-10 02:08:09 +05:30
pukkandan
60bdb7bd9e [youtube] Fix sorting of 3gp format 2021-07-08 22:33:33 +05:30
pukkandan
4bb6b02f93 Improve extractor_args parsing 2021-07-08 21:22:35 +05:30
pukkandan
b5ac45b197 Fix selectors all, mergeall and add tests
Bug from: 981052c9c6
2021-07-07 21:10:43 +05:30
pukkandan
38a40c9e16 [version] update
:ci skip all
2021-07-07 05:43:58 +05:30
42 changed files with 2874 additions and 4234 deletions

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've verified that I'm running yt-dlp version **2021.07.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.06.23
[debug] yt-dlp version 2021.07.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've verified that I'm running yt-dlp version **2021.07.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've verified that I'm running yt-dlp version **2021.07.07**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've verified that I'm running yt-dlp version **2021.07.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.06.23
[debug] yt-dlp version 2021.07.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've verified that I'm running yt-dlp version **2021.07.07**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -11,8 +11,8 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install nose
run: pip install nose
- name: Install test requirements
run: pip install nose pycryptodome
- name: Run tests
env:
YTDL_TEST_SET: core

1
.gitignore vendored
View File

@@ -33,6 +33,7 @@ cookies.txt
*.info.json
*.live_chat.json
*.jpg
*.jpeg
*.png
*.webp
*.annotations.xml

View File

@@ -58,3 +58,8 @@ krichbanana
ohmybahgosh
nyuszika7h
blackjack4494
pyx
TpmKranz
mzbaulhaque
zackmark29
mbway

View File

@@ -18,13 +18,76 @@
-->
### 2021.07.21
* **Add option `--cookies-from-browser`** to load cookies from a browser by [mbway](https://github.com/mbway)
* Usage: `--cookies-from-browser BROWSER[:PROFILE_NAME_OR_PATH]`
* Also added `--no-cookies-from-browser`
* To decrypt chromium cookies, `keyring` is needed for UNIX and `pycryptodome` for Windows
* Add option `--exec-before-download`
* Add field `live_status`
* [FFmpegMetadata] Add language of each stream and some refactoring
* [douyin] Add extractor by [pukkandan](https://github.com/pukkandan), [pyx](https://github.com/pyx)
* [pornflip] Add extractor by [mzbaulhaque](https://github.com/mzbaulhaque)
* **[youtube] Extract data from multiple clients** by [pukkandan](https://github.com/pukkandan), [colethedj](https://github.com/colethedj)
* `player_client` now accepts multiple clients
* Default `player_client` = `android,web`
* This uses twice as many requests, but avoids throttling for most videos while also not losing any formats
* Music clients can be specifically requested and is enabled by default if `music.youtube.com`
* Added `player_client=ios` (Known issue: formats from ios are not sorted correctly)
* Add age-gate bypass for android and ios clients
* [youtube] Extract more thumbnails
* The thumbnail URLs are hard-coded and their actual existence is tested lazily
* Added option `--no-check-formats` to not test them
* [youtube] Misc fixes
* Improve extraction of livestream metadata by [pukkandan](https://github.com/pukkandan), [krichbanana](https://github.com/krichbanana)
* Hide live dash formats since they can't be downloaded anyway
* Fix authentication when using multiple accounts by [colethedj](https://github.com/colethedj)
* Fix controversial videos when requested via API by [colethedj](https://github.com/colethedj)
* Fix session index extraction and headers for non-web player clients by [colethedj](https://github.com/colethedj)
* Make `--extractor-retries` work for more errors
* Fix sorting of 3gp format
* Sanity check `chapters` (and refactor related code)
* Make `parse_time_text` and `_extract_chapters` non-fatal
* Misc cleanup and bug fixes by [colethedj](https://github.com/colethedj)
* [youtube:tab] Fix channels tab
* [youtube:tab] Extract playlist availability by [colethedj](https://github.com/colethedj)
* **[youtube:comments] Move comment extraction to new API** by [colethedj](https://github.com/colethedj)
* [youtube:comments] Fix `is_favorited`, improve `like_count` parsing by [colethedj](https://github.com/colethedj)
* [BravoTV] Improve metadata extraction by [kevinoconnor7](https://github.com/kevinoconnor7)
* [crunchyroll:playlist] Force http
* [yahoo:gyao:player] Relax `_VALID_URL` by [nao20010128nao](https://github.com/nao20010128nao)
* [nebula] Authentication via tokens from cookie jar by [hheimbuerger](https://github.com/hheimbuerger), [TpmKranz](https://github.com/TpmKranz)
* [RTP] Fix extraction and add subtitles by [fstirlitz](https://github.com/fstirlitz)
* [viki] Rewrite extractors and add extractor-arg `video_types` to `vikichannel` by [zackmark29](https://github.com/zackmark29), [pukkandan](https://github.com/pukkandan)
* [vlive] Extract thumbnail directly in addition to the one from Naver
* [generic] Extract previously missed subtitles by [fstirlitz](https://github.com/fstirlitz)
* [generic] Extract everything in the SMIL manifest and detect discarded subtitles by [fstirlitz](https://github.com/fstirlitz)
* [embedthumbnail] Fix `_get_thumbnail_resolution`
* [metadatafromfield] Do not detect numbers as field names
* Fix selectors `all`, `mergeall` and add tests
* Errors in playlist extraction should obey `--ignore-errors`
* Fix bug where `original_url` was not propagated when `_type`=`url`
* Revert "Merge webm formats into mkv if thumbnails are to be embedded (#173)"
* This was wrongly checking for `write_thumbnail`
* Improve `extractor_args` parsing
* Rename `NOTE` in `-F` to `MORE INFO` since it's often confused to be the same as `format_note`
* Add `only_once` param for `write_debug` and `report_warning`
* [extractor] Allow extracting multiple groups in `_search_regex` by [fstirlitz](https://github.com/fstirlitz)
* [utils] Improve `traverse_obj`
* [utils] Add `variadic`
* [utils] Improve `js_to_json` comment regex by [fstirlitz](https://github.com/fstirlitz)
* [webtt] Fix timestamps
* [compat] Remove unnecessary code
* [doc] fix default of multistreams
### 2021.07.07
* Merge youtube-dl: Upto [commit/a803582](https://github.com/ytdl-org/youtube-dl/commit/a8035827177d6b59aca03bd717acb6a9bdd75ada)
* Add `--extractor-args` to pass extractor-specific arguments
* Add `--extractor-args` to pass some extractor-specific arguments. See [readme](https://github.com/yt-dlp/yt-dlp#extractor-arguments)
* Add extractor option `skip` for `youtube`. Eg: `--extractor-args youtube:skip=hls,dash`
* Deprecates --youtube-skip-dash-manifest, --youtube-skip-hls-manifest, --youtube-include-dash-manifest, --youtube-include-hls-manifest
* Deprecates `--youtube-skip-dash-manifest`, `--youtube-skip-hls-manifest`, `--youtube-include-dash-manifest`, `--youtube-include-hls-manifest`
* Allow `--list...` options to work with `--print`, `--quiet` and other `--list...` options
* [youtube] Use `player` API for additional video extraction requests by [colethedj](https://github.com/colethedj)
* **Fixes youtube premium music** (format 141) extraction

View File

@@ -75,19 +75,22 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* All Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`) supports downloading multiple pages of content
* Search (`ytsearch:`, `ytsearchdate:`), search URLs and in-channel search works
* Mixes supports downloading multiple pages of content
* Partial workarounds for age-gate and throttling issues
* Redirect channel's home URL automatically to `/video` to preserve the old behaviour
* `255kbps` audio is extracted from youtube music if premium cookies are given
* Youtube music Albums, channels etc can be downloaded
* **Cookies from browser**: Cookies can be automatically extracted from all major web browsers using `--cookies-from-browser BROWSER[:PROFILE]`
* **Split video by chapters**: Videos can be split into multiple files based on chapters using `--split-chapters`
* **Multi-threaded fragment downloads**: Download multiple fragments of m3u8/mpd videos in parallel. Use `--concurrent-fragments` (`-N`) option to set the number of threads used
* **Aria2c with HLS/DASH**: You can use `aria2c` as the external downloader for DASH(mpd) and HLS(m3u8) formats
* **New extractors**: AnimeLab, Philo MSO, Spectrum MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive, RCTIPlus, TBS Live
* **New extractors**: AnimeLab, Philo MSO, Spectrum MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive, RCTIPlus, TBS Live, douyin, pornflip
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon, videa, yahoo
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon, videa, yahoo, BravoTV, crunchyroll playlist, RTP, viki
* **Subtitle extraction from manifests**: Subtitles can be extracted from streaming media manifests. See [commit/be6202f](https://github.com/yt-dlp/yt-dlp/commit/be6202f12b97858b9d716e608394b51065d0419f) for details
@@ -185,6 +188,7 @@ While all the other dependancies are optional, `ffmpeg` and `ffprobe` are highly
* [**mutagen**](https://github.com/quodlibet/mutagen) - For embedding thumbnail in certain formats. Licenced under [GPLv2+](https://github.com/quodlibet/mutagen/blob/master/COPYING)
* [**pycryptodome**](https://github.com/Legrandin/pycryptodome) - For decrypting various data. Licenced under [BSD2](https://github.com/Legrandin/pycryptodome/blob/master/LICENSE.rst)
* [**websockets**](https://github.com/aaugustin/websockets) - For downloading over websocket. Licenced under [BSD3](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**keyring**](https://github.com/jaraco/keyring) - For decrypting cookies of chromium-based browsers on Linux. Licenced under [MIT](https://github.com/jaraco/keyring/blob/main/LICENSE)
* [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen is not present. Licenced under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING)
* [**rtmpdump**](http://rtmpdump.mplayerhq.hu) - For downloading `rtmp` streams. ffmpeg will be used as a fallback. Licenced under [GPLv2+](http://rtmpdump.mplayerhq.hu)
* [**mplayer**](http://mplayerhq.hu/design7/info.html) or [**mpv**](https://mpv.io) - For downloading `rstp` streams. ffmpeg will be used as a fallback. Licenced under [GPLv2+](https://github.com/mpv-player/mpv/blob/master/Copyright)
@@ -520,7 +524,19 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
option)
--cookies FILE File to read cookies from and dump cookie
jar in
--no-cookies Do not read/dump cookies (default)
--no-cookies Do not read/dump cookies from/to file
(default)
--cookies-from-browser BROWSER[:PROFILE]
Load cookies from a user profile of the
given web browser. Currently supported
browsers are: brave|chrome|chromium|edge|fi
refox|opera|safari|vivaldi. You can specify
the user profile name or directory using
"BROWSER:PROFILE_NAME" or
"BROWSER:PROFILE_PATH". If no profile is
given, the most recently accessed one is
used
--no-cookies-from-browser Do not load cookies from browser (default)
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information (such
as client ids and signatures) permanently.
@@ -638,7 +654,9 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-prefer-free-formats Don't give any special preference to free
containers (default)
--check-formats Check that the formats selected are
actually downloadable (Experimental)
actually downloadable
--no-check-formats Do not check that the formats selected are
actually downloadable
-F, --list-formats List all available formats of requested
videos
--merge-output-format FORMAT If a merge is required (e.g.
@@ -773,6 +791,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
downloaded file is also available. If no
fields are passed, "%(filepath)s" is
appended to the end of the command
--exec-before-download CMD Execute a command before the actual
download. The syntax is the same as --exec
--convert-subs FORMAT Convert the subtitles to another format
(currently supported: srt|vtt|ass|lrc)
(Alias: --convert-subtitles)
@@ -822,7 +842,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--extractor-args KEY:ARGS Pass these arguments to the extractor. See
"EXTRACTOR ARGUMENTS" for details. You can
use this option multiple times to give
different arguments to different extractors
arguments for different extractors
# CONFIGURATION
@@ -937,6 +957,7 @@ The available fields are:
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video (For some extractors, comments are only downloaded at the end, and so this field cannot be used)
- `age_limit` (numeric): Age restriction for the video (years)
- `live_status` (string): One of 'is_live', 'was_live', 'upcoming', 'not_live'
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `was_live` (boolean): Whether this video was originally a live stream
- `playable_in_embed` (string): Whether this video is allowed to play in embedded players on other sites
@@ -1100,7 +1121,7 @@ If you want to download multiple videos and they don't have the same formats ava
If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can merge the video and audio of multiple formats into a single file using `-f <format1>+<format2>+...` (requires ffmpeg installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg. If `--no-video-multistreams` is used, all formats with a video stream except the first one are ignored. Similarly, if `--no-audio-multistreams` is used, all formats with an audio stream except the first one are ignored. For example, `-f bestvideo+best+bestaudio` will download and merge all 3 given formats. The resulting file will have 2 video streams and 2 audio streams. But `-f bestvideo+best+bestaudio --no-video-multistreams` will download and merge only `bestvideo` and `bestaudio`. `best` is ignored since another format containing a video stream (`bestvideo`) has already been selected. The order of the formats is therefore important. `-f best+bestaudio --no-audio-multistreams` will download and merge both formats while `-f bestaudio+best --no-audio-multistreams` will ignore `best` and download only `bestaudio`.
You can merge the video and audio of multiple formats into a single file using `-f <format1>+<format2>+...` (requires ffmpeg installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg. Unless `--video-multistreams` is used, all formats with a video stream except the first one are ignored. Similarly, unless `--audio-multistreams` is used, all formats with an audio stream except the first one are ignored. For example, `-f bestvideo+best+bestaudio --video-multistreams --audio-multistreams` will download and merge all 3 given formats. The resulting file will have 2 video streams and 2 audio streams. But `-f bestvideo+best+bestaudio --no-video-multistreams` will download and merge only `bestvideo` and `bestaudio`. `best` is ignored since another format containing a video stream (`bestvideo`) has already been selected. The order of the formats is therefore important. `-f best+bestaudio --no-audio-multistreams` will download and merge both formats while `-f bestaudio+best --no-audio-multistreams` will ignore `best` and download only `bestaudio`.
## Filtering Formats
@@ -1333,13 +1354,19 @@ Some extractors accept additional arguments which can be passed using `--extract
The following extractors use this feature:
* **youtube**
* `skip`: `hls` or `dash` (or both) to skip download of the respective manifests
* `player_client`: `web` (default) or `android` (force use the android client fallbacks for video extraction)
* `player_skip`: `configs` - skip requests if applicable for client configs and use defaults
* `player_client`: Clients to extract video data from - one or more of `web`, `android`, `ios`, `web_music`, `android_music`, `ios_music`. By default, `android,web` is used. If the URL is from `music.youtube.com`, `android,web,android_music,web_music` is used
* `player_skip`: `configs` - skip any requests for client configs and use defaults
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side).
* `max_comments`: maximum amount of comments to download (default all).
* `max_comment_depth`: maximum depth for nested comments. YouTube supports depths 1 or 2 (default).
* **funimation**
* `language`: Languages to extract. Eg: `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
* **vikiChannel**
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
NOTE: These options may be changed/removed in the future without concern for backward compatibility

View File

@@ -259,6 +259,7 @@
- **dlive:vod**
- **DoodStream**
- **Dotsub**
- **Douyin**
- **DouyuShow**
- **DouyuTV**: 斗鱼
- **DPlay**
@@ -306,6 +307,7 @@
- **EyedoTV**
- **facebook**
- **FacebookPluginsVideo**
- **fancode:live**
- **fancode:vod**
- **faz.net**
- **fc2**
@@ -343,6 +345,8 @@
- **FrontendMastersLesson**
- **FujiTVFODPlus7**
- **Funimation**
- **funimation:page**
- **funimation:show**
- **Funk**
- **Fusion**
- **Fux**
@@ -766,6 +770,7 @@
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **PornFlip**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
- **PornHubPagedVideoList**
@@ -808,6 +813,8 @@
- **RCS**
- **RCSEmbeds**
- **RCSVarious**
- **RCTIPlus**
- **RCTIPlusSeries**
- **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**

View File

@@ -35,6 +35,9 @@ class YDL(FakeYDL):
def to_screen(self, msg):
self.msgs.append(msg)
def dl(self, *args, **kwargs):
assert False, 'Downloader must not be invoked for test_YoutubeDL'
def _make_result(formats, **kwargs):
res = {
@@ -117,35 +120,24 @@ class TestFormatSelection(unittest.TestCase):
]
info_dict = _make_result(formats)
ydl = YDL({'format': '20/47'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '47')
def test(inp, *expected, multi=False):
ydl = YDL({
'format': inp,
'allow_multiple_video_streams': multi,
'allow_multiple_audio_streams': multi,
})
ydl.process_ie_result(info_dict.copy())
downloaded = map(lambda x: x['format_id'], ydl.downloaded_info_dicts)
self.assertEqual(list(downloaded), list(expected))
ydl = YDL({'format': '20/71/worst'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '35')
ydl = YDL()
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '2')
ydl = YDL({'format': 'webm/mp4'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '47')
ydl = YDL({'format': '3gp/40/mp4'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '35')
ydl = YDL({'format': 'example-with-dashes'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'example-with-dashes')
test('20/47', '47')
test('20/71/worst', '35')
test(None, '2')
test('webm/mp4', '47')
test('3gp/40/mp4', '35')
test('example-with-dashes', 'example-with-dashes')
test('all', '35', 'example-with-dashes', '45', '47', '2') # Order doesn't actually matter for this
test('mergeall', '2+47+45+example-with-dashes+35', multi=True)
def test_format_selection_audio(self):
formats = [

96
test/test_cookies.py Normal file
View File

@@ -0,0 +1,96 @@
import unittest
from datetime import datetime, timezone
from yt_dlp import cookies
from yt_dlp.cookies import (
CRYPTO_AVAILABLE,
LinuxChromeCookieDecryptor,
MacChromeCookieDecryptor,
WindowsChromeCookieDecryptor,
YDLLogger,
parse_safari_cookies,
pbkdf2_sha1,
)
class MonkeyPatch:
def __init__(self, module, temporary_values):
self._module = module
self._temporary_values = temporary_values
self._backup_values = {}
def __enter__(self):
for name, temp_value in self._temporary_values.items():
self._backup_values[name] = getattr(self._module, name)
setattr(self._module, name, temp_value)
def __exit__(self, exc_type, exc_val, exc_tb):
for name, backup_value in self._backup_values.items():
setattr(self._module, name, backup_value)
class TestCookies(unittest.TestCase):
def test_chrome_cookie_decryptor_linux_derive_key(self):
key = LinuxChromeCookieDecryptor.derive_key(b'abc')
self.assertEqual(key, b'7\xa1\xec\xd4m\xfcA\xc7\xb19Z\xd0\x19\xdcM\x17')
def test_chrome_cookie_decryptor_mac_derive_key(self):
key = MacChromeCookieDecryptor.derive_key(b'abc')
self.assertEqual(key, b'Y\xe2\xc0\xd0P\xf6\xf4\xe1l\xc1\x8cQ\xcb|\xcdY')
def test_chrome_cookie_decryptor_linux_v10(self):
with MonkeyPatch(cookies, {'_get_linux_keyring_password': lambda *args, **kwargs: b''}):
encrypted_value = b'v10\xccW%\xcd\xe6\xe6\x9fM" \xa7\xb0\xca\xe4\x07\xd6'
value = 'USD'
decryptor = LinuxChromeCookieDecryptor('Chrome', YDLLogger())
self.assertEqual(decryptor.decrypt(encrypted_value), value)
def test_chrome_cookie_decryptor_linux_v11(self):
with MonkeyPatch(cookies, {'_get_linux_keyring_password': lambda *args, **kwargs: b'',
'KEYRING_AVAILABLE': True}):
encrypted_value = b'v11#\x81\x10>`w\x8f)\xc0\xb2\xc1\r\xf4\x1al\xdd\x93\xfd\xf8\xf8N\xf2\xa9\x83\xf1\xe9o\x0elVQd'
value = 'tz=Europe.London'
decryptor = LinuxChromeCookieDecryptor('Chrome', YDLLogger())
self.assertEqual(decryptor.decrypt(encrypted_value), value)
@unittest.skipIf(not CRYPTO_AVAILABLE, 'cryptography library not available')
def test_chrome_cookie_decryptor_windows_v10(self):
with MonkeyPatch(cookies, {
'_get_windows_v10_key': lambda *args, **kwargs: b'Y\xef\xad\xad\xeerp\xf0Y\xe6\x9b\x12\xc2<z\x16]\n\xbb\xb8\xcb\xd7\x9bA\xc3\x14e\x99{\xd6\xf4&'
}):
encrypted_value = b'v10T\xb8\xf3\xb8\x01\xa7TtcV\xfc\x88\xb8\xb8\xef\x05\xb5\xfd\x18\xc90\x009\xab\xb1\x893\x85)\x87\xe1\xa9-\xa3\xad='
value = '32101439'
decryptor = WindowsChromeCookieDecryptor('', YDLLogger())
self.assertEqual(decryptor.decrypt(encrypted_value), value)
def test_chrome_cookie_decryptor_mac_v10(self):
with MonkeyPatch(cookies, {'_get_mac_keyring_password': lambda *args, **kwargs: b'6eIDUdtKAacvlHwBVwvg/Q=='}):
encrypted_value = b'v10\xb3\xbe\xad\xa1[\x9fC\xa1\x98\xe0\x9a\x01\xd9\xcf\xbfc'
value = '2021-06-01-22'
decryptor = MacChromeCookieDecryptor('', YDLLogger())
self.assertEqual(decryptor.decrypt(encrypted_value), value)
def test_safari_cookie_parsing(self):
cookies = \
b'cook\x00\x00\x00\x01\x00\x00\x00i\x00\x00\x01\x00\x01\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00Y' \
b'\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x008\x00\x00\x00B\x00\x00\x00F\x00\x00\x00H' \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x03\xa5>\xc3A\x00\x00\x80\xc3\x07:\xc3A' \
b'localhost\x00foo\x00/\x00test%20%3Bcookie\x00\x00\x00\x054\x07\x17 \x05\x00\x00\x00Kbplist00\xd1\x01' \
b'\x02_\x10\x18NSHTTPCookieAcceptPolicy\x10\x02\x08\x0b&\x00\x00\x00\x00\x00\x00\x01\x01\x00\x00\x00' \
b'\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00('
jar = parse_safari_cookies(cookies)
self.assertEqual(len(jar), 1)
cookie = list(jar)[0]
self.assertEqual(cookie.domain, 'localhost')
self.assertEqual(cookie.port, None)
self.assertEqual(cookie.path, '/')
self.assertEqual(cookie.name, 'foo')
self.assertEqual(cookie.value, 'test%20%3Bcookie')
self.assertFalse(cookie.secure)
expected_expiration = datetime(2021, 6, 18, 21, 39, 19, tzinfo=timezone.utc)
self.assertEqual(cookie.expires, int(expected_expiration.timestamp()))
def test_pbkdf2_sha1(self):
key = pbkdf2_sha1(b'peanuts', b' ' * 16, 1, 16)
self.assertEqual(key, b'g\xe1\x8e\x0fQ\x1c\x9b\xf3\xc9`!\xaa\x90\xd9\xd34')

View File

@@ -1054,6 +1054,9 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{ "040": "040" }')
self.assertEqual(json.loads(on), {'040': '040'})
on = js_to_json('[1,//{},\n2]')
self.assertEqual(json.loads(on), [1, 2])
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')

View File

@@ -31,7 +31,6 @@ from zipimport import zipimporter
from .compat import (
compat_basestring,
compat_cookiejar,
compat_get_terminal_size,
compat_kwargs,
compat_numeric_types,
@@ -42,6 +41,7 @@ from .compat import (
compat_urllib_request,
compat_urllib_request_DataHandler,
)
from .cookies import load_cookies
from .utils import (
age_restricted,
args_to_str,
@@ -110,7 +110,6 @@ from .utils import (
version_tuple,
write_json_file,
write_string,
YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
YoutubeDLRedirectHandler,
@@ -209,6 +208,9 @@ class YoutubeDL(object):
into a single file
allow_multiple_audio_streams: Allow multiple audio streams to be merged
into a single file
check_formats Whether to test if the formats are downloadable.
Can be True (check all), False (check none)
or None (check only if requested by extractor)
paths: Dictionary of output paths. The allowed keys are 'home'
'temp' and the keys of OUTTMPL_TYPES (in utils.py)
outtmpl: Dictionary of templates for output names. Allowed keys
@@ -253,7 +255,7 @@ class YoutubeDL(object):
writedesktoplink: Write a Linux internet shortcut file (.desktop)
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Deprecated - Use subtitlelangs = ['all']
allsubtitles: Deprecated - Use subtitleslangs = ['all']
Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
@@ -287,6 +289,9 @@ class YoutubeDL(object):
break_on_reject: Stop the download process when encountering a video that
has been filtered out.
cookiefile: File name where cookies should be read from and dumped to
cookiesfrombrowser: A tuple containing the name of the browser and the profile
name/path from where cookies are loaded.
Eg: ('chrome', ) or (vivaldi, 'default')
nocheckcertificate:Do not verify SSL certificates
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube.
@@ -447,7 +452,7 @@ class YoutubeDL(object):
params = None
_ies = []
_pps = {'pre_process': [], 'before_dl': [], 'after_move': [], 'post_process': []}
__prepare_filename_warned = False
_printed_messages = set()
_first_webpage_request = True
_download_retcode = None
_num_downloads = None
@@ -462,7 +467,7 @@ class YoutubeDL(object):
self._ies = []
self._ies_instances = {}
self._pps = {'pre_process': [], 'before_dl': [], 'after_move': [], 'post_process': []}
self.__prepare_filename_warned = False
self._printed_messages = set()
self._first_webpage_request = True
self._post_hooks = []
self._progress_hooks = []
@@ -657,8 +662,12 @@ class YoutubeDL(object):
for _ in range(line_count))
return res[:-len('\n')]
def _write_string(self, s, out=None):
write_string(s, out=out, encoding=self.params.get('encoding'))
def _write_string(self, message, out=None, only_once=False):
if only_once:
if message in self._printed_messages:
return
self._printed_messages.add(message)
write_string(message, out=out, encoding=self.params.get('encoding'))
def to_stdout(self, message, skip_eol=False, quiet=False):
"""Print message to stdout"""
@@ -669,13 +678,13 @@ class YoutubeDL(object):
'%s%s' % (self._bidi_workaround(message), ('' if skip_eol else '\n')),
self._err_file if quiet else self._screen_file)
def to_stderr(self, message):
def to_stderr(self, message, only_once=False):
"""Print message to stderr"""
assert isinstance(message, compat_str)
if self.params.get('logger'):
self.params['logger'].error(message)
else:
self._write_string('%s\n' % self._bidi_workaround(message), self._err_file)
self._write_string('%s\n' % self._bidi_workaround(message), self._err_file, only_once=only_once)
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
@@ -752,7 +761,7 @@ class YoutubeDL(object):
self.to_stdout(
message, skip_eol, quiet=self.params.get('quiet', False))
def report_warning(self, message):
def report_warning(self, message, only_once=False):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
@@ -767,7 +776,7 @@ class YoutubeDL(object):
else:
_msg_header = 'WARNING:'
warning_message = '%s %s' % (_msg_header, message)
self.to_stderr(warning_message)
self.to_stderr(warning_message, only_once)
def report_error(self, message, tb=None):
'''
@@ -781,7 +790,7 @@ class YoutubeDL(object):
error_message = '%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
def write_debug(self, message):
def write_debug(self, message, only_once=False):
'''Log debug message or Print message to stderr'''
if not self.params.get('verbose', False):
return
@@ -789,7 +798,7 @@ class YoutubeDL(object):
if self.params.get('logger'):
self.params['logger'].debug(message)
else:
self._write_string('%s\n' % message)
self.to_stderr(message, only_once)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
@@ -1014,13 +1023,13 @@ class YoutubeDL(object):
filename = self._prepare_filename(info_dict, dir_type or 'default')
if warn and not self.__prepare_filename_warned:
if warn:
if not self.params.get('paths'):
pass
elif filename == '-':
self.report_warning('--paths is ignored when an outputting to stdout')
self.report_warning('--paths is ignored when an outputting to stdout', only_once=True)
elif os.path.isabs(filename):
self.report_warning('--paths is ignored since an absolute path is given in output template')
self.report_warning('--paths is ignored since an absolute path is given in output template', only_once=True)
self.__prepare_filename_warned = True
if filename == '-' or not filename:
return filename
@@ -1136,7 +1145,7 @@ class YoutubeDL(object):
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def __handle_extraction_exceptions(func):
def __handle_extraction_exceptions(func, handle_all_errors=True):
def wrapper(self, *args, **kwargs):
try:
return func(self, *args, **kwargs)
@@ -1156,7 +1165,7 @@ class YoutubeDL(object):
except (MaxDownloadsReached, ExistingVideoReached, RejectedVideoReached):
raise
except Exception as e:
if self.params.get('ignoreerrors', False):
if handle_all_errors and self.params.get('ignoreerrors', False):
self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
else:
raise
@@ -1173,6 +1182,8 @@ class YoutubeDL(object):
'_type': 'compat_list',
'entries': ie_result,
}
if extra_info.get('original_url'):
ie_result.setdefault('original_url', extra_info['original_url'])
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
@@ -1204,6 +1215,9 @@ class YoutubeDL(object):
if result_type in ('url', 'url_transparent'):
ie_result['url'] = sanitize_url(ie_result['url'])
if ie_result.get('original_url'):
extra_info.setdefault('original_url', ie_result['original_url'])
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info)
or extract_flat is True):
@@ -1360,13 +1374,19 @@ class YoutubeDL(object):
if not isinstance(ie_entries, (list, PagedList)):
ie_entries = LazyList(ie_entries)
def get_entry(i):
return YoutubeDL.__handle_extraction_exceptions(
lambda self, i: ie_entries[i - 1],
False
)(self, i)
entries = []
for i in playlistitems or itertools.count(playliststart):
if playlistitems is None and playlistend is not None and playlistend < i:
break
entry = None
try:
entry = ie_entries[i - 1]
entry = get_entry(i)
if entry is None:
raise EntryNotInPlaylist()
except (IndexError, EntryNotInPlaylist):
@@ -1758,6 +1778,7 @@ class YoutubeDL(object):
def _check_formats(formats):
if not check_formats:
yield from formats
return
for f in formats:
self.to_screen('[info] Testing format %s' % f['format_id'])
temp_file = tempfile.NamedTemporaryFile(
@@ -1943,15 +1964,27 @@ class YoutubeDL(object):
t.get('id') if t.get('id') is not None else '',
t.get('url')))
def test_thumbnail(t):
self.to_screen('[info] Testing thumbnail %s' % t['id'])
try:
self.urlopen(HEADRequest(t['url']))
except network_exceptions as err:
self.to_screen('[info] Unable to connect to thumbnail %s URL "%s" - %s. Skipping...' % (
t['id'], t['url'], error_to_compat_str(err)))
return False
return True
def thumbnail_tester():
if self.params.get('check_formats'):
test_all = True
to_screen = lambda msg: self.to_screen(f'[info] {msg}')
else:
test_all = False
to_screen = self.write_debug
def test_thumbnail(t):
if not test_all and not t.get('_test_url'):
return True
to_screen('Testing thumbnail %s' % t['id'])
try:
self.urlopen(HEADRequest(t['url']))
except network_exceptions as err:
to_screen('Unable to connect to thumbnail %s URL "%s" - %s. Skipping...' % (
t['id'], t['url'], error_to_compat_str(err)))
return False
return True
return test_thumbnail
for i, t in enumerate(thumbnails):
if t.get('id') is None:
@@ -1959,8 +1992,11 @@ class YoutubeDL(object):
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
t['url'] = sanitize_url(t['url'])
if self.params.get('check_formats'):
info_dict['thumbnails'] = LazyList(filter(test_thumbnail, thumbnails[::-1])).reverse()
if self.params.get('check_formats') is not False:
info_dict['thumbnails'] = LazyList(filter(thumbnail_tester(), thumbnails[::-1])).reverse()
else:
info_dict['thumbnails'] = thumbnails
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
@@ -2007,7 +2043,7 @@ class YoutubeDL(object):
elif thumbnails:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
if info_dict.get('display_id') is None and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
for ts_key, date_key in (
@@ -2023,6 +2059,23 @@ class YoutubeDL(object):
except (ValueError, OverflowError, OSError):
pass
live_keys = ('is_live', 'was_live')
live_status = info_dict.get('live_status')
if live_status is None:
for key in live_keys:
if info_dict.get(key) is False:
continue
if info_dict.get(key):
live_status = key
break
if all(info_dict.get(key) is False for key in live_keys):
live_status = 'not_live'
if live_status:
info_dict['live_status'] = live_status
for key in live_keys:
if info_dict.get(key) is None:
info_dict[key] = (live_status == key)
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
for field in ('chapter', 'season', 'episode'):
@@ -2589,17 +2642,10 @@ class YoutubeDL(object):
requested_formats = info_dict['requested_formats']
old_ext = info_dict['ext']
if self.params.get('merge_output_format') is None:
if not compatible_formats(requested_formats):
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
if (info_dict['ext'] == 'webm'
and self.params.get('writethumbnail', False)
and info_dict.get('thumbnails')):
info_dict['ext'] = 'mkv'
self.report_warning(
'webm doesn\'t support embedding a thumbnail, mkv will be used.')
if self.params.get('merge_output_format') is None and not compatible_formats(requested_formats):
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
def correct_ext(filename):
filename_real_ext = os.path.splitext(filename)[1][1:]
@@ -2995,17 +3041,6 @@ class YoutubeDL(object):
res += '~' + format_bytes(fdict['filesize_approx'])
return res
def _format_note_table(self, f):
def join_fields(*vargs):
return ', '.join((val for val in vargs if val != ''))
return join_fields(
'UNSUPPORTED' if f.get('ext') in ('f4f', 'f4m') else '',
format_field(f, 'language', '[%s]'),
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
format_field(f, 'asr', '%5dHz'))
def list_formats(self, info_dict):
formats = info_dict.get('formats', [info_dict])
new_format = (
@@ -3028,11 +3063,15 @@ class YoutubeDL(object):
format_field(f, 'acodec', default='unknown').replace('none', ''),
format_field(f, 'abr', '%3dk'),
format_field(f, 'asr', '%5dHz'),
self._format_note_table(f)]
for f in formats
if f.get('preference') is None or f['preference'] >= -1000]
', '.join(filter(None, (
'UNSUPPORTED' if f.get('ext') in ('f4f', 'f4m') else '',
format_field(f, 'language', '[%s]'),
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
format_field(f, 'asr', '%5dHz')))),
] for f in formats if f.get('preference') is None or f['preference'] >= -1000]
header_line = ['ID', 'EXT', 'RESOLUTION', 'FPS', '|', ' FILESIZE', ' TBR', 'PROTO',
'|', 'VCODEC', ' VBR', 'ACODEC', ' ABR', ' ASR', 'NOTE']
'|', 'VCODEC', ' VBR', 'ACODEC', ' ABR', ' ASR', 'MORE INFO']
else:
table = [
[
@@ -3179,16 +3218,11 @@ class YoutubeDL(object):
timeout_val = self.params.get('socket_timeout')
self._socket_timeout = 600 if timeout_val is None else float(timeout_val)
opts_cookiesfrombrowser = self.params.get('cookiesfrombrowser')
opts_cookiefile = self.params.get('cookiefile')
opts_proxy = self.params.get('proxy')
if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar()
else:
opts_cookiefile = expand_path(opts_cookiefile)
self.cookiejar = YoutubeDLCookieJar(opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK):
self.cookiejar.load(ignore_discard=True, ignore_expires=True)
self.cookiejar = load_cookies(opts_cookiefile, opts_cookiesfrombrowser, self)
cookie_processor = YoutubeDLCookieProcessor(self.cookiejar)
if opts_proxy is not None:

View File

@@ -20,6 +20,7 @@ from .compat import (
compat_getpass,
workaround_optparse_bug9161,
)
from .cookies import SUPPORTED_BROWSERS
from .utils import (
DateRange,
decodeOption,
@@ -242,6 +243,12 @@ def _real_main(argv=None):
if opts.convertthumbnails not in FFmpegThumbnailsConvertorPP.SUPPORTED_EXTS:
parser.error('invalid thumbnail format specified')
if opts.cookiesfrombrowser is not None:
opts.cookiesfrombrowser = [
part.strip() or None for part in opts.cookiesfrombrowser.split(':', 1)]
if opts.cookiesfrombrowser[0] not in SUPPORTED_BROWSERS:
parser.error('unsupported browser specified for cookies')
if opts.date is not None:
date = DateRange.day(opts.date)
else:
@@ -415,6 +422,13 @@ def _real_main(argv=None):
# Run this before the actual video download
'when': 'before_dl'
})
# Must be after all other before_dl
if opts.exec_before_dl_cmd:
postprocessors.append({
'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_before_dl_cmd,
'when': 'before_dl'
})
if opts.extractaudio:
postprocessors.append({
'key': 'FFmpegExtractAudio',
@@ -621,6 +635,7 @@ def _real_main(argv=None):
'break_on_reject': opts.break_on_reject,
'skip_playlist_after_errors': opts.skip_playlist_after_errors,
'cookiefile': opts.cookiefile,
'cookiesfrombrowser': opts.cookiesfrombrowser,
'nocheckcertificate': opts.no_check_certificate,
'prefer_insecure': opts.prefer_insecure,
'proxy': opts.proxy,

File diff suppressed because it is too large Load Diff

730
yt_dlp/cookies.py Normal file
View File

@@ -0,0 +1,730 @@
import ctypes
import json
import os
import shutil
import sqlite3
import struct
import subprocess
import sys
import tempfile
from datetime import datetime, timedelta, timezone
from hashlib import pbkdf2_hmac
from yt_dlp.aes import aes_cbc_decrypt
from yt_dlp.compat import (
compat_b64decode,
compat_cookiejar_Cookie,
)
from yt_dlp.utils import (
bytes_to_intlist,
expand_path,
intlist_to_bytes,
process_communicate_or_kill,
YoutubeDLCookieJar,
)
try:
from Crypto.Cipher import AES
CRYPTO_AVAILABLE = True
except ImportError:
CRYPTO_AVAILABLE = False
try:
import keyring
KEYRING_AVAILABLE = True
except ImportError:
KEYRING_AVAILABLE = False
CHROMIUM_BASED_BROWSERS = {'brave', 'chrome', 'chromium', 'edge', 'opera', 'vivaldi'}
SUPPORTED_BROWSERS = CHROMIUM_BASED_BROWSERS | {'firefox', 'safari'}
class YDLLogger:
def __init__(self, ydl=None):
self._ydl = ydl
def debug(self, message):
if self._ydl:
self._ydl.write_debug(message)
def info(self, message):
if self._ydl:
self._ydl.to_screen(f'[Cookies] {message}')
def warning(self, message, only_once=False):
if self._ydl:
self._ydl.report_warning(message, only_once)
def error(self, message):
if self._ydl:
self._ydl.report_error(message)
def load_cookies(cookie_file, browser_specification, ydl):
cookie_jars = []
if browser_specification is not None:
browser_name, profile = _parse_browser_specification(*browser_specification)
cookie_jars.append(extract_cookies_from_browser(browser_name, profile, YDLLogger(ydl)))
if cookie_file is not None:
cookie_file = expand_path(cookie_file)
jar = YoutubeDLCookieJar(cookie_file)
if os.access(cookie_file, os.R_OK):
jar.load(ignore_discard=True, ignore_expires=True)
cookie_jars.append(jar)
return _merge_cookie_jars(cookie_jars)
def extract_cookies_from_browser(browser_name, profile=None, logger=YDLLogger()):
if browser_name == 'firefox':
return _extract_firefox_cookies(profile, logger)
elif browser_name == 'safari':
return _extract_safari_cookies(profile, logger)
elif browser_name in CHROMIUM_BASED_BROWSERS:
return _extract_chrome_cookies(browser_name, profile, logger)
else:
raise ValueError('unknown browser: {}'.format(browser_name))
def _extract_firefox_cookies(profile, logger):
logger.info('Extracting cookies from firefox')
if profile is None:
search_root = _firefox_browser_dir()
elif _is_path(profile):
search_root = profile
else:
search_root = os.path.join(_firefox_browser_dir(), profile)
cookie_database_path = _find_most_recently_used_file(search_root, 'cookies.sqlite')
if cookie_database_path is None:
raise FileNotFoundError('could not find firefox cookies database in {}'.format(search_root))
logger.debug('extracting from: "{}"'.format(cookie_database_path))
with tempfile.TemporaryDirectory(prefix='youtube_dl') as tmpdir:
cursor = None
try:
cursor = _open_database_copy(cookie_database_path, tmpdir)
cursor.execute('SELECT host, name, value, path, expiry, isSecure FROM moz_cookies')
jar = YoutubeDLCookieJar()
for host, name, value, path, expiry, is_secure in cursor.fetchall():
cookie = compat_cookiejar_Cookie(
version=0, name=name, value=value, port=None, port_specified=False,
domain=host, domain_specified=bool(host), domain_initial_dot=host.startswith('.'),
path=path, path_specified=bool(path), secure=is_secure, expires=expiry, discard=False,
comment=None, comment_url=None, rest={})
jar.set_cookie(cookie)
logger.info('Extracted {} cookies from firefox'.format(len(jar)))
return jar
finally:
if cursor is not None:
cursor.connection.close()
def _firefox_browser_dir():
if sys.platform in ('linux', 'linux2'):
return os.path.expanduser('~/.mozilla/firefox')
elif sys.platform == 'win32':
return os.path.expandvars(r'%APPDATA%\Mozilla\Firefox\Profiles')
elif sys.platform == 'darwin':
return os.path.expanduser('~/Library/Application Support/Firefox')
else:
raise ValueError('unsupported platform: {}'.format(sys.platform))
def _get_chromium_based_browser_settings(browser_name):
# https://chromium.googlesource.com/chromium/src/+/HEAD/docs/user_data_dir.md
if sys.platform in ('linux', 'linux2'):
config = _config_home()
browser_dir = {
'brave': os.path.join(config, 'BraveSoftware/Brave-Browser'),
'chrome': os.path.join(config, 'google-chrome'),
'chromium': os.path.join(config, 'chromium'),
'edge': os.path.join(config, 'microsoft-edge'),
'opera': os.path.join(config, 'opera'),
'vivaldi': os.path.join(config, 'vivaldi'),
}[browser_name]
elif sys.platform == 'win32':
appdata_local = os.path.expandvars('%LOCALAPPDATA%')
appdata_roaming = os.path.expandvars('%APPDATA%')
browser_dir = {
'brave': os.path.join(appdata_local, r'BraveSoftware\Brave-Browser\User Data'),
'chrome': os.path.join(appdata_local, r'Google\Chrome\User Data'),
'chromium': os.path.join(appdata_local, r'Chromium\User Data'),
'edge': os.path.join(appdata_local, r'Microsoft\Edge\User Data'),
'opera': os.path.join(appdata_roaming, r'Opera Software\Opera Stable'),
'vivaldi': os.path.join(appdata_local, r'Vivaldi\User Data'),
}[browser_name]
elif sys.platform == 'darwin':
appdata = os.path.expanduser('~/Library/Application Support')
browser_dir = {
'brave': os.path.join(appdata, 'BraveSoftware/Brave-Browser'),
'chrome': os.path.join(appdata, 'Google/Chrome'),
'chromium': os.path.join(appdata, 'Chromium'),
'edge': os.path.join(appdata, 'Microsoft Edge'),
'opera': os.path.join(appdata, 'com.operasoftware.Opera'),
'vivaldi': os.path.join(appdata, 'Vivaldi'),
}[browser_name]
else:
raise ValueError('unsupported platform: {}'.format(sys.platform))
# Linux keyring names can be determined by snooping on dbus while opening the browser in KDE:
# dbus-monitor "interface='org.kde.KWallet'" "type=method_return"
keyring_name = {
'brave': 'Brave',
'chrome': 'Chrome',
'chromium': 'Chromium',
'edge': 'Mirosoft Edge' if sys.platform == 'darwin' else 'Chromium',
'opera': 'Opera' if sys.platform == 'darwin' else 'Chromium',
'vivaldi': 'Vivaldi' if sys.platform == 'darwin' else 'Chrome',
}[browser_name]
browsers_without_profiles = {'opera'}
return {
'browser_dir': browser_dir,
'keyring_name': keyring_name,
'supports_profiles': browser_name not in browsers_without_profiles
}
def _extract_chrome_cookies(browser_name, profile, logger):
logger.info('Extracting cookies from {}'.format(browser_name))
config = _get_chromium_based_browser_settings(browser_name)
if profile is None:
search_root = config['browser_dir']
elif _is_path(profile):
search_root = profile
config['browser_dir'] = os.path.dirname(profile) if config['supports_profiles'] else profile
else:
if config['supports_profiles']:
search_root = os.path.join(config['browser_dir'], profile)
else:
logger.error('{} does not support profiles'.format(browser_name))
search_root = config['browser_dir']
cookie_database_path = _find_most_recently_used_file(search_root, 'Cookies')
if cookie_database_path is None:
raise FileNotFoundError('could not find {} cookies database in "{}"'.format(browser_name, search_root))
logger.debug('extracting from: "{}"'.format(cookie_database_path))
decryptor = get_cookie_decryptor(config['browser_dir'], config['keyring_name'], logger)
with tempfile.TemporaryDirectory(prefix='youtube_dl') as tmpdir:
cursor = None
try:
cursor = _open_database_copy(cookie_database_path, tmpdir)
cursor.connection.text_factory = bytes
column_names = _get_column_names(cursor, 'cookies')
secure_column = 'is_secure' if 'is_secure' in column_names else 'secure'
cursor.execute('SELECT host_key, name, value, encrypted_value, path, '
'expires_utc, {} FROM cookies'.format(secure_column))
jar = YoutubeDLCookieJar()
failed_cookies = 0
for host_key, name, value, encrypted_value, path, expires_utc, is_secure in cursor.fetchall():
host_key = host_key.decode('utf-8')
name = name.decode('utf-8')
value = value.decode('utf-8')
path = path.decode('utf-8')
if not value and encrypted_value:
value = decryptor.decrypt(encrypted_value)
if value is None:
failed_cookies += 1
continue
cookie = compat_cookiejar_Cookie(
version=0, name=name, value=value, port=None, port_specified=False,
domain=host_key, domain_specified=bool(host_key), domain_initial_dot=host_key.startswith('.'),
path=path, path_specified=bool(path), secure=is_secure, expires=expires_utc, discard=False,
comment=None, comment_url=None, rest={})
jar.set_cookie(cookie)
if failed_cookies > 0:
failed_message = ' ({} could not be decrypted)'.format(failed_cookies)
else:
failed_message = ''
logger.info('Extracted {} cookies from {}{}'.format(len(jar), browser_name, failed_message))
return jar
finally:
if cursor is not None:
cursor.connection.close()
class ChromeCookieDecryptor:
"""
Overview:
Linux:
- cookies are either v10 or v11
- v10: AES-CBC encrypted with a fixed key
- v11: AES-CBC encrypted with an OS protected key (keyring)
- v11 keys can be stored in various places depending on the activate desktop environment [2]
Mac:
- cookies are either v10 or not v10
- v10: AES-CBC encrypted with an OS protected key (keyring) and more key derivation iterations than linux
- not v10: 'old data' stored as plaintext
Windows:
- cookies are either v10 or not v10
- v10: AES-GCM encrypted with a key which is encrypted with DPAPI
- not v10: encrypted with DPAPI
Sources:
- [1] https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/
- [2] https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/key_storage_linux.cc
- KeyStorageLinux::CreateService
"""
def decrypt(self, encrypted_value):
raise NotImplementedError
def get_cookie_decryptor(browser_root, browser_keyring_name, logger):
if sys.platform in ('linux', 'linux2'):
return LinuxChromeCookieDecryptor(browser_keyring_name, logger)
elif sys.platform == 'darwin':
return MacChromeCookieDecryptor(browser_keyring_name, logger)
elif sys.platform == 'win32':
return WindowsChromeCookieDecryptor(browser_root, logger)
else:
raise NotImplementedError('Chrome cookie decryption is not supported '
'on this platform: {}'.format(sys.platform))
class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
def __init__(self, browser_keyring_name, logger):
self._logger = logger
self._v10_key = self.derive_key(b'peanuts')
if KEYRING_AVAILABLE:
self._v11_key = self.derive_key(_get_linux_keyring_password(browser_keyring_name))
else:
self._v11_key = None
@staticmethod
def derive_key(password):
# values from
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_linux.cc
return pbkdf2_sha1(password, salt=b'saltysalt', iterations=1, key_length=16)
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
return _decrypt_aes_cbc(ciphertext, self._v10_key, self._logger)
elif version == b'v11':
if self._v11_key is None:
self._logger.warning('cannot decrypt cookie as the `keyring` module is not installed. '
'Please install by running `python3 -m pip install keyring`. '
'Note that depending on your platform, additional packages may be required '
'to access the keyring, see https://pypi.org/project/keyring', only_once=True)
return None
return _decrypt_aes_cbc(ciphertext, self._v11_key, self._logger)
else:
return None
class MacChromeCookieDecryptor(ChromeCookieDecryptor):
def __init__(self, browser_keyring_name, logger):
self._logger = logger
password = _get_mac_keyring_password(browser_keyring_name)
self._v10_key = None if password is None else self.derive_key(password)
@staticmethod
def derive_key(password):
# values from
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_mac.mm
return pbkdf2_sha1(password, salt=b'saltysalt', iterations=1003, key_length=16)
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
if self._v10_key is None:
self._logger.warning('cannot decrypt v10 cookies: no key found', only_once=True)
return None
return _decrypt_aes_cbc(ciphertext, self._v10_key, self._logger)
else:
# other prefixes are considered 'old data' which were stored as plaintext
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_mac.mm
return encrypted_value
class WindowsChromeCookieDecryptor(ChromeCookieDecryptor):
def __init__(self, browser_root, logger):
self._logger = logger
self._v10_key = _get_windows_v10_key(browser_root, logger)
def decrypt(self, encrypted_value):
version = encrypted_value[:3]
ciphertext = encrypted_value[3:]
if version == b'v10':
if self._v10_key is None:
self._logger.warning('cannot decrypt v10 cookies: no key found', only_once=True)
return None
elif not CRYPTO_AVAILABLE:
self._logger.warning('cannot decrypt cookie as the `pycryptodome` module is not installed. '
'Please install by running `python3 -m pip install pycryptodome`',
only_once=True)
return None
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_win.cc
# kNonceLength
nonce_length = 96 // 8
# boringssl
# EVP_AEAD_AES_GCM_TAG_LEN
authentication_tag_length = 16
raw_ciphertext = ciphertext
nonce = raw_ciphertext[:nonce_length]
ciphertext = raw_ciphertext[nonce_length:-authentication_tag_length]
authentication_tag = raw_ciphertext[-authentication_tag_length:]
return _decrypt_aes_gcm(ciphertext, self._v10_key, nonce, authentication_tag, self._logger)
else:
# any other prefix means the data is DPAPI encrypted
# https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/os_crypt_win.cc
return _decrypt_windows_dpapi(encrypted_value, self._logger).decode('utf-8')
def _extract_safari_cookies(profile, logger):
if profile is not None:
logger.error('safari does not support profiles')
if sys.platform != 'darwin':
raise ValueError('unsupported platform: {}'.format(sys.platform))
cookies_path = os.path.expanduser('~/Library/Cookies/Cookies.binarycookies')
if not os.path.isfile(cookies_path):
raise FileNotFoundError('could not find safari cookies database')
with open(cookies_path, 'rb') as f:
cookies_data = f.read()
jar = parse_safari_cookies(cookies_data, logger=logger)
logger.info('Extracted {} cookies from safari'.format(len(jar)))
return jar
class ParserError(Exception):
pass
class DataParser:
def __init__(self, data, logger):
self._data = data
self.cursor = 0
self._logger = logger
def read_bytes(self, num_bytes):
if num_bytes < 0:
raise ParserError('invalid read of {} bytes'.format(num_bytes))
end = self.cursor + num_bytes
if end > len(self._data):
raise ParserError('reached end of input')
data = self._data[self.cursor:end]
self.cursor = end
return data
def expect_bytes(self, expected_value, message):
value = self.read_bytes(len(expected_value))
if value != expected_value:
raise ParserError('unexpected value: {} != {} ({})'.format(value, expected_value, message))
def read_uint(self, big_endian=False):
data_format = '>I' if big_endian else '<I'
return struct.unpack(data_format, self.read_bytes(4))[0]
def read_double(self, big_endian=False):
data_format = '>d' if big_endian else '<d'
return struct.unpack(data_format, self.read_bytes(8))[0]
def read_cstring(self):
buffer = []
while True:
c = self.read_bytes(1)
if c == b'\x00':
return b''.join(buffer).decode('utf-8')
else:
buffer.append(c)
def skip(self, num_bytes, description='unknown'):
if num_bytes > 0:
self._logger.debug('skipping {} bytes ({}): {}'.format(
num_bytes, description, self.read_bytes(num_bytes)))
elif num_bytes < 0:
raise ParserError('invalid skip of {} bytes'.format(num_bytes))
def skip_to(self, offset, description='unknown'):
self.skip(offset - self.cursor, description)
def skip_to_end(self, description='unknown'):
self.skip_to(len(self._data), description)
def _mac_absolute_time_to_posix(timestamp):
return int((datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc) + timedelta(seconds=timestamp)).timestamp())
def _parse_safari_cookies_header(data, logger):
p = DataParser(data, logger)
p.expect_bytes(b'cook', 'database signature')
number_of_pages = p.read_uint(big_endian=True)
page_sizes = [p.read_uint(big_endian=True) for _ in range(number_of_pages)]
return page_sizes, p.cursor
def _parse_safari_cookies_page(data, jar, logger):
p = DataParser(data, logger)
p.expect_bytes(b'\x00\x00\x01\x00', 'page signature')
number_of_cookies = p.read_uint()
record_offsets = [p.read_uint() for _ in range(number_of_cookies)]
if number_of_cookies == 0:
logger.debug('a cookies page of size {} has no cookies'.format(len(data)))
return
p.skip_to(record_offsets[0], 'unknown page header field')
for record_offset in record_offsets:
p.skip_to(record_offset, 'space between records')
record_length = _parse_safari_cookies_record(data[record_offset:], jar, logger)
p.read_bytes(record_length)
p.skip_to_end('space in between pages')
def _parse_safari_cookies_record(data, jar, logger):
p = DataParser(data, logger)
record_size = p.read_uint()
p.skip(4, 'unknown record field 1')
flags = p.read_uint()
is_secure = bool(flags & 0x0001)
p.skip(4, 'unknown record field 2')
domain_offset = p.read_uint()
name_offset = p.read_uint()
path_offset = p.read_uint()
value_offset = p.read_uint()
p.skip(8, 'unknown record field 3')
expiration_date = _mac_absolute_time_to_posix(p.read_double())
_creation_date = _mac_absolute_time_to_posix(p.read_double()) # noqa: F841
try:
p.skip_to(domain_offset)
domain = p.read_cstring()
p.skip_to(name_offset)
name = p.read_cstring()
p.skip_to(path_offset)
path = p.read_cstring()
p.skip_to(value_offset)
value = p.read_cstring()
except UnicodeDecodeError:
logger.warning('failed to parse cookie because UTF-8 decoding failed')
return record_size
p.skip_to(record_size, 'space at the end of the record')
cookie = compat_cookiejar_Cookie(
version=0, name=name, value=value, port=None, port_specified=False,
domain=domain, domain_specified=bool(domain), domain_initial_dot=domain.startswith('.'),
path=path, path_specified=bool(path), secure=is_secure, expires=expiration_date, discard=False,
comment=None, comment_url=None, rest={})
jar.set_cookie(cookie)
return record_size
def parse_safari_cookies(data, jar=None, logger=YDLLogger()):
"""
References:
- https://github.com/libyal/dtformats/blob/main/documentation/Safari%20Cookies.asciidoc
- this data appears to be out of date but the important parts of the database structure is the same
- there are a few bytes here and there which are skipped during parsing
"""
if jar is None:
jar = YoutubeDLCookieJar()
page_sizes, body_start = _parse_safari_cookies_header(data, logger)
p = DataParser(data[body_start:], logger)
for page_size in page_sizes:
_parse_safari_cookies_page(p.read_bytes(page_size), jar, logger)
p.skip_to_end('footer')
return jar
def _get_linux_keyring_password(browser_keyring_name):
password = keyring.get_password('{} Keys'.format(browser_keyring_name),
'{} Safe Storage'.format(browser_keyring_name))
if password is None:
# this sometimes occurs in KDE because chrome does not check hasEntry and instead
# just tries to read the value (which kwallet returns "") whereas keyring checks hasEntry
# to verify this:
# dbus-monitor "interface='org.kde.KWallet'" "type=method_return"
# while starting chrome.
# this may be a bug as the intended behaviour is to generate a random password and store
# it, but that doesn't matter here.
password = ''
return password.encode('utf-8')
def _get_mac_keyring_password(browser_keyring_name):
if KEYRING_AVAILABLE:
password = keyring.get_password('{} Safe Storage'.format(browser_keyring_name), browser_keyring_name)
return password.encode('utf-8')
else:
proc = subprocess.Popen(['security', 'find-generic-password',
'-w', # write password to stdout
'-a', browser_keyring_name, # match 'account'
'-s', '{} Safe Storage'.format(browser_keyring_name)], # match 'service'
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL)
try:
stdout, stderr = process_communicate_or_kill(proc)
return stdout
except BaseException:
return None
def _get_windows_v10_key(browser_root, logger):
path = _find_most_recently_used_file(browser_root, 'Local State')
if path is None:
logger.error('could not find local state file')
return None
with open(path, 'r') as f:
data = json.load(f)
try:
base64_key = data['os_crypt']['encrypted_key']
except KeyError:
logger.error('no encrypted key in Local State')
return None
encrypted_key = compat_b64decode(base64_key)
prefix = b'DPAPI'
if not encrypted_key.startswith(prefix):
logger.error('invalid key')
return None
return _decrypt_windows_dpapi(encrypted_key[len(prefix):], logger)
def pbkdf2_sha1(password, salt, iterations, key_length):
return pbkdf2_hmac('sha1', password, salt, iterations, key_length)
def _decrypt_aes_cbc(ciphertext, key, logger, initialization_vector=b' ' * 16):
plaintext = aes_cbc_decrypt(bytes_to_intlist(ciphertext),
bytes_to_intlist(key),
bytes_to_intlist(initialization_vector))
padding_length = plaintext[-1]
try:
return intlist_to_bytes(plaintext[:-padding_length]).decode('utf-8')
except UnicodeDecodeError:
logger.warning('failed to decrypt cookie because UTF-8 decoding failed. Possibly the key is wrong?')
return None
def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag, logger):
cipher = AES.new(key, AES.MODE_GCM, nonce)
try:
plaintext = cipher.decrypt_and_verify(ciphertext, authentication_tag)
except ValueError:
logger.warning('failed to decrypt cookie because the MAC check failed. Possibly the key is wrong?')
return None
try:
return plaintext.decode('utf-8')
except UnicodeDecodeError:
logger.warning('failed to decrypt cookie because UTF-8 decoding failed. Possibly the key is wrong?')
return None
def _decrypt_windows_dpapi(ciphertext, logger):
"""
References:
- https://docs.microsoft.com/en-us/windows/win32/api/dpapi/nf-dpapi-cryptunprotectdata
"""
from ctypes.wintypes import DWORD
class DATA_BLOB(ctypes.Structure):
_fields_ = [('cbData', DWORD),
('pbData', ctypes.POINTER(ctypes.c_char))]
buffer = ctypes.create_string_buffer(ciphertext)
blob_in = DATA_BLOB(ctypes.sizeof(buffer), buffer)
blob_out = DATA_BLOB()
ret = ctypes.windll.crypt32.CryptUnprotectData(
ctypes.byref(blob_in), # pDataIn
None, # ppszDataDescr: human readable description of pDataIn
None, # pOptionalEntropy: salt?
None, # pvReserved: must be NULL
None, # pPromptStruct: information about prompts to display
0, # dwFlags
ctypes.byref(blob_out) # pDataOut
)
if not ret:
logger.warning('failed to decrypt with DPAPI')
return None
result = ctypes.string_at(blob_out.pbData, blob_out.cbData)
ctypes.windll.kernel32.LocalFree(blob_out.pbData)
return result
def _config_home():
return os.environ.get('XDG_CONFIG_HOME', os.path.expanduser('~/.config'))
def _open_database_copy(database_path, tmpdir):
# cannot open sqlite databases if they are already in use (e.g. by the browser)
database_copy_path = os.path.join(tmpdir, 'temporary.sqlite')
shutil.copy(database_path, database_copy_path)
conn = sqlite3.connect(database_copy_path)
return conn.cursor()
def _get_column_names(cursor, table_name):
table_info = cursor.execute('PRAGMA table_info({})'.format(table_name)).fetchall()
return [row[1].decode('utf-8') for row in table_info]
def _find_most_recently_used_file(root, filename):
# if there are multiple browser profiles, take the most recently used one
paths = []
for root, dirs, files in os.walk(root):
for file in files:
if file == filename:
paths.append(os.path.join(root, file))
return None if not paths else max(paths, key=lambda path: os.lstat(path).st_mtime)
def _merge_cookie_jars(jars):
output_jar = YoutubeDLCookieJar()
for jar in jars:
for cookie in jar:
output_jar.set_cookie(cookie)
if jar.filename is not None:
output_jar.filename = jar.filename
return output_jar
def _is_path(value):
return os.path.sep in value
def _parse_browser_specification(browser_name, profile=None):
if browser_name not in SUPPORTED_BROWSERS:
raise ValueError(f'unsupported browser: "{browser_name}"')
if profile is not None and _is_path(profile):
profile = os.path.expanduser(profile)
return browser_name, profile

View File

@@ -116,7 +116,7 @@ class YoutubeLiveChatFD(FragmentFD):
if not success:
return False, None, None, None
try:
data = ie._extract_yt_initial_data(video_id, raw_fragment.decode('utf-8', 'replace'))
data = ie.extract_yt_initial_data(video_id, raw_fragment.decode('utf-8', 'replace'))
except RegexNotFoundError:
data = None
if not data:
@@ -146,7 +146,7 @@ class YoutubeLiveChatFD(FragmentFD):
if not success:
return False
try:
data = ie._extract_yt_initial_data(video_id, raw_fragment.decode('utf-8', 'replace'))
data = ie.extract_yt_initial_data(video_id, raw_fragment.decode('utf-8', 'replace'))
except RegexNotFoundError:
return False
continuation_id = try_get(
@@ -155,7 +155,7 @@ class YoutubeLiveChatFD(FragmentFD):
# no data yet but required to call _append_fragment
self._append_fragment(ctx, b'')
ytcfg = ie._extract_ytcfg(video_id, raw_fragment.decode('utf-8', 'replace'))
ytcfg = ie.extract_ytcfg(video_id, raw_fragment.decode('utf-8', 'replace'))
if not ytcfg:
return False
@@ -183,7 +183,7 @@ class YoutubeLiveChatFD(FragmentFD):
request_data['currentPlayerState'] = {'playerOffsetMs': str(max(offset - 5000, 0))}
if click_tracking_params:
request_data['context']['clickTracking'] = {'clickTrackingParams': click_tracking_params}
headers = ie._generate_api_headers(ytcfg, visitor_data=visitor_data)
headers = ie.generate_api_headers(ytcfg, visitor_data=visitor_data)
headers.update({'content-type': 'application/json'})
fragment_request_data = json.dumps(request_data, ensure_ascii=False).encode('utf-8') + b'\n'
success, continuation_id, offset, click_tracking_params = download_and_parse_fragment(

View File

@@ -8,6 +8,9 @@ from ..utils import (
smuggle_url,
update_url_query,
int_or_none,
float_or_none,
try_get,
dict_get,
)
@@ -24,6 +27,11 @@ class BravoTVIE(AdobePassIE):
'uploader': 'NBCU-BRAV',
'upload_date': '20190314',
'timestamp': 1552591860,
'season_number': 16,
'episode_number': 15,
'series': 'Top Chef',
'episode': 'The Top Chef Season 16 Winner Is...',
'duration': 190.0,
}
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
@@ -79,12 +87,34 @@ class BravoTVIE(AdobePassIE):
'episode_number': int_or_none(metadata.get('episode_num')),
})
query['switch'] = 'progressive'
tp_url = 'http://link.theplatform.com/s/%s/%s' % (account_pid, tp_path)
tp_metadata = self._download_json(
update_url_query(tp_url, {'format': 'preview'}),
display_id, fatal=False)
if tp_metadata:
info.update({
'title': tp_metadata.get('title'),
'description': tp_metadata.get('description'),
'duration': float_or_none(tp_metadata.get('duration'), 1000),
'season_number': int_or_none(
dict_get(tp_metadata, ('pl1$seasonNumber', 'nbcu$seasonNumber'))),
'episode_number': int_or_none(
dict_get(tp_metadata, ('pl1$episodeNumber', 'nbcu$episodeNumber'))),
# For some reason the series is sometimes wrapped into a single element array.
'series': try_get(
dict_get(tp_metadata, ('pl1$show', 'nbcu$show')),
lambda x: x[0] if isinstance(x, list) else x,
expected_type=str),
'episode': dict_get(
tp_metadata, ('pl1$episodeName', 'nbcu$episodeName', 'title')),
})
info.update({
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, tp_path),
query), {'force_smil_url': True}),
'url': smuggle_url(update_url_query(tp_url, query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})
return info

View File

@@ -19,7 +19,6 @@ from ..compat import (
compat_etree_Element,
compat_etree_fromstring,
compat_getpass,
compat_integer_types,
compat_http_client,
compat_os_name,
compat_str,
@@ -79,6 +78,7 @@ from ..utils import (
urljoin,
url_basename,
url_or_none,
variadic,
xpath_element,
xpath_text,
xpath_with_ns,
@@ -229,6 +229,7 @@ class InfoExtractor(object):
* "resolution" (optional, string "{width}x{height}",
deprecated)
* "filesize" (optional, int)
* "_test_url" (optional, bool) - If true, test the URL
thumbnail: Full URL to a video thumbnail image.
description: Full video description.
uploader: Full name of the video uploader.
@@ -296,6 +297,8 @@ class InfoExtractor(object):
live stream that goes on instead of a fixed-length video.
was_live: True, False, or None (=unknown). Whether this video was
originally a live stream.
live_status: 'is_live', 'upcoming', 'was_live', 'not_live' or None (=unknown)
If absent, automatically set from is_live, was_live
start_time: Time in seconds where the reproduction should start, as
specified in the URL.
end_time: Time in seconds where the reproduction should end, as
@@ -628,14 +631,10 @@ class InfoExtractor(object):
assert isinstance(err, compat_urllib_error.HTTPError)
if expected_status is None:
return False
if isinstance(expected_status, compat_integer_types):
return err.code == expected_status
elif isinstance(expected_status, (list, tuple)):
return err.code in expected_status
elif callable(expected_status):
return expected_status(err.code) is True
else:
assert False
return err.code in variadic(expected_status)
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}, expected_status=None):
"""
@@ -1115,6 +1114,8 @@ class InfoExtractor(object):
if group is None:
# return the first matching group
return next(g for g in mobj.groups() if g is not None)
elif isinstance(group, (list, tuple)):
return tuple(mobj.group(g) for g in group)
else:
return mobj.group(group)
elif default is not NO_DEFAULT:
@@ -1207,8 +1208,7 @@ class InfoExtractor(object):
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs):
if not isinstance(prop, (list, tuple)):
prop = [prop]
prop = variadic(prop)
if name is None:
name = 'OpenGraph %s' % prop[0]
og_regexes = []
@@ -1238,8 +1238,7 @@ class InfoExtractor(object):
return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if not isinstance(name, (list, tuple)):
name = [name]
name = variadic(name)
if display_name is None:
display_name = name[0]
return self._html_search_regex(
@@ -2210,7 +2209,7 @@ class InfoExtractor(object):
out.append('{%s}%s' % (namespace, c))
return '/'.join(out)
def _extract_smil_formats(self, smil_url, video_id, fatal=True, f4m_params=None, transform_source=None):
def _extract_smil_formats_and_subtitles(self, smil_url, video_id, fatal=True, f4m_params=None, transform_source=None):
smil = self._download_smil(smil_url, video_id, fatal=fatal, transform_source=transform_source)
if smil is False:
@@ -2219,8 +2218,21 @@ class InfoExtractor(object):
namespace = self._parse_smil_namespace(smil)
return self._parse_smil_formats(
fmts = self._parse_smil_formats(
smil, smil_url, video_id, namespace=namespace, f4m_params=f4m_params)
subs = self._parse_smil_subtitles(
smil, namespace=namespace)
return fmts, subs
def _extract_smil_formats(self, *args, **kwargs):
fmts, subs = self._extract_smil_formats_and_subtitles(*args, **kwargs)
if subs:
self.report_warning(bug_reports_message(
"Ignoring subtitle tracks found in the SMIL manifest; "
"if any subtitle tracks are missing,"
))
return fmts
def _extract_smil_info(self, smil_url, video_id, fatal=True, f4m_params=None):
smil = self._download_smil(smil_url, video_id, fatal=fatal)
@@ -3498,9 +3510,18 @@ class InfoExtractor(object):
else 'public' if all_known
else None)
def _configuration_arg(self, key):
return traverse_obj(
def _configuration_arg(self, key, default=NO_DEFAULT, casesense=False):
'''
@returns A list of values for the extractor argument given by "key"
or "default" if no such key is present
@param default The default value to return when the key is not present (default: [])
@param casesense When false, the values are converted to lower case
'''
val = traverse_obj(
self._downloader.params, ('extractor_args', self.ie_key().lower(), key))
if val is None:
return [] if default is NO_DEFAULT else default
return list(val) if casesense else [x.lower() for x in val]
class SearchInfoExtractor(InfoExtractor):

View File

@@ -636,7 +636,7 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login|media-\d+))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{
'url': 'http://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
'url': 'https://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
'info_dict': {
'id': 'a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
'title': 'A Bridge to the Starry Skies - Hoshizora e Kakaru Hashi'
@@ -661,7 +661,8 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
show_id = self._match_id(url)
webpage = self._download_webpage(
self._add_skip_wall(url), show_id,
# https:// gives a 403, but http:// does not
self._add_skip_wall(url).replace('https://', 'http://'), show_id,
headers=self.geo_verification_headers())
title = self._html_search_meta('name', webpage, default=None)

145
yt_dlp/extractor/douyin.py Normal file
View File

@@ -0,0 +1,145 @@
# coding: utf-8
from ..utils import (
int_or_none,
traverse_obj,
url_or_none,
)
from .common import (
InfoExtractor,
compat_urllib_parse_unquote,
)
class DouyinIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?douyin\.com/video/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.douyin.com/video/6961737553342991651',
'md5': '10523312c8b8100f353620ac9dc8f067',
'info_dict': {
'id': '6961737553342991651',
'ext': 'mp4',
'title': '#杨超越 小小水手带你去远航❤️',
'uploader': '杨超越',
'upload_date': '20210513',
'timestamp': 1620905839,
'uploader_id': '110403406559',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
}
}, {
'url': 'https://www.douyin.com/video/6982497745948921092',
'md5': 'd78408c984b9b5102904cf6b6bc2d712',
'info_dict': {
'id': '6982497745948921092',
'ext': 'mp4',
'title': '这个夏日和小羊@杨超越 一起遇见白色幻想',
'uploader': '杨超越工作室',
'upload_date': '20210708',
'timestamp': 1625739481,
'uploader_id': '408654318141572',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
}
}, {
'url': 'https://www.douyin.com/video/6953975910773099811',
'md5': '72e882e24f75064c218b76c8b713c185',
'info_dict': {
'id': '6953975910773099811',
'ext': 'mp4',
'title': '#一起看海 出现在你的夏日里',
'uploader': '杨超越',
'upload_date': '20210422',
'timestamp': 1619098692,
'uploader_id': '110403406559',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
}
}, {
'url': 'https://www.douyin.com/video/6950251282489675042',
'md5': 'b4db86aec367ef810ddd38b1737d2fed',
'info_dict': {
'id': '6950251282489675042',
'ext': 'mp4',
'title': '哈哈哈,成功了哈哈哈哈哈哈',
'uploader': '杨超越',
'upload_date': '20210412',
'timestamp': 1618231483,
'uploader_id': '110403406559',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
}
}, {
'url': 'https://www.douyin.com/video/6963263655114722595',
'md5': '1abe1c477d05ee62efb40bf2329957cf',
'info_dict': {
'id': '6963263655114722595',
'ext': 'mp4',
'title': '#哪个爱豆的105度最甜 换个角度看看我哈哈',
'uploader': '杨超越',
'upload_date': '20210517',
'timestamp': 1621261163,
'uploader_id': '110403406559',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
render_data = self._parse_json(
self._search_regex(
r'<script [^>]*\bid=[\'"]RENDER_DATA[\'"][^>]*>(%7B.+%7D)</script>',
webpage, 'render data'),
video_id, transform_source=compat_urllib_parse_unquote)
details = traverse_obj(render_data, (..., 'aweme', 'detail'), get_all=False)
thumbnails = [{'url': self._proto_relative_url(url)} for url in traverse_obj(
details, ('video', ('cover', 'dynamicCover', 'originCover')), expected_type=url_or_none, default=[])]
common = {
'width': traverse_obj(details, ('video', 'width'), expected_type=int),
'height': traverse_obj(details, ('video', 'height'), expected_type=int),
'ext': 'mp4',
}
formats = [{**common, 'url': self._proto_relative_url(url)} for url in traverse_obj(
details, ('video', 'playAddr', ..., 'src'), expected_type=url_or_none, default=[]) if url]
self._remove_duplicate_formats(formats)
download_url = traverse_obj(details, ('download', 'url'), expected_type=url_or_none)
if download_url:
formats.append({
**common,
'format_id': 'download',
'url': self._proto_relative_url(download_url),
'quality': 1,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': details.get('desc') or self._html_search_meta('title', webpage),
'formats': formats,
'thumbnails': thumbnails,
'uploader': traverse_obj(details, ('authorInfo', 'nickname'), expected_type=str),
'uploader_id': traverse_obj(details, ('authorInfo', 'uid'), expected_type=str),
'uploader_url': 'https://www.douyin.com/user/%s' % traverse_obj(
details, ('authorInfo', 'secUid'), expected_type=str),
'timestamp': int_or_none(details.get('createTime')),
'duration': traverse_obj(details, ('video', 'duration'), expected_type=int),
'view_count': traverse_obj(details, ('stats', 'playCount'), expected_type=int),
'like_count': traverse_obj(details, ('stats', 'diggCount'), expected_type=int),
'repost_count': traverse_obj(details, ('stats', 'shareCount'), expected_type=int),
'comment_count': traverse_obj(details, ('stats', 'commentCount'), expected_type=int),
}

View File

@@ -321,6 +321,7 @@ from .discoveryplusindia import (
DiscoveryPlusIndiaShowIE,
)
from .dotsub import DotsubIE
from .douyin import DouyinIE
from .douyutv import (
DouyuShowIE,
DouyuTVIE,
@@ -1015,6 +1016,7 @@ from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornflip import PornFlipIE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,

View File

@@ -186,8 +186,8 @@ class FunimationIE(InfoExtractor):
for lang, version, fmt in self._get_experiences(episode):
experience_id = str(fmt['experienceId'])
if (only_initial_experience and experience_id != initial_experience_id
or requested_languages and lang not in requested_languages
or requested_versions and version not in requested_versions):
or requested_languages and lang.lower() not in requested_languages
or requested_versions and version.lower() not in requested_versions):
continue
thumbnails.append({'url': fmt.get('poster')})
duration = max(duration, fmt.get('duration', 0))

View File

@@ -2462,7 +2462,7 @@ class GenericIE(InfoExtractor):
# Is it an M3U playlist?
if first_bytes.startswith(b'#EXTM3U'):
info_dict['formats'] = self._extract_m3u8_formats(url, video_id, 'mp4')
info_dict['formats'], info_dict['subtitles'] = self._extract_m3u8_formats_and_subtitles(url, video_id, 'mp4')
self._sort_formats(info_dict['formats'])
return info_dict
@@ -3410,6 +3410,7 @@ class GenericIE(InfoExtractor):
if not isinstance(sources, list):
sources = [sources]
formats = []
subtitles = {}
for source in sources:
src = source.get('src')
if not src or not isinstance(src, compat_str):
@@ -3422,12 +3423,16 @@ class GenericIE(InfoExtractor):
if src_type == 'video/youtube':
return self.url_result(src, YoutubeIE.ie_key())
if src_type == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
fmts, subs = self._extract_mpd_formats_and_subtitles(
src, video_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif src_type == 'application/x-mpegurl' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'url': src,
@@ -3437,9 +3442,10 @@ class GenericIE(InfoExtractor):
'Referer': full_response.geturl(),
},
})
if formats:
if formats or subtitles:
self._sort_formats(formats)
info_dict['formats'] = formats
info_dict['subtitles'] = subtitles
return info_dict
# Looking for http://schema.org/VideoObject
@@ -3574,13 +3580,13 @@ class GenericIE(InfoExtractor):
ext = determine_ext(video_url)
if ext == 'smil':
entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)
entry_info_dict = {**self._extract_smil_info(video_url, video_id), **entry_info_dict}
elif ext == 'xspf':
return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
elif ext == 'm3u8':
entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
entry_info_dict['formats'], entry_info_dict['subtitles'] = self._extract_m3u8_formats_and_subtitles(video_url, video_id, ext='mp4')
elif ext == 'mpd':
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
entry_info_dict['formats'], entry_info_dict['subtitles'] = self._extract_mpd_formats_and_subtitles(video_url, video_id)
elif ext == 'f4m':
entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
elif re.search(r'(?i)\.(?:ism|smil)/manifest', video_url) and video_url != url:

View File

@@ -19,6 +19,7 @@ from ..utils import (
std_headers,
try_get,
url_or_none,
variadic,
)
@@ -188,9 +189,7 @@ class InstagramIE(InfoExtractor):
uploader_id = media.get('owner', {}).get('username')
def get_count(keys, kind):
if not isinstance(keys, (list, tuple)):
keys = [keys]
for key in keys:
for key in variadic(keys):
count = int_or_none(try_get(
media, (lambda x: x['edge_media_%s' % key]['count'],
lambda x: x['%ss' % kind]['count'])))

View File

@@ -2,9 +2,11 @@
from __future__ import unicode_literals
import json
import time
from urllib.error import HTTPError
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_str, compat_urllib_parse_unquote, compat_urllib_parse_quote
from ..utils import (
ExtractorError,
parse_iso8601,
@@ -78,7 +80,9 @@ class NebulaIE(InfoExtractor):
]
_NETRC_MACHINE = 'watchnebula'
def _retrieve_nebula_auth(self, video_id):
_nebula_token = None
def _retrieve_nebula_auth(self):
"""
Log in to Nebula, and returns a Nebula API token
"""
@@ -91,7 +95,7 @@ class NebulaIE(InfoExtractor):
data = json.dumps({'email': username, 'password': password}).encode('utf8')
response = self._download_json(
'https://api.watchnebula.com/api/v1/auth/login/',
data=data, fatal=False, video_id=video_id,
data=data, fatal=False, video_id=None,
headers={
'content-type': 'application/json',
# Submitting the 'sessionid' cookie always causes a 403 on auth endpoint
@@ -101,6 +105,19 @@ class NebulaIE(InfoExtractor):
errnote='Authentication failed or rejected')
if not response or not response.get('key'):
self.raise_login_required()
# save nebula token as cookie
self._set_cookie(
'nebula.app', 'nebula-auth',
compat_urllib_parse_quote(
json.dumps({
"apiToken": response["key"],
"isLoggingIn": False,
"isLoggingOut": False,
}, separators=(",", ":"))),
expire_time=int(time.time()) + 86400 * 365,
)
return response['key']
def _retrieve_zype_api_key(self, page_url, display_id):
@@ -139,8 +156,17 @@ class NebulaIE(InfoExtractor):
'Authorization': 'Token {access_token}'.format(access_token=access_token)
}, note=note)
def _fetch_zype_access_token(self, video_id, nebula_token):
user_object = self._call_nebula_api('/auth/user/', video_id, nebula_token, note='Retrieving Zype access token')
def _fetch_zype_access_token(self, video_id):
try:
user_object = self._call_nebula_api('/auth/user/', video_id, self._nebula_token, note='Retrieving Zype access token')
except ExtractorError as exc:
# if 401, attempt credential auth and retry
if exc.cause and isinstance(exc.cause, HTTPError) and exc.cause.code == 401:
self._nebula_token = self._retrieve_nebula_auth()
user_object = self._call_nebula_api('/auth/user/', video_id, self._nebula_token, note='Retrieving Zype access token')
else:
raise
access_token = try_get(user_object, lambda x: x['zype_auth_info']['access_token'], compat_str)
if not access_token:
if try_get(user_object, lambda x: x['is_subscribed'], bool):
@@ -162,9 +188,21 @@ class NebulaIE(InfoExtractor):
if category.get('value'):
return category['value'][0]
def _real_initialize(self):
# check cookie jar for valid token
nebula_cookies = self._get_cookies('https://nebula.app')
nebula_cookie = nebula_cookies.get('nebula-auth')
if nebula_cookie:
self.to_screen('Authenticating to Nebula with token from cookie jar')
nebula_cookie_value = compat_urllib_parse_unquote(nebula_cookie.value)
self._nebula_token = self._parse_json(nebula_cookie_value, None).get('apiToken')
# try to authenticate using credentials if no valid token has been found
if not self._nebula_token:
self._nebula_token = self._retrieve_nebula_auth()
def _real_extract(self, url):
display_id = self._match_id(url)
nebula_token = self._retrieve_nebula_auth(display_id)
api_key = self._retrieve_zype_api_key(url, display_id)
response = self._call_zype_api('/videos', {'friendly_title': display_id},
@@ -174,7 +212,7 @@ class NebulaIE(InfoExtractor):
video_meta = response['response'][0]
video_id = video_meta['_id']
zype_access_token = self._fetch_zype_access_token(display_id, nebula_token=nebula_token)
zype_access_token = self._fetch_zype_access_token(display_id)
channel_title = self._extract_channel_title(video_meta)
@@ -187,13 +225,12 @@ class NebulaIE(InfoExtractor):
'title': video_meta.get('title'),
'description': video_meta.get('description'),
'timestamp': parse_iso8601(video_meta.get('published_at')),
'thumbnails': [
{
'id': tn.get('name'), # this appears to be null
'url': tn['url'],
'width': tn.get('width'),
'height': tn.get('height'),
} for tn in video_meta.get('thumbnails', [])],
'thumbnails': [{
'id': tn.get('name'), # this appears to be null
'url': tn['url'],
'width': tn.get('width'),
'height': tn.get('height'),
} for tn in video_meta.get('thumbnails', [])],
'duration': video_meta.get('duration'),
'channel': channel_title,
'uploader': channel_title, # we chose uploader = channel name

View File

@@ -0,0 +1,82 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601
)
class PornFlipIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:(embed|sv|v)/)?(?P<id>[^/]+)'
_TESTS = [
{
'url': 'https://www.pornflip.com/dzv9Mtw1qj2/sv/brazzers-double-dare-two-couples-fucked-jenna-reid-maya-bijou',
'info_dict': {
'id': 'dzv9Mtw1qj2',
'ext': 'mp4',
'title': 'Brazzers - Double Dare Two couples fucked Jenna Reid Maya Bijou',
'description': 'md5:d2b69e6cc743c5fd158e162aa7f05821',
'duration': 476,
'like_count': int,
'dislike_count': int,
'view_count': int,
'timestamp': 1617846819,
'upload_date': '20210408',
'uploader': 'Brazzers',
'age_limit': 18,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
},
{
'url': 'https://www.pornflip.com/v/IrJEC40i21L',
'only_matching': True,
},
{
'url': 'https://www.pornflip.com/Z3jzbChC5-P/sexintaxi-e-sereyna-gomez-czech-naked-couple',
'only_matching': True,
},
{
'url': 'https://www.pornflip.com/embed/bLcDFxnrZnU',
'only_matching': True,
},
]
_HOST = 'www.pornflip.com'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://{}/sv/{}'.format(self._HOST, video_id), video_id, headers={'host': self._HOST})
description = self._html_search_regex(r'&p\[summary\]=(.*?)\s*&p', webpage, 'description', fatal=False)
duration = self._search_regex(r'"duration":\s+"([^"]+)",', webpage, 'duration', fatal=False)
view_count = self._search_regex(r'"interactionCount":\s+"([^"]+)"', webpage, 'view_count', fatal=False)
title = self._html_search_regex(r'id="mediaPlayerTitleLink"[^>]*>(.+)</a>', webpage, 'title', fatal=False)
uploader = self._html_search_regex(r'class="title-chanel"[^>]*>[^<]*<a[^>]*>([^<]+)<', webpage, 'uploader', fatal=False)
upload_date = self._search_regex(r'"uploadDate":\s+"([^"]+)",', webpage, 'upload_date', fatal=False)
likes = self._html_search_regex(
r'class="btn btn-up-rating[^>]*>[^<]*<i[^>]*>[^<]*</i>[^>]*<span[^>]*>[^0-9]*([0-9]+)[^<0-9]*<', webpage, 'like_count', fatal=False)
dislikes = self._html_search_regex(
r'class="btn btn-down-rating[^>]*>[^<]*<i[^>]*>[^<]*</i>[^>]*<span[^>]*>[^0-9]*([0-9]+)[^<0-9]*<', webpage, 'dislike_count', fatal=False)
mpd_url = self._search_regex(r'"([^"]+userscontent.net/dash/[0-9]+/manifest.mpd[^"]*)"', webpage, 'mpd_url').replace('&amp;', '&')
formats = self._extract_mpd_formats(mpd_url, video_id, mpd_id='dash')
self._sort_formats(formats)
return {
'age_limit': 18,
'description': description,
'dislike_count': int_or_none(dislikes),
'duration': parse_duration(duration),
'formats': formats,
'id': video_id,
'like_count': int_or_none(likes),
'timestamp': parse_iso8601(upload_date),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'uploader': uploader,
'view_count': int_or_none(view_count),
}

View File

@@ -2,10 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
)
from ..utils import js_to_json
import re
import json
import urllib.parse
import base64
class RTPIE(InfoExtractor):
@@ -25,6 +26,22 @@ class RTPIE(InfoExtractor):
'only_matching': True,
}]
_RX_OBFUSCATION = re.compile(r'''(?xs)
atob\s*\(\s*decodeURIComponent\s*\(\s*
(\[[0-9A-Za-z%,'"]*\])
\s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\)
''')
def __unobfuscate(self, data, *, video_id):
if data.startswith('{'):
data = self._RX_OBFUSCATION.sub(
lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote(
''.join(self._parse_json(m.group(1), video_id))
)).decode('iso-8859-1')),
data)
return js_to_json(data)
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -32,30 +49,46 @@ class RTPIE(InfoExtractor):
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
config = self._parse_json(self._search_regex(
r'(?s)RTPPlayer\(({.+?})\);', webpage,
'player config'), video_id, js_to_json)
file_url = config['file']
ext = determine_ext(file_url)
if ext == 'm3u8':
file_key = config.get('fileKey')
formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=file_key)
if file_key:
formats.append({
'url': 'https://cdn-ondemand.rtp.pt' + file_key,
'quality': 1,
})
self._sort_formats(formats)
f, config = self._search_regex(
r'''(?sx)
var\s+f\s*=\s*(?P<f>".*?"|{[^;]+?});\s*
var\s+player1\s+=\s+new\s+RTPPlayer\s*\((?P<config>{(?:(?!\*/).)+?})\);(?!\s*\*/)
''', webpage,
'player config', group=('f', 'config'))
f = self._parse_json(
f, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
config = self._parse_json(
config, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
formats = []
if isinstance(f, dict):
f_hls = f.get('hls')
if f_hls is not None:
formats.extend(self._extract_m3u8_formats(
f_hls, video_id, 'mp4', 'm3u8_native', m3u8_id='hls'))
f_dash = f.get('dash')
if f_dash is not None:
formats.extend(self._extract_mpd_formats(f_dash, video_id, mpd_id='dash'))
else:
formats = [{
'url': file_url,
'ext': ext,
}]
if config.get('mediaType') == 'audio':
for f in formats:
f['vcodec'] = 'none'
formats.append({
'format_id': 'f',
'url': f,
'vcodec': 'none' if config.get('mediaType') == 'audio' else None,
})
subtitles = {}
vtt = config.get('vtt')
if vtt is not None:
for lcode, lname, url in vtt:
subtitles.setdefault(lcode, []).append({
'name': lname,
'url': url,
})
return {
'id': video_id,
@@ -63,4 +96,5 @@ class RTPIE(InfoExtractor):
'formats': formats,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
'subtitles': subtitles,
}

View File

@@ -1,39 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import hashlib
import hmac
import itertools
import json
import re
import time
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
HEADRequest,
parse_age_limit,
parse_iso8601,
sanitized_Request,
std_headers,
try_get,
)
class VikiBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?viki\.(?:com|net|mx|jp|fr)/'
_API_QUERY_TEMPLATE = '/v4/%sapp=%s&t=%s&site=www.viki.com'
_API_URL_TEMPLATE = 'https://api.viki.io%s&sig=%s'
_API_URL_TEMPLATE = 'https://api.viki.io%s'
_DEVICE_ID = '86085977d' # used for android api
_APP = '100005a'
_APP_VERSION = '6.0.0'
_APP_SECRET = 'MM_d*yP@`&1@]@!AVrXf_o-HVEnoTnm$O-ti4[G~$JDI/Dc-&piU&z&5.;:}95=Iad'
_APP_VERSION = '6.11.3'
_APP_SECRET = 'd96704b180208dbb2efa30fe44c48bd8690441af9f567ba8fd710a72badc85198f7472'
_GEO_BYPASS = False
_NETRC_MACHINE = 'viki'
@@ -46,53 +35,57 @@ class VikiBaseIE(InfoExtractor):
'paywall': 'Sorry, this content is only available to Viki Pass Plus subscribers',
}
def _prepare_call(self, path, timestamp=None, post_data=None):
def _stream_headers(self, timestamp, sig):
return {
'X-Viki-manufacturer': 'vivo',
'X-Viki-device-model': 'vivo 1606',
'X-Viki-device-os-ver': '6.0.1',
'X-Viki-connection-type': 'WIFI',
'X-Viki-carrier': '',
'X-Viki-as-id': '100005a-1625321982-3932',
'timestamp': str(timestamp),
'signature': str(sig),
'x-viki-app-ver': self._APP_VERSION
}
def _api_query(self, path, version=4, **kwargs):
path += '?' if '?' not in path else '&'
if not timestamp:
timestamp = int(time.time())
query = self._API_QUERY_TEMPLATE % (path, self._APP, timestamp)
query = f'/v{version}/{path}app={self._APP}'
if self._token:
query += '&token=%s' % self._token
return query + ''.join(f'&{name}={val}' for name, val in kwargs.items())
def _sign_query(self, path):
timestamp = int(time.time())
query = self._api_query(path, version=5)
sig = hmac.new(
self._APP_SECRET.encode('ascii'),
query.encode('ascii'),
hashlib.sha1
).hexdigest()
url = self._API_URL_TEMPLATE % (query, sig)
return sanitized_Request(
url, json.dumps(post_data).encode('utf-8')) if post_data else url
self._APP_SECRET.encode('ascii'), f'{query}&t={timestamp}'.encode('ascii'), hashlib.sha1).hexdigest()
return timestamp, sig, self._API_URL_TEMPLATE % query
def _call_api(self, path, video_id, note, timestamp=None, post_data=None):
def _call_api(
self, path, video_id, note='Downloading JSON metadata', data=None, query=None, fatal=True):
if query is None:
timestamp, sig, url = self._sign_query(path)
else:
url = self._API_URL_TEMPLATE % self._api_query(path, version=4)
resp = self._download_json(
self._prepare_call(path, timestamp, post_data),
video_id, note,
headers={
'x-client-user-agent': std_headers['User-Agent'],
'x-viki-as-id': self._APP,
'x-viki-app-ver': self._APP_VERSION,
})
error = resp.get('error')
if error:
if error == 'invalid timestamp':
resp = self._download_json(
self._prepare_call(path, int(resp['current_timestamp']), post_data),
video_id, '%s (retry)' % note,
headers={
'x-client-user-agent': std_headers['User-Agent'],
'x-viki-as-id': self._APP,
'x-viki-app-ver': self._APP_VERSION,
})
error = resp.get('error')
if error:
self._raise_error(resp['error'])
url, video_id, note, fatal=fatal, query=query,
data=json.dumps(data).encode('utf-8') if data else None,
headers=({'x-viki-app-ver': self._APP_VERSION} if data
else self._stream_headers(timestamp, sig) if query is None
else None)) or {}
self._raise_error(resp.get('error'), fatal)
return resp
def _raise_error(self, error):
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error),
expected=True)
def _raise_error(self, error, fatal=True):
if error is None:
return
msg = '%s said: %s' % (self.IE_NAME, error)
if fatal:
raise ExtractorError(msg, expected=True)
else:
self.report_warning(msg)
def _check_errors(self, data):
for reason, status in (data.get('blocking') or {}).items():
@@ -101,9 +94,10 @@ class VikiBaseIE(InfoExtractor):
if reason == 'geo':
self.raise_geo_restricted(msg=message)
elif reason == 'paywall':
if try_get(data, lambda x: x['paywallable']['tvod']):
self._raise_error('This video is for rent only or TVOD (Transactional Video On demand)')
self.raise_login_required(message)
raise ExtractorError('%s said: %s' % (
self.IE_NAME, message), expected=True)
self._raise_error(message)
def _real_initialize(self):
self._login()
@@ -113,29 +107,17 @@ class VikiBaseIE(InfoExtractor):
if username is None:
return
login_form = {
'login_id': username,
'password': password,
}
login = self._call_api(
'sessions.json', None,
'Logging in', post_data=login_form)
self._token = login.get('token')
self._token = self._call_api(
'sessions.json', None, 'Logging in', fatal=False,
data={'username': username, 'password': password}).get('token')
if not self._token:
self.report_warning('Unable to get session token, login has probably failed')
self.report_warning('Login Failed: Unable to get session token')
@staticmethod
def dict_selection(dict_obj, preferred_key, allow_fallback=True):
def dict_selection(dict_obj, preferred_key):
if preferred_key in dict_obj:
return dict_obj.get(preferred_key)
if not allow_fallback:
return
filtered_dict = list(filter(None, [dict_obj.get(k) for k in dict_obj.keys()]))
return filtered_dict[0] if filtered_dict else None
return dict_obj[preferred_key]
return (list(filter(None, dict_obj.values())) or [None])[0]
class VikiIE(VikiBaseIE):
@@ -266,18 +248,10 @@ class VikiIE(VikiBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
resp = self._download_json(
'https://www.viki.com/api/videos/' + video_id,
video_id, 'Downloading video JSON', headers={
'x-client-user-agent': std_headers['User-Agent'],
'x-viki-app-ver': '3.0.0',
})
video = resp['video']
video = self._call_api(f'videos/{video_id}.json', video_id, 'Downloading video JSON', query={})
self._check_errors(video)
title = self.dict_selection(video.get('titles', {}), 'en', allow_fallback=False)
title = try_get(video, lambda x: x['titles']['en'], str)
episode_number = int_or_none(video.get('number'))
if not title:
title = 'Episode %d' % episode_number if video.get('type') == 'episode' else video.get('id') or video_id
@@ -285,116 +259,46 @@ class VikiIE(VikiBaseIE):
container_title = self.dict_selection(container_titles, 'en')
title = '%s - %s' % (container_title, title)
description = self.dict_selection(video.get('descriptions', {}), 'en')
thumbnails = [{
'id': thumbnail_id,
'url': thumbnail['url'],
} for thumbnail_id, thumbnail in (video.get('images') or {}).items() if thumbnail.get('url')]
like_count = int_or_none(try_get(video, lambda x: x['likes']['count']))
resp = self._call_api(
'playback_streams/%s.json?drms=dt1,dt2&device_id=%s' % (video_id, self._DEVICE_ID),
video_id, 'Downloading video streams JSON')['main'][0]
thumbnails = []
for thumbnail_id, thumbnail in (video.get('images') or {}).items():
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail.get('url'),
})
stream_id = try_get(resp, lambda x: x['properties']['track']['stream_id'])
subtitles = dict((lang, [{
'ext': ext,
'url': self._API_URL_TEMPLATE % self._api_query(
f'videos/{video_id}/auth_subtitles/{lang}.{ext}', stream_id=stream_id)
} for ext in ('srt', 'vtt')]) for lang in (video.get('subtitle_completions') or {}).keys())
subtitles = {}
for subtitle_lang, _ in (video.get('subtitle_completions') or {}).items():
subtitles[subtitle_lang] = [{
'ext': subtitles_format,
'url': self._prepare_call(
'videos/%s/subtitles/%s.%s' % (video_id, subtitle_lang, subtitles_format)),
} for subtitles_format in ('srt', 'vtt')]
mpd_url = resp['url']
# 1080p is hidden in another mpd which can be found in the current manifest content
mpd_content = self._download_webpage(mpd_url, video_id, note='Downloading initial MPD manifest')
mpd_url = self._search_regex(
r'(?mi)<BaseURL>(http.+.mpd)', mpd_content, 'new manifest', default=mpd_url)
formats = self._extract_mpd_formats(mpd_url, video_id)
self._sort_formats(formats)
result = {
return {
'id': video_id,
'formats': formats,
'title': title,
'description': description,
'description': self.dict_selection(video.get('descriptions', {}), 'en'),
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('created_at')),
'uploader': video.get('author'),
'uploader_url': video.get('author_url'),
'like_count': like_count,
'like_count': int_or_none(try_get(video, lambda x: x['likes']['count'])),
'age_limit': parse_age_limit(video.get('rating')),
'thumbnails': thumbnails,
'subtitles': subtitles,
'episode_number': episode_number,
}
formats = []
def add_format(format_id, format_dict, protocol='http'):
# rtmps URLs does not seem to work
if protocol == 'rtmps':
return
format_url = format_dict.get('url')
if not format_url:
return
qs = compat_parse_qs(compat_urllib_parse_urlparse(format_url).query)
stream = qs.get('stream', [None])[0]
if stream:
format_url = base64.b64decode(stream).decode()
if format_id in ('m3u8', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native',
m3u8_id='m3u8-%s' % protocol, fatal=False)
# Despite CODECS metadata in m3u8 all video-only formats
# are actually video+audio
for f in m3u8_formats:
if not self.get_param('allow_unplayable_formats') and '_drm/index_' in f['url']:
continue
if f.get('acodec') == 'none' and f.get('vcodec') != 'none':
f['acodec'] = None
formats.append(f)
elif format_id in ('mpd', 'dash'):
formats.extend(self._extract_mpd_formats(
format_url, video_id, 'mpd-%s' % protocol, fatal=False))
elif format_url.startswith('rtmp'):
mobj = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>.+?))/(?P<playpath>mp4:.+)$',
format_url)
if not mobj:
return
formats.append({
'format_id': 'rtmp-%s' % format_id,
'ext': 'flv',
'url': mobj.group('url'),
'play_path': mobj.group('playpath'),
'app': mobj.group('app'),
'page_url': url,
})
else:
urlh = self._request_webpage(
HEADRequest(format_url), video_id, 'Checking file size', fatal=False)
formats.append({
'url': format_url,
'format_id': '%s-%s' % (format_id, protocol),
'height': int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)),
'filesize': int_or_none(urlh.headers.get('Content-Length')),
})
for format_id, format_dict in (resp.get('streams') or {}).items():
add_format(format_id, format_dict)
if not formats:
streams = self._call_api(
'videos/%s/streams.json' % video_id, video_id,
'Downloading video streams JSON')
if 'external' in streams:
result.update({
'_type': 'url_transparent',
'url': streams['external']['url'],
})
return result
for format_id, stream_dict in streams.items():
for protocol, format_dict in stream_dict.items():
add_format(format_id, format_dict, protocol)
self._sort_formats(formats)
result['formats'] = formats
return result
class VikiChannelIE(VikiBaseIE):
IE_NAME = 'viki:channel'
@@ -406,7 +310,7 @@ class VikiChannelIE(VikiBaseIE):
'title': 'Boys Over Flowers',
'description': 'md5:804ce6e7837e1fd527ad2f25420f4d59',
},
'playlist_mincount': 71,
'playlist_mincount': 51,
}, {
'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete',
'info_dict': {
@@ -427,33 +331,35 @@ class VikiChannelIE(VikiBaseIE):
'only_matching': True,
}]
_PER_PAGE = 25
_video_types = ('episodes', 'movies', 'clips', 'trailers')
def _entries(self, channel_id):
params = {
'app': self._APP, 'token': self._token, 'only_ids': 'true',
'direction': 'asc', 'sort': 'number', 'per_page': 30
}
video_types = self._configuration_arg('video_types') or self._video_types
for video_type in video_types:
if video_type not in self._video_types:
self.report_warning(f'Unknown video_type: {video_type}')
page_num = 0
while True:
page_num += 1
params['page'] = page_num
res = self._call_api(
f'containers/{channel_id}/{video_type}.json', channel_id, query=params, fatal=False,
note='Downloading %s JSON page %d' % (video_type.title(), page_num))
for video_id in res.get('response') or []:
yield self.url_result(f'https://www.viki.com/videos/{video_id}', VikiIE.ie_key(), video_id)
if not res.get('more'):
break
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._call_api(
'containers/%s.json' % channel_id, channel_id,
'Downloading channel JSON')
channel = self._call_api('containers/%s.json' % channel_id, channel_id, 'Downloading channel JSON')
self._check_errors(channel)
title = self.dict_selection(channel['titles'], 'en')
description = self.dict_selection(channel['descriptions'], 'en')
entries = []
for video_type in ('episodes', 'clips', 'movies'):
for page_num in itertools.count(1):
page = self._call_api(
'containers/%s/%s.json?per_page=%d&sort=number&direction=asc&with_paging=true&page=%d'
% (channel_id, video_type, self._PER_PAGE, page_num), channel_id,
'Downloading %s JSON page #%d' % (video_type, page_num))
for video in page['response']:
video_id = video['id']
entries.append(self.url_result(
'https://www.viki.com/videos/%s' % video_id, 'Viki'))
if not page['pagination']['next']:
break
return self.playlist_result(entries, channel_id, title, description)
return self.playlist_result(
self._entries(channel_id), channel_id,
self.dict_selection(channel['titles'], 'en'),
self.dict_selection(channel['descriptions'], 'en'))

View File

@@ -178,9 +178,15 @@ class VLiveIE(VLiveBaseIE):
if video_type == 'VOD':
inkey = self._call_api('video/v1.0/vod/%s/inkey', video_id)['inkey']
vod_id = video['vodId']
return merge_dicts(
info_dict = merge_dicts(
get_common_fields(),
self._extract_video_info(video_id, vod_id, inkey))
thumbnail = video.get('thumb')
if thumbnail:
if not info_dict.get('thumbnails') and info_dict.get('thumbnail'):
info_dict['thumbnails'] = [{'url': info_dict.pop('thumbnail')}]
info_dict.setdefault('thumbnails', []).append({'url': thumbnail, 'preference': 1})
return info_dict
elif video_type == 'LIVE':
status = video.get('status')
if status == 'ON_AIR':

View File

@@ -362,7 +362,7 @@ class YahooSearchIE(SearchInfoExtractor):
class YahooGyaOPlayerIE(InfoExtractor):
IE_NAME = 'yahoo:gyao:player'
_VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:player|episode/[^/]+)|streaming\.yahoo\.co\.jp/c/y)/(?P<id>\d+/v\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:player|episode(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/c/y)/(?P<id>\d+/v\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://gyao.yahoo.co.jp/player/00998/v00818/v0000000000000008564/',
'info_dict': {
@@ -384,6 +384,9 @@ class YahooGyaOPlayerIE(InfoExtractor):
}, {
'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
'only_matching': True,
}, {
'url': 'https://gyao.yahoo.co.jp/episode/5fa1226c-ef8d-4e93-af7a-fd92f4e30597',
'only_matching': True,
}]
_GEO_BYPASS = False

File diff suppressed because it is too large Load Diff

View File

@@ -19,6 +19,7 @@ from .utils import (
preferredencoding,
write_string,
)
from .cookies import SUPPORTED_BROWSERS
from .version import __version__
from .downloader.external import list_external_downloaders
@@ -137,14 +138,21 @@ def parseOpts(overrideArguments=None):
else:
raise optparse.OptionValueError(
'wrong %s formatting; it should be %s, not "%s"' % (opt_str, option.metavar, value))
val = process(val) if callable(process) else val
try:
val = process(val) if process else val
except Exception as err:
raise optparse.OptionValueError(
'wrong %s formatting; %s' % (opt_str, err))
for key in keys:
out_dict[key] = val
# No need to wrap help messages if we're on a wide console
columns = compat_get_terminal_size().columns
max_width = columns if columns else 80
max_help_position = 80
# 47% is chosen because that is how README.md is currently formatted
# and moving help text even further to the right is undesirable.
# This can be reduced in the future to get a prettier output
max_help_position = int(0.47 * max_width)
fmt = optparse.IndentedHelpFormatter(width=max_width, max_help_position=max_help_position)
fmt.format_option_strings = _format_option_string
@@ -520,8 +528,12 @@ def parseOpts(overrideArguments=None):
help="Don't give any special preference to free containers (default)")
video_format.add_option(
'--check-formats',
action='store_true', dest='check_formats', default=False,
help="Check that the formats selected are actually downloadable (Experimental)")
action='store_true', dest='check_formats', default=None,
help='Check that the formats selected are actually downloadable')
video_format.add_option(
'--no-check-formats',
action='store_false', dest='check_formats',
help='Do not check that the formats selected are actually downloadable')
video_format.add_option(
'-F', '--list-formats',
action='store_true', dest='listformats',
@@ -1079,7 +1091,21 @@ def parseOpts(overrideArguments=None):
filesystem.add_option(
'--no-cookies',
action='store_const', const=None, dest='cookiefile', metavar='FILE',
help='Do not read/dump cookies (default)')
help='Do not read/dump cookies from/to file (default)')
filesystem.add_option(
'--cookies-from-browser',
dest='cookiesfrombrowser', metavar='BROWSER[:PROFILE]',
help=(
'Load cookies from a user profile of the given web browser. '
'Currently supported browsers are: {}. '
'You can specify the user profile name or directory using '
'"BROWSER:PROFILE_NAME" or "BROWSER:PROFILE_PATH". '
'If no profile is given, the most recently accessed one is used'.format(
'|'.join(sorted(SUPPORTED_BROWSERS)))))
filesystem.add_option(
'--no-cookies-from-browser',
action='store_const', const=None, dest='cookiesfrombrowser',
help='Do not load cookies from browser (default)')
filesystem.add_option(
'--cache-dir', dest='cachedir', default=None, metavar='DIR',
help='Location in the filesystem where youtube-dl can store some downloaded information (such as client ids and signatures) permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl')
@@ -1261,6 +1287,10 @@ def parseOpts(overrideArguments=None):
'Similar syntax to the output template can be used to pass any field as arguments to the command. '
'An additional field "filepath" that contains the final path of the downloaded file is also available. '
'If no fields are passed, "%(filepath)s" is appended to the end of the command'))
postproc.add_option(
'--exec-before-download',
metavar='CMD', dest='exec_before_dl_cmd',
help='Execute a command before the actual download. The syntax is the same as --exec')
postproc.add_option(
'--convert-subs', '--convert-sub', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None,
@@ -1344,6 +1374,7 @@ def parseOpts(overrideArguments=None):
'--no-hls-split-discontinuity',
dest='hls_split_discontinuity', action='store_false',
help='Do not split HLS playlists to different formats at discontinuities such as ad breaks (default)')
_extractor_arg_parser = lambda key, vals='': (key.strip().lower(), [val.strip() for val in vals.split(',')])
extractor.add_option(
'--extractor-args',
metavar='KEY:ARGS', dest='extractor_args', default={}, type='str',
@@ -1351,11 +1382,11 @@ def parseOpts(overrideArguments=None):
callback_kwargs={
'multiple_keys': False,
'process': lambda val: dict(
(lambda x: (x[0], x[1].split(',')))(arg.split('=', 1) + ['', '']) for arg in val.split(';'))
_extractor_arg_parser(*arg.split('=', 1)) for arg in val.split(';'))
},
help=(
'Pass these arguments to the extractor. See "EXTRACTOR ARGUMENTS" for details. '
'You can use this option multiple times to give different arguments to different extractors'))
'You can use this option multiple times to give arguments for different extractors'))
extractor.add_option(
'--youtube-include-dash-manifest', '--no-youtube-skip-dash-manifest',
action='store_true', dest='youtube_include_dash_manifest', default=True,

View File

@@ -51,7 +51,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
try:
size_regex = r',\s*(?P<w>\d+)x(?P<h>\d+)\s*[,\[]'
size_result = self.run_ffmpeg(filename, filename, ['-hide_banner'])
size_result = self.run_ffmpeg(filename, None, ['-hide_banner'], expected_retcodes=(1,))
mobj = re.search(size_regex, size_result)
if mobj is None:
return guess()

View File

@@ -28,7 +28,8 @@ class ExecAfterDownloadPP(PostProcessor):
# If no replacements are found, replace {} for backard compatibility
if '{}' not in cmd:
cmd += ' {}'
return cmd.replace('{}', compat_shlex_quote(info['filepath']))
return cmd.replace('{}', compat_shlex_quote(
info.get('filepath') or info['_filename']))
def run(self, info):
cmd = self.parse_cmd(self.exec_cmd, info)

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals
import io
import itertools
import os
import subprocess
import time
@@ -24,6 +25,7 @@ from ..utils import (
process_communicate_or_kill,
replace_extension,
traverse_obj,
variadic,
)
@@ -233,16 +235,16 @@ class FFmpegPostProcessor(PostProcessor):
None)
return num, len(streams)
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts, **kwargs):
return self.real_run_ffmpeg(
[(path, []) for path in input_paths],
[(out_path, opts)])
[(out_path, opts)], **kwargs)
def real_run_ffmpeg(self, input_path_opts, output_path_opts):
def real_run_ffmpeg(self, input_path_opts, output_path_opts, *, expected_retcodes=(0,)):
self.check_version()
oldest_mtime = min(
os.stat(encodeFilename(path)).st_mtime for path, _ in input_path_opts)
os.stat(encodeFilename(path)).st_mtime for path, _ in input_path_opts if path)
cmd = [encodeFilename(self.executable, True), encodeArgument('-y')]
# avconv does not have repeat option
@@ -261,23 +263,25 @@ class FFmpegPostProcessor(PostProcessor):
+ [encodeFilename(self._ffmpeg_filename_argument(file), True)])
for arg_type, path_opts in (('i', input_path_opts), ('o', output_path_opts)):
cmd += [arg for i, o in enumerate(path_opts)
for arg in make_args(o[0], o[1], arg_type, i + 1)]
cmd += itertools.chain.from_iterable(
make_args(path, list(opts), arg_type, i + 1)
for i, (path, opts) in enumerate(path_opts) if path)
self.write_debug('ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
stdout, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
if p.returncode not in variadic(expected_retcodes):
stderr = stderr.decode('utf-8', 'replace').strip()
if self.get_param('verbose', False):
self.report_error(stderr)
raise FFmpegPostProcessorError(stderr.split('\n')[-1])
for out_path, _ in output_path_opts:
self.try_utime(out_path, oldest_mtime, oldest_mtime)
if out_path:
self.try_utime(out_path, oldest_mtime, oldest_mtime)
return stderr.decode('utf-8', 'replace')
def run_ffmpeg(self, path, out_path, opts):
return self.run_ffmpeg_multiple_files([path], out_path, opts)
def run_ffmpeg(self, path, out_path, opts, **kwargs):
return self.run_ffmpeg_multiple_files([path], out_path, opts, **kwargs)
def _ffmpeg_filename_argument(self, fn):
# Always use 'file:' because the filename may contain ':' (ffmpeg
@@ -526,6 +530,15 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
class FFmpegMetadataPP(FFmpegPostProcessor):
@staticmethod
def _options(target_ext):
yield from ('-map', '0', '-dn')
if target_ext == 'm4a':
yield from ('-vn', '-acodec', 'copy')
else:
yield from ('-c', 'copy')
@PostProcessor._restrict_to(images=False)
def run(self, info):
metadata = {}
@@ -533,15 +546,9 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
def add(meta_list, info_list=None):
if not meta_list:
return
if not info_list:
info_list = meta_list
if not isinstance(meta_list, (list, tuple)):
meta_list = (meta_list,)
if not isinstance(info_list, (list, tuple)):
info_list = (info_list,)
for info_f in info_list:
for info_f in variadic(info_list or meta_list):
if isinstance(info.get(info_f), (compat_str, compat_numeric_types)):
for meta_f in meta_list:
for meta_f in variadic(meta_list):
metadata[meta_f] = info[info_f]
break
@@ -570,22 +577,17 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
for key in filter(lambda k: k.startswith(prefix), info.keys()):
add(key[len(prefix):], key)
if not metadata:
self.to_screen('There isn\'t any metadata to add')
return [], info
filename, metadata_filename = info['filepath'], None
options = [('-metadata', f'{name}={value}') for name, value in metadata.items()]
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
in_filenames = [filename]
options = ['-map', '0', '-dn']
if info['ext'] == 'm4a':
options.extend(['-vn', '-acodec', 'copy'])
else:
options.extend(['-c', 'copy'])
for name, value in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
stream_idx = 0
for fmt in info.get('requested_formats') or []:
stream_count = 2 if 'none' not in (fmt.get('vcodec'), fmt.get('acodec')) else 1
if fmt.get('language'):
lang = ISO639Utils.short2long(fmt['language']) or fmt['language']
options.extend(('-metadata:s:%d' % (stream_idx + i), 'language=%s' % lang)
for i in range(stream_count))
stream_idx += stream_count
chapters = info.get('chapters', [])
if chapters:
@@ -603,24 +605,29 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
if chapter_title:
metadata_file_content += 'title=%s\n' % ffmpeg_escape(chapter_title)
f.write(metadata_file_content)
in_filenames.append(metadata_filename)
options.extend(['-map_metadata', '1'])
options.append(('-map_metadata', '1'))
if ('no-attach-info-json' not in self.get_param('compat_opts', [])
and '__infojson_filename' in info and info['ext'] in ('mkv', 'mka')):
old_stream, new_stream = self.get_stream_number(
filename, ('tags', 'mimetype'), 'application/json')
old_stream, new_stream = self.get_stream_number(filename, ('tags', 'mimetype'), 'application/json')
if old_stream is not None:
options.extend(['-map', '-0:%d' % old_stream])
options.append(('-map', '-0:%d' % old_stream))
new_stream -= 1
options.extend([
options.append((
'-attach', info['__infojson_filename'],
'-metadata:s:%d' % new_stream, 'mimetype=application/json'
])
))
self.to_screen('Adding metadata to \'%s\'' % filename)
self.run_ffmpeg_multiple_files(in_filenames, temp_filename, options)
if not options:
self.to_screen('There isn\'t any metadata to add')
return [], info
temp_filename = prepend_extension(filename, 'temp')
self.to_screen('Adding metadata to "%s"' % filename)
self.run_ffmpeg_multiple_files(
(filename, metadata_filename), temp_filename,
itertools.chain(self._options(info['ext']), *options))
if chapters:
os.remove(metadata_filename)
os.remove(encodeFilename(filename))

View File

@@ -27,7 +27,7 @@ class MetadataFromFieldPP(PostProcessor):
@staticmethod
def field_to_template(tmpl):
if re.match(r'\w+$', tmpl):
if re.match(r'[a-zA-Z_]+$', tmpl):
return '%%(%s)s' % tmpl
return tmpl
@@ -63,7 +63,7 @@ class MetadataFromFieldPP(PostProcessor):
continue
for attribute, value in match.groupdict().items():
info[attribute] = value
self.to_screen('parsed %s from "%s": %s' % (attribute, dictn['in'], value if value is not None else 'NA'))
self.to_screen('parsed %s from "%s": %s' % (attribute, dictn['tmpl'], value if value is not None else 'NA'))
return [], info

View File

@@ -4289,9 +4289,7 @@ def dict_get(d, key_or_keys, default=None, skip_false_values=True):
def try_get(src, getter, expected_type=None):
if not isinstance(getter, (list, tuple)):
getter = [getter]
for get in getter:
for get in variadic(getter):
try:
v = get(src)
except (AttributeError, KeyError, TypeError, IndexError):
@@ -4367,7 +4365,7 @@ def strip_jsonp(code):
def js_to_json(code, vars={}):
# vars is a dict of var, val pairs to substitute
COMMENT_RE = r'/\*(?:(?!\*/).)*?\*/|//[^\n]*'
COMMENT_RE = r'/\*(?:(?!\*/).)*?\*/|//[^\n]*\n'
SKIP_RE = r'\s*(?:{comment})?\s*'.format(comment=COMMENT_RE)
INTEGER_TABLE = (
(r'(?s)^(0[xX][0-9a-fA-F]+){skip}:?$'.format(skip=SKIP_RE), 16),
@@ -4964,11 +4962,9 @@ def cli_configuration_args(argdict, keys, default=[], use_compat=True):
assert isinstance(keys, (list, tuple))
for key_list in keys:
if isinstance(key_list, compat_str):
key_list = (key_list,)
arg_list = list(filter(
lambda x: x is not None,
[argdict.get(key.lower()) for key in key_list]))
[argdict.get(key.lower()) for key in variadic(key_list)]))
if arg_list:
return [arg for args in arg_list for arg in args]
return default
@@ -6228,40 +6224,93 @@ def load_plugins(name, suffix, namespace):
return classes
def traverse_obj(obj, keys, *, casesense=True, is_user_input=False, traverse_string=False):
def traverse_obj(
obj, *path_list, default=None, expected_type=None, get_all=True,
casesense=True, is_user_input=False, traverse_string=False):
''' Traverse nested list/dict/tuple
@param path_list A list of paths which are checked one by one.
Each path is a list of keys where each key is a string,
a tuple of strings or "...". When a tuple is given,
all the keys given in the tuple are traversed, and
"..." traverses all the keys in the object
@param default Default value to return
@param expected_type Only accept final value of this type (Can also be any callable)
@param get_all Return all the values obtained from a path or only the first one
@param casesense Whether to consider dictionary keys as case sensitive
@param is_user_input Whether the keys are generated from user input. If True,
strings are converted to int/slice if necessary
@param traverse_string Whether to traverse inside strings. If True, any
non-compatible object will also be converted into a string
# TODO: Write tests
'''
keys = list(keys)[::-1]
while keys:
key = keys.pop()
if isinstance(obj, dict):
assert isinstance(key, compat_str)
if not casesense:
obj = {k.lower(): v for k, v in obj.items()}
key = key.lower()
obj = obj.get(key)
else:
if is_user_input:
key = (int_or_none(key) if ':' not in key
else slice(*map(int_or_none, key.split(':'))))
if key is None:
if not casesense:
_lower = lambda k: k.lower() if isinstance(k, str) else k
path_list = (map(_lower, variadic(path)) for path in path_list)
def _traverse_obj(obj, path, _current_depth=0):
nonlocal depth
path = tuple(variadic(path))
for i, key in enumerate(path):
if isinstance(key, (list, tuple)):
obj = [_traverse_obj(obj, sub_key, _current_depth) for sub_key in key]
key = ...
if key is ...:
obj = (obj.values() if isinstance(obj, dict)
else obj if isinstance(obj, (list, tuple, LazyList))
else str(obj) if traverse_string else [])
_current_depth += 1
depth = max(depth, _current_depth)
return [_traverse_obj(inner_obj, path[i + 1:], _current_depth) for inner_obj in obj]
elif isinstance(obj, dict):
obj = (obj.get(key) if casesense or (key in obj)
else next((v for k, v in obj.items() if _lower(k) == key), None))
else:
if is_user_input:
key = (int_or_none(key) if ':' not in key
else slice(*map(int_or_none, key.split(':'))))
if key == slice(None):
return _traverse_obj(obj, (..., *path[i + 1:]))
if not isinstance(key, (int, slice)):
return None
if not isinstance(obj, (list, tuple)):
if traverse_string:
obj = compat_str(obj)
else:
if not isinstance(obj, (list, tuple, LazyList)):
if not traverse_string:
return None
obj = str(obj)
try:
obj = obj[key]
except IndexError:
return None
assert isinstance(key, (int, slice))
obj = try_get(obj, lambda x: x[key])
return obj
return obj
if isinstance(expected_type, type):
type_test = lambda val: val if isinstance(val, expected_type) else None
elif expected_type is not None:
type_test = expected_type
else:
type_test = lambda val: val
for path in path_list:
depth = 0
val = _traverse_obj(obj, path)
if val is not None:
if depth:
for _ in range(depth - 1):
val = itertools.chain.from_iterable(v for v in val if v is not None)
val = [v for v in map(type_test, val) if v is not None]
if val:
return val if get_all else val[0]
else:
val = type_test(val)
if val is not None:
return val
return default
def traverse_dict(dictn, keys, casesense=True):
''' For backward compatibility. Do not use '''
return traverse_obj(dictn, keys, casesense=casesense,
is_user_input=True, traverse_string=True)
def variadic(x, allowed_types=(str, bytes)):
return x if isinstance(x, collections.Iterable) and not isinstance(x, allowed_types) else (x,)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.06.23'
__version__ = '2021.07.07'

View File

@@ -120,12 +120,11 @@ def _format_ts(ts):
Convert an MPEG PES timestamp into a WebVTT timestamp.
This will lose sub-millisecond precision.
"""
ts = int((ts + 45) // 90)
ms , ts = divmod(ts, 1000) # noqa: W504,E221,E222,E203
s , ts = divmod(ts, 60) # noqa: W504,E221,E222,E203
min, h = divmod(ts, 60) # noqa: W504,E221,E222
return '%02u:%02u:%02u.%03u' % (h, min, s, ms)
msec = int((ts + 45) // 90)
secs, msec = divmod(msec, 1000)
mins, secs = divmod(secs, 60)
hrs, mins = divmod(mins, 60)
return '%02u:%02u:%02u.%03u' % (hrs, mins, secs, msec)
class Block(object):