1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-12 09:51:15 +00:00

Compare commits

...

68 Commits

Author SHA1 Message Date
pukkandan
a8bf9b4dc1 Release 2021.07.07 2021-07-07 05:35:20 +05:30
pukkandan
51f8a31d65 Update to ytdl-commit-a803582
[peertube] only call description endpoint if necessary
a803582717
2021-07-07 05:17:11 +05:30
Tom-Oliver Heidel
be05d5cff1 [soundcloud] Allow login using oauth token (#469)
Authored by: blackjack4494
2021-07-07 04:21:13 +05:30
zenerdi0de
30d569d2ac [fancode] Fix extraction, support live and allow login with refresh token (#471)
Authored-by: zenerdi0de
2021-07-07 04:02:56 +05:30
OhMyBahGosh
08625e4125 [AdobePass] Add Spectrum MSO (#470)
From: https://github.com/ytdl-org/youtube-dl/pull/26792

Co-authored by: kevinoconnor7, ohmybahgosh
2021-07-07 03:26:51 +05:30
pukkandan
3acf6d3856 [Funimation] Rewrite extractor (See desc) (#444)
* Support direct `/player/` URL
* Treat the different versions of an episode as different formats of a single video. So `experience_id` can no longer be used as the video `id` and the `episode_id` is used instead. This means that all existing archives will break
* Extractor options `language` and `version` to pre-select them
* Compat option `seperate-video-versions` to fall back to old behavior (including using the old video IDs)

Closes #428
2021-07-07 02:51:29 +05:30
pukkandan
46890374f7 [extractor] Minor improvements (See desc)
1. Allow removal of login hint - extractors can set their own login hint as part of `msg`
2. Cleanup `_merge_subtitles` signature
2021-07-07 02:27:53 +05:30
pukkandan
60755938b3 [extractor] Prevent unnecessary download of hls manifests
and refactor `hls_split_discontinuity` code
2021-07-07 02:24:58 +05:30
pukkandan
723d44b92b [fragment] Handle errors in threads correctly 2021-07-07 01:55:54 +05:30
pukkandan
bc97cdae67 [cleanup] Fix linter and some typos
Related: https://github.com/ytdl-org/youtube-dl/pull/29398
2021-07-04 03:04:25 +05:30
nyuszika7h
e010672ab5 [videa] Fix extraction (#463)
Authored by: nyuszika7h
2021-07-03 21:38:08 +05:30
pukkandan
169dbde946 Fixes for --list options (See desc)
1. Fix `--list-formats-old`
2. Allow listing with `--quiet`
3. Allow various listings to work together
4. Allow `--print` to work with listing
2021-07-03 01:16:19 +05:30
MinePlayersPE
17f0eb66b8 [RCTIPlus] Add extractor (#443)
Authored by: MinePlayersPE
2021-07-02 19:54:41 +05:30
pukkandan
981052c9c6 Some minor fixes and refactoring (see desc)
* [utils] Fix issues with reversal
* check_formats should catch `DownloadError`, not `ExtractorError`
* Simplify format selectors with `LazyList` and `yield from`
2021-07-02 08:17:37 +05:30
pukkandan
b1e60d1806 [facebook] Extract description and fix title
Partially fixes: #453
2021-07-02 08:17:37 +05:30
pukkandan
6b6c16ca6c [downloader/ffmpeg] Fix --ppa when using simultaneous download 2021-07-02 08:17:30 +05:30
krichbanana
f6745c4980 [Youtube] Choose correct Live chat API for upcoming streams (#460)
Authored by: krichbanana
2021-07-02 05:59:29 +05:30
coletdjnz
109dd3b237 [youtube] Use new API for additional video extraction requests (#328)
Co-authored-by: colethedj, pukkandan
Closes https://github.com/yt-dlp/yt-dlp/issues/427
Workarounds for https://github.com/ytdl-org/youtube-dl/issues/29326, https://github.com/yt-dlp/yt-dlp/issues/319, https://github.com/ytdl-org/youtube-dl/issues/29086
2021-06-29 22:07:49 +00:00
siikamiika
c2603313b1 [youtube_live_chat] use clickTrackingParams (#449)
Authored by: siikamiika
2021-06-27 04:52:32 +05:30
LE
1e79316e20 [TBS] Support livestreams (#448)
Authored by: llacb47
2021-06-26 17:14:43 +05:30
coletdjnz
45261e063b [youtube:comments] Fix error handling and add itct to params (#446)
Should close #439 (untested)

Authored by: colethedj
2021-06-25 23:31:10 +05:30
pukkandan
49c258e18d [youtube] Fix subtitle names for age-gated videos
Related: https://github.com/iv-org/invidious/pull/2205#issuecomment-868680486
2021-06-25 23:10:31 +05:30
pukkandan
d3f62c1967 Fix --throttled-rate when using --load-info-json 2021-06-25 22:57:17 +05:30
pukkandan
5d3a0e794b Add --extractor-args to pass extractor-specific arguments 2021-06-25 20:10:28 +05:30
Mevious
125728b038 [funimation] Add FunimationShowIE (#442)
Closes #436

Authored by: Mevious
2021-06-25 05:45:23 +05:30
pukkandan
15a4fd53d3 [thumbnailsconvertor] Treat jpeg as jpg 2021-06-25 05:36:35 +05:30
Adrik
4513a41a72 Process videos when using --ignore-no-formats-error (#441)
Authored by: krichbanana
2021-06-24 22:23:34 +05:30
pukkandan
6033d9808d Fix --flat-playlist when entry has no ie_key 2021-06-24 22:23:34 +05:30
pukkandan
bd4d1ea398 [cleanup] Minor refactoring of fragment 2021-06-24 22:23:33 +05:30
pukkandan
8e897ed283 [fragment] Return status of download correctly 2021-06-24 22:04:23 +05:30
LE
412cce82b0 [yahoo] Fix extraction (#435)
Fixes: https://github.com/ytdl-org/youtube-dl/issues/28290

Co-authored-by: llacb47, pukkandan
2021-06-24 21:27:48 +05:30
siikamiika
d534c4520b [youtube_live_chat] Fix download with cookies (#437)
Closes #417 

Authored by: siikamiika
2021-06-24 21:26:32 +05:30
pukkandan
2b18a8c590 [plutotv] Improve _VALID_URL
Closes #431
2021-06-23 07:49:09 +05:30
pukkandan
dac8b87b0c [version] update :ci skip all 2021-06-23 07:37:07 +05:30
pukkandan
6aecd87106 Release 2021.06.23 2021-06-23 07:34:55 +05:30
pukkandan
ed807c1837 Update to ytdl-commit-379f52a
[liveleak] Remove extractor
379f52a495
2021-06-23 07:34:55 +05:30
Mevious
29f63c9672 [funimation] Extract subtitles (#434)
Closes #420, https://github.com/ytdl-org/youtube-dl/issues/25645
Related: https://github.com/ytdl-org/youtube-dl/pull/24906

Authored by: Mevious
2021-06-23 07:27:53 +05:30
pukkandan
9fc0de5796 [hotstar] Use server time for authentication instead of local time
Closes #396
2021-06-23 06:04:42 +05:30
siikamiika
c60ee3a218 [youtube_live_chat] Support ongoing live chat (#422)
Authored by: siikamiika
2021-06-23 05:42:39 +05:30
pukkandan
8a77e5e6bc [cleanup] Revert unnecessary changes in 51d9739f80 2021-06-23 05:34:40 +05:30
pukkandan
51d9739f80 Add option --throttled-rate below which video data is re-extracted
Currently only for HTTP downloads

Closes #430, workaround for https://github.com/ytdl-org/youtube-dl/issues/29326
2021-06-23 05:29:58 +05:30
pukkandan
4c7853de14 [fragment] Merge during download for -N, and refactor hls/dash (#364) 2021-06-22 00:29:50 +05:30
pukkandan
e6779b9400 [twitcasting] Websocket support (#399)
Closes #392
Authored by: nao20010128nao
2021-06-21 22:56:45 +05:30
pukkandan
e36d50c5dd [websockets] Add WebSocketFragmentFD (#399)
Necessary for #392

Co-authored by: nao20010128nao, pukkandan
2021-06-21 22:56:36 +05:30
pukkandan
ff0f78e1fe [aria2c] Lower --min-split-size for HTTP downloads
This makes downloading smaller files much faster
2021-06-20 19:28:54 +05:30
pukkandan
7e067091e8 [options] Rename --add-metadata to --embed-metadata
and clarify that it embeds chapter markers
2021-06-20 04:59:35 +05:30
pukkandan
f89b3e2d7a Skip fixup of existing files and add --fixup force to force it 2021-06-20 04:59:34 +05:30
pukkandan
fd7cfb6444 [cleanup] Refactor fixup 2021-06-20 04:26:11 +05:30
pukkandan
4e6767b5f2 [youtube] Temporary fix for age-gate
Related:
https://stackoverflow.com/a/67629882
https://github.com/yt-dlp/yt-dlp/issues/319
https://github.com/ytdl-org/youtube-dl/issues/29333
https://github.com/ytdl-org/youtube-dl/issues/29086
2021-06-18 20:32:52 +05:30
pukkandan
9fea350f0d Fix id sanitization in filenames
Closes #415
2021-06-17 02:32:24 +05:30
pukkandan
e858a9d6d3 [EmbedThumbnail] Add compat-option embed-thumbnail-atomicparsley
to force use of atomicparsley for embedding thumbnails in mp4

Related: #411
2021-06-16 22:33:32 +05:30
pukkandan
7e87e27c52 [postprocessor] Fix _restrict_to when a codec is not set 2021-06-14 14:09:22 +05:30
pukkandan
d0fb4bd16f [pornhub] Extract cast
Closes #406, https://github.com/ytdl-org/youtube-dl/pull/27384
2021-06-13 21:38:08 +05:30
felix
3fd4c2a543 [mediasite] Extract slides (#343)
Fixes:
https://github.com/ytdl-org/youtube-dl/issues/4974#issue-58006762
https://github.com/ytdl-org/youtube-dl/issues/4540#issuecomment-69574231
https://github.com/ytdl-org/youtube-dl/pull/11185#issuecomment-335554239

Authored by: fstirlitz
2021-06-13 20:36:40 +05:30
felix
cdb19aa4c2 [downloader/mhtml] Add new downloader (#343)
This downloader is intended to be used for streams that consist of a
timed sequence of stand-alone images, such as slideshows or thumbnail
streams

This can be used for implementing:

https://github.com/ytdl-org/youtube-dl/issues/4974#issue-58006762
https://github.com/ytdl-org/youtube-dl/issues/4540#issuecomment-69574231
https://github.com/ytdl-org/youtube-dl/pull/11185#issuecomment-335554239

https://github.com/ytdl-org/youtube-dl/issues/9868
https://github.com/ytdl-org/youtube-dl/pull/14951


Authored by: fstirlitz
2021-06-13 20:36:40 +05:30
pukkandan
4d85fbbdbb Fix bug in 8326b00aab 2021-06-13 14:36:13 +05:30
pukkandan
551f93885e Ignore images formats from merge 2021-06-13 04:16:42 +05:30
pukkandan
8326b00aab Allow images formats
Necessary for #343.

* They are identified by `vcodec=acodec='none'`
* These formats show as the worst in `-F`
* Any postprocessor that expects audio/video will be skipped
* `b*` and all related selectors will skip such formats
* This commit also does not add any selector for downloading such formats. They have to be explicitly requested by the `format_id`. Implementation of a selector is left for when #389 is resolved
2021-06-13 03:45:53 +05:30
pukkandan
b0249bcaf0 Expand --check-formats to thumbnails
Closes #402
2021-06-13 03:45:53 +05:30
pukkandan
21cd8fae49 Use NamedTemporaryFile for --check-formats 2021-06-13 03:45:53 +05:30
pukkandan
45db527fa6 [youtube] Login is not needed for :ytrec 2021-06-13 03:45:53 +05:30
pukkandan
28419ca2c8 [utils] Improve LazyList
* Add `repr` and `str` that mimics `list`
* Add `reversed`. Unlike `[::-1]`, reversed does not exhaust the iterable and modifies the `LazyList` in-place
* Add tests
2021-06-13 03:45:53 +05:30
pukkandan
8ba8714880 [EmbedThumbnail] Fix for already downloaded thumbnail 2021-06-11 19:13:24 +05:30
pukkandan
187986a857 Better error handling of syntax errors in -f 2021-06-11 19:13:22 +05:30
coletdjnz
4ba001080f [youtube] Non-fatal alert reporting for unavailable videos page (#401)
Co-Authored by: colethedj, pukkandan
2021-06-10 21:12:56 +00:00
coletdjnz
1974e99f4b [youtube] Improve SAPISID cookie handling (closes #393) (#395)
Author: colethedj
2021-06-10 21:02:57 +00:00
pukkandan
0181adefc6 [build] Build Windows x86 version with py3.7
and remove redundant tests
Closes #390

:ci skip

Co-authored by: pukkandan, shirt-dev
2021-06-10 01:41:04 +05:30
pukkandan
fd3c633d26 [version] update
:ci skip all
2021-06-10 01:36:46 +05:30
75 changed files with 2852 additions and 1531 deletions

3
.gitattributes vendored
View File

@@ -1 +1,4 @@
* text=auto
Makefile* text whitespace=-tab-in-indent
*.sh text eol=lf

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.08. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.06.08**
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -42,9 +42,9 @@ Provide the complete verbose output of yt-dlp that clearly demonstrates the prob
Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this:
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.06.08
[debug] yt-dlp version 2021.06.23
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.08. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.06.08**
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.08. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.06.08**
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.08. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.06.08**
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -44,9 +44,9 @@ Provide the complete verbose output of yt-dlp that clearly demonstrates the prob
Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this:
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.06.08
[debug] yt-dlp version 2021.06.23
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.08. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.06.23. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.06.08**
- [ ] I've verified that I'm running yt-dlp version **2021.06.23**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -42,7 +42,7 @@ Provide the complete verbose output of yt-dlp that clearly demonstrates the prob
Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this:
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2

View File

@@ -44,7 +44,7 @@ Provide the complete verbose output of yt-dlp that clearly demonstrates the prob
Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this:
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2

View File

@@ -95,14 +95,15 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Set up Python
# 3.8 is used for Win7 support
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Upgrade pip and enable wheel support
run: python -m pip install --upgrade pip setuptools wheel
- name: Install Requirements
run: pip install pyinstaller mutagen pycryptodome
run: pip install pyinstaller mutagen pycryptodome websockets
- name: Bump version
id: bump_version
run: python devscripts/update-version.py
@@ -137,15 +138,16 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.4.4 32-Bit
# 3.7 is used for Vista support. See https://github.com/yt-dlp/yt-dlp/issues/390
- name: Set up Python 3.7 32-Bit
uses: actions/setup-python@v2
with:
python-version: '3.4.4'
python-version: '3.7'
architecture: 'x86'
- name: Upgrade pip and enable wheel support
run: python -m pip install pip==19.1.1 setuptools==43.0.0 wheel==0.33.6
- name: Install Requirements for 32 Bit
run: pip install pyinstaller==3.5 mutagen==1.42.0 pycryptodome==3.9.4 pefile==2019.4.18
run: python -m pip install --upgrade pip setuptools wheel
- name: Install Requirements
run: pip install pyinstaller mutagen pycryptodome websockets
- name: Bump version
id: bump_version
run: python devscripts/update-version.py

View File

@@ -9,11 +9,13 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-18.04]
python-version: [3.6, 3.7, 3.8, 3.9, pypy-3.6, pypy-3.7]
# py3.9 is in quick-test
python-version: [3.7, 3.8, pypy-3.6, pypy-3.7]
run-tests-ext: [sh]
include:
# atleast one of the tests must be in windows
- os: windows-latest
python-version: 3.4 # Windows x86 build is still in 3.4
python-version: 3.6
run-tests-ext: bat
steps:
- uses: actions/checkout@v2

View File

@@ -9,11 +9,11 @@ jobs:
fail-fast: true
matrix:
os: [ubuntu-18.04]
python-version: [3.6, 3.7, 3.8, 3.9, pypy-3.6, pypy-3.7]
python-version: [3.7, 3.8, 3.9, pypy-3.6, pypy-3.7]
run-tests-ext: [sh]
include:
- os: windows-latest
python-version: 3.4 # Windows x86 build is still in 3.4
python-version: 3.6
run-tests-ext: bat
steps:
- uses: actions/checkout@v2

View File

@@ -3,7 +3,7 @@
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e

View File

@@ -52,5 +52,9 @@ hhirtz
louie-github
MinePlayersPE
olifre
rhsmachine
rhsmachine/zenerdi0de
nihil-admirari
krichbanana
ohmybahgosh
nyuszika7h
blackjack4494

View File

@@ -19,6 +19,91 @@
-->
### 2021.07.07
* Merge youtube-dl: Upto [commit/a803582](https://github.com/ytdl-org/youtube-dl/commit/a8035827177d6b59aca03bd717acb6a9bdd75ada)
* Add `--extractor-args` to pass extractor-specific arguments
* Add extractor option `skip` for `youtube`. Eg: `--extractor-args youtube:skip=hls,dash`
* Deprecates --youtube-skip-dash-manifest, --youtube-skip-hls-manifest, --youtube-include-dash-manifest, --youtube-include-hls-manifest
* Allow `--list...` options to work with `--print`, `--quiet` and other `--list...` options
* [youtube] Use `player` API for additional video extraction requests by [colethedj](https://github.com/colethedj)
* **Fixes youtube premium music** (format 141) extraction
* Adds extractor option `player_client` = `web`/`android`
* **`--extractor-args youtube:player_client=android` works around the throttling** for the time-being
* Adds extractor option `player_skip=config`
* Adds age-gate fallback using embedded client
* [youtube] Choose correct Live chat API for upcoming streams by [krichbanana](https://github.com/krichbanana)
* [youtube] Fix subtitle names for age-gated videos
* [youtube:comments] Fix error handling and add `itct` to params by [colethedj](https://github.com/colethedj)
* [youtube_live_chat] Fix download with cookies by [siikamiika](https://github.com/siikamiika)
* [youtube_live_chat] use `clickTrackingParams` by [siikamiika](https://github.com/siikamiika)
* [Funimation] Rewrite extractor
* Add `FunimationShowIE` by [Mevious](https://github.com/Mevious)
* **Treat the different versions of an episode as different formats of a single video**
* This changes the video `id` and will break break existing archives
* Compat option `seperate-video-versions` to fall back to old behavior including using the old video ids
* Support direct `/player/` URL
* Extractor options `language` and `version` to pre-select them during extraction
* These options may be removed in the future if we can extract all formats without additional network requests
* Do not rely on these for format selection and use `-f` filters instead
* [AdobePass] Add Spectrum MSO by [kevinoconnor7](https://github.com/kevinoconnor7), [ohmybahgosh](https://github.com/ohmybahgosh)
* [facebook] Extract description and fix title
* [fancode] Fix extraction, support live and allow login with refresh token by [zenerdi0de](https://github.com/zenerdi0de)
* [plutotv] Improve `_VALID_URL`
* [RCTIPlus] Add extractor by [MinePlayersPE](https://github.com/MinePlayersPE)
* [Soundcloud] Allow login using oauth token by [blackjack4494](https://github.com/blackjack4494)
* [TBS] Support livestreams by [llacb47](https://github.com/llacb47)
* [videa] Fix extraction by [nyuszika7h](https://github.com/nyuszika7h)
* [yahoo] Fix extraction by [llacb47](https://github.com/llacb47), [pukkandan](https://github.com/pukkandan)
* Process videos when using `--ignore-no-formats-error` by [krichbanana](https://github.com/krichbanana)
* Fix `--throttled-rate` when using `--load-info-json`
* Fix `--flat-playlist` when entry has no `ie_key`
* Fix `check_formats` catching `ExtractorError` instead of `DownloadError`
* Fix deprecated option `--list-formats-old`
* [downloader/ffmpeg] Fix `--ppa` when using simultaneous download
* [extractor] Prevent unnecessary download of hls manifests and refactor `hls_split_discontinuity`
* [fragment] Handle status of download and errors in threads correctly; and minor refactoring
* [thumbnailsconvertor] Treat `jpeg` as `jpg`
* [utils] Fix issues with `LazyList` reversal
* [extractor] Allow extractors to set their own login hint
* [cleanup] Simplify format selector code with `LazyList` and `yield from`
* [cleanup] Clean `extractor.common._merge_subtitles` signature
* [cleanup] Fix some typos
### 2021.06.23
* Merge youtube-dl: Upto [commit/379f52a](https://github.com/ytdl-org/youtube-dl/commit/379f52a4954013767219d25099cce9e0f9401961)
* **Add option `--throttled-rate`** below which video data is re-extracted
* [fragment] **Merge during download for `-N`**, and refactor `hls`/`dash`
* [websockets] Add `WebSocketFragmentFD` by [nao20010128nao](https://github.com/nao20010128nao), [pukkandan](https://github.com/pukkandan)
* Allow `images` formats in addition to video/audio
* [downloader/mhtml] Add new downloader for slideshows/storyboards by [fstirlitz](https://github.com/fstirlitz)
* [youtube] Temporary **fix for age-gate**
* [youtube] Support ongoing live chat by [siikamiika](https://github.com/siikamiika)
* [youtube] Improve SAPISID cookie handling by [colethedj](https://github.com/colethedj)
* [youtube] Login is not needed for `:ytrec`
* [youtube] Non-fatal alert reporting for unavailable videos page by [colethedj](https://github.com/colethedj)
* [twitcasting] Websocket support by [nao20010128nao](https://github.com/nao20010128nao)
* [mediasite] Extract slides by [fstirlitz](https://github.com/fstirlitz)
* [funimation] Extract subtitles
* [pornhub] Extract `cast`
* [hotstar] Use server time for authentication instead of local time
* [EmbedThumbnail] Fix for already downloaded thumbnail
* [EmbedThumbnail] Add compat-option `embed-thumbnail-atomicparsley`
* Expand `--check-formats` to thumbnails
* Fix id sanitization in filenames
* Skip fixup of existing files and add `--fixup force` to force it
* Better error handling of syntax errors in `-f`
* Use `NamedTemporaryFile` for `--check-formats`
* [aria2c] Lower `--min-split-size` for HTTP downloads
* [options] Rename `--add-metadata` to `--embed-metadata`
* [utils] Improve `LazyList` and add tests
* [build] Build Windows x86 version with py3.7 and remove redundant tests by [pukkandan](https://github.com/pukkandan), [shirt](https://github.com/shirt-dev)
* [docs] Clarify that `--embed-metadata` embeds chapter markers
* [cleanup] Refactor fixup
### 2021.06.09
* Fix bug where `%(field)d` in filename template throws error
@@ -34,7 +119,7 @@
* [extractor] Fix FourCC fallback when parsing ISM by [fstirlitz](https://github.com/fstirlitz)
* [twitcasting] Add TwitCastingUserIE, TwitCastingLiveIE by [pukkandan](https://github.com/pukkandan), [nao20010128nao](https://github.com/nao20010128nao)
* [vidio] Add VidioPremierIE and VidioLiveIE by [MinePlayersPE](Https://github.com/MinePlayersPE)
* [viki] Fix extraction from by [ytdl-org/youtube-dl@59e583f](https://github.com/ytdl-org/youtube-dl/commit/59e583f7e8530ca92776c866897d895c072e2a82)
* [viki] Fix extraction from [ytdl-org/youtube-dl@59e583f](https://github.com/ytdl-org/youtube-dl/commit/59e583f7e8530ca92776c866897d895c072e2a82)
* [youtube] Support shorts URL
* [zoom] Extract transcripts as subtitles
* Add field `original_url` with the user-inputted URL

View File

@@ -53,6 +53,7 @@ yt-dlp is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on t
* [Format Selection examples](#format-selection-examples)
* [MODIFYING METADATA](#modifying-metadata)
* [Modifying metadata examples](#modifying-metadata-examples)
* [EXTRACTOR ARGUMENTS](#extractor-arguments)
* [PLUGINS](#plugins)
* [DEPRECATED OPTIONS](#deprecated-options)
* [MORE](#more)
@@ -66,7 +67,7 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection that what is possible by simply using `--format` ([examples](#format-selection-examples))
* **Merged with youtube-dl [commit/c2350ca](https://github.com/ytdl-org/youtube-dl/commit/c2350cac243ba1ec1586fe85b0d62d1b700047a2)**: (v2021.06.06) You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with youtube-dl [commit/379f52a](https://github.com/ytdl-org/youtube-dl/commit/379f52a4954013767219d25099cce9e0f9401961)**: (v2021.06.06) You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
@@ -84,9 +85,9 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **Aria2c with HLS/DASH**: You can use `aria2c` as the external downloader for DASH(mpd) and HLS(m3u8) formats
* **New extractors**: AnimeLab, Philo MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive
* **New extractors**: AnimeLab, Philo MSO, Spectrum MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive, RCTIPlus, TBS Live
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon, videa, yahoo
* **Subtitle extraction from manifests**: Subtitles can be extracted from streaming media manifests. See [commit/be6202f](https://github.com/yt-dlp/yt-dlp/commit/be6202f12b97858b9d716e608394b51065d0419f) for details
@@ -127,10 +128,12 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* `--add-metadata` attaches the `infojson` to `mkv` files in addition to writing the metadata when used with `--write-infojson`. Use `--compat-options no-attach-info-json` to revert this
* `playlist_index` behaves differently when used with options like `--playlist-reverse` and `--playlist-items`. See [#302](https://github.com/yt-dlp/yt-dlp/issues/302) for details. You can use `--compat-options playlist-index` if you want to keep the earlier behavior
* The output of `-F` is listed in a new format. Use `--compat-options list-formats` to revert this
* All *experiences* of a funimation episode are considered as a single video. This behavior breaks existing archives. Use `--compat-options seperate-video-versions` to extract information from only the default player
* Youtube live chat (if available) is considered as a subtitle. Use `--sub-langs all,-live_chat` to download all subtitles except live chat. You can also use `--compat-options no-live-chat` to prevent live chat from downloading
* Youtube channel URLs are automatically redirected to `/video`. Append a `/featured` to the URL to download only the videos in the home page. If the channel does not have a videos tab, we try to download the equivalent `UU` playlist instead. Also, `/live` URLs raise an error if there are no live videos instead of silently downloading the entire channel. You may use `--compat-options no-youtube-channel-redirect` to revert all these redirections
* Unavailable videos are also listed for youtube playlists. Use `--compat-options no-youtube-unavailable-videos` to remove this
* If `ffmpeg` is used as the downloader, the downloading and merging of formats happen in a single step when possible. Use `--compat-options no-direct-merge` to revert this
* Thumbnail embedding in `mp4` is done with mutagen if possible. Use `--compat-options embed-thumbnail-atomicparsley` to force the use of AtomicParsley instead
For ease of use, a few more compat options are available:
* `--compat-options all`: Use all compat options
@@ -181,6 +184,7 @@ While all the other dependancies are optional, `ffmpeg` and `ffprobe` are highly
* [**sponskrub**](https://github.com/faissaloo/SponSkrub) - For using the [sponskrub options](#sponskrub-sponsorblock-options). Licenced under [GPLv3+](https://github.com/faissaloo/SponSkrub/blob/master/LICENCE.md)
* [**mutagen**](https://github.com/quodlibet/mutagen) - For embedding thumbnail in certain formats. Licenced under [GPLv2+](https://github.com/quodlibet/mutagen/blob/master/COPYING)
* [**pycryptodome**](https://github.com/Legrandin/pycryptodome) - For decrypting various data. Licenced under [BSD2](https://github.com/Legrandin/pycryptodome/blob/master/LICENSE.rst)
* [**websockets**](https://github.com/aaugustin/websockets) - For downloading over websocket. Licenced under [BSD3](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen is not present. Licenced under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING)
* [**rtmpdump**](http://rtmpdump.mplayerhq.hu) - For downloading `rtmp` streams. ffmpeg will be used as a fallback. Licenced under [GPLv2+](http://rtmpdump.mplayerhq.hu)
* [**mplayer**](http://mplayerhq.hu/design7/info.html) or [**mpv**](https://mpv.io) - For downloading `rstp` streams. ffmpeg will be used as a fallback. Licenced under [GPLv2+](https://github.com/mpv-player/mpv/blob/master/Copyright)
@@ -189,14 +193,14 @@ While all the other dependancies are optional, `ffmpeg` and `ffprobe` are highly
To use or redistribute the dependencies, you must agree to their respective licensing terms.
Note that the windows releases are already built with the python interpreter, mutagen and pycryptodome included.
Note that the windows releases are already built with the python interpreter, mutagen, pycryptodome and websockets included.
### COMPILE
**For Windows**:
To build the Windows executable, you must have pyinstaller (and optionally mutagen and pycryptodome)
To build the Windows executable, you must have pyinstaller (and optionally mutagen, pycryptodome, websockets)
python3 -m pip install --upgrade pyinstaller mutagen pycryptodome
python3 -m pip install --upgrade pyinstaller mutagen pycryptodome websockets
Once you have all the necessary dependencies installed, just run `py pyinst.py`. The executable will be built for the same architecture (32/64 bit) as the python used to build it.
@@ -372,6 +376,9 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
(default is 1)
-r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M)
--throttled-rate RATE Minimum download rate in bytes per second
below which throttling is assumed and the
video data is re-extracted (e.g. 100K)
-R, --retries RETRIES Number of retries (default is 10), or
"infinite"
--fragment-retries RETRIES Number of retries for a fragment (default
@@ -428,7 +435,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--downloader-args NAME:ARGS Give these arguments to the external
downloader. Specify the downloader name and
the arguments separated by a colon ":". You
can use this option multiple times
can use this option multiple times to give
different arguments to different downloaders
(Alias: --external-downloader-args)
## Filesystem Options:
@@ -710,7 +718,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
Metadata, EmbedSubtitle, EmbedThumbnail,
SubtitlesConvertor, ThumbnailsConvertor,
VideoRemuxer, VideoConvertor, SponSkrub,
FixupStretched, FixupM4a and FixupM3u8. The
FixupStretched, FixupM4a, FixupM3u8,
FixupTimestamp and FixupDuration. The
supported executables are: AtomicParsley,
FFmpeg, FFprobe, and SponSkrub. You can
also specify "PP+EXE:ARGS" to give the
@@ -734,10 +743,13 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--embed-subs Embed subtitles in the video (only for mp4,
webm and mkv videos)
--no-embed-subs Do not embed subtitles (default)
--embed-thumbnail Embed thumbnail in the audio as cover art
--embed-thumbnail Embed thumbnail in the video as cover art
--no-embed-thumbnail Do not embed thumbnail (default)
--add-metadata Write metadata to the video file
--no-add-metadata Do not write metadata (default)
--embed-metadata Embed metadata including chapter markers
(if supported by the format) to the video
file (Alias: --add-metadata)
--no-embed-metadata Do not write metadata (default)
(Alias: --no-add-metadata)
--parse-metadata FROM:TO Parse additional metadata like title/artist
from other fields; see "MODIFYING METADATA"
for details
@@ -747,7 +759,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
file. One of never (do nothing), warn (only
emit a warning), detect_or_warn (the
default; fix file if we can, warn
otherwise)
otherwise), force (try fixing even if file
already exists
--ffmpeg-location PATH Location of the ffmpeg binary; either the
path to the binary or its containing
directory
@@ -806,18 +819,10 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-hls-split-discontinuity Do not split HLS playlists to different
formats at discontinuities such as ad
breaks (default)
--youtube-include-dash-manifest Download the DASH manifests and related
data on YouTube videos (default)
(Alias: --no-youtube-skip-dash-manifest)
--youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos
(Alias: --no-youtube-include-dash-manifest)
--youtube-include-hls-manifest Download the HLS manifests and related data
on YouTube videos (default)
(Alias: --no-youtube-skip-hls-manifest)
--youtube-skip-hls-manifest Do not download the HLS manifests and
related data on YouTube videos
(Alias: --no-youtube-include-hls-manifest)
--extractor-args KEY:ARGS Pass these arguments to the extractor. See
"EXTRACTOR ARGUMENTS" for details. You can
use this option multiple times to give
different arguments to different extractors
# CONFIGURATION
@@ -1011,7 +1016,7 @@ Available only when used in `--print`:
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKcj`, this will result in a `yt-dlp test video-BaW_jenozKcj.mp4` file created in the current directory.
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKc`, this will result in a `yt-dlp test video-BaW_jenozKc.mp4` file created in the current directory.
For numeric sequences you can use numeric related formatting, for example, `%(view_count)05d` will result in a string with view count padded with zeros up to 5 characters, like in `00042`.
@@ -1140,7 +1145,7 @@ You can change the criteria for being considered the `best` by using `-S` (`--fo
- `lang`: Language preference as given by the extractor
- `quality`: The quality of the format as given by the extractor
- `source`: Preference of the source as given by the extractor
- `proto`: Protocol used for download (`https`/`ftps` > `http`/`ftp` > `m3u8_native` > `m3u8` > `http_dash_segments` > other > `mms`/`rtsp` > unknown > `f4f`/`f4m`)
- `proto`: Protocol used for download (`https`/`ftps` > `http`/`ftp` > `m3u8_native`/`m3u8` > `http_dash_segments`> `websocket_frag` > other > `mms`/`rtsp` > unknown > `f4f`/`f4m`)
- `vcodec`: Video Codec (`av01` > `vp9.2` > `vp9` > `h265` > `h264` > `vp8` > `h263` > `theora` > other > unknown)
- `acodec`: Audio Codec (`opus` > `vorbis` > `aac` > `mp4a` > `mp3` > `ac3` > `dts` > other > unknown)
- `codec`: Equivalent to `vcodec,acodec`
@@ -1321,6 +1326,23 @@ $ yt-dlp --parse-metadata 'description:(?s)(?P<meta_comment>.+)' --add-metadata
```
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) seperated string of `ARG=VAL1,VAL2`. Eg: `--extractor-args "youtube:skip=dash,hls;player_client=android" --extractor-args "funimation:version=uncut"`
The following extractors use this feature:
* **youtube**
* `skip`: `hls` or `dash` (or both) to skip download of the respective manifests
* `player_client`: `web` (default) or `android` (force use the android client fallbacks for video extraction)
* `player_skip`: `configs` - skip requests if applicable for client configs and use defaults
* **funimation**
* `language`: Languages to extract. Eg: `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
NOTE: These options may be changed/removed in the future without concern for backward compatibility
# PLUGINS
Plugins are loaded from `<root-dir>/ytdlp_plugins/<type>/__init__.py`. Currently only `extractor` plugins are supported. Support for `downloader` and `postprocessor` plugins may be added in the future. See [ytdlp_plugins](ytdlp_plugins) for example.
@@ -1352,6 +1374,10 @@ While these options still work, their use is not recommended since there are oth
--list-formats-old --compat-options list-formats (Alias: --no-list-formats-as-table)
--list-formats-as-table --compat-options -list-formats [Default] (Alias: --no-list-formats-old)
--sponskrub-args ARGS --ppa "sponskrub:ARGS"
--youtube-skip-dash-manifest --extractor-args "youtube:skip=dash" (Alias: --no-youtube-include-dash-manifest)
--youtube-skip-hls-manifest --extractor-args "youtube:skip=hls" (Alias: --no-youtube-include-hls-manifest)
--youtube-include-dash-manifest Default (Alias: --no-youtube-skip-dash-manifest)
--youtube-include-hls-manifest Default (Alias: --no-youtube-skip-hls-manifest)
--test Used by developers for testing extractors. Not intended for the end user
--youtube-print-sig-code Used for testing youtube signatures

View File

@@ -6,6 +6,7 @@ import sys
# import os
import platform
from PyInstaller.utils.hooks import collect_submodules
from PyInstaller.utils.win32.versioninfo import (
VarStruct, VarFileInfo, StringStruct, StringTable,
StringFileInfo, FixedFileInfo, VSVersionInfo, SetVersion,
@@ -66,16 +67,15 @@ VERSION_FILE = VSVersionInfo(
]
)
dependancies = ['Crypto', 'mutagen'] + collect_submodules('websockets')
excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc']
PyInstaller.__main__.run([
'--name=yt-dlp%s' % _x86,
'--onefile',
'--icon=devscripts/cloud.ico',
'--exclude-module=youtube_dl',
'--exclude-module=youtube_dlc',
'--exclude-module=test',
'--exclude-module=ytdlp_plugins',
'--hidden-import=mutagen',
'--hidden-import=Crypto',
*[f'--exclude-module={module}' for module in excluded_modules],
*[f'--hidden-import={module}' for module in dependancies],
'--upx-exclude=vcruntime140.dll',
'yt_dlp/__main__.py',
])

View File

@@ -1,2 +1,3 @@
mutagen
pycryptodome
websockets

View File

@@ -19,7 +19,7 @@ LONG_DESCRIPTION = '\n\n'.join((
'**PS**: Some links in this document will not work since this is a copy of the README.md from Github',
open('README.md', 'r', encoding='utf-8').read()))
REQUIREMENTS = ['mutagen', 'pycryptodome']
REQUIREMENTS = ['mutagen', 'pycryptodome', 'websockets']
if sys.argv[1:2] == ['py2exe']:
raise NotImplementedError('py2exe is not currently supported; instead, use "pyinst.py" to build with pyinstaller')

View File

@@ -225,8 +225,7 @@
- **Culturebox**
- **CultureUnplugged**
- **curiositystream**
- **curiositystream:collections**
- **curiositystream:series**
- **curiositystream:collection**
- **CWTV**
- **DagelijkseKost**: dagelijksekost.een.be
- **DailyMail**
@@ -497,8 +496,6 @@
- **LinuxAcademy**
- **LiTV**
- **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **livestream:original**
- **LnkGo**

View File

@@ -461,14 +461,13 @@ class TestFormatSelection(unittest.TestCase):
def test_invalid_format_specs(self):
def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec})
info_dict = _make_result([{'format_id': 'foo', 'url': TEST_URL}])
self.assertRaises(SyntaxError, ydl.process_ie_result, info_dict)
self.assertRaises(SyntaxError, YDL, {'format': format_spec})
assert_syntax_error('bestvideo,,best')
assert_syntax_error('+bestaudio')
assert_syntax_error('bestvideo+')
assert_syntax_error('/')
assert_syntax_error('[720<height]')
def test_format_filtering(self):
formats = [
@@ -665,15 +664,15 @@ class TestYoutubeDL(unittest.TestCase):
}
def test_prepare_outtmpl_and_filename(self):
def test(tmpl, expected, **params):
def test(tmpl, expected, *, info=None, **params):
params['outtmpl'] = tmpl
ydl = YoutubeDL(params)
ydl._num_downloads = 1
self.assertEqual(ydl.validate_outtmpl(tmpl), None)
outtmpl, tmpl_dict = ydl.prepare_outtmpl(tmpl, self.outtmpl_info)
outtmpl, tmpl_dict = ydl.prepare_outtmpl(tmpl, info or self.outtmpl_info)
out = outtmpl % tmpl_dict
fname = ydl.prepare_filename(self.outtmpl_info)
fname = ydl.prepare_filename(info or self.outtmpl_info)
if callable(expected):
self.assertTrue(expected(out))
@@ -701,6 +700,15 @@ class TestYoutubeDL(unittest.TestCase):
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
# ID sanitization
test('%(id)s', '_abcd', info={'id': '_abcd'})
test('%(some_id)s', '_abcd', info={'some_id': '_abcd'})
test('%(formats.0.id)s', '_abcd', info={'formats': [{'id': '_abcd'}]})
test('%(id)s', '-abcd', info={'id': '-abcd'})
test('%(id)s', '.abcd', info={'id': '.abcd'})
test('%(id)s', 'ab__cd', info={'id': 'ab__cd'})
test('%(id)s', ('ab:cd', 'ab -cd'), info={'id': 'ab:cd'})
# Invalid templates
self.assertTrue(isinstance(YoutubeDL.validate_outtmpl('%'), ValueError))
self.assertTrue(isinstance(YoutubeDL.validate_outtmpl('%(title)'), ValueError))

View File

@@ -12,6 +12,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import io
import itertools
import json
import xml.etree.ElementTree
@@ -108,6 +109,7 @@ from yt_dlp.utils import (
cli_bool_option,
parse_codecs,
iri_to_uri,
LazyList,
)
from yt_dlp.compat import (
compat_chr,
@@ -1525,6 +1527,47 @@ Line 1
self.assertEqual(clean_podcast_url('https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/5899E/traffic.megaphone.fm/HSW7835899191.mp3'), 'https://traffic.megaphone.fm/HSW7835899191.mp3')
self.assertEqual(clean_podcast_url('https://play.podtrac.com/npr-344098539/edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3'), 'https://edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3')
def test_LazyList(self):
it = list(range(10))
self.assertEqual(list(LazyList(it)), it)
self.assertEqual(LazyList(it).exhaust(), it)
self.assertEqual(LazyList(it)[5], it[5])
self.assertEqual(LazyList(it)[::2], it[::2])
self.assertEqual(LazyList(it)[1::2], it[1::2])
self.assertEqual(LazyList(it)[6:2:-2], it[6:2:-2])
self.assertEqual(LazyList(it)[::-1], it[::-1])
self.assertTrue(LazyList(it))
self.assertFalse(LazyList(range(0)))
self.assertEqual(len(LazyList(it)), len(it))
self.assertEqual(repr(LazyList(it)), repr(it))
self.assertEqual(str(LazyList(it)), str(it))
self.assertEqual(list(LazyList(it).reverse()), it[::-1])
self.assertEqual(list(LazyList(it).reverse()[1:3:7]), it[::-1][1:3:7])
def test_LazyList_laziness(self):
def test(ll, idx, val, cache):
self.assertEqual(ll[idx], val)
self.assertEqual(getattr(ll, '_LazyList__cache'), list(cache))
ll = LazyList(range(10))
test(ll, 0, 0, range(1))
test(ll, 5, 5, range(6))
test(ll, -3, 7, range(10))
ll = LazyList(range(10)).reverse()
test(ll, -1, 0, range(1))
test(ll, 3, 6, range(10))
ll = LazyList(itertools.count())
test(ll, 10, 10, range(11))
ll.reverse()
test(ll, -15, 14, range(15))
if __name__ == '__main__':
unittest.main()

View File

@@ -20,6 +20,7 @@ import re
import shutil
import subprocess
import sys
import tempfile
import time
import tokenize
import traceback
@@ -67,6 +68,7 @@ from .utils import (
STR_FORMAT_RE,
formatSeconds,
GeoRestrictedError,
HEADRequest,
int_or_none,
iri_to_uri,
ISO3166Utils,
@@ -86,7 +88,6 @@ from .utils import (
preferredencoding,
prepend_extension,
process_communicate_or_kill,
random_uuidv4,
register_socks_protocols,
RejectedVideoReached,
render_table,
@@ -100,8 +101,10 @@ from .utils import (
str_or_none,
strftime_or_none,
subtitles_filename,
ThrottledDownload,
to_high_limit_path,
traverse_obj,
try_get,
UnavailableVideoError,
url_basename,
version_tuple,
@@ -126,13 +129,14 @@ from .downloader import (
)
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
get_postprocessor,
FFmpegFixupDurationPP,
FFmpegFixupM3u8PP,
FFmpegFixupM4aPP,
FFmpegFixupStretchedPP,
FFmpegFixupTimestampPP,
FFmpegMergerPP,
FFmpegPostProcessor,
# FFmpegSubtitlesConvertorPP,
get_postprocessor,
MoveFilesAfterDownloadPP,
)
from .version import __version__
@@ -388,17 +392,15 @@ class YoutubeDL(object):
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
compat_opts: Compatibility options. See "Differences in default behavior".
Note that only format-sort, format-spec, no-live-chat,
no-attach-info-json, playlist-index, list-formats,
no-direct-merge, no-youtube-channel-redirect,
and no-youtube-unavailable-videos works when used via the API
The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat,
no-playlist-metafiles. Refer __init__.py for their implementation
The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see yt_dlp/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts,
http_chunk_size.
nopart, updatetime, buffersize, ratelimit, throttledratelimit, min_filesize,
max_filesize, test, noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts, http_chunk_size.
The following options are used by the post processors:
prefer_ffmpeg: If False, use avconv instead of ffmpeg if both are available,
@@ -416,11 +418,16 @@ class YoutubeDL(object):
dynamic_mpd: Whether to process dynamic DASH manifests (default: True)
hls_split_discontinuity: Split HLS playlists to different formats at
discontinuities such as ad breaks (default: False)
youtube_include_dash_manifest: If True (default), DASH manifests and related
extractor_args: A dictionary of arguments to be passed to the extractors.
See "EXTRACTOR ARGUMENTS" for details.
Eg: {'youtube': {'skip': ['dash', 'hls']}}
youtube_include_dash_manifest: Deprecated - Use extractor_args instead.
If True (default), DASH manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
care about DASH. (only for youtube)
youtube_include_hls_manifest: If True (default), HLS manifests and related
youtube_include_hls_manifest: Deprecated - Use extractor_args instead.
If True (default), HLS manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
care about HLS. (only for youtube)
@@ -472,8 +479,7 @@ class YoutubeDL(object):
if sys.version_info < (3, 6):
self.report_warning(
'Support for Python version %d.%d have been deprecated and will break in future versions of yt-dlp! '
'Update to Python 3.6 or above' % sys.version_info[:2])
'Python version %d.%d is not supported! Please update to Python 3.6 or above' % sys.version_info[:2])
def check_deprecated(param, option, suggestion):
if self.params.get(param) is not None:
@@ -539,6 +545,11 @@ class YoutubeDL(object):
self.outtmpl_dict = self.parse_outtmpl()
# Creating format selector here allows us to catch syntax errors before the extraction
self.format_selector = (
None if self.params.get('format') is None
else self.build_format_selector(self.params['format']))
self._setup_opener()
"""Preload the archive, if any is specified"""
@@ -564,14 +575,9 @@ class YoutubeDL(object):
self.add_default_info_extractors()
for pp_def_raw in self.params.get('postprocessors', []):
pp_class = get_postprocessor(pp_def_raw['key'])
pp_def = dict(pp_def_raw)
del pp_def['key']
if 'when' in pp_def:
when = pp_def['when']
del pp_def['when']
else:
when = 'post_process'
when = pp_def.pop('when', 'post_process')
pp_class = get_postprocessor(pp_def.pop('key'))
pp = pp_class(self, **compat_kwargs(pp_def))
self.add_post_processor(pp, when=when)
@@ -813,6 +819,21 @@ class YoutubeDL(object):
'Put from __future__ import unicode_literals at the top of your code file or consider switching to Python 3.x.')
return outtmpl_dict
def get_output_path(self, dir_type='', filename=None):
paths = self.params.get('paths', {})
assert isinstance(paths, dict)
path = os.path.join(
expand_path(paths.get('home', '').strip()),
expand_path(paths.get(dir_type, '').strip()) if dir_type else '',
filename or '')
# Temporary fix for #4787
# 'Treat' all problem characters by passing filename through preferredencoding
# to workaround encoding issues with subprocess on python2 @ Windows
if sys.version_info < (3, 0) and sys.platform == 'win32':
path = encodeFilename(path, True).decode(preferredencoding())
return sanitize_path(path, force=self.params.get('windowsfilenames'))
@staticmethod
def validate_outtmpl(tmpl):
''' @return None or Exception object '''
@@ -913,7 +934,7 @@ class YoutubeDL(object):
fmt = outer_mobj.group('format')
mobj = re.match(INTERNAL_FORMAT_RE, key)
if mobj is None:
value, default = None, na
value, default, mobj = None, na, {'fields': ''}
else:
mobj = mobj.groupdict()
default = mobj['default'] if mobj['default'] is not None else na
@@ -923,7 +944,6 @@ class YoutubeDL(object):
fmt = '0{:d}d'.format(field_size_compat_map[key])
value = default if value is None else value
key += '\0%s' % fmt
if fmt == 'c':
value = compat_str(value)
@@ -941,7 +961,8 @@ class YoutubeDL(object):
# So we convert it to repr first
value, fmt = repr(value), '%ss' % fmt[:-1]
if fmt[-1] in 'csr':
value = sanitize(key, value)
value = sanitize(mobj['fields'].split('.')[-1], value)
key += '\0%s' % fmt
TMPL_DICT[key] = value
return '%({key}){fmt}'.format(key=key, fmt=fmt)
@@ -990,12 +1011,11 @@ class YoutubeDL(object):
def prepare_filename(self, info_dict, dir_type='', warn=False):
"""Generate the output filename."""
paths = self.params.get('paths', {})
assert isinstance(paths, dict)
filename = self._prepare_filename(info_dict, dir_type or 'default')
if warn and not self.__prepare_filename_warned:
if not paths:
if not self.params.get('paths'):
pass
elif filename == '-':
self.report_warning('--paths is ignored when an outputting to stdout')
@@ -1005,18 +1025,7 @@ class YoutubeDL(object):
if filename == '-' or not filename:
return filename
homepath = expand_path(paths.get('home', '').strip())
assert isinstance(homepath, compat_str)
subdir = expand_path(paths.get(dir_type, '').strip()) if dir_type else ''
assert isinstance(subdir, compat_str)
path = os.path.join(homepath, subdir, filename)
# Temporary fix for #4787
# 'Treat' all problem characters by passing filename through preferredencoding
# to workaround encoding issues with subprocess on python2 @ Windows
if sys.version_info < (3, 0) and sys.platform == 'win32':
path = encodeFilename(path, True).decode(preferredencoding())
return sanitize_path(path, force=self.params.get('windowsfilenames'))
return self.get_output_path(dir_type, filename)
def _match_entry(self, info_dict, incomplete=False, silent=False):
""" Returns None if the file should be downloaded """
@@ -1140,6 +1149,10 @@ class YoutubeDL(object):
self.report_error(msg)
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
except ThrottledDownload:
self.to_stderr('\r')
self.report_warning('The download speed is below throttle limit. Re-extracting data')
return wrapper(self, *args, **kwargs)
except (MaxDownloadsReached, ExistingVideoReached, RejectedVideoReached):
raise
except Exception as e:
@@ -1167,13 +1180,17 @@ class YoutubeDL(object):
return ie_result
def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, {
'extractor': ie.IE_NAME,
'webpage_url': url,
'original_url': url,
'webpage_url_basename': url_basename(url),
'extractor_key': ie.ie_key(),
})
if url is not None:
self.add_extra_info(ie_result, {
'webpage_url': url,
'original_url': url,
'webpage_url_basename': url_basename(url),
})
if ie is not None:
self.add_extra_info(ie_result, {
'extractor': ie.IE_NAME,
'extractor_key': ie.ie_key(),
})
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
@@ -1192,8 +1209,8 @@ class YoutubeDL(object):
or extract_flat is True):
info_copy = ie_result.copy()
self.add_extra_info(info_copy, extra_info)
self.add_default_extra_info(
info_copy, self.get_info_extractor(ie_result.get('ie_key')), ie_result['url'])
ie = try_get(ie_result.get('ie_key'), self.get_info_extractor)
self.add_default_extra_info(info_copy, ie, ie_result['url'])
self.__forced_printings(info_copy, self.prepare_filename(info_copy), incomplete=True)
return ie_result
@@ -1488,12 +1505,11 @@ class YoutubeDL(object):
'!=': operator.ne,
}
operator_rex = re.compile(r'''(?x)\s*
(?P<key>width|height|tbr|abr|vbr|asr|filesize|filesize_approx|fps)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
$
(?P<key>width|height|tbr|abr|vbr|asr|filesize|filesize_approx|fps)\s*
(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)\s*
''' % '|'.join(map(re.escape, OPERATORS.keys())))
m = operator_rex.search(filter_spec)
m = operator_rex.fullmatch(filter_spec)
if m:
try:
comparison_value = int(m.group('value'))
@@ -1514,13 +1530,12 @@ class YoutubeDL(object):
'$=': lambda attr, value: attr.endswith(value),
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>[a-zA-Z0-9._-]+)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
str_operator_rex = re.compile(r'''(?x)\s*
(?P<key>[a-zA-Z0-9._-]+)\s*
(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[a-zA-Z0-9._-]+)\s*
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(filter_spec)
m = str_operator_rex.fullmatch(filter_spec)
if m:
comparison_value = m.group('value')
str_op = STR_OPERATORS[m.group('op')]
@@ -1530,7 +1545,7 @@ class YoutubeDL(object):
op = str_op
if not m:
raise ValueError('Invalid filter specification %r' % filter_spec)
raise SyntaxError('Invalid filter specification %r' % filter_spec)
def _filter(f):
actual_value = f.get(m.group('key'))
@@ -1685,9 +1700,12 @@ class YoutubeDL(object):
formats_info.extend(format_2.get('requested_formats', (format_2,)))
if not allow_multiple_streams['video'] or not allow_multiple_streams['audio']:
get_no_more = {"video": False, "audio": False}
get_no_more = {'video': False, 'audio': False}
for (i, fmt_info) in enumerate(formats_info):
for aud_vid in ["audio", "video"]:
if fmt_info.get('acodec') == fmt_info.get('vcodec') == 'none':
formats_info.pop(i)
continue
for aud_vid in ['audio', 'video']:
if not allow_multiple_streams[aud_vid] and fmt_info.get(aud_vid[0] + 'codec') != 'none':
if get_no_more[aud_vid]:
formats_info.pop(i)
@@ -1738,21 +1756,25 @@ class YoutubeDL(object):
return new_dict
def _check_formats(formats):
if not check_formats:
yield from formats
for f in formats:
self.to_screen('[info] Testing format %s' % f['format_id'])
paths = self.params.get('paths', {})
temp_file = os.path.join(
expand_path(paths.get('home', '').strip()),
expand_path(paths.get('temp', '').strip()),
'ytdl.%s.f%s.check-format' % (random_uuidv4(), f['format_id']))
temp_file = tempfile.NamedTemporaryFile(
suffix='.tmp', delete=False,
dir=self.get_output_path('temp') or None)
temp_file.close()
try:
dl, _ = self.dl(temp_file, f, test=True)
except (ExtractorError, IOError, OSError, ValueError) + network_exceptions:
dl = False
success, _ = self.dl(temp_file.name, f, test=True)
except (DownloadError, IOError, OSError, ValueError) + network_exceptions:
success = False
finally:
if os.path.exists(temp_file):
os.remove(temp_file)
if dl:
if os.path.exists(temp_file.name):
try:
os.remove(temp_file.name)
except OSError:
self.report_warning('Unable to delete temporary file "%s"' % temp_file.name)
if success:
yield f
else:
self.to_screen('[info] Unable to download format %s. Skipping...' % f['format_id'])
@@ -1763,8 +1785,7 @@ class YoutubeDL(object):
def selector_function(ctx):
for f in fs:
for format in f(ctx):
yield format
yield from f(ctx)
return selector_function
elif selector.type == GROUP: # ()
@@ -1780,17 +1801,21 @@ class YoutubeDL(object):
return picked_formats
return []
elif selector.type == MERGE: # +
selector_1, selector_2 = map(_build_selector_function, selector.selector)
def selector_function(ctx):
for pair in itertools.product(
selector_1(copy.deepcopy(ctx)), selector_2(copy.deepcopy(ctx))):
yield _merge(pair)
elif selector.type == SINGLE: # atom
format_spec = selector.selector or 'best'
# TODO: Add allvideo, allaudio etc by generalizing the code with best/worst selector
if format_spec == 'all':
def selector_function(ctx):
formats = list(ctx['formats'])
if check_formats:
formats = _check_formats(formats)
for f in formats:
yield f
yield from _check_formats(ctx['formats'])
elif format_spec == 'mergeall':
def selector_function(ctx):
formats = list(_check_formats(ctx['formats']))
@@ -1814,14 +1839,16 @@ class YoutubeDL(object):
format_modified = mobj.group('mod') is not None
format_fallback = not format_type and not format_modified # for b, w
filter_f = (
_filter_f = (
(lambda f: f.get('%scodec' % format_type) != 'none')
if format_type and format_modified # bv*, ba*, wv*, wa*
else (lambda f: f.get('%scodec' % not_format_type) == 'none')
if format_type # bv, ba, wv, wa
else (lambda f: f.get('vcodec') != 'none' and f.get('acodec') != 'none')
if not format_modified # b, w
else None) # b*, w*
else lambda f: True) # b*, w*
filter_f = lambda f: _filter_f(f) and (
f.get('vcodec') != 'none' or f.get('acodec') != 'none')
else:
filter_f = ((lambda f: f.get('ext') == format_spec)
if format_spec in ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav'] # extension
@@ -1829,29 +1856,17 @@ class YoutubeDL(object):
def selector_function(ctx):
formats = list(ctx['formats'])
if not formats:
return
matches = list(filter(filter_f, formats)) if filter_f is not None else formats
if format_fallback and ctx['incomplete_formats'] and not matches:
# for extractors with incomplete formats (audio only (soundcloud)
# or video only (imgur)) best/worst will fallback to
# best/worst {video,audio}-only format
matches = formats
if format_reverse:
matches = matches[::-1]
if check_formats:
matches = list(itertools.islice(_check_formats(matches), format_idx))
n = len(matches)
if -n <= format_idx - 1 < n:
matches = LazyList(_check_formats(matches[::-1 if format_reverse else 1]))
try:
yield matches[format_idx - 1]
elif selector.type == MERGE: # +
selector_1, selector_2 = map(_build_selector_function, selector.selector)
def selector_function(ctx):
for pair in itertools.product(
selector_1(copy.deepcopy(ctx)), selector_2(copy.deepcopy(ctx))):
yield _merge(pair)
except IndexError:
return
filters = [self._build_format_filter(f) for f in selector.filters]
@@ -1914,8 +1929,7 @@ class YoutubeDL(object):
self.cookiejar.add_cookie_header(pr)
return pr.get_header('Cookie')
@staticmethod
def _sanitize_thumbnails(info_dict):
def _sanitize_thumbnails(self, info_dict):
thumbnails = info_dict.get('thumbnails')
if thumbnails is None:
thumbnail = info_dict.get('thumbnail')
@@ -1928,12 +1942,25 @@ class YoutubeDL(object):
t.get('height') if t.get('height') is not None else -1,
t.get('id') if t.get('id') is not None else '',
t.get('url')))
def test_thumbnail(t):
self.to_screen('[info] Testing thumbnail %s' % t['id'])
try:
self.urlopen(HEADRequest(t['url']))
except network_exceptions as err:
self.to_screen('[info] Unable to connect to thumbnail %s URL "%s" - %s. Skipping...' % (
t['id'], t['url'], error_to_compat_str(err)))
return False
return True
for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
t['id'] = '%d' % i
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
t['url'] = sanitize_url(t['url'])
if self.params.get('check_formats'):
info_dict['thumbnails'] = LazyList(filter(test_thumbnail, thumbnails[::-1])).reverse()
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
@@ -1973,10 +2000,6 @@ class YoutubeDL(object):
self._sanitize_thumbnails(info_dict)
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
return
thumbnail = info_dict.get('thumbnail')
thumbnails = info_dict.get('thumbnails')
if thumbnail:
@@ -2019,13 +2042,6 @@ class YoutubeDL(object):
automatic_captions = info_dict.get('automatic_captions')
subtitles = info_dict.get('subtitles')
if self.params.get('listsubtitles', False):
if 'automatic_captions' in info_dict:
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return
info_dict['requested_subtitles'] = self.process_subtitles(
info_dict['id'], subtitles, automatic_captions)
@@ -2113,18 +2129,27 @@ class YoutubeDL(object):
info_dict, _ = self.pre_process(info_dict)
if self.params.get('listformats'):
if not info_dict.get('formats'):
raise ExtractorError('No video formats found', expected=True)
self.list_formats(info_dict)
list_only = self.params.get('list_thumbnails') or self.params.get('listformats') or self.params.get('listsubtitles')
if list_only:
self.__forced_printings(info_dict, self.prepare_filename(info_dict), incomplete=True)
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
if self.params.get('listformats'):
if not info_dict.get('formats'):
raise ExtractorError('No video formats found', expected=True)
self.list_formats(info_dict)
if self.params.get('listsubtitles'):
if 'automatic_captions' in info_dict:
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return
req_format = self.params.get('format')
if req_format is None:
format_selector = self.format_selector
if format_selector is None:
req_format = self._default_format_spec(info_dict, download=download)
self.write_debug('Default format spec: %s' % req_format)
format_selector = self.build_format_selector(req_format)
format_selector = self.build_format_selector(req_format)
# While in format selection we may need to have an access to the original
# format set in order to calculate some metrics or do some processing.
@@ -2158,6 +2183,8 @@ class YoutubeDL(object):
raise ExtractorError('Requested format is not available', expected=True)
else:
self.report_warning('Requested format is not available')
# Process what we can, even without any available formats.
self.process_info(dict(info_dict))
elif download:
self.to_screen(
'[info] %s: Downloading %d format(s): %s' % (
@@ -2322,7 +2349,7 @@ class YoutubeDL(object):
# TODO: backward compatibility, to be removed
info_dict['fulltitle'] = info_dict['title']
if 'format' not in info_dict:
if 'format' not in info_dict and 'ext' in info_dict:
info_dict['format'] = info_dict['ext']
if self._match_entry(info_dict) is not None:
@@ -2337,7 +2364,7 @@ class YoutubeDL(object):
files_to_move = {}
# Forced printings
self.__forced_printings(info_dict, full_filename, incomplete=False)
self.__forced_printings(info_dict, full_filename, incomplete=('format' not in info_dict))
if self.params.get('simulate', False):
if self.params.get('force_write_download_archive', False):
@@ -2658,65 +2685,53 @@ class YoutubeDL(object):
return
if success and full_filename != '-':
# Fixup content
fixup_policy = self.params.get('fixup')
if fixup_policy is None:
fixup_policy = 'detect_or_warn'
INSTALL_FFMPEG_MESSAGE = 'Install ffmpeg to fix this automatically.'
def fixup():
do_fixup = True
fixup_policy = self.params.get('fixup')
vid = info_dict['id']
stretched_ratio = info_dict.get('stretched_ratio')
if stretched_ratio is not None and stretched_ratio != 1:
if fixup_policy == 'warn':
self.report_warning('%s: Non-uniform pixel ratio (%s)' % (
info_dict['id'], stretched_ratio))
elif fixup_policy == 'detect_or_warn':
stretched_pp = FFmpegFixupStretchedPP(self)
if stretched_pp.available:
info_dict['__postprocessors'].append(stretched_pp)
if fixup_policy in ('ignore', 'never'):
return
elif fixup_policy == 'warn':
do_fixup = False
elif fixup_policy != 'force':
assert fixup_policy in ('detect_or_warn', None)
if not info_dict.get('__real_download'):
do_fixup = False
def ffmpeg_fixup(cndn, msg, cls):
if not cndn:
return
if not do_fixup:
self.report_warning(f'{vid}: {msg}')
return
pp = cls(self)
if pp.available:
info_dict['__postprocessors'].append(pp)
else:
self.report_warning(
'%s: Non-uniform pixel ratio (%s). %s'
% (info_dict['id'], stretched_ratio, INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')
self.report_warning(f'{vid}: {msg}. Install ffmpeg to fix this automatically')
if (info_dict.get('requested_formats') is None
and info_dict.get('container') == 'm4a_dash'
and info_dict.get('ext') == 'm4a'):
if fixup_policy == 'warn':
self.report_warning(
'%s: writing DASH m4a. '
'Only some players support this container.'
% info_dict['id'])
elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM4aPP(self)
if fixup_pp.available:
info_dict['__postprocessors'].append(fixup_pp)
else:
self.report_warning(
'%s: writing DASH m4a. '
'Only some players support this container. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')
stretched_ratio = info_dict.get('stretched_ratio')
ffmpeg_fixup(
stretched_ratio not in (1, None),
f'Non-uniform pixel ratio {stretched_ratio}',
FFmpegFixupStretchedPP)
if ('protocol' in info_dict
and get_suitable_downloader(info_dict, self.params).__name__ == 'HlsFD'):
if fixup_policy == 'warn':
self.report_warning('%s: malformed AAC bitstream detected.' % (
info_dict['id']))
elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM3u8PP(self)
if fixup_pp.available:
info_dict['__postprocessors'].append(fixup_pp)
else:
self.report_warning(
'%s: malformed AAC bitstream detected. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')
ffmpeg_fixup(
(info_dict.get('requested_formats') is None
and info_dict.get('container') == 'm4a_dash'
and info_dict.get('ext') == 'm4a'),
'writing DASH m4a. Only some players support this container',
FFmpegFixupM4aPP)
downloader = (get_suitable_downloader(info_dict, self.params).__name__
if 'protocol' in info_dict else None)
ffmpeg_fixup(downloader == 'HlsFD', 'malformed AAC bitstream detected', FFmpegFixupM3u8PP)
ffmpeg_fixup(downloader == 'WebSocketFragmentFD', 'malformed timestamps detected', FFmpegFixupTimestampPP)
ffmpeg_fixup(downloader == 'WebSocketFragmentFD', 'malformed duration detected', FFmpegFixupDurationPP)
fixup()
try:
info_dict = self.post_process(dl_filename, info_dict, files_to_move)
except PostProcessingError as err:
@@ -2776,7 +2791,7 @@ class YoutubeDL(object):
info = self.filter_requested_info(json.loads('\n'.join(f)), self.params.get('clean_infojson', True))
try:
self.process_ie_result(info, download=True)
except (DownloadError, EntryNotInPlaylist):
except (DownloadError, EntryNotInPlaylist, ThrottledDownload):
webpage_url = info.get('webpage_url')
if webpage_url is not None:
self.report_warning('The info failed to download, trying with "%s"' % webpage_url)
@@ -2798,7 +2813,7 @@ class YoutubeDL(object):
info_dict['epoch'] = int(time.time())
reject = lambda k, v: k in remove_keys
filter_fn = lambda obj: (
list(map(filter_fn, obj)) if isinstance(obj, (list, tuple, set))
list(map(filter_fn, obj)) if isinstance(obj, (LazyList, list, tuple, set))
else obj if not isinstance(obj, dict)
else dict((k, filter_fn(v)) for k, v in obj.items() if not reject(k, v)))
return filter_fn(info_dict)
@@ -2909,6 +2924,8 @@ class YoutubeDL(object):
@staticmethod
def format_resolution(format, default='unknown'):
if format.get('vcodec') == 'none':
if format.get('acodec') == 'none':
return 'images'
return 'audio only'
if format.get('resolution') is not None:
return format['resolution']
@@ -2993,7 +3010,7 @@ class YoutubeDL(object):
formats = info_dict.get('formats', [info_dict])
new_format = (
'list-formats' not in self.params.get('compat_opts', [])
and self.params.get('list_formats_as_table', True) is not False)
and self.params.get('listformats_table', True) is not False)
if new_format:
table = [
[
@@ -3028,22 +3045,19 @@ class YoutubeDL(object):
header_line = ['format code', 'extension', 'resolution', 'note']
self.to_screen(
'[info] Available formats for %s:\n%s' % (info_dict['id'], render_table(
header_line,
table,
delim=new_format,
extraGap=(0 if new_format else 1),
hideEmpty=new_format)))
'[info] Available formats for %s:' % info_dict['id'])
self.to_stdout(render_table(
header_line, table, delim=new_format, extraGap=(0 if new_format else 1), hideEmpty=new_format))
def list_thumbnails(self, info_dict):
thumbnails = info_dict.get('thumbnails')
thumbnails = list(info_dict.get('thumbnails'))
if not thumbnails:
self.to_screen('[info] No thumbnails present for %s' % info_dict['id'])
return
self.to_screen(
'[info] Thumbnails for %s:' % info_dict['id'])
self.to_screen(render_table(
self.to_stdout(render_table(
['ID', 'width', 'height', 'URL'],
[[t['id'], t.get('width', 'unknown'), t.get('height', 'unknown'), t['url']] for t in thumbnails]))
@@ -3055,12 +3069,12 @@ class YoutubeDL(object):
'Available %s for %s:' % (name, video_id))
def _row(lang, formats):
exts, names = zip(*((f['ext'], f.get('name', 'unknown')) for f in reversed(formats)))
exts, names = zip(*((f['ext'], f.get('name') or 'unknown') for f in reversed(formats)))
if len(set(names)) == 1:
names = [] if names[0] == 'unknown' else names[:1]
return [lang, ', '.join(names), ', '.join(exts)]
self.to_screen(render_table(
self.to_stdout(render_table(
['Language', 'Name', 'Formats'],
[_row(lang, formats) for lang, formats in subtitles.items()],
hideEmpty=True))
@@ -3238,7 +3252,7 @@ class YoutubeDL(object):
multiple = write_all and len(thumbnails) > 1
ret = []
for t in thumbnails[::1 if write_all else -1]:
for t in thumbnails[::-1]:
thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '%s.' % t['id'] if multiple else ''
thumb_display_id = '%s ' % t['id'] if multiple else ''
@@ -3246,6 +3260,7 @@ class YoutubeDL(object):
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(thumb_filename)):
ret.append(suffix + thumb_ext)
t['filepath'] = thumb_filename
self.to_screen('[%s] %s: Thumbnail %sis already present' %
(info_dict['extractor'], info_dict['id'], thumb_display_id))
else:

View File

@@ -151,6 +151,11 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid rate limit specified')
opts.ratelimit = numeric_limit
if opts.throttledratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.throttledratelimit)
if numeric_limit is None:
parser.error('invalid rate limit specified')
opts.throttledratelimit = numeric_limit
if opts.min_filesize is not None:
numeric_limit = FileDownloader.parse_bytes(opts.min_filesize)
if numeric_limit is None:
@@ -268,6 +273,7 @@ def _real_main(argv=None):
'filename', 'format-sort', 'abort-on-error', 'format-spec', 'no-playlist-metafiles',
'multistreams', 'no-live-chat', 'playlist-index', 'list-formats', 'no-direct-merge',
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-attach-info-json',
'embed-thumbnail-atomicparsley', 'seperate-video-versions',
]
compat_opts = parse_compat_opts()
@@ -551,6 +557,7 @@ def _real_main(argv=None):
'ignoreerrors': opts.ignoreerrors,
'force_generic_extractor': opts.force_generic_extractor,
'ratelimit': opts.ratelimit,
'throttledratelimit': opts.throttledratelimit,
'overwrites': opts.overwrites,
'retries': opts.retries,
'fragment_retries': opts.fragment_retries,
@@ -624,6 +631,7 @@ def _real_main(argv=None):
'include_ads': opts.include_ads,
'default_search': opts.default_search,
'dynamic_mpd': opts.dynamic_mpd,
'extractor_args': opts.extractor_args,
'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
'youtube_include_hls_manifest': opts.youtube_include_hls_manifest,
'encoding': opts.encoding,

View File

@@ -3030,6 +3030,21 @@ except AttributeError:
compat_Match = type(re.compile('').match(''))
import asyncio
try:
compat_asyncio_run = asyncio.run
except AttributeError:
def compat_asyncio_run(coro):
try:
loop = asyncio.get_event_loop()
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(coro)
asyncio.run = compat_asyncio_run
__all__ = [
'compat_HTMLParseError',
'compat_HTMLParser',
@@ -3037,6 +3052,7 @@ __all__ = [
'compat_Match',
'compat_Pattern',
'compat_Struct',
'compat_asyncio_run',
'compat_b64decode',
'compat_basestring',
'compat_chr',

View File

@@ -22,8 +22,10 @@ from .http import HttpFD
from .rtmp import RtmpFD
from .rtsp import RtspFD
from .ism import IsmFD
from .mhtml import MhtmlFD
from .niconico import NiconicoDmcFD
from .youtube_live_chat import YoutubeLiveChatReplayFD
from .websocket import WebSocketFragmentFD
from .youtube_live_chat import YoutubeLiveChatFD
from .external import (
get_external_downloader,
FFmpegFD,
@@ -39,8 +41,11 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'mhtml': MhtmlFD,
'niconico_dmc': NiconicoDmcFD,
'youtube_live_chat_replay': YoutubeLiveChatReplayFD,
'websocket_frag': WebSocketFragmentFD,
'youtube_live_chat': YoutubeLiveChatFD,
'youtube_live_chat_replay': YoutubeLiveChatFD,
}
@@ -50,6 +55,7 @@ def shorten_protocol_name(proto, simplify=False):
'rtmp_ffmpeg': 'rtmp_f',
'http_dash_segments': 'dash',
'niconico_dmc': 'dmc',
'websocket_frag': 'WSfrag',
}
if simplify:
short_protocol_names.update({

View File

@@ -32,6 +32,7 @@ class FileDownloader(object):
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.
ratelimit: Download speed limit, in bytes/sec.
throttledratelimit: Assume the download is being throttled below this speed (bytes/sec)
retries: Number of times to retry for HTTP error 5xx
buffersize: Size of download buffer in bytes.
noresizebuffer: Do not automatically resize the download buffer.

View File

@@ -1,21 +1,9 @@
from __future__ import unicode_literals
import errno
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
DownloadError,
sanitize_open,
urljoin,
)
from ..utils import urljoin
class DashSegmentsFD(FragmentFD):
@@ -43,9 +31,6 @@ class DashSegmentsFD(FragmentFD):
else:
self._prepare_and_start_frag_download(ctx)
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
fragments_to_download = []
frag_index = 0
for i, fragment in enumerate(fragments):
@@ -72,120 +57,6 @@ class DashSegmentsFD(FragmentFD):
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
# fd.add_progress_hook(ph)
success = fd.real_download(filename, info_copy)
if not success:
return False
else:
def download_fragment(fragment):
i = fragment['index']
frag_index = fragment['frag_index']
fragment_url = fragment['url']
return fd.real_download(filename, info_copy)
ctx['fragment_index'] = frag_index
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
break
raise
if count > fragment_retries:
if not fatal:
return False, frag_index
ctx['dest_stream'].close()
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return False, frag_index
return frag_content, frag_index
def append_fragment(frag_content, frag_index):
fatal = frag_index == 1 or not skip_unavailable_fragments
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
self._append_fragment(ctx, frag_content)
return True
except EnvironmentError as ose:
if ose.errno != errno.ENOENT:
raise
# FileNotFoundError
if not fatal:
self.report_skip_fragment(frag_index)
return True
else:
ctx['dest_stream'].close()
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if not fatal:
self.report_skip_fragment(frag_index)
return True
else:
ctx['dest_stream'].close()
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
_download_fragment = lambda f: (f, download_fragment(f)[1])
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(_download_fragment, fragment) for fragment in fragments_to_download]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
for fragment, frag_index in map(lambda x: x.result(), futures):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
fragment['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
down.close()
result = append_fragment(frag_content, frag_index)
if not result:
return False
else:
for fragment in fragments_to_download:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
self._finish_frag_download(ctx)
return True
return self.download_and_append_fragments(ctx, fragments_to_download, info_dict)

View File

@@ -280,6 +280,8 @@ class Aria2cFD(ExternalFD):
'--file-allocation=none', '-x16', '-j16', '-s16']
if 'fragments' in info_dict:
cmd += ['--allow-overwrite=true', '--allow-piece-length-change=true']
else:
cmd += ['--min-split-size', '1M']
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
@@ -345,6 +347,10 @@ class FFmpegFD(ExternalFD):
# TODO: Fix path for ffmpeg
return FFmpegPostProcessor().available
def on_process_started(self, proc, stdin):
""" Override this in subclasses """
pass
def _call_downloader(self, tmpfilename, info_dict):
urls = [f['url'] for f in info_dict.get('requested_formats', [])] or [info_dict['url']]
ffpp = FFmpegPostProcessor(downloader=self)
@@ -371,8 +377,6 @@ class FFmpegFD(ExternalFD):
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
args += self._configuration_args()
# start_time = info_dict.get('start_time') or 0
# if start_time:
# args += ['-ss', compat_str(start_time)]
@@ -440,7 +444,8 @@ class FFmpegFD(ExternalFD):
for url in urls:
args += ['-i', url]
args += ['-c', 'copy']
args += self._configuration_args() + ['-c', 'copy']
if info_dict.get('requested_formats'):
for (i, fmt) in enumerate(info_dict['requested_formats']):
if fmt.get('acodec') != 'none':
@@ -472,6 +477,8 @@ class FFmpegFD(ExternalFD):
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
if url in ('-', 'pipe:'):
self.on_process_started(proc, proc.stdin)
try:
retval = proc.wait()
except BaseException as e:
@@ -480,7 +487,7 @@ class FFmpegFD(ExternalFD):
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32':
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32' and url not in ('-', 'pipe:'):
process_communicate_or_kill(proc, b'q')
else:
proc.kill()

View File

@@ -4,9 +4,26 @@ import os
import time
import json
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from .common import FileDownloader
from .http import HttpFD
from ..compat import (
compat_urllib_error,
compat_struct_pack,
)
from ..utils import (
DownloadError,
error_to_compat_str,
encodeFilename,
sanitize_open,
@@ -56,7 +73,7 @@ class FragmentFD(FileDownloader):
def report_retry_fragment(self, err, frag_index, count, retries):
self.to_screen(
'[download] Got server HTTP error: %s. Retrying fragment %d (attempt %d of %s) ...'
'\r[download] Got server HTTP error: %s. Retrying fragment %d (attempt %d of %s) ...'
% (error_to_compat_str(err), frag_index, count, self.format_retries(retries)))
def report_skip_fragment(self, frag_index):
@@ -112,11 +129,15 @@ class FragmentFD(FileDownloader):
return False, None
if fragment_info_dict.get('filetime'):
ctx['fragment_filetime'] = fragment_info_dict.get('filetime')
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = fragment_filename
return True, self._read_fragment(ctx)
def _read_fragment(self, ctx):
down, frag_sanitized = sanitize_open(ctx['fragment_filename_sanitized'], 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
down.close()
return True, frag_content
return frag_content
def _append_fragment(self, ctx, frag_content):
try:
@@ -304,3 +325,101 @@ class FragmentFD(FileDownloader):
'tmpfilename': tmpfilename,
'fragment_index': 0,
})
def download_and_append_fragments(self, ctx, fragments, info_dict, pack_func=None):
fragment_retries = self.params.get('fragment_retries', 0)
is_fatal = (lambda idx: idx == 0) if self.params.get('skip_unavailable_fragments', True) else (lambda _: True)
if not pack_func:
pack_func = lambda frag_content, _: frag_content
def download_fragment(fragment, ctx):
frag_index = ctx['fragment_index'] = fragment['frag_index']
headers = info_dict.get('http_headers', {})
byte_range = fragment.get('byte_range')
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
# Never skip the first fragment
fatal = is_fatal(fragment.get('index') or (frag_index - 1))
count, frag_content = 0, None
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment['url'], info_dict, headers)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
break
raise
if count > fragment_retries:
if not fatal:
return False, frag_index
ctx['dest_stream'].close()
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return False, frag_index
return frag_content, frag_index
def decrypt_fragment(fragment, frag_content):
decrypt_info = fragment.get('decrypt_info')
if not decrypt_info or decrypt_info['METHOD'] != 'AES-128':
return frag_content
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', fragment['media_sequence'])
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if self.params.get('test', False):
return frag_content
return AES.new(decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
def append_fragment(frag_content, frag_index, ctx):
if not frag_content:
if not is_fatal(frag_index - 1):
self.report_skip_fragment(frag_index)
return True
else:
ctx['dest_stream'].close()
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
self._append_fragment(ctx, pack_func(frag_content, frag_index))
return True
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
def _download_fragment(fragment):
ctx_copy = ctx.copy()
frag_content, frag_index = download_fragment(fragment, ctx_copy)
return fragment, frag_content, frag_index, ctx_copy.get('fragment_filename_sanitized')
self.report_warning('The download speed shown is only of one thread. This is a known issue and patches are welcome')
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
for fragment, frag_content, frag_index, frag_filename in pool.map(_download_fragment, fragments):
ctx['fragment_filename_sanitized'] = frag_filename
ctx['fragment_index'] = frag_index
result = append_fragment(decrypt_fragment(fragment, frag_content), frag_index, ctx)
if not result:
return False
else:
for fragment in fragments:
frag_content, frag_index = download_fragment(fragment, ctx)
result = append_fragment(decrypt_fragment(fragment, frag_content), frag_index, ctx)
if not result:
return False
self._finish_frag_download(ctx)
return True

View File

@@ -1,32 +1,18 @@
from __future__ import unicode_literals
import errno
import re
import io
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader
from .fragment import FragmentFD
from .fragment import FragmentFD, can_decrypt_frag
from .external import FFmpegFD
from ..compat import (
compat_urllib_error,
compat_urlparse,
compat_struct_pack,
)
from ..utils import (
parse_m3u8_attributes,
sanitize_open,
update_url_query,
bug_reports_message,
)
@@ -151,10 +137,6 @@ class HlsFD(FragmentFD):
extra_state = ctx.setdefault('extra_state', {})
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
test = self.params.get('test', False)
format_index = info_dict.get('format_index')
extra_query = None
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
@@ -258,7 +240,7 @@ class HlsFD(FragmentFD):
media_sequence += 1
# We only download the first fragment during the test
if test:
if self.params.get('test', False):
fragments = [fragments[0] if fragments else None]
if real_downloader:
@@ -268,195 +250,75 @@ class HlsFD(FragmentFD):
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
# fd.add_progress_hook(ph)
success = fd.real_download(filename, info_copy)
if not success:
return False
else:
def decrypt_fragment(fragment, frag_content):
decrypt_info = fragment['decrypt_info']
if decrypt_info['METHOD'] != 'AES-128':
return frag_content
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', fragment['media_sequence'])
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if test:
return frag_content
return AES.new(decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
return fd.real_download(filename, info_copy)
def download_fragment(fragment):
frag_index = fragment['frag_index']
frag_url = fragment['url']
byte_range = fragment['byte_range']
if is_webvtt:
def pack_fragment(frag_content, frag_index):
output = io.StringIO()
adjust = 0
for block in webvtt.parse_fragment(frag_content):
if isinstance(block, webvtt.CueBlock):
block.start += adjust
block.end += adjust
ctx['fragment_index'] = frag_index
dedup_window = extra_state.setdefault('webvtt_dedup_window', [])
cue = block.as_json
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
ctx, frag_url, info_dict, headers)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
ctx['dest_stream'].close()
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return False, frag_index
return decrypt_fragment(fragment, frag_content), frag_index
pack_fragment = lambda frag_content, _: frag_content
if is_webvtt:
def pack_fragment(frag_content, frag_index):
output = io.StringIO()
adjust = 0
for block in webvtt.parse_fragment(frag_content):
if isinstance(block, webvtt.CueBlock):
block.start += adjust
block.end += adjust
dedup_window = extra_state.setdefault('webvtt_dedup_window', [])
cue = block.as_json
# skip the cue if an identical one appears
# in the window of potential duplicates
# and prune the window of unviable candidates
i = 0
skip = True
while i < len(dedup_window):
window_cue = dedup_window[i]
if window_cue == cue:
break
if window_cue['end'] >= cue['start']:
i += 1
continue
del dedup_window[i]
else:
skip = False
if skip:
# skip the cue if an identical one appears
# in the window of potential duplicates
# and prune the window of unviable candidates
i = 0
skip = True
while i < len(dedup_window):
window_cue = dedup_window[i]
if window_cue == cue:
break
if window_cue['end'] >= cue['start']:
i += 1
continue
# add the cue to the window
dedup_window.append(cue)
elif isinstance(block, webvtt.Magic):
# take care of MPEG PES timestamp overflow
if block.mpegts is None:
block.mpegts = 0
extra_state.setdefault('webvtt_mpegts_adjust', 0)
block.mpegts += extra_state['webvtt_mpegts_adjust'] << 33
if block.mpegts < extra_state.get('webvtt_mpegts_last', 0):
extra_state['webvtt_mpegts_adjust'] += 1
block.mpegts += 1 << 33
extra_state['webvtt_mpegts_last'] = block.mpegts
if frag_index == 1:
extra_state['webvtt_mpegts'] = block.mpegts or 0
extra_state['webvtt_local'] = block.local or 0
# XXX: block.local = block.mpegts = None ?
else:
if block.mpegts is not None and block.local is not None:
adjust = (
(block.mpegts - extra_state.get('webvtt_mpegts', 0))
- (block.local - extra_state.get('webvtt_local', 0))
)
continue
elif isinstance(block, webvtt.HeaderBlock):
if frag_index != 1:
# XXX: this should probably be silent as well
# or verify that all segments contain the same data
self.report_warning(bug_reports_message(
'Discarding a %s block found in the middle of the stream; '
'if the subtitles display incorrectly,'
% (type(block).__name__)))
continue
block.write_into(output)
return output.getvalue().encode('utf-8')
def append_fragment(frag_content, frag_index):
fatal = frag_index == 1 or not skip_unavailable_fragments
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
frag_content = pack_fragment(frag_content, frag_index)
self._append_fragment(ctx, frag_content)
return True
except EnvironmentError as ose:
if ose.errno != errno.ENOENT:
raise
# FileNotFoundError
if not fatal:
self.report_skip_fragment(frag_index)
return True
del dedup_window[i]
else:
ctx['dest_stream'].close()
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if not fatal:
self.report_skip_fragment(frag_index)
return True
else:
ctx['dest_stream'].close()
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
skip = False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
_download_fragment = lambda f: (f, download_fragment(f)[1])
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(_download_fragment, fragment) for fragment in fragments]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
if skip:
continue
for fragment, frag_index in map(lambda x: x.result(), futures):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
fragment['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
down.close()
result = append_fragment(decrypt_fragment(fragment, frag_content), frag_index)
if not result:
return False
else:
for fragment in fragments:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
# add the cue to the window
dedup_window.append(cue)
elif isinstance(block, webvtt.Magic):
# take care of MPEG PES timestamp overflow
if block.mpegts is None:
block.mpegts = 0
extra_state.setdefault('webvtt_mpegts_adjust', 0)
block.mpegts += extra_state['webvtt_mpegts_adjust'] << 33
if block.mpegts < extra_state.get('webvtt_mpegts_last', 0):
extra_state['webvtt_mpegts_adjust'] += 1
block.mpegts += 1 << 33
extra_state['webvtt_mpegts_last'] = block.mpegts
self._finish_frag_download(ctx)
return True
if frag_index == 1:
extra_state['webvtt_mpegts'] = block.mpegts or 0
extra_state['webvtt_local'] = block.local or 0
# XXX: block.local = block.mpegts = None ?
else:
if block.mpegts is not None and block.local is not None:
adjust = (
(block.mpegts - extra_state.get('webvtt_mpegts', 0))
- (block.local - extra_state.get('webvtt_local', 0))
)
continue
elif isinstance(block, webvtt.HeaderBlock):
if frag_index != 1:
# XXX: this should probably be silent as well
# or verify that all segments contain the same data
self.report_warning(bug_reports_message(
'Discarding a %s block found in the middle of the stream; '
'if the subtitles display incorrectly,'
% (type(block).__name__)))
continue
block.write_into(output)
return output.getvalue().encode('utf-8')
else:
pack_fragment = None
return self.download_and_append_fragments(ctx, fragments, info_dict, pack_fragment)

View File

@@ -18,6 +18,7 @@ from ..utils import (
int_or_none,
sanitize_open,
sanitized_Request,
ThrottledDownload,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
@@ -223,6 +224,7 @@ class HttpFD(FileDownloader):
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
now = None # needed for slow_down() in the first loop run
before = start # start measuring
throttle_start = None
def retry(e):
to_stdout = ctx.tmpfilename == '-'
@@ -313,6 +315,18 @@ class HttpFD(FileDownloader):
if data_len is not None and byte_counter == data_len:
break
if speed and speed < (self.params.get('throttledratelimit') or 0):
# The speed must stay below the limit for 3 seconds
# This prevents raising error when the speed temporarily goes down
if throttle_start is None:
throttle_start = now
elif now - throttle_start > 3:
if ctx.stream is not None and ctx.tmpfilename != '-':
ctx.stream.close()
raise ThrottledDownload()
else:
throttle_start = None
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
ctx.resume_len = byte_counter
# ctx.block_size = block_size

202
yt_dlp/downloader/mhtml.py Normal file
View File

@@ -0,0 +1,202 @@
# coding: utf-8
from __future__ import unicode_literals
import io
import quopri
import re
import uuid
from .fragment import FragmentFD
from ..utils import (
escapeHTML,
formatSeconds,
srt_subtitles_timecode,
urljoin,
)
from ..version import __version__ as YT_DLP_VERSION
class MhtmlFD(FragmentFD):
FD_NAME = 'mhtml'
_STYLESHEET = """\
html, body {
margin: 0;
padding: 0;
height: 100vh;
}
html {
overflow-y: scroll;
scroll-snap-type: y mandatory;
}
body {
scroll-snap-type: y mandatory;
display: flex;
flex-flow: column;
}
body > figure {
max-width: 100vw;
max-height: 100vh;
scroll-snap-align: center;
}
body > figure > figcaption {
text-align: center;
height: 2.5em;
}
body > figure > img {
display: block;
margin: auto;
max-width: 100%;
max-height: calc(100vh - 5em);
}
"""
_STYLESHEET = re.sub(r'\s+', ' ', _STYLESHEET)
_STYLESHEET = re.sub(r'\B \B|(?<=[\w\-]) (?=[^\w\-])|(?<=[^\w\-]) (?=[\w\-])', '', _STYLESHEET)
@staticmethod
def _escape_mime(s):
return '=?utf-8?Q?' + (b''.join(
bytes((b,)) if b >= 0x20 else b'=%02X' % b
for b in quopri.encodestring(s.encode('utf-8'), header=True)
)).decode('us-ascii') + '?='
def _gen_cid(self, i, fragment, frag_boundary):
return '%u.%s@yt-dlp.github.io.invalid' % (i, frag_boundary)
def _gen_stub(self, *, fragments, frag_boundary, title):
output = io.StringIO()
output.write((
'<!DOCTYPE html>'
'<html>'
'<head>'
'' '<meta name="generator" content="yt-dlp {version}">'
'' '<title>{title}</title>'
'' '<style>{styles}</style>'
'<body>'
).format(
version=escapeHTML(YT_DLP_VERSION),
styles=self._STYLESHEET,
title=escapeHTML(title)
))
t0 = 0
for i, frag in enumerate(fragments):
output.write('<figure>')
try:
t1 = t0 + frag['duration']
output.write((
'<figcaption>Slide #{num}: {t0} {t1} (duration: {duration})</figcaption>'
).format(
num=i + 1,
t0=srt_subtitles_timecode(t0),
t1=srt_subtitles_timecode(t1),
duration=formatSeconds(frag['duration'], msec=True)
))
except (KeyError, ValueError, TypeError):
t1 = None
output.write((
'<figcaption>Slide #{num}</figcaption>'
).format(num=i + 1))
output.write('<img src="cid:{cid}">'.format(
cid=self._gen_cid(i, frag, frag_boundary)))
output.write('</figure>')
t0 = t1
return output.getvalue()
def real_download(self, filename, info_dict):
fragment_base_url = info_dict.get('fragment_base_url')
fragments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
title = info_dict['title']
origin = info_dict['webpage_url']
ctx = {
'filename': filename,
'total_frags': len(fragments),
}
self._prepare_and_start_frag_download(ctx)
extra_state = ctx.setdefault('extra_state', {
'header_written': False,
'mime_boundary': str(uuid.uuid4()).replace('-', ''),
})
frag_boundary = extra_state['mime_boundary']
if not extra_state['header_written']:
stub = self._gen_stub(
fragments=fragments,
frag_boundary=frag_boundary,
title=title
)
ctx['dest_stream'].write((
'MIME-Version: 1.0\r\n'
'From: <nowhere@yt-dlp.github.io.invalid>\r\n'
'To: <nowhere@yt-dlp.github.io.invalid>\r\n'
'Subject: {title}\r\n'
'Content-type: multipart/related; '
'' 'boundary="{boundary}"; '
'' 'type="text/html"\r\n'
'X.yt-dlp.Origin: {origin}\r\n'
'\r\n'
'--{boundary}\r\n'
'Content-Type: text/html; charset=utf-8\r\n'
'Content-Length: {length}\r\n'
'\r\n'
'{stub}\r\n'
).format(
origin=origin,
boundary=frag_boundary,
length=len(stub),
title=self._escape_mime(title),
stub=stub
).encode('utf-8'))
extra_state['header_written'] = True
for i, fragment in enumerate(fragments):
if (i + 1) <= ctx['fragment_index']:
continue
fragment_url = urljoin(fragment_base_url, fragment['path'])
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
continue
mime_type = b'image/jpeg'
if frag_content.startswith(b'\x89PNG\r\n\x1a\n'):
mime_type = b'image/png'
if frag_content.startswith((b'GIF87a', b'GIF89a')):
mime_type = b'image/gif'
if frag_content.startswith(b'RIFF') and frag_content[8:12] == 'WEBP':
mime_type = b'image/webp'
frag_header = io.BytesIO()
frag_header.write(
b'--%b\r\n' % frag_boundary.encode('us-ascii'))
frag_header.write(
b'Content-ID: <%b>\r\n' % self._gen_cid(i, fragment, frag_boundary).encode('us-ascii'))
frag_header.write(
b'Content-type: %b\r\n' % mime_type)
frag_header.write(
b'Content-length: %u\r\n' % len(frag_content))
frag_header.write(
b'Content-location: %b\r\n' % fragment_url.encode('us-ascii'))
frag_header.write(
b'X.yt-dlp.Duration: %f\r\n' % fragment['duration'])
frag_header.write(b'\r\n')
self._append_fragment(
ctx, frag_header.getvalue() + frag_content + b'\r\n')
ctx['dest_stream'].write(
b'--%b--\r\n\r\n' % frag_boundary.encode('us-ascii'))
self._finish_frag_download(ctx)
return True

View File

@@ -0,0 +1,59 @@
import os
import signal
import asyncio
import threading
try:
import websockets
has_websockets = True
except ImportError:
has_websockets = False
from .common import FileDownloader
from .external import FFmpegFD
class FFmpegSinkFD(FileDownloader):
""" A sink to ffmpeg for downloading fragments in any form """
def real_download(self, filename, info_dict):
info_copy = info_dict.copy()
info_copy['url'] = '-'
async def call_conn(proc, stdin):
try:
await self.real_connection(stdin, info_dict)
except (BrokenPipeError, OSError):
pass
finally:
try:
stdin.flush()
stdin.close()
except OSError:
pass
os.kill(os.getpid(), signal.SIGINT)
class FFmpegStdinFD(FFmpegFD):
@classmethod
def get_basename(cls):
return FFmpegFD.get_basename()
def on_process_started(self, proc, stdin):
thread = threading.Thread(target=asyncio.run, daemon=True, args=(call_conn(proc, stdin), ))
thread.start()
return FFmpegStdinFD(self.ydl, self.params or {}).download(filename, info_copy)
async def real_connection(self, sink, info_dict):
""" Override this in subclasses """
raise NotImplementedError('This method must be implemented by subclasses')
class WebSocketFragmentFD(FFmpegSinkFD):
async def real_connection(self, sink, info_dict):
async with websockets.connect(info_dict['url'], extra_headers=info_dict.get('http_headers', {})) as ws:
while True:
recv = await ws.recv()
if isinstance(recv, str):
recv = recv.encode('utf8')
sink.write(recv)

View File

@@ -1,20 +1,23 @@
from __future__ import division, unicode_literals
import json
import time
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
try_get,
dict_get,
int_or_none,
RegexNotFoundError,
)
from ..extractor.youtube import YoutubeBaseInfoExtractor as YT_BaseIE
class YoutubeLiveChatReplayFD(FragmentFD):
""" Downloads YouTube live chat replays fragment by fragment """
class YoutubeLiveChatFD(FragmentFD):
""" Downloads YouTube live chats fragment by fragment """
FD_NAME = 'youtube_live_chat_replay'
FD_NAME = 'youtube_live_chat'
def real_download(self, filename, info_dict):
video_id = info_dict['video_id']
@@ -31,6 +34,8 @@ class YoutubeLiveChatReplayFD(FragmentFD):
ie = YT_BaseIE(self.ydl)
start_time = int(time.time() * 1000)
def dl_fragment(url, data=None, headers=None):
http_headers = info_dict.get('http_headers', {})
if headers:
@@ -38,13 +43,78 @@ class YoutubeLiveChatReplayFD(FragmentFD):
http_headers.update(headers)
return self._download_fragment(ctx, url, info_dict, http_headers, data)
def download_and_parse_fragment(url, frag_index, request_data):
def parse_actions_replay(live_chat_continuation):
offset = continuation_id = click_tracking_params = None
processed_fragment = bytearray()
for action in live_chat_continuation.get('actions', []):
if 'replayChatItemAction' in action:
replay_chat_item_action = action['replayChatItemAction']
offset = int(replay_chat_item_action['videoOffsetTimeMsec'])
processed_fragment.extend(
json.dumps(action, ensure_ascii=False).encode('utf-8') + b'\n')
if offset is not None:
continuation = try_get(
live_chat_continuation,
lambda x: x['continuations'][0]['liveChatReplayContinuationData'], dict)
if continuation:
continuation_id = continuation.get('continuation')
click_tracking_params = continuation.get('clickTrackingParams')
self._append_fragment(ctx, processed_fragment)
return continuation_id, offset, click_tracking_params
def try_refresh_replay_beginning(live_chat_continuation):
# choose the second option that contains the unfiltered live chat replay
refresh_continuation = try_get(
live_chat_continuation,
lambda x: x['header']['liveChatHeaderRenderer']['viewSelector']['sortFilterSubMenuRenderer']['subMenuItems'][1]['continuation']['reloadContinuationData'], dict)
if refresh_continuation:
# no data yet but required to call _append_fragment
self._append_fragment(ctx, b'')
refresh_continuation_id = refresh_continuation.get('continuation')
offset = 0
click_tracking_params = refresh_continuation.get('trackingParams')
return refresh_continuation_id, offset, click_tracking_params
return parse_actions_replay(live_chat_continuation)
live_offset = 0
def parse_actions_live(live_chat_continuation):
nonlocal live_offset
continuation_id = click_tracking_params = None
processed_fragment = bytearray()
for action in live_chat_continuation.get('actions', []):
timestamp = self.parse_live_timestamp(action)
if timestamp is not None:
live_offset = timestamp - start_time
# compatibility with replay format
pseudo_action = {
'replayChatItemAction': {'actions': [action]},
'videoOffsetTimeMsec': str(live_offset),
'isLive': True,
}
processed_fragment.extend(
json.dumps(pseudo_action, ensure_ascii=False).encode('utf-8') + b'\n')
continuation_data_getters = [
lambda x: x['continuations'][0]['invalidationContinuationData'],
lambda x: x['continuations'][0]['timedContinuationData'],
]
continuation_data = try_get(live_chat_continuation, continuation_data_getters, dict)
if continuation_data:
continuation_id = continuation_data.get('continuation')
click_tracking_params = continuation_data.get('clickTrackingParams')
timeout_ms = int_or_none(continuation_data.get('timeoutMs'))
if timeout_ms is not None:
time.sleep(timeout_ms / 1000)
self._append_fragment(ctx, processed_fragment)
return continuation_id, live_offset, click_tracking_params
def download_and_parse_fragment(url, frag_index, request_data=None, headers=None):
count = 0
while count <= fragment_retries:
try:
success, raw_fragment = dl_fragment(url, request_data, {'content-type': 'application/json'})
success, raw_fragment = dl_fragment(url, request_data, headers)
if not success:
return False, None, None
return False, None, None, None
try:
data = ie._extract_yt_initial_data(video_id, raw_fragment.decode('utf-8', 'replace'))
except RegexNotFoundError:
@@ -54,28 +124,21 @@ class YoutubeLiveChatReplayFD(FragmentFD):
live_chat_continuation = try_get(
data,
lambda x: x['continuationContents']['liveChatContinuation'], dict) or {}
offset = continuation_id = None
processed_fragment = bytearray()
for action in live_chat_continuation.get('actions', []):
if 'replayChatItemAction' in action:
replay_chat_item_action = action['replayChatItemAction']
offset = int(replay_chat_item_action['videoOffsetTimeMsec'])
processed_fragment.extend(
json.dumps(action, ensure_ascii=False).encode('utf-8') + b'\n')
if offset is not None:
continuation_id = try_get(
live_chat_continuation,
lambda x: x['continuations'][0]['liveChatReplayContinuationData']['continuation'])
self._append_fragment(ctx, processed_fragment)
return True, continuation_id, offset
if info_dict['protocol'] == 'youtube_live_chat_replay':
if frag_index == 1:
continuation_id, offset, click_tracking_params = try_refresh_replay_beginning(live_chat_continuation)
else:
continuation_id, offset, click_tracking_params = parse_actions_replay(live_chat_continuation)
elif info_dict['protocol'] == 'youtube_live_chat':
continuation_id, offset, click_tracking_params = parse_actions_live(live_chat_continuation)
return True, continuation_id, offset, click_tracking_params
except compat_urllib_error.HTTPError as err:
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False, None, None
return False, None, None, None
self._prepare_and_start_frag_download(ctx)
@@ -100,9 +163,16 @@ class YoutubeLiveChatReplayFD(FragmentFD):
innertube_context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'])
if not api_key or not innertube_context:
return False
url = 'https://www.youtube.com/youtubei/v1/live_chat/get_live_chat_replay?key=' + api_key
visitor_data = try_get(innertube_context, lambda x: x['client']['visitorData'], str)
if info_dict['protocol'] == 'youtube_live_chat_replay':
url = 'https://www.youtube.com/youtubei/v1/live_chat/get_live_chat_replay?key=' + api_key
chat_page_url = 'https://www.youtube.com/live_chat_replay?continuation=' + continuation_id
elif info_dict['protocol'] == 'youtube_live_chat':
url = 'https://www.youtube.com/youtubei/v1/live_chat/get_live_chat?key=' + api_key
chat_page_url = 'https://www.youtube.com/live_chat?continuation=' + continuation_id
frag_index = offset = 0
click_tracking_params = None
while continuation_id is not None:
frag_index += 1
request_data = {
@@ -111,8 +181,16 @@ class YoutubeLiveChatReplayFD(FragmentFD):
}
if frag_index > 1:
request_data['currentPlayerState'] = {'playerOffsetMs': str(max(offset - 5000, 0))}
success, continuation_id, offset = download_and_parse_fragment(
url, frag_index, json.dumps(request_data, ensure_ascii=False).encode('utf-8') + b'\n')
if click_tracking_params:
request_data['context']['clickTracking'] = {'clickTrackingParams': click_tracking_params}
headers = ie._generate_api_headers(ytcfg, visitor_data=visitor_data)
headers.update({'content-type': 'application/json'})
fragment_request_data = json.dumps(request_data, ensure_ascii=False).encode('utf-8') + b'\n'
success, continuation_id, offset, click_tracking_params = download_and_parse_fragment(
url, frag_index, fragment_request_data, headers)
else:
success, continuation_id, offset, click_tracking_params = download_and_parse_fragment(
chat_page_url, frag_index)
if not success:
return False
if test:
@@ -120,3 +198,39 @@ class YoutubeLiveChatReplayFD(FragmentFD):
self._finish_frag_download(ctx)
return True
@staticmethod
def parse_live_timestamp(action):
action_content = dict_get(
action,
['addChatItemAction', 'addLiveChatTickerItemAction', 'addBannerToLiveChatCommand'])
if not isinstance(action_content, dict):
return None
item = dict_get(action_content, ['item', 'bannerRenderer'])
if not isinstance(item, dict):
return None
renderer = dict_get(item, [
# text
'liveChatTextMessageRenderer', 'liveChatPaidMessageRenderer',
'liveChatMembershipItemRenderer', 'liveChatPaidStickerRenderer',
# ticker
'liveChatTickerPaidMessageItemRenderer',
'liveChatTickerSponsorItemRenderer',
# banner
'liveChatBannerRenderer',
])
if not isinstance(renderer, dict):
return None
parent_item_getters = [
lambda x: x['showItemEndpoint']['showLiveChatItemEndpoint']['renderer'],
lambda x: x['contents'],
]
parent_item = try_get(renderer, parent_item_getters, dict)
if parent_item:
renderer = dict_get(parent_item, [
'liveChatTextMessageRenderer', 'liveChatPaidMessageRenderer',
'liveChatMembershipItemRenderer', 'liveChatPaidStickerRenderer',
])
if not isinstance(renderer, dict):
return None
return int_or_none(renderer.get('timestampUsec'), 1000)

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
import time
import xml.etree.ElementTree as etree
@@ -61,6 +62,11 @@ MSO_INFO = {
'username_field': 'IDToken1',
'password_field': 'IDToken2',
},
'Spectrum': {
'name': 'Spectrum',
'username_field': 'IDToken1',
'password_field': 'IDToken2',
},
'Philo': {
'name': 'Philo',
'username_field': 'ident'
@@ -1524,6 +1530,41 @@ class AdobePassIE(InfoExtractor):
}), headers={
'Content-Type': 'application/x-www-form-urlencoded'
})
elif mso_id == 'Spectrum':
# Spectrum's login for is dynamically loaded via JS so we need to hardcode the flow
# as a one-off implementation.
provider_redirect_page, urlh = provider_redirect_page_res
provider_login_page_res = post_form(
provider_redirect_page_res, self._DOWNLOADING_LOGIN_PAGE)
saml_login_page, urlh = provider_login_page_res
relay_state = self._search_regex(
r'RelayState\s*=\s*"(?P<relay>.+?)";',
saml_login_page, 'RelayState', group='relay')
saml_request = self._search_regex(
r'SAMLRequest\s*=\s*"(?P<saml_request>.+?)";',
saml_login_page, 'SAMLRequest', group='saml_request')
login_json = {
mso_info['username_field']: username,
mso_info['password_field']: password,
'RelayState': relay_state,
'SAMLRequest': saml_request,
}
saml_response_json = self._download_json(
'https://tveauthn.spectrum.net/tveauthentication/api/v1/manualAuth', video_id,
'Downloading SAML Response',
data=json.dumps(login_json).encode(),
headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
self._download_webpage(
saml_response_json['SAMLRedirectUri'], video_id,
'Confirming Login', data=urlencode_postdata({
'SAMLResponse': saml_response_json['SAMLResponse'],
'RelayState': relay_state,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded'
})
else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed.

View File

@@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View File

@@ -281,7 +281,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})

View File

@@ -24,7 +24,7 @@ class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza|dako)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '68993eda72ef62386a15ea2cf3c93107',
'md5': '37b2b7bb9b3dcaa05b67058dc3a714a9',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
@@ -32,9 +32,9 @@ class CanvasIE(InfoExtractor):
'title': 'Nachtwacht: De Greystook',
'description': 'Nachtwacht: De Greystook',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.04,
'duration': 1468.02,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
'expected_warnings': ['is not a supported codec'],
}, {
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,

View File

@@ -70,6 +70,7 @@ from ..utils import (
str_or_none,
str_to_int,
strip_or_none,
traverse_obj,
unescapeHTML,
unified_strdate,
unified_timestamp,
@@ -290,6 +291,7 @@ class InfoExtractor(object):
categories: A list of categories that the video falls in, for example
["Sports", "Berlin"]
tags: A list of tags assigned to the video, e.g. ["sweden", "pop music"]
cast: A list of the video cast
is_live: True, False, or None (=unknown). Whether this video is a
live stream that goes on instead of a fixed-length video.
was_live: True, False, or None (=unknown). Whether this video was
@@ -1036,7 +1038,9 @@ class InfoExtractor(object):
metadata_available=False, method='any'):
if metadata_available and self.get_param('ignore_no_formats_error'):
self.report_warning(msg)
raise ExtractorError('%s. %s' % (msg, self._LOGIN_HINTS[method]), expected=True)
if method is not None:
msg = '%s. %s' % (msg, self._LOGIN_HINTS[method])
raise ExtractorError(msg, expected=True)
def raise_geo_restricted(
self, msg='This video is not available from your location due to geo restriction',
@@ -1473,7 +1477,7 @@ class InfoExtractor(object):
class FormatSort:
regex = r' *((?P<reverse>\+)?(?P<field>[a-zA-Z0-9_]+)((?P<separator>[~:])(?P<limit>.*?))?)? *$'
default = ('hidden', 'hasvid', 'ie_pref', 'lang', 'quality',
default = ('hidden', 'aud_or_vid', 'hasvid', 'ie_pref', 'lang', 'quality',
'res', 'fps', 'codec:vp9.2', 'size', 'br', 'asr',
'proto', 'ext', 'hasaud', 'source', 'format_id') # These must not be aliases
ytdl_default = ('hasaud', 'quality', 'tbr', 'filesize', 'vbr',
@@ -1486,7 +1490,7 @@ class InfoExtractor(object):
'acodec': {'type': 'ordered', 'regex': True,
'order': ['opus', 'vorbis', 'aac', 'mp?4a?', 'mp3', 'e?a?c-?3', 'dts', '', None, 'none']},
'proto': {'type': 'ordered', 'regex': True, 'field': 'protocol',
'order': ['(ht|f)tps', '(ht|f)tp$', 'm3u8.+', 'm3u8', '.*dash', '', 'mms|rtsp', 'none', 'f4']},
'order': ['(ht|f)tps', '(ht|f)tp$', 'm3u8.+', '.*dash', 'ws|websocket', '', 'mms|rtsp', 'none', 'f4']},
'vext': {'type': 'ordered', 'field': 'video_ext',
'order': ('mp4', 'webm', 'flv', '', 'none'),
'order_free': ('webm', 'mp4', 'flv', '', 'none')},
@@ -1494,6 +1498,9 @@ class InfoExtractor(object):
'order': ('m4a', 'aac', 'mp3', 'ogg', 'opus', 'webm', '', 'none'),
'order_free': ('opus', 'ogg', 'webm', 'm4a', 'mp3', 'aac', '', 'none')},
'hidden': {'visible': False, 'forced': True, 'type': 'extractor', 'max': -1000},
'aud_or_vid': {'visible': False, 'forced': True, 'type': 'multiple', 'default': 1,
'field': ('vcodec', 'acodec'),
'function': lambda it: int(any(v != 'none' for v in it))},
'ie_pref': {'priority': True, 'type': 'extractor'},
'hasvid': {'priority': True, 'field': 'vcodec', 'type': 'boolean', 'not_in_list': ('none',)},
'hasaud': {'field': 'acodec', 'type': 'boolean', 'not_in_list': ('none',)},
@@ -1701,9 +1708,7 @@ class InfoExtractor(object):
def wrapped_function(values):
values = tuple(filter(lambda x: x is not None, values))
return (self._get_field_setting(field, 'function')(*values) if len(values) > 1
else values[0] if values
else None)
return self._get_field_setting(field, 'function')(values) if values else None
value = wrapped_function((get_value(f) for f in actual_fields))
else:
@@ -1719,7 +1724,7 @@ class InfoExtractor(object):
if not format.get('ext') and 'url' in format:
format['ext'] = determine_ext(format['url'])
if format.get('vcodec') == 'none':
format['audio_ext'] = format['ext']
format['audio_ext'] = format['ext'] if format.get('acodec') != 'none' else 'none'
format['video_ext'] = 'none'
else:
format['video_ext'] = format['ext']
@@ -1976,24 +1981,33 @@ class InfoExtractor(object):
preference=None, quality=None, m3u8_id=None, live=False, note=None,
errnote=None, fatal=True, data=None, headers={}, query={},
video_id=None):
formats, subtitles = [], {}
if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access
return [], {}
return formats, subtitles
if (not self.get_param('allow_unplayable_formats')
and re.search(r'#EXT-X-SESSION-KEY:.*?URI="skd://', m3u8_doc)): # Apple FairPlay
return [], {}
return formats, subtitles
formats = []
def format_url(url):
return url if re.match(r'^https?://', url) else compat_urlparse.urljoin(m3u8_url, url)
subtitles = {}
if self.get_param('hls_split_discontinuity', False):
def _extract_m3u8_playlist_indices(manifest_url=None, m3u8_doc=None):
if not m3u8_doc:
if not manifest_url:
return []
m3u8_doc = self._download_webpage(
manifest_url, video_id, fatal=fatal, data=data, headers=headers,
note=False, errnote='Failed to download m3u8 playlist information')
if m3u8_doc is False:
return []
return range(1 + sum(line.startswith('#EXT-X-DISCONTINUITY') for line in m3u8_doc.splitlines()))
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
split_discontinuity = self.get_param('hls_split_discontinuity', False)
else:
def _extract_m3u8_playlist_indices(*args, **kwargs):
return [None]
# References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
@@ -2011,68 +2025,16 @@ class InfoExtractor(object):
# media playlist and MUST NOT appear in master playlist thus we can
# clearly detect media playlist with this criterion.
def _extract_m3u8_playlist_formats(format_url=None, m3u8_doc=None, video_id=None,
fatal=True, data=None, headers={}):
if not m3u8_doc:
if not format_url:
return []
res = self._download_webpage_handle(
format_url, video_id,
note=False,
errnote='Failed to download m3u8 playlist information',
fatal=fatal, data=data, headers=headers)
if res is False:
return []
m3u8_doc, urlh = res
format_url = urlh.geturl()
playlist_formats = []
i = (
0
if split_discontinuity
else None)
format_info = {
'index': i,
'key_data': None,
'files': [],
}
for line in m3u8_doc.splitlines():
if not line.startswith('#'):
format_info['files'].append(line)
elif split_discontinuity and line.startswith('#EXT-X-DISCONTINUITY'):
i += 1
playlist_formats.append(format_info)
format_info = {
'index': i,
'url': format_url,
'files': [],
}
playlist_formats.append(format_info)
return playlist_formats
if '#EXT-X-TARGETDURATION' in m3u8_doc: # media playlist, return as is
playlist_formats = _extract_m3u8_playlist_formats(m3u8_doc=m3u8_doc)
for format in playlist_formats:
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
format_index = format.get('index')
if format_index:
format_id.append(str(format_index))
f = {
'format_id': '-'.join(format_id),
'format_index': format_index,
'url': m3u8_url,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
'quality': quality,
}
formats.append(f)
formats = [{
'format_id': '-'.join(map(str, filter(None, [m3u8_id, idx]))),
'format_index': idx,
'url': m3u8_url,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
'quality': quality,
} for idx in _extract_m3u8_playlist_indices(m3u8_doc=m3u8_doc)]
return formats, subtitles
@@ -2112,31 +2074,19 @@ class InfoExtractor(object):
media_url = media.get('URI')
if media_url:
manifest_url = format_url(media_url)
format_id = []
playlist_formats = _extract_m3u8_playlist_formats(manifest_url, video_id=video_id,
fatal=fatal, data=data, headers=headers)
for format in playlist_formats:
format_index = format.get('index')
for v in (m3u8_id, group_id, name):
if v:
format_id.append(v)
if format_index:
format_id.append(str(format_index))
f = {
'format_id': '-'.join(format_id),
'format_index': format_index,
'url': manifest_url,
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
'quality': quality,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
formats.append(f)
formats.extend({
'format_id': '-'.join(map(str, filter(None, (m3u8_id, group_id, name, idx)))),
'format_note': name,
'format_index': idx,
'url': manifest_url,
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
'quality': quality,
'vcodec': 'none' if media_type == 'AUDIO' else None,
} for idx in _extract_m3u8_playlist_indices(manifest_url))
def build_stream_name():
# Despite specification does not mention NAME attribute for
@@ -2175,25 +2125,17 @@ class InfoExtractor(object):
or last_stream_inf.get('BANDWIDTH'), scale=1000)
manifest_url = format_url(line.strip())
playlist_formats = _extract_m3u8_playlist_formats(manifest_url, video_id=video_id,
fatal=fatal, data=data, headers=headers)
for frmt in playlist_formats:
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
format_index = frmt.get('index')
stream_name = build_stream_name()
for idx in _extract_m3u8_playlist_indices(manifest_url):
format_id = [m3u8_id, None, idx]
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
if format_index:
format_id.append(str(format_index))
stream_name = build_stream_name()
format_id[1] = stream_name if stream_name else '%d' % (tbr if tbr else len(formats))
f = {
'format_id': '-'.join(format_id),
'format_index': format_index,
'format_id': '-'.join(map(str, filter(None, format_id))),
'format_index': idx,
'url': manifest_url,
'manifest_url': m3u8_url,
'tbr': tbr,
@@ -2636,7 +2578,7 @@ class InfoExtractor(object):
mime_type = representation_attrib['mimeType']
content_type = representation_attrib.get('contentType', mime_type.split('/')[0])
if content_type in ('video', 'audio', 'text'):
if content_type in ('video', 'audio', 'text') or mime_type == 'image/jpeg':
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
@@ -2653,9 +2595,15 @@ class InfoExtractor(object):
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
bandwidth = int_or_none(representation_attrib.get('bandwidth'))
if representation_id is not None:
format_id = representation_id
else:
format_id = content_type
if mpd_id:
format_id = mpd_id + '-' + format_id
if content_type in ('video', 'audio'):
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'format_id': format_id,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
@@ -2675,6 +2623,17 @@ class InfoExtractor(object):
'manifest_url': mpd_url,
'filesize': filesize,
}
elif mime_type == 'image/jpeg':
# See test case in VikiIE
# https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1
f = {
'format_id': format_id,
'ext': 'mhtml',
'manifest_url': mpd_url,
'format_note': 'DASH storyboards (jpeg)',
'acodec': 'none',
'vcodec': 'none',
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
def prepare_template(template_name, identifiers):
@@ -2693,7 +2652,8 @@ class InfoExtractor(object):
t += c
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
t = t.replace('$RepresentationID$', representation_id)
if representation_id is not None:
t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
t.replace('$$', '$')
@@ -2810,7 +2770,7 @@ class InfoExtractor(object):
'url': mpd_url or base_url,
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments',
'protocol': 'http_dash_segments' if mime_type != 'image/jpeg' else 'mhtml',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url']
@@ -2821,7 +2781,7 @@ class InfoExtractor(object):
else:
# Assuming direct URL to unfragmented media.
f['url'] = base_url
if content_type in ('video', 'audio'):
if content_type in ('video', 'audio') or mime_type == 'image/jpeg':
formats.append(f)
elif content_type == 'text':
subtitles.setdefault(lang or 'und', []).append(f)
@@ -3484,16 +3444,8 @@ class InfoExtractor(object):
return ret
@classmethod
def _merge_subtitles(cls, *dicts, **kwargs):
def _merge_subtitles(cls, *dicts, target=None):
""" Merge subtitle dictionaries, language by language. """
target = (lambda target=None: target)(**kwargs)
# The above lambda extracts the keyword argument 'target' from kwargs
# while ensuring there are no stray ones. When Python 2 support
# is dropped, remove it and change the function signature to:
#
# def _merge_subtitles(cls, *dicts, target=None):
if target is None:
target = {}
for d in dicts:
@@ -3546,6 +3498,10 @@ class InfoExtractor(object):
else 'public' if all_known
else None)
def _configuration_arg(self, key):
return traverse_obj(
self._downloader.params, ('extractor_args', self.ie_key().lower(), key))
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -143,9 +143,9 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
}
class CuriosityStreamCollectionsIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collections'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/collections/(?P<id>\d+)'
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_API_BASE_URL = 'https://api.curiositystream.com/v2/collections/'
_TESTS = [{
'url': 'https://curiositystream.com/collections/86',
@@ -155,6 +155,20 @@ class CuriosityStreamCollectionsIE(CuriosityStreamBaseIE):
'description': 'Wondering where to start? Here are a few of our favorite series and films... from our couch to yours.',
},
'playlist_mincount': 7,
}, {
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
'id': '2',
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 16,
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -163,25 +177,10 @@ class CuriosityStreamCollectionsIE(CuriosityStreamBaseIE):
entries = []
for media in collection.get('media', []):
media_id = compat_str(media.get('id'))
media_type, ie = ('series', CuriosityStreamSeriesIE) if media.get('is_collection') else ('video', CuriosityStreamIE)
media_type, ie = ('series', CuriosityStreamCollectionIE) if media.get('is_collection') else ('video', CuriosityStreamIE)
entries.append(self.url_result(
'https://curiositystream.com/%s/%s' % (media_type, media_id),
ie=ie.ie_key(), video_id=media_id))
return self.playlist_result(
entries, collection_id,
collection.get('title'), collection.get('description'))
class CuriosityStreamSeriesIE(CuriosityStreamCollectionsIE):
IE_NAME = 'curiositystream:series'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/series/(?P<id>\d+)'
_API_BASE_URL = 'https://api.curiositystream.com/v2/series/'
_TESTS = [{
'url': 'https://app.curiositystream.com/series/2',
'info_dict': {
'id': '2',
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 16,
}]

View File

@@ -22,16 +22,19 @@ class EggheadBaseIE(InfoExtractor):
class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:course|playlist)s/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
'info_dict': {
'id': '72',
'id': '432655',
'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
},
}
}, {
'url': 'https://app.egghead.io/playlists/professor-frisby-introduces-composable-functional-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@@ -65,7 +68,7 @@ class EggheadCourseIE(EggheadBaseIE):
class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
@@ -88,6 +91,9 @@ class EggheadLessonIE(EggheadBaseIE):
}, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True,
}, {
'url': 'https://app.egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -291,8 +291,7 @@ from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
CuriosityStreamIE,
CuriosityStreamCollectionsIE,
CuriosityStreamSeriesIE,
CuriosityStreamCollectionIE,
)
from .cwtv import CWTVIE
from .dailymail import DailyMailIE
@@ -399,7 +398,11 @@ from .facebook import (
FacebookIE,
FacebookPluginsVideoIE,
)
from .fancode import FancodeVodIE
from .fancode import (
FancodeVodIE,
FancodeLiveIE
)
from .faz import FazIE
from .fc2 import (
FC2IE,
@@ -456,7 +459,11 @@ from .frontendmasters import (
FrontendMastersCourseIE
)
from .fujitv import FujiTVFODPlus7IE
from .funimation import FunimationIE
from .funimation import (
FunimationIE,
FunimationPageIE,
FunimationShowIE,
)
from .funk import FunkIE
from .fusion import FusionIE
from .gaia import GaiaIE
@@ -655,10 +662,6 @@ from .linkedin import (
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
@@ -1065,6 +1068,10 @@ from .rcs import (
RCSEmbedsIE,
RCSVariousIE,
)
from .rcti import (
RCTIPlusIE,
RCTIPlusSeriesIE,
)
from .rds import RDSIE
from .redbulltv import (
RedBullTVIE,

View File

@@ -629,16 +629,11 @@ class FacebookIE(InfoExtractor):
process_formats(formats)
description = self._html_search_meta('description', webpage, default=None)
video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage,
'title', default=None)
if not video_title:
video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', default=None)
if not video_title:
video_title = self._html_search_meta(
'description', webpage, 'title', default=None)
(r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>',
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>'),
webpage, 'title', default=None) or self._og_search_title(webpage, default=None) or description
if video_title:
video_title = limit_length(video_title, 80)
else:
@@ -662,6 +657,7 @@ class FacebookIE(InfoExtractor):
'formats': formats,
'uploader': uploader,
'timestamp': timestamp,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'subtitles': subtitles,

View File

@@ -7,7 +7,8 @@ from ..compat import compat_str
from ..utils import (
parse_iso8601,
ExtractorError,
try_get
try_get,
mimetype2ext
)
@@ -38,16 +39,63 @@ class FancodeVodIE(InfoExtractor):
'only_matching': True,
}]
_ACCESS_TOKEN = None
_NETRC_MACHINE = 'fancode'
_LOGIN_HINT = 'Use "--user refresh --password <refresh_token>" to login using a refresh token'
headers = {
'content-type': 'application/json',
'origin': 'https://fancode.com',
'referer': 'https://fancode.com',
}
def _login(self):
# Access tokens are shortlived, so get them using the refresh token.
username, password = self._get_login_info()
if username == 'refresh' and password is not None:
self.report_login()
data = '''{
"query":"mutation RefreshToken($refreshToken: String\\u0021) { refreshToken(refreshToken: $refreshToken) { accessToken }}",
"variables":{
"refreshToken":"%s"
},
"operationName":"RefreshToken"
}''' % password
token_json = self.download_gql('refresh token', data, "Getting the Access token")
self._ACCESS_TOKEN = try_get(token_json, lambda x: x['data']['refreshToken']['accessToken'])
if self._ACCESS_TOKEN is None:
self.report_warning('Failed to get Access token')
else:
self.headers.update({'Authorization': 'Bearer %s' % self._ACCESS_TOKEN})
elif username is not None:
self.report_warning(f'Login using username and password is not currently supported. {self._LOGIN_HINT}')
def _real_initialize(self):
self._login()
def _check_login_required(self, is_available, is_premium):
msg = None
if is_premium and self._ACCESS_TOKEN is None:
msg = f'This video is only available for registered users. {self._LOGIN_HINT}'
elif not is_available and self._ACCESS_TOKEN is not None:
msg = 'This video isn\'t available to the current logged in account'
if msg:
self.raise_login_required(msg, metadata_available=True, method=None)
def download_gql(self, variable, data, note, fatal=False, headers=headers):
return self._download_json(
'https://www.fancode.com/graphql', variable,
data=data.encode(), note=note,
headers=headers, fatal=fatal)
def _real_extract(self, url):
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/%s/default_default/index.html?videoId=%s'
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
brightcove_user_id = self._html_search_regex(
r'(?:https?://)?players\.brightcove\.net/(\d+)/default_default/index(?:\.min)?\.js',
webpage, 'user id')
brightcove_user_id = '6008340455001'
data = '''{
"query":"query Video($id: Int\\u0021, $filter: SegmentFilter) { media(id: $id, filter: $filter) { id contentId title contentId publishedTime totalViews totalUpvotes provider thumbnail { src } mediaSource {brightcove } duration isPremium isUserEntitled tags duration }}",
"variables":{
@@ -57,15 +105,9 @@ class FancodeVodIE(InfoExtractor):
}
},
"operationName":"Video"
}''' % video_id
}''' % video_id
metadata_json = self._download_json(
'https://www.fancode.com/graphql', video_id, data=data.encode(), note='Downloading metadata',
headers={
'content-type': 'application/json',
'origin': 'https://fancode.com',
'referer': url,
})
metadata_json = self.download_gql(video_id, data, note='Downloading metadata')
media = try_get(metadata_json, lambda x: x['data']['media'], dict) or {}
brightcove_video_id = try_get(media, lambda x: x['mediaSource']['brightcove'], compat_str)
@@ -74,8 +116,8 @@ class FancodeVodIE(InfoExtractor):
raise ExtractorError('Unable to extract brightcove Video ID')
is_premium = media.get('isPremium')
if is_premium:
self.report_warning('this video requires a premium account', video_id)
self._check_login_required(media.get('isUserEntitled'), is_premium)
return {
'_type': 'url_transparent',
@@ -89,3 +131,57 @@ class FancodeVodIE(InfoExtractor):
'release_timestamp': parse_iso8601(media.get('publishedTime')),
'availability': self._availability(needs_premium=is_premium),
}
class FancodeLiveIE(FancodeVodIE):
IE_NAME = 'fancode:live'
_VALID_URL = r'https?://(www\.)?fancode\.com/match/(?P<id>[0-9]+).+'
_TESTS = [{
'url': 'https://fancode.com/match/35328/cricket-fancode-ecs-hungary-2021-bub-vs-blb?slug=commentary',
'info_dict': {
'id': '35328',
'ext': 'mp4',
'title': 'BUB vs BLB',
"timestamp": 1624863600,
'is_live': True,
'upload_date': '20210628',
},
'skip': 'Ended'
}, {
'url': 'https://fancode.com/match/35328/',
'only_matching': True,
}, {
'url': 'https://fancode.com/match/35567?slug=scorecard',
'only_matching': True,
}]
def _real_extract(self, url):
id = self._match_id(url)
data = '''{
"query":"query MatchResponse($id: Int\\u0021, $isLoggedIn: Boolean\\u0021) { match: matchWithScores(id: $id) { id matchDesc mediaId videoStreamId videoStreamUrl { ...VideoSource } liveStreams { videoStreamId videoStreamUrl { ...VideoSource } contentId } name startTime streamingStatus isPremium isUserEntitled @include(if: $isLoggedIn) status metaTags bgImage { src } sport { name slug } tour { id name } squads { name shortName } liveStreams { contentId } mediaId }}fragment VideoSource on VideoSource { title description posterUrl url deliveryType playerType}",
"variables":{
"id":%s,
"isLoggedIn":true
},
"operationName":"MatchResponse"
}''' % id
info_json = self.download_gql(id, data, "Info json")
match_info = try_get(info_json, lambda x: x['data']['match'])
if match_info.get('status') != "LIVE":
raise ExtractorError('The stream can\'t be accessed', expected=True)
self._check_login_required(match_info.get('isUserEntitled'), True) # all live streams are premium only
return {
'id': id,
'title': match_info.get('name'),
'formats': self._extract_akamai_formats(try_get(match_info, lambda x: x['videoStreamUrl']['url']), id),
'ext': mimetype2ext(try_get(match_info, lambda x: x['videoStreamUrl']['deliveryType'])),
'is_live': True,
'release_timestamp': parse_iso8601(match_info.get('startTime'))
}

View File

@@ -2,59 +2,124 @@
from __future__ import unicode_literals
import random
import re
import string
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
dict_get,
int_or_none,
js_to_json,
str_or_none,
try_get,
urlencode_postdata,
ExtractorError,
urlencode_postdata
)
class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation'
_TOKEN = None
class FunimationPageIE(InfoExtractor):
IE_NAME = 'funimation:page'
_VALID_URL = r'(?P<origin>https?://(?:www\.)?funimation(?:\.com|now\.uk))/(?P<lang>[^/]+/)?(?P<path>shows/(?P<id>[^/]+/[^/?#&]+).*$)'
_TESTS = [{
'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'info_dict': {
'id': '91144',
'display_id': 'role-play',
'ext': 'mp4',
'title': '.hack//SIGN - Role Play',
'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
'thumbnail': r're:https?://.*\.jpg',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.funimation.com/shows/attack-on-titan-junior-high/broadcast-dub-preview/',
'info_dict': {
'id': '210051',
'display_id': 'broadcast-dub-preview',
'id': '210050',
'ext': 'mp4',
'title': 'Attack on Titan: Junior High - Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'title': 'Broadcast Dub Preview',
# Other metadata is tested in FunimationIE
},
'params': {
# m3u8 download
'skip_download': True,
'skip_download': 'm3u8',
},
'add_ie': ['Funimation'],
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
# Not available in US
'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id').replace('/', '_')
if not mobj.group('lang'):
url = '%s/en/%s' % (mobj.group('origin'), mobj.group('path'))
webpage = self._download_webpage(url, display_id)
title_data = self._parse_json(self._search_regex(
r'TITLE_DATA\s*=\s*({[^}]+})',
webpage, 'title data', default=''),
display_id, js_to_json, fatal=False) or {}
video_id = (
title_data.get('id')
or self._search_regex(
(r"KANE_customdimensions.videoID\s*=\s*'(\d+)';", r'<iframe[^>]+src="/player/(\d+)'),
webpage, 'video_id', default=None)
or self._search_regex(
r'/player/(\d+)',
self._html_search_meta(['al:web:url', 'og:video:url', 'og:video:secure_url'], webpage, fatal=True),
'video id'))
return self.url_result(f'https://www.funimation.com/player/{video_id}', FunimationIE.ie_key(), video_id)
class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation\.com/player/(?P<id>\d+)'
_NETRC_MACHINE = 'funimation'
_TOKEN = None
_TESTS = [{
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210050',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 154,
},
'params': {
'skip_download': 'm3u8',
},
}, {
'note': 'player_id should be extracted with the relevent compat-opt',
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210051',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 154,
},
'params': {
'skip_download': 'm3u8',
'compat_opts': ['seperate-video-versions'],
},
}]
def _login(self):
@@ -78,81 +143,184 @@ class FunimationIE(InfoExtractor):
def _real_initialize(self):
self._login()
@staticmethod
def _get_experiences(episode):
for lang, lang_data in episode.get('languages', {}).items():
for video_data in lang_data.values():
for version, f in video_data.items():
yield lang, version.title(), f
def _get_episode(self, webpage, experience_id=None, episode_id=None, fatal=True):
''' Extract the episode, season and show objects given either episode/experience id '''
show = self._parse_json(
self._search_regex(
r'show\s*=\s*({.+?})\s*;', webpage, 'show data', fatal=fatal),
experience_id, transform_source=js_to_json, fatal=fatal) or []
for season in show.get('seasons', []):
for episode in season.get('episodes', []):
if episode_id is not None:
if str(episode.get('episodePk')) == episode_id:
return episode, season, show
continue
for _, _, f in self._get_experiences(episode):
if f.get('experienceId') == experience_id:
return episode, season, show
if fatal:
raise ExtractorError('Unable to find episode information')
else:
self.report_warning('Unable to find episode information')
return {}, {}, {}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
initial_experience_id = self._match_id(url)
webpage = self._download_webpage(
url, initial_experience_id, note=f'Downloading player webpage for {initial_experience_id}')
episode, season, show = self._get_episode(webpage, experience_id=int(initial_experience_id))
episode_id = str(episode['episodePk'])
display_id = episode.get('slug') or episode_id
def _search_kane(name):
return self._search_regex(
r"KANE_customdimensions\.%s\s*=\s*'([^']+)';" % name,
webpage, name, default=None)
formats, subtitles, thumbnails, duration = [], {}, [], 0
requested_languages, requested_versions = self._configuration_arg('language'), self._configuration_arg('version')
only_initial_experience = 'seperate-video-versions' in self.get_param('compat_opts', [])
title_data = self._parse_json(self._search_regex(
r'TITLE_DATA\s*=\s*({[^}]+})',
webpage, 'title data', default=''),
display_id, js_to_json, fatal=False) or {}
for lang, version, fmt in self._get_experiences(episode):
experience_id = str(fmt['experienceId'])
if (only_initial_experience and experience_id != initial_experience_id
or requested_languages and lang not in requested_languages
or requested_versions and version not in requested_versions):
continue
thumbnails.append({'url': fmt.get('poster')})
duration = max(duration, fmt.get('duration', 0))
format_name = '%s %s (%s)' % (version, lang, experience_id)
self.extract_subtitles(
subtitles, experience_id, display_id=display_id, format_name=format_name,
episode=episode if experience_id == initial_experience_id else episode_id)
video_id = title_data.get('id') or self._search_regex([
r"KANE_customdimensions.videoID\s*=\s*'(\d+)';",
r'<iframe[^>]+src="/player/(\d+)',
], webpage, 'video_id', default=None)
if not video_id:
player_url = self._html_search_meta([
'al:web:url',
'og:video:url',
'og:video:secure_url',
], webpage, fatal=True)
video_id = self._search_regex(r'/player/(\d+)', player_url, 'video id')
title = episode = title_data.get('title') or _search_kane('videoTitle') or self._og_search_title(webpage)
series = _search_kane('showName')
if series:
title = '%s - %s' % (series, title)
description = self._html_search_meta(['description', 'og:description'], webpage, fatal=True)
try:
headers = {}
if self._TOKEN:
headers['Authorization'] = 'Token %s' % self._TOKEN
sources = self._download_json(
'https://www.funimation.com/api/showexperience/%s/' % video_id,
video_id, headers=headers, query={
page = self._download_json(
'https://www.funimation.com/api/showexperience/%s/' % experience_id,
display_id, headers=headers, expected_status=403, query={
'pinst_id': ''.join([random.choice(string.digits + string.ascii_letters) for _ in range(8)]),
})['items']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read(), video_id)['errors'][0]
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error.get('detail') or error.get('title')), expected=True)
raise
}, note=f'Downloading {format_name} JSON')
sources = page.get('items') or []
if not sources:
error = try_get(page, lambda x: x['errors'][0], dict)
if error:
self.report_warning('%s said: Error %s - %s' % (
self.IE_NAME, error.get('code'), error.get('detail') or error.get('title')))
else:
self.report_warning('No sources found for format')
formats = []
for source in sources:
source_url = source.get('src')
if not source_url:
continue
source_type = source.get('videoType') or determine_ext(source_url)
if source_type == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': source_type,
'url': source_url,
})
current_formats = []
for source in sources:
source_url = source.get('src')
source_type = source.get('videoType') or determine_ext(source_url)
if source_type == 'm3u8':
current_formats.extend(self._extract_m3u8_formats(
source_url, display_id, 'mp4', m3u8_id='%s-%s' % (experience_id, 'hls'), fatal=False,
note=f'Downloading {format_name} m3u8 information'))
else:
current_formats.append({
'format_id': '%s-%s' % (experience_id, source_type),
'url': source_url,
})
for f in current_formats:
# TODO: Convert language to code
f.update({'language': lang, 'format_note': version})
formats.extend(current_formats)
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
return {
'id': video_id,
'id': initial_experience_id if only_initial_experience else episode_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage),
'series': series,
'season_number': int_or_none(title_data.get('seasonNum') or _search_kane('season')),
'episode_number': int_or_none(title_data.get('episodeNum')),
'episode': episode,
'season_id': title_data.get('seriesId'),
'duration': duration,
'title': episode['episodeTitle'],
'description': episode.get('episodeSummary'),
'episode': episode.get('episodeTitle'),
'episode_number': int_or_none(episode.get('episodeId')),
'episode_id': episode_id,
'season': season.get('seasonTitle'),
'season_number': int_or_none(season.get('seasonId')),
'season_id': str_or_none(season.get('seasonPk')),
'series': show.get('showTitle'),
'formats': formats,
'thumbnails': thumbnails,
'subtitles': subtitles,
}
def _get_subtitles(self, subtitles, experience_id, episode, display_id, format_name):
if isinstance(episode, str):
webpage = self._download_webpage(
f'https://www.funimation.com/player/{experience_id}', display_id,
fatal=False, note=f'Downloading player webpage for {format_name}')
episode, _, _ = self._get_episode(webpage, episode_id=episode, fatal=False)
for _, version, f in self._get_experiences(episode):
for source in f.get('sources'):
for text_track in source.get('textTracks'):
if not text_track.get('src'):
continue
sub_type = text_track.get('type').upper()
sub_type = sub_type if sub_type != 'FULL' else None
current_sub = {
'url': text_track['src'],
'name': ' '.join(filter(None, (version, text_track.get('label'), sub_type)))
}
lang = '_'.join(filter(None, (
text_track.get('language', 'und'), version if version != 'Simulcast' else None, sub_type)))
if current_sub not in subtitles.get(lang, []):
subtitles.setdefault(lang, []).append(current_sub)
return subtitles
class FunimationShowIE(FunimationIE):
IE_NAME = 'funimation:show'
_VALID_URL = r'(?P<url>https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?P<locale>[^/]+)?/?shows/(?P<id>[^/?#&]+))/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.funimation.com/en/shows/sk8-the-infinity',
'info_dict': {
'id': 1315000,
'title': 'SK8 the Infinity'
},
'playlist_count': 13,
'params': {
'skip_download': True,
},
}, {
# without lang code
'url': 'https://www.funimation.com/shows/ouran-high-school-host-club/',
'info_dict': {
'id': 39643,
'title': 'Ouran High School Host Club'
},
'playlist_count': 26,
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
base_url, locale, display_id = re.match(self._VALID_URL, url).groups()
show_info = self._download_json(
'https://title-api.prd.funimationsvc.com/v2/shows/%s?region=US&deviceType=web&locale=%s'
% (display_id, locale or 'en'), display_id)
items = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/funimation/episodes/?limit=99999&title_id=%s'
% show_info.get('id'), display_id).get('items')
vod_items = map(lambda k: dict_get(k, ('mostRecentSvod', 'mostRecentAvod')).get('item'), items)
return {
'_type': 'playlist',
'id': show_info['id'],
'title': show_info['name'],
'entries': [
self.url_result(
'%s/%s' % (base_url, vod_item.get('episodeSlug')), FunimationPageIE.ie_key(),
vod_item.get('episodeId'), vod_item.get('episodeName'))
for vod_item in sorted(vod_items, key=lambda x: x.get('episodeOrder'))],
}

View File

@@ -84,7 +84,6 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .kaltura import KalturaIE
@@ -1632,31 +1631,6 @@ class GenericIE(InfoExtractor):
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
@@ -3204,11 +3178,6 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url:

View File

@@ -27,8 +27,8 @@ from ..utils import (
class HotStarBaseIE(InfoExtractor):
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _call_api_impl(self, path, video_id, query):
st = int(time.time())
def _call_api_impl(self, path, video_id, query, st=None):
st = int_or_none(st) or int(time.time())
exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
@@ -75,9 +75,9 @@ class HotStarBaseIE(InfoExtractor):
'tas': 10000,
})
def _call_api_v2(self, path, video_id):
def _call_api_v2(self, path, video_id, st=None):
return self._call_api_impl(
'%s/content/%s' % (path, video_id), video_id, {
'%s/content/%s' % (path, video_id), video_id, st=st, query={
'desired-config': 'audio_channel:stereo|dynamic_range:sdr|encryption:plain|ladder:tv|package:dash|resolution:hd|subs-tag:HotstarVIP|video_codec:vp9',
'device-id': compat_str(uuid.uuid4()),
'os-name': 'Windows',
@@ -131,7 +131,8 @@ class HotStarIE(HotStarBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage, urlh = self._download_webpage_handle(url, video_id)
st = urlh.headers.get('x-origin-date')
app_state = self._parse_json(self._search_regex(
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id)
@@ -155,7 +156,7 @@ class HotStarIE(HotStarBaseIE):
formats = []
geo_restricted = False
# change to v2 in the future
playback_sets = self._call_api_v2('play/v1/playback', video_id)['playBackSets']
playback_sets = self._call_api_v2('play/v1/playback', video_id, st=st)['playBackSets']
for playback_set in playback_sets:
if not isinstance(playback_set, dict):
continue

View File

@@ -1,191 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?.*?\b[it]=(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Prochan embed
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
},
'skip': 'Video is dead',
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/5983
# Multiple resolutions
'url': 'http://www.liveleak.com/view?i=801_1409392012',
'md5': 'c3a449dbaca5c0d1825caecd52a57d7b',
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/10664#issuecomment-247439521
'url': 'http://m.liveleak.com/view?i=763_1473349649',
'add_ie': ['Youtube'],
'info_dict': {
'id': '763_1473349649',
'ext': 'mp4',
'title': 'Reporters and public officials ignore epidemic of black on asian violence in Sacramento | Colin Flaherty',
'description': 'Colin being the warrior he is and showing the injustice Asians in Sacramento are being subjected to.',
'uploader': 'Ziz',
'upload_date': '20160908',
'uploader_id': 'UCEbta5E_jqlZmEJsriTEtnw'
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.liveleak.com/view?i=677_1439397581',
'info_dict': {
'id': '677_1439397581',
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="((?:https?:)?//(?:www\.)?(?:prochan|youtube)\.com/embed[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
}
for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
formats.append(a_format)
# Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/ytdl-org/youtube-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'quality': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
info_dict['id'] = '%s_%s' % (video_id, idx + 1)
else:
info_dict['id'] = video_id
info_dict.update({
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
return self.playlist_result(entries, video_id, video_title)
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
'url': 'https://www.liveleak.com/ll_embed?i=874_1459135191',
'only_matching': True,
}, {
'url': 'https://www.liveleak.com/ll_embed?f=ab065df993c1',
'only_matching': True,
}]
def _real_extract(self, url):
kind, video_id = re.match(self._VALID_URL, url).groups()
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
else:
liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@@ -122,6 +122,52 @@ class MediasiteIE(InfoExtractor):
r'(?xi)<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:(?:https?:)?//[^/]+)?/Mediasite/Play/%s(?:\?.*?)?)\1' % _ID_RE,
webpage)]
def __extract_slides(self, *, stream_id, snum, Stream, duration, images):
slide_base_url = Stream['SlideBaseUrl']
fname_template = Stream['SlideImageFileNameTemplate']
if fname_template != 'slide_{0:D4}.jpg':
self.report_warning('Unusual slide file name template; report a bug if slide downloading fails')
fname_template = re.sub(r'\{0:D([0-9]+)\}', r'{0:0\1}', fname_template)
fragments = []
for i, slide in enumerate(Stream['Slides']):
if i == 0:
if slide['Time'] > 0:
default_slide = images.get('DefaultSlide')
if default_slide is None:
default_slide = images.get('DefaultStreamImage')
if default_slide is not None:
default_slide = default_slide['ImageFilename']
if default_slide is not None:
fragments.append({
'path': default_slide,
'duration': slide['Time'] / 1000,
})
next_time = try_get(None, [
lambda _: Stream['Slides'][i + 1]['Time'],
lambda _: duration,
lambda _: slide['Time'],
], expected_type=(int, float))
fragments.append({
'path': fname_template.format(slide.get('Number', i + 1)),
'duration': (next_time - slide['Time']) / 1000
})
return {
'format_id': '%s-%u.slides' % (stream_id, snum),
'ext': 'mhtml',
'url': slide_base_url,
'protocol': 'mhtml',
'acodec': 'none',
'vcodec': 'none',
'format_note': 'Slides',
'fragments': fragments,
'fragment_base_url': slide_base_url,
}
def _real_extract(self, url):
url, data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
@@ -198,10 +244,15 @@ class MediasiteIE(InfoExtractor):
'ext': mimetype2ext(VideoUrl.get('MimeType')),
})
# TODO: if Stream['HasSlideContent']:
# synthesise an MJPEG video stream '%s-%u.slides' % (stream_type, snum)
# from Stream['Slides']
# this will require writing a custom downloader...
if Stream.get('HasSlideContent', False):
images = player_options['PlayerLayoutOptions']['Images']
stream_formats.append(self.__extract_slides(
stream_id=stream_id,
snum=snum,
Stream=Stream,
duration=presentation.get('Duration'),
images=images,
))
# disprefer 'secondary' streams
if stream_type != 0:

View File

@@ -249,6 +249,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
if info:
entries.append(info)
# TODO: should be multi-video
return self.playlist_result(
entries, playlist_title=title, playlist_description=description)

View File

@@ -58,7 +58,7 @@ class NRKBaseIE(InfoExtractor):
def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None):
return self._download_json(
urljoin('http://psapi.nrk.no/', path),
urljoin('https://psapi.nrk.no/', path),
video_id, note or 'Downloading %s JSON' % item,
fatal=fatal, query=query,
headers={'Accept-Encoding': 'gzip, deflate, br'})

View File

@@ -98,6 +98,9 @@ class ORFTVthekIE(InfoExtractor):
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id=format_id, fatal=False))
else:
formats.append({
'format_id': format_id,

View File

@@ -569,15 +569,15 @@ class PeerTubeIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
full_description = self._call_api(
host, video_id, 'description', note='Downloading description JSON',
fatal=False)
description = video.get('description')
if len(description) >= 250:
# description is shortened
full_description = self._call_api(
host, video_id, 'description', note='Downloading description JSON',
fatal=False)
description = None
if isinstance(full_description, dict):
description = str_or_none(full_description.get('description'))
if not description:
description = video.get('description')
if isinstance(full_description, dict):
description = str_or_none(full_description.get('description')) or description
subtitles = self.extract_subtitles(host, video_id)

View File

@@ -12,6 +12,10 @@ from ..utils import (
class PeriscopeBaseIE(InfoExtractor):
_M3U8_HEADERS = {
'Referer': 'https://www.periscope.tv/'
}
def _call_api(self, method, query, item_id):
return self._download_json(
'https://api.periscope.tv/api/v2/%s' % method,
@@ -54,9 +58,11 @@ class PeriscopeBaseIE(InfoExtractor):
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native'
if state in ('ended', 'timed_out') else 'm3u8',
m3u8_id=format_id, fatal=fatal)
m3u8_id=format_id, fatal=fatal, headers=self._M3U8_HEADERS)
if len(m3u8_formats) == 1:
self._add_width_and_height(m3u8_formats[0], width, height)
for f in m3u8_formats:
f.setdefault('http_headers', {}).update(self._M3U8_HEADERS)
return m3u8_formats

View File

@@ -19,7 +19,7 @@ from ..utils import (
class PlutoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pluto\.tv/on-demand/(?P<video_type>movies|series)/(?P<slug>.*)/?$'
_VALID_URL = r'https?://(?:www\.)?pluto\.tv(?:/en)?/on-demand/(?P<video_type>movies|series)/(?P<slug>.*)/?$'
_INFO_URL = 'https://service-vod.clusters.pluto.tv/v3/vod/slugs/'
_INFO_QUERY_PARAMS = {
'appName': 'web',
@@ -48,24 +48,21 @@ class PlutoTVIE(InfoExtractor):
'episode_number': 3,
'duration': 3600,
}
},
{
}, {
'url': 'https://pluto.tv/on-demand/series/i-love-money/season/1/',
'playlist_count': 11,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money - Season 1',
}
},
{
}, {
'url': 'https://pluto.tv/on-demand/series/i-love-money/',
'playlist_count': 26,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money',
}
},
{
}, {
'url': 'https://pluto.tv/on-demand/movies/arrival-2015-1-1',
'md5': '3cead001d317a018bf856a896dee1762',
'info_dict': {
@@ -75,7 +72,10 @@ class PlutoTVIE(InfoExtractor):
'description': 'When mysterious spacecraft touch down across the globe, an elite team - led by expert translator Louise Banks (Academy Award® nominee Amy Adams) races against time to decipher their intent.',
'duration': 9000,
}
},
}, {
'url': 'https://pluto.tv/en/on-demand/series/manhunters-fugitive-task-force/seasons/1/episode/third-times-the-charm-1-1',
'only_matching': True,
}
]
def _to_ad_free_formats(self, video_id, formats, subtitles):

View File

@@ -14,6 +14,7 @@ from ..compat import (
)
from .openload import PhantomJSwrapper
from ..utils import (
clean_html,
determine_ext,
ExtractorError,
int_or_none,
@@ -30,6 +31,7 @@ from ..utils import (
class PornHubBaseIE(InfoExtractor):
_NETRC_MACHINE = 'pornhub'
_PORNHUB_HOST_RE = r'(?:(?P<host>pornhub(?:premium)?\.(?:com|net|org))|pornhubthbh7ap3u\.onion)'
def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs):
@@ -122,11 +124,13 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:[^/]+\.)?
%s
/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/
)
(?P<id>[\da-z]+)
'''
''' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': 'a6391306d050e4547f62b3f485dd9ba9',
@@ -145,6 +149,7 @@ class PornHubIE(PornHubBaseIE):
'age_limit': 18,
'tags': list,
'categories': list,
'cast': list,
},
}, {
# non-ASCII title
@@ -236,6 +241,13 @@ class PornHubIE(PornHubBaseIE):
}, {
'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5f75b0f4b18e3',
'only_matching': True,
}, {
# geo restricted
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5a9813bfa7156',
'only_matching': True,
}, {
'url': 'http://pornhubthbh7ap3u.onion/view_video.php?viewkey=ph5a9813bfa7156',
'only_matching': True,
}]
@staticmethod
@@ -275,6 +287,11 @@ class PornHubIE(PornHubBaseIE):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
if any(re.search(p, webpage) for p in (
r'class=["\']geoBlocked["\']',
r'>\s*This content is unavailable in your country')):
self.raise_geo_restricted()
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
@@ -408,17 +425,14 @@ class PornHubIE(PornHubBaseIE):
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
return
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
if mobj:
if not height:
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
if not height:
height = int_or_none(self._search_regex(
r'(?P<height>\d+)[pP]?_\d+[kK]', format_url, 'height',
default=None))
formats.append({
'url': format_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
for video_url, height in video_urls:
@@ -440,7 +454,10 @@ class PornHubIE(PornHubBaseIE):
add_format(video_url, height)
continue
add_format(video_url)
self._sort_formats(formats)
# field_preference is unnecessary here, but kept for code-similarity with youtube-dl
self._sort_formats(
formats, field_preference=('height', 'width', 'fps', 'format_id'))
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
@@ -464,7 +481,7 @@ class PornHubIE(PornHubBaseIE):
r'(?s)<div[^>]+\bclass=["\'].*?\b%sWrapper[^>]*>(.+?)</div>'
% meta_key, webpage, meta_key, default=None)
if div:
return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div)
return [clean_html(x).strip() for x in re.findall(r'(?s)<a[^>]+\bhref=[^>]+>.+?</a>', div)]
info = self._search_json_ld(webpage, video_id, default={})
# description provided in JSON-LD is irrelevant
@@ -485,6 +502,7 @@ class PornHubIE(PornHubBaseIE):
'age_limit': 18,
'tags': extract_list('tags'),
'categories': extract_list('categories'),
'cast': extract_list('pornstars'),
'subtitles': subtitles,
}, info)
@@ -513,7 +531,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?%s/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118,
@@ -542,6 +560,9 @@ class PornHubUserIE(PornHubPlaylistBaseIE):
# Same as before, multi page
'url': 'https://www.pornhubpremium.com/pornstar/lily-labeau',
'only_matching': True,
}, {
'url': 'https://pornhubthbh7ap3u.onion/model/zoe_ph',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -617,7 +638,7 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_VALID_URL = r'https?://(?:[^/]+\.)?%s/(?P<id>(?:[^/]+/)*[^/?#&]+)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True,
@@ -722,6 +743,9 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
}, {
'url': 'https://de.pornhub.com/playlist/4667351',
'only_matching': True,
}, {
'url': 'https://pornhubthbh7ap3u.onion/model/zoe_ph/videos',
'only_matching': True,
}]
@classmethod
@@ -732,7 +756,7 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?%s/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': {
@@ -742,4 +766,7 @@ class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
}, {
'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload',
'only_matching': True,
}, {
'url': 'http://pornhubthbh7ap3u.onion/pornstar/jenny-blighe/videos/upload',
'only_matching': True,
}]

242
yt_dlp/extractor/rcti.py Normal file
View File

@@ -0,0 +1,242 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .openload import PhantomJSwrapper
from .common import InfoExtractor
from ..utils import (
ExtractorError,
RegexNotFoundError,
strip_or_none,
try_get
)
class RCTIPlusBaseIE(InfoExtractor):
def _real_initialize(self):
self._AUTH_KEY = self._download_json(
'https://api.rctiplus.com/api/v1/visitor?platform=web', # platform can be web, mweb, android, ios
None, 'Fetching authorization key')['data']['access_token']
def _call_api(self, url, video_id, note=None):
json = self._download_json(
url, video_id, note=note, headers={'Authorization': self._AUTH_KEY})
if json.get('status', {}).get('code', 0) != 0:
raise ExtractorError('%s said: %s' % (self.IE_NAME, json["status"]["message_client"]), cause=json)
return json.get('data'), json.get('meta')
class RCTIPlusIE(RCTIPlusBaseIE):
_VALID_URL = r'https://www\.rctiplus\.com/programs/\d+?/.*?/(?P<type>episode|clip|extra)/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.rctiplus.com/programs/1259/kiko-untuk-lola/episode/22124/untuk-lola',
'md5': '56ed45affad45fa18d5592a1bc199997',
'info_dict': {
'id': 'v_e22124',
'title': 'Untuk Lola',
'display_id': 'untuk-lola',
'description': 'md5:2b809075c0b1e071e228ad6d13e41deb',
'ext': 'mp4',
'duration': 1400,
'timestamp': 1615978800,
'upload_date': '20210317',
'series': 'Kiko : Untuk Lola',
'season_number': 1,
'episode_number': 1,
'channel': 'RCTI',
},
'params': {
'fixup': 'never',
},
}, { # Clip; Series title doesn't appear on metadata JSON
'url': 'https://www.rctiplus.com/programs/316/cahaya-terindah/clip/3921/make-a-wish',
'md5': 'd179b2ff356f0e91a53bcc6a4d8504f0',
'info_dict': {
'id': 'v_c3921',
'title': 'Make A Wish',
'display_id': 'make-a-wish',
'description': 'Make A Wish',
'ext': 'mp4',
'duration': 288,
'timestamp': 1571652600,
'upload_date': '20191021',
'series': 'Cahaya Terindah',
'channel': 'RCTI',
},
'params': {
'fixup': 'never',
},
}, { # Extra
'url': 'https://www.rctiplus.com/programs/616/inews-malam/extra/9438/diungkapkan-melalui-surat-terbuka-ceo-ruangguru-belva-devara-mundur-dari-staf-khusus-presiden',
'md5': 'c48106afdbce609749f5e0c007d9278a',
'info_dict': {
'id': 'v_ex9438',
'title': 'md5:2ede828c0f8bde249e0912be150314ca',
'display_id': 'md5:62b8d4e9ff096db527a1ad797e8a9933',
'description': 'md5:2ede828c0f8bde249e0912be150314ca',
'ext': 'mp4',
'duration': 93,
'timestamp': 1587561540,
'upload_date': '20200422',
'series': 'iNews Malam',
'channel': 'INews',
},
'params': {
'format': 'bestvideo',
},
}]
def _search_auth_key(self, webpage):
try:
self._AUTH_KEY = self._search_regex(
r'\'Authorization\':"(?P<auth>[^"]+)"', webpage, 'auth-key')
except RegexNotFoundError:
pass
def _real_extract(self, url):
video_type, video_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
self._search_auth_key(webpage)
video_json = self._call_api(
'https://api.rctiplus.com/api/v1/%s/%s/url?appierid=.1' % (video_type, video_id), display_id, 'Downloading video URL JSON')[0]
video_url = video_json['url']
if 'akamaized' in video_url:
# Akamai's CDN requires a session to at least be made via Conviva's API
# TODO: Reverse-engineer Conviva's heartbeat code to avoid phantomJS
phantom = None
try:
phantom = PhantomJSwrapper(self)
phantom.get(url, webpage, display_id, note2='Initiating video session')
except ExtractorError:
self.report_warning('PhantomJS is highly recommended for this video, as it might load incredibly slowly otherwise.'
'You can also try opening the page in this device\'s browser first')
video_meta, meta_paths = self._call_api(
'https://api.rctiplus.com/api/v1/%s/%s' % (video_type, video_id), display_id, 'Downloading video metadata')
thumbnails, image_path = [], meta_paths.get('image_path', 'https://rstatic.akamaized.net/media/')
if video_meta.get('portrait_image'):
thumbnails.append({
'id': 'portrait_image',
'url': '%s%d%s' % (image_path, 2000, video_meta['portrait_image']) # 2000px seems to be the highest resolution that can be given
})
if video_meta.get('landscape_image'):
thumbnails.append({
'id': 'landscape_image',
'url': '%s%d%s' % (image_path, 2000, video_meta['landscape_image'])
})
formats = self._extract_m3u8_formats(video_url, display_id, 'mp4', headers={'Referer': 'https://www.rctiplus.com/'})
for f in formats:
if 'akamaized' in f['url']:
f.setdefault('http_headers', {})['Referer'] = 'https://www.rctiplus.com/' # Referer header is required for akamai CDNs
self._sort_formats(formats)
return {
'id': video_meta.get('product_id') or video_json.get('product_id'),
'title': video_meta.get('title') or video_json.get('content_name'),
'display_id': display_id,
'description': video_meta.get('summary'),
'timestamp': video_meta.get('release_date'),
'duration': video_meta.get('duration'),
'categories': [video_meta.get('genre')],
'average_rating': video_meta.get('star_rating'),
'series': video_meta.get('program_title') or video_json.get('program_title'),
'season_number': video_meta.get('season'),
'episode_number': video_meta.get('episode'),
'channel': video_json.get('tv_name'),
'channel_id': video_json.get('tv_id'),
'formats': formats,
'thumbnails': thumbnails
}
class RCTIPlusSeriesIE(RCTIPlusBaseIE):
_VALID_URL = r'https://www\.rctiplus\.com/programs/(?P<id>\d+)/(?P<display_id>[^/?#&]+)(?:\W)*$'
_TESTS = [{
'url': 'https://www.rctiplus.com/programs/540/upin-ipin',
'playlist_mincount': 417,
'info_dict': {
'id': '540',
'title': 'Upin & Ipin',
'description': 'md5:22cc912381f389664416844e1ec4f86b',
},
}, {
'url': 'https://www.rctiplus.com/programs/540/upin-ipin/#',
'only_matching': True,
}]
_AGE_RATINGS = { # Based off https://id.wikipedia.org/wiki/Sistem_rating_konten_televisi with additional ratings
'S-SU': 2,
'SU': 2,
'P': 2,
'A': 7,
'R': 13,
'R-R/1': 17, # Labelled as 17+ despite being R
'D': 18,
}
def _entries(self, url, display_id=None, note='Downloading entries JSON', metadata={}):
total_pages = 0
try:
total_pages = self._call_api(
'%s&length=20&page=0' % url,
display_id, note)[1]['pagination']['total_page']
except ExtractorError as e:
if 'not found' in str(e):
return []
raise e
if total_pages <= 0:
return []
for page_num in range(1, total_pages + 1):
episode_list = self._call_api(
'%s&length=20&page=%s' % (url, page_num),
display_id, '%s page %s' % (note, page_num))[0] or []
for video_json in episode_list:
link = video_json['share_link']
url_res = self.url_result(link, 'RCTIPlus', video_json.get('product_id'), video_json.get('title'))
url_res.update(metadata)
yield url_res
def _real_extract(self, url):
series_id, display_id = re.match(self._VALID_URL, url).groups()
series_meta, meta_paths = self._call_api(
'https://api.rctiplus.com/api/v1/program/%s/detail' % series_id, display_id, 'Downloading series metadata')
metadata = {
'age_limit': try_get(series_meta, lambda x: self._AGE_RATINGS[x['age_restriction'][0]['code']])
}
cast = []
for star in series_meta.get('starring', []):
cast.append(strip_or_none(star.get('name')))
for star in series_meta.get('creator', []):
cast.append(strip_or_none(star.get('name')))
for star in series_meta.get('writer', []):
cast.append(strip_or_none(star.get('name')))
metadata['cast'] = cast
tags = []
for tag in series_meta.get('tag', []):
tags.append(strip_or_none(tag.get('name')))
metadata['tag'] = tags
entries = []
seasons_list = self._call_api(
'https://api.rctiplus.com/api/v1/program/%s/season' % series_id, display_id, 'Downloading seasons list JSON')[0]
for season in seasons_list:
entries.append(self._entries('https://api.rctiplus.com/api/v2/program/%s/episode?season=%s' % (series_id, season['season']),
display_id, 'Downloading season %s episode entries' % season['season'], metadata))
entries.append(self._entries('https://api.rctiplus.com/api/v2/program/%s/clip?content_id=0' % series_id,
display_id, 'Downloading clip entries', metadata))
entries.append(self._entries('https://api.rctiplus.com/api/v2/program/%s/extra?content_id=0' % series_id,
display_id, 'Downloading extra entries', metadata))
return self.playlist_result(itertools.chain(*entries), series_id, series_meta.get('title'), series_meta.get('summary'), **metadata)

View File

@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import itertools
import re
import json
import random
# import random
from .common import (
InfoExtractor,
@@ -164,23 +164,11 @@ class SoundcloudIE(InfoExtractor):
},
# downloadable song
{
'url': 'https://soundcloud.com/oddsamples/bus-brakes',
'md5': '7624f2351f8a3b2e7cd51522496e7631',
'url': 'https://soundcloud.com/the80m/the-following',
'md5': '9ffcddb08c87d74fb5808a3c183a1d04',
'info_dict': {
'id': '128590877',
'ext': 'mp3',
'title': 'Bus Brakes',
'description': 'md5:0053ca6396e8d2fd7b7e1595ef12ab66',
'uploader': 'oddsamples',
'uploader_id': '73680509',
'timestamp': 1389232924,
'upload_date': '20140109',
'duration': 17.346,
'license': 'cc-by-sa',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
'id': '343609555',
'ext': 'wav',
},
},
# private link, downloadable format
@@ -317,12 +305,13 @@ class SoundcloudIE(InfoExtractor):
raise
def _real_initialize(self):
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or "T5R4kgWS2PRf6lzLyIravUMnKlbIxQag" # 'EXLwg5lHTO2dslU5EePe3xkw0m1h86Cd' # 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'fXuVKzsVXlc6tzniWWS31etd7VHWFUuN' # persistent `client_id`
self._login()
_USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
_NETRC_MACHINE = 'soundcloud'
@@ -332,6 +321,23 @@ class SoundcloudIE(InfoExtractor):
if username is None:
return
if username == 'oauth' and password is not None:
self._access_token = password
query = self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID
payload = {'session': {'access_token': self._access_token}}
token_verification = sanitized_Request(self._API_VERIFY_AUTH_TOKEN % query, json.dumps(payload).encode('utf-8'))
response = self._download_json(token_verification, None, note='Verifying login token...', fatal=False)
if response is not False:
self._HEADERS = {'Authorization': 'OAuth ' + self._access_token}
self.report_login()
else:
self.report_warning('Provided authorization token seems to be invalid. Continue as guest')
elif username is not None:
self.report_warning(
'Login using username and password is not currently supported. '
'Use "--user oauth --password <oauth_token>" to login using an oauth token')
r'''
def genDevId():
def genNumBlock():
return ''.join([str(random.randrange(10)) for i in range(6)])
@@ -358,6 +364,7 @@ class SoundcloudIE(InfoExtractor):
self.report_warning('Unable to get access token, login may has failed')
else:
self._HEADERS = {'Authorization': 'OAuth ' + self._access_token}
'''
# signature generation
def sign(self, user, pw, clid):
@@ -370,9 +377,9 @@ class SoundcloudIE(InfoExtractor):
b = 37
k = 37
c = 5
n = "0763ed7314c69015fd4a0dc16bbf4b90" # _KEY
y = "8" # _REV
r = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36" # _USER_AGENT
n = '0763ed7314c69015fd4a0dc16bbf4b90' # _KEY
y = '8' # _REV
r = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36' # _USER_AGENT
e = user # _USERNAME
t = clid # _CLIENT_ID

View File

@@ -16,7 +16,7 @@ from ..utils import (
class TBSIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com(?P<path>/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+))'
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com(?P<path>/(?:movies|watchtnt|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+))'
_TESTS = [{
'url': 'http://www.tntdrama.com/shows/the-alienist/clips/monster',
'info_dict': {
@@ -45,7 +45,8 @@ class TBSIE(TurnerBaseIE):
drupal_settings = self._parse_json(self._search_regex(
r'<script[^>]+?data-drupal-selector="drupal-settings-json"[^>]*?>({.+?})</script>',
webpage, 'drupal setting'), display_id)
video_data = next(v for v in drupal_settings['turner_playlist'] if v.get('url') == path)
isLive = 'watchtnt' in path
video_data = next(v for v in drupal_settings['turner_playlist'] if isLive or v.get('url') == path)
media_id = video_data['mediaID']
title = video_data['title']
@@ -56,7 +57,8 @@ class TBSIE(TurnerBaseIE):
media_id, tokenizer_query, {
'url': url,
'site_name': site[:3].upper(),
'auth_required': video_data.get('authRequired') == '1',
'auth_required': video_data.get('authRequired') == '1' or isLive,
'is_live': isLive
})
thumbnails = []
@@ -85,5 +87,6 @@ class TBSIE(TurnerBaseIE):
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'thumbnails': thumbnails,
'is_live': isLive
})
return info

View File

@@ -221,6 +221,7 @@ class TurnerBaseIE(AdobePassIE):
}
def _extract_ngtv_info(self, media_id, tokenizer_query, ap_data=None):
is_live = ap_data.get('is_live')
streams_data = self._download_json(
'http://medium.ngtv.io/media/%s/tv' % media_id,
media_id)['media']['tv']
@@ -237,11 +238,11 @@ class TurnerBaseIE(AdobePassIE):
'http://token.ngtv.io/token/token_spe',
m3u8_url, media_id, ap_data or {}, tokenizer_query)
formats.extend(self._extract_m3u8_formats(
m3u8_url, media_id, 'mp4', m3u8_id='hls', fatal=False))
m3u8_url, media_id, 'mp4', m3u8_id='hls', live=is_live, fatal=False))
duration = float_or_none(stream_data.get('totalRuntime'))
if not chapters:
if not chapters and not is_live:
for chapter in stream_data.get('contentSegments', []):
start_time = float_or_none(chapter.get('start'))
chapter_duration = float_or_none(chapter.get('duration'))

View File

@@ -5,12 +5,14 @@ import itertools
import re
from .common import InfoExtractor
from ..downloader.websocket import has_websockets
from ..utils import (
clean_html,
float_or_none,
get_element_by_class,
get_element_by_id,
parse_duration,
qualities,
str_to_int,
try_get,
unified_timestamp,
@@ -89,9 +91,24 @@ class TwitCastingIE(InfoExtractor):
video_js_data = video_js_data[0]
m3u8_url = try_get(video_js_data, lambda x: x['source']['url'])
stream_server_data = self._download_json(
'https://twitcasting.tv/streamserver.php?target=%s&mode=client' % uploader_id, video_id,
'Downloading live info', fatal=False)
is_live = 'data-status="online"' in webpage
formats = []
if is_live and not m3u8_url:
m3u8_url = 'https://twitcasting.tv/%s/metastream.m3u8' % uploader_id
if is_live and has_websockets and stream_server_data:
qq = qualities(['base', 'mobilesource', 'main'])
for mode, ws_url in stream_server_data['llfmp4']['streams'].items():
formats.append({
'url': ws_url,
'format_id': 'ws-%s' % mode,
'ext': 'mp4',
'quality': qq(mode),
'protocol': 'websocket_frag', # TwitCasting simply sends moof atom directly over WS
})
thumbnail = video_js_data.get('thumbnailUrl') or self._og_search_thumbnail(webpage)
description = clean_html(get_element_by_id(
@@ -106,10 +123,9 @@ class TwitCastingIE(InfoExtractor):
r'data-toggle="true"[^>]+datetime="([^"]+)"',
webpage, 'datetime', None))
formats = None
if m3u8_url:
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', live=is_live)
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', live=is_live))
self._sort_formats(formats)
return {

View File

@@ -28,7 +28,7 @@ class UMGDeIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'https://api.universal-music.de/graphql',
'https://graphql.universal-music.de/',
video_id, query={
'query': '''{
universalMusic(channel:16) {
@@ -56,11 +56,9 @@ class UMGDeIE(InfoExtractor):
formats = []
def add_m3u8_format(format_id):
m3u8_formats = self._extract_m3u8_formats(
formats.extend(self._extract_m3u8_formats(
hls_url_template % format_id, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal='False')
if m3u8_formats and m3u8_formats[0].get('height'):
formats.extend(m3u8_formats)
'm3u8_native', m3u8_id='hls', fatal=False))
for f in video_data.get('formats', []):
f_url = f.get('url')

View File

@@ -12,6 +12,7 @@ from ..utils import (
mimetype2ext,
parse_codecs,
update_url_query,
urljoin,
xpath_element,
xpath_text,
)
@@ -19,6 +20,7 @@ from ..compat import (
compat_b64decode,
compat_ord,
compat_struct_pack,
compat_urlparse,
)
@@ -95,9 +97,13 @@ class VideaIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
query = {'v': video_id}
player_page = self._download_webpage(
'https://videa.hu/player', video_id, query=query)
video_page = self._download_webpage(url, video_id)
player_url = self._search_regex(
r'<iframe.*?src="(/player\?[^"]+)"', video_page, 'player url')
player_url = urljoin(url, player_url)
player_page = self._download_webpage(player_url, video_id)
nonce = self._search_regex(
r'_xt\s*=\s*"([^"]+)"', player_page, 'nonce')
@@ -107,6 +113,7 @@ class VideaIE(InfoExtractor):
for i in range(0, 32):
result += s[i - (self._STATIC_SECRET.index(l[i]) - 31)]
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(player_url).query)
random_seed = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(8))
query['_s'] = random_seed
query['_t'] = result[:16]

View File

@@ -142,6 +142,7 @@ class VikiIE(VikiBaseIE):
IE_NAME = 'viki'
_VALID_URL = r'%s(?:videos|player)/(?P<id>[0-9]+v)' % VikiBaseIE._VALID_URL_BASE
_TESTS = [{
'note': 'Free non-DRM video with storyboards in MPD',
'url': 'https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1',
'info_dict': {
'id': '1175236v',
@@ -155,7 +156,6 @@ class VikiIE(VikiBaseIE):
'params': {
'format': 'bestvideo',
},
'expected_warnings': ['Unknown MIME type image/jpeg in DASH manifest'],
}, {
'url': 'http://www.viki.com/videos/1023585v-heirs-episode-14',
'info_dict': {
@@ -173,7 +173,6 @@ class VikiIE(VikiBaseIE):
'format': 'bestvideo',
},
'skip': 'Blocked in the US',
'expected_warnings': ['Unknown MIME type image/jpeg in DASH manifest'],
}, {
# clip
'url': 'http://www.viki.com/videos/1067139v-the-avengers-age-of-ultron-press-conference',
@@ -225,7 +224,6 @@ class VikiIE(VikiBaseIE):
'params': {
'format': 'bestvideo',
},
'expected_warnings': ['Unknown MIME type image/jpeg in DASH manifest'],
}, {
# youtube external
'url': 'http://www.viki.com/videos/50562v-poor-nastya-complete-episode-1',
@@ -264,7 +262,6 @@ class VikiIE(VikiBaseIE):
'params': {
'format': 'bestvideo',
},
'expected_warnings': ['Unknown MIME type image/jpeg in DASH manifest'],
}]
def _real_extract(self, url):

View File

@@ -22,6 +22,7 @@ from ..utils import (
)
from .brightcove import BrightcoveNewIE
from .youtube import YoutubeIE
class YahooIE(InfoExtractor):
@@ -38,6 +39,7 @@ class YahooIE(InfoExtractor):
'timestamp': 1369812016,
'upload_date': '20130529',
},
'skip': 'No longer exists',
}, {
'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
'md5': '7993e572fac98e044588d0b5260f4352',
@@ -50,6 +52,7 @@ class YahooIE(InfoExtractor):
'timestamp': 1406838636,
'upload_date': '20140731',
},
'skip': 'Unfortunately, this video is not available in your region',
}, {
'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
'md5': '71298482f7c64cbb7fa064e4553ff1c1',
@@ -61,7 +64,8 @@ class YahooIE(InfoExtractor):
'duration': 97,
'timestamp': 1414489862,
'upload_date': '20141028',
}
},
'skip': 'No longer exists',
}, {
'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
'md5': '88e209b417f173d86186bef6e4d1f160',
@@ -120,6 +124,7 @@ class YahooIE(InfoExtractor):
'season_number': 6,
'episode_number': 1,
},
'skip': 'No longer exists',
}, {
# ytwnews://cavideo/
'url': 'https://tw.video.yahoo.com/movie-tw/單車天使-中文版預-092316541.html',
@@ -156,7 +161,7 @@ class YahooIE(InfoExtractor):
'id': '352CFDOQrKg',
'ext': 'mp4',
'title': 'Kyndal Inskeep "Performs the Hell Out of" Sia\'s "Elastic Heart" - The Voice Knockouts 2019',
'description': 'md5:35b61e94c2ae214bc965ff4245f80d11',
'description': 'md5:7fe8e3d5806f96002e55f190d1d94479',
'uploader': 'The Voice',
'uploader_id': 'NBCTheVoice',
'upload_date': '20191029',
@@ -165,7 +170,7 @@ class YahooIE(InfoExtractor):
'params': {
'playlistend': 2,
},
'expected_warnings': ['HTTP Error 404'],
'expected_warnings': ['HTTP Error 404', 'Ignoring subtitle tracks'],
}, {
'url': 'https://malaysia.news.yahoo.com/video/bystanders-help-ontario-policeman-bust-190932818.html',
'only_matching': True,
@@ -280,12 +285,13 @@ class YahooIE(InfoExtractor):
else:
country = country.split('-')[0]
item = self._download_json(
items = self._download_json(
'https://%s.yahoo.com/caas/content/article' % country, display_id,
'Downloading content JSON metadata', query={
'url': url
})['items'][0]['data']['partnerData']
})['items'][0]
item = items['data']['partnerData']
if item.get('type') != 'video':
entries = []
@@ -299,9 +305,19 @@ class YahooIE(InfoExtractor):
for e in (item.get('body') or []):
if e.get('type') == 'videoIframe':
iframe_url = e.get('url')
if not iframe_url:
continue
if iframe_url:
entries.append(self.url_result(iframe_url))
if item.get('type') == 'storywithleadvideo':
iframe_url = try_get(item, lambda x: x['meta']['player']['url'])
if iframe_url:
entries.append(self.url_result(iframe_url))
else:
self.report_warning("Yahoo didn't provide an iframe url for this storywithleadvideo")
if items.get('markup'):
entries.extend(
self.url_result(yt_url) for yt_url in YoutubeIE._extract_urls(items['markup']))
return self.playlist_result(
entries, item.get('uuid'),

View File

@@ -3,6 +3,7 @@
from __future__ import unicode_literals
import calendar
import copy
import hashlib
import itertools
import json
@@ -294,39 +295,181 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if not self._login():
return
_YT_WEB_CLIENT_VERSION = '2.20210407.08.00'
_YT_INNERTUBE_API_KEY = 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8'
_YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;'
_YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;'
_YT_INITIAL_BOUNDARY_RE = r'(?:var\s+meta|</script|\n)'
def _generate_sapisidhash_header(self):
sapisid_cookie = self._get_cookies('https://www.youtube.com').get('SAPISID')
_YT_DEFAULT_YTCFGS = {
'WEB': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB',
'INNERTUBE_CLIENT_VERSION': '2.20210622.10.00',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20210622.10.00',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1
},
'WEB_REMIX': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB_REMIX',
'INNERTUBE_CLIENT_VERSION': '1.20210621.00.00',
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_REMIX',
'clientVersion': '1.20210621.00.00',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 67
},
'WEB_EMBEDDED_PLAYER': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB_EMBEDDED_PLAYER',
'INNERTUBE_CLIENT_VERSION': '1.20210620.0.1',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_EMBEDDED_PLAYER',
'clientVersion': '1.20210620.0.1',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 56
},
'ANDROID': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 'ANDROID'
},
'ANDROID_EMBEDDED_PLAYER': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID_EMBEDDED_PLAYER',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_EMBEDDED_PLAYER',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 'ANDROID_EMBEDDED_PLAYER'
},
'ANDROID_MUSIC': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID_MUSIC',
'INNERTUBE_CLIENT_VERSION': '4.32',
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_MUSIC',
'clientVersion': '4.32',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 'ANDROID_MUSIC'
}
}
_YT_DEFAULT_INNERTUBE_HOSTS = {
'DIRECT': 'youtubei.googleapis.com',
'WEB': 'www.youtube.com',
'WEB_REMIX': 'music.youtube.com',
'ANDROID_MUSIC': 'music.youtube.com'
}
def _get_default_ytcfg(self, client='WEB'):
if client in self._YT_DEFAULT_YTCFGS:
return copy.deepcopy(self._YT_DEFAULT_YTCFGS[client])
self.write_debug(f'INNERTUBE default client {client} does not exist - falling back to WEB client.')
return copy.deepcopy(self._YT_DEFAULT_YTCFGS['WEB'])
def _get_innertube_host(self, client='WEB'):
return dict_get(self._YT_DEFAULT_INNERTUBE_HOSTS, (client, 'WEB'))
def _ytcfg_get_safe(self, ytcfg, getter, expected_type=None, default_client='WEB'):
# try_get but with fallback to default ytcfg client values when present
_func = lambda y: try_get(y, getter, expected_type)
return _func(ytcfg) or _func(self._get_default_ytcfg(default_client))
def _extract_client_name(self, ytcfg, default_client='WEB'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CLIENT_NAME'], compat_str, default_client)
def _extract_client_version(self, ytcfg, default_client='WEB'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str, default_client)
def _extract_api_key(self, ytcfg=None, default_client='WEB'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_API_KEY'], compat_str, default_client)
def _extract_context(self, ytcfg=None, default_client='WEB'):
_get_context = lambda y: try_get(y, lambda x: x['INNERTUBE_CONTEXT'], dict)
context = _get_context(ytcfg)
if context:
return context
context = _get_context(self._get_default_ytcfg(default_client))
if not ytcfg:
return context
# Recreate the client context (required)
context['client'].update({
'clientVersion': self._extract_client_version(ytcfg, default_client),
'clientName': self._extract_client_name(ytcfg, default_client),
})
visitor_data = try_get(ytcfg, lambda x: x['VISITOR_DATA'], compat_str)
if visitor_data:
context['client']['visitorData'] = visitor_data
return context
def _generate_sapisidhash_header(self, origin='https://www.youtube.com'):
# Sometimes SAPISID cookie isn't present but __Secure-3PAPISID is.
# See: https://github.com/yt-dlp/yt-dlp/issues/393
yt_cookies = self._get_cookies('https://www.youtube.com')
sapisid_cookie = dict_get(
yt_cookies, ('__Secure-3PAPISID', 'SAPISID'))
if sapisid_cookie is None:
return
time_now = round(time.time())
sapisidhash = hashlib.sha1((str(time_now) + " " + sapisid_cookie.value + " " + "https://www.youtube.com").encode("utf-8")).hexdigest()
return "SAPISIDHASH %s_%s" % (time_now, sapisidhash)
# SAPISID cookie is required if not already present
if not yt_cookies.get('SAPISID'):
self._set_cookie(
'.youtube.com', 'SAPISID', sapisid_cookie.value, secure=True, expire_time=time_now + 3600)
# SAPISIDHASH algorithm from https://stackoverflow.com/a/32065323
sapisidhash = hashlib.sha1(
f'{time_now} {sapisid_cookie.value} {origin}'.encode('utf-8')).hexdigest()
return f'SAPISIDHASH {time_now}_{sapisidhash}'
def _call_api(self, ep, query, video_id, fatal=True, headers=None,
note='Downloading API JSON', errnote='Unable to download API page',
context=None, api_key=None):
context=None, api_key=None, api_hostname=None, default_client='WEB'):
data = {'context': context} if context else {'context': self._extract_context()}
data = {'context': context} if context else {'context': self._extract_context(default_client=default_client)}
data.update(query)
real_headers = self._generate_api_headers()
real_headers = self._generate_api_headers(client=default_client)
real_headers.update({'content-type': 'application/json'})
if headers:
real_headers.update(headers)
return self._download_json(
'https://www.youtube.com/youtubei/v1/%s' % ep,
'https://%s/youtubei/v1/%s' % (api_hostname or self._get_innertube_host(default_client), ep),
video_id=video_id, fatal=fatal, note=note, errnote=errnote,
data=json.dumps(data).encode('utf8'), headers=real_headers,
query={'key': api_key or self._extract_api_key()})
def _extract_api_key(self, ytcfg=None):
return try_get(ytcfg, lambda x: x['INNERTUBE_API_KEY'], compat_str) or self._YT_INNERTUBE_API_KEY
def _extract_yt_initial_data(self, video_id, webpage):
return self._parse_json(
self._search_regex(
@@ -368,46 +511,118 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
default='{}'), video_id, fatal=False) or {}
def __extract_client_version(self, ytcfg):
return try_get(ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str) or self._YT_WEB_CLIENT_VERSION
def _extract_context(self, ytcfg=None):
context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'], dict)
if context:
return context
# Recreate the client context (required)
client_version = self.__extract_client_version(ytcfg)
client_name = try_get(ytcfg, lambda x: x['INNERTUBE_CLIENT_NAME'], compat_str) or 'WEB'
context = {
'client': {
'clientName': client_name,
'clientVersion': client_version,
}
}
visitor_data = try_get(ytcfg, lambda x: x['VISITOR_DATA'], compat_str)
if visitor_data:
context['client']['visitorData'] = visitor_data
return context
def _generate_api_headers(self, ytcfg=None, identity_token=None, account_syncid=None, visitor_data=None):
def _generate_api_headers(self, ytcfg=None, identity_token=None, account_syncid=None,
visitor_data=None, api_hostname=None, client='WEB'):
origin = 'https://' + (api_hostname if api_hostname else self._get_innertube_host(client))
headers = {
'X-YouTube-Client-Name': '1',
'X-YouTube-Client-Version': self.__extract_client_version(ytcfg),
'X-YouTube-Client-Name': compat_str(
self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CONTEXT_CLIENT_NAME'], default_client=client)),
'X-YouTube-Client-Version': self._extract_client_version(ytcfg, client),
'Origin': origin
}
if identity_token:
headers['x-youtube-identity-token'] = identity_token
headers['X-Youtube-Identity-Token'] = identity_token
if account_syncid:
headers['X-Goog-PageId'] = account_syncid
headers['X-Goog-AuthUser'] = 0
if visitor_data:
headers['x-goog-visitor-id'] = visitor_data
auth = self._generate_sapisidhash_header()
headers['X-Goog-Visitor-Id'] = visitor_data
auth = self._generate_sapisidhash_header(origin)
if auth is not None:
headers['Authorization'] = auth
headers['X-Origin'] = 'https://www.youtube.com'
headers['X-Origin'] = origin
return headers
@staticmethod
def _extract_alerts(data):
for alert_dict in try_get(data, lambda x: x['alerts'], list) or []:
if not isinstance(alert_dict, dict):
continue
for alert in alert_dict.values():
alert_type = alert.get('type')
if not alert_type:
continue
message = try_get(alert, lambda x: x['text']['simpleText'], compat_str) or ''
if message:
yield alert_type, message
for run in try_get(alert, lambda x: x['text']['runs'], list) or []:
message += try_get(run, lambda x: x['text'], compat_str)
if message:
yield alert_type, message
def _report_alerts(self, alerts, expected=True):
errors = []
warnings = []
for alert_type, alert_message in alerts:
if alert_type.lower() == 'error':
errors.append([alert_type, alert_message])
else:
warnings.append([alert_type, alert_message])
for alert_type, alert_message in (warnings + errors[:-1]):
self.report_warning('YouTube said: %s - %s' % (alert_type, alert_message))
if errors:
raise ExtractorError('YouTube said: %s' % errors[-1][1], expected=expected)
def _extract_and_report_alerts(self, data, *args, **kwargs):
return self._report_alerts(self._extract_alerts(data), *args, **kwargs)
def _extract_response(self, item_id, query, note='Downloading API JSON', headers=None,
ytcfg=None, check_get_keys=None, ep='browse', fatal=True, api_hostname=None,
default_client='WEB'):
response = None
last_error = None
count = -1
retries = self.get_param('extractor_retries', 3)
if check_get_keys is None:
check_get_keys = []
while count < retries:
count += 1
if last_error:
self.report_warning('%s. Retrying ...' % last_error)
try:
response = self._call_api(
ep=ep, fatal=True, headers=headers,
video_id=item_id, query=query,
context=self._extract_context(ytcfg, default_client),
api_key=self._extract_api_key(ytcfg, default_client),
api_hostname=api_hostname, default_client=default_client,
note='%s%s' % (note, ' (retry #%d)' % count if count else ''))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503, 404):
# Downloading page may result in intermittent 5xx HTTP error
# Sometimes a 404 is also recieved. See: https://github.com/ytdl-org/youtube-dl/issues/28289
last_error = 'HTTP Error %s' % e.cause.code
if count < retries:
continue
if fatal:
raise
else:
self.report_warning(error_to_compat_str(e))
return
else:
# Youtube may send alerts if there was an issue with the continuation page
try:
self._extract_and_report_alerts(response, expected=False)
except ExtractorError as e:
if fatal:
raise
self.report_warning(error_to_compat_str(e))
return
if not check_get_keys or dict_get(response, check_get_keys):
break
# Youtube sometimes sends incomplete data
# See: https://github.com/ytdl-org/youtube-dl/issues/28194
last_error = 'Incomplete data received'
if count >= retries:
if fatal:
raise ExtractorError(last_error)
else:
self.report_warning(last_error)
return
return response
@staticmethod
def is_music_url(url):
return re.match(r'https?://music\.youtube\.com/', url) is not None
@@ -454,20 +669,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Invidious instances taken from https://github.com/iv-org/documentation/blob/master/Invidious-Instances.md
r'(?:www\.)?invidious\.pussthecat\.org',
r'(?:www\.)?invidious\.zee\.li',
r'(?:(?:www|au)\.)?ytprivate\.com',
r'(?:www\.)?invidious\.namazso\.eu',
r'(?:www\.)?invidious\.ethibox\.fr',
r'(?:www\.)?w6ijuptxiku4xpnnaetxvnkc5vqcdu7mgns2u77qefoixi63vbvnpnqd\.onion',
r'(?:www\.)?kbjggqkzv65ivcqj6bumvp337z6264huv5kpkwuv6gu5yjiskvan7fad\.onion',
r'(?:www\.)?invidious\.3o7z6yfxhbw7n3za4rss6l434kmv55cgw2vuziwuigpwegswvwzqipyd\.onion',
r'(?:www\.)?grwp24hodrefzvjjuccrkw3mjq4tzhaaq32amf33dzpmuxe7ilepcmad\.onion',
# youtube-dl invidious instances list
r'(?:(?:www|no)\.)?invidiou\.sh',
r'(?:(?:www|fi)\.)?invidious\.snopyta\.org',
r'(?:www\.)?invidious\.kabi\.tk',
r'(?:www\.)?invidious\.mastodon\.host',
r'(?:www\.)?invidious\.zapashcanon\.fr',
r'(?:www\.)?invidious\.kavin\.rocks',
r'(?:www\.)?(?:invidious(?:-us)?|piped)\.kavin\.rocks',
r'(?:www\.)?invidious\.tinfoil-hat\.net',
r'(?:www\.)?invidious\.himiko\.cloud',
r'(?:www\.)?invidious\.reallyancient\.tech',
@@ -494,6 +704,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.toot\.koeln',
r'(?:www\.)?invidious\.fdn\.fr',
r'(?:www\.)?watch\.nettohikari\.com',
r'(?:www\.)?invidious\.namazso\.eu',
r'(?:www\.)?invidious\.silkky\.cloud',
r'(?:www\.)?invidious\.exonip\.de',
r'(?:www\.)?invidious\.riverside\.rocks',
r'(?:www\.)?invidious\.blamefran\.net',
r'(?:www\.)?invidious\.moomoo\.de',
r'(?:www\.)?ytb\.trom\.tf',
r'(?:www\.)?yt\.cyberhost\.uk',
r'(?:www\.)?kgg2m7yk5aybusll\.onion',
r'(?:www\.)?qklhadlycap4cnod\.onion',
r'(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion',
@@ -502,6 +720,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion',
r'(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p',
r'(?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion',
r'(?:www\.)?w6ijuptxiku4xpnnaetxvnkc5vqcdu7mgns2u77qefoixi63vbvnpnqd\.onion',
r'(?:www\.)?kbjggqkzv65ivcqj6bumvp337z6264huv5kpkwuv6gu5yjiskvan7fad\.onion',
r'(?:www\.)?grwp24hodrefzvjjuccrkw3mjq4tzhaaq32amf33dzpmuxe7ilepcmad\.onion',
r'(?:www\.)?hpniueoejy4opn7bc4ftgazyqjoeqwlvh2uiku2xqku6zpoa4bf5ruid\.onion',
)
_VALID_URL = r"""(?x)^
(
@@ -650,6 +872,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
_AGE_GATE_REASONS = (
'Sign in to confirm your age',
'This video may be inappropriate for some users.',
'Sorry, this content is age-restricted.')
_GEO_BYPASS = False
IE_NAME = 'youtube'
@@ -1329,7 +1556,32 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# multiple subtitles with same lang_code
'url': 'https://www.youtube.com/watch?v=wsQiKKfKxug',
'only_matching': True,
}, {
# Force use android client fallback
'url': 'https://www.youtube.com/watch?v=YOelRv7fMxY',
'info_dict': {
'id': 'YOelRv7fMxY',
'title': 'Digging a Secret Tunnel from my Workshop',
'ext': '3gp',
'upload_date': '20210624',
'channel_id': 'UCp68_FLety0O-n9QU6phsgw',
'uploader': 'colinfurze',
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCp68_FLety0O-n9QU6phsgw',
'description': 'md5:ecb672623246d98c6c562eed6ae798c3'
},
'params': {
'format': '17', # 3gp format available on android
'extractor_args': {'youtube': {'player_client': ['android']}},
},
},
{
# Skip download of additional client configs (remix client config in this case)
'url': 'https://music.youtube.com/watch?v=MgNrAu2pzNs',
'only_matching': True,
'params': {
'extractor_args': {'youtube': {'player_skip': ['configs']}},
},
}
]
@classmethod
@@ -1347,6 +1599,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._code_cache = {}
self._player_cache = {}
def _extract_player_url(self, ytcfg=None, webpage=None):
player_url = try_get(ytcfg, (lambda x: x['PLAYER_JS_URL']), str)
if not player_url:
player_url = self._search_regex(
r'"(?:PLAYER_JS_URL|jsUrl)"\s*:\s*"([^"]+)"',
webpage, 'player URL', fatal=False)
if player_url.startswith('//'):
player_url = 'https:' + player_url
elif not re.match(r'https?://', player_url):
player_url = compat_urlparse.urljoin(
'https://www.youtube.com', player_url)
return player_url
def _signature_cache_id(self, example_sig):
""" Return a string representation of a signature """
return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
@@ -1361,6 +1626,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
raise ExtractorError('Cannot identify player %r' % player_url)
return id_m.group('id')
def _load_player(self, video_id, player_url, fatal=True) -> bool:
player_id = self._extract_player_info(player_url)
if player_id not in self._code_cache:
self._code_cache[player_id] = self._download_webpage(
player_url, video_id, fatal=fatal,
note='Downloading player ' + player_id,
errnote='Download of %s failed' % player_url)
return player_id in self._code_cache
def _extract_signature_function(self, video_id, player_url, example_sig):
player_id = self._extract_player_info(player_url)
@@ -1373,20 +1647,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if cache_spec is not None:
return lambda s: ''.join(s[i] for i in cache_spec)
if player_id not in self._code_cache:
self._code_cache[player_id] = self._download_webpage(
player_url, video_id,
note='Downloading player ' + player_id,
errnote='Download of %s failed' % player_url)
code = self._code_cache[player_id]
res = self._parse_sig_js(code)
if self._load_player(video_id, player_url):
code = self._code_cache[player_id]
res = self._parse_sig_js(code)
test_string = ''.join(map(compat_chr, range(len(example_sig))))
cache_res = res(test_string)
cache_spec = [ord(c) for c in cache_res]
test_string = ''.join(map(compat_chr, range(len(example_sig))))
cache_res = res(test_string)
cache_spec = [ord(c) for c in cache_res]
self._downloader.cache.store('youtube-sigfuncs', func_id, cache_spec)
return res
self._downloader.cache.store('youtube-sigfuncs', func_id, cache_spec)
return res
def _print_sig_code(self, func, example_sig):
def gen_sig_code(idxs):
@@ -1457,11 +1727,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if player_url is None:
raise ExtractorError('Cannot decrypt signature without player_url')
if player_url.startswith('//'):
player_url = 'https:' + player_url
elif not re.match(r'https?://', player_url):
player_url = compat_urlparse.urljoin(
'https://www.youtube.com', player_url)
try:
player_id = (player_url, self._signature_cache_id(s))
if player_id not in self._player_cache:
@@ -1478,6 +1743,31 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
raise ExtractorError(
'Signature extraction failed: ' + tb, cause=e)
def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False):
"""
Extract signatureTimestamp (sts)
Required to tell API what sig/player version is in use.
"""
sts = None
if isinstance(ytcfg, dict):
sts = int_or_none(ytcfg.get('STS'))
if not sts:
# Attempt to extract from player
if player_url is None:
error_msg = 'Cannot extract signature timestamp without player_url.'
if fatal:
raise ExtractorError(error_msg)
self.report_warning(error_msg)
return
if self._load_player(video_id, player_url, fatal=fatal):
player_id = self._extract_player_info(player_url)
code = self._code_cache[player_id]
sts = int_or_none(self._search_regex(
r'(?:signatureTimestamp|sts)\s*:\s*(?P<sts>[0-9]{5})', code,
'JS player signature timestamp', group='sts', fatal=fatal))
return sts
def _mark_watched(self, video_id, player_response):
playback_url = url_or_none(try_get(
player_response,
@@ -1714,6 +2004,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'pbj': 1,
'type': 'next',
}
if 'itct' in continuation:
query['itct'] = continuation['itct']
if parent:
query['action_get_comment_replies'] = 1
else:
@@ -1759,19 +2051,27 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
response = try_get(browse,
(lambda x: x['response'],
lambda x: x[1]['response'])) or {}
lambda x: x[1]['response']), dict) or {}
if response.get('continuationContents'):
break
# YouTube sometimes gives reload: now json if something went wrong (e.g. bad auth)
if browse.get('reload'):
raise ExtractorError('Invalid or missing params in continuation request', expected=False)
if isinstance(browse, dict):
if browse.get('reload'):
raise ExtractorError('Invalid or missing params in continuation request', expected=False)
# TODO: not tested, merged from old extractor
err_msg = browse.get('externalErrorMessage')
# TODO: not tested, merged from old extractor
err_msg = browse.get('externalErrorMessage')
if err_msg:
last_error = err_msg
continue
response_error = try_get(response, lambda x: x['responseContext']['errors']['error'][0], dict) or {}
err_msg = response_error.get('externalErrorMessage')
if err_msg:
raise ExtractorError('YouTube said: %s' % err_msg, expected=False)
last_error = err_msg
continue
# Youtube sometimes sends incomplete data
# See: https://github.com/ytdl-org/youtube-dl/issues/28194
@@ -1866,6 +2166,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'comment_count': len(comments),
}
@staticmethod
def _generate_player_context(sts=None):
context = {
'html5Preference': 'HTML5_PREF_WANTS',
}
if sts is not None:
context['signatureTimestamp'] = sts
return {
'playbackContext': {
'contentPlaybackContext': context
}
}
@staticmethod
def _get_video_info_params(video_id):
return {
'video_id': video_id,
'eurl': 'https://youtube.googleapis.com/v/' + video_id,
'html5': '1',
'c': 'TVHTML5',
'cver': '6.20180913',
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
@@ -1877,6 +2200,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
webpage = self._download_webpage(
webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
ytcfg = self._extract_ytcfg(video_id, webpage) or self._get_default_ytcfg()
identity_token = self._extract_identity_token(webpage, video_id)
syncid = self._extract_account_syncid(ytcfg)
headers = self._generate_api_headers(ytcfg, identity_token, syncid)
player_url = self._extract_player_url(ytcfg, webpage)
player_client = try_get(self._configuration_arg('player_client'), lambda x: x[0], str) or ''
if player_client.upper() not in ('WEB', 'ANDROID'):
player_client = 'WEB'
force_mobile_client = player_client.upper() == 'ANDROID'
player_skip = self._configuration_arg('player_skip') or []
def get_text(x):
if not x:
return
@@ -1890,50 +2226,112 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
ytm_streaming_data = {}
if is_music_url:
# we are forcing to use parse_json because 141 only appeared in get_video_info.
# el, c, cver, cplayer field required for 141(aac 256kbps) codec
# maybe paramter of youtube music player?
ytm_player_response = self._parse_json(try_get(compat_parse_qs(
self._download_webpage(
base_url + 'get_video_info', video_id,
'Fetching youtube music info webpage',
'unable to download youtube music info webpage', query={
'video_id': video_id,
'eurl': 'https://youtube.googleapis.com/v/' + video_id,
'el': 'detailpage',
'c': 'WEB_REMIX',
'cver': '0.1',
'cplayer': 'UNIPLAYER',
'html5': '1',
}, fatal=False)),
lambda x: x['player_response'][0],
compat_str) or '{}', video_id)
ytm_streaming_data = ytm_player_response.get('streamingData') or {}
ytm_webpage = None
sts = self._extract_signature_timestamp(video_id, player_url, ytcfg, fatal=False)
if sts and not force_mobile_client and 'configs' not in player_skip:
ytm_webpage = self._download_webpage(
'https://music.youtube.com',
video_id, fatal=False, note="Downloading remix client config")
ytm_cfg = self._extract_ytcfg(video_id, ytm_webpage) or {}
ytm_client = 'WEB_REMIX'
if not sts or force_mobile_client:
# Android client already has signature descrambled
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/562
if not sts:
self.report_warning('Falling back to mobile remix client for player API.')
ytm_client = 'ANDROID_MUSIC'
ytm_cfg = {}
ytm_headers = self._generate_api_headers(
ytm_cfg, identity_token, syncid,
client=ytm_client)
ytm_query = {'videoId': video_id}
ytm_query.update(self._generate_player_context(sts))
ytm_player_response = self._extract_response(
item_id=video_id, ep='player', query=ytm_query,
ytcfg=ytm_cfg, headers=ytm_headers, fatal=False,
default_client=ytm_client,
note='Downloading %sremix player API JSON' % ('mobile ' if force_mobile_client else ''))
ytm_streaming_data = try_get(ytm_player_response, lambda x: x['streamingData']) or {}
player_response = None
if webpage:
player_response = self._extract_yt_initial_variable(
webpage, self._YT_INITIAL_PLAYER_RESPONSE_RE,
video_id, 'initial player response')
ytcfg = self._extract_ytcfg(video_id, webpage)
if not player_response:
player_response = self._call_api(
'player', {'videoId': video_id}, video_id, api_key=self._extract_api_key(ytcfg))
if not player_response or force_mobile_client:
sts = self._extract_signature_timestamp(video_id, player_url, ytcfg, fatal=False)
yt_client = 'WEB'
ytpcfg = ytcfg
ytp_headers = headers
if not sts or force_mobile_client:
# Android client already has signature descrambled
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/562
if not sts:
self.report_warning('Falling back to mobile client for player API.')
yt_client = 'ANDROID'
ytpcfg = {}
ytp_headers = self._generate_api_headers(ytpcfg, identity_token, syncid, yt_client)
yt_query = {'videoId': video_id}
yt_query.update(self._generate_player_context(sts))
player_response = self._extract_response(
item_id=video_id, ep='player', query=yt_query,
ytcfg=ytpcfg, headers=ytp_headers, fatal=False,
default_client=yt_client,
note='Downloading %splayer API JSON' % ('mobile ' if force_mobile_client else '')
)
# Age-gate workarounds
playability_status = player_response.get('playabilityStatus') or {}
if playability_status.get('reason') == 'Sign in to confirm your age':
if playability_status.get('reason') in self._AGE_GATE_REASONS:
pr = self._parse_json(try_get(compat_parse_qs(
self._download_webpage(
base_url + 'get_video_info', video_id,
'Refetching age-gated info webpage',
'unable to download video info webpage', query={
'video_id': video_id,
'eurl': 'https://youtube.googleapis.com/v/' + video_id,
'html5': '1',
}, fatal=False)),
'Refetching age-gated info webpage', 'unable to download video info webpage',
query=self._get_video_info_params(video_id), fatal=False)),
lambda x: x['player_response'][0],
compat_str) or '{}', video_id)
if not pr:
self.report_warning('Falling back to embedded-only age-gate workaround.')
embed_webpage = None
sts = self._extract_signature_timestamp(video_id, player_url, ytcfg, fatal=False)
if sts and not force_mobile_client and 'configs' not in player_skip:
embed_webpage = self._download_webpage(
'https://www.youtube.com/embed/%s?html5=1' % video_id,
video_id=video_id, note='Downloading age-gated embed config')
ytcfg_age = self._extract_ytcfg(video_id, embed_webpage) or {}
# If we extracted the embed webpage, it'll tell us if we can view the video
embedded_pr = self._parse_json(
try_get(ytcfg_age, lambda x: x['PLAYER_VARS']['embedded_player_response'], str) or '{}',
video_id=video_id)
embedded_ps_reason = try_get(embedded_pr, lambda x: x['playabilityStatus']['reason'], str) or ''
if embedded_ps_reason not in self._AGE_GATE_REASONS:
yt_client = 'WEB_EMBEDDED_PLAYER'
if not sts or force_mobile_client:
# Android client already has signature descrambled
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/562
if not sts:
self.report_warning(
'Falling back to mobile embedded client for player API (note: some formats may be missing).')
yt_client = 'ANDROID_EMBEDDED_PLAYER'
ytcfg_age = {}
ytage_headers = self._generate_api_headers(
ytcfg_age, identity_token, syncid, client=yt_client)
yt_age_query = {'videoId': video_id}
yt_age_query.update(self._generate_player_context(sts))
pr = self._extract_response(
item_id=video_id, ep='player', query=yt_age_query,
ytcfg=ytcfg_age, headers=ytage_headers, fatal=False,
default_client=yt_client,
note='Downloading %sage-gated player API JSON' % ('mobile ' if force_mobile_client else '')
) or {}
if pr:
player_response = pr
@@ -2005,7 +2403,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
formats, itags, stream_ids = [], [], []
itag_qualities = {}
player_url = None
q = qualities([
'tiny', 'audio_quality_low', 'audio_quality_medium', 'audio_quality_high', # Audio only formats
'small', 'medium', 'large', 'hd720', 'hd1080', 'hd1440', 'hd2160', 'hd2880', 'highres'
@@ -2045,12 +2442,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
encrypted_sig = try_get(sc, lambda x: x['s'][0])
if not (sc and fmt_url and encrypted_sig):
continue
if not player_url:
if not webpage:
continue
player_url = self._search_regex(
r'"(?:PLAYER_JS_URL|jsUrl)"\s*:\s*"([^"]+)"',
webpage, 'player URL', fatal=False)
if not player_url:
continue
signature = self._decrypt_signature(sc['s'][0], video_id, player_url)
@@ -2098,8 +2489,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
dct['container'] = dct['ext'] + '_dash'
formats.append(dct)
skip_manifests = self._configuration_arg('skip') or []
get_dash = 'dash' not in skip_manifests and self.get_param('youtube_include_dash_manifest', True)
get_hls = 'hls' not in skip_manifests and self.get_param('youtube_include_hls_manifest', True)
for sd in (streaming_data, ytm_streaming_data):
hls_manifest_url = sd.get('hlsManifestUrl')
hls_manifest_url = get_hls and sd.get('hlsManifestUrl')
if hls_manifest_url:
for f in self._extract_m3u8_formats(
hls_manifest_url, video_id, 'mp4', fatal=False):
@@ -2109,23 +2504,21 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
f['format_id'] = itag
formats.append(f)
if self.get_param('youtube_include_dash_manifest', True):
for sd in (streaming_data, ytm_streaming_data):
dash_manifest_url = sd.get('dashManifestUrl')
if dash_manifest_url:
for f in self._extract_mpd_formats(
dash_manifest_url, video_id, fatal=False):
itag = f['format_id']
if itag in itags:
continue
if itag in itag_qualities:
f['quality'] = q(itag_qualities[itag])
filesize = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url')
or f['url'], 'file size', default=None))
if filesize:
f['filesize'] = filesize
formats.append(f)
dash_manifest_url = get_dash and sd.get('dashManifestUrl')
if dash_manifest_url:
for f in self._extract_mpd_formats(
dash_manifest_url, video_id, fatal=False):
itag = f['format_id']
if itag in itags:
continue
if itag in itag_qualities:
f['quality'] = q(itag_qualities[itag])
filesize = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url')
or f['url'], 'file size', default=None))
if filesize:
f['filesize'] = filesize
formats.append(f)
if not formats:
if not self.get_param('allow_unplayable_formats') and streaming_data.get('licenseInfos'):
@@ -2211,6 +2604,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
or microformat.get('lengthSeconds')) \
or parse_duration(search_meta('duration'))
is_live = video_details.get('isLive')
is_upcoming = video_details.get('isUpcoming')
owner_profile_url = microformat.get('ownerProfileUrl')
info = {
@@ -2284,7 +2678,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
continue
process_language(
automatic_captions, base_url, translation_language_code,
try_get(translation_language, lambda x: x['languageName']['simpleText']),
try_get(translation_language, (
lambda x: x['languageName']['simpleText'],
lambda x: x['languageName']['runs'][0]['text'])),
{'tlang': translation_language_code})
info['automatic_captions'] = automatic_captions
info['subtitles'] = subtitles
@@ -2322,21 +2718,22 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
webpage, self._YT_INITIAL_DATA_RE, video_id,
'yt initial data')
if not initial_data:
initial_data = self._call_api(
'next', {'videoId': video_id}, video_id, fatal=False, api_key=self._extract_api_key(ytcfg))
initial_data = self._extract_response(
item_id=video_id, ep='next', fatal=False,
ytcfg=ytcfg, headers=headers, query={'videoId': video_id},
note='Downloading initial data API JSON')
if not is_live:
try:
# This will error if there is no livechat
initial_data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
info['subtitles']['live_chat'] = [{
'url': 'https://www.youtube.com/watch?v=%s' % video_id, # url is needed to set cookies
'video_id': video_id,
'ext': 'json',
'protocol': 'youtube_live_chat_replay',
}]
except (KeyError, IndexError, TypeError):
pass
try:
# This will error if there is no livechat
initial_data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
info['subtitles']['live_chat'] = [{
'url': 'https://www.youtube.com/watch?v=%s' % video_id, # url is needed to set cookies
'video_id': video_id,
'ext': 'json',
'protocol': 'youtube_live_chat' if is_live or is_upcoming else 'youtube_live_chat_replay',
}]
except (KeyError, IndexError, TypeError):
pass
if initial_data:
chapters = self._extract_chapters_from_json(
@@ -3480,40 +3877,6 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self._extract_mix_playlist(playlist, playlist_id, data, webpage),
playlist_id=playlist_id, playlist_title=title)
@staticmethod
def _extract_alerts(data):
for alert_dict in try_get(data, lambda x: x['alerts'], list) or []:
if not isinstance(alert_dict, dict):
continue
for alert in alert_dict.values():
alert_type = alert.get('type')
if not alert_type:
continue
message = try_get(alert, lambda x: x['text']['simpleText'], compat_str) or ''
if message:
yield alert_type, message
for run in try_get(alert, lambda x: x['text']['runs'], list) or []:
message += try_get(run, lambda x: x['text'], compat_str)
if message:
yield alert_type, message
def _report_alerts(self, alerts, expected=True):
errors = []
warnings = []
for alert_type, alert_message in alerts:
if alert_type.lower() == 'error':
errors.append([alert_type, alert_message])
else:
warnings.append([alert_type, alert_message])
for alert_type, alert_message in (warnings + errors[:-1]):
self.report_warning('YouTube said: %s - %s' % (alert_type, alert_message))
if errors:
raise ExtractorError('YouTube said: %s' % errors[-1][1], expected=expected)
def _extract_and_report_alerts(self, data, *args, **kwargs):
return self._report_alerts(self._extract_alerts(data), *args, **kwargs)
def _reload_with_unavailable_videos(self, item_id, data, webpage):
"""
Get playlist with unavailable videos if the 'show unavailable videos' button exists.
@@ -3558,54 +3921,6 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
check_get_keys='contents', fatal=False,
note='Downloading API JSON with unavailable videos')
def _extract_response(self, item_id, query, note='Downloading API JSON', headers=None,
ytcfg=None, check_get_keys=None, ep='browse', fatal=True):
response = None
last_error = None
count = -1
retries = self.get_param('extractor_retries', 3)
if check_get_keys is None:
check_get_keys = []
while count < retries:
count += 1
if last_error:
self.report_warning('%s. Retrying ...' % last_error)
try:
response = self._call_api(
ep=ep, fatal=True, headers=headers,
video_id=item_id, query=query,
context=self._extract_context(ytcfg),
api_key=self._extract_api_key(ytcfg),
note='%s%s' % (note, ' (retry #%d)' % count if count else ''))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503, 404):
# Downloading page may result in intermittent 5xx HTTP error
# Sometimes a 404 is also recieved. See: https://github.com/ytdl-org/youtube-dl/issues/28289
last_error = 'HTTP Error %s' % e.cause.code
if count < retries:
continue
if fatal:
raise
else:
self.report_warning(error_to_compat_str(e))
return
else:
# Youtube may send alerts if there was an issue with the continuation page
self._extract_and_report_alerts(response, expected=False)
if not check_get_keys or dict_get(response, check_get_keys):
break
# Youtube sometimes sends incomplete data
# See: https://github.com/ytdl-org/youtube-dl/issues/28194
last_error = 'Incomplete data received'
if count >= retries:
if fatal:
raise ExtractorError(last_error)
else:
self.report_warning(last_error)
return
return response
def _extract_webpage(self, url, item_id):
retries = self.get_param('extractor_retries', 3)
count = -1
@@ -4062,6 +4377,7 @@ class YoutubeRecommendedIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube.com recommended videos, ":ytrec" for short (requires authentication)'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/?(?:[?#]|$)|:ytrec(?:ommended)?'
_FEED_NAME = 'recommended'
_LOGIN_REQUIRED = False
_TESTS = [{
'url': ':ytrec',
'only_matching': True,

View File

@@ -599,6 +599,10 @@ def parseOpts(overrideArguments=None):
'-r', '--limit-rate', '--rate-limit',
dest='ratelimit', metavar='RATE',
help='Maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option(
'--throttled-rate',
dest='throttledratelimit', metavar='RATE',
help='Minimum download rate in bytes per second below which throttling is assumed and the video data is re-extracted (e.g. 100K)')
downloader.add_option(
'-R', '--retries',
dest='retries', metavar='RETRIES', default=10,
@@ -712,7 +716,8 @@ def parseOpts(overrideArguments=None):
help=(
'Give these arguments to the external downloader. '
'Specify the downloader name and the arguments separated by a colon ":". '
'You can use this option multiple times (Alias: --external-downloader-args)'))
'You can use this option multiple times to give different arguments to different downloaders '
'(Alias: --external-downloader-args)'))
workarounds = optparse.OptionGroup(parser, 'Workarounds')
workarounds.add_option(
@@ -1165,7 +1170,7 @@ def parseOpts(overrideArguments=None):
'to give the argument to the specified postprocessor/executable. Supported PP are: '
'Merger, ExtractAudio, SplitChapters, Metadata, EmbedSubtitle, EmbedThumbnail, '
'SubtitlesConvertor, ThumbnailsConvertor, VideoRemuxer, VideoConvertor, '
'SponSkrub, FixupStretched, FixupM4a and FixupM3u8. '
'SponSkrub, FixupStretched, FixupM4a, FixupM3u8, FixupTimestamp and FixupDuration. '
'The supported executables are: AtomicParsley, FFmpeg, FFprobe, and SponSkrub. '
'You can also specify "PP+EXE:ARGS" to give the arguments to the specified executable '
'only when being used by the specified postprocessor. Additionally, for ffmpeg/ffprobe, '
@@ -1200,19 +1205,19 @@ def parseOpts(overrideArguments=None):
postproc.add_option(
'--embed-thumbnail',
action='store_true', dest='embedthumbnail', default=False,
help='Embed thumbnail in the audio as cover art')
help='Embed thumbnail in the video as cover art')
postproc.add_option(
'--no-embed-thumbnail',
action='store_false', dest='embedthumbnail',
help='Do not embed thumbnail (default)')
postproc.add_option(
'--add-metadata',
'--embed-metadata', '--add-metadata',
action='store_true', dest='addmetadata', default=False,
help='Write metadata to the video file')
help='Embed metadata including chapter markers (if supported by the format) to the video file (Alias: --add-metadata)')
postproc.add_option(
'--no-add-metadata',
'--no-embed-metadata', '--no-add-metadata',
action='store_false', dest='addmetadata',
help='Do not write metadata (default)')
help='Do not write metadata (default) (Alias: --no-add-metadata)')
postproc.add_option(
'--metadata-from-title',
metavar='FORMAT', dest='metafromtitle',
@@ -1230,10 +1235,12 @@ def parseOpts(overrideArguments=None):
postproc.add_option(
'--fixup',
metavar='POLICY', dest='fixup', default=None,
choices=('never', 'ignore', 'warn', 'detect_or_warn', 'force'),
help=(
'Automatically correct known faults of the file. '
'One of never (do nothing), warn (only emit a warning), '
'detect_or_warn (the default; fix file if we can, warn otherwise)'))
'detect_or_warn (the default; fix file if we can, warn otherwise), '
'force (try fixing even if file already exists'))
postproc.add_option(
'--prefer-avconv', '--no-prefer-ffmpeg',
action='store_false', dest='prefer_ffmpeg',
@@ -1337,22 +1344,34 @@ def parseOpts(overrideArguments=None):
'--no-hls-split-discontinuity',
dest='hls_split_discontinuity', action='store_false',
help='Do not split HLS playlists to different formats at discontinuities such as ad breaks (default)')
extractor.add_option(
'--extractor-args',
metavar='KEY:ARGS', dest='extractor_args', default={}, type='str',
action='callback', callback=_dict_from_options_callback,
callback_kwargs={
'multiple_keys': False,
'process': lambda val: dict(
(lambda x: (x[0], x[1].split(',')))(arg.split('=', 1) + ['', '']) for arg in val.split(';'))
},
help=(
'Pass these arguments to the extractor. See "EXTRACTOR ARGUMENTS" for details. '
'You can use this option multiple times to give different arguments to different extractors'))
extractor.add_option(
'--youtube-include-dash-manifest', '--no-youtube-skip-dash-manifest',
action='store_true', dest='youtube_include_dash_manifest', default=True,
help='Download the DASH manifests and related data on YouTube videos (default) (Alias: --no-youtube-skip-dash-manifest)')
help=optparse.SUPPRESS_HELP)
extractor.add_option(
'--youtube-skip-dash-manifest', '--no-youtube-include-dash-manifest',
action='store_false', dest='youtube_include_dash_manifest',
help='Do not download the DASH manifests and related data on YouTube videos (Alias: --no-youtube-include-dash-manifest)')
help=optparse.SUPPRESS_HELP)
extractor.add_option(
'--youtube-include-hls-manifest', '--no-youtube-skip-hls-manifest',
action='store_true', dest='youtube_include_hls_manifest', default=True,
help='Download the HLS manifests and related data on YouTube videos (default) (Alias: --no-youtube-skip-hls-manifest)')
help=optparse.SUPPRESS_HELP)
extractor.add_option(
'--youtube-skip-hls-manifest', '--no-youtube-include-hls-manifest',
action='store_false', dest='youtube_include_hls_manifest',
help='Do not download the HLS manifests and related data on YouTube videos (Alias: --no-youtube-include-hls-manifest)')
help=optparse.SUPPRESS_HELP)
parser.add_option_group(general)
parser.add_option_group(network)

View File

@@ -5,7 +5,9 @@ from .ffmpeg import (
FFmpegPostProcessor,
FFmpegEmbedSubtitlePP,
FFmpegExtractAudioPP,
FFmpegFixupDurationPP,
FFmpegFixupStretchedPP,
FFmpegFixupTimestampPP,
FFmpegFixupM3u8PP,
FFmpegFixupM4aPP,
FFmpegMergerPP,
@@ -35,9 +37,11 @@ __all__ = [
'FFmpegEmbedSubtitlePP',
'FFmpegExtractAudioPP',
'FFmpegSplitChaptersPP',
'FFmpegFixupDurationPP',
'FFmpegFixupM3u8PP',
'FFmpegFixupM4aPP',
'FFmpegFixupStretchedPP',
'FFmpegFixupTimestampPP',
'FFmpegMergerPP',
'FFmpegMetadataPP',
'FFmpegSubtitlesConvertorPP',

View File

@@ -1,5 +1,6 @@
from __future__ import unicode_literals
import functools
import os
from ..compat import compat_str
@@ -67,6 +68,25 @@ class PostProcessor(object):
"""Sets the downloader for this PP."""
self._downloader = downloader
@staticmethod
def _restrict_to(*, video=True, audio=True, images=True):
allowed = {'video': video, 'audio': audio, 'images': images}
def decorator(func):
@functools.wraps(func)
def wrapper(self, info):
format_type = (
'video' if info.get('vcodec') != 'none'
else 'audio' if info.get('acodec') != 'none'
else 'images')
if allowed[format_type]:
return func(self, info)
else:
self.to_screen('Skipping %s' % format_type)
return [], info
return wrapper
return decorator
def run(self, information):
"""Run the PostProcessor.

View File

@@ -16,6 +16,7 @@ try:
except ImportError:
has_mutagen = False
from .common import PostProcessor
from .ffmpeg import (
FFmpegPostProcessor,
FFmpegThumbnailsConvertorPP,
@@ -62,6 +63,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
def _report_run(self, exe, filename):
self.to_screen('%s: Adding thumbnail to "%s"' % (exe, filename))
@PostProcessor._restrict_to(images=False)
def run(self, info):
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
@@ -90,7 +92,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
# format, there will be some additional data loss.
# PNG, on the other hand, is lossless.
thumbnail_ext = os.path.splitext(thumbnail_filename)[1][1:]
if thumbnail_ext not in ('jpg', 'png'):
if thumbnail_ext not in ('jpg', 'jpeg', 'png'):
thumbnail_filename = convertor.convert_thumbnail(thumbnail_filename, 'png')
thumbnail_ext = 'png'
@@ -123,8 +125,9 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
self.run_ffmpeg(filename, temp_filename, options)
elif info['ext'] in ['m4a', 'mp4', 'mov']:
prefer_atomicparsley = 'embed-thumbnail-atomicparsley' in self.get_param('compat_opts', [])
# Method 1: Use mutagen
if not has_mutagen:
if not has_mutagen or prefer_atomicparsley:
success = False
else:
try:
@@ -143,7 +146,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
success = False
# Method 2: Use ffmpeg+ffprobe
if not success:
if not success and not prefer_atomicparsley:
success = True
try:
options = ['-c', 'copy', '-map', '0', '-dn', '-map', '1']

View File

@@ -310,6 +310,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
except FFmpegPostProcessorError as err:
raise AudioConversionError(err.msg)
@PostProcessor._restrict_to(images=False)
def run(self, information):
path = information['filepath']
orig_ext = information['ext']
@@ -419,6 +420,7 @@ class FFmpegVideoConvertorPP(FFmpegPostProcessor):
return ['-c:v', 'libxvid', '-vtag', 'XVID']
return []
@PostProcessor._restrict_to(images=False)
def run(self, information):
path, source_ext = information['filepath'], information['ext'].lower()
target_ext = self._target_ext(source_ext)
@@ -456,6 +458,7 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
super(FFmpegEmbedSubtitlePP, self).__init__(downloader)
self._already_have_subtitle = already_have_subtitle
@PostProcessor._restrict_to(images=False)
def run(self, information):
if information['ext'] not in ('mp4', 'webm', 'mkv'):
self.to_screen('Subtitles can only be embedded in mp4, webm or mkv files')
@@ -523,6 +526,7 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
class FFmpegMetadataPP(FFmpegPostProcessor):
@PostProcessor._restrict_to(images=False)
def run(self, info):
metadata = {}
@@ -625,6 +629,7 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
class FFmpegMergerPP(FFmpegPostProcessor):
@PostProcessor._restrict_to(images=False)
def run(self, info):
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
@@ -656,55 +661,71 @@ class FFmpegMergerPP(FFmpegPostProcessor):
return True
class FFmpegFixupStretchedPP(FFmpegPostProcessor):
class FFmpegFixupPostProcessor(FFmpegPostProcessor):
def _fixup(self, msg, filename, options):
temp_filename = prepend_extension(filename, 'temp')
self.to_screen(f'{msg} of "{filename}"')
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
class FFmpegFixupStretchedPP(FFmpegFixupPostProcessor):
@PostProcessor._restrict_to(images=False, audio=False)
def run(self, info):
stretched_ratio = info.get('stretched_ratio')
if stretched_ratio is None or stretched_ratio == 1:
return [], info
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
options = ['-c', 'copy', '-map', '0', '-dn', '-aspect', '%f' % stretched_ratio]
self.to_screen('Fixing aspect ratio in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
if stretched_ratio not in (None, 1):
self._fixup('Fixing aspect ratio', info['filepath'], [
'-c', 'copy', '-map', '0', '-dn', '-aspect', '%f' % stretched_ratio])
return [], info
class FFmpegFixupM4aPP(FFmpegPostProcessor):
class FFmpegFixupM4aPP(FFmpegFixupPostProcessor):
@PostProcessor._restrict_to(images=False, video=False)
def run(self, info):
if info.get('container') != 'm4a_dash':
return [], info
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
options = ['-c', 'copy', '-map', '0', '-dn', '-f', 'mp4']
self.to_screen('Correcting container in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
if info.get('container') == 'm4a_dash':
self._fixup('Correcting container', info['filepath'], [
'-c', 'copy', '-map', '0', '-dn', '-f', 'mp4'])
return [], info
class FFmpegFixupM3u8PP(FFmpegPostProcessor):
class FFmpegFixupM3u8PP(FFmpegFixupPostProcessor):
@PostProcessor._restrict_to(images=False)
def run(self, info):
filename = info['filepath']
if self.get_audio_codec(filename) == 'aac':
temp_filename = prepend_extension(filename, 'temp')
if self.get_audio_codec(info['filepath']) == 'aac':
self._fixup('Fixing malformed AAC bitstream', info['filepath'], [
'-c', 'copy', '-map', '0', '-dn', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc'])
return [], info
options = ['-c', 'copy', '-map', '0', '-dn', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
self.to_screen('Fixing malformed AAC bitstream in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
class FFmpegFixupTimestampPP(FFmpegFixupPostProcessor):
def __init__(self, downloader=None, trim=0.001):
# "trim" should be used when the video contains unintended packets
super(FFmpegFixupTimestampPP, self).__init__(downloader)
assert isinstance(trim, (int, float))
self.trim = str(trim)
@PostProcessor._restrict_to(images=False)
def run(self, info):
required_version = '4.4'
if is_outdated_version(self._versions[self.basename], required_version):
self.report_warning(
'A re-encode is needed to fix timestamps in older versions of ffmpeg. '
f'Please install ffmpeg {required_version} or later to fixup without re-encoding')
opts = ['-vf', 'setpts=PTS-STARTPTS']
else:
opts = ['-c', 'copy', '-bsf', 'setts=ts=TS-STARTPTS']
self._fixup('Fixing frame timestamp', info['filepath'], opts + ['-map', '0', '-dn', '-ss', self.trim])
return [], info
class FFmpegFixupDurationPP(FFmpegFixupPostProcessor):
@PostProcessor._restrict_to(images=False)
def run(self, info):
self._fixup('Fixing video duration', info['filepath'], ['-c', 'copy', '-map', '0', '-dn'])
return [], info
@@ -805,6 +826,7 @@ class FFmpegSplitChaptersPP(FFmpegPostProcessor):
['-ss', compat_str(chapter['start_time']),
'-t', compat_str(chapter['end_time'] - chapter['start_time'])])
@PostProcessor._restrict_to(images=False)
def run(self, info):
chapters = info.get('chapters') or []
if not chapters:
@@ -874,6 +896,8 @@ class FFmpegThumbnailsConvertorPP(FFmpegPostProcessor):
_, thumbnail_ext = os.path.splitext(original_thumbnail)
if thumbnail_ext:
thumbnail_ext = thumbnail_ext[1:].lower()
if thumbnail_ext == 'jpeg':
thumbnail_ext = 'jpg'
if thumbnail_ext == self.format:
self.to_screen('Thumbnail "%s" is already in the requested format' % original_thumbnail)
continue

View File

@@ -41,6 +41,7 @@ class SponSkrubPP(PostProcessor):
return None
return path
@PostProcessor._restrict_to(images=False)
def run(self, information):
if self.path is None:
return [], information

View File

@@ -89,13 +89,9 @@ def run_update(ydl):
err = None
if isinstance(globals().get('__loader__'), zipimporter):
# We only support python 3.6 or above
if sys.version_info < (3, 6):
err = 'This is the last release of yt-dlp for Python version %d.%d! Please update to Python 3.6 or above' % sys.version_info[:2]
pass
elif hasattr(sys, 'frozen'):
# Python 3.6 supports only vista and above
if sys.getwindowsversion()[0] < 6:
err = 'This is the last release of yt-dlp for your version of Windows. Please update to Windows Vista or above'
pass
else:
err = 'It looks like you installed yt-dlp with a package manager, pip, setup.py or a tarball. Please use that to update'
if err:

View File

@@ -2244,6 +2244,17 @@ def unescapeHTML(s):
r'&([^&;]+;)', lambda m: _htmlentity_transform(m.group(1)), s)
def escapeHTML(text):
return (
text
.replace('&', '&amp;')
.replace('<', '&lt;')
.replace('>', '&gt;')
.replace('"', '&quot;')
.replace("'", '&#39;')
)
def process_communicate_or_kill(p, *args, **kwargs):
try:
return p.communicate(*args, **kwargs)
@@ -2323,13 +2334,14 @@ def decodeOption(optval):
return optval
def formatSeconds(secs, delim=':'):
def formatSeconds(secs, delim=':', msec=False):
if secs > 3600:
return '%d%s%02d%s%02d' % (secs // 3600, delim, (secs % 3600) // 60, delim, secs % 60)
ret = '%d%s%02d%s%02d' % (secs // 3600, delim, (secs % 3600) // 60, delim, secs % 60)
elif secs > 60:
return '%d%s%02d' % (secs // 60, delim, secs % 60)
ret = '%d%s%02d' % (secs // 60, delim, secs % 60)
else:
return '%d' % secs
ret = '%d' % secs
return '%s.%03d' % (ret, secs % 1) if msec else ret
def make_HTTPS_handler(params, **kwargs):
@@ -2492,6 +2504,11 @@ class RejectedVideoReached(YoutubeDLError):
pass
class ThrottledDownload(YoutubeDLError):
""" Download speed below --throttled-rate. """
pass
class MaxDownloadsReached(YoutubeDLError):
""" --max-downloads limit has been reached. """
pass
@@ -3954,40 +3971,57 @@ class LazyList(collections.Sequence):
def __init__(self, iterable):
self.__iterable = iter(iterable)
self.__cache = []
self.__reversed = False
def __iter__(self):
for item in self.__cache:
yield item
if self.__reversed:
# We need to consume the entire iterable to iterate in reverse
yield from self.exhaust()
return
yield from self.__cache
for item in self.__iterable:
self.__cache.append(item)
yield item
def __exhaust(self):
self.__cache.extend(self.__iterable)
return self.__cache
def exhaust(self):
''' Evaluate the entire iterable '''
self.__cache.extend(self.__iterable)
return self.__exhaust()[::-1 if self.__reversed else 1]
@staticmethod
def __reverse_index(x):
return -(x + 1)
def __getitem__(self, idx):
if isinstance(idx, slice):
step = idx.step or 1
start = idx.start if idx.start is not None else 1 if step > 0 else -1
start = idx.start if idx.start is not None else 0 if step > 0 else -1
stop = idx.stop if idx.stop is not None else -1 if step > 0 else 0
if self.__reversed:
(start, stop), step = map(self.__reverse_index, (start, stop)), -step
idx = slice(start, stop, step)
elif isinstance(idx, int):
if self.__reversed:
idx = self.__reverse_index(idx)
start = stop = idx
else:
raise TypeError('indices must be integers or slices')
if start < 0 or stop < 0:
# We need to consume the entire iterable to be able to slice from the end
# Obviously, never use this with infinite iterables
self.exhaust()
else:
n = max(start, stop) - len(self.__cache) + 1
if n > 0:
self.__cache.extend(itertools.islice(self.__iterable, n))
return self.__exhaust()[idx]
n = max(start, stop) - len(self.__cache) + 1
if n > 0:
self.__cache.extend(itertools.islice(self.__iterable, n))
return self.__cache[idx]
def __bool__(self):
try:
self[0]
self[-1] if self.__reversed else self[0]
except IndexError:
return False
return True
@@ -3996,6 +4030,17 @@ class LazyList(collections.Sequence):
self.exhaust()
return len(self.__cache)
def reverse(self):
self.__reversed = not self.__reversed
return self
def __repr__(self):
# repr and str should mimic a list. So we exhaust the iterable
return repr(self.exhaust())
def __str__(self):
return repr(self.exhaust())
class PagedList(object):
def __len__(self):
@@ -6204,6 +6249,8 @@ def traverse_obj(obj, keys, *, casesense=True, is_user_input=False, traverse_str
if is_user_input:
key = (int_or_none(key) if ':' not in key
else slice(*map(int_or_none, key.split(':'))))
if key is None:
return None
if not isinstance(obj, (list, tuple)):
if traverse_string:
obj = compat_str(obj)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.06.08'
__version__ = '2021.06.23'