1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-22 14:51:23 +00:00

Compare commits

..

41 Commits

Author SHA1 Message Date
pukkandan
597c18665e Release 2021.03.15 2021-03-15 05:54:39 +05:30
pukkandan
10db0d2f57 Update to ytdl-commit-3be0980
3be098010f
2021-03-15 04:52:06 +05:30
pukkandan
7275535116 Split video by chapters (#158)
* New options `--split-chapters` and `--no-split-chapters`
* The output/path of the split files can be given using the key `chapter`
* Additional keys `section_title`, `section_number`, `section_start`, `section_end` are available in the output template
* Alias `--split-tracks` for parity with animelover/youtube-dl
* `--sponskrub-cut` and `--split-chapter` cannot work together

Closes:
https://github.com/blackjack4494/yt-dlc/issues/277
https://github.com/ytdl-org/youtube-dl/issues/28438
https://github.com/ytdl-org/youtube-dl/issues/12907
https://github.com/ytdl-org/youtube-dl/issues/6480
https://github.com/ytdl-org/youtube-dl/pull/25005

Rewritten from the implementation by: femaref and Wattux
https://github.com/Wattux/youtube-dl/tree/split-at-timestamps
https://github.com/ytdl-org/youtube-dl/pull/25005
https://github.com/femaref/youtube-dl/tree/split-track
2021-03-15 04:32:13 +05:30
Matthew
a1c5d2ca64 [Youtube] Rewrite comment extraction (#167)
Closes #121

TODO:
* Add an option for the user to specify newest/popular and max number of comments
* Refactor the download code and generalize with TabIE
* Parse time_text to timestamp
2021-03-15 04:11:11 +05:30
pukkandan
ca87974543 [embedthumbnail] Set mtime correctly
Related: https://github.com/yt-dlp/yt-dlp/issues/67
2021-03-14 21:56:04 +05:30
pukkandan
e92caff5d5 Refactor (See desc)
* Create `FFmpegPostProcessor.real_run_ffmpeg` that can accept multiple input/output files along with switches for each
* Rewrite `cli_configuration_args` and related functions
* Create `YoutubeDL._ensure_dir_exists` - this was previously defined in multiple places
2021-03-14 20:02:55 +05:30
CHJ85
ea3a012d2a [pluto.tv] Add extractor (#163)
https://github.com/ytdl-org/youtube-dl/pull/27621

Authored by: kevinoconnor7
2021-03-14 16:02:16 +05:30
pukkandan
5b8917fb52 [zee5] Support zee5originals 2021-03-14 15:22:29 +05:30
nixxo
8eec0120a2 [rai] fix drm check (#168)
Bug introduced by #150
Authored by: nixxo
2021-03-13 21:08:50 +05:30
shirt
4cf1e5d2f9 Native concurrent downloading of fragments (#166)
* Option `--concurrent-fragments` (`-N`) to set the number of threads

Related: #165

Known issues:
* When receiving Ctrl+C, the process will exit only after finishing the currently downloading fragments
* The download progress shows the speed of only one thread

Authored by shirt-dev
2021-03-13 10:16:58 +05:30
pukkandan
0a473f2f0f More improvements to HLS/DASH external downloader code
* Fix error when there is no `protocol` in `info_dict`
* Move HLS byte range detection to `Aria2cFD` so that the download will fall back to the native downloader instead of ffmpeg
* Fix bug with getting no fragments in DASH
* Convert `check_results` in `can_download` to a generator
2021-03-11 22:07:42 +05:30
nixxo
e4edeb6226 [wimtv] Add extractor (#161)
Added support for VODs, live and embeds

Authored by: nixxo
2021-03-11 13:28:51 +05:30
Ashish
d488e254d9 [Zee5] Add Show Extractor (#160)
Co-authored-by: Ashish <ashish@pop-os.localdomain>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
2021-03-11 13:18:09 +05:30
shirt-dev
d7009caa03 Improve HLS/DASH external downloader code (#162)
Authored by: shirt
2021-03-10 20:09:40 +05:30
pukkandan
54759df586 [zee5] Improve regex 2021-03-09 15:17:16 +05:30
nixxo
605b684c2d [mtv] Add mtv.it and extract series metadata (#156)
* New extractors: MTVItalia, MTVItaliaProgramma
* Extract fields: series, season_number, episode_number

Authored-by: nixxo
2021-03-08 19:10:27 +05:30
pukkandan
994443d24d [version] update :ci skip all 2021-03-08 00:16:25 +05:30
pukkandan
c5640c4508 Release 2021.03.07 2021-03-08 00:06:26 +05:30
teesid
1f52a09e2e [vimeo] Fix videos with password
https://github.com/ytdl-org/youtube-dl/pull/27992

Fixes: https://github.com/ytdl-org/youtube-dl/issues/28354

Authored by teesid
2021-03-07 23:47:53 +05:30
pukkandan
fc21af505c Fix some videos downloading with m3u8 extension 2021-03-07 23:22:12 +05:30
pukkandan
015f3b3120 [bilibili] Change Accept header (Closes #145)
This is a temporary fix. Ideally we should find a more reasonable accept string that just "*/*"

Fixes: https://github.com/ytdl-org/youtube-dl/issues/28363 https://github.com/ytdl-org/youtube-dl/issues/28341

Thanks to animelover1984 for identifying the problem
2021-03-07 17:59:59 +05:30
Ashish
5ba4a0b69c [Documentation] Inclusion of two-line install script for Unix (#155)
Closes #83
Authored-by: Ashish <ashish@pop-os.localdomain>

ci skip all
2021-03-07 15:29:01 +05:30
nixxo
0852947fcc [rai] Check for DRM (#150)
Authored by: nixxo <nixxo@protonmail.com>
2021-03-07 13:01:59 +05:30
pukkandan
99594a11ce Remove "fixup is ignored" warning when fixup wasn't passed by user
Closes #151
2021-03-07 12:32:59 +05:30
pukkandan
2be71994c0 [youtube] Detect when Mixes end or wrap around 2021-03-07 11:04:57 +05:30
pukkandan
26fe8ffed0 [youtube] Fix community page continuation (Closes #152) 2021-03-07 11:04:55 +05:30
nixxo
feee67ae88 [gedi] Improvements from youtube-dl (#149)
Authored-by: nixxo <c.nixxo@gmail.com>
2021-03-06 23:40:32 +05:30
Ashish
1caaf92d47 [MXPlayer] Rewrite extractor with show support (#141)
Co-authored-by: Ashish <ashish@pop-os.localdomain>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
2021-03-06 01:11:02 +05:30
Matthew
d069eca7a3 [Youtube] Fix private feeds/playlists on multi-channel accounts (#143)
Authored by: colethedj
2021-03-05 19:29:14 +05:30
Matthew
f3eaa8dd1c [Youtube] Extract alerts from continuation (#144)
Related: #143

Authored by: colethedj
2021-03-05 15:37:32 +05:30
pukkandan
9e631877f8 [downloader] Fix bug for ffmpeg/httpie
Caused by: 7f7de7f94d
2021-03-05 04:22:37 +05:30
pukkandan
36147a63e3 [trovo] Pass origin header (Closes #139)
Fixes: https://github.com/ytdl-org/youtube-dl/issues/28346
2021-03-04 23:59:37 +05:30
pukkandan
57db6a87ef [lbry] Support lbry:// url
https://github.com/ytdl-org/youtube-dl/pull/28207

Fixes: https://github.com/ytdl-org/youtube-dl/issues/28084

Authored by: nixxo <nixxo@protonmail.com>
2021-03-04 23:45:28 +05:30
pukkandan
cd7c66cf01 [youtube] Fix history, trending and mix playlists (#136)
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: Matthew <colethedj@protonmail.com>
2021-03-04 23:35:26 +05:30
shirt-dev
2c736b4f61 [cbs] Add support for ParamountPlus (#138)
Related: https://github.com/ytdl-org/youtube-dl/issues/28342

Authored-by: shirtjs <2660574+shirtjs@users.noreply.github.com>
2021-03-04 20:20:07 +05:30
pukkandan
c4a508ab31 [update] Fix updater removing the executable bit on some UNIX distros
Closes #133
2021-03-03 19:07:14 +05:30
pukkandan
7815e55572 [update] Fix current build hash for UNIX 2021-03-03 19:02:21 +05:30
pukkandan
162e6f0000 [version] update :ci skip all 2021-03-03 16:42:23 +05:30
pukkandan
a8278ababd Release 2021.03.03.2 2021-03-03 16:34:14 +05:30
pukkandan
bd9ed42387 [build] fix bug from da7f321e93 2021-03-03 16:31:27 +05:30
pukkandan
5f7514957f Release 2021.03.03 2021-03-03 16:27:55 +05:30
57 changed files with 2412 additions and 1165 deletions

View File

@@ -21,7 +21,7 @@ assignees: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.03.01** - [ ] I've verified that I'm running yt-dlp version **2021.03.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.01 [debug] yt-dlp version 2021.03.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates. - Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.03.01** - [ ] I've verified that I'm running yt-dlp version **2021.03.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.01** - [ ] I've verified that I'm running yt-dlp version **2021.03.07**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.03.01** - [ ] I've verified that I'm running yt-dlp version **2021.03.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.01 [debug] yt-dlp version 2021.03.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.07. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates. - Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.01** - [ ] I've verified that I'm running yt-dlp version **2021.03.07**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -29,7 +29,7 @@ jobs:
- name: Print version - name: Print version
run: echo "${{ steps.bump_version.outputs.ytdlp_version }}" run: echo "${{ steps.bump_version.outputs.ytdlp_version }}"
- name: Run Make - name: Run Make
run: make yt-dlp run: make
- name: Create Release - name: Create Release
id: create_release id: create_release
uses: actions/create-release@v1 uses: actions/create-release@v1

1
.gitignore vendored
View File

@@ -60,6 +60,7 @@ yt-dlp.zip
*.mkv *.mkv
*.swf *.swf
*.part *.part
*.part-*
*.ytdl *.ytdl
*.dump *.dump
*.frag *.frag

View File

@@ -21,6 +21,7 @@ nao20010128nao
kurumigi kurumigi
tsukumi tsukumi
bbepis bbepis
animelover1984
Pccode66 Pccode66
Ashish Ashish
RobinD42 RobinD42
@@ -28,3 +29,4 @@ hseg
colethedj colethedj
DennyDai DennyDai
codeasashu codeasashu
teesid

View File

@@ -17,16 +17,58 @@
--> -->
### 2021.03.15
* **Split video by chapters**: using option `--split-chapters`
* The output file of the split files can be set with `-o`/`-P` using the prefix `chapter:`
* Additional keys `section_title`, `section_number`, `section_start`, `section_end` are available in the output template
* **Parallel fragment downloads** by [shirt](https://github.com/shirt-dev)
* Use option `--concurrent-fragments` (`-N`) to set the number of threads (default 1)
* Merge youtube-dl: Upto [commit/3be0980](https://github.com/ytdl-org/youtube-dl/commit/3be098010f667b14075e3dfad1e74e5e2becc8ea)
* [Zee5] Add Show Extractor by [Ashish](https://github.com/Ashish) and [pukkandan](https://github.com/pukkandan)
* [rai] fix drm check [nixxo](https://github.com/nixxo)
* [zee5] Support zee5originals
* [wimtv] Add extractor by [nixxo](https://github.com/nixxo)
* [mtv] Add mtv.it and extract series metadata by [nixxo](https://github.com/nixxo)
* [pluto.tv] Add extractor by [kevinoconnor7](https://github.com/kevinoconnor7)
* [Youtube] Rewrite comment extraction by [colethedj](https://github.com/colethedj)
* [embedthumbnail] Set mtime correctly
* Refactor some postprocessor/downloader code by [pukkandan](https://github.com/pukkandan) and [shirt](https://github.com/shirt-dev)
### 2021.03.07
* [youtube] Fix history, mixes, community pages and trending by [pukkandan](https://github.com/pukkandan) and [colethedj](https://github.com/colethedj)
* [youtube] Fix private feeds/playlists on multi-channel accounts by [colethedj](https://github.com/colethedj)
* [youtube] Extract alerts from continuation by [colethedj](https://github.com/colethedj)
* [cbs] Add support for ParamountPlus by [shirt](https://github.com/shirt-dev)
* [mxplayer] Rewrite extractor with show support by [pukkandan](https://github.com/pukkandan) and [Ashish](https://github.com/Ashish)
* [gedi] Improvements from youtube-dl by [nixxo](https://github.com/nixxo)
* [vimeo] Fix videos with password by [teesid](https://github.com/teesid)
* [lbry] Support `lbry://` url by [nixxo](https://github.com/nixxo)
* [bilibili] Change `Accept` header by [pukkandan](https://github.com/pukkandan) and [animelover1984](https://github.com/animelover1984)
* [trovo] Pass origin header
* [rai] Check for DRM by [nixxo](https://github.com/nixxo)
* [downloader] Fix bug for `ffmpeg`/`httpie`
* [update] Fix updater removing the executable bit on some UNIX distros
* [update] Fix current build hash for UNIX
* [documentation] Include wget/curl/aria2c install instructions for Unix by [Ashish](https://github.com/Ashish)
* Fix some videos downloading with `m3u8` extension
* Remove "fixup is ignored" warning when fixup wasn't passed by user
### 2021.03.03.2
* [build] Fix bug
### 2021.03.03 ### 2021.03.03
* [youtube] Use new browse API for continuation page extraction by @colethedj and @pukkandan * [youtube] Use new browse API for continuation page extraction by [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* Fix HLS playlist downloading by @shirt * Fix HLS playlist downloading by [shirt](https://github.com/shirt-dev)
* **Merge youtube-dl:** Upto [2021.03.03](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.03.03) * Merge youtube-dl: Upto [2021.03.03](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.03.03)
* [mtv] Fix extractor * [mtv] Fix extractor
* [nick] Fix extractor by @DennyDai * [nick] Fix extractor by [DennyDai](https://github.com/DennyDai)
* [mxplayer] Add new extractor by@codeasashu * [mxplayer] Add new extractor by [codeasashu](https://github.com/codeasashu)
* [youtube] Throw error when `--extractor-retries` are exhausted * [youtube] Throw error when `--extractor-retries` are exhausted
* Reduce default of `--extractor-retries` to 3 * Reduce default of `--extractor-retries` to 3
* Fix packaging bugs by @hseg * Fix packaging bugs by [hseg](https://github.com/hseg)
### 2021.03.01 ### 2021.03.01
@@ -55,7 +97,7 @@
* Moved project to an organization [yt-dlp](https://github.com/yt-dlp) * Moved project to an organization [yt-dlp](https://github.com/yt-dlp)
* **Completely changed project name to yt-dlp** by [Pccode66](https://github.com/Pccode66) and [pukkandan](https://github.com/pukkandan) * **Completely changed project name to yt-dlp** by [Pccode66](https://github.com/Pccode66) and [pukkandan](https://github.com/pukkandan)
* Also, `youtube-dlc` config files are no longer loaded * Also, `youtube-dlc` config files are no longer loaded
* **Merge youtube-dl:** Upto [commit/4460329](https://github.com/ytdl-org/youtube-dl/commit/44603290e5002153f3ebad6230cc73aef42cc2cd) (except tmz, gedi) * Merge youtube-dl: Upto [commit/4460329](https://github.com/ytdl-org/youtube-dl/commit/44603290e5002153f3ebad6230cc73aef42cc2cd) (except tmz, gedi)
* [Readthedocs](https://yt-dlp.readthedocs.io) support by [shirt](https://github.com/shirt-dev) * [Readthedocs](https://yt-dlp.readthedocs.io) support by [shirt](https://github.com/shirt-dev)
* [youtube] Show if video was a live stream in info (`was_live`) * [youtube] Show if video was a live stream in info (`was_live`)
* [Zee5] Add new extractor by [Ashish](https://github.com/Ashish) and [pukkandan](https://github.com/pukkandan) * [Zee5] Add new extractor by [Ashish](https://github.com/Ashish) and [pukkandan](https://github.com/pukkandan)
@@ -73,7 +115,7 @@
### 2021.02.19 ### 2021.02.19
* **Merge youtube-dl:** Upto [commit/cf2dbec](https://github.com/ytdl-org/youtube-dl/commit/cf2dbec6301177a1fddf72862de05fa912d9869d) (except kakao) * Merge youtube-dl: Upto [commit/cf2dbec](https://github.com/ytdl-org/youtube-dl/commit/cf2dbec6301177a1fddf72862de05fa912d9869d) (except kakao)
* [viki] Fix extractor * [viki] Fix extractor
* [niconico] Extract `channel` and `channel_id` by [kurumigi](https://github.com/kurumigi) * [niconico] Extract `channel` and `channel_id` by [kurumigi](https://github.com/kurumigi)
* [youtube] Multiple page support for hashtag URLs * [youtube] Multiple page support for hashtag URLs
@@ -98,7 +140,7 @@
### 2021.02.15 ### 2021.02.15
* **Merge youtube-dl:** Upto [2021.02.10](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.10) (except archive.org) * Merge youtube-dl: Upto [2021.02.10](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.10) (except archive.org)
* [niconico] Improved extraction and support encrypted/SMILE movies by [kurumigi](https://github.com/kurumigi), [tsukumi](https://github.com/tsukumi), [bbepis](https://github.com/bbepis), [pukkandan](https://github.com/pukkandan) * [niconico] Improved extraction and support encrypted/SMILE movies by [kurumigi](https://github.com/kurumigi), [tsukumi](https://github.com/tsukumi), [bbepis](https://github.com/bbepis), [pukkandan](https://github.com/pukkandan)
* Fix HLS AES-128 with multiple keys in external downloaders by [shirt](https://github.com/shirt-dev) * Fix HLS AES-128 with multiple keys in external downloaders by [shirt](https://github.com/shirt-dev)
* [youtube_live_chat] Fix by using POST API by [siikamiika](https://github.com/siikamiika) * [youtube_live_chat] Fix by using POST API by [siikamiika](https://github.com/siikamiika)
@@ -141,7 +183,7 @@
### 2021.02.04 ### 2021.02.04
* **Merge youtube-dl:** Upto [2021.02.04.1](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.04.1) * Merge youtube-dl: Upto [2021.02.04.1](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.04.1)
* **Date/time formatting in output template:** * **Date/time formatting in output template:**
* You can use [`strftime`](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) to format date/time fields. Example: `%(upload_date>%Y-%m-%d)s` * You can use [`strftime`](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) to format date/time fields. Example: `%(upload_date>%Y-%m-%d)s`
* **Multiple output templates:** * **Multiple output templates:**
@@ -195,7 +237,7 @@
### 2021.01.24 ### 2021.01.24
* **Merge youtube-dl:** Upto [2021.01.24](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16) * Merge youtube-dl: Upto [2021.01.24](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* Plugin support ([documentation](https://github.com/yt-dlp/yt-dlp#plugins)) * Plugin support ([documentation](https://github.com/yt-dlp/yt-dlp#plugins))
* **Multiple paths**: New option `-P`/`--paths` to give different paths for different types of files * **Multiple paths**: New option `-P`/`--paths` to give different paths for different types of files
* The syntax is `-P "type:path" -P "type:path"` ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=-P,%20--paths%20TYPE:PATH)) * The syntax is `-P "type:path" -P "type:path"` ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=-P,%20--paths%20TYPE:PATH))
@@ -224,7 +266,7 @@
### 2021.01.16 ### 2021.01.16
* **Merge youtube-dl:** Upto [2021.01.16](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16) * Merge youtube-dl: Upto [2021.01.16](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* **Configuration files:** * **Configuration files:**
* Portable configuration file: `./yt-dlp.conf` * Portable configuration file: `./yt-dlp.conf`
* Allow the configuration files to be named `yt-dlp` instead of `youtube-dlc`. See [this](https://github.com/yt-dlp/yt-dlp#configuration) for details * Allow the configuration files to be named `yt-dlp` instead of `youtube-dlc`. See [this](https://github.com/yt-dlp/yt-dlp#configuration) for details
@@ -270,8 +312,7 @@
### 2021.01.08 ### 2021.01.08
* **Merge youtube-dl:** Upto [2021.01.08](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.08) * Merge youtube-dl: Upto [2021.01.08](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.08) except stitcher ([1](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc), [2](https://github.com/ytdl-org/youtube-dl/commit/a563c97c5cddf55f8989ed7ea8314ef78e30107f))
* Extractor stitcher ([1](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc), [2](https://github.com/ytdl-org/youtube-dl/commit/a563c97c5cddf55f8989ed7ea8314ef78e30107f)) have not been merged
* Moved changelog to seperate file * Moved changelog to seperate file
@@ -310,7 +351,7 @@
* Changed video format sorting to show video only files and video+audio files together. * Changed video format sorting to show video only files and video+audio files together.
* Added `--video-multistreams`, `--no-video-multistreams`, `--audio-multistreams`, `--no-audio-multistreams` * Added `--video-multistreams`, `--no-video-multistreams`, `--audio-multistreams`, `--no-audio-multistreams`
* Added `b`,`w`,`v`,`a` as alias for `best`, `worst`, `video` and `audio` respectively * Added `b`,`w`,`v`,`a` as alias for `best`, `worst`, `video` and `audio` respectively
* **Shortcut Options:** Added `--write-link`, `--write-url-link`, `--write-webloc-link`, `--write-desktop-link` by [h-h-h-h](https://github.com/h-h-h-h) - See [Internet Shortcut Options](README.md#internet-shortcut-options) for details * Shortcut Options: Added `--write-link`, `--write-url-link`, `--write-webloc-link`, `--write-desktop-link` by [h-h-h-h](https://github.com/h-h-h-h) - See [Internet Shortcut Options](README.md#internet-shortcut-options) for details
* **Sponskrub integration:** Added `--sponskrub`, `--sponskrub-cut`, `--sponskrub-force`, `--sponskrub-location`, `--sponskrub-args` - See [SponSkrub Options](README.md#sponskrub-sponsorblock-options) for details * **Sponskrub integration:** Added `--sponskrub`, `--sponskrub-cut`, `--sponskrub-force`, `--sponskrub-location`, `--sponskrub-args` - See [SponSkrub Options](README.md#sponskrub-sponsorblock-options) for details
* Added `--force-download-archive` (`--force-write-archive`) by [h-h-h-h](https://github.com/h-h-h-h) * Added `--force-download-archive` (`--force-write-archive`) by [h-h-h-h](https://github.com/h-h-h-h)
* Added `--list-formats-as-table`, `--list-formats-old` * Added `--list-formats-as-table`, `--list-formats-old`
@@ -320,36 +361,38 @@
* Relaxed validation for format filters so that any arbitrary field can be used * Relaxed validation for format filters so that any arbitrary field can be used
* Fix for embedding thumbnail in mp3 by [pauldubois98](https://github.com/pauldubois98) ([ytdl-org/youtube-dl#21569](https://github.com/ytdl-org/youtube-dl/pull/21569)) * Fix for embedding thumbnail in mp3 by [pauldubois98](https://github.com/pauldubois98) ([ytdl-org/youtube-dl#21569](https://github.com/ytdl-org/youtube-dl/pull/21569))
* Make Twitch Video ID output from Playlist and VOD extractor same. This is only a temporary fix * Make Twitch Video ID output from Playlist and VOD extractor same. This is only a temporary fix
* **Merge youtube-dl:** Upto [2021.01.03](https://github.com/ytdl-org/youtube-dl/commit/8e953dcbb10a1a42f4e12e4e132657cb0100a1f8) - See [blackjack4494/yt-dlc#280](https://github.com/blackjack4494/yt-dlc/pull/280) for details * Merge youtube-dl: Upto [2021.01.03](https://github.com/ytdl-org/youtube-dl/commit/8e953dcbb10a1a42f4e12e4e132657cb0100a1f8) - See [blackjack4494/yt-dlc#280](https://github.com/blackjack4494/yt-dlc/pull/280) for details
* Extractors [tiktok](https://github.com/ytdl-org/youtube-dl/commit/fb626c05867deab04425bad0c0b16b55473841a2) and [hotstar](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc) have not been merged * Extractors [tiktok](https://github.com/ytdl-org/youtube-dl/commit/fb626c05867deab04425bad0c0b16b55473841a2) and [hotstar](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc) have not been merged
* Cleaned up the fork for public use * Cleaned up the fork for public use
**PS**: All uncredited changes above this point are authored by [pukkandan](https://github.com/pukkandan)
### Unreleased changes in [blackjack4494/yt-dlc](https://github.com/blackjack4494/yt-dlc) ### Unreleased changes in [blackjack4494/yt-dlc](https://github.com/blackjack4494/yt-dlc)
* Updated to youtube-dl release 2020.11.26 * Updated to youtube-dl release 2020.11.26 by [pukkandan](https://github.com/pukkandan)
* [youtube] * Youtube improvements by [pukkandan](https://github.com/pukkandan)
* Implemented all Youtube Feeds (ytfav, ytwatchlater, ytsubs, ythistory, ytrec) and SearchURL * Implemented all Youtube Feeds (ytfav, ytwatchlater, ytsubs, ythistory, ytrec) and SearchURL
* Fix ytsearch not returning results sometimes due to promoted content
* Temporary fix for automatic captions - disable json3
* Fix some improper Youtube URLs * Fix some improper Youtube URLs
* Redirect channel home to /video * Redirect channel home to /video
* Print youtube's warning message * Print youtube's warning message
* Multiple pages are handled better for feeds * Handle Multiple pages for feeds better
* [youtube] Fix ytsearch not returning results sometimes due to promoted content by [colethedj](https://github.com/colethedj)
* [youtube] Temporary fix for automatic captions - disable json3 by [blackjack4494](https://github.com/blackjack4494)
* Add --break-on-existing by [gergesh](https://github.com/gergesh) * Add --break-on-existing by [gergesh](https://github.com/gergesh)
* Pre-check video IDs in the archive before downloading * Pre-check video IDs in the archive before downloading by [pukkandan](https://github.com/pukkandan)
* [bitwave.tv] New extractor * [bitwave.tv] New extractor by [lorpus](https://github.com/lorpus)
* [Gedi] Add extractor * [Gedi] Add extractor by [nixxo](https://github.com/nixxo)
* [Rcs] Add new extractor * [Rcs] Add new extractor by [nixxo](https://github.com/nixxo)
* [skyit] Add support for multiple Sky Italia website and removed old skyitalia extractor * [skyit] New skyitalia extractor by [nixxo](https://github.com/nixxo)
* [france.tv] Fix thumbnail URL * [france.tv] Fix thumbnail URL by [renalid](https://github.com/renalid)
* [ina] support mobile links * [ina] support mobile links by [B0pol](https://github.com/B0pol)
* [instagram] Fix extractor * [instagram] Fix thumbnail extractor by [nao20010128nao](https://github.com/nao20010128nao)
* [itv] BTCC new pages' URL update (articles instead of races) * [SouthparkDe] Support for English URLs by [xypwn](https://github.com/xypwn)
* [SouthparkDe] Support for English URLs * [spreaker] fix SpreakerShowIE test URL by [pukkandan](https://github.com/pukkandan)
* [spreaker] fix SpreakerShowIE test URL * [Vlive] Fix playlist handling when downloading a channel by [kyuyeunk](https://github.com/kyuyeunk)
* [Vlive] Fix playlist handling when downloading a channel * [tmz] Fix extractor by [diegorodriguezv](https://github.com/diegorodriguezv)
* [generic] Detect embedded bitchute videos * [generic] Detect embedded bitchute videos by [pukkandan](https://github.com/pukkandan)
* [generic] Extract embedded youtube and twitter videos * [generic] Extract embedded youtube and twitter videos by [diegorodriguezv](https://github.com/diegorodriguezv)
* [ffmpeg] Ensure all streams are copied * [ffmpeg] Ensure all streams are copied by [pukkandan](https://github.com/pukkandan)
* Fix for os.rename error when embedding thumbnail to video in a different drive * [embedthumbnail] Fix for os.rename error by [pukkandan](https://github.com/pukkandan)
* make_win.bat: don't use UPX to pack vcruntime140.dll * make_win.bat: don't use UPX to pack vcruntime140.dll by [jbruchon](https://github.com/jbruchon)

View File

@@ -1,4 +1,4 @@
all: yt-dlp doc all: yt-dlp doc pypi-files
clean: clean-test clean-dist clean-cache clean: clean-test clean-dist clean-cache
completions: completion-bash completion-fish completion-zsh completions: completion-bash completion-fish completion-zsh
doc: README.md CONTRIBUTING.md issuetemplates supportedsites doc: README.md CONTRIBUTING.md issuetemplates supportedsites

View File

@@ -57,7 +57,7 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection that what is possible by simply using `--format` ([examples](#format-selection-examples)) * **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection that what is possible by simply using `--format` ([examples](#format-selection-examples))
* **Merged with youtube-dl v2021.03.03**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc) * **Merged with youtube-dl v2021.03.14**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--get-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, Playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details. * **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--get-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, Playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
@@ -66,17 +66,19 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* Youtube search (`ytsearch:`, `ytsearchdate:`) along with Search URLs works correctly * Youtube search (`ytsearch:`, `ytsearchdate:`) along with Search URLs works correctly
* Redirect channel's home URL automatically to `/video` to preserve the old behaviour * Redirect channel's home URL automatically to `/video` to preserve the old behaviour
* **Split video by chapters**: Videos can be split into multiple files based on chapters using `--split-chapters`
* **Multithreaded fragment downloads**: Fragment downloads can be natively multi-threaded. Use `--concurrent-fragments` (`-N`) option to set the number of threads used
* **Aria2c with HLS/DASH**: You can use aria2c as the external downloader for DASH(mpd) and HLS(m3u8) formats. No more slow ffmpeg/native downloads * **Aria2c with HLS/DASH**: You can use aria2c as the external downloader for DASH(mpd) and HLS(m3u8) formats. No more slow ffmpeg/native downloads
* **New extractors**: AnimeLab, Philo MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5 * **New extractors**: AnimeLab, Philo MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, tiktok, akamai, ina, rumble, tennistv * **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, tiktok, akamai, ina, rumble, tennistv
* **Plugin support**: Extractors can be loaded from an external file. See [plugins](#plugins) for details * **Plugin support**: Extractors can be loaded from an external file. See [plugins](#plugins) for details
* **Multiple paths and output templates**: You can give different [output templates](#output-template) and download paths for different types of files. You can also set a temporary path where intermediary files are downloaded to. See [`--paths`](https://github.com/yt-dlp/yt-dlp/#:~:text=-P,%20--paths%20TYPE:PATH) for details * **Multiple paths and output templates**: You can give different [output templates](#output-template) and download paths for different types of files. You can also set a temporary path where intermediary files are downloaded to using `--paths` (`-P`)
<!-- Relative link doesn't work for "#:~:text=" -->
* **Portable Configuration**: Configuration files are automatically loaded from the home and root directories. See [configuration](#configuration) for details * **Portable Configuration**: Configuration files are automatically loaded from the home and root directories. See [configuration](#configuration) for details
@@ -103,6 +105,23 @@ You can install yt-dlp using one of the following methods:
* Use pip+git: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp.git@release` * Use pip+git: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp.git@release`
* Install master branch: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp` * Install master branch: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp`
UNIX users (Linux, macOS, BSD) can also install the [latest release](https://github.com/yt-dlp/yt-dlp/releases/latest) one of the following ways:
```
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
```
sudo wget https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -O /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
```
sudo aria2c https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
### UPDATE ### UPDATE
Starting from version `2021.02.09`, you can use `yt-dlp -U` to update if you are using the provided release. Starting from version `2021.02.09`, you can use `yt-dlp -U` to update if you are using the provided release.
If you are using `pip`, simply re-run the same command that was used to install the program. If you are using `pip`, simply re-run the same command that was used to install the program.
@@ -177,7 +196,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
only list them only list them
--no-flat-playlist Extract the videos of a playlist --no-flat-playlist Extract the videos of a playlist
--mark-watched Mark videos watched (YouTube only) --mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched --no-mark-watched Do not mark videos watched (default)
--no-colors Do not emit color codes in output --no-colors Do not emit color codes in output
## Network Options: ## Network Options:
@@ -280,6 +299,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-include-ads Do not download advertisements (default) --no-include-ads Do not download advertisements (default)
## Download Options: ## Download Options:
-N, --concurrent-fragments N Number of fragments to download
concurrently (default is 1)
-r, --limit-rate RATE Maximum download rate in bytes per second -r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M) (e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
@@ -624,11 +645,12 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
arguments to the specified executable only arguments to the specified executable only
when being used by the specified when being used by the specified
postprocessor. Additionally, for postprocessor. Additionally, for
ffmpeg/ffprobe, a number can be appended to ffmpeg/ffprobe, "_i"/"_o" can be appended
the exe name seperated by "_i" to pass the to the prefix optionally followed by a
argument before the specified input file. number to pass the argument before the
Eg: --ppa "Merger+ffmpeg_i1:-v quiet". You specified input/output file. Eg: --ppa
can use this option multiple times to give "Merger+ffmpeg_i1:-v quiet". You can use
this option multiple times to give
different arguments to different different arguments to different
postprocessors. (Alias: --ppa) postprocessors. (Alias: --ppa)
-k, --keep-video Keep the intermediate video file on disk -k, --keep-video Keep the intermediate video file on disk
@@ -674,6 +696,13 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
push {} /sdcard/Music/ && rm {}' push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format --convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc) (currently supported: srt|ass|vtt|lrc)
--split-chapters Split video into multiple files based on
internal chapters. The "chapter:" prefix
can be used with "--paths" and "--output"
to set the output filename for the split
files. See "OUTPUT TEMPLATE" for details
--no-split-chapters Do not split video based on chapters
(default)
## SponSkrub (SponsorBlock) Options: ## SponSkrub (SponsorBlock) Options:
[SponSkrub](https://github.com/yt-dlp/SponSkrub) is a utility to [SponSkrub](https://github.com/yt-dlp/SponSkrub) is a utility to
@@ -789,9 +818,9 @@ The `-o` option is used to indicate a template for the output file names while `
**tl;dr:** [navigate me to examples](#output-template-examples). **tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage of `-o` is not to set any template arguments when downloading a single file, like in `yt-dlp -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Date/time fields can also be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it inside the parantheses seperated from the field name using a `>`. For example, `%(duration>%H-%M-%S)s`. The basic usage of `-o` is not to set any template arguments when downloading a single file, like in `yt-dlp -o funny_video.flv "https://some/video"` (hard-coding file extension like this is not recommended). However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Date/time fields can also be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it inside the parantheses seperated from the field name using a `>`. For example, `%(duration>%H-%M-%S)s`.
Additionally, you can set different output templates for the various metadata files seperately from the general output template by specifying the type of file followed by the template seperated by a colon ":". The different filetypes supported are `subtitle|thumbnail|description|annotation|infojson|pl_description|pl_infojson`. For example, `-o '%(title)s.%(ext)s' -o 'thumbnail:%(title)s\%(title)s.%(ext)s'` will put the thumbnails in a folder with the same name as the video. Additionally, you can set different output templates for the various metadata files seperately from the general output template by specifying the type of file followed by the template seperated by a colon ":". The different filetypes supported are `subtitle`, `thumbnail`, `description`, `annotation`, `infojson`, `pl_description`, `pl_infojson`, `chapter`. For example, `-o '%(title)s.%(ext)s' -o 'thumbnail:%(title)s\%(title)s.%(ext)s'` will put the thumbnails in a folder with the same name as the video.
The available fields are: The available fields are:
@@ -800,6 +829,7 @@ The available fields are:
- `url` (string): Video URL - `url` (string): Video URL
- `ext` (string): Video filename extension - `ext` (string): Video filename extension
- `alt_title` (string): A secondary title of the video - `alt_title` (string): A secondary title of the video
- `description` (string): The description of the video
- `display_id` (string): An alternative identifier for the video - `display_id` (string): An alternative identifier for the video
- `uploader` (string): Full name of the video uploader - `uploader` (string): Full name of the video uploader
- `license` (string): License name the video is licensed under - `license` (string): License name the video is licensed under
@@ -882,6 +912,13 @@ Available for the media that is a track or a part of a music album:
- `disc_number` (numeric): Number of the disc or other physical medium the track belongs to - `disc_number` (numeric): Number of the disc or other physical medium the track belongs to
- `release_year` (numeric): Year (YYYY) when the album was released - `release_year` (numeric): Year (YYYY) when the album was released
Available when using `--split-chapters` for videos with internal chapters:
- `section_title` (string): Title of the chapter
- `section_number` (numeric): Number of the chapter within the file
- `section_start` (numeric): Start time of the chapter in seconds
- `section_end` (numeric): End time of the chapter in seconds
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default). Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKcj`, this will result in a `yt-dlp test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKcj`, this will result in a `yt-dlp test video-BaW_jenozKcj.mp4` file created in the current directory.

View File

@@ -347,8 +347,7 @@
- **Gaskrank** - **Gaskrank**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **Gedi** - **GediDigital**
- **GediEmbeds**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
- **Gfycat** - **Gfycat**
- **GiantBomb** - **GiantBomb**
@@ -562,6 +561,8 @@
- **mtg**: MTG services - **mtg**: MTG services
- **mtv** - **mtv**
- **mtv.de** - **mtv.de**
- **mtv.it**
- **mtv.it:programma**
- **mtv:video** - **mtv:video**
- **mtvjapan** - **mtvjapan**
- **mtvservices:embedded** - **mtvservices:embedded**
@@ -735,6 +736,7 @@
- **Playwire** - **Playwire**
- **pluralsight** - **pluralsight**
- **pluralsight:course** - **pluralsight:course**
- **PlutoTV**
- **podomatic** - **podomatic**
- **Pokemon** - **Pokemon**
- **PokemonWatch** - **PokemonWatch**
@@ -1172,6 +1174,7 @@
- **Weibo** - **Weibo**
- **WeiboMobile** - **WeiboMobile**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **WimTV**
- **Wistia** - **Wistia**
- **WistiaPlaylist** - **WistiaPlaylist**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
@@ -1242,6 +1245,7 @@
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **Zee5** - **Zee5**
- **zee5:series**
- **Zhihu** - **Zhihu**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn
- **zoom** - **zoom**

View File

@@ -37,7 +37,6 @@ class TestAllURLsMatching(unittest.TestCase):
assertPlaylist('PL63F0C78739B09958') assertPlaylist('PL63F0C78739B09958')
assertTab('https://www.youtube.com/AsapSCIENCE') assertTab('https://www.youtube.com/AsapSCIENCE')
assertTab('https://www.youtube.com/embedded') assertTab('https://www.youtube.com/embedded')
assertTab('https://www.youtube.com/feed') # Own channel's home page
assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q') assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8') assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC') assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')

View File

@@ -1171,6 +1171,9 @@ class YoutubeDL(object):
else: else:
raise Exception('Invalid result type: %s' % result_type) raise Exception('Invalid result type: %s' % result_type)
def _ensure_dir_exists(self, path):
return make_dir(path, self.report_error)
def __process_playlist(self, ie_result, download): def __process_playlist(self, ie_result, download):
# We process each entry in the playlist # We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id') playlist = ie_result.get('title') or ie_result.get('id')
@@ -1187,12 +1190,9 @@ class YoutubeDL(object):
} }
ie_copy.update(dict(ie_result)) ie_copy.update(dict(ie_result))
def ensure_dir_exists(path):
return make_dir(path, self.report_error)
if self.params.get('writeinfojson', False): if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(ie_copy, 'pl_infojson') infofn = self.prepare_filename(ie_copy, 'pl_infojson')
if not ensure_dir_exists(encodeFilename(infofn)): if not self._ensure_dir_exists(encodeFilename(infofn)):
return return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)): if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Playlist metadata is already present') self.to_screen('[info] Playlist metadata is already present')
@@ -1208,7 +1208,7 @@ class YoutubeDL(object):
if self.params.get('writedescription', False): if self.params.get('writedescription', False):
descfn = self.prepare_filename(ie_copy, 'pl_description') descfn = self.prepare_filename(ie_copy, 'pl_description')
if not ensure_dir_exists(encodeFilename(descfn)): if not self._ensure_dir_exists(encodeFilename(descfn)):
return return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)): if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Playlist description is already present') self.to_screen('[info] Playlist description is already present')
@@ -1794,14 +1794,18 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict: if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id'] info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None: for ts_key, date_key in (
# Working around out-of-range timestamp values (e.g. negative ones on Windows, ('timestamp', 'upload_date'),
# see http://bugs.python.org/issue1646728) ('release_timestamp', 'release_date'),
try: ):
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp']) if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
info_dict['upload_date'] = upload_date.strftime('%Y%m%d') # Working around out-of-range timestamp values (e.g. negative ones on Windows,
except (ValueError, OverflowError, OSError): # see http://bugs.python.org/issue1646728)
pass try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing # Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series. # in order to always have clean titles. This is very common for TV series.
@@ -2089,17 +2093,14 @@ class YoutubeDL(object):
if full_filename is None: if full_filename is None:
return return
def ensure_dir_exists(path): if not self._ensure_dir_exists(encodeFilename(full_filename)):
return make_dir(path, self.report_error)
if not ensure_dir_exists(encodeFilename(full_filename)):
return return
if not ensure_dir_exists(encodeFilename(temp_filename)): if not self._ensure_dir_exists(encodeFilename(temp_filename)):
return return
if self.params.get('writedescription', False): if self.params.get('writedescription', False):
descfn = self.prepare_filename(info_dict, 'description') descfn = self.prepare_filename(info_dict, 'description')
if not ensure_dir_exists(encodeFilename(descfn)): if not self._ensure_dir_exists(encodeFilename(descfn)):
return return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)): if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Video description is already present') self.to_screen('[info] Video description is already present')
@@ -2116,7 +2117,7 @@ class YoutubeDL(object):
if self.params.get('writeannotations', False): if self.params.get('writeannotations', False):
annofn = self.prepare_filename(info_dict, 'annotation') annofn = self.prepare_filename(info_dict, 'annotation')
if not ensure_dir_exists(encodeFilename(annofn)): if not self._ensure_dir_exists(encodeFilename(annofn)):
return return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(annofn)): if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present') self.to_screen('[info] Video annotations are already present')
@@ -2204,7 +2205,7 @@ class YoutubeDL(object):
if self.params.get('writeinfojson', False): if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(info_dict, 'infojson') infofn = self.prepare_filename(info_dict, 'infojson')
if not ensure_dir_exists(encodeFilename(infofn)): if not self._ensure_dir_exists(encodeFilename(infofn)):
return return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)): if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Video metadata is already present') self.to_screen('[info] Video metadata is already present')
@@ -2360,7 +2361,7 @@ class YoutubeDL(object):
fname = prepend_extension( fname = prepend_extension(
self.prepare_filename(new_info, 'temp'), self.prepare_filename(new_info, 'temp'),
'f%s' % f['format_id'], new_info['ext']) 'f%s' % f['format_id'], new_info['ext'])
if not ensure_dir_exists(fname): if not self._ensure_dir_exists(fname):
return return
downloaded.append(fname) downloaded.append(fname)
partial_success, real_download = dl(fname, new_info) partial_success, real_download = dl(fname, new_info)
@@ -2437,9 +2438,8 @@ class YoutubeDL(object):
else: else:
assert fixup_policy in ('ignore', 'never') assert fixup_policy in ('ignore', 'never')
if (info_dict.get('protocol') == 'm3u8_native' if ('protocol' in info_dict
or info_dict.get('protocol') == 'm3u8' and get_suitable_downloader(info_dict, self.params).__name__ == 'HlsFD'):
and self.params.get('hls_prefer_native')):
if fixup_policy == 'warn': if fixup_policy == 'warn':
self.report_warning('%s: malformed AAC bitstream detected.' % ( self.report_warning('%s: malformed AAC bitstream detected.' % (
info_dict['id'])) info_dict['id']))

View File

@@ -180,6 +180,8 @@ def _real_main(argv=None):
if opts.overwrites: if opts.overwrites:
# --yes-overwrites implies --no-continue # --yes-overwrites implies --no-continue
opts.continue_dl = False opts.continue_dl = False
if opts.concurrent_fragment_downloads <= 0:
raise ValueError('Concurrent fragments must be positive')
def parse_retries(retries, name=''): def parse_retries(retries, name=''):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@@ -277,9 +279,14 @@ def _real_main(argv=None):
def report_conflict(arg1, arg2): def report_conflict(arg1, arg2):
write_string('WARNING: %s is ignored since %s was given\n' % (arg2, arg1), out=sys.stderr) write_string('WARNING: %s is ignored since %s was given\n' % (arg2, arg1), out=sys.stderr)
if opts.remuxvideo and opts.recodevideo: if opts.remuxvideo and opts.recodevideo:
report_conflict('--recode-video', '--remux-video') report_conflict('--recode-video', '--remux-video')
opts.remuxvideo = False opts.remuxvideo = False
if opts.sponskrub_cut and opts.split_chapters and opts.sponskrub is not False:
report_conflict('--split-chapter', '--sponskrub-cut')
opts.sponskrub_cut = False
if opts.allow_unplayable_formats: if opts.allow_unplayable_formats:
if opts.extractaudio: if opts.extractaudio:
report_conflict('--allow-unplayable-formats', '--extract-audio') report_conflict('--allow-unplayable-formats', '--extract-audio')
@@ -369,11 +376,7 @@ def _real_main(argv=None):
}) })
if not already_have_thumbnail: if not already_have_thumbnail:
opts.writethumbnail = True opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file # This should be below most ffmpeg PP because it may cut parts out from the video
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# This should be below all ffmpeg PP because it may cut parts out from the video
# If opts.sponskrub is None, sponskrub is used, but it silently fails if the executable can't be found # If opts.sponskrub is None, sponskrub is used, but it silently fails if the executable can't be found
if opts.sponskrub is not False: if opts.sponskrub is not False:
postprocessors.append({ postprocessors.append({
@@ -384,6 +387,11 @@ def _real_main(argv=None):
'force': opts.sponskrub_force, 'force': opts.sponskrub_force,
'ignoreerror': opts.sponskrub is None, 'ignoreerror': opts.sponskrub is None,
}) })
if opts.split_chapters:
postprocessors.append({'key': 'FFmpegSplitChapters'})
# XAttrMetadataPP should be run after post-processors that may change file contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# ExecAfterDownload must be the last PP # ExecAfterDownload must be the last PP
if opts.exec_cmd: if opts.exec_cmd:
postprocessors.append({ postprocessors.append({
@@ -463,6 +471,7 @@ def _real_main(argv=None):
'extractor_retries': opts.extractor_retries, 'extractor_retries': opts.extractor_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments, 'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'keep_fragments': opts.keep_fragments, 'keep_fragments': opts.keep_fragments,
'concurrent_fragment_downloads': opts.concurrent_fragment_downloads,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'http_chunk_size': opts.http_chunk_size, 'http_chunk_size': opts.http_chunk_size,

View File

@@ -326,6 +326,12 @@ class FileDownloader(object):
"""Report it was impossible to resume download.""" """Report it was impossible to resume download."""
self.to_screen('[download] Unable to resume') self.to_screen('[download] Unable to resume')
@staticmethod
def supports_manifest(manifest):
""" Whether the downloader can download the fragments from the manifest.
Redefine in subclasses if needed. """
pass
def download(self, filename, info_dict, subtitle=False): def download(self, filename, info_dict, subtitle=False):
"""Download to a filename using the info from info_dict """Download to a filename using the info from info_dict
Return True on success and False otherwise Return True on success and False otherwise

View File

@@ -1,18 +1,26 @@
from __future__ import unicode_literals from __future__ import unicode_literals
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader from ..downloader import _get_real_downloader
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
from ..utils import ( from ..utils import (
DownloadError, DownloadError,
sanitize_open,
urljoin, urljoin,
) )
class DashSegmentsFD(FragmentFD): class DashSegmentsFD(FragmentFD):
""" """
Download segments in a DASH manifest Download segments in a DASH manifest. External downloaders can take over
the fragment downloads by supporting the 'frag_urls' protocol
""" """
FD_NAME = 'dashsegments' FD_NAME = 'dashsegments'
@@ -37,7 +45,7 @@ class DashSegmentsFD(FragmentFD):
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True) skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
fragment_urls = [] fragments_to_download = []
frag_index = 0 frag_index = 0
for i, fragment in enumerate(fragments): for i, fragment in enumerate(fragments):
frag_index += 1 frag_index += 1
@@ -48,49 +56,15 @@ class DashSegmentsFD(FragmentFD):
assert fragment_base_url assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path']) fragment_url = urljoin(fragment_base_url, fragment['path'])
if real_downloader: fragments_to_download.append({
fragment_urls.append(fragment_url) 'frag_index': frag_index,
continue 'index': i,
'url': fragment_url,
# In DASH, the first segment contains necessary headers to })
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False
self._append_fragment(ctx, frag_content)
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
self.report_skip_fragment(frag_index)
break
raise
if count > fragment_retries:
if not fatal:
self.report_skip_fragment(frag_index)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
if real_downloader: if real_downloader:
info_copy = info_dict.copy() info_copy = info_dict.copy()
info_copy['url_list'] = fragment_urls info_copy['fragments'] = fragments_to_download
fd = real_downloader(self.ydl, self.params) fd = real_downloader(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice # TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks: # for ph in self._progress_hooks:
@@ -99,5 +73,104 @@ class DashSegmentsFD(FragmentFD):
if not success: if not success:
return False return False
else: else:
def download_fragment(fragment):
i = fragment['index']
frag_index = fragment['frag_index']
fragment_url = fragment['url']
ctx['fragment_index'] = frag_index
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
break
raise
if count > fragment_retries:
if not fatal:
return False, frag_index
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False, frag_index
return frag_content, frag_index
def append_fragment(frag_content, frag_index):
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
self._append_fragment(ctx, frag_content)
return True
except FileNotFoundError:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(download_fragment, fragment) for fragment in fragments_to_download]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
results = [future.result() for future in futures]
for frag_content, frag_index in results:
result = append_fragment(frag_content, frag_index)
if not result:
return False
else:
for fragment in fragments_to_download:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
return True return True

View File

@@ -108,7 +108,8 @@ class ExternalFD(FileDownloader):
def _configuration_args(self, *args, **kwargs): def _configuration_args(self, *args, **kwargs):
return cli_configuration_args( return cli_configuration_args(
self.params.get('external_downloader_args'), self.params.get('external_downloader_args'),
self.get_basename(), *args, **kwargs) [self.get_basename(), 'default'],
*args, **kwargs)
def _call_downloader(self, tmpfilename, info_dict): def _call_downloader(self, tmpfilename, info_dict):
""" Either overwrite this or implement _make_cmd """ """ Either overwrite this or implement _make_cmd """
@@ -122,18 +123,14 @@ class ExternalFD(FileDownloader):
if p.returncode != 0: if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace')) self.to_stderr(stderr.decode('utf-8', 'replace'))
if 'url_list' in info_dict: if 'fragments' in info_dict:
file_list = [] file_list = []
for [i, url] in enumerate(info_dict['url_list']):
tmpsegmentname = '%s_%s.frag' % (tmpfilename, i)
file_list.append(tmpsegmentname)
key_list = info_dict.get('key_list')
decrypt_info = None
dest, _ = sanitize_open(tmpfilename, 'wb') dest, _ = sanitize_open(tmpfilename, 'wb')
for i, file in enumerate(file_list): for i, fragment in enumerate(info_dict['fragments']):
file = '%s-Frag%d' % (tmpfilename, i)
decrypt_info = fragment.get('decrypt_info')
src, _ = sanitize_open(file, 'rb') src, _ = sanitize_open(file, 'rb')
if key_list: if decrypt_info:
decrypt_info = next((x for x in key_list if x['INDEX'] == i), decrypt_info)
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') iv = decrypt_info.get('IV')
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen( decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
@@ -149,6 +146,7 @@ class ExternalFD(FileDownloader):
fragment_data = src.read() fragment_data = src.read()
dest.write(fragment_data) dest.write(fragment_data)
src.close() src.close()
file_list.append(file)
dest.close() dest.close()
if not self.params.get('keep_fragments', False): if not self.params.get('keep_fragments', False):
for file_path in file_list: for file_path in file_list:
@@ -245,10 +243,19 @@ class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v' AVAILABLE_OPT = '-v'
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'frag_urls') SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'frag_urls')
@staticmethod
def supports_manifest(manifest):
UNSUPPORTED_FEATURES = [
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [1]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
]
check_results = (not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES)
return all(check_results)
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c'] cmd = [self.exe, '-c']
dn = os.path.dirname(tmpfilename) dn = os.path.dirname(tmpfilename)
if 'url_list' not in info_dict: if 'fragments' not in info_dict:
cmd += ['--out', os.path.basename(tmpfilename)] cmd += ['--out', os.path.basename(tmpfilename)]
verbose_level_args = ['--console-log-level=warn', '--summary-interval=0'] verbose_level_args = ['--console-log-level=warn', '--summary-interval=0']
cmd += self._configuration_args(['--file-allocation=none', '-x16', '-j16', '-s16'] + verbose_level_args) cmd += self._configuration_args(['--file-allocation=none', '-x16', '-j16', '-s16'] + verbose_level_args)
@@ -262,14 +269,14 @@ class Aria2cFD(ExternalFD):
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=') cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=') cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--auto-file-renaming=false'] cmd += ['--auto-file-renaming=false']
if 'url_list' in info_dict: if 'fragments' in info_dict:
cmd += verbose_level_args cmd += verbose_level_args
cmd += ['--uri-selector', 'inorder', '--download-result=hide'] cmd += ['--uri-selector', 'inorder', '--download-result=hide']
url_list_file = '%s.frag.urls' % tmpfilename url_list_file = '%s.frag.urls' % tmpfilename
url_list = [] url_list = []
for [i, url] in enumerate(info_dict['url_list']): for i, fragment in enumerate(info_dict['fragments']):
tmpsegmentname = '%s_%s.frag' % (os.path.basename(tmpfilename), i) tmpsegmentname = '%s-Frag%d' % (os.path.basename(tmpfilename), i)
url_list.append('%s\n\tout=%s' % (url, tmpsegmentname)) url_list.append('%s\n\tout=%s' % (fragment['url'], tmpsegmentname))
stream, _ = sanitize_open(url_list_file, 'wb') stream, _ = sanitize_open(url_list_file, 'wb')
stream.write('\n'.join(url_list).encode('utf-8')) stream.write('\n'.join(url_list).encode('utf-8'))
stream.close() stream.close()
@@ -282,8 +289,8 @@ class Aria2cFD(ExternalFD):
class HttpieFD(ExternalFD): class HttpieFD(ExternalFD):
@classmethod @classmethod
def available(cls): def available(cls, path=None):
return check_executable('http', ['--version']) return check_executable(path or 'http', ['--version'])
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']] cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
@@ -298,7 +305,7 @@ class FFmpegFD(ExternalFD):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms') SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
@classmethod @classmethod
def available(cls): def available(cls, path=None): # path is ignored for ffmpeg
return FFmpegPostProcessor().available return FFmpegPostProcessor().available
def _call_downloader(self, tmpfilename, info_dict): def _call_downloader(self, tmpfilename, info_dict):

View File

@@ -7,6 +7,11 @@ try:
can_decrypt_frag = True can_decrypt_frag = True
except ImportError: except ImportError:
can_decrypt_frag = False can_decrypt_frag = False
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader from ..downloader import _get_real_downloader
from .fragment import FragmentFD from .fragment import FragmentFD
@@ -19,12 +24,17 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
parse_m3u8_attributes, parse_m3u8_attributes,
sanitize_open,
update_url_query, update_url_query,
) )
class HlsFD(FragmentFD): class HlsFD(FragmentFD):
""" A limited implementation that does not require ffmpeg """ """
Download segments in a m3u8 manifest. External downloaders can take over
the fragment downloads by supporting the 'frag_urls' protocol and
re-defining 'supports_manifest' function
"""
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'
@@ -53,12 +63,15 @@ class HlsFD(FragmentFD):
UNSUPPORTED_FEATURES += [ UNSUPPORTED_FEATURES += [
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
] ]
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest def check_results():
check_results.append(with_crypto or not is_aes128_enc) yield not info_dict.get('is_live')
check_results.append(not (is_aes128_enc and r'#EXT-X-BYTERANGE' in manifest)) is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
check_results.append(not info_dict.get('is_live')) yield with_crypto or not is_aes128_enc
return all(check_results) yield not (is_aes128_enc and r'#EXT-X-BYTERANGE' in manifest)
for feature in UNSUPPORTED_FEATURES:
yield not re.search(feature, manifest)
return all(check_results())
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
@@ -84,6 +97,8 @@ class HlsFD(FragmentFD):
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
real_downloader = _get_real_downloader(info_dict, 'frag_urls', self.params, None) real_downloader = _get_real_downloader(info_dict, 'frag_urls', self.params, None)
if real_downloader and not real_downloader.supports_manifest(s):
real_downloader = None
def is_ad_fragment_start(s): def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
@@ -93,7 +108,7 @@ class HlsFD(FragmentFD):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment')) or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
fragment_urls = [] fragments = []
media_frags = 0 media_frags = 0
ad_frags = 0 ad_frags = 0
@@ -136,14 +151,12 @@ class HlsFD(FragmentFD):
i = 0 i = 0
media_sequence = 0 media_sequence = 0
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
key_list = []
byte_range = {} byte_range = {}
discontinuity_count = 0 discontinuity_count = 0
frag_index = 0 frag_index = 0
ad_frag_next = False ad_frag_next = False
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
download_frag = False
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
if format_index and discontinuity_count != format_index: if format_index and discontinuity_count != format_index:
@@ -160,10 +173,13 @@ class HlsFD(FragmentFD):
if extra_query: if extra_query:
frag_url = update_url_query(frag_url, extra_query) frag_url = update_url_query(frag_url, extra_query)
if real_downloader: fragments.append({
fragment_urls.append(frag_url) 'frag_index': frag_index,
continue 'url': frag_url,
download_frag = True 'decrypt_info': decrypt_info,
'byte_range': byte_range,
'media_sequence': media_sequence,
})
elif line.startswith('#EXT-X-MAP'): elif line.startswith('#EXT-X-MAP'):
if format_index and discontinuity_count != format_index: if format_index and discontinuity_count != format_index:
@@ -180,9 +196,14 @@ class HlsFD(FragmentFD):
else compat_urlparse.urljoin(man_url, map_info.get('URI'))) else compat_urlparse.urljoin(man_url, map_info.get('URI')))
if extra_query: if extra_query:
frag_url = update_url_query(frag_url, extra_query) frag_url = update_url_query(frag_url, extra_query)
if real_downloader:
fragment_urls.append(frag_url) fragments.append({
continue 'frag_index': frag_index,
'url': frag_url,
'decrypt_info': decrypt_info,
'byte_range': byte_range,
'media_sequence': media_sequence
})
if map_info.get('BYTERANGE'): if map_info.get('BYTERANGE'):
splitted_byte_range = map_info.get('BYTERANGE').split('@') splitted_byte_range = map_info.get('BYTERANGE').split('@')
@@ -191,7 +212,6 @@ class HlsFD(FragmentFD):
'start': sub_range_start, 'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]), 'end': sub_range_start + int(splitted_byte_range[0]),
} }
download_frag = True
elif line.startswith('#EXT-X-KEY'): elif line.startswith('#EXT-X-KEY'):
decrypt_url = decrypt_info.get('URI') decrypt_url = decrypt_info.get('URI')
@@ -206,9 +226,6 @@ class HlsFD(FragmentFD):
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query) decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
if decrypt_url != decrypt_info['URI']: if decrypt_url != decrypt_info['URI']:
decrypt_info['KEY'] = None decrypt_info['KEY'] = None
key_data = decrypt_info.copy()
key_data['INDEX'] = frag_index
key_list.append(key_data)
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'): elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:]) media_sequence = int(line[22:])
@@ -225,58 +242,16 @@ class HlsFD(FragmentFD):
ad_frag_next = False ad_frag_next = False
elif line.startswith('#EXT-X-DISCONTINUITY'): elif line.startswith('#EXT-X-DISCONTINUITY'):
discontinuity_count += 1 discontinuity_count += 1
i += 1
media_sequence += 1
if download_frag: # We only download the first fragment during the test
count = 0 if test:
headers = info_dict.get('http_headers', {}) fragments = [fragments[0] if fragments else None]
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
ctx, frag_url, info_dict, headers)
if not success:
return False
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_index)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test
if test:
break
i += 1
media_sequence += 1
if real_downloader: if real_downloader:
info_copy = info_dict.copy() info_copy = info_dict.copy()
info_copy['url_list'] = fragment_urls info_copy['fragments'] = fragments
info_copy['key_list'] = key_list
fd = real_downloader(self.ydl, self.params) fd = real_downloader(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice # TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks: # for ph in self._progress_hooks:
@@ -285,5 +260,106 @@ class HlsFD(FragmentFD):
if not success: if not success:
return False return False
else: else:
def download_fragment(fragment):
frag_index = fragment['frag_index']
frag_url = fragment['url']
decrypt_info = fragment['decrypt_info']
byte_range = fragment['byte_range']
media_sequence = fragment['media_sequence']
ctx['fragment_index'] = frag_index
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
ctx, frag_url, info_dict, headers)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
return False, frag_index
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
return frag_content, frag_index
def append_fragment(frag_content, frag_index):
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
self._append_fragment(ctx, frag_content)
return True
except FileNotFoundError:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(download_fragment, fragment) for fragment in fragments]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
results = [future.result() for future in futures]
for frag_content, frag_index in results:
result = append_fragment(frag_content, frag_index)
if not result:
return False
else:
for fragment in fragments:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
return True return True

View File

@@ -42,6 +42,7 @@ class ApplePodcastsIE(InfoExtractor):
ember_data = self._parse_json(self._search_regex( ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<', r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id) webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes'] episode = ember_data['data']['attributes']
description = episode.get('description') or {} description = episode.get('description') or {}

View File

@@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty', 'uploader': 'Ben Prunty',
'timestamp': 1396508491, 'timestamp': 1396508491,
'upload_date': '20140403', 'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403', 'release_date': '20140403',
'duration': 260.877, 'duration': 260.877,
'track': 'Lanius (Battle)', 'track': 'Lanius (Battle)',
@@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon', 'uploader': 'Mastodon',
'timestamp': 1322005399, 'timestamp': 1322005399,
'upload_date': '20111122', 'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207', 'release_date': '20040207',
'duration': 120.79, 'duration': 120.79,
'track': 'Hail to Fire', 'track': 'Hail to Fire',
@@ -197,7 +199,7 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': artist, 'uploader': artist,
'timestamp': timestamp, 'timestamp': timestamp,
'release_date': unified_strdate(tralbum.get('album_release_date')), 'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration, 'duration': duration,
'track': track, 'track': track,
'track_number': track_number, 'track_number': track_number,

View File

@@ -170,6 +170,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid'] cid = js['result']['cid']
headers = { headers = {
'Accept': 'application/json',
'Referer': url 'Referer': url
} }
headers.update(self.geo_verification_headers()) headers.update(self.geo_verification_headers())

View File

@@ -27,10 +27,10 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'https://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': { 'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_', 'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'ext': 'mp4', 'ext': 'mp4',
@@ -52,16 +52,19 @@ class CBSIE(CBSBaseIE):
}, { }, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}] }]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517): def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):
items_data = self._download_xml( items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php', 'https://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': site, 'contentId': content_id}) content_id, query={'partner': site, 'contentId': content_id})
video_data = xpath_element(items_data, './/item') video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title') or xpath_text(video_data, 'videotitle', 'title') title = xpath_text(video_data, 'videoTitle', 'title') or xpath_text(video_data, 'videotitle', 'title')
tp_path = 'dJ5BDC/media/guid/%d/%s' % (mpx_acc, content_id) tp_path = 'dJ5BDC/media/guid/%d/%s' % (mpx_acc, content_id)
tp_release_url = 'http://link.theplatform.com/s/' + tp_path tp_release_url = 'https://link.theplatform.com/s/' + tp_path
asset_types = [] asset_types = []
subtitles = {} subtitles = {}

View File

@@ -231,8 +231,9 @@ class InfoExtractor(object):
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
license: License name the video is licensed under. license: License name the video is licensed under.
creator: The creator of the video. creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released. release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available. timestamp: UNIX timestamp of the moment the video was uploaded
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp. If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader. uploader_id: Nickname or id of the video uploader.
@@ -264,6 +265,7 @@ class InfoExtractor(object):
properties (all but one of text or html optional): properties (all but one of text or html optional):
* "author" - human-readable name of the comment author * "author" - human-readable name of the comment author
* "author_id" - user ID of the comment author * "author_id" - user ID of the comment author
* "author_thumbnail" - The thumbnail of the comment author
* "id" - Comment ID * "id" - Comment ID
* "html" - Comment as HTML * "html" - Comment as HTML
* "text" - Plain text of the comment * "text" - Plain text of the comment
@@ -271,6 +273,12 @@ class InfoExtractor(object):
* "parent" - ID of the comment this one is replying to. * "parent" - ID of the comment this one is replying to.
Set to "root" to indicate that this is a Set to "root" to indicate that this is a
comment to the original video. comment to the original video.
* "like_count" - Number of positive ratings of the comment
* "dislike_count" - Number of negative ratings of the comment
* "is_favorited" - Whether the comment is marked as
favorite by the video uploader
* "author_is_uploader" - Whether the comment is made by
the video uploader
age_limit: Age restriction for the video, as an integer (years) age_limit: Age restriction for the video, as an integer (years)
webpage_url: The URL to the video webpage, if given to yt-dlp it webpage_url: The URL to the video webpage, if given to yt-dlp it
should allow to get the same result again. (It will be set should allow to get the same result again. (It will be set
@@ -1849,8 +1857,9 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None, def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None, quality=None, entry_protocol='m3u8', preference=None, quality=None,
m3u8_id=None, live=False, note=None, errnote=None, m3u8_id=None, note=None, errnote=None,
fatal=True, data=None, headers={}, query={}): fatal=True, live=False, data=None, headers={},
query={}):
res = self._download_webpage_handle( res = self._download_webpage_handle(
m3u8_url, video_id, m3u8_url, video_id,
note=note or 'Downloading m3u8 information', note=note or 'Downloading m3u8 information',
@@ -2050,11 +2059,11 @@ class InfoExtractor(object):
playlist_formats = _extract_m3u8_playlist_formats(manifest_url, video_id=video_id, playlist_formats = _extract_m3u8_playlist_formats(manifest_url, video_id=video_id,
fatal=fatal, data=data, headers=headers) fatal=fatal, data=data, headers=headers)
for format in playlist_formats: for frmt in playlist_formats:
format_id = [] format_id = []
if m3u8_id: if m3u8_id:
format_id.append(m3u8_id) format_id.append(m3u8_id)
format_index = format.get('index') format_index = frmt.get('index')
stream_name = build_stream_name() stream_name = build_stream_name()
# Bandwidth of live streams may differ over time thus making # Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided # format_id unpredictable. So it's better to keep provided
@@ -2109,6 +2118,8 @@ class InfoExtractor(object):
# TODO: update acodec for audio only formats with # TODO: update acodec for audio only formats with
# the same GROUP-ID # the same GROUP-ID
f['acodec'] = 'none' f['acodec'] = 'none'
if not f.get('ext'):
f['ext'] = 'm4a' if f.get('vcodec') == 'none' else 'mp4'
formats.append(f) formats.append(f)
# for DailyMotion # for DailyMotion

View File

@@ -450,10 +450,7 @@ from .gamestar import GameStarIE
from .gaskrank import GaskrankIE from .gaskrank import GaskrankIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .gedi import ( from .gedidigital import GediDigitalIE
GediIE,
GediEmbedsIE,
)
from .generic import GenericIE from .generic import GenericIE
from .gfycat import GfycatIE from .gfycat import GfycatIE
from .giantbomb import GiantBombIE from .giantbomb import GiantBombIE
@@ -735,6 +732,8 @@ from .mtv import (
MTVServicesEmbeddedIE, MTVServicesEmbeddedIE,
MTVDEIE, MTVDEIE,
MTVJapanIE, MTVJapanIE,
MTVItaliaIE,
MTVItaliaProgrammaIE,
) )
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
from .mwave import MwaveIE, MwaveMeetGreetIE from .mwave import MwaveIE, MwaveMeetGreetIE
@@ -954,6 +953,7 @@ from .plays import PlaysTVIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
from .playwire import PlaywireIE from .playwire import PlaywireIE
from .plutotv import PlutoTVIE
from .pluralsight import ( from .pluralsight import (
PluralsightIE, PluralsightIE,
PluralsightCourseIE, PluralsightCourseIE,
@@ -1560,6 +1560,7 @@ from .weibo import (
WeiboMobileIE WeiboMobileIE
) )
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wimtv import WimTVIE
from .wistia import ( from .wistia import (
WistiaIE, WistiaIE,
WistiaPlaylistIE, WistiaPlaylistIE,
@@ -1669,5 +1670,6 @@ from .zdf import ZDFIE, ZDFChannelIE
from .zhihu import ZhihuIE from .zhihu import ZhihuIE
from .zingmp3 import ZingMp3IE from .zingmp3 import ZingMp3IE
from .zee5 import Zee5IE from .zee5 import Zee5IE
from .zee5 import Zee5SeriesIE
from .zoom import ZoomIE from .zoom import ZoomIE
from .zype import ZypeIE from .zype import ZypeIE

View File

@@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id) self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
for f in formats: for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr')) wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh: if wh:

View File

@@ -1,266 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
base_url,
url_basename,
urljoin,
)
class GediBaseIE(InfoExtractor):
@staticmethod
def _clean_audio_fmts(formats):
unique_formats = []
for f in formats:
if 'acodec' in f:
unique_formats.append(f)
formats[:] = unique_formats
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_data = re.findall(
r'PlayerFactory\.setParam\(\'(?P<type>.+?)\',\s*\'(?P<name>.+?)\',\s*\'(?P<val>.+?)\'\);',
webpage)
formats = []
audio_fmts = []
hls_fmts = []
http_fmts = []
title = ''
thumb = ''
fmt_reg = r'(?P<t>video|audio)-(?P<p>rrtv|hls)-(?P<h>[\w\d]+)(?:-(?P<br>[\w\d]+))?$'
br_reg = r'video-rrtv-(?P<br>\d+)-'
for t, n, v in player_data:
if t == 'format':
m = re.match(fmt_reg, n)
if m:
# audio formats
if m.group('t') == 'audio':
if m.group('p') == 'hls':
audio_fmts.extend(self._extract_m3u8_formats(
v, video_id, 'm4a', m3u8_id='hls', fatal=False))
elif m.group('p') == 'rrtv':
audio_fmts.append({
'format_id': 'mp3',
'url': v,
'tbr': 128,
'ext': 'mp3',
'vcodec': 'none',
'acodec': 'mp3',
})
# video formats
elif m.group('t') == 'video':
# hls manifest video
if m.group('p') == 'hls':
hls_fmts.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', m3u8_id='hls', fatal=False))
# direct mp4 video
elif m.group('p') == 'rrtv':
if not m.group('br'):
mm = re.search(br_reg, v)
http_fmts.append({
'format_id': 'https-' + m.group('h'),
'protocol': 'https',
'url': v,
'tbr': int(m.group('br')) if m.group('br') else
(int(mm.group('br')) if mm.group('br') else 0),
'height': int(m.group('h'))
})
elif t == 'param':
if n == 'videotitle':
title = v
if n == 'image_full_play':
thumb = v
title = self._og_search_title(webpage) if title == '' else title
# clean weird char
title = compat_str(title).encode('utf8', 'replace').replace(b'\xc3\x82', b'').decode('utf8', 'replace')
if audio_fmts:
self._clean_audio_fmts(audio_fmts)
self._sort_formats(audio_fmts)
if hls_fmts:
self._sort_formats(hls_fmts)
if http_fmts:
self._sort_formats(http_fmts)
formats.extend(audio_fmts)
formats.extend(hls_fmts)
formats.extend(http_fmts)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta('twitter:description', webpage),
'thumbnail': thumb,
'formats': formats,
}
class GediIE(GediBaseIE):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/(?!embed/).+?/(?P<id>[\d/]+)(?:\?|\&|$)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121559/121683',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'md5': 'e763b94b7920799a0e0e23ffefa2d157',
'info_dict': {
'id': '367415/367963',
'ext': 'mp4',
'title': 'Record della pista a Spa Francorchamps, la Pagani Huayra Roadster BC stupisce',
'description': 'md5:5deb503cefe734a3eb3f07ed74303920',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'md5': 'e48108e97b1af137d22a8469f2019057',
'info_dict': {
'id': '66184/66267',
'ext': 'mp4',
'title': 'Cassani e i brividi azzurri ai Mondiali di Imola: \\"Qui mi sono innamorato del ciclismo da ragazzino, incredibile tornarci da ct\\"',
'description': 'md5:fc9c50894f70a2469bb9b54d3d0a3d3b',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'md5': 'a6e39f3bdc1842bbd92abbbbef230817',
'info_dict': {
'id': '141059/142723',
'ext': 'mp4',
'title': 'Dentro la notizia - Ferrari, cosa succede a Maranello',
'description': 'md5:9907d65b53765681fa3a0b3122617c1f',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}]
class GediEmbedsIE(GediBaseIE):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/embed/.+?/(?P<id>[\d/]+)(?:\?|\&|$)'''
_TESTS = [{
'url': 'https://video.huffingtonpost.it/embed/politica/cotticelli-non-so-cosa-mi-sia-successo-sto-cercando-di-capire-se-ho-avuto-un-malore/29312/29276?responsive=true&el=video971040871621586700',
'md5': 'f4ac23cadfea7fef89bea536583fa7ed',
'info_dict': {
'id': '29312/29276',
'ext': 'mp4',
'title': 'Cotticelli: \\"Non so cosa mi sia successo. Sto cercando di capire se ho avuto un malore\\"',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'md5': '0391c2c83c6506581003aaf0255889c0',
'info_dict': {
'id': '14772/14870',
'ext': 'mp4',
'title': 'Festival EMERGENCY, Villa: «La buona informazione aiuta la salute» (14772-14870)',
'description': 'md5:2bce954d278248f3c950be355b7c2226',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}]
@staticmethod
def _sanitize_urls(urls):
# add protocol if missing
for i, e in enumerate(urls):
if e.startswith('//'):
urls[i] = 'https:%s' % e
# clean iframes urls
for i, e in enumerate(urls):
urls[i] = urljoin(base_url(e), url_basename(e))
return urls
@staticmethod
def _extract_urls(webpage):
entries = [
mobj.group('url')
for mobj in re.finditer(r'''(?x)
(?:
data-frame-src=|
<iframe[^\n]+src=
)
(["'])
(?P<url>https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/embed/.+?)
\1''', webpage)]
return GediEmbedsIE._sanitize_urls(entries)
@staticmethod
def _extract_url(webpage):
urls = GediEmbedsIE._extract_urls(webpage)
return urls[0] if urls else None

View File

@@ -0,0 +1,210 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
base_url,
determine_ext,
int_or_none,
url_basename,
urljoin,
)
class GediDigitalIE(InfoExtractor):
_VALID_URL = r'''(?x)(?P<url>(?:https?:)//video\.
(?:
(?:
(?:espresso\.)?repubblica
|lastampa
|ilsecoloxix
|huffingtonpost
)|
(?:
iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)\.gelocal
)\.it(?:/[^/]+){2,4}/(?P<id>\d+))(?:$|[?&].*)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121683',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-full-.+?\.jpg$',
'duration': 125,
},
}, {
'url': 'https://video.huffingtonpost.it/embed/politica/cotticelli-non-so-cosa-mi-sia-successo-sto-cercando-di-capire-se-ho-avuto-un-malore/29312/29276?responsive=true&el=video971040871621586700',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'only_matching': True,
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'only_matching': True,
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'only_matching': True,
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'only_matching': True,
}, {
'url': 'https://video.messaggeroveneto.gelocal.it/locale/maria-giovanna-elmi-covid-vaccino/138155/139268',
'only_matching': True,
}, {
'url': 'https://video.ilpiccolo.gelocal.it/dossier/big-john/dinosauro-big-john-al-via-le-visite-guidate-a-trieste/135226/135751',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimantova.gelocal.it/locale/dal-ponte-visconteo-di-valeggio-l-and-8217sos-dei-ristoratori-aprire-anche-a-cena/137310/137818',
'only_matching': True,
}, {
'url': 'https://video.mattinopadova.gelocal.it/dossier/coronavirus-in-veneto/covid-a-vo-un-anno-dopo-un-cuore-tricolore-per-non-dimenticare/138402/138964',
'only_matching': True,
}, {
'url': 'https://video.laprovinciapavese.gelocal.it/locale/mede-zona-rossa-via-alle-vaccinazioni-per-gli-over-80/137545/138120',
'only_matching': True,
}, {
'url': 'https://video.tribunatreviso.gelocal.it/dossier/coronavirus-in-veneto/ecco-le-prima-vaccinazioni-di-massa-nella-marca/134485/135024',
'only_matching': True,
}, {
'url': 'https://video.nuovavenezia.gelocal.it/locale/camion-troppo-alto-per-il-ponte-ferroviario-perde-il-carico/135734/136266',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimodena.gelocal.it/locale/modena-scoperta-la-proteina-che-predice-il-livello-di-gravita-del-covid/139109/139796',
'only_matching': True,
}, {
'url': 'https://video.lanuovaferrara.gelocal.it/locale/due-bombole-di-gpl-aperte-e-abbandonate-i-vigili-bruciano-il-gas/134391/134957',
'only_matching': True,
}, {
'url': 'https://video.corrierealpi.gelocal.it/dossier/cortina-2021-i-mondiali-di-sci-alpino/mondiali-di-sci-il-timelapse-sulla-splendida-olympia/133760/134331',
'only_matching': True,
}, {
'url': 'https://video.lasentinella.gelocal.it/locale/vestigne-centra-un-auto-e-si-ribalta/138931/139466',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/tutti-i-video/01-ted-villa/14772',
'only_matching': True,
}]
@staticmethod
def _sanitize_urls(urls):
# add protocol if missing
for i, e in enumerate(urls):
if e.startswith('//'):
urls[i] = 'https:%s' % e
# clean iframes urls
for i, e in enumerate(urls):
urls[i] = urljoin(base_url(e), url_basename(e))
return urls
@staticmethod
def _extract_urls(webpage):
entries = [
mobj.group('eurl')
for mobj in re.finditer(r'''(?x)
(?:
data-frame-src=|
<iframe[^\n]+src=
)
(["'])(?P<eurl>%s)\1''' % GediDigitalIE._VALID_URL, webpage)]
return GediDigitalIE._sanitize_urls(entries)
@staticmethod
def _extract_url(webpage):
urls = GediDigitalIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _clean_formats(formats):
format_urls = set()
clean_formats = []
for f in formats:
if f['url'] not in format_urls:
if f.get('audio_ext') != 'none' and not f.get('acodec'):
continue
format_urls.add(f['url'])
clean_formats.append(f)
formats[:] = clean_formats
def _real_extract(self, url):
video_id = self._match_id(url)
url = re.match(self._VALID_URL, url).group('url')
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
['twitter:title', 'og:title'], webpage, fatal=True)
player_data = re.findall(
r"PlayerFactory\.setParam\('(?P<type>format|param)',\s*'(?P<name>[^']+)',\s*'(?P<val>[^']+)'\);",
webpage)
formats = []
duration = thumb = None
for t, n, v in player_data:
if t == 'format':
if n in ('video-hds-vod-ec', 'video-hls-vod-ec', 'video-viralize', 'video-youtube-pfp'):
continue
elif n.endswith('-vod-ak'):
formats.extend(self._extract_akamai_formats(
v, video_id, {'http': 'media.gedidigital.it'}))
else:
ext = determine_ext(v)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', 'm3u8_native', m3u8_id=n, fatal=False))
continue
f = {
'format_id': n,
'url': v,
}
if ext == 'mp3':
abr = int_or_none(self._search_regex(
r'-mp3-audio-(\d+)', v, 'abr', default=None))
f.update({
'abr': abr,
'tbr': abr,
'acodec': ext,
'vcodec': 'none'
})
else:
mobj = re.match(r'^video-rrtv-(\d+)(?:-(\d+))?$', n)
if mobj:
f.update({
'height': int(mobj.group(1)),
'vbr': int_or_none(mobj.group(2)),
})
if not f.get('vbr'):
f['vbr'] = int_or_none(self._search_regex(
r'-video-rrtv-(\d+)', v, 'abr', default=None))
formats.append(f)
elif t == 'param':
if n in ['image_full', 'image']:
thumb = v
elif n == 'videoDuration':
duration = int_or_none(v)
self._clean_formats(formats)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
['twitter:description', 'og:description', 'description'], webpage),
'thumbnail': thumb or self._og_search_thumbnail(webpage),
'formats': formats,
'duration': duration,
}

View File

@@ -127,13 +127,14 @@ from .expressen import ExpressenIE
from .zype import ZypeIE from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .gedi import GediEmbedsIE from .gedidigital import GediDigitalIE
from .rcs import RCSEmbedsIE from .rcs import RCSEmbedsIE
from .bitchute import BitChuteIE from .bitchute import BitChuteIE
from .rumble import RumbleEmbedIE from .rumble import RumbleEmbedIE
from .arcpublishing import ArcPublishingIE from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
from .simplecast import SimplecastIE from .simplecast import SimplecastIE
from .wimtv import WimTVIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -2250,6 +2251,15 @@ class GenericIE(InfoExtractor):
}, },
'playlist_mincount': 52, 'playlist_mincount': 52,
}, },
{
# WimTv embed player
'url': 'http://www.msmotor.tv/wearefmi-pt-2-2021/',
'info_dict': {
'id': 'wearefmi-pt-2-2021',
'title': '#WEAREFMI PT.2 2021 MsMotorTV',
},
'playlist_count': 1,
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -3339,17 +3349,22 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
zype_urls, video_id, video_title, ie=ZypeIE.ie_key()) zype_urls, video_id, video_title, ie=ZypeIE.ie_key())
# Look for RCS media group embeds gedi_urls = GediDigitalIE._extract_urls(webpage)
gedi_urls = GediEmbedsIE._extract_urls(webpage)
if gedi_urls: if gedi_urls:
return self.playlist_from_matches( return self.playlist_from_matches(
gedi_urls, video_id, video_title, ie=GediEmbedsIE.ie_key()) gedi_urls, video_id, video_title, ie=GediDigitalIE.ie_key())
# Look for RCS media group embeds
rcs_urls = RCSEmbedsIE._extract_urls(webpage) rcs_urls = RCSEmbedsIE._extract_urls(webpage)
if rcs_urls: if rcs_urls:
return self.playlist_from_matches( return self.playlist_from_matches(
rcs_urls, video_id, video_title, ie=RCSEmbedsIE.ie_key()) rcs_urls, video_id, video_title, ie=RCSEmbedsIE.ie_key())
wimtv_urls = WimTVIE._extract_urls(webpage)
if wimtv_urls:
return self.playlist_from_matches(
wimtv_urls, video_id, video_title, ie=WimTVIE.ie_key())
bitchute_urls = BitChuteIE._extract_urls(webpage) bitchute_urls = BitChuteIE._extract_urls(webpage)
if bitchute_urls: if bitchute_urls:
return self.playlist_from_matches( return self.playlist_from_matches(

View File

@@ -6,8 +6,10 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@@ -21,9 +23,9 @@ from ..utils import (
class LBRYBaseIE(InfoExtractor): class LBRYBaseIE(InfoExtractor):
_BASE_URL_REGEX = r'https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/' _BASE_URL_REGEX = r'(?:https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/|lbry://)'
_CLAIM_ID_REGEX = r'[0-9a-f]{1,40}' _CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = '[^:/?#&]+(?::%s)?' % _CLAIM_ID_REGEX _OPT_CLAIM_ID = '[^:/?#&]+(?:[:#]%s)?' % _CLAIM_ID_REGEX
_SUPPORTED_STREAM_TYPES = ['video', 'audio'] _SUPPORTED_STREAM_TYPES = ['video', 'audio']
def _call_api_proxy(self, method, display_id, params, resource): def _call_api_proxy(self, method, display_id, params, resource):
@@ -41,7 +43,9 @@ class LBRYBaseIE(InfoExtractor):
'resolve', display_id, {'urls': url}, resource)[url] 'resolve', display_id, {'urls': url}, resource)[url]
def _permanent_url(self, url, claim_name, claim_id): def _permanent_url(self, url, claim_name, claim_id):
return urljoin(url, '/%s:%s' % (claim_name, claim_id)) return urljoin(
url.replace('lbry://', 'https://lbry.tv/'),
'/%s:%s' % (claim_name, claim_id))
def _parse_stream(self, stream, url): def _parse_stream(self, stream, url):
stream_value = stream.get('value') or {} stream_value = stream.get('value') or {}
@@ -60,6 +64,7 @@ class LBRYBaseIE(InfoExtractor):
'description': stream_value.get('description'), 'description': stream_value.get('description'),
'license': stream_value.get('license'), 'license': stream_value.get('license'),
'timestamp': int_or_none(stream.get('timestamp')), 'timestamp': int_or_none(stream.get('timestamp')),
'release_timestamp': int_or_none(stream_value.get('release_time')),
'tags': stream_value.get('tags'), 'tags': stream_value.get('tags'),
'duration': int_or_none(media.get('duration')), 'duration': int_or_none(media.get('duration')),
'channel': try_get(signing_channel, lambda x: x['value']['title']), 'channel': try_get(signing_channel, lambda x: x['value']['title']),
@@ -92,6 +97,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51', 'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
'timestamp': 1595694354, 'timestamp': 1595694354,
'upload_date': '20200725', 'upload_date': '20200725',
'release_timestamp': 1595340697,
'release_date': '20200721',
'width': 1280, 'width': 1280,
'height': 720, 'height': 720,
} }
@@ -106,6 +113,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:661ac4f1db09f31728931d7b88807a61', 'description': 'md5:661ac4f1db09f31728931d7b88807a61',
'timestamp': 1591312601, 'timestamp': 1591312601,
'upload_date': '20200604', 'upload_date': '20200604',
'release_timestamp': 1591312421,
'release_date': '20200604',
'tags': list, 'tags': list,
'duration': 2570, 'duration': 2570,
'channel': 'The LBRY Foundation', 'channel': 'The LBRY Foundation',
@@ -137,6 +146,9 @@ class LBRYIE(LBRYBaseIE):
}, { }, {
'url': 'https://lbry.tv/@lacajadepandora:a/TRUMP-EST%C3%81-BIEN-PUESTO-con-Pilar-Baselga,-Carlos-Senra,-Luis-Palacios-(720p_30fps_H264-192kbit_AAC):1', 'url': 'https://lbry.tv/@lacajadepandora:a/TRUMP-EST%C3%81-BIEN-PUESTO-con-Pilar-Baselga,-Carlos-Senra,-Luis-Palacios-(720p_30fps_H264-192kbit_AAC):1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'lbry://@lbry#3f/odysee#7',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -166,7 +178,7 @@ class LBRYIE(LBRYBaseIE):
class LBRYChannelIE(LBRYBaseIE): class LBRYChannelIE(LBRYBaseIE):
IE_NAME = 'lbry:channel' IE_NAME = 'lbry:channel'
_VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>@%s)/?(?:[?#&]|$)' % LBRYBaseIE._OPT_CLAIM_ID _VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>@%s)/?(?:[?&]|$)' % LBRYBaseIE._OPT_CLAIM_ID
_TESTS = [{ _TESTS = [{
'url': 'https://lbry.tv/@LBRYFoundation:0', 'url': 'https://lbry.tv/@LBRYFoundation:0',
'info_dict': { 'info_dict': {
@@ -178,20 +190,24 @@ class LBRYChannelIE(LBRYBaseIE):
}, { }, {
'url': 'https://lbry.tv/@LBRYFoundation', 'url': 'https://lbry.tv/@LBRYFoundation',
'only_matching': True, 'only_matching': True,
}, {
'url': 'lbry://@lbry#3f',
'only_matching': True,
}] }]
_PAGE_SIZE = 50 _PAGE_SIZE = 50
def _fetch_page(self, claim_id, url, page): def _fetch_page(self, claim_id, url, params, page):
page += 1 page += 1
page_params = {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
}
page_params.update(params)
result = self._call_api_proxy( result = self._call_api_proxy(
'claim_search', claim_id, { 'claim_search', claim_id, page_params, 'page %d' % page)
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
'stream_types': self._SUPPORTED_STREAM_TYPES,
}, 'page %d' % page)
for item in (result.get('items') or []): for item in (result.get('items') or []):
stream_claim_name = item.get('name') stream_claim_name = item.get('name')
stream_claim_id = item.get('claim_id') stream_claim_id = item.get('claim_id')
@@ -212,8 +228,31 @@ class LBRYChannelIE(LBRYBaseIE):
result = self._resolve_url( result = self._resolve_url(
'lbry://' + display_id, display_id, 'channel') 'lbry://' + display_id, display_id, 'channel')
claim_id = result['claim_id'] claim_id = result['claim_id']
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
content = qs.get('content', [None])[0]
params = {
'fee_amount': qs.get('fee_amount', ['>=0'])[0],
'order_by': {
'new': ['release_time'],
'top': ['effective_amount'],
'trending': ['trending_group', 'trending_mixed'],
}[qs.get('order', ['new'])[0]],
'stream_types': [content] if content in ['audio', 'video'] else self._SUPPORTED_STREAM_TYPES,
}
duration = qs.get('duration', [None])[0]
if duration:
params['duration'] = {
'long': '>=1200',
'short': '<=240',
}[duration]
language = qs.get('language', ['all'])[0]
if language != 'all':
languages = [language]
if language == 'en':
languages.append('none')
params['any_languages'] = languages
entries = OnDemandPagedList( entries = OnDemandPagedList(
functools.partial(self._fetch_page, claim_id, url), functools.partial(self._fetch_page, claim_id, url, params),
self._PAGE_SIZE) self._PAGE_SIZE)
result_value = result.get('value') or {} result_value = result.get('value') or {}
return self.playlist_result( return self.playlist_result(

View File

@@ -14,6 +14,7 @@ from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
int_or_none,
RegexNotFoundError, RegexNotFoundError,
sanitized_Request, sanitized_Request,
strip_or_none, strip_or_none,
@@ -176,6 +177,22 @@ class MTVServicesInfoExtractor(InfoExtractor):
raise ExtractorError('Could not find video title') raise ExtractorError('Could not find video title')
title = title.strip() title = title.strip()
series = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:franchise')
season = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:seasonN')
episode = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:episodeN')
series = series.text if series is not None else None
season = season.text if season is not None else None
episode = episode.text if episode is not None else None
if season and episode:
# episode number includes season, so remove it
episode = re.sub(r'^%s' % season, '', episode)
# This a short id that's used in the webpage urls # This a short id that's used in the webpage urls
mtvn_id = None mtvn_id = None
mtvn_id_node = find_xpath_attr(itemdoc, './/{http://search.yahoo.com/mrss/}category', mtvn_id_node = find_xpath_attr(itemdoc, './/{http://search.yahoo.com/mrss/}category',
@@ -201,6 +218,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
'description': description, 'description': description,
'duration': float_or_none(content_el.attrib.get('duration')), 'duration': float_or_none(content_el.attrib.get('duration')),
'timestamp': timestamp, 'timestamp': timestamp,
'series': series,
'season_number': int_or_none(season),
'episode_number': int_or_none(episode),
} }
def _get_feed_query(self, uri): def _get_feed_query(self, uri):
@@ -483,3 +503,152 @@ class MTVDEIE(MTVServicesInfoExtractor):
'arcEp': 'mtv.de', 'arcEp': 'mtv.de',
'mgid': uri, 'mgid': uri,
} }
class MTVItaliaIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv.it'
_VALID_URL = r'https?://(?:www\.)?mtv\.it/(?:episodi|video|musica)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.mtv.it/episodi/24bqab/mario-una-serie-di-maccio-capatonda-cavoli-amario-episodio-completo-S1-E1',
'info_dict': {
'id': '0f0fc78e-45fc-4cce-8f24-971c25477530',
'ext': 'mp4',
'title': 'Cavoli amario (episodio completo)',
'description': 'md5:4962bccea8fed5b7c03b295ae1340660',
'series': 'Mario - Una Serie Di Maccio Capatonda',
'season_number': 1,
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}]
_GEO_COUNTRIES = ['IT']
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
def _get_feed_query(self, uri):
return {
'arcEp': 'mtv.it',
'mgid': uri,
}
class MTVItaliaProgrammaIE(MTVItaliaIE):
IE_NAME = 'mtv.it:programma'
_VALID_URL = r'https?://(?:www\.)?mtv\.it/(?:programmi|playlist)/(?P<id>[0-9a-z]+)'
_TESTS = [{
# program page: general
'url': 'http://www.mtv.it/programmi/s2rppv/mario-una-serie-di-maccio-capatonda',
'info_dict': {
'id': 'a6f155bc-8220-4640-aa43-9b95f64ffa3d',
'title': 'Mario - Una Serie Di Maccio Capatonda',
'description': 'md5:72fbffe1f77ccf4e90757dd4e3216153',
},
'playlist_count': 2,
'params': {
'skip_download': True,
},
}, {
# program page: specific season
'url': 'http://www.mtv.it/programmi/d9ncjf/mario-una-serie-di-maccio-capatonda-S2',
'info_dict': {
'id': '4deeb5d8-f272-490c-bde2-ff8d261c6dd1',
'title': 'Mario - Una Serie Di Maccio Capatonda - Stagione 2',
},
'playlist_count': 34,
'params': {
'skip_download': True,
},
}, {
# playlist page + redirect
'url': 'http://www.mtv.it/playlist/sexy-videos/ilctal',
'info_dict': {
'id': 'dee8f9ee-756d-493b-bf37-16d1d2783359',
'title': 'Sexy Videos',
},
'playlist_mincount': 145,
'params': {
'skip_download': True,
},
}]
_GEO_COUNTRIES = ['IT']
_FEED_URL = 'http://www.mtv.it/feeds/triforce/manifest/v8'
def _get_entries(self, title, url):
while True:
pg = self._search_regex(r'/(\d+)$', url, 'entries', '1')
entries = self._download_json(url, title, 'page %s' % pg)
url = try_get(
entries, lambda x: x['result']['nextPageURL'], compat_str)
entries = try_get(
entries, (
lambda x: x['result']['data']['items'],
lambda x: x['result']['data']['seasons']),
list)
for entry in entries or []:
if entry.get('canonicalURL'):
yield self.url_result(entry['canonicalURL'])
if not url:
break
def _real_extract(self, url):
query = {'url': url}
info_url = update_url_query(self._FEED_URL, query)
video_id = self._match_id(url)
info = self._download_json(info_url, video_id).get('manifest')
redirect = try_get(
info, lambda x: x['newLocation']['url'], compat_str)
if redirect:
return self.url_result(redirect)
title = info.get('title')
video_id = try_get(
info, lambda x: x['reporting']['itemId'], compat_str)
parent_id = try_get(
info, lambda x: x['reporting']['parentId'], compat_str)
playlist_url = current_url = None
for z in (info.get('zones') or {}).values():
if z.get('moduleName') in ('INTL_M304', 'INTL_M209'):
info_url = z.get('feed')
if z.get('moduleName') in ('INTL_M308', 'INTL_M317'):
playlist_url = playlist_url or z.get('feed')
if z.get('moduleName') in ('INTL_M300',):
current_url = current_url or z.get('feed')
if not info_url:
raise ExtractorError('No info found')
if video_id == parent_id:
video_id = self._search_regex(
r'([^\/]+)/[^\/]+$', info_url, 'video_id')
info = self._download_json(info_url, video_id, 'Show infos')
info = try_get(info, lambda x: x['result']['data'], dict)
title = title or try_get(
info, (
lambda x: x['title'],
lambda x: x['headline']),
compat_str)
description = try_get(info, lambda x: x['content'], compat_str)
if current_url:
season = try_get(
self._download_json(playlist_url, video_id, 'Seasons info'),
lambda x: x['result']['data'], dict)
current = try_get(
season, lambda x: x['currentSeason'], compat_str)
seasons = try_get(
season, lambda x: x['seasons'], list) or []
if current in [s.get('eTitle') for s in seasons]:
playlist_url = current_url
title = re.sub(
r'[-|]\s*(?:mtv\s*italia|programma|playlist)',
'', title, flags=re.IGNORECASE).strip()
return self.playlist_result(
self._get_entries(title, playlist_url),
video_id, title, description)

View File

@@ -6,98 +6,122 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
qualities,
try_get,
url_or_none, url_or_none,
urljoin, urljoin,
) )
VALID_STREAMS = ('dash', )
class MxplayerIE(InfoExtractor): class MxplayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mxplayer\.in/movie/(?P<slug>[a-z0-9]+(?:-[a-z0-9]+)*)' _VALID_URL = r'https?://(?:www\.)?mxplayer\.in/(?:show|movie)/(?:(?P<display_id>[-/a-z0-9]+)-)?(?P<id>[a-z0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://www.mxplayer.in/movie/watch-knock-knock-hindi-dubbed-movie-online-b9fa28df3bfb8758874735bbd7d2655a?watch=true', 'url': 'https://www.mxplayer.in/movie/watch-knock-knock-hindi-dubbed-movie-online-b9fa28df3bfb8758874735bbd7d2655a?watch=true',
'info_dict': { 'info_dict': {
'id': 'b9fa28df3bfb8758874735bbd7d2655a', 'id': 'b9fa28df3bfb8758874735bbd7d2655a',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Knock Knock Movie | Watch 2015 Knock Knock Full Movie Online- MX Player', 'title': 'Knock Knock (Hindi Dubbed)',
'description': 'md5:b195ba93ff1987309cfa58e2839d2a5b' 'description': 'md5:b195ba93ff1987309cfa58e2839d2a5b'
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
'format': 'bestvideo' 'format': 'bestvideo'
} }
} }, {
'url': 'https://www.mxplayer.in/show/watch-shaitaan/season-1/the-infamous-taxi-gang-of-meerut-online-45055d5bcff169ad48f2ad7552a83d6c',
def _get_best_stream_url(self, stream): 'info_dict': {
best_stream = list(filter(None, [v for k, v in stream.items()])) 'id': '45055d5bcff169ad48f2ad7552a83d6c',
return best_stream.pop(0) if len(best_stream) else None 'ext': 'm3u8',
'title': 'The infamous taxi gang of Meerut',
'description': 'md5:033a0a7e3fd147be4fb7e07a01a3dc28',
'season': 'Season 1',
'series': 'Shaitaan'
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.mxplayer.in/show/watch-aashram/chapter-1/duh-swapna-online-d445579792b0135598ba1bc9088a84cb',
'info_dict': {
'id': 'd445579792b0135598ba1bc9088a84cb',
'ext': 'mp4',
'title': 'Duh Swapna',
'description': 'md5:35ff39c4bdac403c53be1e16a04192d8',
'season': 'Chapter 1',
'series': 'Aashram'
},
'expected_warnings': ['Unknown MIME type application/mp4 in DASH manifest'],
'params': {
'skip_download': True,
'format': 'bestvideo'
}
}]
def _get_stream_urls(self, video_dict): def _get_stream_urls(self, video_dict):
stream_dict = video_dict.get('stream', {'provider': {}}) stream_provider_dict = try_get(
stream_provider = stream_dict.get('provider') video_dict,
lambda x: x['stream'][x['stream']['provider']])
if not stream_provider_dict:
raise ExtractorError('No stream provider found', expected=True)
if not stream_dict[stream_provider]: for stream_name, stream in stream_provider_dict.items():
message = 'No stream provider found' if stream_name in ('hls', 'dash', 'hlsUrl', 'dashUrl'):
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True) stream_type = stream_name.replace('Url', '')
if isinstance(stream, dict):
streams = [] for quality, stream_url in stream.items():
for stream_name, v in stream_dict[stream_provider].items(): if stream_url:
if stream_name in VALID_STREAMS: yield stream_type, quality, stream_url
stream_url = self._get_best_stream_url(v) else:
if stream_url is None: yield stream_type, 'base', stream
continue
streams.append((stream_name, stream_url))
return streams
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id, video_id = re.match(self._VALID_URL, url).groups()
video_slug = mobj.group('slug')
video_id = video_slug.split('-')[-1]
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
window_state_json = self._html_search_regex( source = self._parse_json(
r'(?s)<script>window\.state\s*[:=]\s(\{.+\})\n(\w+).*(</script>).*', js_to_json(self._html_search_regex(
webpage, 'WindowState') r'(?s)<script>window\.state\s*[:=]\s(\{.+\})\n(\w+).*(</script>).*',
webpage, 'WindowState')),
source = self._parse_json(js_to_json(window_state_json), video_id) video_id)
if not source: if not source:
raise ExtractorError('Cannot find source', expected=True) raise ExtractorError('Cannot find source', expected=True)
config_dict = source['config'] config_dict = source['config']
video_dict = source['entities'][video_id] video_dict = source['entities'][video_id]
stream_urls = self._get_stream_urls(video_dict)
title = self._og_search_title(webpage, fatal=True, default=video_dict['title']) thumbnails = []
for i in video_dict.get('imageInfo') or []:
thumbnails.append({
'url': urljoin(config_dict['imageBaseUrl'], i['url']),
'width': i['width'],
'height': i['height'],
})
formats = [] formats = []
headers = {'Referer': url} get_quality = qualities(['main', 'base', 'high'])
for stream_name, stream_url in stream_urls: for stream_type, quality, stream_url in self._get_stream_urls(video_dict):
if stream_name == 'dash': format_url = url_or_none(urljoin(config_dict['videoCdnBaseUrl'], stream_url))
format_url = url_or_none(urljoin(config_dict['videoCdnBaseUrl'], stream_url)) if not format_url:
if not format_url: continue
continue if stream_type == 'dash':
formats.extend(self._extract_mpd_formats( dash_formats = self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', headers=headers)) format_url, video_id, mpd_id='dash-%s' % quality, headers={'Referer': url})
for frmt in dash_formats:
frmt['quality'] = get_quality(quality)
formats.extend(dash_formats)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, fatal=False,
m3u8_id='hls-%s' % quality, quality=get_quality(quality)))
self._sort_formats(formats) self._sort_formats(formats)
info = { return {
'id': video_id, 'id': video_id,
'title': title, 'display_id': display_id.replace('/', '-'),
'title': video_dict['title'] or self._og_search_title(webpage),
'formats': formats,
'description': video_dict.get('description'), 'description': video_dict.get('description'),
'formats': formats 'season': try_get(video_dict, lambda x: x['container']['title']),
'series': try_get(video_dict, lambda x: x['container']['container']['title']),
'thumbnails': thumbnails,
} }
if video_dict.get('imageInfo'):
info['thumbnails'] = list(map(lambda i: dict(i, **{
'url': urljoin(config_dict['imageBaseUrl'], i['url'])
}), video_dict['imageInfo']))
if video_dict.get('webUrl'):
last_part = video_dict['webUrl'].split("/")[-1]
info['display_id'] = last_part.replace(video_id, "").rstrip("-")
return info

View File

@@ -599,11 +599,13 @@ class PeerTubeIE(InfoExtractor):
else: else:
age_limit = None age_limit = None
webpage_url = 'https://%s/videos/watch/%s' % (host, video_id)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')), 'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')), 'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str), 'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)), 'uploader_id': str_or_none(account_data('id', int)),
@@ -621,5 +623,6 @@ class PeerTubeIE(InfoExtractor):
'tags': try_get(video, lambda x: x['tags'], list), 'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,
'subtitles': subtitles 'subtitles': subtitles,
'webpage_url': webpage_url,
} }

View File

@@ -31,6 +31,7 @@ class PinterestBaseIE(InfoExtractor):
title = (data.get('title') or data.get('grid_title') or video_id).strip() title = (data.get('title') or data.get('grid_title') or video_id).strip()
urls = []
formats = [] formats = []
duration = None duration = None
if extract_formats: if extract_formats:
@@ -38,8 +39,9 @@ class PinterestBaseIE(InfoExtractor):
if not isinstance(format_dict, dict): if not isinstance(format_dict, dict):
continue continue
format_url = url_or_none(format_dict.get('url')) format_url = url_or_none(format_dict.get('url'))
if not format_url: if not format_url or format_url in urls:
continue continue
urls.append(format_url)
duration = float_or_none(format_dict.get('duration'), scale=1000) duration = float_or_none(format_dict.get('duration'), scale=1000)
ext = determine_ext(format_url) ext = determine_ext(format_url)
if 'hls' in format_id.lower() or ext == 'm3u8': if 'hls' in format_id.lower() or ext == 'm3u8':

164
yt_dlp/extractor/plutotv.py Normal file
View File

@@ -0,0 +1,164 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import uuid
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
try_get,
url_or_none,
)
class PlutoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pluto\.tv/on-demand/(?P<video_type>movies|series)/(?P<slug>.*)/?$'
_INFO_URL = 'https://service-vod.clusters.pluto.tv/v3/vod/slugs/'
_INFO_QUERY_PARAMS = {
'appName': 'web',
'appVersion': 'na',
'clientID': compat_str(uuid.uuid1()),
'clientModelNumber': 'na',
'serverSideAds': 'false',
'deviceMake': 'unknown',
'deviceModel': 'web',
'deviceType': 'web',
'deviceVersion': 'unknown',
'sid': compat_str(uuid.uuid1()),
}
_TESTS = [
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/season/2/episode/its-in-the-cards-2009-2-3',
'md5': 'ebcdd8ed89aaace9df37924f722fd9bd',
'info_dict': {
'id': '5de6c598e9379ae4912df0a8',
'ext': 'mp4',
'title': 'It\'s In The Cards',
'episode': 'It\'s In The Cards',
'description': 'The teams face off against each other in a 3-on-2 soccer showdown. Strategy comes into play, though, as each team gets to select their opposing teams two defenders.',
'series': 'I Love Money',
'season_number': 2,
'episode_number': 3,
'duration': 3600,
}
},
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/season/1/',
'playlist_count': 11,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money - Season 1',
}
},
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/',
'playlist_count': 26,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money',
}
},
{
'url': 'https://pluto.tv/on-demand/movies/arrival-2015-1-1',
'md5': '3cead001d317a018bf856a896dee1762',
'info_dict': {
'id': '5e83ac701fa6a9001bb9df24',
'ext': 'mp4',
'title': 'Arrival',
'description': 'When mysterious spacecraft touch down across the globe, an elite team - led by expert translator Louise Banks (Academy Award® nominee Amy Adams) races against time to decipher their intent.',
'duration': 9000,
}
},
]
def _to_ad_free_formats(self, video_id, formats):
ad_free_formats = []
m3u8_urls = set()
for format in formats:
res = self._download_webpage(
format.get('url'), video_id, note='Downloading m3u8 playlist',
fatal=False)
if not res:
continue
first_segment_url = re.search(
r'^(https?://.*/)0\-(end|[0-9]+)/[^/]+\.ts$', res,
re.MULTILINE)
if not first_segment_url:
continue
m3u8_urls.add(
compat_urlparse.urljoin(first_segment_url.group(1), '0-end/master.m3u8'))
for m3u8_url in m3u8_urls:
ad_free_formats.extend(
self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(ad_free_formats)
return ad_free_formats
def _get_video_info(self, video_json, slug, series_name=None):
video_id = video_json.get('_id', slug)
formats = []
for video_url in try_get(video_json, lambda x: x['stitched']['urls'], list) or []:
if video_url.get('type') != 'hls':
continue
url = url_or_none(video_url.get('url'))
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
info = {
'id': video_id,
'formats': self._to_ad_free_formats(video_id, formats),
'title': video_json.get('name'),
'description': video_json.get('description'),
'duration': float_or_none(video_json.get('duration'), scale=1000),
}
if series_name:
info.update({
'series': series_name,
'episode': video_json.get('name'),
'season_number': int_or_none(video_json.get('season')),
'episode_number': int_or_none(video_json.get('number')),
})
return info
def _real_extract(self, url):
path = compat_urlparse.urlparse(url).path
path_components = path.split('/')
video_type = path_components[2]
info_slug = path_components[3]
video_json = self._download_json(self._INFO_URL + info_slug, info_slug,
query=self._INFO_QUERY_PARAMS)
if video_type == 'series':
series_name = video_json.get('name', info_slug)
season_number = int_or_none(try_get(path_components, lambda x: x[5]))
episode_slug = try_get(path_components, lambda x: x[7])
videos = []
for season in video_json['seasons']:
if season_number is not None and season_number != int_or_none(season.get('number')):
continue
for episode in season['episodes']:
if episode_slug is not None and episode_slug != episode.get('slug'):
continue
videos.append(self._get_video_info(episode, episode_slug, series_name))
if not videos:
raise ExtractorError('Failed to find any videos to extract')
if episode_slug is not None and len(videos) == 1:
return videos[0]
playlist_title = series_name
if season_number is not None:
playlist_title += ' - Season %d' % season_number
return self.playlist_result(videos,
playlist_id=video_json.get('_id', info_slug),
playlist_title=playlist_title)
return self._get_video_info(video_json, info_slug)

View File

@@ -167,6 +167,7 @@ class PornHubIE(PornHubBaseIE):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video has been flagged for verification in accordance with our trust and safety policy',
}, { }, {
# subtitles # subtitles
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7', 'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7',
@@ -265,7 +266,8 @@ class PornHubIE(PornHubBaseIE):
webpage = dl_webpage('pc') webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', (r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
r'(?s)<section[^>]+class=["\']noVideo["\'][^>]*>(?P<error>.+?)</section>'),
webpage, 'error message', default=None, group='error') webpage, 'error message', default=None, group='error')
if error_msg: if error_msg:
error_msg = re.sub(r'\s+', ' ', error_msg) error_msg = re.sub(r'\s+', ' ', error_msg)
@@ -394,6 +396,21 @@ class PornHubIE(PornHubBaseIE):
upload_date = None upload_date = None
formats = [] formats = []
def add_format(format_url, height=None):
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
if mobj:
if not height:
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
formats.append({
'url': format_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
for video_url, height in video_urls: for video_url, height in video_urls:
if not upload_date: if not upload_date:
upload_date = self._search_regex( upload_date = self._search_regex(
@@ -410,18 +427,19 @@ class PornHubIE(PornHubBaseIE):
video_url, video_id, 'mp4', entry_protocol='m3u8_native', video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
continue continue
tbr = None if '/video/get_media' in video_url:
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) medias = self._download_json(video_url, video_id, fatal=False)
if mobj: if isinstance(medias, list):
if not height: for media in medias:
height = int(mobj.group('height')) if not isinstance(media, dict):
tbr = int(mobj.group('tbr')) continue
formats.append({ video_url = url_or_none(media.get('videoUrl'))
'url': video_url, if not video_url:
'format_id': '%dp' % height if height else None, continue
'height': height, height = int_or_none(media.get('quality'))
'tbr': tbr, add_format(video_url, height)
}) continue
add_format(video_url)
self._sort_formats(formats) self._sort_formats(formats)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(

View File

@@ -158,6 +158,10 @@ class RaiPlayIE(RaiBaseIE):
# subtitles at 'subtitlesArray' key (see #27698) # subtitles at 'subtitlesArray' key (see #27698)
'url': 'https://www.raiplay.it/video/2020/12/Report---04-01-2021-2e90f1de-8eee-4de4-ac0e-78d21db5b600.html', 'url': 'https://www.raiplay.it/video/2020/12/Report---04-01-2021-2e90f1de-8eee-4de4-ac0e-78d21db5b600.html',
'only_matching': True, 'only_matching': True,
}, {
# DRM protected
'url': 'https://www.raiplay.it/video/2020/09/Lo-straordinario-mondo-di-Zoey-S1E1-Lo-straordinario-potere-di-Zoey-ed493918-1d32-44b7-8454-862e473d00ff.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -166,6 +170,14 @@ class RaiPlayIE(RaiBaseIE):
media = self._download_json( media = self._download_json(
base + '.json', video_id, 'Downloading video JSON') base + '.json', video_id, 'Downloading video JSON')
if not self._downloader.params.get('allow_unplayable_formats'):
if try_get(
media,
(lambda x: x['rights_management']['rights']['drm'],
lambda x: x['program_info']['rights_management']['rights']['drm']),
dict):
raise ExtractorError('This video is DRM protected.', expected=True)
title = media['name'] title = media['name']
video = media['video'] video = media['video']

View File

@@ -2,8 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import io
import re import re
import time import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@@ -14,56 +15,13 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
qualities,
remove_end, remove_end,
remove_start, remove_start,
sanitized_Request,
std_headers, std_headers,
) )
_bytes_to_chr = (lambda x: x) if sys.version_info[0] == 2 else (lambda x: map(chr, x))
def _decrypt_url(png):
encrypted_data = compat_b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8 + length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:]
if url_data[0] == 'H' and url_data[3] == '%':
# remove useless HQ%% at the start
url_data = url_data[4:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
class RTVEALaCartaIE(InfoExtractor): class RTVEALaCartaIE(InfoExtractor):
@@ -79,28 +37,31 @@ class RTVEALaCartaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia', 'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
'duration': 5024.566, 'duration': 5024.566,
'series': 'Balonmano',
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, { }, {
'note': 'Live stream', 'note': 'Live stream',
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/', 'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
'info_dict': { 'info_dict': {
'id': '1694255', 'id': '1694255',
'ext': 'flv', 'ext': 'mp4',
'title': 'TODO', 'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': 'live stream',
}, },
'skip': 'The f4m manifest can\'t be used yet',
}, { }, {
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/', 'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
'md5': 'e55e162379ad587e9640eda4f7353c0f', 'md5': 'd850f3c8731ea53952ebab489cf81cbf',
'info_dict': { 'info_dict': {
'id': '4236788', 'id': '4236788',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Servir y proteger - Capítulo 104 ', 'title': 'Servir y proteger - Capítulo 104',
'duration': 3222.0, 'duration': 3222.0,
}, },
'params': { 'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve', 'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True, 'only_matching': True,
@@ -111,58 +72,102 @@ class RTVEALaCartaIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8') user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
manager_info = self._download_json( self._manager = self._download_json(
'http://www.rtve.es/odin/loki/' + user_agent_b64, 'http://www.rtve.es/odin/loki/' + user_agent_b64,
None, 'Fetching manager info') None, 'Fetching manager info')['manager']
self._manager = manager_info['manager']
@staticmethod
def _decrypt_url(png):
encrypted_data = io.BytesIO(compat_b64decode(png)[8:])
while True:
length = compat_struct_unpack('!I', encrypted_data.read(4))[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
alphabet_data, text = data.split(b'\0')
quality, url_data = text.split(b'%%')
alphabet = []
e = 0
d = 0
for l in _bytes_to_chr(alphabet_data):
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in _bytes_to_chr(url_data):
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
yield quality.decode(), url
encrypted_data.read(4) # CRC
def _extract_png_formats(self, video_id):
png = self._download_webpage(
'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id),
video_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
self._sort_formats(formats)
return formats
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
info = self._download_json( info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id, 'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0] video_id)['page']['items'][0]
if info['state'] == 'DESPU': if info['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True) raise ExtractorError('The video is no longer available', expected=True)
title = info['title'] title = info['title'].strip()
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id) formats = self._extract_png_formats(video_id)
png_request = sanitized_Request(png_url)
png_request.add_header('Referer', url)
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
ext = determine_ext(video_url)
formats = []
if not video_url.endswith('.f4m') and ext != 'm3u8':
if '?' not in video_url:
video_url = video_url.replace('resources/', 'auth/resources/')
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': video_url,
})
self._sort_formats(formats)
subtitles = None subtitles = None
if info.get('sbtFile') is not None: sbt_file = info.get('sbtFile')
subtitles = self.extract_subtitles(video_id, info['sbtFile']) if sbt_file:
subtitles = self.extract_subtitles(video_id, sbt_file)
is_live = info.get('live') is True
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'formats': formats, 'formats': formats,
'thumbnail': info.get('image'), 'thumbnail': info.get('image'),
'page_url': url,
'subtitles': subtitles, 'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), scale=1000), 'duration': float_or_none(info.get('duration'), 1000),
'is_live': is_live,
'series': info.get('programTitle'),
} }
def _get_subtitles(self, video_id, sub_file): def _get_subtitles(self, video_id, sub_file):
@@ -174,48 +179,26 @@ class RTVEALaCartaIE(InfoExtractor):
for s in subs) for s in subs)
class RTVEInfantilIE(InfoExtractor): class RTVEInfantilIE(RTVEALaCartaIE):
IE_NAME = 'rtve.es:infantil' IE_NAME = 'rtve.es:infantil'
IE_DESC = 'RTVE infantil' IE_DESC = 'RTVE infantil'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/(?P<show>[^/]*)/video/(?P<short_title>[^/]*)/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/', 'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
'md5': '915319587b33720b8e0357caaa6617e6', 'md5': '5747454717aedf9f9fdf212d1bcfc48d',
'info_dict': { 'info_dict': {
'id': '3040283', 'id': '3040283',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maneras de vivir', 'title': 'Maneras de vivir',
'thumbnail': 'http://www.rtve.es/resources/jpg/6/5/1426182947956.JPG', 'thumbnail': r're:https?://.+/1426182947956\.JPG',
'duration': 357.958, 'duration': 357.958,
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}] }]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
webpage = self._download_webpage(url, video_id) class RTVELiveIE(RTVEALaCartaIE):
vidplayer_id = self._search_regex(
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
return {
'id': video_id,
'ext': 'mp4',
'title': info['title'],
'url': video_url,
'thumbnail': info.get('image'),
'duration': float_or_none(info.get('duration'), scale=1000),
}
class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
@@ -225,7 +208,7 @@ class RTVELiveIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'la-1', 'id': 'la-1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$', 'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
}, },
'params': { 'params': {
'skip_download': 'live stream', 'skip_download': 'live stream',
@@ -234,29 +217,22 @@ class RTVELiveIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
start_time = time.gmtime()
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es') title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
title = remove_start(title, 'Estoy viendo ') title = remove_start(title, 'Estoy viendo ')
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
vidplayer_id = self._search_regex( vidplayer_id = self._search_regex(
(r'playerId=player([0-9]+)', (r'playerId=player([0-9]+)',
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)', r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
r'data-id=["\'](\d+)'), r'data-id=["\'](\d+)'),
webpage, 'internal video ID') webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title),
'formats': formats, 'formats': self._extract_png_formats(vidplayer_id),
'is_live': True, 'is_live': True,
} }

View File

@@ -51,13 +51,16 @@ class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)' _VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AC%D9%84%D8%B3-%D8%A7%D9%84%D8%B4%D8%A8%D8%A7%D8%A8-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-275286', 'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
'info_dict': { 'info_dict': {
'id': '275286', 'id': '816924',
'ext': 'mp4', 'ext': 'mp4',
'title': 'مجلس الشباب الموسم 1 كليب 1', 'title': 'متحف الدحيح الموسم 1 كليب 1',
'timestamp': 1506988800, 'timestamp': 1602806400,
'upload_date': '20171003', 'upload_date': '20201016',
'description': 'برومو',
'duration': 22,
'categories': ['كوميديا'],
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@@ -109,12 +112,15 @@ class ShahidIE(ShahidBaseIE):
page_type = 'episode' page_type = 'episode'
playout = self._call_api( playout = self._call_api(
'playout/url/' + video_id, video_id)['playout'] 'playout/new/url/' + video_id, video_id)['playout']
if not self._downloader.params.get('allow_unplayable_formats') and playout.get('drm'): if not self._downloader.params.get('allow_unplayable_formats') and playout.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4') formats = self._extract_m3u8_formats(re.sub(
# https://docs.aws.amazon.com/mediapackage/latest/ug/manifest-filtering.html
r'aws\.manifestfilter=[\w:;,-]+&?',
'', playout['url']), video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
# video = self._call_api( # video = self._call_api(

View File

@@ -6,9 +6,9 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor): class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com' IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark(?:\.cc|studios)\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss' _FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_TESTS = [{ _TESTS = [{
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured', 'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
@@ -23,8 +23,20 @@ class SouthParkIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1', 'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.southparkstudios.com/episodes/h4o269/south-park-stunning-and-brave-season-19-ep-1',
'only_matching': True,
}] }]
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'shared.southpark.global',
'ep': '90877963',
'imageEp': 'shared.southpark.global',
'mgid': uri,
}
class SouthParkEsIE(SouthParkIE): class SouthParkEsIE(SouthParkIE):
IE_NAME = 'southpark.cc.com:español' IE_NAME = 'southpark.cc.com:español'

View File

@@ -1,82 +1,105 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
clean_html,
float_or_none,
int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request, strip_or_none,
try_get,
) )
class SportDeutschlandIE(InfoExtractor): class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])' _VALID_URL = r'https?://sportdeutschland\.tv/(?P<id>(?:[^/]+/)?[^?#/&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0', 'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': { 'info_dict': {
'id': 're-live-deutsche-meisterschaften-2020-halbfinals', 'id': '5318cac0275701382770543d7edaf0a0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals', 'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals - Teil 1',
'categories': ['Badminton-Deutschland'], 'duration': 16106.36,
'view_count': int,
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
}, },
'params': {
'noplaylist': True,
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': 'c6e2fdd01f63013854c47054d2ab776f',
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals',
'description': 'md5:5263ff4c31c04bb780c9f91130b48530',
'duration': 31397,
},
'playlist_count': 2,
}, {
'url': 'https://sportdeutschland.tv/freeride-world-tour-2021-fieberbrunn-oesterreich',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
video_id = mobj.group('id') data = self._download_json(
sport_id = mobj.group('sport') 'https://backend.sportdeutschland.tv/api/permalinks/' + display_id,
display_id, query={'access_token': 'true'})
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
'Referer': url,
})
data = self._download_json(req, video_id)
asset = data['asset'] asset = data['asset']
categories = [data['section']['title']] title = (asset.get('title') or asset['label']).strip()
asset_id = asset.get('id') or asset.get('uuid')
formats = [] info = {
smil_url = asset['video'] 'id': asset_id,
if '.smil' in smil_url: 'title': title,
m3u8_url = smil_url.replace('.smil', '.m3u8') 'description': clean_html(asset.get('body') or asset.get('description')) or asset.get('teaser'),
formats.extend( 'duration': int_or_none(asset.get('seconds')),
self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4'))
smil_doc = self._download_xml(
smil_url, video_id, note='Downloading SMIL metadata')
base_url_el = smil_doc.find('./head/meta')
if base_url_el:
base_url = base_url_el.attrib['base']
formats.extend([{
'format_id': 'rmtp',
'url': base_url if base_url_el else n.attrib['src'],
'play_path': n.attrib['src'],
'ext': 'flv',
'preference': -100,
'format_note': 'Seems to fail at example stream',
} for n in smil_doc.findall('./body/video')])
else:
formats.append({'url': smil_url})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': asset['title'],
'thumbnail': asset.get('image'),
'description': asset.get('teaser'),
'duration': asset.get('duration'),
'categories': categories,
'view_count': asset.get('views'),
'rtmp_live': asset.get('live'),
'timestamp': parse_iso8601(asset.get('date')),
} }
videos = asset.get('videos') or []
if len(videos) > 1:
playlist_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('playlistId', [None])[0]
if playlist_id:
if self._downloader.params.get('noplaylist'):
videos = [videos[int(playlist_id)]]
self.to_screen('Downloading just a single video because of --no-playlist')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % asset_id)
def entries():
for i, video in enumerate(videos, 1):
video_id = video.get('uuid')
video_url = video.get('url')
if not (video_id and video_url):
continue
formats = self._extract_m3u8_formats(
video_url.replace('.smil', '.m3u8'), video_id, 'mp4', fatal=False)
if not formats:
continue
yield {
'id': video_id,
'formats': formats,
'title': title + ' - ' + (video.get('label') or 'Teil %d' % i),
'duration': float_or_none(video.get('duration')),
}
info.update({
'_type': 'multi_video',
'entries': entries(),
})
else:
formats = self._extract_m3u8_formats(
videos[0]['url'].replace('.smil', '.m3u8'), asset_id, 'mp4')
section_title = strip_or_none(try_get(data, lambda x: x['section']['title']))
info.update({
'formats': formats,
'display_id': asset.get('permalink'),
'thumbnail': try_get(asset, lambda x: x['images'][0]),
'categories': [section_title] if section_title else None,
'view_count': int_or_none(asset.get('views')),
'is_live': asset.get('is_live') is True,
'timestamp': parse_iso8601(asset.get('date') or asset.get('published_at')),
})
return info

View File

@@ -14,6 +14,7 @@ from ..utils import (
class TrovoBaseIE(InfoExtractor): class TrovoBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?trovo\.live/' _VALID_URL_BASE = r'https?://(?:www\.)?trovo\.live/'
_HEADERS = {'Origin': 'https://trovo.live'}
def _extract_streamer_info(self, data): def _extract_streamer_info(self, data):
streamer_info = data.get('streamerInfo') or {} streamer_info = data.get('streamerInfo') or {}
@@ -68,6 +69,7 @@ class TrovoIE(TrovoBaseIE):
'format_id': format_id, 'format_id': format_id,
'height': int_or_none(format_id[:-1]) if format_id else None, 'height': int_or_none(format_id[:-1]) if format_id else None,
'url': play_url, 'url': play_url,
'http_headers': self._HEADERS,
}) })
self._sort_formats(formats) self._sort_formats(formats)
@@ -153,6 +155,7 @@ class TrovoVodIE(TrovoBaseIE):
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'tbr': int_or_none(play_info.get('bitrate')), 'tbr': int_or_none(play_info.get('bitrate')),
'url': play_url, 'url': play_url,
'http_headers': self._HEADERS,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -9,6 +9,7 @@ from ..utils import (
int_or_none, int_or_none,
remove_start, remove_start,
smuggle_url, smuggle_url,
strip_or_none,
try_get, try_get,
) )
@@ -25,6 +26,10 @@ class TVerIE(InfoExtractor):
}, { }, {
'url': 'https://tver.jp/episode/79622438', 'url': 'https://tver.jp/episode/79622438',
'only_matching': True, 'only_matching': True,
}, {
# subtitle = ' '
'url': 'https://tver.jp/corner/f0068870',
'only_matching': True,
}] }]
_TOKEN = None _TOKEN = None
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
@@ -47,8 +52,12 @@ class TVerIE(InfoExtractor):
} }
if service == 'cx': if service == 'cx':
title = main['title']
subtitle = strip_or_none(main.get('subtitle'))
if subtitle:
title += ' - ' + subtitle
info.update({ info.update({
'title': main.get('subtitle') or main['title'], 'title': title,
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id), 'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
'ie_key': 'FujiTVFODPlus7', 'ie_key': 'FujiTVFODPlus7',
}) })

View File

@@ -498,6 +498,24 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/album/2632481/video/79010983', 'url': 'https://vimeo.com/album/2632481/video/79010983',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://vimeo.com/showcase/3253534/video/119195465',
'note': 'A video in a password protected album (showcase)',
'info_dict': {
'id': '119195465',
'ext': 'mp4',
'title': 'youtube-dl test video \'ä"BaW_jenozKc',
'uploader': 'Philipp Hagemeister',
'uploader_id': 'user20132939',
'description': 'md5:fa7b6c6d8db0bdc353893df2f111855b',
'upload_date': '20150209',
'timestamp': 1423518307,
},
'params': {
'format': 'best[protocol=https]',
'videopassword': 'youtube-dl',
},
},
{ {
# source file returns 403: Forbidden # source file returns 403: Forbidden
'url': 'https://vimeo.com/7809605', 'url': 'https://vimeo.com/7809605',
@@ -564,6 +582,44 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _try_album_password(self, url):
album_id = self._search_regex(
r'vimeo\.com/(?:album|showcase)/([^/]+)', url, 'album id', default=None)
if not album_id:
return
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
if not viewer:
webpage = self._download_webpage(url, album_id)
viewer = self._parse_json(self._search_regex(
r'bootstrap_data\s*=\s*({.+?})</script>',
webpage, 'bootstrap data'), album_id)['viewer']
jwt = viewer['jwt']
album = self._download_json(
'https://api.vimeo.com/albums/' + album_id,
album_id, headers={'Authorization': 'jwt ' + jwt},
query={'fields': 'description,name,privacy'})
if try_get(album, lambda x: x['privacy']['view']) == 'password':
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This album is protected by a password, use the --video-password option',
expected=True)
self._set_vimeo_cookie('vuid', viewer['vuid'])
try:
self._download_json(
'https://vimeo.com/showcase/%s/auth' % album_id,
album_id, 'Verifying the password', data=urlencode_postdata({
'password': password,
'token': viewer['xsrft'],
}), headers={
'X-Requested-With': 'XMLHttpRequest',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError('Wrong password', expected=True)
raise
def _real_extract(self, url): def _real_extract(self, url):
url, data = unsmuggle_url(url, {}) url, data = unsmuggle_url(url, {})
headers = std_headers.copy() headers = std_headers.copy()
@@ -591,6 +647,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')): elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
self._try_album_password(url)
try: try:
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
webpage, urlh = self._download_webpage_handle( webpage, urlh = self._download_webpage_handle(

View File

@@ -7,6 +7,8 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
try_get,
unified_timestamp,
) )
@@ -19,14 +21,17 @@ class VoxMediaVolumeIE(OnceIE):
setup = self._parse_json(self._search_regex( setup = self._parse_json(self._search_regex(
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id) r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
video_data = setup.get('video') or {} player_setup = setup.get('player_setup') or setup
video_data = player_setup.get('video') or {}
formatted_metadata = video_data.get('formatted_metadata') or {}
info = { info = {
'id': video_id, 'id': video_id,
'title': video_data.get('title_short'), 'title': player_setup.get('title') or video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'), 'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail') 'thumbnail': formatted_metadata.get('thumbnail') or video_data.get('brightcove_thumbnail'),
'timestamp': unified_timestamp(formatted_metadata.get('video_publish_date')),
} }
asset = setup.get('asset') or setup.get('params') or {} asset = try_get(setup, lambda x: x['embed_assets']['chorus'], dict) or {}
formats = [] formats = []
hls_url = asset.get('hls_url') hls_url = asset.get('hls_url')
@@ -47,6 +52,7 @@ class VoxMediaVolumeIE(OnceIE):
if formats: if formats:
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats
info['duration'] = int_or_none(asset.get('duration'))
return info return info
for provider_video_type in ('ooyala', 'youtube', 'brightcove'): for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
@@ -84,7 +90,7 @@ class VoxMediaIE(InfoExtractor):
}, { }, {
# Volume embed, Youtube # Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet', 'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
'md5': '4c8f4a0937752b437c3ebc0ed24802b5', 'md5': 'fd19aa0cf3a0eea515d4fd5c8c0e9d68',
'info_dict': { 'info_dict': {
'id': 'Gy8Md3Eky38', 'id': 'Gy8Md3Eky38',
'ext': 'mp4', 'ext': 'mp4',
@@ -93,6 +99,7 @@ class VoxMediaIE(InfoExtractor):
'uploader_id': 'TheVerge', 'uploader_id': 'TheVerge',
'upload_date': '20141021', 'upload_date': '20141021',
'uploader': 'The Verge', 'uploader': 'The Verge',
'timestamp': 1413907200,
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'skip': 'similar to the previous test', 'skip': 'similar to the previous test',
@@ -100,13 +107,13 @@ class VoxMediaIE(InfoExtractor):
# Volume embed, Youtube # Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill', 'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': { 'info_dict': {
'id': 'YCjDnX-Xzhg', 'id': '22986359b',
'ext': 'mp4', 'ext': 'mp4',
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination", 'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
'description': 'md5:fc1317922057de31cd74bce91eb1c66c', 'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
'uploader_id': 'voxdotcom',
'upload_date': '20150915', 'upload_date': '20150915',
'uploader': 'Vox', 'timestamp': 1442332800,
'duration': 285,
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'skip': 'similar to the previous test', 'skip': 'similar to the previous test',
@@ -160,6 +167,9 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella', 'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella',
'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.', 'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.',
'timestamp': 1402938000,
'upload_date': '20140616',
'duration': 4114,
}, },
'add_ie': ['VoxMediaVolume'], 'add_ie': ['VoxMediaVolume'],
}] }]

163
yt_dlp/extractor/wimtv.py Normal file
View File

@@ -0,0 +1,163 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
urlencode_postdata,
ExtractorError,
)
class WimTVIE(InfoExtractor):
_player = None
_UUID_RE = r'[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_VALID_URL = r'''(?x)
https?://platform.wim.tv/
(?:
(?:embed/)?\?
|\#/webtv/.+?/
)
(?P<type>vod|live|cast)[=/]
(?P<id>%s).*?''' % _UUID_RE
_TESTS = [{
# vod stream
'url': 'https://platform.wim.tv/embed/?vod=db29fb32-bade-47b6-a3a6-cb69fe80267a',
'md5': 'db29fb32-bade-47b6-a3a6-cb69fe80267a',
'info_dict': {
'id': 'db29fb32-bade-47b6-a3a6-cb69fe80267a',
'ext': 'mp4',
'title': 'AMA SUPERCROSS 2020 - R2 ST. LOUIS',
'duration': 6481,
'thumbnail': r're:https?://.+?/thumbnail/.+?/720$'
},
'params': {
'skip_download': True,
},
}, {
# live stream
'url': 'https://platform.wim.tv/embed/?live=28e22c22-49db-40f3-8c37-8cbb0ff44556&autostart=true',
'info_dict': {
'id': '28e22c22-49db-40f3-8c37-8cbb0ff44556',
'ext': 'mp4',
'title': 'Streaming MSmotorTV',
'is_live': True,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://platform.wim.tv/#/webtv/automotornews/vod/422492b6-539e-474d-9c6b-68c9d5893365',
'only_matching': True,
}, {
'url': 'https://platform.wim.tv/#/webtv/renzoarborechannel/cast/f47e0d15-5b45-455e-bf0d-dba8ffa96365',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe[^>]+src=["\'](?P<url>%s)' % WimTVIE._VALID_URL,
webpage)]
def _real_initialize(self):
if not self._player:
self._get_player_data()
def _get_player_data(self):
msg_id = 'Player data'
self._player = {}
datas = [{
'url': 'https://platform.wim.tv/common/libs/player/wimtv/wim-rest.js',
'vars': [{
'regex': r'appAuth = "(.+?)"',
'variable': 'app_auth',
}]
}, {
'url': 'https://platform.wim.tv/common/config/endpointconfig.js',
'vars': [{
'regex': r'PRODUCTION_HOSTNAME_THUMB = "(.+?)"',
'variable': 'thumb_server',
}, {
'regex': r'PRODUCTION_HOSTNAME_THUMB\s*\+\s*"(.+?)"',
'variable': 'thumb_server_path',
}]
}]
for data in datas:
temp = self._download_webpage(data['url'], msg_id)
for var in data['vars']:
val = self._search_regex(var['regex'], temp, msg_id)
if not val:
raise ExtractorError('%s not found' % var['variable'])
self._player[var['variable']] = val
def _generate_token(self):
json = self._download_json(
'https://platform.wim.tv/wimtv-server/oauth/token', 'Token generation',
headers={'Authorization': 'Basic %s' % self._player['app_auth']},
data=urlencode_postdata({'grant_type': 'client_credentials'}))
token = json.get('access_token')
if not token:
raise ExtractorError('access token not generated')
return token
def _generate_thumbnail(self, thumb_id, width='720'):
if not thumb_id or not self._player.get('thumb_server'):
return None
if not self._player.get('thumb_server_path'):
self._player['thumb_server_path'] = ''
return '%s%s/asset/thumbnail/%s/%s' % (
self._player['thumb_server'],
self._player['thumb_server_path'],
thumb_id, width)
def _real_extract(self, url):
urlc = re.match(self._VALID_URL, url).groupdict()
video_id = urlc['id']
stream_type = is_live = None
if urlc['type'] in {'live', 'cast'}:
stream_type = urlc['type'] + '/channel'
is_live = True
else:
stream_type = 'vod'
is_live = False
token = self._generate_token()
json = self._download_json(
'https://platform.wim.tv/wimtv-server/api/public/%s/%s/play' % (
stream_type, video_id), video_id,
headers={'Authorization': 'Bearer %s' % token,
'Content-Type': 'application/json'},
data=bytes('{}', 'utf-8'))
formats = []
for src in json.get('srcs') or []:
if src.get('mimeType') == 'application/x-mpegurl':
formats.extend(
self._extract_m3u8_formats(
src.get('uniqueStreamer'), video_id, 'mp4'))
if src.get('mimeType') == 'video/flash':
formats.append({
'format_id': 'rtmp',
'url': src.get('uniqueStreamer'),
'ext': determine_ext(src.get('uniqueStreamer'), 'flv'),
'rtmp_live': is_live,
})
json = json.get('resource')
thumb = self._generate_thumbnail(json.get('thumbnailId'))
self._sort_formats(formats)
return {
'id': video_id,
'title': json.get('title') or json.get('name'),
'duration': parse_duration(json.get('duration')),
'formats': formats,
'thumbnail': thumb,
'is_live': is_live,
}

View File

@@ -26,6 +26,7 @@ from ..compat import (
from ..jsinterp import JSInterpreter from ..jsinterp import JSInterpreter
from ..utils import ( from ..utils import (
clean_html, clean_html,
dict_get,
ExtractorError, ExtractorError,
format_field, format_field,
float_or_none, float_or_none,
@@ -59,9 +60,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_TFA_URL = 'https://accounts.google.com/_/signin/challenge?hl=en&TL={0}' _TFA_URL = 'https://accounts.google.com/_/signin/challenge?hl=en&TL={0}'
_RESERVED_NAMES = ( _RESERVED_NAMES = (
r'embed|e|watch_popup|channel|c|user|playlist|watch|w|v|movies|results|shared|hashtag|' r'channel|c|user|playlist|watch|w|v|embed|e|watch_popup|'
r'storefront|oops|index|account|reporthistory|t/terms|about|upload|signin|logout|' r'movies|results|shared|hashtag|trending|feed|feeds|'
r'feed/(?:watch_later|history|subscriptions|library|trending|recommended)') r'storefront|oops|index|account|reporthistory|t/terms|about|upload|signin|logout')
_NETRC_MACHINE = 'youtube' _NETRC_MACHINE = 'youtube'
# If True it will raise an error if no login info is provided # If True it will raise an error if no login info is provided
@@ -271,15 +272,21 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if not self._login(): if not self._login():
return return
_YT_WEB_CLIENT_VERSION = '2.20210301.08.00'
_DEFAULT_API_DATA = { _DEFAULT_API_DATA = {
'context': { 'context': {
'client': { 'client': {
'clientName': 'WEB', 'clientName': 'WEB',
'clientVersion': '2.20210301.08.00', 'clientVersion': _YT_WEB_CLIENT_VERSION,
} }
}, },
} }
_DEFAULT_BASIC_API_HEADERS = {
'X-YouTube-Client-Name': '1',
'X-YouTube-Client-Version': _YT_WEB_CLIENT_VERSION
}
_YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;' _YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;'
_YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;' _YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;'
_YT_INITIAL_BOUNDARY_RE = r'(?:var\s+meta|</script|\n)' _YT_INITIAL_BOUNDARY_RE = r'(?:var\s+meta|</script|\n)'
@@ -301,7 +308,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
auth = self._generate_sapisidhash_header() auth = self._generate_sapisidhash_header()
if auth is not None: if auth is not None:
headers.update({'Authorization': auth, 'X-Origin': 'https://www.youtube.com'}) headers.update({'Authorization': auth, 'X-Origin': 'https://www.youtube.com'})
return self._download_json( return self._download_json(
'https://www.youtube.com/youtubei/v1/%s' % ep, 'https://www.youtube.com/youtubei/v1/%s' % ep,
video_id=video_id, fatal=fatal, note=note, errnote=errnote, video_id=video_id, fatal=fatal, note=note, errnote=errnote,
@@ -315,6 +321,27 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
self._YT_INITIAL_DATA_RE), webpage, 'yt initial data'), self._YT_INITIAL_DATA_RE), webpage, 'yt initial data'),
video_id) video_id)
def _extract_identity_token(self, webpage, item_id):
ytcfg = self._extract_ytcfg(item_id, webpage)
if ytcfg:
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
if token:
return token
return self._search_regex(
r'\bID_TOKEN["\']\s*:\s*["\'](.+?)["\']', webpage,
'identity token', default=None)
@staticmethod
def _extract_account_syncid(data):
"""Extract syncId required to download private playlists of secondary channels"""
sync_ids = (
try_get(data, lambda x: x['responseContext']['mainAppWebResponseContext']['datasyncId'], compat_str)
or '').split("||")
if len(sync_ids) >= 2 and sync_ids[1]:
# datasyncid is of the form "channel_syncid||user_syncid" for secondary channel
# and just "user_syncid||" for primary channel. We only want the channel_syncid
return sync_ids[0]
def _extract_ytcfg(self, video_id, webpage): def _extract_ytcfg(self, video_id, webpage):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
@@ -1462,6 +1489,270 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(r'%s\s*%s' % (regex, self._YT_INITIAL_BOUNDARY_RE), (r'%s\s*%s' % (regex, self._YT_INITIAL_BOUNDARY_RE),
regex), webpage, name, default='{}'), video_id, fatal=False) regex), webpage, name, default='{}'), video_id, fatal=False)
@staticmethod
def _join_text_entries(runs):
text = None
for run in runs:
if not isinstance(run, dict):
continue
sub_text = try_get(run, lambda x: x['text'], compat_str)
if sub_text:
if not text:
text = sub_text
continue
text += sub_text
return text
def _extract_comment(self, comment_renderer, parent=None):
comment_id = comment_renderer.get('commentId')
if not comment_id:
return
comment_text_runs = try_get(comment_renderer, lambda x: x['contentText']['runs']) or []
text = self._join_text_entries(comment_text_runs) or ''
comment_time_text = try_get(comment_renderer, lambda x: x['publishedTimeText']['runs']) or []
time_text = self._join_text_entries(comment_time_text)
author = try_get(comment_renderer, lambda x: x['authorText']['simpleText'], compat_str)
author_id = try_get(comment_renderer,
lambda x: x['authorEndpoint']['browseEndpoint']['browseId'], compat_str)
votes = str_to_int(try_get(comment_renderer, (lambda x: x['voteCount']['simpleText'],
lambda x: x['likeCount']), compat_str)) or 0
author_thumbnail = try_get(comment_renderer,
lambda x: x['authorThumbnail']['thumbnails'][-1]['url'], compat_str)
author_is_uploader = try_get(comment_renderer, lambda x: x['authorIsChannelOwner'], bool)
is_liked = try_get(comment_renderer, lambda x: x['isLiked'], bool)
return {
'id': comment_id,
'text': text,
# TODO: This should be parsed to timestamp
'time_text': time_text,
'like_count': votes,
'is_favorited': is_liked,
'author': author,
'author_id': author_id,
'author_thumbnail': author_thumbnail,
'author_is_uploader': author_is_uploader,
'parent': parent or 'root'
}
def _comment_entries(self, root_continuation_data, identity_token, account_syncid,
session_token_list, parent=None, comment_counts=None):
def extract_thread(parent_renderer):
contents = try_get(parent_renderer, lambda x: x['contents'], list) or []
if not parent:
comment_counts[2] = 0
for content in contents:
comment_thread_renderer = try_get(content, lambda x: x['commentThreadRenderer'])
comment_renderer = try_get(
comment_thread_renderer, (lambda x: x['comment']['commentRenderer'], dict)) or try_get(
content, (lambda x: x['commentRenderer'], dict))
if not comment_renderer:
continue
comment = self._extract_comment(comment_renderer, parent)
if not comment:
continue
comment_counts[0] += 1
yield comment
# Attempt to get the replies
comment_replies_renderer = try_get(
comment_thread_renderer, lambda x: x['replies']['commentRepliesRenderer'], dict)
if comment_replies_renderer:
comment_counts[2] += 1
comment_entries_iter = self._comment_entries(
comment_replies_renderer, identity_token, account_syncid,
parent=comment.get('id'), session_token_list=session_token_list,
comment_counts=comment_counts)
for reply_comment in comment_entries_iter:
yield reply_comment
if not comment_counts:
# comment so far, est. total comments, current comment thread #
comment_counts = [0, 0, 0]
headers = self._DEFAULT_BASIC_API_HEADERS.copy()
# TODO: Generalize the download code with TabIE
if identity_token:
headers['x-youtube-identity-token'] = identity_token
if account_syncid:
headers['X-Goog-PageId'] = account_syncid
headers['X-Goog-AuthUser'] = 0
continuation = YoutubeTabIE._extract_continuation(root_continuation_data) # TODO
first_continuation = False
if parent is None:
first_continuation = True
for page_num in itertools.count(0):
if not continuation:
break
retries = self._downloader.params.get('extractor_retries', 3)
count = -1
last_error = None
while count < retries:
count += 1
if last_error:
self.report_warning('%s. Retrying ...' % last_error)
try:
query = {
'ctoken': continuation['ctoken'],
'pbj': 1,
'type': 'next',
}
if parent:
query['action_get_comment_replies'] = 1
else:
query['action_get_comments'] = 1
comment_prog_str = '(%d/%d)' % (comment_counts[0], comment_counts[1])
if page_num == 0:
if first_continuation:
note_prefix = "Downloading initial comment continuation page"
else:
note_prefix = " Downloading comment reply thread %d %s" % (comment_counts[2], comment_prog_str)
else:
note_prefix = "%sDownloading comment%s page %d %s" % (
" " if parent else "",
' replies' if parent else '',
page_num,
comment_prog_str)
browse = self._download_json(
'https://www.youtube.com/comment_service_ajax', None,
'%s %s' % (note_prefix, '(retry #%d)' % count if count else ''),
headers=headers, query=query,
data=urlencode_postdata({
'session_token': session_token_list[0]
}))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503, 404, 413):
if e.cause.code == 413:
self.report_warning("Assumed end of comments (received HTTP Error 413)")
return
# Downloading page may result in intermittent 5xx HTTP error
# Sometimes a 404 is also recieved. See: https://github.com/ytdl-org/youtube-dl/issues/28289
last_error = 'HTTP Error %s' % e.cause.code
if e.cause.code == 404:
last_error = last_error + " (this API is probably deprecated)"
if count < retries:
continue
raise
else:
session_token = try_get(browse, lambda x: x['xsrf_token'], compat_str)
if session_token:
session_token_list[0] = session_token
response = try_get(browse,
(lambda x: x['response'],
lambda x: x[1]['response'])) or {}
if response.get('continuationContents'):
break
# YouTube sometimes gives reload: now json if something went wrong (e.g. bad auth)
if browse.get('reload'):
raise ExtractorError("Invalid or missing params in continuation request", expected=False)
# TODO: not tested, merged from old extractor
err_msg = browse.get('externalErrorMessage')
if err_msg:
raise ExtractorError('YouTube said: %s' % err_msg, expected=False)
# Youtube sometimes sends incomplete data
# See: https://github.com/ytdl-org/youtube-dl/issues/28194
last_error = 'Incomplete data received'
if count >= retries:
self._downloader.report_error(last_error)
if not response:
break
known_continuation_renderers = {
'itemSectionContinuation': extract_thread,
'commentRepliesContinuation': extract_thread
}
# extract next root continuation from the results
continuation_contents = try_get(
response, lambda x: x['continuationContents'], dict) or {}
for key, value in continuation_contents.items():
if key not in known_continuation_renderers:
continue
continuation_renderer = value
if first_continuation:
first_continuation = False
expected_comment_count = try_get(
continuation_renderer,
(lambda x: x['header']['commentsHeaderRenderer']['countText']['runs'][0]['text'],
lambda x: x['header']['commentsHeaderRenderer']['commentsCount']['runs'][0]['text']),
compat_str)
if expected_comment_count:
comment_counts[1] = str_to_int(expected_comment_count)
self.to_screen("Downloading ~%d comments" % str_to_int(expected_comment_count))
yield comment_counts[1]
# TODO: cli arg.
# 1/True for newest, 0/False for popular (default)
comment_sort_index = int(True)
sort_continuation_renderer = try_get(
continuation_renderer,
lambda x: x['header']['commentsHeaderRenderer']['sortMenu']['sortFilterSubMenuRenderer']['subMenuItems']
[comment_sort_index]['continuation']['reloadContinuationData'], dict)
# If this fails, the initial continuation page
# starts off with popular anyways.
if sort_continuation_renderer:
continuation = YoutubeTabIE._build_continuation_query(
continuation=sort_continuation_renderer.get('continuation'),
ctp=sort_continuation_renderer.get('clickTrackingParams'))
self.to_screen("Sorting comments by %s" % ('popular' if comment_sort_index == 0 else 'newest'))
break
for entry in known_continuation_renderers[key](continuation_renderer):
yield entry
continuation = YoutubeTabIE._extract_continuation(continuation_renderer) # TODO
break
def _extract_comments(self, ytcfg, video_id, contents, webpage, xsrf_token):
"""Entry for comment extraction"""
comments = []
known_entry_comment_renderers = (
'itemSectionRenderer',
)
estimated_total = 0
for entry in contents:
for key, renderer in entry.items():
if key not in known_entry_comment_renderers:
continue
comment_iter = self._comment_entries(
renderer,
identity_token=self._extract_identity_token(webpage, item_id=video_id),
account_syncid=self._extract_account_syncid(ytcfg),
session_token_list=[xsrf_token])
for comment in comment_iter:
if isinstance(comment, int):
estimated_total = comment
continue
comments.append(comment)
break
self.to_screen("Downloaded %d/%d comments" % (len(comments), estimated_total))
return {
'comments': comments,
'comment_count': len(comments),
}
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -2024,156 +2315,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
errnote='Unable to download video annotations', fatal=False, errnote='Unable to download video annotations', fatal=False,
data=urlencode_postdata({xsrf_field_name: xsrf_token})) data=urlencode_postdata({xsrf_field_name: xsrf_token}))
# Get comments
# TODO: Refactor and move to seperate function
def extract_comments():
expected_video_comment_count = 0
video_comments = []
comment_xsrf = xsrf_token
def find_value(html, key, num_chars=2, separator='"'):
pos_begin = html.find(key) + len(key) + num_chars
pos_end = html.find(separator, pos_begin)
return html[pos_begin: pos_end]
def search_dict(partial, key):
if isinstance(partial, dict):
for k, v in partial.items():
if k == key:
yield v
else:
for o in search_dict(v, key):
yield o
elif isinstance(partial, list):
for i in partial:
for o in search_dict(i, key):
yield o
continuations = []
if initial_data:
try:
ncd = next(search_dict(initial_data, 'nextContinuationData'))
continuations = [ncd['continuation']]
# Handle videos where comments have been disabled entirely
except StopIteration:
pass
def get_continuation(continuation, session_token, replies=False):
query = {
'pbj': 1,
'ctoken': continuation,
}
if replies:
query['action_get_comment_replies'] = 1
else:
query['action_get_comments'] = 1
while True:
content, handle = self._download_webpage_handle(
'https://www.youtube.com/comment_service_ajax',
video_id,
note=False,
expected_status=[413],
data=urlencode_postdata({
'session_token': session_token
}),
query=query,
headers={
'Accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0',
'X-YouTube-Client-Name': '1',
'X-YouTube-Client-Version': '2.20201202.06.01'
}
)
response_code = handle.getcode()
if (response_code == 200):
return self._parse_json(content, video_id)
if (response_code == 413):
return None
raise ExtractorError('Unexpected HTTP error code: %s' % response_code)
first_continuation = True
chain_msg = ''
self.to_screen('Downloading comments')
while continuations:
continuation = continuations.pop()
comment_response = get_continuation(continuation, comment_xsrf)
if not comment_response:
continue
if list(search_dict(comment_response, 'externalErrorMessage')):
raise ExtractorError('Error returned from server: ' + next(search_dict(comment_response, 'externalErrorMessage')))
if 'continuationContents' not in comment_response['response']:
# Something is wrong here. Youtube won't accept this continuation token for some reason and responds with a user satisfaction dialog (error?)
continue
# not sure if this actually helps
if 'xsrf_token' in comment_response:
comment_xsrf = comment_response['xsrf_token']
item_section = comment_response['response']['continuationContents']['itemSectionContinuation']
if first_continuation:
expected_video_comment_count = int(item_section['header']['commentsHeaderRenderer']['countText']['runs'][0]['text'].replace(' Comments', '').replace('1 Comment', '1').replace(',', ''))
first_continuation = False
if 'contents' not in item_section:
# continuation returned no comments?
# set an empty array as to not break the for loop
item_section['contents'] = []
for meta_comment in item_section['contents']:
comment = meta_comment['commentThreadRenderer']['comment']['commentRenderer']
video_comments.append({
'id': comment['commentId'],
'text': ''.join([c['text'] for c in try_get(comment, lambda x: x['contentText']['runs'], list) or []]),
'time_text': ''.join([c['text'] for c in comment['publishedTimeText']['runs']]),
'author': comment.get('authorText', {}).get('simpleText', ''),
'votes': comment.get('voteCount', {}).get('simpleText', '0'),
'author_thumbnail': comment['authorThumbnail']['thumbnails'][-1]['url'],
'parent': 'root'
})
if 'replies' not in meta_comment['commentThreadRenderer']:
continue
reply_continuations = [rcn['nextContinuationData']['continuation'] for rcn in meta_comment['commentThreadRenderer']['replies']['commentRepliesRenderer']['continuations']]
while reply_continuations:
time.sleep(1)
continuation = reply_continuations.pop()
replies_data = get_continuation(continuation, comment_xsrf, True)
if not replies_data or 'continuationContents' not in replies_data[1]['response']:
continue
if self._downloader.params.get('verbose', False):
chain_msg = ' (chain %s)' % comment['commentId']
self.to_screen('Comments downloaded: %d of ~%d%s' % (len(video_comments), expected_video_comment_count, chain_msg))
reply_comment_meta = replies_data[1]['response']['continuationContents']['commentRepliesContinuation']
for reply_meta in reply_comment_meta.get('contents', {}):
reply_comment = reply_meta['commentRenderer']
video_comments.append({
'id': reply_comment['commentId'],
'text': ''.join([c['text'] for c in reply_comment['contentText']['runs']]),
'time_text': ''.join([c['text'] for c in reply_comment['publishedTimeText']['runs']]),
'author': reply_comment.get('authorText', {}).get('simpleText', ''),
'votes': reply_comment.get('voteCount', {}).get('simpleText', '0'),
'author_thumbnail': reply_comment['authorThumbnail']['thumbnails'][-1]['url'],
'parent': comment['commentId']
})
if 'continuations' not in reply_comment_meta or len(reply_comment_meta['continuations']) == 0:
continue
reply_continuations += [rcn['nextContinuationData']['continuation'] for rcn in reply_comment_meta['continuations']]
self.to_screen('Comments downloaded: %d of ~%d' % (len(video_comments), expected_video_comment_count))
if 'continuations' in item_section:
continuations += [ncd['nextContinuationData']['continuation'] for ncd in item_section['continuations']]
time.sleep(1)
self.to_screen('Total comments downloaded: %d of ~%d' % (len(video_comments), expected_video_comment_count))
return {
'comments': video_comments,
'comment_count': expected_video_comment_count
}
if get_comments: if get_comments:
info['__post_extractor'] = extract_comments info['__post_extractor'] = lambda: self._extract_comments(ytcfg, video_id, contents, webpage, xsrf_token)
self.mark_watched(video_id, player_response) self.mark_watched(video_id, player_response)
@@ -2520,17 +2663,22 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
channel_url, 'channel id') channel_url, 'channel id')
@staticmethod @staticmethod
def _extract_grid_item_renderer(item): def _extract_basic_item_renderer(item):
for item_kind in ('Playlist', 'Video', 'Channel'): # Modified from _extract_grid_item_renderer
renderer = item.get('grid%sRenderer' % item_kind) known_renderers = (
if renderer: 'playlistRenderer', 'videoRenderer', 'channelRenderer'
return renderer 'gridPlaylistRenderer', 'gridVideoRenderer', 'gridChannelRenderer'
)
for key, renderer in item.items():
if key not in known_renderers:
continue
return renderer
def _grid_entries(self, grid_renderer): def _grid_entries(self, grid_renderer):
for item in grid_renderer['items']: for item in grid_renderer['items']:
if not isinstance(item, dict): if not isinstance(item, dict):
continue continue
renderer = self._extract_grid_item_renderer(item) renderer = self._extract_basic_item_renderer(item)
if not isinstance(renderer, dict): if not isinstance(renderer, dict):
continue continue
title = try_get( title = try_get(
@@ -2559,7 +2707,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
content = shelf_renderer.get('content') content = shelf_renderer.get('content')
if not isinstance(content, dict): if not isinstance(content, dict):
return return
renderer = content.get('gridRenderer') renderer = content.get('gridRenderer') or content.get('expandedShelfContentsRenderer')
if renderer: if renderer:
# TODO: add support for nested playlists so each shelf is processed # TODO: add support for nested playlists so each shelf is processed
# as separate playlist # as separate playlist
@@ -2601,20 +2749,6 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
continue continue
yield self._extract_video(renderer) yield self._extract_video(renderer)
r""" # Not needed in the new implementation
def _itemSection_entries(self, item_sect_renderer):
for content in item_sect_renderer['contents']:
if not isinstance(content, dict):
continue
renderer = content.get('videoRenderer', {})
if not isinstance(renderer, dict):
continue
video_id = renderer.get('videoId')
if not video_id:
continue
yield self._extract_video(renderer)
"""
def _rich_entries(self, rich_grid_renderer): def _rich_entries(self, rich_grid_renderer):
renderer = try_get( renderer = try_get(
rich_grid_renderer, lambda x: x['content']['videoRenderer'], dict) or {} rich_grid_renderer, lambda x: x['content']['videoRenderer'], dict) or {}
@@ -2713,7 +2847,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
ctp = continuation_ep.get('clickTrackingParams') ctp = continuation_ep.get('clickTrackingParams')
return YoutubeTabIE._build_continuation_query(continuation, ctp) return YoutubeTabIE._build_continuation_query(continuation, ctp)
def _entries(self, tab, identity_token, item_id): def _entries(self, tab, item_id, identity_token, account_syncid):
def extract_entries(parent_renderer): # this needs to called again for continuation to work with feeds def extract_entries(parent_renderer): # this needs to called again for continuation to work with feeds
contents = try_get(parent_renderer, lambda x: x['contents'], list) or [] contents = try_get(parent_renderer, lambda x: x['contents'], list) or []
@@ -2773,6 +2907,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if identity_token: if identity_token:
headers['x-youtube-identity-token'] = identity_token headers['x-youtube-identity-token'] = identity_token
if account_syncid:
headers['X-Goog-PageId'] = account_syncid
headers['X-Goog-AuthUser'] = 0
for page_num in itertools.count(1): for page_num in itertools.count(1):
if not continuation: if not continuation:
break break
@@ -2803,9 +2941,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
else: else:
# Youtube sometimes sends incomplete data # Youtube sometimes sends incomplete data
# See: https://github.com/ytdl-org/youtube-dl/issues/28194 # See: https://github.com/ytdl-org/youtube-dl/issues/28194
if response.get('continuationContents') or response.get('onResponseReceivedActions'): if dict_get(response,
('continuationContents', 'onResponseReceivedActions', 'onResponseReceivedEndpoints')):
break break
last_error = 'Incomplete data recieved'
# Youtube may send alerts if there was an issue with the continuation page
self._extract_alerts(response, expected=False)
last_error = 'Incomplete data received'
if count >= retries: if count >= retries:
self._downloader.report_error(last_error) self._downloader.report_error(last_error)
@@ -2837,11 +2980,13 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'gridPlaylistRenderer': (self._grid_entries, 'items'), 'gridPlaylistRenderer': (self._grid_entries, 'items'),
'gridVideoRenderer': (self._grid_entries, 'items'), 'gridVideoRenderer': (self._grid_entries, 'items'),
'playlistVideoRenderer': (self._playlist_entries, 'contents'), 'playlistVideoRenderer': (self._playlist_entries, 'contents'),
'itemSectionRenderer': (self._playlist_entries, 'contents'), 'itemSectionRenderer': (extract_entries, 'contents'), # for feeds
'richItemRenderer': (extract_entries, 'contents'), # for hashtag 'richItemRenderer': (extract_entries, 'contents'), # for hashtag
'backstagePostThreadRenderer': (self._post_thread_continuation_entries, 'contents')
} }
continuation_items = try_get( continuation_items = try_get(
response, lambda x: x['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'], list) response,
lambda x: dict_get(x, ('onResponseReceivedActions', 'onResponseReceivedEndpoints'))[0]['appendContinuationItemsAction']['continuationItems'], list)
continuation_item = try_get(continuation_items, lambda x: x[0], dict) or {} continuation_item = try_get(continuation_items, lambda x: x[0], dict) or {}
video_items_renderer = None video_items_renderer = None
for key, value in continuation_item.items(): for key, value in continuation_item.items():
@@ -2888,7 +3033,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
try_get(owner, lambda x: x['navigationEndpoint']['browseEndpoint']['canonicalBaseUrl'], compat_str)) try_get(owner, lambda x: x['navigationEndpoint']['browseEndpoint']['canonicalBaseUrl'], compat_str))
return {k: v for k, v in uploader.items() if v is not None} return {k: v for k, v in uploader.items() if v is not None}
def _extract_from_tabs(self, item_id, webpage, data, tabs, identity_token): def _extract_from_tabs(self, item_id, webpage, data, tabs):
playlist_id = title = description = channel_url = channel_name = channel_id = None playlist_id = title = description = channel_url = channel_name = channel_id = None
thumbnails_list = tags = [] thumbnails_list = tags = []
@@ -2952,16 +3097,41 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'channel_id': metadata['uploader_id'], 'channel_id': metadata['uploader_id'],
'channel_url': metadata['uploader_url']}) 'channel_url': metadata['uploader_url']})
return self.playlist_result( return self.playlist_result(
self._entries(selected_tab, identity_token, playlist_id), self._entries(
selected_tab, playlist_id,
self._extract_identity_token(webpage, item_id),
self._extract_account_syncid(data)),
**metadata) **metadata)
def _extract_mix_playlist(self, playlist, playlist_id):
first_id = last_id = None
for page_num in itertools.count(1):
videos = list(self._playlist_entries(playlist))
if not videos:
return
start = next((i for i, v in enumerate(videos) if v['id'] == last_id), -1) + 1
if start >= len(videos):
return
for video in videos[start:]:
if video['id'] == first_id:
self.to_screen('First video %s found again; Assuming end of Mix' % first_id)
return
yield video
first_id = first_id or videos[0]['id']
last_id = videos[-1]['id']
_, data = self._extract_webpage(
'https://www.youtube.com/watch?list=%s&v=%s' % (playlist_id, last_id),
'%s page %d' % (playlist_id, page_num))
playlist = try_get(
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
def _extract_from_playlist(self, item_id, url, data, playlist): def _extract_from_playlist(self, item_id, url, data, playlist):
title = playlist.get('title') or try_get( title = playlist.get('title') or try_get(
data, lambda x: x['titleText']['simpleText'], compat_str) data, lambda x: x['titleText']['simpleText'], compat_str)
playlist_id = playlist.get('playlistId') or item_id playlist_id = playlist.get('playlistId') or item_id
# Inline playlist rendition continuation does not always work
# at Youtube side, so delegating regular tab-based playlist URL # Delegating everything except mix playlists to regular tab-based playlist URL
# processing whenever possible.
playlist_url = urljoin(url, try_get( playlist_url = urljoin(url, try_get(
playlist, lambda x: x['endpoint']['commandMetadata']['webCommandMetadata']['url'], playlist, lambda x: x['endpoint']['commandMetadata']['webCommandMetadata']['url'],
compat_str)) compat_str))
@@ -2969,67 +3139,42 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
return self.url_result( return self.url_result(
playlist_url, ie=YoutubeTabIE.ie_key(), video_id=playlist_id, playlist_url, ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
video_title=title) video_title=title)
return self.playlist_result(
self._playlist_entries(playlist), playlist_id=playlist_id,
playlist_title=title)
@staticmethod return self.playlist_result(
def _extract_alerts(data): self._extract_mix_playlist(playlist, playlist_id),
for alert_dict in try_get(data, lambda x: x['alerts'], list) or []: playlist_id=playlist_id, playlist_title=title)
if not isinstance(alert_dict, dict):
continue def _extract_alerts(self, data, expected=False):
for renderer in alert_dict:
alert = alert_dict[renderer] def _real_extract_alerts():
alert_type = alert.get('type') for alert_dict in try_get(data, lambda x: x['alerts'], list) or []:
if not alert_type: if not isinstance(alert_dict, dict):
continue continue
message = try_get(alert, lambda x: x['text']['simpleText'], compat_str) for alert in alert_dict.values():
if message: alert_type = alert.get('type')
yield alert_type, message if not alert_type:
for run in try_get(alert, lambda x: x['text']['runs'], list) or []: continue
message = try_get(run, lambda x: x['text'], compat_str) message = try_get(alert, lambda x: x['text']['simpleText'], compat_str)
if message: if message:
yield alert_type, message yield alert_type, message
for run in try_get(alert, lambda x: x['text']['runs'], list) or []:
message = try_get(run, lambda x: x['text'], compat_str)
if message:
yield alert_type, message
def _extract_identity_token(self, webpage, item_id): err_msg = None
ytcfg = self._extract_ytcfg(item_id, webpage) for alert_type, alert_message in _real_extract_alerts():
if ytcfg: if alert_type.lower() == 'error':
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str) if err_msg:
if token: self._downloader.report_warning('YouTube said: %s - %s' % ('ERROR', err_msg))
return token err_msg = alert_message
return self._search_regex(
r'\bID_TOKEN["\']\s*:\s*["\'](.+?)["\']', webpage,
'identity token', default=None)
def _real_extract(self, url):
item_id = self._match_id(url)
url = compat_urlparse.urlunparse(
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
is_home = re.match(r'(?P<pre>%s)(?P<post>/?(?![^#?]).*$)' % self._VALID_URL, url)
if is_home is not None and is_home.group('not_channel') is None and item_id != 'feed':
self._downloader.report_warning(
'A channel/user page was given. All the channel\'s videos will be downloaded. '
'To download only the videos in the home page, add a "/featured" to the URL')
url = '%s/videos%s' % (is_home.group('pre'), is_home.group('post') or '')
# Handle both video/playlist URLs
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('v', [None])[0]
playlist_id = qs.get('list', [None])[0]
if is_home is not None and is_home.group('not_channel') is not None and is_home.group('not_channel').startswith('watch') and not video_id:
if playlist_id:
self._downloader.report_warning('%s is not a valid Youtube URL. Trying to download playlist %s' % (url, playlist_id))
url = 'https://www.youtube.com/playlist?list=%s' % playlist_id
# return self.url_result(playlist_id, ie=YoutubePlaylistIE.ie_key())
else: else:
raise ExtractorError('Unable to recognize tab page') self._downloader.report_warning('YouTube said: %s - %s' % (alert_type, alert_message))
if video_id and playlist_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
if err_msg:
raise ExtractorError('YouTube said: %s' % err_msg, expected=expected)
def _extract_webpage(self, url, item_id):
retries = self._downloader.params.get('extractor_retries', 3) retries = self._downloader.params.get('extractor_retries', 3)
count = -1 count = -1
last_error = 'Incomplete yt initial data recieved' last_error = 'Incomplete yt initial data recieved'
@@ -3041,40 +3186,67 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self.report_warning('%s. Retrying ...' % last_error) self.report_warning('%s. Retrying ...' % last_error)
webpage = self._download_webpage( webpage = self._download_webpage(
url, item_id, url, item_id,
'Downloading webpage%s' % ' (retry #%d)' % count if count else '') 'Downloading webpage%s' % (' (retry #%d)' % count if count else ''))
identity_token = self._extract_identity_token(webpage, item_id)
data = self._extract_yt_initial_data(item_id, webpage) data = self._extract_yt_initial_data(item_id, webpage)
err_msg = None self._extract_alerts(data, expected=True)
for alert_type, alert_message in self._extract_alerts(data):
if alert_type.lower() == 'error':
if err_msg:
self._downloader.report_warning('YouTube said: %s - %s' % ('ERROR', err_msg))
err_msg = alert_message
else:
self._downloader.report_warning('YouTube said: %s - %s' % (alert_type, alert_message))
if err_msg:
raise ExtractorError('YouTube said: %s' % err_msg, expected=True)
if data.get('contents') or data.get('currentVideoEndpoint'): if data.get('contents') or data.get('currentVideoEndpoint'):
break break
if count >= retries: if count >= retries:
self._downloader.report_error(last_error) self._downloader.report_error(last_error)
return webpage, data
def _real_extract(self, url):
item_id = self._match_id(url)
url = compat_urlparse.urlunparse(
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
# This is not matched in a channel page with a tab selected
mobj = re.match(r'(?P<pre>%s)(?P<post>/?(?![^#?]).*$)' % self._VALID_URL, url)
mobj = mobj.groupdict() if mobj else {}
if mobj and not mobj.get('not_channel'):
self._downloader.report_warning(
'A channel/user page was given. All the channel\'s videos will be downloaded. '
'To download only the videos in the home page, add a "/featured" to the URL')
url = '%s/videos%s' % (mobj.get('pre'), mobj.get('post') or '')
# Handle both video/playlist URLs
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('v', [None])[0]
playlist_id = qs.get('list', [None])[0]
if not video_id and (mobj.get('not_channel') or '').startswith('watch'):
if not playlist_id:
# If there is neither video or playlist ids,
# youtube redirects to home page, which is undesirable
raise ExtractorError('Unable to recognize tab page')
self._downloader.report_warning('A video URL was given without video ID. Trying to download playlist %s' % playlist_id)
url = 'https://www.youtube.com/playlist?list=%s' % playlist_id
if video_id and playlist_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
self.to_screen('Downloading playlist %s; add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage, data = self._extract_webpage(url, item_id)
tabs = try_get( tabs = try_get(
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list) data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
if tabs: if tabs:
return self._extract_from_tabs(item_id, webpage, data, tabs, identity_token) return self._extract_from_tabs(item_id, webpage, data, tabs)
playlist = try_get( playlist = try_get(
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict) data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
if playlist: if playlist:
return self._extract_from_playlist(item_id, url, data, playlist) return self._extract_from_playlist(item_id, url, data, playlist)
# Fallback to video extraction if no playlist alike page is recognized.
# First check for the current video then try the v attribute of URL query.
video_id = try_get( video_id = try_get(
data, lambda x: x['currentVideoEndpoint']['watchEndpoint']['videoId'], data, lambda x: x['currentVideoEndpoint']['watchEndpoint']['videoId'],
compat_str) or video_id compat_str) or video_id
if video_id: if video_id:
self._downloader.report_warning('Unable to recognize playlist. Downloading just video %s' % video_id)
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id) return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
# Failed to recognize
raise ExtractorError('Unable to recognize tab page') raise ExtractorError('Unable to recognize tab page')
@@ -3338,7 +3510,6 @@ class YoutubeFeedsInfoExtractor(YoutubeTabIE):
Subclasses must define the _FEED_NAME property. Subclasses must define the _FEED_NAME property.
""" """
_LOGIN_REQUIRED = True _LOGIN_REQUIRED = True
# _MAX_PAGES = 5
_TESTS = [] _TESTS = []
@property @property

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
@@ -16,24 +17,34 @@ from ..utils import (
class Zee5IE(InfoExtractor): class Zee5IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?zee5\.com/[^#?]*/(?P<display_id>[-\w]+)/(?P<id>[-\d]+)' _VALID_URL = r'''(?x)
(?:
zee5:|
(?:https?://)(?:www\.)?zee5\.com/(?:[^#?]+/)?
(?:
(?:tvshows|kids|zee5originals)(?:/[^#/?]+){3}
|movies/[^#/?]+
)/(?P<display_id>[^#/?]+)/
)
(?P<id>[^#/?]+)/?(?:$|[?#])
'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.zee5.com/movies/details/krishna-the-birth/0-0-63098', 'url': 'https://www.zee5.com/movies/details/krishna-the-birth/0-0-63098',
'info_dict': { 'info_dict': {
"id": "0-0-63098", 'id': '0-0-63098',
"ext": "m3u8", 'ext': 'mp4',
"display_id": "krishna-the-birth", 'display_id': 'krishna-the-birth',
"title": "Krishna - The Birth", 'title': 'Krishna - The Birth',
"duration": 4368, 'duration': 4368,
"average_rating": 4, 'average_rating': 4,
"description": str, 'description': str,
"alt_title": "Krishna - The Birth", 'alt_title': 'Krishna - The Birth',
"uploader": "Zee Entertainment Enterprises Ltd", 'uploader': 'Zee Entertainment Enterprises Ltd',
"release_date": "20060101", 'release_date': '20060101',
"upload_date": "20060101", 'upload_date': '20060101',
"timestamp": 1136073600, 'timestamp': 1136073600,
"thumbnail": "https://akamaividz.zee5.com/resources/0-0-63098/list/270x152/0063098_list_80888170.jpg", 'thumbnail': 'https://akamaividz.zee5.com/resources/0-0-63098/list/270x152/0063098_list_80888170.jpg',
"tags": list 'tags': list
}, },
'params': { 'params': {
'format': 'bv', 'format': 'bv',
@@ -41,37 +52,43 @@ class Zee5IE(InfoExtractor):
}, { }, {
'url': 'https://zee5.com/tvshows/details/krishna-balram/0-6-1871/episode-1-the-test-of-bramha/0-1-233402', 'url': 'https://zee5.com/tvshows/details/krishna-balram/0-6-1871/episode-1-the-test-of-bramha/0-1-233402',
'info_dict': { 'info_dict': {
"id": "0-1-233402", 'id': '0-1-233402',
'ext': 'm3u8', 'ext': 'mp4',
"display_id": "episode-1-the-test-of-bramha", 'display_id': 'episode-1-the-test-of-bramha',
"title": "Episode 1 - The Test Of Bramha", 'title': 'Episode 1 - The Test Of Bramha',
"duration": 1336, 'duration': 1336,
"average_rating": 4, 'average_rating': 4,
"description": str, 'description': str,
"alt_title": "Episode 1 - The Test Of Bramha", 'alt_title': 'Episode 1 - The Test Of Bramha',
"uploader": "Green Gold", 'uploader': 'Green Gold',
"release_date": "20090101", 'release_date': '20090101',
"upload_date": "20090101", 'upload_date': '20090101',
"timestamp": 1230768000, 'timestamp': 1230768000,
"thumbnail": "https://akamaividz.zee5.com/resources/0-1-233402/list/270x152/01233402_list.jpg", 'thumbnail': 'https://akamaividz.zee5.com/resources/0-1-233402/list/270x152/01233402_list.jpg',
"series": "Krishna Balram", 'series': 'Krishna Balram',
"season_number": 1, 'season_number': 1,
"episode_number": 1, 'episode_number': 1,
"tags": list, 'tags': list,
}, },
'params': { 'params': {
'format': 'bv', 'format': 'bv',
}, },
}, {
'url': 'https://www.zee5.com/hi/tvshows/details/kundali-bhagya/0-6-366/kundali-bhagya-march-08-2021/0-1-manual_7g9jv1os7730?country=IN',
'only_matching': True
}, {
'url': 'https://www.zee5.com/global/hi/tvshows/details/kundali-bhagya/0-6-366/kundali-bhagya-march-08-2021/0-1-manual_7g9jv1os7730',
'only_matching': True
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).group('id', 'display_id') video_id, display_id = re.match(self._VALID_URL, url).group('id', 'display_id')
access_token_request = self._download_json( access_token_request = self._download_json(
'https://useraction.zee5.com/token/platform_tokens.php?platform_name=web_app', 'https://useraction.zee5.com/token/platform_tokens.php?platform_name=web_app',
video_id, note="Downloading access token") video_id, note='Downloading access token')
token_request = self._download_json( token_request = self._download_json(
'https://useraction.zee5.com/tokennd', 'https://useraction.zee5.com/tokennd',
video_id, note="Downloading video token") video_id, note='Downloading video token')
json_data = self._download_json( json_data = self._download_json(
'https://gwapi.zee5.com/content/details/{}?translation=en&country=IN'.format(video_id), 'https://gwapi.zee5.com/content/details/{}?translation=en&country=IN'.format(video_id),
video_id, headers={'X-Access-Token': access_token_request['token']}) video_id, headers={'X-Access-Token': access_token_request['token']})
@@ -111,3 +128,78 @@ class Zee5IE(InfoExtractor):
'episode_number': int_or_none(try_get(json_data, lambda x: x['index'])), 'episode_number': int_or_none(try_get(json_data, lambda x: x['index'])),
'tags': try_get(json_data, lambda x: x['tags'], list) 'tags': try_get(json_data, lambda x: x['tags'], list)
} }
class Zee5SeriesIE(InfoExtractor):
IE_NAME = 'zee5:series'
_VALID_URL = r'''(?x)
(?:
zee5:series:|
(?:https?://)(?:www\.)?zee5\.com/(?:[^#?]+/)?
(?:tvshows|kids|zee5originals)(?:/[^#/?]+){2}/
)
(?P<id>[^#/?]+)/?(?:$|[?#])
'''
_TESTS = [{
'url': 'https://www.zee5.com/kids/kids-shows/krishna-balram/0-6-1871',
'playlist_mincount': 43,
'info_dict': {
'id': '0-6-1871',
},
}, {
'url': 'https://www.zee5.com/tvshows/details/bhabi-ji-ghar-par-hai/0-6-199',
'playlist_mincount': 1500,
'info_dict': {
'id': '0-6-199',
},
}, {
'url': 'https://www.zee5.com/tvshows/details/agent-raghav-crime-branch/0-6-965',
'playlist_mincount': 25,
'info_dict': {
'id': '0-6-965',
},
}, {
'url': 'https://www.zee5.com/ta/tvshows/details/nagabhairavi/0-6-3201',
'playlist_mincount': 3,
'info_dict': {
'id': '0-6-3201',
},
}, {
'url': 'https://www.zee5.com/global/hi/tvshows/details/khwaabon-ki-zamin-par/0-6-270',
'playlist_mincount': 150,
'info_dict': {
'id': '0-6-270',
},
}
]
def _entries(self, show_id):
access_token_request = self._download_json(
'https://useraction.zee5.com/token/platform_tokens.php?platform_name=web_app',
show_id, note='Downloading access token')
headers = {
'X-Access-Token': access_token_request['token'],
'Referer': 'https://www.zee5.com/',
}
show_url = 'https://gwapi.zee5.com/content/tvshow/{}?translation=en&country=IN'.format(show_id)
page_num = 0
show_json = self._download_json(show_url, video_id=show_id, headers=headers)
for season in show_json.get('seasons') or []:
season_id = try_get(season, lambda x: x['id'], compat_str)
next_url = 'https://gwapi.zee5.com/content/tvshow/?season_id={}&type=episode&translation=en&country=IN&on_air=false&asset_subtype=tvshow&page=1&limit=100'.format(season_id)
while next_url:
page_num += 1
episodes_json = self._download_json(
next_url, video_id=show_id, headers=headers,
note='Downloading JSON metadata page %d' % page_num)
for episode in try_get(episodes_json, lambda x: x['episode'], list) or []:
video_id = episode.get('id')
yield self.url_result(
'zee5:%s' % video_id,
ie=Zee5IE.ie_key(), video_id=video_id)
next_url = url_or_none(episodes_json.get('next_episode_api'))
def _real_extract(self, url):
show_id = self._match_id(url)
return self.playlist_result(self._entries(show_id), playlist_id=show_id)

View File

@@ -214,12 +214,11 @@ def parseOpts(overrideArguments=None):
help='Mark videos watched (YouTube only)') help='Mark videos watched (YouTube only)')
general.add_option( general.add_option(
'--no-mark-watched', '--no-mark-watched',
action='store_false', dest='mark_watched', default=False, action='store_false', dest='mark_watched',
help='Do not mark videos watched') help='Do not mark videos watched (default)')
general.add_option( general.add_option(
'--no-colors', '--no-colors',
action='store_true', dest='no_color', action='store_true', dest='no_color', default=False,
default=False,
help='Do not emit color codes in output') help='Do not emit color codes in output')
network = optparse.OptionGroup(parser, 'Network Options') network = optparse.OptionGroup(parser, 'Network Options')
@@ -558,6 +557,10 @@ def parseOpts(overrideArguments=None):
help='Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags') help='Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags')
downloader = optparse.OptionGroup(parser, 'Download Options') downloader = optparse.OptionGroup(parser, 'Download Options')
downloader.add_option(
'-N', '--concurrent-fragments',
dest='concurrent_fragment_downloads', metavar='N', default=1, type=int,
help='Number of fragments to download concurrently (default is %default)')
downloader.add_option( downloader.add_option(
'-r', '--limit-rate', '--rate-limit', '-r', '--limit-rate', '--rate-limit',
dest='ratelimit', metavar='RATE', dest='ratelimit', metavar='RATE',
@@ -1087,8 +1090,8 @@ def parseOpts(overrideArguments=None):
'The supported executables are: SponSkrub, FFmpeg, FFprobe, and AtomicParsley. ' 'The supported executables are: SponSkrub, FFmpeg, FFprobe, and AtomicParsley. '
'You can also specify "PP+EXE:ARGS" to give the arguments to the specified executable ' 'You can also specify "PP+EXE:ARGS" to give the arguments to the specified executable '
'only when being used by the specified postprocessor. Additionally, for ffmpeg/ffprobe, ' 'only when being used by the specified postprocessor. Additionally, for ffmpeg/ffprobe, '
'a number can be appended to the exe name seperated by "_i" to pass the argument ' '"_i"/"_o" can be appended to the prefix optionally followed by a number to pass the argument '
'before the specified input file. Eg: --ppa "Merger+ffmpeg_i1:-v quiet". ' 'before the specified input/output file. Eg: --ppa "Merger+ffmpeg_i1:-v quiet". '
'You can use this option multiple times to give different arguments to different ' 'You can use this option multiple times to give different arguments to different '
'postprocessors. (Alias: --ppa)')) 'postprocessors. (Alias: --ppa)'))
postproc.add_option( postproc.add_option(
@@ -1154,7 +1157,7 @@ def parseOpts(overrideArguments=None):
help='Write metadata to the video file\'s xattrs (using dublin core and xdg standards)') help='Write metadata to the video file\'s xattrs (using dublin core and xdg standards)')
postproc.add_option( postproc.add_option(
'--fixup', '--fixup',
metavar='POLICY', dest='fixup', default='detect_or_warn', metavar='POLICY', dest='fixup', default=None,
help=( help=(
'Automatically correct known faults of the file. ' 'Automatically correct known faults of the file. '
'One of never (do nothing), warn (only emit a warning), ' 'One of never (do nothing), warn (only emit a warning), '
@@ -1179,6 +1182,17 @@ def parseOpts(overrideArguments=None):
'--convert-subs', '--convert-subtitles', '--convert-subs', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None, metavar='FORMAT', dest='convertsubtitles', default=None,
help='Convert the subtitles to other format (currently supported: srt|ass|vtt|lrc)') help='Convert the subtitles to other format (currently supported: srt|ass|vtt|lrc)')
postproc.add_option(
'--split-chapters', '--split-tracks',
dest='split_chapters', action='store_true', default=False,
help=(
'Split video into multiple files based on internal chapters. '
'The "chapter:" prefix can be used with "--paths" and "--output" to '
'set the output filename for the split files. See "OUTPUT TEMPLATE" for details'))
postproc.add_option(
'--no-split-chapters', '--no-split-tracks',
dest='split_chapters', action='store_false',
help='Do not split video based on chapters (default)')
sponskrub = optparse.OptionGroup(parser, 'SponSkrub (SponsorBlock) Options', description=( sponskrub = optparse.OptionGroup(parser, 'SponSkrub (SponsorBlock) Options', description=(
'SponSkrub (https://github.com/yt-dlp/SponSkrub) is a utility to mark/remove sponsor segments ' 'SponSkrub (https://github.com/yt-dlp/SponSkrub) is a utility to mark/remove sponsor segments '

View File

@@ -13,6 +13,7 @@ from .ffmpeg import (
FFmpegVideoConvertorPP, FFmpegVideoConvertorPP,
FFmpegVideoRemuxerPP, FFmpegVideoRemuxerPP,
FFmpegSubtitlesConvertorPP, FFmpegSubtitlesConvertorPP,
FFmpegSplitChaptersPP,
) )
from .xattrpp import XAttrMetadataPP from .xattrpp import XAttrMetadataPP
from .execafterdownload import ExecAfterDownloadPP from .execafterdownload import ExecAfterDownloadPP
@@ -31,6 +32,7 @@ __all__ = [
'ExecAfterDownloadPP', 'ExecAfterDownloadPP',
'FFmpegEmbedSubtitlePP', 'FFmpegEmbedSubtitlePP',
'FFmpegExtractAudioPP', 'FFmpegExtractAudioPP',
'FFmpegSplitChaptersPP',
'FFmpegFixupM3u8PP', 'FFmpegFixupM3u8PP',
'FFmpegFixupM4aPP', 'FFmpegFixupM4aPP',
'FFmpegFixupStretchedPP', 'FFmpegFixupStretchedPP',

View File

@@ -91,10 +91,18 @@ class PostProcessor(object):
except Exception: except Exception:
self.report_warning(errnote) self.report_warning(errnote)
def _configuration_args(self, *args, **kwargs): def _configuration_args(self, exe, keys=None, default=[], use_compat=True):
pp_key = self.pp_key().lower()
exe = exe.lower()
root_key = exe if pp_key == exe else '%s+%s' % (pp_key, exe)
keys = ['%s%s' % (root_key, k) for k in (keys or [''])]
if root_key in keys:
keys += [root_key] + ([] if pp_key == exe else [(self.pp_key(), exe)]) + ['default']
else:
use_compat = False
return cli_configuration_args( return cli_configuration_args(
self._downloader.params.get('postprocessor_args'), self._downloader.params.get('postprocessor_args'),
self.pp_key().lower(), *args, **kwargs) keys, default, use_compat)
class AudioConversionError(PostProcessingError): class AudioConversionError(PostProcessingError):

View File

@@ -85,6 +85,8 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
thumbnail_filename = thumbnail_jpg_filename thumbnail_filename = thumbnail_jpg_filename
thumbnail_ext = 'jpg' thumbnail_ext = 'jpg'
mtime = os.stat(encodeFilename(filename)).st_mtime
success = True success = True
if info['ext'] == 'mp3': if info['ext'] == 'mp3':
options = [ options = [
@@ -139,7 +141,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
encodeFilename(thumbnail_filename, True), encodeFilename(thumbnail_filename, True),
encodeArgument('-o'), encodeArgument('-o'),
encodeFilename(temp_filename, True)] encodeFilename(temp_filename, True)]
cmd += [encodeArgument(o) for o in self._configuration_args(exe='AtomicParsley')] cmd += [encodeArgument(o) for o in self._configuration_args('AtomicParsley')]
self.to_screen('Adding thumbnail to "%s"' % filename) self.to_screen('Adding thumbnail to "%s"' % filename)
self.write_debug('AtomicParsley command line: %s' % shell_quote(cmd)) self.write_debug('AtomicParsley command line: %s' % shell_quote(cmd))
@@ -187,6 +189,8 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
os.remove(encodeFilename(filename)) os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename)) os.rename(encodeFilename(temp_filename), encodeFilename(filename))
self.try_utime(filename, mtime, mtime)
files_to_delete = [thumbnail_filename] files_to_delete = [thumbnail_filename]
if self._already_have_thumbnail: if self._already_have_thumbnail:
info['__files_to_move'][original_thumbnail] = replace_extension( info['__files_to_move'][original_thumbnail] = replace_extension(

View File

@@ -10,6 +10,7 @@ import json
from .common import AudioConversionError, PostProcessor from .common import AudioConversionError, PostProcessor
from ..compat import compat_str
from ..utils import ( from ..utils import (
encodeArgument, encodeArgument,
encodeFilename, encodeFilename,
@@ -234,25 +235,35 @@ class FFmpegPostProcessor(PostProcessor):
return num, len(streams) return num, len(streams)
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts): def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
return self.real_run_ffmpeg(
[(path, []) for path in input_paths],
[(out_path, opts)])
def real_run_ffmpeg(self, input_path_opts, output_path_opts):
self.check_version() self.check_version()
oldest_mtime = min( oldest_mtime = min(
os.stat(encodeFilename(path)).st_mtime for path in input_paths) os.stat(encodeFilename(path)).st_mtime for path, _ in input_path_opts)
cmd = [encodeFilename(self.executable, True), encodeArgument('-y')] cmd = [encodeFilename(self.executable, True), encodeArgument('-y')]
# avconv does not have repeat option # avconv does not have repeat option
if self.basename == 'ffmpeg': if self.basename == 'ffmpeg':
cmd += [encodeArgument('-loglevel'), encodeArgument('repeat+info')] cmd += [encodeArgument('-loglevel'), encodeArgument('repeat+info')]
def make_args(file, pre=[], post=[], *args, **kwargs): def make_args(file, args, name, number):
args = pre + self._configuration_args(*args, **kwargs) + post keys = ['_%s%d' % (name, number), '_%s' % name]
if name == 'o' and number == 1:
keys.append('')
args += self._configuration_args(self.basename, keys)
if name == 'i':
args.append('-i')
return ( return (
[encodeArgument(o) for o in args] [encodeArgument(arg) for arg in args]
+ [encodeFilename(self._ffmpeg_filename_argument(file), True)]) + [encodeFilename(self._ffmpeg_filename_argument(file), True)])
for i, path in enumerate(input_paths): for arg_type, path_opts in (('i', input_path_opts), ('o', output_path_opts)):
cmd += make_args(path, post=['-i'], exe='%s_i%d' % (self.basename, i + 1), use_default_arg=False) cmd += [arg for i, o in enumerate(path_opts)
cmd += make_args(out_path, pre=opts, exe=self.basename) for arg in make_args(o[0], o[1], arg_type, i + 1)]
self.write_debug('ffmpeg command line: %s' % shell_quote(cmd)) self.write_debug('ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE) p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
@@ -262,7 +273,8 @@ class FFmpegPostProcessor(PostProcessor):
if self.get_param('verbose', False): if self.get_param('verbose', False):
self.report_error(stderr) self.report_error(stderr)
raise FFmpegPostProcessorError(stderr.split('\n')[-1]) raise FFmpegPostProcessorError(stderr.split('\n')[-1])
self.try_utime(out_path, oldest_mtime, oldest_mtime) for out_path, _ in output_path_opts:
self.try_utime(out_path, oldest_mtime, oldest_mtime)
return stderr.decode('utf-8', 'replace') return stderr.decode('utf-8', 'replace')
def run_ffmpeg(self, path, out_path, opts): def run_ffmpeg(self, path, out_path, opts):
@@ -758,3 +770,40 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
} }
return sub_filenames, info return sub_filenames, info
class FFmpegSplitChaptersPP(FFmpegPostProcessor):
def _prepare_filename(self, number, chapter, info):
info = info.copy()
info.update({
'section_number': number,
'section_title': chapter.get('title'),
'section_start': chapter.get('start_time'),
'section_end': chapter.get('end_time'),
})
return self._downloader.prepare_filename(info, 'chapter')
def _ffmpeg_args_for_chapter(self, number, chapter, info):
destination = self._prepare_filename(number, chapter, info)
if not self._downloader._ensure_dir_exists(encodeFilename(destination)):
return
chapter['_filename'] = destination
self.to_screen('Chapter %03d; Destination: %s' % (number, destination))
return (
destination,
['-ss', compat_str(chapter['start_time']),
'-to', compat_str(chapter['end_time'])])
def run(self, info):
chapters = info.get('chapters') or []
if not chapters:
self.report_warning('There are no tracks to extract')
return [], info
self.to_screen('Splitting video by chapters; %d chapters found' % len(chapters))
for idx, chapter in enumerate(chapters):
destination, opts = self._ffmpeg_args_for_chapter(idx + 1, chapter, info)
self.real_run_ffmpeg([(info['filepath'], opts)], [(destination, ['-c', 'copy'])])
return [], info

View File

@@ -71,7 +71,7 @@ class SponSkrubPP(PostProcessor):
if not self.cutout: if not self.cutout:
cmd += ['-chapter'] cmd += ['-chapter']
cmd += compat_shlex_split(self.args) # For backward compatibility cmd += compat_shlex_split(self.args) # For backward compatibility
cmd += self._configuration_args(exe=self._exe_name, use_default_arg='no_compat') cmd += self._configuration_args(self._exe_name, use_compat=False)
cmd += ['--', information['id'], filename, temp_filename] cmd += ['--', information['id'], filename, temp_filename]
cmd = [encodeArgument(i) for i in cmd] cmd = [encodeArgument(i) for i in cmd]

View File

@@ -49,12 +49,16 @@ def update_self(to_screen, verbose, opener):
h.update(mv[:n]) h.update(mv[:n])
return h.hexdigest() return h.hexdigest()
to_screen('Current Build Hash %s' % calc_sha256sum(sys.executable))
if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, 'frozen'): if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, 'frozen'):
to_screen('It looks like you installed yt-dlp with a package manager, pip, setup.py or a tarball. Please use that to update.') to_screen('It looks like you installed yt-dlp with a package manager, pip, setup.py or a tarball. Please use that to update.')
return return
# sys.executable is set to the full pathname of the exe-file for py2exe
# though symlinks are not followed so that we need to do this manually
# with help of realpath
filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
to_screen('Current Build Hash %s' % calc_sha256sum(filename))
# Download and check versions info # Download and check versions info
try: try:
version_info = opener.open(JSON_URL).read().decode('utf-8') version_info = opener.open(JSON_URL).read().decode('utf-8')
@@ -103,11 +107,6 @@ def update_self(to_screen, verbose, opener):
(i[1] for i in hashes if i[0] == 'yt-dlp%s' % label), (i[1] for i in hashes if i[0] == 'yt-dlp%s' % label),
None) None)
# sys.executable is set to the full pathname of the exe-file for py2exe
# though symlinks are not followed so that we need to do this manually
# with help of realpath
filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
if not os.access(filename, os.W_OK): if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename) to_screen('ERROR: no write permissions on %s' % filename)
return return
@@ -198,28 +197,18 @@ def update_self(to_screen, verbose, opener):
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest') to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
return return
expected_sum = get_sha256sum('zip', py_ver)
if expected_sum and hashlib.sha256(newcontent).hexdigest() != expected_sum:
to_screen('ERROR: unable to verify the new zip')
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
return
try: try:
with open(filename + '.new', 'wb') as outf: with open(filename, 'wb') as outf:
outf.write(newcontent) outf.write(newcontent)
except (IOError, OSError): except (IOError, OSError):
if verbose: if verbose:
to_screen(encode_compat_str(traceback.format_exc())) to_screen(encode_compat_str(traceback.format_exc()))
to_screen('ERROR: unable to write the new version')
return
expected_sum = get_sha256sum('zip', py_ver)
if expected_sum and calc_sha256sum(filename + '.new') != expected_sum:
to_screen('ERROR: unable to verify the new zip')
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
try:
os.remove(filename + '.new')
except OSError:
to_screen('ERROR: unable to remove corrupt zip')
return
try:
os.rename(filename + '.new', filename)
except OSError:
to_screen('ERROR: unable to overwrite current version') to_screen('ERROR: unable to overwrite current version')
return return

View File

@@ -4182,8 +4182,10 @@ def qualities(quality_ids):
DEFAULT_OUTTMPL = { DEFAULT_OUTTMPL = {
'default': '%(title)s [%(id)s].%(ext)s', 'default': '%(title)s [%(id)s].%(ext)s',
'chapter': '%(title)s - %(section_number)03d %(section_title)s [%(id)s].%(ext)s',
} }
OUTTMPL_TYPES = { OUTTMPL_TYPES = {
'chapter': None,
'subtitle': None, 'subtitle': None,
'thumbnail': None, 'thumbnail': None,
'description': 'description', 'description': 'description',
@@ -4692,36 +4694,26 @@ def cli_valueless_option(params, command_option, param, expected_value=True):
return [command_option] if param == expected_value else [] return [command_option] if param == expected_value else []
def cli_configuration_args(argdict, key, default=[], exe=None, use_default_arg=True): def cli_configuration_args(argdict, keys, default=[], use_compat=True):
# use_default_arg can be True, False, or 'no_compat'
if isinstance(argdict, (list, tuple)): # for backward compatibility if isinstance(argdict, (list, tuple)): # for backward compatibility
if use_default_arg is True: if use_compat:
return argdict return argdict
else: else:
argdict = None argdict = None
if argdict is None: if argdict is None:
return default return default
assert isinstance(argdict, dict) assert isinstance(argdict, dict)
key = key.lower() assert isinstance(keys, (list, tuple))
args = exe_args = None for key_list in keys:
if exe is not None: if isinstance(key_list, compat_str):
assert isinstance(exe, compat_str) key_list = (key_list,)
exe = exe.lower() arg_list = list(filter(
args = argdict.get('%s+%s' % (key, exe)) lambda x: x is not None,
if args is None: [argdict.get(key.lower()) for key in key_list]))
exe_args = argdict.get(exe) if arg_list:
return [arg for args in arg_list for arg in args]
if args is None: return default
args = argdict.get(key) if key != exe else None
if args is None and exe_args is None:
args = argdict.get('default', default) if use_default_arg else default
args, exe_args = args or [], exe_args or []
assert isinstance(args, (list, tuple))
assert isinstance(exe_args, (list, tuple))
return args + exe_args
class ISO639Utils(object): class ISO639Utils(object):

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2021.03.01' __version__ = '2021.03.07'