1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-12 09:51:15 +00:00

Compare commits

..

32 Commits

Author SHA1 Message Date
github-actions
9fd03a1696 [version] update
Created by: pukkandan

:ci skip all :ci run dl
2022-08-14 22:18:33 +00:00
pukkandan
55937202b7 Release 2022.08.14 2022-08-15 03:45:12 +05:30
pukkandan
1e4fca9a87 [cleanup] Misc 2022-08-15 03:41:45 +05:30
pukkandan
49b4ceaedf [jsinterp] Bring or-par with youtube-dl
Partially cherry-picked from: d231b56717

Authored by pukkandan, dirkf
2022-08-15 03:31:49 +05:30
pukkandan
d711839760 Update to ytdl-commit-e6a836d
[core] Make `--max-downloads ...` stop immediately on reaching the limit
e6a836d54c
2022-08-15 03:31:48 +05:30
pukkandan
48732becfe Fix bug in 1155ecef29 2022-08-15 03:31:48 +05:30
pukkandan
6440c45ff3 [update] Copy bitmask from old binary
Improves a6125983ab

Authored by: Lesmiscore
2022-08-15 03:31:47 +05:30
masta79
ef6342bd07 [extractor/toggo] Improve _VALID_URL (#4663)
Authored by: masta79
2022-08-15 03:31:41 +05:30
ischmidt20
e183bb8c9b [extractor/MLB] New extractor (#4586)
Authored by: ischmidt20
2022-08-15 01:47:18 +05:30
HobbyistDev
7695f5a0a7 [extractor/moview] Add extractor (#4607)
Authored by: HobbyistDev
2022-08-15 01:39:05 +05:30
Ben Welsh
cb7cc448c0 [extractor/truth] Add extractor (#4609)
Closes #3865
Authored by: palewire
2022-08-15 01:36:04 +05:30
bashonly
63be30e3e0 [extractor/facebook] Add reel support (#4660)
Closes #4039 
Authored by: bashonly
2022-08-15 01:33:24 +05:30
Ben Welsh
43cf982ac3 [extractor/parler] Add extractor (#4616)
Authored by: palewire
2022-08-15 01:31:16 +05:30
nixxo
7e82397441 [extractor/rai] Misc fixes (#4600)
Authored by: nixxo
2022-08-15 01:17:55 +05:30
Aldo Ridhoni
66c4afd828 [extractor/doodstream] Add wf domain (#4648)
Authored by: aldoridhoni
2022-08-15 01:13:03 +05:30
pukkandan
0e0ce898f6 [ThumbnailsConvertor] Fix conversion after fixup_webp
Closes #4565
2022-08-14 20:34:55 +05:30
pukkandan
a6125983ab [update] Set executable bit-mask
Closes #4621
2022-08-14 19:22:35 +05:30
pukkandan
8f84770acd [utils] Fix get_compatible_ext
Closes #4647
2022-08-14 19:22:34 +05:30
Lesmiscore
62b58c0936 [docs] Consistent use of e.g. (#4643)
Authored by: Lesmiscore
2022-08-14 17:34:13 +05:30
pukkandan
8f53dc44a0 [jsinterp] Handle new youtube signature functions
Closes #4635
2022-08-14 05:12:32 +05:30
Jacob Truman
1cddfdc52b [extractor/aenetworks] Add formats parameter (#4645)
Closes #4047
Authored by: jacobtruman
2022-08-13 22:56:41 +05:30
coletdjnz
cea4b857f0 [patreon] Ignore erroneous media attachments (#4638)
Fixes https://github.com/yt-dlp/yt-dlp/issues/4608
Authored by: coletdjnz
2022-08-13 00:25:20 +00:00
shirt
ffcd62c289 [extractor/tubitv] Extract additional formats (#4646)
Authored by: shirt-dev
2022-08-13 05:10:49 +05:30
pukkandan
a1c5bd82ec [jsinterp] Truncate error messages
Related: #4635
2022-08-12 19:15:16 +05:30
pukkandan
5da42f2b9b [extractor/crunchyroll] Improve _VALID_URLs
Closes #4633
2022-08-12 13:13:11 +05:30
pukkandan
1155ecef29 [extractor/zattoo] Fix resellers
Fixes #4630
2022-08-12 12:53:46 +05:30
pukkandan
96623ab5c6 [devscripts] Fix import
Closes #4603
2022-08-11 07:23:48 +05:30
pukkandan
7e798d725e [extractor] Fix format sorting of channels 2022-08-11 07:23:46 +05:30
pukkandan
8420a4d063 [ffmpeg] Smarter detection of ffprobe filename 2022-08-11 07:23:45 +05:30
pukkandan
b5e9a641f5 [postprocessor/embedthumbnail] Detect libatomicparsley.so 2022-08-11 07:23:36 +05:30
pukkandan
c220d9efc8 [ffmpeg] Disable avconv unless --prefer-avconv 2022-08-09 05:15:38 +05:30
pukkandan
81e0195998 [build] Fix changelog
Bug in c4b6c5c7c9
2022-08-09 03:58:29 +05:30
50 changed files with 1118 additions and 422 deletions

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting a broken site
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -62,7 +62,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -70,8 +70,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting a new site support request
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -74,7 +74,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -82,8 +82,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm requesting a site-specific feature
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -70,7 +70,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -78,8 +78,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting a bug unrelated to a specific site
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -55,7 +55,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -63,8 +63,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell
validations:

View File

@@ -20,7 +20,7 @@ body:
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
@@ -51,7 +51,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -59,7 +59,7 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell

View File

@@ -26,7 +26,7 @@ body:
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2022.08.08** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.08.14** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
@@ -57,7 +57,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.08.08 [9d339c4] (win32_exe)
[debug] yt-dlp version 2022.08.14 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -65,7 +65,7 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.08.08, Current version: 2022.08.08
yt-dlp is up to date (2022.08.08)
Latest version: 2022.08.14, Current version: 2022.08.14
yt-dlp is up to date (2022.08.14)
<more lines>
render: shell

View File

@@ -257,7 +257,7 @@ jobs:
- name: Get Changelog
run: |
changelog=$(grep -oPz '(?s)(?<=### ${{ steps.bump_version.outputs.ytdlp_version }}\n{2}).+?(?=\n{2,3}###)' Changelog.md) || true
changelog=$(grep -oPz '(?s)(?<=### ${{ needs.prepare.outputs.ytdlp_version }}\n{2}).+?(?=\n{2,3}###)' Changelog.md) || true
echo "changelog<<EOF" >> $GITHUB_ENV
echo "$changelog" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV

View File

@@ -195,7 +195,7 @@ After you have ensured this site is distributing its content legally, you can fo
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type (for example int or float)
# * Any Python type, e.g. int or float
}
}]
@@ -261,7 +261,7 @@ The aforementioned metafields are the critical data that the extraction does not
For pornographic sites, appropriate `age_limit` must also be returned.
The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - Eg: when the video is a live stream that has not started yet.
The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - e.g. when the video is a live stream that has not started yet.
[Any field](yt_dlp/extractor/common.py#219-L426) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.

View File

@@ -294,3 +294,8 @@ haobinliang
Mehavoid
winterbird-code
yashkc2025
aldoridhoni
bashonly
jacobtruman
masta79
palewire

View File

@@ -11,6 +11,37 @@
-->
### 2022.08.14
* Merge youtube-dl: Upto [commit/d231b56](https://github.com/ytdl-org/youtube-dl/commit/d231b56)
* [jsinterp] Handle **new youtube signature functions**
* [jsinterp] Truncate error messages
* [extractor] Fix format sorting of `channels`
* [ffmpeg] Disable avconv unless `--prefer-avconv`
* [ffmpeg] Smarter detection of ffprobe filename
* [patreon] Ignore erroneous media attachments by [coletdjnz](https://github.com/coletdjnz)
* [postprocessor/embedthumbnail] Detect `libatomicparsley.so`
* [ThumbnailsConvertor] Fix conversion after `fixup_webp`
* [utils] Fix `get_compatible_ext`
* [build] Fix changelog
* [update] Set executable bit-mask by [pukkandan](https://github.com/pukkandan), [Lesmiscore](https://github.com/Lesmiscore)
* [devscripts] Fix import
* [docs] Consistent use of `e.g.` by [Lesmiscore](https://github.com/Lesmiscore)
* [cleanup] Misc fixes and cleanup
* [extractor/moview] Add extractor by [HobbyistDev](https://github.com/HobbyistDev)
* [extractor/parler] Add extractor by [palewire](https://github.com/palewire)
* [extractor/truth] Add extractor by [palewire](https://github.com/palewire)
* [extractor/aenetworks] Add formats parameter by [jacobtruman](https://github.com/jacobtruman)
* [extractor/crunchyroll] Improve `_VALID_URL`s
* [extractor/doodstream] Add `wf` domain by [aldoridhoni](https://github.com/aldoridhoni)
* [extractor/facebook] Add reel support by [bashonly](https://github.com/bashonly)
* [extractor/MLB] New extractor by [ischmidt20](https://github.com/ischmidt20)
* [extractor/rai] Misc fixes by [nixxo](https://github.com/nixxo)
* [extractor/toggo] Improve `_VALID_URL` by [masta79](https://github.com/masta79)
* [extractor/tubitv] Extract additional formats by [shirt-dev](https://github.com/shirt-dev)
* [extractor/zattoo] Potential fix for resellers
### 2022.08.08
* **Remove Python 3.6 support**
@@ -20,10 +51,10 @@
* `--compat-option no-live-chat` should disable danmaku
* Fix misleading DRM message
* Import ctypes only when necessary
* Minor bugfixes by [pukkandan](https://github.com/pukkandan)
* Reject entire playlists faster with `--match-filter` by [pukkandan](https://github.com/pukkandan)
* Minor bugfixes
* Reject entire playlists faster with `--match-filter`
* Remove filtered entries from `-J`
* Standardize retry mechanism by [pukkandan](https://github.com/pukkandan)
* Standardize retry mechanism
* Validate `--merge-output-format`
* [downloader] Add average speed to final progress line
* [extractor] Add field `audio_channels`
@@ -31,7 +62,7 @@
* [ffmpeg] Set `ffmpeg_location` in a contextvar
* [FFmpegThumbnailsConvertor] Fix conversion from GIF
* [MetadataParser] Don't set `None` when the field didn't match
* [outtmpl] Smarter replacing of unsupported characters by [pukkandan](https://github.com/pukkandan)
* [outtmpl] Smarter replacing of unsupported characters
* [outtmpl] Treat empty values as None in filenames
* [utils] sanitize_open: Allow any IO stream as stdout
* [build, devscripts] Add devscript to set a build variant
@@ -64,7 +95,7 @@
* [extractor/bbc] Fix news articles by [ajj8](https://github.com/ajj8)
* [extractor/camtasia] Separate into own extractor by [coletdjnz](https://github.com/coletdjnz)
* [extractor/cloudflarestream] Fix video_id padding by [haobinliang](https://github.com/haobinliang)
* [extractor/crunchyroll] Fix conversion of thumbnail from GIF by [pukkandan](https://github.com/pukkandan)
* [extractor/crunchyroll] Fix conversion of thumbnail from GIF
* [extractor/crunchyroll] Handle missing metadata correctly by [Burve](https://github.com/Burve), [pukkandan](https://github.com/pukkandan)
* [extractor/crunchyroll:beta] Extract timestamp and fix tests by [tejing1](https://github.com/tejing1)
* [extractor/crunchyroll:beta] Use streams API by [tejing1](https://github.com/tejing1)
@@ -211,7 +242,7 @@
* [**Deprecate support for Python 3.6**](https://github.com/yt-dlp/yt-dlp/issues/3764#issuecomment-1154051119)
* **Add option `--download-sections` to download video partially**
* Chapter regex and time ranges are accepted (Eg: `--download-sections *1:10-2:20`)
* Chapter regex and time ranges are accepted, e.g. `--download-sections *1:10-2:20`
* Add option `--alias`
* Add option `--lazy-playlist` to process entries as they are received
* Add option `--retry-sleep`
@@ -1375,7 +1406,7 @@
* Add new option `--netrc-location`
* [outtmpl] Allow alternate fields using `,`
* [outtmpl] Add format type `B` to treat the value as bytes (eg: to limit the filename to a certain number of bytes)
* [outtmpl] Add format type `B` to treat the value as bytes, e.g. to limit the filename to a certain number of bytes
* Separate the options `--ignore-errors` and `--no-abort-on-error`
* Basic framework for simultaneous download of multiple formats by [nao20010128nao](https://github.com/nao20010128nao)
* [17live] Add 17.live extractor by [nao20010128nao](https://github.com/nao20010128nao)
@@ -1765,7 +1796,7 @@
* Merge youtube-dl: Upto [commit/a803582](https://github.com/ytdl-org/youtube-dl/commit/a8035827177d6b59aca03bd717acb6a9bdd75ada)
* Add `--extractor-args` to pass some extractor-specific arguments. See [readme](https://github.com/yt-dlp/yt-dlp#extractor-arguments)
* Add extractor option `skip` for `youtube`. Eg: `--extractor-args youtube:skip=hls,dash`
* Add extractor option `skip` for `youtube`, e.g. `--extractor-args youtube:skip=hls,dash`
* Deprecates `--youtube-skip-dash-manifest`, `--youtube-skip-hls-manifest`, `--youtube-include-dash-manifest`, `--youtube-include-hls-manifest`
* Allow `--list...` options to work with `--print`, `--quiet` and other `--list...` options
* [youtube] Use `player` API for additional video extraction requests by [coletdjnz](https://github.com/coletdjnz)

View File

@@ -28,12 +28,12 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
[![gh-sponsor](https://img.shields.io/badge/_-Sponsor-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/coletdjnz)
* YouTube improvements including: age-gate bypass, private playlists, multiple-clients (to avoid throttling) and a lot of under-the-hood improvements
* Added support for downloading YoutubeWebArchive videos
* Added support for new websites MainStreaming, PRX, nzherald, etc
* Added support for new websites YoutubeWebArchive, MainStreaming, PRX, nzherald, Mediaklikk, StarTV etc
* Improved/fixed support for Patreon, panopto, gfycat, itv, pbs, SouthParkDE etc
## [Ashish0804](https://github.com/Ashish0804)
## [Ashish0804](https://github.com/Ashish0804) <sub><sup>[Inactive]</sup></sub>
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/ashish0804)
@@ -48,4 +48,5 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
**Monacoin**: mona1q3tf7dzvshrhfe3md379xtvt2n22duhglv5dskr
* Download live from start to end for YouTube
* Added support for new websites mildom, PixivSketch, skeb, radiko, voicy, mirrativ, openrec, whowatch, damtomo, 17.live, mixch etc
* Added support for new websites AbemaTV, mildom, PixivSketch, skeb, radiko, voicy, mirrativ, openrec, whowatch, damtomo, 17.live, mixch etc
* Improved/fixed support for fc2, YahooJapanNews, tver, iwara etc

147
README.md
View File

@@ -71,7 +71,7 @@ yt-dlp is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on t
# NEW FEATURES
* Merged with **youtube-dl v2021.12.17+ [commit/adb5294](https://github.com/ytdl-org/youtube-dl/commit/adb5294177265ba35b45746dbb600965076ed150)**<!--([exceptions](https://github.com/yt-dlp/yt-dlp/issues/21))--> and **youtube-dlc v2020.11.11-3+ [commit/f9401f2](https://github.com/blackjack4494/yt-dlc/commit/f9401f2a91987068139c5f757b12fc711d4c0cee)**: You get all the features and patches of [youtube-dlc](https://github.com/blackjack4494/yt-dlc) in addition to the latest [youtube-dl](https://github.com/ytdl-org/youtube-dl)
* Merged with **youtube-dl v2021.12.17+ [commit/d231b56](https://github.com/ytdl-org/youtube-dl/commit/d231b56717c73ee597d2e077d11b69ed48a1b02d)**<!--([exceptions](https://github.com/yt-dlp/yt-dlp/issues/21))--> and **youtube-dlc v2020.11.11-3+ [commit/f9401f2](https://github.com/blackjack4494/yt-dlc/commit/f9401f2a91987068139c5f757b12fc711d4c0cee)**: You get all the features and patches of [youtube-dlc](https://github.com/blackjack4494/yt-dlc) in addition to the latest [youtube-dl](https://github.com/ytdl-org/youtube-dl)
* **[SponsorBlock Integration](#sponsorblock-options)**: You can mark/remove sponsor sections in youtube videos by utilizing the [SponsorBlock](https://sponsor.ajay.app) API
@@ -146,7 +146,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* Some private fields such as filenames are removed by default from the infojson. Use `--no-clean-infojson` or `--compat-options no-clean-infojson` to revert this
* When `--embed-subs` and `--write-subs` are used together, the subtitles are written to disk and also embedded in the media file. You can use just `--embed-subs` to embed the subs and automatically delete the separate file. See [#630 (comment)](https://github.com/yt-dlp/yt-dlp/issues/630#issuecomment-893659460) for more info. `--compat-options no-keep-subs` can be used to revert this
* `certifi` will be used for SSL root certificates, if installed. If you want to use system certificates (e.g. self-signed), use `--compat-options no-certifi`
* youtube-dl tries to remove some superfluous punctuations from filenames. While this can sometimes be helpful, it is often undesirable. So yt-dlp tries to keep the fields in the filenames as close to their original values as possible. You can use `--compat-options filename-sanitization` to revert to youtube-dl's behavior
* yt-dlp's sanitization of invalid characters in filenames is different/smarter than in youtube-dl. You can use `--compat-options filename-sanitization` to revert to youtube-dl's behavior
For ease of use, a few more compat options are available:
@@ -376,7 +376,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
--extractor-descriptions Output descriptions of all supported
extractors and exit
--force-generic-extractor Force extraction to use the generic extractor
--default-search PREFIX Use this prefix for unqualified URLs. Eg:
--default-search PREFIX Use this prefix for unqualified URLs. E.g.
"gvsearch2:python" downloads two videos from
google videos for the search term "python".
Use the value "auto" to let yt-dlp guess
@@ -425,7 +425,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
an alias starts with a dash "-", it is
prefixed with "--". Arguments are parsed
according to the Python string formatting
mini-language. Eg: --alias get-audio,-X
mini-language. E.g. --alias get-audio,-X
"-S=aext:{0},abr -x --audio-format {0}"
creates options "--get-audio" and "-X" that
takes an argument (ARG0) and expands to
@@ -439,10 +439,10 @@ You can also fork the project on github and run your fork's [build workflow](.gi
## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. To
enable SOCKS proxy, specify a proper scheme.
Eg: socks5://user:pass@127.0.0.1:1080/. Pass
in an empty string (--proxy "") for direct
connection
enable SOCKS proxy, specify a proper scheme,
e.g. socks5://user:pass@127.0.0.1:1080/.
Pass in an empty string (--proxy "") for
direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
@@ -471,17 +471,17 @@ You can also fork the project on github and run your fork's [build workflow](.gi
compatibility, START-STOP is also supported.
Use negative indices to count from the right
and negative STEP to download in reverse
order. Eg: "-I 1:3,7,-5::2" used on a
order. E.g. "-I 1:3,7,-5::2" used on a
playlist of size 15 will download the videos
at index 1,2,3,7,11,13,15
--min-filesize SIZE Do not download any videos smaller than SIZE
(e.g. 50k or 44.6m)
--max-filesize SIZE Do not download any videos larger than SIZE
(e.g. 50k or 44.6m)
--min-filesize SIZE Do not download any videos smaller than
SIZE, e.g. 50k or 44.6M
--max-filesize SIZE Do not download any videos larger than SIZE,
e.g. 50k or 44.6M
--date DATE Download only videos uploaded on this date.
The date can be "YYYYMMDD" or in the format
[now|today|yesterday][-N[day|week|month|year]].
Eg: --date today-2weeks
E.g. --date today-2weeks
--datebefore DATE Download only videos uploaded on or before
this date. The date formats accepted is the
same as --date
@@ -498,7 +498,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
conditions. Use a "\" to escape "&" or
quotes if needed. If used multiple times,
the filter matches if atleast one of the
conditions are met. Eg: --match-filter
conditions are met. E.g. --match-filter
!is_live --match-filter "like_count>?100 &
description~='(?i)\bcats \& dogs\b'" matches
only videos that are not live OR those that
@@ -536,11 +536,11 @@ You can also fork the project on github and run your fork's [build workflow](.gi
-N, --concurrent-fragments N Number of fragments of a dash/hlsnative
video that should be downloaded concurrently
(default is 1)
-r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M)
-r, --limit-rate RATE Maximum download rate in bytes per second,
e.g. 50K or 4.2M
--throttled-rate RATE Minimum download rate in bytes per second
below which throttling is assumed and the
video data is re-extracted (e.g. 100K)
video data is re-extracted, e.g. 100K
-R, --retries RETRIES Number of retries (default is 10), or
"infinite"
--file-access-retries RETRIES Number of times to retry on file access
@@ -554,7 +554,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
be a number, linear=START[:END[:STEP=1]] or
exp=START[:END[:BASE=2]]. This option can be
used multiple times to set the sleep for the
different retry types. Eg: --retry-sleep
different retry types, e.g. --retry-sleep
linear=1::2 --retry-sleep fragment:exp=1:20
--skip-unavailable-fragments Skip unavailable fragments for DASH,
hlsnative and ISM downloads (default)
@@ -566,14 +566,14 @@ You can also fork the project on github and run your fork's [build workflow](.gi
downloading is finished
--no-keep-fragments Delete downloaded fragments after
downloading is finished (default)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
--buffer-size SIZE Size of download buffer, e.g. 1024 or 16K
(default is 1024)
--resize-buffer The buffer size is automatically resized
from an initial value of --buffer-size
(default)
--no-resize-buffer Do not automatically adjust the buffer size
--http-chunk-size SIZE Size of a chunk for chunk-based HTTP
downloading (e.g. 10485760 or 10M) (default
downloading, e.g. 10485760 or 10M (default
is disabled). May be useful for bypassing
bandwidth throttling imposed by a webserver
(experimental)
@@ -598,10 +598,10 @@ You can also fork the project on github and run your fork's [build workflow](.gi
the given regular expression. Time ranges
prefixed by a "*" can also be used in place
of chapters to download the specified range.
Eg: --download-sections "*10:15-15:00"
--download-sections "intro". Needs ffmpeg.
This option can be used multiple times to
download multiple sections
Needs ffmpeg. This option can be used
multiple times to download multiple
sections, e.g. --download-sections
"*10:15-15:00" --download-sections "intro"
--downloader [PROTO:]NAME Name or path of the external downloader to
use (optionally) prefixed by the protocols
(http, ftp, m3u8, dash, rstp, rtmp, mms) to
@@ -609,7 +609,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
aria2c, avconv, axel, curl, ffmpeg, httpie,
wget. You can use this option multiple times
to set different downloaders for different
protocols. For example, --downloader aria2c
protocols. E.g. --downloader aria2c
--downloader "dash,m3u8:native" will use
aria2c for http/ftp downloads, and the
native downloader for dash/m3u8 downloads
@@ -791,7 +791,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
"postprocess:", or "postprocess-title:".
The video's fields are accessible under the
"info" key and the progress attributes are
accessible under "progress" key. E.g.:
accessible under "progress" key. E.g.
--console-title --progress-template
"download-title:%(info.id)s-%(progress.eta)s"
-v, --verbose Print various debugging information
@@ -860,7 +860,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
-F, --list-formats List available formats of each video.
Simulate unless --no-simulate is used
--merge-output-format FORMAT Containers that may be used when merging
formats, separated by "/" (Eg: "mp4/mkv").
formats, separated by "/", e.g. "mp4/mkv".
Ignored if no merge is required. (currently
supported: avi, flv, mkv, mov, mp4, webm)
@@ -874,13 +874,13 @@ You can also fork the project on github and run your fork's [build workflow](.gi
--list-subs List available subtitles of each video.
Simulate unless --no-simulate is used
--sub-format FORMAT Subtitle format; accepts formats preference,
Eg: "srt" or "ass/srt/best"
e.g. "srt" or "ass/srt/best"
--sub-langs LANGS Languages of the subtitles to download (can
be regex) or "all" separated by commas. (Eg:
--sub-langs "en.*,ja") You can prefix the
be regex) or "all" separated by commas, e.g.
--sub-langs "en.*,ja". You can prefix the
language code with a "-" to exclude it from
the requested languages. (Eg: --sub-langs
all,-live_chat) Use --list-subs for a list
the requested languages, e.g. --sub-langs
all,-live_chat. Use --list-subs for a list
of available language tags
## Authentication Options:
@@ -929,7 +929,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
m4a, mka, mp3, ogg, opus, vorbis, wav). If
target container does not support the
video/audio codec, remuxing will fail. You
can specify multiple rules; Eg.
can specify multiple rules; e.g.
"aac>m4a/mov>mp4/mkv" will remux aac to m4a,
mov to mp4 and anything else to mkv
--recode-video FORMAT Re-encode the video into another format if
@@ -954,7 +954,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
for ffmpeg/ffprobe, "_i"/"_o" can be
appended to the prefix optionally followed
by a number to pass the argument before the
specified input/output file. Eg: --ppa
specified input/output file, e.g. --ppa
"Merger+ffmpeg_i1:-v quiet". You can use
this option multiple times to give different
arguments to different postprocessors.
@@ -1081,7 +1081,7 @@ Make chapter entries for, or remove various segments (sponsor,
music_offtopic, poi_highlight, all and
default (=all). You can prefix the category
with a "-" to exclude it. See [1] for
description of the categories. Eg:
description of the categories. E.g.
--sponsorblock-mark all,-preview
[1] https://wiki.sponsor.ajay.app/w/Segment_Categories
--sponsorblock-remove CATS SponsorBlock categories to be removed from
@@ -1140,7 +1140,7 @@ You can configure yt-dlp by placing any supported command line option to a confi
1. **System Configuration**: `/etc/yt-dlp.conf`
For example, with the following configuration file yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
E.g. with the following configuration file yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
```
# Lines starting with # are comments
@@ -1178,7 +1178,7 @@ After that you can add credentials for an extractor in the following format, whe
```
machine <extractor> login <username> password <password>
```
For example:
E.g.
```
machine youtube login myaccount@gmail.com password my_youtube_password
machine twitch login my_twitch_account_name password my_twitch_password
@@ -1197,32 +1197,32 @@ The `-o` option is used to indicate a template for the output file names while `
The simplest usage of `-o` is not to set any template arguments when downloading a single file, like in `yt-dlp -o funny_video.flv "https://some/video"` (hard-coding file extension like this is _not_ recommended and could break some post-processing).
It may however also contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [Python string formatting operations](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations.
It may however also contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [Python string formatting operations](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting), e.g. `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations.
The field names themselves (the part inside the parenthesis) can also have some special formatting:
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a `.` (dot) separator. You can also do python slicing using `:`. Eg: `%(tags.0)s`, `%(subtitles.en.-1.ext)s`, `%(id.3:7:-1)s`, `%(formats.:.format_id)s`. `%()s` refers to the entire infodict. Note that all the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a `.` (dot) separator. You can also do python slicing using `:`. E.g. `%(tags.0)s`, `%(subtitles.en.-1.ext)s`, `%(id.3:7:-1)s`, `%(formats.:.format_id)s`. `%()s` refers to the entire infodict. Note that all the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Addition**: Addition and subtraction of numeric fields can be done using `+` and `-` respectively. Eg: `%(playlist_index+10)03d`, `%(n_entries+1-playlist_index)d`
1. **Addition**: Addition and subtraction of numeric fields can be done using `+` and `-` respectively. E.g. `%(playlist_index+10)03d`, `%(n_entries+1-playlist_index)d`
1. **Date/time Formatting**: Date/time fields can be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it separated from the field name using a `>`. Eg: `%(duration>%H-%M-%S)s`, `%(upload_date>%Y-%m-%d)s`, `%(epoch-3600>%H-%M-%S)s`
1. **Date/time Formatting**: Date/time fields can be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it separated from the field name using a `>`. E.g. `%(duration>%H-%M-%S)s`, `%(upload_date>%Y-%m-%d)s`, `%(epoch-3600>%H-%M-%S)s`
1. **Alternatives**: Alternate fields can be specified separated with a `,`. Eg: `%(release_date>%Y,upload_date>%Y|Unknown)s`
1. **Alternatives**: Alternate fields can be specified separated with a `,`. E.g. `%(release_date>%Y,upload_date>%Y|Unknown)s`
1. **Replacement**: A replacement value can specified using a `&` separator. If the field is *not* empty, this replacement value will be used instead of the actual field content. This is done after alternate fields are considered; thus the replacement is used if *any* of the alternative fields is *not* empty.
1. **Default**: A literal default value can be specified for when the field is empty using a `|` separator. This overrides `--output-na-template`. Eg: `%(uploader|Unknown)s`
1. **Default**: A literal default value can be specified for when the field is empty using a `|` separator. This overrides `--output-na-template`. E.g. `%(uploader|Unknown)s`
1. **More Conversions**: In addition to the normal format types `diouxXeEfFgGcrs`, yt-dlp additionally supports converting to `B` = **B**ytes, `j` = **j**son (flag `#` for pretty-printing), `h` = HTML escaping, `l` = a comma separated **l**ist (flag `#` for `\n` newline-separated), `q` = a string **q**uoted for the terminal (flag `#` to split a list into different arguments), `D` = add **D**ecimal suffixes (Eg: 10M) (flag `#` to use 1024 as factor), and `S` = **S**anitize as filename (flag `#` for restricted)
1. **More Conversions**: In addition to the normal format types `diouxXeEfFgGcrs`, yt-dlp additionally supports converting to `B` = **B**ytes, `j` = **j**son (flag `#` for pretty-printing), `h` = HTML escaping, `l` = a comma separated **l**ist (flag `#` for `\n` newline-separated), `q` = a string **q**uoted for the terminal (flag `#` to split a list into different arguments), `D` = add **D**ecimal suffixes (e.g. 10M) (flag `#` to use 1024 as factor), and `S` = **S**anitize as filename (flag `#` for restricted)
1. **Unicode normalization**: The format type `U` can be used for NFC [unicode normalization](https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize). The alternate form flag (`#`) changes the normalization to NFD and the conversion flag `+` can be used for NFKC/NFKD compatibility equivalence normalization. Eg: `%(title)+.100U` is NFKC
1. **Unicode normalization**: The format type `U` can be used for NFC [unicode normalization](https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize). The alternate form flag (`#`) changes the normalization to NFD and the conversion flag `+` can be used for NFKC/NFKD compatibility equivalence normalization. E.g. `%(title)+.100U` is NFKC
To summarize, the general syntax for a field is:
```
%(name[.keys][addition][>strf][,alternate][&replacement][|default])[flags][width][.precision][length]type
```
Additionally, you can set different output templates for the various metadata files separately from the general output template by specifying the type of file followed by the template separated by a colon `:`. The different file types supported are `subtitle`, `thumbnail`, `description`, `annotation` (deprecated), `infojson`, `link`, `pl_thumbnail`, `pl_description`, `pl_infojson`, `chapter`, `pl_video`. For example, `-o "%(title)s.%(ext)s" -o "thumbnail:%(title)s\%(title)s.%(ext)s"` will put the thumbnails in a folder with the same name as the video. If any of the templates is empty, that type of file will not be written. Eg: `--write-thumbnail -o "thumbnail:"` will write thumbnails only for playlists and not for video.
Additionally, you can set different output templates for the various metadata files separately from the general output template by specifying the type of file followed by the template separated by a colon `:`. The different file types supported are `subtitle`, `thumbnail`, `description`, `annotation` (deprecated), `infojson`, `link`, `pl_thumbnail`, `pl_description`, `pl_infojson`, `chapter`, `pl_video`. E.g. `-o "%(title)s.%(ext)s" -o "thumbnail:%(title)s\%(title)s.%(ext)s"` will put the thumbnails in a folder with the same name as the video. If any of the templates is empty, that type of file will not be written. E.g. `--write-thumbnail -o "thumbnail:"` will write thumbnails only for playlists and not for video.
The available fields are:
@@ -1358,13 +1358,13 @@ Available only in `--sponsorblock-chapter-title`:
- `category_names` (list): Friendly names of the categories
- `name` (string): Friendly name of the smallest category
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKc`, this will result in a `yt-dlp test video-BaW_jenozKc.mp4` file created in the current directory.
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. E.g. for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKc`, this will result in a `yt-dlp test video-BaW_jenozKc.mp4` file created in the current directory.
Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
**Tip**: Look at the `-j` output to identify which fields are available for the particular URL
For numeric sequences you can use [numeric related formatting](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting), for example, `%(view_count)05d` will result in a string with view count padded with zeros up to 5 characters, like in `00042`.
For numeric sequences you can use [numeric related formatting](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting); e.g. `%(view_count)05d` will result in a string with view count padded with zeros up to 5 characters, like in `00042`.
Output templates can also contain arbitrary hierarchical path, e.g. `-o "%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s"` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
@@ -1434,7 +1434,7 @@ The general syntax for format selection is `-f FORMAT` (or `--format FORMAT`) wh
**tl;dr:** [navigate me to examples](#format-selection-examples).
<!-- MANPAGE: END EXCLUDED SECTION -->
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
The simplest case is requesting a specific format; e.g. with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
@@ -1461,15 +1461,15 @@ For example, to download the worst quality video-only format you can use `-f wor
You can select the n'th best format of a type by using `best<type>.<n>`. For example, `best.2` will select the 2nd best combined format. Similarly, `bv*.3` will select the 3rd best format that contains a video stream.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that formats on the left hand side are preferred; e.g. `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can merge the video and audio of multiple formats into a single file using `-f <format1>+<format2>+...` (requires ffmpeg installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg.
You can merge the video and audio of multiple formats into a single file using `-f <format1>+<format2>+...` (requires ffmpeg installed); e.g. `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg.
**Deprecation warning**: Since the *below* described behavior is complex and counter-intuitive, this will be removed and multistreams will be enabled by default in the future. A new operator will be instead added to limit formats to single audio/video
Unless `--video-multistreams` is used, all formats with a video stream except the first one are ignored. Similarly, unless `--audio-multistreams` is used, all formats with an audio stream except the first one are ignored. For example, `-f bestvideo+best+bestaudio --video-multistreams --audio-multistreams` will download and merge all 3 given formats. The resulting file will have 2 video streams and 2 audio streams. But `-f bestvideo+best+bestaudio --no-video-multistreams` will download and merge only `bestvideo` and `bestaudio`. `best` is ignored since another format containing a video stream (`bestvideo`) has already been selected. The order of the formats is therefore important. `-f best+bestaudio --no-audio-multistreams` will download and merge both formats while `-f bestaudio+best --no-audio-multistreams` will ignore `best` and download only `bestaudio`.
Unless `--video-multistreams` is used, all formats with a video stream except the first one are ignored. Similarly, unless `--audio-multistreams` is used, all formats with an audio stream except the first one are ignored. E.g. `-f bestvideo+best+bestaudio --video-multistreams --audio-multistreams` will download and merge all 3 given formats. The resulting file will have 2 video streams and 2 audio streams. But `-f bestvideo+best+bestaudio --no-video-multistreams` will download and merge only `bestvideo` and `bestaudio`. `best` is ignored since another format containing a video stream (`bestvideo`) has already been selected. The order of the formats is therefore important. `-f best+bestaudio --no-audio-multistreams` will download and merge both formats while `-f bestaudio+best --no-audio-multistreams` will ignore `best` and download only `bestaudio`.
## Filtering Formats
@@ -1500,9 +1500,9 @@ Any string comparison may be prefixed with negation `!` in order to produce an o
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the website. Any other field made available by the extractor can also be used for filtering.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can also use the filters with `all` to download all formats that satisfy the filter. For example, `-f "all[vcodec=none]"` selects all audio-only formats.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
Format selectors can also be grouped using parentheses, for example if you want to download the best pre-merged mp4 and webm formats with a height lower than 480 you can use `-f "(mp4,webm)[height<480]"`.
Format selectors can also be grouped using parentheses; e.g. `-f "(mp4,webm)[height<480]"` will download the best pre-merged mp4 and webm formats with a height lower than 480.
## Sorting Formats
@@ -1540,11 +1540,11 @@ The available fields are:
**Deprecation warning**: Many of these fields have (currently undocumented) aliases, that may be removed in a future version. It is recommended to use only the documented field names.
All fields, unless specified otherwise, are sorted in descending order. To reverse this, prefix the field with a `+`. Eg: `+res` prefers format with the smallest resolution. Additionally, you can suffix a preferred value for the fields, separated by a `:`. Eg: `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two preferred values, the first for video and the second for audio. Eg: `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp9.2` > `av01` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. Eg: `filesize~1G` prefers the format with filesize closest to 1 GiB.
All fields, unless specified otherwise, are sorted in descending order. To reverse this, prefix the field with a `+`. E.g. `+res` prefers format with the smallest resolution. Additionally, you can suffix a preferred value for the fields, separated by a `:`. E.g. `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two preferred values, the first for video and the second for audio. E.g. `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp9.2` > `av01` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. E.g. `filesize~1G` prefers the format with filesize closest to 1 GiB.
The fields `hasvid` and `ie_pref` are always given highest priority in sorting, irrespective of the user-defined order. This behaviour can be changed by using `--format-sort-force`. Apart from these, the default order used is: `lang,quality,res,fps,hdr:12,codec:vp9.2,size,br,asr,proto,ext,hasaud,source,id`. The extractors may override this default order, but they cannot override the user-provided order.
The fields `hasvid` and `ie_pref` are always given highest priority in sorting, irrespective of the user-defined order. This behaviour can be changed by using `--format-sort-force`. Apart from these, the default order used is: `lang,quality,res,fps,hdr:12,vcodec:vp9.2,channels,acodec,size,br,asr,proto,ext,hasaud,source,id`. The extractors may override this default order, but they cannot override the user-provided order.
Note that the default has `codec:vp9.2`; i.e. `av1` is not preferred. Similarly, the default for hdr is `hdr:12`; i.e. dolby vision is not preferred. These choices are made since DV and AV1 formats are not yet fully compatible with most devices. This may be changed in the future as more devices become capable of smoothly playing back these formats.
Note that the default has `vcodec:vp9.2`; i.e. `av1` is not preferred. Similarly, the default for hdr is `hdr:12`; i.e. dolby vision is not preferred. These choices are made since DV and AV1 formats are not yet fully compatible with most devices. This may be changed in the future as more devices become capable of smoothly playing back these formats.
If your format selector is `worst`, the last item is selected after sorting. This means it will select the format that is worst in all respects. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps`.
@@ -1685,9 +1685,9 @@ Note that any field created by this can be used in the [output template](#output
This option also has a few special uses:
* You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. Eg: `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description
* You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. E.g. `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. For example, you can use this to set a different "description" and "synopsis". To modify the metadata of individual streams, use the `meta<n>_` prefix (Eg: `meta1_language`). Any value set to the `meta_` field will overwrite all default values.
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file - you can use this to set a different "description" and "synopsis". To modify the metadata of individual streams, use the `meta<n>_` prefix (e.g. `meta1_language`). Any value set to the `meta_` field will overwrite all default values.
**Note**: Metadata modification happens before format selection, post-extraction and other post-processing operations. Some fields may be added or changed during these steps, overriding your changes.
@@ -1746,21 +1746,19 @@ $ yt-dlp --replace-in-metadata "title,uploader" "[ _]" "-"
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. Eg: `--extractor-args "youtube:player-client=android_embedded,web;include_live_dash" --extractor-args "funimation:version=uncut"`
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=android_embedded,web;include_live_dash" --extractor-args "funimation:version=uncut"`
The following extractors use this feature:
#### youtube
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (Eg: `web_embedded`); and `mweb` and `tv_embedded` (agegate bypass) with no variants. By default, `android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients.
* `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb` and `tv_embedded` (agegate bypass) with no variants. By default, `android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `include_live_dash`: Include live dash formats even without `--live-from-start` (These formats don't download properly)
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `innertube_host`: Innertube API host to use for all API requests
* e.g. `studio.youtube.com`, `youtubei.googleapis.com`
* Note: Cookies exported from `www.youtube.com` will not work with hosts other than `*.youtube.com`
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests
#### youtubetab (YouTube playlists, channels, feeds, etc.)
@@ -1768,17 +1766,16 @@ The following extractors use this feature:
* `approximate_date`: Extract approximate `upload_date` in flat-playlist. This may cause date-based filters to be slightly off
#### funimation
* `language`: Languages to extract. Eg: `funimation:language=english,japanese`
* `language`: Languages to extract, e.g. `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyroll
* `language`: Languages to extract. Eg: `crunchyroll:language=jaJp`
* `hardsub`: Which hard-sub versions to extract. Eg: `crunchyroll:hardsub=None,enUS`
* `language`: Languages to extract, e.g. `crunchyroll:language=jaJp`
* `hardsub`: Which hard-sub versions to extract, e.g. `crunchyroll:hardsub=None,enUS`
#### crunchyrollbeta
* `format`: Which stream type(s) to extract. Default is `adaptive_hls` Eg: `crunchyrollbeta:format=vo_adaptive_hls`
* Potentially useful values include `adaptive_hls`, `adaptive_dash`, `vo_adaptive_hls`, `vo_adaptive_dash`, `download_hls`, `download_dash`, `multitrack_adaptive_hls_v2`
* `hardsub`: Preference order for which hardsub versions to extract. Default is `None` (no hardsubs). Eg: `crunchyrollbeta:hardsub=en-US,None`
* `format`: Which stream type(s) to extract (default: `adaptive_hls`). Potentially useful values include `adaptive_hls`, `adaptive_dash`, `vo_adaptive_hls`, `vo_adaptive_dash`, `download_hls`, `download_dash`, `multitrack_adaptive_hls_v2`
* `hardsub`: Preference order for which hardsub versions to extract (default: `None` = no hardsubs), e.g. `crunchyrollbeta:hardsub=en-US,None`
#### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
@@ -1798,11 +1795,11 @@ The following extractors use this feature:
* `dr`: dynamic range to ignore - one or more of `sdr`, `hdr10`, `dv`
#### tiktok
* `app_version`: App version to call mobile APIs with - should be set along with `manifest_app_version`. (e.g. `20.2.1`)
* `manifest_app_version`: Numeric app version to call mobile APIs with. (e.g. `221`)
* `app_version`: App version to call mobile APIs with - should be set along with `manifest_app_version`, e.g. `20.2.1`
* `manifest_app_version`: Numeric app version to call mobile APIs with, e.g. `221`
#### rokfinchannel
* `tab`: Which tab to download. One of `new`, `top`, `videos`, `podcasts`, `streams`, `stacks`. (E.g. `rokfinchannel:tab=streams`)
* `tab`: Which tab to download - one of `new`, `top`, `videos`, `podcasts`, `streams`, `stacks`
NOTE: These options may be changed/removed in the future without concern for backward compatibility
@@ -2066,7 +2063,7 @@ While these options still work, their use is not recommended since there are oth
--all-formats -f all
--all-subs --sub-langs all --write-subs
--print-json -j --no-simulate
--autonumber-size NUMBER Use string formatting. Eg: %(autonumber)03d
--autonumber-size NUMBER Use string formatting, e.g. %(autonumber)03d
--autonumber-start NUMBER Use internal field formatting like %(autonumber+NUMBER)s
--id -o "%(id)s.%(ext)s"
--metadata-from-title FORMAT --parse-metadata "%(title)s:FORMAT"

1
devscripts/__init__.py Normal file
View File

@@ -0,0 +1 @@
# Empty file needed to make devscripts.utils properly importable from outside

View File

@@ -380,6 +380,7 @@
- **ExtremeTube**
- **EyedoTV**
- **facebook**: [<abbr title="netrc machine"><em>facebook</em></abbr>]
- **facebook:reel**
- **FacebookPluginsVideo**
- **fancode:live**: [<abbr title="netrc machine"><em>fancode</em></abbr>]
- **fancode:vod**: [<abbr title="netrc machine"><em>fancode</em></abbr>]
@@ -709,6 +710,7 @@
- **mixcloud:playlist**
- **mixcloud:user**
- **MLB**
- **MLBTV**: [<abbr title="netrc machine"><em>mlb</em></abbr>]
- **MLBVideo**
- **MLSSoccer**
- **Mnet**
@@ -726,6 +728,7 @@
- **MovieClips**
- **MovieFap**
- **Moviepilot**
- **MoviewPlay**
- **Moviezine**
- **MovingImage**
- **MSN**
@@ -916,6 +919,7 @@
- **ParamountNetwork**
- **ParamountPlus**
- **ParamountPlusSeries**
- **Parler**: Posts on parler.com
- **parliamentlive.tv**: UK parliament videos
- **Parlview**
- **Patreon**
@@ -1314,6 +1318,7 @@
- **TrovoVod**
- **TrueID**
- **TruNews**
- **Truth**
- **TruTV**
- **Tube8**
- **TubeTuGraz**: [<abbr title="netrc machine"><em>tubetugraz</em></abbr>] tube.tugraz.at
@@ -1584,7 +1589,7 @@
- **youtube:clip**
- **youtube:favorites**: YouTube liked videos; ":ytfav" keyword (requires cookies)
- **youtube:history**: Youtube watch history; ":ythis" keyword (requires cookies)
- **youtube:music:search_url**: YouTube music search URLs with selectable sections (Eg: #songs)
- **youtube:music:search_url**: YouTube music search URLs with selectable sections, e.g. #songs
- **youtube:notif**: YouTube notifications; ":ytnotif" keyword (requires cookies)
- **youtube:playlist**: YouTube playlists
- **youtube:recommended**: YouTube recommended videos; ":ytrec" keyword

View File

@@ -105,11 +105,11 @@ def generator(test_case, tname):
info_dict = tc.get('info_dict', {})
params = tc.get('params', {})
if not info_dict.get('id'):
raise Exception('Test definition incorrect. \'id\' key is not present')
raise Exception(f'Test {tname} definition incorrect - "id" key is not present')
elif not info_dict.get('ext'):
if params.get('skip_download') and params.get('ignore_no_formats_error'):
continue
raise Exception('Test definition incorrect. The output file cannot be known. \'ext\' key is not present')
raise Exception(f'Test {tname} definition incorrect - "ext" key must be present to define the output file')
if 'skip' in test_case:
print_skipping(test_case['skip'])
@@ -161,7 +161,9 @@ def generator(test_case, tname):
force_generic_extractor=params.get('force_generic_extractor', False))
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (urllib.error.URLError, socket.timeout, UnavailableVideoError, http.client.BadStatusLine) or (err.exc_info[0] == urllib.error.HTTPError and err.exc_info[1].code == 503):
if (err.exc_info[0] not in (urllib.error.URLError, socket.timeout, UnavailableVideoError, http.client.BadStatusLine)
or (err.exc_info[0] == urllib.error.HTTPError and err.exc_info[1].code == 503)):
err.msg = f'{getattr(err, "msg", err)} ({tname})'
raise
if try_num == RETRIES:

View File

@@ -19,6 +19,9 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function x3(){return 42;}')
self.assertEqual(jsi.call_function('x3'), 42)
jsi = JSInterpreter('function x3(){42}')
self.assertEqual(jsi.call_function('x3'), None)
jsi = JSInterpreter('var x5 = function(){return 42;}')
self.assertEqual(jsi.call_function('x5'), 42)
@@ -45,14 +48,26 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function f(){return 1 << 5;}')
self.assertEqual(jsi.call_function('f'), 32)
jsi = JSInterpreter('function f(){return 2 ** 5}')
self.assertEqual(jsi.call_function('f'), 32)
jsi = JSInterpreter('function f(){return 19 & 21;}')
self.assertEqual(jsi.call_function('f'), 17)
jsi = JSInterpreter('function f(){return 11 >> 2;}')
self.assertEqual(jsi.call_function('f'), 2)
jsi = JSInterpreter('function f(){return []? 2+3: 4;}')
self.assertEqual(jsi.call_function('f'), 5)
jsi = JSInterpreter('function f(){return 1 == 2}')
self.assertEqual(jsi.call_function('f'), False)
jsi = JSInterpreter('function f(){return 0 && 1 || 2;}')
self.assertEqual(jsi.call_function('f'), 2)
def test_array_access(self):
jsi = JSInterpreter('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2] = 7; return x;}')
jsi = JSInterpreter('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}')
self.assertEqual(jsi.call_function('f'), [5, 2, 7])
def test_parens(self):
@@ -62,6 +77,10 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function f(){return (1 + 2) * 3;}')
self.assertEqual(jsi.call_function('f'), 9)
def test_quotes(self):
jsi = JSInterpreter(R'function f(){return "a\"\\("}')
self.assertEqual(jsi.call_function('f'), R'a"\(')
def test_assignments(self):
jsi = JSInterpreter('function f(){var x = 20; x = 30 + 1; return x;}')
self.assertEqual(jsi.call_function('f'), 31)
@@ -104,17 +123,28 @@ class TestJSInterpreter(unittest.TestCase):
}''')
self.assertEqual(jsi.call_function('x'), [20, 20, 30, 40, 50])
def test_builtins(self):
jsi = JSInterpreter('''
function x() { return new Date('Wednesday 31 December 1969 18:01:26 MDT') - 0; }
''')
self.assertEqual(jsi.call_function('x'), 86000)
jsi = JSInterpreter('''
function x(dt) { return new Date(dt) - 0; }
''')
self.assertEqual(jsi.call_function('x', 'Wednesday 31 December 1969 18:01:26 MDT'), 86000)
def test_call(self):
jsi = JSInterpreter('''
function x() { return 2; }
function y(a) { return x() + a; }
function y(a) { return x() + (a?a:0); }
function z() { return y(3); }
''')
self.assertEqual(jsi.call_function('z'), 5)
self.assertEqual(jsi.call_function('y'), 2)
def test_for_loop(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) {a++} a }
function x() { a=0; for (i=0; i-10; i++) {a++} return a }
''')
self.assertEqual(jsi.call_function('x'), 10)
@@ -155,19 +185,19 @@ class TestJSInterpreter(unittest.TestCase):
def test_for_loop_continue(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { continue; a++ } a }
function x() { a=0; for (i=0; i-10; i++) { continue; a++ } return a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_for_loop_break(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { break; a++ } a }
function x() { a=0; for (i=0; i-10; i++) { break; a++ } return a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_literal_list(self):
jsi = JSInterpreter('''
function x() { [1, 2, "asdf", [5, 6, 7]][3] }
function x() { return [1, 2, "asdf", [5, 6, 7]][3] }
''')
self.assertEqual(jsi.call_function('x'), [5, 6, 7])
@@ -177,6 +207,23 @@ class TestJSInterpreter(unittest.TestCase):
''')
self.assertEqual(jsi.call_function('x'), 7)
jsi = JSInterpreter('''
function x() { a=5; return (a -= 1, a+=3, a); }
''')
self.assertEqual(jsi.call_function('x'), 7)
def test_void(self):
jsi = JSInterpreter('''
function x() { return void 42; }
''')
self.assertEqual(jsi.call_function('x'), None)
def test_return_function(self):
jsi = JSInterpreter('''
function x() { return [1, function(){return 1}][1] }
''')
self.assertEqual(jsi.call_function('x')([]), 1)
if __name__ == '__main__':
unittest.main()

View File

@@ -413,6 +413,10 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)

View File

@@ -94,6 +94,14 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/5dd88d1d/player-plasma-ias-phone-en_US.vflset/base.js',
'kSxKFLeqzv_ZyHSAt', 'n8gS8oRlHOxPFA',
),
(
'https://www.youtube.com/s/player/324f67b9/player_ias.vflset/en_US/base.js',
'xdftNy7dh9QGnhW', '22qLGxrmX8F1rA',
),
(
'https://www.youtube.com/s/player/4c3f79c5/player_ias.vflset/en_US/base.js',
'TDCstCG66tEAO5pR9o', 'dbxNtZ14c-yWyw',
),
]
@@ -101,6 +109,7 @@ _NSIG_TESTS = [
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/4c3f79c5/player_ias.vflset/en_US/base.js', '4c3f79c5'),
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/fr_FR/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),

View File

@@ -272,7 +272,7 @@ class YoutubeDL:
subtitleslangs: List of languages of the subtitles to download (can be regex).
The list may contain "all" to refer to all the available
subtitles. The language can be prefixed with a "-" to
exclude it from the requested languages. Eg: ['all', '-live_chat']
exclude it from the requested languages, e.g. ['all', '-live_chat']
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
@@ -301,8 +301,8 @@ class YoutubeDL:
should act on each input URL as opposed to for the entire queue
cookiefile: File name or text stream from where cookies should be read and dumped to
cookiesfrombrowser: A tuple containing the name of the browser, the profile
name/pathfrom where cookies are loaded, and the name of the
keyring. Eg: ('chrome', ) or ('vivaldi', 'default', 'BASICTEXT')
name/path from where cookies are loaded, and the name of the
keyring, e.g. ('chrome', ) or ('vivaldi', 'default', 'BASICTEXT')
legacyserverconnect: Explicitly allow HTTPS connection to servers that do not
support RFC 5746 secure renegotiation
nocheckcertificate: Do not verify SSL certificates
@@ -470,7 +470,7 @@ class YoutubeDL:
discontinuities such as ad breaks (default: False)
extractor_args: A dictionary of arguments to be passed to the extractors.
See "EXTRACTOR ARGUMENTS" for details.
Eg: {'youtube': {'skip': ['dash', 'hls']}}
E.g. {'youtube': {'skip': ['dash', 'hls']}}
mark_watched: Mark videos watched (even with --simulate). Only for YouTube
The following options are deprecated and may be removed in the future:
@@ -1046,7 +1046,7 @@ class YoutubeDL:
# outtmpl should be expand_path'ed before template dict substitution
# because meta fields may contain env variables we don't want to
# be expanded. For example, for outtmpl "%(title)s.%(ext)s" and
# be expanded. E.g. for outtmpl "%(title)s.%(ext)s" and
# title "Hello $PATH", we don't want `$PATH` to be expanded.
return expand_path(outtmpl).replace(sep, '')
@@ -1977,8 +1977,8 @@ class YoutubeDL:
filter_parts.append(string)
def _remove_unused_ops(tokens):
# Remove operators that we don't use and join them with the surrounding strings
# for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
# Remove operators that we don't use and join them with the surrounding strings.
# E.g. 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
ALLOWED_OPS = ('/', '+', ',', '(', ')')
last_string, last_start, last_end, last_line = None, None, None, None
for type, string, start, end, line in tokens:

View File

@@ -184,7 +184,7 @@ def build_fragments_list(boot_info):
first_frag_number = fragment_run_entry_table[0]['first']
fragments_counter = itertools.count(first_frag_number)
for segment, fragments_count in segment_run_table['segment_run']:
# In some live HDS streams (for example Rai), `fragments_count` is
# In some live HDS streams (e.g. Rai), `fragments_count` is
# abnormal and causing out-of-memory errors. It's OK to change the
# number of fragments for live streams as they are updated periodically
if fragments_count == 4294967295 and boot_info['live']:

View File

@@ -500,6 +500,7 @@ from .facebook import (
FacebookIE,
FacebookPluginsVideoIE,
FacebookRedirectURLIE,
FacebookReelIE,
)
from .fancode import (
FancodeVodIE,
@@ -956,6 +957,7 @@ from .mixcloud import (
from .mlb import (
MLBIE,
MLBVideoIE,
MLBTVIE,
)
from .mlssoccer import MLSSoccerIE
from .mnet import MnetIE
@@ -974,6 +976,7 @@ from .motherless import (
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviepilot import MoviepilotIE
from .moview import MoviewPlayIE
from .moviezine import MoviezineIE
from .movingimage import MovingImageIE
from .msn import MSNIE
@@ -1236,6 +1239,7 @@ from .paramountplus import (
ParamountPlusIE,
ParamountPlusSeriesIE,
)
from .parler import ParlerIE
from .parlview import ParlviewIE
from .patreon import (
PatreonIE,
@@ -1792,6 +1796,7 @@ from .trovo import (
)
from .trueid import TrueIDIE
from .trunews import TruNewsIE
from .truth import TruthIE
from .trutv import TruTVIE
from .tube8 import Tube8IE
from .tubetugraz import TubeTuGrazIE, TubeTuGrazSeriesIE

View File

@@ -365,7 +365,7 @@ class AbemaTVIE(AbemaTVBaseIE):
# read breadcrumb on top of page
breadcrumb = self._extract_breadcrumb_list(webpage, video_id)
if breadcrumb:
# breadcrumb list translates to: (example is 1st test for this IE)
# breadcrumb list translates to: (e.g. 1st test for this IE)
# Home > Anime (genre) > Isekai Shokudo 2 (series name) > Episode 1 "Cheese cakes" "Morning again" (episode title)
# hence this works
info['series'] = breadcrumb[-2]

View File

@@ -28,14 +28,17 @@ class AENetworksBaseIE(ThePlatformIE):
}
def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'}
query = {
'mbr': 'true',
'formats': 'M3U+none,MPEG-DASH+none,MPEG4,MP3',
}
if auth:
query['auth'] = auth
TP_SMIL_QUERY = [{
'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak'
'switch': 'hls_high_ak',
}, {
'assetTypes': 'high_video_s3'
'assetTypes': 'high_video_s3',
}, {
'assetTypes': 'high_video_s3',
'switch': 'hls_high_fastly',

View File

@@ -331,7 +331,7 @@ class InfoExtractor:
playable_in_embed: Whether this video is allowed to play in embedded
players on other sites. Can be True (=always allowed),
False (=never allowed), None (=unknown), or a string
specifying the criteria for embedability (Eg: 'whitelist')
specifying the criteria for embedability; e.g. 'whitelist'
availability: Under what condition the video is available. One of
'private', 'premium_only', 'subscriber_only', 'needs_auth',
'unlisted' or 'public'. Use 'InfoExtractor._availability'
@@ -452,8 +452,8 @@ class InfoExtractor:
_extract_from_webpage may raise self.StopExtraction() to stop further
processing of the webpage and obtain exclusive rights to it. This is useful
when the extractor cannot reliably be matched using just the URL.
Eg: invidious/peertube instances
when the extractor cannot reliably be matched using just the URL,
e.g. invidious/peertube instances
Embed-only extractors can be defined by setting _VALID_URL = False.
@@ -1669,8 +1669,8 @@ class InfoExtractor:
regex = r' *((?P<reverse>\+)?(?P<field>[a-zA-Z0-9_]+)((?P<separator>[~:])(?P<limit>.*?))?)? *$'
default = ('hidden', 'aud_or_vid', 'hasvid', 'ie_pref', 'lang', 'quality',
'res', 'fps', 'hdr:12', 'channels', 'codec:vp9.2', 'size', 'br', 'asr',
'proto', 'ext', 'hasaud', 'source', 'id') # These must not be aliases
'res', 'fps', 'hdr:12', 'vcodec:vp9.2', 'channels', 'acodec',
'size', 'br', 'asr', 'proto', 'ext', 'hasaud', 'source', 'id') # These must not be aliases
ytdl_default = ('hasaud', 'lang', 'quality', 'tbr', 'filesize', 'vbr',
'height', 'width', 'proto', 'vext', 'abr', 'aext',
'fps', 'fs_approx', 'source', 'id')
@@ -2367,7 +2367,7 @@ class InfoExtractor:
audio_group_id = last_stream_inf.get('AUDIO')
# As per [1, 4.3.4.1.1] any EXT-X-STREAM-INF tag which
# references a rendition group MUST have a CODECS attribute.
# However, this is not always respected, for example, [2]
# However, this is not always respected. E.g. [2]
# contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite
# referencing an audio group it represents a complete
@@ -3003,8 +3003,8 @@ class InfoExtractor:
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# No media template,
# e.g. https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
segment_index = 0
@@ -3021,7 +3021,7 @@ class InfoExtractor:
representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# E.g. https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
@@ -3249,8 +3249,8 @@ class InfoExtractor:
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
# https://github.com/ytdl-org/youtube-dl/issues/11979,
# e.g. http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>%s)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>' % _MEDIA_TAG_NAME_RE, webpage))
for media_tag, _, media_type, media_content in media_tags:
media_info = {
@@ -3706,7 +3706,7 @@ class InfoExtractor:
desc += f'; "{cls.SEARCH_KEY}:" prefix'
if search_examples:
_COUNTS = ('', '5', '10', 'all')
desc += f' (Example: "{cls.SEARCH_KEY}{random.choice(_COUNTS)}:{random.choice(search_examples)}")'
desc += f' (e.g. "{cls.SEARCH_KEY}{random.choice(_COUNTS)}:{random.choice(search_examples)}")'
if not cls.working():
desc += ' (**Currently broken**)' if markdown else ' (Currently broken)'

View File

@@ -114,7 +114,14 @@ class CrunchyrollBaseIE(InfoExtractor):
class CrunchyrollIE(CrunchyrollBaseIE, VRVBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|(?!series/|watch/)(?:[^/]+/){1,2}[^/?&]*?)(?P<id>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'''(?x)
https?://(?:(?P<prefix>www|m)\.)?(?P<url>
crunchyroll\.(?:com|fr)/(?:
media(?:-|/\?id=)|
(?!series/|watch/)(?:[^/]+/){1,2}[^/?&#]*?
)(?P<id>[0-9]+)
)(?:[/?&#]|$)'''
_TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': {
@@ -758,7 +765,11 @@ class CrunchyrollBetaBaseIE(CrunchyrollBaseIE):
class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
IE_NAME = 'crunchyroll:beta'
_VALID_URL = r'https?://beta\.crunchyroll\.com/(?P<lang>(?:\w{2}(?:-\w{2})?/)?)watch/(?P<id>\w+)/(?P<display_id>[\w\-]*)/?(?:\?|$)'
_VALID_URL = r'''(?x)
https?://beta\.crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<id>\w+)
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
_TESTS = [{
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'info_dict': {
@@ -780,7 +791,7 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y/',
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
@@ -867,7 +878,11 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
class CrunchyrollBetaShowIE(CrunchyrollBetaBaseIE):
IE_NAME = 'crunchyroll:playlist:beta'
_VALID_URL = r'https?://beta\.crunchyroll\.com/(?P<lang>(?:\w{2}(?:-\w{2})?/)?)series/(?P<id>\w+)/(?P<display_id>[\w\-]*)/?(?:\?|$)'
_VALID_URL = r'''(?x)
https?://beta\.crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
series/(?P<id>\w+)
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
_TESTS = [{
'url': 'https://beta.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'info_dict': {
@@ -876,7 +891,7 @@ class CrunchyrollBetaShowIE(CrunchyrollBetaBaseIE):
},
'playlist_mincount': 10,
}, {
'url': 'https://beta.crunchyroll.com/it/series/GY19NQ2QR/Girl-Friend-BETA',
'url': 'https://beta.crunchyroll.com/it/series/GY19NQ2QR',
'only_matching': True,
}]

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
class DoodStreamIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dood\.(?:to|watch|so|pm)/[ed]/(?P<id>[a-z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?dood\.(?:to|watch|so|pm|wf)/[ed]/(?P<id>[a-z0-9]+)'
_TESTS = [{
'url': 'http://dood.to/e/5s1wmbdacezb',
'md5': '4568b83b31e13242b3f1ff96c55f0595',

View File

@@ -772,3 +772,30 @@ class FacebookRedirectURLIE(InfoExtractor):
if not redirect_url:
raise ExtractorError('Invalid facebook redirect URL', expected=True)
return self.url_result(redirect_url)
class FacebookReelIE(InfoExtractor):
_VALID_URL = r'https?://(?:[\w-]+\.)?facebook\.com/reel/(?P<id>\d+)'
IE_NAME = 'facebook:reel'
_TESTS = [{
'url': 'https://www.facebook.com/reel/1195289147628387',
'md5': 'c4ff9a7182ff9ff7d6f7a83603bae831',
'info_dict': {
'id': '1195289147628387',
'ext': 'mp4',
'title': 'md5:9f5b142921b2dc57004fa13f76005f87',
'description': 'md5:24ea7ef062215d295bdde64e778f5474',
'uploader': 'Beast Camp Training',
'uploader_id': '1738535909799870',
'duration': 9.536,
'thumbnail': r're:^https?://.*',
'upload_date': '20211121',
'timestamp': 1637502604,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
f'https://m.facebook.com/watch/?v={video_id}&_rdr', FacebookIE, video_id)

View File

@@ -3035,7 +3035,7 @@ class GenericIE(InfoExtractor):
self.report_detected('Twitter card')
if not found:
# We look for Open Graph info:
# We have to match any number spaces between elements, some sites try to align them (eg.: statigr.am)
# We have to match any number spaces between elements, some sites try to align them, e.g.: statigr.am
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:

48
yt_dlp/extractor/jixie.py Normal file
View File

@@ -0,0 +1,48 @@
from .common import InfoExtractor
from ..utils import clean_html, float_or_none, traverse_obj, try_call
class JixieBaseIE(InfoExtractor):
"""
API Reference:
https://jixie.atlassian.net/servicedesk/customer/portal/2/article/1339654214?src=-1456335525,
https://scripts.jixie.media/jxvideo.3.1.min.js
"""
def _extract_data_from_jixie_id(self, display_id, video_id, webpage):
json_data = self._download_json(
'https://apidam.jixie.io/api/public/stream', display_id,
query={'metadata': 'full', 'video_id': video_id})['data']
formats, subtitles = [], {}
for stream in json_data['streams']:
if stream.get('type') == 'HLS':
fmt, sub = self._extract_m3u8_formats_and_subtitles(stream.get('url'), display_id, ext='mp4')
if json_data.get('drm'):
for f in fmt:
f['has_drm'] = True
formats.extend(fmt)
self._merge_subtitles(sub, target=subtitles)
else:
formats.append({
'url': stream.get('url'),
'width': stream.get('width'),
'height': stream.get('height'),
'ext': 'mp4',
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'title': json_data.get('title') or self._html_search_meta(['og:title', 'twitter:title'], webpage),
'description': (clean_html(traverse_obj(json_data, ('metadata', 'description')))
or self._html_search_meta(['description', 'og:description', 'twitter:description'], webpage)),
'thumbnails': traverse_obj(json_data, ('metadata', 'thumbnails')),
'duration': float_or_none(traverse_obj(json_data, ('metadata', 'duration'))),
'tags': try_call(lambda: (json_data['metadata']['keywords'] or None).split(',')),
'categories': try_call(lambda: (json_data['metadata']['categories'] or None).split(',')),
'uploader_id': json_data.get('owner_id'),
}

View File

@@ -1,17 +1,7 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
float_or_none,
traverse_obj,
try_call,
)
# Video from www.kompas.tv and video.kompas.com seems use jixie player
# see [1] https://jixie.atlassian.net/servicedesk/customer/portal/2/article/1339654214?src=-1456335525,
# [2] https://scripts.jixie.media/jxvideo.3.1.min.js for more info
from .jixie import JixieBaseIE
class KompasVideoIE(InfoExtractor):
class KompasVideoIE(JixieBaseIE):
_VALID_URL = r'https?://video\.kompas\.com/\w+/(?P<id>\d+)/(?P<slug>[\w-]+)'
_TESTS = [{
'url': 'https://video.kompas.com/watch/164474/kim-jong-un-siap-kirim-nuklir-lawan-as-dan-korsel',
@@ -33,36 +23,4 @@ class KompasVideoIE(InfoExtractor):
video_id, display_id = self._match_valid_url(url).group('id', 'slug')
webpage = self._download_webpage(url, display_id)
json_data = self._download_json(
'https://apidam.jixie.io/api/public/stream', display_id,
query={'metadata': 'full', 'video_id': video_id})['data']
formats, subtitles = [], {}
for stream in json_data['streams']:
if stream.get('type') == 'HLS':
fmt, sub = self._extract_m3u8_formats_and_subtitles(stream.get('url'), display_id, ext='mp4')
formats.extend(fmt)
self._merge_subtitles(sub, target=subtitles)
else:
formats.append({
'url': stream.get('url'),
'width': stream.get('width'),
'height': stream.get('height'),
'ext': 'mp4',
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'title': json_data.get('title') or self._html_search_meta(['og:title', 'twitter:title'], webpage),
'description': (clean_html(traverse_obj(json_data, ('metadata', 'description')))
or self._html_search_meta(['description', 'og:description', 'twitter:description'], webpage)),
'thumbnails': traverse_obj(json_data, ('metadata', 'thumbnails')),
'duration': float_or_none(traverse_obj(json_data, ('metadata', 'duration'))),
'tags': try_call(lambda: json_data['metadata']['keywords'].split(',')),
'categories': try_call(lambda: json_data['metadata']['categories'].split(',')),
'uploader_id': json_data.get('owner_id'),
}
return self._extract_data_from_jixie_id(display_id, video_id, webpage)

View File

@@ -1,11 +1,15 @@
import re
import urllib.parse
import uuid
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
join_nonempty,
parse_duration,
parse_iso8601,
traverse_obj,
try_get,
)
@@ -267,3 +271,79 @@ class MLBVideoIE(MLBBaseIE):
}
}''' % display_id,
})['data']['mediaPlayback'][0]
class MLBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/tv/g(?P<id>\d{6})'
_NETRC_MACHINE = 'mlb'
_TESTS = [{
'url': 'https://www.mlb.com/tv/g661581/vee2eff5f-a7df-4c20-bdb4-7b926fa12638',
'info_dict': {
'id': '661581',
'ext': 'mp4',
'title': '2022-07-02 - St. Louis Cardinals @ Philadelphia Phillies',
},
'params': {
'skip_download': True,
},
}]
_access_token = None
def _real_initialize(self):
if not self._access_token:
self.raise_login_required(
'All videos are only available to registered users', method='password')
def _perform_login(self, username, password):
data = f'grant_type=password&username={urllib.parse.quote(username)}&password={urllib.parse.quote(password)}&scope=openid offline_access&client_id=0oa3e1nutA1HLzAKG356'
access_token = self._download_json(
'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None,
headers={
'User-Agent': 'okhttp/3.12.1',
'Content-Type': 'application/x-www-form-urlencoded'
}, data=data.encode())['access_token']
entitlement = self._download_webpage(
f'https://media-entitlement.mlb.com/api/v3/jwt?os=Android&appname=AtBat&did={str(uuid.uuid4())}', None,
headers={
'User-Agent': 'okhttp/3.12.1',
'Authorization': f'Bearer {access_token}'
})
data = f'grant_type=urn:ietf:params:oauth:grant-type:token-exchange&subject_token={entitlement}&subject_token_type=urn:ietf:params:oauth:token-type:jwt&platform=android-tv'
self._access_token = self._download_json(
'https://us.edge.bamgrid.com/token', None,
headers={
'Accept': 'application/json',
'Authorization': 'Bearer bWxidHYmYW5kcm9pZCYxLjAuMA.6LZMbH2r--rbXcgEabaDdIslpo4RyZrlVfWZhsAgXIk',
'Content-Type': 'application/x-www-form-urlencoded'
}, data=data.encode())['access_token']
def _real_extract(self, url):
video_id = self._match_id(url)
airings = self._download_json(
f'https://search-api-mlbtv.mlb.com/svc/search/v2/graphql/persisted/query/core/Airings?variables=%7B%22partnerProgramIds%22%3A%5B%22{video_id}%22%5D%2C%22applyEsniMediaRightsLabels%22%3Atrue%7D',
video_id)['data']['Airings']
formats, subtitles = [], {}
for airing in airings:
m3u8_url = self._download_json(
airing['playbackUrls'][0]['href'].format(scenario='browser~csai'), video_id,
headers={
'Authorization': self._access_token,
'Accept': 'application/vnd.media-service+json; version=2'
})['stream']['complete']
f, s = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id=join_nonempty(airing.get('feedType'), airing.get('feedLanguage')))
formats.extend(f)
self._merge_subtitles(s, target=subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': traverse_obj(airings, (..., 'titles', 0, 'episodeName'), get_all=False),
'formats': formats,
'subtitles': subtitles,
'http_headers': {'Authorization': f'Bearer {self._access_token}'},
}

View File

@@ -0,0 +1,43 @@
from .jixie import JixieBaseIE
class MoviewPlayIE(JixieBaseIE):
_VALID_URL = r'https?://www\.moview\.id/play/\d+/(?P<id>[\w-]+)'
_TESTS = [
{
# drm hls, only use direct link
'url': 'https://www.moview.id/play/174/Candy-Monster',
'info_dict': {
'id': '146182',
'ext': 'mp4',
'display_id': 'Candy-Monster',
'uploader_id': 'Mo165qXUUf',
'duration': 528.2,
'title': 'Candy Monster',
'description': 'Mengapa Candy Monster ingin mengambil permen Chloe?',
'thumbnail': 'https://video.jixie.media/1034/146182/146182_1280x720.jpg',
}
}, {
# non-drm hls
'url': 'https://www.moview.id/play/75/Paris-Van-Java-Episode-16',
'info_dict': {
'id': '28210',
'ext': 'mp4',
'duration': 2595.666667,
'display_id': 'Paris-Van-Java-Episode-16',
'uploader_id': 'Mo165qXUUf',
'thumbnail': 'https://video.jixie.media/1003/28210/28210_1280x720.jpg',
'description': 'md5:2a5e18d98eef9b39d7895029cac96c63',
'title': 'Paris Van Java Episode 16',
}
}
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'video_id\s*=\s*"(?P<video_id>[^"]+)', webpage, 'video_id')
return self._extract_data_from_jixie_id(display_id, video_id, webpage)

View File

@@ -169,7 +169,7 @@ class PhantomJSwrapper:
In most cases you don't need to add any `jscode`.
It is executed in `page.onLoadFinished`.
`saveAndExit();` is mandatory, use it instead of `phantom.exit()`
It is possible to wait for some element on the webpage, for example:
It is possible to wait for some element on the webpage, e.g.
var check = function() {
var elementFound = page.evaluate(function() {
return document.querySelector('#b.done') !== null;

111
yt_dlp/extractor/parler.py Normal file
View File

@@ -0,0 +1,111 @@
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
clean_html,
format_field,
int_or_none,
strip_or_none,
traverse_obj,
unified_timestamp,
urlencode_postdata,
)
class ParlerIE(InfoExtractor):
IE_DESC = 'Posts on parler.com'
_VALID_URL = r'https://parler\.com/feed/(?P<id>[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12})'
_TESTS = [
{
'url': 'https://parler.com/feed/df79fdba-07cc-48fe-b085-3293897520d7',
'md5': '16e0f447bf186bb3cf64de5bbbf4d22d',
'info_dict': {
'id': 'df79fdba-07cc-48fe-b085-3293897520d7',
'ext': 'mp4',
'thumbnail': 'https://bl-images.parler.com/videos/6ce7cdf3-a27a-4d72-bf9c-d3e17ce39a66/thumbnail.jpeg',
'title': 'Parler video #df79fdba-07cc-48fe-b085-3293897520d7',
'description': 'md5:6f220bde2df4a97cbb89ac11f1fd8197',
'timestamp': 1659744000,
'upload_date': '20220806',
'uploader': 'Tulsi Gabbard',
'uploader_id': 'TulsiGabbard',
'uploader_url': 'https://parler.com/TulsiGabbard',
'view_count': int,
'comment_count': int,
'repost_count': int,
},
},
{
'url': 'https://parler.com/feed/a7406eb4-91e5-4793-b5e3-ade57a24e287',
'md5': '11687e2f5bb353682cee338d181422ed',
'info_dict': {
'id': 'a7406eb4-91e5-4793-b5e3-ade57a24e287',
'ext': 'mp4',
'thumbnail': 'https://bl-images.parler.com/videos/317827a8-1e48-4cbc-981f-7dd17d4c1183/thumbnail.jpeg',
'title': 'Parler video #a7406eb4-91e5-4793-b5e3-ade57a24e287',
'description': 'This man should run for office',
'timestamp': 1659657600,
'upload_date': '20220805',
'uploader': 'Benny Johnson',
'uploader_id': 'BennyJohnson',
'uploader_url': 'https://parler.com/BennyJohnson',
'view_count': int,
'comment_count': int,
'repost_count': int,
},
},
{
'url': 'https://parler.com/feed/f23b85c1-6558-470f-b9ff-02c145f28da5',
'md5': 'eaba1ff4a10fe281f5ce74e930ab2cb4',
'info_dict': {
'id': 'r5vkSaz8PxQ',
'ext': 'mp4',
'thumbnail': 'https://i.ytimg.com/vi_webp/r5vkSaz8PxQ/maxresdefault.webp',
'title': 'Tom MacDonald Names Reaction',
'description': 'md5:33c21f0d35ae6dc2edf3007d6696baea',
'upload_date': '20220716',
'duration': 1267,
'uploader': 'Mahesh Chookolingo',
'uploader_id': 'maheshchookolingo',
'uploader_url': 'http://www.youtube.com/user/maheshchookolingo',
'channel': 'Mahesh Chookolingo',
'channel_id': 'UCox6YeMSY1PQInbCtTaZj_w',
'channel_url': 'https://www.youtube.com/channel/UCox6YeMSY1PQInbCtTaZj_w',
'categories': ['Entertainment'],
'tags': list,
'availability': 'public',
'live_status': 'not_live',
'view_count': int,
'comment_count': int,
'like_count': int,
'channel_follower_count': int,
'age_limit': 0,
'playable_in_embed': True,
},
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://parler.com/open-api/ParleyDetailEndpoint.php', video_id,
data=urlencode_postdata({'uuid': video_id}))['data'][0]
primary = data['primary']
embed = self._parse_json(primary.get('V2LINKLONG') or '', video_id, fatal=False)
if embed:
return self.url_result(embed[0], YoutubeIE)
return {
'id': video_id,
'url': traverse_obj(primary, ('video_data', 'videoSrc')),
'thumbnail': traverse_obj(primary, ('video_data', 'thumbnailUrl')),
'title': '',
'description': strip_or_none(clean_html(primary.get('full_body'))) or None,
'timestamp': unified_timestamp(primary.get('date_created')),
'uploader': strip_or_none(primary.get('name')),
'uploader_id': strip_or_none(primary.get('username')),
'uploader_url': format_field(strip_or_none(primary.get('username')), None, 'https://parler.com/%s'),
'view_count': int_or_none(primary.get('view_count')),
'comment_count': int_or_none(traverse_obj(data, ('engagement', 'commentCount'))),
'repost_count': int_or_none(traverse_obj(data, ('engagement', 'echoCount'))),
}

View File

@@ -154,6 +154,28 @@ class PatreonIE(PatreonBaseIE):
'channel_url': 'https://www.patreon.com/loish',
'channel_follower_count': int,
}
}, {
# bad videos under media (if media is included). Real one is under post_file
'url': 'https://www.patreon.com/posts/premium-access-70282931',
'info_dict': {
'id': '70282931',
'ext': 'mp4',
'title': '[Premium Access + Uncut] The Office - 2x6 The Fight - Group Reaction',
'channel_url': 'https://www.patreon.com/thenormies',
'channel_id': '573397',
'uploader_id': '2929435',
'uploader': 'The Normies',
'description': 'md5:79c9fd8778e2cef84049a94c058a5e23',
'comment_count': int,
'upload_date': '20220809',
'thumbnail': r're:^https?://.*$',
'channel_follower_count': int,
'like_count': int,
'timestamp': 1660052820,
'tags': ['The Office', 'early access', 'uncut'],
'uploader_url': 'https://www.patreon.com/thenormies',
},
'skip': 'Patron-only content',
}]
def _real_extract(self, url):
@@ -166,7 +188,7 @@ class PatreonIE(PatreonBaseIE):
'fields[post_tag]': 'value',
'fields[campaign]': 'url,name,patron_count',
'json-api-use-default-includes': 'false',
'include': 'media,user,user_defined_tags,campaign',
'include': 'audio,user,user_defined_tags,campaign,attachments_media',
})
attributes = post['data']['attributes']
title = attributes['title'].strip()
@@ -190,11 +212,16 @@ class PatreonIE(PatreonBaseIE):
media_attributes = i.get('attributes') or {}
download_url = media_attributes.get('download_url')
ext = mimetype2ext(media_attributes.get('mimetype'))
if download_url and ext in KNOWN_EXTENSIONS:
# if size_bytes is None, this media file is likely unavailable
# See: https://github.com/yt-dlp/yt-dlp/issues/4608
size_bytes = int_or_none(media_attributes.get('size_bytes'))
if download_url and ext in KNOWN_EXTENSIONS and size_bytes is not None:
# XXX: what happens if there are multiple attachments?
return {
**info,
'ext': ext,
'filesize': int_or_none(media_attributes.get('size_bytes')),
'filesize': size_bytes,
'url': download_url,
}
elif i_type == 'user':

View File

@@ -51,6 +51,9 @@ class RaiBaseIE(InfoExtractor):
query={'output': 45, 'pl': platform},
headers=self.geo_verification_headers())
if xpath_text(relinker, './license_url', default='{}') != '{}':
self.report_drm(video_id)
if not geoprotection:
geoprotection = xpath_text(
relinker, './geoprotection', default=None) == 'Y'
@@ -251,6 +254,8 @@ class RaiPlayIE(RaiBaseIE):
},
'release_year': 2022,
'episode': 'Espresso nel caffè - 07/04/2014',
'timestamp': 1396919880,
'upload_date': '20140408',
},
'params': {
'skip_download': True,
@@ -274,6 +279,8 @@ class RaiPlayIE(RaiBaseIE):
'release_year': 2021,
'season_number': 1,
'episode': 'Senza occhi',
'timestamp': 1637318940,
'upload_date': '20211119',
},
}, {
'url': 'http://www.raiplay.it/video/2016/11/gazebotraindesi-efebe701-969c-4593-92f3-285f0d1ce750.html?',
@@ -284,7 +291,7 @@ class RaiPlayIE(RaiBaseIE):
'only_matching': True,
}, {
# DRM protected
'url': 'https://www.raiplay.it/video/2020/09/Lo-straordinario-mondo-di-Zoey-S1E1-Lo-straordinario-potere-di-Zoey-ed493918-1d32-44b7-8454-862e473d00ff.html',
'url': 'https://www.raiplay.it/video/2021/06/Lo-straordinario-mondo-di-Zoey-S2E1-Lo-straordinario-ritorno-di-Zoey-3ba992de-2332-41ad-9214-73e32ab209f4.html',
'only_matching': True,
}]
@@ -363,6 +370,8 @@ class RaiPlayLiveIE(RaiPlayIE):
'creator': 'Rai News 24',
'is_live': True,
'live_status': 'is_live',
'upload_date': '20090502',
'timestamp': 1241276220,
},
'params': {
'skip_download': True,
@@ -448,6 +457,8 @@ class RaiPlaySoundIE(RaiBaseIE):
'series': 'Il Ruggito del Coniglio',
'episode': 'Il Ruggito del Coniglio del 10/12/2021',
'creator': 'rai radio 2',
'timestamp': 1638346620,
'upload_date': '20211201',
},
'params': {
'skip_download': True,
@@ -707,7 +718,8 @@ class RaiIE(RaiBaseIE):
class RaiNewsIE(RaiIE):
_VALID_URL = rf'https?://(www\.)?rainews\.it/[^?#]+-(?P<id>{RaiBaseIE._UUID_RE})(?:-[^/?#]+)?\.html'
_VALID_URL = rf'https?://(www\.)?rainews\.it/(?!articoli)[^?#]+-(?P<id>{RaiBaseIE._UUID_RE})(?:-[^/?#]+)?\.html'
_EMBED_REGEX = [rf'<iframe[^>]+data-src="(?P<url>/iframe/[^?#]+?{RaiBaseIE._UUID_RE}\.html)']
_TESTS = [{
# new rainews player (#3911)
'url': 'https://www.rainews.it/rubriche/24mm/video/2022/05/24mm-del-29052022-12cf645d-1ffd-4220-b27c-07c226dbdecf.html',
@@ -732,6 +744,10 @@ class RaiNewsIE(RaiIE):
'upload_date': '20161103'
},
'expected_warnings': ['unable to extract player_data'],
}, {
# iframe + drm
'url': 'https://www.rainews.it/iframe/video/2022/07/euro2022-europei-calcio-femminile-italia-belgio-gol-0-1-video-4de06a69-de75-4e32-a657-02f0885f8118.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -755,6 +771,7 @@ class RaiNewsIE(RaiIE):
raise ExtractorError('Relinker URL not found', cause=e)
relinker_info = self._extract_relinker_info(urljoin(url, relinker_url), video_id)
self._sort_formats(relinker_info['formats'])
return {
@@ -769,13 +786,13 @@ class RaiNewsIE(RaiIE):
class RaiSudtirolIE(RaiBaseIE):
_VALID_URL = r'https?://raisudtirol\.rai\.it/.+?media=(?P<id>[TP]tv\d+)'
_TESTS = [{
'url': 'https://raisudtirol.rai.it/de/index.php?media=Ttv1656281400',
'url': 'https://raisudtirol.rai.it/la/index.php?media=Ptv1619729460',
'info_dict': {
'id': 'Ttv1656281400',
'id': 'Ptv1619729460',
'ext': 'mp4',
'title': 'Tagesschau + Sport am Sonntag - 31-07-2022 20:00',
'series': 'Tagesschau + Sport am Sonntag',
'upload_date': '20220731',
'title': 'Euro: trasmisciun d\'economia - 29-04-2021 20:51',
'series': 'Euro: trasmisciun d\'economia',
'upload_date': '20210429',
'thumbnail': r're:https://raisudtirol\.rai\.it/img/.+?\.jpg',
'uploader': 'raisudtirol',
}
@@ -796,6 +813,14 @@ class RaiSudtirolIE(RaiBaseIE):
'series': video_title,
'upload_date': unified_strdate(video_date),
'thumbnail': urljoin('https://raisudtirol.rai.it/', video_thumb),
'url': self._proto_relative_url(video_url),
'uploader': 'raisudtirol',
'formats': [{
'format_id': 'https-mp4',
'url': self._proto_relative_url(video_url),
'width': 1024,
'height': 576,
'fps': 25,
'vcodec': 'h264',
'acodec': 'aac',
}],
}

View File

@@ -4,7 +4,7 @@ from ..utils import int_or_none, parse_qs
class ToggoIE(InfoExtractor):
IE_NAME = 'toggo'
_VALID_URL = r'https?://(?:www\.)?toggo\.de/(?:toggolino/)?[^/?#]+/folge/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?toggo\.de/(?:toggolino/)?[^/?#]+/(?:folge|video)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.toggo.de/weihnachtsmann--co-kg/folge/ein-geschenk-fuer-zwei',
'info_dict': {
@@ -33,6 +33,9 @@ class ToggoIE(InfoExtractor):
}, {
'url': 'https://www.toggo.de/toggolino/paw-patrol/folge/der-wetter-zeppelin-der-chili-kochwettbewerb',
'only_matching': True,
}, {
'url': 'https://www.toggo.de/toggolino/paw-patrol/video/paw-patrol-rettung-im-anflug',
'only_matching': True,
}]
def _real_extract(self, url):

69
yt_dlp/extractor/truth.py Normal file
View File

@@ -0,0 +1,69 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
format_field,
int_or_none,
strip_or_none,
traverse_obj,
unified_timestamp,
)
class TruthIE(InfoExtractor):
_VALID_URL = r'https?://truthsocial\.com/@[^/]+/posts/(?P<id>\d+)'
_TESTS = [
{
'url': 'https://truthsocial.com/@realDonaldTrump/posts/108779000807761862',
'md5': '4a5fb1470c192e493d9efd6f19e514d3',
'info_dict': {
'id': '108779000807761862',
'ext': 'qt',
'title': 'Truth video #108779000807761862',
'description': None,
'timestamp': 1659835827,
'upload_date': '20220807',
'uploader': 'Donald J. Trump',
'uploader_id': 'realDonaldTrump',
'uploader_url': 'https://truthsocial.com/@realDonaldTrump',
'repost_count': int,
'comment_count': int,
'like_count': int,
},
},
{
'url': 'https://truthsocial.com/@ProjectVeritasAction/posts/108618228543962049',
'md5': 'fd47ba68933f9dce27accc52275be9c3',
'info_dict': {
'id': '108618228543962049',
'ext': 'mp4',
'title': 'md5:debde7186cf83f60ff7b44dbb9444e35',
'description': 'md5:de2fc49045bf92bb8dc97e56503b150f',
'timestamp': 1657382637,
'upload_date': '20220709',
'uploader': 'Project Veritas Action',
'uploader_id': 'ProjectVeritasAction',
'uploader_url': 'https://truthsocial.com/@ProjectVeritasAction',
'repost_count': int,
'comment_count': int,
'like_count': int,
},
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
status = self._download_json(f'https://truthsocial.com/api/v1/statuses/{video_id}', video_id)
uploader_id = strip_or_none(traverse_obj(status, ('account', 'username')))
return {
'id': video_id,
'url': status['media_attachments'][0]['url'],
'title': '',
'description': strip_or_none(clean_html(status.get('content'))) or None,
'timestamp': unified_timestamp(status.get('created_at')),
'uploader': strip_or_none(traverse_obj(status, ('account', 'display_name'))),
'uploader_id': uploader_id,
'uploader_url': format_field(uploader_id, None, 'https://truthsocial.com/@%s'),
'repost_count': int_or_none(status.get('reblogs_count')),
'like_count': int_or_none(status.get('favourites_count')),
'comment_count': int_or_none(status.get('replies_count')),
}

View File

@@ -70,16 +70,17 @@ class TubiTvIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
'https://tubitv.com/oz/videos/%s/content?video_resources=dash&video_resources=hlsv3&video_resources=hlsv6' % video_id, video_id)
title = video_data['title']
formats = []
url = video_data['url']
# URL can be sometimes empty. Does this only happen when there is DRM?
if url:
formats = self._extract_m3u8_formats(
self._proto_relative_url(url),
video_id, 'mp4', 'm3u8_native')
for resource in video_data['video_resources']:
if resource['type'] in ('dash', ):
formats += self._extract_mpd_formats(resource['manifest']['url'], video_id, mpd_id=resource['type'], fatal=False)
elif resource['type'] in ('hlsv3', 'hlsv6'):
formats += self._extract_m3u8_formats(resource['manifest']['url'], video_id, 'mp4', m3u8_id=resource['type'], fatal=False)
self._sort_formats(formats)
thumbnails = []

View File

@@ -1169,7 +1169,7 @@ class TwitchClipsIE(TwitchBaseIE):
'id': clip.get('id') or video_id,
'_old_archive_ids': [make_archive_id(self, old_id)] if old_id else None,
'display_id': video_id,
'title': clip.get('title') or video_id,
'title': clip.get('title'),
'formats': formats,
'duration': int_or_none(clip.get('durationSeconds')),
'view_count': int_or_none(clip.get('viewCount')),

View File

@@ -2653,7 +2653,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if self.get_param('youtube_print_sig_code'):
self.to_screen(f'Extracted nsig function from {player_id}:\n{func_code[1]}\n')
return lambda s: jsi.extract_function_from_code(*func_code)([s])
func = jsi.extract_function_from_code(*func_code)
return lambda s: func([s])
def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False):
"""
@@ -3246,9 +3247,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else -10 if 'descriptive' in (audio_track.get('displayName') or '').lower() and -10
else -1)
# Some formats may have much smaller duration than others (possibly damaged during encoding)
# Eg: 2-nOtRESiUc Ref: https://github.com/yt-dlp/yt-dlp/issues/2823
# E.g. 2-nOtRESiUc Ref: https://github.com/yt-dlp/yt-dlp/issues/2823
# Make sure to avoid false positives with small duration differences.
# Eg: __2ABJjxzNo, ySuUZEjARPY
# E.g. __2ABJjxzNo, ySuUZEjARPY
is_damaged = try_get(fmt, lambda x: float(x['approxDurationMs']) / duration < 500)
if is_damaged:
self.report_warning(
@@ -3588,7 +3589,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
formats.extend(self._extract_storyboard(player_responses, duration))
# source_preference is lower for throttled/potentially damaged formats
self._sort_formats(formats, ('quality', 'res', 'fps', 'hdr:12', 'channels', 'source', 'codec:vp9.2', 'lang', 'proto'))
self._sort_formats(formats, (
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec:vp9.2', 'channels', 'acodec', 'lang', 'proto'))
info = {
'id': video_id,
@@ -5832,7 +5834,7 @@ class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
class YoutubeMusicSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube music search URLs with selectable sections (Eg: #songs)'
IE_DESC = 'YouTube music search URLs with selectable sections, e.g. #songs'
IE_NAME = 'youtube:music:search_url'
_VALID_URL = r'https?://music\.youtube\.com/search\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{

View File

@@ -2,10 +2,7 @@ import re
from uuid import uuid4
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..compat import compat_HTTPError, compat_str
from ..utils import (
ExtractorError,
int_or_none,
@@ -237,6 +234,10 @@ class ZattooPlatformBaseIE(InfoExtractor):
ondemand_termtoken=ondemand_termtoken, ondemand_type=ondemand_type)
return info_dict
def _real_extract(self, url):
video_id, record_id = self._match_valid_url(url).groups()
return self._extract_video(video_id, record_id)
def _make_valid_url(host):
return rf'https?://(?:www\.)?{re.escape(host)}/watch/[^/]+?/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'

View File

@@ -1,29 +1,72 @@
import collections
import contextlib
import itertools
import json
import math
import operator
import re
from .utils import ExtractorError, remove_quotes
from .utils import (
NO_DEFAULT,
ExtractorError,
js_to_json,
remove_quotes,
truncate_string,
unified_timestamp,
write_string,
)
_NAME_RE = r'[a-zA-Z_$][\w$]*'
_OPERATORS = {
# Ref: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence
_OPERATORS = { # None => Defined in JSInterpreter._operator
'?': None,
'||': None,
'&&': None,
'&': operator.and_,
'|': operator.or_,
'^': operator.xor,
'&': operator.and_,
'===': operator.is_,
'!==': operator.is_not,
'==': operator.eq,
'!=': operator.ne,
'<=': operator.le,
'>=': operator.ge,
'<': operator.lt,
'>': operator.gt,
'>>': operator.rshift,
'<<': operator.lshift,
'-': operator.sub,
'+': operator.add,
'%': operator.mod,
'/': operator.truediv,
'-': operator.sub,
'*': operator.mul,
'/': operator.truediv,
'%': operator.mod,
'**': operator.pow,
}
_COMP_OPERATORS = {'===', '!==', '==', '!=', '<=', '>=', '<', '>'}
_MATCHING_PARENS = dict(zip('({[', ')}]'))
_QUOTES = '\'"'
def _ternary(cndn, if_true=True, if_false=False):
"""Simulate JS's ternary operator (cndn?if_true:if_false)"""
if cndn in (False, None, 0, ''):
return if_false
with contextlib.suppress(TypeError):
if math.isnan(cndn): # NB: NaN cannot be checked by membership
return if_false
return if_true
class JS_Break(ExtractorError):
def __init__(self):
ExtractorError.__init__(self, 'Invalid break')
@@ -46,6 +89,27 @@ class LocalNameSpace(collections.ChainMap):
raise NotImplementedError('Deleting is not supported')
class Debugger:
import sys
ENABLED = False and 'pytest' in sys.modules
@staticmethod
def write(*args, level=100):
write_string(f'[debug] JS: {" " * (100 - level)}'
f'{" ".join(truncate_string(str(x), 50, 50) for x in args)}\n')
@classmethod
def wrap_interpreter(cls, f):
def interpret_statement(self, stmt, local_vars, allow_recursion, *args, **kwargs):
if cls.ENABLED and stmt.strip():
cls.write(stmt, level=allow_recursion)
ret, should_ret = f(self, stmt, local_vars, allow_recursion, *args, **kwargs)
if cls.ENABLED and stmt.strip():
cls.write(['->', '=>'][should_ret], repr(ret), '<-|', stmt, level=allow_recursion)
return ret, should_ret
return interpret_statement
class JSInterpreter:
__named_object_counter = 0
@@ -53,6 +117,12 @@ class JSInterpreter:
self.code, self._functions = code, {}
self._objects = {} if objects is None else objects
class Exception(ExtractorError):
def __init__(self, msg, expr=None, *args, **kwargs):
if expr is not None:
msg = f'{msg.rstrip()} in: {truncate_string(expr, 50, 50)}'
super().__init__(msg, *args, **kwargs)
def _named_object(self, namespace, obj):
self.__named_object_counter += 1
name = f'__yt_dlp_jsinterp_obj{self.__named_object_counter}'
@@ -67,9 +137,9 @@ class JSInterpreter:
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
in_quote, escaping = None, False
for idx, char in enumerate(expr):
if char in _MATCHING_PARENS:
if not in_quote and char in _MATCHING_PARENS:
counters[_MATCHING_PARENS[char]] += 1
elif char in counters:
elif not in_quote and char in counters:
counters[char] -= 1
elif not escaping and char in _QUOTES and in_quote in (char, None):
in_quote = None if in_quote else char
@@ -92,53 +162,99 @@ class JSInterpreter:
def _separate_at_paren(cls, expr, delim):
separated = list(cls._separate(expr, delim, 1))
if len(separated) < 2:
raise ExtractorError(f'No terminating paren {delim} in {expr}')
raise cls.Exception(f'No terminating paren {delim}', expr)
return separated[0][1:].strip(), separated[1].strip()
def _operator(self, op, left_val, right_expr, expr, local_vars, allow_recursion):
if op in ('||', '&&'):
if (op == '&&') ^ _ternary(left_val):
return left_val # short circuiting
elif op == '?':
right_expr = _ternary(left_val, *self._separate(right_expr, ':', 1))
right_val = self.interpret_expression(right_expr, local_vars, allow_recursion)
if not _OPERATORS.get(op):
return right_val
try:
return _OPERATORS[op](left_val, right_val)
except Exception as e:
raise self.Exception(f'Failed to evaluate {left_val!r} {op} {right_val!r}', expr, cause=e)
def _index(self, obj, idx):
if idx == 'length':
return len(obj)
try:
return obj[int(idx)] if isinstance(obj, list) else obj[idx]
except Exception as e:
raise self.Exception(f'Cannot get index {idx}', repr(obj), cause=e)
def _dump(self, obj, namespace):
try:
return json.dumps(obj)
except TypeError:
return self._named_object(namespace, obj)
@Debugger.wrap_interpreter
def interpret_statement(self, stmt, local_vars, allow_recursion=100):
if allow_recursion < 0:
raise ExtractorError('Recursion limit reached')
raise self.Exception('Recursion limit reached')
allow_recursion -= 1
should_abort = False
should_return = False
sub_statements = list(self._separate(stmt, ';')) or ['']
stmt = sub_statements.pop().lstrip()
expr = stmt = sub_statements.pop().strip()
for sub_stmt in sub_statements:
ret, should_abort = self.interpret_statement(sub_stmt, local_vars, allow_recursion - 1)
if should_abort:
return ret, should_abort
ret, should_return = self.interpret_statement(sub_stmt, local_vars, allow_recursion)
if should_return:
return ret, should_return
m = re.match(r'(?P<var>var\s)|return(?:\s+|$)', stmt)
if not m: # Try interpreting it as an expression
expr = stmt
elif m.group('var'):
expr = stmt[len(m.group(0)):]
else:
expr = stmt[len(m.group(0)):]
should_abort = True
return self.interpret_expression(expr, local_vars, allow_recursion), should_abort
def interpret_expression(self, expr, local_vars, allow_recursion):
expr = expr.strip()
m = re.match(r'(?P<var>(?:var|const|let)\s)|return(?:\s+|$)', stmt)
if m:
expr = stmt[len(m.group(0)):].strip()
should_return = not m.group('var')
if not expr:
return None
return None, should_return
if expr[0] in _QUOTES:
inner, outer = self._separate(expr, expr[0], 1)
inner = json.loads(js_to_json(f'{inner}{expr[0]}', strict=True))
if not outer:
return inner, should_return
expr = self._named_object(local_vars, inner) + outer
if expr.startswith('new '):
obj = expr[4:]
if obj.startswith('Date('):
left, right = self._separate_at_paren(obj[4:], ')')
expr = unified_timestamp(
self.interpret_expression(left, local_vars, allow_recursion), False)
if not expr:
raise self.Exception(f'Failed to parse date {left!r}', expr)
expr = self._dump(int(expr * 1000), local_vars) + right
else:
raise self.Exception(f'Unsupported object {obj}', expr)
if expr.startswith('void '):
left = self.interpret_expression(expr[5:], local_vars, allow_recursion)
return None, should_return
if expr.startswith('{'):
inner, outer = self._separate_at_paren(expr, '}')
inner, should_abort = self.interpret_statement(inner, local_vars, allow_recursion - 1)
inner, should_abort = self.interpret_statement(inner, local_vars, allow_recursion)
if not outer or should_abort:
return inner
return inner, should_abort or should_return
else:
expr = json.dumps(inner) + outer
expr = self._dump(inner, local_vars) + outer
if expr.startswith('('):
inner, outer = self._separate_at_paren(expr, ')')
inner = self.interpret_expression(inner, local_vars, allow_recursion)
if not outer:
return inner
inner, should_abort = self.interpret_statement(inner, local_vars, allow_recursion)
if not outer or should_abort:
return inner, should_abort or should_return
else:
expr = json.dumps(inner) + outer
expr = self._dump(inner, local_vars) + outer
if expr.startswith('['):
inner, outer = self._separate_at_paren(expr, ']')
@@ -147,21 +263,23 @@ class JSInterpreter:
for item in self._separate(inner)])
expr = name + outer
m = re.match(r'(?P<try>try)\s*|(?:(?P<catch>catch)|(?P<for>for)|(?P<switch>switch))\s*\(', expr)
m = re.match(r'(?P<try>try|finally)\s*|(?:(?P<catch>catch)|(?P<for>for)|(?P<switch>switch))\s*\(', expr)
if m and m.group('try'):
if expr[m.end()] == '{':
try_expr, expr = self._separate_at_paren(expr[m.end():], '}')
else:
try_expr, expr = expr[m.end() - 1:], ''
ret, should_abort = self.interpret_statement(try_expr, local_vars, allow_recursion - 1)
ret, should_abort = self.interpret_statement(try_expr, local_vars, allow_recursion)
if should_abort:
return ret
return self.interpret_statement(expr, local_vars, allow_recursion - 1)[0]
return ret, True
ret, should_abort = self.interpret_statement(expr, local_vars, allow_recursion)
return ret, should_abort or should_return
elif m and m.group('catch'):
# We ignore the catch block
_, expr = self._separate_at_paren(expr, '}')
return self.interpret_statement(expr, local_vars, allow_recursion - 1)[0]
ret, should_abort = self.interpret_statement(expr, local_vars, allow_recursion)
return ret, should_abort or should_return
elif m and m.group('for'):
constructor, remaining = self._separate_at_paren(expr[m.end() - 1:], ')')
@@ -176,24 +294,21 @@ class JSInterpreter:
else:
body, expr = remaining, ''
start, cndn, increment = self._separate(constructor, ';')
if self.interpret_statement(start, local_vars, allow_recursion - 1)[1]:
raise ExtractorError(
f'Premature return in the initialization of a for loop in {constructor!r}')
self.interpret_expression(start, local_vars, allow_recursion)
while True:
if not self.interpret_expression(cndn, local_vars, allow_recursion):
if not _ternary(self.interpret_expression(cndn, local_vars, allow_recursion)):
break
try:
ret, should_abort = self.interpret_statement(body, local_vars, allow_recursion - 1)
ret, should_abort = self.interpret_statement(body, local_vars, allow_recursion)
if should_abort:
return ret
return ret, True
except JS_Break:
break
except JS_Continue:
pass
if self.interpret_statement(increment, local_vars, allow_recursion - 1)[1]:
raise ExtractorError(
f'Premature return in the initialization of a for loop in {constructor!r}')
return self.interpret_statement(expr, local_vars, allow_recursion - 1)[0]
self.interpret_expression(increment, local_vars, allow_recursion)
ret, should_abort = self.interpret_statement(expr, local_vars, allow_recursion)
return ret, should_abort or should_return
elif m and m.group('switch'):
switch_val, remaining = self._separate_at_paren(expr[m.end() - 1:], ')')
@@ -207,24 +322,28 @@ class JSInterpreter:
if default:
matched = matched or case == 'default'
elif not matched:
matched = case != 'default' and switch_val == self.interpret_expression(case, local_vars, allow_recursion)
matched = (case != 'default'
and switch_val == self.interpret_expression(case, local_vars, allow_recursion))
if not matched:
continue
try:
ret, should_abort = self.interpret_statement(stmt, local_vars, allow_recursion - 1)
ret, should_abort = self.interpret_statement(stmt, local_vars, allow_recursion)
if should_abort:
return ret
except JS_Break:
break
if matched:
break
return self.interpret_statement(expr, local_vars, allow_recursion - 1)[0]
ret, should_abort = self.interpret_statement(expr, local_vars, allow_recursion)
return ret, should_abort or should_return
# Comma separated statements
sub_expressions = list(self._separate(expr))
expr = sub_expressions.pop().strip() if sub_expressions else ''
for sub_expr in sub_expressions:
self.interpret_expression(sub_expr, local_vars, allow_recursion)
ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion)
if should_abort:
return ret, True
for m in re.finditer(rf'''(?x)
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
@@ -236,47 +355,45 @@ class JSInterpreter:
local_vars[var] += 1 if sign[0] == '+' else -1
if m.group('pre_sign'):
ret = local_vars[var]
expr = expr[:start] + json.dumps(ret) + expr[end:]
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
if not expr:
return None
return None, should_return
m = re.match(fr'''(?x)
(?P<assign>
(?P<out>{_NAME_RE})(?:\[(?P<index>[^\]]+?)\])?\s*
(?P<op>{"|".join(map(re.escape, _OPERATORS))})?
(?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})?
=(?P<expr>.*)$
)|(?P<return>
(?!if|return|true|false|null)(?P<name>{_NAME_RE})$
(?!if|return|true|false|null|undefined)(?P<name>{_NAME_RE})$
)|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:\.(?P<member>[^(]+)|\[(?P<member2>[^\]]+)\])\s*
)|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>[\w$,]*)\)$
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr)
if m and m.group('assign'):
if not m.group('op'):
opfunc = lambda curr, right: right
else:
opfunc = _OPERATORS[m.group('op')]
right_val = self.interpret_expression(m.group('expr'), local_vars, allow_recursion)
left_val = local_vars.get(m.group('out'))
if not m.group('index'):
local_vars[m.group('out')] = opfunc(left_val, right_val)
return local_vars[m.group('out')]
local_vars[m.group('out')] = self._operator(
m.group('op'), left_val, m.group('expr'), expr, local_vars, allow_recursion)
return local_vars[m.group('out')], should_return
elif left_val is None:
raise ExtractorError(f'Cannot index undefined variable: {m.group("out")}')
raise self.Exception(f'Cannot index undefined variable {m.group("out")}', expr)
idx = self.interpret_expression(m.group('index'), local_vars, allow_recursion)
if not isinstance(idx, int):
raise ExtractorError(f'List indices must be integers: {idx}')
left_val[idx] = opfunc(left_val[idx], right_val)
return left_val[idx]
if not isinstance(idx, (int, float)):
raise self.Exception(f'List index {idx} must be integer', expr)
idx = int(idx)
left_val[idx] = self._operator(
m.group('op'), left_val[idx], m.group('expr'), expr, local_vars, allow_recursion)
return left_val[idx], should_return
elif expr.isdigit():
return int(expr)
return int(expr), should_return
elif expr == 'break':
raise JS_Break()
@@ -284,35 +401,35 @@ class JSInterpreter:
raise JS_Continue()
elif m and m.group('return'):
return local_vars[m.group('name')]
return local_vars[m.group('name')], should_return
with contextlib.suppress(ValueError):
return json.loads(expr)
return json.loads(js_to_json(expr, strict=True)), should_return
if m and m.group('indexing'):
val = local_vars[m.group('in')]
idx = self.interpret_expression(m.group('idx'), local_vars, allow_recursion)
return val[idx]
return self._index(val, idx), should_return
for op, opfunc in _OPERATORS.items():
for op in _OPERATORS:
separated = list(self._separate(expr, op))
if len(separated) < 2:
right_expr = separated.pop()
while op in '<>*-' and len(separated) > 1 and not separated[-1].strip():
separated.pop()
right_expr = f'{op}{right_expr}'
if op != '-':
right_expr = f'{separated.pop()}{op}{right_expr}'
if not separated:
continue
right_val = separated.pop()
left_val = op.join(separated)
left_val, should_abort = self.interpret_statement(
left_val, local_vars, allow_recursion - 1)
if should_abort:
raise ExtractorError(f'Premature left-side return of {op} in {expr!r}')
right_val, should_abort = self.interpret_statement(
right_val, local_vars, allow_recursion - 1)
if should_abort:
raise ExtractorError(f'Premature right-side return of {op} in {expr!r}')
return opfunc(left_val or 0, right_val)
left_val = self.interpret_expression(op.join(separated), local_vars, allow_recursion)
return self._operator(op, 0 if left_val is None else left_val,
right_expr, expr, local_vars, allow_recursion), should_return
if m and m.group('attribute'):
variable = m.group('var')
member = remove_quotes(m.group('member') or m.group('member2'))
member = m.group('member')
if not member:
member = self.interpret_expression(m.group('member2'), local_vars, allow_recursion)
arg_str = expr[m.end():]
if arg_str.startswith('('):
arg_str, remaining = self._separate_at_paren(arg_str, ')')
@@ -322,23 +439,27 @@ class JSInterpreter:
def assertion(cndn, msg):
""" assert, but without risk of getting optimized out """
if not cndn:
raise ExtractorError(f'{member} {msg}: {expr}')
raise self.Exception(f'{member} {msg}', expr)
def eval_method():
if variable == 'String':
obj = str
elif variable in local_vars:
obj = local_vars[variable]
else:
if (variable, member) == ('console', 'debug'):
if Debugger.ENABLED:
Debugger.write(self.interpret_expression(f'[{arg_str}]', local_vars, allow_recursion))
return
types = {
'String': str,
'Math': float,
}
obj = local_vars.get(variable, types.get(variable, NO_DEFAULT))
if obj is NO_DEFAULT:
if variable not in self._objects:
self._objects[variable] = self.extract_object(variable)
obj = self._objects[variable]
# Member access
if arg_str is None:
if member == 'length':
return len(obj)
return obj[member]
return self._index(obj, member)
# Function call
argvals = [
@@ -349,12 +470,17 @@ class JSInterpreter:
if member == 'fromCharCode':
assertion(argvals, 'takes one or more arguments')
return ''.join(map(chr, argvals))
raise ExtractorError(f'Unsupported string method {member}')
raise self.Exception(f'Unsupported String method {member}', expr)
elif obj == float:
if member == 'pow':
assertion(len(argvals) == 2, 'takes two arguments')
return argvals[0] ** argvals[1]
raise self.Exception(f'Unsupported Math method {member}', expr)
if member == 'split':
assertion(argvals, 'takes one or more arguments')
assertion(argvals == [''], 'with arguments is not implemented')
return list(obj)
assertion(len(argvals) == 1, 'with limit argument is not implemented')
return obj.split(argvals[0]) if argvals[0] else list(obj)
elif member == 'join':
assertion(isinstance(obj, list), 'must be applied on a list')
assertion(len(argvals) == 1, 'takes exactly one argument')
@@ -400,7 +526,7 @@ class JSInterpreter:
assertion(argvals, 'takes one or more arguments')
assertion(len(argvals) <= 2, 'takes at-most 2 arguments')
f, this = (argvals + [''])[:2]
return [f((item, idx, obj), this=this) for idx, item in enumerate(obj)]
return [f((item, idx, obj), {'this': this}, allow_recursion) for idx, item in enumerate(obj)]
elif member == 'indexOf':
assertion(argvals, 'takes one or more arguments')
assertion(len(argvals) <= 2, 'takes at-most 2 arguments')
@@ -410,27 +536,35 @@ class JSInterpreter:
except ValueError:
return -1
return obj[int(member) if isinstance(obj, list) else member](argvals)
idx = int(member) if isinstance(obj, list) else member
return obj[idx](argvals, allow_recursion=allow_recursion)
if remaining:
return self.interpret_expression(
ret, should_abort = self.interpret_statement(
self._named_object(local_vars, eval_method()) + remaining,
local_vars, allow_recursion)
return ret, should_return or should_abort
else:
return eval_method()
return eval_method(), should_return
elif m and m.group('function'):
fname = m.group('fname')
argvals = tuple(
int(v) if v.isdigit() else local_vars[v]
for v in self._separate(m.group('args')))
argvals = [self.interpret_expression(v, local_vars, allow_recursion)
for v in self._separate(m.group('args'))]
if fname in local_vars:
return local_vars[fname](argvals)
return local_vars[fname](argvals, allow_recursion=allow_recursion), should_return
elif fname not in self._functions:
self._functions[fname] = self.extract_function(fname)
return self._functions[fname](argvals)
return self._functions[fname](argvals, allow_recursion=allow_recursion), should_return
raise ExtractorError(f'Unsupported JS expression {expr!r}')
raise self.Exception(
f'Unsupported JS expression {truncate_string(expr, 20, 20) if expr != stmt else ""}', stmt)
def interpret_expression(self, expr, local_vars, allow_recursion):
ret, should_return = self.interpret_statement(expr, local_vars, allow_recursion)
if should_return:
raise self.Exception('Cannot return from an expression', expr)
return ret
def extract_object(self, objname):
_FUNC_NAME_RE = r'''(?:[a-zA-Z$0-9]+|"[a-zA-Z$0-9]+"|'[a-zA-Z$0-9]+')'''
@@ -442,12 +576,14 @@ class JSInterpreter:
}\s*;
''' % (re.escape(objname), _FUNC_NAME_RE),
self.code)
if not obj_m:
raise self.Exception(f'Could not find object {objname}')
fields = obj_m.group('fields')
# Currently, it only supports function definitions
fields_m = re.finditer(
r'''(?x)
(?P<key>%s)\s*:\s*function\s*\((?P<args>[a-z,]+)\){(?P<code>[^}]+)}
''' % _FUNC_NAME_RE,
(?P<key>%s)\s*:\s*function\s*\((?P<args>(?:%s|,)*)\){(?P<code>[^}]+)}
''' % (_FUNC_NAME_RE, _NAME_RE),
fields)
for f in fields_m:
argnames = f.group('args').split(',')
@@ -458,19 +594,19 @@ class JSInterpreter:
def extract_function_code(self, funcname):
""" @returns argnames, code """
func_m = re.search(
r'''(?x)
r'''(?xs)
(?:
function\s+%(name)s|
[{;,]\s*%(name)s\s*=\s*function|
var\s+%(name)s\s*=\s*function
(?:var|const|let)\s+%(name)s\s*=\s*function
)\s*
\((?P<args>[^)]*)\)\s*
(?P<code>{(?:(?!};)[^"]|"([^"]|\\")*")+})''' % {'name': re.escape(funcname)},
(?P<code>{.+})''' % {'name': re.escape(funcname)},
self.code)
code, _ = self._separate_at_paren(func_m.group('code'), '}') # refine the match
code, _ = self._separate_at_paren(func_m.group('code'), '}')
if func_m is None:
raise ExtractorError(f'Could not find JS function "{funcname}"')
return func_m.group('args').split(','), code
raise self.Exception(f'Could not find JS function "{funcname}"')
return [x.strip() for x in func_m.group('args').split(',')], code
def extract_function(self, funcname):
return self.extract_function_from_code(*self.extract_function_code(funcname))
@@ -494,16 +630,13 @@ class JSInterpreter:
def build_function(self, argnames, code, *global_stack):
global_stack = list(global_stack) or [{}]
argnames = tuple(argnames)
def resf(args, **kwargs):
global_stack[0].update({
**dict(zip(argnames, args)),
**kwargs
})
def resf(args, kwargs={}, allow_recursion=100):
global_stack[0].update(itertools.zip_longest(argnames, args, fillvalue=None))
global_stack[0].update(kwargs)
var_stack = LocalNameSpace(*global_stack)
for stmt in self._separate(code.replace('\n', ''), ';'):
ret, should_abort = self.interpret_statement(stmt, var_stack)
if should_abort:
break
return ret
ret, should_abort = self.interpret_statement(code.replace('\n', ''), var_stack, allow_recursion - 1)
if should_abort:
return ret
return resf

View File

@@ -34,7 +34,7 @@ def format_text(text, f):
'''
@param f String representation of formatting to apply in the form:
[style] [light] font_color [on [light] bg_color]
Eg: "red", "bold green on light blue"
E.g. "red", "bold green on light blue"
'''
f = f.upper()
tokens = f.strip().split()

View File

@@ -77,7 +77,7 @@ def parseOpts(overrideArguments=None, ignore_config_files='if_override'):
if root.parse_known_args()[0].ignoreconfig:
return False
# Multiple package names can be given here
# Eg: ('yt-dlp', 'youtube-dlc', 'youtube-dl') will look for
# E.g. ('yt-dlp', 'youtube-dlc', 'youtube-dl') will look for
# the configuration file of any of these three packages
for package in ('yt-dlp',):
if user:
@@ -374,7 +374,7 @@ def create_parser():
dest='default_search', metavar='PREFIX',
help=(
'Use this prefix for unqualified URLs. '
'Eg: "gvsearch2:python" downloads two videos from google videos for the search term "python". '
'E.g. "gvsearch2:python" downloads two videos from google videos for the search term "python". '
'Use the value "auto" to let yt-dlp guess ("auto_warning" to emit a warning when guessing). '
'"error" just throws an error. The default value "fixup_error" repairs broken URLs, '
'but emits an error if this is not possible instead of searching'))
@@ -459,7 +459,7 @@ def create_parser():
help=(
'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". '
'Arguments are parsed according to the Python string formatting mini-language. '
'Eg: --alias get-audio,-X "-S=aext:{0},abr -x --audio-format {0}" creates options '
'E.g. --alias get-audio,-X "-S=aext:{0},abr -x --audio-format {0}" creates options '
'"--get-audio" and "-X" that takes an argument (ARG0) and expands to '
'"-S=aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. '
'Alias options can trigger more aliases; so be careful to avoid defining recursive options. '
@@ -471,8 +471,8 @@ def create_parser():
'--proxy', dest='proxy',
default=None, metavar='URL',
help=(
'Use the specified HTTP/HTTPS/SOCKS proxy. To enable SOCKS proxy, specify a proper scheme. '
'Eg: socks5://user:pass@127.0.0.1:1080/. Pass in an empty string (--proxy "") for direct connection'))
'Use the specified HTTP/HTTPS/SOCKS proxy. To enable SOCKS proxy, specify a proper scheme, '
'e.g. socks5://user:pass@127.0.0.1:1080/. Pass in an empty string (--proxy "") for direct connection'))
network.add_option(
'--socket-timeout',
dest='socket_timeout', type=float, default=None, metavar='SECONDS',
@@ -537,7 +537,7 @@ def create_parser():
'Comma separated playlist_index of the videos to download. '
'You can specify a range using "[START]:[STOP][:STEP]". For backward compatibility, START-STOP is also supported. '
'Use negative indices to count from the right and negative STEP to download in reverse order. '
'Eg: "-I 1:3,7,-5::2" used on a playlist of size 15 will download the videos at index 1,2,3,7,11,13,15'))
'E.g. "-I 1:3,7,-5::2" used on a playlist of size 15 will download the videos at index 1,2,3,7,11,13,15'))
selection.add_option(
'--match-title',
dest='matchtitle', metavar='REGEX',
@@ -549,17 +549,17 @@ def create_parser():
selection.add_option(
'--min-filesize',
metavar='SIZE', dest='min_filesize', default=None,
help='Do not download any videos smaller than SIZE (e.g. 50k or 44.6m)')
help='Do not download any videos smaller than SIZE, e.g. 50k or 44.6M')
selection.add_option(
'--max-filesize',
metavar='SIZE', dest='max_filesize', default=None,
help='Do not download any videos larger than SIZE (e.g. 50k or 44.6m)')
help='Do not download any videos larger than SIZE, e.g. 50k or 44.6M')
selection.add_option(
'--date',
metavar='DATE', dest='date', default=None,
help=(
'Download only videos uploaded on this date. The date can be "YYYYMMDD" or in the format '
'[now|today|yesterday][-N[day|week|month|year]]. Eg: --date today-2weeks'))
'[now|today|yesterday][-N[day|week|month|year]]. E.g. --date today-2weeks'))
selection.add_option(
'--datebefore',
metavar='DATE', dest='datebefore', default=None,
@@ -589,7 +589,7 @@ def create_parser():
'You can also simply specify a field to match if the field is present, '
'use "!field" to check if the field is not present, and "&" to check multiple conditions. '
'Use a "\\" to escape "&" or quotes if needed. If used multiple times, '
'the filter matches if atleast one of the conditions are met. Eg: --match-filter '
'the filter matches if atleast one of the conditions are met. E.g. --match-filter '
'!is_live --match-filter "like_count>?100 & description~=\'(?i)\\bcats \\& dogs\\b\'" '
'matches only videos that are not live OR those that have a like count more than 100 '
'(or the like field is not available) and also has a description '
@@ -785,7 +785,7 @@ def create_parser():
'--merge-output-format',
action='store', dest='merge_output_format', metavar='FORMAT', default=None,
help=(
'Containers that may be used when merging formats, separated by "/" (Eg: "mp4/mkv"). '
'Containers that may be used when merging formats, separated by "/", e.g. "mp4/mkv". '
'Ignored if no merge is required. '
f'(currently supported: {", ".join(sorted(FFmpegMergerPP.SUPPORTED_EXTS))})'))
video_format.add_option(
@@ -825,14 +825,14 @@ def create_parser():
subtitles.add_option(
'--sub-format',
action='store', dest='subtitlesformat', metavar='FORMAT', default='best',
help='Subtitle format; accepts formats preference, Eg: "srt" or "ass/srt/best"')
help='Subtitle format; accepts formats preference, e.g. "srt" or "ass/srt/best"')
subtitles.add_option(
'--sub-langs', '--srt-langs',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_list_from_options_callback,
help=(
'Languages of the subtitles to download (can be regex) or "all" separated by commas. (Eg: --sub-langs "en.*,ja") '
'You can prefix the language code with a "-" to exclude it from the requested languages. (Eg: --sub-langs all,-live_chat) '
'Languages of the subtitles to download (can be regex) or "all" separated by commas, e.g. --sub-langs "en.*,ja". '
'You can prefix the language code with a "-" to exclude it from the requested languages, e.g. --sub-langs all,-live_chat. '
'Use --list-subs for a list of available language tags'))
downloader = optparse.OptionGroup(parser, 'Download Options')
@@ -843,11 +843,11 @@ def create_parser():
downloader.add_option(
'-r', '--limit-rate', '--rate-limit',
dest='ratelimit', metavar='RATE',
help='Maximum download rate in bytes per second (e.g. 50K or 4.2M)')
help='Maximum download rate in bytes per second, e.g. 50K or 4.2M')
downloader.add_option(
'--throttled-rate',
dest='throttledratelimit', metavar='RATE',
help='Minimum download rate in bytes per second below which throttling is assumed and the video data is re-extracted (e.g. 100K)')
help='Minimum download rate in bytes per second below which throttling is assumed and the video data is re-extracted, e.g. 100K')
downloader.add_option(
'-R', '--retries',
dest='retries', metavar='RETRIES', default=10,
@@ -871,8 +871,8 @@ def create_parser():
'Time to sleep between retries in seconds (optionally) prefixed by the type of retry '
'(http (default), fragment, file_access, extractor) to apply the sleep to. '
'EXPR can be a number, linear=START[:END[:STEP=1]] or exp=START[:END[:BASE=2]]. '
'This option can be used multiple times to set the sleep for the different retry types. '
'Eg: --retry-sleep linear=1::2 --retry-sleep fragment:exp=1:20'))
'This option can be used multiple times to set the sleep for the different retry types, '
'e.g. --retry-sleep linear=1::2 --retry-sleep fragment:exp=1:20'))
downloader.add_option(
'--skip-unavailable-fragments', '--no-abort-on-unavailable-fragment',
action='store_true', dest='skip_unavailable_fragments', default=True,
@@ -892,7 +892,7 @@ def create_parser():
downloader.add_option(
'--buffer-size',
dest='buffersize', metavar='SIZE', default='1024',
help='Size of download buffer (e.g. 1024 or 16K) (default is %default)')
help='Size of download buffer, e.g. 1024 or 16K (default is %default)')
downloader.add_option(
'--resize-buffer',
action='store_false', dest='noresizebuffer',
@@ -905,7 +905,7 @@ def create_parser():
'--http-chunk-size',
dest='http_chunk_size', metavar='SIZE', default=None,
help=(
'Size of a chunk for chunk-based HTTP downloading (e.g. 10485760 or 10M) (default is disabled). '
'Size of a chunk for chunk-based HTTP downloading, e.g. 10485760 or 10M (default is disabled). '
'May be useful for bypassing bandwidth throttling imposed by a webserver (experimental)'))
downloader.add_option(
'--test',
@@ -963,8 +963,8 @@ def create_parser():
help=(
'Download only chapters whose title matches the given regular expression. '
'Time ranges prefixed by a "*" can also be used in place of chapters to download the specified range. '
'Eg: --download-sections "*10:15-15:00" --download-sections "intro". '
'Needs ffmpeg. This option can be used multiple times to download multiple sections'))
'Needs ffmpeg. This option can be used multiple times to download multiple sections, '
'e.g. --download-sections "*10:15-15:00" --download-sections "intro"'))
downloader.add_option(
'--downloader', '--external-downloader',
dest='external_downloader', metavar='[PROTO:]NAME', default={}, type='str',
@@ -978,7 +978,7 @@ def create_parser():
'the protocols (http, ftp, m3u8, dash, rstp, rtmp, mms) to use it for. '
f'Currently supports native, {", ".join(sorted(list_external_downloaders()))}. '
'You can use this option multiple times to set different downloaders for different protocols. '
'For example, --downloader aria2c --downloader "dash,m3u8:native" will use '
'E.g. --downloader aria2c --downloader "dash,m3u8:native" will use '
'aria2c for http/ftp downloads, and the native downloader for dash/m3u8 downloads '
'(Alias: --external-downloader)'))
downloader.add_option(
@@ -1188,7 +1188,7 @@ def create_parser():
'Template for progress outputs, optionally prefixed with one of "download:" (default), '
'"download-title:" (the console title), "postprocess:", or "postprocess-title:". '
'The video\'s fields are accessible under the "info" key and '
'the progress attributes are accessible under "progress" key. E.g.: '
'the progress attributes are accessible under "progress" key. E.g. '
# TODO: Document the fields inside "progress"
'--console-title --progress-template "download-title:%(info.id)s-%(progress.eta)s"'))
verbosity.add_option(
@@ -1488,7 +1488,7 @@ def create_parser():
'Remux the video into another container if necessary '
f'(currently supported: {", ".join(FFmpegVideoRemuxerPP.SUPPORTED_EXTS)}). '
'If target container does not support the video/audio codec, remuxing will fail. You can specify multiple rules; '
'Eg. "aac>m4a/mov>mp4/mkv" will remux aac to m4a, mov to mp4 and anything else to mkv'))
'e.g. "aac>m4a/mov>mp4/mkv" will remux aac to m4a, mov to mp4 and anything else to mkv'))
postproc.add_option(
'--recode-video',
metavar='FORMAT', dest='recodevideo', default=None,
@@ -1513,7 +1513,7 @@ def create_parser():
'You can also specify "PP+EXE:ARGS" to give the arguments to the specified executable '
'only when being used by the specified postprocessor. Additionally, for ffmpeg/ffprobe, '
'"_i"/"_o" can be appended to the prefix optionally followed by a number to pass the argument '
'before the specified input/output file. Eg: --ppa "Merger+ffmpeg_i1:-v quiet". '
'before the specified input/output file, e.g. --ppa "Merger+ffmpeg_i1:-v quiet". '
'You can use this option multiple times to give different arguments to different '
'postprocessors. (Alias: --ppa)'))
postproc.add_option(
@@ -1729,7 +1729,7 @@ def create_parser():
'SponsorBlock categories to create chapters for, separated by commas. '
f'Available categories are {", ".join(SponsorBlockPP.CATEGORIES.keys())}, all and default (=all). '
'You can prefix the category with a "-" to exclude it. See [1] for description of the categories. '
'Eg: --sponsorblock-mark all,-preview [1] https://wiki.sponsor.ajay.app/w/Segment_Categories'))
'E.g. --sponsorblock-mark all,-preview [1] https://wiki.sponsor.ajay.app/w/Segment_Categories'))
sponsorblock.add_option(
'--sponsorblock-remove', metavar='CATS',
dest='sponsorblock_remove', default=set(), action='callback', type='str',

View File

@@ -139,7 +139,8 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
if not success:
success = True
atomicparsley = next((
x for x in ['AtomicParsley', 'atomicparsley']
# libatomicparsley.so : See https://github.com/xibr/ytdlp-lazy/issues/1
x for x in ['AtomicParsley', 'atomicparsley', 'libatomicparsley.so']
if check_executable(x, ['-v'])), None)
if atomicparsley is None:
self.to_screen('Neither mutagen nor AtomicParsley was found. Falling back to ffmpeg')

View File

@@ -109,18 +109,24 @@ class FFmpegPostProcessor(PostProcessor):
return {p: p for p in programs}
if not os.path.exists(location):
self.report_warning(f'ffmpeg-location {location} does not exist! Continuing without ffmpeg')
self.report_warning(
f'ffmpeg-location {location} does not exist! Continuing without ffmpeg', only_once=True)
return {}
elif os.path.isdir(location):
dirname, basename = location, None
dirname, basename, filename = location, None, None
else:
basename = os.path.splitext(os.path.basename(location))[0]
basename = next((p for p in programs if basename.startswith(p)), 'ffmpeg')
filename = os.path.basename(location)
basename = next((p for p in programs if p in filename), 'ffmpeg')
dirname = os.path.dirname(os.path.abspath(location))
if basename in self._ffmpeg_to_avconv.keys():
self._prefer_ffmpeg = True
paths = {p: os.path.join(dirname, p) for p in programs}
if basename and basename in filename:
for p in programs:
path = os.path.join(dirname, filename.replace(basename, p))
if os.path.exists(path):
paths[p] = path
if basename:
paths[basename] = location
return paths
@@ -171,9 +177,9 @@ class FFmpegPostProcessor(PostProcessor):
return self.probe_basename
def _get_version(self, kind):
executables = (kind, self._ffmpeg_to_avconv[kind])
executables = (kind, )
if not self._prefer_ffmpeg:
executables = reversed(executables)
executables = (kind, self._ffmpeg_to_avconv[kind])
basename, version, features = next(filter(
lambda x: x[1], ((p, *self._get_ffmpeg_version(p)) for p in executables)), (None, None, {}))
if kind == 'ffmpeg':
@@ -1099,6 +1105,7 @@ class FFmpegThumbnailsConvertorPP(FFmpegPostProcessor):
continue
has_thumbnail = True
self.fixup_webp(info, idx)
original_thumbnail = thumbnail_dict['filepath'] # Path can change during fixup
thumbnail_ext = os.path.splitext(original_thumbnail)[1][1:].lower()
if thumbnail_ext == 'jpeg':
thumbnail_ext = 'jpg'

View File

@@ -9,7 +9,7 @@ import sys
from zipimport import zipimporter
from .compat import functools # isort: split
from .compat import compat_realpath
from .compat import compat_realpath, compat_shlex_quote
from .utils import (
Popen,
cached_method,
@@ -229,24 +229,33 @@ class Updater:
except OSError:
return self._report_permission_error(new_filename)
try:
if old_filename:
if old_filename:
mask = os.stat(self.filename).st_mode
try:
os.rename(self.filename, old_filename)
except OSError:
return self._report_error('Unable to move current version')
try:
if old_filename:
os.rename(new_filename, self.filename)
except OSError:
self._report_error('Unable to overwrite current version')
return os.rename(old_filename, self.filename)
except OSError:
return self._report_error('Unable to move current version')
if detect_variant() not in ('win32_exe', 'py2exe'):
if old_filename:
os.remove(old_filename)
else:
try:
os.rename(new_filename, self.filename)
except OSError:
self._report_error('Unable to overwrite current version')
return os.rename(old_filename, self.filename)
if detect_variant() in ('win32_exe', 'py2exe'):
atexit.register(Popen, f'ping 127.0.0.1 -n 5 -w 1000 & del /F "{old_filename}"',
shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
elif old_filename:
try:
os.remove(old_filename)
except OSError:
self._report_error('Unable to remove the old version')
try:
os.chmod(self.filename, mask)
except OSError:
return self._report_error(
f'Unable to set permissions. Run: sudo chmod a+rx {compat_shlex_quote(self.filename)}')
self.ydl.to_screen(f'Updated yt-dlp to version {self.new_version}')
return True

View File

@@ -150,6 +150,16 @@ MONTH_NAMES = {
'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'],
}
# From https://github.com/python/cpython/blob/3.11/Lib/email/_parseaddr.py#L36-L42
TIMEZONE_NAMES = {
'UT': 0, 'UTC': 0, 'GMT': 0, 'Z': 0,
'AST': -4, 'ADT': -3, # Atlantic (used in Canada)
'EST': -5, 'EDT': -4, # Eastern
'CST': -6, 'CDT': -5, # Central
'MST': -7, 'MDT': -6, # Mountain
'PST': -8, 'PDT': -7 # Pacific
}
# needed for sanitizing filenames in restricted mode
ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ',
itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOO', ['OE'], 'UUUUUY', ['TH', 'ss'],
@@ -600,7 +610,7 @@ def sanitize_open(filename, open_mode):
if sys.platform == 'win32':
import msvcrt
# stdout may be any IO stream. Eg, when using contextlib.redirect_stdout
# stdout may be any IO stream, e.g. when using contextlib.redirect_stdout
with contextlib.suppress(io.UnsupportedOperation):
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
return (sys.stdout.buffer if hasattr(sys.stdout, 'buffer') else sys.stdout, filename)
@@ -776,8 +786,8 @@ def _htmlentity_transform(entity_with_semicolon):
if entity in html.entities.name2codepoint:
return chr(html.entities.name2codepoint[entity])
# TODO: HTML5 allows entities without a semicolon. For example,
# '&Eacuteric' should be decoded as 'Éric'.
# TODO: HTML5 allows entities without a semicolon.
# E.g. '&Eacuteric' should be decoded as 'Éric'.
if entity_with_semicolon in html.entities.html5:
return html.entities.html5[entity_with_semicolon]
@@ -1684,7 +1694,11 @@ def extract_timezone(date_str):
$)
''', date_str)
if not m:
timezone = datetime.timedelta()
m = re.search(r'\d{1,2}:\d{1,2}(?:\.\d+)?(?P<tz>\s*[A-Z]+)$', date_str)
timezone = TIMEZONE_NAMES.get(m and m.group('tz').strip())
if timezone is not None:
date_str = date_str[:-len(m.group('tz'))]
timezone = datetime.timedelta(hours=timezone or 0)
else:
date_str = date_str[:-len(m.group('tz'))]
if not m.group('sign'):
@@ -1746,7 +1760,8 @@ def unified_timestamp(date_str, day_first=True):
if date_str is None:
return None
date_str = re.sub(r'[,|]', '', date_str)
date_str = re.sub(r'\s+', ' ', re.sub(
r'(?i)[,|]|(mon|tues?|wed(nes)?|thu(rs)?|fri|sat(ur)?)(day)?', '', date_str))
pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
timezone, date_str = extract_timezone(date_str)
@@ -1768,9 +1783,10 @@ def unified_timestamp(date_str, day_first=True):
with contextlib.suppress(ValueError):
dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
return calendar.timegm(dt.timetuple())
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
return calendar.timegm(timetuple) + pm_delta * 3600
return calendar.timegm(timetuple) + pm_delta * 3600 - timezone.total_seconds()
def determine_ext(url, default_ext='unknown_video'):
@@ -3199,7 +3215,7 @@ def strip_jsonp(code):
r'\g<callback_data>', code)
def js_to_json(code, vars={}):
def js_to_json(code, vars={}, *, strict=False):
# vars is a dict of var, val pairs to substitute
COMMENT_RE = r'/\*(?:(?!\*/).)*?\*/|//[^\n]*\n'
SKIP_RE = fr'\s*(?:{COMMENT_RE})?\s*'
@@ -3233,14 +3249,17 @@ def js_to_json(code, vars={}):
if v in vars:
return vars[v]
if strict:
raise ValueError(f'Unknown value: {v}')
return '"%s"' % v
def create_map(mobj):
return json.dumps(dict(json.loads(js_to_json(mobj.group(1) or '[]', vars=vars))))
code = re.sub(r'new Date\((".+")\)', r'\g<1>', code)
code = re.sub(r'new Map\((\[.*?\])?\)', create_map, code)
if not strict:
code = re.sub(r'new Date\((".+")\)', r'\g<1>', code)
return re.sub(r'''(?sx)
"(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
@@ -3482,8 +3501,8 @@ def get_compatible_ext(*, vcodecs, acodecs, vexts, aexts, preferences=None):
},
}
sanitize_codec = functools.partial(try_get, getter=lambda x: x.split('.')[0].replace('0', ''))
vcodec, acodec = sanitize_codec(vcodecs[0]), sanitize_codec(acodecs[0])
sanitize_codec = functools.partial(try_get, getter=lambda x: x[0].split('.')[0].replace('0', ''))
vcodec, acodec = sanitize_codec(vcodecs), sanitize_codec(acodecs)
for ext in preferences or COMPATIBLE_CODECS.keys():
codec_set = COMPATIBLE_CODECS.get(ext, set())
@@ -5759,6 +5778,13 @@ def make_archive_id(ie, video_id):
return f'{ie_key.lower()} {video_id}'
def truncate_string(s, left, right=0):
assert left > 3 and right >= 0
if s is None or len(s) <= left + right:
return s
return f'{s[:left-3]}...{s[-right:]}'
# Deprecated
has_certifi = bool(certifi)
has_websockets = bool(websockets)

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2022.08.08'
__version__ = '2022.08.14'
RELEASE_GIT_HEAD = '3157158f7'
RELEASE_GIT_HEAD = '55937202b'
VARIANT = None