mirror of
https://github.com/yt-dlp/yt-dlp.git
synced 2025-07-10 15:28:33 +00:00
Merge branch 'yt-dlp:master' into onsen
This commit is contained in:
commit
9582ee1522
@ -707,3 +707,11 @@ Sakura286
|
|||||||
SamDecrock
|
SamDecrock
|
||||||
stratus-ss
|
stratus-ss
|
||||||
subrat-lima
|
subrat-lima
|
||||||
|
gitninja1234
|
||||||
|
jkruse
|
||||||
|
xiaomac
|
||||||
|
wesson09
|
||||||
|
Crypto90
|
||||||
|
MutantPiggieGolem1
|
||||||
|
Sanceilaks
|
||||||
|
Strkmn
|
||||||
|
105
Changelog.md
105
Changelog.md
@ -4,11 +4,114 @@ # Changelog
|
|||||||
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
|
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
|
||||||
-->
|
-->
|
||||||
|
|
||||||
|
### 2025.01.15
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- **youtube**: [Do not use `web_creator` as a default client](https://github.com/yt-dlp/yt-dlp/commit/c8541f8b13e743fcfa06667530d13fee8686e22a) ([#12087](https://github.com/yt-dlp/yt-dlp/issues/12087)) by [bashonly](https://github.com/bashonly)
|
||||||
|
|
||||||
|
### 2025.01.12
|
||||||
|
|
||||||
|
#### Core changes
|
||||||
|
- [Fix filename sanitization with `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/8346b549150003df988538e54c9d8bc4de568979) ([#11988](https://github.com/yt-dlp/yt-dlp/issues/11988)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Validate retries values are non-negative](https://github.com/yt-dlp/yt-dlp/commit/1f4e1e85a27c5b43e34d7706cfd88ffce1b56a4a) ([#11927](https://github.com/yt-dlp/yt-dlp/issues/11927)) by [Strkmn](https://github.com/Strkmn)
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- **drtalks**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1f489f4a45691cac3f9e787d22a3a8a086229ba6) ([#10831](https://github.com/yt-dlp/yt-dlp/issues/10831)) by [pzhlkj6612](https://github.com/pzhlkj6612), [seproDev](https://github.com/seproDev)
|
||||||
|
- **plvideo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3c14e9191f3035b9a729d1d87bc0381f42de57cf) ([#10657](https://github.com/yt-dlp/yt-dlp/issues/10657)) by [Sanceilaks](https://github.com/Sanceilaks), [seproDev](https://github.com/seproDev)
|
||||||
|
- **vine**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/e2ef4fece6c9742d1733e3bae408c4787765f78c) ([#11700](https://github.com/yt-dlp/yt-dlp/issues/11700)) by [allendema](https://github.com/allendema)
|
||||||
|
- **xiaohongshu**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/763ed06ee69f13949397897bd42ff2ec3dc3d384) ([#11806](https://github.com/yt-dlp/yt-dlp/issues/11806)) by [HobbyistDev](https://github.com/HobbyistDev)
|
||||||
|
- **youtube**
|
||||||
|
- [Fix DASH formats incorrectly skipped in some situations](https://github.com/yt-dlp/yt-dlp/commit/0b6b7742c2e7f2a1fcb0b54ef3dd484bab404b3f) ([#11910](https://github.com/yt-dlp/yt-dlp/issues/11910)) by [coletdjnz](https://github.com/coletdjnz)
|
||||||
|
- [Refactor cookie auth](https://github.com/yt-dlp/yt-dlp/commit/75079f4e3f7dce49b61ef01da7adcd9876a0ca3b) ([#11989](https://github.com/yt-dlp/yt-dlp/issues/11989)) by [coletdjnz](https://github.com/coletdjnz)
|
||||||
|
- [Use `tv` instead of `mweb` client by default](https://github.com/yt-dlp/yt-dlp/commit/712d2abb32f59b2d246be2901255f84f1a4c30b3) ([#12059](https://github.com/yt-dlp/yt-dlp/issues/12059)) by [coletdjnz](https://github.com/coletdjnz)
|
||||||
|
|
||||||
|
#### Misc. changes
|
||||||
|
- **cleanup**: Miscellaneous: [dade5e3](https://github.com/yt-dlp/yt-dlp/commit/dade5e35c89adaad04408bfef766820dbca06ebe) by [grqz](https://github.com/grqz), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
|
||||||
|
|
||||||
|
### 2024.12.23
|
||||||
|
|
||||||
|
#### Core changes
|
||||||
|
- [Don't sanitize filename on Unix when `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/6fc85f617a5850307fd5b258477070e6ee177796) ([#9591](https://github.com/yt-dlp/yt-dlp/issues/9591)) by [pukkandan](https://github.com/pukkandan)
|
||||||
|
- **update**
|
||||||
|
- [Check 64-bitness when upgrading ARM builds](https://github.com/yt-dlp/yt-dlp/commit/b91c3925c2059970daa801cb131c0c2f4f302e72) ([#11819](https://github.com/yt-dlp/yt-dlp/issues/11819)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Fix endless update loop for `linux_exe` builds](https://github.com/yt-dlp/yt-dlp/commit/3d3ee458c1fe49dd5ebd7651a092119d23eb7000) ([#11827](https://github.com/yt-dlp/yt-dlp/issues/11827)) by [bashonly](https://github.com/bashonly)
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- **soundcloud**: [Various fixes](https://github.com/yt-dlp/yt-dlp/commit/d298693b1b266d198e8eeecb90ea17c4a031268f) ([#11820](https://github.com/yt-dlp/yt-dlp/issues/11820)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **youtube**
|
||||||
|
- [Add age-gate workaround for some embeddable videos](https://github.com/yt-dlp/yt-dlp/commit/09a6c687126f04e243fcb105a828787efddd1030) ([#11821](https://github.com/yt-dlp/yt-dlp/issues/11821)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Fix `uploader_id` extraction](https://github.com/yt-dlp/yt-dlp/commit/1a8851b689763e5173b96f70f8a71df0e4a44b66) ([#11818](https://github.com/yt-dlp/yt-dlp/issues/11818)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/65cf46cddd873fd229dbb0fc0689bca4c201c6b6) ([#11893](https://github.com/yt-dlp/yt-dlp/issues/11893)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Skip iOS formats that require PO Token](https://github.com/yt-dlp/yt-dlp/commit/9f42e68a74f3f00b0253fe70763abd57cac4237b) ([#11890](https://github.com/yt-dlp/yt-dlp/issues/11890)) by [coletdjnz](https://github.com/coletdjnz)
|
||||||
|
|
||||||
|
### 2024.12.13
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- **patreon**: campaign: [Support /c/ URLs](https://github.com/yt-dlp/yt-dlp/commit/bc262bcad4d3683ceadf61a7eb87e233e72adef3) ([#11756](https://github.com/yt-dlp/yt-dlp/issues/11756)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **soundcloud**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/f4d3e9e6dc25077b79849a31a2f67f93fdc01e62) ([#11777](https://github.com/yt-dlp/yt-dlp/issues/11777)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **youtube**
|
||||||
|
- [Fix `release_date` extraction](https://github.com/yt-dlp/yt-dlp/commit/d5e2a379f2adcb28bc48c7d9e90716d7278f89d2) ([#11759](https://github.com/yt-dlp/yt-dlp/issues/11759)) by [MutantPiggieGolem1](https://github.com/MutantPiggieGolem1)
|
||||||
|
- [Fix signature function extraction for `2f1832d2`](https://github.com/yt-dlp/yt-dlp/commit/5460cd91891bf613a2065e2fc278d9903c37a127) ([#11801](https://github.com/yt-dlp/yt-dlp/issues/11801)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Prioritize original language over auto-dubbed audio](https://github.com/yt-dlp/yt-dlp/commit/dc3c4fddcc653989dae71fc563d82a308fc898cc) ([#11803](https://github.com/yt-dlp/yt-dlp/issues/11803)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- search_url: [Fix playlist searches](https://github.com/yt-dlp/yt-dlp/commit/f6c73aad5f1a67544bea137ebd9d1e22e0e56567) ([#11782](https://github.com/yt-dlp/yt-dlp/issues/11782)) by [Crypto90](https://github.com/Crypto90)
|
||||||
|
|
||||||
|
#### Misc. changes
|
||||||
|
- **cleanup**: [Make more playlist entries lazy](https://github.com/yt-dlp/yt-dlp/commit/54216696261bc07cacd9a837c501d9e0b7fed09e) ([#11763](https://github.com/yt-dlp/yt-dlp/issues/11763)) by [seproDev](https://github.com/seproDev)
|
||||||
|
|
||||||
|
### 2024.12.06
|
||||||
|
|
||||||
|
#### Core changes
|
||||||
|
- **cookies**: [Add `--cookies-from-browser` support for MS Store Firefox](https://github.com/yt-dlp/yt-dlp/commit/354cb4026cf2191e1a130ec2a627b95cabfbc60a) ([#11731](https://github.com/yt-dlp/yt-dlp/issues/11731)) by [wesson09](https://github.com/wesson09)
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- **bilibili**: [Fix HD formats extraction](https://github.com/yt-dlp/yt-dlp/commit/fca3eb5f8be08d5fab2e18b45b7281a12e566725) ([#11734](https://github.com/yt-dlp/yt-dlp/issues/11734)) by [grqz](https://github.com/grqz)
|
||||||
|
- **soundcloud**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/2feb28028ee48f2185d2d95076e62accb09b9e2e) ([#11742](https://github.com/yt-dlp/yt-dlp/issues/11742)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **youtube**
|
||||||
|
- [Fix `n` sig extraction for player `3bb1f723`](https://github.com/yt-dlp/yt-dlp/commit/a95ee6d8803fca9157adecf63732ab58bf87fd88) ([#11750](https://github.com/yt-dlp/yt-dlp/issues/11750)) by [bashonly](https://github.com/bashonly) (With fixes in [4bd2655](https://github.com/yt-dlp/yt-dlp/commit/4bd2655398aed450456197a6767639114a24eac2))
|
||||||
|
- [Fix signature function extraction](https://github.com/yt-dlp/yt-dlp/commit/4c85ccd1366c88cf93982f8350f58eed17355981) ([#11751](https://github.com/yt-dlp/yt-dlp/issues/11751)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/2e49c789d3eebc39af8910705d65a98bca0e4c4f) ([#11724](https://github.com/yt-dlp/yt-dlp/issues/11724)) by [bashonly](https://github.com/bashonly)
|
||||||
|
|
||||||
|
### 2024.12.03
|
||||||
|
|
||||||
|
#### Core changes
|
||||||
|
- [Add `playlist_webpage_url` field](https://github.com/yt-dlp/yt-dlp/commit/7d6c259a03bc4707a319e5e8c6eff0278707874b) ([#11613](https://github.com/yt-dlp/yt-dlp/issues/11613)) by [seproDev](https://github.com/seproDev)
|
||||||
|
|
||||||
|
#### Extractor changes
|
||||||
|
- [Handle fragmented formats in `_remove_duplicate_formats`](https://github.com/yt-dlp/yt-dlp/commit/e0500cbf796323551bbabe5b8ed8c75a511ba47a) ([#11637](https://github.com/yt-dlp/yt-dlp/issues/11637)) by [Grub4K](https://github.com/Grub4K)
|
||||||
|
- **bilibili**
|
||||||
|
- [Always try to extract HD formats](https://github.com/yt-dlp/yt-dlp/commit/dc1687648077c5bf64863b307ecc5ab7e029bd8d) ([#10559](https://github.com/yt-dlp/yt-dlp/issues/10559)) by [grqz](https://github.com/grqz)
|
||||||
|
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/239f5f36fe04603bec59c8b975f6a792f10246db) ([#11667](https://github.com/yt-dlp/yt-dlp/issues/11667)) by [grqz](https://github.com/grqz) (With fixes in [f05a1cd](https://github.com/yt-dlp/yt-dlp/commit/f05a1cd1492fc98dc8d80d2081d632a1879913d2) by [bashonly](https://github.com/bashonly), [grqz](https://github.com/grqz))
|
||||||
|
- [Fix subtitles and chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/a13a336aa6f906812701abec8101b73b73db8ff7) ([#11708](https://github.com/yt-dlp/yt-dlp/issues/11708)) by [xiaomac](https://github.com/xiaomac)
|
||||||
|
- **chaturbate**: [Fix support for non-public streams](https://github.com/yt-dlp/yt-dlp/commit/4b5eec0aaa7c02627f27a386591b735b90e681a8) ([#11624](https://github.com/yt-dlp/yt-dlp/issues/11624)) by [jkruse](https://github.com/jkruse)
|
||||||
|
- **dacast**: [Fix HLS AES formats extraction](https://github.com/yt-dlp/yt-dlp/commit/0a0d80800b9350d1a4c4b18d82cfb77ffbc3c507) ([#11644](https://github.com/yt-dlp/yt-dlp/issues/11644)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **dropbox**: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/00dcde728635633eee969ad4d498b9f233c4a94e) ([#11636](https://github.com/yt-dlp/yt-dlp/issues/11636)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **duoplay**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/62cba8a1bedbfc0ddde7267ae57b72bf5f7ea7b1) ([#11588](https://github.com/yt-dlp/yt-dlp/issues/11588)) by [bashonly](https://github.com/bashonly), [glensc](https://github.com/glensc)
|
||||||
|
- **facebook**: [Support more groups URLs](https://github.com/yt-dlp/yt-dlp/commit/e0f1ae813b36e783e2348ba2a1566e12f5cd8f6e) ([#11576](https://github.com/yt-dlp/yt-dlp/issues/11576)) by [grqz](https://github.com/grqz)
|
||||||
|
- **instagram**: [Support `share` URLs](https://github.com/yt-dlp/yt-dlp/commit/360aed810ad85db950df586282d256516c98cd2d) ([#11677](https://github.com/yt-dlp/yt-dlp/issues/11677)) by [grqz](https://github.com/grqz)
|
||||||
|
- **microsoftembed**: [Make format extraction non fatal](https://github.com/yt-dlp/yt-dlp/commit/2bea7936323ca4b6f3b9b1fdd892566223e30efa) ([#11654](https://github.com/yt-dlp/yt-dlp/issues/11654)) by [seproDev](https://github.com/seproDev)
|
||||||
|
- **mitele**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/cd0f934604587ed793e9177f6a127e5dcf99a7dd) ([#11683](https://github.com/yt-dlp/yt-dlp/issues/11683)) by [DarkZeros](https://github.com/DarkZeros)
|
||||||
|
- **stripchat**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/16336c51d0848a6868a4fa04e749fa03548b4913) ([#11596](https://github.com/yt-dlp/yt-dlp/issues/11596)) by [gitninja1234](https://github.com/gitninja1234)
|
||||||
|
- **tiktok**: [Deprioritize animated thumbnails](https://github.com/yt-dlp/yt-dlp/commit/910ecc422930bca14e2abe4986f5f92359e3cea8) ([#11645](https://github.com/yt-dlp/yt-dlp/issues/11645)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **vk**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/c038a7b187ba24360f14134842a7a2cf897c33b1) ([#11715](https://github.com/yt-dlp/yt-dlp/issues/11715)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- **youtube**
|
||||||
|
- [Adjust player clients for site changes](https://github.com/yt-dlp/yt-dlp/commit/0d146c1e36f467af30e87b7af651bdee67b73500) ([#11663](https://github.com/yt-dlp/yt-dlp/issues/11663)) by [bashonly](https://github.com/bashonly)
|
||||||
|
- tab: [Fix playlists tab extraction](https://github.com/yt-dlp/yt-dlp/commit/fe70f20aedf528fdee332131bc9b6710e54e6f10) ([#11615](https://github.com/yt-dlp/yt-dlp/issues/11615)) by [seproDev](https://github.com/seproDev)
|
||||||
|
|
||||||
|
#### Networking changes
|
||||||
|
- **Request Handler**: websockets: [Support websockets 14.0+](https://github.com/yt-dlp/yt-dlp/commit/c7316373c0a886f65a07a51e50ee147bb3294c85) ([#11616](https://github.com/yt-dlp/yt-dlp/issues/11616)) by [coletdjnz](https://github.com/coletdjnz)
|
||||||
|
|
||||||
|
#### Misc. changes
|
||||||
|
- **cleanup**
|
||||||
|
- [Bump ruff to 0.8.x](https://github.com/yt-dlp/yt-dlp/commit/d8fb3490863653182864d2a53522f350d67a9ff8) ([#11608](https://github.com/yt-dlp/yt-dlp/issues/11608)) by [seproDev](https://github.com/seproDev)
|
||||||
|
- Miscellaneous
|
||||||
|
- [ccf0a6b](https://github.com/yt-dlp/yt-dlp/commit/ccf0a6b86b7f68a75463804fe485ec240b8635f0) by [bashonly](https://github.com/bashonly), [pzhlkj6612](https://github.com/pzhlkj6612)
|
||||||
|
- [2b67ac3](https://github.com/yt-dlp/yt-dlp/commit/2b67ac300ac8b44368fb121637d1743cea8c5b6b) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
|
||||||
|
|
||||||
### 2024.11.18
|
### 2024.11.18
|
||||||
|
|
||||||
#### Important changes
|
#### Important changes
|
||||||
- **Login with OAuth is no longer supported for YouTube**
|
- **Login with OAuth is no longer supported for YouTube**
|
||||||
Due to a change made by the site, yt-dlp is longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)
|
Due to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)
|
||||||
|
|
||||||
#### Core changes
|
#### Core changes
|
||||||
- [Catch broken Cryptodome installations](https://github.com/yt-dlp/yt-dlp/commit/b83ca24eb72e1e558b0185bd73975586c0bc0546) ([#11486](https://github.com/yt-dlp/yt-dlp/issues/11486)) by [seproDev](https://github.com/seproDev)
|
- [Catch broken Cryptodome installations](https://github.com/yt-dlp/yt-dlp/commit/b83ca24eb72e1e558b0185bd73975586c0bc0546) ([#11486](https://github.com/yt-dlp/yt-dlp/issues/11486)) by [seproDev](https://github.com/seproDev)
|
||||||
|
12
README.md
12
README.md
@ -613,8 +613,7 @@ ## Filesystem Options:
|
|||||||
--no-restrict-filenames Allow Unicode characters, "&" and spaces in
|
--no-restrict-filenames Allow Unicode characters, "&" and spaces in
|
||||||
filenames (default)
|
filenames (default)
|
||||||
--windows-filenames Force filenames to be Windows-compatible
|
--windows-filenames Force filenames to be Windows-compatible
|
||||||
--no-windows-filenames Make filenames Windows-compatible only if
|
--no-windows-filenames Sanitize filenames only minimally
|
||||||
using Windows (default)
|
|
||||||
--trim-filenames LENGTH Limit the filename length (excluding
|
--trim-filenames LENGTH Limit the filename length (excluding
|
||||||
extension) to the specified number of
|
extension) to the specified number of
|
||||||
characters
|
characters
|
||||||
@ -1294,6 +1293,7 @@ # OUTPUT TEMPLATE
|
|||||||
- `playlist_uploader_id` (string): Nickname or id of the playlist uploader
|
- `playlist_uploader_id` (string): Nickname or id of the playlist uploader
|
||||||
- `playlist_channel` (string): Display name of the channel that uploaded the playlist
|
- `playlist_channel` (string): Display name of the channel that uploaded the playlist
|
||||||
- `playlist_channel_id` (string): Identifier of the channel that uploaded the playlist
|
- `playlist_channel_id` (string): Identifier of the channel that uploaded the playlist
|
||||||
|
- `playlist_webpage_url` (string): URL of the playlist webpage
|
||||||
- `webpage_url` (string): A URL to the video webpage which, if given to yt-dlp, should yield the same result again
|
- `webpage_url` (string): A URL to the video webpage which, if given to yt-dlp, should yield the same result again
|
||||||
- `webpage_url_basename` (string): The basename of the webpage URL
|
- `webpage_url_basename` (string): The basename of the webpage URL
|
||||||
- `webpage_url_domain` (string): The domain of the webpage URL
|
- `webpage_url_domain` (string): The domain of the webpage URL
|
||||||
@ -1760,7 +1760,7 @@ # Replace all spaces and "_" in title and uploader with a `-`
|
|||||||
|
|
||||||
# EXTRACTOR ARGUMENTS
|
# EXTRACTOR ARGUMENTS
|
||||||
|
|
||||||
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=mediaconnect,web;formats=incomplete" --extractor-args "funimation:version=uncut"`
|
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "funimation:version=uncut"`
|
||||||
|
|
||||||
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
|
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
|
||||||
|
|
||||||
@ -1769,13 +1769,13 @@ # EXTRACTOR ARGUMENTS
|
|||||||
#### youtube
|
#### youtube
|
||||||
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
|
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
|
||||||
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
|
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
|
||||||
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `mediaconnect`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,mweb` is used, and `web_creator` is added as needed for age-gated videos when account age verification is required. Similarly, the `_music` variants are added for `music.youtube.com` URLs. Some clients, such as `web` and `android`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. You can use `all` to use all the clients, and `default` for the default clients. You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=all,-web`
|
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `_music` variants may be added for `music.youtube.com` URLs. Some clients, such as `web` and `android`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
|
||||||
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
|
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
|
||||||
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
|
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
|
||||||
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
|
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
|
||||||
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
|
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
|
||||||
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
|
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
|
||||||
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
|
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
|
||||||
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
|
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
|
||||||
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
|
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
|
||||||
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
|
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
|
||||||
@ -1859,7 +1859,7 @@ #### afreecatvlive
|
|||||||
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
|
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
|
||||||
|
|
||||||
#### soundcloud
|
#### soundcloud
|
||||||
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{extension}` (omitting the bitrate), e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known extensions include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
|
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{codec}`, e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known codecs include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
|
||||||
|
|
||||||
#### orfon (orf:on)
|
#### orfon (orf:on)
|
||||||
* `prefer_segments_playlist`: Prefer a playlist of program segments instead of a single complete video when available. If individual segments are desired, use `--concat-playlist never --extractor-args "orfon:prefer_segments_playlist"`
|
* `prefer_segments_playlist`: Prefer a playlist of program segments instead of a single complete video when available. If individual segments are desired, use `--concat-playlist never --extractor-args "orfon:prefer_segments_playlist"`
|
||||||
|
@ -238,6 +238,6 @@
|
|||||||
{
|
{
|
||||||
"action": "add",
|
"action": "add",
|
||||||
"when": "52c0ffe40ad6e8404d93296f575007b05b04c686",
|
"when": "52c0ffe40ad6e8404d93296f575007b05b04c686",
|
||||||
"short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)"
|
"short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
@ -52,7 +52,7 @@ default = [
|
|||||||
"pycryptodomex",
|
"pycryptodomex",
|
||||||
"requests>=2.32.2,<3",
|
"requests>=2.32.2,<3",
|
||||||
"urllib3>=1.26.17,<3",
|
"urllib3>=1.26.17,<3",
|
||||||
"websockets>=13.0,<14",
|
"websockets>=13.0",
|
||||||
]
|
]
|
||||||
curl-cffi = [
|
curl-cffi = [
|
||||||
"curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'",
|
"curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'",
|
||||||
@ -76,7 +76,7 @@ dev = [
|
|||||||
]
|
]
|
||||||
static-analysis = [
|
static-analysis = [
|
||||||
"autopep8~=2.0",
|
"autopep8~=2.0",
|
||||||
"ruff~=0.7.0",
|
"ruff~=0.9.0",
|
||||||
]
|
]
|
||||||
test = [
|
test = [
|
||||||
"pytest~=8.1",
|
"pytest~=8.1",
|
||||||
@ -186,6 +186,7 @@ ignore = [
|
|||||||
"E501", # line-too-long
|
"E501", # line-too-long
|
||||||
"E731", # lambda-assignment
|
"E731", # lambda-assignment
|
||||||
"E741", # ambiguous-variable-name
|
"E741", # ambiguous-variable-name
|
||||||
|
"UP031", # printf-string-formatting
|
||||||
"UP036", # outdated-version-block
|
"UP036", # outdated-version-block
|
||||||
"B006", # mutable-argument-default
|
"B006", # mutable-argument-default
|
||||||
"B008", # function-call-in-default-argument
|
"B008", # function-call-in-default-argument
|
||||||
@ -194,6 +195,7 @@ ignore = [
|
|||||||
"B023", # function-uses-loop-variable (false positives)
|
"B023", # function-uses-loop-variable (false positives)
|
||||||
"B028", # no-explicit-stacklevel
|
"B028", # no-explicit-stacklevel
|
||||||
"B904", # raise-without-from-inside-except
|
"B904", # raise-without-from-inside-except
|
||||||
|
"A005", # stdlib-module-shadowing
|
||||||
"C401", # unnecessary-generator-set
|
"C401", # unnecessary-generator-set
|
||||||
"C402", # unnecessary-generator-dict
|
"C402", # unnecessary-generator-dict
|
||||||
"PIE790", # unnecessary-placeholder
|
"PIE790", # unnecessary-placeholder
|
||||||
@ -258,9 +260,6 @@ select = [
|
|||||||
"A002", # builtin-argument-shadowing
|
"A002", # builtin-argument-shadowing
|
||||||
"C408", # unnecessary-collection-call
|
"C408", # unnecessary-collection-call
|
||||||
]
|
]
|
||||||
"yt_dlp/jsinterp.py" = [
|
|
||||||
"UP031", # printf-string-formatting
|
|
||||||
]
|
|
||||||
|
|
||||||
[tool.ruff.lint.isort]
|
[tool.ruff.lint.isort]
|
||||||
known-first-party = [
|
known-first-party = [
|
||||||
|
@ -374,6 +374,7 @@ # Supported sites
|
|||||||
- **Dropbox**
|
- **Dropbox**
|
||||||
- **Dropout**: [*dropout*](## "netrc machine")
|
- **Dropout**: [*dropout*](## "netrc machine")
|
||||||
- **DropoutSeason**
|
- **DropoutSeason**
|
||||||
|
- **DrTalks**
|
||||||
- **DrTuber**
|
- **DrTuber**
|
||||||
- **drtv**
|
- **drtv**
|
||||||
- **drtv:live**
|
- **drtv:live**
|
||||||
@ -1086,6 +1087,7 @@ # Supported sites
|
|||||||
- **pluralsight**: [*pluralsight*](## "netrc machine")
|
- **pluralsight**: [*pluralsight*](## "netrc machine")
|
||||||
- **pluralsight:course**
|
- **pluralsight:course**
|
||||||
- **PlutoTV**: (**Currently broken**)
|
- **PlutoTV**: (**Currently broken**)
|
||||||
|
- **PlVideo**: Платформа
|
||||||
- **PodbayFM**
|
- **PodbayFM**
|
||||||
- **PodbayFMChannel**
|
- **PodbayFMChannel**
|
||||||
- **Podchaser**
|
- **Podchaser**
|
||||||
@ -1641,8 +1643,6 @@ # Supported sites
|
|||||||
- **Vimm:stream**
|
- **Vimm:stream**
|
||||||
- **ViMP**
|
- **ViMP**
|
||||||
- **ViMP:Playlist**
|
- **ViMP:Playlist**
|
||||||
- **Vine**
|
|
||||||
- **vine:user**
|
|
||||||
- **Viously**
|
- **Viously**
|
||||||
- **Viqeo**: (**Currently broken**)
|
- **Viqeo**: (**Currently broken**)
|
||||||
- **Viu**
|
- **Viu**
|
||||||
|
@ -761,6 +761,13 @@ def test(tmpl, expected, *, info=None, **params):
|
|||||||
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
|
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
|
||||||
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
|
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
|
||||||
|
|
||||||
|
# Sanitization options
|
||||||
|
test('%(title3)s', (None, 'foo⧸bar⧹test'))
|
||||||
|
test('%(title5)s', (None, 'aei_A'), restrictfilenames=True)
|
||||||
|
test('%(title3)s', (None, 'foo_bar_test'), windowsfilenames=False, restrictfilenames=True)
|
||||||
|
if sys.platform != 'win32':
|
||||||
|
test('%(title3)s', (None, 'foo⧸bar\\test'), windowsfilenames=False)
|
||||||
|
|
||||||
# ID sanitization
|
# ID sanitization
|
||||||
test('%(id)s', '_abcd', info={'id': '_abcd'})
|
test('%(id)s', '_abcd', info={'id': '_abcd'})
|
||||||
test('%(some_id)s', '_abcd', info={'some_id': '_abcd'})
|
test('%(some_id)s', '_abcd', info={'some_id': '_abcd'})
|
||||||
|
@ -216,7 +216,9 @@ def handle(self):
|
|||||||
protocol = websockets.ServerProtocol()
|
protocol = websockets.ServerProtocol()
|
||||||
connection = websockets.sync.server.ServerConnection(socket=self.request, protocol=protocol, close_timeout=0)
|
connection = websockets.sync.server.ServerConnection(socket=self.request, protocol=protocol, close_timeout=0)
|
||||||
connection.handshake()
|
connection.handshake()
|
||||||
connection.send(json.dumps(self.socks_info))
|
for message in connection:
|
||||||
|
if message == 'socks_info':
|
||||||
|
connection.send(json.dumps(self.socks_info))
|
||||||
connection.close()
|
connection.close()
|
||||||
|
|
||||||
|
|
||||||
|
@ -68,6 +68,16 @@
|
|||||||
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
|
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
|
||||||
'AOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL2QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
|
'AOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL2QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
|
||||||
),
|
),
|
||||||
|
(
|
||||||
|
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
|
||||||
|
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
|
||||||
|
'MyOSJXtKI3m-uME_jv7-pT12gOFC02RFkGoqWpzE0Cs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
|
||||||
|
),
|
||||||
|
(
|
||||||
|
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
|
||||||
|
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
|
||||||
|
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
|
||||||
|
),
|
||||||
]
|
]
|
||||||
|
|
||||||
_NSIG_TESTS = [
|
_NSIG_TESTS = [
|
||||||
@ -183,6 +193,14 @@
|
|||||||
'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js',
|
'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js',
|
||||||
'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw',
|
'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw',
|
||||||
),
|
),
|
||||||
|
(
|
||||||
|
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
|
||||||
|
'gK15nzVyaXE9RsMP3z', 'ZFFWFLPWx9DEgQ',
|
||||||
|
),
|
||||||
|
(
|
||||||
|
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
|
||||||
|
'YWt1qdbe8SAfkoPHW5d', 'RrRjWQOJmBiP',
|
||||||
|
),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
@ -254,8 +272,11 @@ def signature(jscode, sig_input):
|
|||||||
|
|
||||||
|
|
||||||
def n_sig(jscode, sig_input):
|
def n_sig(jscode, sig_input):
|
||||||
funcname = YoutubeIE(FakeYDL())._extract_n_function_name(jscode)
|
ie = YoutubeIE(FakeYDL())
|
||||||
return JSInterpreter(jscode).call_function(funcname, sig_input)
|
funcname = ie._extract_n_function_name(jscode)
|
||||||
|
jsi = JSInterpreter(jscode)
|
||||||
|
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname)))
|
||||||
|
return func([sig_input])
|
||||||
|
|
||||||
|
|
||||||
make_sig_test = t_factory(
|
make_sig_test = t_factory(
|
||||||
|
@ -266,7 +266,9 @@ class YoutubeDL:
|
|||||||
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
|
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
|
||||||
restrictfilenames: Do not allow "&" and spaces in file names
|
restrictfilenames: Do not allow "&" and spaces in file names
|
||||||
trim_file_name: Limit length of filename (extension excluded)
|
trim_file_name: Limit length of filename (extension excluded)
|
||||||
windowsfilenames: Force the filenames to be windows compatible
|
windowsfilenames: True: Force filenames to be Windows compatible
|
||||||
|
False: Sanitize filenames only minimally
|
||||||
|
This option has no effect when running on Windows
|
||||||
ignoreerrors: Do not stop on download/postprocessing errors.
|
ignoreerrors: Do not stop on download/postprocessing errors.
|
||||||
Can be 'only_download' to ignore only download errors.
|
Can be 'only_download' to ignore only download errors.
|
||||||
Default is 'only_download' for CLI, but False for API
|
Default is 'only_download' for CLI, but False for API
|
||||||
@ -281,7 +283,10 @@ class YoutubeDL:
|
|||||||
lazy_playlist: Process playlist entries as they are received.
|
lazy_playlist: Process playlist entries as they are received.
|
||||||
matchtitle: Download only matching titles.
|
matchtitle: Download only matching titles.
|
||||||
rejecttitle: Reject downloads for matching titles.
|
rejecttitle: Reject downloads for matching titles.
|
||||||
logger: Log messages to a logging.Logger instance.
|
logger: A class having a `debug`, `warning` and `error` function where
|
||||||
|
each has a single string parameter, the message to be logged.
|
||||||
|
For compatibility reasons, both debug and info messages are passed to `debug`.
|
||||||
|
A debug message will have a prefix of `[debug] ` to discern it from info messages.
|
||||||
logtostderr: Print everything to stderr instead of stdout.
|
logtostderr: Print everything to stderr instead of stdout.
|
||||||
consoletitle: Display progress in the console window's titlebar.
|
consoletitle: Display progress in the console window's titlebar.
|
||||||
writedescription: Write the video description to a .description file
|
writedescription: Write the video description to a .description file
|
||||||
@ -1116,7 +1121,7 @@ def report_file_delete(self, file_name):
|
|||||||
def raise_no_formats(self, info, forced=False, *, msg=None):
|
def raise_no_formats(self, info, forced=False, *, msg=None):
|
||||||
has_drm = info.get('_has_drm')
|
has_drm = info.get('_has_drm')
|
||||||
ignored, expected = self.params.get('ignore_no_formats_error'), bool(msg)
|
ignored, expected = self.params.get('ignore_no_formats_error'), bool(msg)
|
||||||
msg = msg or has_drm and 'This video is DRM protected' or 'No video formats found!'
|
msg = msg or (has_drm and 'This video is DRM protected') or 'No video formats found!'
|
||||||
if forced or not ignored:
|
if forced or not ignored:
|
||||||
raise ExtractorError(msg, video_id=info['id'], ie=info['extractor'],
|
raise ExtractorError(msg, video_id=info['id'], ie=info['extractor'],
|
||||||
expected=has_drm or ignored or expected)
|
expected=has_drm or ignored or expected)
|
||||||
@ -1192,8 +1197,7 @@ def _copy_infodict(info_dict):
|
|||||||
|
|
||||||
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=False):
|
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=False):
|
||||||
""" Make the outtmpl and info_dict suitable for substitution: ydl.escape_outtmpl(outtmpl) % info_dict
|
""" Make the outtmpl and info_dict suitable for substitution: ydl.escape_outtmpl(outtmpl) % info_dict
|
||||||
@param sanitize Whether to sanitize the output as a filename.
|
@param sanitize Whether to sanitize the output as a filename
|
||||||
For backward compatibility, a function can also be passed
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
info_dict.setdefault('epoch', int(time.time())) # keep epoch consistent once set
|
info_dict.setdefault('epoch', int(time.time())) # keep epoch consistent once set
|
||||||
@ -1309,14 +1313,23 @@ def get_value(mdict):
|
|||||||
|
|
||||||
na = self.params.get('outtmpl_na_placeholder', 'NA')
|
na = self.params.get('outtmpl_na_placeholder', 'NA')
|
||||||
|
|
||||||
def filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames')):
|
def filename_sanitizer(key, value, restricted):
|
||||||
return sanitize_filename(str(value), restricted=restricted, is_id=(
|
return sanitize_filename(str(value), restricted=restricted, is_id=(
|
||||||
bool(re.search(r'(^|[_.])id(\.|$)', key))
|
bool(re.search(r'(^|[_.])id(\.|$)', key))
|
||||||
if 'filename-sanitization' in self.params['compat_opts']
|
if 'filename-sanitization' in self.params['compat_opts']
|
||||||
else NO_DEFAULT))
|
else NO_DEFAULT))
|
||||||
|
|
||||||
sanitizer = sanitize if callable(sanitize) else filename_sanitizer
|
if callable(sanitize):
|
||||||
sanitize = bool(sanitize)
|
self.deprecation_warning('Passing a callable "sanitize" to YoutubeDL.prepare_outtmpl is deprecated')
|
||||||
|
elif not sanitize:
|
||||||
|
pass
|
||||||
|
elif (sys.platform != 'win32' and not self.params.get('restrictfilenames')
|
||||||
|
and self.params.get('windowsfilenames') is False):
|
||||||
|
def sanitize(key, value):
|
||||||
|
return str(value).replace('/', '\u29F8').replace('\0', '')
|
||||||
|
else:
|
||||||
|
def sanitize(key, value):
|
||||||
|
return filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames'))
|
||||||
|
|
||||||
def _dumpjson_default(obj):
|
def _dumpjson_default(obj):
|
||||||
if isinstance(obj, (set, LazyList)):
|
if isinstance(obj, (set, LazyList)):
|
||||||
@ -1399,13 +1412,13 @@ def create_key(outer_mobj):
|
|||||||
|
|
||||||
if sanitize:
|
if sanitize:
|
||||||
# If value is an object, sanitize might convert it to a string
|
# If value is an object, sanitize might convert it to a string
|
||||||
# So we convert it to repr first
|
# So we manually convert it before sanitizing
|
||||||
if fmt[-1] == 'r':
|
if fmt[-1] == 'r':
|
||||||
value, fmt = repr(value), str_fmt
|
value, fmt = repr(value), str_fmt
|
||||||
elif fmt[-1] == 'a':
|
elif fmt[-1] == 'a':
|
||||||
value, fmt = ascii(value), str_fmt
|
value, fmt = ascii(value), str_fmt
|
||||||
if fmt[-1] in 'csra':
|
if fmt[-1] in 'csra':
|
||||||
value = sanitizer(last_field, value)
|
value = sanitize(last_field, value)
|
||||||
|
|
||||||
key = '{}\0{}'.format(key.replace('%', '%\0'), outer_mobj.group('format'))
|
key = '{}\0{}'.format(key.replace('%', '%\0'), outer_mobj.group('format'))
|
||||||
TMPL_DICT[key] = value
|
TMPL_DICT[key] = value
|
||||||
@ -1947,6 +1960,7 @@ def _playlist_infodict(ie_result, strict=False, **kwargs):
|
|||||||
'playlist_uploader_id': ie_result.get('uploader_id'),
|
'playlist_uploader_id': ie_result.get('uploader_id'),
|
||||||
'playlist_channel': ie_result.get('channel'),
|
'playlist_channel': ie_result.get('channel'),
|
||||||
'playlist_channel_id': ie_result.get('channel_id'),
|
'playlist_channel_id': ie_result.get('channel_id'),
|
||||||
|
'playlist_webpage_url': ie_result.get('webpage_url'),
|
||||||
**kwargs,
|
**kwargs,
|
||||||
}
|
}
|
||||||
if strict:
|
if strict:
|
||||||
@ -2195,7 +2209,7 @@ def _select_formats(self, formats, selector):
|
|||||||
def _default_format_spec(self, info_dict):
|
def _default_format_spec(self, info_dict):
|
||||||
prefer_best = (
|
prefer_best = (
|
||||||
self.params['outtmpl']['default'] == '-'
|
self.params['outtmpl']['default'] == '-'
|
||||||
or info_dict.get('is_live') and not self.params.get('live_from_start'))
|
or (info_dict.get('is_live') and not self.params.get('live_from_start')))
|
||||||
|
|
||||||
def can_merge():
|
def can_merge():
|
||||||
merger = FFmpegMergerPP(self)
|
merger = FFmpegMergerPP(self)
|
||||||
@ -2364,7 +2378,7 @@ def _merge(formats_pair):
|
|||||||
vexts=[f['ext'] for f in video_fmts],
|
vexts=[f['ext'] for f in video_fmts],
|
||||||
aexts=[f['ext'] for f in audio_fmts],
|
aexts=[f['ext'] for f in audio_fmts],
|
||||||
preferences=(try_call(lambda: self.params['merge_output_format'].split('/'))
|
preferences=(try_call(lambda: self.params['merge_output_format'].split('/'))
|
||||||
or self.params.get('prefer_free_formats') and ('webm', 'mkv')))
|
or (self.params.get('prefer_free_formats') and ('webm', 'mkv'))))
|
||||||
|
|
||||||
filtered = lambda *keys: filter(None, (traverse_obj(fmt, *keys) for fmt in formats_info))
|
filtered = lambda *keys: filter(None, (traverse_obj(fmt, *keys) for fmt in formats_info))
|
||||||
|
|
||||||
@ -3540,8 +3554,8 @@ def ffmpeg_fixup(cndn, msg, cls):
|
|||||||
and info_dict.get('container') == 'm4a_dash',
|
and info_dict.get('container') == 'm4a_dash',
|
||||||
'writing DASH m4a. Only some players support this container',
|
'writing DASH m4a. Only some players support this container',
|
||||||
FFmpegFixupM4aPP)
|
FFmpegFixupM4aPP)
|
||||||
ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
|
ffmpeg_fixup((downloader == 'hlsnative' and not self.params.get('hls_use_mpegts'))
|
||||||
or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
|
or (info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None),
|
||||||
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
|
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
|
||||||
FFmpegFixupM3u8PP)
|
FFmpegFixupM3u8PP)
|
||||||
ffmpeg_fixup(downloader == 'dashsegments'
|
ffmpeg_fixup(downloader == 'dashsegments'
|
||||||
|
@ -261,9 +261,11 @@ def parse_retries(name, value):
|
|||||||
elif value in ('inf', 'infinite'):
|
elif value in ('inf', 'infinite'):
|
||||||
return float('inf')
|
return float('inf')
|
||||||
try:
|
try:
|
||||||
return int(value)
|
int_value = int(value)
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
validate(False, f'{name} retry count', value)
|
validate(False, f'{name} retry count', value)
|
||||||
|
validate_positive(f'{name} retry count', int_value)
|
||||||
|
return int_value
|
||||||
|
|
||||||
opts.retries = parse_retries('download', opts.retries)
|
opts.retries = parse_retries('download', opts.retries)
|
||||||
opts.fragment_retries = parse_retries('fragment', opts.fragment_retries)
|
opts.fragment_retries = parse_retries('fragment', opts.fragment_retries)
|
||||||
@ -1062,7 +1064,7 @@ def make_row(target, handler):
|
|||||||
# If we only have a single process attached, then the executable was double clicked
|
# If we only have a single process attached, then the executable was double clicked
|
||||||
# When using `pyinstaller` with `--onefile`, two processes get attached
|
# When using `pyinstaller` with `--onefile`, two processes get attached
|
||||||
is_onefile = hasattr(sys, '_MEIPASS') and os.path.basename(sys._MEIPASS).startswith('_MEI')
|
is_onefile = hasattr(sys, '_MEIPASS') and os.path.basename(sys._MEIPASS).startswith('_MEI')
|
||||||
if attached_processes == 1 or is_onefile and attached_processes == 2:
|
if attached_processes == 1 or (is_onefile and attached_processes == 2):
|
||||||
print(parser._generate_error_message(
|
print(parser._generate_error_message(
|
||||||
'Do not double-click the executable, instead call it from a command line.\n'
|
'Do not double-click the executable, instead call it from a command line.\n'
|
||||||
'Please read the README for further information on how to use yt-dlp: '
|
'Please read the README for further information on how to use yt-dlp: '
|
||||||
@ -1109,9 +1111,9 @@ def main(argv=None):
|
|||||||
from .extractor import gen_extractors, list_extractors
|
from .extractor import gen_extractors, list_extractors
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
'main',
|
|
||||||
'YoutubeDL',
|
'YoutubeDL',
|
||||||
'parse_options',
|
|
||||||
'gen_extractors',
|
'gen_extractors',
|
||||||
'list_extractors',
|
'list_extractors',
|
||||||
|
'main',
|
||||||
|
'parse_options',
|
||||||
]
|
]
|
||||||
|
@ -534,19 +534,17 @@ def ghash(subkey, data):
|
|||||||
__all__ = [
|
__all__ = [
|
||||||
'aes_cbc_decrypt',
|
'aes_cbc_decrypt',
|
||||||
'aes_cbc_decrypt_bytes',
|
'aes_cbc_decrypt_bytes',
|
||||||
'aes_ctr_decrypt',
|
|
||||||
'aes_decrypt_text',
|
|
||||||
'aes_decrypt',
|
|
||||||
'aes_ecb_decrypt',
|
|
||||||
'aes_gcm_decrypt_and_verify',
|
|
||||||
'aes_gcm_decrypt_and_verify_bytes',
|
|
||||||
|
|
||||||
'aes_cbc_encrypt',
|
'aes_cbc_encrypt',
|
||||||
'aes_cbc_encrypt_bytes',
|
'aes_cbc_encrypt_bytes',
|
||||||
|
'aes_ctr_decrypt',
|
||||||
'aes_ctr_encrypt',
|
'aes_ctr_encrypt',
|
||||||
|
'aes_decrypt',
|
||||||
|
'aes_decrypt_text',
|
||||||
|
'aes_ecb_decrypt',
|
||||||
'aes_ecb_encrypt',
|
'aes_ecb_encrypt',
|
||||||
'aes_encrypt',
|
'aes_encrypt',
|
||||||
|
'aes_gcm_decrypt_and_verify',
|
||||||
|
'aes_gcm_decrypt_and_verify_bytes',
|
||||||
'key_expansion',
|
'key_expansion',
|
||||||
'pad_block',
|
'pad_block',
|
||||||
'pkcs7_padding',
|
'pkcs7_padding',
|
||||||
|
@ -195,7 +195,10 @@ def _extract_firefox_cookies(profile, container, logger):
|
|||||||
|
|
||||||
def _firefox_browser_dirs():
|
def _firefox_browser_dirs():
|
||||||
if sys.platform in ('cygwin', 'win32'):
|
if sys.platform in ('cygwin', 'win32'):
|
||||||
yield os.path.expandvars(R'%APPDATA%\Mozilla\Firefox\Profiles')
|
yield from map(os.path.expandvars, (
|
||||||
|
R'%APPDATA%\Mozilla\Firefox\Profiles',
|
||||||
|
R'%LOCALAPPDATA%\Packages\Mozilla.Firefox_n80bbvh6b1yt2\LocalCache\Roaming\Mozilla\Firefox\Profiles',
|
||||||
|
))
|
||||||
|
|
||||||
elif sys.platform == 'darwin':
|
elif sys.platform == 'darwin':
|
||||||
yield os.path.expanduser('~/Library/Application Support/Firefox/Profiles')
|
yield os.path.expanduser('~/Library/Application Support/Firefox/Profiles')
|
||||||
@ -1276,8 +1279,8 @@ def open(self, file, *, write=False):
|
|||||||
def _really_save(self, f, ignore_discard, ignore_expires):
|
def _really_save(self, f, ignore_discard, ignore_expires):
|
||||||
now = time.time()
|
now = time.time()
|
||||||
for cookie in self:
|
for cookie in self:
|
||||||
if (not ignore_discard and cookie.discard
|
if ((not ignore_discard and cookie.discard)
|
||||||
or not ignore_expires and cookie.is_expired(now)):
|
or (not ignore_expires and cookie.is_expired(now))):
|
||||||
continue
|
continue
|
||||||
name, value = cookie.name, cookie.value
|
name, value = cookie.name, cookie.value
|
||||||
if value is None:
|
if value is None:
|
||||||
|
@ -119,12 +119,12 @@ def real_download(self, filename, info_dict):
|
|||||||
self.to_screen(f'[{self.FD_NAME}] Fragment downloads will be delegated to {real_downloader.get_basename()}')
|
self.to_screen(f'[{self.FD_NAME}] Fragment downloads will be delegated to {real_downloader.get_basename()}')
|
||||||
|
|
||||||
def is_ad_fragment_start(s):
|
def is_ad_fragment_start(s):
|
||||||
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
|
return ((s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s)
|
||||||
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
|
or (s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad')))
|
||||||
|
|
||||||
def is_ad_fragment_end(s):
|
def is_ad_fragment_end(s):
|
||||||
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s
|
return ((s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s)
|
||||||
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
|
or (s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment')))
|
||||||
|
|
||||||
fragments = []
|
fragments = []
|
||||||
|
|
||||||
|
@ -123,8 +123,8 @@ def download_and_parse_fragment(url, frag_index, request_data=None, headers=None
|
|||||||
data,
|
data,
|
||||||
lambda x: x['continuationContents']['liveChatContinuation'], dict) or {}
|
lambda x: x['continuationContents']['liveChatContinuation'], dict) or {}
|
||||||
|
|
||||||
func = (info_dict['protocol'] == 'youtube_live_chat' and parse_actions_live
|
func = ((info_dict['protocol'] == 'youtube_live_chat' and parse_actions_live)
|
||||||
or frag_index == 1 and try_refresh_replay_beginning
|
or (frag_index == 1 and try_refresh_replay_beginning)
|
||||||
or parse_actions_replay)
|
or parse_actions_replay)
|
||||||
return (True, *func(live_chat_continuation))
|
return (True, *func(live_chat_continuation))
|
||||||
except HTTPError as err:
|
except HTTPError as err:
|
||||||
|
@ -256,6 +256,7 @@
|
|||||||
BilibiliCheeseIE,
|
BilibiliCheeseIE,
|
||||||
BilibiliCheeseSeasonIE,
|
BilibiliCheeseSeasonIE,
|
||||||
BilibiliCollectionListIE,
|
BilibiliCollectionListIE,
|
||||||
|
BiliBiliDynamicIE,
|
||||||
BilibiliFavoritesListIE,
|
BilibiliFavoritesListIE,
|
||||||
BiliBiliIE,
|
BiliBiliIE,
|
||||||
BiliBiliPlayerIE,
|
BiliBiliPlayerIE,
|
||||||
@ -555,6 +556,7 @@
|
|||||||
DropoutIE,
|
DropoutIE,
|
||||||
DropoutSeasonIE,
|
DropoutSeasonIE,
|
||||||
)
|
)
|
||||||
|
from .drtalks import DrTalksIE
|
||||||
from .drtuber import DrTuberIE
|
from .drtuber import DrTuberIE
|
||||||
from .drtv import (
|
from .drtv import (
|
||||||
DRTVIE,
|
DRTVIE,
|
||||||
@ -584,6 +586,10 @@
|
|||||||
EggheadCourseIE,
|
EggheadCourseIE,
|
||||||
EggheadLessonIE,
|
EggheadLessonIE,
|
||||||
)
|
)
|
||||||
|
from .eggs import (
|
||||||
|
EggsArtistIE,
|
||||||
|
EggsIE,
|
||||||
|
)
|
||||||
from .eighttracks import EightTracksIE
|
from .eighttracks import EightTracksIE
|
||||||
from .eitb import EitbIE
|
from .eitb import EitbIE
|
||||||
from .elementorembed import ElementorEmbedIE
|
from .elementorembed import ElementorEmbedIE
|
||||||
@ -1278,6 +1284,10 @@
|
|||||||
)
|
)
|
||||||
from .nekohacker import NekoHackerIE
|
from .nekohacker import NekoHackerIE
|
||||||
from .nerdcubed import NerdCubedFeedIE
|
from .nerdcubed import NerdCubedFeedIE
|
||||||
|
from .nest import (
|
||||||
|
NestClipIE,
|
||||||
|
NestIE,
|
||||||
|
)
|
||||||
from .neteasemusic import (
|
from .neteasemusic import (
|
||||||
NetEaseMusicAlbumIE,
|
NetEaseMusicAlbumIE,
|
||||||
NetEaseMusicDjRadioIE,
|
NetEaseMusicDjRadioIE,
|
||||||
@ -1533,6 +1543,10 @@
|
|||||||
PinterestCollectionIE,
|
PinterestCollectionIE,
|
||||||
PinterestIE,
|
PinterestIE,
|
||||||
)
|
)
|
||||||
|
from .piramidetv import (
|
||||||
|
PiramideTVChannelIE,
|
||||||
|
PiramideTVIE,
|
||||||
|
)
|
||||||
from .pixivsketch import (
|
from .pixivsketch import (
|
||||||
PixivSketchIE,
|
PixivSketchIE,
|
||||||
PixivSketchUserIE,
|
PixivSketchUserIE,
|
||||||
@ -1552,6 +1566,7 @@
|
|||||||
PluralsightIE,
|
PluralsightIE,
|
||||||
)
|
)
|
||||||
from .plutotv import PlutoTVIE
|
from .plutotv import PlutoTVIE
|
||||||
|
from .plvideo import PlVideoIE
|
||||||
from .podbayfm import (
|
from .podbayfm import (
|
||||||
PodbayFMChannelIE,
|
PodbayFMChannelIE,
|
||||||
PodbayFMIE,
|
PodbayFMIE,
|
||||||
@ -2355,10 +2370,6 @@
|
|||||||
VimmIE,
|
VimmIE,
|
||||||
VimmRecordingIE,
|
VimmRecordingIE,
|
||||||
)
|
)
|
||||||
from .vine import (
|
|
||||||
VineIE,
|
|
||||||
VineUserIE,
|
|
||||||
)
|
|
||||||
from .viously import ViouslyIE
|
from .viously import ViouslyIE
|
||||||
from .viqeo import ViqeoIE
|
from .viqeo import ViqeoIE
|
||||||
from .viu import (
|
from .viu import (
|
||||||
|
@ -232,7 +232,7 @@ def _real_extract(self, url):
|
|||||||
|
|
||||||
error = self._parse_json(e.cause.response.read(), video_id)
|
error = self._parse_json(e.cause.response.read(), video_id)
|
||||||
message = error.get('message')
|
message = error.get('message')
|
||||||
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
|
if e.cause.status == 403 and error.get('code') == 'player-bad-geolocation-country':
|
||||||
self.raise_geo_restricted(msg=message)
|
self.raise_geo_restricted(msg=message)
|
||||||
raise ExtractorError(message)
|
raise ExtractorError(message)
|
||||||
else:
|
else:
|
||||||
|
@ -18,7 +18,6 @@
|
|||||||
InAdvancePagedList,
|
InAdvancePagedList,
|
||||||
OnDemandPagedList,
|
OnDemandPagedList,
|
||||||
bool_or_none,
|
bool_or_none,
|
||||||
clean_html,
|
|
||||||
determine_ext,
|
determine_ext,
|
||||||
filter_dict,
|
filter_dict,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
@ -33,6 +32,7 @@
|
|||||||
parse_qs,
|
parse_qs,
|
||||||
parse_resolution,
|
parse_resolution,
|
||||||
qualities,
|
qualities,
|
||||||
|
sanitize_url,
|
||||||
smuggle_url,
|
smuggle_url,
|
||||||
srt_subtitles_timecode,
|
srt_subtitles_timecode,
|
||||||
str_or_none,
|
str_or_none,
|
||||||
@ -63,7 +63,7 @@ def _check_missing_formats(self, play_info, formats):
|
|||||||
'support_formats', lambda _, v: v['quality'] not in parsed_qualities))], delim=', ')
|
'support_formats', lambda _, v: v['quality'] not in parsed_qualities))], delim=', ')
|
||||||
if missing_formats:
|
if missing_formats:
|
||||||
self.to_screen(
|
self.to_screen(
|
||||||
f'Format(s) {missing_formats} are missing; you have to login or '
|
f'Format(s) {missing_formats} are missing; you have to '
|
||||||
f'become a premium member to download them. {self._login_hint()}')
|
f'become a premium member to download them. {self._login_hint()}')
|
||||||
|
|
||||||
def extract_formats(self, play_info):
|
def extract_formats(self, play_info):
|
||||||
@ -165,14 +165,18 @@ def _sign_wbi(self, params, video_id):
|
|||||||
params['w_rid'] = hashlib.md5(f'{query}{self._get_wbi_key(video_id)}'.encode()).hexdigest()
|
params['w_rid'] = hashlib.md5(f'{query}{self._get_wbi_key(video_id)}'.encode()).hexdigest()
|
||||||
return params
|
return params
|
||||||
|
|
||||||
def _download_playinfo(self, bvid, cid, headers=None, qn=None):
|
def _download_playinfo(self, bvid, cid, headers=None, query=None):
|
||||||
params = {'bvid': bvid, 'cid': cid, 'fnval': 4048}
|
params = {'bvid': bvid, 'cid': cid, 'fnval': 4048, **(query or {})}
|
||||||
if qn:
|
if self.is_logged_in:
|
||||||
params['qn'] = qn
|
params.pop('try_look', None)
|
||||||
|
if qn := params.get('qn'):
|
||||||
|
note = f'Downloading video format {qn} for cid {cid}'
|
||||||
|
else:
|
||||||
|
note = f'Downloading video formats for cid {cid}'
|
||||||
|
|
||||||
return self._download_json(
|
return self._download_json(
|
||||||
'https://api.bilibili.com/x/player/wbi/playurl', bvid,
|
'https://api.bilibili.com/x/player/wbi/playurl', bvid,
|
||||||
query=self._sign_wbi(params, bvid), headers=headers,
|
query=self._sign_wbi(params, bvid), headers=headers, note=note)['data']
|
||||||
note=f'Downloading video formats for cid {cid} {qn or ""}')['data']
|
|
||||||
|
|
||||||
def json2srt(self, json_data):
|
def json2srt(self, json_data):
|
||||||
srt_data = ''
|
srt_data = ''
|
||||||
@ -191,7 +195,7 @@ def _get_subtitles(self, video_id, cid, aid=None):
|
|||||||
}
|
}
|
||||||
|
|
||||||
video_info = self._download_json(
|
video_info = self._download_json(
|
||||||
'https://api.bilibili.com/x/player/v2', video_id,
|
'https://api.bilibili.com/x/player/wbi/v2', video_id,
|
||||||
query={'aid': aid, 'cid': cid} if aid else {'bvid': video_id, 'cid': cid},
|
query={'aid': aid, 'cid': cid} if aid else {'bvid': video_id, 'cid': cid},
|
||||||
note=f'Extracting subtitle info {cid}', headers=self._HEADERS)
|
note=f'Extracting subtitle info {cid}', headers=self._HEADERS)
|
||||||
if traverse_obj(video_info, ('data', 'need_login_subtitle')):
|
if traverse_obj(video_info, ('data', 'need_login_subtitle')):
|
||||||
@ -207,7 +211,7 @@ def _get_subtitles(self, video_id, cid, aid=None):
|
|||||||
|
|
||||||
def _get_chapters(self, aid, cid):
|
def _get_chapters(self, aid, cid):
|
||||||
chapters = aid and cid and self._download_json(
|
chapters = aid and cid and self._download_json(
|
||||||
'https://api.bilibili.com/x/player/v2', aid, query={'aid': aid, 'cid': cid},
|
'https://api.bilibili.com/x/player/wbi/v2', aid, query={'aid': aid, 'cid': cid},
|
||||||
note='Extracting chapters', fatal=False, headers=self._HEADERS)
|
note='Extracting chapters', fatal=False, headers=self._HEADERS)
|
||||||
return traverse_obj(chapters, ('data', 'view_points', ..., {
|
return traverse_obj(chapters, ('data', 'view_points', ..., {
|
||||||
'title': 'content',
|
'title': 'content',
|
||||||
@ -286,7 +290,7 @@ def _get_interactive_entries(self, video_id, cid, metainfo, headers=None):
|
|||||||
('data', 'interaction', 'graph_version', {int_or_none}))
|
('data', 'interaction', 'graph_version', {int_or_none}))
|
||||||
cid_edges = self._get_divisions(video_id, graph_version, {1: {'cid': cid}}, 1)
|
cid_edges = self._get_divisions(video_id, graph_version, {1: {'cid': cid}}, 1)
|
||||||
for cid, edges in cid_edges.items():
|
for cid, edges in cid_edges.items():
|
||||||
play_info = self._download_playinfo(video_id, cid, headers=headers)
|
play_info = self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1})
|
||||||
yield {
|
yield {
|
||||||
**metainfo,
|
**metainfo,
|
||||||
'id': f'{video_id}_{cid}',
|
'id': f'{video_id}_{cid}',
|
||||||
@ -639,40 +643,29 @@ def _real_extract(self, url):
|
|||||||
headers['Referer'] = url
|
headers['Referer'] = url
|
||||||
|
|
||||||
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)
|
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)
|
||||||
|
|
||||||
|
if traverse_obj(initial_state, ('error', 'trueCode')) == -403:
|
||||||
|
self.raise_login_required()
|
||||||
|
if traverse_obj(initial_state, ('error', 'trueCode')) == -404:
|
||||||
|
raise ExtractorError(
|
||||||
|
'This video may be deleted or geo-restricted. '
|
||||||
|
'You might want to try a VPN or a proxy server (with --proxy)', expected=True)
|
||||||
|
|
||||||
is_festival = 'videoData' not in initial_state
|
is_festival = 'videoData' not in initial_state
|
||||||
if is_festival:
|
if is_festival:
|
||||||
video_data = initial_state['videoInfo']
|
video_data = initial_state['videoInfo']
|
||||||
else:
|
else:
|
||||||
play_info_obj = self._search_json(
|
|
||||||
r'window\.__playinfo__\s*=', webpage, 'play info', video_id, fatal=False)
|
|
||||||
if not play_info_obj:
|
|
||||||
if traverse_obj(initial_state, ('error', 'trueCode')) == -403:
|
|
||||||
self.raise_login_required()
|
|
||||||
if traverse_obj(initial_state, ('error', 'trueCode')) == -404:
|
|
||||||
raise ExtractorError(
|
|
||||||
'This video may be deleted or geo-restricted. '
|
|
||||||
'You might want to try a VPN or a proxy server (with --proxy)', expected=True)
|
|
||||||
play_info = traverse_obj(play_info_obj, ('data', {dict}))
|
|
||||||
if not play_info:
|
|
||||||
if traverse_obj(play_info_obj, 'code') == 87007:
|
|
||||||
toast = get_element_by_class('tips-toast', webpage) or ''
|
|
||||||
msg = clean_html(
|
|
||||||
f'{get_element_by_class("belongs-to", toast) or ""},'
|
|
||||||
+ (get_element_by_class('level', toast) or ''))
|
|
||||||
raise ExtractorError(
|
|
||||||
f'This is a supporter-only video: {msg}. {self._login_hint()}', expected=True)
|
|
||||||
raise ExtractorError('Failed to extract play info')
|
|
||||||
video_data = initial_state['videoData']
|
video_data = initial_state['videoData']
|
||||||
|
|
||||||
video_id, title = video_data['bvid'], video_data.get('title')
|
video_id, title = video_data['bvid'], video_data.get('title')
|
||||||
|
|
||||||
# Bilibili anthologies are similar to playlists but all videos share the same video ID as the anthology itself.
|
# Bilibili anthologies are similar to playlists but all videos share the same video ID as the anthology itself.
|
||||||
page_list_json = not is_festival and traverse_obj(
|
page_list_json = (not is_festival and traverse_obj(
|
||||||
self._download_json(
|
self._download_json(
|
||||||
'https://api.bilibili.com/x/player/pagelist', video_id,
|
'https://api.bilibili.com/x/player/pagelist', video_id,
|
||||||
fatal=False, query={'bvid': video_id, 'jsonp': 'jsonp'},
|
fatal=False, query={'bvid': video_id, 'jsonp': 'jsonp'},
|
||||||
note='Extracting videos in anthology', headers=headers),
|
note='Extracting videos in anthology', headers=headers),
|
||||||
'data', expected_type=list) or []
|
'data', expected_type=list)) or []
|
||||||
is_anthology = len(page_list_json) > 1
|
is_anthology = len(page_list_json) > 1
|
||||||
|
|
||||||
part_id = int_or_none(parse_qs(url).get('p', [None])[-1])
|
part_id = int_or_none(parse_qs(url).get('p', [None])[-1])
|
||||||
@ -691,8 +684,6 @@ def _real_extract(self, url):
|
|||||||
|
|
||||||
festival_info = {}
|
festival_info = {}
|
||||||
if is_festival:
|
if is_festival:
|
||||||
play_info = self._download_playinfo(video_id, cid, headers=headers)
|
|
||||||
|
|
||||||
festival_info = traverse_obj(initial_state, {
|
festival_info = traverse_obj(initial_state, {
|
||||||
'uploader': ('videoInfo', 'upName'),
|
'uploader': ('videoInfo', 'upName'),
|
||||||
'uploader_id': ('videoInfo', 'upMid', {str_or_none}),
|
'uploader_id': ('videoInfo', 'upMid', {str_or_none}),
|
||||||
@ -727,62 +718,79 @@ def _real_extract(self, url):
|
|||||||
self._get_interactive_entries(video_id, cid, metainfo, headers=headers), **metainfo,
|
self._get_interactive_entries(video_id, cid, metainfo, headers=headers), **metainfo,
|
||||||
duration=traverse_obj(initial_state, ('videoData', 'duration', {int_or_none})),
|
duration=traverse_obj(initial_state, ('videoData', 'duration', {int_or_none})),
|
||||||
__post_extractor=self.extract_comments(aid))
|
__post_extractor=self.extract_comments(aid))
|
||||||
else:
|
|
||||||
formats = self.extract_formats(play_info)
|
|
||||||
|
|
||||||
if not traverse_obj(play_info, ('dash')):
|
play_info = None
|
||||||
# we only have legacy formats and need additional work
|
if self.is_logged_in:
|
||||||
has_qn = lambda x: x in traverse_obj(formats, (..., 'quality'))
|
play_info = traverse_obj(
|
||||||
for qn in traverse_obj(play_info, ('accept_quality', lambda _, v: not has_qn(v), {int})):
|
self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id, default=None),
|
||||||
formats.extend(traverse_obj(
|
('data', {dict}))
|
||||||
self.extract_formats(self._download_playinfo(video_id, cid, headers=headers, qn=qn)),
|
if not play_info:
|
||||||
lambda _, v: not has_qn(v['quality'])))
|
play_info = self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1})
|
||||||
self._check_missing_formats(play_info, formats)
|
formats = self.extract_formats(play_info)
|
||||||
flv_formats = traverse_obj(formats, lambda _, v: v['fragments'])
|
|
||||||
if flv_formats and len(flv_formats) < len(formats):
|
|
||||||
# Flv and mp4 are incompatible due to `multi_video` workaround, so drop one
|
|
||||||
if not self._configuration_arg('prefer_multi_flv'):
|
|
||||||
dropped_fmts = ', '.join(
|
|
||||||
f'{f.get("format_note")} ({f.get("format_id")})' for f in flv_formats)
|
|
||||||
formats = traverse_obj(formats, lambda _, v: not v.get('fragments'))
|
|
||||||
if dropped_fmts:
|
|
||||||
self.to_screen(
|
|
||||||
f'Dropping incompatible flv format(s) {dropped_fmts} since mp4 is available. '
|
|
||||||
'To extract flv, pass --extractor-args "bilibili:prefer_multi_flv"')
|
|
||||||
else:
|
|
||||||
formats = traverse_obj(
|
|
||||||
# XXX: Filtering by extractor-arg is for testing purposes
|
|
||||||
formats, lambda _, v: v['quality'] == int(self._configuration_arg('prefer_multi_flv')[0]),
|
|
||||||
) or [max(flv_formats, key=lambda x: x['quality'])]
|
|
||||||
|
|
||||||
if traverse_obj(formats, (0, 'fragments')):
|
if video_data.get('is_upower_exclusive'):
|
||||||
# We have flv formats, which are individual short videos with their own timestamps and metainfo
|
high_level = traverse_obj(initial_state, ('elecFullInfo', 'show_info', 'high_level', {dict})) or {}
|
||||||
# Binary concatenation corrupts their timestamps, so we need a `multi_video` workaround
|
msg = f'{join_nonempty("title", "sub_title", from_dict=high_level, delim=",")}. {self._login_hint()}'
|
||||||
return {
|
if not formats:
|
||||||
**metainfo,
|
raise ExtractorError(f'This is a supporter-only video: {msg}', expected=True)
|
||||||
'_type': 'multi_video',
|
if '试看' in traverse_obj(play_info, ('accept_description', ..., {str})):
|
||||||
'entries': [{
|
self.report_warning(
|
||||||
'id': f'{metainfo["id"]}_{idx}',
|
f'This is a supporter-only video, only the preview will be extracted: {msg}',
|
||||||
'title': metainfo['title'],
|
video_id=video_id)
|
||||||
'http_headers': metainfo['http_headers'],
|
|
||||||
'formats': [{
|
if not traverse_obj(play_info, 'dash'):
|
||||||
**fragment,
|
# we only have legacy formats and need additional work
|
||||||
'format_id': formats[0].get('format_id'),
|
has_qn = lambda x: x in traverse_obj(formats, (..., 'quality'))
|
||||||
}],
|
for qn in traverse_obj(play_info, ('accept_quality', lambda _, v: not has_qn(v), {int})):
|
||||||
'subtitles': self.extract_subtitles(video_id, cid) if idx == 0 else None,
|
formats.extend(traverse_obj(
|
||||||
'__post_extractor': self.extract_comments(aid) if idx == 0 else None,
|
self.extract_formats(self._download_playinfo(video_id, cid, headers=headers, query={'qn': qn})),
|
||||||
} for idx, fragment in enumerate(formats[0]['fragments'])],
|
lambda _, v: not has_qn(v['quality'])))
|
||||||
'duration': float_or_none(play_info.get('timelength'), scale=1000),
|
self._check_missing_formats(play_info, formats)
|
||||||
}
|
flv_formats = traverse_obj(formats, lambda _, v: v['fragments'])
|
||||||
else:
|
if flv_formats and len(flv_formats) < len(formats):
|
||||||
return {
|
# Flv and mp4 are incompatible due to `multi_video` workaround, so drop one
|
||||||
**metainfo,
|
if not self._configuration_arg('prefer_multi_flv'):
|
||||||
'formats': formats,
|
dropped_fmts = ', '.join(
|
||||||
'duration': float_or_none(play_info.get('timelength'), scale=1000),
|
f'{f.get("format_note")} ({f.get("format_id")})' for f in flv_formats)
|
||||||
'chapters': self._get_chapters(aid, cid),
|
formats = traverse_obj(formats, lambda _, v: not v.get('fragments'))
|
||||||
'subtitles': self.extract_subtitles(video_id, cid),
|
if dropped_fmts:
|
||||||
'__post_extractor': self.extract_comments(aid),
|
self.to_screen(
|
||||||
}
|
f'Dropping incompatible flv format(s) {dropped_fmts} since mp4 is available. '
|
||||||
|
'To extract flv, pass --extractor-args "bilibili:prefer_multi_flv"')
|
||||||
|
else:
|
||||||
|
formats = traverse_obj(
|
||||||
|
# XXX: Filtering by extractor-arg is for testing purposes
|
||||||
|
formats, lambda _, v: v['quality'] == int(self._configuration_arg('prefer_multi_flv')[0]),
|
||||||
|
) or [max(flv_formats, key=lambda x: x['quality'])]
|
||||||
|
|
||||||
|
if traverse_obj(formats, (0, 'fragments')):
|
||||||
|
# We have flv formats, which are individual short videos with their own timestamps and metainfo
|
||||||
|
# Binary concatenation corrupts their timestamps, so we need a `multi_video` workaround
|
||||||
|
return {
|
||||||
|
**metainfo,
|
||||||
|
'_type': 'multi_video',
|
||||||
|
'entries': [{
|
||||||
|
'id': f'{metainfo["id"]}_{idx}',
|
||||||
|
'title': metainfo['title'],
|
||||||
|
'http_headers': metainfo['http_headers'],
|
||||||
|
'formats': [{
|
||||||
|
**fragment,
|
||||||
|
'format_id': formats[0].get('format_id'),
|
||||||
|
}],
|
||||||
|
'subtitles': self.extract_subtitles(video_id, cid) if idx == 0 else None,
|
||||||
|
'__post_extractor': self.extract_comments(aid) if idx == 0 else None,
|
||||||
|
} for idx, fragment in enumerate(formats[0]['fragments'])],
|
||||||
|
'duration': float_or_none(play_info.get('timelength'), scale=1000),
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
**metainfo,
|
||||||
|
'formats': formats,
|
||||||
|
'duration': float_or_none(play_info.get('timelength'), scale=1000),
|
||||||
|
'chapters': self._get_chapters(aid, cid),
|
||||||
|
'subtitles': self.extract_subtitles(video_id, cid),
|
||||||
|
'__post_extractor': self.extract_comments(aid),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
class BiliBiliBangumiIE(BilibiliBaseIE):
|
class BiliBiliBangumiIE(BilibiliBaseIE):
|
||||||
@ -860,10 +868,16 @@ def _real_extract(self, url):
|
|||||||
self.raise_login_required('This video is for premium members only')
|
self.raise_login_required('This video is for premium members only')
|
||||||
|
|
||||||
headers['Referer'] = url
|
headers['Referer'] = url
|
||||||
play_info = self._download_json(
|
|
||||||
'https://api.bilibili.com/pgc/player/web/v2/playurl', episode_id,
|
play_info = (
|
||||||
'Extracting episode', query={'fnval': '4048', 'ep_id': episode_id},
|
self._search_json(
|
||||||
headers=headers)
|
r'playurlSSRData\s*=', webpage, 'embedded page info', episode_id,
|
||||||
|
end_pattern='\n', default=None)
|
||||||
|
or self._download_json(
|
||||||
|
'https://api.bilibili.com/pgc/player/web/v2/playurl', episode_id,
|
||||||
|
'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id},
|
||||||
|
headers=headers))
|
||||||
|
|
||||||
premium_only = play_info.get('code') == -10403
|
premium_only = play_info.get('code') == -10403
|
||||||
play_info = traverse_obj(play_info, ('result', 'video_info', {dict})) or {}
|
play_info = traverse_obj(play_info, ('result', 'video_info', {dict})) or {}
|
||||||
|
|
||||||
@ -1848,6 +1862,47 @@ def _real_extract(self, url):
|
|||||||
ie=BiliBiliIE.ie_key(), video_id=video_id)
|
ie=BiliBiliIE.ie_key(), video_id=video_id)
|
||||||
|
|
||||||
|
|
||||||
|
class BiliBiliDynamicIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://(?:t\.bilibili\.com|(?:www\.)?bilibili\.com/opus)/(?P<id>\d+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://t.bilibili.com/998134289197432852',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'BV1TAmBYVEJr',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'uploader_id': '1192648858',
|
||||||
|
'comment_count': int,
|
||||||
|
'_old_archive_ids': ['bilibili 113457567568273_part1'],
|
||||||
|
'thumbnail': 'http://i2.hdslb.com/bfs/archive/50091efd965d9f13ff6814f7ad374f90ab21e77d.jpg',
|
||||||
|
'duration': 929.238,
|
||||||
|
'upload_date': '20241110',
|
||||||
|
'uploader': '何同学工作室',
|
||||||
|
'like_count': int,
|
||||||
|
'view_count': int,
|
||||||
|
'title': '美国小朋友就玩这个?!何同学工作室11月开箱',
|
||||||
|
'description': '本期产品信息:\n机器狗\n气味模拟器\nCloudboom Strike LS\n无弦吉他\n蓝牙磁带音箱\n神奇画板',
|
||||||
|
'timestamp': 1731232800,
|
||||||
|
'tags': list,
|
||||||
|
'chapters': list,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
post_id = self._match_id(url)
|
||||||
|
# Without the newer chrome UA, the API will return an error (-352)
|
||||||
|
post_data = self._download_json(
|
||||||
|
'https://api.bilibili.com/x/polymer/web-dynamic/v1/detail', post_id,
|
||||||
|
query={'id': post_id}, headers={
|
||||||
|
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||||
|
})
|
||||||
|
video_url = traverse_obj(post_data, (
|
||||||
|
'data', 'item', (None, 'orig'), 'modules', 'module_dynamic',
|
||||||
|
(('major', ('archive', 'pgc')), ('additional', ('reserve', 'common'))),
|
||||||
|
'jump_url', {url_or_none}, any, {sanitize_url}))
|
||||||
|
if not video_url or (self.suitable(video_url) and post_id == self._match_id(video_url)):
|
||||||
|
raise ExtractorError('No valid video URL found', expected=True)
|
||||||
|
return self.url_result(video_url)
|
||||||
|
|
||||||
|
|
||||||
class BiliIntlBaseIE(InfoExtractor):
|
class BiliIntlBaseIE(InfoExtractor):
|
||||||
_API_URL = 'https://api.bilibili.tv/intl/gateway'
|
_API_URL = 'https://api.bilibili.tv/intl/gateway'
|
||||||
_NETRC_MACHINE = 'biliintl'
|
_NETRC_MACHINE = 'biliintl'
|
||||||
|
@ -88,7 +88,7 @@ class BlueskyIE(InfoExtractor):
|
|||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e',
|
'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e',
|
||||||
'md5': '1af9c7fda061cf7593bbffca89e43d1c',
|
'md5': 'cc0110ed1f6b0247caac8234cc1e861d',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '3l3w4tnezek2e',
|
'id': '3l3w4tnezek2e',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
@ -133,6 +133,8 @@ class BlueskyIE(InfoExtractor):
|
|||||||
'channel_follower_count': int,
|
'channel_follower_count': int,
|
||||||
'categories': ['Entertainment'],
|
'categories': ['Entertainment'],
|
||||||
'tags': [],
|
'tags': [],
|
||||||
|
'chapters': list,
|
||||||
|
'heatmap': 'count:100',
|
||||||
},
|
},
|
||||||
'add_ie': ['Youtube'],
|
'add_ie': ['Youtube'],
|
||||||
}, {
|
}, {
|
||||||
@ -184,14 +186,14 @@ class BlueskyIE(InfoExtractor):
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://bsky.app/profile/alt.bun.how/post/3l7rdfxhyds2f',
|
'url': 'https://bsky.app/profile/cinny.bun.how/post/3l7rdfxhyds2f',
|
||||||
'md5': '8775118b235cf9fa6b5ad30f95cda75c',
|
'md5': '8775118b235cf9fa6b5ad30f95cda75c',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '3l7rdfxhyds2f',
|
'id': '3l7rdfxhyds2f',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'uploader': 'cinnamon',
|
'uploader': 'cinnamon',
|
||||||
'uploader_id': 'alt.bun.how',
|
'uploader_id': 'cinny.bun.how',
|
||||||
'uploader_url': 'https://bsky.app/profile/alt.bun.how',
|
'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
|
||||||
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
|
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
|
||||||
'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide',
|
'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide',
|
||||||
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
|
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
|
||||||
@ -341,6 +343,7 @@ def _extract_videos(self, root, video_id, embed_path='embed', record_path='recor
|
|||||||
|
|
||||||
formats.append({
|
formats.append({
|
||||||
'format_id': 'blob',
|
'format_id': 'blob',
|
||||||
|
'quality': 1,
|
||||||
'url': update_url_query(
|
'url': update_url_query(
|
||||||
self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}),
|
self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}),
|
||||||
**traverse_obj(root, (*embed_path, 'aspectRatio', {
|
**traverse_obj(root, (*embed_path, 'aspectRatio', {
|
||||||
|
@ -31,6 +31,7 @@
|
|||||||
update_url_query,
|
update_url_query,
|
||||||
url_or_none,
|
url_or_none,
|
||||||
)
|
)
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
class BrightcoveLegacyIE(InfoExtractor):
|
class BrightcoveLegacyIE(InfoExtractor):
|
||||||
@ -935,8 +936,8 @@ def extract_policy_key():
|
|||||||
|
|
||||||
if content_type == 'playlist':
|
if content_type == 'playlist':
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
[self._parse_brightcove_metadata(vid, vid.get('id'), headers)
|
(self._parse_brightcove_metadata(vid, vid['id'], headers)
|
||||||
for vid in json_data.get('videos', []) if vid.get('id')],
|
for vid in traverse_obj(json_data, ('videos', lambda _, v: v['id']))),
|
||||||
json_data.get('id'), json_data.get('name'),
|
json_data.get('id'), json_data.get('name'),
|
||||||
json_data.get('description'))
|
json_data.get('description'))
|
||||||
|
|
||||||
|
@ -59,16 +59,15 @@ def _extract_from_api(self, video_id, tld):
|
|||||||
'Accept': 'application/json',
|
'Accept': 'application/json',
|
||||||
}, fatal=False, impersonate=True) or {}
|
}, fatal=False, impersonate=True) or {}
|
||||||
|
|
||||||
status = response.get('room_status')
|
|
||||||
if status != 'public':
|
|
||||||
if error := self._ERROR_MAP.get(status):
|
|
||||||
raise ExtractorError(error, expected=True)
|
|
||||||
self.report_warning('Falling back to webpage extraction')
|
|
||||||
return None
|
|
||||||
|
|
||||||
m3u8_url = response.get('url')
|
m3u8_url = response.get('url')
|
||||||
if not m3u8_url:
|
if not m3u8_url:
|
||||||
self.raise_geo_restricted()
|
status = response.get('room_status')
|
||||||
|
if error := self._ERROR_MAP.get(status):
|
||||||
|
raise ExtractorError(error, expected=True)
|
||||||
|
if status == 'public':
|
||||||
|
self.raise_geo_restricted()
|
||||||
|
self.report_warning(f'Got status "{status}" from API; falling back to webpage extraction')
|
||||||
|
return None
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
|
@ -1854,12 +1854,26 @@ def _check_formats(self, formats, video_id):
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _remove_duplicate_formats(formats):
|
def _remove_duplicate_formats(formats):
|
||||||
format_urls = set()
|
seen_urls = set()
|
||||||
|
seen_fragment_urls = set()
|
||||||
unique_formats = []
|
unique_formats = []
|
||||||
for f in formats:
|
for f in formats:
|
||||||
if f['url'] not in format_urls:
|
fragments = f.get('fragments')
|
||||||
format_urls.add(f['url'])
|
if callable(fragments):
|
||||||
unique_formats.append(f)
|
unique_formats.append(f)
|
||||||
|
|
||||||
|
elif fragments:
|
||||||
|
fragment_urls = frozenset(
|
||||||
|
fragment.get('url') or urljoin(f['fragment_base_url'], fragment['path'])
|
||||||
|
for fragment in fragments)
|
||||||
|
if fragment_urls not in seen_fragment_urls:
|
||||||
|
seen_fragment_urls.add(fragment_urls)
|
||||||
|
unique_formats.append(f)
|
||||||
|
|
||||||
|
elif f['url'] not in seen_urls:
|
||||||
|
seen_urls.add(f['url'])
|
||||||
|
unique_formats.append(f)
|
||||||
|
|
||||||
formats[:] = unique_formats
|
formats[:] = unique_formats
|
||||||
|
|
||||||
def _is_valid_url(self, url, video_id, item='video', headers={}):
|
def _is_valid_url(self, url, video_id, item='video', headers={}):
|
||||||
@ -3789,7 +3803,7 @@ def _cookies_passed(self):
|
|||||||
def mark_watched(self, *args, **kwargs):
|
def mark_watched(self, *args, **kwargs):
|
||||||
if not self.get_param('mark_watched', False):
|
if not self.get_param('mark_watched', False):
|
||||||
return
|
return
|
||||||
if self.supports_login() and self._get_login_info()[0] is not None or self._cookies_passed:
|
if (self.supports_login() and self._get_login_info()[0] is not None) or self._cookies_passed:
|
||||||
self._mark_watched(*args, **kwargs)
|
self._mark_watched(*args, **kwargs)
|
||||||
|
|
||||||
def _mark_watched(self, *args, **kwargs):
|
def _mark_watched(self, *args, **kwargs):
|
||||||
|
@ -1,7 +1,4 @@
|
|||||||
import time
|
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..networking import HEADRequest
|
|
||||||
from ..utils import int_or_none
|
from ..utils import int_or_none
|
||||||
|
|
||||||
|
|
||||||
@ -31,9 +28,6 @@ def _real_extract(self, url):
|
|||||||
video_id = mobj.group('id')
|
video_id = mobj.group('id')
|
||||||
display_id = mobj.group('display_id') or video_id
|
display_id = mobj.group('display_id') or video_id
|
||||||
|
|
||||||
# request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
|
|
||||||
self._request_webpage(HEADRequest(
|
|
||||||
'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
|
|
||||||
movie_data = self._download_json(
|
movie_data = self._download_json(
|
||||||
f'http://www.cultureunplugged.com/movie-data/cu-{video_id}.json', display_id)
|
f'http://www.cultureunplugged.com/movie-data/cu-{video_id}.json', display_id)
|
||||||
|
|
||||||
|
@ -1,3 +1,4 @@
|
|||||||
|
import functools
|
||||||
import hashlib
|
import hashlib
|
||||||
import re
|
import re
|
||||||
import time
|
import time
|
||||||
@ -51,6 +52,15 @@ class DacastVODIE(DacastBaseIE):
|
|||||||
'thumbnail': 'https://universe-files.dacast.com/26137208-5858-65c1-5e9a-9d6b6bd2b6c2',
|
'thumbnail': 'https://universe-files.dacast.com/26137208-5858-65c1-5e9a-9d6b6bd2b6c2',
|
||||||
},
|
},
|
||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
|
}, { # /uspaes/ in hls_url
|
||||||
|
'url': 'https://iframe.dacast.com/vod/f9823fc6-faba-b98f-0d00-4a7b50a58c5b/348c5c84-b6af-4859-bb9d-1d01009c795b',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '348c5c84-b6af-4859-bb9d-1d01009c795b',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'pl1-edyta-rubas-211124.mp4',
|
||||||
|
'uploader_id': 'f9823fc6-faba-b98f-0d00-4a7b50a58c5b',
|
||||||
|
'thumbnail': 'https://universe-files.dacast.com/4d0bd042-a536-752d-fc34-ad2fa44bbcbb.png',
|
||||||
|
},
|
||||||
}]
|
}]
|
||||||
_WEBPAGE_TESTS = [{
|
_WEBPAGE_TESTS = [{
|
||||||
'url': 'https://www.dacast.com/support/knowledgebase/how-can-i-embed-a-video-on-my-website/',
|
'url': 'https://www.dacast.com/support/knowledgebase/how-can-i-embed-a-video-on-my-website/',
|
||||||
@ -74,6 +84,15 @@ class DacastVODIE(DacastBaseIE):
|
|||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
}]
|
}]
|
||||||
|
|
||||||
|
@functools.cached_property
|
||||||
|
def _usp_signing_secret(self):
|
||||||
|
player_js = self._download_webpage(
|
||||||
|
'https://player.dacast.com/js/player.js', None, 'Downloading player JS')
|
||||||
|
# Rotates every so often, but hardcode a fallback in case of JS change/breakage before rotation
|
||||||
|
return self._search_regex(
|
||||||
|
r'\bUSP_SIGNING_SECRET\s*=\s*(["\'])(?P<secret>(?:(?!\1).)+)', player_js,
|
||||||
|
'usp signing secret', group='secret', fatal=False) or 'odnInCGqhvtyRTtIiddxtuRtawYYICZP'
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
|
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
|
||||||
query = {'contentId': f'{user_id}-vod-{video_id}', 'provider': 'universe'}
|
query = {'contentId': f'{user_id}-vod-{video_id}', 'provider': 'universe'}
|
||||||
@ -94,10 +113,10 @@ def _real_extract(self, url):
|
|||||||
if 'DRM_EXT' in hls_url:
|
if 'DRM_EXT' in hls_url:
|
||||||
self.report_drm(video_id)
|
self.report_drm(video_id)
|
||||||
elif '/uspaes/' in hls_url:
|
elif '/uspaes/' in hls_url:
|
||||||
# From https://player.dacast.com/js/player.js
|
# Ref: https://player.dacast.com/js/player.js
|
||||||
ts = int(time.time())
|
ts = int(time.time())
|
||||||
signature = hashlib.sha1(
|
signature = hashlib.sha1(
|
||||||
f'{10413792000 - ts}{ts}YfaKtquEEpDeusCKbvYszIEZnWmBcSvw').digest().hex()
|
f'{10413792000 - ts}{ts}{self._usp_signing_secret}'.encode()).digest().hex()
|
||||||
hls_aes['uri'] = f'https://keys.dacast.com/uspaes/{video_id}.key?s={signature}&ts={ts}'
|
hls_aes['uri'] = f'https://keys.dacast.com/uspaes/{video_id}.key?s={signature}&ts={ts}'
|
||||||
|
|
||||||
for retry in self.RetryManager():
|
for retry in self.RetryManager():
|
||||||
|
@ -261,6 +261,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
|||||||
'tags': [],
|
'tags': [],
|
||||||
'view_count': int,
|
'view_count': int,
|
||||||
'like_count': int,
|
'like_count': int,
|
||||||
|
'thumbnail': r're:https://\w+.dmcdn.net/v/WnEY61cmvMxt2Fi6d/x1080',
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
# https://geo.dailymotion.com/player/xf7zn.html?playlist=x7wdsj
|
# https://geo.dailymotion.com/player/xf7zn.html?playlist=x7wdsj
|
||||||
@ -288,6 +289,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
|||||||
'description': 'À bord du « véloto », l’alternative à la voiture pour la campagne',
|
'description': 'À bord du « véloto », l’alternative à la voiture pour la campagne',
|
||||||
'tags': ['biclou', 'vélo', 'véloto', 'campagne', 'voiture', 'environnement', 'véhicules intermédiaires'],
|
'tags': ['biclou', 'vélo', 'véloto', 'campagne', 'voiture', 'environnement', 'véhicules intermédiaires'],
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
# https://geo.dailymotion.com/player/xry80.html?video=x8vu47w
|
||||||
|
'url': 'https://www.metatube.com/en/videos/546765/This-frogs-decorates-Christmas-tree/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'x8vu47w',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'like_count': int,
|
||||||
|
'uploader': 'Metatube',
|
||||||
|
'thumbnail': r're:https://\w+.dmcdn.net/v/W1G_S1coGSFTfkTeR/x1080',
|
||||||
|
'upload_date': '20240326',
|
||||||
|
'view_count': int,
|
||||||
|
'timestamp': 1711496732,
|
||||||
|
'age_limit': 0,
|
||||||
|
'uploader_id': 'x2xpy74',
|
||||||
|
'title': 'Está lindas ranitas ponen su arbolito',
|
||||||
|
'duration': 28,
|
||||||
|
'description': 'Que lindura',
|
||||||
|
'tags': [],
|
||||||
|
},
|
||||||
}]
|
}]
|
||||||
_GEO_BYPASS = False
|
_GEO_BYPASS = False
|
||||||
_COMMON_MEDIA_FIELDS = '''description
|
_COMMON_MEDIA_FIELDS = '''description
|
||||||
@ -302,7 +322,7 @@ def _extract_embed_urls(cls, url, webpage):
|
|||||||
yield from super()._extract_embed_urls(url, webpage)
|
yield from super()._extract_embed_urls(url, webpage)
|
||||||
for mobj in re.finditer(
|
for mobj in re.finditer(
|
||||||
r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
|
r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
|
||||||
yield from 'https://www.dailymotion.com/embed/video/' + mobj.group('id')
|
yield 'https://www.dailymotion.com/embed/video/' + mobj.group('id')
|
||||||
for mobj in re.finditer(
|
for mobj in re.finditer(
|
||||||
r'(?s)<script [^>]*\bsrc=(["\'])(?:https?:)?//[\w-]+\.dailymotion\.com/player/(?:(?!\1).)+\1[^>]*>', webpage):
|
r'(?s)<script [^>]*\bsrc=(["\'])(?:https?:)?//[\w-]+\.dailymotion\.com/player/(?:(?!\1).)+\1[^>]*>', webpage):
|
||||||
attrs = extract_attributes(mobj.group(0))
|
attrs = extract_attributes(mobj.group(0))
|
||||||
|
@ -48,32 +48,30 @@ def _real_extract(self, url):
|
|||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
fn = urllib.parse.unquote(url_basename(url))
|
fn = urllib.parse.unquote(url_basename(url))
|
||||||
title = os.path.splitext(fn)[0]
|
title = os.path.splitext(fn)[0]
|
||||||
password = self.get_param('videopassword')
|
content_id = None
|
||||||
|
|
||||||
for part in self._yield_decoded_parts(webpage):
|
for part in self._yield_decoded_parts(webpage):
|
||||||
if '/sm/password' in part:
|
if '/sm/password' in part:
|
||||||
webpage = self._download_webpage(
|
content_id = self._search_regex(r'content_id=([\w.+=/-]+)', part, 'content ID')
|
||||||
update_url('https://www.dropbox.com/sm/password', query=part.partition('?')[2]), video_id)
|
|
||||||
break
|
break
|
||||||
|
|
||||||
if (self._og_search_title(webpage, default=None) == 'Dropbox - Password Required'
|
if content_id:
|
||||||
or 'Enter the password for this link' in webpage):
|
password = self.get_param('videopassword')
|
||||||
if password:
|
if not password:
|
||||||
response = self._download_json(
|
|
||||||
'https://www.dropbox.com/sm/auth', video_id, 'POSTing video password',
|
|
||||||
headers={'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'},
|
|
||||||
data=urlencode_postdata({
|
|
||||||
'is_xhr': 'true',
|
|
||||||
't': self._get_cookies('https://www.dropbox.com')['t'].value,
|
|
||||||
'content_id': self._search_regex(r'content_id=([\w.+=/-]+)["\']', webpage, 'content id'),
|
|
||||||
'password': password,
|
|
||||||
'url': url,
|
|
||||||
}))
|
|
||||||
|
|
||||||
if response.get('status') != 'authed':
|
|
||||||
raise ExtractorError('Invalid password', expected=True)
|
|
||||||
elif not self._get_cookies('https://dropbox.com').get('sm_auth'):
|
|
||||||
raise ExtractorError('Password protected video, use --video-password <password>', expected=True)
|
raise ExtractorError('Password protected video, use --video-password <password>', expected=True)
|
||||||
|
|
||||||
|
response = self._download_json(
|
||||||
|
'https://www.dropbox.com/sm/auth', video_id, 'POSTing video password',
|
||||||
|
data=urlencode_postdata({
|
||||||
|
'is_xhr': 'true',
|
||||||
|
't': self._get_cookies('https://www.dropbox.com')['t'].value,
|
||||||
|
'content_id': content_id,
|
||||||
|
'password': password,
|
||||||
|
'url': update_url(url, scheme='', netloc=''),
|
||||||
|
}))
|
||||||
|
if response.get('status') != 'authed':
|
||||||
|
raise ExtractorError('Invalid password', expected=True)
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
|
||||||
formats, subtitles = [], {}
|
formats, subtitles = [], {}
|
||||||
|
@ -135,7 +135,7 @@ def _real_extract(self, url):
|
|||||||
self.raise_login_required(method='any')
|
self.raise_login_required(method='any')
|
||||||
raise ExtractorError(login_err, expected=True)
|
raise ExtractorError(login_err, expected=True)
|
||||||
|
|
||||||
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
|
embed_url = self._html_search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
|
||||||
thumbnail = self._og_search_thumbnail(webpage)
|
thumbnail = self._og_search_thumbnail(webpage)
|
||||||
watch_info = get_element_by_id('watch-info', webpage) or ''
|
watch_info = get_element_by_id('watch-info', webpage) or ''
|
||||||
|
|
||||||
|
51
yt_dlp/extractor/drtalks.py
Normal file
51
yt_dlp/extractor/drtalks.py
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
from .brightcove import BrightcoveNewIE
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import url_or_none
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
|
class DrTalksIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?drtalks\.com/videos/(?P<id>[\w-]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://drtalks.com/videos/six-pillars-of-resilience-tools-for-managing-stress-and-flourishing/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '6366193757112',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'uploader_id': '6314452011001',
|
||||||
|
'tags': ['resilience'],
|
||||||
|
'description': 'md5:9c6805aee237ee6de8052461855b9dda',
|
||||||
|
'timestamp': 1734546659,
|
||||||
|
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/12/Episode-82-Eva-Selhub-DrTalks-Thumbs.jpg',
|
||||||
|
'title': 'Six Pillars of Resilience: Tools for Managing Stress and Flourishing',
|
||||||
|
'duration': 2800.682,
|
||||||
|
'upload_date': '20241218',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://drtalks.com/videos/the-pcos-puzzle-mastering-metabolic-health-with-marcelle-pick/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '6364699891112',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'The PCOS Puzzle: Mastering Metabolic Health with Marcelle Pick',
|
||||||
|
'description': 'md5:e87cbe00ca50135d5702787fc4043aaa',
|
||||||
|
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/11/Episode-34-Marcelle-Pick-OBGYN-NP-DrTalks.jpg',
|
||||||
|
'duration': 3515.2,
|
||||||
|
'tags': ['pcos'],
|
||||||
|
'upload_date': '20241114',
|
||||||
|
'timestamp': 1731592119,
|
||||||
|
'uploader_id': '6314452011001',
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
next_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['data']['video']
|
||||||
|
|
||||||
|
return self.url_result(
|
||||||
|
next_data['videos']['brightcoveVideoLink'], BrightcoveNewIE, video_id,
|
||||||
|
url_transparent=True,
|
||||||
|
**traverse_obj(next_data, {
|
||||||
|
'title': ('title', {str}),
|
||||||
|
'description': ('videos', 'summury', {str}),
|
||||||
|
'thumbnail': ('featuredImage', 'node', 'sourceUrl', {url_or_none}),
|
||||||
|
}))
|
@ -5,15 +5,16 @@
|
|||||||
get_element_text_and_html_by_tag,
|
get_element_text_and_html_by_tag,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
join_nonempty,
|
join_nonempty,
|
||||||
|
parse_qs,
|
||||||
str_or_none,
|
str_or_none,
|
||||||
try_call,
|
try_call,
|
||||||
unified_timestamp,
|
unified_timestamp,
|
||||||
)
|
)
|
||||||
from ..utils.traversal import traverse_obj
|
from ..utils.traversal import traverse_obj, value
|
||||||
|
|
||||||
|
|
||||||
class DuoplayIE(InfoExtractor):
|
class DuoplayIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://duoplay\.ee/(?P<id>\d+)/[\w-]+/?(?:\?(?:[^#]+&)?ep=(?P<ep>\d+))?'
|
_VALID_URL = r'https?://duoplay\.ee/(?P<id>\d+)(?:[/?#]|$)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'note': 'Siberi võmm S02E12',
|
'note': 'Siberi võmm S02E12',
|
||||||
'url': 'https://duoplay.ee/4312/siberi-vomm?ep=24',
|
'url': 'https://duoplay.ee/4312/siberi-vomm?ep=24',
|
||||||
@ -34,15 +35,16 @@ class DuoplayIE(InfoExtractor):
|
|||||||
'episode_number': 12,
|
'episode_number': 12,
|
||||||
'episode_id': '24',
|
'episode_id': '24',
|
||||||
},
|
},
|
||||||
|
'skip': 'No video found',
|
||||||
}, {
|
}, {
|
||||||
'note': 'Empty title',
|
'note': 'Empty title',
|
||||||
'url': 'https://duoplay.ee/17/uhikarotid?ep=14',
|
'url': 'https://duoplay.ee/17/uhikarotid?ep=14',
|
||||||
'md5': '6aca68be71112314738dd17cced7f8bf',
|
'md5': 'cba9f5dabf2582b224d80ac44fb80e47',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '17_14',
|
'id': '17_14',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Ühikarotid',
|
'title': 'Episode 14',
|
||||||
'thumbnail': r're:https://.+\.jpg(?:\?c=\d+)?$',
|
'thumbnail': r're:https?://.+\.jpg',
|
||||||
'description': 'md5:4719b418e058c209def41d48b601276e',
|
'description': 'md5:4719b418e058c209def41d48b601276e',
|
||||||
'upload_date': '20100916',
|
'upload_date': '20100916',
|
||||||
'timestamp': 1284661800,
|
'timestamp': 1284661800,
|
||||||
@ -52,6 +54,8 @@ class DuoplayIE(InfoExtractor):
|
|||||||
'season_number': 2,
|
'season_number': 2,
|
||||||
'episode_id': '14',
|
'episode_id': '14',
|
||||||
'release_year': 2010,
|
'release_year': 2010,
|
||||||
|
'episode': 'Episode 14',
|
||||||
|
'episode_number': 14,
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'note': 'Movie without expiry',
|
'note': 'Movie without expiry',
|
||||||
@ -68,10 +72,32 @@ class DuoplayIE(InfoExtractor):
|
|||||||
'timestamp': 1671054000,
|
'timestamp': 1671054000,
|
||||||
'release_year': 2018,
|
'release_year': 2018,
|
||||||
},
|
},
|
||||||
|
'skip': 'No video found',
|
||||||
|
}, {
|
||||||
|
'note': 'Episode url without show name',
|
||||||
|
'url': 'https://duoplay.ee/9644?ep=185',
|
||||||
|
'md5': '63f324b4fe2dbd8194dca16a6d52184a',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '9644_185',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Episode 185',
|
||||||
|
'thumbnail': r're:https?://.+\.jpg',
|
||||||
|
'description': 'md5:ed25ba4e9e5d54bc291a4a0cdd241467',
|
||||||
|
'upload_date': '20241120',
|
||||||
|
'timestamp': 1732077000,
|
||||||
|
'episode': 'Episode 63',
|
||||||
|
'episode_id': '185',
|
||||||
|
'episode_number': 63,
|
||||||
|
'season': 'Season 2',
|
||||||
|
'season_number': 2,
|
||||||
|
'series': 'Telehommik',
|
||||||
|
'series_id': '9644',
|
||||||
|
},
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
telecast_id, episode = self._match_valid_url(url).group('id', 'ep')
|
telecast_id = self._match_id(url)
|
||||||
|
episode = traverse_obj(parse_qs(url), ('ep', 0, {int_or_none}, {str_or_none}))
|
||||||
video_id = join_nonempty(telecast_id, episode, delim='_')
|
video_id = join_nonempty(telecast_id, episode, delim='_')
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
video_player = try_call(lambda: extract_attributes(
|
video_player = try_call(lambda: extract_attributes(
|
||||||
@ -79,25 +105,33 @@ def _real_extract(self, url):
|
|||||||
if not video_player or not video_player.get('manifest-url'):
|
if not video_player or not video_player.get('manifest-url'):
|
||||||
raise ExtractorError('No video found', expected=True)
|
raise ExtractorError('No video found', expected=True)
|
||||||
|
|
||||||
|
manifest_url = video_player['manifest-url']
|
||||||
|
session_token = self._download_json(
|
||||||
|
'https://sts.postimees.ee/session/register', video_id, 'Registering session',
|
||||||
|
'Unable to register session', headers={
|
||||||
|
'Accept': 'application/json',
|
||||||
|
'X-Original-URI': manifest_url,
|
||||||
|
})['session']
|
||||||
|
|
||||||
episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {}
|
episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
|
'formats': self._extract_m3u8_formats(manifest_url, video_id, 'mp4', query={'s': session_token}),
|
||||||
**traverse_obj(episode_attr, {
|
**traverse_obj(episode_attr, {
|
||||||
'title': 'title',
|
'title': ('title', {str}),
|
||||||
'description': 'synopsis',
|
'description': ('synopsis', {str}),
|
||||||
'thumbnail': ('images', 'original'),
|
'thumbnail': ('images', 'original'),
|
||||||
'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}),
|
'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}),
|
||||||
'cast': ('cast', {lambda x: x.split(', ')}),
|
'cast': ('cast', filter, {lambda x: x.split(', ')}),
|
||||||
'release_year': ('year', {int_or_none}),
|
'release_year': ('year', {int_or_none}),
|
||||||
}),
|
}),
|
||||||
**(traverse_obj(episode_attr, {
|
**(traverse_obj(episode_attr, {
|
||||||
'title': (None, ('subtitle', ('episode_nr', {lambda x: f'Episode {x}' if x else None}))),
|
'title': (None, (('subtitle', {str}, filter), {value(f'Episode {episode}' if episode else None)})),
|
||||||
'series': 'title',
|
'series': ('title', {str}),
|
||||||
'series_id': ('telecast_id', {str_or_none}),
|
'series_id': ('telecast_id', {str_or_none}),
|
||||||
'season_number': ('season_id', {int_or_none}),
|
'season_number': ('season_id', {int_or_none}),
|
||||||
'episode': 'subtitle',
|
'episode': ('subtitle', {str}, filter),
|
||||||
'episode_number': ('episode_nr', {int_or_none}),
|
'episode_number': ('episode_nr', {int_or_none}),
|
||||||
'episode_id': ('episode_id', {str_or_none}),
|
'episode_id': ('episode_id', {str_or_none}),
|
||||||
}, get_all=False) if episode_attr.get('category') != 'movies' else {}),
|
}, get_all=False) if episode_attr.get('category') != 'movies' else {}),
|
||||||
|
@ -162,7 +162,7 @@ def _real_extract(self, url):
|
|||||||
items = re.findall(r'(?s)playlist\.push\(({.+?})\);', webpage)
|
items = re.findall(r'(?s)playlist\.push\(({.+?})\);', webpage)
|
||||||
if items:
|
if items:
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
[self._parse_video_metadata(i, video_id, timestamp) for i in items],
|
(self._parse_video_metadata(i, video_id, timestamp) for i in items),
|
||||||
video_id, self._html_search_meta('twitter:title', webpage))
|
video_id, self._html_search_meta('twitter:title', webpage))
|
||||||
|
|
||||||
item = self._search_regex(
|
item = self._search_regex(
|
||||||
|
155
yt_dlp/extractor/eggs.py
Normal file
155
yt_dlp/extractor/eggs.py
Normal file
@ -0,0 +1,155 @@
|
|||||||
|
import secrets
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from .youtube import YoutubeIE
|
||||||
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
parse_iso8601,
|
||||||
|
str_or_none,
|
||||||
|
url_or_none,
|
||||||
|
)
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
|
class EggsBaseIE(InfoExtractor):
|
||||||
|
_API_HEADERS = {
|
||||||
|
'Accept': '*/*',
|
||||||
|
'apVersion': '8.2.00',
|
||||||
|
'deviceName': 'Android',
|
||||||
|
}
|
||||||
|
|
||||||
|
def _real_initialize(self):
|
||||||
|
self._API_HEADERS['deviceId'] = secrets.token_hex(8)
|
||||||
|
|
||||||
|
def _call_api(self, endpoint, video_id):
|
||||||
|
return self._download_json(
|
||||||
|
f'https://app-front-api.eggs.mu/v1/{endpoint}', video_id,
|
||||||
|
headers=self._API_HEADERS)
|
||||||
|
|
||||||
|
def _extract_music_info(self, data):
|
||||||
|
if yt_url := traverse_obj(data, ('youtubeUrl', {url_or_none})):
|
||||||
|
return self.url_result(yt_url, ie=YoutubeIE)
|
||||||
|
|
||||||
|
artist_name = traverse_obj(data, ('artist', 'artistName', {str_or_none}))
|
||||||
|
music_id = traverse_obj(data, ('musicId', {str_or_none}))
|
||||||
|
webpage_url = None
|
||||||
|
if artist_name and music_id:
|
||||||
|
webpage_url = f'https://eggs.mu/artist/{artist_name}/song/{music_id}'
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': music_id,
|
||||||
|
'vcodec': 'none',
|
||||||
|
'webpage_url': webpage_url,
|
||||||
|
'extractor_key': EggsIE.ie_key(),
|
||||||
|
'extractor': EggsIE.IE_NAME,
|
||||||
|
**traverse_obj(data, {
|
||||||
|
'title': ('musicTitle', {str}),
|
||||||
|
'url': ('musicDataPath', {url_or_none}),
|
||||||
|
'uploader': ('artist', 'displayName', {str}),
|
||||||
|
'uploader_id': ('artist', 'artistId', {str_or_none}),
|
||||||
|
'thumbnail': ('imageDataPath', {url_or_none}),
|
||||||
|
'view_count': ('numberOfMusicPlays', {int_or_none}),
|
||||||
|
'like_count': ('numberOfLikes', {int_or_none}),
|
||||||
|
'comment_count': ('numberOfComments', {int_or_none}),
|
||||||
|
'composers': ('composer', {str}, all),
|
||||||
|
'tags': ('tags', ..., {str}),
|
||||||
|
'timestamp': ('releaseDate', {parse_iso8601}),
|
||||||
|
'artist': ('artist', 'displayName', {str}),
|
||||||
|
})}
|
||||||
|
|
||||||
|
|
||||||
|
class EggsIE(EggsBaseIE):
|
||||||
|
IE_NAME = 'eggs:single'
|
||||||
|
_VALID_URL = r'https?://eggs\.mu/artist/[^/?#]+/song/(?P<id>[\da-f-]+)'
|
||||||
|
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://eggs.mu/artist/32_sunny_girl/song/0e95fd1d-4d61-4d5b-8b18-6092c551da90',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '0e95fd1d-4d61-4d5b-8b18-6092c551da90',
|
||||||
|
'ext': 'm4a',
|
||||||
|
'title': 'シネマと信号',
|
||||||
|
'uploader': 'Sunny Girl',
|
||||||
|
'thumbnail': r're:https?://.*\.jpg(?:\?.*)?$',
|
||||||
|
'uploader_id': '1607',
|
||||||
|
'like_count': int,
|
||||||
|
'timestamp': 1731327327,
|
||||||
|
'composers': ['橘高連太郎'],
|
||||||
|
'view_count': int,
|
||||||
|
'comment_count': int,
|
||||||
|
'artists': ['Sunny Girl'],
|
||||||
|
'upload_date': '20241111',
|
||||||
|
'tags': ['SunnyGirl', 'シネマと信号'],
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://eggs.mu/artist/KAMO_3pband/song/1d4bc45f-1af6-47a9-8b30-a70cae350b4f',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '80cLKA2wnoA',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'KAMO「いい女だから」Audio',
|
||||||
|
'uploader': 'KAMO',
|
||||||
|
'live_status': 'not_live',
|
||||||
|
'channel_id': 'UCsHLBw2__5Q9y55skXPotOg',
|
||||||
|
'channel_follower_count': int,
|
||||||
|
'description': 'md5:d260da711ecbec3e720293dc11401b87',
|
||||||
|
'availability': 'public',
|
||||||
|
'uploader_id': '@KAMO_band',
|
||||||
|
'upload_date': '20240925',
|
||||||
|
'thumbnail': 'https://i.ytimg.com/vi/80cLKA2wnoA/maxresdefault.jpg',
|
||||||
|
'comment_count': int,
|
||||||
|
'channel_url': 'https://www.youtube.com/channel/UCsHLBw2__5Q9y55skXPotOg',
|
||||||
|
'view_count': int,
|
||||||
|
'duration': 151,
|
||||||
|
'like_count': int,
|
||||||
|
'channel': 'KAMO',
|
||||||
|
'playable_in_embed': True,
|
||||||
|
'uploader_url': 'https://www.youtube.com/@KAMO_band',
|
||||||
|
'tags': [],
|
||||||
|
'timestamp': 1727271121,
|
||||||
|
'age_limit': 0,
|
||||||
|
'categories': ['People & Blogs'],
|
||||||
|
},
|
||||||
|
'add_ie': ['Youtube'],
|
||||||
|
'params': {'skip_download': 'Youtube'},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
song_id = self._match_id(url)
|
||||||
|
json_data = self._call_api(f'musics/{song_id}', song_id)
|
||||||
|
return self._extract_music_info(json_data)
|
||||||
|
|
||||||
|
|
||||||
|
class EggsArtistIE(EggsBaseIE):
|
||||||
|
IE_NAME = 'eggs:artist'
|
||||||
|
_VALID_URL = r'https?://eggs\.mu/artist/(?P<id>\w+)/?(?:[?#&]|$)'
|
||||||
|
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://eggs.mu/artist/32_sunny_girl',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '32_sunny_girl',
|
||||||
|
'thumbnail': 'https://image-pro.eggs.mu/profile/1607.jpeg?updated_at=2024-04-03T20%3A06%3A00%2B09%3A00',
|
||||||
|
'description': 'Muddy Mine / 東京高田馬場CLUB PHASE / Gt.Vo 橘高 連太郎 / Ba.Cho 小野 ゆうき / Dr 大森 りゅうひこ',
|
||||||
|
'title': 'Sunny Girl',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 18,
|
||||||
|
}, {
|
||||||
|
'url': 'https://eggs.mu/artist/KAMO_3pband',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'KAMO_3pband',
|
||||||
|
'description': '川崎発3ピースバンド',
|
||||||
|
'thumbnail': 'https://image-pro.eggs.mu/profile/35217.jpeg?updated_at=2024-11-27T16%3A31%3A50%2B09%3A00',
|
||||||
|
'title': 'KAMO',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 2,
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
artist_id = self._match_id(url)
|
||||||
|
artist_data = self._call_api(f'artists/{artist_id}', artist_id)
|
||||||
|
song_data = self._call_api(f'artists/{artist_id}/musics', artist_id)
|
||||||
|
return self.playlist_result(
|
||||||
|
traverse_obj(song_data, ('data', ..., {dict}, {self._extract_music_info})),
|
||||||
|
playlist_id=artist_id, **traverse_obj(artist_data, {
|
||||||
|
'title': ('displayName', {str}),
|
||||||
|
'description': ('profile', {str}),
|
||||||
|
'thumbnail': ('imageDataPath', {url_or_none}),
|
||||||
|
}))
|
@ -50,7 +50,7 @@ class FacebookIE(InfoExtractor):
|
|||||||
[^/]+/videos/(?:[^/]+/)?|
|
[^/]+/videos/(?:[^/]+/)?|
|
||||||
[^/]+/posts/|
|
[^/]+/posts/|
|
||||||
events/(?:[^/]+/)?|
|
events/(?:[^/]+/)?|
|
||||||
groups/[^/]+/(?:permalink|posts)/|
|
groups/[^/]+/(?:permalink|posts)/(?:[\da-f]+/)?|
|
||||||
watchparty/
|
watchparty/
|
||||||
)|
|
)|
|
||||||
facebook:
|
facebook:
|
||||||
@ -410,6 +410,9 @@ class FacebookIE(InfoExtractor):
|
|||||||
'uploader': 'Comitato Liberi Pensatori',
|
'uploader': 'Comitato Liberi Pensatori',
|
||||||
'uploader_id': '100065709540881',
|
'uploader_id': '100065709540881',
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.facebook.com/groups/1513990329015294/posts/d41d8cd9/2013209885760000/?app=fbl',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
_SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)'
|
_SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)'
|
||||||
_api_config = {
|
_api_config = {
|
||||||
|
@ -193,9 +193,9 @@ def _real_extract(self, url):
|
|||||||
|
|
||||||
for lang, version, fmt in self._get_experiences(episode):
|
for lang, version, fmt in self._get_experiences(episode):
|
||||||
experience_id = str(fmt['experienceId'])
|
experience_id = str(fmt['experienceId'])
|
||||||
if (only_initial_experience and experience_id != initial_experience_id
|
if ((only_initial_experience and experience_id != initial_experience_id)
|
||||||
or requested_languages and lang.lower() not in requested_languages
|
or (requested_languages and lang.lower() not in requested_languages)
|
||||||
or requested_versions and version.lower() not in requested_versions):
|
or (requested_versions and version.lower() not in requested_versions)):
|
||||||
continue
|
continue
|
||||||
thumbnails.append({'url': fmt.get('poster')})
|
thumbnails.append({'url': fmt.get('poster')})
|
||||||
duration = max(duration, fmt.get('duration', 0))
|
duration = max(duration, fmt.get('duration', 0))
|
||||||
|
@ -254,7 +254,7 @@ def _real_extract(self, url):
|
|||||||
|
|
||||||
|
|
||||||
class InstagramIE(InstagramBaseIE):
|
class InstagramIE(InstagramBaseIE):
|
||||||
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com(?:/[^/]+)?/(?:p|tv|reels?(?!/audio/))/(?P<id>[^/?#&]+))'
|
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com(?:/(?!share/)[^/?#]+)?/(?:p|tv|reels?(?!/audio/))/(?P<id>[^/?#&]+))'
|
||||||
_EMBED_REGEX = [r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1']
|
_EMBED_REGEX = [r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1']
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
|
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
|
||||||
|
@ -310,7 +310,13 @@ def _real_extract(self, url):
|
|||||||
if stream_type in self._SUPPORTED_STREAM_TYPES:
|
if stream_type in self._SUPPORTED_STREAM_TYPES:
|
||||||
claim_id, is_live = result['claim_id'], False
|
claim_id, is_live = result['claim_id'], False
|
||||||
streaming_url = self._call_api_proxy(
|
streaming_url = self._call_api_proxy(
|
||||||
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
|
'get', claim_id, {
|
||||||
|
'uri': uri,
|
||||||
|
**traverse_obj(parse_qs(url), {
|
||||||
|
'signature': ('signature', 0),
|
||||||
|
'signature_ts': ('signature_ts', 0),
|
||||||
|
}),
|
||||||
|
}, 'streaming url')['streaming_url']
|
||||||
|
|
||||||
# GET request to v3 API returns original video/audio file if available
|
# GET request to v3 API returns original video/audio file if available
|
||||||
direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url)
|
direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url)
|
||||||
|
@ -26,6 +26,7 @@ class MicrosoftEmbedIE(InfoExtractor):
|
|||||||
'timestamp': 1631658316,
|
'timestamp': 1631658316,
|
||||||
'upload_date': '20210914',
|
'upload_date': '20210914',
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Failed to parse XML: syntax error: line 1, column 0'],
|
||||||
}]
|
}]
|
||||||
_API_URL = 'https://prod-video-cms-rt-microsoft-com.akamaized.net/vhs/api/videos/'
|
_API_URL = 'https://prod-video-cms-rt-microsoft-com.akamaized.net/vhs/api/videos/'
|
||||||
|
|
||||||
@ -36,11 +37,11 @@ def _real_extract(self, url):
|
|||||||
formats = []
|
formats = []
|
||||||
for source_type, source in metadata['streams'].items():
|
for source_type, source in metadata['streams'].items():
|
||||||
if source_type == 'smooth_Streaming':
|
if source_type == 'smooth_Streaming':
|
||||||
formats.extend(self._extract_ism_formats(source['url'], video_id, 'mss'))
|
formats.extend(self._extract_ism_formats(source['url'], video_id, 'mss', fatal=False))
|
||||||
elif source_type == 'apple_HTTP_Live_Streaming':
|
elif source_type == 'apple_HTTP_Live_Streaming':
|
||||||
formats.extend(self._extract_m3u8_formats(source['url'], video_id, 'mp4'))
|
formats.extend(self._extract_m3u8_formats(source['url'], video_id, 'mp4', fatal=False))
|
||||||
elif source_type == 'mPEG_DASH':
|
elif source_type == 'mPEG_DASH':
|
||||||
formats.extend(self._extract_mpd_formats(source['url'], video_id))
|
formats.extend(self._extract_mpd_formats(source['url'], video_id, fatal=False))
|
||||||
else:
|
else:
|
||||||
formats.append({
|
formats.append({
|
||||||
'format_id': source_type,
|
'format_id': source_type,
|
||||||
|
@ -80,9 +80,9 @@ class MiTeleIE(TelecincoBaseIE):
|
|||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
display_id = self._match_id(url)
|
display_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, display_id)
|
webpage = self._download_webpage(url, display_id)
|
||||||
pre_player = self._parse_json(self._search_regex(
|
pre_player = self._search_json(
|
||||||
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
|
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
|
||||||
webpage, 'Pre Player'), display_id)['prePlayer']
|
webpage, 'Pre Player', display_id)['prePlayer']
|
||||||
title = pre_player['title']
|
title = pre_player['title']
|
||||||
video_info = self._parse_content(pre_player['video'], url)
|
video_info = self._parse_content(pre_player['video'], url)
|
||||||
content = pre_player.get('content') or {}
|
content = pre_player.get('content') or {}
|
||||||
|
117
yt_dlp/extractor/nest.py
Normal file
117
yt_dlp/extractor/nest.py
Normal file
@ -0,0 +1,117 @@
|
|||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import ExtractorError, float_or_none, update_url_query, url_or_none
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
|
class NestIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?live/(?P<id>\w+)'
|
||||||
|
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://video.nest.com/embedded/live/4fvYdSo8AX?autoplay=0',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '4fvYdSo8AX',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'startswith:Outside ',
|
||||||
|
'alt_title': 'Outside',
|
||||||
|
'description': '<null>',
|
||||||
|
'location': 'Los Angeles',
|
||||||
|
'availability': 'public',
|
||||||
|
'thumbnail': r're:https?://',
|
||||||
|
'live_status': 'is_live',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
# m3u8 download
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.nest.com/live/4fvYdSo8AX',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
_WEBPAGE_TESTS = [{
|
||||||
|
'url': 'https://www.pacificblue.biz/noyo-harbor-webcam/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '4fvYdSo8AX',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'startswith:Outside ',
|
||||||
|
'alt_title': 'Outside',
|
||||||
|
'description': '<null>',
|
||||||
|
'location': 'Los Angeles',
|
||||||
|
'availability': 'public',
|
||||||
|
'thumbnail': r're:https?://',
|
||||||
|
'live_status': 'is_live',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
# m3u8 download
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
item = self._download_json(
|
||||||
|
'https://video.nest.com/api/dropcam/cameras.get_by_public_token',
|
||||||
|
video_id, query={'token': video_id})['items'][0]
|
||||||
|
uuid = item.get('uuid')
|
||||||
|
stream_domain = item.get('live_stream_host')
|
||||||
|
if not stream_domain or not uuid:
|
||||||
|
raise ExtractorError('Unable to construct playlist URL')
|
||||||
|
|
||||||
|
thumb_domain = item.get('nexus_api_nest_domain_host')
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
**traverse_obj(item, {
|
||||||
|
'description': ('description', {str}),
|
||||||
|
'title': (('title', 'name', 'where'), {str}, filter, any),
|
||||||
|
'alt_title': ('name', {str}),
|
||||||
|
'location': ((('timezone', {lambda x: x.split('/')[1].replace('_', ' ')}), 'where'), {str}, filter, any),
|
||||||
|
}),
|
||||||
|
'thumbnail': update_url_query(
|
||||||
|
f'https://{thumb_domain}/get_image',
|
||||||
|
{'uuid': uuid, 'public': video_id}) if thumb_domain else None,
|
||||||
|
'availability': self._availability(is_private=item.get('is_public') is False),
|
||||||
|
'formats': self._extract_m3u8_formats(
|
||||||
|
f'https://{stream_domain}/nexus_aac/{uuid}/playlist.m3u8',
|
||||||
|
video_id, 'mp4', live=True, query={'public': video_id}),
|
||||||
|
'is_live': True,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class NestClipIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?clip/(?P<id>\w+)'
|
||||||
|
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://video.nest.com/clip/f34c9dd237a44eca9a0001af685e3dff',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'f34c9dd237a44eca9a0001af685e3dff',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'NestClip video #f34c9dd237a44eca9a0001af685e3dff',
|
||||||
|
'thumbnail': 'https://clips.dropcam.com/f34c9dd237a44eca9a0001af685e3dff.jpg',
|
||||||
|
'timestamp': 1735413474.468,
|
||||||
|
'upload_date': '20241228',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.nest.com/embedded/clip/34e0432adc3c46a98529443d8ad5aa76',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '34e0432adc3c46a98529443d8ad5aa76',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Shootout at Veterans Boulevard at Fleur De Lis Drive',
|
||||||
|
'thumbnail': 'https://clips.dropcam.com/34e0432adc3c46a98529443d8ad5aa76.jpg',
|
||||||
|
'upload_date': '20230817',
|
||||||
|
'timestamp': 1692262897.191,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
data = self._download_json(
|
||||||
|
'https://video.nest.com/api/dropcam/videos.get_by_filename', video_id,
|
||||||
|
query={'filename': f'{video_id}.mp4'})
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
**traverse_obj(data, ('items', 0, {
|
||||||
|
'title': ('title', {str}),
|
||||||
|
'thumbnail': ('thumbnail_url', {url_or_none}),
|
||||||
|
'url': ('download_url', {url_or_none}),
|
||||||
|
'timestamp': ('start_time', {float_or_none}),
|
||||||
|
})),
|
||||||
|
}
|
@ -12,6 +12,7 @@
|
|||||||
parse_iso8601,
|
parse_iso8601,
|
||||||
str_or_none,
|
str_or_none,
|
||||||
try_get,
|
try_get,
|
||||||
|
update_url_query,
|
||||||
url_or_none,
|
url_or_none,
|
||||||
urljoin,
|
urljoin,
|
||||||
)
|
)
|
||||||
@ -171,6 +172,8 @@ def call_playback_api(item, query=None):
|
|||||||
format_url = url_or_none(asset.get('url'))
|
format_url = url_or_none(asset.get('url'))
|
||||||
if not format_url:
|
if not format_url:
|
||||||
continue
|
continue
|
||||||
|
# Remove the 'adap' query parameter
|
||||||
|
format_url = update_url_query(format_url, {'adap': []})
|
||||||
asset_format = (asset.get('format') or '').lower()
|
asset_format = (asset.get('format') or '').lower()
|
||||||
if asset_format == 'hls' or determine_ext(format_url) == 'm3u8':
|
if asset_format == 'hls' or determine_ext(format_url) == 'm3u8':
|
||||||
formats.extend(self._extract_nrk_formats(format_url, video_id))
|
formats.extend(self._extract_nrk_formats(format_url, video_id))
|
||||||
|
@ -343,7 +343,7 @@ def _real_extract(self, url):
|
|||||||
if media_ids:
|
if media_ids:
|
||||||
media_ids.append(lead_video_id)
|
media_ids.append(lead_video_id)
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
[self._extract_video(media_id) for media_id in media_ids], page_id, title, description)
|
map(self._extract_video, media_ids), page_id, title, description)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
**self._extract_video(lead_video_id),
|
**self._extract_video(lead_video_id),
|
||||||
|
@ -457,7 +457,7 @@ class PatreonCampaignIE(PatreonBaseIE):
|
|||||||
_VALID_URL = r'''(?x)
|
_VALID_URL = r'''(?x)
|
||||||
https?://(?:www\.)?patreon\.com/(?:
|
https?://(?:www\.)?patreon\.com/(?:
|
||||||
(?:m|api/campaigns)/(?P<campaign_id>\d+)|
|
(?:m|api/campaigns)/(?P<campaign_id>\d+)|
|
||||||
(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
|
(?:c/)?(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
|
||||||
)(?:/posts)?/?(?:$|[?#])'''
|
)(?:/posts)?/?(?:$|[?#])'''
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.patreon.com/dissonancepod/',
|
'url': 'https://www.patreon.com/dissonancepod/',
|
||||||
@ -509,6 +509,26 @@ class PatreonCampaignIE(PatreonBaseIE):
|
|||||||
'thumbnail': r're:^https?://.*$',
|
'thumbnail': r're:^https?://.*$',
|
||||||
},
|
},
|
||||||
'playlist_mincount': 201,
|
'playlist_mincount': 201,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.patreon.com/c/OgSog',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '8504388',
|
||||||
|
'title': 'OGSoG',
|
||||||
|
'description': r're:(?s)Hello and welcome to our Patreon page. We are Mari, Lasercorn, .+',
|
||||||
|
'channel': 'OGSoG',
|
||||||
|
'channel_id': '8504388',
|
||||||
|
'channel_url': 'https://www.patreon.com/OgSog',
|
||||||
|
'uploader_url': 'https://www.patreon.com/OgSog',
|
||||||
|
'uploader_id': '72323575',
|
||||||
|
'uploader': 'David Moss',
|
||||||
|
'thumbnail': r're:https?://.+/.+',
|
||||||
|
'channel_follower_count': int,
|
||||||
|
'age_limit': 0,
|
||||||
|
},
|
||||||
|
'playlist_mincount': 331,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.patreon.com/c/OgSog/posts',
|
||||||
|
'only_matching': True,
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://www.patreon.com/dissonancepod/posts',
|
'url': 'https://www.patreon.com/dissonancepod/posts',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
99
yt_dlp/extractor/piramidetv.py
Normal file
99
yt_dlp/extractor/piramidetv.py
Normal file
@ -0,0 +1,99 @@
|
|||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import parse_iso8601, smuggle_url, unsmuggle_url, url_or_none
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
|
class PiramideTVIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://piramide\.tv/video/(?P<id>[\w-]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://piramide.tv/video/wWtBAORdJUTh',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'wWtBAORdJUTh',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'md5:79f9c8183ea6a35c836923142cf0abcc',
|
||||||
|
'description': '',
|
||||||
|
'thumbnail': 'https://cdn.jwplayer.com/v2/media/W86PgQDn/thumbnails/B9gpIxkH.jpg',
|
||||||
|
'channel': 'León Picarón',
|
||||||
|
'channel_id': 'leonpicaron',
|
||||||
|
'timestamp': 1696460362,
|
||||||
|
'upload_date': '20231004',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://piramide.tv/video/wcYn6li79NgN',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'wcYn6li79NgN',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'ACEPTO TENER UN BEBE CON MI NOVIA\u2026? | Parte 1',
|
||||||
|
'description': '',
|
||||||
|
'channel': 'ARTA GAME',
|
||||||
|
'channel_id': 'arta_game',
|
||||||
|
'thumbnail': 'https://cdn.jwplayer.com/v2/media/cnEdGp5X/thumbnails/rHAaWfP7.jpg',
|
||||||
|
'timestamp': 1703434976,
|
||||||
|
'upload_date': '20231224',
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _extract_video(self, video_id):
|
||||||
|
video_data = self._download_json(
|
||||||
|
f'https://hermes.piramide.tv/video/data/{video_id}', video_id, fatal=False)
|
||||||
|
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
|
||||||
|
f'https://cdn.piramide.tv/video/{video_id}/manifest.m3u8', video_id, fatal=False)
|
||||||
|
next_video = traverse_obj(video_data, ('video', 'next_video', 'id', {str}))
|
||||||
|
return next_video, {
|
||||||
|
'id': video_id,
|
||||||
|
'formats': formats,
|
||||||
|
'subtitles': subtitles,
|
||||||
|
**traverse_obj(video_data, ('video', {
|
||||||
|
'id': ('id', {str}),
|
||||||
|
'title': ('title', {str}),
|
||||||
|
'description': ('description', {str}),
|
||||||
|
'thumbnail': ('media', 'thumbnail', {url_or_none}),
|
||||||
|
'channel': ('channel', 'name', {str}),
|
||||||
|
'channel_id': ('channel', 'id', {str}),
|
||||||
|
'timestamp': ('date', {parse_iso8601}),
|
||||||
|
})),
|
||||||
|
}
|
||||||
|
|
||||||
|
def _entries(self, video_id):
|
||||||
|
visited = set()
|
||||||
|
while True:
|
||||||
|
visited.add(video_id)
|
||||||
|
next_video, info = self._extract_video(video_id)
|
||||||
|
yield info
|
||||||
|
if not next_video or next_video in visited:
|
||||||
|
break
|
||||||
|
video_id = next_video
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
url, smuggled_data = unsmuggle_url(url, {})
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
if self._yes_playlist(video_id, video_id, smuggled_data):
|
||||||
|
return self.playlist_result(self._entries(video_id), video_id)
|
||||||
|
return self._extract_video(video_id)[1]
|
||||||
|
|
||||||
|
|
||||||
|
class PiramideTVChannelIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://piramide\.tv/channel/(?P<id>[\w-]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://piramide.tv/channel/thekalo',
|
||||||
|
'playlist_mincount': 10,
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'thekalo',
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _entries(self, channel_name):
|
||||||
|
videos = self._download_json(
|
||||||
|
f'https://hermes.piramide.tv/channel/list/{channel_name}/date/100000', channel_name)
|
||||||
|
for video in traverse_obj(videos, ('videos', lambda _, v: v['id'])):
|
||||||
|
yield self.url_result(smuggle_url(
|
||||||
|
f'https://piramide.tv/video/{video["id"]}', {'force_noplaylist': True}),
|
||||||
|
**traverse_obj(video, {
|
||||||
|
'id': ('id', {str}),
|
||||||
|
'title': ('title', {str}),
|
||||||
|
'description': ('description', {str}),
|
||||||
|
}))
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
channel_name = self._match_id(url)
|
||||||
|
return self.playlist_result(self._entries(channel_name), channel_name)
|
@ -1,4 +1,5 @@
|
|||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
|
from ..networking.exceptions import HTTPError
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
traverse_obj,
|
traverse_obj,
|
||||||
@ -110,8 +111,8 @@ def _real_extract(self, url):
|
|||||||
if not traverse_obj(data, 'is_broadcasting'):
|
if not traverse_obj(data, 'is_broadcasting'):
|
||||||
try:
|
try:
|
||||||
self._call_api(user_id, 'users/current.json', url, 'Investigating reason for request failure')
|
self._call_api(user_id, 'users/current.json', url, 'Investigating reason for request failure')
|
||||||
except ExtractorError as ex:
|
except ExtractorError as e:
|
||||||
if ex.cause and ex.cause.code == 401:
|
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
|
||||||
self.raise_login_required(f'Please log in, or use direct link like https://sketch.pixiv.net/@{user_id}/1234567890', method='cookies')
|
self.raise_login_required(f'Please log in, or use direct link like https://sketch.pixiv.net/@{user_id}/1234567890', method='cookies')
|
||||||
raise ExtractorError('This user is offline', expected=True)
|
raise ExtractorError('This user is offline', expected=True)
|
||||||
|
|
||||||
|
130
yt_dlp/extractor/plvideo.py
Normal file
130
yt_dlp/extractor/plvideo.py
Normal file
@ -0,0 +1,130 @@
|
|||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import (
|
||||||
|
float_or_none,
|
||||||
|
int_or_none,
|
||||||
|
parse_iso8601,
|
||||||
|
parse_resolution,
|
||||||
|
url_or_none,
|
||||||
|
)
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
|
class PlVideoIE(InfoExtractor):
|
||||||
|
IE_DESC = 'Платформа'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?plvideo\.ru/(?:watch\?(?:[^#]+&)?v=|shorts/)(?P<id>[\w-]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://plvideo.ru/watch?v=Y5JzUzkcQTMK',
|
||||||
|
'md5': 'fe8e18aca892b3b31f3bf492169f8a26',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'Y5JzUzkcQTMK',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-images/v/cover/37/dd/37dd00a4c96c77436ab737e85947abd7/original663a4a3bb713e5.33151959.jpg',
|
||||||
|
'title': 'Presidente de Cuba llega a Moscú en una visita de trabajo',
|
||||||
|
'channel': 'RT en Español',
|
||||||
|
'channel_id': 'ZH4EKqunVDvo',
|
||||||
|
'media_type': 'video',
|
||||||
|
'comment_count': int,
|
||||||
|
'tags': ['rusia', 'cuba', 'russia', 'miguel díaz-canel'],
|
||||||
|
'description': 'md5:a1a395d900d77a86542a91ee0826c115',
|
||||||
|
'released_timestamp': 1715096124,
|
||||||
|
'channel_is_verified': True,
|
||||||
|
'like_count': int,
|
||||||
|
'timestamp': 1715095911,
|
||||||
|
'duration': 44320,
|
||||||
|
'view_count': int,
|
||||||
|
'dislike_count': int,
|
||||||
|
'upload_date': '20240507',
|
||||||
|
'modified_date': '20240701',
|
||||||
|
'channel_follower_count': int,
|
||||||
|
'modified_timestamp': 1719824073,
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://plvideo.ru/shorts/S3Uo9c-VLwFX',
|
||||||
|
'md5': '7d8fa2279406c69d2fd2a6fc548a9805',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'S3Uo9c-VLwFX',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'channel': 'Romaatom',
|
||||||
|
'tags': 'count:22',
|
||||||
|
'dislike_count': int,
|
||||||
|
'upload_date': '20241130',
|
||||||
|
'description': 'md5:452e6de219bf2f32bb95806c51c3b364',
|
||||||
|
'duration': 58433,
|
||||||
|
'modified_date': '20241130',
|
||||||
|
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-11-cover/S3Uo9c-VLwFX/f9318999-a941-482b-b700-2102a7049366.jpg',
|
||||||
|
'media_type': 'shorts',
|
||||||
|
'like_count': int,
|
||||||
|
'modified_timestamp': 1732961458,
|
||||||
|
'channel_is_verified': True,
|
||||||
|
'channel_id': 'erJyyTIbmUd1',
|
||||||
|
'timestamp': 1732961355,
|
||||||
|
'comment_count': int,
|
||||||
|
'title': 'Белоусов отменил приказы о кадровом резерве на гражданской службе',
|
||||||
|
'channel_follower_count': int,
|
||||||
|
'view_count': int,
|
||||||
|
'released_timestamp': 1732961458,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
|
||||||
|
video_data = self._download_json(
|
||||||
|
f'https://api.g1.plvideo.ru/v1/videos/{video_id}?Aud=18', video_id)
|
||||||
|
|
||||||
|
is_live = False
|
||||||
|
formats = []
|
||||||
|
subtitles = {}
|
||||||
|
automatic_captions = {}
|
||||||
|
for quality, data in traverse_obj(video_data, ('item', 'profiles', {dict.items}, lambda _, v: url_or_none(v[1]['hls']))):
|
||||||
|
formats.append({
|
||||||
|
'format_id': quality,
|
||||||
|
'ext': 'mp4',
|
||||||
|
'protocol': 'm3u8_native',
|
||||||
|
**traverse_obj(data, {
|
||||||
|
'url': 'hls',
|
||||||
|
'fps': ('fps', {float_or_none}),
|
||||||
|
'aspect_ratio': ('aspectRatio', {float_or_none}),
|
||||||
|
}),
|
||||||
|
**parse_resolution(quality),
|
||||||
|
})
|
||||||
|
if livestream_url := traverse_obj(video_data, ('item', 'livestream', 'url', {url_or_none})):
|
||||||
|
is_live = True
|
||||||
|
formats.extend(self._extract_m3u8_formats(livestream_url, video_id, 'mp4', live=True))
|
||||||
|
for lang, url in traverse_obj(video_data, ('item', 'subtitles', {dict.items}, lambda _, v: url_or_none(v[1]))):
|
||||||
|
if lang.endswith('-auto'):
|
||||||
|
automatic_captions.setdefault(lang[:-5], []).append({
|
||||||
|
'url': url,
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
subtitles.setdefault(lang, []).append({
|
||||||
|
'url': url,
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'formats': formats,
|
||||||
|
'subtitles': subtitles,
|
||||||
|
'automatic_captions': automatic_captions,
|
||||||
|
'is_live': is_live,
|
||||||
|
**traverse_obj(video_data, ('item', {
|
||||||
|
'id': ('id', {str}),
|
||||||
|
'title': ('title', {str}),
|
||||||
|
'description': ('description', {str}),
|
||||||
|
'thumbnail': ('cover', 'paths', 'original', 'src', {url_or_none}),
|
||||||
|
'duration': ('uploadFile', 'videoDuration', {int_or_none}),
|
||||||
|
'channel': ('channel', 'name', {str}),
|
||||||
|
'channel_id': ('channel', 'id', {str}),
|
||||||
|
'channel_follower_count': ('channel', 'stats', 'subscribers', {int_or_none}),
|
||||||
|
'channel_is_verified': ('channel', 'verified', {bool}),
|
||||||
|
'tags': ('tags', ..., {str}),
|
||||||
|
'timestamp': ('createdAt', {parse_iso8601}),
|
||||||
|
'released_timestamp': ('publishedAt', {parse_iso8601}),
|
||||||
|
'modified_timestamp': ('updatedAt', {parse_iso8601}),
|
||||||
|
'view_count': ('stats', 'viewTotalCount', {int_or_none}),
|
||||||
|
'like_count': ('stats', 'likeCount', {int_or_none}),
|
||||||
|
'dislike_count': ('stats', 'dislikeCount', {int_or_none}),
|
||||||
|
'comment_count': ('stats', 'commentCount', {int_or_none}),
|
||||||
|
'media_type': ('type', {str}),
|
||||||
|
})),
|
||||||
|
}
|
@ -176,6 +176,8 @@ class RTVSLOShowIE(InfoExtractor):
|
|||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '173250997',
|
'id': '173250997',
|
||||||
'title': 'Ekipa Bled',
|
'title': 'Ekipa Bled',
|
||||||
|
'description': 'md5:c88471e27a1268c448747a5325319ab7',
|
||||||
|
'thumbnail': 'https://img.rtvcdn.si/_up/ava/ava_misc/show_logos/173250997/logo_wide1.jpg',
|
||||||
},
|
},
|
||||||
'playlist_count': 18,
|
'playlist_count': 18,
|
||||||
}]
|
}]
|
||||||
@ -187,4 +189,7 @@ def _real_extract(self, url):
|
|||||||
return self.playlist_from_matches(
|
return self.playlist_from_matches(
|
||||||
re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage),
|
re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage),
|
||||||
playlist_id, self._html_extract_title(webpage),
|
playlist_id, self._html_extract_title(webpage),
|
||||||
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE)
|
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE,
|
||||||
|
description=self._og_search_description(webpage),
|
||||||
|
thumbnail=self._og_search_thumbnail(webpage),
|
||||||
|
)
|
||||||
|
@ -4,43 +4,12 @@
|
|||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
parse_qs,
|
UnsupportedError,
|
||||||
unsmuggle_url,
|
make_archive_id,
|
||||||
|
remove_end,
|
||||||
|
url_or_none,
|
||||||
)
|
)
|
||||||
|
from ..utils.traversal import traverse_obj
|
||||||
_COMMITTEES = {
|
|
||||||
'ag': ('76440', 'http://ag-f.akamaihd.net'),
|
|
||||||
'aging': ('76442', 'http://aging-f.akamaihd.net'),
|
|
||||||
'approps': ('76441', 'http://approps-f.akamaihd.net'),
|
|
||||||
'arch': ('', 'http://ussenate-f.akamaihd.net'),
|
|
||||||
'armed': ('76445', 'http://armed-f.akamaihd.net'),
|
|
||||||
'banking': ('76446', 'http://banking-f.akamaihd.net'),
|
|
||||||
'budget': ('76447', 'http://budget-f.akamaihd.net'),
|
|
||||||
'cecc': ('76486', 'http://srs-f.akamaihd.net'),
|
|
||||||
'commerce': ('80177', 'http://commerce1-f.akamaihd.net'),
|
|
||||||
'csce': ('75229', 'http://srs-f.akamaihd.net'),
|
|
||||||
'dpc': ('76590', 'http://dpc-f.akamaihd.net'),
|
|
||||||
'energy': ('76448', 'http://energy-f.akamaihd.net'),
|
|
||||||
'epw': ('76478', 'http://epw-f.akamaihd.net'),
|
|
||||||
'ethics': ('76449', 'http://ethics-f.akamaihd.net'),
|
|
||||||
'finance': ('76450', 'http://finance-f.akamaihd.net'),
|
|
||||||
'foreign': ('76451', 'http://foreign-f.akamaihd.net'),
|
|
||||||
'govtaff': ('76453', 'http://govtaff-f.akamaihd.net'),
|
|
||||||
'help': ('76452', 'http://help-f.akamaihd.net'),
|
|
||||||
'indian': ('76455', 'http://indian-f.akamaihd.net'),
|
|
||||||
'intel': ('76456', 'http://intel-f.akamaihd.net'),
|
|
||||||
'intlnarc': ('76457', 'http://intlnarc-f.akamaihd.net'),
|
|
||||||
'jccic': ('85180', 'http://jccic-f.akamaihd.net'),
|
|
||||||
'jec': ('76458', 'http://jec-f.akamaihd.net'),
|
|
||||||
'judiciary': ('76459', 'http://judiciary-f.akamaihd.net'),
|
|
||||||
'rpc': ('76591', 'http://rpc-f.akamaihd.net'),
|
|
||||||
'rules': ('76460', 'http://rules-f.akamaihd.net'),
|
|
||||||
'saa': ('76489', 'http://srs-f.akamaihd.net'),
|
|
||||||
'smbiz': ('76461', 'http://smbiz-f.akamaihd.net'),
|
|
||||||
'srs': ('75229', 'http://srs-f.akamaihd.net'),
|
|
||||||
'uscc': ('76487', 'http://srs-f.akamaihd.net'),
|
|
||||||
'vetaff': ('76462', 'http://vetaff-f.akamaihd.net'),
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class SenateISVPIE(InfoExtractor):
|
class SenateISVPIE(InfoExtractor):
|
||||||
@ -53,31 +22,46 @@ class SenateISVPIE(InfoExtractor):
|
|||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'judiciary031715',
|
'id': 'judiciary031715',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Integrated Senate Video Player',
|
'title': 'ISVP',
|
||||||
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
|
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
|
||||||
|
'_old_archive_ids': ['senategov judiciary031715'],
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Failed to download m3u8 information'],
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
|
'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'commerce011514',
|
'id': 'commerce011514',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Integrated Senate Video Player',
|
'title': 'Integrated Senate Video Player',
|
||||||
|
'_old_archive_ids': ['senategov commerce011514'],
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'skip': 'This video is not available.',
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
|
'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
|
||||||
# checksum differs each time
|
# checksum differs each time
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'intel090613',
|
'id': 'intel090613',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Integrated Senate Video Player',
|
'title': 'ISVP',
|
||||||
|
'_old_archive_ids': ['senategov intel090613'],
|
||||||
|
},
|
||||||
|
'expected_warnings': ['Failed to download m3u8 information'],
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.senate.gov/isvp/?auto_play=false&comm=help&filename=help090920&poster=https://www.help.senate.gov/assets/images/video-poster.png&stt=950',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'help090920',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'ISVP',
|
||||||
|
'thumbnail': 'https://www.help.senate.gov/assets/images/video-poster.png',
|
||||||
|
'_old_archive_ids': ['senategov help090920'],
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
# From http://www.c-span.org/video/?96791-1
|
# From http://www.c-span.org/video/?96791-1
|
||||||
@ -85,60 +69,81 @@ class SenateISVPIE(InfoExtractor):
|
|||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
|
_COMMITTEES = {
|
||||||
|
'ag': ('76440', 'https://ag-f.akamaihd.net', '2036803', 'agriculture'),
|
||||||
|
'aging': ('76442', 'https://aging-f.akamaihd.net', '2036801', 'aging'),
|
||||||
|
'approps': ('76441', 'https://approps-f.akamaihd.net', '2036802', 'appropriations'),
|
||||||
|
'arch': ('', 'https://ussenate-f.akamaihd.net', '', 'arch'),
|
||||||
|
'armed': ('76445', 'https://armed-f.akamaihd.net', '2036800', 'armedservices'),
|
||||||
|
'banking': ('76446', 'https://banking-f.akamaihd.net', '2036799', 'banking'),
|
||||||
|
'budget': ('76447', 'https://budget-f.akamaihd.net', '2036798', 'budget'),
|
||||||
|
'cecc': ('76486', 'https://srs-f.akamaihd.net', '2036782', 'srs_cecc'),
|
||||||
|
'commerce': ('80177', 'https://commerce1-f.akamaihd.net', '2036779', 'commerce'),
|
||||||
|
'csce': ('75229', 'https://srs-f.akamaihd.net', '2036777', 'srs_srs'),
|
||||||
|
'dpc': ('76590', 'https://dpc-f.akamaihd.net', '', 'dpc'),
|
||||||
|
'energy': ('76448', 'https://energy-f.akamaihd.net', '2036797', 'energy'),
|
||||||
|
'epw': ('76478', 'https://epw-f.akamaihd.net', '2036783', 'environment'),
|
||||||
|
'ethics': ('76449', 'https://ethics-f.akamaihd.net', '2036796', 'ethics'),
|
||||||
|
'finance': ('76450', 'https://finance-f.akamaihd.net', '2036795', 'finance_finance'),
|
||||||
|
'foreign': ('76451', 'https://foreign-f.akamaihd.net', '2036794', 'foreignrelations'),
|
||||||
|
'govtaff': ('76453', 'https://govtaff-f.akamaihd.net', '2036792', 'hsgac'),
|
||||||
|
'help': ('76452', 'https://help-f.akamaihd.net', '2036793', 'help'),
|
||||||
|
'indian': ('76455', 'https://indian-f.akamaihd.net', '2036791', 'indianaffairs'),
|
||||||
|
'intel': ('76456', 'https://intel-f.akamaihd.net', '2036790', 'intelligence'),
|
||||||
|
'intlnarc': ('76457', 'https://intlnarc-f.akamaihd.net', '', 'internationalnarcoticscaucus'),
|
||||||
|
'jccic': ('85180', 'https://jccic-f.akamaihd.net', '2036778', 'jccic'),
|
||||||
|
'jec': ('76458', 'https://jec-f.akamaihd.net', '2036789', 'jointeconomic'),
|
||||||
|
'judiciary': ('76459', 'https://judiciary-f.akamaihd.net', '2036788', 'judiciary'),
|
||||||
|
'rpc': ('76591', 'https://rpc-f.akamaihd.net', '', 'rpc'),
|
||||||
|
'rules': ('76460', 'https://rules-f.akamaihd.net', '2036787', 'rules'),
|
||||||
|
'saa': ('76489', 'https://srs-f.akamaihd.net', '2036780', 'srs_saa'),
|
||||||
|
'smbiz': ('76461', 'https://smbiz-f.akamaihd.net', '2036786', 'smallbusiness'),
|
||||||
|
'srs': ('75229', 'https://srs-f.akamaihd.net', '2031966', 'srs_srs'),
|
||||||
|
'uscc': ('76487', 'https://srs-f.akamaihd.net', '2036781', 'srs_uscc'),
|
||||||
|
'vetaff': ('76462', 'https://vetaff-f.akamaihd.net', '2036785', 'veteransaffairs'),
|
||||||
|
}
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
url, smuggled_data = unsmuggle_url(url, {})
|
|
||||||
|
|
||||||
qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs'))
|
qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs'))
|
||||||
if not qs.get('filename') or not qs.get('type') or not qs.get('comm'):
|
if not qs.get('filename') or not qs.get('comm'):
|
||||||
raise ExtractorError('Invalid URL', expected=True)
|
raise ExtractorError('Invalid URL', expected=True)
|
||||||
|
filename = qs['filename'][0]
|
||||||
video_id = re.sub(r'.mp4$', '', qs['filename'][0])
|
video_id = remove_end(filename, '.mp4')
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
committee = qs['comm'][0]
|
||||||
|
|
||||||
if smuggled_data.get('force_title'):
|
stream_num, stream_domain, stream_id, msl3 = self._COMMITTEES[committee]
|
||||||
title = smuggled_data['force_title']
|
|
||||||
else:
|
|
||||||
title = self._html_extract_title(webpage)
|
|
||||||
poster = qs.get('poster')
|
|
||||||
thumbnail = poster[0] if poster else None
|
|
||||||
|
|
||||||
video_type = qs['type'][0]
|
|
||||||
committee = video_type if video_type == 'arch' else qs['comm'][0]
|
|
||||||
|
|
||||||
stream_num, domain = _COMMITTEES[committee]
|
|
||||||
|
|
||||||
|
urls_alternatives = [f'https://www-senate-gov-media-srs.akamaized.net/hls/live/{stream_id}/{committee}/{filename}/master.m3u8',
|
||||||
|
f'https://www-senate-gov-msl3archive.akamaized.net/{msl3}/{filename}_1/master.m3u8',
|
||||||
|
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
|
||||||
|
f'{stream_domain}/i/{filename}.mp4/master.m3u8']
|
||||||
formats = []
|
formats = []
|
||||||
if video_type == 'arch':
|
subtitles = {}
|
||||||
filename = video_id if '.' in video_id else video_id + '.mp4'
|
for video_url in urls_alternatives:
|
||||||
m3u8_url = urllib.parse.urljoin(domain, 'i/' + filename + '/master.m3u8')
|
formats, subtitles = self._extract_m3u8_formats_and_subtitles(video_url, video_id, ext='mp4', fatal=False)
|
||||||
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8')
|
if formats:
|
||||||
else:
|
break
|
||||||
hdcore_sign = 'hdcore=3.1.0'
|
|
||||||
url_params = (domain, video_id, stream_num)
|
|
||||||
f4m_url = f'%s/z/%s_1@%s/manifest.f4m?{hdcore_sign}' % url_params
|
|
||||||
m3u8_url = '{}/i/{}_1@{}/master.m3u8'.format(*url_params)
|
|
||||||
for entry in self._extract_f4m_formats(f4m_url, video_id, f4m_id='f4m'):
|
|
||||||
# URLs without the extra param induce an 404 error
|
|
||||||
entry.update({'extra_param_to_segment_url': hdcore_sign})
|
|
||||||
formats.append(entry)
|
|
||||||
for entry in self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8'):
|
|
||||||
mobj = re.search(r'(?P<tag>(?:-p|-b)).m3u8', entry['url'])
|
|
||||||
if mobj:
|
|
||||||
entry['format_id'] += mobj.group('tag')
|
|
||||||
formats.append(entry)
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': title,
|
'title': self._html_extract_title(webpage),
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'thumbnail': thumbnail,
|
'subtitles': subtitles,
|
||||||
|
'thumbnail': traverse_obj(qs, ('poster', 0, {url_or_none})),
|
||||||
|
'_old_archive_ids': [make_archive_id(SenateGovIE, video_id)],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
class SenateGovIE(InfoExtractor):
|
class SenateGovIE(InfoExtractor):
|
||||||
_IE_NAME = 'senate.gov'
|
_IE_NAME = 'senate.gov'
|
||||||
_VALID_URL = r'https?:\/\/(?:www\.)?(help|appropriations|judiciary|banking|armed-services|finance)\.senate\.gov'
|
_SUBDOMAIN_RE = '|'.join(map(re.escape, (
|
||||||
|
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
|
||||||
|
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',
|
||||||
|
'intelligence', 'inaugural', 'judiciary', 'rules', 'sbc', 'veterans',
|
||||||
|
)))
|
||||||
|
_VALID_URL = rf'https?://(?:www\.)?(?:{_SUBDOMAIN_RE})\.senate\.gov'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health',
|
'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -147,6 +152,9 @@ class SenateGovIE(InfoExtractor):
|
|||||||
'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health',
|
'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health',
|
||||||
'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions',
|
'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
|
'age_limit': 0,
|
||||||
|
'thumbnail': 'https://www.help.senate.gov/assets/images/sharelogo.jpg',
|
||||||
|
'_old_archive_ids': ['senategov help090920'],
|
||||||
},
|
},
|
||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
}, {
|
}, {
|
||||||
@ -156,8 +164,12 @@ class SenateGovIE(InfoExtractor):
|
|||||||
'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD',
|
'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD',
|
||||||
'title': 'Review of the FY2019 Budget Request for the U.S. Army',
|
'title': 'Review of the FY2019 Budget Request for the U.S. Army',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
|
'age_limit': 0,
|
||||||
|
'thumbnail': 'https://www.appropriations.senate.gov/themes/appropriations/images/video-poster-flash-fit.png',
|
||||||
|
'_old_archive_ids': ['senategov appropsA051518'],
|
||||||
},
|
},
|
||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
|
'expected_warnings': ['Failed to download m3u8 information'],
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization',
|
'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -166,32 +178,65 @@ class SenateGovIE(InfoExtractor):
|
|||||||
'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization',
|
'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization',
|
||||||
'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs',
|
'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
|
'thumbnail': 'https://www.banking.senate.gov/themes/banking/images/sharelogo.jpg',
|
||||||
|
'age_limit': 0,
|
||||||
|
'_old_archive_ids': ['senategov banking041521'],
|
||||||
},
|
},
|
||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.agriculture.senate.gov/hearings/hemp-production-and-the-2018-farm-bill',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.aging.senate.gov/hearings/the-older-americans-act-the-local-impact-of-the-law-and-the-upcoming-reauthorization',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.budget.senate.gov/hearings/improving-care-lowering-costs-achieving-health-care-efficiency',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.commerce.senate.gov/2024/12/communications-networks-safety-and-security',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.energy.senate.gov/hearings/2024/2/full-committee-hearing-to-examine',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.epw.senate.gov/public/index.cfm/hearings?ID=F63083EA-2C13-498C-B548-341BED68C209',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.foreign.senate.gov/hearings/american-diplomacy-and-global-leadership-review-of-the-fy25-state-department-budget-request',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.intelligence.senate.gov/hearings/foreign-threats-elections-2024-%E2%80%93-roles-and-responsibilities-us-tech-providers',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.inaugural.senate.gov/52nd-inaugural-ceremonies/',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.rules.senate.gov/hearings/02/07/2023/business-meeting',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.sbc.senate.gov/public/index.cfm/hearings?ID=5B13AA6B-8279-45AF-B54B-94156DC7A2AB',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.veterans.senate.gov/2024/5/frontier-health-care-ensuring-veterans-access-no-matter-where-they-live',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
display_id = self._generic_id(url)
|
display_id = self._generic_id(url)
|
||||||
webpage = self._download_webpage(url, display_id)
|
webpage = self._download_webpage(url, display_id)
|
||||||
parse_info = parse_qs(self._search_regex(
|
url_info = next(SenateISVPIE.extract_from_webpage(self._downloader, url, webpage), None)
|
||||||
r'<iframe class="[^>"]*streaminghearing[^>"]*"\s[^>]*\bsrc="([^">]*)', webpage, 'hearing URL'))
|
if not url_info:
|
||||||
|
raise UnsupportedError(url)
|
||||||
stream_num, stream_domain = _COMMITTEES[parse_info['comm'][-1]]
|
|
||||||
filename = parse_info['filename'][-1]
|
|
||||||
|
|
||||||
formats = self._extract_m3u8_formats(
|
|
||||||
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
|
|
||||||
display_id, ext='mp4')
|
|
||||||
|
|
||||||
title = self._html_search_regex(
|
title = self._html_search_regex(
|
||||||
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title')
|
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title', fatal=False)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': re.sub(r'.mp4$', '', filename),
|
**url_info,
|
||||||
|
'_type': 'url_transparent',
|
||||||
'display_id': display_id,
|
'display_id': display_id,
|
||||||
'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(),
|
'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(),
|
||||||
'description': self._og_search_description(webpage, default=None),
|
'description': self._og_search_description(webpage, default=None),
|
||||||
'thumbnail': self._og_search_thumbnail(webpage, default=None),
|
'thumbnail': self._og_search_thumbnail(webpage, default=None),
|
||||||
'age_limit': self._rta_search(webpage),
|
'age_limit': self._rta_search(webpage),
|
||||||
'formats': formats,
|
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,6 @@
|
|||||||
from ..networking import HEADRequest
|
from ..networking import HEADRequest
|
||||||
from ..networking.exceptions import HTTPError
|
from ..networking.exceptions import HTTPError
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
KNOWN_EXTENSIONS,
|
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
@ -211,6 +210,7 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, extract_f
|
|||||||
|
|
||||||
format_urls = set()
|
format_urls = set()
|
||||||
formats = []
|
formats = []
|
||||||
|
has_drm = False
|
||||||
query = {'client_id': self._CLIENT_ID}
|
query = {'client_id': self._CLIENT_ID}
|
||||||
if secret_token:
|
if secret_token:
|
||||||
query['secret_token'] = secret_token
|
query['secret_token'] = secret_token
|
||||||
@ -246,55 +246,24 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, extract_f
|
|||||||
'url': format_url,
|
'url': format_url,
|
||||||
'quality': 10,
|
'quality': 10,
|
||||||
'format_note': 'Original',
|
'format_note': 'Original',
|
||||||
|
'vcodec': 'none',
|
||||||
})
|
})
|
||||||
|
|
||||||
def invalid_url(url):
|
def invalid_url(url):
|
||||||
return not url or url in format_urls
|
return not url or url in format_urls
|
||||||
|
|
||||||
def add_format(f, protocol, is_preview=False):
|
|
||||||
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
|
|
||||||
if mobj:
|
|
||||||
for k, v in mobj.groupdict().items():
|
|
||||||
if not f.get(k):
|
|
||||||
f[k] = v
|
|
||||||
format_id_list = []
|
|
||||||
if protocol:
|
|
||||||
format_id_list.append(protocol)
|
|
||||||
ext = f.get('ext')
|
|
||||||
if ext == 'aac':
|
|
||||||
f.update({
|
|
||||||
'abr': 256,
|
|
||||||
'quality': 5,
|
|
||||||
'format_note': 'Premium',
|
|
||||||
})
|
|
||||||
for k in ('ext', 'abr'):
|
|
||||||
v = str_or_none(f.get(k))
|
|
||||||
if v:
|
|
||||||
format_id_list.append(v)
|
|
||||||
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
|
|
||||||
if preview:
|
|
||||||
format_id_list.append('preview')
|
|
||||||
abr = f.get('abr')
|
|
||||||
if abr:
|
|
||||||
f['abr'] = int(abr)
|
|
||||||
if protocol in ('hls', 'hls-aes'):
|
|
||||||
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
|
|
||||||
else:
|
|
||||||
protocol = 'http'
|
|
||||||
f.update({
|
|
||||||
'format_id': '_'.join(format_id_list),
|
|
||||||
'protocol': protocol,
|
|
||||||
'preference': -10 if preview else None,
|
|
||||||
})
|
|
||||||
formats.append(f)
|
|
||||||
|
|
||||||
# New API
|
# New API
|
||||||
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']))):
|
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']) and v['preset'])):
|
||||||
if extract_flat:
|
if extract_flat:
|
||||||
break
|
break
|
||||||
format_url = t['url']
|
format_url = t['url']
|
||||||
|
preset = t['preset']
|
||||||
|
preset_base = preset.partition('_')[0]
|
||||||
|
|
||||||
protocol = traverse_obj(t, ('format', 'protocol', {str}))
|
protocol = traverse_obj(t, ('format', 'protocol', {str})) or 'http'
|
||||||
|
if protocol.startswith(('ctr-', 'cbc-')):
|
||||||
|
has_drm = True
|
||||||
|
continue
|
||||||
if protocol == 'progressive':
|
if protocol == 'progressive':
|
||||||
protocol = 'http'
|
protocol = 'http'
|
||||||
if protocol != 'hls' and '/hls' in format_url:
|
if protocol != 'hls' and '/hls' in format_url:
|
||||||
@ -302,35 +271,60 @@ def add_format(f, protocol, is_preview=False):
|
|||||||
if protocol == 'encrypted-hls' or '/encrypted-hls' in format_url:
|
if protocol == 'encrypted-hls' or '/encrypted-hls' in format_url:
|
||||||
protocol = 'hls-aes'
|
protocol = 'hls-aes'
|
||||||
|
|
||||||
ext = None
|
short_identifier = f'{protocol}_{preset_base}'
|
||||||
if preset := traverse_obj(t, ('preset', {str_or_none})):
|
if preset_base == 'abr':
|
||||||
ext = preset.split('_')[0]
|
self.write_debug(f'Skipping broken "{short_identifier}" format')
|
||||||
if ext not in KNOWN_EXTENSIONS:
|
continue
|
||||||
ext = mimetype2ext(traverse_obj(t, ('format', 'mime_type', {str})))
|
if not self._is_requested(short_identifier):
|
||||||
|
self.write_debug(f'"{short_identifier}" is not a requested format, skipping')
|
||||||
identifier = join_nonempty(protocol, ext, delim='_')
|
|
||||||
if not self._is_requested(identifier):
|
|
||||||
self.write_debug(f'"{identifier}" is not a requested format, skipping')
|
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called
|
# XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called
|
||||||
stream_url = traverse_obj(self._call_api(
|
stream_url = traverse_obj(self._call_api(
|
||||||
format_url, track_id, f'Downloading {identifier} format info JSON',
|
format_url, track_id, f'Downloading {short_identifier} format info JSON',
|
||||||
query=query, headers=self._HEADERS), ('url', {url_or_none}))
|
query=query, headers=self._HEADERS), ('url', {url_or_none}))
|
||||||
|
|
||||||
if invalid_url(stream_url):
|
if invalid_url(stream_url):
|
||||||
continue
|
continue
|
||||||
format_urls.add(stream_url)
|
format_urls.add(stream_url)
|
||||||
add_format({
|
|
||||||
|
mime_type = traverse_obj(t, ('format', 'mime_type', {str}))
|
||||||
|
codec = self._search_regex(r'codecs="([^"]+)"', mime_type, 'codec', default=None)
|
||||||
|
ext = {
|
||||||
|
'mp4a': 'm4a',
|
||||||
|
'opus': 'opus',
|
||||||
|
}.get(codec[:4] if codec else None) or mimetype2ext(mime_type, default=None)
|
||||||
|
if not ext or ext == 'm3u8':
|
||||||
|
ext = preset_base
|
||||||
|
|
||||||
|
is_premium = t.get('quality') == 'hq'
|
||||||
|
abr = int_or_none(
|
||||||
|
self._search_regex(r'(\d+)k$', preset, 'abr', default=None)
|
||||||
|
or self._search_regex(r'\.(\d+)\.(?:opus|mp3)[/?]', stream_url, 'abr', default=None)
|
||||||
|
or (256 if (is_premium and 'aac' in preset) else None))
|
||||||
|
|
||||||
|
is_preview = (t.get('snipped')
|
||||||
|
or '/preview/' in format_url
|
||||||
|
or re.search(r'/(?:preview|playlist)/0/30/', stream_url))
|
||||||
|
|
||||||
|
formats.append({
|
||||||
|
'format_id': join_nonempty(protocol, preset, is_preview and 'preview', delim='_'),
|
||||||
'url': stream_url,
|
'url': stream_url,
|
||||||
'ext': ext,
|
'ext': ext,
|
||||||
}, protocol, t.get('snipped') or '/preview/' in format_url)
|
'acodec': codec,
|
||||||
|
'vcodec': 'none',
|
||||||
|
'abr': abr,
|
||||||
|
'protocol': 'm3u8_native' if protocol in ('hls', 'hls-aes') else 'http',
|
||||||
|
'container': 'm4a_dash' if ext == 'm4a' else None,
|
||||||
|
'quality': 5 if is_premium else 0 if (abr and abr >= 160) else -1,
|
||||||
|
'format_note': 'Premium' if is_premium else None,
|
||||||
|
'preference': -10 if is_preview else None,
|
||||||
|
})
|
||||||
|
|
||||||
for f in formats:
|
if not formats:
|
||||||
f['vcodec'] = 'none'
|
if has_drm:
|
||||||
|
self.report_drm(track_id)
|
||||||
if not formats and info.get('policy') == 'BLOCK':
|
if info.get('policy') == 'BLOCK':
|
||||||
self.raise_geo_restricted(metadata_available=True)
|
self.raise_geo_restricted(metadata_available=True)
|
||||||
|
|
||||||
user = info.get('user') or {}
|
user = info.get('user') or {}
|
||||||
|
|
||||||
|
@ -28,24 +28,21 @@ class StripchatIE(InfoExtractor):
|
|||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, video_id, headers=self.geo_verification_headers())
|
webpage = self._download_webpage(url, video_id, headers=self.geo_verification_headers())
|
||||||
|
data = self._search_json(
|
||||||
|
r'<script\b[^>]*>\s*window\.__PRELOADED_STATE__\s*=',
|
||||||
|
webpage, 'data', video_id, transform_source=lowercase_escape)
|
||||||
|
|
||||||
data = self._parse_json(
|
if traverse_obj(data, ('viewCam', 'show', {dict})):
|
||||||
self._search_regex(
|
raise ExtractorError('Model is in a private show', expected=True)
|
||||||
r'<script\b[^>]*>\s*window\.__PRELOADED_STATE__\s*=(?P<value>.*?)<\/script>',
|
if not traverse_obj(data, ('viewCam', 'model', 'isLive', {bool})):
|
||||||
webpage, 'data', default='{}', group='value'),
|
|
||||||
video_id, transform_source=lowercase_escape, fatal=False)
|
|
||||||
if not data:
|
|
||||||
raise ExtractorError('Unable to find configuration for stream.')
|
|
||||||
|
|
||||||
if traverse_obj(data, ('viewCam', 'show'), expected_type=dict):
|
|
||||||
raise ExtractorError('Model is in private show', expected=True)
|
|
||||||
elif not traverse_obj(data, ('viewCam', 'model', 'isLive'), expected_type=bool):
|
|
||||||
raise UserNotLive(video_id=video_id)
|
raise UserNotLive(video_id=video_id)
|
||||||
|
|
||||||
model_id = traverse_obj(data, ('viewCam', 'model', 'id'), expected_type=int)
|
model_id = data['viewCam']['model']['id']
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
for host in traverse_obj(data, ('config', 'data', (
|
# HLS hosts are currently found in .configV3.static.features.hlsFallback.fallbackDomains[]
|
||||||
|
# The rest of the path is for backwards compatibility and to guard against A/B testing
|
||||||
|
for host in traverse_obj(data, ((('config', 'data'), ('configV3', 'static')), (
|
||||||
(('features', 'featuresV2'), 'hlsFallback', 'fallbackDomains', ...), 'hlsStreamHost'))):
|
(('features', 'featuresV2'), 'hlsFallback', 'fallbackDomains', ...), 'hlsStreamHost'))):
|
||||||
formats = self._extract_m3u8_formats(
|
formats = self._extract_m3u8_formats(
|
||||||
f'https://edge-hls.{host}/hls/{model_id}/master/{model_id}_auto.m3u8',
|
f'https://edge-hls.{host}/hls/{model_id}/master/{model_id}_auto.m3u8',
|
||||||
@ -53,7 +50,7 @@ def _real_extract(self, url):
|
|||||||
if formats:
|
if formats:
|
||||||
break
|
break
|
||||||
if not formats:
|
if not formats:
|
||||||
self.raise_no_formats('No active streams found', expected=True)
|
self.raise_no_formats('Unable to extract stream host', video_id=video_id)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
|
@ -413,15 +413,6 @@ def extract_addr(addr, add_meta={}):
|
|||||||
for f in formats:
|
for f in formats:
|
||||||
self._set_cookie(urllib.parse.urlparse(f['url']).hostname, 'sid_tt', auth_cookie.value)
|
self._set_cookie(urllib.parse.urlparse(f['url']).hostname, 'sid_tt', auth_cookie.value)
|
||||||
|
|
||||||
thumbnails = []
|
|
||||||
for cover_id in ('cover', 'ai_dynamic_cover', 'animated_cover', 'ai_dynamic_cover_bak',
|
|
||||||
'origin_cover', 'dynamic_cover'):
|
|
||||||
for cover_url in traverse_obj(video_info, (cover_id, 'url_list', ...)):
|
|
||||||
thumbnails.append({
|
|
||||||
'id': cover_id,
|
|
||||||
'url': cover_url,
|
|
||||||
})
|
|
||||||
|
|
||||||
stats_info = aweme_detail.get('statistics') or {}
|
stats_info = aweme_detail.get('statistics') or {}
|
||||||
music_info = aweme_detail.get('music') or {}
|
music_info = aweme_detail.get('music') or {}
|
||||||
labels = traverse_obj(aweme_detail, ('hybrid_label', ..., 'text'), expected_type=str)
|
labels = traverse_obj(aweme_detail, ('hybrid_label', ..., 'text'), expected_type=str)
|
||||||
@ -467,7 +458,17 @@ def extract_addr(addr, add_meta={}):
|
|||||||
'formats': formats,
|
'formats': formats,
|
||||||
'subtitles': self.extract_subtitles(
|
'subtitles': self.extract_subtitles(
|
||||||
aweme_detail, aweme_id, traverse_obj(author_info, 'uploader', 'uploader_id', 'channel_id')),
|
aweme_detail, aweme_id, traverse_obj(author_info, 'uploader', 'uploader_id', 'channel_id')),
|
||||||
'thumbnails': thumbnails,
|
'thumbnails': [
|
||||||
|
{
|
||||||
|
'id': cover_id,
|
||||||
|
'url': cover_url,
|
||||||
|
'preference': -1 if cover_id in ('cover', 'origin_cover') else -2,
|
||||||
|
}
|
||||||
|
for cover_id in (
|
||||||
|
'cover', 'ai_dynamic_cover', 'animated_cover',
|
||||||
|
'ai_dynamic_cover_bak', 'origin_cover', 'dynamic_cover')
|
||||||
|
for cover_url in traverse_obj(video_info, (cover_id, 'url_list', ...))
|
||||||
|
],
|
||||||
'duration': (traverse_obj(video_info, (
|
'duration': (traverse_obj(video_info, (
|
||||||
(None, 'download_addr'), 'duration', {int_or_none(scale=1000)}, any))
|
(None, 'download_addr'), 'duration', {int_or_none(scale=1000)}, any))
|
||||||
or traverse_obj(music_info, ('duration', {int_or_none}))),
|
or traverse_obj(music_info, ('duration', {int_or_none}))),
|
||||||
@ -600,11 +601,15 @@ def _parse_aweme_video_web(self, aweme_detail, webpage_url, video_id, extract_fl
|
|||||||
'repost_count': 'shareCount',
|
'repost_count': 'shareCount',
|
||||||
'comment_count': 'commentCount',
|
'comment_count': 'commentCount',
|
||||||
}), expected_type=int_or_none),
|
}), expected_type=int_or_none),
|
||||||
'thumbnails': traverse_obj(aweme_detail, (
|
'thumbnails': [
|
||||||
(None, 'video'), ('thumbnail', 'cover', 'dynamicCover', 'originCover'), {
|
{
|
||||||
'url': ({url_or_none}, {self._proto_relative_url}),
|
'id': cover_id,
|
||||||
},
|
'url': self._proto_relative_url(cover_url),
|
||||||
)),
|
'preference': -2 if cover_id == 'dynamicCover' else -1,
|
||||||
|
}
|
||||||
|
for cover_id in ('thumbnail', 'cover', 'dynamicCover', 'originCover')
|
||||||
|
for cover_url in traverse_obj(aweme_detail, ((None, 'video'), cover_id, {url_or_none}))
|
||||||
|
],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -189,26 +189,6 @@ class TumblrIE(InfoExtractor):
|
|||||||
'release_date': '20140227',
|
'release_date': '20140227',
|
||||||
},
|
},
|
||||||
'add_ie': ['Vimeo'],
|
'add_ie': ['Vimeo'],
|
||||||
}, {
|
|
||||||
'url': 'http://sutiblr.tumblr.com/post/139638707273',
|
|
||||||
'md5': '2dd184b3669e049ba40563a7d423f95c',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'ir7qBEIKqvq',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Vine by sutiblr',
|
|
||||||
'alt_title': 'Vine by sutiblr',
|
|
||||||
'uploader': 'sutiblr',
|
|
||||||
'uploader_id': '1198993975374495744',
|
|
||||||
'upload_date': '20160220',
|
|
||||||
'like_count': int,
|
|
||||||
'comment_count': int,
|
|
||||||
'repost_count': int,
|
|
||||||
'thumbnail': r're:^https?://.*\.jpg',
|
|
||||||
'timestamp': 1455940159,
|
|
||||||
'view_count': int,
|
|
||||||
},
|
|
||||||
'add_ie': ['Vine'],
|
|
||||||
'skip': 'Vine is unavailable',
|
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://silami.tumblr.com/post/84250043974/my-bad-river-flows-in-you-impression-on-maschine',
|
'url': 'https://silami.tumblr.com/post/84250043974/my-bad-river-flows-in-you-impression-on-maschine',
|
||||||
'md5': '3c92d7c3d867f14ccbeefa2119022277',
|
'md5': '3c92d7c3d867f14ccbeefa2119022277',
|
||||||
@ -366,7 +346,6 @@ class TumblrIE(InfoExtractor):
|
|||||||
_providers = {
|
_providers = {
|
||||||
'instagram': 'Instagram',
|
'instagram': 'Instagram',
|
||||||
'vimeo': 'Vimeo',
|
'vimeo': 'Vimeo',
|
||||||
'vine': 'Vine',
|
|
||||||
'youtube': 'Youtube',
|
'youtube': 'Youtube',
|
||||||
'dailymotion': 'Dailymotion',
|
'dailymotion': 'Dailymotion',
|
||||||
'tiktok': 'TikTok',
|
'tiktok': 'TikTok',
|
||||||
|
@ -409,26 +409,6 @@ class TwitterCardIE(InfoExtractor):
|
|||||||
},
|
},
|
||||||
'add_ie': ['Youtube'],
|
'add_ie': ['Youtube'],
|
||||||
},
|
},
|
||||||
{
|
|
||||||
'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'iBb2x00UVlv',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'upload_date': '20151113',
|
|
||||||
'uploader_id': '1189339351084113920',
|
|
||||||
'uploader': 'ArsenalTerje',
|
|
||||||
'title': 'Vine by ArsenalTerje',
|
|
||||||
'timestamp': 1447451307,
|
|
||||||
'alt_title': 'Vine by ArsenalTerje',
|
|
||||||
'comment_count': int,
|
|
||||||
'like_count': int,
|
|
||||||
'thumbnail': r're:^https?://[^?#]+\.jpg',
|
|
||||||
'view_count': int,
|
|
||||||
'repost_count': int,
|
|
||||||
},
|
|
||||||
'add_ie': ['Vine'],
|
|
||||||
'params': {'skip_download': 'm3u8'},
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
|
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
|
||||||
'md5': '884812a2adc8aaf6fe52b15ccbfa3b88',
|
'md5': '884812a2adc8aaf6fe52b15ccbfa3b88',
|
||||||
@ -567,25 +547,6 @@ class TwitterIE(TwitterBaseIE):
|
|||||||
'age_limit': 0,
|
'age_limit': 0,
|
||||||
'_old_archive_ids': ['twitter 700207533655363584'],
|
'_old_archive_ids': ['twitter 700207533655363584'],
|
||||||
},
|
},
|
||||||
}, {
|
|
||||||
'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
|
|
||||||
'md5': '89a15ed345d13b86e9a5a5e051fa308a',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'MIOxnrUteUd',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
|
|
||||||
'uploader': 'TAKUMA',
|
|
||||||
'uploader_id': '1004126642786242560',
|
|
||||||
'timestamp': 1402826626,
|
|
||||||
'upload_date': '20140615',
|
|
||||||
'thumbnail': r're:^https?://.*\.jpg',
|
|
||||||
'alt_title': 'Vine by TAKUMA',
|
|
||||||
'comment_count': int,
|
|
||||||
'repost_count': int,
|
|
||||||
'like_count': int,
|
|
||||||
'view_count': int,
|
|
||||||
},
|
|
||||||
'add_ie': ['Vine'],
|
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
|
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
|
@ -421,5 +421,5 @@ def _real_extract(self, url):
|
|||||||
return self._process_video_json(video_json['chapters'][0], video_id)
|
return self._process_video_json(video_json['chapters'][0], video_id)
|
||||||
|
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
[self._process_video_json(chapter, video_id) for chapter in video_json['chapters']],
|
(self._process_video_json(chapter, video_id) for chapter in video_json['chapters']),
|
||||||
str(video_json['playerUuid']), video_json.get('name'))
|
str(video_json['playerUuid']), video_json.get('name'))
|
||||||
|
@ -1,150 +0,0 @@
|
|||||||
from .common import InfoExtractor
|
|
||||||
from ..utils import (
|
|
||||||
determine_ext,
|
|
||||||
format_field,
|
|
||||||
int_or_none,
|
|
||||||
unified_timestamp,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class VineIE(InfoExtractor):
|
|
||||||
_VALID_URL = r'https?://(?:www\.)?vine\.co/(?:v|oembed)/(?P<id>\w+)'
|
|
||||||
_EMBED_REGEX = [r'<iframe[^>]+src=[\'"](?P<url>(?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))']
|
|
||||||
_TESTS = [{
|
|
||||||
'url': 'https://vine.co/v/b9KOOWX7HUx',
|
|
||||||
'md5': '2f36fed6235b16da96ce9b4dc890940d',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'b9KOOWX7HUx',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Chicken.',
|
|
||||||
'alt_title': 'Vine by Jack',
|
|
||||||
'timestamp': 1368997951,
|
|
||||||
'upload_date': '20130519',
|
|
||||||
'uploader': 'Jack',
|
|
||||||
'uploader_id': '76',
|
|
||||||
'view_count': int,
|
|
||||||
'like_count': int,
|
|
||||||
'comment_count': int,
|
|
||||||
'repost_count': int,
|
|
||||||
},
|
|
||||||
}, {
|
|
||||||
'url': 'https://vine.co/v/e192BnZnZ9V',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'e192BnZnZ9V',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'ยิ้ม~ เขิน~ อาย~ น่าร้ากอ้ะ >//< @n_whitewo @orlameena #lovesicktheseries #lovesickseason2',
|
|
||||||
'alt_title': 'Vine by Pimry_zaa',
|
|
||||||
'timestamp': 1436057405,
|
|
||||||
'upload_date': '20150705',
|
|
||||||
'uploader': 'Pimry_zaa',
|
|
||||||
'uploader_id': '1135760698325307392',
|
|
||||||
'view_count': int,
|
|
||||||
'like_count': int,
|
|
||||||
'comment_count': int,
|
|
||||||
'repost_count': int,
|
|
||||||
},
|
|
||||||
'params': {
|
|
||||||
'skip_download': True,
|
|
||||||
},
|
|
||||||
}, {
|
|
||||||
'url': 'https://vine.co/v/MYxVapFvz2z',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'https://vine.co/v/bxVjBbZlPUH',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
|
|
||||||
'only_matching': True,
|
|
||||||
}]
|
|
||||||
|
|
||||||
def _real_extract(self, url):
|
|
||||||
video_id = self._match_id(url)
|
|
||||||
|
|
||||||
data = self._download_json(
|
|
||||||
f'https://archive.vine.co/posts/{video_id}.json', video_id)
|
|
||||||
|
|
||||||
def video_url(kind):
|
|
||||||
for url_suffix in ('Url', 'URL'):
|
|
||||||
format_url = data.get(f'video{kind}{url_suffix}')
|
|
||||||
if format_url:
|
|
||||||
return format_url
|
|
||||||
|
|
||||||
formats = []
|
|
||||||
for quality, format_id in enumerate(('low', '', 'dash')):
|
|
||||||
format_url = video_url(format_id.capitalize())
|
|
||||||
if not format_url:
|
|
||||||
continue
|
|
||||||
# DASH link returns plain mp4
|
|
||||||
if format_id == 'dash' and determine_ext(format_url) == 'mpd':
|
|
||||||
formats.extend(self._extract_mpd_formats(
|
|
||||||
format_url, video_id, mpd_id='dash', fatal=False))
|
|
||||||
else:
|
|
||||||
formats.append({
|
|
||||||
'url': format_url,
|
|
||||||
'format_id': format_id or 'standard',
|
|
||||||
'quality': quality,
|
|
||||||
})
|
|
||||||
self._check_formats(formats, video_id)
|
|
||||||
|
|
||||||
username = data.get('username')
|
|
||||||
|
|
||||||
alt_title = format_field(username, None, 'Vine by %s')
|
|
||||||
|
|
||||||
return {
|
|
||||||
'id': video_id,
|
|
||||||
'title': data.get('description') or alt_title or 'Vine video',
|
|
||||||
'alt_title': alt_title,
|
|
||||||
'thumbnail': data.get('thumbnailUrl'),
|
|
||||||
'timestamp': unified_timestamp(data.get('created')),
|
|
||||||
'uploader': username,
|
|
||||||
'uploader_id': data.get('userIdStr'),
|
|
||||||
'view_count': int_or_none(data.get('loops')),
|
|
||||||
'like_count': int_or_none(data.get('likes')),
|
|
||||||
'comment_count': int_or_none(data.get('comments')),
|
|
||||||
'repost_count': int_or_none(data.get('reposts')),
|
|
||||||
'formats': formats,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class VineUserIE(InfoExtractor):
|
|
||||||
IE_NAME = 'vine:user'
|
|
||||||
_VALID_URL = r'https?://vine\.co/(?P<u>u/)?(?P<user>[^/]+)'
|
|
||||||
_VINE_BASE_URL = 'https://vine.co/'
|
|
||||||
_TESTS = [{
|
|
||||||
'url': 'https://vine.co/itsruthb',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'itsruthb',
|
|
||||||
'title': 'Ruth B',
|
|
||||||
'description': '| Instagram/Twitter: itsruthb | still a lost boy from neverland',
|
|
||||||
},
|
|
||||||
'playlist_mincount': 611,
|
|
||||||
}, {
|
|
||||||
'url': 'https://vine.co/u/942914934646415360',
|
|
||||||
'only_matching': True,
|
|
||||||
}]
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def suitable(cls, url):
|
|
||||||
return False if VineIE.suitable(url) else super().suitable(url)
|
|
||||||
|
|
||||||
def _real_extract(self, url):
|
|
||||||
mobj = self._match_valid_url(url)
|
|
||||||
user = mobj.group('user')
|
|
||||||
u = mobj.group('u')
|
|
||||||
|
|
||||||
profile_url = '{}api/users/profiles/{}{}'.format(
|
|
||||||
self._VINE_BASE_URL, 'vanity/' if not u else '', user)
|
|
||||||
profile_data = self._download_json(
|
|
||||||
profile_url, user, note='Downloading user profile data')
|
|
||||||
|
|
||||||
data = profile_data['data']
|
|
||||||
user_id = data.get('userId') or data['userIdStr']
|
|
||||||
profile = self._download_json(
|
|
||||||
f'https://archive.vine.co/profiles/{user_id}.json', user_id)
|
|
||||||
entries = [
|
|
||||||
self.url_result(
|
|
||||||
f'https://vine.co/v/{post_id}', ie='Vine', video_id=post_id)
|
|
||||||
for post_id in profile['posts']
|
|
||||||
if post_id and isinstance(post_id, str)]
|
|
||||||
return self.playlist_result(
|
|
||||||
entries, user, profile.get('username'), profile.get('description'))
|
|
@ -17,10 +17,10 @@
|
|||||||
get_element_html_by_id,
|
get_element_html_by_id,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
join_nonempty,
|
join_nonempty,
|
||||||
|
parse_qs,
|
||||||
parse_resolution,
|
parse_resolution,
|
||||||
str_or_none,
|
str_or_none,
|
||||||
str_to_int,
|
str_to_int,
|
||||||
traverse_obj,
|
|
||||||
try_call,
|
try_call,
|
||||||
unescapeHTML,
|
unescapeHTML,
|
||||||
unified_timestamp,
|
unified_timestamp,
|
||||||
@ -29,6 +29,7 @@
|
|||||||
urlencode_postdata,
|
urlencode_postdata,
|
||||||
urljoin,
|
urljoin,
|
||||||
)
|
)
|
||||||
|
from ..utils.traversal import require, traverse_obj
|
||||||
|
|
||||||
|
|
||||||
class VKBaseIE(InfoExtractor):
|
class VKBaseIE(InfoExtractor):
|
||||||
@ -91,17 +92,17 @@ def _download_payload(self, path, video_id, data, fatal=True):
|
|||||||
class VKIE(VKBaseIE):
|
class VKIE(VKBaseIE):
|
||||||
IE_NAME = 'vk'
|
IE_NAME = 'vk'
|
||||||
IE_DESC = 'VK'
|
IE_DESC = 'VK'
|
||||||
_EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1']
|
_EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk(?:(?:video)?\.ru|\.com)/video_ext\.php.+?)\1']
|
||||||
_VALID_URL = r'''(?x)
|
_VALID_URL = r'''(?x)
|
||||||
https?://
|
https?://
|
||||||
(?:
|
(?:
|
||||||
(?:
|
(?:
|
||||||
(?:(?:m|new)\.)?vk\.com/video_|
|
(?:(?:m|new)\.)?vk(?:(?:video)?\.ru|\.com)/video_|
|
||||||
(?:www\.)?daxab\.com/
|
(?:www\.)?daxab\.com/
|
||||||
)
|
)
|
||||||
ext\.php\?(?P<embed_query>.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+).*)|
|
ext\.php\?(?P<embed_query>.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+).*)|
|
||||||
(?:
|
(?:
|
||||||
(?:(?:m|new)\.)?vk\.com/(?:.+?\?.*?z=)?(?:video|clip)|
|
(?:(?:m|new)\.)?vk(?:(?:video)?\.ru|\.com)/(?:.+?\?.*?z=)?(?:video|clip)|
|
||||||
(?:www\.)?daxab\.com/embed/
|
(?:www\.)?daxab\.com/embed/
|
||||||
)
|
)
|
||||||
(?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>([\da-f]+)|(ln-[\da-zA-Z]+)))?
|
(?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>([\da-f]+)|(ln-[\da-zA-Z]+)))?
|
||||||
@ -110,7 +111,7 @@ class VKIE(VKBaseIE):
|
|||||||
|
|
||||||
_TESTS = [
|
_TESTS = [
|
||||||
{
|
{
|
||||||
'url': 'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
|
'url': 'https://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '-77521_162222515',
|
'id': '-77521_162222515',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
@ -127,7 +128,7 @@ class VKIE(VKBaseIE):
|
|||||||
'params': {'skip_download': 'm3u8'},
|
'params': {'skip_download': 'm3u8'},
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://vk.com/video205387401_165548505',
|
'url': 'https://vk.com/video205387401_165548505',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '205387401_165548505',
|
'id': '205387401_165548505',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
@ -182,10 +183,10 @@ class VKIE(VKBaseIE):
|
|||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
|
'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
|
||||||
'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
|
'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
|
||||||
'duration': 178,
|
'duration': 179,
|
||||||
'upload_date': '20130117',
|
'upload_date': '20130117',
|
||||||
'uploader': "Children's Joy Foundation Inc.",
|
'uploader': "Children's Joy Foundation Inc.",
|
||||||
'uploader_id': 'thecjf',
|
'uploader_id': '@CJFIofficial',
|
||||||
'view_count': int,
|
'view_count': int,
|
||||||
'channel_id': 'UCgzCNQ11TmR9V97ECnhi3gw',
|
'channel_id': 'UCgzCNQ11TmR9V97ECnhi3gw',
|
||||||
'availability': 'public',
|
'availability': 'public',
|
||||||
@ -193,7 +194,7 @@ class VKIE(VKBaseIE):
|
|||||||
'live_status': 'not_live',
|
'live_status': 'not_live',
|
||||||
'playable_in_embed': True,
|
'playable_in_embed': True,
|
||||||
'channel': 'Children\'s Joy Foundation Inc.',
|
'channel': 'Children\'s Joy Foundation Inc.',
|
||||||
'uploader_url': 'http://www.youtube.com/user/thecjf',
|
'uploader_url': 'https://www.youtube.com/@CJFIofficial',
|
||||||
'thumbnail': r're:https?://.+\.jpg$',
|
'thumbnail': r're:https?://.+\.jpg$',
|
||||||
'tags': 'count:27',
|
'tags': 'count:27',
|
||||||
'start_time': 0.0,
|
'start_time': 0.0,
|
||||||
@ -201,6 +202,7 @@ class VKIE(VKBaseIE):
|
|||||||
'channel_url': 'https://www.youtube.com/channel/UCgzCNQ11TmR9V97ECnhi3gw',
|
'channel_url': 'https://www.youtube.com/channel/UCgzCNQ11TmR9V97ECnhi3gw',
|
||||||
'channel_follower_count': int,
|
'channel_follower_count': int,
|
||||||
'age_limit': 0,
|
'age_limit': 0,
|
||||||
|
'timestamp': 1358394935,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -222,6 +224,7 @@ class VKIE(VKBaseIE):
|
|||||||
'thumbnail': r're:https?://.+x1080$',
|
'thumbnail': r're:https?://.+x1080$',
|
||||||
'tags': list,
|
'tags': list,
|
||||||
},
|
},
|
||||||
|
'skip': 'This video has been deleted and is no longer available.',
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'https://vk.com/clips-74006511?z=clip-74006511_456247211',
|
'url': 'https://vk.com/clips-74006511?z=clip-74006511_456247211',
|
||||||
@ -235,13 +238,13 @@ class VKIE(VKBaseIE):
|
|||||||
'timestamp': 1664995597,
|
'timestamp': 1664995597,
|
||||||
'title': 'Clip by @madempress',
|
'title': 'Clip by @madempress',
|
||||||
'upload_date': '20221005',
|
'upload_date': '20221005',
|
||||||
'uploader': 'Шальная императрица',
|
'uploader': 'Шальная Императрица',
|
||||||
'uploader_id': '-74006511',
|
'uploader_id': '-74006511',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
# video key is extra_data not url\d+
|
# video key is extra_data not url\d+
|
||||||
'url': 'http://vk.com/video-110305615_171782105',
|
'url': 'https://vk.com/video-110305615_171782105',
|
||||||
'md5': 'e13fcda136f99764872e739d13fac1d1',
|
'md5': 'e13fcda136f99764872e739d13fac1d1',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '-110305615_171782105',
|
'id': '-110305615_171782105',
|
||||||
@ -273,6 +276,7 @@ class VKIE(VKBaseIE):
|
|||||||
'params': {
|
'params': {
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'skip': 'No formats found',
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
# live stream, hls and rtmp links, most likely already finished live
|
# live stream, hls and rtmp links, most likely already finished live
|
||||||
@ -312,7 +316,16 @@ class VKIE(VKBaseIE):
|
|||||||
{
|
{
|
||||||
'url': 'https://vk.com/clip30014565_456240946',
|
'url': 'https://vk.com/clip30014565_456240946',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
},
|
||||||
|
{
|
||||||
|
'url': 'https://vkvideo.ru/video-127553155_456242961',
|
||||||
|
'only_matching': True,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'url': 'https://vk.ru/video-220754053_456242564',
|
||||||
|
'only_matching': True,
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = self._match_valid_url(url)
|
mobj = self._match_valid_url(url)
|
||||||
@ -338,7 +351,7 @@ def _real_extract(self, url):
|
|||||||
video_id = '{}_{}'.format(mobj.group('oid'), mobj.group('id'))
|
video_id = '{}_{}'.format(mobj.group('oid'), mobj.group('id'))
|
||||||
|
|
||||||
info_page = self._download_webpage(
|
info_page = self._download_webpage(
|
||||||
'http://vk.com/video_ext.php?' + mobj.group('embed_query'), video_id)
|
'https://vk.com/video_ext.php?' + mobj.group('embed_query'), video_id)
|
||||||
|
|
||||||
error_message = self._html_search_regex(
|
error_message = self._html_search_regex(
|
||||||
[r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
|
[r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
|
||||||
@ -432,7 +445,7 @@ def _real_extract(self, url):
|
|||||||
if m_opts_url:
|
if m_opts_url:
|
||||||
opts_url = m_opts_url.group(1)
|
opts_url = m_opts_url.group(1)
|
||||||
if opts_url.startswith('//'):
|
if opts_url.startswith('//'):
|
||||||
opts_url = 'http:' + opts_url
|
opts_url = 'https:' + opts_url
|
||||||
return self.url_result(opts_url)
|
return self.url_result(opts_url)
|
||||||
|
|
||||||
data = player['params'][0]
|
data = player['params'][0]
|
||||||
@ -512,8 +525,11 @@ def _real_extract(self, url):
|
|||||||
class VKUserVideosIE(VKBaseIE):
|
class VKUserVideosIE(VKBaseIE):
|
||||||
IE_NAME = 'vk:uservideos'
|
IE_NAME = 'vk:uservideos'
|
||||||
IE_DESC = "VK - User's Videos"
|
IE_DESC = "VK - User's Videos"
|
||||||
_VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/video/(?:playlist/)?(?P<id>[^?$#/&]+)(?!\?.*\bz=video)(?:[/?#&](?:.*?\bsection=(?P<section>\w+))?|$)'
|
_BASE_URL_RE = r'https?://(?:(?:m|new)\.)?vk(?:video\.ru|\.com/video)'
|
||||||
_TEMPLATE_URL = 'https://vk.com/videos'
|
_VALID_URL = [
|
||||||
|
rf'{_BASE_URL_RE}/playlist/(?P<id>-?\d+_\d+)',
|
||||||
|
rf'{_BASE_URL_RE}/(?P<id>@[^/?#]+)(?:/all)?/?(?!\?.*\bz=video)(?:[?#]|$)',
|
||||||
|
]
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://vk.com/video/@mobidevices',
|
'url': 'https://vk.com/video/@mobidevices',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -527,12 +543,20 @@ class VKUserVideosIE(VKBaseIE):
|
|||||||
},
|
},
|
||||||
'playlist_mincount': 182,
|
'playlist_mincount': 182,
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://vk.com/video/playlist/-174476437_2',
|
'url': 'https://vkvideo.ru/playlist/-204353299_426',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '-174476437_playlist_2',
|
'id': '-204353299_playlist_426',
|
||||||
'title': 'Анонсы',
|
|
||||||
},
|
},
|
||||||
'playlist_mincount': 108,
|
'playlist_mincount': 33,
|
||||||
|
}, {
|
||||||
|
'url': 'https://vk.com/video/@gorkyfilmstudio/all',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://vkvideo.ru/@mobidevices',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://vk.com/video/playlist/-174476437_2',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
_VIDEO = collections.namedtuple('Video', ['owner_id', 'id'])
|
_VIDEO = collections.namedtuple('Video', ['owner_id', 'id'])
|
||||||
|
|
||||||
@ -552,7 +576,7 @@ def _entries(self, page_id, section):
|
|||||||
v = self._VIDEO._make(video[:2])
|
v = self._VIDEO._make(video[:2])
|
||||||
video_id = '%d_%d' % (v.owner_id, v.id)
|
video_id = '%d_%d' % (v.owner_id, v.id)
|
||||||
yield self.url_result(
|
yield self.url_result(
|
||||||
'http://vk.com/video' + video_id, VKIE.ie_key(), video_id)
|
'https://vk.com/video' + video_id, VKIE.ie_key(), video_id)
|
||||||
if count >= total:
|
if count >= total:
|
||||||
break
|
break
|
||||||
video_list_json = self._download_payload('al_video', page_id, {
|
video_list_json = self._download_payload('al_video', page_id, {
|
||||||
@ -561,23 +585,25 @@ def _entries(self, page_id, section):
|
|||||||
'oid': page_id,
|
'oid': page_id,
|
||||||
'section': section,
|
'section': section,
|
||||||
})[0][section]
|
})[0][section]
|
||||||
count += video_list_json['count']
|
new_count = video_list_json['count']
|
||||||
|
if not new_count:
|
||||||
|
self.to_screen(f'{page_id}: Skipping {total - count} unavailable videos')
|
||||||
|
break
|
||||||
|
count += new_count
|
||||||
video_list = video_list_json['list']
|
video_list = video_list_json['list']
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
u_id, section = self._match_valid_url(url).groups()
|
u_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, u_id)
|
webpage = self._download_webpage(url, u_id)
|
||||||
|
|
||||||
if u_id.startswith('@'):
|
if u_id.startswith('@'):
|
||||||
page_id = self._search_regex(r'data-owner-id\s?=\s?"([^"]+)"', webpage, 'page_id')
|
page_id = traverse_obj(
|
||||||
elif '_' in u_id:
|
self._search_json(r'\bvar newCur\s*=', webpage, 'cursor data', u_id),
|
||||||
page_id, section = u_id.split('_', 1)
|
('oid', {int}, {str_or_none}, {require('page id')}))
|
||||||
section = f'playlist_{section}'
|
section = traverse_obj(parse_qs(url), ('section', 0)) or 'all'
|
||||||
else:
|
else:
|
||||||
raise ExtractorError('Invalid URL', expected=True)
|
page_id, _, section = u_id.partition('_')
|
||||||
|
section = f'playlist_{section}'
|
||||||
if not section:
|
|
||||||
section = 'all'
|
|
||||||
|
|
||||||
playlist_title = clean_html(get_element_by_class('VideoInfoPanel__title', webpage))
|
playlist_title = clean_html(get_element_by_class('VideoInfoPanel__title', webpage))
|
||||||
return self.playlist_result(self._entries(page_id, section), f'{page_id}_{section}', playlist_title)
|
return self.playlist_result(self._entries(page_id, section), f'{page_id}_{section}', playlist_title)
|
||||||
@ -717,7 +743,7 @@ def _real_extract(self, url):
|
|||||||
|
|
||||||
|
|
||||||
class VKPlayBaseIE(InfoExtractor):
|
class VKPlayBaseIE(InfoExtractor):
|
||||||
_BASE_URL_RE = r'https?://(?:vkplay\.live|live\.vkplay\.ru)/'
|
_BASE_URL_RE = r'https?://(?:vkplay\.live|live\.vk(?:play|video)\.ru)/'
|
||||||
_RESOLUTIONS = {
|
_RESOLUTIONS = {
|
||||||
'tiny': '256x144',
|
'tiny': '256x144',
|
||||||
'lowest': '426x240',
|
'lowest': '426x240',
|
||||||
@ -797,6 +823,9 @@ class VKPlayIE(VKPlayBaseIE):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://live.vkplay.ru/lebwa/record/33a4e4ce-e3ef-49db-bb14-f006cc6fabc9/records',
|
'url': 'https://live.vkplay.ru/lebwa/record/33a4e4ce-e3ef-49db-bb14-f006cc6fabc9/records',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://live.vkvideo.ru/lebwa/record/33a4e4ce-e3ef-49db-bb14-f006cc6fabc9/records',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
@ -839,6 +868,9 @@ class VKPlayLiveIE(VKPlayBaseIE):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://live.vkplay.ru/lebwa',
|
'url': 'https://live.vkplay.ru/lebwa',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://live.vkvideo.ru/panterka',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
|
@ -124,7 +124,7 @@ def _parse_video_info(self, video_info, video_id=None):
|
|||||||
|
|
||||||
|
|
||||||
class WeiboIE(WeiboBaseIE):
|
class WeiboIE(WeiboBaseIE):
|
||||||
_VALID_URL = r'https?://(?:m\.weibo\.cn/status|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
|
_VALID_URL = r'https?://(?:m\.weibo\.cn/(?:status|detail)|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://weibo.com/7827771738/N4xlMvjhI',
|
'url': 'https://weibo.com/7827771738/N4xlMvjhI',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -164,6 +164,25 @@ class WeiboIE(WeiboBaseIE):
|
|||||||
'like_count': int,
|
'like_count': int,
|
||||||
'repost_count': int,
|
'repost_count': int,
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://m.weibo.cn/detail/4189191225395228',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '4189191225395228',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'display_id': 'FBqgOmDxO',
|
||||||
|
'title': '柴犬柴犬的秒拍视频',
|
||||||
|
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagram:shibainu.gaku http://t.cn/RHbmjzW ',
|
||||||
|
'duration': 53,
|
||||||
|
'timestamp': 1514264429,
|
||||||
|
'upload_date': '20171226',
|
||||||
|
'thumbnail': r're:https://.*\.jpg',
|
||||||
|
'uploader': '柴犬柴犬',
|
||||||
|
'uploader_id': '5926682210',
|
||||||
|
'uploader_url': 'https://weibo.com/u/5926682210',
|
||||||
|
'view_count': int,
|
||||||
|
'like_count': int,
|
||||||
|
'repost_count': int,
|
||||||
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://weibo.com/0/4224132150961381',
|
'url': 'https://weibo.com/0/4224132150961381',
|
||||||
'note': 'no playback_list example',
|
'note': 'no playback_list example',
|
||||||
|
@ -5,12 +5,13 @@
|
|||||||
int_or_none,
|
int_or_none,
|
||||||
js_to_json,
|
js_to_json,
|
||||||
url_or_none,
|
url_or_none,
|
||||||
|
urlhandle_detect_ext,
|
||||||
)
|
)
|
||||||
from ..utils.traversal import traverse_obj
|
from ..utils.traversal import traverse_obj
|
||||||
|
|
||||||
|
|
||||||
class XiaoHongShuIE(InfoExtractor):
|
class XiaoHongShuIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://www\.xiaohongshu\.com/explore/(?P<id>[\da-f]+)'
|
_VALID_URL = r'https?://www\.xiaohongshu\.com/(?:explore|discovery/item)/(?P<id>[\da-f]+)'
|
||||||
IE_DESC = '小红书'
|
IE_DESC = '小红书'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.xiaohongshu.com/explore/6411cf99000000001300b6d9',
|
'url': 'https://www.xiaohongshu.com/explore/6411cf99000000001300b6d9',
|
||||||
@ -25,6 +26,18 @@ class XiaoHongShuIE(InfoExtractor):
|
|||||||
'duration': 101.726,
|
'duration': 101.726,
|
||||||
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[a-z0-9]+/[\w]+',
|
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[a-z0-9]+/[\w]+',
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.xiaohongshu.com/discovery/item/674051740000000007027a15?xsec_token=CBgeL8Dxd1ZWBhwqRd568gAZ_iwG-9JIf9tnApNmteU2E=',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '674051740000000007027a15',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '相互喜欢就可以了',
|
||||||
|
'uploader_id': '63439913000000001901f49a',
|
||||||
|
'duration': 28.073,
|
||||||
|
'description': '#广州[话题]# #深圳[话题]# #香港[话题]# #街头采访[话题]# #是你喜欢的类型[话题]#',
|
||||||
|
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[\da-f]+/[^/]+',
|
||||||
|
'tags': ['广州', '深圳', '香港', '街头采访', '是你喜欢的类型'],
|
||||||
|
},
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
@ -34,7 +47,7 @@ def _real_extract(self, url):
|
|||||||
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json)
|
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json)
|
||||||
|
|
||||||
note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note'))
|
note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note'))
|
||||||
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ('h264', 'av1', 'h265'), ...))
|
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ..., ...))
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
for info in video_info:
|
for info in video_info:
|
||||||
@ -44,18 +57,32 @@ def _real_extract(self, url):
|
|||||||
'height': ('height', {int_or_none}),
|
'height': ('height', {int_or_none}),
|
||||||
'vcodec': ('videoCodec', {str}),
|
'vcodec': ('videoCodec', {str}),
|
||||||
'acodec': ('audioCodec', {str}),
|
'acodec': ('audioCodec', {str}),
|
||||||
'abr': ('audioBitrate', {int_or_none}),
|
'abr': ('audioBitrate', {int_or_none(scale=1000)}),
|
||||||
'vbr': ('videoBitrate', {int_or_none}),
|
'vbr': ('videoBitrate', {int_or_none(scale=1000)}),
|
||||||
'audio_channels': ('audioChannels', {int_or_none}),
|
'audio_channels': ('audioChannels', {int_or_none}),
|
||||||
'tbr': ('avgBitrate', {int_or_none}),
|
'tbr': ('avgBitrate', {int_or_none(scale=1000)}),
|
||||||
'format': ('qualityType', {str}),
|
'format': ('qualityType', {str}),
|
||||||
'filesize': ('size', {int_or_none}),
|
'filesize': ('size', {int_or_none}),
|
||||||
'duration': ('duration', {float_or_none(scale=1000)}),
|
'duration': ('duration', {float_or_none(scale=1000)}),
|
||||||
})
|
})
|
||||||
|
|
||||||
formats.extend(traverse_obj(info, (('mediaUrl', ('backupUrls', ...)), {
|
formats.extend(traverse_obj(info, (('masterUrl', ('backupUrls', ...)), {
|
||||||
lambda u: url_or_none(u) and {'url': u, **format_info}})))
|
lambda u: url_or_none(u) and {'url': u, **format_info}})))
|
||||||
|
|
||||||
|
if origin_key := traverse_obj(note_info, ('video', 'consumer', 'originVideoKey', {str})):
|
||||||
|
# Not using a head request because of false negatives
|
||||||
|
urlh = self._request_webpage(
|
||||||
|
f'https://sns-video-bd.xhscdn.com/{origin_key}', display_id,
|
||||||
|
'Checking original video availability', 'Original video is not available', fatal=False)
|
||||||
|
if urlh:
|
||||||
|
formats.append({
|
||||||
|
'format_id': 'direct',
|
||||||
|
'ext': urlhandle_detect_ext(urlh, default='mp4'),
|
||||||
|
'filesize': int_or_none(urlh.get_header('Content-Length')),
|
||||||
|
'url': urlh.url,
|
||||||
|
'quality': 1,
|
||||||
|
})
|
||||||
|
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
for image_info in traverse_obj(note_info, ('imageList', ...)):
|
for image_info in traverse_obj(note_info, ('imageList', ...)):
|
||||||
thumbnail_info = traverse_obj(image_info, {
|
thumbnail_info = traverse_obj(image_info, {
|
||||||
|
@ -32,7 +32,6 @@
|
|||||||
classproperty,
|
classproperty,
|
||||||
clean_html,
|
clean_html,
|
||||||
datetime_from_str,
|
datetime_from_str,
|
||||||
dict_get,
|
|
||||||
filesize_from_tbr,
|
filesize_from_tbr,
|
||||||
filter_dict,
|
filter_dict,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
@ -78,53 +77,60 @@
|
|||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'WEB',
|
'clientName': 'WEB',
|
||||||
'clientVersion': '2.20240726.00.00',
|
'clientVersion': '2.20241126.01.00',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
|
||||||
'REQUIRE_PO_TOKEN': True,
|
'REQUIRE_PO_TOKEN': True,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
|
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
|
||||||
'web_safari': {
|
'web_safari': {
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'WEB',
|
'clientName': 'WEB',
|
||||||
'clientVersion': '2.20240726.00.00',
|
'clientVersion': '2.20241126.01.00',
|
||||||
'userAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15,gzip(gfe)',
|
'userAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15,gzip(gfe)',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
|
||||||
'REQUIRE_PO_TOKEN': True,
|
'REQUIRE_PO_TOKEN': True,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
'web_embedded': {
|
'web_embedded': {
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'WEB_EMBEDDED_PLAYER',
|
'clientName': 'WEB_EMBEDDED_PLAYER',
|
||||||
'clientVersion': '1.20240723.01.00',
|
'clientVersion': '1.20241201.00.00',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 56,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 56,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
'web_music': {
|
'web_music': {
|
||||||
'INNERTUBE_HOST': 'music.youtube.com',
|
'INNERTUBE_HOST': 'music.youtube.com',
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'WEB_REMIX',
|
'clientName': 'WEB_REMIX',
|
||||||
'clientVersion': '1.20240724.00.00',
|
'clientVersion': '1.20241127.01.00',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
# This client now requires sign-in for every video
|
# This client now requires sign-in for every video
|
||||||
'web_creator': {
|
'web_creator': {
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'WEB_CREATOR',
|
'clientName': 'WEB_CREATOR',
|
||||||
'clientVersion': '1.20240723.03.00',
|
'clientVersion': '1.20241203.01.00',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
'REQUIRE_AUTH': True,
|
'REQUIRE_AUTH': True,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
'android': {
|
'android': {
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
@ -208,6 +214,7 @@
|
|||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
|
||||||
'REQUIRE_JS_PLAYER': False,
|
'REQUIRE_JS_PLAYER': False,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
},
|
},
|
||||||
# This client now requires sign-in for every video
|
# This client now requires sign-in for every video
|
||||||
'ios_music': {
|
'ios_music': {
|
||||||
@ -224,6 +231,7 @@
|
|||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
|
||||||
'REQUIRE_JS_PLAYER': False,
|
'REQUIRE_JS_PLAYER': False,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
'REQUIRE_AUTH': True,
|
'REQUIRE_AUTH': True,
|
||||||
},
|
},
|
||||||
# This client now requires sign-in for every video
|
# This client now requires sign-in for every video
|
||||||
@ -241,6 +249,7 @@
|
|||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 15,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 15,
|
||||||
'REQUIRE_JS_PLAYER': False,
|
'REQUIRE_JS_PLAYER': False,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
'REQUIRE_AUTH': True,
|
'REQUIRE_AUTH': True,
|
||||||
},
|
},
|
||||||
# mweb has 'ultralow' formats
|
# mweb has 'ultralow' formats
|
||||||
@ -249,19 +258,24 @@
|
|||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'MWEB',
|
'clientName': 'MWEB',
|
||||||
'clientVersion': '2.20240726.01.00',
|
'clientVersion': '2.20241202.07.00',
|
||||||
|
# mweb previously did not require PO Token with this UA
|
||||||
|
'userAgent': 'Mozilla/5.0 (iPad; CPU OS 16_7_10 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1,gzip(gfe)',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
|
||||||
|
'REQUIRE_PO_TOKEN': True,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
'tv': {
|
'tv': {
|
||||||
'INNERTUBE_CONTEXT': {
|
'INNERTUBE_CONTEXT': {
|
||||||
'client': {
|
'client': {
|
||||||
'clientName': 'TVHTML5',
|
'clientName': 'TVHTML5',
|
||||||
'clientVersion': '7.20240724.13.00',
|
'clientVersion': '7.20241201.18.00',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
|
||||||
|
'SUPPORTS_COOKIES': True,
|
||||||
},
|
},
|
||||||
# This client now requires sign-in for every video
|
# This client now requires sign-in for every video
|
||||||
# It was previously an age-gate workaround for videos that were `playable_in_embed`
|
# It was previously an age-gate workaround for videos that were `playable_in_embed`
|
||||||
@ -275,19 +289,7 @@
|
|||||||
},
|
},
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 85,
|
'INNERTUBE_CONTEXT_CLIENT_NAME': 85,
|
||||||
'REQUIRE_AUTH': True,
|
'REQUIRE_AUTH': True,
|
||||||
},
|
'SUPPORTS_COOKIES': True,
|
||||||
# This client now requires sign-in for every video
|
|
||||||
# It may be able to receive pre-merged video+audio 720p/1080p streams
|
|
||||||
'mediaconnect': {
|
|
||||||
'INNERTUBE_CONTEXT': {
|
|
||||||
'client': {
|
|
||||||
'clientName': 'MEDIA_CONNECT_FRONTEND',
|
|
||||||
'clientVersion': '0.1',
|
|
||||||
},
|
|
||||||
},
|
|
||||||
'INNERTUBE_CONTEXT_CLIENT_NAME': 95,
|
|
||||||
'REQUIRE_JS_PLAYER': False,
|
|
||||||
'REQUIRE_AUTH': True,
|
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -317,6 +319,7 @@ def build_innertube_clients():
|
|||||||
ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
|
ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
|
||||||
ytcfg.setdefault('REQUIRE_PO_TOKEN', False)
|
ytcfg.setdefault('REQUIRE_PO_TOKEN', False)
|
||||||
ytcfg.setdefault('REQUIRE_AUTH', False)
|
ytcfg.setdefault('REQUIRE_AUTH', False)
|
||||||
|
ytcfg.setdefault('SUPPORTS_COOKIES', False)
|
||||||
ytcfg.setdefault('PLAYER_PARAMS', None)
|
ytcfg.setdefault('PLAYER_PARAMS', None)
|
||||||
ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en')
|
ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en')
|
||||||
|
|
||||||
@ -518,11 +521,12 @@ def ucid_or_none(self, ucid):
|
|||||||
return self._search_regex(rf'^({self._YT_CHANNEL_UCID_RE})$', ucid, 'UC-id', default=None)
|
return self._search_regex(rf'^({self._YT_CHANNEL_UCID_RE})$', ucid, 'UC-id', default=None)
|
||||||
|
|
||||||
def handle_or_none(self, handle):
|
def handle_or_none(self, handle):
|
||||||
return self._search_regex(rf'^({self._YT_HANDLE_RE})$', handle, '@-handle', default=None)
|
return self._search_regex(rf'^({self._YT_HANDLE_RE})$', urllib.parse.unquote(handle or ''),
|
||||||
|
'@-handle', default=None)
|
||||||
|
|
||||||
def handle_from_url(self, url):
|
def handle_from_url(self, url):
|
||||||
return self._search_regex(rf'^(?:https?://(?:www\.)?youtube\.com)?/({self._YT_HANDLE_RE})',
|
return self._search_regex(rf'^(?:https?://(?:www\.)?youtube\.com)?/({self._YT_HANDLE_RE})',
|
||||||
url, 'channel handle', default=None)
|
urllib.parse.unquote(url or ''), 'channel handle', default=None)
|
||||||
|
|
||||||
def ucid_from_url(self, url):
|
def ucid_from_url(self, url):
|
||||||
return self._search_regex(rf'^(?:https?://(?:www\.)?youtube\.com)?/({self._YT_CHANNEL_UCID_RE})',
|
return self._search_regex(rf'^(?:https?://(?:www\.)?youtube\.com)?/({self._YT_CHANNEL_UCID_RE})',
|
||||||
@ -567,9 +571,15 @@ def _initialize_pref(self):
|
|||||||
pref.update({'hl': self._preferred_lang or 'en', 'tz': 'UTC'})
|
pref.update({'hl': self._preferred_lang or 'en', 'tz': 'UTC'})
|
||||||
self._set_cookie('.youtube.com', name='PREF', value=urllib.parse.urlencode(pref))
|
self._set_cookie('.youtube.com', name='PREF', value=urllib.parse.urlencode(pref))
|
||||||
|
|
||||||
|
def _initialize_cookie_auth(self):
|
||||||
|
yt_sapisid, yt_1psapisid, yt_3psapisid = self._get_sid_cookies()
|
||||||
|
if yt_sapisid or yt_1psapisid or yt_3psapisid:
|
||||||
|
self.write_debug('Found YouTube account cookies')
|
||||||
|
|
||||||
def _real_initialize(self):
|
def _real_initialize(self):
|
||||||
self._initialize_pref()
|
self._initialize_pref()
|
||||||
self._initialize_consent()
|
self._initialize_consent()
|
||||||
|
self._initialize_cookie_auth()
|
||||||
self._check_login_required()
|
self._check_login_required()
|
||||||
|
|
||||||
def _perform_login(self, username, password):
|
def _perform_login(self, username, password):
|
||||||
@ -627,32 +637,63 @@ def _extract_context(self, ytcfg=None, default_client='web'):
|
|||||||
client_context.update({'hl': self._preferred_lang or 'en', 'timeZone': 'UTC', 'utcOffsetMinutes': 0})
|
client_context.update({'hl': self._preferred_lang or 'en', 'timeZone': 'UTC', 'utcOffsetMinutes': 0})
|
||||||
return context
|
return context
|
||||||
|
|
||||||
_SAPISID = None
|
@staticmethod
|
||||||
|
def _make_sid_authorization(scheme, sid, origin, additional_parts):
|
||||||
|
timestamp = str(round(time.time()))
|
||||||
|
|
||||||
def _generate_sapisidhash_header(self, origin='https://www.youtube.com'):
|
hash_parts = []
|
||||||
time_now = round(time.time())
|
if additional_parts:
|
||||||
if self._SAPISID is None:
|
hash_parts.append(':'.join(additional_parts.values()))
|
||||||
yt_cookies = self._get_cookies('https://www.youtube.com')
|
hash_parts.extend([timestamp, sid, origin])
|
||||||
# Sometimes SAPISID cookie isn't present but __Secure-3PAPISID is.
|
sidhash = hashlib.sha1(' '.join(hash_parts).encode()).hexdigest()
|
||||||
# See: https://github.com/yt-dlp/yt-dlp/issues/393
|
|
||||||
sapisid_cookie = dict_get(
|
parts = [timestamp, sidhash]
|
||||||
yt_cookies, ('__Secure-3PAPISID', 'SAPISID'))
|
if additional_parts:
|
||||||
if sapisid_cookie and sapisid_cookie.value:
|
parts.append(''.join(additional_parts))
|
||||||
self._SAPISID = sapisid_cookie.value
|
|
||||||
self.write_debug('Extracted SAPISID cookie')
|
return f'{scheme} {"_".join(parts)}'
|
||||||
# SAPISID cookie is required if not already present
|
|
||||||
if not yt_cookies.get('SAPISID'):
|
def _get_sid_cookies(self):
|
||||||
self.write_debug('Copying __Secure-3PAPISID cookie to SAPISID cookie')
|
"""
|
||||||
self._set_cookie(
|
Get SAPISID, 1PSAPISID, 3PSAPISID cookie values
|
||||||
'.youtube.com', 'SAPISID', self._SAPISID, secure=True, expire_time=time_now + 3600)
|
@returns sapisid, 1psapisid, 3psapisid
|
||||||
else:
|
"""
|
||||||
self._SAPISID = False
|
yt_cookies = self._get_cookies('https://www.youtube.com')
|
||||||
if not self._SAPISID:
|
yt_sapisid = try_call(lambda: yt_cookies['SAPISID'].value)
|
||||||
|
yt_3papisid = try_call(lambda: yt_cookies['__Secure-3PAPISID'].value)
|
||||||
|
yt_1papisid = try_call(lambda: yt_cookies['__Secure-1PAPISID'].value)
|
||||||
|
|
||||||
|
# Sometimes SAPISID cookie isn't present but __Secure-3PAPISID is.
|
||||||
|
# YouTube also falls back to __Secure-3PAPISID if SAPISID is missing.
|
||||||
|
# See: https://github.com/yt-dlp/yt-dlp/issues/393
|
||||||
|
|
||||||
|
return yt_sapisid or yt_3papisid, yt_1papisid, yt_3papisid
|
||||||
|
|
||||||
|
def _get_sid_authorization_header(self, origin='https://www.youtube.com', user_session_id=None):
|
||||||
|
"""
|
||||||
|
Generate API Session ID Authorization for Innertube requests. Assumes all requests are secure (https).
|
||||||
|
@param origin: Origin URL
|
||||||
|
@param user_session_id: Optional User Session ID
|
||||||
|
@return: Authorization header value
|
||||||
|
"""
|
||||||
|
|
||||||
|
authorizations = []
|
||||||
|
additional_parts = {}
|
||||||
|
if user_session_id:
|
||||||
|
additional_parts['u'] = user_session_id
|
||||||
|
|
||||||
|
yt_sapisid, yt_1psapisid, yt_3psapisid = self._get_sid_cookies()
|
||||||
|
|
||||||
|
for scheme, sid in (('SAPISIDHASH', yt_sapisid),
|
||||||
|
('SAPISID1PHASH', yt_1psapisid),
|
||||||
|
('SAPISID3PHASH', yt_3psapisid)):
|
||||||
|
if sid:
|
||||||
|
authorizations.append(self._make_sid_authorization(scheme, sid, origin, additional_parts))
|
||||||
|
|
||||||
|
if not authorizations:
|
||||||
return None
|
return None
|
||||||
# SAPISIDHASH algorithm from https://stackoverflow.com/a/32065323
|
|
||||||
sapisidhash = hashlib.sha1(
|
return ' '.join(authorizations)
|
||||||
f'{time_now} {self._SAPISID} {origin}'.encode()).hexdigest()
|
|
||||||
return f'SAPISIDHASH {time_now}_{sapisidhash}'
|
|
||||||
|
|
||||||
def _call_api(self, ep, query, video_id, fatal=True, headers=None,
|
def _call_api(self, ep, query, video_id, fatal=True, headers=None,
|
||||||
note='Downloading API JSON', errnote='Unable to download API page',
|
note='Downloading API JSON', errnote='Unable to download API page',
|
||||||
@ -688,26 +729,48 @@ def _extract_session_index(*data):
|
|||||||
if session_index is not None:
|
if session_index is not None:
|
||||||
return session_index
|
return session_index
|
||||||
|
|
||||||
def _data_sync_id_to_delegated_session_id(self, data_sync_id):
|
@staticmethod
|
||||||
if not data_sync_id:
|
def _parse_data_sync_id(data_sync_id):
|
||||||
return
|
|
||||||
# datasyncid is of the form "channel_syncid||user_syncid" for secondary channel
|
|
||||||
# and just "user_syncid||" for primary channel. We only want the channel_syncid
|
|
||||||
channel_syncid, _, user_syncid = data_sync_id.partition('||')
|
|
||||||
if user_syncid:
|
|
||||||
return channel_syncid
|
|
||||||
|
|
||||||
def _extract_account_syncid(self, *args):
|
|
||||||
"""
|
"""
|
||||||
Extract current session ID required to download private playlists of secondary channels
|
Parse data_sync_id into delegated_session_id and user_session_id.
|
||||||
|
|
||||||
|
data_sync_id is of the form "delegated_session_id||user_session_id" for secondary channel
|
||||||
|
and just "user_session_id||" for primary channel.
|
||||||
|
|
||||||
|
@param data_sync_id: data_sync_id string
|
||||||
|
@return: Tuple of (delegated_session_id, user_session_id)
|
||||||
|
"""
|
||||||
|
if not data_sync_id:
|
||||||
|
return None, None
|
||||||
|
first, _, second = data_sync_id.partition('||')
|
||||||
|
if second:
|
||||||
|
return first, second
|
||||||
|
return None, first
|
||||||
|
|
||||||
|
def _extract_delegated_session_id(self, *args):
|
||||||
|
"""
|
||||||
|
Extract current delegated session ID required to download private playlists of secondary channels
|
||||||
@params response and/or ytcfg
|
@params response and/or ytcfg
|
||||||
|
@return: delegated session ID
|
||||||
"""
|
"""
|
||||||
# ytcfg includes channel_syncid if on secondary channel
|
# ytcfg includes channel_syncid if on secondary channel
|
||||||
if delegated_sid := traverse_obj(args, (..., 'DELEGATED_SESSION_ID', {str}, any)):
|
if delegated_sid := traverse_obj(args, (..., 'DELEGATED_SESSION_ID', {str}, any)):
|
||||||
return delegated_sid
|
return delegated_sid
|
||||||
|
|
||||||
data_sync_id = self._extract_data_sync_id(*args)
|
data_sync_id = self._extract_data_sync_id(*args)
|
||||||
return self._data_sync_id_to_delegated_session_id(data_sync_id)
|
return self._parse_data_sync_id(data_sync_id)[0]
|
||||||
|
|
||||||
|
def _extract_user_session_id(self, *args):
|
||||||
|
"""
|
||||||
|
Extract current user session ID
|
||||||
|
@params response and/or ytcfg
|
||||||
|
@return: user session ID
|
||||||
|
"""
|
||||||
|
if user_sid := traverse_obj(args, (..., 'USER_SESSION_ID', {str}, any)):
|
||||||
|
return user_sid
|
||||||
|
|
||||||
|
data_sync_id = self._extract_data_sync_id(*args)
|
||||||
|
return self._parse_data_sync_id(data_sync_id)[1]
|
||||||
|
|
||||||
def _extract_data_sync_id(self, *args):
|
def _extract_data_sync_id(self, *args):
|
||||||
"""
|
"""
|
||||||
@ -734,7 +797,7 @@ def _extract_visitor_data(self, *args):
|
|||||||
|
|
||||||
@functools.cached_property
|
@functools.cached_property
|
||||||
def is_authenticated(self):
|
def is_authenticated(self):
|
||||||
return bool(self._generate_sapisidhash_header())
|
return bool(self._get_sid_authorization_header())
|
||||||
|
|
||||||
def extract_ytcfg(self, video_id, webpage):
|
def extract_ytcfg(self, video_id, webpage):
|
||||||
if not webpage:
|
if not webpage:
|
||||||
@ -744,25 +807,28 @@ def extract_ytcfg(self, video_id, webpage):
|
|||||||
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
|
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
|
||||||
default='{}'), video_id, fatal=False) or {}
|
default='{}'), video_id, fatal=False) or {}
|
||||||
|
|
||||||
def _generate_cookie_auth_headers(self, *, ytcfg=None, account_syncid=None, session_index=None, origin=None, **kwargs):
|
def _generate_cookie_auth_headers(self, *, ytcfg=None, delegated_session_id=None, user_session_id=None, session_index=None, origin=None, **kwargs):
|
||||||
headers = {}
|
headers = {}
|
||||||
account_syncid = account_syncid or self._extract_account_syncid(ytcfg)
|
delegated_session_id = delegated_session_id or self._extract_delegated_session_id(ytcfg)
|
||||||
if account_syncid:
|
if delegated_session_id:
|
||||||
headers['X-Goog-PageId'] = account_syncid
|
headers['X-Goog-PageId'] = delegated_session_id
|
||||||
if session_index is None:
|
if session_index is None:
|
||||||
session_index = self._extract_session_index(ytcfg)
|
session_index = self._extract_session_index(ytcfg)
|
||||||
if account_syncid or session_index is not None:
|
if delegated_session_id or session_index is not None:
|
||||||
headers['X-Goog-AuthUser'] = session_index if session_index is not None else 0
|
headers['X-Goog-AuthUser'] = session_index if session_index is not None else 0
|
||||||
|
|
||||||
auth = self._generate_sapisidhash_header(origin)
|
auth = self._get_sid_authorization_header(origin, user_session_id=user_session_id or self._extract_user_session_id(ytcfg))
|
||||||
if auth is not None:
|
if auth is not None:
|
||||||
headers['Authorization'] = auth
|
headers['Authorization'] = auth
|
||||||
headers['X-Origin'] = origin
|
headers['X-Origin'] = origin
|
||||||
|
|
||||||
|
if traverse_obj(ytcfg, 'LOGGED_IN', expected_type=bool):
|
||||||
|
headers['X-Youtube-Bootstrap-Logged-In'] = 'true'
|
||||||
|
|
||||||
return headers
|
return headers
|
||||||
|
|
||||||
def generate_api_headers(
|
def generate_api_headers(
|
||||||
self, *, ytcfg=None, account_syncid=None, session_index=None,
|
self, *, ytcfg=None, delegated_session_id=None, user_session_id=None, session_index=None,
|
||||||
visitor_data=None, api_hostname=None, default_client='web', **kwargs):
|
visitor_data=None, api_hostname=None, default_client='web', **kwargs):
|
||||||
|
|
||||||
origin = 'https://' + (self._select_api_hostname(api_hostname, default_client))
|
origin = 'https://' + (self._select_api_hostname(api_hostname, default_client))
|
||||||
@ -773,7 +839,12 @@ def generate_api_headers(
|
|||||||
'Origin': origin,
|
'Origin': origin,
|
||||||
'X-Goog-Visitor-Id': visitor_data or self._extract_visitor_data(ytcfg),
|
'X-Goog-Visitor-Id': visitor_data or self._extract_visitor_data(ytcfg),
|
||||||
'User-Agent': self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CONTEXT']['client']['userAgent'], default_client=default_client),
|
'User-Agent': self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CONTEXT']['client']['userAgent'], default_client=default_client),
|
||||||
**self._generate_cookie_auth_headers(ytcfg=ytcfg, account_syncid=account_syncid, session_index=session_index, origin=origin),
|
**self._generate_cookie_auth_headers(
|
||||||
|
ytcfg=ytcfg,
|
||||||
|
delegated_session_id=delegated_session_id,
|
||||||
|
user_session_id=user_session_id,
|
||||||
|
session_index=session_index,
|
||||||
|
origin=origin),
|
||||||
}
|
}
|
||||||
return filter_dict(headers)
|
return filter_dict(headers)
|
||||||
|
|
||||||
@ -1356,7 +1427,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
|
'401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
|
||||||
}
|
}
|
||||||
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
|
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
|
||||||
_DEFAULT_CLIENTS = ('ios', 'mweb')
|
_DEFAULT_CLIENTS = ('tv', 'ios', 'web')
|
||||||
|
_DEFAULT_AUTHED_CLIENTS = ('tv', 'web')
|
||||||
|
|
||||||
_GEO_BYPASS = False
|
_GEO_BYPASS = False
|
||||||
|
|
||||||
@ -1494,7 +1566,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
},
|
},
|
||||||
# Age-gate videos. See https://github.com/yt-dlp/yt-dlp/pull/575#issuecomment-888837000
|
# Age-gate videos. See https://github.com/yt-dlp/yt-dlp/pull/575#issuecomment-888837000
|
||||||
{
|
{
|
||||||
'note': 'Embed allowed age-gate video',
|
'note': 'Embed allowed age-gate video; works with web_embedded',
|
||||||
'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
|
'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'HtVdAasjOgU',
|
'id': 'HtVdAasjOgU',
|
||||||
@ -1524,7 +1596,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'heatmap': 'count:100',
|
'heatmap': 'count:100',
|
||||||
'timestamp': 1401991663,
|
'timestamp': 1401991663,
|
||||||
},
|
},
|
||||||
'skip': 'Age-restricted; requires authentication',
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'note': 'Age-gate video with embed allowed in public site',
|
'note': 'Age-gate video with embed allowed in public site',
|
||||||
@ -2800,6 +2871,35 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'extractor_args': {'youtube': {'player_client': ['ios'], 'player_skip': ['webpage']}},
|
'extractor_args': {'youtube': {'player_client': ['ios'], 'player_skip': ['webpage']}},
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
# uploader_id has non-ASCII characters that are percent-encoded in YT's JSON
|
||||||
|
'url': 'https://www.youtube.com/shorts/18NGQq7p3LY',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '18NGQq7p3LY',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '아이브 이서 장원영 리즈 삐끼삐끼 챌린지',
|
||||||
|
'description': '',
|
||||||
|
'uploader': 'ㅇㅇ',
|
||||||
|
'uploader_id': '@으아-v1k',
|
||||||
|
'uploader_url': 'https://www.youtube.com/@으아-v1k',
|
||||||
|
'channel': 'ㅇㅇ',
|
||||||
|
'channel_id': 'UCC25oTm2J7ZVoi5TngOHg9g',
|
||||||
|
'channel_url': 'https://www.youtube.com/channel/UCC25oTm2J7ZVoi5TngOHg9g',
|
||||||
|
'thumbnail': r're:https?://.+/.+\.jpg',
|
||||||
|
'playable_in_embed': True,
|
||||||
|
'age_limit': 0,
|
||||||
|
'duration': 3,
|
||||||
|
'timestamp': 1724306170,
|
||||||
|
'upload_date': '20240822',
|
||||||
|
'availability': 'public',
|
||||||
|
'live_status': 'not_live',
|
||||||
|
'view_count': int,
|
||||||
|
'like_count': int,
|
||||||
|
'channel_follower_count': int,
|
||||||
|
'categories': ['People & Blogs'],
|
||||||
|
'tags': [],
|
||||||
|
},
|
||||||
|
},
|
||||||
]
|
]
|
||||||
|
|
||||||
_WEBPAGE_TESTS = [
|
_WEBPAGE_TESTS = [
|
||||||
@ -2925,7 +3025,7 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
|
|||||||
# Obtain from MPD's maximum seq value
|
# Obtain from MPD's maximum seq value
|
||||||
old_mpd_url = mpd_url
|
old_mpd_url = mpd_url
|
||||||
last_error = ctx.pop('last_error', None)
|
last_error = ctx.pop('last_error', None)
|
||||||
expire_fast = immediate or last_error and isinstance(last_error, HTTPError) and last_error.status == 403
|
expire_fast = immediate or (last_error and isinstance(last_error, HTTPError) and last_error.status == 403)
|
||||||
mpd_url, stream_number, is_live = (mpd_feed(format_id, 5 if expire_fast else 18000)
|
mpd_url, stream_number, is_live = (mpd_feed(format_id, 5 if expire_fast else 18000)
|
||||||
or (mpd_url, stream_number, False))
|
or (mpd_url, stream_number, False))
|
||||||
if not refresh_sequence:
|
if not refresh_sequence:
|
||||||
@ -3118,19 +3218,26 @@ def _genslice(start, end, step):
|
|||||||
self.to_screen('Extracted signature function:\n' + code)
|
self.to_screen('Extracted signature function:\n' + code)
|
||||||
|
|
||||||
def _parse_sig_js(self, jscode):
|
def _parse_sig_js(self, jscode):
|
||||||
|
# Examples where `sig` is funcname:
|
||||||
|
# sig=function(a){a=a.split(""); ... ;return a.join("")};
|
||||||
|
# ;c&&(c=sig(decodeURIComponent(c)),a.set(b,encodeURIComponent(c)));return a};
|
||||||
|
# {var l=f,m=h.sp,n=sig(decodeURIComponent(h.s));l.set(m,encodeURIComponent(n))}
|
||||||
|
# sig=function(J){J=J.split(""); ... ;return J.join("")};
|
||||||
|
# ;N&&(N=sig(decodeURIComponent(N)),J.set(R,encodeURIComponent(N)));return J};
|
||||||
|
# {var H=u,k=f.sp,v=sig(decodeURIComponent(f.s));H.set(k,encodeURIComponent(v))}
|
||||||
funcname = self._search_regex(
|
funcname = self._search_regex(
|
||||||
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
(r'\b(?P<var>[a-zA-Z0-9_$]+)&&\((?P=var)=(?P<sig>[a-zA-Z0-9_$]{2,})\(decodeURIComponent\((?P=var)\)\)',
|
||||||
|
r'(?P<sig>[a-zA-Z0-9_$]+)\s*=\s*function\(\s*(?P<arg>[a-zA-Z0-9_$]+)\s*\)\s*{\s*(?P=arg)\s*=\s*(?P=arg)\.split\(\s*""\s*\)\s*;\s*[^}]+;\s*return\s+(?P=arg)\.join\(\s*""\s*\)',
|
||||||
|
r'(?:\b|[^a-zA-Z0-9_$])(?P<sig>[a-zA-Z0-9_$]{2,})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)(?:;[a-zA-Z0-9_$]{2}\.[a-zA-Z0-9_$]{2}\(a,\d+\))?',
|
||||||
|
# Old patterns
|
||||||
|
r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'\bm=(?P<sig>[a-zA-Z0-9$]{2,})\(decodeURIComponent\(h\.s\)\)',
|
r'\bm=(?P<sig>[a-zA-Z0-9$]{2,})\(decodeURIComponent\(h\.s\)\)',
|
||||||
r'\bc&&\(c=(?P<sig>[a-zA-Z0-9$]{2,})\(decodeURIComponent\(c\)\)',
|
|
||||||
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2,})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)(?:;[a-zA-Z0-9$]{2}\.[a-zA-Z0-9$]{2}\(a,\d+\))?',
|
|
||||||
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
|
|
||||||
# Obsolete patterns
|
# Obsolete patterns
|
||||||
r'("|\')signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
r'("|\')signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
|
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
||||||
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
|
|
||||||
r'\bc\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
|
r'\bc\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
|
||||||
jscode, 'Initial JS player signature function name', group='sig')
|
jscode, 'Initial JS player signature function name', group='sig')
|
||||||
|
|
||||||
@ -3204,6 +3311,7 @@ def _extract_n_function_name(self, jscode, player_url=None):
|
|||||||
# * a.D&&(b="nn"[+a.D],c=a.get(b))&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
|
# * a.D&&(b="nn"[+a.D],c=a.get(b))&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
|
||||||
# * a.D&&(PL(a),b=a.j.n||null)&&(b=narray[0](b),a.set("n",b),narray.length||nfunc("")
|
# * a.D&&(PL(a),b=a.j.n||null)&&(b=narray[0](b),a.set("n",b),narray.length||nfunc("")
|
||||||
# * a.D&&(b="nn"[+a.D],vL(a),c=a.j[b]||null)&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
|
# * a.D&&(b="nn"[+a.D],vL(a),c=a.j[b]||null)&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
|
||||||
|
# * J.J="";J.url="";J.Z&&(R="nn"[+J.Z],mW(J),N=J.K[R]||null)&&(N=narray[idx](N),J.set(R,N))}};
|
||||||
funcname, idx = self._search_regex(
|
funcname, idx = self._search_regex(
|
||||||
r'''(?x)
|
r'''(?x)
|
||||||
(?:
|
(?:
|
||||||
@ -3220,7 +3328,7 @@ def _extract_n_function_name(self, jscode, player_url=None):
|
|||||||
)\)&&\(c=|
|
)\)&&\(c=|
|
||||||
\b(?P<var>[a-zA-Z0-9_$]+)=
|
\b(?P<var>[a-zA-Z0-9_$]+)=
|
||||||
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]\)
|
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]\)
|
||||||
(?(var),[a-zA-Z0-9_$]+\.set\("n"\,(?P=var)\),(?P=nfunc)\.length)''',
|
(?(var),[a-zA-Z0-9_$]+\.set\((?:"n+"|[a-zA-Z0-9_$]+)\,(?P=var)\))''',
|
||||||
jscode, 'n function name', group=('nfunc', 'idx'), default=(None, None))
|
jscode, 'n function name', group=('nfunc', 'idx'), default=(None, None))
|
||||||
if not funcname:
|
if not funcname:
|
||||||
self.report_warning(join_nonempty(
|
self.report_warning(join_nonempty(
|
||||||
@ -3229,7 +3337,7 @@ def _extract_n_function_name(self, jscode, player_url=None):
|
|||||||
return self._search_regex(
|
return self._search_regex(
|
||||||
r'''(?xs)
|
r'''(?xs)
|
||||||
;\s*(?P<name>[a-zA-Z0-9_$]+)\s*=\s*function\([a-zA-Z0-9_$]+\)
|
;\s*(?P<name>[a-zA-Z0-9_$]+)\s*=\s*function\([a-zA-Z0-9_$]+\)
|
||||||
\s*\{(?:(?!};).)+?["']enhanced_except_''',
|
\s*\{(?:(?!};).)+?return\s*(?P<q>["'])[\w-]+_w8_(?P=q)\s*\+\s*[a-zA-Z0-9_$]+''',
|
||||||
jscode, 'Initial JS player n function name', group='name')
|
jscode, 'Initial JS player n function name', group='name')
|
||||||
elif not idx:
|
elif not idx:
|
||||||
return funcname
|
return funcname
|
||||||
@ -3238,6 +3346,11 @@ def _extract_n_function_name(self, jscode, player_url=None):
|
|||||||
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode,
|
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode,
|
||||||
f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)]
|
f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)]
|
||||||
|
|
||||||
|
def _fixup_n_function_code(self, argnames, code):
|
||||||
|
return argnames, re.sub(
|
||||||
|
rf';\s*if\s*\(\s*typeof\s+[a-zA-Z0-9_$]+\s*===?\s*(["\'])undefined\1\s*\)\s*return\s+{argnames[0]};',
|
||||||
|
';', code)
|
||||||
|
|
||||||
def _extract_n_function_code(self, video_id, player_url):
|
def _extract_n_function_code(self, video_id, player_url):
|
||||||
player_id = self._extract_player_info(player_url)
|
player_id = self._extract_player_info(player_url)
|
||||||
func_code = self.cache.load('youtube-nsig', player_id, min_ver='2024.07.09')
|
func_code = self.cache.load('youtube-nsig', player_id, min_ver='2024.07.09')
|
||||||
@ -3249,7 +3362,8 @@ def _extract_n_function_code(self, video_id, player_url):
|
|||||||
|
|
||||||
func_name = self._extract_n_function_name(jscode, player_url=player_url)
|
func_name = self._extract_n_function_name(jscode, player_url=player_url)
|
||||||
|
|
||||||
func_code = jsi.extract_function_code(func_name)
|
# XXX: Workaround for the `typeof` gotcha
|
||||||
|
func_code = self._fixup_n_function_code(*jsi.extract_function_code(func_name))
|
||||||
|
|
||||||
self.cache.store('youtube-nsig', player_id, func_code)
|
self.cache.store('youtube-nsig', player_id, func_code)
|
||||||
return jsi, player_id, func_code
|
return jsi, player_id, func_code
|
||||||
@ -3265,7 +3379,7 @@ def extract_nsig(s):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
raise JSInterpreter.Exception(traceback.format_exc(), cause=e)
|
raise JSInterpreter.Exception(traceback.format_exc(), cause=e)
|
||||||
|
|
||||||
if ret.startswith('enhanced_except_'):
|
if ret.startswith('enhanced_except_') or ret.endswith(s):
|
||||||
raise JSInterpreter.Exception('Signature function returned an exception')
|
raise JSInterpreter.Exception('Signature function returned an exception')
|
||||||
return ret
|
return ret
|
||||||
|
|
||||||
@ -3793,9 +3907,13 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
|
|||||||
default_client=client,
|
default_client=client,
|
||||||
visitor_data=visitor_data,
|
visitor_data=visitor_data,
|
||||||
session_index=self._extract_session_index(master_ytcfg, player_ytcfg),
|
session_index=self._extract_session_index(master_ytcfg, player_ytcfg),
|
||||||
account_syncid=(
|
delegated_session_id=(
|
||||||
self._data_sync_id_to_delegated_session_id(data_sync_id)
|
self._parse_data_sync_id(data_sync_id)[0]
|
||||||
or self._extract_account_syncid(master_ytcfg, initial_pr, player_ytcfg)
|
or self._extract_delegated_session_id(master_ytcfg, initial_pr, player_ytcfg)
|
||||||
|
),
|
||||||
|
user_session_id=(
|
||||||
|
self._parse_data_sync_id(data_sync_id)[1]
|
||||||
|
or self._extract_user_session_id(master_ytcfg, initial_pr, player_ytcfg)
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -3823,12 +3941,13 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
|
|||||||
def _get_requested_clients(self, url, smuggled_data):
|
def _get_requested_clients(self, url, smuggled_data):
|
||||||
requested_clients = []
|
requested_clients = []
|
||||||
excluded_clients = []
|
excluded_clients = []
|
||||||
|
default_clients = self._DEFAULT_AUTHED_CLIENTS if self.is_authenticated else self._DEFAULT_CLIENTS
|
||||||
allowed_clients = sorted(
|
allowed_clients = sorted(
|
||||||
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
|
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
|
||||||
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
|
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
|
||||||
for client in self._configuration_arg('player_client'):
|
for client in self._configuration_arg('player_client'):
|
||||||
if client == 'default':
|
if client == 'default':
|
||||||
requested_clients.extend(self._DEFAULT_CLIENTS)
|
requested_clients.extend(default_clients)
|
||||||
elif client == 'all':
|
elif client == 'all':
|
||||||
requested_clients.extend(allowed_clients)
|
requested_clients.extend(allowed_clients)
|
||||||
elif client.startswith('-'):
|
elif client.startswith('-'):
|
||||||
@ -3838,20 +3957,20 @@ def _get_requested_clients(self, url, smuggled_data):
|
|||||||
else:
|
else:
|
||||||
requested_clients.append(client)
|
requested_clients.append(client)
|
||||||
if not requested_clients:
|
if not requested_clients:
|
||||||
requested_clients.extend(self._DEFAULT_CLIENTS)
|
requested_clients.extend(default_clients)
|
||||||
for excluded_client in excluded_clients:
|
for excluded_client in excluded_clients:
|
||||||
if excluded_client in requested_clients:
|
if excluded_client in requested_clients:
|
||||||
requested_clients.remove(excluded_client)
|
requested_clients.remove(excluded_client)
|
||||||
if not requested_clients:
|
if not requested_clients:
|
||||||
raise ExtractorError('No player clients have been requested', expected=True)
|
raise ExtractorError('No player clients have been requested', expected=True)
|
||||||
|
|
||||||
if smuggled_data.get('is_music_url') or self.is_music_url(url):
|
if self.is_authenticated:
|
||||||
for requested_client in requested_clients:
|
unsupported_clients = [
|
||||||
_, base_client, variant = _split_innertube_client(requested_client)
|
client for client in requested_clients if not INNERTUBE_CLIENTS[client]['SUPPORTS_COOKIES']
|
||||||
music_client = f'{base_client}_music' if base_client != 'mweb' else 'web_music'
|
]
|
||||||
if variant != 'music' and music_client in INNERTUBE_CLIENTS:
|
for client in unsupported_clients:
|
||||||
if not INNERTUBE_CLIENTS[music_client]['REQUIRE_AUTH'] or self.is_authenticated:
|
self.report_warning(f'Skipping client "{client}" since it does not support cookies', only_once=True)
|
||||||
requested_clients.append(music_client)
|
requested_clients.remove(client)
|
||||||
|
|
||||||
return orderedSet(requested_clients)
|
return orderedSet(requested_clients)
|
||||||
|
|
||||||
@ -3919,13 +4038,10 @@ def append_client(*client_names):
|
|||||||
)
|
)
|
||||||
|
|
||||||
require_po_token = self._get_default_ytcfg(client).get('REQUIRE_PO_TOKEN')
|
require_po_token = self._get_default_ytcfg(client).get('REQUIRE_PO_TOKEN')
|
||||||
if not po_token and require_po_token:
|
if not po_token and require_po_token and 'missing_pot' in self._configuration_arg('formats'):
|
||||||
self.report_warning(
|
self.report_warning(
|
||||||
f'No PO Token provided for {client} client, '
|
f'No PO Token provided for {client} client, '
|
||||||
f'which is required for working {client} formats. '
|
f'which may be required for working {client} formats. This client will be deprioritized', only_once=True)
|
||||||
f'You can manually pass a PO Token for this client with '
|
|
||||||
f'--extractor-args "youtube:po_token={client}+XXX"',
|
|
||||||
only_once=True)
|
|
||||||
deprioritize_pr = True
|
deprioritize_pr = True
|
||||||
|
|
||||||
pr = initial_pr if client == 'web' else None
|
pr = initial_pr if client == 'web' else None
|
||||||
@ -3958,17 +4074,6 @@ def append_client(*client_names):
|
|||||||
else:
|
else:
|
||||||
prs.append(pr)
|
prs.append(pr)
|
||||||
|
|
||||||
# EU countries require age-verification for accounts to access age-restricted videos
|
|
||||||
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
|
|
||||||
if self.is_authenticated and self._is_agegated(pr):
|
|
||||||
self.to_screen(
|
|
||||||
f'{video_id}: This video is age-restricted and YouTube is requiring '
|
|
||||||
'account age-verification; some formats may be missing', only_once=True)
|
|
||||||
# web_creator can work around the age-verification requirement
|
|
||||||
# android_vr and mediaconnect may also be able to work around age-verification
|
|
||||||
# tv_embedded may(?) still work around age-verification if the video is embeddable
|
|
||||||
append_client('web_creator')
|
|
||||||
|
|
||||||
prs.extend(deprioritized_prs)
|
prs.extend(deprioritized_prs)
|
||||||
|
|
||||||
if skipped_clients:
|
if skipped_clients:
|
||||||
@ -3983,10 +4088,25 @@ def append_client(*client_names):
|
|||||||
return prs, player_url
|
return prs, player_url
|
||||||
|
|
||||||
def _needs_live_processing(self, live_status, duration):
|
def _needs_live_processing(self, live_status, duration):
|
||||||
if (live_status == 'is_live' and self.get_param('live_from_start')
|
if ((live_status == 'is_live' and self.get_param('live_from_start'))
|
||||||
or live_status == 'post_live' and (duration or 0) > 2 * 3600):
|
or (live_status == 'post_live' and (duration or 0) > 2 * 3600)):
|
||||||
return live_status
|
return live_status
|
||||||
|
|
||||||
|
def _report_pot_format_skipped(self, video_id, client_name, proto):
|
||||||
|
msg = (
|
||||||
|
f'{video_id}: {client_name} client {proto} formats require a PO Token which was not provided. '
|
||||||
|
'They will be skipped as they may yield HTTP Error 403. '
|
||||||
|
f'You can manually pass a PO Token for this client with --extractor-args "youtube:po_token={client_name}+XXX". '
|
||||||
|
'For more information, refer to https://github.com/yt-dlp/yt-dlp/wiki/Extractors#po-token-guide . '
|
||||||
|
'To enable these broken formats anyway, pass --extractor-args "youtube:formats=missing_pot"')
|
||||||
|
|
||||||
|
# Only raise a warning for non-default clients, to not confuse users.
|
||||||
|
# iOS HLS formats still work without PO Token, so we don't need to warn about them.
|
||||||
|
if client_name in (*self._DEFAULT_CLIENTS, *self._DEFAULT_AUTHED_CLIENTS):
|
||||||
|
self.write_debug(msg, only_once=True)
|
||||||
|
else:
|
||||||
|
self.report_warning(msg, only_once=True)
|
||||||
|
|
||||||
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
|
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
|
||||||
CHUNK_SIZE = 10 << 20
|
CHUNK_SIZE = 10 << 20
|
||||||
PREFERRED_LANG_VALUE = 10
|
PREFERRED_LANG_VALUE = 10
|
||||||
@ -4040,10 +4160,12 @@ def build_fragments(f):
|
|||||||
if height:
|
if height:
|
||||||
res_qualities[height] = quality
|
res_qualities[height] = quality
|
||||||
|
|
||||||
|
display_name = audio_track.get('displayName') or ''
|
||||||
|
is_original = 'original' in display_name.lower()
|
||||||
|
is_descriptive = 'descriptive' in display_name.lower()
|
||||||
is_default = audio_track.get('audioIsDefault')
|
is_default = audio_track.get('audioIsDefault')
|
||||||
is_descriptive = 'descriptive' in (audio_track.get('displayName') or '').lower()
|
|
||||||
language_code = audio_track.get('id', '').split('.')[0]
|
language_code = audio_track.get('id', '').split('.')[0]
|
||||||
if language_code and is_default:
|
if language_code and (is_original or (is_default and not original_language)):
|
||||||
original_language = language_code
|
original_language = language_code
|
||||||
|
|
||||||
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
|
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
|
||||||
@ -4111,11 +4233,10 @@ def build_fragments(f):
|
|||||||
fmt_url = update_url_query(fmt_url, {'pot': po_token})
|
fmt_url = update_url_query(fmt_url, {'pot': po_token})
|
||||||
|
|
||||||
# Clients that require PO Token return videoplayback URLs that may return 403
|
# Clients that require PO Token return videoplayback URLs that may return 403
|
||||||
is_broken = (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN'))
|
require_po_token = (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN'))
|
||||||
if is_broken:
|
if require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
|
||||||
self.report_warning(
|
self._report_pot_format_skipped(video_id, client_name, 'https')
|
||||||
f'{video_id}: {client_name} client formats require a PO Token which was not provided. '
|
continue
|
||||||
'They will be deprioritized as they may yield HTTP Error 403', only_once=True)
|
|
||||||
|
|
||||||
name = fmt.get('qualityLabel') or quality.replace('audio_quality_', '') or ''
|
name = fmt.get('qualityLabel') or quality.replace('audio_quality_', '') or ''
|
||||||
fps = int_or_none(fmt.get('fps')) or 0
|
fps = int_or_none(fmt.get('fps')) or 0
|
||||||
@ -4124,11 +4245,11 @@ def build_fragments(f):
|
|||||||
'filesize': int_or_none(fmt.get('contentLength')),
|
'filesize': int_or_none(fmt.get('contentLength')),
|
||||||
'format_id': f'{itag}{"-drc" if fmt.get("isDrc") else ""}',
|
'format_id': f'{itag}{"-drc" if fmt.get("isDrc") else ""}',
|
||||||
'format_note': join_nonempty(
|
'format_note': join_nonempty(
|
||||||
join_nonempty(audio_track.get('displayName'), is_default and ' (default)', delim=''),
|
join_nonempty(display_name, is_default and ' (default)', delim=''),
|
||||||
name, fmt.get('isDrc') and 'DRC',
|
name, fmt.get('isDrc') and 'DRC',
|
||||||
try_get(fmt, lambda x: x['projectionType'].replace('RECTANGULAR', '').lower()),
|
try_get(fmt, lambda x: x['projectionType'].replace('RECTANGULAR', '').lower()),
|
||||||
try_get(fmt, lambda x: x['spatialAudioType'].replace('SPATIAL_AUDIO_TYPE_', '').lower()),
|
try_get(fmt, lambda x: x['spatialAudioType'].replace('SPATIAL_AUDIO_TYPE_', '').lower()),
|
||||||
is_damaged and 'DAMAGED', is_broken and 'BROKEN',
|
is_damaged and 'DAMAGED', require_po_token and 'MISSING POT',
|
||||||
(self.get_param('verbose') or all_formats) and short_client_name(client_name),
|
(self.get_param('verbose') or all_formats) and short_client_name(client_name),
|
||||||
delim=', '),
|
delim=', '),
|
||||||
# Format 22 is likely to be damaged. See https://github.com/yt-dlp/yt-dlp/issues/3372
|
# Format 22 is likely to be damaged. See https://github.com/yt-dlp/yt-dlp/issues/3372
|
||||||
@ -4143,9 +4264,9 @@ def build_fragments(f):
|
|||||||
'url': fmt_url,
|
'url': fmt_url,
|
||||||
'width': int_or_none(fmt.get('width')),
|
'width': int_or_none(fmt.get('width')),
|
||||||
'language': join_nonempty(language_code, 'desc' if is_descriptive else '') or None,
|
'language': join_nonempty(language_code, 'desc' if is_descriptive else '') or None,
|
||||||
'language_preference': PREFERRED_LANG_VALUE if is_default else -10 if is_descriptive else -1,
|
'language_preference': PREFERRED_LANG_VALUE if is_original else 5 if is_default else -10 if is_descriptive else -1,
|
||||||
# Strictly de-prioritize broken, damaged and 3gp formats
|
# Strictly de-prioritize broken, damaged and 3gp formats
|
||||||
'preference': -20 if is_broken else -10 if is_damaged else -2 if itag == '17' else None,
|
'preference': -20 if require_po_token else -10 if is_damaged else -2 if itag == '17' else None,
|
||||||
}
|
}
|
||||||
mime_mobj = re.match(
|
mime_mobj = re.match(
|
||||||
r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', fmt.get('mimeType') or '')
|
r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', fmt.get('mimeType') or '')
|
||||||
@ -4180,7 +4301,7 @@ def build_fragments(f):
|
|||||||
skip_manifests = set(self._configuration_arg('skip'))
|
skip_manifests = set(self._configuration_arg('skip'))
|
||||||
if (not self.get_param('youtube_include_hls_manifest', True)
|
if (not self.get_param('youtube_include_hls_manifest', True)
|
||||||
or needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway
|
or needs_live_processing == 'is_live' # These will be filtered out by YoutubeDL anyway
|
||||||
or needs_live_processing and skip_bad_formats):
|
or (needs_live_processing and skip_bad_formats)):
|
||||||
skip_manifests.add('hls')
|
skip_manifests.add('hls')
|
||||||
|
|
||||||
if not self.get_param('youtube_include_dash_manifest', True):
|
if not self.get_param('youtube_include_dash_manifest', True):
|
||||||
@ -4195,7 +4316,6 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
|
|||||||
key = (proto, f.get('language'))
|
key = (proto, f.get('language'))
|
||||||
if not all_formats and key in itags[itag]:
|
if not all_formats and key in itags[itag]:
|
||||||
return False
|
return False
|
||||||
itags[itag].add(key)
|
|
||||||
|
|
||||||
if f.get('source_preference') is None:
|
if f.get('source_preference') is None:
|
||||||
f['source_preference'] = -1
|
f['source_preference'] = -1
|
||||||
@ -4203,12 +4323,14 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
|
|||||||
# Clients that require PO Token return videoplayback URLs that may return 403
|
# Clients that require PO Token return videoplayback URLs that may return 403
|
||||||
# hls does not currently require PO Token
|
# hls does not currently require PO Token
|
||||||
if (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN')) and proto != 'hls':
|
if (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN')) and proto != 'hls':
|
||||||
self.report_warning(
|
if 'missing_pot' not in self._configuration_arg('formats'):
|
||||||
f'{video_id}: {client_name} client {proto} formats require a PO Token which was not provided. '
|
self._report_pot_format_skipped(video_id, client_name, proto)
|
||||||
'They will be deprioritized as they may yield HTTP Error 403', only_once=True)
|
return False
|
||||||
f['format_note'] = join_nonempty(f.get('format_note'), 'BROKEN', delim=' ')
|
f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ')
|
||||||
f['source_preference'] -= 20
|
f['source_preference'] -= 20
|
||||||
|
|
||||||
|
itags[itag].add(key)
|
||||||
|
|
||||||
if itag and all_formats:
|
if itag and all_formats:
|
||||||
f['format_id'] = f'{itag}-{proto}'
|
f['format_id'] = f'{itag}-{proto}'
|
||||||
elif any(p != proto for p, _ in itags[itag]):
|
elif any(p != proto for p, _ in itags[itag]):
|
||||||
@ -4378,14 +4500,14 @@ def _real_extract(self, url):
|
|||||||
expected_type=dict)
|
expected_type=dict)
|
||||||
|
|
||||||
translated_title = self._get_text(microformats, (..., 'title'))
|
translated_title = self._get_text(microformats, (..., 'title'))
|
||||||
video_title = (self._preferred_lang and translated_title
|
video_title = ((self._preferred_lang and translated_title)
|
||||||
or get_first(video_details, 'title') # primary
|
or get_first(video_details, 'title') # primary
|
||||||
or translated_title
|
or translated_title
|
||||||
or search_meta(['og:title', 'twitter:title', 'title']))
|
or search_meta(['og:title', 'twitter:title', 'title']))
|
||||||
translated_description = self._get_text(microformats, (..., 'description'))
|
translated_description = self._get_text(microformats, (..., 'description'))
|
||||||
original_description = get_first(video_details, 'shortDescription')
|
original_description = get_first(video_details, 'shortDescription')
|
||||||
video_description = (
|
video_description = (
|
||||||
self._preferred_lang and translated_description
|
(self._preferred_lang and translated_description)
|
||||||
# If original description is blank, it will be an empty string.
|
# If original description is blank, it will be an empty string.
|
||||||
# Do not prefer translated description in this case.
|
# Do not prefer translated description in this case.
|
||||||
or original_description if original_description is not None else translated_description)
|
or original_description if original_description is not None else translated_description)
|
||||||
@ -4662,7 +4784,7 @@ def process_language(container, base_url, lang_code, sub_name, query):
|
|||||||
(?=(?P<artist>[^\n]+))(?P=artist)\n+
|
(?=(?P<artist>[^\n]+))(?P=artist)\n+
|
||||||
(?=(?P<album>[^\n]+))(?P=album)\n
|
(?=(?P<album>[^\n]+))(?P=album)\n
|
||||||
(?:.+?℗\s*(?P<release_year>\d{4})(?!\d))?
|
(?:.+?℗\s*(?P<release_year>\d{4})(?!\d))?
|
||||||
(?:.+?Released on\s*:\s*(?P<release_date>\d{4}-\d{2}-\d{2}))?
|
(?:.+?Released\ on\s*:\s*(?P<release_date>\d{4}-\d{2}-\d{2}))?
|
||||||
(.+?\nArtist\s*:\s*
|
(.+?\nArtist\s*:\s*
|
||||||
(?=(?P<clean_artist>[^\n]+))(?P=clean_artist)\n
|
(?=(?P<clean_artist>[^\n]+))(?P=clean_artist)\n
|
||||||
)?.+\nAuto-generated\ by\ YouTube\.\s*$
|
)?.+\nAuto-generated\ by\ YouTube\.\s*$
|
||||||
@ -4986,6 +5108,10 @@ def _grid_entries(self, grid_renderer):
|
|||||||
for item in grid_renderer['items']:
|
for item in grid_renderer['items']:
|
||||||
if not isinstance(item, dict):
|
if not isinstance(item, dict):
|
||||||
continue
|
continue
|
||||||
|
if lockup_view_model := traverse_obj(item, ('lockupViewModel', {dict})):
|
||||||
|
if entry := self._extract_lockup_view_model(lockup_view_model):
|
||||||
|
yield entry
|
||||||
|
continue
|
||||||
renderer = self._extract_basic_item_renderer(item)
|
renderer = self._extract_basic_item_renderer(item)
|
||||||
if not isinstance(renderer, dict):
|
if not isinstance(renderer, dict):
|
||||||
continue
|
continue
|
||||||
@ -5084,10 +5210,30 @@ def _playlist_entries(self, video_list_renderer):
|
|||||||
continue
|
continue
|
||||||
yield self._extract_video(renderer)
|
yield self._extract_video(renderer)
|
||||||
|
|
||||||
|
def _extract_lockup_view_model(self, view_model):
|
||||||
|
content_id = view_model.get('contentId')
|
||||||
|
if not content_id:
|
||||||
|
return
|
||||||
|
content_type = view_model.get('contentType')
|
||||||
|
if content_type not in ('LOCKUP_CONTENT_TYPE_PLAYLIST', 'LOCKUP_CONTENT_TYPE_PODCAST'):
|
||||||
|
self.report_warning(
|
||||||
|
f'Unsupported lockup view model content type "{content_type}"{bug_reports_message()}', only_once=True)
|
||||||
|
return
|
||||||
|
return self.url_result(
|
||||||
|
f'https://www.youtube.com/playlist?list={content_id}', ie=YoutubeTabIE, video_id=content_id,
|
||||||
|
title=traverse_obj(view_model, (
|
||||||
|
'metadata', 'lockupMetadataViewModel', 'title', 'content', {str})),
|
||||||
|
thumbnails=self._extract_thumbnails(view_model, (
|
||||||
|
'contentImage', 'collectionThumbnailViewModel', 'primaryThumbnail', 'thumbnailViewModel', 'image'), final_key='sources'))
|
||||||
|
|
||||||
def _rich_entries(self, rich_grid_renderer):
|
def _rich_entries(self, rich_grid_renderer):
|
||||||
|
if lockup_view_model := traverse_obj(rich_grid_renderer, ('content', 'lockupViewModel', {dict})):
|
||||||
|
if entry := self._extract_lockup_view_model(lockup_view_model):
|
||||||
|
yield entry
|
||||||
|
return
|
||||||
renderer = traverse_obj(
|
renderer = traverse_obj(
|
||||||
rich_grid_renderer,
|
rich_grid_renderer,
|
||||||
('content', ('videoRenderer', 'reelItemRenderer', 'playlistRenderer', 'shortsLockupViewModel', 'lockupViewModel'), any)) or {}
|
('content', ('videoRenderer', 'reelItemRenderer', 'playlistRenderer', 'shortsLockupViewModel'), any)) or {}
|
||||||
video_id = renderer.get('videoId')
|
video_id = renderer.get('videoId')
|
||||||
if video_id:
|
if video_id:
|
||||||
yield self._extract_video(renderer)
|
yield self._extract_video(renderer)
|
||||||
@ -5114,18 +5260,6 @@ def _rich_entries(self, rich_grid_renderer):
|
|||||||
})),
|
})),
|
||||||
thumbnails=self._extract_thumbnails(renderer, 'thumbnail', final_key='sources'))
|
thumbnails=self._extract_thumbnails(renderer, 'thumbnail', final_key='sources'))
|
||||||
return
|
return
|
||||||
# lockupViewModel extraction
|
|
||||||
content_id = renderer.get('contentId')
|
|
||||||
if content_id and renderer.get('contentType') == 'LOCKUP_CONTENT_TYPE_PODCAST':
|
|
||||||
yield self.url_result(
|
|
||||||
f'https://www.youtube.com/playlist?list={content_id}',
|
|
||||||
ie=YoutubeTabIE, video_id=content_id,
|
|
||||||
**traverse_obj(renderer, {
|
|
||||||
'title': ('metadata', 'lockupMetadataViewModel', 'title', 'content', {str}),
|
|
||||||
}),
|
|
||||||
thumbnails=self._extract_thumbnails(renderer, (
|
|
||||||
'contentImage', 'collectionThumbnailViewModel', 'primaryThumbnail', 'thumbnailViewModel', 'image'), final_key='sources'))
|
|
||||||
return
|
|
||||||
|
|
||||||
def _video_entry(self, video_renderer):
|
def _video_entry(self, video_renderer):
|
||||||
video_id = video_renderer.get('videoId')
|
video_id = video_renderer.get('videoId')
|
||||||
@ -5243,6 +5377,7 @@ def _extract_entries(self, parent_renderer, continuation_list):
|
|||||||
'channelRenderer': lambda x: self._grid_entries({'items': [{'channelRenderer': x}]}),
|
'channelRenderer': lambda x: self._grid_entries({'items': [{'channelRenderer': x}]}),
|
||||||
'hashtagTileRenderer': lambda x: [self._hashtag_tile_entry(x)],
|
'hashtagTileRenderer': lambda x: [self._hashtag_tile_entry(x)],
|
||||||
'richGridRenderer': lambda x: self._extract_entries(x, continuation_list),
|
'richGridRenderer': lambda x: self._extract_entries(x, continuation_list),
|
||||||
|
'lockupViewModel': lambda x: [self._extract_lockup_view_model(x)],
|
||||||
}
|
}
|
||||||
for key, renderer in isr_content.items():
|
for key, renderer in isr_content.items():
|
||||||
if key not in known_renderers:
|
if key not in known_renderers:
|
||||||
@ -5259,7 +5394,7 @@ def _extract_entries(self, parent_renderer, continuation_list):
|
|||||||
if not continuation_list[0]:
|
if not continuation_list[0]:
|
||||||
continuation_list[0] = self._extract_continuation(parent_renderer)
|
continuation_list[0] = self._extract_continuation(parent_renderer)
|
||||||
|
|
||||||
def _entries(self, tab, item_id, ytcfg, account_syncid, visitor_data):
|
def _entries(self, tab, item_id, ytcfg, delegated_session_id, visitor_data):
|
||||||
continuation_list = [None]
|
continuation_list = [None]
|
||||||
extract_entries = lambda x: self._extract_entries(x, continuation_list)
|
extract_entries = lambda x: self._extract_entries(x, continuation_list)
|
||||||
tab_content = try_get(tab, lambda x: x['content'], dict)
|
tab_content = try_get(tab, lambda x: x['content'], dict)
|
||||||
@ -5280,7 +5415,7 @@ def _entries(self, tab, item_id, ytcfg, account_syncid, visitor_data):
|
|||||||
break
|
break
|
||||||
seen_continuations.add(continuation_token)
|
seen_continuations.add(continuation_token)
|
||||||
headers = self.generate_api_headers(
|
headers = self.generate_api_headers(
|
||||||
ytcfg=ytcfg, account_syncid=account_syncid, visitor_data=visitor_data)
|
ytcfg=ytcfg, delegated_session_id=delegated_session_id, visitor_data=visitor_data)
|
||||||
response = self._extract_response(
|
response = self._extract_response(
|
||||||
item_id=f'{item_id} page {page_num}',
|
item_id=f'{item_id} page {page_num}',
|
||||||
query=continuation, headers=headers, ytcfg=ytcfg,
|
query=continuation, headers=headers, ytcfg=ytcfg,
|
||||||
@ -5350,7 +5485,7 @@ def _extract_from_tabs(self, item_id, ytcfg, data, tabs):
|
|||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
self._entries(
|
self._entries(
|
||||||
selected_tab, metadata['id'], ytcfg,
|
selected_tab, metadata['id'], ytcfg,
|
||||||
self._extract_account_syncid(ytcfg, data),
|
self._extract_delegated_session_id(ytcfg, data),
|
||||||
self._extract_visitor_data(data, ytcfg)),
|
self._extract_visitor_data(data, ytcfg)),
|
||||||
**metadata)
|
**metadata)
|
||||||
|
|
||||||
@ -5502,7 +5637,7 @@ def _extract_inline_playlist(self, playlist, playlist_id, data, ytcfg):
|
|||||||
watch_endpoint = try_get(
|
watch_endpoint = try_get(
|
||||||
playlist, lambda x: x['contents'][-1]['playlistPanelVideoRenderer']['navigationEndpoint']['watchEndpoint'])
|
playlist, lambda x: x['contents'][-1]['playlistPanelVideoRenderer']['navigationEndpoint']['watchEndpoint'])
|
||||||
headers = self.generate_api_headers(
|
headers = self.generate_api_headers(
|
||||||
ytcfg=ytcfg, account_syncid=self._extract_account_syncid(ytcfg, data),
|
ytcfg=ytcfg, delegated_session_id=self._extract_delegated_session_id(ytcfg, data),
|
||||||
visitor_data=self._extract_visitor_data(response, data, ytcfg))
|
visitor_data=self._extract_visitor_data(response, data, ytcfg))
|
||||||
query = {
|
query = {
|
||||||
'playlistId': playlist_id,
|
'playlistId': playlist_id,
|
||||||
@ -5600,7 +5735,7 @@ def _reload_with_unavailable_videos(self, item_id, data, ytcfg):
|
|||||||
if not is_playlist:
|
if not is_playlist:
|
||||||
return
|
return
|
||||||
headers = self.generate_api_headers(
|
headers = self.generate_api_headers(
|
||||||
ytcfg=ytcfg, account_syncid=self._extract_account_syncid(ytcfg, data),
|
ytcfg=ytcfg, delegated_session_id=self._extract_delegated_session_id(ytcfg, data),
|
||||||
visitor_data=self._extract_visitor_data(data, ytcfg))
|
visitor_data=self._extract_visitor_data(data, ytcfg))
|
||||||
query = {
|
query = {
|
||||||
'params': 'wgYCCAA=',
|
'params': 'wgYCCAA=',
|
||||||
@ -5794,7 +5929,7 @@ class YoutubeTabIE(YoutubeTabBaseInfoExtractor):
|
|||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'UCYO_jab_esuFRV4b17AJtAw',
|
'id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||||
'title': '3Blue1Brown - Playlists',
|
'title': '3Blue1Brown - Playlists',
|
||||||
'description': 'md5:4d1da95432004b7ba840ebc895b6b4c9',
|
'description': 'md5:602e3789e6a0cb7d9d352186b720e395',
|
||||||
'channel_url': 'https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw',
|
'channel_url': 'https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw',
|
||||||
'channel': '3Blue1Brown',
|
'channel': '3Blue1Brown',
|
||||||
'channel_id': 'UCYO_jab_esuFRV4b17AJtAw',
|
'channel_id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||||
@ -6813,7 +6948,7 @@ def _extract_tab_id_and_name(self, tab, base_url='https://www.youtube.com'):
|
|||||||
tab_url = urljoin(base_url, traverse_obj(
|
tab_url = urljoin(base_url, traverse_obj(
|
||||||
tab, ('endpoint', 'commandMetadata', 'webCommandMetadata', 'url')))
|
tab, ('endpoint', 'commandMetadata', 'webCommandMetadata', 'url')))
|
||||||
|
|
||||||
tab_id = (tab_url and self._get_url_mobj(tab_url)['tab'][1:]
|
tab_id = ((tab_url and self._get_url_mobj(tab_url)['tab'][1:])
|
||||||
or traverse_obj(tab, 'tabIdentifier', expected_type=str))
|
or traverse_obj(tab, 'tabIdentifier', expected_type=str))
|
||||||
if tab_id:
|
if tab_id:
|
||||||
return {
|
return {
|
||||||
|
@ -1370,12 +1370,12 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
|
|||||||
help='Allow Unicode characters, "&" and spaces in filenames (default)')
|
help='Allow Unicode characters, "&" and spaces in filenames (default)')
|
||||||
filesystem.add_option(
|
filesystem.add_option(
|
||||||
'--windows-filenames',
|
'--windows-filenames',
|
||||||
action='store_true', dest='windowsfilenames', default=False,
|
action='store_true', dest='windowsfilenames', default=None,
|
||||||
help='Force filenames to be Windows-compatible')
|
help='Force filenames to be Windows-compatible')
|
||||||
filesystem.add_option(
|
filesystem.add_option(
|
||||||
'--no-windows-filenames',
|
'--no-windows-filenames',
|
||||||
action='store_false', dest='windowsfilenames',
|
action='store_false', dest='windowsfilenames',
|
||||||
help='Make filenames Windows-compatible only if using Windows (default)')
|
help='Sanitize filenames only minimally')
|
||||||
filesystem.add_option(
|
filesystem.add_option(
|
||||||
'--trim-filenames', '--trim-file-names', metavar='LENGTH',
|
'--trim-filenames', '--trim-file-names', metavar='LENGTH',
|
||||||
dest='trim_file_name', default=0, type=int,
|
dest='trim_file_name', default=0, type=int,
|
||||||
|
@ -183,4 +183,4 @@ def load_plugins(name, suffix):
|
|||||||
|
|
||||||
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.extractor', f'{PACKAGE_NAME}.postprocessor'))
|
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.extractor', f'{PACKAGE_NAME}.postprocessor'))
|
||||||
|
|
||||||
__all__ = ['directories', 'load_plugins', 'PACKAGE_NAME', 'COMPAT_PACKAGE_NAME']
|
__all__ = ['COMPAT_PACKAGE_NAME', 'PACKAGE_NAME', 'directories', 'load_plugins']
|
||||||
|
@ -44,4 +44,4 @@ def get_postprocessor(key):
|
|||||||
|
|
||||||
globals().update(_PLUGIN_CLASSES)
|
globals().update(_PLUGIN_CLASSES)
|
||||||
__all__ = [name for name in globals() if name.endswith('PP')]
|
__all__ = [name for name in globals() if name.endswith('PP')]
|
||||||
__all__.extend(('PostProcessor', 'FFmpegPostProcessor'))
|
__all__.extend(('FFmpegPostProcessor', 'PostProcessor'))
|
||||||
|
@ -626,7 +626,7 @@ def run(self, info):
|
|||||||
sub_ext = sub_info['ext']
|
sub_ext = sub_info['ext']
|
||||||
if sub_ext == 'json':
|
if sub_ext == 'json':
|
||||||
self.report_warning('JSON subtitles cannot be embedded')
|
self.report_warning('JSON subtitles cannot be embedded')
|
||||||
elif ext != 'webm' or ext == 'webm' and sub_ext == 'vtt':
|
elif ext != 'webm' or (ext == 'webm' and sub_ext == 'vtt'):
|
||||||
sub_langs.append(lang)
|
sub_langs.append(lang)
|
||||||
sub_names.append(sub_info.get('name'))
|
sub_names.append(sub_info.get('name'))
|
||||||
sub_filenames.append(sub_info['filepath'])
|
sub_filenames.append(sub_info['filepath'])
|
||||||
|
@ -65,9 +65,14 @@ def _get_variant_and_executable_path():
|
|||||||
machine = '_legacy' if version_tuple(platform.mac_ver()[0]) < (10, 15) else ''
|
machine = '_legacy' if version_tuple(platform.mac_ver()[0]) < (10, 15) else ''
|
||||||
else:
|
else:
|
||||||
machine = f'_{platform.machine().lower()}'
|
machine = f'_{platform.machine().lower()}'
|
||||||
|
is_64bits = sys.maxsize > 2**32
|
||||||
# Ref: https://en.wikipedia.org/wiki/Uname#Examples
|
# Ref: https://en.wikipedia.org/wiki/Uname#Examples
|
||||||
if machine[1:] in ('x86', 'x86_64', 'amd64', 'i386', 'i686'):
|
if machine[1:] in ('x86', 'x86_64', 'amd64', 'i386', 'i686'):
|
||||||
machine = '_x86' if platform.architecture()[0][:2] == '32' else ''
|
machine = '_x86' if not is_64bits else ''
|
||||||
|
# platform.machine() on 32-bit raspbian OS may return 'aarch64', so check "64-bitness"
|
||||||
|
# See: https://github.com/yt-dlp/yt-dlp/issues/11813
|
||||||
|
elif machine[1:] == 'aarch64' and not is_64bits:
|
||||||
|
machine = '_armv7l'
|
||||||
# sys.executable returns a /tmp/ path for staticx builds (linux_static)
|
# sys.executable returns a /tmp/ path for staticx builds (linux_static)
|
||||||
# Ref: https://staticx.readthedocs.io/en/latest/usage.html#run-time-information
|
# Ref: https://staticx.readthedocs.io/en/latest/usage.html#run-time-information
|
||||||
if static_exe_path := os.getenv('STATICX_PROG_PATH'):
|
if static_exe_path := os.getenv('STATICX_PROG_PATH'):
|
||||||
@ -525,11 +530,16 @@ def filename(self):
|
|||||||
@functools.cached_property
|
@functools.cached_property
|
||||||
def cmd(self):
|
def cmd(self):
|
||||||
"""The command-line to run the executable, if known"""
|
"""The command-line to run the executable, if known"""
|
||||||
|
argv = None
|
||||||
# There is no sys.orig_argv in py < 3.10. Also, it can be [] when frozen
|
# There is no sys.orig_argv in py < 3.10. Also, it can be [] when frozen
|
||||||
if getattr(sys, 'orig_argv', None):
|
if getattr(sys, 'orig_argv', None):
|
||||||
return sys.orig_argv
|
argv = sys.orig_argv
|
||||||
elif getattr(sys, 'frozen', False):
|
elif getattr(sys, 'frozen', False):
|
||||||
return sys.argv
|
argv = sys.argv
|
||||||
|
# linux_static exe's argv[0] will be /tmp/staticx-NNNN/yt-dlp_linux if we don't fixup here
|
||||||
|
if argv and os.getenv('STATICX_PROG_PATH'):
|
||||||
|
argv = [self.filename, *argv[1:]]
|
||||||
|
return argv
|
||||||
|
|
||||||
def restart(self):
|
def restart(self):
|
||||||
"""Restart the executable"""
|
"""Restart the executable"""
|
||||||
|
@ -2683,8 +2683,8 @@ def merge_dicts(*dicts):
|
|||||||
merged = {}
|
merged = {}
|
||||||
for a_dict in dicts:
|
for a_dict in dicts:
|
||||||
for k, v in a_dict.items():
|
for k, v in a_dict.items():
|
||||||
if (v is not None and k not in merged
|
if ((v is not None and k not in merged)
|
||||||
or isinstance(v, str) and merged[k] == ''):
|
or (isinstance(v, str) and merged[k] == '')):
|
||||||
merged[k] = v
|
merged[k] = v
|
||||||
return merged
|
return merged
|
||||||
|
|
||||||
|
@ -1,8 +1,8 @@
|
|||||||
# Autogenerated by devscripts/update-version.py
|
# Autogenerated by devscripts/update-version.py
|
||||||
|
|
||||||
__version__ = '2024.11.18'
|
__version__ = '2025.01.15'
|
||||||
|
|
||||||
RELEASE_GIT_HEAD = '7ea2787920cccc6b8ea30791993d114fbd564434'
|
RELEASE_GIT_HEAD = 'c8541f8b13e743fcfa06667530d13fee8686e22a'
|
||||||
|
|
||||||
VARIANT = None
|
VARIANT = None
|
||||||
|
|
||||||
@ -12,4 +12,4 @@
|
|||||||
|
|
||||||
ORIGIN = 'yt-dlp/yt-dlp'
|
ORIGIN = 'yt-dlp/yt-dlp'
|
||||||
|
|
||||||
_pkg_version = '2024.11.18'
|
_pkg_version = '2025.01.15'
|
||||||
|
Loading…
Reference in New Issue
Block a user