1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-20 22:01:21 +00:00

Compare commits

..

89 Commits

Author SHA1 Message Date
github-actions[bot]
035b1ece8f Release 2025.07.21
Created by: bashonly

:ci skip all
2025-07-21 23:47:12 +00:00
sepro
9951fdd0d0 [cleanup] Misc (#13595)
Closes #10853, Closes #12436, Closes #13314, Closes #13609
Authored by: seproDev, InvalidUsernameException, doe1080, hseg, bashonly, adamralph

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
Co-authored-by: InvalidUsernameException <InvalidUsernameException@users.noreply.github.com>
Co-authored-by: gesh <gesh@gesh.uni.cx>
Co-authored-by: Adam Ralph <adam@adamralph.com>
Co-authored-by: doe1080 <98906116+doe1080@users.noreply.github.com>
2025-07-21 23:43:30 +00:00
Simon Sawicki
959ac99e98 Fix --exec placeholder expansion on Windows
See https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-45hg-7f49-5h56 for more details

Authored by: Grub4K
2025-07-21 18:19:46 -05:00
bashonly
d88b304d44 [ie/patreon:campaign] Fix extractor (#13712)
Closes #13622
Authored by: bashonly
2025-07-21 23:15:31 +00:00
bashonly
b15aa8d772 [ie/BiliBiliBangumi] Fix extractor (#13800)
Closes #13795
Authored by: bashonly
2025-07-21 23:11:58 +00:00
c-basalt
d3edc5d52a [ie/bilibili] Pass newer user-agent with API requests (#13736)
Closes #12887
Authored by: c-basalt
2025-07-21 23:04:43 +00:00
doe1080
060c6a4501 [ie/skeb] Rework extractor (#13593)
Closes #7440
Authored by: doe1080
2025-07-21 22:32:10 +00:00
doe1080
6be26626f7 [utils] unified_timestamp: Return int values (#13796)
Authored by: doe1080
2025-07-21 21:59:13 +00:00
bashonly
ef103b2d11 [ie/hotstar] Fix error handling (#13793)
Fix 7e0af2b1f0

Closes #13790
Authored by: bashonly
2025-07-21 19:09:52 +00:00
bashonly
3e49bc8a1b Make extractor-designated impersonation override --impersonate (#13792)
Fix 32809eb2da

Authored by: bashonly
2025-07-21 18:42:21 +00:00
bashonly
2ac3eb9837 Fix ImpersonateTarget sanitization (#13791)
Fix 32809eb2da

Authored by: bashonly
2025-07-21 18:41:00 +00:00
bashonly
8820101aa3 [ie/youtube] Use impersonation for downloading subtitles (#13786)
Closes #13770
Authored by: bashonly
2025-07-20 23:22:04 +00:00
bashonly
a4561c7a66 [rh:requests] Refactor default headers (#13785)
Authored by: bashonly
2025-07-20 23:20:58 +00:00
bashonly
32809eb2da Allow extractors to designate formats/subtitles for impersonation (#13778)
Authored by: bashonly
2025-07-20 23:05:43 +00:00
WouterGordts
f9dff95cb1 [ie/bandcamp] Extract tags (#13480)
Authored by: WouterGordts
2025-07-20 20:12:40 +00:00
Tim
790c286ce3 [ie/10play] Support new site domain (#13611)
Closes #13577
Authored by: Georift
2025-07-20 20:00:44 +00:00
bashonly
87e3dc8c7f [ie/mlbtv] Make formats downloadable with ffmpeg (#13761)
Authored by: bashonly
2025-07-20 19:57:20 +00:00
R0hanW
1a8474c3ca [ie/PlayerFm] Add extractor (#13016)
Closes #4518
Authored by: R0hanW
2025-07-19 01:38:52 +02:00
bashonly
09982bc33e [ie/dangalplay] Support other login regions (#13768)
Authored by: bashonly
2025-07-18 23:24:52 +00:00
Víctor Schmidt
c8329fc572 [ie/rai] Fix formats extraction (#13572)
Closes #13548
Authored by: moonshinerd, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-18 22:43:04 +00:00
bashonly
1f27a9f8ba [core] Warn when skipping formats (#13090)
Authored by: bashonly
2025-07-18 21:59:50 +00:00
bashonly
4919051e44 [core] Don't let format testing alter the return code (#13767)
Closes #13750
Authored by: bashonly
2025-07-18 21:55:02 +00:00
bashonly
5f951ce929 [ie/aenetworks] Support new URL formats (#13747)
Closes #13745
Authored by: bashonly
2025-07-18 20:06:02 +00:00
bashonly
28bf46b7da [utils] urlhandle_detect_ext: Use x-amz-meta-file-type headers (#13749)
Authored by: bashonly
2025-07-18 19:46:06 +00:00
bashonly
b8abd255e4 [utils] mimetype2ext: Always parse flac from audio/flac (#13748)
Authored by: bashonly
2025-07-18 19:43:40 +00:00
bashonly
c1ac543c81 [ie/soundcloud] Always extract original format extension (#13746)
Closes #13743
Authored by: bashonly
2025-07-16 23:19:58 +00:00
flanter21
dcc4cba39e [ie/blackboardcollaborate] Support subtitles and authwalled videos (#12473)
Authored by: flanter21
2025-07-16 23:17:48 +00:00
Nikolay Fedorov
3a84be9d16 [ie/TheHighWire] Add extractor (#13505)
Closes #13364
Authored by: swayll
2025-07-14 19:01:53 +00:00
rdamas
d42a6ff0c4 [ie/archive.org] Fix extractor (#13706)
Closes #13704
Authored by: rdamas
2025-07-14 18:55:52 +00:00
bashonly
ade876efb3 [ie/francetv] Improve error handling (#13726)
Closes #13324
Authored by: bashonly
2025-07-14 17:25:45 +00:00
bashonly
7e0af2b1f0 [ie/hotstar] Improve error handling (#13727)
Authored by: bashonly
2025-07-14 17:24:52 +00:00
doe1080
d57a0b5aa7 [ie/noovo] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
doe1080
6fb3947c0d [ie/bellmedia] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
doe1080
9f54ea3898 [ie/ctv] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
chauhantirth
07d1d85f63 [ie/hotstar] Fix support for free accounts (#13700)
Fixes b5bd057fe8

Closes #13600
Authored by: chauhantirth
2025-07-13 22:35:26 +00:00
doe1080
5d693446e8 [ie/limelight] Remove extractors (#13267)
Authored by: doe1080
2025-07-14 00:10:59 +02:00
doe1080
23e9389f93 [ie/bandaichannel] Remove extractor (#13152)
Closes #8829
Authored by: doe1080
2025-07-13 23:53:47 +02:00
doe1080
6d39c420f7 [ie/JoqrAg] Remove extractor (#13152)
Authored by: doe1080
2025-07-13 23:53:47 +02:00
barsnick
85c3fa1925 [ie/RaiSudtirol] Support alternative domain (#13718)
Authored by: barsnick
2025-07-13 23:35:10 +02:00
Povilas Balzaravičius
b4b4486eff [ie/LRTRadio] Fix extractor (#13717)
Authored by: Pawka
2025-07-13 21:24:37 +00:00
Frank Cai
630f3389c3 [ie/UnitedNationsWebTv] Add extractor (#13538)
Closes #2675
Authored by: averageFOSSenjoyer
2025-07-13 23:16:01 +02:00
bashonly
a6db1d297a [ie/vimeo] Handle age-restricted videos (#13719)
Closes #13716
Authored by: bashonly
2025-07-13 21:09:39 +00:00
ShockedPlot7560
0f33950c77 [ie/mixlr] Add extractors (#13561)
Authored by: ShockedPlot7560, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-13 01:35:51 +02:00
bashonly
b5fea53f20 [ie] Rework _search_nextjs_v13_data helper (#13711)
Fix 5245231e4a

Authored by: bashonly
2025-07-12 23:12:05 +00:00
bashonly
5245231e4a [ie] Add _search_nextjs_v13_data helper (#13398)
* Fixes FranceTVSiteIE livestream extraction
* Fixes GoPlayIE metadata extraction

Authored by: bashonly
2025-07-12 22:12:46 +00:00
Lyuben Ivanov
3ae61e0f31 [ie/BTVPlus] Add extractor (#13541)
Authored by: bubo
2025-07-12 21:56:11 +02:00
bashonly
a5d697f62d [ie/vimeo] Fix extractor (#13692)
Closes #13180, Closes #13689
Authored by: bashonly
2025-07-12 19:23:22 +00:00
coletdjnz
6e5bee418b [ie/youtube] Ensure context params are consistent for web clients (#13701)
Authored by: coletdjnz
2025-07-12 13:44:27 +12:00
coletdjnz
5b57b72c1a [ie/youtube] Do not require PO Token for premium accounts (#13640)
Authored by: coletdjnz
2025-07-11 18:54:01 +12:00
doe1080
2aaf1aa71d [ie/newspicks] Fix extractor (#13612)
Closes #10472
Authored by: doe1080
2025-07-09 22:21:47 +00:00
Nikolay Fedorov
7b4c96e089 [ie/mir24.tv] Add extractor (#13651)
Closes #13365
Authored by: swayll
2025-07-09 22:16:33 +00:00
bashonly
0b359b184d [ie/9gag] Support browser impersonation (#13678)
Closes #10837
Authored by: bashonly
2025-07-09 21:58:19 +00:00
bashonly
805519bfaa [jsinterp] Fix undefined variable name caching (#13677)
Fix b342d27f3f

Authored by: bashonly
2025-07-09 20:45:47 +00:00
coletdjnz
aa9f1f4d57 [ie/youtube] Log bad playability statuses of player responses (#13647)
Authored by: coletdjnz
2025-07-09 18:29:54 +12:00
InvalidUsernameException
fd36b8f31b [test:download] Support playlist_maxcount (#13433)
Authored by: InvalidUsernameException
2025-07-08 04:19:03 +00:00
barsnick
99093e96fd [devscripts] Fix filename/directory Bash completions (#13620)
Closes #13619
Authored by: barsnick
2025-07-08 04:18:15 +00:00
garret1317
7c49a93788 [ie/NhkRadiru] Fix metadata extraction (#12708)
Authored by: garret1317
2025-07-08 03:55:19 +00:00
bashonly
884f35d54a [ie/BiliBiliBangumi] Fix geo-block detection (#13667)
Closes #13634
Authored by: bashonly
2025-07-08 03:54:27 +00:00
bashonly
c23d837b65 [ie/youtube:tab] Fix subscriptions feed extraction (#13665)
Adds support for LOCKUP_CONTENT_TYPE_VIDEO view models

Closes #13658
Authored by: bashonly
2025-07-07 20:25:34 +00:00
bashonly
a7113722ec [fd/hls] Do not fall back to ffmpeg when native is required (#13655)
Authored by: bashonly
2025-07-06 22:14:22 +00:00
bashonly
0e68332bcb [ie/youtube] Fix subtitles extraction (#13659)
Fixes regression introduced in 2ba5391cd6

Closes #13654
Authored by: bashonly
2025-07-06 22:07:21 +00:00
bashonly
422cc8cb2f [ie/twitch] Improve error handling (#13618)
Authored by: bashonly
2025-07-06 22:03:34 +00:00
bashonly
fca94ac5d6 [ie/youtube] Extract global nsig helper functions (#13639)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-05 18:23:15 -05:00
bashonly
b342d27f3f [jsinterp] Cache undefined variable names (#13639)
Authored by: bashonly
2025-07-05 18:23:15 -05:00
bashonly
b6328ca050 [jsinterp] Fix variable scoping (#13639)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-05 18:23:15 -05:00
bashonly
0b41746964 [ie/sproutvideo] Fix extractor (#13610)
Closes #13606
Authored by: bashonly
2025-07-02 13:21:06 +00:00
Simon Sawicki
c316416b97 [rh:requests] Do not allocate 2GB on read (#13603)
Fixes c2ff2dbaec

Authored by: Grub4K
2025-07-02 01:42:00 +02:00
Simon Sawicki
e99c0b838a [ie] Detect invalid m3u8 playlist data (#13601)
Authored by: Grub4K
2025-07-02 00:32:32 +02:00
Simon Sawicki
c2ff2dbaec [rh:requests] Work around partial read dropping data (#13599)
Authored by: Grub4K
2025-07-02 00:12:43 +02:00
sepro
ca5cce5b07 [cleanup] Bump ruff to 0.12.x (#13596)
Authored by: seproDev
2025-07-01 21:17:11 +02:00
sepro
f3008bc5f8 No longer enable --mtime by default (#12781)
Closes #12780
Authored by: seproDev
2025-07-01 13:23:53 +02:00
github-actions[bot]
30fa54280b Release 2025.06.30
Created by: bashonly

:ci skip all
2025-06-30 23:47:20 +00:00
bashonly
b018784498 [cleanup] Misc (#13590)
Authored by: bashonly
2025-06-30 23:44:42 +00:00
bashonly
11b9416e10 [ie/sproutvideo] Support browser impersonation (#13589)
Closes #13576
Authored by: bashonly
2025-06-30 23:37:56 +00:00
Clark
35fc33fbc5 [ie/sauceplus] Add extractor (#13567)
Authored by: ceandreasen, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-06-30 23:25:28 +00:00
helpimnotdrowning
b16722ede8 [ie/kick] Support subscriber-only content (#13550)
Closes #13442
Authored by: helpimnotdrowning
2025-06-30 23:24:04 +00:00
bashonly
500761e41a [ie] Fix m3u8 playlist data corruption (#13588)
Revert 7b81634fb1

Closes #13581
Authored by: bashonly
2025-06-30 23:06:22 +00:00
bashonly
2ba5391cd6 [ie/youtube] Fix premium formats extraction (#13586)
Fix ff6f94041a

Closes #13545
Authored by: bashonly
2025-06-30 23:02:59 +00:00
bashonly
e9f157669e [ie/hotstar] Fix formats extraction (#13585)
Fix b5bd057fe8

Authored by: bashonly
2025-06-30 19:19:43 +00:00
sepro
958153a226 [jsinterp] Fix extract_object (#13580)
Fixes sig extraction for YouTube player `e12fbea4`

Authored by: seproDev
2025-06-30 15:50:33 +02:00
bashonly
1b88384634 [ci] Add signature tests (#13582)
Authored by: bashonly
2025-06-30 13:05:52 +00:00
Simon Sawicki
7b81634fb1 [ie] Detect invalid m3u8 playlist data (#13563)
Authored by: Grub4K
2025-06-29 18:49:27 +02:00
bashonly
7e2504f941 [ie/jiocinema] Remove extractors (#13565)
Closes #10123, Closes #10144, Closes #10225, Closes #10240, Closes #10508
Authored by: bashonly
2025-06-28 23:32:21 +00:00
bashonly
4bd9a7ade7 [ie/hotstar:series] Fix extractor (#13564)
* Removes HotStarSeasonIE and HotStarPlaylistIE

Authored by: bashonly
2025-06-28 23:30:51 +00:00
chauhantirth
b5bd057fe8 [ie/hotstar] Fix extractor (#13530)
Closes #11195
Authored by: chauhantirth, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-06-28 02:29:43 +00:00
bashonly
5e292baad6 [ie/hotstar] Raise for login required (#10405)
Closes #10366
Authored by: bashonly
2025-06-27 22:31:06 +00:00
bashonly
0a6b104489 [ie/hotstar] Fix metadata extraction (#13560)
Closes #7946
Authored by: bashonly
2025-06-27 22:29:37 +00:00
doe1080
06c1a8cdff [ie/niconico:live] Fix extractor and downloader (#13158)
Authored by: doe1080
2025-06-26 17:45:03 +00:00
c-basalt
99b85ac102 [ie/BilibiliSpaceVideo] Extract hidden-mode collections as playlists (#13533)
Closes #13435
Authored by: c-basalt
2025-06-26 17:42:41 +00:00
79 changed files with 2970 additions and 2282 deletions

41
.github/workflows/signature-tests.yml vendored Normal file
View File

@@ -0,0 +1,41 @@
name: Signature Tests
on:
push:
paths:
- .github/workflows/signature-tests.yml
- test/test_youtube_signature.py
- yt_dlp/jsinterp.py
pull_request:
paths:
- .github/workflows/signature-tests.yml
- test/test_youtube_signature.py
- yt_dlp/jsinterp.py
permissions:
contents: read
concurrency:
group: signature-tests-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
tests:
name: Signature Tests
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13', pypy-3.10, pypy-3.11]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install test requirements
run: python3 ./devscripts/install_deps.py --only-optional --include test
- name: Run tests
timeout-minutes: 15
run: |
python3 -m yt_dlp -v || true # Print debug head
python3 ./devscripts/run_tests.py test/test_youtube_signature.py

View File

@@ -126,7 +126,7 @@ By sharing an account with anyone, you agree to bear all risks associated with i
While these steps won't necessarily ensure that no misuse of the account takes place, these are still some good practices to follow. While these steps won't necessarily ensure that no misuse of the account takes place, these are still some good practices to follow.
- Look for people with `Member` (maintainers of the project) or `Contributor` (people who have previously contributed code) tag on their messages. - Look for people with `Member` (maintainers of the project) or `Contributor` (people who have previously contributed code) tag on their messages.
- Change the password before sharing the account to something random (use [this](https://passwordsgenerator.net/) if you don't have a random password generator). - Change the password before sharing the account to something random.
- Change the password after receiving the account back. - Change the password after receiving the account back.
### Is the website primarily used for piracy? ### Is the website primarily used for piracy?

View File

@@ -781,3 +781,15 @@ maxbin123
nullpos nullpos
anlar anlar
eason1478 eason1478
ceandreasen
chauhantirth
helpimnotdrowning
adamralph
averageFOSSenjoyer
bubo
flanter21
Georift
moonshinerd
R0hanW
ShockedPlot7560
swayll

View File

@@ -4,6 +4,120 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.07.21
#### Important changes
- **Default behaviour changed from `--mtime` to `--no-mtime`**
yt-dlp no longer applies the server modified time to downloaded files by default. [Read more](https://github.com/yt-dlp/yt-dlp/issues/12780)
- Security: [[CVE-2025-54072](https://nvd.nist.gov/vuln/detail/CVE-2025-54072)] [Fix `--exec` placeholder expansion on Windows](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-45hg-7f49-5h56)
- When `--exec` is used on Windows, the filepath expanded from `{}` (or the default placeholder) is now properly escaped
#### Core changes
- [Allow extractors to designate formats/subtitles for impersonation](https://github.com/yt-dlp/yt-dlp/commit/32809eb2da92c649e540a5b714f6235036026161) ([#13778](https://github.com/yt-dlp/yt-dlp/issues/13778)) by [bashonly](https://github.com/bashonly) (With fixes in [3e49bc8](https://github.com/yt-dlp/yt-dlp/commit/3e49bc8a1bdb4109b857f2c361c358e86fa63405), [2ac3eb9](https://github.com/yt-dlp/yt-dlp/commit/2ac3eb98373d1c31341c5e918c83872c7ff409c6))
- [Don't let format testing alter the return code](https://github.com/yt-dlp/yt-dlp/commit/4919051e447c7f8ae9df8ba5c4208b6b5c04915a) ([#13767](https://github.com/yt-dlp/yt-dlp/issues/13767)) by [bashonly](https://github.com/bashonly)
- [Fix `--exec` placeholder expansion on Windows](https://github.com/yt-dlp/yt-dlp/commit/959ac99e98c3215437e573c22d64be42d361e863) by [Grub4K](https://github.com/Grub4K)
- [No longer enable `--mtime` by default](https://github.com/yt-dlp/yt-dlp/commit/f3008bc5f89d2691f2f8dfc51b406ef4e25281c3) ([#12781](https://github.com/yt-dlp/yt-dlp/issues/12781)) by [seproDev](https://github.com/seproDev)
- [Warn when skipping formats](https://github.com/yt-dlp/yt-dlp/commit/1f27a9f8baccb9105f2476154557540efe09a937) ([#13090](https://github.com/yt-dlp/yt-dlp/issues/13090)) by [bashonly](https://github.com/bashonly)
- **jsinterp**
- [Cache undefined variable names](https://github.com/yt-dlp/yt-dlp/commit/b342d27f3f82d913976509ddf5bff539ad8567ec) ([#13639](https://github.com/yt-dlp/yt-dlp/issues/13639)) by [bashonly](https://github.com/bashonly) (With fixes in [805519b](https://github.com/yt-dlp/yt-dlp/commit/805519bfaa7cb5443912dfe45ac774834ba65a16))
- [Fix variable scoping](https://github.com/yt-dlp/yt-dlp/commit/b6328ca05030d815222b25d208cc59a964623bf9) ([#13639](https://github.com/yt-dlp/yt-dlp/issues/13639)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- **utils**
- `mimetype2ext`: [Always parse `flac` from `audio/flac`](https://github.com/yt-dlp/yt-dlp/commit/b8abd255e454acbe0023cdb946f9eb461ced7eeb) ([#13748](https://github.com/yt-dlp/yt-dlp/issues/13748)) by [bashonly](https://github.com/bashonly)
- `unified_timestamp`: [Return `int` values](https://github.com/yt-dlp/yt-dlp/commit/6be26626f7cfa71d28e0fac2861eb04758810c5d) ([#13796](https://github.com/yt-dlp/yt-dlp/issues/13796)) by [doe1080](https://github.com/doe1080)
- `urlhandle_detect_ext`: [Use `x-amz-meta-file-type` headers](https://github.com/yt-dlp/yt-dlp/commit/28bf46b7dafe2e241137763bf570a2f91ba8a53a) ([#13749](https://github.com/yt-dlp/yt-dlp/issues/13749)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- [Add `_search_nextjs_v13_data` helper](https://github.com/yt-dlp/yt-dlp/commit/5245231e4a39ecd5595d4337d46d85e150e2430a) ([#13398](https://github.com/yt-dlp/yt-dlp/issues/13398)) by [bashonly](https://github.com/bashonly) (With fixes in [b5fea53](https://github.com/yt-dlp/yt-dlp/commit/b5fea53f2099bed41ba1b17ab0ac87c8dba5a5ec))
- [Detect invalid m3u8 playlist data](https://github.com/yt-dlp/yt-dlp/commit/e99c0b838a9c5feb40c0dcd291bd7b8620b8d36d) ([#13601](https://github.com/yt-dlp/yt-dlp/issues/13601)) by [Grub4K](https://github.com/Grub4K)
- **10play**: [Support new site domain](https://github.com/yt-dlp/yt-dlp/commit/790c286ce3e0b534ca2d8f6648ced220d888f139) ([#13611](https://github.com/yt-dlp/yt-dlp/issues/13611)) by [Georift](https://github.com/Georift)
- **9gag**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/0b359b184dee0c7052be482857bf562de67e4928) ([#13678](https://github.com/yt-dlp/yt-dlp/issues/13678)) by [bashonly](https://github.com/bashonly)
- **aenetworks**: [Support new URL formats](https://github.com/yt-dlp/yt-dlp/commit/5f951ce929b56a822514f1a02cc06af030855ec7) ([#13747](https://github.com/yt-dlp/yt-dlp/issues/13747)) by [bashonly](https://github.com/bashonly)
- **archive.org**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/d42a6ff0c4ca8893d722ff4e0c109aecbf4cc7cf) ([#13706](https://github.com/yt-dlp/yt-dlp/issues/13706)) by [rdamas](https://github.com/rdamas)
- **bandaichannel**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/23e9389f936ec5236a87815b8576e5ce567b2f77) ([#13152](https://github.com/yt-dlp/yt-dlp/issues/13152)) by [doe1080](https://github.com/doe1080)
- **bandcamp**: [Extract tags](https://github.com/yt-dlp/yt-dlp/commit/f9dff95cb1c138913011417b3bba020c0a691bba) ([#13480](https://github.com/yt-dlp/yt-dlp/issues/13480)) by [WouterGordts](https://github.com/WouterGordts)
- **bellmedia**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/6fb3947c0dc6d0e3eab5077c5bada8402f47a277) ([#13429](https://github.com/yt-dlp/yt-dlp/issues/13429)) by [doe1080](https://github.com/doe1080)
- **bilibili**: [Pass newer user-agent with API requests](https://github.com/yt-dlp/yt-dlp/commit/d3edc5d52a7159eda2331dbc7e14bf40a6585c81) ([#13736](https://github.com/yt-dlp/yt-dlp/issues/13736)) by [c-basalt](https://github.com/c-basalt)
- **bilibilibangumi**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/b15aa8d77257b86fa44c9a42a615dfe47ac5b3b7) ([#13800](https://github.com/yt-dlp/yt-dlp/issues/13800)) by [bashonly](https://github.com/bashonly)
- [Fix geo-block detection](https://github.com/yt-dlp/yt-dlp/commit/884f35d54a64f1e6e7be49459842f573fc3a2701) ([#13667](https://github.com/yt-dlp/yt-dlp/issues/13667)) by [bashonly](https://github.com/bashonly)
- **blackboardcollaborate**: [Support subtitles and authwalled videos](https://github.com/yt-dlp/yt-dlp/commit/dcc4cba39e2a79d3efce16afa28dbe245468489f) ([#12473](https://github.com/yt-dlp/yt-dlp/issues/12473)) by [flanter21](https://github.com/flanter21)
- **btvplus**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3ae61e0f313dd03a09060abc7a212775c3717818) ([#13541](https://github.com/yt-dlp/yt-dlp/issues/13541)) by [bubo](https://github.com/bubo)
- **ctv**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/9f54ea38984788811773ca2ceaca73864acf0e8a) ([#13429](https://github.com/yt-dlp/yt-dlp/issues/13429)) by [doe1080](https://github.com/doe1080)
- **dangalplay**: [Support other login regions](https://github.com/yt-dlp/yt-dlp/commit/09982bc33e2f1f9a1ff66e6738df44f15b36f6a6) ([#13768](https://github.com/yt-dlp/yt-dlp/issues/13768)) by [bashonly](https://github.com/bashonly)
- **francetv**: [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/ade876efb31d55d3394185ffc56942fdc8d325cc) ([#13726](https://github.com/yt-dlp/yt-dlp/issues/13726)) by [bashonly](https://github.com/bashonly)
- **hotstar**
- [Fix support for free accounts](https://github.com/yt-dlp/yt-dlp/commit/07d1d85f6387e4bdb107096f0131c7054f078bb9) ([#13700](https://github.com/yt-dlp/yt-dlp/issues/13700)) by [chauhantirth](https://github.com/chauhantirth)
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/7e0af2b1f0c3edb688603b022f3a9ca0bfdf75e9) ([#13727](https://github.com/yt-dlp/yt-dlp/issues/13727)) by [bashonly](https://github.com/bashonly) (With fixes in [ef103b2](https://github.com/yt-dlp/yt-dlp/commit/ef103b2d115bd0e880f9cfd2f7dd705f48e4b40d))
- **joqrag**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/6d39c420f7774562a106d90253e2ed5b75036321) ([#13152](https://github.com/yt-dlp/yt-dlp/issues/13152)) by [doe1080](https://github.com/doe1080)
- **limelight**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/5d693446e882931618c40c99bb593f0b87b30eb9) ([#13267](https://github.com/yt-dlp/yt-dlp/issues/13267)) by [doe1080](https://github.com/doe1080)
- **lrtradio**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/b4b4486effdcb96bb6b8148171a49ff579b69a4a) ([#13717](https://github.com/yt-dlp/yt-dlp/issues/13717)) by [Pawka](https://github.com/Pawka)
- **mir24.tv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/7b4c96e0898db048259ef5fdf12ed14e3605dce3) ([#13651](https://github.com/yt-dlp/yt-dlp/issues/13651)) by [swayll](https://github.com/swayll)
- **mixlr**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/0f33950c778331bf4803c76e8b0ba1862df93431) ([#13561](https://github.com/yt-dlp/yt-dlp/issues/13561)) by [seproDev](https://github.com/seproDev), [ShockedPlot7560](https://github.com/ShockedPlot7560)
- **mlbtv**: [Make formats downloadable with ffmpeg](https://github.com/yt-dlp/yt-dlp/commit/87e3dc8c7f78929d2ef4f4a44e6a567e04cd8226) ([#13761](https://github.com/yt-dlp/yt-dlp/issues/13761)) by [bashonly](https://github.com/bashonly)
- **newspicks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2aaf1aa71d174700859c9ec1a81109b78e34961c) ([#13612](https://github.com/yt-dlp/yt-dlp/issues/13612)) by [doe1080](https://github.com/doe1080)
- **nhkradiru**: [Fix metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/7c49a937887756efcfa162abdcf17e48c244cb0c) ([#12708](https://github.com/yt-dlp/yt-dlp/issues/12708)) by [garret1317](https://github.com/garret1317)
- **noovo**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/d57a0b5aa78d59324b037d37492fe86aa4fbf58a) ([#13429](https://github.com/yt-dlp/yt-dlp/issues/13429)) by [doe1080](https://github.com/doe1080)
- **patreon**: campaign: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/d88b304d44c599d81acfa4231502270c8b9fe2f8) ([#13712](https://github.com/yt-dlp/yt-dlp/issues/13712)) by [bashonly](https://github.com/bashonly)
- **playerfm**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1a8474c3ca6dbe51bb153b2b8eef7b9a61fa7dc3) ([#13016](https://github.com/yt-dlp/yt-dlp/issues/13016)) by [R0hanW](https://github.com/R0hanW)
- **rai**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/c8329fc572903eeed7edad1642773b2268b71a62) ([#13572](https://github.com/yt-dlp/yt-dlp/issues/13572)) by [moonshinerd](https://github.com/moonshinerd), [seproDev](https://github.com/seproDev)
- **raisudtirol**: [Support alternative domain](https://github.com/yt-dlp/yt-dlp/commit/85c3fa1925a9057ef4ae8af682686d5b3eb8e568) ([#13718](https://github.com/yt-dlp/yt-dlp/issues/13718)) by [barsnick](https://github.com/barsnick)
- **skeb**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/060c6a4501a0b8a92f1b9c12788f556d902c83c6) ([#13593](https://github.com/yt-dlp/yt-dlp/issues/13593)) by [doe1080](https://github.com/doe1080)
- **soundcloud**: [Always extract original format extension](https://github.com/yt-dlp/yt-dlp/commit/c1ac543c8166ff031d62e340b3244ca8556e3fb9) ([#13746](https://github.com/yt-dlp/yt-dlp/issues/13746)) by [bashonly](https://github.com/bashonly)
- **sproutvideo**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/0b41746964e1d0470ac286ce09408940a3a51147) ([#13610](https://github.com/yt-dlp/yt-dlp/issues/13610)) by [bashonly](https://github.com/bashonly)
- **thehighwire**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3a84be9d1660ef798ea28f929a20391bef6afda4) ([#13505](https://github.com/yt-dlp/yt-dlp/issues/13505)) by [swayll](https://github.com/swayll)
- **twitch**: [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/422cc8cb2ff2bd3b4c2bc64e23507b7e6f522c35) ([#13618](https://github.com/yt-dlp/yt-dlp/issues/13618)) by [bashonly](https://github.com/bashonly)
- **unitednationswebtv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/630f3389c33f0f7f6ec97e8917d20aeb4e4078da) ([#13538](https://github.com/yt-dlp/yt-dlp/issues/13538)) by [averageFOSSenjoyer](https://github.com/averageFOSSenjoyer)
- **vimeo**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/a5d697f62d8be78ffd472acb2f52c8bc32833003) ([#13692](https://github.com/yt-dlp/yt-dlp/issues/13692)) by [bashonly](https://github.com/bashonly)
- [Handle age-restricted videos](https://github.com/yt-dlp/yt-dlp/commit/a6db1d297ab40cc346de24aacbeab93112b2f4e1) ([#13719](https://github.com/yt-dlp/yt-dlp/issues/13719)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Do not require PO Token for premium accounts](https://github.com/yt-dlp/yt-dlp/commit/5b57b72c1a7c6bd249ffcebdf5630761ec664c10) ([#13640](https://github.com/yt-dlp/yt-dlp/issues/13640)) by [coletdjnz](https://github.com/coletdjnz)
- [Ensure context params are consistent for web clients](https://github.com/yt-dlp/yt-dlp/commit/6e5bee418bc108565108153fd745c8e7a59f16dd) ([#13701](https://github.com/yt-dlp/yt-dlp/issues/13701)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract global nsig helper functions](https://github.com/yt-dlp/yt-dlp/commit/fca94ac5d63ed6578b5cd9c8129d97a8a713c39a) ([#13639](https://github.com/yt-dlp/yt-dlp/issues/13639)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/0e68332bcb9fba87c42805b7a051eeb2bed36206) ([#13659](https://github.com/yt-dlp/yt-dlp/issues/13659)) by [bashonly](https://github.com/bashonly)
- [Log bad playability statuses of player responses](https://github.com/yt-dlp/yt-dlp/commit/aa9f1f4d577e99897ac16cd19d4e217d688ea75d) ([#13647](https://github.com/yt-dlp/yt-dlp/issues/13647)) by [coletdjnz](https://github.com/coletdjnz)
- [Use impersonation for downloading subtitles](https://github.com/yt-dlp/yt-dlp/commit/8820101aa3152e5f4811541c645f8b5de231ba8c) ([#13786](https://github.com/yt-dlp/yt-dlp/issues/13786)) by [bashonly](https://github.com/bashonly)
- tab: [Fix subscriptions feed extraction](https://github.com/yt-dlp/yt-dlp/commit/c23d837b6524d1e7a4595948871ba1708cba4dfa) ([#13665](https://github.com/yt-dlp/yt-dlp/issues/13665)) by [bashonly](https://github.com/bashonly)
#### Downloader changes
- **hls**: [Do not fall back to ffmpeg when native is required](https://github.com/yt-dlp/yt-dlp/commit/a7113722ec33f30fc898caee9242af2b82188a53) ([#13655](https://github.com/yt-dlp/yt-dlp/issues/13655)) by [bashonly](https://github.com/bashonly)
#### Networking changes
- **Request Handler**
- requests
- [Refactor default headers](https://github.com/yt-dlp/yt-dlp/commit/a4561c7a66c39d88efe7ae51e7fa1986faf093fb) ([#13785](https://github.com/yt-dlp/yt-dlp/issues/13785)) by [bashonly](https://github.com/bashonly)
- [Work around partial read dropping data](https://github.com/yt-dlp/yt-dlp/commit/c2ff2dbaec7929015373fe002e9bd4849931a4ce) ([#13599](https://github.com/yt-dlp/yt-dlp/issues/13599)) by [Grub4K](https://github.com/Grub4K) (With fixes in [c316416](https://github.com/yt-dlp/yt-dlp/commit/c316416b972d1b05e58fbcc21e80428b900ce102))
#### Misc. changes
- **cleanup**
- [Bump ruff to 0.12.x](https://github.com/yt-dlp/yt-dlp/commit/ca5cce5b07d51efe7310b449cdefeca8d873e9df) ([#13596](https://github.com/yt-dlp/yt-dlp/issues/13596)) by [seproDev](https://github.com/seproDev)
- Miscellaneous: [9951fdd](https://github.com/yt-dlp/yt-dlp/commit/9951fdd0d08b655cb1af8cd7f32a3fb7e2b1324e) by [adamralph](https://github.com/adamralph), [bashonly](https://github.com/bashonly), [doe1080](https://github.com/doe1080), [hseg](https://github.com/hseg), [InvalidUsernameException](https://github.com/InvalidUsernameException), [seproDev](https://github.com/seproDev)
- **devscripts**: [Fix filename/directory Bash completions](https://github.com/yt-dlp/yt-dlp/commit/99093e96fd6a26dea9d6e4bd1e4b16283b6ad1ee) ([#13620](https://github.com/yt-dlp/yt-dlp/issues/13620)) by [barsnick](https://github.com/barsnick)
- **test**: download: [Support `playlist_maxcount`](https://github.com/yt-dlp/yt-dlp/commit/fd36b8f31bafbd8096bdb92a446a0c9c6081209c) ([#13433](https://github.com/yt-dlp/yt-dlp/issues/13433)) by [InvalidUsernameException](https://github.com/InvalidUsernameException)
### 2025.06.30
#### Core changes
- **jsinterp**: [Fix `extract_object`](https://github.com/yt-dlp/yt-dlp/commit/958153a226214c86879e36211ac191bf78289578) ([#13580](https://github.com/yt-dlp/yt-dlp/issues/13580)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **bilibilispacevideo**: [Extract hidden-mode collections as playlists](https://github.com/yt-dlp/yt-dlp/commit/99b85ac102047446e6adf5b62bfc3c8d80b53778) ([#13533](https://github.com/yt-dlp/yt-dlp/issues/13533)) by [c-basalt](https://github.com/c-basalt)
- **hotstar**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/b5bd057fe86550f3aa67f2fc8790d1c6a251c57b) ([#13530](https://github.com/yt-dlp/yt-dlp/issues/13530)) by [bashonly](https://github.com/bashonly), [chauhantirth](https://github.com/chauhantirth) (With fixes in [e9f1576](https://github.com/yt-dlp/yt-dlp/commit/e9f157669e24953a88d15ce22053649db7a8e81e) by [bashonly](https://github.com/bashonly))
- [Fix metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/0a6b1044899f452cd10b6c7a6b00fa985a9a8b97) ([#13560](https://github.com/yt-dlp/yt-dlp/issues/13560)) by [bashonly](https://github.com/bashonly)
- [Raise for login required](https://github.com/yt-dlp/yt-dlp/commit/5e292baad62c749b6c340621ab2d0f904165ddfb) ([#10405](https://github.com/yt-dlp/yt-dlp/issues/10405)) by [bashonly](https://github.com/bashonly)
- series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4bd9a7ade7e0508b9795b3e72a69eeb40788b62b) ([#13564](https://github.com/yt-dlp/yt-dlp/issues/13564)) by [bashonly](https://github.com/bashonly)
- **jiocinema**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/7e2504f941a11ea2b0dba00de3f0295cdc253e79) ([#13565](https://github.com/yt-dlp/yt-dlp/issues/13565)) by [bashonly](https://github.com/bashonly)
- **kick**: [Support subscriber-only content](https://github.com/yt-dlp/yt-dlp/commit/b16722ede83377f77ea8352dcd0a6ca8e83b8f0f) ([#13550](https://github.com/yt-dlp/yt-dlp/issues/13550)) by [helpimnotdrowning](https://github.com/helpimnotdrowning)
- **niconico**: live: [Fix extractor and downloader](https://github.com/yt-dlp/yt-dlp/commit/06c1a8cdffe14050206683253726875144192ef5) ([#13158](https://github.com/yt-dlp/yt-dlp/issues/13158)) by [doe1080](https://github.com/doe1080)
- **sauceplus**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/35fc33fbc51c7f5392fb2300f65abf6cf107ef90) ([#13567](https://github.com/yt-dlp/yt-dlp/issues/13567)) by [bashonly](https://github.com/bashonly), [ceandreasen](https://github.com/ceandreasen)
- **sproutvideo**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/11b9416e10cff7513167d76d6c47774fcdd3e26a) ([#13589](https://github.com/yt-dlp/yt-dlp/issues/13589)) by [bashonly](https://github.com/bashonly)
- **youtube**: [Fix premium formats extraction](https://github.com/yt-dlp/yt-dlp/commit/2ba5391cd68ed4f2415c827d2cecbcbc75ace10b) ([#13586](https://github.com/yt-dlp/yt-dlp/issues/13586)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **ci**: [Add signature tests](https://github.com/yt-dlp/yt-dlp/commit/1b883846347addeab12663fd74317fd544341a1c) ([#13582](https://github.com/yt-dlp/yt-dlp/issues/13582)) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [b018784](https://github.com/yt-dlp/yt-dlp/commit/b0187844988e557c7e1e6bb1aabd4c1176768d86) by [bashonly](https://github.com/bashonly)
### 2025.06.25 ### 2025.06.25
#### Extractor changes #### Extractor changes

View File

@@ -277,7 +277,7 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
<!-- MANPAGE: BEGIN EXCLUDED SECTION --> <!-- MANPAGE: BEGIN EXCLUDED SECTION -->
yt-dlp [OPTIONS] [--] URL [URL...] yt-dlp [OPTIONS] [--] URL [URL...]
`Ctrl+F` is your friend :D Tip: Use `CTRL`+`F` (or `Command`+`F`) to search by keywords
<!-- MANPAGE: END EXCLUDED SECTION --> <!-- MANPAGE: END EXCLUDED SECTION -->
<!-- Auto generated --> <!-- Auto generated -->
@@ -639,9 +639,9 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--no-part Do not use .part files - write directly into --no-part Do not use .part files - write directly into
output file output file
--mtime Use the Last-modified header to set the file --mtime Use the Last-modified header to set the file
modification time (default) modification time
--no-mtime Do not use the Last-modified header to set --no-mtime Do not use the Last-modified header to set
the file modification time the file modification time (default)
--write-description Write video description to a .description file --write-description Write video description to a .description file
--no-write-description Do not write video description (default) --no-write-description Do not write video description (default)
--write-info-json Write video metadata to a .info.json file --write-info-json Write video metadata to a .info.json file
@@ -1156,15 +1156,15 @@ You can configure yt-dlp by placing any supported command line option in a confi
* `/etc/yt-dlp/config` * `/etc/yt-dlp/config`
* `/etc/yt-dlp/config.txt` * `/etc/yt-dlp/config.txt`
E.g. with the following configuration file, yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory: E.g. with the following configuration file, yt-dlp will always extract the audio, copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
``` ```
# Lines starting with # are comments # Lines starting with # are comments
# Always extract audio # Always extract audio
-x -x
# Do not copy the mtime # Copy the mtime
--no-mtime --mtime
# Use this proxy # Use this proxy
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
@@ -1799,6 +1799,7 @@ The following extractors use this feature:
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv`, `tv_simply` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios` * `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv`, `tv_simply` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual` * `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@@ -1900,6 +1901,10 @@ The following extractors use this feature:
#### tver #### tver
* `backend`: Backend API to use for extraction - one of `streaks` (default) or `brightcove` (deprecated) * `backend`: Backend API to use for extraction - one of `streaks` (default) or `brightcove` (deprecated)
#### vimeo
* `client`: Client to extract video data from. The currently available clients are `android`, `ios`, and `web`. Only one client can be used. The `android` client is used by default. If account cookies or credentials are used for authentication, then the `web` client is used by default. The `web` client only works with authentication. The `ios` client only works with previously cached OAuth tokens
* `original_format_policy`: Policy for when to try extracting original formats. One of `always`, `never`, or `auto`. The default `auto` policy tries to avoid exceeding the web client's API rate-limit by only making an extra request when Vimeo publicizes the video's downloadability
**Note**: These options may be changed/removed in the future without concern for backward compatibility **Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE --> <!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@@ -2262,6 +2267,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests. * yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests.
* The sub-modules `swfinterp`, `casefold` are removed. * The sub-modules `swfinterp`, `casefold` are removed.
* Passing `--simulate` (or calling `extract_info` with `download=False`) no longer alters the default format selection. See [#9843](https://github.com/yt-dlp/yt-dlp/issues/9843) for details. * Passing `--simulate` (or calling `extract_info` with `download=False`) no longer alters the default format selection. See [#9843](https://github.com/yt-dlp/yt-dlp/issues/9843) for details.
* yt-dlp no longer applies the server modified time to downloaded files by default. Use `--mtime` or `--compat-options mtime-by-default` to revert this.
For ease of use, a few more compat options are available: For ease of use, a few more compat options are available:
@@ -2271,7 +2277,7 @@ For ease of use, a few more compat options are available:
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization` * `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization`
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx` * `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Same as `--compat-options 2024,prefer-vp9-sort` * `--compat-options 2023`: Same as `--compat-options 2024,prefer-vp9-sort`
* `--compat-options 2024`: Currently does nothing. Use this to enable all future compat options * `--compat-options 2024`: Same as `--compat-options mtime-by-default`. Use this to enable all future compat options
The following compat options restore vulnerable behavior from before security patches: The following compat options restore vulnerable behavior from before security patches:

View File

@@ -10,9 +10,13 @@ __yt_dlp()
diropts="--cache-dir" diropts="--cache-dir"
if [[ ${prev} =~ ${fileopts} ]]; then if [[ ${prev} =~ ${fileopts} ]]; then
local IFS=$'\n'
type compopt &>/dev/null && compopt -o filenames
COMPREPLY=( $(compgen -f -- ${cur}) ) COMPREPLY=( $(compgen -f -- ${cur}) )
return 0 return 0
elif [[ ${prev} =~ ${diropts} ]]; then elif [[ ${prev} =~ ${diropts} ]]; then
local IFS=$'\n'
type compopt &>/dev/null && compopt -o dirnames
COMPREPLY=( $(compgen -d -- ${cur}) ) COMPREPLY=( $(compgen -d -- ${cur}) )
return 0 return 0
fi fi

View File

@@ -254,5 +254,23 @@
{ {
"action": "remove", "action": "remove",
"when": "d596824c2f8428362c072518856065070616e348" "when": "d596824c2f8428362c072518856065070616e348"
},
{
"action": "remove",
"when": "7b81634fb1d15999757e7a9883daa6ef09ea785b"
},
{
"action": "remove",
"when": "500761e41acb96953a5064e951d41d190c287e46"
},
{
"action": "add",
"when": "f3008bc5f89d2691f2f8dfc51b406ef4e25281c3",
"short": "[priority] **Default behaviour changed from `--mtime` to `--no-mtime`**\nyt-dlp no longer applies the server modified time to downloaded files by default. [Read more](https://github.com/yt-dlp/yt-dlp/issues/12780)"
},
{
"action": "add",
"when": "959ac99e98c3215437e573c22d64be42d361e863",
"short": "[priority] Security: [[CVE-2025-54072](https://nvd.nist.gov/vuln/detail/CVE-2025-54072)] [Fix `--exec` placeholder expansion on Windows](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-45hg-7f49-5h56)\n - When `--exec` is used on Windows, the filepath expanded from `{}` (or the default placeholder) is now properly escaped"
} }
] ]

View File

@@ -75,7 +75,7 @@ dev = [
] ]
static-analysis = [ static-analysis = [
"autopep8~=2.0", "autopep8~=2.0",
"ruff~=0.11.0", "ruff~=0.12.0",
] ]
test = [ test = [
"pytest~=8.1", "pytest~=8.1",
@@ -210,10 +210,12 @@ ignore = [
"TD001", # invalid-todo-tag "TD001", # invalid-todo-tag
"TD002", # missing-todo-author "TD002", # missing-todo-author
"TD003", # missing-todo-link "TD003", # missing-todo-link
"PLC0415", # import-outside-top-level
"PLE0604", # invalid-all-object (false positives) "PLE0604", # invalid-all-object (false positives)
"PLE0643", # potential-index-error (false positives) "PLE0643", # potential-index-error (false positives)
"PLW0603", # global-statement "PLW0603", # global-statement
"PLW1510", # subprocess-run-without-check "PLW1510", # subprocess-run-without-check
"PLW1641", # eq-without-hash
"PLW2901", # redefined-loop-name "PLW2901", # redefined-loop-name
"RUF001", # ambiguous-unicode-character-string "RUF001", # ambiguous-unicode-character-string
"RUF012", # mutable-class-default "RUF012", # mutable-class-default

View File

@@ -133,7 +133,6 @@ The only reliable way to check if a site is supported is to try it.
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
- **BanBye** - **BanBye**
- **BanByeChannel** - **BanByeChannel**
- **bandaichannel**
- **Bandcamp** - **Bandcamp**
- **Bandcamp:album** - **Bandcamp:album**
- **Bandcamp:user** - **Bandcamp:user**
@@ -157,7 +156,6 @@ The only reliable way to check if a site is supported is to try it.
- **Beeg** - **Beeg**
- **BehindKink**: (**Currently broken**) - **BehindKink**: (**Currently broken**)
- **Bellator** - **Bellator**
- **BellMedia**
- **BerufeTV** - **BerufeTV**
- **Bet**: (**Currently broken**) - **Bet**: (**Currently broken**)
- **bfi:player**: (**Currently broken**) - **bfi:player**: (**Currently broken**)
@@ -197,6 +195,7 @@ The only reliable way to check if a site is supported is to try it.
- **BitChute** - **BitChute**
- **BitChuteChannel** - **BitChuteChannel**
- **BlackboardCollaborate** - **BlackboardCollaborate**
- **BlackboardCollaborateLaunch**
- **BleacherReport**: (**Currently broken**) - **BleacherReport**: (**Currently broken**)
- **BleacherReportCMS**: (**Currently broken**) - **BleacherReportCMS**: (**Currently broken**)
- **blerp** - **blerp**
@@ -225,6 +224,7 @@ The only reliable way to check if a site is supported is to try it.
- **Brilliantpala:Elearn**: [*brilliantpala*](## "netrc machine") VoD on elearn.brilliantpala.org - **Brilliantpala:Elearn**: [*brilliantpala*](## "netrc machine") VoD on elearn.brilliantpala.org
- **bt:article**: Bergens Tidende Articles - **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BTVPlus**
- **Bundesliga** - **Bundesliga**
- **Bundestag** - **Bundestag**
- **BunnyCdn** - **BunnyCdn**
@@ -317,7 +317,6 @@ The only reliable way to check if a site is supported is to try it.
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CSpanCongress** - **CSpanCongress**
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **cu.ntv.co.jp**: 日テレ無料TADA! - **cu.ntv.co.jp**: 日テレ無料TADA!
- **CultureUnplugged** - **CultureUnplugged**
@@ -575,9 +574,7 @@ The only reliable way to check if a site is supported is to try it.
- **HollywoodReporterPlaylist** - **HollywoodReporterPlaylist**
- **Holodex** - **Holodex**
- **HotNewHipHop**: (**Currently broken**) - **HotNewHipHop**: (**Currently broken**)
- **hotstar** - **hotstar**: JioHotstar
- **hotstar:playlist**
- **hotstar:season**
- **hotstar:series** - **hotstar:series**
- **hrfernsehen** - **hrfernsehen**
- **HRTi**: [*hrti*](## "netrc machine") - **HRTi**: [*hrti*](## "netrc machine")
@@ -647,8 +644,6 @@ The only reliable way to check if a site is supported is to try it.
- **Jamendo** - **Jamendo**
- **JamendoAlbum** - **JamendoAlbum**
- **JeuxVideo**: (**Currently broken**) - **JeuxVideo**: (**Currently broken**)
- **jiocinema**: [*jiocinema*](## "netrc machine")
- **jiocinema:series**: [*jiocinema*](## "netrc machine")
- **jiosaavn:album** - **jiosaavn:album**
- **jiosaavn:artist** - **jiosaavn:artist**
- **jiosaavn:playlist** - **jiosaavn:playlist**
@@ -656,7 +651,6 @@ The only reliable way to check if a site is supported is to try it.
- **jiosaavn:show:playlist** - **jiosaavn:show:playlist**
- **jiosaavn:song** - **jiosaavn:song**
- **Joj** - **Joj**
- **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)
- **Jove** - **Jove**
- **JStream** - **JStream**
- **JTBC**: jtbc.co.kr - **JTBC**: jtbc.co.kr
@@ -727,9 +721,6 @@ The only reliable way to check if a site is supported is to try it.
- **life:embed** - **life:embed**
- **likee** - **likee**
- **likee:user** - **likee:user**
- **limelight**
- **limelight:channel**
- **limelight:channel_list**
- **LinkedIn**: [*linkedin*](## "netrc machine") - **LinkedIn**: [*linkedin*](## "netrc machine")
- **linkedin:events**: [*linkedin*](## "netrc machine") - **linkedin:events**: [*linkedin*](## "netrc machine")
- **linkedin:learning**: [*linkedin*](## "netrc machine") - **linkedin:learning**: [*linkedin*](## "netrc machine")
@@ -811,6 +802,7 @@ The only reliable way to check if a site is supported is to try it.
- **minds:channel** - **minds:channel**
- **minds:group** - **minds:group**
- **Minoto** - **Minoto**
- **mir24.tv**
- **mirrativ** - **mirrativ**
- **mirrativ:user** - **mirrativ:user**
- **MirrorCoUK** - **MirrorCoUK**
@@ -821,6 +813,8 @@ The only reliable way to check if a site is supported is to try it.
- **mixcloud** - **mixcloud**
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:user** - **mixcloud:user**
- **Mixlr**
- **MixlrRecoring**
- **MLB** - **MLB**
- **MLBArticle** - **MLBArticle**
- **MLBTV**: [*mlb*](## "netrc machine") - **MLBTV**: [*mlb*](## "netrc machine")
@@ -977,7 +971,6 @@ The only reliable way to check if a site is supported is to try it.
- **NoicePodcast** - **NoicePodcast**
- **NonkTube** - **NonkTube**
- **NoodleMagazine** - **NoodleMagazine**
- **Noovo**
- **NOSNLArticle** - **NOSNLArticle**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz - **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- **NovaEmbed** - **NovaEmbed**
@@ -1101,6 +1094,7 @@ The only reliable way to check if a site is supported is to try it.
- **Platzi**: [*platzi*](## "netrc machine") - **Platzi**: [*platzi*](## "netrc machine")
- **PlatziCourse**: [*platzi*](## "netrc machine") - **PlatziCourse**: [*platzi*](## "netrc machine")
- **player.sky.it** - **player.sky.it**
- **PlayerFm**
- **playeur** - **playeur**
- **PlayPlusTV**: [*playplustv*](## "netrc machine") - **PlayPlusTV**: [*playplustv*](## "netrc machine")
- **PlaySuisse**: [*playsuisse*](## "netrc machine") - **PlaySuisse**: [*playsuisse*](## "netrc machine")
@@ -1299,6 +1293,7 @@ The only reliable way to check if a site is supported is to try it.
- **SampleFocus** - **SampleFocus**
- **Sangiin**: 参議院インターネット審議中継 (archive) - **Sangiin**: 参議院インターネット審議中継 (archive)
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **SaucePlus**: Sauce+
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
- **sbs.co.kr** - **sbs.co.kr**
- **sbs.co.kr:allvod_program** - **sbs.co.kr:allvod_program**
@@ -1475,11 +1470,12 @@ The only reliable way to check if a site is supported is to try it.
- **Tempo** - **Tempo**
- **TennisTV**: [*tennistv*](## "netrc machine") - **TennisTV**: [*tennistv*](## "netrc machine")
- **TF1** - **TF1**
- **TFO** - **TFO**: (**Currently broken**)
- **theatercomplextown:ppv**: [*theatercomplextown*](## "netrc machine") - **theatercomplextown:ppv**: [*theatercomplextown*](## "netrc machine")
- **theatercomplextown:vod**: [*theatercomplextown*](## "netrc machine") - **theatercomplextown:vod**: [*theatercomplextown*](## "netrc machine")
- **TheGuardianPodcast** - **TheGuardianPodcast**
- **TheGuardianPodcastPlaylist** - **TheGuardianPodcastPlaylist**
- **TheHighWire**
- **TheHoleTv** - **TheHoleTv**
- **TheIntercept** - **TheIntercept**
- **ThePlatform** - **ThePlatform**
@@ -1547,8 +1543,8 @@ The only reliable way to check if a site is supported is to try it.
- **tv2playseries.hu** - **tv2playseries.hu**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TV5MONDE** - **TV5MONDE**
- **tv5unis** - **tv5unis**: (**Currently broken**)
- **tv5unis:video** - **tv5unis:video**: (**Currently broken**)
- **tv8.it** - **tv8.it**
- **tv8.it:live**: TV8 Live - **tv8.it:live**: TV8 Live
- **tv8.it:playlist**: TV8 Playlist - **tv8.it:playlist**: TV8 Playlist
@@ -1603,6 +1599,7 @@ The only reliable way to check if a site is supported is to try it.
- **UlizaPortal**: ulizaportal.jp - **UlizaPortal**: ulizaportal.jp
- **umg:de**: Universal Music Deutschland - **umg:de**: Universal Music Deutschland
- **Unistra** - **Unistra**
- **UnitedNationsWebTv**
- **Unity**: (**Currently broken**) - **Unity**: (**Currently broken**)
- **uol.com.br** - **uol.com.br**
- **uplynk** - **uplynk**

View File

@@ -36,6 +36,18 @@ class InfoExtractorTestRequestHandler(http.server.BaseHTTPRequestHandler):
self.send_header('Content-Type', 'text/html; charset=utf-8') self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers() self.end_headers()
self.wfile.write(TEAPOT_RESPONSE_BODY.encode()) self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
elif self.path == '/fake.m3u8':
self.send_response(200)
self.send_header('Content-Length', '1024')
self.end_headers()
self.wfile.write(1024 * b'\x00')
elif self.path == '/bipbop.m3u8':
with open('test/testdata/m3u8/bipbop_16x9.m3u8', 'rb') as f:
data = f.read()
self.send_response(200)
self.send_header('Content-Length', str(len(data)))
self.end_headers()
self.wfile.write(data)
else: else:
assert False assert False
@@ -1947,6 +1959,37 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
with self.assertWarns(DeprecationWarning): with self.assertWarns(DeprecationWarning):
self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {}) self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {})
def test_search_nextjs_v13_data(self):
HTML = R'''
<script>(self.__next_f=self.__next_f||[]).push([0])</script>
<script>self.__next_f.push([2,"0:[\"$\",\"$L0\",null,{\"do_not_add_this\":\"fail\"}]\n"])</script>
<script>self.__next_f.push([1,"1:I[46975,[],\"HTTPAccessFallbackBoundary\"]\n2:I[32630,[\"8183\",\"static/chunks/8183-768193f6a9e33cdd.js\"]]\n"])</script>
<script nonce="abc123">self.__next_f.push([1,"e:[false,[\"$\",\"div\",null,{\"children\":[\"$\",\"$L18\",null,{\"foo\":\"bar\"}]}],false]\n "])</script>
<script>self.__next_f.push([1,"2a:[[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[]}],[\"$\",\"$L16\",null,{\"meta\":{\"dateCreated\":1730489700,\"uuid\":\"40cac41d-8d29-4ef5-aa11-75047b9f0907\"}}]]\n"])</script>
<script>self.__next_f.push([1,"df:[\"$undefined\",[\"$\",\"div\",null,{\"children\":[\"$\",\"$L17\",null,{}],\"do_not_include_this_field\":\"fail\"}],[\"$\",\"div\",null,{\"children\":[[\"$\",\"$L19\",null,{\"duplicated_field_name\":{\"x\":1}}],[\"$\",\"$L20\",null,{\"duplicated_field_name\":{\"y\":2}}]]}],\"$undefined\"]\n"])</script>
<script>self.__next_f.push([3,"MzM6WyIkIiwiJEwzMiIsbnVsbCx7ImRlY29kZWQiOiJzdWNjZXNzIn1d"])</script>
'''
EXPECTED = {
'18': {
'foo': 'bar',
},
'16': {
'meta': {
'dateCreated': 1730489700,
'uuid': '40cac41d-8d29-4ef5-aa11-75047b9f0907',
},
},
'19': {
'duplicated_field_name': {'x': 1},
},
'20': {
'duplicated_field_name': {'y': 2},
},
}
self.assertEqual(self.ie._search_nextjs_v13_data(HTML, None), EXPECTED)
self.assertEqual(self.ie._search_nextjs_v13_data('', None, fatal=False), {})
self.assertEqual(self.ie._search_nextjs_v13_data(None, None, fatal=False), {})
def test_search_nuxt_json(self): def test_search_nuxt_json(self):
HTML_TMPL = '<script data-ssr="true" id="__NUXT_DATA__" type="application/json">[{}]</script>' HTML_TMPL = '<script data-ssr="true" id="__NUXT_DATA__" type="application/json">[{}]</script>'
VALID_DATA = ''' VALID_DATA = '''
@@ -2079,5 +2122,45 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
self.ie._search_nuxt_json(HTML_TMPL.format(data), None, default=DEFAULT), DEFAULT) self.ie._search_nuxt_json(HTML_TMPL.format(data), None, default=DEFAULT), DEFAULT)
class TestInfoExtractorNetwork(unittest.TestCase):
def setUp(self, /):
self.httpd = http.server.HTTPServer(
('127.0.0.1', 0), InfoExtractorTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
self.called = False
def require_warning(*args, **kwargs):
self.called = True
self.ydl = FakeYDL()
self.ydl.report_warning = require_warning
self.ie = DummyIE(self.ydl)
def tearDown(self, /):
self.ydl.close()
self.httpd.shutdown()
self.httpd.server_close()
self.server_thread.join(1)
def test_extract_m3u8_formats(self):
formats, subtitles = self.ie._extract_m3u8_formats_and_subtitles(
f'http://127.0.0.1:{self.port}/bipbop.m3u8', None, fatal=False)
self.assertFalse(self.called)
self.assertTrue(formats)
self.assertTrue(subtitles)
def test_extract_m3u8_formats_warning(self):
formats, subtitles = self.ie._extract_m3u8_formats_and_subtitles(
f'http://127.0.0.1:{self.port}/fake.m3u8', None, fatal=False)
self.assertTrue(self.called, 'Warning was not issued for binary m3u8 file')
self.assertFalse(formats)
self.assertFalse(subtitles)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -14,6 +14,7 @@ import json
from test.helper import ( from test.helper import (
assertGreaterEqual, assertGreaterEqual,
assertLessEqual,
expect_info_dict, expect_info_dict,
expect_warnings, expect_warnings,
get_params, get_params,
@@ -65,10 +66,6 @@ tests_counter = collections.defaultdict(collections.Counter)
@is_download_test @is_download_test
class TestDownload(unittest.TestCase): class TestDownload(unittest.TestCase):
# Parallel testing in nosetests. See
# http://nose.readthedocs.org/en/latest/doc_tests/test_multiprocess/multiprocess.html
_multiprocess_shared_ = True
maxDiff = None maxDiff = None
COMPLETED_TESTS = {} COMPLETED_TESTS = {}
@@ -121,10 +118,13 @@ def generator(test_case, tname):
params = get_params(test_case.get('params', {})) params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl'] params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case: if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist') params.setdefault('playlistend', max(
params.setdefault('playlistend', test_case.get( test_case.get('playlist_mincount', -1),
'playlist_mincount', test_case.get('playlist_count', -2) + 1)) test_case.get('playlist_count', -2) + 1,
test_case.get('playlist_maxcount', -2) + 1))
params.setdefault('skip_download', True) params.setdefault('skip_download', True)
if 'playlist_duration_sum' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
ydl = YoutubeDL(params, auto_init=False) ydl = YoutubeDL(params, auto_init=False)
ydl.add_default_info_extractors() ydl.add_default_info_extractors()
@@ -159,6 +159,7 @@ def generator(test_case, tname):
try_rm(os.path.splitext(tc_filename)[0] + '.info.json') try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files() try_rm_tcs_files()
try: try:
test_url = test_case['url']
try_num = 1 try_num = 1
while True: while True:
try: try:
@@ -166,7 +167,7 @@ def generator(test_case, tname):
# for outside error handling, and returns the exit code # for outside error handling, and returns the exit code
# instead of the result dict. # instead of the result dict.
res_dict = ydl.extract_info( res_dict = ydl.extract_info(
test_case['url'], test_url,
force_generic_extractor=params.get('force_generic_extractor', False)) force_generic_extractor=params.get('force_generic_extractor', False))
except (DownloadError, ExtractorError) as err: except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one # Check if the exception is not a network related one
@@ -194,23 +195,23 @@ def generator(test_case, tname):
self.assertTrue('entries' in res_dict) self.assertTrue('entries' in res_dict)
expect_info_dict(self, res_dict, test_case.get('info_dict', {})) expect_info_dict(self, res_dict, test_case.get('info_dict', {}))
num_entries = len(res_dict.get('entries', []))
if 'playlist_mincount' in test_case: if 'playlist_mincount' in test_case:
mincount = test_case['playlist_mincount']
assertGreaterEqual( assertGreaterEqual(
self, self, num_entries, mincount,
len(res_dict['entries']), f'Expected at least {mincount} entries in playlist {test_url}, but got only {num_entries}')
test_case['playlist_mincount'],
'Expected at least %d in playlist %s, but got only %d' % (
test_case['playlist_mincount'], test_case['url'],
len(res_dict['entries'])))
if 'playlist_count' in test_case: if 'playlist_count' in test_case:
count = test_case['playlist_count']
got = num_entries if num_entries <= count else 'more'
self.assertEqual( self.assertEqual(
len(res_dict['entries']), num_entries, count,
test_case['playlist_count'], f'Expected exactly {count} entries in playlist {test_url}, but got {got}')
'Expected %d entries in playlist %s, but got %d.' % ( if 'playlist_maxcount' in test_case:
test_case['playlist_count'], maxcount = test_case['playlist_maxcount']
test_case['url'], assertLessEqual(
len(res_dict['entries']), self, num_entries, maxcount,
)) f'Expected at most {maxcount} entries in playlist {test_url}, but got more')
if 'playlist_duration_sum' in test_case: if 'playlist_duration_sum' in test_case:
got_duration = sum(e['duration'] for e in res_dict['entries']) got_duration = sum(e['duration'] for e in res_dict['entries'])
self.assertEqual( self.assertEqual(

View File

@@ -478,6 +478,10 @@ class TestJSInterpreter(unittest.TestCase):
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000}) func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111) self.assertEqual(func([1]), 1111)
def test_extract_object(self):
jsi = JSInterpreter('var a={};a.xy={};var xy;var zxy={};xy={z:function(){return "abc"}};')
self.assertTrue('z' in jsi.extract_object('xy', None))
def test_increment_decrement(self): def test_increment_decrement(self):
self._test('function f() { var x = 1; return ++x; }', 2) self._test('function f() { var x = 1; return ++x; }', 2)
self._test('function f() { var x = 1; return x++; }', 1) self._test('function f() { var x = 1; return x++; }', 1)
@@ -486,6 +490,57 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f() { var a = "test--"; return a; }', 'test--') self._test('function f() { var a = "test--"; return a; }', 'test--')
self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--') self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--')
def test_nested_function_scoping(self):
self._test(R'''
function f() {
var g = function() {
var P = 2;
return P;
};
var P = 1;
g();
return P;
}
''', 1)
self._test(R'''
function f() {
var x = function() {
for (var w = 1, M = []; w < 2; w++) switch (w) {
case 1:
M.push("a");
case 2:
M.push("b");
}
return M
};
var w = "c";
var M = "d";
var y = x();
y.push(w);
y.push(M);
return y;
}
''', ['a', 'b', 'c', 'd'])
self._test(R'''
function f() {
var P, Q;
var z = 100;
var g = function() {
var P, Q; P = 2; Q = 15;
z = 0;
return P+Q;
};
P = 1; Q = 10;
var x = g(), y = 3;
return P+Q+x+y+z;
}
''', 31)
def test_undefined_varnames(self):
jsi = JSInterpreter('function f(){ var a; return [a, b]; }')
self._test(jsi, [JS_Undefined, JS_Undefined])
self.assertEqual(jsi._undefined_varnames, {'b'})
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -22,7 +22,6 @@ import ssl
import tempfile import tempfile
import threading import threading
import time import time
import urllib.error
import urllib.request import urllib.request
import warnings import warnings
import zlib import zlib
@@ -223,10 +222,7 @@ class HTTPTestRequestHandler(http.server.BaseHTTPRequestHandler):
if encoding == 'br' and brotli: if encoding == 'br' and brotli:
payload = brotli.compress(payload) payload = brotli.compress(payload)
elif encoding == 'gzip': elif encoding == 'gzip':
buf = io.BytesIO() payload = gzip.compress(payload, mtime=0)
with gzip.GzipFile(fileobj=buf, mode='wb') as f:
f.write(payload)
payload = buf.getvalue()
elif encoding == 'deflate': elif encoding == 'deflate':
payload = zlib.compress(payload) payload = zlib.compress(payload)
elif encoding == 'unsupported': elif encoding == 'unsupported':
@@ -729,6 +725,17 @@ class TestHTTPRequestHandler(TestRequestHandlerBase):
assert 'X-test-heaDer: test' in res assert 'X-test-heaDer: test' in res
def test_partial_read_then_full_read(self, handler):
with handler() as rh:
for encoding in ('', 'gzip', 'deflate'):
res = validate_and_send(rh, Request(
f'http://127.0.0.1:{self.http_port}/content-encoding',
headers={'ytdl-encoding': encoding}))
assert res.headers.get('Content-Encoding') == encoding
assert res.read(6) == b'<html>'
assert res.read(0) == b''
assert res.read() == b'<video src="/vid.mp4" /></html>'
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True) @pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate: class TestClientCertificate:

View File

@@ -133,6 +133,11 @@ _SIG_TESTS = [
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA', '2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0', 'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
), ),
(
'https://www.youtube.com/s/player/e12fbea4/player_ias.vflset/en_US/base.js',
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt',
'JC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNG_1kOyBf6HPuAuCduh-a',
),
] ]
_NSIG_TESTS = [ _NSIG_TESTS = [
@@ -328,6 +333,50 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/fc2a56a5/tv-player-ias.vflset/tv-player-ias.js', 'https://www.youtube.com/s/player/fc2a56a5/tv-player-ias.vflset/tv-player-ias.js',
'qTKWg_Il804jd2kAC', 'OtUAm2W6gyzJjB9u', 'qTKWg_Il804jd2kAC', 'OtUAm2W6gyzJjB9u',
), ),
(
'https://www.youtube.com/s/player/a74bf670/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'hQP7k1hA22OrNTnq',
),
(
'https://www.youtube.com/s/player/6275f73c/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '-I03XF0iyf6I_X0A',
),
(
'https://www.youtube.com/s/player/20c72c18/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '-I03XF0iyf6I_X0A',
),
(
'https://www.youtube.com/s/player/9fe2e06e/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '6r5ekNIiEMPutZy',
),
(
'https://www.youtube.com/s/player/680f8c75/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '0ml9caTwpa55Jf',
),
(
'https://www.youtube.com/s/player/14397202/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'ozZFAN21okDdJTa',
),
(
'https://www.youtube.com/s/player/5dcb2c1f/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'p7iTbRZDYAF',
),
(
'https://www.youtube.com/s/player/a10d7fcc/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '9Zue7DDHJSD',
),
(
'https://www.youtube.com/s/player/8e20cb06/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '5-4tTneTROTpMzba',
),
(
'https://www.youtube.com/s/player/e12fbea4/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'XkeRfXIPOkSwfg',
),
(
'https://www.youtube.com/s/player/ef259203/player_ias_tce.vflset/en_US/base.js',
'rPqBC01nJpqhhi2iA2U', 'hY7dbiKFT51UIA',
),
] ]

View File

@@ -52,7 +52,7 @@ from .networking.exceptions import (
SSLError, SSLError,
network_exceptions, network_exceptions,
) )
from .networking.impersonate import ImpersonateRequestHandler from .networking.impersonate import ImpersonateRequestHandler, ImpersonateTarget
from .plugins import directories as plugin_directories, load_all_plugins from .plugins import directories as plugin_directories, load_all_plugins
from .postprocessor import ( from .postprocessor import (
EmbedThumbnailPP, EmbedThumbnailPP,
@@ -482,7 +482,8 @@ class YoutubeDL:
The following options do not work when used through the API: The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat, filename, abort-on-error, multistreams, no-live-chat,
format-sort, no-clean-infojson, no-playlist-metafiles, format-sort, no-clean-infojson, no-playlist-metafiles,
no-keep-subs, no-attach-info-json, allow-unsafe-ext, prefer-vp9-sort. no-keep-subs, no-attach-info-json, allow-unsafe-ext, prefer-vp9-sort,
mtime-by-default.
Refer __init__.py for their implementation Refer __init__.py for their implementation
progress_template: Dictionary of templates for progress outputs. progress_template: Dictionary of templates for progress outputs.
Allowed keys are 'download', 'postprocess', Allowed keys are 'download', 'postprocess',
@@ -528,6 +529,7 @@ class YoutubeDL:
discontinuities such as ad breaks (default: False) discontinuities such as ad breaks (default: False)
extractor_args: A dictionary of arguments to be passed to the extractors. extractor_args: A dictionary of arguments to be passed to the extractors.
See "EXTRACTOR ARGUMENTS" for details. See "EXTRACTOR ARGUMENTS" for details.
Argument values must always be a list of string(s).
E.g. {'youtube': {'skip': ['dash', 'hls']}} E.g. {'youtube': {'skip': ['dash', 'hls']}}
mark_watched: Mark videos watched (even with --simulate). Only for YouTube mark_watched: Mark videos watched (even with --simulate). Only for YouTube
@@ -2194,7 +2196,7 @@ class YoutubeDL:
return op(actual_value, comparison_value) return op(actual_value, comparison_value)
return _filter return _filter
def _check_formats(self, formats): def _check_formats(self, formats, warning=True):
for f in formats: for f in formats:
working = f.get('__working') working = f.get('__working')
if working is not None: if working is not None:
@@ -2207,6 +2209,9 @@ class YoutubeDL:
continue continue
temp_file = tempfile.NamedTemporaryFile(suffix='.tmp', delete=False, dir=path or None) temp_file = tempfile.NamedTemporaryFile(suffix='.tmp', delete=False, dir=path or None)
temp_file.close() temp_file.close()
# If FragmentFD fails when testing a fragment, it will wrongly set a non-zero return code.
# Save the actual return code for later. See https://github.com/yt-dlp/yt-dlp/issues/13750
original_retcode = self._download_retcode
try: try:
success, _ = self.dl(temp_file.name, f, test=True) success, _ = self.dl(temp_file.name, f, test=True)
except (DownloadError, OSError, ValueError, *network_exceptions): except (DownloadError, OSError, ValueError, *network_exceptions):
@@ -2217,12 +2222,18 @@ class YoutubeDL:
os.remove(temp_file.name) os.remove(temp_file.name)
except OSError: except OSError:
self.report_warning(f'Unable to delete temporary file "{temp_file.name}"') self.report_warning(f'Unable to delete temporary file "{temp_file.name}"')
# Restore the actual return code
self._download_retcode = original_retcode
f['__working'] = success f['__working'] = success
if success: if success:
f.pop('__needs_testing', None) f.pop('__needs_testing', None)
yield f yield f
else: else:
self.to_screen('[info] Unable to download format {}. Skipping...'.format(f['format_id'])) msg = f'Unable to download format {f["format_id"]}. Skipping...'
if warning:
self.report_warning(msg)
else:
self.to_screen(f'[info] {msg}')
def _select_formats(self, formats, selector): def _select_formats(self, formats, selector):
return list(selector({ return list(selector({
@@ -2948,7 +2959,7 @@ class YoutubeDL:
) )
if self.params.get('check_formats') is True: if self.params.get('check_formats') is True:
formats = LazyList(self._check_formats(formats[::-1]), reverse=True) formats = LazyList(self._check_formats(formats[::-1], warning=False), reverse=True)
if not formats or formats[0] is not info_dict: if not formats or formats[0] is not info_dict:
# only set the 'formats' fields if the original info_dict list them # only set the 'formats' fields if the original info_dict list them
@@ -3221,6 +3232,7 @@ class YoutubeDL:
} }
else: else:
params = self.params params = self.params
fd = get_suitable_downloader(info, params, to_stdout=(name == '-'))(self, params) fd = get_suitable_downloader(info, params, to_stdout=(name == '-'))(self, params)
if not test: if not test:
for ph in self._progress_hooks: for ph in self._progress_hooks:
@@ -3696,6 +3708,8 @@ class YoutubeDL:
return {k: filter_fn(v) for k, v in obj.items() if not reject(k, v)} return {k: filter_fn(v) for k, v in obj.items() if not reject(k, v)}
elif isinstance(obj, (list, tuple, set, LazyList)): elif isinstance(obj, (list, tuple, set, LazyList)):
return list(map(filter_fn, obj)) return list(map(filter_fn, obj))
elif isinstance(obj, ImpersonateTarget):
return str(obj)
elif obj is None or isinstance(obj, (str, int, float, bool)): elif obj is None or isinstance(obj, (str, int, float, bool)):
return obj return obj
else: else:
@@ -4173,6 +4187,31 @@ class YoutubeDL:
for rh in self._request_director.handlers.values() for rh in self._request_director.handlers.values()
if isinstance(rh, ImpersonateRequestHandler)) if isinstance(rh, ImpersonateRequestHandler))
def _parse_impersonate_targets(self, impersonate):
if impersonate in (True, ''):
impersonate = ImpersonateTarget()
requested_targets = [
t if isinstance(t, ImpersonateTarget) else ImpersonateTarget.from_str(t)
for t in variadic(impersonate)
] if impersonate else []
available_target = next(filter(self._impersonate_target_available, requested_targets), None)
return available_target, requested_targets
@staticmethod
def _unavailable_targets_message(requested_targets, note=None, is_error=False):
note = note or 'The extractor specified to use impersonation for this download'
specific_targets = ', '.join(filter(None, map(str, requested_targets)))
message = (
'no impersonate target is available' if not specific_targets
else f'none of these impersonate targets are available: {specific_targets}')
return (
f'{note}, but {message}. {"See" if is_error else "If you encounter errors, then see"}'
f' https://github.com/yt-dlp/yt-dlp#impersonation '
f'for information on installing the required dependencies')
def urlopen(self, req): def urlopen(self, req):
""" Start an HTTP download """ """ Start an HTTP download """
if isinstance(req, str): if isinstance(req, str):

View File

@@ -159,6 +159,12 @@ def set_compat_opts(opts):
elif 'prefer-vp9-sort' in opts.compat_opts: elif 'prefer-vp9-sort' in opts.compat_opts:
opts.format_sort.extend(FormatSorter._prefer_vp9_sort) opts.format_sort.extend(FormatSorter._prefer_vp9_sort)
if 'mtime-by-default' in opts.compat_opts:
if opts.updatetime is None:
opts.updatetime = True
else:
_unused_compat_opt('mtime-by-default')
_video_multistreams_set = set_default_compat('multistreams', 'allow_multiple_video_streams', False, remove_compat=False) _video_multistreams_set = set_default_compat('multistreams', 'allow_multiple_video_streams', False, remove_compat=False)
_audio_multistreams_set = set_default_compat('multistreams', 'allow_multiple_audio_streams', False, remove_compat=False) _audio_multistreams_set = set_default_compat('multistreams', 'allow_multiple_audio_streams', False, remove_compat=False)
if _video_multistreams_set is False and _audio_multistreams_set is False: if _video_multistreams_set is False and _audio_multistreams_set is False:

View File

@@ -435,7 +435,7 @@ def sub_bytes_inv(data):
def rotate(data): def rotate(data):
return data[1:] + [data[0]] return [*data[1:], data[0]]
def key_schedule_core(data, rcon_iteration): def key_schedule_core(data, rcon_iteration):

View File

@@ -99,7 +99,7 @@ def _get_suitable_downloader(info_dict, protocol, params, default):
if external_downloader is None: if external_downloader is None:
if info_dict['to_stdout'] and FFmpegFD.can_merge_formats(info_dict, params): if info_dict['to_stdout'] and FFmpegFD.can_merge_formats(info_dict, params):
return FFmpegFD return FFmpegFD
elif external_downloader.lower() != 'native': elif external_downloader.lower() != 'native' and info_dict.get('impersonate') is None:
ed = get_external_downloader(external_downloader) ed = get_external_downloader(external_downloader)
if ed.can_download(info_dict, external_downloader): if ed.can_download(info_dict, external_downloader):
return ed return ed

View File

@@ -495,3 +495,14 @@ class FileDownloader:
exe = os.path.basename(args[0]) exe = os.path.basename(args[0])
self.write_debug(f'{exe} command line: {shell_quote(args)}') self.write_debug(f'{exe} command line: {shell_quote(args)}')
def _get_impersonate_target(self, info_dict):
impersonate = info_dict.get('impersonate')
if impersonate is None:
return None
available_target, requested_targets = self.ydl._parse_impersonate_targets(impersonate)
if available_target:
return available_target
elif requested_targets:
self.report_warning(self.ydl._unavailable_targets_message(requested_targets))
return None

View File

@@ -302,7 +302,7 @@ class FragmentFD(FileDownloader):
elif to_file: elif to_file:
self.try_rename(ctx['tmpfilename'], ctx['filename']) self.try_rename(ctx['tmpfilename'], ctx['filename'])
filetime = ctx.get('fragment_filetime') filetime = ctx.get('fragment_filetime')
if self.params.get('updatetime', True) and filetime: if self.params.get('updatetime') and filetime:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
os.utime(ctx['filename'], (time.time(), filetime)) os.utime(ctx['filename'], (time.time(), filetime))

View File

@@ -94,12 +94,19 @@ class HlsFD(FragmentFD):
can_download, message = self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')), None can_download, message = self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')), None
if can_download: if can_download:
has_ffmpeg = FFmpegFD.available() has_ffmpeg = FFmpegFD.available()
no_crypto = not Cryptodome.AES and '#EXT-X-KEY:METHOD=AES-128' in s if not Cryptodome.AES and '#EXT-X-KEY:METHOD=AES-128' in s:
if no_crypto and has_ffmpeg: # Even if pycryptodomex isn't available, force HlsFD for m3u8s that won't work with ffmpeg
can_download, message = False, 'The stream has AES-128 encryption and pycryptodomex is not available' ffmpeg_can_dl = not traverse_obj(info_dict, ((
elif no_crypto: 'extra_param_to_segment_url', 'extra_param_to_key_url',
message = ('The stream has AES-128 encryption and neither ffmpeg nor pycryptodomex are available; ' 'hls_media_playlist_data', ('hls_aes', ('uri', 'key', 'iv')),
'Decryption will be performed natively, but will be extremely slow') ), any))
message = 'The stream has AES-128 encryption and {} available'.format(
'neither ffmpeg nor pycryptodomex are' if ffmpeg_can_dl and not has_ffmpeg else
'pycryptodomex is not')
if has_ffmpeg and ffmpeg_can_dl:
can_download = False
else:
message += '; decryption will be performed natively, but will be extremely slow'
elif info_dict.get('extractor_key') == 'Generic' and re.search(r'(?m)#EXT-X-MEDIA-SEQUENCE:(?!0$)', s): elif info_dict.get('extractor_key') == 'Generic' and re.search(r'(?m)#EXT-X-MEDIA-SEQUENCE:(?!0$)', s):
install_ffmpeg = '' if has_ffmpeg else 'install ffmpeg and ' install_ffmpeg = '' if has_ffmpeg else 'install ffmpeg and '
message = ('Live HLS streams are not supported by the native downloader. If this is a livestream, ' message = ('Live HLS streams are not supported by the native downloader. If this is a livestream, '

View File

@@ -27,6 +27,10 @@ class HttpFD(FileDownloader):
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
url = info_dict['url'] url = info_dict['url']
request_data = info_dict.get('request_data', None) request_data = info_dict.get('request_data', None)
request_extensions = {}
impersonate_target = self._get_impersonate_target(info_dict)
if impersonate_target is not None:
request_extensions['impersonate'] = impersonate_target
class DownloadContext(dict): class DownloadContext(dict):
__getattr__ = dict.get __getattr__ = dict.get
@@ -109,7 +113,7 @@ class HttpFD(FileDownloader):
if try_call(lambda: range_end >= ctx.content_len): if try_call(lambda: range_end >= ctx.content_len):
range_end = ctx.content_len - 1 range_end = ctx.content_len - 1
request = Request(url, request_data, headers) request = Request(url, request_data, headers, extensions=request_extensions)
has_range = range_start is not None has_range = range_start is not None
if has_range: if has_range:
request.headers['Range'] = f'bytes={int(range_start)}-{int_or_none(range_end) or ""}' request.headers['Range'] = f'bytes={int(range_start)}-{int_or_none(range_end) or ""}'
@@ -348,7 +352,7 @@ class HttpFD(FileDownloader):
self.try_rename(ctx.tmpfilename, ctx.filename) self.try_rename(ctx.tmpfilename, ctx.filename)
# Update file modification time # Update file modification time
if self.params.get('updatetime', True): if self.params.get('updatetime'):
info_dict['filetime'] = self.try_utime(ctx.filename, ctx.data.headers.get('last-modified', None)) info_dict['filetime'] = self.try_utime(ctx.filename, ctx.data.headers.get('last-modified', None))
self._hook_progress({ self._hook_progress({

View File

@@ -5,47 +5,46 @@ import time
from .common import FileDownloader from .common import FileDownloader
from .external import FFmpegFD from .external import FFmpegFD
from ..networking import Request from ..networking import Request
from ..utils import DownloadError, str_or_none, try_get from ..networking.websocket import WebSocketResponse
from ..utils import DownloadError, str_or_none, truncate_string
from ..utils.traversal import traverse_obj
class NiconicoLiveFD(FileDownloader): class NiconicoLiveFD(FileDownloader):
""" Downloads niconico live without being stopped """ """ Downloads niconico live without being stopped """
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
video_id = info_dict['video_id'] video_id = info_dict['id']
ws_url = info_dict['url'] opts = info_dict['downloader_options']
ws_extractor = info_dict['ws'] quality, ws_extractor, ws_url = opts['max_quality'], opts['ws'], opts['ws_url']
ws_origin_host = info_dict['origin']
live_quality = info_dict.get('live_quality', 'high')
live_latency = info_dict.get('live_latency', 'high')
dl = FFmpegFD(self.ydl, self.params or {}) dl = FFmpegFD(self.ydl, self.params or {})
new_info_dict = info_dict.copy() new_info_dict = info_dict.copy()
new_info_dict.update({ new_info_dict['protocol'] = 'm3u8'
'protocol': 'm3u8',
})
def communicate_ws(reconnect): def communicate_ws(reconnect):
if reconnect: # Support --load-info-json as if it is a reconnect attempt
ws = self.ydl.urlopen(Request(ws_url, headers={'Origin': f'https://{ws_origin_host}'})) if reconnect or not isinstance(ws_extractor, WebSocketResponse):
ws = self.ydl.urlopen(Request(
ws_url, headers={'Origin': 'https://live.nicovideo.jp'}))
if self.ydl.params.get('verbose', False): if self.ydl.params.get('verbose', False):
self.to_screen('[debug] Sending startWatching request') self.write_debug('Sending startWatching request')
ws.send(json.dumps({ ws.send(json.dumps({
'type': 'startWatching',
'data': { 'data': {
'reconnect': True,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': { 'stream': {
'quality': live_quality,
'protocol': 'hls+fmp4',
'latency': live_latency,
'accessRightMethod': 'single_cookie', 'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': quality,
}, },
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': True,
}, },
'type': 'startWatching',
})) }))
else: else:
ws = ws_extractor ws = ws_extractor
@@ -58,7 +57,6 @@ class NiconicoLiveFD(FileDownloader):
if not data or not isinstance(data, dict): if not data or not isinstance(data, dict):
continue continue
if data.get('type') == 'ping': if data.get('type') == 'ping':
# pong back
ws.send(r'{"type":"pong"}') ws.send(r'{"type":"pong"}')
ws.send(r'{"type":"keepSeat"}') ws.send(r'{"type":"keepSeat"}')
elif data.get('type') == 'disconnect': elif data.get('type') == 'disconnect':
@@ -66,12 +64,10 @@ class NiconicoLiveFD(FileDownloader):
return True return True
elif data.get('type') == 'error': elif data.get('type') == 'error':
self.write_debug(data) self.write_debug(data)
message = try_get(data, lambda x: x['body']['code'], str) or recv message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
return DownloadError(message) return DownloadError(message)
elif self.ydl.params.get('verbose', False): elif self.ydl.params.get('verbose', False):
if len(recv) > 100: self.write_debug(f'Server response: {truncate_string(recv, 100)}')
recv = recv[:100] + '...'
self.to_screen(f'[debug] Server said: {recv}')
def ws_main(): def ws_main():
reconnect = False reconnect = False
@@ -81,7 +77,8 @@ class NiconicoLiveFD(FileDownloader):
if ret is True: if ret is True:
return return
except BaseException as e: except BaseException as e:
self.to_screen('[{}] {}: Connection error occured, reconnecting after 10 seconds: {}'.format('niconico:live', video_id, str_or_none(e))) self.to_screen(
f'[niconico:live] {video_id}: Connection error occured, reconnecting after 10 seconds: {e}')
time.sleep(10) time.sleep(10)
continue continue
finally: finally:

View File

@@ -201,7 +201,6 @@ from .banbye import (
BanByeChannelIE, BanByeChannelIE,
BanByeIE, BanByeIE,
) )
from .bandaichannel import BandaiChannelIE
from .bandcamp import ( from .bandcamp import (
BandcampAlbumIE, BandcampAlbumIE,
BandcampIE, BandcampIE,
@@ -229,7 +228,6 @@ from .beatbump import (
from .beatport import BeatportIE from .beatport import BeatportIE
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .berufetv import BerufeTVIE from .berufetv import BerufeTVIE
from .bet import BetIE from .bet import BetIE
from .bfi import BFIPlayerIE from .bfi import BFIPlayerIE
@@ -275,7 +273,10 @@ from .bitchute import (
BitChuteChannelIE, BitChuteChannelIE,
BitChuteIE, BitChuteIE,
) )
from .blackboardcollaborate import BlackboardCollaborateIE from .blackboardcollaborate import (
BlackboardCollaborateIE,
BlackboardCollaborateLaunchIE,
)
from .bleacherreport import ( from .bleacherreport import (
BleacherReportCMSIE, BleacherReportCMSIE,
BleacherReportIE, BleacherReportIE,
@@ -309,6 +310,7 @@ from .brilliantpala import (
BrilliantpalaClassesIE, BrilliantpalaClassesIE,
BrilliantpalaElearnIE, BrilliantpalaElearnIE,
) )
from .btvplus import BTVPlusIE
from .bundesliga import BundesligaIE from .bundesliga import BundesligaIE
from .bundestag import BundestagIE from .bundestag import BundestagIE
from .bunnycdn import BunnyCdnIE from .bunnycdn import BunnyCdnIE
@@ -446,7 +448,6 @@ from .cspan import (
CSpanIE, CSpanIE,
) )
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import ( from .curiositystream import (
@@ -805,9 +806,7 @@ from .holodex import HolodexIE
from .hotnewhiphop import HotNewHipHopIE from .hotnewhiphop import HotNewHipHopIE
from .hotstar import ( from .hotstar import (
HotStarIE, HotStarIE,
HotStarPlaylistIE,
HotStarPrefixIE, HotStarPrefixIE,
HotStarSeasonIE,
HotStarSeriesIE, HotStarSeriesIE,
) )
from .hrefli import HrefLiRedirectIE from .hrefli import HrefLiRedirectIE
@@ -921,10 +920,6 @@ from .japandiet import (
ShugiinItvVodIE, ShugiinItvVodIE,
) )
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jiocinema import (
JioCinemaIE,
JioCinemaSeriesIE,
)
from .jiosaavn import ( from .jiosaavn import (
JioSaavnAlbumIE, JioSaavnAlbumIE,
JioSaavnArtistIE, JioSaavnArtistIE,
@@ -934,7 +929,6 @@ from .jiosaavn import (
JioSaavnSongIE, JioSaavnSongIE,
) )
from .joj import JojIE from .joj import JojIE
from .joqrag import JoqrAgIE
from .jove import JoveIE from .jove import JoveIE
from .jstream import JStreamIE from .jstream import JStreamIE
from .jtbc import ( from .jtbc import (
@@ -1037,11 +1031,6 @@ from .likee import (
LikeeIE, LikeeIE,
LikeeUserIE, LikeeUserIE,
) )
from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
LimelightMediaIE,
)
from .linkedin import ( from .linkedin import (
LinkedInEventsIE, LinkedInEventsIE,
LinkedInIE, LinkedInIE,
@@ -1153,6 +1142,7 @@ from .minds import (
MindsIE, MindsIE,
) )
from .minoto import MinotoIE from .minoto import MinotoIE
from .mir24tv import Mir24TvIE
from .mirrativ import ( from .mirrativ import (
MirrativIE, MirrativIE,
MirrativUserIE, MirrativUserIE,
@@ -1173,6 +1163,10 @@ from .mixcloud import (
MixcloudPlaylistIE, MixcloudPlaylistIE,
MixcloudUserIE, MixcloudUserIE,
) )
from .mixlr import (
MixlrIE,
MixlrRecoringIE,
)
from .mlb import ( from .mlb import (
MLBIE, MLBIE,
MLBTVIE, MLBTVIE,
@@ -1383,7 +1377,6 @@ from .nobelprize import NobelPrizeIE
from .noice import NoicePodcastIE from .noice import NoicePodcastIE
from .nonktube import NonkTubeIE from .nonktube import NonkTubeIE
from .noodlemagazine import NoodleMagazineIE from .noodlemagazine import NoodleMagazineIE
from .noovo import NoovoIE
from .nosnl import NOSNLArticleIE from .nosnl import NOSNLArticleIE
from .nova import ( from .nova import (
NovaEmbedIE, NovaEmbedIE,
@@ -1564,6 +1557,7 @@ from .platzi import (
PlatziCourseIE, PlatziCourseIE,
PlatziIE, PlatziIE,
) )
from .playerfm import PlayerFmIE
from .playplustv import PlayPlusTVIE from .playplustv import PlayPlusTVIE
from .playsuisse import PlaySuisseIE from .playsuisse import PlaySuisseIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
@@ -1830,6 +1824,7 @@ from .safari import (
from .saitosan import SaitosanIE from .saitosan import SaitosanIE
from .samplefocus import SampleFocusIE from .samplefocus import SampleFocusIE
from .sapo import SapoIE from .sapo import SapoIE
from .sauceplus import SaucePlusIE
from .sbs import SBSIE from .sbs import SBSIE
from .sbscokr import ( from .sbscokr import (
SBSCoKrAllvodProgramIE, SBSCoKrAllvodProgramIE,
@@ -2101,6 +2096,7 @@ from .theguardian import (
TheGuardianPodcastIE, TheGuardianPodcastIE,
TheGuardianPodcastPlaylistIE, TheGuardianPodcastPlaylistIE,
) )
from .thehighwire import TheHighWireIE
from .theholetv import TheHoleTvIE from .theholetv import TheHoleTvIE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
from .theplatform import ( from .theplatform import (
@@ -2289,6 +2285,7 @@ from .uliza import (
) )
from .umg import UMGDeIE from .umg import UMGDeIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .unitednations import UnitedNationsWebTvIE
from .unity import UnityIE from .unity import UnityIE
from .unsupported import ( from .unsupported import (
KnownDRMIE, KnownDRMIE,

View File

@@ -111,11 +111,9 @@ class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id> _VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
shows/[^/]+/season-\d+/episode-\d+| shows/[^/?#]+/season-\d+/episode-\d+|
(?: (?P<type>movie|special)s/[^/?#]+(?P<extra>/[^/?#]+)?|
(?:movie|special)s/[^/]+| (?:shows/[^/?#]+/)?videos/[^/?#]+
(?:shows/[^/]+/)?videos
)/[^/?#&]+
)''' )'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
@@ -128,7 +126,7 @@ class AENetworksIE(AENetworksBaseIE):
'upload_date': '20120529', 'upload_date': '20120529',
'uploader': 'AENE-NEW', 'uploader': 'AENE-NEW',
'duration': 2592.0, 'duration': 2592.0,
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:5', 'chapters': 'count:5',
'tags': 'count:14', 'tags': 'count:14',
'categories': ['Mountain Men'], 'categories': ['Mountain Men'],
@@ -139,10 +137,7 @@ class AENetworksIE(AENetworksBaseIE):
'series': 'Mountain Men', 'series': 'Mountain Men',
'age_limit': 0, 'age_limit': 0,
}, },
'params': { 'params': {'skip_download': 'm3u8'},
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'Geo-restricted - This content is not available in your location.', 'skip': 'Geo-restricted - This content is not available in your location.',
}, { }, {
@@ -156,7 +151,7 @@ class AENetworksIE(AENetworksBaseIE):
'upload_date': '20160112', 'upload_date': '20160112',
'uploader': 'AENE-NEW', 'uploader': 'AENE-NEW',
'duration': 1277.695, 'duration': 1277.695,
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:4', 'chapters': 'count:4',
'tags': 'count:23', 'tags': 'count:23',
'episode': 'Inlawful Entry', 'episode': 'Inlawful Entry',
@@ -166,10 +161,53 @@ class AENetworksIE(AENetworksBaseIE):
'series': 'Duck Dynasty', 'series': 'Duck Dynasty',
'age_limit': 0, 'age_limit': 0,
}, },
'params': { 'params': {'skip_download': 'm3u8'},
# m3u8 download 'add_ie': ['ThePlatform'],
'skip_download': True, }, {
'url': 'https://play.mylifetime.com/movies/v-c-andrews-web-of-dreams',
'info_dict': {
'id': '1590627395981',
'ext': 'mp4',
'title': 'VC Andrews\' Web of Dreams',
'description': 'md5:2a8ba13ae64271c79eb65c0577d312ce',
'uploader': 'AENE-NEW',
'age_limit': 14,
'duration': 5253.665,
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:8',
'tags': ['lifetime', 'mylifetime', 'lifetime channel', "VC Andrews' Web of Dreams"],
'series': '',
'season': 'Season 0',
'season_number': 0,
'episode': 'VC Andrews\' Web of Dreams',
'episode_number': 0,
'timestamp': 1566489703.0,
'upload_date': '20190822',
}, },
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
}, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story',
'info_dict': {
'id': '1488235587551',
'ext': 'mp4',
'title': 'Hunting JonBenet\'s Killer: The Untold Story',
'description': 'md5:209869425ee392d74fe29201821e48b4',
'uploader': 'AENE-NEW',
'age_limit': 14,
'duration': 5003.903,
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:10',
'tags': 'count:11',
'series': '',
'season': 'Season 0',
'season_number': 0,
'episode': 'Hunting JonBenet\'s Killer: The Untold Story',
'episode_number': 0,
'timestamp': 1554987697.0,
'upload_date': '20190411',
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
@@ -198,7 +236,9 @@ class AENetworksIE(AENetworksBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
domain, canonical = self._match_valid_url(url).groups() domain, canonical, url_type, extra = self._match_valid_url(url).group('domain', 'id', 'type', 'extra')
if url_type in ('movie', 'special') and not extra:
canonical += f'/full-{url_type}'
return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url) return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)

View File

@@ -16,6 +16,7 @@ from ..utils import (
dict_get, dict_get,
extract_attributes, extract_attributes,
get_element_by_id, get_element_by_id,
get_element_text_and_html_by_tag,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
js_to_json, js_to_json,
@@ -72,6 +73,7 @@ class ArchiveOrgIE(InfoExtractor):
'display_id': 'Cops-v2.mp4', 'display_id': 'Cops-v2.mp4',
'thumbnail': r're:https://archive\.org/download/.*\.jpg', 'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'duration': 1091.96, 'duration': 1091.96,
'track': 'Cops-v2',
}, },
}, { }, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
@@ -86,6 +88,7 @@ class ArchiveOrgIE(InfoExtractor):
'thumbnail': r're:https://archive\.org/download/.*\.jpg', 'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'duration': 59.77, 'duration': 59.77,
'display_id': 'Commercial-JFK1960ElectionAdCampaignJingle.mpg', 'display_id': 'Commercial-JFK1960ElectionAdCampaignJingle.mpg',
'track': 'Commercial-JFK1960ElectionAdCampaignJingle',
}, },
}, { }, {
'url': 'https://archive.org/details/Election_Ads/Commercial-Nixon1960ElectionAdToughonDefense.mpg', 'url': 'https://archive.org/details/Election_Ads/Commercial-Nixon1960ElectionAdToughonDefense.mpg',
@@ -102,6 +105,7 @@ class ArchiveOrgIE(InfoExtractor):
'duration': 59.51, 'duration': 59.51,
'license': 'http://creativecommons.org/licenses/publicdomain/', 'license': 'http://creativecommons.org/licenses/publicdomain/',
'thumbnail': r're:https://archive\.org/download/.*\.jpg', 'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'track': 'Commercial-Nixon1960ElectionAdToughonDefense',
}, },
}, { }, {
'url': 'https://archive.org/details/gd1977-05-08.shure57.stevenson.29303.flac16', 'url': 'https://archive.org/details/gd1977-05-08.shure57.stevenson.29303.flac16',
@@ -182,6 +186,7 @@ class ArchiveOrgIE(InfoExtractor):
'duration': 130.46, 'duration': 130.46,
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_01_000117.jpg', 'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_01_000117.jpg',
'display_id': 'irelandthemakingofarepublicreel1_01.mov', 'display_id': 'irelandthemakingofarepublicreel1_01.mov',
'track': 'irelandthemakingofarepublicreel1 01',
}, },
}, { }, {
'md5': '67335ee3b23a0da930841981c1e79b02', 'md5': '67335ee3b23a0da930841981c1e79b02',
@@ -192,6 +197,7 @@ class ArchiveOrgIE(InfoExtractor):
'title': 'irelandthemakingofarepublicreel1_02.mov', 'title': 'irelandthemakingofarepublicreel1_02.mov',
'display_id': 'irelandthemakingofarepublicreel1_02.mov', 'display_id': 'irelandthemakingofarepublicreel1_02.mov',
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_02_001374.jpg', 'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_02_001374.jpg',
'track': 'irelandthemakingofarepublicreel1 02',
}, },
}, { }, {
'md5': 'e470e86787893603f4a341a16c281eb5', 'md5': 'e470e86787893603f4a341a16c281eb5',
@@ -202,6 +208,7 @@ class ArchiveOrgIE(InfoExtractor):
'title': 'irelandthemakingofarepublicreel2.mov', 'title': 'irelandthemakingofarepublicreel2.mov',
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel2_001554.jpg', 'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel2_001554.jpg',
'display_id': 'irelandthemakingofarepublicreel2.mov', 'display_id': 'irelandthemakingofarepublicreel2.mov',
'track': 'irelandthemakingofarepublicreel2',
}, },
}, },
], ],
@@ -229,15 +236,8 @@ class ArchiveOrgIE(InfoExtractor):
@staticmethod @staticmethod
def _playlist_data(webpage): def _playlist_data(webpage):
element = re.findall(r'''(?xs) element = get_element_text_and_html_by_tag('play-av', webpage)[1]
<input return json.loads(extract_attributes(element)['playlist'])
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s+class=['"]?js-play8-playlist['"]?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s*/>
''', webpage)[0]
return json.loads(extract_attributes(element)['value'])
def _real_extract(self, url): def _real_extract(self, url):
video_id = urllib.parse.unquote_plus(self._match_id(url)) video_id = urllib.parse.unquote_plus(self._match_id(url))

View File

@@ -1,33 +0,0 @@
from .brightcove import BrightcoveNewBaseIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewBaseIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
ExtractorError, ExtractorError,
clean_html,
extract_attributes, extract_attributes,
float_or_none, float_or_none,
int_or_none, int_or_none,
@@ -19,7 +20,7 @@ from ..utils import (
url_or_none, url_or_none,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, traverse_obj from ..utils.traversal import find_element, find_elements, traverse_obj
class BandcampIE(InfoExtractor): class BandcampIE(InfoExtractor):
@@ -70,6 +71,9 @@ class BandcampIE(InfoExtractor):
'album': 'FTL: Advanced Edition Soundtrack', 'album': 'FTL: Advanced Edition Soundtrack',
'uploader_url': 'https://benprunty.bandcamp.com', 'uploader_url': 'https://benprunty.bandcamp.com',
'uploader_id': 'benprunty', 'uploader_id': 'benprunty',
'tags': ['soundtrack', 'chiptunes', 'cinematic', 'electronic', 'video game music', 'California'],
'artists': ['Ben Prunty'],
'album_artists': ['Ben Prunty'],
}, },
}, { }, {
# no free download, mp3 128 # no free download, mp3 128
@@ -94,6 +98,9 @@ class BandcampIE(InfoExtractor):
'album': 'Call of the Mastodon', 'album': 'Call of the Mastodon',
'uploader_url': 'https://relapsealumni.bandcamp.com', 'uploader_url': 'https://relapsealumni.bandcamp.com',
'uploader_id': 'relapsealumni', 'uploader_id': 'relapsealumni',
'tags': ['Philadelphia'],
'artists': ['Mastodon'],
'album_artists': ['Mastodon'],
}, },
}, { }, {
# track from compilation album (artist/album_artist difference) # track from compilation album (artist/album_artist difference)
@@ -118,6 +125,9 @@ class BandcampIE(InfoExtractor):
'album': 'DSK F/W 2016-2017 Free Compilation', 'album': 'DSK F/W 2016-2017 Free Compilation',
'uploader_url': 'https://diskotopia.bandcamp.com', 'uploader_url': 'https://diskotopia.bandcamp.com',
'uploader_id': 'diskotopia', 'uploader_id': 'diskotopia',
'tags': ['Japan'],
'artists': ['submerse'],
'album_artists': ['Diskotopia'],
}, },
}] }]
@@ -252,6 +262,7 @@ class BandcampIE(InfoExtractor):
'album': embed.get('album_title'), 'album': embed.get('album_title'),
'album_artist': album_artist, 'album_artist': album_artist,
'formats': formats, 'formats': formats,
'tags': traverse_obj(webpage, ({find_elements(cls='tag')}, ..., {clean_html})),
} }

View File

@@ -1,91 +0,0 @@
from .common import InfoExtractor
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn(?:bloomberg)?|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space|
etalk|
marilyn
)\.ca|
(?:much|cp24)\.com
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '3e5b8e38370741d5089da79161646635',
'info_dict': {
'id': '1403070',
'ext': 'flv',
'title': 'David Cockfield\'s Top Picks',
'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20180525',
'timestamp': 1527288600,
'season_id': '73997',
'season': '2018',
'thumbnail': 'http://images2.9c9media.com/image_asset/2018_5_25_baf30cbd-b28d-4a18-9903-4bb8713b00f5_PNG_956x536.jpg',
'tags': [],
'categories': ['ETFs'],
'season_number': 8,
'duration': 272.038,
'series': 'Market Call Tonight',
},
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
}, {
'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
'only_matching': True,
}, {
'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
'etalk': 'ctv',
'bnnbloomberg': 'bnn',
'marilyn': 'ctv_marilyn',
}
def _real_extract(self, url):
domain, video_id = self._match_valid_url(url).groups()
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': f'9c9media:{self._DOMAINS.get(domain, domain)}_web:{video_id}',
'ie_key': 'NineCNineMedia',
}

View File

@@ -175,6 +175,13 @@ class BilibiliBaseIE(InfoExtractor):
else: else:
note = f'Downloading video formats for cid {cid}' note = f'Downloading video formats for cid {cid}'
# TODO: remove this patch once utils.networking.random_user_agent() is updated, see #13735
# playurl requests carrying old UA will be rejected
headers = {
'User-Agent': f'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{random.randint(118,138)}.0.0.0 Safari/537.36',
**(headers or {}),
}
return self._download_json( return self._download_json(
'https://api.bilibili.com/x/player/wbi/playurl', bvid, 'https://api.bilibili.com/x/player/wbi/playurl', bvid,
query=self._sign_wbi(params, bvid), headers=headers, note=note)['data'] query=self._sign_wbi(params, bvid), headers=headers, note=note)['data']
@@ -353,7 +360,7 @@ class BiliBiliIE(BilibiliBaseIE):
'id': 'BV1bK411W797', 'id': 'BV1bK411W797',
'title': '物语中的人物是如何吐槽自己的OP的', 'title': '物语中的人物是如何吐槽自己的OP的',
}, },
'playlist_count': 18, 'playlist_count': 23,
'playlist': [{ 'playlist': [{
'info_dict': { 'info_dict': {
'id': 'BV1bK411W797_p1', 'id': 'BV1bK411W797_p1',
@@ -373,6 +380,7 @@ class BiliBiliIE(BilibiliBaseIE):
'_old_archive_ids': ['bilibili 498159642_part1'], '_old_archive_ids': ['bilibili 498159642_part1'],
}, },
}], }],
'params': {'playlist_items': '2'},
}, { }, {
'note': 'Specific page of Anthology', 'note': 'Specific page of Anthology',
'url': 'https://www.bilibili.com/video/BV1bK411W797?p=1', 'url': 'https://www.bilibili.com/video/BV1bK411W797?p=1',
@@ -899,11 +907,26 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id}, 'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id},
headers=headers)) headers=headers))
geo_blocked = traverse_obj(play_info, ( # play_info can be structured in at least three different ways, e.g.:
'raw', 'data', 'plugins', lambda _, v: v['name'] == 'AreaLimitPanel', 'config', 'is_block', {bool}, any)) # 1.) play_info['result']['video_info'] and play_info['code']
premium_only = play_info.get('code') == -10403 # 2.) play_info['raw']['data']['video_info'] and play_info['code']
# 3.) play_info['data']['result']['video_info'] and play_info['data']['code']
# So we need to transform any of the above into a common structure
status_code = play_info.get('code')
if 'raw' in play_info:
play_info = play_info['raw']
if 'data' in play_info:
play_info = play_info['data']
if status_code is None:
status_code = play_info.get('code')
if 'result' in play_info:
play_info = play_info['result']
video_info = traverse_obj(play_info, (('result', ('raw', 'data')), 'video_info', {dict}, any)) or {} geo_blocked = traverse_obj(play_info, (
'plugins', lambda _, v: v['name'] == 'AreaLimitPanel', 'config', 'is_block', {bool}, any))
premium_only = status_code == -10403
video_info = traverse_obj(play_info, ('video_info', {dict})) or {}
formats = self.extract_formats(video_info) formats = self.extract_formats(video_info)
if not formats: if not formats:
@@ -913,8 +936,8 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
self.raise_login_required('This video is for premium members only') self.raise_login_required('This video is for premium members only')
if traverse_obj(play_info, (( if traverse_obj(play_info, ((
('result', 'play_check', 'play_detail'), # 'PLAY_PREVIEW' vs 'PLAY_WHOLE' ('play_check', 'play_detail'), # 'PLAY_PREVIEW' vs 'PLAY_WHOLE' vs 'PLAY_NONE'
('raw', 'data', 'play_video_type'), # 'preview' vs 'whole' 'play_video_type', # 'preview' vs 'whole' vs 'none'
), any, {lambda x: x in ('PLAY_PREVIEW', 'preview')})): ), any, {lambda x: x in ('PLAY_PREVIEW', 'preview')})):
self.report_warning( self.report_warning(
'Only preview format is available, ' 'Only preview format is available, '
@@ -1000,6 +1023,7 @@ class BiliBiliBangumiMediaIE(BilibiliBaseIE):
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$', 'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
}, },
}], }],
'params': {'playlist_items': '2'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -1055,6 +1079,7 @@ class BiliBiliBangumiSeasonIE(BilibiliBaseIE):
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$', 'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
}, },
}], }],
'params': {'playlist_items': '2'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -1226,6 +1251,26 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'id': '313580179', 'id': '313580179',
}, },
'playlist_mincount': 92, 'playlist_mincount': 92,
}, {
# Hidden-mode collection
'url': 'https://space.bilibili.com/3669403/video',
'info_dict': {
'id': '3669403',
},
'playlist': [{
'info_dict': {
'_type': 'playlist',
'id': '3669403_3958082',
'title': '合集·直播回放',
'description': '',
'uploader': '月路Yuel',
'uploader_id': '3669403',
'timestamp': int,
'upload_date': str,
'thumbnail': str,
},
}],
'params': {'playlist_items': '7'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -1282,8 +1327,14 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
} }
def get_entries(page_data): def get_entries(page_data):
for entry in traverse_obj(page_data, ('list', 'vlist')) or []: for entry in traverse_obj(page_data, ('list', 'vlist', ..., {dict})):
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid']) if traverse_obj(entry, ('meta', 'attribute')) == 156:
# hidden-mode collection doesn't show its videos in uploads; extract as playlist instead
yield self.url_result(
f'https://space.bilibili.com/{entry["mid"]}/lists/{entry["meta"]["id"]}?type=season',
BilibiliCollectionListIE, f'{entry["mid"]}_{entry["meta"]["id"]}')
else:
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries) metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
return self.playlist_result(paged_list, playlist_id) return self.playlist_result(paged_list, playlist_id)
@@ -1819,7 +1870,7 @@ class BilibiliAudioIE(BilibiliAudioBaseIE):
'thumbnail': r're:^https?://.+\.jpg', 'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1564836614, 'timestamp': 1564836614,
'upload_date': '20190803', 'upload_date': '20190803',
'uploader': 'tsukimi-つきみぐ', 'uploader': '十六夜tsukimiつきみぐ',
'view_count': int, 'view_count': int,
}, },
} }
@@ -1874,10 +1925,10 @@ class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
'url': 'https://www.bilibili.com/audio/am10624', 'url': 'https://www.bilibili.com/audio/am10624',
'info_dict': { 'info_dict': {
'id': '10624', 'id': '10624',
'title': '每日新曲推荐每日11:00更新', 'title': '新曲推荐',
'description': '每天11:00更新为你推送最新音乐', 'description': '每天11:00更新为你推送最新音乐',
}, },
'playlist_count': 19, 'playlist_count': 16,
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,16 +1,27 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601 from ..utils import (
UnsupportedError,
float_or_none,
int_or_none,
join_nonempty,
jwt_decode_hs256,
mimetype2ext,
parse_iso8601,
parse_qs,
url_or_none,
)
from ..utils.traversal import traverse_obj
class BlackboardCollaborateIE(InfoExtractor): class BlackboardCollaborateIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?P<region>[a-z-]+)\.bbcollab\.com/ (?P<region>[a-z]+)(?:-lti)?\.bbcollab\.com/
(?: (?:
collab/ui/session/playback/load| collab/ui/session/playback/load|
recording recording
)/ )/
(?P<id>[^/]+)''' (?P<id>[^/?#]+)'''
_TESTS = [ _TESTS = [
{ {
'url': 'https://us-lti.bbcollab.com/collab/ui/session/playback/load/0a633b6a88824deb8c918f470b22b256', 'url': 'https://us-lti.bbcollab.com/collab/ui/session/playback/load/0a633b6a88824deb8c918f470b22b256',
@@ -19,9 +30,55 @@ class BlackboardCollaborateIE(InfoExtractor):
'id': '0a633b6a88824deb8c918f470b22b256', 'id': '0a633b6a88824deb8c918f470b22b256',
'title': 'HESI A2 Information Session - Thursday, May 6, 2021 - recording_1', 'title': 'HESI A2 Information Session - Thursday, May 6, 2021 - recording_1',
'ext': 'mp4', 'ext': 'mp4',
'duration': 1896000, 'duration': 1896,
'timestamp': 1620331399, 'timestamp': 1620333295,
'upload_date': '20210506', 'upload_date': '20210506',
'subtitles': {
'live_chat': 'mincount:1',
},
},
},
{
'url': 'https://eu.bbcollab.com/collab/ui/session/playback/load/4bde2dee104f40289a10f8e554270600',
'md5': '108db6a8f83dcb0c2a07793649581865',
'info_dict': {
'id': '4bde2dee104f40289a10f8e554270600',
'title': 'Meeting - Azerbaycanca erize formasi',
'ext': 'mp4',
'duration': 880,
'timestamp': 1671176868,
'upload_date': '20221216',
},
},
{
'url': 'https://eu.bbcollab.com/recording/f83be390ecff46c0bf7dccb9dddcf5f6',
'md5': 'e3b0b88ddf7847eae4b4c0e2d40b83a5',
'info_dict': {
'id': 'f83be390ecff46c0bf7dccb9dddcf5f6',
'title': 'Keynote lecture by Laura Carvalho - recording_1',
'ext': 'mp4',
'duration': 5506,
'timestamp': 1662721705,
'upload_date': '20220909',
'subtitles': {
'live_chat': 'mincount:1',
},
},
},
{
'url': 'https://eu.bbcollab.com/recording/c3e1e7c9e83d4cd9981c93c74888d496',
'md5': 'fdb2d8c43d66fbc0b0b74ef5e604eb1f',
'info_dict': {
'id': 'c3e1e7c9e83d4cd9981c93c74888d496',
'title': 'International Ally User Group - recording_18',
'ext': 'mp4',
'duration': 3479,
'timestamp': 1721919621,
'upload_date': '20240725',
'subtitles': {
'en': 'mincount:1',
'live_chat': 'mincount:1',
},
}, },
}, },
{ {
@@ -42,22 +99,81 @@ class BlackboardCollaborateIE(InfoExtractor):
}, },
] ]
def _call_api(self, region, video_id, path=None, token=None, note=None, fatal=False):
# Ref: https://github.com/blackboard/BBDN-Collab-Postman-REST
return self._download_json(
join_nonempty(f'https://{region}.bbcollab.com/collab/api/csa/recordings', video_id, path, delim='/'),
video_id, note or 'Downloading JSON metadata', fatal=fatal,
headers={'Authorization': f'Bearer {token}'} if token else None)
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) mobj = self._match_valid_url(url)
region = mobj.group('region') region = mobj.group('region')
video_id = mobj.group('id') video_id = mobj.group('id')
info = self._download_json( token = parse_qs(url).get('authToken', [None])[-1]
f'https://{region}.bbcollab.com/collab/api/csa/recordings/{video_id}/data', video_id)
duration = info.get('duration') video_info = self._call_api(region, video_id, path='data/secure', token=token, note='Trying auth token')
title = info['name'] if video_info:
upload_date = info.get('created') video_extra = self._call_api(region, video_id, token=token, note='Retrieving extra attributes')
streams = info['streams'] else:
formats = [{'format_id': k, 'url': url} for k, url in streams.items()] video_info = self._call_api(region, video_id, path='data', note='Trying fallback', fatal=True)
video_extra = {}
formats = traverse_obj(video_info, ('extStreams', lambda _, v: url_or_none(v['streamUrl']), {
'url': 'streamUrl',
'ext': ('contentType', {mimetype2ext}),
'aspect_ratio': ('aspectRatio', {float_or_none}),
}))
if filesize := traverse_obj(video_extra, ('storageSize', {int_or_none})):
for fmt in formats:
fmt['filesize'] = filesize
subtitles = {}
for subs in traverse_obj(video_info, ('subtitles', lambda _, v: url_or_none(v['url']))):
subtitles.setdefault(subs.get('lang') or 'und', []).append({
'name': traverse_obj(subs, ('label', {str})),
'url': subs['url'],
})
for live_chat_url in traverse_obj(video_info, ('chats', ..., 'url', {url_or_none})):
subtitles.setdefault('live_chat', []).append({'url': live_chat_url})
return { return {
'duration': duration, **traverse_obj(video_info, {
'title': ('name', {str}),
'timestamp': ('created', {parse_iso8601}),
'duration': ('duration', {int_or_none(scale=1000)}),
}),
'formats': formats, 'formats': formats,
'id': video_id, 'id': video_id,
'timestamp': parse_iso8601(upload_date), 'subtitles': subtitles,
'title': title,
} }
class BlackboardCollaborateLaunchIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.bbcollab\.com/launch/(?P<id>[^/?#]+)'
_TESTS = [
{
'url': 'https://au.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNzQwNDE2NDgzLCJpYXQiOjE3NDA0MTYxODMsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI3MzI4YzRjZTNmM2U0ZTcwYmY3MTY3N2RkZTgzMzk2NSIsImNvbnN1bWVySWQiOiJhM2Q3NGM0Y2QyZGU0MGJmODFkMjFlODNlMmEzNzM5MCIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.xuELw4EafEwUMoYcCHidGn4Tw9O1QCbYHzYGJUl0kKk',
'only_matching': True,
},
{
'url': 'https://us.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNjk0NDgxOTc3LCJpYXQiOjE2OTQ0ODE2NzcsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI3YWU0MTFhNTU3NjU0OWFiOTZlYjVmMTM1YmY3MWU5MCIsImNvbnN1bWVySWQiOiJBRUU2MEI4MDI2QzM3ODU2RjMwMzNEN0ZEOTQzMTFFNSIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.yOhRZNaIjXYoMYMpcTzgjZJCnIFaYf2cAzbco8OAxlY',
'only_matching': True,
},
{
'url': 'https://eu.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNzUyNjgyODYwLCJpYXQiOjE3NTI2ODI1NjAsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI4MjQzYjFiODg2Nzk0NTZkYjkwN2NmNDZmZmE1MmFhZiIsImNvbnN1bWVySWQiOiI5ZTY4NzYwZWJiNzM0MzRiYWY3NTQyZjA1YmJkOTMzMCIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.Xj4ymojYLwZ1vKPKZ-KxjpqQvFXoJekjRaG0npngwWs',
'only_matching': True,
},
]
def _real_extract(self, url):
token = self._match_id(url)
video_id = jwt_decode_hs256(token)['resourceAccessTicket']['resourceId']
redirect_url = self._request_webpage(url, video_id).url
if self.suitable(redirect_url):
raise UnsupportedError(redirect_url)
return self.url_result(redirect_url, BlackboardCollaborateIE, video_id)

View File

@@ -0,0 +1,73 @@
from .common import InfoExtractor
from ..utils import (
bug_reports_message,
clean_html,
get_element_by_class,
js_to_json,
mimetype2ext,
strip_or_none,
url_or_none,
urljoin,
)
from ..utils.traversal import traverse_obj
class BTVPlusIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?btvplus\.bg/produkt/(?:predavaniya|seriali|novini)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://btvplus.bg/produkt/predavaniya/67271/btv-reporterite/btv-reporterite-12-07-2025-g',
'info_dict': {
'ext': 'mp4',
'id': '67271',
'title': 'bTV Репортерите - 12.07.2025 г.',
'thumbnail': 'https://cdn.btv.bg/media/images/940x529/Jul2025/2113606319.jpg',
},
}, {
'url': 'https://btvplus.bg/produkt/seriali/66942/sezon-2/plen-sezon-2-epizod-55',
'info_dict': {
'ext': 'mp4',
'id': '66942',
'title': 'Плен - сезон 2, епизод 55',
'thumbnail': 'https://cdn.btv.bg/media/images/940x529/Jun2025/2113595104.jpg',
},
}, {
'url': 'https://btvplus.bg/produkt/novini/67270/btv-novinite-centralna-emisija-12-07-2025',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_url = self._search_regex(
r'var\s+videoUrl\s*=\s*[\'"]([^\'"]+)[\'"]',
webpage, 'player URL')
player_config = self._download_json(
urljoin('https://btvplus.bg', player_url), video_id)['config']
videojs_data = self._search_json(
r'videojs\(["\'][^"\']+["\'],', player_config, 'videojs data',
video_id, transform_source=js_to_json)
formats = []
subtitles = {}
for src in traverse_obj(videojs_data, ('sources', lambda _, v: url_or_none(v['src']))):
ext = mimetype2ext(src.get('type'))
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src['src'], video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
self.report_warning(f'Unknown format type {ext}{bug_reports_message()}')
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'title': (
strip_or_none(self._og_search_title(webpage, default=None))
or clean_html(get_element_by_class('product-title', webpage))),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
}

View File

@@ -1,5 +1,6 @@
import base64 import base64
import collections import collections
import contextlib
import functools import functools
import getpass import getpass
import http.client import http.client
@@ -37,7 +38,6 @@ from ..networking.exceptions import (
TransportError, TransportError,
network_exceptions, network_exceptions,
) )
from ..networking.impersonate import ImpersonateTarget
from ..utils import ( from ..utils import (
IDENTITY, IDENTITY,
JSON_LD_RE, JSON_LD_RE,
@@ -258,11 +258,19 @@ class InfoExtractor:
* key The key (as hex) used to decrypt fragments. * key The key (as hex) used to decrypt fragments.
If `key` is given, any key URI will be ignored If `key` is given, any key URI will be ignored
* iv The IV (as hex) used to decrypt fragments * iv The IV (as hex) used to decrypt fragments
* impersonate Impersonate target(s). Can be any of the following entities:
* an instance of yt_dlp.networking.impersonate.ImpersonateTarget
* a string in the format of CLIENT[:OS]
* a list or a tuple of CLIENT[:OS] strings or ImpersonateTarget instances
* a boolean value; True means any impersonate target is sufficient
* downloader_options A dictionary of downloader options * downloader_options A dictionary of downloader options
(For internal use only) (For internal use only)
* http_chunk_size Chunk size for HTTP downloads * http_chunk_size Chunk size for HTTP downloads
* ffmpeg_args Extra arguments for ffmpeg downloader (input) * ffmpeg_args Extra arguments for ffmpeg downloader (input)
* ffmpeg_args_out Extra arguments for ffmpeg downloader (output) * ffmpeg_args_out Extra arguments for ffmpeg downloader (output)
* ws (NiconicoLiveFD only) WebSocketResponse
* ws_url (NiconicoLiveFD only) Websockets URL
* max_quality (NiconicoLiveFD only) Max stream quality string
* is_dash_periods Whether the format is a result of merging * is_dash_periods Whether the format is a result of merging
multiple DASH periods. multiple DASH periods.
RTMP formats can also have the additional fields: page_url, RTMP formats can also have the additional fields: page_url,
@@ -332,6 +340,7 @@ class InfoExtractor:
* "name": Name or description of the subtitles * "name": Name or description of the subtitles
* "http_headers": A dictionary of additional HTTP headers * "http_headers": A dictionary of additional HTTP headers
to add to the request. to add to the request.
* "impersonate": Impersonate target(s); same as the "formats" field
"ext" will be calculated from URL if missing "ext" will be calculated from URL if missing
automatic_captions: Like 'subtitles'; contains automatically generated automatic_captions: Like 'subtitles'; contains automatically generated
captions instead of normal subtitles captions instead of normal subtitles
@@ -388,6 +397,8 @@ class InfoExtractor:
chapters: A list of dictionaries, with the following entries: chapters: A list of dictionaries, with the following entries:
* "start_time" - The start time of the chapter in seconds * "start_time" - The start time of the chapter in seconds
* "end_time" - The end time of the chapter in seconds * "end_time" - The end time of the chapter in seconds
(optional: core code can determine this value from
the next chapter's start_time or the video's duration)
* "title" (optional, string) * "title" (optional, string)
heatmap: A list of dictionaries, with the following entries: heatmap: A list of dictionaries, with the following entries:
* "start_time" - The start time of the data point in seconds * "start_time" - The start time of the data point in seconds
@@ -402,7 +413,8 @@ class InfoExtractor:
'unlisted' or 'public'. Use 'InfoExtractor._availability' 'unlisted' or 'public'. Use 'InfoExtractor._availability'
to set it to set it
media_type: The type of media as classified by the site, e.g. "episode", "clip", "trailer" media_type: The type of media as classified by the site, e.g. "episode", "clip", "trailer"
_old_archive_ids: A list of old archive ids needed for backward compatibility _old_archive_ids: A list of old archive ids needed for backward
compatibility. Use yt_dlp.utils.make_archive_id to generate ids
_format_sort_fields: A list of fields to use for sorting formats _format_sort_fields: A list of fields to use for sorting formats
__post_extractor: A function to be called just before the metadata is __post_extractor: A function to be called just before the metadata is
written to either disk, logger or console. The function written to either disk, logger or console. The function
@@ -880,26 +892,17 @@ class InfoExtractor:
extensions = {} extensions = {}
if impersonate in (True, ''): available_target, requested_targets = self._downloader._parse_impersonate_targets(impersonate)
impersonate = ImpersonateTarget()
requested_targets = [
t if isinstance(t, ImpersonateTarget) else ImpersonateTarget.from_str(t)
for t in variadic(impersonate)
] if impersonate else []
available_target = next(filter(self._downloader._impersonate_target_available, requested_targets), None)
if available_target: if available_target:
extensions['impersonate'] = available_target extensions['impersonate'] = available_target
elif requested_targets: elif requested_targets:
message = 'The extractor is attempting impersonation, but ' msg = 'The extractor is attempting impersonation'
message += (
'no impersonate target is available' if not str(impersonate)
else f'none of these impersonate targets are available: "{", ".join(map(str, requested_targets))}"')
info_msg = ('see https://github.com/yt-dlp/yt-dlp#impersonation '
'for information on installing the required dependencies')
if require_impersonation: if require_impersonation:
raise ExtractorError(f'{message}; {info_msg}', expected=True) raise ExtractorError(
self.report_warning(f'{message}; if you encounter errors, then {info_msg}', only_once=True) self._downloader._unavailable_targets_message(requested_targets, note=msg, is_error=True),
expected=True)
self.report_warning(
self._downloader._unavailable_targets_message(requested_targets, note=msg), only_once=True)
try: try:
return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query, extensions)) return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query, extensions))
@@ -1779,6 +1782,59 @@ class InfoExtractor:
r'<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>', webpage, 'next.js data', r'<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>', webpage, 'next.js data',
video_id, end_pattern='</script>', fatal=fatal, default=default, **kw) video_id, end_pattern='</script>', fatal=fatal, default=default, **kw)
def _search_nextjs_v13_data(self, webpage, video_id, fatal=True):
"""Parses Next.js app router flight data that was introduced in Next.js v13"""
nextjs_data = {}
if not fatal and not isinstance(webpage, str):
return nextjs_data
def flatten(flight_data):
if not isinstance(flight_data, list):
return
if len(flight_data) == 4 and flight_data[0] == '$':
_, name, _, data = flight_data
if not isinstance(data, dict):
return
children = data.pop('children', None)
if data and isinstance(name, str) and re.fullmatch(r'\$L[0-9a-f]+', name):
# It is useful hydration JSON data
nextjs_data[name[2:]] = data
flatten(children)
return
for f in flight_data:
flatten(f)
flight_text = ''
# The pattern for the surrounding JS/tag should be strict as it's a hardcoded string in the next.js source
# Ref: https://github.com/vercel/next.js/blob/5a4a08fdc/packages/next/src/server/app-render/use-flight-response.tsx#L189
for flight_segment in re.findall(r'<script\b[^>]*>self\.__next_f\.push\((\[.+?\])\)</script>', webpage):
segment = self._parse_json(flight_segment, video_id, fatal=fatal, errnote=None if fatal else False)
# Some earlier versions of next.js "optimized" away this array structure; this is unsupported
# Ref: https://github.com/vercel/next.js/commit/0123a9d5c9a9a77a86f135b7ae30b46ca986d761
if not isinstance(segment, list) or len(segment) != 2:
self.write_debug(
f'{video_id}: Unsupported next.js flight data structure detected', only_once=True)
continue
# Only use the relevant payload type (1 == data)
# Ref: https://github.com/vercel/next.js/blob/5a4a08fdc/packages/next/src/server/app-render/use-flight-response.tsx#L11-L14
payload_type, chunk = segment
if payload_type == 1:
flight_text += chunk
for f in flight_text.splitlines():
prefix, _, body = f.lstrip().partition(':')
if not re.fullmatch(r'[0-9a-f]+', prefix):
continue
# The body still isn't guaranteed to be valid JSON, so parsing should always be non-fatal
if body.startswith('[') and body.endswith(']'):
flatten(self._parse_json(body, video_id, fatal=False, errnote=False))
elif body.startswith('{') and body.endswith('}'):
data = self._parse_json(body, video_id, fatal=False, errnote=False)
if data is not None:
nextjs_data[prefix] = data
return nextjs_data
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal=True, traverse=('data', 0)): def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal=True, traverse=('data', 0)):
"""Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function""" """Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function"""
rectx = re.escape(context_name) rectx = re.escape(context_name)
@@ -2126,21 +2182,33 @@ class InfoExtractor:
raise ExtractorError(errnote, video_id=video_id) raise ExtractorError(errnote, video_id=video_id)
self.report_warning(f'{errnote}{bug_reports_message()}') self.report_warning(f'{errnote}{bug_reports_message()}')
return [], {} return [], {}
if note is None:
res = self._download_webpage_handle( note = 'Downloading m3u8 information'
m3u8_url, video_id, if errnote is None:
note='Downloading m3u8 information' if note is None else note, errnote = 'Failed to download m3u8 information'
errnote='Failed to download m3u8 information' if errnote is None else errnote, response = self._request_webpage(
m3u8_url, video_id, note=note, errnote=errnote,
fatal=fatal, data=data, headers=headers, query=query) fatal=fatal, data=data, headers=headers, query=query)
if response is False:
if res is False:
return [], {} return [], {}
m3u8_doc, urlh = res with contextlib.closing(response):
m3u8_url = urlh.url prefix = response.read(512)
if not prefix.startswith(b'#EXTM3U'):
msg = 'Response data has no m3u header'
if fatal:
raise ExtractorError(msg, video_id=video_id)
self.report_warning(f'{msg}{bug_reports_message()}', video_id=video_id)
return [], {}
content = self._webpage_read_content(
response, m3u8_url, video_id, note=note, errnote=errnote,
fatal=fatal, prefix=prefix, data=data)
if content is False:
return [], {}
return self._parse_m3u8_formats_and_subtitles( return self._parse_m3u8_formats_and_subtitles(
m3u8_doc, m3u8_url, ext=ext, entry_protocol=entry_protocol, content, response.url, ext=ext, entry_protocol=entry_protocol,
preference=preference, quality=quality, m3u8_id=m3u8_id, preference=preference, quality=quality, m3u8_id=m3u8_id,
note=note, errnote=errnote, fatal=fatal, live=live, data=data, note=note, errnote=errnote, fatal=fatal, live=live, data=data,
headers=headers, query=query, video_id=video_id) headers=headers, query=query, video_id=video_id)

View File

@@ -1,49 +0,0 @@
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id, # noqa: UP031
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:{}:{}'.format(content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View File

@@ -11,8 +11,14 @@ from ..utils.traversal import traverse_obj
class DangalPlayBaseIE(InfoExtractor): class DangalPlayBaseIE(InfoExtractor):
_NETRC_MACHINE = 'dangalplay' _NETRC_MACHINE = 'dangalplay'
_REGION = 'IN'
_OTV_USER_ID = None _OTV_USER_ID = None
_LOGIN_HINT = 'Pass credentials as -u "token" -p "USER_ID" where USER_ID is the `otv_user_id` in browser local storage' _LOGIN_HINT = (
'Pass credentials as -u "token" -p "USER_ID" '
'(where USER_ID is the value of "otv_user_id" in your browser local storage). '
'Your login region can be optionally suffixed to the username as @REGION '
'(where REGION is the two-letter "region" code found in your browser local storage), '
'e.g.: -u "token@IN" -p "USER_ID"')
_API_BASE = 'https://ottapi.dangalplay.com' _API_BASE = 'https://ottapi.dangalplay.com'
_AUTH_TOKEN = 'jqeGWxRKK7FK5zEk3xCM' # from https://www.dangalplay.com/main.48ad19e24eb46acccef3.js _AUTH_TOKEN = 'jqeGWxRKK7FK5zEk3xCM' # from https://www.dangalplay.com/main.48ad19e24eb46acccef3.js
_SECRET_KEY = 'f53d31a4377e4ef31fa0' # same as above _SECRET_KEY = 'f53d31a4377e4ef31fa0' # same as above
@@ -20,8 +26,12 @@ class DangalPlayBaseIE(InfoExtractor):
def _perform_login(self, username, password): def _perform_login(self, username, password):
if self._OTV_USER_ID: if self._OTV_USER_ID:
return return
if username != 'token' or not re.fullmatch(r'[\da-f]{32}', password): mobj = re.fullmatch(r'token(?:@(?P<region>[A-Z]{2}))?', username)
if not mobj or not re.fullmatch(r'[\da-f]{32}', password):
raise ExtractorError(self._LOGIN_HINT, expected=True) raise ExtractorError(self._LOGIN_HINT, expected=True)
if region := mobj.group('region'):
self._REGION = region
self.write_debug(f'Setting login region to "{self._REGION}"')
self._OTV_USER_ID = password self._OTV_USER_ID = password
def _real_initialize(self): def _real_initialize(self):
@@ -52,7 +62,7 @@ class DangalPlayBaseIE(InfoExtractor):
f'{self._API_BASE}/{path}', display_id, note, fatal=fatal, f'{self._API_BASE}/{path}', display_id, note, fatal=fatal,
headers={'Accept': 'application/json'}, query={ headers={'Accept': 'application/json'}, query={
'auth_token': self._AUTH_TOKEN, 'auth_token': self._AUTH_TOKEN,
'region': 'IN', 'region': self._REGION,
**query, **query,
}) })
@@ -106,7 +116,7 @@ class DangalPlayIE(DangalPlayBaseIE):
'catalog_id': catalog_id, 'catalog_id': catalog_id,
'content_id': content_id, 'content_id': content_id,
'category': '', 'category': '',
'region': 'IN', 'region': self._REGION,
'auth_token': self._AUTH_TOKEN, 'auth_token': self._AUTH_TOKEN,
'id': self._OTV_USER_ID, 'id': self._OTV_USER_ID,
'md5': hashlib.md5(unhashed.encode()).hexdigest(), 'md5': hashlib.md5(unhashed.encode()).hexdigest(),
@@ -129,11 +139,14 @@ class DangalPlayIE(DangalPlayBaseIE):
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 422: if isinstance(e.cause, HTTPError) and e.cause.status == 422:
error_info = traverse_obj(e.cause.response.read().decode(), ({json.loads}, 'error', {dict})) or {} error_info = traverse_obj(e.cause.response.read().decode(), ({json.loads}, 'error', {dict})) or {}
if error_info.get('code') == '1016': error_code = error_info.get('code')
if error_code == '1016':
self.raise_login_required( self.raise_login_required(
f'Your token has expired or is invalid. {self._LOGIN_HINT}', method=None) f'Your token has expired or is invalid. {self._LOGIN_HINT}', method=None)
elif msg := error_info.get('message'): elif error_code == '4028':
raise ExtractorError(msg) self.raise_login_required(
f'Your login region is unspecified or incorrect. {self._LOGIN_HINT}', method=None)
raise ExtractorError(join_nonempty(error_code, error_info.get('message'), delim=': '))
raise raise
m3u8_url = traverse_obj(details, ( m3u8_url = traverse_obj(details, (

View File

@@ -17,8 +17,140 @@ from ..utils import (
from ..utils.traversal import traverse_obj from ..utils.traversal import traverse_obj
class FloatplaneIE(InfoExtractor): class FloatplaneBaseIE(InfoExtractor):
def _real_extract(self, url):
post_id = self._match_id(url)
post_data = self._download_json(
f'{self._BASE_URL}/api/v3/content/post', post_id, query={'id': post_id},
note='Downloading post data', errnote='Unable to download post data',
impersonate=self._IMPERSONATE_TARGET)
if not any(traverse_obj(post_data, ('metadata', ('hasVideo', 'hasAudio')))):
raise ExtractorError('Post does not contain a video or audio track', expected=True)
uploader_url = format_field(
post_data, [('creator', 'urlname')], f'{self._BASE_URL}/channel/%s/home') or None
common_info = {
'uploader_url': uploader_url,
'channel_url': urljoin(f'{uploader_url}/', traverse_obj(post_data, ('channel', 'urlname'))),
'availability': self._availability(needs_subscription=True),
**traverse_obj(post_data, {
'uploader': ('creator', 'title', {str}),
'uploader_id': ('creator', 'id', {str}),
'channel': ('channel', 'title', {str}),
'channel_id': ('channel', 'id', {str}),
'release_timestamp': ('releaseDate', {parse_iso8601}),
}),
}
items = []
for media in traverse_obj(post_data, (('videoAttachments', 'audioAttachments'), ...)):
media_id = media['id']
media_typ = media.get('type') or 'video'
metadata = self._download_json(
f'{self._BASE_URL}/api/v3/content/{media_typ}', media_id, query={'id': media_id},
note=f'Downloading {media_typ} metadata', impersonate=self._IMPERSONATE_TARGET)
stream = self._download_json(
f'{self._BASE_URL}/api/v2/cdn/delivery', media_id, query={
'type': 'vod' if media_typ == 'video' else 'aod',
'guid': metadata['guid'],
}, note=f'Downloading {media_typ} stream data',
impersonate=self._IMPERSONATE_TARGET)
path_template = traverse_obj(stream, ('resource', 'uri', {str}))
def format_path(params):
path = path_template
for i, val in (params or {}).items():
path = path.replace(f'{{qualityLevelParams.{i}}}', val)
return path
formats = []
for quality in traverse_obj(stream, ('resource', 'data', 'qualityLevels', ...)):
url = urljoin(stream['cdn'], format_path(traverse_obj(
stream, ('resource', 'data', 'qualityLevelParams', quality['name'], {dict}))))
format_id = traverse_obj(quality, ('name', {str}))
hls_aes = {}
m3u8_data = None
# If we need impersonation for the API, then we need it for HLS keys too: extract in advance
if self._IMPERSONATE_TARGET is not None:
m3u8_data = self._download_webpage(
url, media_id, fatal=False, impersonate=self._IMPERSONATE_TARGET, headers=self._HEADERS,
note=join_nonempty('Downloading', format_id, 'm3u8 information', delim=' '),
errnote=join_nonempty('Failed to download', format_id, 'm3u8 information', delim=' '))
if not m3u8_data:
continue
key_url = self._search_regex(
r'#EXT-X-KEY:METHOD=AES-128,URI="(https?://[^"]+)"',
m3u8_data, 'HLS AES key URI', default=None)
if key_url:
urlh = self._request_webpage(
key_url, media_id, fatal=False, impersonate=self._IMPERSONATE_TARGET, headers=self._HEADERS,
note=join_nonempty('Downloading', format_id, 'HLS AES key', delim=' '),
errnote=join_nonempty('Failed to download', format_id, 'HLS AES key', delim=' '))
if urlh:
hls_aes['key'] = urlh.read().hex()
formats.append({
**traverse_obj(quality, {
'format_note': ('label', {str}),
'width': ('width', {int}),
'height': ('height', {int}),
}),
**parse_codecs(quality.get('codecs')),
'url': url,
'ext': determine_ext(url.partition('/chunk.m3u8')[0], 'mp4'),
'format_id': format_id,
'hls_media_playlist_data': m3u8_data,
'hls_aes': hls_aes or None,
})
items.append({
**common_info,
'id': media_id,
**traverse_obj(metadata, {
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'formats': formats,
})
post_info = {
**common_info,
'id': post_id,
'display_id': post_id,
**traverse_obj(post_data, {
'title': ('title', {str}),
'description': ('text', {clean_html}),
'like_count': ('likes', {int_or_none}),
'dislike_count': ('dislikes', {int_or_none}),
'comment_count': ('comments', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'http_headers': self._HEADERS,
}
if len(items) > 1:
return self.playlist_result(items, **post_info)
post_info.update(items[0])
return post_info
class FloatplaneIE(FloatplaneBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/post/(?P<id>\w+)' _VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/post/(?P<id>\w+)'
_BASE_URL = 'https://www.floatplane.com'
_IMPERSONATE_TARGET = None
_HEADERS = {
'Origin': _BASE_URL,
'Referer': f'{_BASE_URL}/',
}
_TESTS = [{ _TESTS = [{
'url': 'https://www.floatplane.com/post/2Yf3UedF7C', 'url': 'https://www.floatplane.com/post/2Yf3UedF7C',
'info_dict': { 'info_dict': {
@@ -170,105 +302,9 @@ class FloatplaneIE(InfoExtractor):
}] }]
def _real_initialize(self): def _real_initialize(self):
if not self._get_cookies('https://www.floatplane.com').get('sails.sid'): if not self._get_cookies(self._BASE_URL).get('sails.sid'):
self.raise_login_required() self.raise_login_required()
def _real_extract(self, url):
post_id = self._match_id(url)
post_data = self._download_json(
'https://www.floatplane.com/api/v3/content/post', post_id, query={'id': post_id},
note='Downloading post data', errnote='Unable to download post data')
if not any(traverse_obj(post_data, ('metadata', ('hasVideo', 'hasAudio')))):
raise ExtractorError('Post does not contain a video or audio track', expected=True)
uploader_url = format_field(
post_data, [('creator', 'urlname')], 'https://www.floatplane.com/channel/%s/home') or None
common_info = {
'uploader_url': uploader_url,
'channel_url': urljoin(f'{uploader_url}/', traverse_obj(post_data, ('channel', 'urlname'))),
'availability': self._availability(needs_subscription=True),
**traverse_obj(post_data, {
'uploader': ('creator', 'title', {str}),
'uploader_id': ('creator', 'id', {str}),
'channel': ('channel', 'title', {str}),
'channel_id': ('channel', 'id', {str}),
'release_timestamp': ('releaseDate', {parse_iso8601}),
}),
}
items = []
for media in traverse_obj(post_data, (('videoAttachments', 'audioAttachments'), ...)):
media_id = media['id']
media_typ = media.get('type') or 'video'
metadata = self._download_json(
f'https://www.floatplane.com/api/v3/content/{media_typ}', media_id, query={'id': media_id},
note=f'Downloading {media_typ} metadata')
stream = self._download_json(
'https://www.floatplane.com/api/v2/cdn/delivery', media_id, query={
'type': 'vod' if media_typ == 'video' else 'aod',
'guid': metadata['guid'],
}, note=f'Downloading {media_typ} stream data')
path_template = traverse_obj(stream, ('resource', 'uri', {str}))
def format_path(params):
path = path_template
for i, val in (params or {}).items():
path = path.replace(f'{{qualityLevelParams.{i}}}', val)
return path
formats = []
for quality in traverse_obj(stream, ('resource', 'data', 'qualityLevels', ...)):
url = urljoin(stream['cdn'], format_path(traverse_obj(
stream, ('resource', 'data', 'qualityLevelParams', quality['name'], {dict}))))
formats.append({
**traverse_obj(quality, {
'format_id': ('name', {str}),
'format_note': ('label', {str}),
'width': ('width', {int}),
'height': ('height', {int}),
}),
**parse_codecs(quality.get('codecs')),
'url': url,
'ext': determine_ext(url.partition('/chunk.m3u8')[0], 'mp4'),
})
items.append({
**common_info,
'id': media_id,
**traverse_obj(metadata, {
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'formats': formats,
})
post_info = {
**common_info,
'id': post_id,
'display_id': post_id,
**traverse_obj(post_data, {
'title': ('title', {str}),
'description': ('text', {clean_html}),
'like_count': ('likes', {int_or_none}),
'dislike_count': ('dislikes', {int_or_none}),
'comment_count': ('comments', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
}
if len(items) > 1:
return self.playlist_result(items, **post_info)
post_info.update(items[0])
return post_info
class FloatplaneChannelIE(InfoExtractor): class FloatplaneChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?' _VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'

View File

@@ -1,4 +1,3 @@
import json
import re import re
import urllib.parse import urllib.parse
@@ -19,7 +18,11 @@ from ..utils import (
unsmuggle_url, unsmuggle_url,
url_or_none, url_or_none,
) )
from ..utils.traversal import find_element, traverse_obj from ..utils.traversal import (
find_element,
get_first,
traverse_obj,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -121,9 +124,10 @@ class FranceTVIE(InfoExtractor):
elif code := traverse_obj(dinfo, ('code', {int})): elif code := traverse_obj(dinfo, ('code', {int})):
if code == 2009: if code == 2009:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES) self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
elif code in (2015, 2017): elif code in (2015, 2017, 2019):
# 2015: L'accès à cette vidéo est impossible. (DRM-only) # 2015: L'accès à cette vidéo est impossible. (DRM-only)
# 2017: Cette vidéo n'est pas disponible depuis le site web mobile (b/c DRM) # 2017: Cette vidéo n'est pas disponible depuis le site web mobile (b/c DRM)
# 2019: L'accès à cette vidéo est incompatible avec votre configuration. (DRM-only)
drm_formats = True drm_formats = True
continue continue
self.report_warning( self.report_warning(
@@ -258,7 +262,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html', 'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
'info_dict': { 'info_dict': {
'id': 'ec217ecc-0733-48cf-ac06-af1347b849d1', # old: c5bda21d-2c6f-4470-8849-3d8327adb2ba' 'id': 'b2cf9fd8-e971-4757-8651-848f2772df61', # old: ec217ecc-0733-48cf-ac06-af1347b849d1
'ext': 'mp4', 'ext': 'mp4',
'title': '13h15, le dimanche... - Les mystères de Jésus', 'title': '13h15, le dimanche... - Les mystères de Jésus',
'timestamp': 1502623500, 'timestamp': 1502623500,
@@ -269,7 +273,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [FranceTVIE.ie_key()], 'skip': 'Unfortunately, this video is no longer available',
}, { }, {
# geo-restricted # geo-restricted
'url': 'https://www.france.tv/enfants/six-huit-ans/foot2rue/saison-1/3066387-duel-au-vieux-port.html', 'url': 'https://www.france.tv/enfants/six-huit-ans/foot2rue/saison-1/3066387-duel-au-vieux-port.html',
@@ -287,7 +291,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1441, 'duration': 1441,
}, },
'skip': 'No longer available', 'skip': 'Unfortunately, this video is no longer available',
}, { }, {
# geo-restricted livestream (workflow == 'token-akamai') # geo-restricted livestream (workflow == 'token-akamai')
'url': 'https://www.france.tv/france-4/direct.html', 'url': 'https://www.france.tv/france-4/direct.html',
@@ -308,6 +312,19 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'live_status': 'is_live', 'live_status': 'is_live',
}, },
'params': {'skip_download': 'livestream'}, 'params': {'skip_download': 'livestream'},
}, {
# Not geo-restricted
'url': 'https://www.france.tv/france-2/la-maison-des-maternelles/5574051-nous-sommes-amis-et-nous-avons-fait-un-enfant-ensemble.html',
'info_dict': {
'id': 'b448bfe4-9fe7-11ee-97d8-2ba3426fa3df',
'ext': 'mp4',
'title': 'Nous sommes amis et nous avons fait un enfant ensemble - Émission du jeudi 21 décembre 2023',
'duration': 1065,
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1703147921,
'upload_date': '20231221',
},
'params': {'skip_download': 'm3u8'},
}, { }, {
# france3 # france3
'url': 'https://www.france.tv/france-3/des-chiffres-et-des-lettres/139063-emission-du-mardi-9-mai-2017.html', 'url': 'https://www.france.tv/france-3/des-chiffres-et-des-lettres/139063-emission-du-mardi-9-mai-2017.html',
@@ -342,30 +359,16 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
nextjs_data = self._search_nextjs_v13_data(webpage, display_id)
nextjs_data = traverse_obj( if get_first(nextjs_data, ('isLive', {bool})):
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json}, ..., 'children', ..., ..., 'children', ..., ..., 'children'))
if traverse_obj(nextjs_data, (..., ..., 'children', ..., 'isLive', {bool}, any)):
# For livestreams we need the id of the stream instead of the currently airing episode id # For livestreams we need the id of the stream instead of the currently airing episode id
video_id = traverse_obj(nextjs_data, ( video_id = get_first(nextjs_data, ('options', 'id', {str}))
..., ..., 'children', ..., 'children', ..., 'children', ..., 'children', ..., ...,
'children', ..., ..., 'children', ..., ..., 'children', (..., (..., ...)),
'options', 'id', {str}, any))
else: else:
video_id = traverse_obj(nextjs_data, ( video_id = get_first(nextjs_data, ('video', ('playerReplayId', 'siId'), {str}))
..., ..., ..., 'children',
lambda _, v: v['video']['url'] == urllib.parse.urlparse(url).path,
'video', ('playerReplayId', 'siId'), {str}, any))
if not video_id: if not video_id:
raise ExtractorError('Unable to extract video ID') raise ExtractorError('Unable to extract video ID')

View File

@@ -1481,30 +1481,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['SenateISVP'], 'add_ie': ['SenateISVP'],
}, },
{
# Limelight embeds (1 channel embed + 4 media embeds)
'url': 'http://www.sedona.com/FacilitatorTraining2017',
'info_dict': {
'id': 'FacilitatorTraining2017',
'title': 'Facilitator Training 2017',
},
'playlist_mincount': 5,
},
{
# Limelight embed (LimelightPlayerUtil.embed)
'url': 'https://tv5.ca/videos?v=xuu8qowr291ri',
'info_dict': {
'id': '95d035dc5c8a401588e9c0e6bd1e9c92',
'ext': 'mp4',
'title': '07448641',
'timestamp': 1499890639,
'upload_date': '20170712',
},
'params': {
'skip_download': True,
},
'add_ie': ['LimelightMedia'],
},
{ {
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/', 'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
'info_dict': { 'info_dict': {

View File

@@ -5,16 +5,11 @@ import hashlib
import hmac import hmac
import json import json
import os import os
import re
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import ExtractorError, int_or_none
ExtractorError, from ..utils.traversal import get_first, traverse_obj
int_or_none,
remove_end,
traverse_obj,
)
class GoPlayIE(InfoExtractor): class GoPlayIE(InfoExtractor):
@@ -27,10 +22,10 @@ class GoPlayIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '2baa4560-87a0-421b-bffc-359914e3c387', 'id': '2baa4560-87a0-421b-bffc-359914e3c387',
'ext': 'mp4', 'ext': 'mp4',
'title': 'S22 - Aflevering 1', 'title': 'De Slimste Mens ter Wereld - S22 - Aflevering 1',
'description': r're:In aflevering 1 nemen Daan Alferink, Tess Elst en Xander De Rycke .{66}', 'description': r're:In aflevering 1 nemen Daan Alferink, Tess Elst en Xander De Rycke .{66}',
'series': 'De Slimste Mens ter Wereld', 'series': 'De Slimste Mens ter Wereld',
'episode': 'Episode 1', 'episode': 'Wordt aangekondigd',
'season_number': 22, 'season_number': 22,
'episode_number': 1, 'episode_number': 1,
'season': 'Season 22', 'season': 'Season 22',
@@ -52,7 +47,7 @@ class GoPlayIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'ecb79672-92b9-4cd9-a0d7-e2f0250681ee', 'id': 'ecb79672-92b9-4cd9-a0d7-e2f0250681ee',
'ext': 'mp4', 'ext': 'mp4',
'title': 'S11 - Aflevering 1', 'title': 'De Mol - S11 - Aflevering 1',
'description': r're:Tien kandidaten beginnen aan hun verovering van Amerika en ontmoeten .{102}', 'description': r're:Tien kandidaten beginnen aan hun verovering van Amerika en ontmoeten .{102}',
'episode': 'Episode 1', 'episode': 'Episode 1',
'series': 'De Mol', 'series': 'De Mol',
@@ -75,21 +70,13 @@ class GoPlayIE(InfoExtractor):
if not self._id_token: if not self._id_token:
raise self.raise_login_required(method='password') raise self.raise_login_required(method='password')
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
nextjs_data = traverse_obj( nextjs_data = self._search_nextjs_v13_data(webpage, display_id)
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage), meta = get_first(nextjs_data, (
(..., {json.loads}, ..., {self._find_json}, ...)) lambda k, v: k in ('video', 'meta') and v['path'] == urllib.parse.urlparse(url).path))
meta = traverse_obj(nextjs_data, (
..., ..., 'children', ..., ..., 'children',
lambda _, v: v['video']['path'] == urllib.parse.urlparse(url).path, 'video', any))
video_id = meta['uuid'] video_id = meta['uuid']
info_dict = traverse_obj(meta, { info_dict = traverse_obj(meta, {
@@ -98,19 +85,18 @@ class GoPlayIE(InfoExtractor):
}) })
if traverse_obj(meta, ('program', 'subtype')) != 'movie': if traverse_obj(meta, ('program', 'subtype')) != 'movie':
for season_data in traverse_obj(nextjs_data, (..., 'children', ..., 'playlists', ...)): for season_data in traverse_obj(nextjs_data, (..., 'playlists', ..., {dict})):
episode_data = traverse_obj( episode_data = traverse_obj(season_data, ('videos', lambda _, v: v['videoId'] == video_id, any))
season_data, ('videos', lambda _, v: v['videoId'] == video_id, any))
if not episode_data: if not episode_data:
continue continue
episode_title = traverse_obj( season_number = traverse_obj(season_data, ('season', {int_or_none}))
episode_data, 'contextualTitle', 'episodeTitle', expected_type=str)
info_dict.update({ info_dict.update({
'title': episode_title or info_dict.get('title'), 'episode': traverse_obj(episode_data, ('episodeTitle', {str})),
'series': remove_end(info_dict.get('title'), f' - {episode_title}'),
'season_number': traverse_obj(season_data, ('season', {int_or_none})),
'episode_number': traverse_obj(episode_data, ('episodeNumber', {int_or_none})), 'episode_number': traverse_obj(episode_data, ('episodeNumber', {int_or_none})),
'season_number': season_number,
'series': self._search_regex(
fr'^(.+)? - S{season_number} - ', info_dict.get('title'), 'series', default=None),
}) })
break break

View File

@@ -1,3 +1,4 @@
import functools
import hashlib import hashlib
import hmac import hmac
import json import json
@@ -9,77 +10,126 @@ from .common import InfoExtractor
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList,
determine_ext, determine_ext,
filter_dict,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
jwt_decode_hs256,
parse_iso8601,
str_or_none, str_or_none,
traverse_obj,
url_or_none, url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class HotStarBaseIE(InfoExtractor): class HotStarBaseIE(InfoExtractor):
_TOKEN_NAME = 'userUP'
_BASE_URL = 'https://www.hotstar.com' _BASE_URL = 'https://www.hotstar.com'
_API_URL = 'https://api.hotstar.com' _API_URL = 'https://api.hotstar.com'
_API_URL_V2 = 'https://apix.hotstar.com/v2'
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee' _AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
_FREE_HEADERS = {
'user-agent': 'Hotstar;in.startv.hotstar/25.06.30.0.11580 (Android/12)',
'x-hs-client': 'platform:android;app_id:in.startv.hotstar;app_version:25.06.30.0;os:Android;os_version:12;schema_version:0.0.1523',
'x-hs-platform': 'android',
}
_SUB_HEADERS = {
'user-agent': 'Disney+;in.startv.hotstar.dplus.tv/23.08.14.4.2915 (Android/13)',
'x-hs-client': 'platform:androidtv;app_id:in.startv.hotstar.dplus.tv;app_version:23.08.14.4;os:Android;os_version:13;schema_version:0.0.970',
'x-hs-platform': 'androidtv',
}
def _has_active_subscription(self, cookies, server_time):
server_time = int_or_none(server_time) or int(time.time())
expiry = traverse_obj(cookies, (
self._TOKEN_NAME, 'value', {jwt_decode_hs256}, 'sub', {json.loads},
'subscriptions', 'in', ..., 'expiry', {parse_iso8601}, all, {max})) or 0
return expiry > server_time
def _call_api_v1(self, path, *args, **kwargs): def _call_api_v1(self, path, *args, **kwargs):
return self._download_json( return self._download_json(
f'{self._API_URL}/o/v1/{path}', *args, **kwargs, f'{self._API_URL}/o/v1/{path}', *args, **kwargs,
headers={'x-country-code': 'IN', 'x-platform-code': 'PCTV'}) headers={'x-country-code': 'IN', 'x-platform-code': 'PCTV'})
def _call_api_impl(self, path, video_id, query, st=None, cookies=None): def _call_api_impl(self, path, video_id, query, cookies=None, st=None):
st = int_or_none(st) or int(time.time()) st = int_or_none(st) or int(time.time())
exp = st + 6000 exp = st + 6000
auth = f'st={st}~exp={exp}~acl=/*' auth = f'st={st}~exp={exp}~acl=/*'
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest() auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
if cookies and cookies.get('userUP'):
token = cookies.get('userUP').value
else:
token = self._download_json(
f'{self._API_URL}/um/v3/users',
video_id, note='Downloading token',
data=json.dumps({'device_ids': [{'id': str(uuid.uuid4()), 'type': 'device_id'}]}).encode(),
headers={
'hotstarauth': auth,
'x-hs-platform': 'PCTV', # or 'web'
'Content-Type': 'application/json',
})['user_identity']
response = self._download_json( response = self._download_json(
f'{self._API_URL}/{path}', video_id, query=query, f'{self._API_URL_V2}/{path}', video_id, query=query,
headers={ headers=filter_dict({
**(self._SUB_HEADERS if self._has_active_subscription(cookies, st) else self._FREE_HEADERS),
'hotstarauth': auth, 'hotstarauth': auth,
'x-hs-appversion': '6.72.2', 'x-hs-usertoken': traverse_obj(cookies, (self._TOKEN_NAME, 'value')),
'x-hs-platform': 'web', 'x-hs-device-id': traverse_obj(cookies, ('deviceId', 'value')) or str(uuid.uuid4()),
'x-hs-usertoken': token, 'content-type': 'application/json',
}) }))
if response['message'] != "Playback URL's fetched successfully": if not traverse_obj(response, ('success', {dict})):
raise ExtractorError( raise ExtractorError('API call was unsuccessful')
response['message'], expected=True) return response['success']
return response['data']
def _call_api_v2(self, path, video_id, st=None, cookies=None): def _call_api_v2(self, path, video_id, content_type, cookies=None, st=None):
return self._call_api_impl( return self._call_api_impl(f'{path}', video_id, query={
f'{path}/content/{video_id}', video_id, st=st, cookies=cookies, query={ 'content_id': video_id,
'desired-config': 'audio_channel:stereo|container:fmp4|dynamic_range:hdr|encryption:plain|ladder:tv|package:dash|resolution:fhd|subs-tag:HotstarVIP|video_codec:h265', 'filters': f'content_type={content_type}',
'device-id': cookies.get('device_id').value if cookies.get('device_id') else str(uuid.uuid4()), 'client_capabilities': json.dumps({
'os-name': 'Windows', 'package': ['dash', 'hls'],
'os-version': '10', 'container': ['fmp4', 'fmp4br', 'ts'],
}) 'ads': ['non_ssai', 'ssai'],
'audio_channel': ['stereo', 'dolby51', 'atmos'],
'encryption': ['plain', 'widevine'], # wv only so we can raise appropriate error
'video_codec': ['h264', 'h265'],
'video_codec_non_secure': ['h264', 'h265', 'vp9'],
'ladder': ['phone', 'tv', 'full'],
'resolution': ['hd', '4k'],
'true_resolution': ['hd', '4k'],
'dynamic_range': ['sdr', 'hdr'],
}, separators=(',', ':')),
'drm_parameters': json.dumps({
'widevine_security_level': ['SW_SECURE_DECODE', 'SW_SECURE_CRYPTO'],
'hdcp_version': ['HDCP_V2_2', 'HDCP_V2_1', 'HDCP_V2', 'HDCP_V1'],
}, separators=(',', ':')),
}, cookies=cookies, st=st)
def _playlist_entries(self, path, item_id, root=None, **kwargs): @staticmethod
results = self._call_api_v1(path, item_id, **kwargs)['body']['results'] def _parse_metadata_v1(video_data):
for video in traverse_obj(results, (('assets', None), 'items', ...)): return traverse_obj(video_data, {
if video.get('contentId'): 'id': ('contentId', {str}),
yield self.url_result( 'title': ('title', {str}),
HotStarIE._video_url(video['contentId'], root=root), HotStarIE, video['contentId']) 'description': ('description', {str}),
'duration': ('duration', {int_or_none}),
'timestamp': (('broadcastDate', 'startDate'), {int_or_none}, any),
'release_year': ('year', {int_or_none}),
'channel': ('channelName', {str}),
'channel_id': ('channelId', {int}, {str_or_none}),
'series': ('showName', {str}),
'season': ('seasonName', {str}),
'season_number': ('seasonNo', {int_or_none}),
'season_id': ('seasonId', {int}, {str_or_none}),
'episode': ('title', {str}),
'episode_number': ('episodeNo', {int_or_none}),
})
def _fetch_page(self, path, item_id, name, query, root, page):
results = self._call_api_v1(
path, item_id, note=f'Downloading {name} page {page + 1} JSON', query={
**query,
'tao': page * self._PAGE_SIZE,
'tas': self._PAGE_SIZE,
})['body']['results']
for video in traverse_obj(results, (('assets', None), 'items', lambda _, v: v['contentId'])):
yield self.url_result(
HotStarIE._video_url(video['contentId'], root=root), HotStarIE, **self._parse_metadata_v1(video))
class HotStarIE(HotStarBaseIE): class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar' IE_NAME = 'hotstar'
IE_DESC = 'JioHotstar'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?hotstar\.com(?:/in)?/(?!in/) https?://(?:www\.)?hotstar\.com(?:/in)?/(?!in/)
(?: (?:
@@ -114,15 +164,16 @@ class HotStarIE(HotStarBaseIE):
'upload_date': '20190501', 'upload_date': '20190501',
'duration': 1219, 'duration': 1219,
'channel': 'StarPlus', 'channel': 'StarPlus',
'channel_id': '3', 'channel_id': '821',
'series': 'Ek Bhram - Sarvagun Sampanna', 'series': 'Ek Bhram - Sarvagun Sampanna',
'season': 'Chapter 1', 'season': 'Chapter 1',
'season_number': 1, 'season_number': 1,
'season_id': '6771', 'season_id': '1260004607',
'episode': 'Janhvi Targets Suman', 'episode': 'Janhvi Targets Suman',
'episode_number': 8, 'episode_number': 8,
}, },
}, { 'params': {'skip_download': 'm3u8'},
}, { # Metadata call gets HTTP Error 504 with tas=10000
'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/anupama-anuj-share-a-moment/1000282843', 'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/anupama-anuj-share-a-moment/1000282843',
'info_dict': { 'info_dict': {
'id': '1000282843', 'id': '1000282843',
@@ -134,14 +185,14 @@ class HotStarIE(HotStarBaseIE):
'channel': 'StarPlus', 'channel': 'StarPlus',
'series': 'Anupama', 'series': 'Anupama',
'season_number': 1, 'season_number': 1,
'season_id': '7399', 'season_id': '1260022018',
'upload_date': '20230307', 'upload_date': '20230307',
'episode': 'Anupama, Anuj Share a Moment', 'episode': 'Anupama, Anuj Share a Moment',
'episode_number': 853, 'episode_number': 853,
'duration': 1272, 'duration': 1266,
'channel_id': '3', 'channel_id': '821',
}, },
'skip': 'HTTP Error 504: Gateway Time-out', # XXX: Investigate 504 errors on some episodes 'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.hotstar.com/in/shows/kana-kaanum-kaalangal/1260097087/back-to-school/1260097320', 'url': 'https://www.hotstar.com/in/shows/kana-kaanum-kaalangal/1260097087/back-to-school/1260097320',
'info_dict': { 'info_dict': {
@@ -154,14 +205,15 @@ class HotStarIE(HotStarBaseIE):
'channel': 'Hotstar Specials', 'channel': 'Hotstar Specials',
'series': 'Kana Kaanum Kaalangal', 'series': 'Kana Kaanum Kaalangal',
'season_number': 1, 'season_number': 1,
'season_id': '9441', 'season_id': '1260097089',
'upload_date': '20220421', 'upload_date': '20220421',
'episode': 'Back To School', 'episode': 'Back To School',
'episode_number': 1, 'episode_number': 1,
'duration': 1810, 'duration': 1810,
'channel_id': '54', 'channel_id': '1260003991',
}, },
}, { 'params': {'skip_download': 'm3u8'},
}, { # Metadata call gets HTTP Error 504 with tas=10000
'url': 'https://www.hotstar.com/in/clips/e3-sairat-kahani-pyaar-ki/1000262286', 'url': 'https://www.hotstar.com/in/clips/e3-sairat-kahani-pyaar-ki/1000262286',
'info_dict': { 'info_dict': {
'id': '1000262286', 'id': '1000262286',
@@ -173,6 +225,7 @@ class HotStarIE(HotStarBaseIE):
'timestamp': 1622943900, 'timestamp': 1622943900,
'duration': 5395, 'duration': 5395,
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.hotstar.com/in/movies/premam/1000091195', 'url': 'https://www.hotstar.com/in/movies/premam/1000091195',
'info_dict': { 'info_dict': {
@@ -180,12 +233,13 @@ class HotStarIE(HotStarBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Premam', 'title': 'Premam',
'release_year': 2015, 'release_year': 2015,
'description': 'md5:d833c654e4187b5e34757eafb5b72d7f', 'description': 'md5:096cd8aaae8dab56524823dc19dfa9f7',
'timestamp': 1462149000, 'timestamp': 1462149000,
'upload_date': '20160502', 'upload_date': '20160502',
'episode': 'Premam', 'episode': 'Premam',
'duration': 8994, 'duration': 8994,
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157', 'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'only_matching': True, 'only_matching': True,
@@ -208,6 +262,13 @@ class HotStarIE(HotStarBaseIE):
None: 'content', None: 'content',
} }
_CONTENT_TYPE = {
'movie': 'MOVIE',
'episode': 'EPISODE',
'match': 'SPORT',
'content': 'CLIPS',
}
_IGNORE_MAP = { _IGNORE_MAP = {
'res': 'resolution', 'res': 'resolution',
'vcodec': 'video_codec', 'vcodec': 'video_codec',
@@ -229,38 +290,50 @@ class HotStarIE(HotStarBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id, video_type = self._match_valid_url(url).group('id', 'type') video_id, video_type = self._match_valid_url(url).group('id', 'type')
video_type = self._TYPE.get(video_type, video_type) video_type = self._TYPE[video_type]
cookies = self._get_cookies(url) # Cookies before any request cookies = self._get_cookies(url) # Cookies before any request
if not cookies or not cookies.get(self._TOKEN_NAME):
self.raise_login_required()
video_data = traverse_obj( video_data = traverse_obj(
self._call_api_v1( self._call_api_v1(f'{video_type}/detail', video_id, fatal=False, query={
f'{video_type}/detail', video_id, fatal=False, query={'tas': 10000, 'contentId': video_id}), 'tas': 5, # See https://github.com/yt-dlp/yt-dlp/issues/7946
('body', 'results', 'item', {dict})) or {} 'contentId': video_id,
if not self.get_param('allow_unplayable_formats') and video_data.get('drmProtected'): }), ('body', 'results', 'item', {dict})) or {}
if video_data.get('drmProtected'):
self.report_drm(video_id) self.report_drm(video_id)
# See https://github.com/yt-dlp/yt-dlp/issues/396
st = self._download_webpage_handle(f'{self._BASE_URL}/in', video_id)[1].headers.get('x-origin-date')
geo_restricted = False geo_restricted = False
formats, subs = [], {} formats, subs, has_drm = [], {}, False
headers = {'Referer': f'{self._BASE_URL}/in'} headers = {'Referer': f'{self._BASE_URL}/in'}
content_type = traverse_obj(video_data, ('contentType', {str})) or self._CONTENT_TYPE[video_type]
# change to v2 in the future # See https://github.com/yt-dlp/yt-dlp/issues/396
playback_sets = self._call_api_v2('play/v1/playback', video_id, st=st, cookies=cookies)['playBackSets'] st = self._request_webpage(
for playback_set in playback_sets: f'{self._BASE_URL}/in', video_id, 'Fetching server time').get_header('x-origin-date')
if not isinstance(playback_set, dict): watch = self._call_api_v2('pages/watch', video_id, content_type, cookies, st)
continue player_config = traverse_obj(watch, (
tags = str_or_none(playback_set.get('tagsCombination')) or '' 'page', 'spaces', 'player', 'widget_wrappers', lambda _, v: v['template'] == 'PlayerWidget',
'widget', 'data', 'player_config', {dict}, any, {require('player config')}))
for playback_set in traverse_obj(player_config, (
('media_asset', 'media_asset_v2'),
('primary', 'fallback'),
all, lambda _, v: url_or_none(v['content_url']),
)):
tags = str_or_none(playback_set.get('playback_tags')) or ''
if any(f'{prefix}:{ignore}' in tags if any(f'{prefix}:{ignore}' in tags
for key, prefix in self._IGNORE_MAP.items() for key, prefix in self._IGNORE_MAP.items()
for ignore in self._configuration_arg(key)): for ignore in self._configuration_arg(key)):
continue continue
format_url = url_or_none(playback_set.get('playbackUrl')) tag_dict = dict((*t.split(':', 1), None)[:2] for t in tags.split(';'))
if not format_url: if tag_dict.get('encryption') not in ('plain', None):
has_drm = True
continue continue
format_url = re.sub(r'(?<=//staragvod)(\d)', r'web\1', format_url)
format_url = re.sub(r'(?<=//staragvod)(\d)', r'web\1', playback_set['content_url'])
ext = determine_ext(format_url) ext = determine_ext(format_url)
current_formats, current_subs = [], {} current_formats, current_subs = [], {}
@@ -280,14 +353,12 @@ class HotStarIE(HotStarBaseIE):
'height': int_or_none(playback_set.get('height')), 'height': int_or_none(playback_set.get('height')),
}] }]
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403: if isinstance(e.cause, HTTPError) and e.cause.status in (403, 474):
geo_restricted = True geo_restricted = True
else:
self.write_debug(e)
continue continue
tag_dict = dict((*t.split(':', 1), None)[:2] for t in tags.split(';'))
if tag_dict.get('encryption') not in ('plain', None):
for f in current_formats:
f['has_drm'] = True
for f in current_formats: for f in current_formats:
for k, v in self._TAG_FIELDS.items(): for k, v in self._TAG_FIELDS.items():
if not f.get(k): if not f.get(k):
@@ -299,6 +370,11 @@ class HotStarIE(HotStarBaseIE):
'stereo': 2, 'stereo': 2,
'dolby51': 6, 'dolby51': 6,
}.get(tag_dict.get('audio_channel')) }.get(tag_dict.get('audio_channel'))
if (
'Audio_Description' in f['format_id']
or 'Audio Description' in (f.get('format_note') or '')
):
f['source_preference'] = -99 + (f.get('source_preference') or -1)
f['format_note'] = join_nonempty( f['format_note'] = join_nonempty(
tag_dict.get('ladder'), tag_dict.get('ladder'),
tag_dict.get('audio_channel') if f.get('acodec') != 'none' else None, tag_dict.get('audio_channel') if f.get('acodec') != 'none' else None,
@@ -308,29 +384,22 @@ class HotStarIE(HotStarBaseIE):
formats.extend(current_formats) formats.extend(current_formats)
subs = self._merge_subtitles(subs, current_subs) subs = self._merge_subtitles(subs, current_subs)
if not formats and geo_restricted: if not formats:
self.raise_geo_restricted(countries=['IN'], metadata_available=True) if geo_restricted:
self.raise_geo_restricted(countries=['IN'], metadata_available=True)
elif has_drm:
self.report_drm(video_id)
elif not self._has_active_subscription(cookies, st):
self.raise_no_formats('Your account does not have access to this content', expected=True)
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
for f in formats: for f in formats:
f.setdefault('http_headers', {}).update(headers) f.setdefault('http_headers', {}).update(headers)
return { return {
**self._parse_metadata_v1(video_data),
'id': video_id, 'id': video_id,
'title': video_data.get('title'),
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(traverse_obj(video_data, 'broadcastDate', 'startDate')),
'release_year': int_or_none(video_data.get('year')),
'formats': formats, 'formats': formats,
'subtitles': subs, 'subtitles': subs,
'channel': video_data.get('channelName'),
'channel_id': str_or_none(video_data.get('channelId')),
'series': video_data.get('showName'),
'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': str_or_none(video_data.get('seasonId')),
'episode': video_data.get('title'),
'episode_number': int_or_none(video_data.get('episodeNo')),
} }
@@ -371,64 +440,6 @@ class HotStarPrefixIE(InfoExtractor):
return self.url_result(HotStarIE._video_url(video_id, video_type), HotStarIE, video_id) return self.url_result(HotStarIE._video_url(video_id, video_type), HotStarIE, video_id)
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)(?:/[^/]+){2}/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': {
'id': '3_2_26',
},
'playlist_mincount': 20,
}, {
'url': 'https://www.hotstar.com/shows/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'only_matching': True,
}, {
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True,
}, {
'url': 'https://www.hotstar.com/in/tv/karthika-deepam/15457/list/popular-clips/t-3_2_1272',
'only_matching': True,
}]
def _real_extract(self, url):
id_ = self._match_id(url)
return self.playlist_result(
self._playlist_entries('tray/find', id_, query={'tas': 10000, 'uqId': id_}), id_)
class HotStarSeasonIE(HotStarBaseIE):
IE_NAME = 'hotstar:season'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)/[^/]+/\w+)/seasons/[^/]+/ss-(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.hotstar.com/tv/radhakrishn/1260000646/seasons/season-2/ss-8028',
'info_dict': {
'id': '8028',
},
'playlist_mincount': 35,
}, {
'url': 'https://www.hotstar.com/in/tv/ishqbaaz/9567/seasons/season-2/ss-4357',
'info_dict': {
'id': '4357',
},
'playlist_mincount': 30,
}, {
'url': 'https://www.hotstar.com/in/tv/bigg-boss/14714/seasons/season-4/ss-8208/',
'info_dict': {
'id': '8208',
},
'playlist_mincount': 19,
}, {
'url': 'https://www.hotstar.com/in/shows/bigg-boss/14714/seasons/season-4/ss-8208/',
'only_matching': True,
}]
def _real_extract(self, url):
url, season_id = self._match_valid_url(url).groups()
return self.playlist_result(self._playlist_entries(
'season/asset', season_id, url, query={'tao': 0, 'tas': 0, 'size': 10000, 'id': season_id}), season_id)
class HotStarSeriesIE(HotStarBaseIE): class HotStarSeriesIE(HotStarBaseIE):
IE_NAME = 'hotstar:series' IE_NAME = 'hotstar:series'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)/[^/]+/(?P<id>\d+))/?(?:[#?]|$)' _VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)/[^/]+/(?P<id>\d+))/?(?:[#?]|$)'
@@ -443,25 +454,29 @@ class HotStarSeriesIE(HotStarBaseIE):
'info_dict': { 'info_dict': {
'id': '1260050431', 'id': '1260050431',
}, },
'playlist_mincount': 43, 'playlist_mincount': 42,
}, { }, {
'url': 'https://www.hotstar.com/in/tv/mahabharat/435/', 'url': 'https://www.hotstar.com/in/tv/mahabharat/435/',
'info_dict': { 'info_dict': {
'id': '435', 'id': '435',
}, },
'playlist_mincount': 267, 'playlist_mincount': 267,
}, { }, { # HTTP Error 504 with tas=10000 (possibly because total size is over 1000 items?)
'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/', 'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/',
'info_dict': { 'info_dict': {
'id': '1260022017', 'id': '1260022017',
}, },
'playlist_mincount': 940, 'playlist_mincount': 1601,
}] }]
_PAGE_SIZE = 100
def _real_extract(self, url): def _real_extract(self, url):
url, series_id = self._match_valid_url(url).groups() url, series_id = self._match_valid_url(url).group('url', 'id')
id_ = self._call_api_v1( eid = self._call_api_v1(
'show/detail', series_id, query={'contentId': series_id})['body']['results']['item']['id'] 'show/detail', series_id, query={'contentId': series_id})['body']['results']['item']['id']
return self.playlist_result(self._playlist_entries( entries = OnDemandPagedList(functools.partial(
'tray/g/1/items', series_id, url, query={'tao': 0, 'tas': 10000, 'etid': 0, 'eid': id_}), series_id) self._fetch_page, 'tray/g/1/items', series_id,
'series', {'etid': 0, 'eid': eid}, url), self._PAGE_SIZE)
return self.playlist_result(entries, series_id)

View File

@@ -1,408 +0,0 @@
import base64
import itertools
import json
import random
import re
import string
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
jwt_decode_hs256,
parse_age_limit,
try_call,
url_or_none,
)
from ..utils.traversal import traverse_obj
class JioCinemaBaseIE(InfoExtractor):
_NETRC_MACHINE = 'jiocinema'
_GEO_BYPASS = False
_ACCESS_TOKEN = None
_REFRESH_TOKEN = None
_GUEST_TOKEN = None
_USER_ID = None
_DEVICE_ID = None
_API_HEADERS = {'Origin': 'https://www.jiocinema.com', 'Referer': 'https://www.jiocinema.com/'}
_APP_NAME = {'appName': 'RJIL_JioCinema'}
_APP_VERSION = {'appVersion': '5.0.0'}
_API_SIGNATURES = 'o668nxgzwff'
_METADATA_API_BASE = 'https://content-jiovoot.voot.com/psapi'
_ACCESS_HINT = 'the `accessToken` from your browser local storage'
_LOGIN_HINT = (
'Log in with "-u phone -p <PHONE_NUMBER>" to authenticate with OTP, '
f'or use "-u token -p <ACCESS_TOKEN>" to log in with {_ACCESS_HINT}. '
'If you have previously logged in with yt-dlp and your session '
'has been cached, you can use "-u device -p <DEVICE_ID>"')
def _cache_token(self, token_type):
assert token_type in ('access', 'refresh', 'all')
if token_type in ('access', 'all'):
self.cache.store(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-access', JioCinemaBaseIE._ACCESS_TOKEN)
if token_type in ('refresh', 'all'):
self.cache.store(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-refresh', JioCinemaBaseIE._REFRESH_TOKEN)
def _call_api(self, url, video_id, note='Downloading API JSON', headers={}, data={}):
return self._download_json(
url, video_id, note, data=json.dumps(data, separators=(',', ':')).encode(), headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
**self._API_HEADERS,
**headers,
}, expected_status=(400, 403, 474))
def _call_auth_api(self, service, endpoint, note, headers={}, data={}):
return self._call_api(
f'https://auth-jiocinema.voot.com/{service}service/apis/v4/{endpoint}',
None, note=note, headers=headers, data=data)
def _refresh_token(self):
if not JioCinemaBaseIE._REFRESH_TOKEN or not JioCinemaBaseIE._DEVICE_ID:
raise ExtractorError('User token has expired', expected=True)
response = self._call_auth_api(
'token', 'refreshtoken', 'Refreshing token',
headers={'accesstoken': self._ACCESS_TOKEN}, data={
**self._APP_NAME,
'deviceId': self._DEVICE_ID,
'refreshToken': self._REFRESH_TOKEN,
**self._APP_VERSION,
})
refresh_token = response.get('refreshTokenId')
if refresh_token and refresh_token != JioCinemaBaseIE._REFRESH_TOKEN:
JioCinemaBaseIE._REFRESH_TOKEN = refresh_token
self._cache_token('refresh')
JioCinemaBaseIE._ACCESS_TOKEN = response['authToken']
self._cache_token('access')
def _fetch_guest_token(self):
JioCinemaBaseIE._DEVICE_ID = ''.join(random.choices(string.digits, k=10))
guest_token = self._call_auth_api(
'token', 'guest', 'Downloading guest token', data={
**self._APP_NAME,
'deviceType': 'phone',
'os': 'ios',
'deviceId': self._DEVICE_ID,
'freshLaunch': False,
'adId': self._DEVICE_ID,
**self._APP_VERSION,
})
self._GUEST_TOKEN = guest_token['authToken']
self._USER_ID = guest_token['userId']
def _call_login_api(self, endpoint, guest_token, data, note):
return self._call_auth_api(
'user', f'loginotp/{endpoint}', note, headers={
**self.geo_verification_headers(),
'accesstoken': self._GUEST_TOKEN,
**self._APP_NAME,
**traverse_obj(guest_token, 'data', {
'deviceType': ('deviceType', {str}),
'os': ('os', {str}),
})}, data=data)
def _is_token_expired(self, token):
return (try_call(lambda: jwt_decode_hs256(token)['exp']) or 0) <= int(time.time() - 180)
def _perform_login(self, username, password):
if self._ACCESS_TOKEN and not self._is_token_expired(self._ACCESS_TOKEN):
return
UUID_RE = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
if username.lower() == 'token':
if try_call(lambda: jwt_decode_hs256(password)):
JioCinemaBaseIE._ACCESS_TOKEN = password
refresh_hint = 'the `refreshToken` UUID from your browser local storage'
refresh_token = self._configuration_arg('refresh_token', [''], ie_key=JioCinemaIE)[0]
if not refresh_token:
self.to_screen(
'To extend the life of your login session, in addition to your access token, '
'you can pass --extractor-args "jiocinema:refresh_token=REFRESH_TOKEN" '
f'where REFRESH_TOKEN is {refresh_hint}')
elif re.fullmatch(UUID_RE, refresh_token):
JioCinemaBaseIE._REFRESH_TOKEN = refresh_token
else:
self.report_warning(f'Invalid refresh_token value. Use {refresh_hint}')
else:
raise ExtractorError(
f'The password given could not be decoded as a token; use {self._ACCESS_HINT}', expected=True)
elif username.lower() == 'device' and re.fullmatch(rf'(?:{UUID_RE}|\d+)', password):
JioCinemaBaseIE._REFRESH_TOKEN = self.cache.load(JioCinemaBaseIE._NETRC_MACHINE, f'{password}-refresh')
JioCinemaBaseIE._ACCESS_TOKEN = self.cache.load(JioCinemaBaseIE._NETRC_MACHINE, f'{password}-access')
if not JioCinemaBaseIE._REFRESH_TOKEN or not JioCinemaBaseIE._ACCESS_TOKEN:
raise ExtractorError(f'Failed to load cached tokens for device ID "{password}"', expected=True)
elif username.lower() == 'phone' and re.fullmatch(r'\+?\d+', password):
self._fetch_guest_token()
guest_token = jwt_decode_hs256(self._GUEST_TOKEN)
initial_data = {
'number': base64.b64encode(password.encode()).decode(),
**self._APP_VERSION,
}
response = self._call_login_api('send', guest_token, initial_data, 'Requesting OTP')
if not traverse_obj(response, ('OTPInfo', {dict})):
raise ExtractorError('There was a problem with the phone number login attempt')
is_iphone = guest_token.get('os') == 'ios'
response = self._call_login_api('verify', guest_token, {
'deviceInfo': {
'consumptionDeviceName': 'iPhone' if is_iphone else 'Android',
'info': {
'platform': {'name': 'iPhone OS' if is_iphone else 'Android'},
'androidId': self._DEVICE_ID,
'type': 'iOS' if is_iphone else 'Android',
},
},
**initial_data,
'otp': self._get_tfa_info('the one-time password sent to your phone'),
}, 'Submitting OTP')
if traverse_obj(response, 'code') == 1043:
raise ExtractorError('Wrong OTP', expected=True)
JioCinemaBaseIE._REFRESH_TOKEN = response['refreshToken']
JioCinemaBaseIE._ACCESS_TOKEN = response['authToken']
else:
raise ExtractorError(self._LOGIN_HINT, expected=True)
user_token = jwt_decode_hs256(JioCinemaBaseIE._ACCESS_TOKEN)['data']
JioCinemaBaseIE._USER_ID = user_token['userId']
JioCinemaBaseIE._DEVICE_ID = user_token['deviceId']
if JioCinemaBaseIE._REFRESH_TOKEN and username != 'device':
self._cache_token('all')
if self.get_param('cachedir') is not False:
self.to_screen(
f'NOTE: For subsequent logins you can use "-u device -p {JioCinemaBaseIE._DEVICE_ID}"')
elif not JioCinemaBaseIE._REFRESH_TOKEN:
JioCinemaBaseIE._REFRESH_TOKEN = self.cache.load(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-refresh')
if JioCinemaBaseIE._REFRESH_TOKEN:
self._cache_token('access')
self.to_screen(f'Logging in as device ID "{JioCinemaBaseIE._DEVICE_ID}"')
if self._is_token_expired(JioCinemaBaseIE._ACCESS_TOKEN):
self._refresh_token()
class JioCinemaIE(JioCinemaBaseIE):
IE_NAME = 'jiocinema'
_VALID_URL = r'https?://(?:www\.)?jiocinema\.com/?(?:movies?/[^/?#]+/|tv-shows/(?:[^/?#]+/){3})(?P<id>\d{3,})'
_TESTS = [{
'url': 'https://www.jiocinema.com/tv-shows/agnisakshi-ek-samjhauta/1/pradeep-to-stop-the-wedding/3759931',
'info_dict': {
'id': '3759931',
'ext': 'mp4',
'title': 'Pradeep to stop the wedding?',
'description': 'md5:75f72d1d1a66976633345a3de6d672b1',
'episode': 'Pradeep to stop the wedding?',
'episode_number': 89,
'season': 'Agnisakshi…Ek Samjhauta-S1',
'season_number': 1,
'series': 'Agnisakshi Ek Samjhauta',
'duration': 1238.0,
'thumbnail': r're:https?://.+\.jpg',
'age_limit': 13,
'season_id': '3698031',
'upload_date': '20230606',
'timestamp': 1686009600,
'release_date': '20230607',
'genres': ['Drama'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.jiocinema.com/movies/bhediya/3754021/watch',
'info_dict': {
'id': '3754021',
'ext': 'mp4',
'title': 'Bhediya',
'description': 'md5:a6bf2900371ac2fc3f1447401a9f7bb0',
'episode': 'Bhediya',
'duration': 8500.0,
'thumbnail': r're:https?://.+\.jpg',
'age_limit': 13,
'upload_date': '20230525',
'timestamp': 1685026200,
'release_date': '20230524',
'genres': ['Comedy'],
},
'params': {'skip_download': 'm3u8'},
}]
def _extract_formats_and_subtitles(self, playback, video_id):
m3u8_url = traverse_obj(playback, (
'data', 'playbackUrls', lambda _, v: v['streamtype'] == 'hls', 'url', {url_or_none}, any))
if not m3u8_url: # DRM-only content only serves dash urls
self.report_drm(video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, m3u8_id='hls')
self._remove_duplicate_formats(formats)
return {
# '/_definst_/smil:vod/' m3u8 manifests claim to have 720p+ formats but max out at 480p
'formats': traverse_obj(formats, (
lambda _, v: '/_definst_/smil:vod/' not in v['url'] or v['height'] <= 480)),
'subtitles': subtitles,
}
def _real_extract(self, url):
video_id = self._match_id(url)
if not self._ACCESS_TOKEN and self._is_token_expired(self._GUEST_TOKEN):
self._fetch_guest_token()
elif self._ACCESS_TOKEN and self._is_token_expired(self._ACCESS_TOKEN):
self._refresh_token()
playback = self._call_api(
f'https://apis-jiovoot.voot.com/playbackjv/v3/{video_id}', video_id,
'Downloading playback JSON', headers={
**self.geo_verification_headers(),
'accesstoken': self._ACCESS_TOKEN or self._GUEST_TOKEN,
**self._APP_NAME,
'deviceid': self._DEVICE_ID,
'uniqueid': self._USER_ID,
'x-apisignatures': self._API_SIGNATURES,
'x-platform': 'androidweb',
'x-platform-token': 'web',
}, data={
'4k': False,
'ageGroup': '18+',
'appVersion': '3.4.0',
'bitrateProfile': 'xhdpi',
'capability': {
'drmCapability': {
'aesSupport': 'yes',
'fairPlayDrmSupport': 'none',
'playreadyDrmSupport': 'none',
'widevineDRMSupport': 'none',
},
'frameRateCapability': [{
'frameRateSupport': '30fps',
'videoQuality': '1440p',
}],
},
'continueWatchingRequired': False,
'dolby': False,
'downloadRequest': False,
'hevc': False,
'kidsSafe': False,
'manufacturer': 'Windows',
'model': 'Windows',
'multiAudioRequired': True,
'osVersion': '10',
'parentalPinValid': True,
'x-apisignatures': self._API_SIGNATURES,
})
status_code = traverse_obj(playback, ('code', {int}))
if status_code == 474:
self.raise_geo_restricted(countries=['IN'])
elif status_code == 1008:
error_msg = 'This content is only available for premium users'
if self._ACCESS_TOKEN:
raise ExtractorError(error_msg, expected=True)
self.raise_login_required(f'{error_msg}. {self._LOGIN_HINT}', method=None)
elif status_code == 400:
raise ExtractorError('The requested content is not available', expected=True)
elif status_code is not None and status_code != 200:
raise ExtractorError(
f'JioCinema says: {traverse_obj(playback, ("message", {str})) or status_code}')
metadata = self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/content/query/asset-details',
video_id, fatal=False, query={
'ids': f'include:{video_id}',
'responseType': 'common',
'devicePlatformType': 'desktop',
})
return {
'id': video_id,
'http_headers': self._API_HEADERS,
**self._extract_formats_and_subtitles(playback, video_id),
**traverse_obj(playback, ('data', {
# fallback metadata
'title': ('name', {str}),
'description': ('fullSynopsis', {str}),
'series': ('show', 'name', {str}, filter),
'season': ('tournamentName', {str}, {lambda x: x if x != 'Season 0' else None}),
'season_number': ('episode', 'season', {int_or_none}, filter),
'episode': ('fullTitle', {str}),
'episode_number': ('episode', 'episodeNo', {int_or_none}, filter),
'age_limit': ('ageNemonic', {parse_age_limit}),
'duration': ('totalDuration', {float_or_none}),
'thumbnail': ('images', {url_or_none}),
})),
**traverse_obj(metadata, ('result', 0, {
'title': ('fullTitle', {str}),
'description': ('fullSynopsis', {str}),
'series': ('showName', {str}, filter),
'season': ('seasonName', {str}, filter),
'season_number': ('season', {int_or_none}),
'season_id': ('seasonId', {str}, filter),
'episode': ('fullTitle', {str}),
'episode_number': ('episode', {int_or_none}),
'timestamp': ('uploadTime', {int_or_none}),
'release_date': ('telecastDate', {str}),
'age_limit': ('ageNemonic', {parse_age_limit}),
'duration': ('duration', {float_or_none}),
'genres': ('genres', ..., {str}),
'thumbnail': ('seo', 'ogImage', {url_or_none}),
})),
}
class JioCinemaSeriesIE(JioCinemaBaseIE):
IE_NAME = 'jiocinema:series'
_VALID_URL = r'https?://(?:www\.)?jiocinema\.com/tv-shows/(?P<slug>[\w-]+)/(?P<id>\d{3,})'
_TESTS = [{
'url': 'https://www.jiocinema.com/tv-shows/naagin/3499917',
'info_dict': {
'id': '3499917',
'title': 'naagin',
},
'playlist_mincount': 120,
}, {
'url': 'https://www.jiocinema.com/tv-shows/mtv-splitsvilla-x5/3499820',
'info_dict': {
'id': '3499820',
'title': 'mtv-splitsvilla-x5',
},
'playlist_mincount': 310,
}]
def _entries(self, series_id):
seasons = traverse_obj(self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/view/show/{series_id}', series_id,
'Downloading series metadata JSON', query={'responseType': 'common'}), (
'trays', lambda _, v: v['trayId'] == 'season-by-show-multifilter',
'trayTabs', lambda _, v: v['id']))
for season_num, season in enumerate(seasons, start=1):
season_id = season['id']
label = season.get('label') or season_num
for page_num in itertools.count(1):
episodes = traverse_obj(self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/content/generic/series-wise-episode',
season_id, f'Downloading season {label} page {page_num} JSON', query={
'sort': 'episode:asc',
'id': season_id,
'responseType': 'common',
'page': page_num,
}), ('result', lambda _, v: v['id'] and url_or_none(v['slug'])))
if not episodes:
break
for episode in episodes:
yield self.url_result(
episode['slug'], JioCinemaIE, **traverse_obj(episode, {
'video_id': 'id',
'video_title': ('fullTitle', {str}),
'season_number': ('season', {int_or_none}),
'episode_number': ('episode', {int_or_none}),
}))
def _real_extract(self, url):
slug, series_id = self._match_valid_url(url).group('slug', 'id')
return self.playlist_result(self._entries(series_id), series_id, slug)

View File

@@ -1,112 +0,0 @@
import datetime as dt
import urllib.parse
from .common import InfoExtractor
from ..utils import (
clean_html,
datetime_from_str,
unified_timestamp,
urljoin,
)
class JoqrAgIE(InfoExtractor):
IE_DESC = '超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)'
_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/(?:player|inc-player-hls)\.php',
r'https?://(?:www\.)?joqr\.co\.jp/ag/',
r'https?://(?:www\.)?joqr\.co\.jp/qr/ag(?:daily|regular)program/?(?:$|[#?])']
_TESTS = [{
'url': 'https://www.uniqueradio.jp/agplayer5/player.php',
'info_dict': {
'id': 'live',
'title': str,
'channel': '超!A&G+',
'description': str,
'live_status': 'is_live',
'release_timestamp': int,
},
'params': {
'skip_download': True,
'ignore_no_formats_error': True,
},
}, {
'url': 'https://www.uniqueradio.jp/agplayer5/inc-player-hls.php',
'only_matching': True,
}, {
'url': 'https://www.joqr.co.jp/ag/article/103760/',
'only_matching': True,
}, {
'url': 'http://www.joqr.co.jp/qr/agdailyprogram/',
'only_matching': True,
}, {
'url': 'http://www.joqr.co.jp/qr/agregularprogram/',
'only_matching': True,
}]
def _extract_metadata(self, variable, html):
return clean_html(urllib.parse.unquote_plus(self._search_regex(
rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
html, 'metadata', group='value', default=''))) or None
def _extract_start_timestamp(self, video_id, is_live):
def extract_start_time_from(date_str):
dt_ = datetime_from_str(date_str) + dt.timedelta(hours=9)
date = dt_.strftime('%Y%m%d')
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+\s*(\d{1,2}:\d{1,2})',
self._download_webpage(
f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
note=f'Downloading program list of {date}', fatal=False,
errnote=f'Failed to download program list of {date}') or '',
'start time', default=None)
if start_time:
return unified_timestamp(f'{dt_.strftime("%Y/%m/%d")} {start_time} +09:00')
return None
start_timestamp = extract_start_time_from('today')
if not start_timestamp:
return None
if not is_live or start_timestamp < datetime_from_str('now').timestamp():
return start_timestamp
else:
return extract_start_time_from('yesterday')
def _real_extract(self, url):
video_id = 'live'
metadata = self._download_webpage(
'https://www.uniqueradio.jp/aandg', video_id,
note='Downloading metadata', errnote='Failed to download metadata')
title = self._extract_metadata('Program_name', metadata)
if not title or title == '放送休止':
formats = []
live_status = 'is_upcoming'
release_timestamp = self._extract_start_timestamp(video_id, False)
msg = 'This stream is not currently live'
if release_timestamp:
msg += (' and will start at '
+ dt.datetime.fromtimestamp(release_timestamp).strftime('%Y-%m-%d %H:%M:%S'))
self.raise_no_formats(msg, expected=True)
else:
m3u8_path = self._search_regex(
r'<source\s[^>]*\bsrc="([^"]+)"',
self._download_webpage(
'https://www.uniqueradio.jp/agplayer5/inc-player-hls.php', video_id,
note='Downloading player data', errnote='Failed to download player data'),
'm3u8 url')
formats = self._extract_m3u8_formats(
urljoin('https://www.uniqueradio.jp/', m3u8_path), video_id)
live_status = 'is_live'
release_timestamp = self._extract_start_timestamp(video_id, True)
return {
'id': video_id,
'title': title,
'channel': '超!A&G+',
'description': self._extract_metadata('Program_text', metadata),
'formats': formats,
'live_status': live_status,
'release_timestamp': release_timestamp,
}

View File

@@ -1,12 +1,12 @@
import functools
import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import ( from ..utils import (
UserNotLive, UserNotLive,
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts,
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
@@ -16,21 +16,17 @@ from ..utils import (
class KickBaseIE(InfoExtractor): class KickBaseIE(InfoExtractor):
def _real_initialize(self): @functools.cached_property
self._request_webpage( def _api_headers(self):
HEADRequest('https://kick.com/'), None, 'Setting up session', fatal=False, impersonate=True) token = traverse_obj(
xsrf_token = self._get_cookies('https://kick.com/').get('XSRF-TOKEN') self._get_cookies('https://kick.com/'),
if not xsrf_token: ('session_token', 'value', {urllib.parse.unquote}))
self.write_debug('kick.com did not set XSRF-TOKEN cookie') return {'Authorization': f'Bearer {token}'} if token else {}
KickBaseIE._API_HEADERS = {
'Authorization': f'Bearer {xsrf_token.value}',
'X-XSRF-TOKEN': xsrf_token.value,
} if xsrf_token else {}
def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs): def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs):
return self._download_json( return self._download_json(
f'https://kick.com/api/{path}', display_id, note=note, f'https://kick.com/api/{path}', display_id, note=note,
headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs) headers={**self._api_headers, **headers}, impersonate=True, **kwargs)
class KickIE(KickBaseIE): class KickIE(KickBaseIE):

View File

@@ -1,358 +0,0 @@
import re
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
float_or_none,
int_or_none,
smuggle_url,
try_get,
unsmuggle_url,
)
class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
@classmethod
def _extract_embed_urls(cls, url, webpage):
lm = {
'Media': 'media',
'Channel': 'channel',
'ChannelList': 'channel_list',
}
def smuggle(url):
return smuggle_url(url, {'source_url': url})
entries = []
for kind, video_id in re.findall(
r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})',
webpage):
entries.append(cls.url_result(
smuggle(f'limelight:{lm[kind]}:{video_id}'),
f'Limelight{kind}', video_id))
for mobj in re.finditer(
# As per [1] class attribute should be exactly equal to
# LimelightEmbeddedPlayerFlash but numerous examples seen
# that don't exactly match it (e.g. [2]).
# 1. http://support.3playmedia.com/hc/en-us/articles/227732408-Limelight-Embedding-the-Captions-Plugin-with-the-Limelight-Player-on-Your-Webpage
# 2. http://www.sedona.com/FacilitatorTraining2017
r'''(?sx)
<object[^>]+class=(["\'])(?:(?!\1).)*\bLimelightEmbeddedPlayerFlash\b(?:(?!\1).)*\1[^>]*>.*?
<param[^>]+
name=(["\'])flashVars\2[^>]+
value=(["\'])(?:(?!\3).)*(?P<kind>media|channel(?:List)?)Id=(?P<id>[a-z0-9]{32})
''', webpage):
kind, video_id = mobj.group('kind'), mobj.group('id')
entries.append(cls.url_result(
smuggle(f'limelight:{kind}:{video_id}'),
f'Limelight{kind.capitalize()}', video_id))
# http://support.3playmedia.com/hc/en-us/articles/115009517327-Limelight-Embedding-the-Audio-Description-Plugin-with-the-Limelight-Player-on-Your-Web-Page)
for video_id in re.findall(
r'(?s)LimelightPlayerUtil\.embed\s*\(\s*{.*?\bmediaId["\']\s*:\s*["\'](?P<id>[a-z0-9]{32})',
webpage):
entries.append(cls.url_result(
smuggle(f'limelight:media:{video_id}'),
LimelightMediaIE.ie_key(), video_id))
return entries
def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, f'Downloading PlaylistService {method} JSON',
fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403:
error = self._parse_json(e.cause.response.read().decode(), item_id)['detail']['contentAccessPermission']
if error == 'CountryDisabled':
self.raise_geo_restricted()
raise ExtractorError(error, expected=True)
raise
def _extract(self, item_id, pc_method, mobile_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer)
mobile = self._call_playlist_service(
item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile
def _extract_info(self, pc, mobile, i, referer):
get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
pc_item = get_item(pc, 'playlistItems')
mobile_item = get_item(mobile, 'mediaList')
video_id = pc_item.get('mediaId') or mobile_item['mediaId']
title = pc_item.get('title') or mobile_item['title']
formats = []
urls = []
for stream in pc_item.get('streams', []):
stream_url = stream.get('url')
if not stream_url or stream_url in urls:
continue
if not self.get_param('allow_unplayable_formats') and stream.get('drmProtected'):
continue
urls.append(stream_url)
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id='hds', fatal=False))
else:
fmt = {
'url': stream_url,
'abr': float_or_none(stream.get('audioBitRate')),
'fps': float_or_none(stream.get('videoFrameRate')),
'ext': ext,
}
width = int_or_none(stream.get('videoWidthInPixels'))
height = int_or_none(stream.get('videoHeightInPixels'))
vbr = float_or_none(stream.get('videoBitRate'))
if width or height or vbr:
fmt.update({
'width': width,
'height': height,
'vbr': vbr,
})
else:
fmt['vcodec'] = 'none'
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', stream_url)
if rtmp:
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_format_id = format_id.replace('rtmp', 'http')
CDN_HOSTS = (
('delvenetworks.com', 'cpl.delvenetworks.com'),
('video.llnw.net', 's2.content.video.llnw.net'),
)
for cdn_host, http_host in CDN_HOSTS:
if cdn_host not in rtmp.group('host').lower():
continue
http_url = 'http://{}/{}'.format(http_host, rtmp.group('playpath')[4:])
urls.append(http_url)
if self._is_valid_url(http_url, video_id, http_format_id):
http_fmt = fmt.copy()
http_fmt.update({
'url': http_url,
'format_id': http_format_id,
})
formats.append(http_fmt)
break
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': format_id,
})
formats.append(fmt)
for mobile_url in mobile_item.get('mobileUrls', []):
media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform')
if not media_url or media_url in urls:
continue
if (format_id in ('Widevine', 'SmoothStreaming')
and not self.get_param('allow_unplayable_formats', False)):
continue
urls.append(media_url)
ext = determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': media_url,
'format_id': format_id,
'quality': -10,
'ext': ext,
})
subtitles = {}
for flag in mobile_item.get('flags'):
if flag == 'ClosedCaptions':
closed_captions = self._call_playlist_service(
video_id, 'getClosedCaptionsDetailsByMediaId',
False, referer) or []
for cc in closed_captions:
cc_url = cc.get('webvttFileUrl')
if not cc_url:
continue
lang = cc.get('languageCode') or self._search_regex(r'/([a-z]{2})\.vtt', cc_url, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': cc_url,
})
break
get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
return {
'id': video_id,
'title': title,
'description': get_meta('description'),
'formats': formats,
'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
'subtitles': subtitles,
}
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
_VALID_URL = r'''(?x)
(?:
limelight:media:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bmediaId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
'id': '3ffd040b522b4485b6d84effc750cd86',
'ext': 'mp4',
'title': 'HaP and the HB Prince Trailer',
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# video with subtitles
'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
'info_dict': {
'id': 'a3e00274d4564ec4a9b29b9466432335',
'ext': 'mp4',
'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101,
# TODO: extract all languages that were accessible via API
# 'subtitles': 'mincount:9',
'subtitles': 'mincount:1',
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile = self._extract(
video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', source_url)
return self._extract_info(pc, mobile, 0, source_url)
class LimelightChannelIE(LimelightBaseIE):
IE_NAME = 'limelight:channel'
_VALID_URL = r'''(?x)
(?:
limelight:channel:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
},
'playlist_mincount': 3,
}, {
'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
pc, mobile = self._extract(
channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
source_url)
entries = [
self._extract_info(pc, mobile, i, source_url)
for i in range(len(pc['playlistItems']))]
return self.playlist_result(
entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE):
IE_NAME = 'limelight:channel_list'
_VALID_URL = r'''(?x)
(?:
limelight:channel_list:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelListId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
'info_dict': {
'id': '301b117890c4465c8179ede21fd92e2b',
'title': 'Website - Hero Player',
},
'playlist_mincount': 2,
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel_list'
def _real_extract(self, url):
channel_list_id = self._match_id(url)
channel_list = self._call_playlist_service(
channel_list_id, 'getMobileChannelListById')
entries = [
self.url_result('limelight:channel:{}'.format(channel['id']), 'LimelightChannel')
for channel in channel_list['channelList']]
return self.playlist_result(
entries, channel_list_id, channel_list['title'])

View File

@@ -134,7 +134,7 @@ class LRTRadioIE(LRTBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id, path = self._match_valid_url(url).group('id', 'path') video_id, path = self._match_valid_url(url).group('id', 'path')
media = self._download_json( media = self._download_json(
'https://www.lrt.lt/radioteka/api/media', video_id, 'https://www.lrt.lt/rest-api/media', video_id,
query={'url': f'/mediateka/irasas/{video_id}/{path}'}) query={'url': f'/mediateka/irasas/{video_id}/{path}'})
return { return {

View File

@@ -0,0 +1,37 @@
from .common import InfoExtractor
from ..utils import parse_qs, url_or_none
from ..utils.traversal import require, traverse_obj
class Mir24TvIE(InfoExtractor):
IE_NAME = 'mir24.tv'
_VALID_URL = r'https?://(?:www\.)?mir24\.tv/news/(?P<id>[0-9]+)/[^/?#]+'
_TESTS = [{
'url': 'https://mir24.tv/news/16635210/dni-kultury-rossii-otkrylis-v-uzbekistane.-na-prazdnichnom-koncerte-vystupili-zvezdy-rossijskoj-estrada',
'info_dict': {
'id': '16635210',
'title': 'Дни культуры России открылись в Узбекистане. На праздничном концерте выступили звезды российской эстрады',
'ext': 'mp4',
'thumbnail': r're:https://images\.mir24\.tv/.+\.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, impersonate=True)
iframe_url = self._search_regex(
r'<iframe\b[^>]+\bsrc=["\'](https?://mir24\.tv/players/[^"\']+)',
webpage, 'iframe URL')
m3u8_url = traverse_obj(iframe_url, (
{parse_qs}, 'source', -1, {self._proto_relative_url}, {url_or_none}, {require('m3u8 URL')}))
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
return {
'id': video_id,
'title': self._og_search_title(webpage, default=None) or self._html_extract_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -18,7 +18,7 @@ class MirrativIE(MirrativBaseIE):
IE_NAME = 'mirrativ' IE_NAME = 'mirrativ'
_VALID_URL = r'https?://(?:www\.)?mirrativ\.com/live/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?mirrativ\.com/live/(?P<id>[^/?#&]+)'
TESTS = [{ _TESTS = [{
'url': 'https://mirrativ.com/live/UQomuS7EMgHoxRHjEhNiHw', 'url': 'https://mirrativ.com/live/UQomuS7EMgHoxRHjEhNiHw',
'info_dict': { 'info_dict': {
'id': 'UQomuS7EMgHoxRHjEhNiHw', 'id': 'UQomuS7EMgHoxRHjEhNiHw',

134
yt_dlp/extractor/mixlr.py Normal file
View File

@@ -0,0 +1,134 @@
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import int_or_none, parse_iso8601, url_or_none, urlhandle_detect_ext
from ..utils.traversal import traverse_obj
class MixlrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<username>[\w-]+)\.mixlr\.com/events/(?P<id>\d+)'
_TESTS = [{
'url': 'https://suncity-104-9fm.mixlr.com/events/4387115',
'info_dict': {
'id': '4387115',
'ext': 'mp3',
'title': r're:SUNCITY 104.9FM\'s live audio \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'uploader': 'suncity-104-9fm',
'like_count': int,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/cd5b34d05fa2cee72d80477724a2f02e.png',
'timestamp': 1751943773,
'upload_date': '20250708',
'release_timestamp': 1751943764,
'release_date': '20250708',
'live_status': 'is_live',
},
}, {
'url': 'https://brcountdown.mixlr.com/events/4395480',
'info_dict': {
'id': '4395480',
'ext': 'aac',
'title': r're:Beats Revolution Countdown Episodio 461 \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'description': 'md5:5cacd089723f7add3f266bd588315bb3',
'uploader': 'brcountdown',
'like_count': int,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/c48727a59f690b87a55d47d123ba0d6d.jpg',
'timestamp': 1752354007,
'upload_date': '20250712',
'release_timestamp': 1752354000,
'release_date': '20250712',
'live_status': 'is_live',
},
}, {
'url': 'https://www.brcountdown.mixlr.com/events/4395480',
'only_matching': True,
}]
def _real_extract(self, url):
username, event_id = self._match_valid_url(url).group('username', 'id')
broadcast_info = self._download_json(
f'https://api.mixlr.com/v3/channels/{username}/events/{event_id}', event_id)
formats = []
format_url = traverse_obj(
broadcast_info, ('included', 0, 'attributes', 'progressive_stream_url', {url_or_none}))
if format_url:
urlh = self._request_webpage(
HEADRequest(format_url), event_id, fatal=False, note='Checking stream')
if urlh and urlh.status == 200:
ext = urlhandle_detect_ext(urlh)
if ext == 'octet-stream':
self.report_warning(
'The server did not return a valid file extension for the stream URL. '
'Assuming an mp3 stream; postprocessing may fail if this is incorrect')
ext = 'mp3'
formats.append({
'url': format_url,
'ext': ext,
'vcodec': 'none',
})
release_timestamp = traverse_obj(
broadcast_info, ('data', 'attributes', 'starts_at', {str}))
if not formats and release_timestamp:
self.raise_no_formats(f'This event will start at {release_timestamp}', expected=True)
return {
'id': event_id,
'uploader': username,
'formats': formats,
'release_timestamp': parse_iso8601(release_timestamp),
**traverse_obj(broadcast_info, ('included', 0, 'attributes', {
'title': ('title', {str}),
'timestamp': ('started_at', {parse_iso8601}),
'concurrent_view_count': ('concurrent_view_count', {int_or_none}),
'like_count': ('heart_count', {int_or_none}),
'is_live': ('live', {bool}),
})),
**traverse_obj(broadcast_info, ('data', 'attributes', {
'title': ('title', {str}),
'description': ('description', {str}),
'timestamp': ('started_at', {parse_iso8601}),
'concurrent_view_count': ('concurrent_view_count', {int_or_none}),
'like_count': ('heart_count', {int_or_none}),
'thumbnail': ('artwork_url', {url_or_none}),
'uploader_id': ('broadcaster_id', {str}),
})),
}
class MixlrRecoringIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<username>[\w-]+)\.mixlr\.com/recordings/(?P<id>\d+)'
_TESTS = [{
'url': 'https://biblewayng.mixlr.com/recordings/2375193',
'info_dict': {
'id': '2375193',
'ext': 'mp3',
'title': "God's Jewels and Their Resting Place Bro. Adeniji",
'description': 'Preached February 21, 2024 in the evening',
'uploader_id': '8659190',
'duration': 10968,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/ceca120ef707f642abeea6e29cd74238.jpg',
'timestamp': 1708544542,
'upload_date': '20240221',
},
}]
def _real_extract(self, url):
username, recording_id = self._match_valid_url(url).group('username', 'id')
recording_info = self._download_json(
f'https://api.mixlr.com/v3/channels/{username}/recordings/{recording_id}', recording_id)
return {
'id': recording_id,
**traverse_obj(recording_info, ('data', 'attributes', {
'ext': ('file_format', {str}),
'url': ('url', {url_or_none}),
'title': ('title', {str}),
'description': ('description', {str}),
'timestamp': ('created_at', {parse_iso8601}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('artwork_url', {url_or_none}),
'uploader_id': ('user_id', {str}),
})),
}

View File

@@ -457,12 +457,9 @@ mutation initPlaybackSession(
self.report_warning(f'No formats available for {format_id} broadcast; skipping') self.report_warning(f'No formats available for {format_id} broadcast; skipping')
return [], {} return [], {}
cdn_headers = {'x-cdn-token': token}
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url.replace(f'/{token}/', '/'), video_id, 'mp4', m3u8_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)
m3u8_id=format_id, fatal=False, headers=cdn_headers)
for fmt in fmts: for fmt in fmts:
fmt['http_headers'] = cdn_headers
fmt.setdefault('format_note', join_nonempty(feed, medium, delim=' ')) fmt.setdefault('format_note', join_nonempty(feed, medium, delim=' '))
fmt.setdefault('language', language) fmt.setdefault('language', language)
if fmt.get('vcodec') == 'none' and fmt['language'] == 'en': if fmt.get('vcodec') == 'none' and fmt['language'] == 'en':

View File

@@ -1,53 +1,70 @@
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import (
clean_html,
parse_iso8601,
parse_qs,
url_or_none,
)
from ..utils.traversal import require, traverse_obj
class NewsPicksIE(InfoExtractor): class NewsPicksIE(InfoExtractor):
_VALID_URL = r'https?://newspicks\.com/movie-series/(?P<channel_id>\d+)\?movieId=(?P<id>\d+)' _VALID_URL = r'https?://newspicks\.com/movie-series/(?P<id>[^?/#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://newspicks.com/movie-series/11?movieId=1813', 'url': 'https://newspicks.com/movie-series/11/?movieId=1813',
'info_dict': { 'info_dict': {
'id': '1813', 'id': '1813',
'title': '日本の課題を破壊せよ【ゲスト:成田悠輔】',
'description': 'md5:09397aad46d6ded6487ff13f138acadf',
'channel': 'HORIE ONE',
'channel_id': '11',
'release_date': '20220117',
'thumbnail': r're:https://.+jpg',
'ext': 'mp4', 'ext': 'mp4',
'title': '日本の課題を破壊せよ【ゲスト:成田悠輔】',
'cast': 'count:4',
'description': 'md5:09397aad46d6ded6487ff13f138acadf',
'release_date': '20220117',
'release_timestamp': 1642424400,
'series': 'HORIE ONE',
'series_id': '11',
'thumbnail': r're:https?://resources\.newspicks\.com/.+\.(?:jpe?g|png)',
'timestamp': 1642424420,
'upload_date': '20220117',
},
}, {
'url': 'https://newspicks.com/movie-series/158/?movieId=3932',
'info_dict': {
'id': '3932',
'ext': 'mp4',
'title': '【検証】専門家は、KADOKAWAをどう見るか',
'cast': 'count:3',
'description': 'md5:2c2d4bf77484a4333ec995d676f9a91d',
'release_date': '20240622',
'release_timestamp': 1719088080,
'series': 'NPレポート',
'series_id': '158',
'thumbnail': r're:https?://resources\.newspicks\.com/.+\.(?:jpe?g|png)',
'timestamp': 1719086400,
'upload_date': '20240622',
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id, channel_id = self._match_valid_url(url).group('id', 'channel_id') series_id = self._match_id(url)
video_id = traverse_obj(parse_qs(url), ('movieId', -1, {str}, {require('movie ID')}))
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
entries = self._parse_html5_media_entries(
url, webpage.replace('movie-for-pc', 'movie'), video_id, 'hls')
if not entries:
raise ExtractorError('No HTML5 media elements found')
info = entries[0]
title = self._html_search_meta('og:title', webpage, fatal=False) fragment = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['fragment']
description = self._html_search_meta( m3u8_url = traverse_obj(fragment, ('movie', 'movieUrl', {url_or_none}, {require('m3u8 URL')}))
('og:description', 'twitter:title'), webpage, fatal=False) formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4')
channel = self._html_search_regex(
r'value="11".+?<div\s+class="title">(.+?)</div', webpage, 'channel name', fatal=False)
if not title or not channel:
title, channel = re.split(r'\s*|\s*', self._html_extract_title(webpage))
release_date = self._search_regex( return {
r'<span\s+class="on-air-date">\s*(\d+)年(\d+)月(\d+)日\s*</span>',
webpage, 'release date', fatal=False, group=(1, 2, 3))
info.update({
'id': video_id, 'id': video_id,
'title': title, 'formats': formats,
'description': description, 'series': traverse_obj(fragment, ('series', 'title', {str})),
'channel': channel, 'series_id': series_id,
'channel_id': channel_id, 'subtitles': subtitles,
'release_date': ('%04d%02d%02d' % tuple(map(int, release_date))) if release_date else None, **traverse_obj(fragment, ('movie', {
}) 'title': ('title', {str}),
return info 'cast': ('relatedUsers', ..., 'displayName', {str}, filter, all, filter),
'description': ('explanation', {clean_html}),
'release_timestamp': ('onAirStartDate', {parse_iso8601}),
'thumbnail': (('image', 'coverImageUrl'), {url_or_none}, any),
'timestamp': ('published', {parse_iso8601}),
})),
}

View File

@@ -8,6 +8,8 @@ from ..utils import (
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
make_archive_id,
orderedSet,
parse_duration, parse_duration,
remove_end, remove_end,
traverse_obj, traverse_obj,
@@ -16,6 +18,7 @@ from ..utils import (
unified_timestamp, unified_timestamp,
url_or_none, url_or_none,
urljoin, urljoin,
variadic,
) )
@@ -495,7 +498,7 @@ class NhkForSchoolBangumiIE(InfoExtractor):
chapters = None chapters = None
if chapter_durations and chapter_titles and len(chapter_durations) == len(chapter_titles): if chapter_durations and chapter_titles and len(chapter_durations) == len(chapter_titles):
start_time = chapter_durations start_time = chapter_durations
end_time = chapter_durations[1:] + [duration] end_time = [*chapter_durations[1:], duration]
chapters = [{ chapters = [{
'start_time': s, 'start_time': s,
'end_time': e, 'end_time': e,
@@ -591,102 +594,179 @@ class NhkRadiruIE(InfoExtractor):
IE_DESC = 'NHK らじる (Radiru/Rajiru)' IE_DESC = 'NHK らじる (Radiru/Rajiru)'
_VALID_URL = r'https?://www\.nhk\.or\.jp/radio/(?:player/ondemand|ondemand/detail)\.html\?p=(?P<site>[\da-zA-Z]+)_(?P<corner>[\da-zA-Z]+)(?:_(?P<headline>[\da-zA-Z]+))?' _VALID_URL = r'https?://www\.nhk\.or\.jp/radio/(?:player/ondemand|ondemand/detail)\.html\?p=(?P<site>[\da-zA-Z]+)_(?P<corner>[\da-zA-Z]+)(?:_(?P<headline>[\da-zA-Z]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=0449_01_4003239', 'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=LG96ZW5KZ4_01_4251382',
'skip': 'Episode expired on 2024-06-09', 'skip': 'Episode expires on 2025-07-14',
'info_dict': { 'info_dict': {
'title': 'ジャズ・トゥナイト ジャズ「Night and Day」特集', 'title': 'クラシックの庭\u3000特集「ドボルザークを聴く」(1)交響曲を中心に',
'id': '0449_01_4003239', 'id': 'LG96ZW5KZ4_01_4251382',
'ext': 'm4a', 'ext': 'm4a',
'uploader': 'NHK FM 東京', 'description': 'md5:652d3c38a25b77959c716421eba1617a',
'description': 'md5:ad05f3c3f3f6e99b2e69f9b5e49551dc', 'uploader': 'NHK FM・東京',
'series': 'ジャズ・トゥナイト', 'channel': 'NHK FM・東京',
'channel': 'NHK FM 東京', 'duration': 6597.0,
'thumbnail': 'https://www.nhk.or.jp/prog/img/449/g449.jpg', 'thumbnail': 'https://www.nhk.jp/static/assets/images/radioseries/rs/LG96ZW5KZ4/LG96ZW5KZ4-eyecatch_a67c6e949325016c0724f2ed3eec8a2f.jpg',
'upload_date': '20240601', 'categories': ['音楽', 'クラシック・オペラ'],
'series_id': '0449_01', 'cast': ['田添菜穂子'],
'release_date': '20240601', 'series': 'クラシックの庭',
'timestamp': 1717257600, 'series_id': 'LG96ZW5KZ4',
'release_timestamp': 1717250400, 'episode': '特集「ドボルザークを聴く」(1)交響曲を中心に',
'episode_id': 'QP1Q2ZXZY3',
'timestamp': 1751871000,
'upload_date': '20250707',
'release_timestamp': 1751864403,
'release_date': '20250707',
}, },
}, { }, {
# playlist, airs every weekday so it should _hopefully_ be okay forever # playlist, airs every weekday so it should _hopefully_ be okay forever
'url': 'https://www.nhk.or.jp/radio/ondemand/detail.html?p=0458_01', 'url': 'https://www.nhk.or.jp/radio/ondemand/detail.html?p=Z9L1V2M24L_01',
'info_dict': { 'info_dict': {
'id': '0458_01', 'id': 'Z9L1V2M24L_01',
'title': 'ベストオブクラシック', 'title': 'ベストオブクラシック',
'description': '世界中の上質な演奏会をじっくり堪能する本格派クラシック番組。', 'description': '世界中の上質な演奏会をじっくり堪能する本格派クラシック番組。',
'thumbnail': 'https://www.nhk.or.jp/prog/img/458/g458.jpg', 'thumbnail': 'https://www.nhk.jp/static/assets/images/radioseries/rs/Z9L1V2M24L/Z9L1V2M24L-eyecatch_83ed28b4782907998875965fee60a351.jpg',
'series_id': '0458_01', 'series_id': 'Z9L1V2M24L_01',
'uploader': 'NHK FM', 'uploader': 'NHK FM',
'channel': 'NHK FM', 'channel': 'NHK FM',
'series': 'ベストオブクラシック', 'series': 'ベストオブクラシック',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, {
# one with letters in the id
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=F683_01_3910688',
'note': 'Expires on 2025-03-31',
'info_dict': {
'id': 'F683_01_3910688',
'ext': 'm4a',
'title': '夏目漱石「文鳥」第1回',
'series': '【らじる文庫】夏目漱石「文鳥」全4回',
'series_id': 'F683_01',
'description': '朗読:浅井理アナウンサー',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F683/img/roudoku_05_rod_640.jpg',
'upload_date': '20240106',
'release_date': '20240106',
'uploader': 'NHK R1',
'release_timestamp': 1704511800,
'channel': 'NHK R1',
'timestamp': 1704512700,
},
'expected_warnings': ['Unable to download JSON metadata',
'Failed to get extended metadata. API returned Error 1: Invalid parameters'],
}, { }, {
# news # news
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=F261_01_4012173', 'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=18439M2W42_02_4251212',
'skip': 'Expires on 2025-07-15',
'info_dict': { 'info_dict': {
'id': 'F261_01_4012173', 'id': '18439M2W42_02_4251212',
'ext': 'm4a', 'ext': 'm4a',
'channel': 'NHKラジオ第1', 'title': 'マイあさ! 午前5時のNHKニュース 2025年7月8日',
'uploader': 'NHKラジオ第1', 'uploader': 'NHKラジオ第1',
'channel': 'NHKラジオ第1',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/18439M2W42/img/series_945_thumbnail.jpg',
'series': 'NHKラジオニュース', 'series': 'NHKラジオニュース',
'title': '午前時のNHKニュース', 'timestamp': 1751919420,
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F261/img/RADIONEWS_640.jpg', 'upload_date': '20250707',
'release_timestamp': 1718290800, 'release_timestamp': 1751918400,
'release_date': '20240613', 'release_date': '20250707',
'timestamp': 1718291400,
'upload_date': '20240613',
}, },
}, { }, {
# fallback when extended metadata fails # fallback when extended metadata fails
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=2834_01_4009298', 'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=J8792PY43V_20_4253945',
'skip': 'Expires on 2024-06-07', 'skip': 'Expires on 2025-09-01',
'info_dict': { 'info_dict': {
'id': '2834_01_4009298', 'id': 'J8792PY43V_20_4253945',
'title': 'まち☆キラ!開成町特集',
'ext': 'm4a', 'ext': 'm4a',
'release_date': '20240531', 'title': '「後絶たない筋肉増強剤の使用」ワールドリポート',
'upload_date': '20240531', 'description': '大濱 敦(ソウル支局)',
'series': 'はま☆キラ!', 'uploader': 'NHK R1',
'thumbnail': 'https://www.nhk.or.jp/prog/img/2834/g2834.jpg', 'channel': 'NHK R1',
'channel': 'NHK R1,FM', 'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/J8792PY43V/img/corner/box_31_thumbnail.jpg',
'description': '', 'series': 'マイあさ! ワールドリポート',
'timestamp': 1717123800, 'series_id': 'J8792PY43V_20',
'uploader': 'NHK R1,FM', 'timestamp': 1751837100,
'release_timestamp': 1717120800, 'upload_date': '20250706',
'series_id': '2834_01', 'release_timestamp': 1751835600,
'release_date': '20250706',
}, },
'expected_warnings': ['Failed to get extended metadata. API returned empty list.'], 'expected_warnings': ['Failed to download extended metadata: HTTP Error 404: Not Found'],
}] }]
_API_URL_TMPL = None _API_URL_TMPL = None
# The `_format_*` and `_make_*` functions are ported from: https://www.nhk.or.jp/radio/assets/js/timetable_detail_new.js
def _format_act_list(self, act_list):
role_groups = {}
for act in traverse_obj(act_list, (..., {dict})):
role = act.get('role')
if role not in role_groups:
role_groups[role] = []
role_groups[role].append(act)
formatted_roles = []
for role, acts in role_groups.items():
for i, act in enumerate(acts):
res = f'{role}' if i == 0 and role is not None else ''
if title := act.get('title'):
res += f'{title}'
formatted_roles.append(join_nonempty(res, act.get('name'), delim=''))
return join_nonempty(*formatted_roles, delim='')
def _make_artists(self, track, key):
artists = []
for artist in traverse_obj(track, (key, ..., {dict})):
if res := join_nonempty(*traverse_obj(artist, ((
('role', filter, {'{}'.format}),
('part', filter, {'{}'.format}),
('name', filter),
), {str})), delim=''):
artists.append(res)
return ''.join(artists) or None
def _make_duration(self, track, key):
d = traverse_obj(track, (key, {parse_duration}))
if d is None:
return None
hours, remainder = divmod(d, 3600)
minutes, seconds = divmod(remainder, 60)
res = ''
if hours > 0:
res += f'{int(hours)}時間'
if minutes > 0:
res += f'{int(minutes)}'
res += f'{int(seconds):02}秒)'
return res
def _format_music_list(self, music_list):
tracks = []
for track in traverse_obj(music_list, (..., {dict})):
track_details = traverse_obj(track, ((
('name', filter, {'{}'.format}),
('lyricist', filter, {'{}:作詞'.format}),
('composer', filter, {'{}:作曲'.format}),
('arranger', filter, {'{}:編曲'.format}),
), {str}))
track_details.append(self._make_artists(track, 'byArtist'))
track_details.append(self._make_duration(track, 'duration'))
if label := join_nonempty('label', 'code', delim=' ', from_dict=track):
track_details.append(f'{label}')
if location := traverse_obj(track, ('location', {str})):
track_details.append(f'{location}')
tracks.append(join_nonempty(*track_details, delim='\n'))
return '\n\n'.join(tracks)
def _format_description(self, response):
detailed_description = traverse_obj(response, ('detailedDescription', {dict})) or {}
return join_nonempty(
join_nonempty('epg80', 'epg200', delim='\n\n', from_dict=detailed_description),
traverse_obj(response, ('misc', 'actList', {self._format_act_list})),
traverse_obj(response, ('misc', 'musicList', {self._format_music_list})),
delim='\n\n')
def _get_thumbnails(self, data, keys, name=None, preference=-1):
thumbnails = []
for size, thumb in traverse_obj(data, (
*variadic(keys, (str, bytes, dict, set)), {dict.items},
lambda _, v: v[0] != 'copyright' and url_or_none(v[1]['url']),
)):
thumbnails.append({
'url': thumb['url'],
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
'preference': preference,
'id': join_nonempty(name, size),
})
preference -= 1
return thumbnails
def _extract_extended_metadata(self, episode_id, aa_vinfo): def _extract_extended_metadata(self, episode_id, aa_vinfo):
service, _, area = traverse_obj(aa_vinfo, (2, {str}, {lambda x: (x or '').partition(',')})) service, _, area = traverse_obj(aa_vinfo, (2, {str}, {lambda x: (x or '').partition(',')}))
date_id = aa_vinfo[3]
detail_url = try_call( detail_url = try_call(
lambda: self._API_URL_TMPL.format(area=area, service=service, dateid=aa_vinfo[3])) lambda: self._API_URL_TMPL.format(broadcastEventId=join_nonempty(service, area, date_id)))
if not detail_url: if not detail_url:
return {} return {}
@@ -699,36 +779,37 @@ class NhkRadiruIE(InfoExtractor):
if error := traverse_obj(response, ('error', {dict})): if error := traverse_obj(response, ('error', {dict})):
self.report_warning( self.report_warning(
'Failed to get extended metadata. API returned ' 'Failed to get extended metadata. API returned '
f'Error {join_nonempty("code", "message", from_dict=error, delim=": ")}') f'Error {join_nonempty("statuscode", "message", from_dict=error, delim=": ")}')
return {} return {}
full_meta = traverse_obj(response, ('list', service, 0, {dict})) station = traverse_obj(response, ('publishedOn', 'broadcastDisplayName', {str}))
if not full_meta:
self.report_warning('Failed to get extended metadata. API returned empty list.')
return {}
station = ' '.join(traverse_obj(full_meta, (('service', 'area'), 'name', {str}))) or None thumbnails = []
thumbnails = [{ thumbnails.extend(self._get_thumbnails(response, ('about', 'eyecatch')))
'id': str(id_), for num, dct in enumerate(traverse_obj(response, ('about', 'eyecatchList', ...))):
'preference': 1 if id_.startswith('thumbnail') else -2 if id_.startswith('logo') else -1, thumbnails.extend(self._get_thumbnails(dct, None, join_nonempty('list', num), -2))
**traverse_obj(thumb, { thumbnails.extend(
'url': 'url', self._get_thumbnails(response, ('about', 'partOfSeries', 'eyecatch'), 'series', -3))
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
} for id_, thumb in traverse_obj(full_meta, ('images', {dict.items}, lambda _, v: v[1]['url']))]
return filter_dict({ return filter_dict({
'description': self._format_description(response),
'cast': traverse_obj(response, ('misc', 'actList', ..., 'name', {str})),
'thumbnails': thumbnails,
**traverse_obj(response, {
'title': ('name', {str}),
'timestamp': ('endDate', {unified_timestamp}),
'release_timestamp': ('startDate', {unified_timestamp}),
'duration': ('duration', {parse_duration}),
}),
**traverse_obj(response, ('identifierGroup', {
'series': ('radioSeriesName', {str}),
'series_id': ('radioSeriesId', {str}),
'episode': ('radioEpisodeName', {str}),
'episode_id': ('radioEpisodeId', {str}),
'categories': ('genre', ..., ['name1', 'name2'], {str}, all, {orderedSet}),
})),
'channel': station, 'channel': station,
'uploader': station, 'uploader': station,
'description': join_nonempty(
'subtitle', 'content', 'act', 'music', delim='\n\n', from_dict=full_meta),
'thumbnails': thumbnails,
**traverse_obj(full_meta, {
'title': ('title', {str}),
'timestamp': ('end_time', {unified_timestamp}),
'release_timestamp': ('start_time', {unified_timestamp}),
}),
}) })
def _extract_episode_info(self, episode, programme_id, series_meta): def _extract_episode_info(self, episode, programme_id, series_meta):
@@ -782,7 +863,9 @@ class NhkRadiruIE(InfoExtractor):
site_id, corner_id, headline_id = self._match_valid_url(url).group('site', 'corner', 'headline') site_id, corner_id, headline_id = self._match_valid_url(url).group('site', 'corner', 'headline')
programme_id = f'{site_id}_{corner_id}' programme_id = f'{site_id}_{corner_id}'
if site_id == 'F261': # XXX: News programmes use old API (for now?) # XXX: News programmes use the old API
# Can't move this to NhkRadioNewsPageIE because news items still use the normal URL format
if site_id == '18439M2W42':
meta = self._download_json( meta = self._download_json(
'https://www.nhk.or.jp/s-media/news/news-site/list/v1/all.json', programme_id)['main'] 'https://www.nhk.or.jp/s-media/news/news-site/list/v1/all.json', programme_id)['main']
series_meta = traverse_obj(meta, { series_meta = traverse_obj(meta, {
@@ -843,8 +926,8 @@ class NhkRadioNewsPageIE(InfoExtractor):
'url': 'https://www.nhk.or.jp/radionews/', 'url': 'https://www.nhk.or.jp/radionews/',
'playlist_mincount': 5, 'playlist_mincount': 5,
'info_dict': { 'info_dict': {
'id': 'F261_01', 'id': '18439M2W42_01',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F261/img/RADIONEWS_640.jpg', 'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/18439M2W42/img/series_945_thumbnail.jpg',
'description': 'md5:bf2c5b397e44bc7eb26de98d8f15d79d', 'description': 'md5:bf2c5b397e44bc7eb26de98d8f15d79d',
'channel': 'NHKラジオ第1', 'channel': 'NHKラジオ第1',
'uploader': 'NHKラジオ第1', 'uploader': 'NHKラジオ第1',
@@ -853,7 +936,7 @@ class NhkRadioNewsPageIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
return self.url_result('https://www.nhk.or.jp/radio/ondemand/detail.html?p=F261_01', NhkRadiruIE) return self.url_result('https://www.nhk.or.jp/radio/ondemand/detail.html?p=18439M2W42_01', NhkRadiruIE)
class NhkRadiruLiveIE(InfoExtractor): class NhkRadiruLiveIE(InfoExtractor):
@@ -863,11 +946,12 @@ class NhkRadiruLiveIE(InfoExtractor):
# radio 1, no area specified # radio 1, no area specified
'url': 'https://www.nhk.or.jp/radio/player/?ch=r1', 'url': 'https://www.nhk.or.jp/radio/player/?ch=r1',
'info_dict': { 'info_dict': {
'id': 'r1-tokyo', 'id': 'bs-r1-130',
'title': 're:^NHKネットラジオ第1 東京.+$', 'title': 're:^NHKラジオ第1東京.+$',
'ext': 'm4a', 'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/r1-200x200.png', 'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r1/r1-logo.svg',
'live_status': 'is_live', 'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive r1-tokyo'],
}, },
}, { }, {
# radio 2, area specified # radio 2, area specified
@@ -875,26 +959,28 @@ class NhkRadiruLiveIE(InfoExtractor):
'url': 'https://www.nhk.or.jp/radio/player/?ch=r2', 'url': 'https://www.nhk.or.jp/radio/player/?ch=r2',
'params': {'extractor_args': {'nhkradirulive': {'area': ['fukuoka']}}}, 'params': {'extractor_args': {'nhkradirulive': {'area': ['fukuoka']}}},
'info_dict': { 'info_dict': {
'id': 'r2-fukuoka', 'id': 'bs-r2-400',
'title': 're:^NHKネットラジオ第2 福岡.+$', 'title': 're:^NHKラジオ第2.+$',
'ext': 'm4a', 'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/r2-200x200.png', 'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r2/r2-logo.svg',
'live_status': 'is_live', 'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive r2-fukuoka'],
}, },
}, { }, {
# fm, area specified # fm, area specified
'url': 'https://www.nhk.or.jp/radio/player/?ch=fm', 'url': 'https://www.nhk.or.jp/radio/player/?ch=fm',
'params': {'extractor_args': {'nhkradirulive': {'area': ['sapporo']}}}, 'params': {'extractor_args': {'nhkradirulive': {'area': ['sapporo']}}},
'info_dict': { 'info_dict': {
'id': 'fm-sapporo', 'id': 'bs-r3-010',
'title': 're:^NHKネットラジオFM 札幌.+$', 'title': 're:^NHK FM・札幌.+$',
'ext': 'm4a', 'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/fm-200x200.png', 'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r3/r3-logo.svg',
'live_status': 'is_live', 'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive fm-sapporo'],
}, },
}] }]
_NOA_STATION_IDS = {'r1': 'n1', 'r2': 'n2', 'fm': 'n3'} _NOA_STATION_IDS = {'r1': 'r1', 'r2': 'r2', 'fm': 'r3'}
def _real_extract(self, url): def _real_extract(self, url):
station = self._match_id(url) station = self._match_id(url)
@@ -911,12 +997,15 @@ class NhkRadiruLiveIE(InfoExtractor):
noa_info = self._download_json( noa_info = self._download_json(
f'https:{config.find(".//url_program_noa").text}'.format(area=data.find('areakey').text), f'https:{config.find(".//url_program_noa").text}'.format(area=data.find('areakey').text),
station, note=f'Downloading {area} station metadata', fatal=False) station, note=f'Downloading {area} station metadata', fatal=False)
present_info = traverse_obj(noa_info, ('nowonair_list', self._NOA_STATION_IDS.get(station), 'present')) broadcast_service = traverse_obj(noa_info, (self._NOA_STATION_IDS.get(station), 'publishedOn'))
return { return {
'title': ' '.join(traverse_obj(present_info, (('service', 'area'), 'name', {str}))), **traverse_obj(broadcast_service, {
'id': join_nonempty(station, area), 'title': ('broadcastDisplayName', {str}),
'thumbnails': traverse_obj(present_info, ('service', 'images', ..., { 'id': ('id', {str}),
}),
'_old_archive_ids': [make_archive_id(self, join_nonempty(station, area))],
'thumbnails': traverse_obj(broadcast_service, ('logo', ..., {
'url': 'url', 'url': 'url',
'width': ('width', {int_or_none}), 'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}), 'height': ('height', {int_or_none}),

View File

@@ -4,16 +4,15 @@ import itertools
import json import json
import re import re
import time import time
import urllib.parse
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
clean_html, clean_html,
determine_ext, determine_ext,
extract_attributes,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_bitrate, parse_bitrate,
@@ -22,9 +21,8 @@ from ..utils import (
parse_qs, parse_qs,
parse_resolution, parse_resolution,
qualities, qualities,
remove_start,
str_or_none, str_or_none,
unescapeHTML, truncate_string,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_basename, url_basename,
@@ -32,7 +30,11 @@ from ..utils import (
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, require, traverse_obj from ..utils.traversal import (
find_element,
require,
traverse_obj,
)
class NiconicoBaseIE(InfoExtractor): class NiconicoBaseIE(InfoExtractor):
@@ -806,41 +808,39 @@ class NiconicoLiveIE(NiconicoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(f'https://live.nicovideo.jp/watch/{video_id}', video_id) webpage = self._download_webpage(url, video_id, expected_status=404)
if err_msg := traverse_obj(webpage, ({find_element(cls='message')}, {clean_html})):
raise ExtractorError(err_msg, expected=True)
embedded_data = self._parse_json(unescapeHTML(self._search_regex( embedded_data = traverse_obj(webpage, (
r'<script\s+id="embedded-data"\s*data-props="(.+?)"', webpage, 'embedded data')), video_id) {find_element(tag='script', id='embedded-data', html=True)},
{extract_attributes}, 'data-props', {json.loads}))
ws_url = traverse_obj(embedded_data, ('site', 'relive', 'webSocketUrl')) frontend_id = traverse_obj(embedded_data, ('site', 'frontendId', {str_or_none}), default='9')
if not ws_url:
raise ExtractorError('The live hasn\'t started yet or already ended.', expected=True)
ws_url = update_url_query(ws_url, {
'frontend_id': traverse_obj(embedded_data, ('site', 'frontendId')) or '9',
})
hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.')
ws_url = traverse_obj(embedded_data, (
'site', 'relive', 'webSocketUrl', {url_or_none}, {require('websocket URL')}))
ws_url = update_url_query(ws_url, {'frontend_id': frontend_id})
ws = self._request_webpage( ws = self._request_webpage(
Request(ws_url, headers={'Origin': f'https://{hostname}'}), ws_url, video_id, 'Connecting to WebSocket server',
video_id=video_id, note='Connecting to WebSocket server') headers={'Origin': 'https://live.nicovideo.jp'})
self.write_debug('Sending HLS server request') self.write_debug('Sending HLS server request')
ws.send(json.dumps({ ws.send(json.dumps({
'type': 'startWatching',
'data': { 'data': {
'reconnect': False,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': { 'stream': {
'quality': 'abr',
'protocol': 'hls',
'latency': 'high',
'accessRightMethod': 'single_cookie', 'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': 'abr',
}, },
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': False,
}, },
'type': 'startWatching',
})) }))
while True: while True:
@@ -860,17 +860,15 @@ class NiconicoLiveIE(NiconicoBaseIE):
raise ExtractorError('Disconnected at middle of extraction') raise ExtractorError('Disconnected at middle of extraction')
elif data.get('type') == 'error': elif data.get('type') == 'error':
self.write_debug(recv) self.write_debug(recv)
message = traverse_obj(data, ('body', 'code')) or recv message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
raise ExtractorError(message) raise ExtractorError(message)
elif self.get_param('verbose', False): elif self.get_param('verbose', False):
if len(recv) > 100: self.write_debug(f'Server response: {truncate_string(recv, 100)}')
recv = recv[:100] + '...'
self.write_debug(f'Server said: {recv}')
title = traverse_obj(embedded_data, ('program', 'title')) or self._html_search_meta( title = traverse_obj(embedded_data, ('program', 'title')) or self._html_search_meta(
('og:title', 'twitter:title'), webpage, 'live title', fatal=False) ('og:title', 'twitter:title'), webpage, 'live title', fatal=False)
raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail')) or {} raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail', {dict})) or {}
thumbnails = [] thumbnails = []
for name, value in raw_thumbs.items(): for name, value in raw_thumbs.items():
if not isinstance(value, dict): if not isinstance(value, dict):
@@ -897,31 +895,30 @@ class NiconicoLiveIE(NiconicoBaseIE):
cookie['domain'], cookie['name'], cookie['value'], cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure']) expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure'])
fmt_common = {
'live_latency': 'high',
'origin': hostname,
'protocol': 'niconico_live',
'video_id': video_id,
'ws': ws,
}
q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr' q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr'
a_map = {96: 'audio_low', 192: 'audio_high'} a_map = {96: 'audio_low', 192: 'audio_high'}
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True) formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt in formats: for fmt in formats:
fmt['protocol'] = 'niconico_live'
if fmt.get('acodec') == 'none': if fmt.get('acodec') == 'none':
fmt['format_id'] = next(q_iter, fmt['format_id']) fmt['format_id'] = next(q_iter, fmt['format_id'])
elif fmt.get('vcodec') == 'none': elif fmt.get('vcodec') == 'none':
abr = parse_bitrate(fmt['url'].lower()) abr = parse_bitrate(fmt['url'].lower())
fmt.update({ fmt.update({
'abr': abr, 'abr': abr,
'acodec': 'mp4a.40.2',
'format_id': a_map.get(abr, fmt['format_id']), 'format_id': a_map.get(abr, fmt['format_id']),
}) })
fmt.update(fmt_common)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'downloader_options': {
'max_quality': traverse_obj(embedded_data, ('program', 'stream', 'maxQuality', {str})) or 'normal',
'ws': ws,
'ws_url': ws_url,
},
**traverse_obj(embedded_data, { **traverse_obj(embedded_data, {
'view_count': ('program', 'statistics', 'watchCount'), 'view_count': ('program', 'statistics', 'watchCount'),
'comment_count': ('program', 'statistics', 'commentCount'), 'comment_count': ('program', 'statistics', 'commentCount'),

View File

@@ -1,6 +1,5 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
determine_ext, determine_ext,
int_or_none, int_or_none,
traverse_obj, traverse_obj,
@@ -61,10 +60,10 @@ class NineGagIE(InfoExtractor):
post = self._download_json( post = self._download_json(
'https://9gag.com/v1/post', post_id, query={ 'https://9gag.com/v1/post', post_id, query={
'id': post_id, 'id': post_id,
})['data']['post'] }, impersonate=True)['data']['post']
if post.get('type') != 'Animated': if post.get('type') != 'Animated':
raise ExtractorError( self.raise_no_formats(
'The given url does not contain a video', 'The given url does not contain a video',
expected=True) expected=True)

View File

@@ -1,6 +1,3 @@
import json
import re
from .brightcove import BrightcoveNewIE from .brightcove import BrightcoveNewIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@@ -11,7 +8,12 @@ from ..utils import (
str_or_none, str_or_none,
url_or_none, url_or_none,
) )
from ..utils.traversal import require, traverse_obj, value from ..utils.traversal import (
get_first,
require,
traverse_obj,
value,
)
class NineNowIE(InfoExtractor): class NineNowIE(InfoExtractor):
@@ -101,20 +103,11 @@ class NineNowIE(InfoExtractor):
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId={}' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId={}'
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv and yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url): def _real_extract(self, url):
display_id, video_type = self._match_valid_url(url).group('id', 'type') display_id, video_type = self._match_valid_url(url).group('id', 'type')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
common_data = traverse_obj( common_data = get_first(self._search_nextjs_v13_data(webpage, display_id), ('payload', {dict}))
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json},
lambda _, v: v['payload'][video_type]['slug'] == display_id,
'payload', any, {require('video data')}))
if traverse_obj(common_data, (video_type, 'video', 'drm', {bool})): if traverse_obj(common_data, (video_type, 'video', 'drm', {bool})):
self.report_drm(display_id) self.report_drm(display_id)

View File

@@ -1,100 +0,0 @@
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
smuggle_url,
try_get,
)
class NoovoIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?noovo\.ca/videos/(?P<id>[^/]+/[^/?#&]+)'
_TESTS = [{
# clip
'url': 'http://noovo.ca/videos/rpm-plus/chrysler-imperial',
'info_dict': {
'id': '5386045029001',
'ext': 'mp4',
'title': 'Chrysler Imperial',
'description': 'md5:de3c898d1eb810f3e6243e08c8b4a056',
'timestamp': 1491399228,
'upload_date': '20170405',
'uploader_id': '618566855001',
'series': 'RPM+',
},
'params': {
'skip_download': True,
},
}, {
# episode
'url': 'http://noovo.ca/videos/l-amour-est-dans-le-pre/episode-13-8',
'info_dict': {
'id': '5395865725001',
'title': 'Épisode 13 : Les retrouvailles',
'description': 'md5:888c3330f0c1b4476c5bc99a1c040473',
'ext': 'mp4',
'timestamp': 1492019320,
'upload_date': '20170412',
'uploader_id': '618566855001',
'series': "L'amour est dans le pré",
'season_number': 5,
'episode': 'Épisode 13',
'episode_number': 13,
},
'params': {
'skip_download': True,
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/618566855001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
brightcove_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'brightcove id')
data = self._parse_json(
self._search_regex(
r'(?s)dataLayer\.push\(\s*({.+?})\s*\);', webpage, 'data',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
title = try_get(
data, lambda x: x['video']['nom'],
str) or self._html_search_meta(
'dcterms.Title', webpage, 'title', fatal=True)
description = self._html_search_meta(
('dcterms.Description', 'description'), webpage, 'description')
series = try_get(
data, lambda x: x['emission']['nom']) or self._search_regex(
r'<div[^>]+class="banner-card__subtitle h4"[^>]*>([^<]+)',
webpage, 'series', default=None)
season_el = try_get(data, lambda x: x['emission']['saison'], dict) or {}
season = try_get(season_el, lambda x: x['nom'], str)
season_number = int_or_none(try_get(season_el, lambda x: x['numero']))
episode_el = try_get(season_el, lambda x: x['episode'], dict) or {}
episode = try_get(episode_el, lambda x: x['nom'], str)
episode_number = int_or_none(try_get(episode_el, lambda x: x['numero']))
return {
'_type': 'url_transparent',
'ie_key': BrightcoveNewIE.ie_key(),
'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['CA']}),
'id': brightcove_id,
'title': title,
'description': description,
'series': series,
'season': season,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
}

View File

@@ -19,7 +19,7 @@ from ..utils import (
url_or_none, url_or_none,
urljoin, urljoin,
) )
from ..utils.traversal import traverse_obj, value from ..utils.traversal import require, traverse_obj, value
class PatreonBaseIE(InfoExtractor): class PatreonBaseIE(InfoExtractor):
@@ -462,7 +462,7 @@ class PatreonCampaignIE(PatreonBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?patreon\.com/(?: https?://(?:www\.)?patreon\.com/(?:
(?:m|api/campaigns)/(?P<campaign_id>\d+)| (?:m|api/campaigns)/(?P<campaign_id>\d+)|
(?:c/)?(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+) (?:cw?/)?(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
)(?:/posts)?/?(?:$|[?#])''' )(?:/posts)?/?(?:$|[?#])'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.patreon.com/dissonancepod/', 'url': 'https://www.patreon.com/dissonancepod/',
@@ -531,6 +531,28 @@ class PatreonCampaignIE(PatreonBaseIE):
'age_limit': 0, 'age_limit': 0,
}, },
'playlist_mincount': 331, 'playlist_mincount': 331,
'skip': 'Channel removed',
}, {
# next.js v13 data, see https://github.com/yt-dlp/yt-dlp/issues/13622
'url': 'https://www.patreon.com/c/anythingelse/posts',
'info_dict': {
'id': '9631148',
'title': 'Anything Else?',
'description': 'md5:2ee1db4aed2f9460c2b295825a24aa08',
'uploader': 'dan ',
'uploader_id': '13852412',
'uploader_url': 'https://www.patreon.com/anythingelse',
'channel': 'Anything Else?',
'channel_id': '9631148',
'channel_url': 'https://www.patreon.com/anythingelse',
'channel_follower_count': int,
'age_limit': 0,
'thumbnail': r're:https?://.+/.+',
},
'playlist_mincount': 151,
}, {
'url': 'https://www.patreon.com/cw/anythingelse',
'only_matching': True,
}, { }, {
'url': 'https://www.patreon.com/c/OgSog/posts', 'url': 'https://www.patreon.com/c/OgSog/posts',
'only_matching': True, 'only_matching': True,
@@ -572,8 +594,11 @@ class PatreonCampaignIE(PatreonBaseIE):
campaign_id, vanity = self._match_valid_url(url).group('campaign_id', 'vanity') campaign_id, vanity = self._match_valid_url(url).group('campaign_id', 'vanity')
if campaign_id is None: if campaign_id is None:
webpage = self._download_webpage(url, vanity, headers={'User-Agent': self.patreon_user_agent}) webpage = self._download_webpage(url, vanity, headers={'User-Agent': self.patreon_user_agent})
campaign_id = self._search_nextjs_data( campaign_id = traverse_obj(self._search_nextjs_data(webpage, vanity, default=None), (
webpage, vanity)['props']['pageProps']['bootstrapEnvelope']['pageBootstrap']['campaign']['data']['id'] 'props', 'pageProps', 'bootstrapEnvelope', 'pageBootstrap', 'campaign', 'data', 'id', {str}))
if not campaign_id:
campaign_id = traverse_obj(self._search_nextjs_v13_data(webpage, vanity), (
lambda _, v: v['type'] == 'campaign', 'id', {str}, any, {require('campaign ID')}))
params = { params = {
'json-api-use-default-includes': 'false', 'json-api-use-default-includes': 'false',

View File

@@ -0,0 +1,70 @@
from .common import InfoExtractor
from ..utils import clean_html, clean_podcast_url, int_or_none, str_or_none, url_or_none
from ..utils.traversal import traverse_obj
class PlayerFmIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?player\.fm/(?:series/)?[\w-]+/(?P<id>[\w-]+))'
_TESTS = [{
'url': 'https://player.fm/series/chapo-trap-house/movie-mindset-33-casino-feat-felix',
'info_dict': {
'ext': 'mp3',
'id': '478606546',
'display_id': 'movie-mindset-33-casino-feat-felix',
'thumbnail': r're:^https://.*\.(jpg|png)',
'title': 'Movie Mindset 33 - Casino feat. Felix',
'creators': ['Chapo Trap House'],
'description': r're:The first episode of this season of Movie Mindset is free .+ we feel about it\.',
'duration': 6830,
'timestamp': 1745406000,
'upload_date': '20250423',
},
}, {
'url': 'https://player.fm/series/nbc-nightly-news-with-tom-llamas/thursday-april-17-2025',
'info_dict': {
'ext': 'mp3',
'id': '477635490',
'display_id': 'thursday-april-17-2025',
'title': 'Thursday, April 17, 2025',
'thumbnail': r're:^https://.*\.(jpg|png)',
'duration': 1143,
'description': 'md5:4890b8cf9a55a787561cd5d59dfcda82',
'creators': ['NBC News'],
'timestamp': 1744941374,
'upload_date': '20250418',
},
}, {
'url': 'https://player.fm/series/soccer-101/ep-109-its-kicking-off-how-have-the-rules-for-kickoff-changed-what-are-the-best-approaches-to-getting-the-game-underway-and-how-could-we-improve-on-the-present-system-ack3NzL3yibvs4pf',
'info_dict': {
'ext': 'mp3',
'id': '481418710',
'thumbnail': r're:^https://.*\.(jpg|png)',
'title': r're:#109 It\'s kicking off! How have the rules for kickoff changed, .+ the present system\?',
'creators': ['TSS'],
'duration': 1510,
'display_id': 'md5:b52ecacaefab891b59db69721bfd9b13',
'description': 'md5:52a39e36d08d8919527454f152ad3c25',
'timestamp': 1659102055,
'upload_date': '20220729',
},
}]
def _real_extract(self, url):
display_id, url = self._match_valid_url(url).group('id', 'url')
data = self._download_json(f'{url}.json', display_id)
return {
'display_id': display_id,
'vcodec': 'none',
**traverse_obj(data, {
'id': ('id', {int}, {str_or_none}),
'url': ('url', {clean_podcast_url}),
'title': ('title', {str}),
'description': ('description', {clean_html}),
'duration': ('duration', {int_or_none}),
'thumbnail': (('image', ('series', 'image')), 'url', {url_or_none}, any),
'filesize': ('size', {int_or_none}),
'timestamp': ('publishedAt', {int_or_none}),
'creators': ('series', 'author', {str}, filter, all, filter),
}),
}

View File

@@ -81,7 +81,7 @@ class RaiBaseIE(InfoExtractor):
# geo flag is a bit unreliable and not properly set all the time # geo flag is a bit unreliable and not properly set all the time
geoprotection = xpath_text(relinker, './geoprotection', default='N') == 'Y' geoprotection = xpath_text(relinker, './geoprotection', default='N') == 'Y'
ext = determine_ext(media_url) ext = determine_ext(media_url).lower()
formats = [] formats = []
if ext == 'mp3': if ext == 'mp3':
@@ -108,7 +108,7 @@ class RaiBaseIE(InfoExtractor):
'format_id': join_nonempty('https', bitrate, delim='-'), 'format_id': join_nonempty('https', bitrate, delim='-'),
}) })
else: else:
raise ExtractorError('Unrecognized media file found') raise ExtractorError(f'Unrecognized media extension "{ext}"')
if (not formats and geoprotection is True) or '/video_no_available.mp4' in media_url: if (not formats and geoprotection is True) or '/video_no_available.mp4' in media_url:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES, metadata_available=True) self.raise_geo_restricted(countries=self._GEO_COUNTRIES, metadata_available=True)
@@ -503,6 +503,28 @@ class RaiPlaySoundIE(RaiBaseIE):
'upload_date': '20211201', 'upload_date': '20211201',
}, },
'params': {'skip_download': True}, 'params': {'skip_download': True},
}, {
# case-sensitivity test for uppercase extension
'url': 'https://www.raiplaysound.it/audio/2020/05/Storia--Lunita-dItalia-e-lunificazione-della-Germania-b4c16390-7f3f-4282-b353-d94897dacb7c.html',
'md5': 'c69ebd69282f0effd7ef67b7e2f6c7d8',
'info_dict': {
'id': 'b4c16390-7f3f-4282-b353-d94897dacb7c',
'ext': 'mp3',
'title': "Storia | 01 L'unità d'Italia e l'unificazione della Germania",
'alt_title': 'md5:ed4ed82585c52057b71b43994a59b705',
'description': 'md5:92818b6f31b2c150567d56b75db2ea7f',
'uploader': 'rai radio 3',
'duration': 2439.0,
'thumbnail': 'https://www.raiplaysound.it/dl/img/2023/09/07/1694084898279_Maturadio-LOGO-2048x1152.jpg',
'creators': ['rai radio 3'],
'series': 'Maturadio',
'season': 'Season 9',
'season_number': 9,
'episode': "01. L'unità d'Italia e l'unificazione della Germania",
'episode_number': 1,
'timestamp': 1590400740,
'upload_date': '20200525',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -765,7 +787,7 @@ class RaiCulturaIE(RaiNewsIE): # XXX: Do not subclass from concrete IE
class RaiSudtirolIE(RaiBaseIE): class RaiSudtirolIE(RaiBaseIE):
_VALID_URL = r'https?://raisudtirol\.rai\.it/.+media=(?P<id>\w+)' _VALID_URL = r'https?://rai(?:bz|sudtirol)\.rai\.it/.+media=(?P<id>\w+)'
_TESTS = [{ _TESTS = [{
# mp4 file # mp4 file
'url': 'https://raisudtirol.rai.it/la/index.php?media=Ptv1619729460', 'url': 'https://raisudtirol.rai.it/la/index.php?media=Ptv1619729460',
@@ -791,6 +813,9 @@ class RaiSudtirolIE(RaiBaseIE):
'formats': 'count:6', 'formats': 'count:6',
}, },
'params': {'skip_download': True}, 'params': {'skip_download': True},
}, {
'url': 'https://raibz.rai.it/de/index.php?media=Ptv1751660400',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,41 @@
from .floatplane import FloatplaneBaseIE
class SaucePlusIE(FloatplaneBaseIE):
IE_DESC = 'Sauce+'
_VALID_URL = r'https?://(?:(?:www|beta)\.)?sauceplus\.com/post/(?P<id>\w+)'
_BASE_URL = 'https://www.sauceplus.com'
_HEADERS = {
'Origin': _BASE_URL,
'Referer': f'{_BASE_URL}/',
}
_IMPERSONATE_TARGET = True
_TESTS = [{
'url': 'https://www.sauceplus.com/post/YbBwIa2A5g',
'info_dict': {
'id': 'eit4Ugu5TL',
'ext': 'mp4',
'display_id': 'YbBwIa2A5g',
'title': 'Scare the Coyote - Episode 3',
'description': '',
'thumbnail': r're:^https?://.*\.jpe?g$',
'duration': 2975,
'comment_count': int,
'like_count': int,
'dislike_count': int,
'release_date': '20250627',
'release_timestamp': 1750993500,
'uploader': 'Scare The Coyote',
'uploader_id': '683e0a3269688656a5a49a44',
'uploader_url': 'https://www.sauceplus.com/channel/ScareTheCoyote/home',
'channel': 'Scare The Coyote',
'channel_id': '683e0a326968866ceba49a45',
'channel_url': 'https://www.sauceplus.com/channel/ScareTheCoyote/home/main',
'availability': 'subscriber_only',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_initialize(self):
if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'):
self.raise_login_required()

View File

@@ -1,140 +1,118 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError, determine_ext, parse_qs, traverse_obj from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
str_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class SkebIE(InfoExtractor): class SkebIE(InfoExtractor):
_VALID_URL = r'https?://skeb\.jp/@[^/]+/works/(?P<id>\d+)' _VALID_URL = r'https?://skeb\.jp/@(?P<uploader_id>[^/?#]+)/works/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://skeb.jp/@riiru_wm/works/10', 'url': 'https://skeb.jp/@riiru_wm/works/10',
'info_dict': { 'info_dict': {
'id': '466853', 'id': '466853',
'title': '内容はおまかせします! by 姫ノ森りぃる@一周年',
'description': 'md5:1ec50901efc3437cfbfe3790468d532d',
'uploader': '姫ノ森りぃる@一周年',
'uploader_id': 'riiru_wm',
'age_limit': 0,
'tags': [],
'url': r're:https://skeb.+',
'thumbnail': r're:https://skeb.+',
'subtitles': {
'jpn': [{
'url': r're:https://skeb.+',
'ext': 'vtt',
}],
},
'width': 720,
'height': 405,
'duration': 313,
'fps': 30,
'ext': 'mp4', 'ext': 'mp4',
'title': '10-1',
'description': 'md5:1ec50901efc3437cfbfe3790468d532d',
'duration': 313,
'genres': ['video'],
'thumbnail': r're:https?://.+',
'uploader': '姫ノ森りぃる@ひとづま',
'uploader_id': 'riiru_wm',
}, },
}, { }, {
'url': 'https://skeb.jp/@furukawa_nob/works/3', 'url': 'https://skeb.jp/@furukawa_nob/works/3',
'info_dict': { 'info_dict': {
'id': '489408', 'id': '489408',
'title': 'いつもお世話になってお... by 古川ノブ@音楽とVlo...',
'description': 'md5:5adc2e41d06d33b558bf7b1faeb7b9c2',
'uploader': '古川ノブ@音楽とVlogのVtuber',
'uploader_id': 'furukawa_nob',
'age_limit': 0,
'tags': [
'よろしく', '大丈夫', 'お願い', 'でした',
'是非', 'O', 'バー', '遊び', 'おはよう',
'オーバ', 'ボイス',
],
'url': r're:https://skeb.+',
'thumbnail': r're:https://skeb.+',
'subtitles': {
'jpn': [{
'url': r're:https://skeb.+',
'ext': 'vtt',
}],
},
'duration': 98,
'ext': 'mp3', 'ext': 'mp3',
'vcodec': 'none', 'title': '3-1',
'abr': 128, 'description': 'md5:6de1f8f876426a6ac321c123848176a8',
'duration': 98,
'genres': ['voice'],
'tags': 'count:11',
'thumbnail': r're:https?://.+',
'uploader': '古川ノブ@宮城の動画勢Vtuber',
'uploader_id': 'furukawa_nob',
}, },
}, { }, {
'url': 'https://skeb.jp/@mollowmollow/works/6', 'url': 'https://skeb.jp/@Rizu_panda_cube/works/626',
'info_dict': { 'info_dict': {
'id': '6', 'id': '626',
'title': 'ヒロ。\n\n私のキャラク... by 諸々', 'description': 'md5:834557b39ca56960c5f77dd6ddabe775',
'description': 'md5:aa6cbf2ba320b50bce219632de195f07', 'uploader': 'りづ100億%',
'_type': 'playlist', 'uploader_id': 'Rizu_panda_cube',
'entries': [{ 'tags': 'count:57',
'id': '486430', 'genres': ['video'],
'title': 'ヒロ。\n\n私のキャラク... by 諸々',
'description': 'md5:aa6cbf2ba320b50bce219632de195f07',
}, {
'id': '486431',
'title': 'ヒロ。\n\n私のキャラク... by 諸々',
}],
}, },
'playlist_count': 2,
'expected_warnings': ['Skipping unsupported extension'],
}] }]
def _real_extract(self, url): def _call_api(self, uploader_id, work_id):
video_id = self._match_id(url) return self._download_json(
nuxt_data = self._search_nuxt_data(self._download_webpage(url, video_id), video_id) f'https://skeb.jp/api/users/{uploader_id}/works/{work_id}', work_id, headers={
'Accept': 'application/json',
'Authorization': 'Bearer null',
})
parent = { def _real_extract(self, url):
'id': video_id, uploader_id, work_id = self._match_valid_url(url).group('uploader_id', 'id')
'title': nuxt_data.get('title'), try:
'description': nuxt_data.get('description'), works = self._call_api(uploader_id, work_id)
'uploader': traverse_obj(nuxt_data, ('creator', 'name')), except ExtractorError as e:
'uploader_id': traverse_obj(nuxt_data, ('creator', 'screen_name')), if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
'age_limit': 18 if nuxt_data.get('nsfw') else 0, raise
'tags': nuxt_data.get('tag_list'), webpage = e.cause.response.read().decode()
value = self._search_regex(
r'document\.cookie\s*=\s*["\']request_key=([^;"\']+)', webpage, 'request key')
self._set_cookie('skeb.jp', 'request_key', value)
works = self._call_api(uploader_id, work_id)
info = {
'uploader_id': uploader_id,
**traverse_obj(works, {
'age_limit': ('nsfw', {bool}, {lambda x: 18 if x else None}),
'description': (('source_body', 'body'), {clean_html}, filter, any),
'genres': ('genre', {str}, filter, all, filter),
'tags': ('tag_list', ..., {str}, filter, all, filter),
'uploader': ('creator', 'name', {str}),
}),
} }
entries = [] entries = []
for item in nuxt_data.get('previews') or []: for idx, preview in enumerate(traverse_obj(works, ('previews', lambda _, v: url_or_none(v['url']))), 1):
vid_url = item.get('url') ext = traverse_obj(preview, ('information', 'extension', {str}))
given_ext = traverse_obj(item, ('information', 'extension')) if ext not in ('mp3', 'mp4'):
preview_ext = determine_ext(vid_url, default_ext=None) self.report_warning(f'Skipping unsupported extension "{ext}"')
if not preview_ext:
content_disposition = parse_qs(vid_url)['response-content-disposition'][0]
preview_ext = self._search_regex(
r'filename="[^"]+\.([^\.]+?)"', content_disposition,
'preview file extension', fatal=False, group=1)
if preview_ext not in ('mp4', 'mp3'):
continue continue
if not vid_url or not item.get('id'):
continue
width, height = traverse_obj(item, ('information', 'width')), traverse_obj(item, ('information', 'height'))
if width is not None and height is not None:
# the longest side is at most 720px for non-client viewers
max_size = max(width, height)
width, height = (x * 720 // max_size for x in (width, height))
entries.append({ entries.append({
**parent, 'ext': ext,
'id': str(item['id']), 'title': f'{work_id}-{idx}',
'url': vid_url,
'thumbnail': item.get('poster_url'),
'subtitles': { 'subtitles': {
'jpn': [{ 'ja': [{
'url': item.get('vtt_url'),
'ext': 'vtt', 'ext': 'vtt',
'url': preview['vtt_url'],
}], }],
} if item.get('vtt_url') else None, } if url_or_none(preview.get('vtt_url')) else None,
'width': width, 'vcodec': 'none' if ext == 'mp3' else None,
'height': height, **info,
'duration': traverse_obj(item, ('information', 'duration')), **traverse_obj(preview, {
'fps': traverse_obj(item, ('information', 'frame_rate')), 'id': ('id', {str_or_none}),
'ext': preview_ext or given_ext, 'thumbnail': ('poster_url', {url_or_none}),
'vcodec': 'none' if preview_ext == 'mp3' else None, 'url': ('url', {url_or_none}),
# you'll always get 128kbps MP3 for non-client viewers }),
'abr': 128 if preview_ext == 'mp3' else None, **traverse_obj(preview, ('information', {
'duration': ('duration', {int_or_none}),
'fps': ('frame_rate', {int_or_none}),
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
})),
}) })
if not entries: return self.playlist_result(entries, work_id, **info)
raise ExtractorError('No video/audio attachment found in this commission.', expected=True)
elif len(entries) == 1:
return entries[0]
else:
parent.update({
'_type': 'playlist',
'entries': entries,
})
return parent

View File

@@ -242,7 +242,7 @@ class SoundcloudBaseIE(InfoExtractor):
format_urls.add(format_url) format_urls.add(format_url)
formats.append({ formats.append({
'format_id': 'download', 'format_id': 'download',
'ext': urlhandle_detect_ext(urlh, default='mp3'), 'ext': urlhandle_detect_ext(urlh),
'filesize': int_or_none(urlh.headers.get('Content-Length')), 'filesize': int_or_none(urlh.headers.get('Content-Length')),
'url': format_url, 'url': format_url,
'quality': 10, 'quality': 10,

View File

@@ -98,13 +98,10 @@ class SproutVideoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, headers={ webpage = self._download_webpage(
**traverse_obj(smuggled_data, {'Referer': 'referer'}), url, video_id, headers=traverse_obj(smuggled_data, {'Referer': 'referer'}), impersonate=True)
# yt-dlp's default Chrome user-agents are too old
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:140.0) Gecko/20100101 Firefox/140.0',
})
data = self._search_json( data = self._search_json(
r'var\s+(?:dat|playerInfo)\s*=\s*["\']', webpage, 'player info', video_id, r'(?:var|const|let)\s+(?:dat|playerInfo)\s*=\s*["\']', webpage, 'player info', video_id,
contains_pattern=r'[A-Za-z0-9+/=]+', end_pattern=r'["\'];', contains_pattern=r'[A-Za-z0-9+/=]+', end_pattern=r'["\'];',
transform_source=lambda x: base64.b64decode(x).decode()) transform_source=lambda x: base64.b64decode(x).decode())

View File

@@ -7,11 +7,11 @@ from ..utils import int_or_none, traverse_obj, url_or_none, urljoin
class TenPlayIE(InfoExtractor): class TenPlayIE(InfoExtractor):
IE_NAME = '10play' IE_NAME = '10play'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/?#]+/)+(?P<id>tpv\d{6}[a-z]{5})' _VALID_URL = r'https?://(?:www\.)?10(?:play)?\.com\.au/(?:[^/?#]+/)+(?P<id>tpv\d{6}[a-z]{5})'
_NETRC_MACHINE = '10play' _NETRC_MACHINE = '10play'
_TESTS = [{ _TESTS = [{
# Geo-restricted to Australia # Geo-restricted to Australia
'url': 'https://10play.com.au/australian-survivor/web-extras/season-10-brains-v-brawn-ii/myless-journey/tpv250414jdmtf', 'url': 'https://10.com.au/australian-survivor/web-extras/season-10-brains-v-brawn-ii/myless-journey/tpv250414jdmtf',
'info_dict': { 'info_dict': {
'id': '7440980000013868', 'id': '7440980000013868',
'ext': 'mp4', 'ext': 'mp4',
@@ -32,7 +32,7 @@ class TenPlayIE(InfoExtractor):
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
# Geo-restricted to Australia # Geo-restricted to Australia
'url': 'https://10play.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp', 'url': 'https://10.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp',
'info_dict': { 'info_dict': {
'id': '9000000000091177', 'id': '9000000000091177',
'ext': 'mp4', 'ext': 'mp4',
@@ -55,7 +55,7 @@ class TenPlayIE(InfoExtractor):
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
# Geo-restricted to Australia; upgrading the m3u8 quality fails and we need the fallback # Geo-restricted to Australia; upgrading the m3u8 quality fails and we need the fallback
'url': 'https://10play.com.au/tiny-chef-show/episodes/season-1/episode-2/tpv240228pofvt', 'url': 'https://10.com.au/tiny-chef-show/episodes/season-1/episode-2/tpv240228pofvt',
'info_dict': { 'info_dict': {
'id': '9000000000084116', 'id': '9000000000084116',
'ext': 'mp4', 'ext': 'mp4',
@@ -77,6 +77,7 @@ class TenPlayIE(InfoExtractor):
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information: HTTP Error 502'], 'expected_warnings': ['Failed to download m3u8 information: HTTP Error 502'],
'skip': 'video unavailable',
}, { }, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc', 'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True, 'only_matching': True,
@@ -96,7 +97,7 @@ class TenPlayIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
content_id = self._match_id(url) content_id = self._match_id(url)
data = self._download_json( data = self._download_json(
'https://10play.com.au/api/v1/videos/' + content_id, content_id) 'https://10.com.au/api/v1/videos/' + content_id, content_id)
video_data = self._download_json( video_data = self._download_json(
f'https://vod.ten.com.au/api/videos/bcquery?command=find_videos_by_id&video_id={data["altId"]}', f'https://vod.ten.com.au/api/videos/bcquery?command=find_videos_by_id&video_id={data["altId"]}',
@@ -137,21 +138,24 @@ class TenPlayIE(InfoExtractor):
class TenPlaySeasonIE(InfoExtractor): class TenPlaySeasonIE(InfoExtractor):
IE_NAME = '10play:season' IE_NAME = '10play:season'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?10(?:play)?\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-15', 'url': 'https://10.com.au/masterchef/episodes/season-15',
'info_dict': { 'info_dict': {
'title': 'Season 15', 'title': 'Season 15',
'id': 'MTQ2NjMxOQ==', 'id': 'MTQ2NjMxOQ==',
}, },
'playlist_mincount': 50, 'playlist_mincount': 50,
}, { }, {
'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2024', 'url': 'https://10.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2024',
'info_dict': { 'info_dict': {
'title': 'Season 2024', 'title': 'Season 2024',
'id': 'Mjc0OTIw', 'id': 'Mjc0OTIw',
}, },
'playlist_mincount': 159, 'playlist_mincount': 159,
}, {
'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2024',
'only_matching': True,
}] }]
def _entries(self, load_more_url, display_id=None): def _entries(self, load_more_url, display_id=None):
@@ -172,7 +176,7 @@ class TenPlaySeasonIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
show, season = self._match_valid_url(url).group('show', 'season') show, season = self._match_valid_url(url).group('show', 'season')
season_info = self._download_json( season_info = self._download_json(
f'https://10play.com.au/api/shows/{show}/episodes/{season}', f'{show}/{season}') f'https://10.com.au/api/shows/{show}/episodes/{season}', f'{show}/{season}')
episodes_carousel = traverse_obj(season_info, ( episodes_carousel = traverse_obj(season_info, (
'content', 0, 'components', ( 'content', 0, 'components', (

View File

@@ -6,6 +6,7 @@ from ..utils import ExtractorError, clean_html, int_or_none
class TFOIE(InfoExtractor): class TFOIE(InfoExtractor):
_WORKING = False
_GEO_COUNTRIES = ['CA'] _GEO_COUNTRIES = ['CA']
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = { _TEST = {

View File

@@ -0,0 +1,43 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
url_or_none,
)
from ..utils.traversal import (
find_element,
require,
traverse_obj,
)
class TheHighWireIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thehighwire\.com/ark-videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://thehighwire.com/ark-videos/the-deposition-of-stanley-plotkin/',
'info_dict': {
'id': 'the-deposition-of-stanley-plotkin',
'ext': 'mp4',
'title': 'THE DEPOSITION OF STANLEY PLOTKIN',
'description': 'md5:6d0be4f1181daaa10430fd8b945a5e54',
'thumbnail': r're:https?://static\.arkengine\.com/video/.+\.jpg',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = traverse_obj(webpage, (
{find_element(cls='ark-video-embed', html=True)},
{extract_attributes}, 'src', {url_or_none}, {require('embed URL')}))
embed_page = self._download_webpage(embed_url, display_id)
return {
'id': display_id,
**traverse_obj(webpage, {
'title': ({find_element(cls='section-header')}, {clean_html}),
'description': ({find_element(cls='episode-description__copy')}, {clean_html}),
}),
**self._parse_html5_media_entries(embed_url, embed_page, display_id, m3u8_id='hls')[0],
}

View File

@@ -51,6 +51,7 @@ class TV5UnisBaseIE(InfoExtractor):
class TV5UnisVideoIE(TV5UnisBaseIE): class TV5UnisVideoIE(TV5UnisBaseIE):
_WORKING = False
IE_NAME = 'tv5unis:video' IE_NAME = 'tv5unis:video'
_VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
@@ -71,6 +72,7 @@ class TV5UnisVideoIE(TV5UnisBaseIE):
class TV5UnisIE(TV5UnisBaseIE): class TV5UnisIE(TV5UnisBaseIE):
_WORKING = False
IE_NAME = 'tv5unis' IE_NAME = 'tv5unis'
_VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/(?P<id>[^/]+)(?:/saisons/(?P<season_number>\d+)/episodes/(?P<episode_number>\d+))?/?(?:[?#&]|$)' _VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/(?P<id>[^/]+)(?:/saisons/(?P<season_number>\d+)/episodes/(?P<episode_number>\d+))?/?(?:[?#&]|$)'
_TESTS = [{ _TESTS = [{

View File

@@ -6,6 +6,7 @@ import re
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
UserNotLive, UserNotLive,
@@ -188,19 +189,39 @@ class TwitchBaseIE(InfoExtractor):
}] if thumbnail else None }] if thumbnail else None
def _extract_twitch_m3u8_formats(self, path, video_id, token, signature, live_from_start=False): def _extract_twitch_m3u8_formats(self, path, video_id, token, signature, live_from_start=False):
formats = self._extract_m3u8_formats( try:
f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={ formats = self._extract_m3u8_formats(
'allow_source': 'true', f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={
'allow_audio_only': 'true', 'allow_source': 'true',
'allow_spectre': 'true', 'allow_audio_only': 'true',
'p': random.randint(1000000, 10000000), 'allow_spectre': 'true',
'platform': 'web', 'p': random.randint(1000000, 10000000),
'player': 'twitchweb', 'platform': 'web',
'supported_codecs': 'av1,h265,h264', 'player': 'twitchweb',
'playlist_include_framerate': 'true', 'supported_codecs': 'av1,h265,h264',
'sig': signature, 'playlist_include_framerate': 'true',
'token': token, 'sig': signature,
}) 'token': token,
})
except ExtractorError as e:
if (
not isinstance(e.cause, HTTPError)
or e.cause.status != 403
or e.cause.response.get_header('content-type') != 'application/json'
):
raise
error_info = traverse_obj(e.cause.response.read(), ({json.loads}, 0, {dict})) or {}
if error_info.get('error_code') in ('vod_manifest_restricted', 'unauthorized_entitlements'):
common_msg = 'access to this subscriber-only content'
if self._get_cookies('https://gql.twitch.tv').get('auth-token'):
raise ExtractorError(f'Your account does not have {common_msg}', expected=True)
self.raise_login_required(f'You must be logged into an account that has {common_msg}')
if error_msg := join_nonempty('error_code', 'error', from_dict=error_info, delim=': '):
raise ExtractorError(error_msg, expected=True)
raise
for fmt in formats: for fmt in formats:
if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'): if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'):
# mpegts does not yet have proper support for av1 # mpegts does not yet have proper support for av1

View File

@@ -0,0 +1,32 @@
from .common import InfoExtractor
from .kaltura import KalturaIE
class UnitedNationsWebTvIE(InfoExtractor):
_VALID_URL = r'https?://webtv\.un\.org/(?:ar|zh|en|fr|ru|es)/asset/\w+/(?P<id>\w+)'
_TESTS = [{
'url': 'https://webtv.un.org/en/asset/k1o/k1o7stmi6p',
'md5': 'b2f8b3030063298ae841b4b7ddc01477',
'info_dict': {
'id': '1_o7stmi6p',
'ext': 'mp4',
'title': 'António Guterres (Secretary-General) on Israel and Iran - Security Council, 9939th meeting',
'thumbnail': 'http://cfvod.kaltura.com/p/2503451/sp/250345100/thumbnail/entry_id/1_o7stmi6p/version/100021',
'uploader_id': 'evgeniia.alisova@un.org',
'upload_date': '20250620',
'timestamp': 1750430976,
'duration': 234,
'view_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._html_search_regex(
r'partnerId:\s*(\d+)', webpage, 'partner_id')
entry_id = self._html_search_regex(
r'const\s+kentryID\s*=\s*["\'](\w+)["\']', webpage, 'kentry_id')
return self.url_result(f'kaltura:{partner_id}:{entry_id}', KalturaIE)

View File

@@ -53,6 +53,10 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'(?:beta\.)?crunchyroll\.com', r'(?:beta\.)?crunchyroll\.com',
r'viki\.com', r'viki\.com',
r'deezer\.com', r'deezer\.com',
r'b-ch\.com',
r'ctv\.ca',
r'noovo\.ca',
r'tsn\.ca',
) )
_TESTS = [{ _TESTS = [{
@@ -168,6 +172,18 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, { }, {
'url': 'http://www.deezer.com/playlist/176747451', 'url': 'http://www.deezer.com/playlist/176747451',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.b-ch.com/titles/8203/001',
'only_matching': True,
}, {
'url': 'https://www.ctv.ca/shows/masterchef-53506/the-audition-battles-s15e1',
'only_matching': True,
}, {
'url': 'https://www.noovo.ca/emissions/lamour-est-dans-le-pre/prets-pour-lamour-s10e1',
'only_matching': True,
}, {
'url': 'https://www.tsn.ca/video/relaxed-oilers-look-to-put-emotional-game-2-loss-in-the-rearview%7E3148747',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -21,6 +21,7 @@ from ..utils import (
js_to_json, js_to_json,
jwt_decode_hs256, jwt_decode_hs256,
merge_dicts, merge_dicts,
mimetype2ext,
parse_filesize, parse_filesize,
parse_iso8601, parse_iso8601,
parse_qs, parse_qs,
@@ -28,9 +29,11 @@ from ..utils import (
smuggle_url, smuggle_url,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
try_call,
try_get, try_get,
unified_timestamp, unified_timestamp,
unsmuggle_url, unsmuggle_url,
url_basename,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urlhandle_detect_ext, urlhandle_detect_ext,
@@ -45,14 +48,57 @@ class VimeoBaseInfoExtractor(InfoExtractor):
_REFERER_HINT = ( _REFERER_HINT = (
'Cannot download embed-only video without embedding URL. Please call yt-dlp ' 'Cannot download embed-only video without embedding URL. Please call yt-dlp '
'with the URL of the page that embeds this video.') 'with the URL of the page that embeds this video.')
_IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw=='
_IOS_CLIENT_HEADERS = { _DEFAULT_CLIENT = 'android'
_DEFAULT_AUTHED_CLIENT = 'web'
_CLIENT_HEADERS = {
'Accept': 'application/vnd.vimeo.*+json; version=3.4.10', 'Accept': 'application/vnd.vimeo.*+json; version=3.4.10',
'Accept-Language': 'en', 'Accept-Language': 'en',
'User-Agent': 'Vimeo/11.10.0 (com.vimeo; build:250424.164813.0; iOS 18.4.1) Alamofire/5.9.0 VimeoNetworking/5.0.0',
} }
_IOS_OAUTH_CACHE_KEY = 'oauth-token-ios' _CLIENT_CONFIGS = {
_ios_oauth_token = None 'android': {
'CACHE_KEY': 'oauth-token-android',
'CACHE_ONLY': False,
'VIEWER_JWT': False,
'REQUIRES_AUTH': False,
'AUTH': 'NzRmYTg5YjgxMWExY2JiNzUwZDg1MjhkMTYzZjQ4YWYyOGEyZGJlMTp4OGx2NFd3QnNvY1lkamI2UVZsdjdDYlNwSDUrdm50YzdNNThvWDcwN1JrenJGZC9tR1lReUNlRjRSVklZeWhYZVpRS0tBcU9YYzRoTGY2Z1dlVkJFYkdJc0dMRHpoZWFZbU0reDRqZ1dkZ1diZmdIdGUrNUM5RVBySlM0VG1qcw==',
'USER_AGENT': 'com.vimeo.android.videoapp (OnePlus, ONEPLUS A6003, OnePlus, Android 14/34 Version 11.8.1) Kotlin VimeoNetworking/3.12.0',
'VIDEOS_FIELDS': (
'uri', 'name', 'description', 'type', 'link', 'player_embed_url', 'duration', 'width',
'language', 'height', 'embed', 'created_time', 'modified_time', 'release_time', 'content_rating',
'content_rating_class', 'rating_mod_locked', 'license', 'privacy', 'pictures', 'tags', 'stats',
'categories', 'uploader', 'metadata', 'user', 'files', 'download', 'app', 'play', 'status',
'resource_key', 'badge', 'upload', 'transcode', 'is_playable', 'has_audio',
),
},
'ios': {
'CACHE_KEY': 'oauth-token-ios',
'CACHE_ONLY': True,
'VIEWER_JWT': False,
'REQUIRES_AUTH': False,
'AUTH': 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw==',
'USER_AGENT': 'Vimeo/11.10.0 (com.vimeo; build:250424.164813.0; iOS 18.4.1) Alamofire/5.9.0 VimeoNetworking/5.0.0',
'VIDEOS_FIELDS': (
'uri', 'name', 'description', 'type', 'link', 'player_embed_url', 'duration',
'width', 'language', 'height', 'embed', 'created_time', 'modified_time', 'release_time',
'content_rating', 'content_rating_class', 'rating_mod_locked', 'license', 'config_url',
'embed_player_config_url', 'privacy', 'pictures', 'tags', 'stats', 'categories', 'uploader',
'metadata', 'user', 'files', 'download', 'app', 'play', 'status', 'resource_key', 'badge',
'upload', 'transcode', 'is_playable', 'has_audio',
),
},
'web': {
'VIEWER_JWT': True,
'REQUIRES_AUTH': True,
'USER_AGENT': None,
'VIDEOS_FIELDS': (
'config_url', 'created_time', 'description', 'license',
'metadata.connections.comments.total', 'metadata.connections.likes.total',
'release_time', 'stats.plays',
),
},
}
_oauth_tokens = {}
_viewer_info = None _viewer_info = None
@staticmethod @staticmethod
@@ -80,7 +126,14 @@ class VimeoBaseInfoExtractor(InfoExtractor):
return self._viewer_info return self._viewer_info
@property
def _is_logged_in(self):
return 'vimeo' in self._get_cookies('https://vimeo.com')
def _perform_login(self, username, password): def _perform_login(self, username, password):
if self._is_logged_in:
return
viewer = self._fetch_viewer_info() viewer = self._fetch_viewer_info()
data = { data = {
'action': 'login', 'action': 'login',
@@ -105,8 +158,8 @@ class VimeoBaseInfoExtractor(InfoExtractor):
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _real_initialize(self): def _real_initialize(self):
if self._LOGIN_REQUIRED and not self._get_cookies('https://vimeo.com').get('vuid'): if self._LOGIN_REQUIRED and not self._is_logged_in:
self._raise_login_required() self.raise_login_required()
def _get_video_password(self): def _get_video_password(self):
password = self.get_param('videopassword') password = self.get_param('videopassword')
@@ -277,52 +330,95 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'), '_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'),
} }
def _fetch_oauth_token(self): def _fetch_oauth_token(self, client):
if not self._ios_oauth_token: client_config = self._CLIENT_CONFIGS[client]
self._ios_oauth_token = self.cache.load(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY)
if not self._ios_oauth_token: if client_config['VIEWER_JWT']:
self._ios_oauth_token = self._download_json( return f'jwt {self._fetch_viewer_info()["jwt"]}'
cache_key = client_config['CACHE_KEY']
if not self._oauth_tokens.get(cache_key):
self._oauth_tokens[cache_key] = self.cache.load(self._NETRC_MACHINE, cache_key)
if not self._oauth_tokens.get(cache_key):
if client_config['CACHE_ONLY']:
raise ExtractorError(
f'The {client} client is unable to fetch new OAuth tokens '
f'and is only intended for use with previously cached tokens', expected=True)
self._oauth_tokens[cache_key] = self._download_json(
'https://api.vimeo.com/oauth/authorize/client', None, 'https://api.vimeo.com/oauth/authorize/client', None,
'Fetching OAuth token', 'Failed to fetch OAuth token', f'Fetching {client} OAuth token', f'Failed to fetch {client} OAuth token',
headers={ headers={
'Authorization': f'Basic {self._IOS_CLIENT_AUTH}', 'Authorization': f'Basic {client_config["AUTH"]}',
**self._IOS_CLIENT_HEADERS, 'User-Agent': client_config['USER_AGENT'],
**self._CLIENT_HEADERS,
}, data=urlencode_postdata({ }, data=urlencode_postdata({
'grant_type': 'client_credentials', 'grant_type': 'client_credentials',
'scope': 'private public create edit delete interact upload purchased stats', 'scope': 'private public create edit delete interact upload purchased stats video_files',
}, quote_via=urllib.parse.quote))['access_token'] }, quote_via=urllib.parse.quote))['access_token']
self.cache.store(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY, self._ios_oauth_token) self.cache.store(self._NETRC_MACHINE, cache_key, self._oauth_tokens[cache_key])
return self._ios_oauth_token return f'Bearer {self._oauth_tokens[cache_key]}'
def _get_requested_client(self):
default_client = self._DEFAULT_AUTHED_CLIENT if self._is_logged_in else self._DEFAULT_CLIENT
client = self._configuration_arg('client', [default_client], ie_key=VimeoIE)[0]
if client not in self._CLIENT_CONFIGS:
raise ExtractorError(
f'Unsupported API client "{client}" requested. '
f'Supported clients are: {", ".join(self._CLIENT_CONFIGS)}', expected=True)
return client
def _call_videos_api(self, video_id, unlisted_hash=None, path=None, *, force_client=None, query=None, **kwargs):
client = force_client or self._get_requested_client()
client_config = self._CLIENT_CONFIGS[client]
if client_config['REQUIRES_AUTH'] and not self._is_logged_in:
self.raise_login_required(f'The {client} client requires authentication')
def _call_videos_api(self, video_id, unlisted_hash=None, **kwargs):
return self._download_json( return self._download_json(
join_nonempty(f'https://api.vimeo.com/videos/{video_id}', unlisted_hash, delim=':'), join_nonempty(
video_id, 'Downloading API JSON', headers={ 'https://api.vimeo.com/videos',
'Authorization': f'Bearer {self._fetch_oauth_token()}', join_nonempty(video_id, unlisted_hash, delim=':'),
**self._IOS_CLIENT_HEADERS, path, delim='/'),
}, query={ video_id, f'Downloading {client} API JSON', f'Unable to download {client} API JSON',
'fields': ','.join(( headers=filter_dict({
'config_url', 'embed_player_config_url', 'player_embed_url', 'download', 'play', 'Authorization': self._fetch_oauth_token(client),
'files', 'description', 'license', 'release_time', 'created_time', 'stats.plays', 'User-Agent': client_config['USER_AGENT'],
'metadata.connections.comments.total', 'metadata.connections.likes.total')), **self._CLIENT_HEADERS,
}), query={
'fields': ','.join(client_config['VIDEOS_FIELDS']),
**(query or {}),
}, **kwargs) }, **kwargs)
def _extract_original_format(self, url, video_id, unlisted_hash=None, api_data=None): def _extract_original_format(self, url, video_id, unlisted_hash=None):
# Original/source formats are only available when logged in # Original/source formats are only available when logged in
if not self._get_cookies('https://vimeo.com/').get('vimeo'): if not self._is_logged_in:
return return None
query = {'action': 'load_download_config'} policy = self._configuration_arg('original_format_policy', ['auto'], ie_key=VimeoIE)[0]
if unlisted_hash: if policy == 'never':
query['unlisted_hash'] = unlisted_hash return None
download_data = self._download_json(
url, video_id, 'Loading download config JSON', fatal=False, try:
query=query, headers={'X-Requested-With': 'XMLHttpRequest'}, download_data = self._download_json(
expected_status=(403, 404)) or {} url, video_id, 'Loading download config JSON', query=filter_dict({
source_file = download_data.get('source_file') 'action': 'load_download_config',
download_url = try_get(source_file, lambda x: x['download_url']) 'unlisted_hash': unlisted_hash,
}), headers={
'Accept': 'application/json',
'X-Requested-With': 'XMLHttpRequest',
})
except ExtractorError as error:
self.write_debug(f'Unable to load download config JSON: {error.cause}')
download_data = None
source_file = traverse_obj(download_data, ('source_file', {dict})) or {}
download_url = traverse_obj(source_file, ('download_url', {url_or_none}))
if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'): if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
source_name = source_file.get('public_name', 'Original') source_name = source_file.get('public_name', 'Original')
if self._is_valid_url(download_url, video_id, f'{source_name} video'): if self._is_valid_url(download_url, video_id, f'{source_name} video'):
@@ -340,8 +436,27 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'quality': 1, 'quality': 1,
} }
original_response = api_data or self._call_videos_api( # Most web client API requests are subject to rate-limiting (429) when logged-in.
video_id, unlisted_hash, fatal=False, expected_status=(403, 404)) # Requesting only the 'privacy' field is NOT rate-limited,
# so first we should check if video even has 'download' formats available
try:
privacy_info = self._call_videos_api(
video_id, unlisted_hash, force_client='web', query={'fields': 'privacy'})
except ExtractorError as error:
self.write_debug(f'Unable to download privacy info: {error.cause}')
return None
if not traverse_obj(privacy_info, ('privacy', 'download', {bool})):
msg = f'{video_id}: Vimeo says this video is not downloadable'
if policy != 'always':
self.write_debug(
f'{msg}, so yt-dlp is not attempting to extract the original/source format. '
f'To try anyways, use --extractor-args "vimeo:original_format_policy=always"')
return None
self.write_debug(f'{msg}; attempting to extract original/source format anyways')
original_response = self._call_videos_api(
video_id, unlisted_hash, force_client='web', query={'fields': 'download'}, fatal=False)
for download_data in traverse_obj(original_response, ('download', ..., {dict})): for download_data in traverse_obj(original_response, ('download', ..., {dict})):
download_url = download_data.get('link') download_url = download_data.get('link')
if not download_url or download_data.get('quality') != 'source': if not download_url or download_data.get('quality') != 'source':
@@ -919,25 +1034,125 @@ class VimeoIE(VimeoBaseInfoExtractor):
raise ExtractorError('Wrong video password', expected=True) raise ExtractorError('Wrong video password', expected=True)
return checked return checked
def _get_subtitles(self, video_id, unlisted_hash):
subs = {}
text_tracks = self._call_videos_api(
video_id, unlisted_hash, path='texttracks', query={
'include_transcript': 'true',
'fields': ','.join((
'active', 'display_language', 'id', 'language', 'link', 'name', 'type', 'uri',
)),
}, fatal=False)
for tt in traverse_obj(text_tracks, ('data', lambda _, v: url_or_none(v['link']))):
subs.setdefault(tt.get('language'), []).append({
'url': tt['link'],
'ext': 'vtt',
'name': tt.get('display_language'),
})
return subs
def _parse_api_response(self, video, video_id, unlisted_hash=None):
formats, subtitles = [], {}
seen_urls = set()
duration = traverse_obj(video, ('duration', {int_or_none}))
for file in traverse_obj(video, (
(('play', (None, 'progressive')), 'files', 'download'), lambda _, v: url_or_none(v['link']),
)):
format_url = file['link']
if format_url in seen_urls:
continue
seen_urls.add(format_url)
quality = file.get('quality')
ext = determine_ext(format_url)
if quality == 'hls' or ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
elif quality == 'dash' or ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
format_url, video_id, mpd_id='dash', fatal=False)
for fmt in fmts:
fmt['format_id'] = join_nonempty(
*fmt['format_id'].split('-', 2)[:2], int_or_none(fmt.get('tbr')))
else:
fmt = traverse_obj(file, {
'ext': ('type', {mimetype2ext(default='mp4')}),
'vcodec': ('codec', {str.lower}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
'filesize': ('size', {int_or_none}),
'fps': ('fps', {int_or_none}),
})
fmt.update({
'url': format_url,
'format_id': join_nonempty(
'http', traverse_obj(file, 'public_name', 'rendition'), quality),
'tbr': try_call(lambda: fmt['filesize'] * 8 / duration / 1024),
})
formats.append(fmt)
continue
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if traverse_obj(video, ('metadata', 'connections', 'texttracks', 'total', {int})):
self._merge_subtitles(self.extract_subtitles(video_id, unlisted_hash), target=subtitles)
return {
**traverse_obj(video, {
'title': ('name', {str}),
'uploader': ('user', 'name', {str}),
'uploader_id': ('user', 'link', {url_basename}),
'uploader_url': ('user', 'link', {url_or_none}),
'release_timestamp': ('live', 'scheduled_start_time', {int_or_none}),
'thumbnails': ('pictures', 'sizes', lambda _, v: url_or_none(v['link']), {
'url': 'link',
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
'id': video_id,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'live_status': {
'streaming': 'is_live',
'done': 'was_live',
}.get(traverse_obj(video, ('live', 'status', {str}))),
}
def _extract_from_api(self, video_id, unlisted_hash=None): def _extract_from_api(self, video_id, unlisted_hash=None):
for retry in (False, True): for retry in (False, True):
try: try:
video = self._call_videos_api(video_id, unlisted_hash) video = self._call_videos_api(video_id, unlisted_hash)
break break
except ExtractorError as e: except ExtractorError as e:
if (not retry and isinstance(e.cause, HTTPError) and e.cause.status == 400 if not isinstance(e.cause, HTTPError):
and 'password' in traverse_obj( raise
self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False), response = traverse_obj(
({json.loads}, 'invalid_parameters', ..., 'field'), self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False),
)): ({json.loads}, {dict})) or {}
if (
not retry and e.cause.status == 400
and 'password' in traverse_obj(response, ('invalid_parameters', ..., 'field'))
):
self._verify_video_password(video_id) self._verify_video_password(video_id)
continue elif e.cause.status == 404 and response.get('error_code') == 5460:
raise self.raise_login_required(join_nonempty(
traverse_obj(response, ('error', {str.strip})),
'Authentication may be needed due to your location.',
'If your IP address is located in Europe you could try using a VPN/proxy,',
f'or else u{self._login_hint()[1:]}',
delim=' '), method=None)
else:
raise
if config_url := traverse_obj(video, ('config_url', {url_or_none})):
info = self._parse_config(self._download_json(config_url, video_id), video_id)
else:
info = self._parse_api_response(video, video_id, unlisted_hash)
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(
f'https://vimeo.com/{video_id}', video_id, unlisted_hash, api_data=video) f'https://vimeo.com/{video_id}', video_id, unlisted_hash)
if source_format: if source_format:
info['formats'].append(source_format) info['formats'].append(source_format)

View File

@@ -1,5 +1,6 @@
import calendar import calendar
import copy import copy
import dataclasses
import datetime as dt import datetime as dt
import enum import enum
import functools import functools
@@ -38,6 +39,60 @@ class _PoTokenContext(enum.Enum):
SUBS = 'subs' SUBS = 'subs'
class StreamingProtocol(enum.Enum):
HTTPS = 'https'
DASH = 'dash'
HLS = 'hls'
@dataclasses.dataclass
class BasePoTokenPolicy:
required: bool = False
# Try to fetch a PO Token even if it is not required.
recommended: bool = False
not_required_for_premium: bool = False
@dataclasses.dataclass
class GvsPoTokenPolicy(BasePoTokenPolicy):
not_required_with_player_token: bool = False
@dataclasses.dataclass
class PlayerPoTokenPolicy(BasePoTokenPolicy):
pass
@dataclasses.dataclass
class SubsPoTokenPolicy(BasePoTokenPolicy):
pass
WEB_PO_TOKEN_POLICIES = {
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False),
# In rollout, currently detected via experiment
# Premium users DO require a PO Token for subtitles
'SUBS_PO_TOKEN_POLICY': SubsPoTokenPolicy(required=False),
}
# any clients starting with _ cannot be explicitly requested by the user # any clients starting with _ cannot be explicitly requested by the user
INNERTUBE_CLIENTS = { INNERTUBE_CLIENTS = {
'web': { 'web': {
@@ -48,8 +103,9 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 1, 'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
**WEB_PO_TOKEN_POLICIES,
'PLAYER_PARAMS': '8AEB',
}, },
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats # Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
'web_safari': { 'web_safari': {
@@ -61,8 +117,9 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 1, 'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
**WEB_PO_TOKEN_POLICIES,
'PLAYER_PARAMS': '8AEB',
}, },
'web_embedded': { 'web_embedded': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
@@ -83,7 +140,24 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 67, 'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS], 'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
@@ -95,7 +169,24 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 62, 'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS], 'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
@@ -112,7 +203,24 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 3, 'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS], 'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
not_required_with_player_token=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False, recommended=True),
}, },
# YouTube Kids videos aren't returned on this client for some reason # YouTube Kids videos aren't returned on this client for some reason
'android_vr': { 'android_vr': {
@@ -146,7 +254,21 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 5, 'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS], 'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
# HLS Livestreams require POT 30 seconds in
# TODO: Rolling out
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
not_required_with_player_token=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False, recommended=True),
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
# mweb has 'ultralow' formats # mweb has 'ultralow' formats
@@ -161,7 +283,24 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 2, 'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS], 'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
'tv': { 'tv': {
@@ -174,6 +313,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 7, 'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
'PLAYER_PARAMS': '8AEB',
}, },
'tv_simply': { 'tv_simply': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
@@ -224,7 +364,11 @@ def build_innertube_clients():
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()): for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com') ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg.setdefault('REQUIRE_JS_PLAYER', True) ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
ytcfg.setdefault('PO_TOKEN_REQUIRED_CONTEXTS', []) ytcfg.setdefault('GVS_PO_TOKEN_POLICY', {})
for protocol in StreamingProtocol:
ytcfg['GVS_PO_TOKEN_POLICY'].setdefault(protocol, GvsPoTokenPolicy())
ytcfg.setdefault('PLAYER_PO_TOKEN_POLICY', PlayerPoTokenPolicy())
ytcfg.setdefault('SUBS_PO_TOKEN_POLICY', SubsPoTokenPolicy())
ytcfg.setdefault('REQUIRE_AUTH', False) ytcfg.setdefault('REQUIRE_AUTH', False)
ytcfg.setdefault('SUPPORTS_COOKIES', False) ytcfg.setdefault('SUPPORTS_COOKIES', False)
ytcfg.setdefault('PLAYER_PARAMS', None) ytcfg.setdefault('PLAYER_PARAMS', None)

View File

@@ -317,17 +317,31 @@ class YoutubeTabBaseInfoExtractor(YoutubeBaseInfoExtractor):
content_id = view_model.get('contentId') content_id = view_model.get('contentId')
if not content_id: if not content_id:
return return
content_type = view_model.get('contentType') content_type = view_model.get('contentType')
if content_type not in ('LOCKUP_CONTENT_TYPE_PLAYLIST', 'LOCKUP_CONTENT_TYPE_PODCAST'): if content_type == 'LOCKUP_CONTENT_TYPE_VIDEO':
ie = YoutubeIE
url = f'https://www.youtube.com/watch?v={content_id}'
thumb_keys = (None,)
elif content_type in ('LOCKUP_CONTENT_TYPE_PLAYLIST', 'LOCKUP_CONTENT_TYPE_PODCAST'):
ie = YoutubeTabIE
url = f'https://www.youtube.com/playlist?list={content_id}'
thumb_keys = ('collectionThumbnailViewModel', 'primaryThumbnail')
else:
self.report_warning( self.report_warning(
f'Unsupported lockup view model content type "{content_type}"{bug_reports_message()}', only_once=True) f'Unsupported lockup view model content type "{content_type}"{bug_reports_message()}',
only_once=True)
return return
return self.url_result( return self.url_result(
f'https://www.youtube.com/playlist?list={content_id}', ie=YoutubeTabIE, video_id=content_id, url, ie, content_id,
title=traverse_obj(view_model, ( title=traverse_obj(view_model, (
'metadata', 'lockupMetadataViewModel', 'title', 'content', {str})), 'metadata', 'lockupMetadataViewModel', 'title', 'content', {str})),
thumbnails=self._extract_thumbnails(view_model, ( thumbnails=self._extract_thumbnails(view_model, (
'contentImage', 'collectionThumbnailViewModel', 'primaryThumbnail', 'thumbnailViewModel', 'image'), final_key='sources')) 'contentImage', *thumb_keys, 'thumbnailViewModel', 'image'), final_key='sources'),
duration=traverse_obj(view_model, (
'contentImage', 'thumbnailViewModel', 'overlays', ..., 'thumbnailOverlayBadgeViewModel',
'thumbnailBadges', ..., 'thumbnailBadgeViewModel', 'text', {parse_duration}, any)))
def _rich_entries(self, rich_grid_renderer): def _rich_entries(self, rich_grid_renderer):
if lockup_view_model := traverse_obj(rich_grid_renderer, ('content', 'lockupViewModel', {dict})): if lockup_view_model := traverse_obj(rich_grid_renderer, ('content', 'lockupViewModel', {dict})):

View File

@@ -18,6 +18,9 @@ import urllib.parse
from ._base import ( from ._base import (
INNERTUBE_CLIENTS, INNERTUBE_CLIENTS,
BadgeType, BadgeType,
GvsPoTokenPolicy,
PlayerPoTokenPolicy,
StreamingProtocol,
YoutubeBaseInfoExtractor, YoutubeBaseInfoExtractor,
_PoTokenContext, _PoTokenContext,
_split_innertube_client, _split_innertube_client,
@@ -26,7 +29,7 @@ from ._base import (
from .pot._director import initialize_pot_director from .pot._director import initialize_pot_director
from .pot.provider import PoTokenContext, PoTokenRequest from .pot.provider import PoTokenContext, PoTokenRequest
from ..openload import PhantomJSwrapper from ..openload import PhantomJSwrapper
from ...jsinterp import JSInterpreter from ...jsinterp import JSInterpreter, LocalNameSpace
from ...networking.exceptions import HTTPError from ...networking.exceptions import HTTPError
from ...utils import ( from ...utils import (
NO_DEFAULT, NO_DEFAULT,
@@ -71,9 +74,11 @@ from ...utils import (
from ...utils.networking import clean_headers, clean_proxies, select_proxy from ...utils.networking import clean_headers, clean_proxies, select_proxy
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client' STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token' STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token'
STREAMING_DATA_FETCH_GVS_PO_TOKEN = '__yt_dlp_fetch_gvs_po_token'
STREAMING_DATA_PLAYER_TOKEN_PROVIDED = '__yt_dlp_player_token_provided'
STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context' STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context'
STREAMING_DATA_IS_PREMIUM_SUBSCRIBER = '__yt_dlp_is_premium_subscriber'
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide' PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
@@ -253,6 +258,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt') _SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
_DEFAULT_CLIENTS = ('tv', 'ios', 'web') _DEFAULT_CLIENTS = ('tv', 'ios', 'web')
_DEFAULT_AUTHED_CLIENTS = ('tv', 'web') _DEFAULT_AUTHED_CLIENTS = ('tv', 'web')
# Premium does not require POT (except for subtitles)
_DEFAULT_PREMIUM_CLIENTS = ('tv', 'web')
_GEO_BYPASS = False _GEO_BYPASS = False
@@ -1801,6 +1808,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js', 'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js',
} }
_INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()} _INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()}
_NSIG_FUNC_CACHE_ID = 'nsig func'
_DUMMY_STRING = 'dlp_wins'
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
@@ -1831,7 +1840,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if time.time() <= start_time + delay: if time.time() <= start_time + delay:
return return
_, _, prs, player_url = self._download_player_responses(url, smuggled_data, video_id, webpage_url) _, _, _, _, prs, player_url = self._initial_extract(
url, smuggled_data, webpage_url, 'web', video_id)
video_details = traverse_obj(prs, (..., 'videoDetails'), expected_type=dict) video_details = traverse_obj(prs, (..., 'videoDetails'), expected_type=dict)
microformats = traverse_obj( microformats = traverse_obj(
prs, (..., 'microformat', 'playerMicroformatRenderer'), prs, (..., 'microformat', 'playerMicroformatRenderer'),
@@ -2066,7 +2076,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
assert os.path.basename(func_id) == func_id assert os.path.basename(func_id) == func_id
self.write_debug(f'Extracting signature function {func_id}') self.write_debug(f'Extracting signature function {func_id}')
cache_spec, code = self.cache.load('youtube-sigfuncs', func_id, min_ver='2025.03.31'), None cache_spec, code = self.cache.load('youtube-sigfuncs', func_id, min_ver='2025.07.21'), None
if not cache_spec: if not cache_spec:
code = self._load_player(video_id, player_url) code = self._load_player(video_id, player_url)
@@ -2170,7 +2180,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if data := self._player_cache.get(cache_id): if data := self._player_cache.get(cache_id):
return data return data
data = self.cache.load(*cache_id, min_ver='2025.03.31') data = self.cache.load(*cache_id, min_ver='2025.07.21')
if data: if data:
self._player_cache[cache_id] = data self._player_cache[cache_id] = data
@@ -2204,7 +2214,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.to_screen(f'Extracted nsig function from {player_id}:\n{func_code[1]}\n') self.to_screen(f'Extracted nsig function from {player_id}:\n{func_code[1]}\n')
try: try:
extract_nsig = self._cached(self._extract_n_function_from_code, 'nsig func', player_url) extract_nsig = self._cached(self._extract_n_function_from_code, self._NSIG_FUNC_CACHE_ID, player_url)
ret = extract_nsig(jsi, func_code)(s) ret = extract_nsig(jsi, func_code)(s)
except JSInterpreter.Exception as e: except JSInterpreter.Exception as e:
try: try:
@@ -2312,16 +2322,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
jsi = JSInterpreter(varcode) jsi = JSInterpreter(varcode)
interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url) interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url)
return varname, interpret_global_var(varvalue, {}, allow_recursion=10) return varname, interpret_global_var(varvalue, LocalNameSpace(), allow_recursion=10)
def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url): def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url):
# Fixup global array
varname, global_list = self._interpret_player_js_global_var(jscode, player_url) varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if varname and global_list: if varname and global_list:
nsig_code = f'var {varname}={json.dumps(global_list)}; {nsig_code}' nsig_code = f'var {varname}={json.dumps(global_list)}; {nsig_code}'
else: else:
varname = 'dlp_wins' varname = self._DUMMY_STRING
global_list = [] global_list = []
# Fixup typeof check
undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+' undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+'
fixed_code = re.sub( fixed_code = re.sub(
fr'''(?x) fr'''(?x)
@@ -2334,6 +2346,32 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.write_debug(join_nonempty( self.write_debug(join_nonempty(
'No typeof statement found in nsig function code', 'No typeof statement found in nsig function code',
player_url and f' player = {player_url}', delim='\n'), only_once=True) player_url and f' player = {player_url}', delim='\n'), only_once=True)
# Fixup global funcs
jsi = JSInterpreter(fixed_code)
cache_id = (self._NSIG_FUNC_CACHE_ID, player_url)
try:
self._cached(
self._extract_n_function_from_code, *cache_id)(jsi, (argnames, fixed_code))(self._DUMMY_STRING)
except JSInterpreter.Exception:
self._player_cache.pop(cache_id, None)
global_funcnames = jsi._undefined_varnames
debug_names = []
jsi = JSInterpreter(jscode)
for func_name in global_funcnames:
try:
func_args, func_code = jsi.extract_function_code(func_name)
fixed_code = f'var {func_name} = function({", ".join(func_args)}) {{ {func_code} }}; {fixed_code}'
debug_names.append(func_name)
except Exception:
self.report_warning(join_nonempty(
f'Unable to extract global nsig function {func_name} from player JS',
player_url and f' player = {player_url}', delim='\n'), only_once=True)
if debug_names:
self.write_debug(f'Extracted global nsig functions: {", ".join(debug_names)}')
return argnames, fixed_code return argnames, fixed_code
def _extract_n_function_code(self, video_id, player_url): def _extract_n_function_code(self, video_id, player_url):
@@ -2347,7 +2385,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
func_name = self._extract_n_function_name(jscode, player_url=player_url) func_name = self._extract_n_function_name(jscode, player_url=player_url)
# XXX: Workaround for the global array variable and lack of `typeof` implementation # XXX: Work around (a) global array variable, (b) `typeof` short-circuit, (c) global functions
func_code = self._fixup_n_function_code(*jsi.extract_function_code(func_name), jscode, player_url) func_code = self._fixup_n_function_code(*jsi.extract_function_code(func_name), jscode, player_url)
return jsi, player_id, func_code return jsi, player_id, func_code
@@ -2820,10 +2858,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
context['signatureTimestamp'] = sts context['signatureTimestamp'] = sts
return { return {
'playbackContext': { 'playbackContext': {
'adPlaybackContext': {
'pyv': True,
'adType': 'AD_TYPE_INSTREAM',
},
'contentPlaybackContext': context, 'contentPlaybackContext': context,
}, },
**cls._get_checkok_params(), **cls._get_checkok_params(),
@@ -2865,7 +2899,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
only_once=True) only_once=True)
continue continue
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None, def fetch_po_token(self, client='web', context: _PoTokenContext = _PoTokenContext.GVS, ytcfg=None, visitor_data=None,
data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None, data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None,
required=False, **kwargs): required=False, **kwargs):
""" """
@@ -2950,7 +2984,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
fetch_pot_policy == 'never' fetch_pot_policy == 'never'
or ( or (
fetch_pot_policy == 'auto' fetch_pot_policy == 'auto'
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
and not kwargs.get('required', False) and not kwargs.get('required', False)
) )
): ):
@@ -3009,19 +3042,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _is_unplayable(player_response): def _is_unplayable(player_response):
return traverse_obj(player_response, ('playabilityStatus', 'status')) == 'UNPLAYABLE' return traverse_obj(player_response, ('playabilityStatus', 'status')) == 'UNPLAYABLE'
def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg, player_url, initial_pr, visitor_data, data_sync_id, po_token): def _extract_player_response(self, client, video_id, webpage_ytcfg, player_ytcfg, player_url, initial_pr, visitor_data, data_sync_id, po_token):
headers = self.generate_api_headers( headers = self.generate_api_headers(
ytcfg=player_ytcfg, ytcfg=player_ytcfg,
default_client=client, default_client=client,
visitor_data=visitor_data, visitor_data=visitor_data,
session_index=self._extract_session_index(master_ytcfg, player_ytcfg), session_index=self._extract_session_index(webpage_ytcfg, player_ytcfg),
delegated_session_id=( delegated_session_id=(
self._parse_data_sync_id(data_sync_id)[0] self._parse_data_sync_id(data_sync_id)[0]
or self._extract_delegated_session_id(master_ytcfg, initial_pr, player_ytcfg) or self._extract_delegated_session_id(webpage_ytcfg, initial_pr, player_ytcfg)
), ),
user_session_id=( user_session_id=(
self._parse_data_sync_id(data_sync_id)[1] self._parse_data_sync_id(data_sync_id)[1]
or self._extract_user_session_id(master_ytcfg, initial_pr, player_ytcfg) or self._extract_user_session_id(webpage_ytcfg, initial_pr, player_ytcfg)
), ),
) )
@@ -3037,7 +3070,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if po_token: if po_token:
yt_query['serviceIntegrityDimensions'] = {'poToken': po_token} yt_query['serviceIntegrityDimensions'] = {'poToken': po_token}
sts = self._extract_signature_timestamp(video_id, player_url, master_ytcfg, fatal=False) if player_url else None sts = self._extract_signature_timestamp(video_id, player_url, webpage_ytcfg, fatal=False) if player_url else None
yt_query.update(self._generate_player_context(sts)) yt_query.update(self._generate_player_context(sts))
return self._extract_response( return self._extract_response(
item_id=video_id, ep='player', query=yt_query, item_id=video_id, ep='player', query=yt_query,
@@ -3046,10 +3079,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
note='Downloading {} player API JSON'.format(client.replace('_', ' ').strip()), note='Downloading {} player API JSON'.format(client.replace('_', ' ').strip()),
) or None ) or None
def _get_requested_clients(self, url, smuggled_data): def _get_requested_clients(self, url, smuggled_data, is_premium_subscriber):
requested_clients = [] requested_clients = []
excluded_clients = [] excluded_clients = []
default_clients = self._DEFAULT_AUTHED_CLIENTS if self.is_authenticated else self._DEFAULT_CLIENTS default_clients = (
self._DEFAULT_PREMIUM_CLIENTS if is_premium_subscriber
else self._DEFAULT_AUTHED_CLIENTS if self.is_authenticated
else self._DEFAULT_CLIENTS
)
allowed_clients = sorted( allowed_clients = sorted(
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'), (client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True) key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
@@ -3091,11 +3128,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if (pr_id := traverse_obj(pr, ('videoDetails', 'videoId'))) != video_id: if (pr_id := traverse_obj(pr, ('videoDetails', 'videoId'))) != video_id:
return pr_id return pr_id
def _extract_player_responses(self, clients, video_id, webpage, master_ytcfg, smuggled_data): def _extract_player_responses(self, clients, video_id, webpage, webpage_client, webpage_ytcfg, is_premium_subscriber):
initial_pr = None initial_pr = None
if webpage: if webpage:
initial_pr = self._search_json( initial_pr = self._search_json(
self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage, 'initial player response', video_id, fatal=False) self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage,
f'{webpage_client} client initial player response', video_id, fatal=False)
prs = [] prs = []
deprioritized_prs = [] deprioritized_prs = []
@@ -3126,11 +3164,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
while clients: while clients:
deprioritize_pr = False deprioritize_pr = False
client, base_client, variant = _split_innertube_client(clients.pop()) client, base_client, variant = _split_innertube_client(clients.pop())
player_ytcfg = master_ytcfg if client == 'web' else {} player_ytcfg = webpage_ytcfg if client == webpage_client else {}
if 'configs' not in self._configuration_arg('player_skip') and client != 'web': if 'configs' not in self._configuration_arg('player_skip') and client != webpage_client:
player_ytcfg = self._download_ytcfg(client, video_id) or player_ytcfg player_ytcfg = self._download_ytcfg(client, video_id) or player_ytcfg
player_url = player_url or self._extract_player_url(master_ytcfg, player_ytcfg, webpage=webpage) player_url = player_url or self._extract_player_url(webpage_ytcfg, player_ytcfg, webpage=webpage)
require_js_player = self._get_default_ytcfg(client).get('REQUIRE_JS_PLAYER') require_js_player = self._get_default_ytcfg(client).get('REQUIRE_JS_PLAYER')
if 'js' in self._configuration_arg('player_skip'): if 'js' in self._configuration_arg('player_skip'):
require_js_player = False require_js_player = False
@@ -3140,10 +3178,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
player_url = self._download_player_url(video_id) player_url = self._download_player_url(video_id)
tried_iframe_fallback = True tried_iframe_fallback = True
pr = initial_pr if client == 'web' else None pr = None
if client == webpage_client and 'player_response' not in self._configuration_arg('webpage_skip'):
pr = initial_pr
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg) visitor_data = visitor_data or self._extract_visitor_data(webpage_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg) data_sync_id = data_sync_id or self._extract_data_sync_id(webpage_ytcfg, initial_pr, player_ytcfg)
fetch_po_token_args = { fetch_po_token_args = {
'client': client, 'client': client,
@@ -3152,53 +3192,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'data_sync_id': data_sync_id if self.is_authenticated else None, 'data_sync_id': data_sync_id if self.is_authenticated else None,
'player_url': player_url if require_js_player else None, 'player_url': player_url if require_js_player else None,
'webpage': webpage, 'webpage': webpage,
'session_index': self._extract_session_index(master_ytcfg, player_ytcfg), 'session_index': self._extract_session_index(webpage_ytcfg, player_ytcfg),
'ytcfg': player_ytcfg or self._get_default_ytcfg(client), 'ytcfg': player_ytcfg or self._get_default_ytcfg(client),
} }
# Don't need a player PO token for WEB if using player response from webpage # Don't need a player PO token for WEB if using player response from webpage
player_pot_policy: PlayerPoTokenPolicy = self._get_default_ytcfg(client)['PLAYER_PO_TOKEN_POLICY']
player_po_token = None if pr else self.fetch_po_token( player_po_token = None if pr else self.fetch_po_token(
context=_PoTokenContext.PLAYER, **fetch_po_token_args) context=_PoTokenContext.PLAYER, **fetch_po_token_args,
required=player_pot_policy.required or player_pot_policy.recommended)
gvs_po_token = self.fetch_po_token( fetch_gvs_po_token_func = functools.partial(
context=_PoTokenContext.GVS, **fetch_po_token_args) self.fetch_po_token, context=_PoTokenContext.GVS, **fetch_po_token_args)
fetch_subs_po_token_func = functools.partial( fetch_subs_po_token_func = functools.partial(
self.fetch_po_token, self.fetch_po_token, context=_PoTokenContext.SUBS, **fetch_po_token_args)
context=_PoTokenContext.SUBS,
**fetch_po_token_args,
)
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
if (
not player_po_token
and _PoTokenContext.PLAYER in required_pot_contexts
):
# TODO: may need to skip player response request. Unsure yet..
self.report_warning(
f'No Player PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized'
f'You can manually pass a Player PO Token for this client with --extractor-args "youtube:po_token={client}.player+XXX". '
f'For more information, refer to {PO_TOKEN_GUIDE_URL} .', only_once=True)
deprioritize_pr = True
if (
not gvs_po_token
and _PoTokenContext.GVS in required_pot_contexts
and 'missing_pot' in self._configuration_arg('formats')
):
# note: warning with help message is provided later during format processing
self.report_warning(
f'No GVS PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized',
only_once=True)
deprioritize_pr = True
try: try:
pr = pr or self._extract_player_response( pr = pr or self._extract_player_response(
client, video_id, client, video_id,
master_ytcfg=player_ytcfg or master_ytcfg, webpage_ytcfg=player_ytcfg or webpage_ytcfg,
player_ytcfg=player_ytcfg, player_ytcfg=player_ytcfg,
player_url=player_url, player_url=player_url,
initial_pr=initial_pr, initial_pr=initial_pr,
@@ -3216,12 +3229,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT') innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT')
sd = pr.setdefault('streamingData', {}) sd = pr.setdefault('streamingData', {})
sd[STREAMING_DATA_CLIENT_NAME] = client sd[STREAMING_DATA_CLIENT_NAME] = client
sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token sd[STREAMING_DATA_FETCH_GVS_PO_TOKEN] = fetch_gvs_po_token_func
sd[STREAMING_DATA_PLAYER_TOKEN_PROVIDED] = bool(player_po_token)
sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context
sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func
sd[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER] = is_premium_subscriber
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})): for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
f[STREAMING_DATA_CLIENT_NAME] = client f[STREAMING_DATA_CLIENT_NAME] = client
f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token f[STREAMING_DATA_FETCH_GVS_PO_TOKEN] = fetch_gvs_po_token_func
f[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER] = is_premium_subscriber
f[STREAMING_DATA_PLAYER_TOKEN_PROVIDED] = bool(player_po_token)
if deprioritize_pr: if deprioritize_pr:
deprioritized_prs.append(pr) deprioritized_prs.append(pr)
else: else:
@@ -3247,6 +3264,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# web_creator may work around age-verification for all videos but requires PO token # web_creator may work around age-verification for all videos but requires PO token
append_client('tv_embedded', 'web_creator') append_client('tv_embedded', 'web_creator')
status = traverse_obj(pr, ('playabilityStatus', 'status', {str}))
if status not in ('OK', 'LIVE_STREAM_OFFLINE', 'AGE_CHECK_REQUIRED', 'AGE_VERIFICATION_REQUIRED'):
self.write_debug(f'{video_id}: {client} player response playability status: {status}')
prs.extend(deprioritized_prs) prs.extend(deprioritized_prs)
if skipped_clients: if skipped_clients:
@@ -3327,6 +3348,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}), }),
} for range_start in range(0, f['filesize'], CHUNK_SIZE)) } for range_start in range(0, f['filesize'], CHUNK_SIZE))
def gvs_pot_required(policy, is_premium_subscriber, has_player_token):
return (
policy.required
and not (policy.not_required_with_player_token and has_player_token)
and not (policy.not_required_for_premium and is_premium_subscriber))
# save pots per client to avoid fetching again
gvs_pots = {}
for fmt in streaming_formats: for fmt in streaming_formats:
client_name = fmt[STREAMING_DATA_CLIENT_NAME] client_name = fmt[STREAMING_DATA_CLIENT_NAME]
if fmt.get('targetDurationSec'): if fmt.get('targetDurationSec'):
@@ -3386,7 +3416,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
encrypted_sig = try_get(sc, lambda x: x['s'][0]) encrypted_sig = try_get(sc, lambda x: x['s'][0])
if not all((sc, fmt_url, player_url, encrypted_sig)): if not all((sc, fmt_url, player_url, encrypted_sig)):
msg = f'Some {client_name} client https formats have been skipped as they are missing a url. ' msg = f'Some {client_name} client https formats have been skipped as they are missing a url. '
if client_name == 'web': if client_name in ('web', 'web_safari'):
msg += 'YouTube is forcing SABR streaming for this client. ' msg += 'YouTube is forcing SABR streaming for this client. '
else: else:
msg += ( msg += (
@@ -3446,18 +3476,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.report_warning( self.report_warning(
'Some formats are possibly damaged. They will be deprioritized', video_id, only_once=True) 'Some formats are possibly damaged. They will be deprioritized', video_id, only_once=True)
po_token = fmt.get(STREAMING_DATA_INITIAL_PO_TOKEN) fetch_po_token_func = fmt[STREAMING_DATA_FETCH_GVS_PO_TOKEN]
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.HTTPS]
require_po_token = (
itag not in ['18']
and gvs_pot_required(
pot_policy, fmt[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER],
fmt[STREAMING_DATA_PLAYER_TOKEN_PROVIDED]))
po_token = (
gvs_pots.get(client_name)
or fetch_po_token_func(required=require_po_token or pot_policy.recommended))
if po_token: if po_token:
fmt_url = update_url_query(fmt_url, {'pot': po_token}) fmt_url = update_url_query(fmt_url, {'pot': po_token})
if client_name not in gvs_pots:
gvs_pots[client_name] = po_token
# Clients that require PO Token return videoplayback URLs that may return 403 if not po_token and require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
require_po_token = (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and itag not in ['18']) # these formats do not require PO Token
if require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, 'https') self._report_pot_format_skipped(video_id, client_name, 'https')
continue continue
@@ -3472,7 +3509,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
name, fmt.get('isDrc') and 'DRC', name, fmt.get('isDrc') and 'DRC',
try_get(fmt, lambda x: x['projectionType'].replace('RECTANGULAR', '').lower()), try_get(fmt, lambda x: x['projectionType'].replace('RECTANGULAR', '').lower()),
try_get(fmt, lambda x: x['spatialAudioType'].replace('SPATIAL_AUDIO_TYPE_', '').lower()), try_get(fmt, lambda x: x['spatialAudioType'].replace('SPATIAL_AUDIO_TYPE_', '').lower()),
is_damaged and 'DAMAGED', require_po_token and 'MISSING POT', is_damaged and 'DAMAGED', require_po_token and not po_token and 'MISSING POT',
(self.get_param('verbose') or all_formats) and short_client_name(client_name), (self.get_param('verbose') or all_formats) and short_client_name(client_name),
delim=', '), delim=', '),
# Format 22 is likely to be damaged. See https://github.com/yt-dlp/yt-dlp/issues/3372 # Format 22 is likely to be damaged. See https://github.com/yt-dlp/yt-dlp/issues/3372
@@ -3535,7 +3572,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
elif skip_bad_formats and live_status == 'is_live' and needs_live_processing != 'is_live': elif skip_bad_formats and live_status == 'is_live' and needs_live_processing != 'is_live':
skip_manifests.add('dash') skip_manifests.add('dash')
def process_manifest_format(f, proto, client_name, itag, po_token): def process_manifest_format(f, proto, client_name, itag, missing_pot):
key = (proto, f.get('language')) key = (proto, f.get('language'))
if not all_formats and key in itags[itag]: if not all_formats and key in itags[itag]:
return False return False
@@ -3543,20 +3580,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if f.get('source_preference') is None: if f.get('source_preference') is None:
f['source_preference'] = -1 f['source_preference'] = -1
# Clients that require PO Token return videoplayback URLs that may return 403 if missing_pot:
# hls does not currently require PO Token
if (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and proto != 'hls'
):
if 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, proto)
return False
f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ') f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ')
f['source_preference'] -= 20 f['source_preference'] -= 20
# XXX: Check if IOS HLS formats are affected by player PO token enforcement; temporary # XXX: Check if IOS HLS formats are affected by PO token enforcement; temporary
# See https://github.com/yt-dlp/yt-dlp/issues/13511 # See https://github.com/yt-dlp/yt-dlp/issues/13511
if proto == 'hls' and client_name == 'ios': if proto == 'hls' and client_name == 'ios':
f['__needs_testing'] = True f['__needs_testing'] = True
@@ -3595,39 +3623,62 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
subtitles = {} subtitles = {}
for sd in streaming_data: for sd in streaming_data:
client_name = sd[STREAMING_DATA_CLIENT_NAME] client_name = sd[STREAMING_DATA_CLIENT_NAME]
po_token = sd.get(STREAMING_DATA_INITIAL_PO_TOKEN) fetch_pot_func = sd[STREAMING_DATA_FETCH_GVS_PO_TOKEN]
is_premium_subscriber = sd[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER]
has_player_token = sd[STREAMING_DATA_PLAYER_TOKEN_PROVIDED]
hls_manifest_url = 'hls' not in skip_manifests and sd.get('hlsManifestUrl') hls_manifest_url = 'hls' not in skip_manifests and sd.get('hlsManifestUrl')
if hls_manifest_url: if hls_manifest_url:
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(
client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.HLS]
require_po_token = gvs_pot_required(pot_policy, is_premium_subscriber, has_player_token)
po_token = gvs_pots.get(client_name, fetch_pot_func(required=require_po_token or pot_policy.recommended))
if po_token: if po_token:
hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}' hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}'
fmts, subs = self._extract_m3u8_formats_and_subtitles( if client_name not in gvs_pots:
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live') gvs_pots[client_name] = po_token
for sub in traverse_obj(subs, (..., ..., {dict})): if require_po_token and not po_token and 'missing_pot' not in self._configuration_arg('formats'):
# HLS subs (m3u8) do not need a PO token; save client name for debugging self._report_pot_format_skipped(video_id, client_name, 'hls')
sub[STREAMING_DATA_CLIENT_NAME] = client_name else:
subtitles = self._merge_subtitles(subs, subtitles) fmts, subs = self._extract_m3u8_formats_and_subtitles(
for f in fmts: hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
if process_manifest_format(f, 'hls', client_name, self._search_regex( for sub in traverse_obj(subs, (..., ..., {dict})):
r'/itag/(\d+)', f['url'], 'itag', default=None), po_token): # TODO: If HLS video requires a PO Token, do the subs also require pot?
yield f # Save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles)
for f in fmts:
if process_manifest_format(f, 'hls', client_name, self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None), require_po_token and not po_token):
yield f
dash_manifest_url = 'dash' not in skip_manifests and sd.get('dashManifestUrl') dash_manifest_url = 'dash' not in skip_manifests and sd.get('dashManifestUrl')
if dash_manifest_url: if dash_manifest_url:
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(
client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.DASH]
require_po_token = gvs_pot_required(pot_policy, is_premium_subscriber, has_player_token)
po_token = gvs_pots.get(client_name, fetch_pot_func(required=require_po_token or pot_policy.recommended))
if po_token: if po_token:
dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}' dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}'
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False) if client_name not in gvs_pots:
for sub in traverse_obj(subs, (..., ..., {dict})): gvs_pots[client_name] = po_token
# TODO: Investigate if DASH subs ever need a PO token; save client name for debugging if require_po_token and not po_token and 'missing_pot' not in self._configuration_arg('formats'):
sub[STREAMING_DATA_CLIENT_NAME] = client_name self._report_pot_format_skipped(video_id, client_name, 'dash')
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH else:
for f in formats: formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token): for sub in traverse_obj(subs, (..., ..., {dict})):
f['filesize'] = int_or_none(self._search_regex( # TODO: If DASH video requires a PO Token, do the subs also require pot?
r'/clen/(\d+)', f.get('fragment_base_url') or f['url'], 'file size', default=None)) # Save client name for debugging
if needs_live_processing: sub[STREAMING_DATA_CLIENT_NAME] = client_name
f['is_from_start'] = True subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
for f in formats:
if process_manifest_format(f, 'dash', client_name, f['format_id'], require_po_token and not po_token):
f['filesize'] = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url') or f['url'], 'file size', default=None))
if needs_live_processing:
f['is_from_start'] = True
yield f yield f
yield subtitles yield subtitles
def _extract_storyboard(self, player_responses, duration): def _extract_storyboard(self, player_responses, duration):
@@ -3668,22 +3719,22 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
} for j in range(math.ceil(fragment_count))], } for j in range(math.ceil(fragment_count))],
} }
def _download_player_responses(self, url, smuggled_data, video_id, webpage_url): def _download_initial_webpage(self, webpage_url, webpage_client, video_id):
webpage = None webpage = None
if 'webpage' not in self._configuration_arg('player_skip'): if webpage_url and 'webpage' not in self._configuration_arg('player_skip'):
query = {'bpctr': '9999999999', 'has_verified': '1'} query = {'bpctr': '9999999999', 'has_verified': '1'}
pp = self._configuration_arg('player_params', [None], casesense=True)[0] pp = (
self._configuration_arg('player_params', [None], casesense=True)[0]
or traverse_obj(INNERTUBE_CLIENTS, (webpage_client, 'PLAYER_PARAMS', {str}))
)
if pp: if pp:
query['pp'] = pp query['pp'] = pp
webpage = self._download_webpage_with_retries(webpage_url, video_id, query=query) webpage = self._download_webpage_with_retries(
webpage_url, video_id, query=query,
master_ytcfg = self.extract_ytcfg(video_id, webpage) or self._get_default_ytcfg() headers=traverse_obj(self._get_default_ytcfg(webpage_client), {
'User-Agent': ('INNERTUBE_CONTEXT', 'client', 'userAgent', {str}),
player_responses, player_url = self._extract_player_responses( }))
self._get_requested_clients(url, smuggled_data), return webpage
video_id, webpage, master_ytcfg, smuggled_data)
return webpage, master_ytcfg, player_responses, player_url
def _list_formats(self, video_id, microformats, video_details, player_responses, player_url, duration=None): def _list_formats(self, video_id, microformats, video_details, player_responses, player_url, duration=None):
live_broadcast_details = traverse_obj(microformats, (..., 'liveBroadcastDetails')) live_broadcast_details = traverse_obj(microformats, (..., 'liveBroadcastDetails'))
@@ -3708,14 +3759,60 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return live_broadcast_details, live_status, streaming_data, formats, subtitles return live_broadcast_details, live_status, streaming_data, formats, subtitles
def _download_initial_data(self, video_id, webpage, webpage_client, webpage_ytcfg):
initial_data = None
if webpage and 'initial_data' not in self._configuration_arg('webpage_skip'):
initial_data = self.extract_yt_initial_data(video_id, webpage, fatal=False)
if not traverse_obj(initial_data, 'contents'):
self.report_warning('Incomplete data received in embedded initial data; re-fetching using API.')
initial_data = None
if not initial_data and 'initial_data' not in self._configuration_arg('player_skip'):
query = {'videoId': video_id}
query.update(self._get_checkok_params())
initial_data = self._extract_response(
item_id=video_id, ep='next', fatal=False,
ytcfg=webpage_ytcfg, query=query, check_get_keys='contents',
note='Downloading initial data API JSON', default_client=webpage_client)
return initial_data
def _is_premium_subscriber(self, initial_data):
if not self.is_authenticated or not initial_data:
return False
tlr = traverse_obj(
initial_data, ('topbar', 'desktopTopbarRenderer', 'logo', 'topbarLogoRenderer'))
return (
traverse_obj(tlr, ('iconImage', 'iconType')) == 'YOUTUBE_PREMIUM_LOGO'
or 'premium' in (self._get_text(tlr, 'tooltipText') or '').lower()
)
def _initial_extract(self, url, smuggled_data, webpage_url, webpage_client, video_id):
# This function is also used by live-from-start refresh
webpage = self._download_initial_webpage(webpage_url, webpage_client, video_id)
webpage_ytcfg = self.extract_ytcfg(video_id, webpage) or self._get_default_ytcfg(webpage_client)
initial_data = self._download_initial_data(video_id, webpage, webpage_client, webpage_ytcfg)
is_premium_subscriber = self._is_premium_subscriber(initial_data)
if is_premium_subscriber:
self.write_debug('Detected YouTube Premium subscription')
player_responses, player_url = self._extract_player_responses(
self._get_requested_clients(url, smuggled_data, is_premium_subscriber),
video_id, webpage, webpage_client, webpage_ytcfg, is_premium_subscriber)
return webpage, webpage_ytcfg, initial_data, is_premium_subscriber, player_responses, player_url
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
base_url = self.http_scheme() + '//www.youtube.com/' base_url = self.http_scheme() + '//www.youtube.com/'
webpage_url = base_url + 'watch?v=' + video_id webpage_url = base_url + 'watch?v=' + video_id
webpage_client = 'web'
webpage, master_ytcfg, player_responses, player_url = self._download_player_responses(url, smuggled_data, video_id, webpage_url) webpage, webpage_ytcfg, initial_data, is_premium_subscriber, player_responses, player_url = self._initial_extract(
url, smuggled_data, webpage_url, webpage_client, video_id)
playability_statuses = traverse_obj( playability_statuses = traverse_obj(
player_responses, (..., 'playabilityStatus'), expected_type=dict) player_responses, (..., 'playabilityStatus'), expected_type=dict)
@@ -3952,11 +4049,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def process_language(container, base_url, lang_code, sub_name, client_name, query): def process_language(container, base_url, lang_code, sub_name, client_name, query):
lang_subs = container.setdefault(lang_code, []) lang_subs = container.setdefault(lang_code, [])
for fmt in self._SUBTITLE_FORMATS: for fmt in self._SUBTITLE_FORMATS:
query = {**query, 'fmt': fmt} # xosf=1 results in undesirable text position data for vtt, json3 & srv* subtitles
# See: https://github.com/yt-dlp/yt-dlp/issues/13654
query = {**query, 'fmt': fmt, 'xosf': []}
lang_subs.append({ lang_subs.append({
'ext': fmt, 'ext': fmt,
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)), 'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
'name': sub_name, 'name': sub_name,
'impersonate': True,
STREAMING_DATA_CLIENT_NAME: client_name, STREAMING_DATA_CLIENT_NAME: client_name,
}) })
@@ -3988,7 +4088,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
pctr = pr['captions']['playerCaptionsTracklistRenderer'] pctr = pr['captions']['playerCaptionsTracklistRenderer']
client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME] client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME]
innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName'] innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName']
required_contexts = self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS'] pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(client_name)['SUBS_PO_TOKEN_POLICY']
fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN] fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN]
pot_params = {} pot_params = {}
@@ -4001,11 +4101,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
requires_pot = ( requires_pot = (
# We can detect the experiment for now # We can detect the experiment for now
any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv')) any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv'))
or _PoTokenContext.SUBS in required_contexts) or (pot_policy.required and not (pot_policy.not_required_for_premium and is_premium_subscriber)))
if not already_fetched_pot: if not already_fetched_pot:
already_fetched_pot = True already_fetched_pot = True
if subs_po_token := fetch_subs_po_token_func(required=requires_pot): if subs_po_token := fetch_subs_po_token_func(required=requires_pot or pot_policy.recommended):
pot_params.update({ pot_params.update({
'pot': subs_po_token, 'pot': subs_po_token,
'potc': '1', 'potc': '1',
@@ -4108,21 +4208,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'release_year': int_or_none(release_year), 'release_year': int_or_none(release_year),
}) })
initial_data = None
if webpage:
initial_data = self.extract_yt_initial_data(video_id, webpage, fatal=False)
if not traverse_obj(initial_data, 'contents'):
self.report_warning('Incomplete data received in embedded initial data; re-fetching using API.')
initial_data = None
if not initial_data and 'initial_data' not in self._configuration_arg('player_skip'):
query = {'videoId': video_id}
query.update(self._get_checkok_params())
initial_data = self._extract_response(
item_id=video_id, ep='next', fatal=False,
ytcfg=master_ytcfg, query=query, check_get_keys='contents',
headers=self.generate_api_headers(ytcfg=master_ytcfg),
note='Downloading initial data API JSON')
COMMENTS_SECTION_IDS = ('comment-item-section', 'engagement-panel-comments-section') COMMENTS_SECTION_IDS = ('comment-item-section', 'engagement-panel-comments-section')
info['comment_count'] = traverse_obj(initial_data, ( info['comment_count'] = traverse_obj(initial_data, (
'contents', 'twoColumnWatchNextResults', 'results', 'results', 'contents', ..., 'itemSectionRenderer', 'contents', 'twoColumnWatchNextResults', 'results', 'results', 'contents', ..., 'itemSectionRenderer',
@@ -4321,7 +4406,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._has_badge(badges, BadgeType.AVAILABILITY_UNLISTED) self._has_badge(badges, BadgeType.AVAILABILITY_UNLISTED)
or get_first(microformats, 'isUnlisted', expected_type=bool)))) or get_first(microformats, 'isUnlisted', expected_type=bool))))
info['__post_extractor'] = self.extract_comments(master_ytcfg, video_id, contents, webpage) info['__post_extractor'] = self.extract_comments(webpage_ytcfg, video_id, contents, webpage)
self.mark_watched(video_id, player_responses) self.mark_watched(video_id, player_responses)

View File

@@ -222,6 +222,14 @@ class LocalNameSpace(collections.ChainMap):
def __delitem__(self, key): def __delitem__(self, key):
raise NotImplementedError('Deleting is not supported') raise NotImplementedError('Deleting is not supported')
def set_local(self, key, value):
self.maps[0][key] = value
def get_local(self, key):
if key in self.maps[0]:
return self.maps[0][key]
return JS_Undefined
class Debugger: class Debugger:
import sys import sys
@@ -271,6 +279,7 @@ class JSInterpreter:
def __init__(self, code, objects=None): def __init__(self, code, objects=None):
self.code, self._functions = code, {} self.code, self._functions = code, {}
self._objects = {} if objects is None else objects self._objects = {} if objects is None else objects
self._undefined_varnames = set()
class Exception(ExtractorError): # noqa: A001 class Exception(ExtractorError): # noqa: A001
def __init__(self, msg, expr=None, *args, **kwargs): def __init__(self, msg, expr=None, *args, **kwargs):
@@ -381,7 +390,7 @@ class JSInterpreter:
return self._named_object(namespace, obj) return self._named_object(namespace, obj)
@Debugger.wrap_interpreter @Debugger.wrap_interpreter
def interpret_statement(self, stmt, local_vars, allow_recursion=100): def interpret_statement(self, stmt, local_vars, allow_recursion=100, _is_var_declaration=False):
if allow_recursion < 0: if allow_recursion < 0:
raise self.Exception('Recursion limit reached') raise self.Exception('Recursion limit reached')
allow_recursion -= 1 allow_recursion -= 1
@@ -401,6 +410,7 @@ class JSInterpreter:
if m.group('throw'): if m.group('throw'):
raise JS_Throw(self.interpret_expression(expr, local_vars, allow_recursion)) raise JS_Throw(self.interpret_expression(expr, local_vars, allow_recursion))
should_return = not m.group('var') should_return = not m.group('var')
_is_var_declaration = _is_var_declaration or bool(m.group('var'))
if not expr: if not expr:
return None, should_return return None, should_return
@@ -585,7 +595,8 @@ class JSInterpreter:
sub_expressions = list(self._separate(expr)) sub_expressions = list(self._separate(expr))
if len(sub_expressions) > 1: if len(sub_expressions) > 1:
for sub_expr in sub_expressions: for sub_expr in sub_expressions:
ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion) ret, should_abort = self.interpret_statement(
sub_expr, local_vars, allow_recursion, _is_var_declaration=_is_var_declaration)
if should_abort: if should_abort:
return ret, True return ret, True
return ret, False return ret, False
@@ -599,8 +610,12 @@ class JSInterpreter:
left_val = local_vars.get(m.group('out')) left_val = local_vars.get(m.group('out'))
if not m.group('index'): if not m.group('index'):
local_vars[m.group('out')] = self._operator( eval_result = self._operator(
m.group('op'), left_val, m.group('expr'), expr, local_vars, allow_recursion) m.group('op'), left_val, m.group('expr'), expr, local_vars, allow_recursion)
if _is_var_declaration:
local_vars.set_local(m.group('out'), eval_result)
else:
local_vars[m.group('out')] = eval_result
return local_vars[m.group('out')], should_return return local_vars[m.group('out')], should_return
elif left_val in (None, JS_Undefined): elif left_val in (None, JS_Undefined):
raise self.Exception(f'Cannot index undefined variable {m.group("out")}', expr) raise self.Exception(f'Cannot index undefined variable {m.group("out")}', expr)
@@ -654,7 +669,19 @@ class JSInterpreter:
return float('NaN'), should_return return float('NaN'), should_return
elif m and m.group('return'): elif m and m.group('return'):
return local_vars.get(m.group('name'), JS_Undefined), should_return var = m.group('name')
# Declared variables
if _is_var_declaration:
ret = local_vars.get_local(var)
# Register varname in local namespace
# Set value as JS_Undefined or its pre-existing value
local_vars.set_local(var, ret)
else:
ret = local_vars.get(var, NO_DEFAULT)
if ret is NO_DEFAULT:
ret = JS_Undefined
self._undefined_varnames.add(var)
return ret, should_return
with contextlib.suppress(ValueError): with contextlib.suppress(ValueError):
return json.loads(js_to_json(expr, strict=True)), should_return return json.loads(js_to_json(expr, strict=True)), should_return
@@ -857,7 +884,7 @@ class JSInterpreter:
obj = {} obj = {}
obj_m = re.search( obj_m = re.search(
r'''(?x) r'''(?x)
(?<!\.)%s\s*=\s*{\s* (?<![a-zA-Z$0-9.])%s\s*=\s*{\s*
(?P<fields>(%s\s*:\s*function\s*\(.*?\)\s*{.*?}(?:,\s*)?)*) (?P<fields>(%s\s*:\s*function\s*\(.*?\)\s*{.*?}(?:,\s*)?)*)
}\s*; }\s*;
''' % (re.escape(objname), _FUNC_NAME_RE), ''' % (re.escape(objname), _FUNC_NAME_RE),

View File

@@ -140,6 +140,12 @@ class RequestsResponseAdapter(Response):
def read(self, amt: int | None = None): def read(self, amt: int | None = None):
try: try:
# Work around issue with `.read(amt)` then `.read()`
# See: https://github.com/urllib3/urllib3/issues/3636
if amt is None:
# Python 3.9 preallocates the whole read buffer, read in chunks
read_chunk = functools.partial(self.fp.read, 1 << 20, decode_content=True)
return b''.join(iter(read_chunk, b''))
# Interact with urllib3 response directly. # Interact with urllib3 response directly.
return self.fp.read(amt, decode_content=True) return self.fp.read(amt, decode_content=True)
@@ -307,7 +313,7 @@ class RequestsRH(RequestHandler, InstanceStoreMixin):
max_retries=urllib3.util.retry.Retry(False), max_retries=urllib3.util.retry.Retry(False),
) )
session.adapters.clear() session.adapters.clear()
session.headers = requests.models.CaseInsensitiveDict({'Connection': 'keep-alive'}) session.headers = requests.models.CaseInsensitiveDict()
session.mount('https://', http_adapter) session.mount('https://', http_adapter)
session.mount('http://', http_adapter) session.mount('http://', http_adapter)
session.cookies = cookiejar session.cookies = cookiejar
@@ -316,6 +322,7 @@ class RequestsRH(RequestHandler, InstanceStoreMixin):
def _prepare_headers(self, _, headers): def _prepare_headers(self, _, headers):
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS) add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
headers.setdefault('Connection', 'keep-alive')
def _send(self, request): def _send(self, request):

View File

@@ -529,14 +529,14 @@ def create_parser():
'no-attach-info-json', 'embed-thumbnail-atomicparsley', 'no-external-downloader-progress', 'no-attach-info-json', 'embed-thumbnail-atomicparsley', 'no-external-downloader-progress',
'embed-metadata', 'seperate-video-versions', 'no-clean-infojson', 'no-keep-subs', 'no-certifi', 'embed-metadata', 'seperate-video-versions', 'no-clean-infojson', 'no-keep-subs', 'no-certifi',
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-youtube-prefer-utc-upload-date', 'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-youtube-prefer-utc-upload-date',
'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext', 'prefer-vp9-sort', 'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext', 'prefer-vp9-sort', 'mtime-by-default',
}, 'aliases': { }, 'aliases': {
'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'], 'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'],
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'], 'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'],
'2021': ['2022', 'no-certifi', 'filename-sanitization'], '2021': ['2022', 'no-certifi', 'filename-sanitization'],
'2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'], '2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'],
'2023': ['2024', 'prefer-vp9-sort'], '2023': ['2024', 'prefer-vp9-sort'],
'2024': [], '2024': ['mtime-by-default'],
}, },
}, help=( }, help=(
'Options that can help keep compatibility with youtube-dl or youtube-dlc ' 'Options that can help keep compatibility with youtube-dl or youtube-dlc '
@@ -1466,12 +1466,12 @@ def create_parser():
help='Do not use .part files - write directly into output file') help='Do not use .part files - write directly into output file')
filesystem.add_option( filesystem.add_option(
'--mtime', '--mtime',
action='store_true', dest='updatetime', default=True, action='store_true', dest='updatetime', default=None,
help='Use the Last-modified header to set the file modification time (default)') help='Use the Last-modified header to set the file modification time')
filesystem.add_option( filesystem.add_option(
'--no-mtime', '--no-mtime',
action='store_false', dest='updatetime', action='store_false', dest='updatetime',
help='Do not use the Last-modified header to set the file modification time') help='Do not use the Last-modified header to set the file modification time (default)')
filesystem.add_option( filesystem.add_option(
'--write-description', '--write-description',
action='store_true', dest='writedescription', default=False, action='store_true', dest='writedescription', default=False,

View File

@@ -18,7 +18,7 @@ class ExecPP(PostProcessor):
if filepath: if filepath:
if '{}' not in cmd: if '{}' not in cmd:
cmd += ' {}' cmd += ' {}'
cmd = cmd.replace('{}', shell_quote(filepath)) cmd = cmd.replace('{}', shell_quote(filepath, shell=True))
return cmd return cmd
def run(self, info): def run(self, info):

View File

@@ -1285,7 +1285,7 @@ def unified_timestamp(date_str, day_first=True):
timetuple = email.utils.parsedate_tz(date_str) timetuple = email.utils.parsedate_tz(date_str)
if timetuple: if timetuple:
return calendar.timegm(timetuple) + pm_delta * 3600 - timezone.total_seconds() return calendar.timegm(timetuple) + pm_delta * 3600 - int(timezone.total_seconds())
@partial_application @partial_application
@@ -2961,6 +2961,7 @@ def mimetype2ext(mt, default=NO_DEFAULT):
'audio/x-matroska': 'mka', 'audio/x-matroska': 'mka',
'audio/x-mpegurl': 'm3u', 'audio/x-mpegurl': 'm3u',
'aacp': 'aac', 'aacp': 'aac',
'flac': 'flac',
'midi': 'mid', 'midi': 'mid',
'ogg': 'ogg', 'ogg': 'ogg',
'wav': 'wav', 'wav': 'wav',
@@ -3105,21 +3106,15 @@ def get_compatible_ext(*, vcodecs, acodecs, vexts, aexts, preferences=None):
def urlhandle_detect_ext(url_handle, default=NO_DEFAULT): def urlhandle_detect_ext(url_handle, default=NO_DEFAULT):
getheader = url_handle.headers.get getheader = url_handle.headers.get
cd = getheader('Content-Disposition') if cd := getheader('Content-Disposition'):
if cd: if m := re.match(r'attachment;\s*filename="(?P<filename>[^"]+)"', cd):
m = re.match(r'attachment;\s*filename="(?P<filename>[^"]+)"', cd) if ext := determine_ext(m.group('filename'), default_ext=None):
if m: return ext
e = determine_ext(m.group('filename'), default_ext=None)
if e:
return e
meta_ext = getheader('x-amz-meta-name') return (
if meta_ext: determine_ext(getheader('x-amz-meta-name'), default_ext=None)
e = meta_ext.rpartition('.')[2] or getheader('x-amz-meta-file-type')
if e: or mimetype2ext(getheader('Content-Type'), default=default))
return e
return mimetype2ext(getheader('Content-Type'), default=default)
def encode_data_uri(data, mime_type): def encode_data_uri(data, mime_type):

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2025.06.25' __version__ = '2025.07.21'
RELEASE_GIT_HEAD = '1838a1ce5d4ade80770ba9162eaffc9a1607dc70' RELEASE_GIT_HEAD = '9951fdd0d08b655cb1af8cd7f32a3fb7e2b1324e'
VARIANT = None VARIANT = None
@@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.06.25' _pkg_version = '2025.07.21'