1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-06-27 08:58:30 +00:00

Merge branch 'master' into niconico_error

This commit is contained in:
doe1080 2025-06-27 06:30:06 +09:00 committed by GitHub
commit eb80d73363
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
87 changed files with 3197 additions and 1534 deletions

View File

@ -256,7 +256,7 @@ jobs:
with: with:
path: | path: |
~/yt-dlp-build-venv ~/yt-dlp-build-venv
key: cache-reqs-${{ github.job }} key: cache-reqs-${{ github.job }}-${{ github.ref }}
- name: Install Requirements - name: Install Requirements
run: | run: |
@ -331,19 +331,16 @@ jobs:
if: steps.restore-cache.outputs.cache-hit == 'true' if: steps.restore-cache.outputs.cache-hit == 'true'
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
cache_key: cache-reqs-${{ github.job }} cache_key: cache-reqs-${{ github.job }}-${{ github.ref }}
repository: ${{ github.repository }}
branch: ${{ github.ref }}
run: | run: |
gh extension install actions/gh-actions-cache gh cache delete "${cache_key}"
gh actions-cache delete "${cache_key}" -R "${repository}" -B "${branch}" --confirm
- name: Cache requirements - name: Cache requirements
uses: actions/cache/save@v4 uses: actions/cache/save@v4
with: with:
path: | path: |
~/yt-dlp-build-venv ~/yt-dlp-build-venv
key: cache-reqs-${{ github.job }} key: cache-reqs-${{ github.job }}-${{ github.ref }}
macos_legacy: macos_legacy:
needs: process needs: process

2
.gitignore vendored
View File

@ -105,6 +105,8 @@ README.txt
*.zsh *.zsh
*.spec *.spec
test/testdata/sigs/player-*.js test/testdata/sigs/player-*.js
test/testdata/thumbnails/empty.webp
test/testdata/thumbnails/foo\ %d\ bar/foo_%d.*
# Binary # Binary
/youtube-dl /youtube-dl

View File

@ -770,3 +770,14 @@ NeonMan
pj47x pj47x
troex troex
WouterGordts WouterGordts
baierjan
GeoffreyFrogeye
Pawka
v3DJG6GL
yozel
brian6932
iednod55
maxbin123
nullpos
anlar
eason1478

View File

@ -4,6 +4,126 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.06.25
#### Extractor changes
- [Add `_search_nuxt_json` helper](https://github.com/yt-dlp/yt-dlp/commit/51887484e46ab6015c041cb1ab626a55f25a03bd) ([#13386](https://github.com/yt-dlp/yt-dlp/issues/13386)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- **brightcove**: new: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/e6bd4a3da295b760ab20b39c18ce8934d312c2bf) ([#13461](https://github.com/yt-dlp/yt-dlp/issues/13461)) by [doe1080](https://github.com/doe1080)
- **huya**: live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2600849badb0d08c55b58dcc77a13af6ba423da6) ([#13520](https://github.com/yt-dlp/yt-dlp/issues/13520)) by [doe1080](https://github.com/doe1080)
- **hypergryph**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/1722c55400ff30bb5aee5dd7a262f0b7e9ce2f0e) ([#13415](https://github.com/yt-dlp/yt-dlp/issues/13415)) by [doe1080](https://github.com/doe1080), [eason1478](https://github.com/eason1478)
- **lsm**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/c57412d1f9cf0124adc972a47858ac42b740c61d) ([#13126](https://github.com/yt-dlp/yt-dlp/issues/13126)) by [Caesim404](https://github.com/Caesim404)
- **mave**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1838a1ce5d4ade80770ba9162eaffc9a1607dc70) ([#13380](https://github.com/yt-dlp/yt-dlp/issues/13380)) by [anlar](https://github.com/anlar)
- **sportdeutschland**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/a4ce4327c9836691d3b6b00e44a90b6741601ed8) ([#13519](https://github.com/yt-dlp/yt-dlp/issues/13519)) by [DTrombett](https://github.com/DTrombett)
- **sproutvideo**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/5b559d0072b7164daf06bacdc41c6f11283452c8) ([#13544](https://github.com/yt-dlp/yt-dlp/issues/13544)) by [bashonly](https://github.com/bashonly)
- **tv8.it**: [Support slugless URLs](https://github.com/yt-dlp/yt-dlp/commit/3bd30291601c47fa4a257983473884103ecab0c7) ([#13478](https://github.com/yt-dlp/yt-dlp/issues/13478)) by [DTrombett](https://github.com/DTrombett)
- **youtube**
- [Check any `ios` m3u8 formats prior to download](https://github.com/yt-dlp/yt-dlp/commit/8f94b76cbf7bbd9dfd8762c63cdea04f90f1297f) ([#13524](https://github.com/yt-dlp/yt-dlp/issues/13524)) by [bashonly](https://github.com/bashonly)
- [Improve player context payloads](https://github.com/yt-dlp/yt-dlp/commit/ff6f94041aeee19c5559e1c1cd693960a1c1dd14) ([#13539](https://github.com/yt-dlp/yt-dlp/issues/13539)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **test**: `traversal`: [Fix morsel tests for Python 3.14](https://github.com/yt-dlp/yt-dlp/commit/73bf10211668e4a59ccafd790e06ee82d9fea9ea) ([#13471](https://github.com/yt-dlp/yt-dlp/issues/13471)) by [Grub4K](https://github.com/Grub4K)
### 2025.06.09
#### Extractor changes
- [Improve JSON LD thumbnails extraction](https://github.com/yt-dlp/yt-dlp/commit/85c8a405e3651dc041b758f4744d4fb3c4c55e01) ([#13368](https://github.com/yt-dlp/yt-dlp/issues/13368)) by [bashonly](https://github.com/bashonly), [doe1080](https://github.com/doe1080)
- **10play**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6d265388c6e943419ac99e9151cf75a3265f980f) ([#13349](https://github.com/yt-dlp/yt-dlp/issues/13349)) by [bashonly](https://github.com/bashonly)
- **adobepass**
- [Add Fubo MSO](https://github.com/yt-dlp/yt-dlp/commit/eee90acc47d7f8de24afaa8b0271ccaefdf6e88c) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [maxbin123](https://github.com/maxbin123)
- [Always add newer user-agent when required](https://github.com/yt-dlp/yt-dlp/commit/0ee1102268cf31b07f8a8318a47424c66b2f7378) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- [Fix Philo MSO authentication](https://github.com/yt-dlp/yt-dlp/commit/943083edcd3df45aaa597a6967bc6c95b720f54c) ([#13335](https://github.com/yt-dlp/yt-dlp/issues/13335)) by [Sipherdrakon](https://github.com/Sipherdrakon)
- [Rework to require software statement](https://github.com/yt-dlp/yt-dlp/commit/711c5d5d098fee2992a1a624b1c4b30364b91426) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly), [maxbin123](https://github.com/maxbin123)
- [Validate login URL before sending credentials](https://github.com/yt-dlp/yt-dlp/commit/89c1b349ad81318d9d3bea76c01c891696e58d38) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **aenetworks**
- [Fix playlist extractors](https://github.com/yt-dlp/yt-dlp/commit/f37d599a697e82fe68b423865897d55bae34f373) ([#13408](https://github.com/yt-dlp/yt-dlp/issues/13408)) by [Sipherdrakon](https://github.com/Sipherdrakon)
- [Fix provider-locked content extraction](https://github.com/yt-dlp/yt-dlp/commit/6693d6603358ae6beca834dbd822a7917498b813) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [maxbin123](https://github.com/maxbin123)
- **bilibilibangumi**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/13e55162719528d42d2133e16b65ff59a667a6e4) ([#13416](https://github.com/yt-dlp/yt-dlp/issues/13416)) by [c-basalt](https://github.com/c-basalt)
- **brightcove**: new: [Adapt to new AdobePass requirement](https://github.com/yt-dlp/yt-dlp/commit/98f8eec956e3b16cb66a3d49cc71af3807db795e) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **cu.ntv.co.jp**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/aa863ddab9b1d104678e9cf39bb76f5b14fca660) ([#13302](https://github.com/yt-dlp/yt-dlp/issues/13302)) by [doe1080](https://github.com/doe1080), [nullpos](https://github.com/nullpos)
- **go**: [Fix provider-locked content extraction](https://github.com/yt-dlp/yt-dlp/commit/2e5bf002dad16f5ce35aa2023d392c9e518fcd8f) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly), [maxbin123](https://github.com/maxbin123)
- **nbc**: [Rework and adapt extractors to new AdobePass flow](https://github.com/yt-dlp/yt-dlp/commit/2d7949d5642bc37d1e71bf00c9a55260e5505d58) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **nobelprize**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/97ddfefeb4faba6e61cd80996c16952b8eab16f3) ([#13205](https://github.com/yt-dlp/yt-dlp/issues/13205)) by [doe1080](https://github.com/doe1080)
- **odnoklassniki**: [Detect and raise when login is required](https://github.com/yt-dlp/yt-dlp/commit/148a1eb4c59e127965396c7a6e6acf1979de459e) ([#13361](https://github.com/yt-dlp/yt-dlp/issues/13361)) by [bashonly](https://github.com/bashonly)
- **patreon**: [Fix m3u8 formats extraction](https://github.com/yt-dlp/yt-dlp/commit/e0d6c0822930f6e63f574d46d946a58b73ecd10c) ([#13266](https://github.com/yt-dlp/yt-dlp/issues/13266)) by [bashonly](https://github.com/bashonly) (With fixes in [1a8a03e](https://github.com/yt-dlp/yt-dlp/commit/1a8a03ea8d827107319a18076ee3505090667c5a))
- **podchaser**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/538eb305673c26bff6a2b12f1c96375fe02ce41a) ([#13271](https://github.com/yt-dlp/yt-dlp/issues/13271)) by [bashonly](https://github.com/bashonly)
- **sr**: mediathek: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/e3c605a61f4cc2de9059f37434fa108c3c20f58e) ([#13294](https://github.com/yt-dlp/yt-dlp/issues/13294)) by [doe1080](https://github.com/doe1080)
- **stacommu**: [Avoid partial stream formats](https://github.com/yt-dlp/yt-dlp/commit/5d96527be80dc1ed1702d9cd548ff86de570ad70) ([#13412](https://github.com/yt-dlp/yt-dlp/issues/13412)) by [bashonly](https://github.com/bashonly)
- **startrek**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/a8bf0011bde92b3f1324a98bfbd38932fd3ebe18) ([#13188](https://github.com/yt-dlp/yt-dlp/issues/13188)) by [doe1080](https://github.com/doe1080)
- **svt**: play: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e1b6062f8c4a3fa33c65269d48d09ec78de765a2) ([#13329](https://github.com/yt-dlp/yt-dlp/issues/13329)) by [barsnick](https://github.com/barsnick), [bashonly](https://github.com/bashonly)
- **telecinco**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/03dba2012d9bd3f402fa8c2f122afba89bbd22a4) ([#13379](https://github.com/yt-dlp/yt-dlp/issues/13379)) by [bashonly](https://github.com/bashonly)
- **theplatform**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/ed108b3ea481c6a4b5215a9302ba92d74baa2425) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **toutiao**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/f8051e3a61686c5db1de5f5746366ecfbc3ad20c) ([#13246](https://github.com/yt-dlp/yt-dlp/issues/13246)) by [doe1080](https://github.com/doe1080)
- **turner**: [Adapt extractors to new AdobePass flow](https://github.com/yt-dlp/yt-dlp/commit/0daddc780d3ac5bebc3a3ec5b884d9243cbc0745) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **twitcasting**: [Fix password-protected livestream support](https://github.com/yt-dlp/yt-dlp/commit/52f9729c9a92ad4656d746ff0b1acecb87b3e96d) ([#13097](https://github.com/yt-dlp/yt-dlp/issues/13097)) by [bashonly](https://github.com/bashonly)
- **twitter**: broadcast: [Support events URLs](https://github.com/yt-dlp/yt-dlp/commit/7794374de8afb20499b023107e2abfd4e6b93ee4) ([#13248](https://github.com/yt-dlp/yt-dlp/issues/13248)) by [doe1080](https://github.com/doe1080)
- **umg**: de: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/4e7c1ea346b510280218b47e8653dbbca3a69870) ([#13373](https://github.com/yt-dlp/yt-dlp/issues/13373)) by [doe1080](https://github.com/doe1080)
- **vice**: [Mark extractors as broken](https://github.com/yt-dlp/yt-dlp/commit/6121559e027a04574690799c1776bc42bb51af31) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Extract subtitles from player subdomain](https://github.com/yt-dlp/yt-dlp/commit/c723c4e5e78263df178dbe69844a3d05f3ef9e35) ([#13350](https://github.com/yt-dlp/yt-dlp/issues/13350)) by [bashonly](https://github.com/bashonly)
- **watchespn**: [Fix provider-locked content extraction](https://github.com/yt-dlp/yt-dlp/commit/b094747e93cfb0a2c53007120e37d0d84d41f030) ([#13131](https://github.com/yt-dlp/yt-dlp/issues/13131)) by [maxbin123](https://github.com/maxbin123)
- **weverse**: [Support login with oauth refresh tokens](https://github.com/yt-dlp/yt-dlp/commit/3fe72e9eea38d9a58211cde42cfaa577ce020e2c) ([#13284](https://github.com/yt-dlp/yt-dlp/issues/13284)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Add `tv_simply` player client](https://github.com/yt-dlp/yt-dlp/commit/1fd0e88b67db53ad163393d6965f68e908fa70e3) ([#13389](https://github.com/yt-dlp/yt-dlp/issues/13389)) by [gamer191](https://github.com/gamer191)
- [Extract srt subtitles](https://github.com/yt-dlp/yt-dlp/commit/231349786e8c42089c2e079ec94c0ea866c37999) ([#13411](https://github.com/yt-dlp/yt-dlp/issues/13411)) by [gamer191](https://github.com/gamer191)
- [Fix `--mark-watched` support](https://github.com/yt-dlp/yt-dlp/commit/b5be29fa58ec98226e11621fd9c58585bcff6879) ([#13222](https://github.com/yt-dlp/yt-dlp/issues/13222)) by [brian6932](https://github.com/brian6932), [iednod55](https://github.com/iednod55)
- [Fix automatic captions for some client combinations](https://github.com/yt-dlp/yt-dlp/commit/53ea743a9c158f8ca2d75a09ca44ba68606042d8) ([#13268](https://github.com/yt-dlp/yt-dlp/issues/13268)) by [bashonly](https://github.com/bashonly)
- [Improve signature extraction debug output](https://github.com/yt-dlp/yt-dlp/commit/d30a49742cfa22e61c47df4ac0e7334d648fb85d) ([#13327](https://github.com/yt-dlp/yt-dlp/issues/13327)) by [bashonly](https://github.com/bashonly)
- [Rework nsig function name extraction](https://github.com/yt-dlp/yt-dlp/commit/9e38b273b7ac942e7e9fc05a651ed810ab7d30ba) ([#13403](https://github.com/yt-dlp/yt-dlp/issues/13403)) by [Grub4K](https://github.com/Grub4K)
- [nsig code improvements and cleanup](https://github.com/yt-dlp/yt-dlp/commit/f7bbf5a617f9ab54ef51eaef99be36e175b5e9c3) ([#13280](https://github.com/yt-dlp/yt-dlp/issues/13280)) by [bashonly](https://github.com/bashonly)
- **zdf**: [Fix language extraction and format sorting](https://github.com/yt-dlp/yt-dlp/commit/db162b76f6bdece50babe2e0cacfe56888c2e125) ([#13313](https://github.com/yt-dlp/yt-dlp/issues/13313)) by [InvalidUsernameException](https://github.com/InvalidUsernameException)
#### Misc. changes
- **build**
- [Exclude `pkg_resources` from being collected](https://github.com/yt-dlp/yt-dlp/commit/cc749a8a3b8b6e5c05318868c72a403f376a1b38) ([#13320](https://github.com/yt-dlp/yt-dlp/issues/13320)) by [bashonly](https://github.com/bashonly)
- [Fix macOS requirements caching](https://github.com/yt-dlp/yt-dlp/commit/201812100f315c6727a4418698d5b4e8a79863d4) ([#13328](https://github.com/yt-dlp/yt-dlp/issues/13328)) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [339614a](https://github.com/yt-dlp/yt-dlp/commit/339614a173c74b42d63e858c446a9cae262a13af) by [bashonly](https://github.com/bashonly)
- **test**: postprocessors: [Remove binary thumbnail test data](https://github.com/yt-dlp/yt-dlp/commit/a9b370069838e84d44ac7ad095d657003665885a) ([#13341](https://github.com/yt-dlp/yt-dlp/issues/13341)) by [bashonly](https://github.com/bashonly)
### 2025.05.22
#### Core changes
- **cookies**: [Fix Linux desktop environment detection](https://github.com/yt-dlp/yt-dlp/commit/e491fd4d090db3af52a82863fb0553dd5e17fb85) ([#13197](https://github.com/yt-dlp/yt-dlp/issues/13197)) by [mbway](https://github.com/mbway)
- **jsinterp**: [Fix increment/decrement evaluation](https://github.com/yt-dlp/yt-dlp/commit/167d7a9f0ffd1b4fe600193441bdb7358db2740b) ([#13238](https://github.com/yt-dlp/yt-dlp/issues/13238)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
#### Extractor changes
- **1tv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/41c0a1fb89628696f8bb88e2b9f3a68f355b8c26) ([#13168](https://github.com/yt-dlp/yt-dlp/issues/13168)) by [bashonly](https://github.com/bashonly)
- **amcnetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/464c84fedf78eef822a431361155f108b5df96d7) ([#13147](https://github.com/yt-dlp/yt-dlp/issues/13147)) by [bashonly](https://github.com/bashonly)
- **bitchute**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1d0f6539c47e5d5c68c3c47cdb7075339e2885ac) ([#13081](https://github.com/yt-dlp/yt-dlp/issues/13081)) by [bashonly](https://github.com/bashonly)
- **cartoonnetwork**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/7dbb47f84f0ee1266a3a01f58c9bc4c76d76794a) ([#13148](https://github.com/yt-dlp/yt-dlp/issues/13148)) by [bashonly](https://github.com/bashonly)
- **iprima**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/a7d9a5eb79ceeecb851389f3f2c88597871ca3f2) ([#12937](https://github.com/yt-dlp/yt-dlp/issues/12937)) by [baierjan](https://github.com/baierjan)
- **jiosaavn**
- artist: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/586b557b124f954d3f625360ebe970989022ad97) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- playlist, show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/317f4b8006c2c0f0f64f095b1485163ad97c9053) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/6839276496d8814cf16f58b637e45663467928e6) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- **lrtradio**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/abf58dcd6a09e14eec4ea82ae12f79a0337cb383) ([#13200](https://github.com/yt-dlp/yt-dlp/issues/13200)) by [Pawka](https://github.com/Pawka)
- **nebula**: [Support `--mark-watched`](https://github.com/yt-dlp/yt-dlp/commit/20f288bdc2173c7cc58d709d25ca193c1f6001e7) ([#13120](https://github.com/yt-dlp/yt-dlp/issues/13120)) by [GeoffreyFrogeye](https://github.com/GeoffreyFrogeye)
- **niconico**
- [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/f569be4602c2a857087e495d5d7ed6060cd97abe) ([#13236](https://github.com/yt-dlp/yt-dlp/issues/13236)) by [bashonly](https://github.com/bashonly)
- live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7a7b85c9014d96421e18aa7ea5f4c1bee5ceece0) ([#13045](https://github.com/yt-dlp/yt-dlp/issues/13045)) by [doe1080](https://github.com/doe1080)
- **nytimesarticle**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/b26bc32579c00ef579d75a835807ccc87d20ee0a) ([#13104](https://github.com/yt-dlp/yt-dlp/issues/13104)) by [bashonly](https://github.com/bashonly)
- **once**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/f475e8b529d18efdad603ffda02a56e707fe0e2c) ([#13164](https://github.com/yt-dlp/yt-dlp/issues/13164)) by [bashonly](https://github.com/bashonly)
- **picarto**: vod: [Support `/profile/` video URLs](https://github.com/yt-dlp/yt-dlp/commit/31e090cb787f3504ec25485adff9a2a51d056734) ([#13227](https://github.com/yt-dlp/yt-dlp/issues/13227)) by [subrat-lima](https://github.com/subrat-lima)
- **playsuisse**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/d880e060803ae8ed5a047e578cca01e1f0e630ce) ([#12466](https://github.com/yt-dlp/yt-dlp/issues/12466)) by [v3DJG6GL](https://github.com/v3DJG6GL)
- **sprout**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/cbcfe6378dde33a650e3852ab17ad4503b8e008d) ([#13149](https://github.com/yt-dlp/yt-dlp/issues/13149)) by [bashonly](https://github.com/bashonly)
- **svtpage**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ea8498ed534642dd7e925961b97b934987142fd3) ([#12957](https://github.com/yt-dlp/yt-dlp/issues/12957)) by [diman8](https://github.com/diman8)
- **twitch**: [Support `--live-from-start`](https://github.com/yt-dlp/yt-dlp/commit/00b1bec55249cf2ad6271d36492c51b34b6459d1) ([#13202](https://github.com/yt-dlp/yt-dlp/issues/13202)) by [bashonly](https://github.com/bashonly)
- **vimeo**: event: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/545c1a5b6f2fe88722b41aef0e7485bf3be3f3f9) ([#13216](https://github.com/yt-dlp/yt-dlp/issues/13216)) by [bashonly](https://github.com/bashonly)
- **wat.tv**: [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/f123cc83b3aea45053f5fa1d9141048b01fc2774) ([#13111](https://github.com/yt-dlp/yt-dlp/issues/13111)) by [bashonly](https://github.com/bashonly)
- **weverse**: [Fix live extraction](https://github.com/yt-dlp/yt-dlp/commit/5328eda8820cc5f21dcf917684d23fbdca41831d) ([#13084](https://github.com/yt-dlp/yt-dlp/issues/13084)) by [bashonly](https://github.com/bashonly)
- **xinpianchang**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/83fabf352489d52843f67e6e9cc752db86d27e6e) ([#13245](https://github.com/yt-dlp/yt-dlp/issues/13245)) by [garret1317](https://github.com/garret1317)
- **youtube**
- [Add PO token support for subtitles](https://github.com/yt-dlp/yt-dlp/commit/32ed5f107c6c641958d1cd2752e130de4db55a13) ([#13234](https://github.com/yt-dlp/yt-dlp/issues/13234)) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz)
- [Add `web_embedded` client for age-restricted videos](https://github.com/yt-dlp/yt-dlp/commit/0feec6dc131f488428bf881519e7c69766fbb9ae) ([#13089](https://github.com/yt-dlp/yt-dlp/issues/13089)) by [bashonly](https://github.com/bashonly)
- [Add a PO Token Provider Framework](https://github.com/yt-dlp/yt-dlp/commit/2685654a37141cca63eda3a92da0e2706e23ccfd) ([#12840](https://github.com/yt-dlp/yt-dlp/issues/12840)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract `media_type` for all videos](https://github.com/yt-dlp/yt-dlp/commit/ded11ebc9afba6ba33923375103e9be2d7c804e7) ([#13136](https://github.com/yt-dlp/yt-dlp/issues/13136)) by [bashonly](https://github.com/bashonly)
- [Fix `--live-from-start` support for premieres](https://github.com/yt-dlp/yt-dlp/commit/8f303afb43395be360cafd7ad4ce2b6e2eedfb8a) ([#13079](https://github.com/yt-dlp/yt-dlp/issues/13079)) by [arabcoders](https://github.com/arabcoders)
- [Fix geo-restriction error handling](https://github.com/yt-dlp/yt-dlp/commit/c7e575e31608c19c5b26c10a4229db89db5fc9a8) ([#13217](https://github.com/yt-dlp/yt-dlp/issues/13217)) by [yozel](https://github.com/yozel)
#### Misc. changes
- **build**
- [Bump PyInstaller to v6.13.0](https://github.com/yt-dlp/yt-dlp/commit/17cf9088d0d535e4a7feffbf02bd49cd9dae5ab9) ([#13082](https://github.com/yt-dlp/yt-dlp/issues/13082)) by [bashonly](https://github.com/bashonly)
- [Bump run-on-arch-action to v3](https://github.com/yt-dlp/yt-dlp/commit/9064d2482d1fe722bbb4a49731fe0711c410d1c8) ([#13088](https://github.com/yt-dlp/yt-dlp/issues/13088)) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [7977b32](https://github.com/yt-dlp/yt-dlp/commit/7977b329ed97b216e37bd402f4935f28c00eac9e) by [bashonly](https://github.com/bashonly)
### 2025.04.30 ### 2025.04.30
#### Important changes #### Important changes

View File

@ -18,10 +18,11 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites \
tar pypi-files lazy-extractors install uninstall tar pypi-files lazy-extractors install uninstall
clean-test: clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \ rm -rf tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \ *.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \ *.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \
*.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp *.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp \
test/testdata/sigs/player-*.js test/testdata/thumbnails/empty.webp "test/testdata/thumbnails/foo %d bar/foo_%d."*
clean-dist: clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \ rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \
yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS

View File

@ -44,6 +44,7 @@
* [Post-processing Options](#post-processing-options) * [Post-processing Options](#post-processing-options)
* [SponsorBlock Options](#sponsorblock-options) * [SponsorBlock Options](#sponsorblock-options)
* [Extractor Options](#extractor-options) * [Extractor Options](#extractor-options)
* [Preset Aliases](#preset-aliases)
* [CONFIGURATION](#configuration) * [CONFIGURATION](#configuration)
* [Configuration file encoding](#configuration-file-encoding) * [Configuration file encoding](#configuration-file-encoding)
* [Authentication with netrc](#authentication-with-netrc) * [Authentication with netrc](#authentication-with-netrc)
@ -348,8 +349,8 @@ ## General Options:
--no-flat-playlist Fully extract the videos of a playlist --no-flat-playlist Fully extract the videos of a playlist
(default) (default)
--live-from-start Download livestreams from the start. --live-from-start Download livestreams from the start.
Currently only supported for YouTube Currently experimental and only supported
(Experimental) for YouTube and Twitch
--no-live-from-start Download livestreams from the current time --no-live-from-start Download livestreams from the current time
(default) (default)
--wait-for-video MIN[-MAX] Wait for scheduled streams to become --wait-for-video MIN[-MAX] Wait for scheduled streams to become
@ -375,12 +376,12 @@ ## General Options:
an alias starts with a dash "-", it is an alias starts with a dash "-", it is
prefixed with "--". Arguments are parsed prefixed with "--". Arguments are parsed
according to the Python string formatting according to the Python string formatting
mini-language. E.g. --alias get-audio,-X mini-language. E.g. --alias get-audio,-X "-S
"-S=aext:{0},abr -x --audio-format {0}" aext:{0},abr -x --audio-format {0}" creates
creates options "--get-audio" and "-X" that options "--get-audio" and "-X" that takes an
takes an argument (ARG0) and expands to argument (ARG0) and expands to "-S
"-S=aext:ARG0,abr -x --audio-format ARG0". aext:ARG0,abr -x --audio-format ARG0". All
All defined aliases are listed in the --help defined aliases are listed in the --help
output. Alias options can trigger more output. Alias options can trigger more
aliases; so be careful to avoid defining aliases; so be careful to avoid defining
recursive options. As a safety measure, each recursive options. As a safety measure, each
@ -1105,6 +1106,10 @@ ## Extractor Options:
arguments for different extractors arguments for different extractors
## Preset Aliases: ## Preset Aliases:
Predefined aliases for convenience and ease of use. Note that future
versions of yt-dlp may add or adjust presets, but the existing preset
names will not be changed or removed
-t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format -t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format
mp3 mp3
@ -1790,9 +1795,9 @@ # EXTRACTOR ARGUMENTS
The following extractors use this feature: The following extractors use this feature:
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube/_base.py](https://github.com/yt-dlp/yt-dlp/blob/415b4c9f955b1a0391204bd24a7132590e7b3bdb/yt_dlp/extractor/youtube/_base.py#L402-L409) for the list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios` * `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv`, `tv_simply` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual` * `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
@ -1805,7 +1810,7 @@ #### youtube
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage` * `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID) * `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request) * `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be any of `gvs` (Google Video Server URLs), `player` (Innertube player request) or `subs` (Subtitles)
* `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default) * `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default)
* `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context) * `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context)

View File

@ -2,6 +2,7 @@
set -e set -e
source ~/.local/share/pipx/venvs/pyinstaller/bin/activate source ~/.local/share/pipx/venvs/pyinstaller/bin/activate
python -m devscripts.install_deps -o --include build
python -m devscripts.install_deps --include secretstorage --include curl-cffi python -m devscripts.install_deps --include secretstorage --include curl-cffi
python -m devscripts.make_lazy_extractors python -m devscripts.make_lazy_extractors
python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}" python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}"

View File

@ -36,6 +36,9 @@ def main():
f'--name={name}', f'--name={name}',
'--icon=devscripts/logo.ico', '--icon=devscripts/logo.ico',
'--upx-exclude=vcruntime140.dll', '--upx-exclude=vcruntime140.dll',
# Ref: https://github.com/yt-dlp/yt-dlp/issues/13311
# https://github.com/pyinstaller/pyinstaller/issues/9149
'--exclude-module=pkg_resources',
'--noconfirm', '--noconfirm',
'--additional-hooks-dir=yt_dlp/__pyinstaller', '--additional-hooks-dir=yt_dlp/__pyinstaller',
*opts, *opts,

View File

@ -65,7 +65,7 @@ build = [
"build", "build",
"hatchling", "hatchling",
"pip", "pip",
"setuptools>=71.0.2", # 71.0.0 broke pyinstaller "setuptools>=71.0.2,<81", # See https://github.com/pyinstaller/pyinstaller/issues/9149
"wheel", "wheel",
] ]
dev = [ dev = [

View File

@ -5,6 +5,8 @@ # Supported sites
Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them. Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them.
The only reliable way to check if a site is supported is to try it. The only reliable way to check if a site is supported is to try it.
- **10play**: [*10play*](## "netrc machine")
- **10play:season**
- **17live** - **17live**
- **17live:clip** - **17live:clip**
- **17live:vod** - **17live:vod**
@ -246,7 +248,6 @@ # Supported sites
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalsurmas** - **Canalsurmas**
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine") - **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
- **CartoonNetwork**
- **cbc.ca** - **cbc.ca**
- **cbc.ca:player** - **cbc.ca:player**
- **cbc.ca:player:playlist** - **cbc.ca:player:playlist**
@ -296,7 +297,7 @@ # Supported sites
- **CNNIndonesia** - **CNNIndonesia**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralTV** - **ComedyCentralTV**
- **ConanClassic** - **ConanClassic**: (**Currently broken**)
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **CONtv** - **CONtv**
- **CookingChannel** - **CookingChannel**
@ -318,7 +319,7 @@ # Supported sites
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV** - **CTV**
- **CTVNews** - **CTVNews**
- **cu.ntv.co.jp**: Nippon Television Network - **cu.ntv.co.jp**: 日テレ無料TADA!
- **CultureUnplugged** - **CultureUnplugged**
- **curiositystream**: [*curiositystream*](## "netrc machine") - **curiositystream**: [*curiositystream*](## "netrc machine")
- **curiositystream:collections**: [*curiositystream*](## "netrc machine") - **curiositystream:collections**: [*curiositystream*](## "netrc machine")
@ -589,7 +590,7 @@ # Supported sites
- **Hungama** - **Hungama**
- **HungamaAlbumPlaylist** - **HungamaAlbumPlaylist**
- **HungamaSong** - **HungamaSong**
- **huya:live**: huya.com - **huya:live**: 虎牙直播
- **huya:video**: 虎牙视频 - **huya:video**: 虎牙视频
- **Hypem** - **Hypem**
- **Hytale** - **Hytale**
@ -649,7 +650,10 @@ # Supported sites
- **jiocinema**: [*jiocinema*](## "netrc machine") - **jiocinema**: [*jiocinema*](## "netrc machine")
- **jiocinema:series**: [*jiocinema*](## "netrc machine") - **jiocinema:series**: [*jiocinema*](## "netrc machine")
- **jiosaavn:album** - **jiosaavn:album**
- **jiosaavn:artist**
- **jiosaavn:playlist** - **jiosaavn:playlist**
- **jiosaavn:show**
- **jiosaavn:show:playlist**
- **jiosaavn:song** - **jiosaavn:song**
- **Joj** - **Joj**
- **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR) - **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)
@ -772,6 +776,7 @@ # Supported sites
- **massengeschmack.tv** - **massengeschmack.tv**
- **Masters** - **Masters**
- **MatchTV** - **MatchTV**
- **Mave**
- **MBN**: mbn.co.kr (매일방송) - **MBN**: mbn.co.kr (매일방송)
- **MDR**: MDR.DE - **MDR**: MDR.DE
- **MedalTV** - **MedalTV**
@ -828,7 +833,7 @@ # Supported sites
- **Mojevideo**: mojevideo.sk - **Mojevideo**: mojevideo.sk
- **Mojvideo** - **Mojvideo**
- **Monstercat** - **Monstercat**
- **MonsterSirenHypergryphMusic** - **monstersiren**: 塞壬唱片
- **Motherless** - **Motherless**
- **MotherlessGallery** - **MotherlessGallery**
- **MotherlessGroup** - **MotherlessGroup**
@ -880,19 +885,19 @@ # Supported sites
- **Naver** - **Naver**
- **Naver:live** - **Naver:live**
- **navernow** - **navernow**
- **nba** - **nba**: (**Currently broken**)
- **nba:channel** - **nba:channel**: (**Currently broken**)
- **nba:embed** - **nba:embed**: (**Currently broken**)
- **nba:watch** - **nba:watch**: (**Currently broken**)
- **nba:watch:collection** - **nba:watch:collection**: (**Currently broken**)
- **nba:watch:embed** - **nba:watch:embed**: (**Currently broken**)
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **nbcolympics** - **nbcolympics**
- **nbcolympics:stream** - **nbcolympics:stream**: (**Currently broken**)
- **NBCSports** - **NBCSports**: (**Currently broken**)
- **NBCSportsStream** - **NBCSportsStream**: (**Currently broken**)
- **NBCSportsVPlayer** - **NBCSportsVPlayer**: (**Currently broken**)
- **NBCStations** - **NBCStations**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
- **ndr:embed** - **ndr:embed**
@ -968,7 +973,7 @@ # Supported sites
- **Nitter** - **Nitter**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **NobelPrize**: (**Currently broken**) - **NobelPrize**
- **NoicePodcast** - **NoicePodcast**
- **NonkTube** - **NonkTube**
- **NoodleMagazine** - **NoodleMagazine**
@ -1081,8 +1086,8 @@ # Supported sites
- **Photobucket** - **Photobucket**
- **PiaLive** - **PiaLive**
- **Piapro**: [*piapro*](## "netrc machine") - **Piapro**: [*piapro*](## "netrc machine")
- **Picarto** - **picarto**
- **PicartoVod** - **picarto:vod**
- **Piksel** - **Piksel**
- **Pinkbike** - **Pinkbike**
- **Pinterest** - **Pinterest**
@ -1390,16 +1395,15 @@ # Supported sites
- **Spreaker** - **Spreaker**
- **SpreakerShow** - **SpreakerShow**
- **SpringboardPlatform** - **SpringboardPlatform**
- **Sprout**
- **SproutVideo** - **SproutVideo**
- **sr:mediathek**: Saarländischer Rundfunk (**Currently broken**) - **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **StacommuLive**: [*stacommu*](## "netrc machine") - **StacommuLive**: [*stacommu*](## "netrc machine")
- **StacommuVOD**: [*stacommu*](## "netrc machine") - **StacommuVOD**: [*stacommu*](## "netrc machine")
- **StagePlusVODConcert**: [*stageplus*](## "netrc machine") - **StagePlusVODConcert**: [*stageplus*](## "netrc machine")
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **StarTrek**: (**Currently broken**) - **startrek**: STAR TREK
- **startv** - **startv**
- **Steam** - **Steam**
- **SteamCommunityBroadcast** - **SteamCommunityBroadcast**
@ -1422,12 +1426,11 @@ # Supported sites
- **SunPorno** - **SunPorno**
- **sverigesradio:episode** - **sverigesradio:episode**
- **sverigesradio:publication** - **sverigesradio:publication**
- **SVT** - **svt:page**
- **SVTPage** - **svt:play**: SVT Play and Öppet arkiv
- **SVTPlay**: SVT Play and Öppet arkiv - **svt:play:series**
- **SVTSeries**
- **SwearnetEpisode** - **SwearnetEpisode**
- **Syfy**: (**Currently broken**) - **Syfy**
- **SYVDK** - **SYVDK**
- **SztvHu** - **SztvHu**
- **t-online.de**: (**Currently broken**) - **t-online.de**: (**Currently broken**)
@ -1471,8 +1474,6 @@ # Supported sites
- **Telewebion**: (**Currently broken**) - **Telewebion**: (**Currently broken**)
- **Tempo** - **Tempo**
- **TennisTV**: [*tennistv*](## "netrc machine") - **TennisTV**: [*tennistv*](## "netrc machine")
- **TenPlay**: [*10play*](## "netrc machine")
- **TenPlaySeason**
- **TF1** - **TF1**
- **TFO** - **TFO**
- **theatercomplextown:ppv**: [*theatercomplextown*](## "netrc machine") - **theatercomplextown:ppv**: [*theatercomplextown*](## "netrc machine")
@ -1510,6 +1511,7 @@ # Supported sites
- **tokfm:podcast** - **tokfm:podcast**
- **ToonGoggles** - **ToonGoggles**
- **tou.tv**: [*toutv*](## "netrc machine") - **tou.tv**: [*toutv*](## "netrc machine")
- **toutiao**: 今日头条
- **Toypics**: Toypics video (**Currently broken**) - **Toypics**: Toypics video (**Currently broken**)
- **ToypicsUser**: Toypics user profile (**Currently broken**) - **ToypicsUser**: Toypics user profile (**Currently broken**)
- **TrailerAddict**: (**Currently broken**) - **TrailerAddict**: (**Currently broken**)
@ -1599,7 +1601,7 @@ # Supported sites
- **UKTVPlay** - **UKTVPlay**
- **UlizaPlayer** - **UlizaPlayer**
- **UlizaPortal**: ulizaportal.jp - **UlizaPortal**: ulizaportal.jp
- **umg:de**: Universal Music Deutschland (**Currently broken**) - **umg:de**: Universal Music Deutschland
- **Unistra** - **Unistra**
- **Unity**: (**Currently broken**) - **Unity**: (**Currently broken**)
- **uol.com.br** - **uol.com.br**
@ -1622,9 +1624,9 @@ # Supported sites
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **vhx:embed**: [*vimeo*](## "netrc machine") - **vhx:embed**: [*vimeo*](## "netrc machine")
- **vice** - **vice**: (**Currently broken**)
- **vice:article** - **vice:article**: (**Currently broken**)
- **vice:show** - **vice:show**: (**Currently broken**)
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.arnes.si**: Arnes Video - **video.arnes.si**: Arnes Video
@ -1656,6 +1658,7 @@ # Supported sites
- **vimeo**: [*vimeo*](## "netrc machine") - **vimeo**: [*vimeo*](## "netrc machine")
- **vimeo:album**: [*vimeo*](## "netrc machine") - **vimeo:album**: [*vimeo*](## "netrc machine")
- **vimeo:channel**: [*vimeo*](## "netrc machine") - **vimeo:channel**: [*vimeo*](## "netrc machine")
- **vimeo:event**: [*vimeo*](## "netrc machine")
- **vimeo:group**: [*vimeo*](## "netrc machine") - **vimeo:group**: [*vimeo*](## "netrc machine")
- **vimeo:likes**: [*vimeo*](## "netrc machine") Vimeo user likes - **vimeo:likes**: [*vimeo*](## "netrc machine") Vimeo user likes
- **vimeo:ondemand**: [*vimeo*](## "netrc machine") - **vimeo:ondemand**: [*vimeo*](## "netrc machine")

View File

@ -314,6 +314,20 @@ def test_search_json_ld_realworld(self):
}, },
{}, {},
), ),
(
# test thumbnail_url key without URL scheme
r'''
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "VideoObject",
"thumbnail_url": "//www.nobelprize.org/images/12693-landscape-medium-gallery.jpg"
}</script>''',
{
'thumbnails': [{'url': 'https://www.nobelprize.org/images/12693-landscape-medium-gallery.jpg'}],
},
{},
),
] ]
for html, expected_dict, search_json_ld_kwargs in _TESTS: for html, expected_dict, search_json_ld_kwargs in _TESTS:
expect_dict( expect_dict(
@ -1933,6 +1947,137 @@ def test_search_nextjs_data(self):
with self.assertWarns(DeprecationWarning): with self.assertWarns(DeprecationWarning):
self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {}) self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {})
def test_search_nuxt_json(self):
HTML_TMPL = '<script data-ssr="true" id="__NUXT_DATA__" type="application/json">[{}]</script>'
VALID_DATA = '''
["ShallowReactive",1],
{"data":2,"state":21,"once":25,"_errors":28,"_server_errors":30},
["ShallowReactive",3],
{"$abcdef123456":4},
{"podcast":5,"activeEpisodeData":7},
{"podcast":6,"seasons":14},
{"title":10,"id":11},
["Reactive",8],
{"episode":9,"creators":18,"empty_list":20},
{"title":12,"id":13,"refs":34,"empty_refs":35},
"Series Title",
"podcast-id-01",
"Episode Title",
"episode-id-99",
[15,16,17],
1,
2,
3,
[19],
"Podcast Creator",
[],
{"$ssite-config":22},
{"env":23,"name":24,"map":26,"numbers":14},
"production",
"podcast-website",
["Set"],
["Reactive",27],
["Map"],
["ShallowReactive",29],
{},
["NuxtError",31],
{"status":32,"message":33},
503,
"Service Unavailable",
[36,37],
[38,39],
["Ref",40],
["ShallowRef",41],
["EmptyRef",42],
["EmptyShallowRef",43],
"ref",
"shallow_ref",
"{\\"ref\\":1}",
"{\\"shallow_ref\\":2}"
'''
PAYLOAD = {
'data': {
'$abcdef123456': {
'podcast': {
'podcast': {
'title': 'Series Title',
'id': 'podcast-id-01',
},
'seasons': [1, 2, 3],
},
'activeEpisodeData': {
'episode': {
'title': 'Episode Title',
'id': 'episode-id-99',
'refs': ['ref', 'shallow_ref'],
'empty_refs': [{'ref': 1}, {'shallow_ref': 2}],
},
'creators': ['Podcast Creator'],
'empty_list': [],
},
},
},
'state': {
'$ssite-config': {
'env': 'production',
'name': 'podcast-website',
'map': [],
'numbers': [1, 2, 3],
},
},
'once': [],
'_errors': {},
'_server_errors': {
'status': 503,
'message': 'Service Unavailable',
},
}
PARTIALLY_INVALID = [(
'''
{"data":1},
{"invalid_raw_list":2},
[15,16,17]
''',
{'data': {'invalid_raw_list': [None, None, None]}},
), (
'''
{"data":1},
["EmptyRef",2],
"not valid JSON"
''',
{'data': None},
), (
'''
{"data":1},
["EmptyShallowRef",2],
"not valid JSON"
''',
{'data': None},
)]
INVALID = [
'''
[]
''',
'''
["unsupported",1],
{"data":2},
{}
''',
]
DEFAULT = object()
self.assertEqual(self.ie._search_nuxt_json(HTML_TMPL.format(VALID_DATA), None), PAYLOAD)
self.assertEqual(self.ie._search_nuxt_json('', None, fatal=False), {})
self.assertIs(self.ie._search_nuxt_json('', None, default=DEFAULT), DEFAULT)
for data, expected in PARTIALLY_INVALID:
self.assertEqual(
self.ie._search_nuxt_json(HTML_TMPL.format(data), None, fatal=False), expected)
for data in INVALID:
self.assertIs(
self.ie._search_nuxt_json(HTML_TMPL.format(data), None, default=DEFAULT), DEFAULT)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -58,6 +58,14 @@ def test_get_desktop_environment(self):
({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3), ({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE), ({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'gnome'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'mate'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'my_custom_de', 'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME), ({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE3), ({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE3),
({'KDE_FULL_SESSION': 1, 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4), ({'KDE_FULL_SESSION': 1, 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),

235
test/test_devalue.py Normal file
View File

@ -0,0 +1,235 @@
#!/usr/bin/env python3
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import datetime as dt
import json
import math
import re
import unittest
from yt_dlp.utils.jslib import devalue
TEST_CASES_EQUALS = [{
'name': 'int',
'unparsed': [-42],
'parsed': -42,
}, {
'name': 'str',
'unparsed': ['woo!!!'],
'parsed': 'woo!!!',
}, {
'name': 'Number',
'unparsed': [['Object', 42]],
'parsed': 42,
}, {
'name': 'String',
'unparsed': [['Object', 'yar']],
'parsed': 'yar',
}, {
'name': 'Infinity',
'unparsed': -4,
'parsed': math.inf,
}, {
'name': 'negative Infinity',
'unparsed': -5,
'parsed': -math.inf,
}, {
'name': 'negative zero',
'unparsed': -6,
'parsed': -0.0,
}, {
'name': 'RegExp',
'unparsed': [['RegExp', 'regexp', 'gim']], # XXX: flags are ignored
'parsed': re.compile('regexp'),
}, {
'name': 'Date',
'unparsed': [['Date', '2001-09-09T01:46:40.000Z']],
'parsed': dt.datetime.fromtimestamp(1e9, tz=dt.timezone.utc),
}, {
'name': 'Array',
'unparsed': [[1, 2, 3], 'a', 'b', 'c'],
'parsed': ['a', 'b', 'c'],
}, {
'name': 'Array (empty)',
'unparsed': [[]],
'parsed': [],
}, {
'name': 'Array (sparse)',
'unparsed': [[-2, 1, -2], 'b'],
'parsed': [None, 'b', None],
}, {
'name': 'Object',
'unparsed': [{'foo': 1, 'x-y': 2}, 'bar', 'z'],
'parsed': {'foo': 'bar', 'x-y': 'z'},
}, {
'name': 'Set',
'unparsed': [['Set', 1, 2, 3], 1, 2, 3],
'parsed': [1, 2, 3],
}, {
'name': 'Map',
'unparsed': [['Map', 1, 2], 'a', 'b'],
'parsed': [['a', 'b']],
}, {
'name': 'BigInt',
'unparsed': [['BigInt', '1']],
'parsed': 1,
}, {
'name': 'Uint8Array',
'unparsed': [['Uint8Array', 'AQID']],
'parsed': [1, 2, 3],
}, {
'name': 'ArrayBuffer',
'unparsed': [['ArrayBuffer', 'AQID']],
'parsed': [1, 2, 3],
}, {
'name': 'str (repetition)',
'unparsed': [[1, 1], 'a string'],
'parsed': ['a string', 'a string'],
}, {
'name': 'None (repetition)',
'unparsed': [[1, 1], None],
'parsed': [None, None],
}, {
'name': 'dict (repetition)',
'unparsed': [[1, 1], {}],
'parsed': [{}, {}],
}, {
'name': 'Object without prototype',
'unparsed': [['null']],
'parsed': {},
}, {
'name': 'cross-realm POJO',
'unparsed': [{}],
'parsed': {},
}]
TEST_CASES_IS = [{
'name': 'bool',
'unparsed': [True],
'parsed': True,
}, {
'name': 'Boolean',
'unparsed': [['Object', False]],
'parsed': False,
}, {
'name': 'undefined',
'unparsed': -1,
'parsed': None,
}, {
'name': 'null',
'unparsed': [None],
'parsed': None,
}, {
'name': 'NaN',
'unparsed': -3,
'parsed': math.nan,
}]
TEST_CASES_INVALID = [{
'name': 'empty string',
'unparsed': '',
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'hole',
'unparsed': -2,
'error': ValueError,
'pattern': r'invalid integer input',
}, {
'name': 'string',
'unparsed': 'hello',
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'number',
'unparsed': 42,
'error': ValueError,
'pattern': r'invalid integer input',
}, {
'name': 'boolean',
'unparsed': True,
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'null',
'unparsed': None,
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'object',
'unparsed': {},
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'empty array',
'unparsed': [],
'error': ValueError,
'pattern': r'expected a non-empty list as input',
}, {
'name': 'Python negative indexing',
'unparsed': [[1, 2, 3, 4, 5, 6, 7, -7], 1, 2, 3, 4, 5, 6, 7],
'error': IndexError,
'pattern': r'invalid index: -7',
}]
class TestDevalue(unittest.TestCase):
def test_devalue_parse_equals(self):
for tc in TEST_CASES_EQUALS:
self.assertEqual(devalue.parse(tc['unparsed']), tc['parsed'], tc['name'])
def test_devalue_parse_is(self):
for tc in TEST_CASES_IS:
self.assertIs(devalue.parse(tc['unparsed']), tc['parsed'], tc['name'])
def test_devalue_parse_invalid(self):
for tc in TEST_CASES_INVALID:
with self.assertRaisesRegex(tc['error'], tc['pattern'], msg=tc['name']):
devalue.parse(tc['unparsed'])
def test_devalue_parse_cyclical(self):
name = 'Map (cyclical)'
result = devalue.parse([['Map', 1, 0], 'self'])
self.assertEqual(result[0][0], 'self', name)
self.assertIs(result, result[0][1], name)
name = 'Set (cyclical)'
result = devalue.parse([['Set', 0, 1], 42])
self.assertEqual(result[1], 42, name)
self.assertIs(result, result[0], name)
result = devalue.parse([[0]])
self.assertIs(result, result[0], 'Array (cyclical)')
name = 'Object (cyclical)'
result = devalue.parse([{'self': 0}])
self.assertIs(result, result['self'], name)
name = 'Object with null prototype (cyclical)'
result = devalue.parse([['null', 'self', 0]])
self.assertIs(result, result['self'], name)
name = 'Objects (cyclical)'
result = devalue.parse([[1, 2], {'second': 2}, {'first': 1}])
self.assertIs(result[0], result[1]['first'], name)
self.assertIs(result[1], result[0]['second'], name)
def test_devalue_parse_revivers(self):
self.assertEqual(
devalue.parse([['indirect', 1], {'a': 2}, 'b'], revivers={'indirect': lambda x: x}),
{'a': 'b'}, 'revivers (indirect)')
self.assertEqual(
devalue.parse([['parse', 1], '{"a":0}'], revivers={'parse': lambda x: json.loads(x)}),
{'a': 0}, 'revivers (parse)')
if __name__ == '__main__':
unittest.main()

View File

@ -478,6 +478,14 @@ def test_extract_function_with_global_stack(self):
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000}) func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111) self.assertEqual(func([1]), 1111)
def test_increment_decrement(self):
self._test('function f() { var x = 1; return ++x; }', 2)
self._test('function f() { var x = 1; return x++; }', 1)
self._test('function f() { var x = 1; x--; return x }', 0)
self._test('function f() { var y; var x = 1; x++, --x, x--, x--, y="z", "abc", x++; return --x }', -1)
self._test('function f() { var a = "test--"; return a; }', 'test--')
self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -8,6 +8,8 @@
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import subprocess
from yt_dlp import YoutubeDL from yt_dlp import YoutubeDL
from yt_dlp.utils import shell_quote from yt_dlp.utils import shell_quote
from yt_dlp.postprocessor import ( from yt_dlp.postprocessor import (
@ -47,7 +49,18 @@ def test_escaping(self):
print('Skipping: ffmpeg not found') print('Skipping: ffmpeg not found')
return return
file = 'test/testdata/thumbnails/foo %d bar/foo_%d.{}' test_data_dir = 'test/testdata/thumbnails'
generated_file = f'{test_data_dir}/empty.webp'
subprocess.check_call([
pp.executable, '-y', '-f', 'lavfi', '-i', 'color=c=black:s=320x320',
'-c:v', 'libwebp', '-pix_fmt', 'yuv420p', '-vframes', '1', generated_file,
], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
file = test_data_dir + '/foo %d bar/foo_%d.{}'
initial_file = file.format('webp')
os.replace(generated_file, initial_file)
tests = (('webp', 'png'), ('png', 'jpg')) tests = (('webp', 'png'), ('png', 'jpg'))
for inp, out in tests: for inp, out in tests:
@ -55,11 +68,13 @@ def test_escaping(self):
if os.path.exists(out_file): if os.path.exists(out_file):
os.remove(out_file) os.remove(out_file)
pp.convert_thumbnail(file.format(inp), out) pp.convert_thumbnail(file.format(inp), out)
assert os.path.exists(out_file) self.assertTrue(os.path.exists(out_file))
for _, out in tests: for _, out in tests:
os.remove(file.format(out)) os.remove(file.format(out))
os.remove(initial_file)
class TestExec(unittest.TestCase): class TestExec(unittest.TestCase):
def test_parse_cmd(self): def test_parse_cmd(self):
@ -610,3 +625,7 @@ def test_quote_for_concat_QuotesAtEnd(self):
self.assertEqual( self.assertEqual(
r"'special '\'' characters '\'' galore'\'\'\'", r"'special '\'' characters '\'' galore'\'\'\'",
self._pp._quote_for_ffmpeg("special ' characters ' galore'''")) self._pp._quote_for_ffmpeg("special ' characters ' galore'''"))
if __name__ == '__main__':
unittest.main()

View File

@ -11,10 +11,11 @@ class TestGetWebPoContentBinding:
@pytest.mark.parametrize('client_name, context, is_authenticated, expected', [ @pytest.mark.parametrize('client_name, context, is_authenticated, expected', [
*[(client, context, is_authenticated, expected) for client in [ *[(client, context, is_authenticated, expected) for client in [
'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER'] 'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER', 'TVHTML5_SIMPLY']
for context, is_authenticated, expected in [ for context, is_authenticated, expected in [
(PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)), (PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),
(PoTokenContext.PLAYER, False, ('example-video-id', ContentBindingType.VIDEO_ID)), (PoTokenContext.PLAYER, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
(PoTokenContext.SUBS, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
(PoTokenContext.GVS, True, ('example-data-sync-id', ContentBindingType.DATASYNC_ID)), (PoTokenContext.GVS, True, ('example-data-sync-id', ContentBindingType.DATASYNC_ID)),
]], ]],
('WEB_REMIX', PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)), ('WEB_REMIX', PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),

View File

@ -49,7 +49,7 @@ def test_not_supports(self, ie, logger, pot_request, client_name, context, is_au
@pytest.mark.parametrize('client_name, context, is_authenticated, remote_host, source_address, request_proxy, expected', [ @pytest.mark.parametrize('client_name, context, is_authenticated, remote_host, source_address, request_proxy, expected', [
*[(client, context, is_authenticated, remote_host, source_address, request_proxy, expected) for client in [ *[(client, context, is_authenticated, remote_host, source_address, request_proxy, expected) for client in [
'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER'] 'WEB', 'MWEB', 'TVHTML5', 'WEB_EMBEDDED_PLAYER', 'WEB_CREATOR', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER', 'TVHTML5_SIMPLY']
for context, is_authenticated, remote_host, source_address, request_proxy, expected in [ for context, is_authenticated, remote_host, source_address, request_proxy, expected in [
(PoTokenContext.GVS, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'visitor_id'}), (PoTokenContext.GVS, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'visitor_id'}),
(PoTokenContext.PLAYER, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'video_id'}), (PoTokenContext.PLAYER, False, 'example-remote-host', 'example-source-address', 'example-request-proxy', {'t': 'webpo', 'ip': 'example-remote-host', 'sa': 'example-source-address', 'px': 'example-request-proxy', 'cb': '123abcXYZ_-', 'cbt': 'video_id'}),

View File

@ -416,18 +416,8 @@ def test_traversal_unbranching(self):
'`any` should allow further branching' '`any` should allow further branching'
def test_traversal_morsel(self): def test_traversal_morsel(self):
values = {
'expires': 'a',
'path': 'b',
'comment': 'c',
'domain': 'd',
'max-age': 'e',
'secure': 'f',
'httponly': 'g',
'version': 'h',
'samesite': 'i',
}
morsel = http.cookies.Morsel() morsel = http.cookies.Morsel()
values = dict(zip(morsel, 'abcdefghijklmnop'))
morsel.set('item_key', 'item_value', 'coded_value') morsel.set('item_key', 'item_value', 'coded_value')
morsel.update(values) morsel.update(values)
values['key'] = 'item_key' values['key'] = 'item_key'

View File

@ -316,6 +316,18 @@
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js', 'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE', 'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
), ),
(
'https://www.youtube.com/s/player/59b252b9/player_ias.vflset/en_US/base.js',
'D3XWVpYgwhLLKNK4AGX', 'aZrQ1qWJ5yv5h',
),
(
'https://www.youtube.com/s/player/fc2a56a5/player_ias.vflset/en_US/base.js',
'qTKWg_Il804jd2kAC', 'OtUAm2W6gyzJjB9u',
),
(
'https://www.youtube.com/s/player/fc2a56a5/tv-player-ias.vflset/tv-player-ias.js',
'qTKWg_Il804jd2kAC', 'OtUAm2W6gyzJjB9u',
),
] ]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.8 KiB

View File

View File

@ -490,7 +490,7 @@ class YoutubeDL:
The template is mapped on a dictionary with keys 'progress' and 'info' The template is mapped on a dictionary with keys 'progress' and 'info'
retry_sleep_functions: Dictionary of functions that takes the number of attempts retry_sleep_functions: Dictionary of functions that takes the number of attempts
as argument and returns the time to sleep in seconds. as argument and returns the time to sleep in seconds.
Allowed keys are 'http', 'fragment', 'file_access' Allowed keys are 'http', 'fragment', 'file_access', 'extractor'
download_ranges: A callback function that gets called for every video with download_ranges: A callback function that gets called for every video with
the signature (info_dict, ydl) -> Iterable[Section]. the signature (info_dict, ydl) -> Iterable[Section].
Only the returned sections will be downloaded. Only the returned sections will be downloaded.
@ -2219,6 +2219,7 @@ def _check_formats(self, formats):
self.report_warning(f'Unable to delete temporary file "{temp_file.name}"') self.report_warning(f'Unable to delete temporary file "{temp_file.name}"')
f['__working'] = success f['__working'] = success
if success: if success:
f.pop('__needs_testing', None)
yield f yield f
else: else:
self.to_screen('[info] Unable to download format {}. Skipping...'.format(f['format_id'])) self.to_screen('[info] Unable to download format {}. Skipping...'.format(f['format_id']))
@ -3963,6 +3964,7 @@ def simplified_codec(f, field):
self._format_out('UNSUPPORTED', self.Styles.BAD_FORMAT) if f.get('ext') in ('f4f', 'f4m') else None, self._format_out('UNSUPPORTED', self.Styles.BAD_FORMAT) if f.get('ext') in ('f4f', 'f4m') else None,
(self._format_out('Maybe DRM', self.Styles.WARNING) if f.get('has_drm') == 'maybe' (self._format_out('Maybe DRM', self.Styles.WARNING) if f.get('has_drm') == 'maybe'
else self._format_out('DRM', self.Styles.BAD_FORMAT) if f.get('has_drm') else None), else self._format_out('DRM', self.Styles.BAD_FORMAT) if f.get('has_drm') else None),
self._format_out('Untested', self.Styles.WARNING) if f.get('__needs_testing') else None,
format_field(f, 'format_note'), format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))), format_field(f, 'container', ignore=(None, f.get('ext'))),
delim=', '), delim=' '), delim=', '), delim=' '),

View File

@ -764,11 +764,11 @@ def _get_linux_desktop_environment(env, logger):
GetDesktopEnvironment GetDesktopEnvironment
""" """
xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None) xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None)
desktop_session = env.get('DESKTOP_SESSION', None) desktop_session = env.get('DESKTOP_SESSION', '')
if xdg_current_desktop is not None: if xdg_current_desktop is not None:
for part in map(str.strip, xdg_current_desktop.split(':')): for part in map(str.strip, xdg_current_desktop.split(':')):
if part == 'Unity': if part == 'Unity':
if desktop_session is not None and 'gnome-fallback' in desktop_session: if 'gnome-fallback' in desktop_session:
return _LinuxDesktopEnvironment.GNOME return _LinuxDesktopEnvironment.GNOME
else: else:
return _LinuxDesktopEnvironment.UNITY return _LinuxDesktopEnvironment.UNITY
@ -797,35 +797,34 @@ def _get_linux_desktop_environment(env, logger):
return _LinuxDesktopEnvironment.UKUI return _LinuxDesktopEnvironment.UKUI
elif part == 'LXQt': elif part == 'LXQt':
return _LinuxDesktopEnvironment.LXQT return _LinuxDesktopEnvironment.LXQT
logger.info(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"') logger.debug(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"')
elif desktop_session is not None: if desktop_session == 'deepin':
if desktop_session == 'deepin': return _LinuxDesktopEnvironment.DEEPIN
return _LinuxDesktopEnvironment.DEEPIN elif desktop_session in ('mate', 'gnome'):
elif desktop_session in ('mate', 'gnome'): return _LinuxDesktopEnvironment.GNOME
return _LinuxDesktopEnvironment.GNOME elif desktop_session in ('kde4', 'kde-plasma'):
elif desktop_session in ('kde4', 'kde-plasma'): return _LinuxDesktopEnvironment.KDE4
elif desktop_session == 'kde':
if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE4 return _LinuxDesktopEnvironment.KDE4
elif desktop_session == 'kde':
if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE4
else:
return _LinuxDesktopEnvironment.KDE3
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
return _LinuxDesktopEnvironment.XFCE
elif desktop_session == 'ukui':
return _LinuxDesktopEnvironment.UKUI
else: else:
logger.info(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"') return _LinuxDesktopEnvironment.KDE3
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
return _LinuxDesktopEnvironment.XFCE
elif desktop_session == 'ukui':
return _LinuxDesktopEnvironment.UKUI
else: else:
if 'GNOME_DESKTOP_SESSION_ID' in env: logger.debug(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"')
return _LinuxDesktopEnvironment.GNOME
elif 'KDE_FULL_SESSION' in env: if 'GNOME_DESKTOP_SESSION_ID' in env:
if 'KDE_SESSION_VERSION' in env: return _LinuxDesktopEnvironment.GNOME
return _LinuxDesktopEnvironment.KDE4 elif 'KDE_FULL_SESSION' in env:
else: if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE3 return _LinuxDesktopEnvironment.KDE4
else:
return _LinuxDesktopEnvironment.KDE3
return _LinuxDesktopEnvironment.OTHER return _LinuxDesktopEnvironment.OTHER

View File

@ -5,47 +5,46 @@
from .common import FileDownloader from .common import FileDownloader
from .external import FFmpegFD from .external import FFmpegFD
from ..networking import Request from ..networking import Request
from ..utils import DownloadError, str_or_none, try_get from ..networking.websocket import WebSocketResponse
from ..utils import DownloadError, str_or_none, truncate_string
from ..utils.traversal import traverse_obj
class NiconicoLiveFD(FileDownloader): class NiconicoLiveFD(FileDownloader):
""" Downloads niconico live without being stopped """ """ Downloads niconico live without being stopped """
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
video_id = info_dict['video_id'] video_id = info_dict['id']
ws_url = info_dict['url'] opts = info_dict['downloader_options']
ws_extractor = info_dict['ws'] quality, ws_extractor, ws_url = opts['max_quality'], opts['ws'], opts['ws_url']
ws_origin_host = info_dict['origin']
live_quality = info_dict.get('live_quality', 'high')
live_latency = info_dict.get('live_latency', 'high')
dl = FFmpegFD(self.ydl, self.params or {}) dl = FFmpegFD(self.ydl, self.params or {})
new_info_dict = info_dict.copy() new_info_dict = info_dict.copy()
new_info_dict.update({ new_info_dict['protocol'] = 'm3u8'
'protocol': 'm3u8',
})
def communicate_ws(reconnect): def communicate_ws(reconnect):
if reconnect: # Support --load-info-json as if it is a reconnect attempt
ws = self.ydl.urlopen(Request(ws_url, headers={'Origin': f'https://{ws_origin_host}'})) if reconnect or not isinstance(ws_extractor, WebSocketResponse):
ws = self.ydl.urlopen(Request(
ws_url, headers={'Origin': 'https://live.nicovideo.jp'}))
if self.ydl.params.get('verbose', False): if self.ydl.params.get('verbose', False):
self.to_screen('[debug] Sending startWatching request') self.write_debug('Sending startWatching request')
ws.send(json.dumps({ ws.send(json.dumps({
'type': 'startWatching',
'data': { 'data': {
'reconnect': True,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': { 'stream': {
'quality': live_quality,
'protocol': 'hls+fmp4',
'latency': live_latency,
'accessRightMethod': 'single_cookie', 'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': quality,
}, },
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': True,
}, },
'type': 'startWatching',
})) }))
else: else:
ws = ws_extractor ws = ws_extractor
@ -58,7 +57,6 @@ def communicate_ws(reconnect):
if not data or not isinstance(data, dict): if not data or not isinstance(data, dict):
continue continue
if data.get('type') == 'ping': if data.get('type') == 'ping':
# pong back
ws.send(r'{"type":"pong"}') ws.send(r'{"type":"pong"}')
ws.send(r'{"type":"keepSeat"}') ws.send(r'{"type":"keepSeat"}')
elif data.get('type') == 'disconnect': elif data.get('type') == 'disconnect':
@ -66,12 +64,10 @@ def communicate_ws(reconnect):
return True return True
elif data.get('type') == 'error': elif data.get('type') == 'error':
self.write_debug(data) self.write_debug(data)
message = try_get(data, lambda x: x['body']['code'], str) or recv message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
return DownloadError(message) return DownloadError(message)
elif self.ydl.params.get('verbose', False): elif self.ydl.params.get('verbose', False):
if len(recv) > 100: self.write_debug(f'Server response: {truncate_string(recv, 100)}')
recv = recv[:100] + '...'
self.to_screen(f'[debug] Server said: {recv}')
def ws_main(): def ws_main():
reconnect = False reconnect = False
@ -81,7 +77,8 @@ def ws_main():
if ret is True: if ret is True:
return return
except BaseException as e: except BaseException as e:
self.to_screen('[{}] {}: Connection error occured, reconnecting after 10 seconds: {}'.format('niconico:live', video_id, str_or_none(e))) self.to_screen(
f'[niconico:live] {video_id}: Connection error occured, reconnecting after 10 seconds: {e}')
time.sleep(10) time.sleep(10)
continue continue
finally: finally:

View File

@ -300,7 +300,6 @@
BrainPOPIlIE, BrainPOPIlIE,
BrainPOPJrIE, BrainPOPJrIE,
) )
from .bravotv import BravoTVIE
from .breitbart import BreitBartIE from .breitbart import BreitBartIE
from .brightcove import ( from .brightcove import (
BrightcoveLegacyIE, BrightcoveLegacyIE,
@ -1108,6 +1107,7 @@
from .massengeschmacktv import MassengeschmackTVIE from .massengeschmacktv import MassengeschmackTVIE
from .masters import MastersIE from .masters import MastersIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mave import MaveIE
from .mbn import MBNIE from .mbn import MBNIE
from .mdr import MDRIE from .mdr import MDRIE
from .medaltv import MedalTVIE from .medaltv import MedalTVIE
@ -1262,6 +1262,7 @@
) )
from .nbc import ( from .nbc import (
NBCIE, NBCIE,
BravoTVIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE, NBCOlympicsIE,
NBCOlympicsStreamIE, NBCOlympicsStreamIE,
@ -1269,6 +1270,7 @@
NBCSportsStreamIE, NBCSportsStreamIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
NBCStationsIE, NBCStationsIE,
SyfyIE,
) )
from .ndr import ( from .ndr import (
NDRIE, NDRIE,
@ -2016,13 +2018,11 @@
SverigesRadioPublicationIE, SverigesRadioPublicationIE,
) )
from .svt import ( from .svt import (
SVTIE,
SVTPageIE, SVTPageIE,
SVTPlayIE, SVTPlayIE,
SVTSeriesIE, SVTSeriesIE,
) )
from .swearnet import SwearnetEpisodeIE from .swearnet import SwearnetEpisodeIE
from .syfy import SyfyIE
from .syvdk import SYVDKIE from .syvdk import SYVDKIE
from .sztvhu import SztvHuIE from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE from .tagesschau import TagesschauIE
@ -2147,6 +2147,7 @@
from .toggo import ToggoIE from .toggo import ToggoIE
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .toutiao import ToutiaoIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ( from .toypics import (
ToypicsIE, ToypicsIE,

View File

@ -3,6 +3,7 @@
import re import re
import time import time
import urllib.parse import urllib.parse
import uuid
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
from .common import InfoExtractor from .common import InfoExtractor
@ -10,6 +11,7 @@
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
ExtractorError, ExtractorError,
parse_qs,
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
urlencode_postdata, urlencode_postdata,
@ -45,6 +47,8 @@
'name': 'Comcast XFINITY', 'name': 'Comcast XFINITY',
'username_field': 'user', 'username_field': 'user',
'password_field': 'passwd', 'password_field': 'passwd',
'login_hostname': 'login.xfinity.com',
'needs_newer_ua': True,
}, },
'TWC': { 'TWC': {
'name': 'Time Warner Cable | Spectrum', 'name': 'Time Warner Cable | Spectrum',
@ -74,6 +78,12 @@
'name': 'Verizon FiOS', 'name': 'Verizon FiOS',
'username_field': 'IDToken1', 'username_field': 'IDToken1',
'password_field': 'IDToken2', 'password_field': 'IDToken2',
'login_hostname': 'ssoauth.verizon.com',
},
'Fubo': {
'name': 'Fubo',
'username_field': 'username',
'password_field': 'password',
}, },
'Cablevision': { 'Cablevision': {
'name': 'Optimum/Cablevision', 'name': 'Optimum/Cablevision',
@ -1338,6 +1348,7 @@
'name': 'Sling TV', 'name': 'Sling TV',
'username_field': 'username', 'username_field': 'username',
'password_field': 'password', 'password_field': 'password',
'login_hostname': 'identity.sling.com',
}, },
'Suddenlink': { 'Suddenlink': {
'name': 'Suddenlink', 'name': 'Suddenlink',
@ -1355,7 +1366,6 @@
class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should end with BaseIE/InfoExtractor class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should end with BaseIE/InfoExtractor
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s' _SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0' _USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
_MODERN_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; rv:131.0) Gecko/20100101 Firefox/131.0'
_MVPD_CACHE = 'ap-mvpd' _MVPD_CACHE = 'ap-mvpd'
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page' _DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
@ -1367,6 +1377,14 @@ def _download_webpage_handle(self, *args, **kwargs):
return super()._download_webpage_handle( return super()._download_webpage_handle(
*args, **kwargs) *args, **kwargs)
@staticmethod
def _get_mso_headers(mso_info):
# yt-dlp's default user-agent is usually too old for some MSO's like Comcast_SSO
# See: https://github.com/yt-dlp/yt-dlp/issues/10848
return {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:131.0) Gecko/20100101 Firefox/131.0',
} if mso_info.get('needs_newer_ua') else {}
@staticmethod @staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating): def _get_mvpd_resource(provider_id, title, guid, rating):
channel = etree.Element('channel') channel = etree.Element('channel')
@ -1382,7 +1400,13 @@ def _get_mvpd_resource(provider_id, title, guid, rating):
resource_rating.text = rating resource_rating.text = rating
return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>' return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>'
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource): def _extract_mvpd_auth(self, url, video_id, requestor_id, resource, software_statement):
mso_id = self.get_param('ap_mso')
if mso_id:
mso_info = MSO_INFO[mso_id]
else:
mso_info = {}
def xml_text(xml_str, tag): def xml_text(xml_str, tag):
return self._search_regex( return self._search_regex(
f'<{tag}>(.+?)</{tag}>', xml_str, tag) f'<{tag}>(.+?)</{tag}>', xml_str, tag)
@ -1391,15 +1415,27 @@ def is_expired(token, date_ele):
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele))) token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele)))
return token_expires and token_expires <= int(time.time()) return token_expires and token_expires <= int(time.time())
def post_form(form_page_res, note, data={}): def post_form(form_page_res, note, data={}, validate_url=False):
form_page, urlh = form_page_res form_page, urlh = form_page_res
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url') post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
if not re.match(r'https?://', post_url): if not re.match(r'https?://', post_url):
post_url = urllib.parse.urljoin(urlh.url, post_url) post_url = urllib.parse.urljoin(urlh.url, post_url)
if validate_url:
# This request is submitting credentials so we should validate it when possible
url_parsed = urllib.parse.urlparse(post_url)
expected_hostname = mso_info.get('login_hostname')
if expected_hostname and expected_hostname != url_parsed.hostname:
raise ExtractorError(
f'Unexpected login URL hostname; expected "{expected_hostname}" but got '
f'"{url_parsed.hostname}". Aborting before submitting credentials')
if url_parsed.scheme != 'https':
self.write_debug('Upgrading login URL scheme to https')
post_url = urllib.parse.urlunparse(url_parsed._replace(scheme='https'))
form_data = self._hidden_inputs(form_page) form_data = self._hidden_inputs(form_page)
form_data.update(data) form_data.update(data)
return self._download_webpage_handle( return self._download_webpage_handle(
post_url, video_id, note, data=urlencode_postdata(form_data), headers={ post_url, video_id, note, data=urlencode_postdata(form_data), headers={
**self._get_mso_headers(mso_info),
'Content-Type': 'application/x-www-form-urlencoded', 'Content-Type': 'application/x-www-form-urlencoded',
}) })
@ -1432,40 +1468,72 @@ def extract_redirect_url(html, url=None, fatal=False):
} }
guid = xml_text(resource, 'guid') if '<' in resource else resource guid = xml_text(resource, 'guid') if '<' in resource else resource
count = 0 for _ in range(2):
while count < 2:
requestor_info = self.cache.load(self._MVPD_CACHE, requestor_id) or {} requestor_info = self.cache.load(self._MVPD_CACHE, requestor_id) or {}
authn_token = requestor_info.get('authn_token') authn_token = requestor_info.get('authn_token')
if authn_token and is_expired(authn_token, 'simpleTokenExpires'): if authn_token and is_expired(authn_token, 'simpleTokenExpires'):
authn_token = None authn_token = None
if not authn_token: if not authn_token:
mso_id = self.get_param('ap_mso') if not mso_id:
if mso_id: raise_mvpd_required()
username, password = self._get_login_info('ap_username', 'ap_password', mso_id) username, password = self._get_login_info('ap_username', 'ap_password', mso_id)
if not username or not password: if not username or not password:
raise_mvpd_required()
mso_info = MSO_INFO[mso_id]
provider_redirect_page_res = self._download_webpage_handle(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
}, headers={
# yt-dlp's default user-agent is usually too old for Comcast_SSO
# See: https://github.com/yt-dlp/yt-dlp/issues/10848
'User-Agent': self._MODERN_USER_AGENT,
} if mso_id == 'Comcast_SSO' else None)
elif not self._cookies_passed:
raise_mvpd_required() raise_mvpd_required()
if not mso_id: device_info, urlh = self._download_json_handle(
pass 'https://sp.auth.adobe.com/indiv/devices',
elif mso_id == 'Comcast_SSO': video_id, 'Registering device with Adobe',
data=json.dumps({'fingerprint': uuid.uuid4().hex}).encode(),
headers={'Content-Type': 'application/json; charset=UTF-8'})
device_id = device_info['deviceId']
mvpd_headers['pass_sfp'] = urlh.get_header('pass_sfp')
mvpd_headers['Ap_21'] = device_id
registration = self._download_json(
'https://sp.auth.adobe.com/o/client/register',
video_id, 'Registering client with Adobe',
data=json.dumps({'software_statement': software_statement}).encode(),
headers={'Content-Type': 'application/json; charset=UTF-8'})
access_token = self._download_json(
'https://sp.auth.adobe.com/o/client/token', video_id,
'Obtaining access token', data=urlencode_postdata({
'grant_type': 'client_credentials',
'client_id': registration['client_id'],
'client_secret': registration['client_secret'],
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})['access_token']
mvpd_headers['Authorization'] = f'Bearer {access_token}'
reg_code = self._download_json(
f'https://sp.auth.adobe.com/reggie/v1/{requestor_id}/regcode',
video_id, 'Obtaining registration code',
data=urlencode_postdata({
'requestor': requestor_id,
'deviceId': device_id,
'format': 'json',
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Authorization': f'Bearer {access_token}',
})['code']
provider_redirect_page_res = self._download_webpage_handle(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
'reg_code': reg_code,
}, headers=self._get_mso_headers(mso_info))
if mso_id == 'Comcast_SSO':
# Comcast page flow varies by video site and whether you # Comcast page flow varies by video site and whether you
# are on Comcast's network. # are on Comcast's network.
provider_redirect_page, urlh = provider_redirect_page_res provider_redirect_page, urlh = provider_redirect_page_res
@ -1489,8 +1557,8 @@ def extract_redirect_url(html, url=None, fatal=False):
oauth_redirect_url = extract_redirect_url( oauth_redirect_url = extract_redirect_url(
provider_redirect_page, fatal=True) provider_redirect_page, fatal=True)
provider_login_page_res = self._download_webpage_handle( provider_login_page_res = self._download_webpage_handle(
oauth_redirect_url, video_id, oauth_redirect_url, video_id, self._DOWNLOADING_LOGIN_PAGE,
self._DOWNLOADING_LOGIN_PAGE) headers=self._get_mso_headers(mso_info))
else: else:
provider_login_page_res = post_form( provider_login_page_res = post_form(
provider_redirect_page_res, provider_redirect_page_res,
@ -1500,24 +1568,35 @@ def extract_redirect_url(html, url=None, fatal=False):
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
mvpd_confirm_page, urlh = mvpd_confirm_page_res mvpd_confirm_page, urlh = mvpd_confirm_page_res
if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page: if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page:
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Philo': elif mso_id == 'Philo':
# Philo has very unique authentication method # Philo has very unique authentication method
self._download_webpage( self._request_webpage(
'https://idp.philo.com/auth/init/login_code', video_id, 'Requesting auth code', data=urlencode_postdata({ 'https://idp.philo.com/auth/init/login_code', video_id,
'Requesting Philo auth code', data=json.dumps({
'ident': username, 'ident': username,
'device': 'web', 'device': 'web',
'send_confirm_link': False, 'send_confirm_link': False,
'send_token': True, 'send_token': True,
})) 'device_ident': f'web-{uuid.uuid4().hex}',
'include_login_link': True,
}).encode(), headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
philo_code = getpass.getpass('Type auth code you have received [Return]: ') philo_code = getpass.getpass('Type auth code you have received [Return]: ')
self._download_webpage( self._request_webpage(
'https://idp.philo.com/auth/update/login_code', video_id, 'Submitting token', data=urlencode_postdata({ 'https://idp.philo.com/auth/update/login_code', video_id,
'token': philo_code, 'Submitting token', data=json.dumps({'token': philo_code}).encode(),
})) headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
mvpd_confirm_page_res = self._download_webpage_handle('https://idp.philo.com/idp/submit', video_id, 'Confirming Philo Login') mvpd_confirm_page_res = self._download_webpage_handle('https://idp.philo.com/idp/submit', video_id, 'Confirming Philo Login')
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Verizon': elif mso_id == 'Verizon':
@ -1539,7 +1618,7 @@ def extract_redirect_url(html, url=None, fatal=False):
provider_redirect_page_res, 'Logging in', { provider_redirect_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
saml_login_page, urlh = saml_login_page_res saml_login_page, urlh = saml_login_page_res
if 'Please try again.' in saml_login_page: if 'Please try again.' in saml_login_page:
raise ExtractorError( raise ExtractorError(
@ -1560,7 +1639,7 @@ def extract_redirect_url(html, url=None, fatal=False):
[saml_login_page, saml_redirect_url], 'Logging in', { [saml_login_page, saml_redirect_url], 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
if 'Please try again.' in saml_login_page: if 'Please try again.' in saml_login_page:
raise ExtractorError( raise ExtractorError(
'Failed to login, incorrect User ID or Password.') 'Failed to login, incorrect User ID or Password.')
@ -1631,7 +1710,7 @@ def extract_redirect_url(html, url=None, fatal=False):
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
provider_refresh_redirect_url = extract_redirect_url( provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.url) provider_association_redirect, url=urlh.url)
@ -1682,7 +1761,7 @@ def extract_redirect_url(html, url=None, fatal=False):
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
provider_refresh_redirect_url = extract_redirect_url( provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.url) provider_association_redirect, url=urlh.url)
@ -1699,6 +1778,27 @@ def extract_redirect_url(html, url=None, fatal=False):
query=hidden_data) query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Fubo':
_, urlh = provider_redirect_page_res
fubo_response = self._download_json(
'https://api.fubo.tv/partners/tve/connect', video_id,
'Authenticating with Fubo', 'Unable to authenticate with Fubo',
query=parse_qs(urlh.url), data=json.dumps({
'username': username,
'password': password,
}).encode(), headers={
'Accept': 'application/json',
'Content-Type': 'application/json',
})
self._request_webpage(
'https://sp.auth.adobe.com/adobe-services/oauth2', video_id,
'Authenticating with Adobe', 'Failed to authenticate with Adobe',
query={
'code': fubo_response['code'],
'state': fubo_response['state'],
})
else: else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh # Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed. # based redirect that should be followed.
@ -1717,7 +1817,8 @@ def extract_redirect_url(html, url=None, fatal=False):
} }
if mso_id in ('Cablevision', 'AlticeOne'): if mso_id in ('Cablevision', 'AlticeOne'):
form_data['_eventId_proceed'] = '' form_data['_eventId_proceed'] = ''
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', form_data) mvpd_confirm_page_res = post_form(
provider_login_page_res, 'Logging in', form_data, validate_url=True)
if mso_id != 'Rogers': if mso_id != 'Rogers':
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
@ -1727,6 +1828,7 @@ def extract_redirect_url(html, url=None, fatal=False):
'Retrieving Session', data=urlencode_postdata({ 'Retrieving Session', data=urlencode_postdata({
'_method': 'GET', '_method': 'GET',
'requestor_id': requestor_id, 'requestor_id': requestor_id,
'reg_code': reg_code,
}), headers=mvpd_headers) }), headers=mvpd_headers)
except ExtractorError as e: except ExtractorError as e:
if not mso_id and isinstance(e.cause, HTTPError) and e.cause.status == 401: if not mso_id and isinstance(e.cause, HTTPError) and e.cause.status == 401:
@ -1734,7 +1836,6 @@ def extract_redirect_url(html, url=None, fatal=False):
raise raise
if '<pendingLogout' in session: if '<pendingLogout' in session:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
authn_token = unescapeHTML(xml_text(session, 'authnToken')) authn_token = unescapeHTML(xml_text(session, 'authnToken'))
requestor_info['authn_token'] = authn_token requestor_info['authn_token'] = authn_token
@ -1755,7 +1856,6 @@ def extract_redirect_url(html, url=None, fatal=False):
}), headers=mvpd_headers) }), headers=mvpd_headers)
if '<pendingLogout' in authorize: if '<pendingLogout' in authorize:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
if '<error' in authorize: if '<error' in authorize:
raise ExtractorError(xml_text(authorize, 'details'), expected=True) raise ExtractorError(xml_text(authorize, 'details'), expected=True)
@ -1778,6 +1878,5 @@ def extract_redirect_url(html, url=None, fatal=False):
}), headers=mvpd_headers) }), headers=mvpd_headers)
if '<pendingLogout' in short_authorize: if '<pendingLogout' in short_authorize:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
return short_authorize return short_authorize

View File

@ -84,6 +84,8 @@ class AdultSwimIE(TurnerBaseIE):
'skip': '404 Not Found', 'skip': '404 Not Found',
}] }]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwNjg5ZmU2My00OTc5LTQxZmQtYWYxNC1hYjVlNmJjNWVkZWIiLCJuYmYiOjE1MzcxOTA2NzQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwNjc0fQ.Xl3AEduM0s1TxDQ6-XssdKIiLm261hhsEv1C1yo_nitIajZThSI9rXILqtIzO0aujoHhdzUnu_dUCq9ffiSBzEG632tTa1la-5tegHtce80cMhewBN4n2t8n9O5tiaPx8MPY8ALdm5wS7QzWE6DO_LTJKgE8Bl7Yv-CWJT4q4SywtNiQWLVOuhBRnDyfsRezxRwptw8qTn9dv5ZzUrVJaby5fDZ_nOncMKvegOgaKd5KEuCAGQ-mg-PSuValMjGuf6FwDguGaK7IyI5Y2oOrzXmD4Dj7q4WBg8w9QoZhtLeAU56mcsGILolku2R5FHlVLO9xhjResyt-pfmegOkpSw'
def _real_extract(self, url): def _real_extract(self, url):
show_path, episode_path = self._match_valid_url(url).groups() show_path, episode_path = self._match_valid_url(url).groups()
display_id = episode_path or show_path display_id = episode_path or show_path
@ -152,7 +154,7 @@ def _real_extract(self, url):
# CDN_TOKEN_APP_ID from: # CDN_TOKEN_APP_ID from:
# https://d2gg02c3xr550i.cloudfront.net/assets/asvp.e9c8bef24322d060ef87.bundle.js # https://d2gg02c3xr550i.cloudfront.net/assets/asvp.e9c8bef24322d060ef87.bundle.js
'appId': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhcHBJZCI6ImFzLXR2ZS1kZXNrdG9wLXB0enQ2bSIsInByb2R1Y3QiOiJ0dmUiLCJuZXR3b3JrIjoiYXMiLCJwbGF0Zm9ybSI6ImRlc2t0b3AiLCJpYXQiOjE1MzI3MDIyNzl9.BzSCk-WYOZ2GMCIaeVb8zWnzhlgnXuJTCu0jGp_VaZE', 'appId': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhcHBJZCI6ImFzLXR2ZS1kZXNrdG9wLXB0enQ2bSIsInByb2R1Y3QiOiJ0dmUiLCJuZXR3b3JrIjoiYXMiLCJwbGF0Zm9ybSI6ImRlc2t0b3AiLCJpYXQiOjE1MzI3MDIyNzl9.BzSCk-WYOZ2GMCIaeVb8zWnzhlgnXuJTCu0jGp_VaZE',
}, { }, self._SOFTWARE_STATEMENT, {
'url': url, 'url': url,
'site_name': 'AdultSwim', 'site_name': 'AdultSwim',
'auth_required': auth, 'auth_required': auth,

View File

@ -1,3 +1,5 @@
import json
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -6,7 +8,6 @@
remove_start, remove_start,
traverse_obj, traverse_obj,
update_url_query, update_url_query,
urlencode_postdata,
) )
@ -20,13 +21,13 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
_THEPLATFORM_KEY = '43jXaGRQud' _THEPLATFORM_KEY = '43jXaGRQud'
_THEPLATFORM_SECRET = 'S10BPXHMlb' _THEPLATFORM_SECRET = 'S10BPXHMlb'
_DOMAIN_MAP = { _DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'), 'history.com': ('HISTORY', 'history', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1MzZlMTQ3ZS0zMzFhLTQxY2YtYTMwNC01MDA2NzNlOGYwYjYiLCJuYmYiOjE1Mzg2NjMzMDksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM4NjYzMzA5fQ.n24-FVHLGXJe2D4atIQZ700aiXKIajKh5PWFoHJ40Az4itjtwwSFHnvufnoal3T8lYkwNLxce7H-IEGxIykRkZEdwq09pMKMT-ft9ASzE4vQ8fAWbf5ZgDME86x4Jq_YaxkRc9Ne0eShGhl8fgTJHvk07sfWcol61HJ7kU7K8FzzcHR0ucFQgA5VNd8RyjoGWY7c6VxnXR214LOpXsywmit04-vGJC102b_WA2EQfqI93UzG6M6l0EeV4n0_ijP3s8_i8WMJZ_uwnTafCIY6G_731i01dKXDLSFzG1vYglAwDa8DTcdrAAuIFFDF6QNGItCCmwbhjufjmoeVb7R1Gg'),
'aetv.com': ('AETV', 'aetv'), 'aetv.com': ('AETV', 'aetv', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI5Y2IwNjg2Yy03ODUxLTRiZDUtODcyMC00MjNlZTg1YTQ1NzMiLCJuYmYiOjE1Mzg2NjMyOTAsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM4NjYzMjkwfQ.T5Elf0X4TndO4NEgqBas1gDxNHGPVk_daO2Ha5FBzVO6xi3zM7eavdAKfYMCN7gpWYJx03iADaVPtczO_t_aGZczDjpwJHgTUzDgvcLZAVsVDqtDIAMy3S846rPgT6UDbVoxurA7B2VTPm9phjrSXhejvd0LBO8MQL4AZ3sy2VmiPJ2noT1ily5PuHCYlkrT1fheO064duR__Cd9DQ5VTMnKjzY3Cx345CEwKDkUk5gwgxhXM-aY0eblehrq8VD81_aRM_O3tvh7nbTydHOnUpV-k_iKVi49gqz7Sf8zb6Zh5z2Uftn3vYCfE5NQuesitoRMnsH17nW7o_D59hkRgg'),
'mylifetime.com': ('LIFETIME', 'lifetime'), 'mylifetime.com': ('LIFETIME', 'lifetime', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJmODg0MDM1ZC1mZGRmLTRmYjgtYmRkMC05MzRhZDdiYTAwYTciLCJuYmYiOjE1NDkzOTI2NDQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTQ5MzkyNjQ0fQ.vkTIaCpheKdKQd__2-3ec4qkcpbAhyCTvwe5iTl922ItSQfVhpEJG4wseVSNmBTrpBi0hvLedcw6Hj1_UuzBMVuVcCqLprU-pI8recEwL0u7G-eVkylsxe1OTUm1o3V6OykXQ9KlA-QQLL1neUhdhR1n5B1LZ4cmtBmiEpfgf4rFwXD1ScFylIcaWKLBqHoRBNUmxyTmoXXvn_A-GGSj9eCizFzY8W5uBwUcsoiw2Cr1skx7PbB2RSP1I5DsoIJKG-8XV1KS7MWl-fNLjE-hVAsI9znqfEEFcPBiv3LhCP4Nf4OIs7xAselMn0M0c8igRUZhURWX_hdygUAxkbKFtQ'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'), 'fyi.tv': ('FYI', 'fyi', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxOGZiOWM3Ny1mYmMzLTQxYTktYmE1Yi1lMzM0ZmUzNzU4NjEiLCJuYmYiOjE1ODc1ODAzNzcsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTg3NTgwMzc3fQ.AYDuipKswmIfLBfOjHRsfc5fMV5NmJUmiJnkpiep4VEw9QiXkygFj4bN06Si5tFc5Mee5TDrGzDpV6iuKbVpLT5kuqXhAn-Wozf5zKPsg_IpdEKO7gsiCq4calt72ct44KTqtKD_hVcoxQU24_HaJsRgXzu3B-6Ff6UrmsXkyvYifYVC9v2DSkdCuA02_IrlllzVT2kRuefUXgL4vQRtTFf77uYa0RKSTG7uVkiQ_AU41eXevKlO2qgtc14Hk5cZ7-ZNrDyMCXYA5ngdIHP7Gs9PWaFXT36PFHI_rC4EfxUABPzjQFxjpP75aX5qn8SH__HbM9q3hoPWgaEaf76qIQ'),
'fyi.tv': ('FYI', 'fyi'), 'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc', None),
'historyvault.com': (None, 'historyvault'), 'historyvault.com': (None, 'historyvault', None),
'biography.com': (None, 'biography'), 'biography.com': (None, 'biography', None),
} }
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
@ -71,7 +72,7 @@ def _extract_aen_smil(self, smil_url, video_id, auth=None):
} }
def _extract_aetn_info(self, domain, filter_key, filter_value, url): def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain] requestor_id, brand, software_statement = self._DOMAIN_MAP[domain]
result = self._download_json( result = self._download_json(
f'https://feeds.video.aetnd.com/api/v2/{brand}/videos', f'https://feeds.video.aetnd.com/api/v2/{brand}/videos',
filter_value, query={f'filter[{filter_key}]': filter_value}) filter_value, query={f'filter[{filter_key}]': filter_value})
@ -95,7 +96,7 @@ def _extract_aetn_info(self, domain, filter_key, filter_value, url):
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'), theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
traverse_obj(theplatform_metadata, ('ratings', 0, 'rating'))) traverse_obj(theplatform_metadata, ('ratings', 0, 'rating')))
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource, software_statement)
info.update(self._extract_aen_smil(media_url, video_id, auth)) info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({ info.update({
'title': title, 'title': title,
@ -132,10 +133,11 @@ class AENetworksIE(AENetworksBaseIE):
'tags': 'count:14', 'tags': 'count:14',
'categories': ['Mountain Men'], 'categories': ['Mountain Men'],
'episode_number': 1, 'episode_number': 1,
'episode': 'Episode 1', 'episode': 'Winter Is Coming',
'season': 'Season 1', 'season': 'Season 1',
'season_number': 1, 'season_number': 1,
'series': 'Mountain Men', 'series': 'Mountain Men',
'age_limit': 0,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -157,18 +159,18 @@ class AENetworksIE(AENetworksBaseIE):
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:^https?://.*\.jpe?g$',
'chapters': 'count:4', 'chapters': 'count:4',
'tags': 'count:23', 'tags': 'count:23',
'episode': 'Episode 1', 'episode': 'Inlawful Entry',
'episode_number': 1, 'episode_number': 1,
'season': 'Season 9', 'season': 'Season 9',
'season_number': 9, 'season_number': 9,
'series': 'Duck Dynasty', 'series': 'Duck Dynasty',
'age_limit': 0,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.',
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True, 'only_matching': True,
@ -203,18 +205,19 @@ def _real_extract(self, url):
class AENetworksListBaseIE(AENetworksBaseIE): class AENetworksListBaseIE(AENetworksBaseIE):
def _call_api(self, resource, slug, brand, fields): def _call_api(self, resource, slug, brand, fields):
return self._download_json( return self._download_json(
'https://yoga.appsvcs.aetnd.com/graphql', 'https://yoga.appsvcs.aetnd.com/graphql', slug,
slug, query={'brand': brand}, data=urlencode_postdata({ query={'brand': brand}, headers={'Content-Type': 'application/json'},
data=json.dumps({
'query': '''{ 'query': '''{
%s(slug: "%s") { %s(slug: "%s") {
%s %s
} }
}''' % (resource, slug, fields), # noqa: UP031 }''' % (resource, slug, fields), # noqa: UP031
}))['data'][resource] }).encode())['data'][resource]
def _real_extract(self, url): def _real_extract(self, url):
domain, slug = self._match_valid_url(url).groups() domain, slug = self._match_valid_url(url).groups()
_, brand = self._DOMAIN_MAP[domain] _, brand, _ = self._DOMAIN_MAP[domain]
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS) playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
base_url = f'http://watch.{domain}' base_url = f'http://watch.{domain}'

View File

@ -816,6 +816,26 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
'upload_date': '20111104', 'upload_date': '20111104',
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$', 'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
}, },
}, {
'note': 'new playurlSSRData scheme',
'url': 'https://www.bilibili.com/bangumi/play/ep678060',
'info_dict': {
'id': '678060',
'ext': 'mp4',
'series': '去你家吃饭好吗',
'series_id': '6198',
'season': '第二季',
'season_id': '42542',
'season_number': 2,
'episode': '吴老二:你家大公鸡养不熟,能煮熟吗…',
'episode_id': '678060',
'episode_number': 61,
'title': '一只小九九丫 吴老二:你家大公鸡养不熟,能煮熟吗…',
'duration': 266.123,
'timestamp': 1663315904,
'upload_date': '20220916',
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
},
}, { }, {
'url': 'https://www.bilibili.com/bangumi/play/ep267851', 'url': 'https://www.bilibili.com/bangumi/play/ep267851',
'info_dict': { 'info_dict': {
@ -879,12 +899,26 @@ def _real_extract(self, url):
'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id}, 'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id},
headers=headers)) headers=headers))
geo_blocked = traverse_obj(play_info, (
'raw', 'data', 'plugins', lambda _, v: v['name'] == 'AreaLimitPanel', 'config', 'is_block', {bool}, any))
premium_only = play_info.get('code') == -10403 premium_only = play_info.get('code') == -10403
play_info = traverse_obj(play_info, ('result', 'video_info', {dict})) or {}
formats = self.extract_formats(play_info) video_info = traverse_obj(play_info, (('result', ('raw', 'data')), 'video_info', {dict}, any)) or {}
if not formats and (premium_only or '成为大会员抢先看' in webpage or '开通大会员观看' in webpage): formats = self.extract_formats(video_info)
self.raise_login_required('This video is for premium members only')
if not formats:
if geo_blocked:
self.raise_geo_restricted()
elif premium_only or '成为大会员抢先看' in webpage or '开通大会员观看' in webpage:
self.raise_login_required('This video is for premium members only')
if traverse_obj(play_info, ((
('result', 'play_check', 'play_detail'), # 'PLAY_PREVIEW' vs 'PLAY_WHOLE'
('raw', 'data', 'play_video_type'), # 'preview' vs 'whole'
), any, {lambda x: x in ('PLAY_PREVIEW', 'preview')})):
self.report_warning(
'Only preview format is available, '
f'you have to become a premium member to access full video. {self._login_hint()}')
bangumi_info = self._download_json( bangumi_info = self._download_json(
'https://api.bilibili.com/pgc/view/web/season', episode_id, 'Get episode details', 'https://api.bilibili.com/pgc/view/web/season', episode_id, 'Get episode details',
@ -922,7 +956,7 @@ def _real_extract(self, url):
'season': str_or_none(season_title), 'season': str_or_none(season_title),
'season_id': str_or_none(season_id), 'season_id': str_or_none(season_id),
'season_number': season_number, 'season_number': season_number,
'duration': float_or_none(play_info.get('timelength'), scale=1000), 'duration': float_or_none(video_info.get('timelength'), scale=1000),
'subtitles': self.extract_subtitles(episode_id, episode_info.get('cid'), aid=aid), 'subtitles': self.extract_subtitles(episode_id, episode_info.get('cid'), aid=aid),
'__post_extractor': self.extract_comments(aid), '__post_extractor': self.extract_comments(aid),
'http_headers': {'Referer': url}, 'http_headers': {'Referer': url},
@ -1192,6 +1226,26 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'id': '313580179', 'id': '313580179',
}, },
'playlist_mincount': 92, 'playlist_mincount': 92,
}, {
# Hidden-mode collection
'url': 'https://space.bilibili.com/3669403/video',
'info_dict': {
'id': '3669403',
},
'playlist': [{
'info_dict': {
'_type': 'playlist',
'id': '3669403_3958082',
'title': '合集·直播回放',
'description': '',
'uploader': '月路Yuel',
'uploader_id': '3669403',
'timestamp': int,
'upload_date': str,
'thumbnail': str,
},
}],
'params': {'playlist_items': '7'},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -1248,8 +1302,14 @@ def get_metadata(page_data):
} }
def get_entries(page_data): def get_entries(page_data):
for entry in traverse_obj(page_data, ('list', 'vlist')) or []: for entry in traverse_obj(page_data, ('list', 'vlist', ..., {dict})):
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid']) if traverse_obj(entry, ('meta', 'attribute')) == 156:
# hidden-mode collection doesn't show its videos in uploads; extract as playlist instead
yield self.url_result(
f'https://space.bilibili.com/{entry["mid"]}/lists/{entry["meta"]["id"]}?type=season',
BilibiliCollectionListIE, f'{entry["mid"]}_{entry["meta"]["id"]}')
else:
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries) metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
return self.playlist_result(paged_list, playlist_id) return self.playlist_result(paged_list, playlist_id)

View File

@ -1,188 +0,0 @@
from .adobepass import AdobePassIE
from ..networking import HEADRequest
from ..utils import (
extract_attributes,
float_or_none,
get_element_html_by_class,
int_or_none,
merge_dicts,
parse_age_limit,
remove_end,
str_or_none,
traverse_obj,
unescapeHTML,
unified_timestamp,
update_url_query,
url_or_none,
)
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'info_dict': {
'id': '3923059',
'ext': 'mp4',
'title': 'The Top Chef Season 16 Winner Is...',
'description': 'Find out who takes the title of Top Chef!',
'upload_date': '20190314',
'timestamp': 1552591860,
'season_number': 16,
'episode_number': 15,
'series': 'Top Chef',
'episode': 'The Top Chef Season 16 Winner Is...',
'duration': 190.357,
'season': 'Season 16',
'thumbnail': r're:^https://.+\.jpg',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/top-chef/season-20/episode-1/london-calling',
'info_dict': {
'id': '9000234570',
'ext': 'mp4',
'title': 'London Calling',
'description': 'md5:5af95a8cbac1856bd10e7562f86bb759',
'upload_date': '20230310',
'timestamp': 1678410000,
'season_number': 20,
'episode_number': 1,
'series': 'Top Chef',
'episode': 'London Calling',
'duration': 3266.03,
'season': 'Season 20',
'chapters': 'count:7',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-1/closing-night',
'info_dict': {
'id': '3692045',
'ext': 'mp4',
'title': 'Closing Night',
'description': 'md5:3170065c5c2f19548d72a4cbc254af63',
'upload_date': '20180401',
'timestamp': 1522623600,
'season_number': 1,
'episode_number': 1,
'series': 'In Ice Cold Blood',
'episode': 'Closing Night',
'duration': 2629.051,
'season': 'Season 1',
'chapters': 'count:6',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'info_dict': {
'id': '3974019',
'ext': 'mp4',
'title': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'description': 'md5:f9d638dd6946a1c1c0533a9c6100eae5',
'upload_date': '20190617',
'timestamp': 1560790800,
'season_number': 2,
'episode_number': 16,
'series': 'In Ice Cold Blood',
'episode': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'duration': 68.235,
'season': 'Season 2',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url):
site, display_id = self._match_valid_url(url).group('site', 'id')
webpage = self._download_webpage(url, display_id)
settings = self._search_json(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>', webpage, 'settings', display_id)
tve = extract_attributes(get_element_html_by_class('tve-video-deck-app', webpage) or '')
query = {
'manifest': 'm3u',
'formats': 'm3u,mpeg4',
}
if tve:
account_pid = tve.get('data-mpx-media-account-pid') or 'HNK2IC'
account_id = tve['data-mpx-media-account-id']
metadata = self._parse_json(
tve.get('data-normalized-video', ''), display_id, fatal=False, transform_source=unescapeHTML)
video_id = tve.get('data-guid') or metadata['guid']
if tve.get('data-entitlement') == 'auth':
auth = traverse_obj(settings, ('tve_adobe_auth', {dict})) or {}
site = remove_end(site, 'tv')
release_pid = tve['data-release-pid']
resource = self._get_mvpd_resource(
tve.get('data-adobe-pass-resource-id') or auth.get('adobePassResourceId') or site,
tve['data-title'], release_pid, tve.get('data-rating'))
query.update({
'switch': 'HLSServiceSecure',
'auth': self._extract_mvpd_auth(
url, release_pid, auth.get('adobePassRequestorId') or site, resource),
})
else:
ls_playlist = traverse_obj(settings, ('ls_playlist', ..., {dict}), get_all=False) or {}
account_pid = ls_playlist.get('mpxMediaAccountPid') or 'PHSl-B'
account_id = ls_playlist['mpxMediaAccountId']
video_id = ls_playlist['defaultGuid']
metadata = traverse_obj(
ls_playlist, ('videos', lambda _, v: v['guid'] == video_id, {dict}), get_all=False)
tp_url = f'https://link.theplatform.com/s/{account_pid}/media/guid/{account_id}/{video_id}'
tp_metadata = self._download_json(
update_url_query(tp_url, {'format': 'preview'}), video_id, fatal=False)
chapters = traverse_obj(tp_metadata, ('chapters', ..., {
'start_time': ('startTime', {float_or_none(scale=1000)}),
'end_time': ('endTime', {float_or_none(scale=1000)}),
}))
# prune pointless single chapters that span the entire duration from short videos
if len(chapters) == 1 and not traverse_obj(chapters, (0, 'end_time')):
chapters = None
m3u8_url = self._request_webpage(HEADRequest(
update_url_query(f'{tp_url}/stream.m3u8', query)), video_id, 'Checking m3u8 URL').url
if 'mpeg_cenc' in m3u8_url:
self.report_drm(video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'chapters': chapters,
**merge_dicts(traverse_obj(tp_metadata, {
'title': 'title',
'description': 'description',
'duration': ('duration', {float_or_none(scale=1000)}),
'timestamp': ('pubDate', {float_or_none(scale=1000)}),
'season_number': (('pl1$seasonNumber', 'nbcu$seasonNumber'), {int_or_none}),
'episode_number': (('pl1$episodeNumber', 'nbcu$episodeNumber'), {int_or_none}),
'series': (('pl1$show', 'nbcu$show'), (None, ...), {str}),
'episode': (('title', 'pl1$episodeNumber', 'nbcu$episodeNumber'), {str_or_none}),
'age_limit': ('ratings', ..., 'rating', {parse_age_limit}),
}, get_all=False), traverse_obj(metadata, {
'title': 'title',
'description': 'description',
'duration': ('durationInSeconds', {int_or_none}),
'timestamp': ('airDate', {unified_timestamp}),
'thumbnail': ('thumbnailUrl', {url_or_none}),
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode': 'episodeTitle',
'series': 'show',
})),
}

View File

@ -495,8 +495,6 @@ def _real_extract(self, url):
class BrightcoveNewBaseIE(AdobePassIE): class BrightcoveNewBaseIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}): def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
formats, subtitles = [], {} formats, subtitles = [], {}
sources = json_data.get('sources') or [] sources = json_data.get('sources') or []
for source in sources: for source in sources:
@ -600,16 +598,18 @@ def build_format_id(kind):
return { return {
'id': video_id, 'id': video_id,
'title': title,
'description': clean_html(json_data.get('description')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'duration': duration, 'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': json_data.get('account_id'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live, 'is_live': is_live,
**traverse_obj(json_data, {
'title': ('name', {clean_html}),
'description': ('description', {clean_html}),
'tags': ('tags', ..., {str}, filter, all, filter),
'timestamp': ('published_at', {parse_iso8601}),
'uploader_id': ('account_id', {str}),
}),
} }
@ -645,10 +645,7 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'uploader_id': '4036320279001', 'uploader_id': '4036320279001',
'formats': 'mincount:39', 'formats': 'mincount:39',
}, },
'params': { 'skip': '404 Not Found',
# m3u8 download
'skip_download': True,
},
}, { }, {
# playlist stream # playlist stream
'url': 'https://players.brightcove.net/1752604059001/S13cJdUBz_default/index.html?playlistId=5718313430001', 'url': 'https://players.brightcove.net/1752604059001/S13cJdUBz_default/index.html?playlistId=5718313430001',
@ -709,7 +706,6 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'TGD_01-032_5', 'title': 'TGD_01-032_5',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'tags': [],
'timestamp': 1646078943, 'timestamp': 1646078943,
'uploader_id': '1569565978001', 'uploader_id': '1569565978001',
'upload_date': '20220228', 'upload_date': '20220228',
@ -721,7 +717,6 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'TGD 01-087 (Airs 05.25.22)_Segment 5', 'title': 'TGD 01-087 (Airs 05.25.22)_Segment 5',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'tags': [],
'timestamp': 1651604591, 'timestamp': 1651604591,
'uploader_id': '1569565978001', 'uploader_id': '1569565978001',
'upload_date': '20220503', 'upload_date': '20220503',
@ -923,10 +918,18 @@ def extract_policy_key():
errors = json_data.get('errors') errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH': if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
custom_fields = json_data['custom_fields'] custom_fields = json_data['custom_fields']
missing_fields = ', '.join(
key for key in ('source_url', 'software_statement') if not smuggled_data.get(key))
if missing_fields:
raise ExtractorError(
f'Missing fields in smuggled data: {missing_fields}. '
f'This video can be only extracted from the webpage where it is embedded. '
f'Pass the URL of the embedding webpage instead of the Brightcove URL', expected=True)
tve_token = self._extract_mvpd_auth( tve_token = self._extract_mvpd_auth(
smuggled_data['source_url'], video_id, smuggled_data['source_url'], video_id,
custom_fields['bcadobepassrequestorid'], custom_fields['bcadobepassrequestorid'],
custom_fields['bcadobepassresourceid']) custom_fields['bcadobepassresourceid'],
smuggled_data['software_statement'])
json_data = self._download_json( json_data = self._download_json(
api_url, video_id, headers={ api_url, video_id, headers={
'Accept': f'application/json;pk={policy_key}', 'Accept': f'application/json;pk={policy_key}',

View File

@ -11,7 +11,7 @@
class CloudyCDNIE(InfoExtractor): class CloudyCDNIE(InfoExtractor):
_VALID_URL = r'(?:https?:)?//embed\.cloudycdn\.services/(?P<site_id>[^/?#]+)/media/(?P<id>[\w-]+)' _VALID_URL = r'(?:https?:)?//embed\.(?P<domain>cloudycdn\.services|backscreen\.com)/(?P<site_id>[^/?#]+)/media/(?P<id>[\w-]+)'
_EMBED_REGEX = [rf'<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})'] _EMBED_REGEX = [rf'<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{ _TESTS = [{
'url': 'https://embed.cloudycdn.services/ltv/media/46k_d23-6000-105?', 'url': 'https://embed.cloudycdn.services/ltv/media/46k_d23-6000-105?',
@ -23,7 +23,7 @@ class CloudyCDNIE(InfoExtractor):
'duration': 1442, 'duration': 1442,
'upload_date': '20231121', 'upload_date': '20231121',
'title': 'D23-6000-105_cetstud', 'title': 'D23-6000-105_cetstud',
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/660858/placeholder1700589200.jpg', 'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
}, },
}, { }, {
'url': 'https://embed.cloudycdn.services/izm/media/26e_lv-8-5-1', 'url': 'https://embed.cloudycdn.services/izm/media/26e_lv-8-5-1',
@ -33,7 +33,7 @@ class CloudyCDNIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'LV-8-5-1', 'title': 'LV-8-5-1',
'timestamp': 1669767167, 'timestamp': 1669767167,
'thumbnail': 'https://store.cloudycdn.services/tmsp00120/assets/media/488306/placeholder1679423604.jpg', 'thumbnail': 'https://store.bstrm.net/tmsp00120/assets/media/488306/placeholder1679423604.jpg',
'duration': 1205, 'duration': 1205,
'upload_date': '20221130', 'upload_date': '20221130',
}, },
@ -48,9 +48,21 @@ class CloudyCDNIE(InfoExtractor):
'duration': 1673, 'duration': 1673,
'title': 'D24-6000-074-cetstud', 'title': 'D24-6000-074-cetstud',
'timestamp': 1718902233, 'timestamp': 1718902233,
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/788392/placeholder1718903938.jpg', 'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/788392/placeholder1718903938.jpg',
}, },
'params': {'format': 'bv'}, 'params': {'format': 'bv'},
}, {
'url': 'https://embed.backscreen.com/ltv/media/32j_z25-0600-127?',
'md5': '9b6fa09ac1a4de53d4f42b94affc3b42',
'info_dict': {
'id': '32j_z25-0600-127',
'ext': 'mp4',
'title': 'Z25-0600-127-DZ',
'duration': 1906,
'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/977427/placeholder1746633646.jpg',
'timestamp': 1746632402,
'upload_date': '20250507',
},
}] }]
_WEBPAGE_TESTS = [{ _WEBPAGE_TESTS = [{
'url': 'https://www.tavaklase.lv/video/es-esmu-mina-um-2/', 'url': 'https://www.tavaklase.lv/video/es-esmu-mina-um-2/',
@ -60,17 +72,17 @@ class CloudyCDNIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20230223', 'upload_date': '20230223',
'duration': 629, 'duration': 629,
'thumbnail': 'https://store.cloudycdn.services/tmsp00120/assets/media/518407/placeholder1678748124.jpg', 'thumbnail': 'https://store.bstrm.net/tmsp00120/assets/media/518407/placeholder1678748124.jpg',
'timestamp': 1677181513, 'timestamp': 1677181513,
'title': 'LIB-2', 'title': 'LIB-2',
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
site_id, video_id = self._match_valid_url(url).group('site_id', 'id') domain, site_id, video_id = self._match_valid_url(url).group('domain', 'site_id', 'id')
data = self._download_json( data = self._download_json(
f'https://player.cloudycdn.services/player/{site_id}/media/{video_id}/', f'https://player.{domain}/player/{site_id}/media/{video_id}/',
video_id, data=urlencode_postdata({ video_id, data=urlencode_postdata({
'version': '6.4.0', 'version': '6.4.0',
'referer': url, 'referer': url,

View File

@ -101,6 +101,7 @@
xpath_with_ns, xpath_with_ns,
) )
from ..utils._utils import _request_dump_filename from ..utils._utils import _request_dump_filename
from ..utils.jslib import devalue
class InfoExtractor: class InfoExtractor:
@ -262,6 +263,9 @@ class InfoExtractor:
* http_chunk_size Chunk size for HTTP downloads * http_chunk_size Chunk size for HTTP downloads
* ffmpeg_args Extra arguments for ffmpeg downloader (input) * ffmpeg_args Extra arguments for ffmpeg downloader (input)
* ffmpeg_args_out Extra arguments for ffmpeg downloader (output) * ffmpeg_args_out Extra arguments for ffmpeg downloader (output)
* ws (NiconicoLiveFD only) WebSocketResponse
* ws_url (NiconicoLiveFD only) Websockets URL
* max_quality (NiconicoLiveFD only) Max stream quality string
* is_dash_periods Whether the format is a result of merging * is_dash_periods Whether the format is a result of merging
multiple DASH periods. multiple DASH periods.
RTMP formats can also have the additional fields: page_url, RTMP formats can also have the additional fields: page_url,
@ -1675,9 +1679,9 @@ def extract_video_object(e):
'ext': mimetype2ext(e.get('encodingFormat')), 'ext': mimetype2ext(e.get('encodingFormat')),
'title': unescapeHTML(e.get('name')), 'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')), 'description': unescapeHTML(e.get('description')),
'thumbnails': [{'url': unescapeHTML(url)} 'thumbnails': traverse_obj(e, (('thumbnailUrl', 'thumbnailURL', 'thumbnail_url'), (None, ...), {
for url in variadic(traverse_obj(e, 'thumbnailUrl', 'thumbnailURL')) 'url': ({str}, {unescapeHTML}, {self._proto_relative_url}, {url_or_none}),
if url_or_none(url)], })),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
# author can be an instance of 'Organization' or 'Person' types. # author can be an instance of 'Organization' or 'Person' types.
@ -1795,6 +1799,63 @@ def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal
ret = self._parse_json(js, video_id, transform_source=functools.partial(js_to_json, vars=args), fatal=fatal) ret = self._parse_json(js, video_id, transform_source=functools.partial(js_to_json, vars=args), fatal=fatal)
return traverse_obj(ret, traverse) or {} return traverse_obj(ret, traverse) or {}
def _resolve_nuxt_array(self, array, video_id, *, fatal=True, default=NO_DEFAULT):
"""Resolves Nuxt rich JSON payload arrays"""
# Ref: https://github.com/nuxt/nuxt/commit/9e503be0f2a24f4df72a3ccab2db4d3e63511f57
# https://github.com/nuxt/nuxt/pull/19205
if default is not NO_DEFAULT:
fatal = False
if not isinstance(array, list) or not array:
error_msg = 'Unable to resolve Nuxt JSON data: invalid input'
if fatal:
raise ExtractorError(error_msg, video_id=video_id)
elif default is NO_DEFAULT:
self.report_warning(error_msg, video_id=video_id)
return {} if default is NO_DEFAULT else default
def indirect_reviver(data):
return data
def json_reviver(data):
return json.loads(data)
gen = devalue.parse_iter(array, revivers={
'NuxtError': indirect_reviver,
'EmptyShallowRef': json_reviver,
'EmptyRef': json_reviver,
'ShallowRef': indirect_reviver,
'ShallowReactive': indirect_reviver,
'Ref': indirect_reviver,
'Reactive': indirect_reviver,
})
while True:
try:
error_msg = f'Error resolving Nuxt JSON: {gen.send(None)}'
if fatal:
raise ExtractorError(error_msg, video_id=video_id)
elif default is NO_DEFAULT:
self.report_warning(error_msg, video_id=video_id, only_once=True)
else:
self.write_debug(f'{video_id}: {error_msg}', only_once=True)
except StopIteration as error:
return error.value or ({} if default is NO_DEFAULT else default)
def _search_nuxt_json(self, webpage, video_id, *, fatal=True, default=NO_DEFAULT):
"""Parses metadata from Nuxt rich JSON payloads embedded in HTML"""
passed_default = default is not NO_DEFAULT
array = self._search_json(
r'<script\b[^>]+\bid="__NUXT_DATA__"[^>]*>', webpage,
'Nuxt JSON data', video_id, contains_pattern=r'\[(?s:.+)\]',
fatal=fatal, default=NO_DEFAULT if not passed_default else None)
if not array:
return default if passed_default else {}
return self._resolve_nuxt_array(array, video_id, fatal=fatal, default=default)
@staticmethod @staticmethod
def _hidden_inputs(html): def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html) html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)

View File

@ -206,7 +206,7 @@ def _real_extract(self, url):
'is_live': True, 'is_live': True,
**traverse_obj(room, { **traverse_obj(room, {
'display_id': ('url', {str}, {lambda i: i[1:]}), 'display_id': ('url', {str}, {lambda i: i[1:]}),
'title': ('room_name', {unescapeHTML}), 'title': ('room_name', {str}, {unescapeHTML}),
'description': ('show_details', {str}), 'description': ('show_details', {str}),
'uploader': ('nickname', {str}), 'uploader': ('nickname', {str}),
'thumbnail': ('room_src', {url_or_none}), 'thumbnail': ('room_src', {url_or_none}),

View File

@ -64,7 +64,7 @@ class DreiSatIE(ZDFBaseIE):
'title': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1', 'title': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1',
'description': 'md5:bae51bfc22f15563ce3acbf97d2e8844', 'description': 'md5:bae51bfc22f15563ce3acbf97d2e8844',
'duration': 5399.0, 'duration': 5399.0,
'thumbnail': 'https://www.3sat.de/assets/buchmesse-kerkeling-100~original?cb=1743329640903', 'thumbnail': 'https://www.3sat.de/assets/buchmesse-kerkeling-100~original?cb=1747256996338',
'chapters': 'count:24', 'chapters': 'count:24',
'episode': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1', 'episode': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1',
'episode_id': 'POS_1ef236cc-b390-401e-acd0-4fb4b04315fb', 'episode_id': 'POS_1ef236cc-b390-401e-acd0-4fb4b04315fb',

View File

@ -329,6 +329,7 @@ class WatchESPNIE(AdobePassIE):
}] }]
_API_KEY = 'ZXNwbiZicm93c2VyJjEuMC4w.ptUt7QxsteaRruuPmGZFaJByOoqKvDP2a5YkInHrc7c' _API_KEY = 'ZXNwbiZicm93c2VyJjEuMC4w.ptUt7QxsteaRruuPmGZFaJByOoqKvDP2a5YkInHrc7c'
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIyZGJmZWM4My03OWE1LTQyNzEtYTVmZC04NTZjYTMxMjRjNjMiLCJuYmYiOjE1NDAyMTI3NjEsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTQwMjEyNzYxfQ.yaK3r4AI2uLVvsyN1GLzqzgzRlxMPtasSaiYYBV0wIstqih5tvjTmeoLmi8Xy9Kp_U7Md-bOffwiyK3srHkpUkhhwXLH2x6RPjmS1tPmhaG7-3LBcHTf2ySPvXhVf7cN4ngldawK4tdtLtsw6rF_JoZE2yaC6XbS2F51nXSFEDDnOQWIHEQRG3aYAj-38P2CLGf7g-Yfhbp5cKXeksHHQ90u3eOO4WH0EAjc9oO47h33U8KMEXxJbvjV5J8Va2G2fQSgLDZ013NBI3kQnE313qgqQh2feQILkyCENpB7g-TVBreAjOaH1fU471htSoGGYepcAXv-UDtpgitDiLy7CQ'
def _call_bamgrid_api(self, path, video_id, payload=None, headers={}): def _call_bamgrid_api(self, path, video_id, payload=None, headers={}):
if 'Authorization' not in headers: if 'Authorization' not in headers:
@ -405,8 +406,8 @@ def _real_extract(self, url):
# TV Provider required # TV Provider required
else: else:
resource = self._get_mvpd_resource('ESPN', video_data['name'], video_id, None) resource = self._get_mvpd_resource('espn1', video_data['name'], video_id, None)
auth = self._extract_mvpd_auth(url, video_id, 'ESPN', resource).encode() auth = self._extract_mvpd_auth(url, video_id, 'ESPN', resource, self._SOFTWARE_STATEMENT).encode()
asset = self._download_json( asset = self._download_json(
f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb', f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb',

View File

@ -7,161 +7,157 @@
int_or_none, int_or_none,
join_nonempty, join_nonempty,
parse_age_limit, parse_age_limit,
remove_end,
remove_start,
traverse_obj,
try_get,
unified_timestamp, unified_timestamp,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import traverse_obj
class GoIE(AdobePassIE): class GoIE(AdobePassIE):
_SITE_INFO = { _SITE_INFO = {
'abc': { 'abc': {
'brand': '001', 'brand': '001',
'requestor_id': 'ABC', 'requestor_id': 'dtci',
'provider_id': 'ABC',
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI4OTcwMjlkYS0yYjM1LTQyOWUtYWQ0NS02ZjZiZjVkZTdhOTUiLCJuYmYiOjE2MjAxNzM5NjksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNjIwMTczOTY5fQ.SC69DVJWSL8sIe-vVUrP6xS_kzHKqwz9PdKYexs_y-f7Vin6mM-7S-W1TE_-K55O0pyf-TL4xYgvm6LIye8CckG-nZfVwNPV4huduov0jmIcxCQFeUwkHULG2IaA44wfBVUBdaHgkhPweZ2amjycO_IXtez-gBXOLbE3B7Gx9j_5ISCFtyVUblThKfoGyQv6KT6t8Vpmc4ZSKCCQp74KWFFypydb9ucego1taW_nQD06Cdf4yByLd6NaTBceMcIKbug9b9gxFm3XBgJ5q3z7KGo1Kr6XalAV5j4m-fQ91wczlTilX8FM4AljMupyRM9mA_aEADILQ4hS79q4SM0w6w',
}, },
'freeform': { 'freeform': {
'brand': '002', 'brand': '002',
'requestor_id': 'ABCFamily', 'requestor_id': 'ABCFamily',
}, 'provider_id': 'ABCFamily',
'watchdisneychannel': { 'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZWM2MGYyNC0xYzRjLTQ1NzQtYjc0Zi03ZmM4N2E5YWMzMzgiLCJuYmYiOjE1ODc2NjU5MjMsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTg3NjY1OTIzfQ.flCn3dhvmvPnWmV0JV8Fm0YFyj07yPez9-n1GFEwVIm_S2wQVWbWyJhqsAyLZVFrhOMZYTqmPS3OHxGwTwXkEYn6PD7o_vIVG3oqi-Xn1m5jRt_Gazw5qEtpat6VE7bvKGSD3ZhcidOrsCk8NcYyq75u61NHDvSl81pcedJjVRVUpsqrEwmo0aVbA0C8PX3ri0mEbGvkMKvHn8E60xp-PSE-VK8SDT0plwPu_TwUszkZ6-_I8_2xcv_WBqcXFkAVg7Q-iNJXgQvmNsrpcrYuLvi6hEH4ZLtoDcXU6MhwTQAJTiHSo8x9aHX1_qFP09CzlNOFQbC2ZEJdP9SvA53SLQ',
'brand': '004',
'resource_id': 'Disney',
},
'watchdisneyjunior': {
'brand': '008',
'resource_id': 'DisneyJunior',
},
'watchdisneyxd': {
'brand': '009',
'resource_id': 'DisneyXD',
}, },
'disneynow': { 'disneynow': {
'brand': '011', 'brand': '011', # also: '004', '008', '009'
'requestor_id': 'DisneyChannels',
'provider_id': 'DisneyChannels',
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1MzAzNTRiOS04NDNiLTRkNjAtYTQ3ZS0yNzk1MzlkOTIyNTciLCJuYmYiOjE1NTg5ODc0NDksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTU4OTg3NDQ5fQ.Jud6YS6-J2h0h6po0oMheDym0qRTJQGj4kzacrz4DFuEwhcBkkykW6pF5pKuAUJy9HCZ40oDAHe2KcTlDJjCZF5tDaUEfdihakZ9cC_rG7MU-QoRne8qaB_dPDKwGuk-ZyWD8eV3zwTJmbGo8hDxYTEU81YNCxwhyc_BPDr5TYiubbmpP3_pTnXmSpuL58isJ2peSKWlX9BacuXtBY25c_QnPFKk-_EETm7IHkTpDazde1QfHWGu4s4yJpKGk8RVVujVG6h6ELlL-ZeYLilBm7iS7h1TYG1u7fJhyZRL7isaom6NvAzsvN3ngss1fLwt8decP8wzdFHrbYTdTjW8qw',
'resource_id': 'Disney', 'resource_id': 'Disney',
}, },
'fxnow.fxnetworks': { 'fxnetworks': {
'brand': '025', 'brand': '025', # also: '020'
'requestor_id': 'dtci', 'requestor_id': 'dtci',
'provider_id': 'fx', # also 'fxx', 'fxm'
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIzYWRhYWZiNC02OTAxLTRlYzktOTdmNy1lYWZkZTJkODJkN2EiLCJuYmYiOjE1NjIwMjQwNzYsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTYyMDI0MDc2fQ.dhKMpZK50AObbZYrMiYPSfWtzXHUaeMP3jrIY4Cgfvh0GaEgk0Mns_zp78jypFeZgRtPVleQMQDNq2YEloRLcAGqP1aa6WVDglnK77ZWUm4IKai14Rwf3A6YBhSRoO2_lMmUGkuTf6gZY-kMIPqBYKqzTQiQl4HbniPFodIzFRiuI9QJVrkoyTGrJL4oqiX08PoFI3Z-TOti1Heu3EbFC-GveQHhlinYrzU7rbiAqLEz7FImtfBDsnXX1Y3uJDLYM3Bq4Oh0nrzTv1Fd62wNsCNErHHIbELidh1zZF0ujvt7ReuZUwAitm0UhEJ7OxNOUbEQWtae6pVNscvdvTFMpg',
},
'nationalgeographic': {
'brand': '026', # also '023'
'requestor_id': 'dtci',
'provider_id': 'ngc', # also 'ngw'
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxMzE4YTM1Ni05Mjc4LTQ4NjEtYTFmNi1jMTIzMzg1ZWMzYzMiLCJuYmYiOjE1NjIwMjM4MjgsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTYyMDIzODI4fQ.Le-2OzF9-jrhJ7ZfWtLWk5iSHGVZoxeU1w0_fO--Heli0OwRZsRq2slSmx-oZTzxuWmAgDEiBkWSDcDK6sM25DrCLsdsJa3MBuZ-slBRtH8aq3HpNoqqLkU-vg6gRUEKMtwBUtwCu_9aKUCayYtndWv4b1DjVQeSrteOW5NNudWVYleAe0kxeNJQHo5If9SCzDudKVJktFUjhNks4QPOC_uONPkRRlL9D0fNvtOY-LRFckfcHhf5z9l1iZjeukV0YhdKnuw1wyiaWrQXBUDiBfbkCRd2DM-KnelqPxfiXCaTjGKDURRBO3pz33ebge3IFXSiU5vl4qHQ8xvunzGpFw',
}, },
} }
_VALID_URL = r'''(?x) _URL_PATH_RE = r'(?:video|episode|movies-and-specials)/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})'
https?:// _VALID_URL = [
(?P<sub_domain> fr'https?://(?:www\.)?(?P<site>abc)\.com/{_URL_PATH_RE}',
(?:{}\.)?go|fxnow\.fxnetworks| fr'https?://(?:www\.)?(?P<site>freeform)\.com/{_URL_PATH_RE}',
(?:www\.)?(?:abc|freeform|disneynow) fr'https?://(?:www\.)?(?P<site>disneynow)\.com/{_URL_PATH_RE}',
)\.com/ fr'https?://fxnow\.(?P<site>fxnetworks)\.com/{_URL_PATH_RE}',
(?: fr'https?://(?:www\.)?(?P<site>nationalgeographic)\.com/tv/{_URL_PATH_RE}',
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)| ]
(?:[^/]+/)*(?P<display_id>[^/?\#]+)
)
'''.format(r'\.|'.join(list(_SITE_INFO.keys())))
_TESTS = [{ _TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643', 'url': 'https://abc.com/episode/4192c0e6-26e5-47a8-817b-ce8272b9e440/playlist/PL551127435',
'info_dict': { 'info_dict': {
'id': 'VDKA3807643', 'id': 'VDKA10805898',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Traitor in the White House', 'title': 'Switch the Flip',
'description': 'md5:05b009d2d145a1e85d25111bd37222e8', 'description': 'To help get Brians life in order, Stewie and Brian swap bodies using a machine that Stewie invents.',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'This content is no longer available.',
}, {
'url': 'https://disneynow.com/shows/big-hero-6-the-series',
'info_dict': {
'title': 'Doraemon',
'id': 'SH55574025',
},
'playlist_mincount': 51,
}, {
'url': 'http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood',
'info_dict': {
'id': 'VDKA3609139',
'title': 'This Guilty Blood',
'description': 'md5:f18e79ad1c613798d95fdabfe96cd292',
'age_limit': 14, 'age_limit': 14,
'duration': 1297,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'Family Guy',
'season': 'Season 16',
'season_number': 16,
'episode': 'Episode 17',
'episode_number': 17,
'timestamp': 1746082800.0,
'upload_date': '20250501',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://disneynow.com/episode/21029660-ba06-4406-adb0-a9a78f6e265e/playlist/PL553044961',
'info_dict': {
'id': 'VDKA39546942',
'ext': 'mp4',
'title': 'Zero Friends Again',
'description': 'Relationships fray under the pressures of a difficult journey.',
'age_limit': 0,
'duration': 1721,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'Star Wars: Skeleton Crew',
'season': 'Season 1',
'season_number': 1,
'episode': 'Episode 6',
'episode_number': 6,
'timestamp': 1746946800.0,
'upload_date': '20250511',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://fxnow.fxnetworks.com/episode/09f4fa6f-c293-469e-aebe-32c9ca5842a7/playlist/PL554408064',
'info_dict': {
'id': 'VDKA38112033',
'ext': 'mp4',
'title': 'The Return of Jerry',
'description': 'The vampires long-lost fifth roommate returns. Written by Paul Simms; directed by Kyle Newacheck.',
'age_limit': 17,
'duration': 1493,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'What We Do in the Shadows',
'season': 'Season 6',
'season_number': 6,
'episode': 'Episode 1', 'episode': 'Episode 1',
'upload_date': '20170102',
'season': 'Season 2',
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/abcf/Shadowhunters/video/201/ae5f75608d86bf88aa4f9f4aa76ab1b7/579x325-Q100_ae5f75608d86bf88aa4f9f4aa76ab1b7.jpg',
'duration': 2544,
'season_number': 2,
'series': 'Shadowhunters',
'episode_number': 1, 'episode_number': 1,
'timestamp': 1483387200, 'timestamp': 1729573200.0,
'ext': 'mp4', 'upload_date': '20241022',
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
}, },
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, { }, {
'url': 'https://abc.com/shows/the-rookie/episode-guide/season-04/12-the-knock', 'url': 'https://www.freeform.com/episode/bda0eaf7-761a-4838-aa44-96f794000844/playlist/PL553044961',
'info_dict': { 'info_dict': {
'id': 'VDKA26050359', 'id': 'VDKA39007340',
'title': 'The Knock',
'description': 'md5:0c2947e3ada4c31f28296db7db14aa64',
'age_limit': 14,
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/abc/TheRookie/video/412/daf830d06e83b11eaf5c0a299d993ae3/1556x876-Q75_daf830d06e83b11eaf5c0a299d993ae3.jpg', 'title': 'Angel\'s Landing',
'episode': 'Episode 12', 'description': 'md5:91bf084e785c968fab16734df7313446',
'season_number': 4, 'age_limit': 14,
'season': 'Season 4', 'duration': 2523,
'timestamp': 1642975200, 'thumbnail': r're:https?://.+/.+\.jpg',
'episode_number': 12, 'series': 'How I Escaped My Cult',
'upload_date': '20220123', 'season': 'Season 1',
'series': 'The Rookie', 'season_number': 1,
'duration': 2572, 'episode': 'Episode 2',
}, 'episode_number': 2,
'params': { 'timestamp': 1740038400.0,
'geo_bypass_ip_block': '3.244.239.0/24', 'upload_date': '20250220',
# m3u8 download
'skip_download': True,
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://fxnow.fxnetworks.com/shows/better-things/video/vdka12782841', 'url': 'https://www.nationalgeographic.com/tv/episode/ca694661-1186-41ae-8089-82f64d69b16d/playlist/PL554408064',
'info_dict': { 'info_dict': {
'id': 'VDKA12782841', 'id': 'VDKA39492078',
'title': 'First Look: Better Things - Season 2',
'description': 'md5:fa73584a95761c605d9d54904e35b407',
'ext': 'mp4', 'ext': 'mp4',
'age_limit': 14, 'title': 'Heart of the Emperors',
'upload_date': '20170825', 'description': 'md5:4fc50a2878f030bb3a7eac9124dca677',
'duration': 161, 'age_limit': 0,
'series': 'Better Things', 'duration': 2775,
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/fx/BetterThings/video/12782841/b6b05e58264121cc2c98811318e6d507/1556x876-Q75_b6b05e58264121cc2c98811318e6d507.jpg', 'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1503661074, 'series': 'Secrets of the Penguins',
}, 'season': 'Season 1',
'params': { 'season_number': 1,
'geo_bypass_ip_block': '3.244.239.0/24', 'episode': 'Episode 1',
# m3u8 download 'episode_number': 1,
'skip_download': True, 'timestamp': 1745204400.0,
'upload_date': '20250421',
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'https://www.freeform.com/movies-and-specials/c38281fc-9f8f-47c7-8220-22394f9df2e1',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://abc.go.com/shows/world-news-tonight/episode-guide/2017-02/17-021717-intense-stand-off-between-man-with-rifle-and-police-in-oakland', 'url': 'https://abc.com/video/219a454a-172c-41bf-878a-d169e6bc0bdc/playlist/PL5523098420',
'only_matching': True,
}, {
# brand 004
'url': 'http://disneynow.go.com/shows/big-hero-6-the-series/season-01/episode-10-mr-sparkles-loses-his-sparkle/vdka4637915',
'only_matching': True,
}, {
# brand 008
'url': 'http://disneynow.go.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://disneynow.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://www.freeform.com/shows/cruel-summer/episode-guide/season-01/01-happy-birthday-jeanette-turner',
'only_matching': True, 'only_matching': True,
}] }]
@ -171,58 +167,29 @@ def _extract_videos(self, brand, video_id='-1', show_id='-1'):
f'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/{brand}/001/-1/{show_id}/-1/{video_id}/-1/-1.json', f'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/{brand}/001/-1/{show_id}/-1/{video_id}/-1/-1.json',
display_id)['video'] display_id)['video']
def _extract_global_var(self, name, webpage, video_id):
return self._search_json(
fr'window\[["\']{re.escape(name)}["\']\]\s*=',
webpage, f'{name.strip("_")} JSON', video_id)
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) site, display_id = self._match_valid_url(url).group('site', 'id')
sub_domain = remove_start(remove_end(mobj.group('sub_domain') or '', '.go'), 'www.') webpage = self._download_webpage(url, display_id)
video_id, display_id = mobj.group('id', 'display_id') config = self._extract_global_var('__CONFIG__', webpage, display_id)
site_info = self._SITE_INFO.get(sub_domain, {}) data = self._extract_global_var(config['globalVar'], webpage, display_id)
brand = site_info.get('brand') video_id = traverse_obj(data, (
if not video_id or not site_info: 'page', 'content', 'video', 'layout', (('video', 'id'), 'videoid'), {str}, any))
webpage = self._download_webpage(url, display_id or video_id) if not video_id:
data = self._parse_json( video_id = self._search_regex([
self._search_regex( # data-track-video_id="VDKA39492078"
r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage, # data-track-video_id_code="vdka39492078"
'data', default='{}'), # data-video-id="'VDKA3609139'"
display_id or video_id, fatal=False) r'data-(?:track-)?video[_-]id(?:_code)?=["\']*((?:vdka|VDKA)\d+)',
# https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot # page.analytics.videoIdCode
layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict) r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\d+)'], webpage, 'video ID')
video_id = None
if layout: site_info = self._SITE_INFO[site]
video_id = try_get( brand = site_info['brand']
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
str)
if not video_id:
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)',
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',
r'data-page-brand=\s*["\']\s*(\d+)'), webpage, 'brand',
default='004')
site_info = next(
si for _, si in self._SITE_INFO.items()
if si.get('brand') == brand)
if not video_id:
# show extraction works for Disney, DisneyJunior and DisneyXD
# ABC and Freeform has different layout
show_id = self._search_regex(r'data-show-id=["\']*(SH\d+)', webpage, 'show id')
videos = self._extract_videos(brand, show_id=show_id)
show_title = self._search_regex(r'data-show-title="([^"]+)"', webpage, 'show title', fatal=False)
entries = []
for video in videos:
entries.append(self.url_result(
video['url'], 'Go', video.get('id'), video.get('title')))
entries.reverse()
return self.playlist_result(entries, show_id, show_title)
video_data = self._extract_videos(brand, video_id)[0] video_data = self._extract_videos(brand, video_id)[0]
video_id = video_data['id'] video_id = video_data['id']
title = video_data['title'] title = video_data['title']
@ -238,26 +205,31 @@ def _real_extract(self, url):
if ext == 'm3u8': if ext == 'm3u8':
video_type = video_data.get('type') video_type = video_data.get('type')
data = { data = {
'video_id': video_data['id'], 'video_id': video_id,
'video_type': video_type, 'video_type': video_type,
'brand': brand, 'brand': brand,
'device': '001', 'device': '001',
'app_name': 'webplayer-abc',
} }
if video_data.get('accesslevel') == '1': if video_data.get('accesslevel') == '1':
requestor_id = site_info.get('requestor_id', 'DisneyChannels') provider_id = site_info['provider_id']
software_statement = traverse_obj(data, ('app', 'config', (
('features', 'auth', 'softwareStatement'),
('tvAuth', 'SOFTWARE_STATEMENTS', 'PRODUCTION'),
), {str}, any)) or site_info['software_statement']
resource = site_info.get('resource_id') or self._get_mvpd_resource( resource = site_info.get('resource_id') or self._get_mvpd_resource(
requestor_id, title, video_id, None) provider_id, title, video_id, None)
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, site_info['requestor_id'], resource, software_statement)
data.update({ data.update({
'token': auth, 'token': auth,
'token_type': 'ap', 'token_type': 'ap',
'adobe_requestor_id': requestor_id, 'adobe_requestor_id': provider_id,
}) })
else: else:
self._initialize_geo_bypass({'countries': ['US']}) self._initialize_geo_bypass({'countries': ['US']})
entitlement = self._download_json( entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json', 'https://prod.gatekeeper.us-abc.symphony.edgedatg.go.com/vp2/ws-secure/entitlement/2020/playmanifest_secure.json',
video_id, data=urlencode_postdata(data)) video_id, data=urlencode_postdata(data))
errors = entitlement.get('errors', {}).get('errors', []) errors = entitlement.get('errors', {}).get('errors', [])
if errors: if errors:
@ -267,7 +239,7 @@ def _real_extract(self, url):
error['message'], countries=['US']) error['message'], countries=['US'])
error_message = ', '.join([error['message'] for error in errors]) error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError(f'{self.IE_NAME} said: {error_message}', expected=True) raise ExtractorError(f'{self.IE_NAME} said: {error_message}', expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey'] asset_url += '?' + entitlement['entitlement']['uplynkData']['sessionKey']
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False) asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False)
formats.extend(fmts) formats.extend(fmts)

View File

@ -7,12 +7,13 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
int_or_none, int_or_none,
parse_duration, parse_duration,
str_or_none, str_or_none,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_strdate, update_url,
update_url_query, update_url_query,
url_or_none, url_or_none,
) )
@ -22,8 +23,8 @@
class HuyaLiveIE(InfoExtractor): class HuyaLiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|m\.)?huya\.com/(?!(?:video/play/))(?P<id>[^/#?&]+)(?:\D|$)' _VALID_URL = r'https?://(?:www\.|m\.)?huya\.com/(?!(?:video/play/))(?P<id>[^/#?&]+)(?:\D|$)'
IE_NAME = 'huya:live' IE_NAME = 'huya:live'
IE_DESC = 'huya.com' IE_DESC = '虎牙直播'
TESTS = [{ _TESTS = [{
'url': 'https://www.huya.com/572329', 'url': 'https://www.huya.com/572329',
'info_dict': { 'info_dict': {
'id': '572329', 'id': '572329',
@ -149,63 +150,94 @@ class HuyaVideoIE(InfoExtractor):
'id': '1002412640', 'id': '1002412640',
'ext': 'mp4', 'ext': 'mp4',
'title': '8月3日', 'title': '8月3日',
'thumbnail': r're:https?://.*\.jpg', 'categories': ['主机游戏'],
'duration': 14, 'duration': 14.0,
'uploader': '虎牙-ATS欧卡车队青木', 'uploader': '虎牙-ATS欧卡车队青木',
'uploader_id': '1564376151', 'uploader_id': '1564376151',
'upload_date': '20240803', 'upload_date': '20240803',
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1722675433,
}, },
}, }, {
{
'url': 'https://www.huya.com/video/play/556054543.html', 'url': 'https://www.huya.com/video/play/556054543.html',
'info_dict': { 'info_dict': {
'id': '556054543', 'id': '556054543',
'ext': 'mp4', 'ext': 'mp4',
'title': '我不挑事 也不怕事', 'title': '我不挑事 也不怕事',
'thumbnail': r're:https?://.*\.jpg', 'categories': ['英雄联盟'],
'duration': 1864, 'description': 'md5:58184869687d18ce62dc7b4b2ad21201',
'duration': 1864.0,
'uploader': '卡尔', 'uploader': '卡尔',
'uploader_id': '367138632', 'uploader_id': '367138632',
'upload_date': '20210811', 'upload_date': '20210811',
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'tags': 'count:4',
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1628675950,
},
}, {
# Only m3u8 available
'url': 'https://www.huya.com/video/play/1063345618.html',
'info_dict': {
'id': '1063345618',
'ext': 'mp4',
'title': '峡谷第一中黑铁上钻石顶级教学对抗elo',
'categories': ['英雄联盟'],
'comment_count': int,
'duration': 21603.0,
'like_count': int,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1749668803,
'upload_date': '20250611',
'uploader': '北枫CC',
'uploader_id': '2183525275',
'view_count': int,
}, },
}] }]
def _real_extract(self, url: str): def _real_extract(self, url: str):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( moment = self._download_json(
'https://liveapi.huya.com/moment/getMomentContent', video_id, 'https://liveapi.huya.com/moment/getMomentContent',
query={'videoId': video_id})['data']['moment']['videoInfo'] video_id, query={'videoId': video_id})['data']['moment']
formats = [] formats = []
for definition in traverse_obj(video_data, ('definitions', lambda _, v: url_or_none(v['url']))): for definition in traverse_obj(moment, (
formats.append({ 'videoInfo', 'definitions', lambda _, v: url_or_none(v['m3u8']),
'url': definition['url'], )):
**traverse_obj(definition, { fmts = self._extract_m3u8_formats(definition['m3u8'], video_id, 'mp4', fatal=False)
'format_id': ('defName', {str}), for fmt in fmts:
'width': ('width', {int_or_none}), fmt.update(**traverse_obj(definition, {
'height': ('height', {int_or_none}),
'filesize': ('size', {int_or_none}), 'filesize': ('size', {int_or_none}),
}), 'format_id': ('defName', {str}),
}) 'height': ('height', {int_or_none}),
'quality': ('definition', {int_or_none}),
'width': ('width', {int_or_none}),
}))
formats.extend(fmts)
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
**traverse_obj(video_data, { **traverse_obj(moment, {
'comment_count': ('commentCount', {int_or_none}),
'description': ('content', {clean_html}, filter),
'like_count': ('favorCount', {int_or_none}),
'timestamp': ('cTime', {int_or_none}),
}),
**traverse_obj(moment, ('videoInfo', {
'title': ('videoTitle', {str}), 'title': ('videoTitle', {str}),
'thumbnail': ('videoCover', {url_or_none}), 'categories': ('category', {str}, filter, all, filter),
'duration': ('videoDuration', {parse_duration}), 'duration': ('videoDuration', {parse_duration}),
'tags': ('tags', ..., {str}, filter, all, filter),
'thumbnail': (('videoBigCover', 'videoCover'), {url_or_none}, {update_url(query=None)}, any),
'uploader': ('nickName', {str}), 'uploader': ('nickName', {str}),
'uploader_id': ('uid', {str_or_none}), 'uploader_id': ('uid', {str_or_none}),
'upload_date': ('videoUploadTime', {unified_strdate}),
'view_count': ('videoPlayNum', {int_or_none}), 'view_count': ('videoPlayNum', {int_or_none}),
'comment_count': ('videoCommentNum', {int_or_none}), })),
'like_count': ('favorCount', {int_or_none}),
}),
} }

View File

@ -1,32 +1,66 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json, traverse_obj from ..utils import (
ExtractorError,
clean_html,
url_or_none,
)
from ..utils.traversal import subs_list_to_dict, traverse_obj
class MonsterSirenHypergryphMusicIE(InfoExtractor): class MonsterSirenHypergryphMusicIE(InfoExtractor):
IE_NAME = 'monstersiren'
IE_DESC = '塞壬唱片'
_API_BASE = 'https://monster-siren.hypergryph.com/api'
_VALID_URL = r'https?://monster-siren\.hypergryph\.com/music/(?P<id>\d+)' _VALID_URL = r'https?://monster-siren\.hypergryph\.com/music/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://monster-siren.hypergryph.com/music/514562', 'url': 'https://monster-siren.hypergryph.com/music/514562',
'info_dict': { 'info_dict': {
'id': '514562', 'id': '514562',
'ext': 'wav', 'ext': 'wav',
'artists': ['塞壬唱片-MSR'],
'album': 'Flame Shadow',
'title': 'Flame Shadow', 'title': 'Flame Shadow',
'album': 'Flame Shadow',
'artists': ['塞壬唱片-MSR'],
'description': 'md5:19e2acfcd1b65b41b29e8079ab948053',
'thumbnail': r're:https?://web\.hycdn\.cn/siren/pic/.+\.jpg',
},
}, {
'url': 'https://monster-siren.hypergryph.com/music/514518',
'info_dict': {
'id': '514518',
'ext': 'wav',
'title': 'Heavenly Me (Instrumental)',
'album': 'Heavenly Me',
'artists': ['塞壬唱片-MSR', 'AIYUE blessed : 理名'],
'description': 'md5:ce790b41c932d1ad72eb791d1d8ae598',
'thumbnail': r're:https?://web\.hycdn\.cn/siren/pic/.+\.jpg',
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
audio_id = self._match_id(url) audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id) song = self._download_json(f'{self._API_BASE}/song/{audio_id}', audio_id)
json_data = self._search_json( if traverse_obj(song, 'code') != 0:
r'window\.g_initialProps\s*=', webpage, 'data', audio_id, transform_source=js_to_json) msg = traverse_obj(song, ('msg', {str}, filter))
raise ExtractorError(
msg or 'API returned an error response', expected=bool(msg))
album = None
if album_id := traverse_obj(song, ('data', 'albumCid', {str})):
album = self._download_json(
f'{self._API_BASE}/album/{album_id}/detail', album_id, fatal=False)
return { return {
'id': audio_id, 'id': audio_id,
'title': traverse_obj(json_data, ('player', 'songDetail', 'name')),
'url': traverse_obj(json_data, ('player', 'songDetail', 'sourceUrl')),
'ext': 'wav',
'vcodec': 'none', 'vcodec': 'none',
'artists': traverse_obj(json_data, ('player', 'songDetail', 'artists', ...)), **traverse_obj(song, ('data', {
'album': traverse_obj(json_data, ('musicPlay', 'albumDetail', 'name')), 'title': ('name', {str}),
'artists': ('artists', ..., {str}),
'subtitles': ({'url': 'lyricUrl'}, all, {subs_list_to_dict(lang='en')}),
'url': ('sourceUrl', {url_or_none}),
})),
**traverse_obj(album, ('data', {
'album': ('name', {str}),
'description': ('intro', {clean_html}),
'thumbnail': ('coverUrl', {url_or_none}),
})),
} }

View File

@ -167,11 +167,11 @@ class LSMLTVEmbedIE(InfoExtractor):
'duration': 1442, 'duration': 1442,
'upload_date': '20231121', 'upload_date': '20231121',
'title': 'D23-6000-105_cetstud', 'title': 'D23-6000-105_cetstud',
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/660858/placeholder1700589200.jpg', 'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
}, },
}, { }, {
'url': 'https://ltv.lsm.lv/embed?enablesdkjs=1&c=eyJpdiI6IncwVzZmUFk2MU12enVWK1I3SUcwQ1E9PSIsInZhbHVlIjoid3FhV29vamc3T2sxL1RaRmJ5Rm1GTXozU0o2dVczdUtLK0cwZEZJMDQ2a3ZIRG5DK2pneGlnbktBQy9uazVleHN6VXhxdWIweWNvcHRDSnlISlNYOHlVZ1lpcTUrcWZSTUZPQW14TVdkMW9aOUtRWVNDcFF4eWpHNGcrT0VZbUNFQStKQk91cGpndW9FVjJIa0lpbkh3PT0iLCJtYWMiOiIyZGI1NDJlMWRlM2QyMGNhOGEwYTM2MmNlN2JlOGRhY2QyYjdkMmEzN2RlOTEzYTVkNzI1ODlhZDlhZjU4MjQ2IiwidGFnIjoiIn0=', 'url': 'https://ltv.lsm.lv/embed?enablesdkjs=1&c=eyJpdiI6IncwVzZmUFk2MU12enVWK1I3SUcwQ1E9PSIsInZhbHVlIjoid3FhV29vamc3T2sxL1RaRmJ5Rm1GTXozU0o2dVczdUtLK0cwZEZJMDQ2a3ZIRG5DK2pneGlnbktBQy9uazVleHN6VXhxdWIweWNvcHRDSnlISlNYOHlVZ1lpcTUrcWZSTUZPQW14TVdkMW9aOUtRWVNDcFF4eWpHNGcrT0VZbUNFQStKQk91cGpndW9FVjJIa0lpbkh3PT0iLCJtYWMiOiIyZGI1NDJlMWRlM2QyMGNhOGEwYTM2MmNlN2JlOGRhY2QyYjdkMmEzN2RlOTEzYTVkNzI1ODlhZDlhZjU4MjQ2IiwidGFnIjoiIn0=',
'md5': 'a1711e190fe680fdb68fd8413b378e87', 'md5': 'f236cef2fd5953612754e4e66be51e7a',
'info_dict': { 'info_dict': {
'id': 'wUnFArIPDSY', 'id': 'wUnFArIPDSY',
'ext': 'mp4', 'ext': 'mp4',
@ -198,6 +198,8 @@ class LSMLTVEmbedIE(InfoExtractor):
'uploader_url': 'https://www.youtube.com/@LTV16plus', 'uploader_url': 'https://www.youtube.com/@LTV16plus',
'like_count': int, 'like_count': int,
'description': 'md5:7ff0c42ba971e3c13e4b8a2ff03b70b5', 'description': 'md5:7ff0c42ba971e3c13e4b8a2ff03b70b5',
'media_type': 'livestream',
'timestamp': 1652550741,
}, },
}] }]
@ -208,7 +210,7 @@ def _real_extract(self, url):
r'window\.ltvEmbedPayload\s*=', webpage, 'embed json', video_id) r'window\.ltvEmbedPayload\s*=', webpage, 'embed json', video_id)
embed_type = traverse_obj(data, ('source', 'name', {str})) embed_type = traverse_obj(data, ('source', 'name', {str}))
if embed_type == 'telia': if embed_type in ('backscreen', 'telia'): # 'telia' only for backwards compat
ie_key = 'CloudyCDN' ie_key = 'CloudyCDN'
embed_url = traverse_obj(data, ('source', 'embed_url', {url_or_none})) embed_url = traverse_obj(data, ('source', 'embed_url', {url_or_none}))
elif embed_type == 'youtube': elif embed_type == 'youtube':
@ -226,9 +228,9 @@ def _real_extract(self, url):
class LSMReplayIE(InfoExtractor): class LSMReplayIE(InfoExtractor):
_VALID_URL = r'https?://replay\.lsm\.lv/[^/?#]+/(?:ieraksts|statja)/[^/?#]+/(?P<id>\d+)' _VALID_URL = r'https?://replay\.lsm\.lv/[^/?#]+/(?:skaties/|klausies/)?(?:ieraksts|statja)/[^/?#]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://replay.lsm.lv/lv/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija', 'url': 'https://replay.lsm.lv/lv/skaties/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'md5': '64f72a360ca530d5ed89c77646c9eee5', 'md5': '64f72a360ca530d5ed89c77646c9eee5',
'info_dict': { 'info_dict': {
'id': '46k_d23-6000-105', 'id': '46k_d23-6000-105',
@ -241,20 +243,23 @@ class LSMReplayIE(InfoExtractor):
'thumbnail': 'https://ltv.lsm.lv/storage/media/8/7/large/5/1f9604e1.jpg', 'thumbnail': 'https://ltv.lsm.lv/storage/media/8/7/large/5/1f9604e1.jpg',
}, },
}, { }, {
'url': 'https://replay.lsm.lv/lv/ieraksts/lr/183522/138-nepilniga-kompensejamo-zalu-sistema-pat-menesiem-dzena-pacientus-pa-aptiekam', 'url': 'https://replay.lsm.lv/lv/klausies/ieraksts/lr/183522/138-nepilniga-kompensejamo-zalu-sistema-pat-menesiem-dzena-pacientus-pa-aptiekam',
'md5': '719b33875cd1429846eeeaeec6df2830', 'md5': '84feb80fd7e6ec07744726a9f01cda4d',
'info_dict': { 'info_dict': {
'id': 'a342781', 'id': '183522',
'ext': 'mp3', 'ext': 'm4a',
'duration': 1823, 'duration': 1823,
'title': '#138 Nepilnīgā kompensējamo zāļu sistēma pat mēnešiem dzenā pacientus pa aptiekām', 'title': '#138 Nepilnīgā kompensējamo zāļu sistēma pat mēnešiem dzenā pacientus pa aptiekām',
'thumbnail': 'https://pic.latvijasradio.lv/public/assets/media/9/d/large_fd4675ac.jpg', 'thumbnail': 'https://pic.latvijasradio.lv/public/assets/media/9/d/large_fd4675ac.jpg',
'upload_date': '20231102', 'upload_date': '20231102',
'timestamp': 1698921060, 'timestamp': 1698913860,
'description': 'md5:7bac3b2dd41e44325032943251c357b1', 'description': 'md5:7bac3b2dd41e44325032943251c357b1',
}, },
}, { }, {
'url': 'https://replay.lsm.lv/ru/statja/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija', 'url': 'https://replay.lsm.lv/ru/skaties/statja/ltv/355067/v-kengaragse-nacalas-ukladka-relsov',
'only_matching': True,
}, {
'url': 'https://replay.lsm.lv/lv/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'only_matching': True, 'only_matching': True,
}] }]
@ -267,12 +272,24 @@ def _real_extract(self, url):
data = self._search_nuxt_data( data = self._search_nuxt_data(
self._fix_nuxt_data(webpage), video_id, context_name='__REPLAY__') self._fix_nuxt_data(webpage), video_id, context_name='__REPLAY__')
playback_type = data['playback']['type']
if playback_type == 'playable_audio_lr':
playback_data = {
'formats': self._extract_m3u8_formats(data['playback']['service']['hls_url'], video_id),
}
elif playback_type == 'embed':
playback_data = {
'_type': 'url_transparent',
'url': data['playback']['service']['url'],
}
else:
raise ExtractorError(f'Unsupported playback type "{playback_type}"')
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
**playback_data,
**traverse_obj(data, { **traverse_obj(data, {
'url': ('playback', 'service', 'url', {url_or_none}),
'title': ('mediaItem', 'title'), 'title': ('mediaItem', 'title'),
'description': ('mediaItem', ('lead', 'body')), 'description': ('mediaItem', ('lead', 'body')),
'duration': ('mediaItem', 'duration', {int_or_none}), 'duration': ('mediaItem', 'duration', {int_or_none}),

107
yt_dlp/extractor/mave.py Normal file
View File

@ -0,0 +1,107 @@
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_iso8601,
urljoin,
)
from ..utils.traversal import require, traverse_obj
class MaveIE(InfoExtractor):
_VALID_URL = r'https?://(?P<channel>[\w-]+)\.mave\.digital/(?P<id>ep-\d+)'
_TESTS = [{
'url': 'https://ochenlichnoe.mave.digital/ep-25',
'md5': 'aa3e513ef588b4366df1520657cbc10c',
'info_dict': {
'id': '4035f587-914b-44b6-aa5a-d76685ad9bc2',
'ext': 'mp3',
'display_id': 'ochenlichnoe-ep-25',
'title': 'Между мной и миром: психология самооценки',
'description': 'md5:4b7463baaccb6982f326bce5c700382a',
'uploader': 'Самарский университет',
'channel': 'Очень личное',
'channel_id': 'ochenlichnoe',
'channel_url': 'https://ochenlichnoe.mave.digital/',
'view_count': int,
'like_count': int,
'dislike_count': int,
'duration': 3744,
'thumbnail': r're:https://.+/storage/podcasts/.+\.jpg',
'series': 'Очень личное',
'series_id': '2e0c3749-6df2-4946-82f4-50691419c065',
'season': 'Season 3',
'season_number': 3,
'episode': 'Episode 3',
'episode_number': 3,
'timestamp': 1747817300,
'upload_date': '20250521',
},
}, {
'url': 'https://budem.mave.digital/ep-12',
'md5': 'e1ce2780fcdb6f17821aa3ca3e8c919f',
'info_dict': {
'id': '41898bb5-ff57-4797-9236-37a8e537aa21',
'ext': 'mp3',
'display_id': 'budem-ep-12',
'title': 'Екатерина Михайлова: "Горе от ума" не про женщин написана',
'description': 'md5:fa3bdd59ee829dfaf16e3efcb13f1d19',
'uploader': 'Полина Цветкова+Евгения Акопова',
'channel': 'Все там будем',
'channel_id': 'budem',
'channel_url': 'https://budem.mave.digital/',
'view_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 18,
'duration': 3664,
'thumbnail': r're:https://.+/storage/podcasts/.+\.jpg',
'series': 'Все там будем',
'series_id': 'fe9347bf-c009-4ebd-87e8-b06f2f324746',
'season': 'Season 2',
'season_number': 2,
'episode': 'Episode 5',
'episode_number': 5,
'timestamp': 1735538400,
'upload_date': '20241230',
},
}]
_API_BASE_URL = 'https://api.mave.digital/'
def _real_extract(self, url):
channel_id, slug = self._match_valid_url(url).group('channel', 'id')
display_id = f'{channel_id}-{slug}'
webpage = self._download_webpage(url, display_id)
data = traverse_obj(
self._search_nuxt_json(webpage, display_id),
('data', lambda _, v: v['activeEpisodeData'], any, {require('podcast data')}))
return {
'display_id': display_id,
'channel_id': channel_id,
'channel_url': f'https://{channel_id}.mave.digital/',
'vcodec': 'none',
'thumbnail': re.sub(r'_\d+(?=\.(?:jpg|png))', '', self._og_search_thumbnail(webpage, default='')) or None,
**traverse_obj(data, ('activeEpisodeData', {
'url': ('audio', {urljoin(self._API_BASE_URL)}),
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {clean_html}),
'duration': ('duration', {int_or_none}),
'season_number': ('season', {int_or_none}),
'episode_number': ('number', {int_or_none}),
'view_count': ('listenings', {int_or_none}),
'like_count': ('reactions', lambda _, v: v['type'] == 'like', 'count', {int_or_none}, any),
'dislike_count': ('reactions', lambda _, v: v['type'] == 'dislike', 'count', {int_or_none}, any),
'age_limit': ('is_explicit', {bool}, {lambda x: 18 if x else None}),
'timestamp': ('publish_date', {parse_iso8601}),
})),
**traverse_obj(data, ('podcast', 'podcast', {
'series_id': ('id', {str}),
'series': ('title', {str}),
'channel': ('title', {str}),
'uploader': ('author', {str}),
})),
}

View File

@ -1,7 +1,5 @@
from .telecinco import TelecincoBaseIE from .telecinco import TelecincoBaseIE
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
) )
@ -81,17 +79,7 @@ class MiTeleIE(TelecincoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_akamai_webpage(url, display_id)
try: # yt-dlp's default user-agents are too old and blocked by akamai
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient to bypass akamai
webpage = self._download_webpage(url, display_id, impersonate=True)
pre_player = self._search_json( pre_player = self._search_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
webpage, 'Pre Player', display_id)['prePlayer'] webpage, 'Pre Player', display_id)['prePlayer']

View File

@ -19,7 +19,8 @@
class NBACVPBaseIE(TurnerBaseIE): class NBACVPBaseIE(TurnerBaseIE):
def _extract_nba_cvp_info(self, path, video_id, fatal=False): def _extract_nba_cvp_info(self, path, video_id, fatal=False):
return self._extract_cvp_info( return self._extract_cvp_info(
f'http://secure.nba.com/{path}', video_id, { # XXX: The 3rd argument (None) needs to be the AdobePass software_statement
f'http://secure.nba.com/{path}', video_id, None, {
'default': { 'default': {
'media_src': 'http://nba.cdn.turner.com/nba/big', 'media_src': 'http://nba.cdn.turner.com/nba/big',
}, },
@ -94,6 +95,7 @@ def _extract_video(self, filter_key, filter_value):
class NBAWatchEmbedIE(NBAWatchBaseIE): class NBAWatchEmbedIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch:embed' IE_NAME = 'nba:watch:embed'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'embed\?.*?\bid=(?P<id>\d+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'embed\?.*?\bid=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -115,6 +117,7 @@ def _real_extract(self, url):
class NBAWatchIE(NBAWatchBaseIE): class NBAWatchIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch' IE_NAME = 'nba:watch'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'(?:nba/)?video/(?P<id>.+?(?=/index\.html)|(?:[^/]+/)*[^/?#&]+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'(?:nba/)?video/(?P<id>.+?(?=/index\.html)|(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -167,6 +170,7 @@ def _real_extract(self, url):
class NBAWatchCollectionIE(NBAWatchBaseIE): class NBAWatchCollectionIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch:collection' IE_NAME = 'nba:watch:collection'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'list/collection/(?P<id>[^/?#&]+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'list/collection/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -336,6 +340,7 @@ def _real_extract(self, url):
class NBAEmbedIE(NBABaseIE): class NBAEmbedIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba:embed' IE_NAME = 'nba:embed'
_VALID_URL = r'https?://secure\.nba\.com/assets/amp/include/video/(?:topI|i)frame\.html\?.*?\bcontentId=(?P<id>[^?#&]+)' _VALID_URL = r'https?://secure\.nba\.com/assets/amp/include/video/(?:topI|i)frame\.html\?.*?\bcontentId=(?P<id>[^?#&]+)'
_TESTS = [{ _TESTS = [{
@ -358,6 +363,7 @@ def _real_extract(self, url):
class NBAIE(NBABaseIE): class NBAIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba' IE_NAME = 'nba'
_VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?!{NBABaseIE._CHANNEL_PATH_REGEX})video/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?!{NBABaseIE._CHANNEL_PATH_REGEX})video/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -385,6 +391,7 @@ def _extract_url_results(self, team, content_id):
class NBAChannelIE(NBABaseIE): class NBAChannelIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba:channel' IE_NAME = 'nba:channel'
_VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?:{NBABaseIE._CHANNEL_PATH_REGEX})/(?P<id>[^/?#&]+)' _VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?:{NBABaseIE._CHANNEL_PATH_REGEX})/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{

View File

@ -6,7 +6,7 @@
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE, default_ns from .theplatform import ThePlatformBaseIE, ThePlatformIE, default_ns
from ..networking import HEADRequest from ..networking import HEADRequest
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -14,26 +14,130 @@
UserNotLive, UserNotLive,
clean_html, clean_html,
determine_ext, determine_ext,
extract_attributes,
float_or_none, float_or_none,
get_element_html_by_class,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
make_archive_id,
mimetype2ext, mimetype2ext,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
parse_iso8601,
remove_end, remove_end,
smuggle_url,
traverse_obj,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_basename, url_basename,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE class NBCUniversalBaseIE(ThePlatformBaseIE):
_VALID_URL = r'https?(?P<permalink>://(?:www\.)?nbc\.com/(?:classic-tv/)?[^/]+/video/[^/]+/(?P<id>(?:NBCE|n)?\d+))' _GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_M3U8_RE = r'https?://[^/?#]+/prod/[\w-]+/(?P<folders>[^?#]+/)cmaf/mpeg_(?:cbcs|cenc)\w*/master_cmaf\w*\.m3u8'
def _download_nbcu_smil_and_extract_m3u8_url(self, tp_path, video_id, query):
smil = self._download_xml(
f'https://link.theplatform.com/s/{tp_path}', video_id,
'Downloading SMIL manifest', 'Failed to download SMIL manifest', query={
**query,
'format': 'SMIL', # XXX: Do not confuse "format" with "formats"
'manifest': 'm3u',
'switch': 'HLSServiceSecure', # Or else we get broken mp4 http URLs instead of HLS
}, headers=self.geo_verification_headers())
ns = f'//{{{default_ns}}}'
if url := traverse_obj(smil, (f'{ns}video/@src', lambda _, v: determine_ext(v) == 'm3u8', any)):
return url
exc = traverse_obj(smil, (f'{ns}param', lambda _, v: v.get('name') == 'exception', '@value', any))
if exc == 'GeoLocationBlocked':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(traverse_obj(smil, (f'{ns}ref/@abstract', ..., any)), expected=exc == 'Expired')
def _extract_nbcu_formats_and_subtitles(self, tp_path, video_id, query):
# formats='mpeg4' will return either a working m3u8 URL or an m3u8 template for non-DRM HLS
# formats='m3u+none,mpeg4' may return DRM HLS but w/the "folders" needed for non-DRM template
query['formats'] = 'm3u+none,mpeg4'
m3u8_url = self._download_nbcu_smil_and_extract_m3u8_url(tp_path, video_id, query)
if mobj := re.fullmatch(self._M3U8_RE, m3u8_url):
query['formats'] = 'mpeg4'
m3u8_tmpl = self._download_nbcu_smil_and_extract_m3u8_url(tp_path, video_id, query)
# Example: https://vod-lf-oneapp-prd.akamaized.net/prod/video/{folders}master_hls.m3u8
if '{folders}' in m3u8_tmpl:
self.write_debug('Found m3u8 URL template, formatting URL path')
m3u8_url = m3u8_tmpl.format(folders=mobj.group('folders'))
if '/mpeg_cenc' in m3u8_url or '/mpeg_cbcs' in m3u8_url:
self.report_drm(video_id)
return self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
def _extract_nbcu_video(self, url, display_id, old_ie_key=None):
webpage = self._download_webpage(url, display_id)
settings = self._search_json(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>',
webpage, 'settings', display_id)
query = {}
tve = extract_attributes(get_element_html_by_class('tve-video-deck-app', webpage) or '')
if tve:
account_pid = tve.get('data-mpx-media-account-pid') or tve['data-mpx-account-pid']
account_id = tve['data-mpx-media-account-id']
metadata = self._parse_json(
tve.get('data-normalized-video') or '', display_id, fatal=False, transform_source=unescapeHTML)
video_id = tve.get('data-guid') or metadata['guid']
if tve.get('data-entitlement') == 'auth':
auth = settings['tve_adobe_auth']
release_pid = tve['data-release-pid']
resource = self._get_mvpd_resource(
tve.get('data-adobe-pass-resource-id') or auth['adobePassResourceId'],
tve['data-title'], release_pid, tve.get('data-rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, auth['adobePassRequestorId'],
resource, auth['adobePassSoftwareStatement'])
else:
ls_playlist = traverse_obj(settings, (
'ls_playlist', lambda _, v: v['defaultGuid'], any, {require('LS playlist')}))
video_id = ls_playlist['defaultGuid']
account_pid = ls_playlist.get('mpxMediaAccountPid') or ls_playlist['mpxAccountPid']
account_id = ls_playlist['mpxMediaAccountId']
metadata = traverse_obj(ls_playlist, ('videos', lambda _, v: v['guid'] == video_id, any)) or {}
tp_path = f'{account_pid}/media/guid/{account_id}/{video_id}'
formats, subtitles = self._extract_nbcu_formats_and_subtitles(tp_path, video_id, query)
tp_metadata = self._download_theplatform_metadata(tp_path, video_id, fatal=False)
parsed_info = self._parse_theplatform_metadata(tp_metadata)
self._merge_subtitles(parsed_info['subtitles'], target=subtitles)
return {
**parsed_info,
**traverse_obj(metadata, {
'title': ('title', {str}),
'description': ('description', {str}),
'duration': ('durationInSeconds', {int_or_none}),
'timestamp': ('airDate', {parse_iso8601}),
'thumbnail': ('thumbnailUrl', {url_or_none}),
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode': ('episodeTitle', {str}),
'series': ('show', {str}),
}),
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id(old_ie_key, video_id)] if old_ie_key else None,
}
class NBCIE(NBCUniversalBaseIE):
_VALID_URL = r'https?(?P<permalink>://(?:www\.)?nbc\.com/(?:classic-tv/)?[^/?#]+/video/[^/?#]+/(?P<id>\w+))'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.nbc.com/the-tonight-show/video/jimmy-fallon-surprises-fans-at-ben-jerrys/2848237', 'url': 'http://www.nbc.com/the-tonight-show/video/jimmy-fallon-surprises-fans-at-ben-jerrys/2848237',
@ -49,47 +153,20 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'episode_number': 86, 'episode_number': 86,
'season': 'Season 2', 'season': 'Season 2',
'season_number': 2, 'season_number': 2,
'series': 'Tonight Show: Jimmy Fallon', 'series': 'Tonight',
'duration': 237.0, 'duration': 236.504,
'chapters': 'count:1', 'tags': 'count:2',
'tags': 'count:4',
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'categories': ['Series/The Tonight Show Starring Jimmy Fallon'], 'categories': ['Series/The Tonight Show Starring Jimmy Fallon'],
'media_type': 'Full Episode', 'media_type': 'Full Episode',
'age_limit': 14,
'_old_archive_ids': ['theplatform 2848237'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{ {
'url': 'http://www.nbc.com/saturday-night-live/video/star-wars-teaser/2832821',
'info_dict': {
'id': '2832821',
'ext': 'mp4',
'title': 'Star Wars Teaser',
'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
'timestamp': 1417852800,
'upload_date': '20141206',
'uploader': 'NBCU-COM',
},
'skip': 'page not found',
},
{
# HLS streams requires the 'hdnea3' cookie
'url': 'http://www.nbc.com/Kings/video/goliath/n1806',
'info_dict': {
'id': '101528f5a9e8127b107e98c5e6ce4638',
'ext': 'mp4',
'title': 'Goliath',
'description': 'When an unknown soldier saves the life of the King\'s son in battle, he\'s thrust into the limelight and politics of the kingdom.',
'timestamp': 1237100400,
'upload_date': '20090315',
'uploader': 'NBCU-COM',
},
'skip': 'page not found',
},
{
# manifest url does not have extension
'url': 'https://www.nbc.com/the-golden-globe-awards/video/oprah-winfrey-receives-cecil-b-de-mille-award-at-the-2018-golden-globes/3646439', 'url': 'https://www.nbc.com/the-golden-globe-awards/video/oprah-winfrey-receives-cecil-b-de-mille-award-at-the-2018-golden-globes/3646439',
'info_dict': { 'info_dict': {
'id': '3646439', 'id': '3646439',
@ -99,48 +176,47 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'episode_number': 1, 'episode_number': 1,
'season': 'Season 75', 'season': 'Season 75',
'season_number': 75, 'season_number': 75,
'series': 'The Golden Globe Awards', 'series': 'Golden Globes',
'description': 'Oprah Winfrey receives the Cecil B. de Mille Award at the 75th Annual Golden Globe Awards.', 'description': 'Oprah Winfrey receives the Cecil B. de Mille Award at the 75th Annual Golden Globe Awards.',
'uploader': 'NBCU-COM', 'uploader': 'NBCU-COM',
'upload_date': '20180107', 'upload_date': '20180107',
'timestamp': 1515312000, 'timestamp': 1515312000,
'duration': 570.0, 'duration': 569.703,
'tags': 'count:8', 'tags': 'count:8',
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:1', 'media_type': 'Highlight',
'age_limit': 0,
'categories': ['Series/The Golden Globe Awards'],
'_old_archive_ids': ['theplatform 3646439'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{ {
# new video_id format # Needs to be extracted from webpage instead of GraphQL
'url': 'https://www.nbc.com/quantum-leap/video/bens-first-leap-nbcs-quantum-leap/NBCE125189978', 'url': 'https://www.nbc.com/paris2024/video/ali-truwit-found-purpose-pool-after-her-life-changed/para24_sww_alitruwittodayshow_240823',
'info_dict': { 'info_dict': {
'id': 'NBCE125189978', 'id': 'para24_sww_alitruwittodayshow_240823',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ben\'s First Leap | NBC\'s Quantum Leap', 'title': 'Ali Truwit found purpose in the pool after her life changed',
'description': 'md5:a82762449b7ec4bb83291a7b355ebf8e', 'description': 'md5:c16d7489e1516593de1cc5d3f39b9bdb',
'uploader': 'NBCU-COM', 'uploader': 'NBCU-SPORTS',
'series': 'Quantum Leap', 'duration': 311.077,
'season': 'Season 1',
'season_number': 1,
'episode': 'Ben\'s First Leap | NBC\'s Quantum Leap',
'episode_number': 1,
'duration': 170.171,
'chapters': [],
'timestamp': 1663956155,
'upload_date': '20220923',
'tags': 'count:10',
'age_limit': 0,
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'categories': ['Series/Quantum Leap 2022'], 'episode': 'Ali Truwit found purpose in the pool after her life changed',
'media_type': 'Highlight', 'timestamp': 1724435902.0,
'upload_date': '20240823',
'_old_archive_ids': ['theplatform para24_sww_alitruwittodayshow_240823'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{
'url': 'https://www.nbc.com/quantum-leap/video/bens-first-leap-nbcs-quantum-leap/NBCE125189978',
'only_matching': True,
},
{ {
'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310', 'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310',
'only_matching': True, 'only_matching': True,
@ -151,6 +227,7 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'only_matching': True, 'only_matching': True,
}, },
] ]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1Yzg2YjdkYy04NDI3LTRjNDUtOGQwZi1iNDkzYmE3MmQwYjQiLCJuYmYiOjE1Nzg3MDM2MzEsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTc4NzAzNjMxfQ.QQKIsBhAjGQTMdAqRTqhcz2Cddr4Y2hEjnSiOeKKki4nLrkDOsjQMmqeTR0hSRarraxH54wBgLvsxI7LHwKMvr7G8QpynNAxylHlQD3yhN9tFhxt4KR5wW3as02B-W2TznK9bhNWPKIyHND95Uo2Mi6rEQoq8tM9O09WPWaanE5BX_-r6Llr6dPq5F0Lpx2QOn2xYRb1T4nFxdFTNoss8GBds8OvChTiKpXMLHegLTc1OS4H_1a8tO_37jDwSdJuZ8iTyRLV4kZ2cpL6OL5JPMObD4-HQiec_dfcYgMKPiIfP9ZqdXpec2SVaCLsWEk86ZYvD97hLIQrK5rrKd1y-A'
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = self._match_valid_url(url).groups() permalink, video_id = self._match_valid_url(url).groups()
@ -196,62 +273,50 @@ def _real_extract(self, url):
'userId': '0', 'userId': '0',
}), }),
})['data']['bonanzaPage']['metadata'] })['data']['bonanzaPage']['metadata']
query = {
'mbr': 'true', if not video_data:
'manifest': 'm3u', # Some videos are not available via GraphQL API
'switch': 'HLSServiceSecure', webpage = self._download_webpage(url, video_id)
} video_data = self._search_json(
r'<script>\s*PRELOAD\s*=', webpage, 'video data',
video_id)['pages'][urllib.parse.urlparse(url).path]['base']['metadata']
video_id = video_data['mpxGuid'] video_id = video_data['mpxGuid']
tp_path = 'NnzsPC/media/guid/{}/{}'.format(video_data.get('mpxAccountId') or '2410887629', video_id) tp_path = f'NnzsPC/media/guid/{video_data["mpxAccountId"]}/{video_id}'
tpm = self._download_theplatform_metadata(tp_path, video_id) tpm = self._download_theplatform_metadata(tp_path, video_id, fatal=False)
title = tpm.get('title') or video_data.get('secondaryTitle') title = traverse_obj(tpm, ('title', {str})) or video_data.get('secondaryTitle')
query = {}
if video_data.get('locked'): if video_data.get('locked'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
video_data.get('resourceId') or 'nbcentertainment', video_data['resourceId'], title, video_id, video_data.get('rating'))
title, video_id, video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource) url, video_id, 'nbcentertainment', resource, self._SOFTWARE_STATEMENT)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/{}/{}'.format(video_data.get('mpxAccountId') or '2410887629', video_id),
query), {'force_smil_url': True})
# Empty string or 0 can be valid values for these. So the check must be `is None` formats, subtitles = self._extract_nbcu_formats_and_subtitles(tp_path, video_id, query)
description = video_data.get('description') parsed_info = self._parse_theplatform_metadata(tpm)
if description is None: self._merge_subtitles(parsed_info['subtitles'], target=subtitles)
description = tpm.get('description')
episode_number = int_or_none(video_data.get('episodeNumber'))
if episode_number is None:
episode_number = int_or_none(tpm.get('nbcu$airOrder'))
rating = video_data.get('rating')
if rating is None:
try_get(tpm, lambda x: x['ratings'][0]['rating'])
season_number = int_or_none(video_data.get('seasonNumber'))
if season_number is None:
season_number = int_or_none(tpm.get('nbcu$seasonNumber'))
series = video_data.get('seriesShortTitle')
if series is None:
series = tpm.get('nbcu$seriesShortTitle')
tags = video_data.get('keywords')
if tags is None or len(tags) == 0:
tags = tpm.get('keywords')
return { return {
'_type': 'url_transparent', **traverse_obj(video_data, {
'age_limit': parse_age_limit(rating), 'description': ('description', {str}, filter),
'description': description, 'episode': ('secondaryTitle', {str}, filter),
'episode': title, 'episode_number': ('episodeNumber', {int_or_none}),
'episode_number': episode_number, 'season_number': ('seasonNumber', {int_or_none}),
'age_limit': ('rating', {parse_age_limit}),
'tags': ('keywords', ..., {str}, filter, all, filter),
'series': ('seriesShortTitle', {str}),
}),
**parsed_info,
'id': video_id, 'id': video_id,
'ie_key': 'ThePlatform',
'season_number': season_number,
'series': series,
'tags': tags,
'title': title, 'title': title,
'url': theplatform_url, 'formats': formats,
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('ThePlatform', video_id)],
} }
class NBCSportsVPlayerIE(InfoExtractor): class NBCSportsVPlayerIE(InfoExtractor):
_WORKING = False
_VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/' _VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/'
_VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)' _VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)'
_EMBED_REGEX = [rf'(?:iframe[^>]+|var video|div[^>]+data-(?:mpx-)?)[sS]rc\s?=\s?"(?P<url>{_VALID_URL_BASE}[^\"]+)'] _EMBED_REGEX = [rf'(?:iframe[^>]+|var video|div[^>]+data-(?:mpx-)?)[sS]rc\s?=\s?"(?P<url>{_VALID_URL_BASE}[^\"]+)']
@ -286,6 +351,7 @@ def _real_extract(self, url):
class NBCSportsIE(InfoExtractor): class NBCSportsIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TESTS = [{ _TESTS = [{
@ -321,6 +387,7 @@ def _real_extract(self, url):
class NBCSportsStreamIE(AdobePassIE): class NBCSportsStreamIE(AdobePassIE):
_WORKING = False
_VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)' _VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559', 'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559',
@ -354,7 +421,7 @@ def _real_extract(self, url):
source_url = video_source['ottStreamUrl'] source_url = video_source['ottStreamUrl']
is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live' is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live'
resource = self._get_mvpd_resource('nbcsports', title, video_id, '') resource = self._get_mvpd_resource('nbcsports', title, video_id, '')
token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource) token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource, None) # XXX: None arg needs to be software_statement
tokenized_url = self._download_json( tokenized_url = self._download_json(
'https://token.playmakerservices.com/cdn', 'https://token.playmakerservices.com/cdn',
video_id, data=json.dumps({ video_id, data=json.dumps({
@ -534,22 +601,26 @@ class NBCOlympicsIE(InfoExtractor):
IE_NAME = 'nbcolympics' IE_NAME = 'nbcolympics'
_VALID_URL = r'https?://www\.nbcolympics\.com/videos?/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://www\.nbcolympics\.com/videos?/(?P<id>[0-9a-z-]+)'
_TEST = { _TESTS = [{
# Geo-restricted to US # Geo-restricted to US
'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold', 'url': 'https://www.nbcolympics.com/videos/watch-final-minutes-team-usas-mens-basketball-gold',
'md5': '54fecf846d05429fbaa18af557ee523a',
'info_dict': { 'info_dict': {
'id': 'WjTBzDXx5AUq', 'id': 'SAwGfPlQ1q01',
'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Rose\'s son Leo was in tears after his dad won gold', 'display_id': 'watch-final-minutes-team-usas-mens-basketball-gold',
'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.', 'title': 'Watch the final minutes of Team USA\'s men\'s basketball gold',
'timestamp': 1471274964, 'description': 'md5:f704f591217305c9559b23b877aa8d31',
'upload_date': '20160815',
'uploader': 'NBCU-SPORTS', 'uploader': 'NBCU-SPORTS',
'duration': 387.053,
'thumbnail': r're:https://.+/.+\.jpg',
'chapters': [],
'timestamp': 1723346984,
'upload_date': '20240811',
}, },
'skip': '404 Not Found', }, {
} 'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -578,6 +649,7 @@ def _real_extract(self, url):
class NBCOlympicsStreamIE(AdobePassIE): class NBCOlympicsStreamIE(AdobePassIE):
_WORKING = False
IE_NAME = 'nbcolympics:stream' IE_NAME = 'nbcolympics:stream'
_VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)'
_TESTS = [ _TESTS = [
@ -630,7 +702,8 @@ def _real_extract(self, url):
event_config.get('resourceId', 'NBCOlympics'), event_config.get('resourceId', 'NBCOlympics'),
re.sub(r'[^\w\d ]+', '', event_config['eventTitle']), pid, re.sub(r'[^\w\d ]+', '', event_config['eventTitle']), pid,
event_config.get('ratingId', 'NO VALUE')) event_config.get('ratingId', 'NO VALUE'))
media_token = self._extract_mvpd_auth(url, pid, event_config.get('requestorId', 'NBCOlympics'), ap_resource) # XXX: The None arg below needs to be the software_statement for this requestor
media_token = self._extract_mvpd_auth(url, pid, event_config.get('requestorId', 'NBCOlympics'), ap_resource, None)
source_url = self._download_json( source_url = self._download_json(
'https://tokens.playmakerservices.com/', pid, 'Retrieving tokenized URL', 'https://tokens.playmakerservices.com/', pid, 'Retrieving tokenized URL',
@ -848,3 +921,178 @@ def _real_extract(self, url):
'is_live': is_live, 'is_live': is_live,
**info, **info,
} }
class BravoTVIE(NBCUniversalBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:bravotv|oxygen)\.com/(?:[^/?#]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'info_dict': {
'id': '3923059',
'ext': 'mp4',
'title': 'The Top Chef Season 16 Winner Is...',
'display_id': 'the-top-chef-season-16-winner-is',
'description': 'Find out who takes the title of Top Chef!',
'upload_date': '20190315',
'timestamp': 1552618860,
'season_number': 16,
'episode_number': 15,
'series': 'Top Chef',
'episode': 'Finale',
'duration': 190,
'season': 'Season 16',
'thumbnail': r're:^https://.+\.jpg',
'uploader': 'NBCU-BRAV',
'categories': ['Series', 'Series/Top Chef'],
'tags': 'count:10',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/top-chef/season-20/episode-1/london-calling',
'info_dict': {
'id': '9000234570',
'ext': 'mp4',
'title': 'London Calling',
'display_id': 'london-calling',
'description': 'md5:5af95a8cbac1856bd10e7562f86bb759',
'upload_date': '20230310',
'timestamp': 1678418100,
'season_number': 20,
'episode_number': 1,
'series': 'Top Chef',
'episode': 'London Calling',
'duration': 3266,
'season': 'Season 20',
'chapters': 'count:7',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Top Chef'],
'tags': 'count:10',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-1/closing-night',
'info_dict': {
'id': '3692045',
'ext': 'mp4',
'title': 'Closing Night',
'display_id': 'closing-night',
'description': 'md5:c8a5bb523c8ef381f3328c6d9f1e4632',
'upload_date': '20230126',
'timestamp': 1674709200,
'season_number': 1,
'episode_number': 1,
'series': 'In Ice Cold Blood',
'episode': 'Closing Night',
'duration': 2629,
'season': 'Season 1',
'chapters': 'count:6',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/In Ice Cold Blood'],
'tags': ['ice-t', 'in ice cold blood', 'law and order', 'oxygen', 'true crime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'info_dict': {
'id': '3974019',
'ext': 'mp4',
'title': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'display_id': 'handling-the-horwitz-house-after-the-murder-season-2',
'description': 'md5:f9d638dd6946a1c1c0533a9c6100eae5',
'upload_date': '20190618',
'timestamp': 1560819600,
'season_number': 2,
'episode_number': 16,
'series': 'In Ice Cold Blood',
'episode': 'Mother Vs Son',
'duration': 68,
'season': 'Season 2',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'uploader': 'NBCU-OXY',
'categories': ['Series/In Ice Cold Blood'],
'tags': ['in ice cold blood', 'ice-t', 'law and order', 'true crime', 'oxygen'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_nbcu_video(url, display_id)
class SyfyIE(NBCUniversalBaseIE):
_VALID_URL = r'https?://(?:www\.)?syfy\.com/[^/?#]+/(?:season-\d+/episode-\d+/(?:videos/)?|videos/)(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.syfy.com/face-off/season-13/episode-10/videos/keyed-up',
'info_dict': {
'id': '3774403',
'ext': 'mp4',
'display_id': 'keyed-up',
'title': 'Keyed Up',
'description': 'md5:feafd15bee449f212dcd3065bbe9a755',
'age_limit': 14,
'duration': 169,
'thumbnail': r're:https://www\.syfy\.com/.+/.+\.jpg',
'series': 'Face Off',
'season': 'Season 13',
'season_number': 13,
'episode': 'Through the Looking Glass Part 2',
'episode_number': 10,
'timestamp': 1533711618,
'upload_date': '20180808',
'media_type': 'Excerpt',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Face Off'],
'tags': 'count:15',
'_old_archive_ids': ['theplatform 3774403'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.syfy.com/face-off/season-13/episode-10/through-the-looking-glass-part-2',
'info_dict': {
'id': '3772391',
'ext': 'mp4',
'display_id': 'through-the-looking-glass-part-2',
'title': 'Through the Looking Glass Pt.2',
'description': 'md5:90bd5dcbf1059fe3296c263599af41d2',
'age_limit': 0,
'duration': 2599,
'thumbnail': r're:https://www\.syfy\.com/.+/.+\.jpg',
'chapters': [{'start_time': 0.0, 'end_time': 679.0, 'title': '<Untitled Chapter 1>'},
{'start_time': 679.0, 'end_time': 1040.967, 'title': '<Untitled Chapter 2>'},
{'start_time': 1040.967, 'end_time': 1403.0, 'title': '<Untitled Chapter 3>'},
{'start_time': 1403.0, 'end_time': 1870.0, 'title': '<Untitled Chapter 4>'},
{'start_time': 1870.0, 'end_time': 2496.967, 'title': '<Untitled Chapter 5>'},
{'start_time': 2496.967, 'end_time': 2599, 'title': '<Untitled Chapter 6>'}],
'series': 'Face Off',
'season': 'Season 13',
'season_number': 13,
'episode': 'Through the Looking Glass Part 2',
'episode_number': 10,
'timestamp': 1672570800,
'upload_date': '20230101',
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Face Off'],
'tags': 'count:15',
'_old_archive_ids': ['theplatform 3772391'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_nbcu_video(url, display_id, old_ie_key='ThePlatform')

View File

@ -6,13 +6,13 @@
import urllib.parse import urllib.parse
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
clean_html, clean_html,
determine_ext, determine_ext,
extract_attributes,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_bitrate, parse_bitrate,
@ -20,17 +20,20 @@
parse_qs, parse_qs,
parse_resolution, parse_resolution,
qualities, qualities,
remove_start,
str_or_none, str_or_none,
time_seconds, time_seconds,
unescapeHTML, truncate_string,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_basename, url_basename,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, traverse_obj from ..utils.traversal import (
find_element,
require,
traverse_obj,
)
class NiconicoBaseIE(InfoExtractor): class NiconicoBaseIE(InfoExtractor):
@ -820,41 +823,39 @@ class NiconicoLiveIE(NiconicoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(f'https://live.nicovideo.jp/watch/{video_id}', video_id) webpage = self._download_webpage(url, video_id, expected_status=404)
if err_msg := traverse_obj(webpage, ({find_element(cls='message')}, {clean_html})):
raise ExtractorError(err_msg, expected=True)
embedded_data = self._parse_json(unescapeHTML(self._search_regex( embedded_data = traverse_obj(webpage, (
r'<script\s+id="embedded-data"\s*data-props="(.+?)"', webpage, 'embedded data')), video_id) {find_element(tag='script', id='embedded-data', html=True)},
{extract_attributes}, 'data-props', {json.loads}))
ws_url = traverse_obj(embedded_data, ('site', 'relive', 'webSocketUrl')) frontend_id = traverse_obj(embedded_data, ('site', 'frontendId', {str_or_none}), default='9')
if not ws_url:
raise ExtractorError('The live hasn\'t started yet or already ended.', expected=True)
ws_url = update_url_query(ws_url, {
'frontend_id': traverse_obj(embedded_data, ('site', 'frontendId')) or '9',
})
hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.')
ws_url = traverse_obj(embedded_data, (
'site', 'relive', 'webSocketUrl', {url_or_none}, {require('websocket URL')}))
ws_url = update_url_query(ws_url, {'frontend_id': frontend_id})
ws = self._request_webpage( ws = self._request_webpage(
Request(ws_url, headers={'Origin': f'https://{hostname}'}), ws_url, video_id, 'Connecting to WebSocket server',
video_id=video_id, note='Connecting to WebSocket server') headers={'Origin': 'https://live.nicovideo.jp'})
self.write_debug('Sending HLS server request') self.write_debug('Sending HLS server request')
ws.send(json.dumps({ ws.send(json.dumps({
'type': 'startWatching',
'data': { 'data': {
'reconnect': False,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': { 'stream': {
'quality': 'abr',
'protocol': 'hls',
'latency': 'high',
'accessRightMethod': 'single_cookie', 'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': 'abr',
}, },
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': False,
}, },
'type': 'startWatching',
})) }))
while True: while True:
@ -874,17 +875,15 @@ def _real_extract(self, url):
raise ExtractorError('Disconnected at middle of extraction') raise ExtractorError('Disconnected at middle of extraction')
elif data.get('type') == 'error': elif data.get('type') == 'error':
self.write_debug(recv) self.write_debug(recv)
message = traverse_obj(data, ('body', 'code')) or recv message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
raise ExtractorError(message) raise ExtractorError(message)
elif self.get_param('verbose', False): elif self.get_param('verbose', False):
if len(recv) > 100: self.write_debug(f'Server response: {truncate_string(recv, 100)}')
recv = recv[:100] + '...'
self.write_debug(f'Server said: {recv}')
title = traverse_obj(embedded_data, ('program', 'title')) or self._html_search_meta( title = traverse_obj(embedded_data, ('program', 'title')) or self._html_search_meta(
('og:title', 'twitter:title'), webpage, 'live title', fatal=False) ('og:title', 'twitter:title'), webpage, 'live title', fatal=False)
raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail')) or {} raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail', {dict})) or {}
thumbnails = [] thumbnails = []
for name, value in raw_thumbs.items(): for name, value in raw_thumbs.items():
if not isinstance(value, dict): if not isinstance(value, dict):
@ -911,31 +910,30 @@ def _real_extract(self, url):
cookie['domain'], cookie['name'], cookie['value'], cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure']) expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure'])
fmt_common = {
'live_latency': 'high',
'origin': hostname,
'protocol': 'niconico_live',
'video_id': video_id,
'ws': ws,
}
q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr' q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr'
a_map = {96: 'audio_low', 192: 'audio_high'} a_map = {96: 'audio_low', 192: 'audio_high'}
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True) formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt in formats: for fmt in formats:
fmt['protocol'] = 'niconico_live'
if fmt.get('acodec') == 'none': if fmt.get('acodec') == 'none':
fmt['format_id'] = next(q_iter, fmt['format_id']) fmt['format_id'] = next(q_iter, fmt['format_id'])
elif fmt.get('vcodec') == 'none': elif fmt.get('vcodec') == 'none':
abr = parse_bitrate(fmt['url'].lower()) abr = parse_bitrate(fmt['url'].lower())
fmt.update({ fmt.update({
'abr': abr, 'abr': abr,
'acodec': 'mp4a.40.2',
'format_id': a_map.get(abr, fmt['format_id']), 'format_id': a_map.get(abr, fmt['format_id']),
}) })
fmt.update(fmt_common)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'downloader_options': {
'max_quality': traverse_obj(embedded_data, ('program', 'stream', 'maxQuality', {str})) or 'normal',
'ws': ws,
'ws_url': ws_url,
},
**traverse_obj(embedded_data, { **traverse_obj(embedded_data, {
'view_count': ('program', 'statistics', 'watchCount'), 'view_count': ('program', 'statistics', 'watchCount'),
'comment_count': ('program', 'statistics', 'commentCount'), 'comment_count': ('program', 'statistics', 'commentCount'),

View File

@ -1,59 +1,57 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, UnsupportedError,
get_element_by_attribute, clean_html,
int_or_none, int_or_none,
js_to_json, parse_duration,
mimetype2ext, parse_qs,
update_url_query, str_or_none,
update_url,
) )
from ..utils.traversal import find_element, traverse_obj
class NobelPrizeIE(InfoExtractor): class NobelPrizeIE(InfoExtractor):
_WORKING = False _VALID_URL = r'https?://(?:(?:mediaplayer|www)\.)?nobelprize\.org/mediaplayer/'
_VALID_URL = r'https?://(?:www\.)?nobelprize\.org/mediaplayer.*?\bid=(?P<id>\d+)' _TESTS = [{
_TEST = { 'url': 'https://www.nobelprize.org/mediaplayer/?id=2636',
'url': 'http://www.nobelprize.org/mediaplayer/?id=2636',
'md5': '04c81e5714bb36cc4e2232fee1d8157f',
'info_dict': { 'info_dict': {
'id': '2636', 'id': '2636',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Announcement of the 2016 Nobel Prize in Physics', 'title': 'Announcement of the 2016 Nobel Prize in Physics',
'description': 'md5:05beba57f4f5a4bbd4cf2ef28fcff739', 'description': 'md5:1a2d8a6ca80c88fb3b9a326e0b0e8e43',
'duration': 1560.0,
'thumbnail': r're:https?://www\.nobelprize\.org/images/.+\.jpg',
'timestamp': 1504883793,
'upload_date': '20170908',
}, },
} }, {
'url': 'https://mediaplayer.nobelprize.org/mediaplayer/?qid=12693',
'info_dict': {
'id': '12693',
'ext': 'mp4',
'title': 'Nobel Lecture by Peter Higgs',
'description': 'md5:9b12e275dbe3a8138484e70e00673a05',
'duration': 1800.0,
'thumbnail': r're:https?://www\.nobelprize\.org/images/.+\.jpg',
'timestamp': 1504883793,
'upload_date': '20170908',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = traverse_obj(parse_qs(url), (
webpage = self._download_webpage(url, video_id) ('id', 'qid'), -1, {int_or_none}, {str_or_none}, any))
media = self._parse_json(self._search_regex( if not video_id:
r'(?s)var\s*config\s*=\s*({.+?});', webpage, raise UnsupportedError(url)
'config'), video_id, js_to_json)['media'] webpage = self._download_webpage(
title = media['title'] update_url(url, netloc='mediaplayer.nobelprize.org'), video_id)
formats = []
for source in media.get('source', []):
source_src = source.get('src')
if not source_src:
continue
ext = mimetype2ext(source.get('type')) or determine_ext(source_src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_src, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(source_src, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': source_src,
})
return { return {
**self._search_json_ld(webpage, video_id),
'id': video_id, 'id': video_id,
'title': title, 'title': self._html_search_meta('caption', webpage),
'description': get_element_by_attribute('itemprop', 'description', webpage), 'description': traverse_obj(webpage, (
'duration': int_or_none(media.get('duration')), {find_element(tag='span', attr='itemprop', value='description')}, {clean_html})),
'formats': formats, 'duration': parse_duration(self._html_search_meta('duration', webpage)),
} }

View File

@ -1,55 +1,82 @@
from .common import InfoExtractor from .streaks import StreaksBaseIE
from ..utils import ( from ..utils import (
ExtractorError, int_or_none,
smuggle_url, parse_iso8601,
traverse_obj, str_or_none,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class NTVCoJpCUIE(InfoExtractor): class NTVCoJpCUIE(StreaksBaseIE):
IE_NAME = 'cu.ntv.co.jp' IE_NAME = 'cu.ntv.co.jp'
IE_DESC = 'Nippon Television Network' IE_DESC = '日テレ無料TADA!'
_VALID_URL = r'https?://cu\.ntv\.co\.jp/(?!program)(?P<id>[^/?&#]+)' _VALID_URL = r'https?://cu\.ntv\.co\.jp/(?!program-list|search)(?P<id>[\w-]+)/?(?:[?#]|$)'
_TEST = { _TESTS = [{
'url': 'https://cu.ntv.co.jp/televiva-chill-gohan_181031/', 'url': 'https://cu.ntv.co.jp/gaki_20250525/',
'info_dict': { 'info_dict': {
'id': '5978891207001', 'id': 'gaki_20250525',
'ext': 'mp4', 'ext': 'mp4',
'title': '桜エビと炒り卵がポイント! 「中華風 エビチリおにぎり」──『美虎』五十嵐美幸', 'title': '放送開始36年!方正ココリコが選ぶ神回&地獄回!',
'upload_date': '20181213', 'cast': 'count:2',
'description': 'md5:1985b51a9abc285df0104d982a325f2a', 'description': 'md5:1e1db556224d627d4d2f74370c650927',
'uploader_id': '3855502814001', 'display_id': 'ref:gaki_20250525',
'timestamp': 1544669941, 'duration': 1450,
'episode': '放送開始36年!方正ココリコが選ぶ神回&地獄回!',
'episode_id': '000000010172808',
'episode_number': 255,
'genres': ['variety'],
'live_status': 'not_live',
'modified_date': '20250525',
'modified_timestamp': 1748145537,
'release_date': '20250525',
'release_timestamp': 1748145539,
'series': 'ダウンタウンのガキの使いやあらへんで!',
'series_id': 'gaki',
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1748145197,
'upload_date': '20250525',
'uploader': '日本テレビ放送網',
'uploader_id': '0x7FE2',
}, },
'params': { }]
# m3u8 download
'skip_download': True,
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
player_config = self._search_nuxt_data(webpage, display_id)
video_id = traverse_obj(player_config, ('movie', 'video_id')) info = self._search_json(
if not video_id: r'window\.app\s*=', webpage, 'video info',
raise ExtractorError('Failed to extract video ID for Brightcove') display_id)['falcorCache']['catalog']['episode'][display_id]['value']
account_id = traverse_obj(player_config, ('player', 'account')) or '3855502814001' media_id = traverse_obj(info, (
title = traverse_obj(player_config, ('movie', 'name')) 'streaks_data', 'mediaid', {str_or_none}, {require('Streaks media ID')}))
if not title: non_phonetic = (lambda _, v: v['is_phonetic'] is False, 'value', {str})
og_title = self._og_search_title(webpage, fatal=False) or traverse_obj(player_config, ('player', 'title'))
if og_title:
title = og_title.split('(', 1)[0].strip()
description = (traverse_obj(player_config, ('movie', 'description'))
or self._html_search_meta(['description', 'og:description'], webpage))
return { return {
'_type': 'url_transparent', **self._extract_from_streaks_api('ntv-tada', media_id, headers={
'id': video_id, 'X-Streaks-Api-Key': 'df497719056b44059a0483b8faad1f4a',
'display_id': display_id, }),
'title': title, **traverse_obj(info, {
'description': description, 'id': ('content_id', {str_or_none}),
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % (account_id, video_id), {'geo_countries': ['JP']}), 'title': ('title', *non_phonetic, any),
'ie_key': 'BrightcoveNew', 'age_limit': ('is_adult_only_content', {lambda x: 18 if x else None}),
'cast': ('credit', ..., 'name', *non_phonetic),
'genres': ('genre', ..., {str}),
'release_timestamp': ('pub_date', {parse_iso8601}),
'tags': ('tags', ..., {str}),
'thumbnail': ('artwork', ..., 'url', any, {url_or_none}),
}),
**traverse_obj(info, ('tv_episode_info', {
'duration': ('duration', {int_or_none}),
'episode_number': ('episode_number', {int}),
'series': ('parent_show_title', *non_phonetic, any),
'series_id': ('show_content_id', {str}),
})),
**traverse_obj(info, ('custom_data', {
'description': ('program_detail', {str}),
'episode': ('episode_title', {str}),
'episode_id': ('episode_id', {str_or_none}),
'uploader': ('network_name', {str}),
'uploader_id': ('network_id', {str}),
})),
} }

View File

@ -273,6 +273,8 @@ def _extract_desktop(self, url):
return self._extract_desktop(smuggle_url(url, {'referrer': 'https://boosty.to'})) return self._extract_desktop(smuggle_url(url, {'referrer': 'https://boosty.to'}))
elif error: elif error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
elif '>Access to this video is restricted</div>' in webpage:
self.raise_login_required()
player = self._parse_json( player = self._parse_json(
unescapeHTML(self._search_regex( unescapeHTML(self._search_regex(
@ -429,7 +431,7 @@ def _extract_mobile(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( webpage = self._download_webpage(
f'http://m.ok.ru/video/{video_id}', video_id, f'https://m.ok.ru/video/{video_id}', video_id,
note='Downloading mobile webpage') note='Downloading mobile webpage')
error = self._search_regex( error = self._search_regex(

View File

@ -340,8 +340,9 @@ def _real_extract(self, url):
'channel_follower_count': ('attributes', 'patron_count', {int_or_none}), 'channel_follower_count': ('attributes', 'patron_count', {int_or_none}),
})) }))
# all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, Vimeo # Must be all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, and Vimeo.
headers = {'referer': 'https://patreon.com/'} # patreon.com URLs redirect to www.patreon.com; this matters when requesting mux.com m3u8s
headers = {'referer': 'https://www.patreon.com/'}
# handle Vimeo embeds # handle Vimeo embeds
if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo': if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo':
@ -352,7 +353,7 @@ def _real_extract(self, url):
v_url, video_id, 'Checking Vimeo embed URL', headers=headers, v_url, video_id, 'Checking Vimeo embed URL', headers=headers,
fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection
entries.append(self.url_result( entries.append(self.url_result(
VimeoIE._smuggle_referrer(v_url, 'https://patreon.com/'), VimeoIE._smuggle_referrer(v_url, headers['referer']),
VimeoIE, url_transparent=True)) VimeoIE, url_transparent=True))
embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none})) embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none}))
@ -379,11 +380,13 @@ def _real_extract(self, url):
'url': post_file['url'], 'url': post_file['url'],
}) })
elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8': elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8':
formats, subtitles = self._extract_m3u8_formats_and_subtitles(post_file['url'], video_id) formats, subtitles = self._extract_m3u8_formats_and_subtitles(
post_file['url'], video_id, headers=headers)
entries.append({ entries.append({
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'http_headers': headers,
}) })
can_view_post = traverse_obj(attributes, 'current_user_can_view') can_view_post = traverse_obj(attributes, 'current_user_can_view')

View File

@ -9,11 +9,10 @@
int_or_none, int_or_none,
join_nonempty, join_nonempty,
parse_qs, parse_qs,
traverse_obj,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import unpack from ..utils.traversal import traverse_obj, unpack
class PlaySuisseIE(InfoExtractor): class PlaySuisseIE(InfoExtractor):

View File

@ -5,11 +5,13 @@
from ..utils import ( from ..utils import (
OnDemandPagedList, OnDemandPagedList,
float_or_none, float_or_none,
int_or_none,
orderedSet,
str_or_none, str_or_none,
str_to_int,
traverse_obj,
unified_timestamp, unified_timestamp,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class PodchaserIE(InfoExtractor): class PodchaserIE(InfoExtractor):
@ -21,24 +23,25 @@ class PodchaserIE(InfoExtractor):
'id': '104365585', 'id': '104365585',
'title': 'Ep. 285 freeze me off', 'title': 'Ep. 285 freeze me off',
'description': 'cam ahn', 'description': 'cam ahn',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'ext': 'mp3', 'ext': 'mp3',
'categories': ['Comedy'], 'categories': ['Comedy', 'News', 'Politics', 'Arts'],
'tags': ['comedy', 'dark humor'], 'tags': ['comedy', 'dark humor'],
'series': 'Cum Town', 'series': 'The Adam Friedland Show Podcast',
'duration': 3708, 'duration': 3708,
'timestamp': 1636531259, 'timestamp': 1636531259,
'upload_date': '20211110', 'upload_date': '20211110',
'average_rating': 4.0, 'average_rating': 4.0,
'series_id': '36924',
}, },
}, { }, {
'url': 'https://www.podchaser.com/podcasts/the-bone-zone-28853', 'url': 'https://www.podchaser.com/podcasts/the-bone-zone-28853',
'info_dict': { 'info_dict': {
'id': '28853', 'id': '28853',
'title': 'The Bone Zone', 'title': 'The Bone Zone',
'description': 'Podcast by The Bone Zone', 'description': r're:The official home of the Bone Zone podcast.+',
}, },
'playlist_count': 275, 'playlist_mincount': 275,
}, { }, {
'url': 'https://www.podchaser.com/podcasts/sean-carrolls-mindscape-scienc-699349/episodes', 'url': 'https://www.podchaser.com/podcasts/sean-carrolls-mindscape-scienc-699349/episodes',
'info_dict': { 'info_dict': {
@ -51,19 +54,33 @@ class PodchaserIE(InfoExtractor):
@staticmethod @staticmethod
def _parse_episode(episode, podcast): def _parse_episode(episode, podcast):
return { info = traverse_obj(episode, {
'id': str(episode.get('id')), 'id': ('id', {int}, {str_or_none}, {require('episode ID')}),
'title': episode.get('title'), 'title': ('title', {str}),
'description': episode.get('description'), 'description': ('description', {str}),
'url': episode.get('audio_url'), 'url': ('audio_url', {url_or_none}),
'thumbnail': episode.get('image_url'), 'thumbnail': ('image_url', {url_or_none}),
'duration': str_to_int(episode.get('length')), 'duration': ('length', {int_or_none}),
'timestamp': unified_timestamp(episode.get('air_date')), 'timestamp': ('air_date', {unified_timestamp}),
'average_rating': float_or_none(episode.get('rating')), 'average_rating': ('rating', {float_or_none}),
'categories': list(set(traverse_obj(podcast, (('summary', None), 'categories', ..., 'text')))), })
'tags': traverse_obj(podcast, ('tags', ..., 'text')), info.update(traverse_obj(podcast, {
'series': podcast.get('title'), 'series': ('title', {str}),
} 'series_id': ('id', {int}, {str_or_none}),
'categories': (('summary', None), 'categories', ..., 'text', {str}, filter, all, {orderedSet}),
'tags': ('tags', ..., 'text', {str}),
}))
info['vcodec'] = 'none'
if info.get('series_id'):
podcast_slug = traverse_obj(podcast, ('slug', {str})) or 'podcast'
episode_slug = traverse_obj(episode, ('slug', {str})) or 'episode'
info['webpage_url'] = '/'.join((
'https://www.podchaser.com/podcasts',
'-'.join((podcast_slug[:30].rstrip('-'), info['series_id'])),
'-'.join((episode_slug[:30].rstrip('-'), info['id']))))
return info
def _call_api(self, path, *args, **kwargs): def _call_api(self, path, *args, **kwargs):
return self._download_json(f'https://api.podchaser.com/{path}', *args, **kwargs) return self._download_json(f'https://api.podchaser.com/{path}', *args, **kwargs)
@ -93,5 +110,5 @@ def _real_extract(self, url):
OnDemandPagedList(functools.partial(self._fetch_page, podcast_id, podcast), self._PAGE_SIZE), OnDemandPagedList(functools.partial(self._fetch_page, podcast_id, podcast), self._PAGE_SIZE),
str_or_none(podcast.get('id')), podcast.get('title'), podcast.get('description')) str_or_none(podcast.get('id')), podcast.get('title'), podcast.get('description'))
episode = self._call_api(f'episodes/{episode_id}', episode_id) episode = self._call_api(f'podcasts/{podcast_id}/episodes/{episode_id}/player_ids', episode_id)
return self._parse_episode(episode, podcast) return self._parse_episode(episode, podcast)

View File

@ -15,7 +15,6 @@
str_or_none, str_or_none,
strip_jsonp, strip_jsonp,
traverse_obj, traverse_obj,
unescapeHTML,
url_or_none, url_or_none,
urljoin, urljoin,
) )
@ -425,7 +424,7 @@ def _real_extract(self, url):
return self.playlist_result(entries, list_id, **traverse_obj(list_json, ('cdlist', 0, { return self.playlist_result(entries, list_id, **traverse_obj(list_json, ('cdlist', 0, {
'title': ('dissname', {str}), 'title': ('dissname', {str}),
'description': ('desc', {unescapeHTML}, {clean_html}), 'description': ('desc', {clean_html}),
}))) })))

View File

@ -213,7 +213,7 @@ class CieloTVItIE(SkyItIE): # XXX: Do not subclass from concrete IE
class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
IE_NAME = 'tv8.it' IE_NAME = 'tv8.it'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/[0-9a-z-]+-(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/(?:[0-9a-z-]+-)?(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.tv8.it/video/ogni-mattina-ucciso-asino-di-andrea-lo-cicero-630529', 'url': 'https://www.tv8.it/video/ogni-mattina-ucciso-asino-di-andrea-lo-cicero-630529',
'md5': '9ab906a3f75ea342ed928442f9dabd21', 'md5': '9ab906a3f75ea342ed928442f9dabd21',
@ -227,6 +227,19 @@ class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
'thumbnail': 'https://videoplatform.sky.it/still/2020/11/18/1605717753954_ogni-mattina-ucciso-asino-di-andrea-lo-cicero_videostill_1.jpg', 'thumbnail': 'https://videoplatform.sky.it/still/2020/11/18/1605717753954_ogni-mattina-ucciso-asino-di-andrea-lo-cicero_videostill_1.jpg',
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.tv8.it/video/964361',
'md5': '1e58e807154658a16edc29e45be38107',
'info_dict': {
'id': '964361',
'ext': 'mp4',
'title': 'GialappaShow - S.4 Ep.2',
'description': 'md5:60bb4ff5af18bbeeaedabc1de5f9e1e2',
'duration': 8030,
'thumbnail': 'https://videoplatform.sky.it/captures/494/2024/11/06/964361/964361_1730888412914_thumb_494.jpg',
'timestamp': 1730821499,
'upload_date': '20241105',
},
}] }]
_DOMAIN = 'mtv8' _DOMAIN = 'mtv8'

View File

@ -697,7 +697,7 @@ def _real_extract(self, url):
try: try:
return self._extract_info_dict(info, full_title, token) return self._extract_info_dict(info, full_title, token)
except ExtractorError as e: except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429: if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
raise raise
self.report_warning( self.report_warning(
'You have reached the API rate limit, which is ~600 requests per ' 'You have reached the API rate limit, which is ~600 requests per '

View File

@ -25,6 +25,7 @@ class SportDeutschlandIE(InfoExtractor):
'upload_date': '20230114', 'upload_date': '20230114',
'timestamp': 1673733618, 'timestamp': 1673733618,
}, },
'skip': 'not found',
}, { }, {
'url': 'https://sportdeutschland.tv/deutscherbadmintonverband/bwf-tour-1-runde-feld-1-yonex-gainward-german-open-2022-0', 'url': 'https://sportdeutschland.tv/deutscherbadmintonverband/bwf-tour-1-runde-feld-1-yonex-gainward-german-open-2022-0',
'info_dict': { 'info_dict': {
@ -41,6 +42,7 @@ class SportDeutschlandIE(InfoExtractor):
'upload_date': '20220309', 'upload_date': '20220309',
'timestamp': 1646860727.0, 'timestamp': 1646860727.0,
}, },
'skip': 'not found',
}, { }, {
'url': 'https://sportdeutschland.tv/ggcbremen/formationswochenende-latein-2023', 'url': 'https://sportdeutschland.tv/ggcbremen/formationswochenende-latein-2023',
'info_dict': { 'info_dict': {
@ -68,6 +70,7 @@ class SportDeutschlandIE(InfoExtractor):
'live_status': 'was_live', 'live_status': 'was_live',
}, },
}], }],
'skip': 'not found',
}, { }, {
'url': 'https://sportdeutschland.tv/dtb/gymnastik-international-tag-1', 'url': 'https://sportdeutschland.tv/dtb/gymnastik-international-tag-1',
'info_dict': { 'info_dict': {
@ -82,13 +85,30 @@ class SportDeutschlandIE(InfoExtractor):
'live_status': 'is_live', 'live_status': 'is_live',
}, },
'skip': 'live', 'skip': 'live',
}, {
'url': 'https://sportdeutschland.tv/rostock-griffins/gfl2-rostock-griffins-vs-elmshorn-fighting-pirates',
'md5': '35c11a19395c938cdd076b93bda54cde',
'info_dict': {
'id': '9f27a97d-1544-4d0b-aa03-48d92d17a03a',
'ext': 'mp4',
'title': 'GFL2: Rostock Griffins vs. Elmshorn Fighting Pirates',
'display_id': 'rostock-griffins/gfl2-rostock-griffins-vs-elmshorn-fighting-pirates',
'channel': 'Rostock Griffins',
'channel_url': 'https://sportdeutschland.tv/rostock-griffins',
'live_status': 'was_live',
'description': 'md5:60cb00067e55dafa27b0933a43d72862',
'channel_id': '9635f21c-3f67-4584-9ce4-796e9a47276b',
'timestamp': 1749913117,
'upload_date': '20250614',
},
}] }]
def _process_video(self, asset_id, video): def _process_video(self, asset_id, video):
is_live = video['type'] == 'mux_live' is_live = video['type'] == 'mux_live'
token = self._download_json( token = self._download_json(
f'https://api.sportdeutschland.tv/api/frontend/asset-token/{asset_id}', f'https://api.sportdeutschland.tv/api/web/personal/asset-token/{asset_id}',
video['id'], query={'type': video['type'], 'playback_id': video['src']})['token'] video['id'], query={'type': video['type'], 'playback_id': video['src']},
headers={'Referer': 'https://sportdeutschland.tv/'})['token']
formats, subtitles = self._extract_m3u8_formats_and_subtitles( formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://stream.mux.com/{video["src"]}.m3u8?token={token}', video['id'], live=is_live) f'https://stream.mux.com/{video["src"]}.m3u8?token={token}', video['id'], live=is_live)

View File

@ -41,6 +41,7 @@ class SproutVideoIE(InfoExtractor):
'duration': 703, 'duration': 703,
'thumbnail': r're:https?://images\.sproutvideo\.com/.+\.jpg', 'thumbnail': r're:https?://images\.sproutvideo\.com/.+\.jpg',
}, },
'skip': 'Account Disabled',
}, { }, {
# http formats 'sd' and 'hd' are available # http formats 'sd' and 'hd' are available
'url': 'https://videos.sproutvideo.com/embed/119cd6bc1a18e6cd98/30751a1761ae5b90', 'url': 'https://videos.sproutvideo.com/embed/119cd6bc1a18e6cd98/30751a1761ae5b90',
@ -97,11 +98,21 @@ def _extract_embed_urls(cls, url, webpage):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( webpage = self._download_webpage(url, video_id, headers={
url, video_id, headers=traverse_obj(smuggled_data, {'Referer': 'referer'})) **traverse_obj(smuggled_data, {'Referer': 'referer'}),
# yt-dlp's default Chrome user-agents are too old
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:140.0) Gecko/20100101 Firefox/140.0',
})
data = self._search_json( data = self._search_json(
r'var\s+dat\s*=\s*["\']', webpage, 'data', video_id, contains_pattern=r'[A-Za-z0-9+/=]+', r'var\s+(?:dat|playerInfo)\s*=\s*["\']', webpage, 'player info', video_id,
end_pattern=r'["\'];', transform_source=lambda x: base64.b64decode(x).decode()) contains_pattern=r'[A-Za-z0-9+/=]+', end_pattern=r'["\'];',
transform_source=lambda x: base64.b64decode(x).decode())
# SproutVideo may send player info for 'SMPTE Color Monitor Test' [a791d7b71b12ecc52e]
# e.g. if the user-agent we used with the webpage request is too old
video_uid = data['videoUid']
if video_id != video_uid:
raise ExtractorError(f'{self.IE_NAME} sent the wrong video data ({video_uid})')
formats, subtitles = [], {} formats, subtitles = [], {}
headers = { headers = {

View File

@ -1,57 +1,102 @@
from .ard import ARDMediathekBaseIE from .ard import ARDMediathekBaseIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
get_element_by_attribute, clean_html,
extract_attributes,
parse_duration,
parse_qs,
unified_strdate,
)
from ..utils.traversal import (
find_element,
require,
traverse_obj,
) )
class SRMediathekIE(ARDMediathekBaseIE): class SRMediathekIE(ARDMediathekBaseIE):
_WORKING = False
IE_NAME = 'sr:mediathek' IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk' IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
_CLS_COMMON = 'teaser__image__caption__text teaser__image__caption__text--'
_VALID_URL = r'https?://(?:www\.)?sr-mediathek\.de/index\.php\?.*?&id=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=28455', 'url': 'https://www.sr-mediathek.de/index.php?seite=7&id=141317',
'info_dict': { 'info_dict': {
'id': '28455', 'id': '141317',
'ext': 'mp4', 'ext': 'mp4',
'title': 'sportarena (26.10.2014)', 'title': 'Kärnten, da will ich hin!',
'description': 'Ringen: KSV Köllerbach gegen Aachen-Walheim; Frauen-Fußball: 1. FC Saarbrücken gegen Sindelfingen; Motorsport: Rallye in Losheim; dazu: Interview mit Timo Bernhard; Turnen: TG Saar; Reitsport: Deutscher Voltigier-Pokal; Badminton: Interview mit Michael Fuchs ', 'channel': 'SR Fernsehen',
'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:7732e71e803379a499732864a572a456',
}, 'duration': 1788.0,
'skip': 'no longer available', 'release_date': '20250525',
}, { 'series': 'da will ich hin!',
'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=37682', 'series_id': 'DWIH',
'info_dict': { 'thumbnail': r're:https?://.+\.jpg',
'id': '37682',
'ext': 'mp4',
'title': 'Love, Cakes and Rock\'n\'Roll',
'description': 'md5:18bf9763631c7d326c22603681e1123d',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
}, { }, {
'url': 'http://sr-mediathek.de/index.php?seite=7&id=7480', 'url': 'https://www.sr-mediathek.de/index.php?seite=7&id=153853',
'only_matching': True, 'info_dict': {
'id': '153853',
'ext': 'mp3',
'title': 'Kappes, Klöße, Kokosmilch: Bruschetta mit Nduja',
'channel': 'SR 3',
'description': 'md5:3935798de3562b10c4070b408a15e225',
'duration': 139.0,
'release_date': '20250523',
'series': 'Kappes, Klöße, Kokosmilch',
'series_id': 'SR3_KKK_A',
'thumbnail': r're:https?://.+\.jpg',
},
}, {
'url': 'https://www.sr-mediathek.de/index.php?seite=7&id=31406&pnr=&tbl=pf',
'info_dict': {
'id': '31406',
'ext': 'mp3',
'title': 'Das Leben schwer nehmen, ist einfach zu anstrengend',
'channel': 'SR 1',
'description': 'md5:3e03fd556af831ad984d0add7175fb0c',
'duration': 1769.0,
'release_date': '20230717',
'series': 'Abendrot',
'series_id': 'SR1_AB_P',
'thumbnail': r're:https?://.+\.jpg',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
description = self._og_search_description(webpage)
if '>Der gew&uuml;nschte Beitrag ist leider nicht mehr verf&uuml;gbar.<' in webpage: if description == 'Der gewünschte Beitrag ist leider nicht mehr vorhanden.':
raise ExtractorError(f'Video {video_id} is no longer available', expected=True) raise ExtractorError(f'Video {video_id} is no longer available', expected=True)
media_collection_url = self._search_regex( player_url = traverse_obj(webpage, (
r'data-mediacollection-ardplayer="([^"]+)"', webpage, 'media collection url') {find_element(tag='div', id=f'player{video_id}', html=True)},
info = self._extract_media_info(media_collection_url, webpage, video_id) {extract_attributes}, 'data-mediacollection-ardplayer',
info.update({ {self._proto_relative_url}, {require('player URL')}))
article = traverse_obj(webpage, (
{find_element(cls='article__content')},
{find_element(tag='p')}, {clean_html}))
return {
**self._extract_media_info(player_url, webpage, video_id),
'id': video_id, 'id': video_id,
'title': get_element_by_attribute('class', 'ardplayer-title', webpage), 'title': traverse_obj(webpage, (
'description': self._og_search_description(webpage), {find_element(cls='ardplayer-title')}, {clean_html})),
'channel': traverse_obj(webpage, (
{find_element(cls=f'{self._CLS_COMMON}subheadline')},
{lambda x: x.split('|')[0]}, {clean_html})),
'description': description,
'duration': parse_duration(self._search_regex(
r'(\d{2}:\d{2}:\d{2})', article, 'duration')),
'release_date': unified_strdate(self._search_regex(
r'(\d{2}\.\d{2}\.\d{4})', article, 'release_date')),
'series': traverse_obj(webpage, (
{find_element(cls=f'{self._CLS_COMMON}headline')}, {clean_html})),
'series_id': traverse_obj(webpage, (
{find_element(cls='teaser__link', html=True)},
{extract_attributes}, 'href', {parse_qs}, 'sen', ..., {str}, any)),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
}) }
return info

View File

@ -4,6 +4,7 @@
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
traverse_obj, traverse_obj,
url_basename,
url_or_none, url_or_none,
) )
@ -65,9 +66,19 @@ def _extract_ppv(self, url):
hls_info, decrypt = self._call_encrypted_api( hls_info, decrypt = self._call_encrypted_api(
video_id, ':watchArchive', 'stream information', data={'method': 1}) video_id, ':watchArchive', 'stream information', data={'method': 1})
formats = self._get_formats(hls_info, ('hls', 'urls', ..., {url_or_none}), video_id)
for f in formats:
# bitrates are exaggerated in PPV playlists, so avoid wrong/huge filesize_approx values
if f.get('tbr'):
f['tbr'] = int(f['tbr'] / 2.5)
# prefer variants with the same basename as the master playlist to avoid partial streams
f['format_id'] = url_basename(f['url']).partition('.')[0]
if not f['format_id'].startswith(url_basename(f['manifest_url']).partition('.')[0]):
f['preference'] = -10
return { return {
'id': video_id, 'id': video_id,
'formats': self._get_formats(hls_info, ('hls', 'urls', ..., {url_or_none}), video_id), 'formats': formats,
'hls_aes': self._extract_hls_key(hls_info, 'hls', decrypt), 'hls_aes': self._extract_hls_key(hls_info, 'hls', decrypt),
**traverse_obj(video_info, { **traverse_obj(video_info, {
'title': ('displayName', {str}), 'title': ('displayName', {str}),

View File

@ -1,76 +1,76 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none, urljoin from .youtube import YoutubeIE
from ..utils import (
clean_html,
parse_iso8601,
update_url,
url_or_none,
)
from ..utils.traversal import subs_list_to_dict, traverse_obj
class StarTrekIE(InfoExtractor): class StarTrekIE(InfoExtractor):
_WORKING = False IE_NAME = 'startrek'
_VALID_URL = r'(?P<base>https?://(?:intl|www)\.startrek\.com)/videos/(?P<id>[^/]+)' IE_DESC = 'STAR TREK'
_VALID_URL = r'https?://(?:www\.)?startrek\.com(?:/en-(?:ca|un))?/videos/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://intl.startrek.com/videos/watch-welcoming-jess-bush-to-the-ready-room', 'url': 'https://www.startrek.com/en-un/videos/official-trailer-star-trek-lower-decks-season-4',
'md5': '491df5035c9d4dc7f63c79caaf9c839e',
'info_dict': { 'info_dict': {
'id': 'watch-welcoming-jess-bush-to-the-ready-room', 'id': 'official-trailer-star-trek-lower-decks-season-4',
'ext': 'mp4', 'ext': 'mp4',
'title': 'WATCH: Welcoming Jess Bush to The Ready Room', 'title': 'Official Trailer | Star Trek: Lower Decks - Season 4',
'duration': 1888, 'alt_title': 'md5:dd7e3191aaaf9e95db16fc3abd5ef68b',
'timestamp': 1655388000, 'categories': ['TRAILERS'],
'upload_date': '20220616', 'description': 'md5:563d7856ddab99bee7a5e50f45531757',
'description': 'md5:1ffee884e3920afbdd6dd04e926a1221', 'release_date': '20230722',
'thumbnail': r're:https://(?:intl|www)\.startrek\.com/sites/default/files/styles/video_1920x1080/public/images/2022-06/pp_14794_rr_thumb_107_yt_16x9\.jpg(?:\?.+)?', 'release_timestamp': 1690033200,
'subtitles': {'en-US': [{ 'series': 'Star Trek: Lower Decks',
'url': r're:https://(?:intl|www)\.startrek\.com/sites/default/files/video/captions/2022-06/TRR_SNW_107_v4\.vtt', 'series_id': 'star-trek-lower-decks',
}, { 'thumbnail': r're:https?://.+\.(?:jpg|png)',
'url': 'https://media.startrek.com/2022/06/16/2043801155561/1069981_hls/trr_snw_107_v4-c4bfc25d/stream_vtt.m3u8',
}]},
}, },
}, { }, {
'url': 'https://www.startrek.com/videos/watch-ethan-peck-and-gia-sandhu-beam-down-to-the-ready-room', 'url': 'https://www.startrek.com/en-ca/videos/my-first-contact-senator-cory-booker',
'md5': 'f5ad74fbb86e91e0882fc0a333178d1d',
'info_dict': { 'info_dict': {
'id': 'watch-ethan-peck-and-gia-sandhu-beam-down-to-the-ready-room', 'id': 'my-first-contact-senator-cory-booker',
'ext': 'mp4', 'ext': 'mp4',
'title': 'WATCH: Ethan Peck and Gia Sandhu Beam Down to The Ready Room', 'title': 'My First Contact: Senator Cory Booker',
'duration': 1986, 'alt_title': 'md5:fe74a8bdb0afab421c6e159a7680db4d',
'timestamp': 1654221600, 'categories': ['MY FIRST CONTACT'],
'upload_date': '20220603', 'description': 'md5:a3992ab3b3e0395925d71156bbc018ce',
'description': 'md5:b3aa0edacfe119386567362dec8ed51b', 'release_date': '20250401',
'thumbnail': r're:https://www\.startrek\.com/sites/default/files/styles/video_1920x1080/public/images/2022-06/pp_14792_rr_thumb_105_yt_16x9_1.jpg(?:\?.+)?', 'release_timestamp': 1743512400,
'subtitles': {'en-US': [{ 'series': 'Star Trek: The Original Series',
'url': r're:https://(?:intl|www)\.startrek\.com/sites/default/files/video/captions/2022-06/TRR_SNW_105_v5\.vtt', 'series_id': 'star-trek-the-original-series',
}]}, 'thumbnail': r're:https?://.+\.(?:jpg|png)',
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
urlbase, video_id = self._match_valid_url(url).group('base', 'id') video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
player = self._search_regex( page_props = self._search_nextjs_data(webpage, video_id)['props']['pageProps']
r'(<\s*div\s+id\s*=\s*"cvp-player-[^<]+<\s*/div\s*>)', webpage, 'player') video_data = page_props['video']['data']
if youtube_id := video_data.get('youtube_video_id'):
return self.url_result(youtube_id, YoutubeIE)
hls = self._html_search_regex(r'\bdata-hls\s*=\s*"([^"]+)"', player, 'HLS URL') series_id = traverse_obj(video_data, (
formats, subtitles = self._extract_m3u8_formats_and_subtitles(hls, video_id, 'mp4') 'series_and_movies', ..., 'series_or_movie', 'slug', {str}, any))
captions = self._html_search_regex(
r'\bdata-captions-url\s*=\s*"([^"]+)"', player, 'captions URL', fatal=False)
if captions:
subtitles.setdefault('en-US', [])[:0] = [{'url': urljoin(urlbase, captions)}]
# NB: Most of the data in the json_ld is undesirable
json_ld = self._search_json_ld(webpage, video_id, fatal=False)
return { return {
'id': video_id, 'id': video_id,
'title': self._html_search_regex( 'series': traverse_obj(page_props, (
r'\bdata-title\s*=\s*"([^"]+)"', player, 'title', json_ld.get('title')), 'queried', 'header', 'tab3', 'slices', ..., 'items',
'description': self._html_search_regex( lambda _, v: v['link']['slug'] == series_id, 'link_copy', {str}, any)),
r'(?s)<\s*div\s+class\s*=\s*"header-body"\s*>(.+?)<\s*/div\s*>', 'series_id': series_id,
webpage, 'description', fatal=False), **traverse_obj(video_data, {
'duration': int_or_none(self._html_search_regex( 'title': ('title', ..., 'text', {clean_html}, any),
r'\bdata-duration\s*=\s*"(\d+)"', player, 'duration', fatal=False)), 'alt_title': ('subhead', ..., 'text', {clean_html}, any),
'formats': formats, 'categories': ('category', 'data', 'category_name', {str.upper}, filter, all),
'subtitles': subtitles, 'description': ('slices', ..., 'primary', 'content', ..., 'text', {clean_html}, any),
'thumbnail': urljoin(urlbase, self._html_search_regex( 'release_timestamp': ('published', {parse_iso8601}),
r'\bdata-poster-url\s*=\s*"([^"]+)"', player, 'thumbnail', fatal=False)), 'subtitles': ({'url': 'legacy_subtitle_file'}, all, {subs_list_to_dict(lang='en')}),
'timestamp': json_ld.get('timestamp'), 'thumbnail': ('poster_frame', 'url', {url_or_none}, {update_url(query=None)}),
'url': ('legacy_video_url', {url_or_none}),
}),
} }

View File

@ -6,10 +6,13 @@
determine_ext, determine_ext,
dict_get, dict_get,
int_or_none, int_or_none,
traverse_obj,
try_get, try_get,
unified_timestamp, unified_timestamp,
) )
from ..utils.traversal import (
require,
traverse_obj,
)
class SVTBaseIE(InfoExtractor): class SVTBaseIE(InfoExtractor):
@ -97,40 +100,8 @@ def _extract_video(self, video_info, video_id):
} }
class SVTIE(SVTBaseIE): class SVTPlayIE(SVTBaseIE):
_VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)' IE_NAME = 'svt:play'
_EMBED_REGEX = [rf'(?:<iframe src|href)="(?P<url>{_VALID_URL}[^"]*)"']
_TEST = {
'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
'md5': '33e9a5d8f646523ce0868ecfb0eed77d',
'info_dict': {
'id': '2900353',
'ext': 'mp4',
'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27,
'age_limit': 0,
},
}
def _real_extract(self, url):
mobj = self._match_valid_url(url)
widget_id = mobj.group('widget_id')
article_id = mobj.group('id')
info = self._download_json(
f'http://www.svt.se/wd?widgetId={widget_id}&articleId={article_id}&format=json&type=embed&output=json',
article_id)
info_dict = self._extract_video(info['video'], article_id)
info_dict['title'] = info['context']['title']
return info_dict
class SVTPlayBaseIE(SVTBaseIE):
_SVTPLAY_RE = r'root\s*\[\s*(["\'])_*svtplay\1\s*\]\s*=\s*(?P<json>{.+?})\s*;\s*\n'
class SVTPlayIE(SVTPlayBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv' IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
@ -173,6 +144,7 @@ class SVTPlayIE(SVTPlayBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': '1. Farlig kryssning', 'title': '1. Farlig kryssning',
'timestamp': 1491019200, 'timestamp': 1491019200,
'description': 'md5:8f350bc605677a5ead36a19a62fd9a34',
'upload_date': '20170401', 'upload_date': '20170401',
'duration': 2566, 'duration': 2566,
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$', 'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
@ -186,19 +158,21 @@ class SVTPlayIE(SVTPlayBaseIE):
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
'expected_warnings': [r'Failed to download (?:MPD|m3u8)'],
}, { }, {
'url': 'https://www.svtplay.se/video/jz2rYz7/anders-hansen-moter/james-fallon?info=visa', 'url': 'https://www.svtplay.se/video/jz2rYz7/anders-hansen-moter/james-fallon?info=visa',
'info_dict': { 'info_dict': {
'id': 'jvXAGVb', 'id': 'jvXAGVb',
'ext': 'mp4', 'ext': 'mp4',
'title': 'James Fallon', 'title': 'James Fallon',
'timestamp': 1673917200, 'description': r're:James Fallon är hjärnforskaren .{532} att upptäcka psykopati tidigt\?$',
'upload_date': '20230117', 'timestamp': 1743379200,
'upload_date': '20250331',
'duration': 1081, 'duration': 1081,
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$', 'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
'age_limit': 0, 'age_limit': 0,
'episode': 'James Fallon', 'episode': 'James Fallon',
'series': 'Anders Hansen möter...', 'series': 'Anders Hansen möter',
}, },
'params': { 'params': {
'skip_download': 'dash', 'skip_download': 'dash',
@ -233,96 +207,75 @@ class SVTPlayIE(SVTPlayBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
def _extract_by_video_id(self, video_id, webpage=None): def _extract_by_video_id(self, video_id):
data = self._download_json( data = self._download_json(
f'https://api.svt.se/videoplayer-api/video/{video_id}', f'https://api.svt.se/videoplayer-api/video/{video_id}',
video_id, headers=self.geo_verification_headers()) video_id, headers=self.geo_verification_headers())
info_dict = self._extract_video(data, video_id) info_dict = self._extract_video(data, video_id)
if not info_dict.get('title'): if not info_dict.get('title'):
title = dict_get(info_dict, ('episode', 'series')) info_dict['title'] = traverse_obj(info_dict, 'episode', 'series')
if not title and webpage:
title = re.sub(
r'\s*\|\s*.+?$', '', self._og_search_title(webpage))
if not title:
title = video_id
info_dict['title'] = title
return info_dict return info_dict
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) mobj = self._match_valid_url(url)
video_id = mobj.group('id') video_id = mobj.group('id')
svt_id = mobj.group('svt_id') or mobj.group('modal_id') svt_id = mobj.group('svt_id') or mobj.group('modal_id')
if svt_id: if svt_id:
return self._extract_by_video_id(svt_id) return self._extract_by_video_id(svt_id)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data = self._parse_json( data = traverse_obj(self._search_nextjs_data(webpage, video_id), (
self._search_regex( 'props', 'urqlState', ..., 'data', {json.loads},
self._SVTPLAY_RE, webpage, 'embedded data', default='{}', 'detailsPageByPath', {dict}, any, {require('video data')}))
group='json'), details = traverse_obj(data, (
video_id, fatal=False) 'modules', lambda _, v: v['details']['smartStart']['item']['videos'], 'details', any))
svt_id = traverse_obj(details, (
thumbnail = self._og_search_thumbnail(webpage) 'smartStart', 'item', 'videos',
# There can be 'AudioDescribed' and 'SignInterpreted' variants; try 'Default' or else get first
if data: (lambda _, v: v['accessibility'] == 'Default', 0),
video_info = try_get( 'svtId', {str}, any))
data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
dict)
if video_info:
info_dict = self._extract_video(video_info, video_id)
info_dict.update({
'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
'thumbnail': thumbnail,
})
return info_dict
svt_id = try_get(
data, lambda x: x['statistics']['dataLake']['content']['id'],
str)
if not svt_id: if not svt_id:
nextjs_data = self._search_nextjs_data(webpage, video_id, fatal=False) svt_id = traverse_obj(data, ('video', 'svtId', {str}, {require('SVT ID')}))
svt_id = traverse_obj(nextjs_data, (
'props', 'urqlState', ..., 'data', {json.loads}, 'detailsPageByPath',
'video', 'svtId', {str}), get_all=False)
if not svt_id: info_dict = self._extract_by_video_id(svt_id)
svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/[\w-]+/[^"\']*\b(?:modalId|id)=([\w-]+)'),
webpage, 'video id')
info_dict = self._extract_by_video_id(svt_id, webpage) if not info_dict.get('title'):
info_dict['thumbnail'] = thumbnail info_dict['title'] = re.sub(r'\s*\|\s*.+?$', '', self._og_search_title(webpage))
if not info_dict.get('thumbnail'):
info_dict['thumbnail'] = self._og_search_thumbnail(webpage)
if not info_dict.get('description'):
info_dict['description'] = traverse_obj(details, ('description', {str}))
return info_dict return info_dict
class SVTSeriesIE(SVTPlayBaseIE): class SVTSeriesIE(SVTBaseIE):
IE_NAME = 'svt:play:series'
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?' _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svtplay.se/rederiet', 'url': 'https://www.svtplay.se/rederiet',
'info_dict': { 'info_dict': {
'id': '14445680', 'id': 'jpmQYgn',
'title': 'Rederiet', 'title': 'Rederiet',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039', 'description': 'md5:f71122f7cf2e52b643e75915e04cb83d',
}, },
'playlist_mincount': 318, 'playlist_mincount': 318,
}, { }, {
'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680', 'url': 'https://www.svtplay.se/rederiet?tab=season-2-jpmQYgn',
'info_dict': { 'info_dict': {
'id': 'season-2-14445680', 'id': 'season-2-jpmQYgn',
'title': 'Rederiet - Säsong 2', 'title': 'Rederiet - Säsong 2',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039', 'description': 'md5:f71122f7cf2e52b643e75915e04cb83d',
}, },
'playlist_mincount': 12, 'playlist_mincount': 12,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super().suitable(url) return False if SVTPlayIE.suitable(url) else super().suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
series_slug, season_id = self._match_valid_url(url).groups() series_slug, season_id = self._match_valid_url(url).groups()
@ -386,6 +339,7 @@ def _real_extract(self, url):
class SVTPageIE(SVTBaseIE): class SVTPageIE(SVTBaseIE):
IE_NAME = 'svt:page'
_VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/?#]+/)*(?P<id>[^/?&#]+)' _VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/?#]+/)*(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svt.se/nyheter/lokalt/skane/viktor-18-forlorade-armar-och-ben-i-sepsis-vill-ateruppta-karaten-och-bli-svetsare', 'url': 'https://www.svt.se/nyheter/lokalt/skane/viktor-18-forlorade-armar-och-ben-i-sepsis-vill-ateruppta-karaten-och-bli-svetsare',
@ -463,7 +417,7 @@ class SVTPageIE(SVTBaseIE):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super().suitable(url) return False if SVTPlayIE.suitable(url) else super().suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)

View File

@ -1,58 +0,0 @@
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class SyfyIE(AdobePassIE):
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
'info_dict': {
'id': '2968097',
'ext': 'mp4',
'title': 'The Internet Ruined My Life: Season 1 Trailer',
'description': 'One tweet, one post, one click, can destroy everything.',
'uploader': 'NBCU-MPAT',
'upload_date': '20170113',
'timestamp': 1484345640,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
'skip': 'Redirects to main page',
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
syfy_mpx = next(iter(self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
display_id)['syfy']['syfy_mpx'].values()))
video_id = syfy_mpx['mpxGUID']
title = syfy_mpx['episodeTitle']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
if syfy_mpx.get('entitlement') == 'auth':
resource = self._get_mvpd_resource(
'syfy', title, video_id,
syfy_mpx.get('mpxRating', 'TV-14'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'syfy', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
self._proto_relative_url(syfy_mpx['releaseURL']), query),
{'force_smil_url': True}),
'title': title,
'id': video_id,
'display_id': display_id,
}

View File

@ -32,6 +32,10 @@ class TBSIE(TurnerBaseIE):
'url': 'http://www.tntdrama.com/movies/star-wars-a-new-hope', 'url': 'http://www.tntdrama.com/movies/star-wars-a-new-hope',
'only_matching': True, 'only_matching': True,
}] }]
_SOFTWARE_STATEMENT_MAP = {
'tbs': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJkZTA0NTYxZS1iMTFhLTRlYTgtYTg5NC01NjI3MGM1NmM2MWIiLCJuYmYiOjE1MzcxODkzOTAsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTg5MzkwfQ.Z7ny66kaqNDdCHf9Y9KsV12LrBxrLkGGxlYe2XGm6qsw2T-k1OCKC1TMzeqiZP735292MMRAQkcJDKrMIzNbAuf9nCdIcv4kE1E2nqUnjPMBduC1bHffZp8zlllyrN2ElDwM8Vhwv_5nElLRwWGEt0Kaq6KJAMZA__WDxKWC18T-wVtsOZWXQpDqO7nByhfj2t-Z8c3TUNVsA_wHgNXlkzJCZ16F2b7yGLT5ZhLPupOScd3MXC5iPh19HSVIok22h8_F_noTmGzmMnIRQi6bWYWK2zC7TQ_MsYHfv7V6EaG5m1RKZTV6JAwwoJQF_9ByzarLV1DGwZxD9-eQdqswvg',
'tntdrama': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwOTMxYTU4OS1jZjEzLTRmNjMtYTJmYy03MzhjMjE1NWU5NjEiLCJuYmYiOjE1MzcxOTA4MjcsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwODI3fQ.AucKvtws7oekTXi80_zX4-BlgJD9GLvlOI9FlBCjdlx7Pa3eJ0AqbogynKMiatMbnLOTMHGjd7tTiq422unmZjBz70dhePAe9BbW0dIo7oQ57vZ-VBYw_tWYRPmON61MwAbLVlqROD3n_zURs85S8TlkQx9aNx9x_riGGELjd8l05CVa_pOluNhYvuIFn6wmrASOKI1hNEblBDWh468UWP571-fe4zzi0rlYeeHd-cjvtWvOB3bQsWrUVbK4pRmqvzEH59j0vNF-ihJF9HncmUicYONe47Mib3elfMok23v4dB1_UAlQY_oawfNcynmEnJQCcqFmbHdEwTW6gMiYsA',
}
def _real_extract(self, url): def _real_extract(self, url):
site, path, display_id = self._match_valid_url(url).groups() site, path, display_id = self._match_valid_url(url).groups()
@ -48,7 +52,7 @@ def _real_extract(self, url):
drupal_settings['ngtv_token_url']).query) drupal_settings['ngtv_token_url']).query)
info = self._extract_ngtv_info( info = self._extract_ngtv_info(
media_id, tokenizer_query, { media_id, tokenizer_query, self._SOFTWARE_STATEMENT_MAP[site], {
'url': url, 'url': url,
'site_name': site[:3].upper(), 'site_name': site[:3].upper(),
'auth_required': video_data.get('authRequired') == '1' or is_live, 'auth_required': video_data.get('authRequired') == '1' or is_live,

View File

@ -156,6 +156,7 @@ def _real_extract(self, url):
class ConanClassicIE(TeamcocoBaseIE): class ConanClassicIE(TeamcocoBaseIE):
_WORKING = False
_VALID_URL = r'https?://(?:(?:www\.)?conanclassic|conan25\.teamcoco)\.com/(?P<id>([^/]+/)*[^/?#]+)' _VALID_URL = r'https?://(?:(?:www\.)?conanclassic|conan25\.teamcoco)\.com/(?P<id>([^/]+/)*[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://conanclassic.com/video/ice-cube-kevin-hart-conan-share-lyft', 'url': 'https://conanclassic.com/video/ice-cube-kevin-hart-conan-share-lyft',
@ -263,7 +264,7 @@ def _real_extract(self, url):
info.update(self._extract_ngtv_info(media_id, { info.update(self._extract_ngtv_info(media_id, {
'accessToken': token, 'accessToken': token,
'accessTokenType': 'jws', 'accessTokenType': 'jws',
})) }, None)) # TODO: the None arg needs to be the AdobePass software_statement
else: else:
formats, subtitles = self._get_formats_and_subtitles( formats, subtitles = self._get_formats_and_subtitles(
traverse_obj(response, ('data', 'findRecordVideoMetadata')), video_id) traverse_obj(response, ('data', 'findRecordVideoMetadata')), video_id)

View File

@ -63,6 +63,17 @@ def _parse_content(self, content, url):
'http_headers': headers, 'http_headers': headers,
} }
def _download_akamai_webpage(self, url, display_id):
try: # yt-dlp's default user-agents are too old and blocked by akamai
return self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient to bypass akamai
return self._download_webpage(url, display_id, impersonate=True)
class TelecincoIE(TelecincoBaseIE): class TelecincoIE(TelecincoBaseIE):
IE_DESC = 'telecinco.es, cuatro.com and mediaset.es' IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
@ -140,7 +151,7 @@ class TelecincoIE(TelecincoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_akamai_webpage(url, display_id)
article = self._search_json( article = self._search_json(
r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=', r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=',
webpage, 'article', display_id)['article'] webpage, 'article', display_id)['article']

View File

@ -6,32 +6,32 @@
class TenPlayIE(InfoExtractor): class TenPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})' IE_NAME = '10play'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/?#]+/)+(?P<id>tpv\d{6}[a-z]{5})'
_NETRC_MACHINE = '10play' _NETRC_MACHINE = '10play'
_TESTS = [{ _TESTS = [{
'url': 'https://10play.com.au/neighbours/web-extras/season-41/heres-a-first-look-at-mischa-bartons-neighbours-debut/tpv230911hyxnz', # Geo-restricted to Australia
'url': 'https://10play.com.au/australian-survivor/web-extras/season-10-brains-v-brawn-ii/myless-journey/tpv250414jdmtf',
'info_dict': { 'info_dict': {
'id': '6336940246112', 'id': '7440980000013868',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Here\'s A First Look At Mischa Barton\'s Neighbours Debut', 'title': 'Myles\'s Journey',
'alt_title': 'Here\'s A First Look At Mischa Barton\'s Neighbours Debut', 'alt_title': 'Myles\'s Journey',
'description': 'Neighbours Premieres Monday, September 18 At 4:30pm On 10 And 10 Play And 6:30pm On 10 Peach', 'description': 'Relive Myles\'s epic Brains V Brawn II journey to reach the game\'s final two',
'duration': 74,
'season': 'Season 41',
'season_number': 41,
'series': 'Neighbours',
'thumbnail': r're:https://.*\.jpg',
'uploader': 'Channel 10', 'uploader': 'Channel 10',
'age_limit': 15,
'timestamp': 1694386800,
'upload_date': '20230910',
'uploader_id': '2199827728001', 'uploader_id': '2199827728001',
'age_limit': 15,
'duration': 249,
'thumbnail': r're:https://.+/.+\.jpg',
'series': 'Australian Survivor',
'season': 'Season 10',
'season_number': 10,
'timestamp': 1744629420,
'upload_date': '20250414',
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': True,
},
'skip': 'Only available in Australia',
}, { }, {
# Geo-restricted to Australia
'url': 'https://10play.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp', 'url': 'https://10play.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp',
'info_dict': { 'info_dict': {
'id': '9000000000091177', 'id': '9000000000091177',
@ -45,17 +45,38 @@ class TenPlayIE(InfoExtractor):
'season': 'Season 42', 'season': 'Season 42',
'season_number': 42, 'season_number': 42,
'series': 'Neighbours', 'series': 'Neighbours',
'thumbnail': r're:https://.*\.jpg', 'thumbnail': r're:https://.+/.+\.jpg',
'age_limit': 15, 'age_limit': 15,
'timestamp': 1725517860, 'timestamp': 1725517860,
'upload_date': '20240905', 'upload_date': '20240905',
'uploader': 'Channel 10', 'uploader': 'Channel 10',
'uploader_id': '2199827728001', 'uploader_id': '2199827728001',
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': True, }, {
# Geo-restricted to Australia; upgrading the m3u8 quality fails and we need the fallback
'url': 'https://10play.com.au/tiny-chef-show/episodes/season-1/episode-2/tpv240228pofvt',
'info_dict': {
'id': '9000000000084116',
'ext': 'mp4',
'uploader': 'Channel 10',
'uploader_id': '2199827728001',
'duration': 1297,
'title': 'The Tiny Chef Show - S1 Ep. 2',
'alt_title': 'S1 Ep. 2 - Popcorn/banana',
'description': 'md5:d4758b52b5375dfaa67a78261dcb5763',
'age_limit': 0,
'series': 'The Tiny Chef Show',
'season_number': 1,
'episode_number': 2,
'timestamp': 1747957740,
'thumbnail': r're:https://.+/.+\.jpg',
'upload_date': '20250522',
'season': 'Season 1',
'episode': 'Episode 2',
}, },
'skip': 'Only available in Australia', 'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information: HTTP Error 502'],
}, { }, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc', 'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True, 'only_matching': True,
@ -86,8 +107,11 @@ def _real_extract(self, url):
if '10play-not-in-oz' in m3u8_url: if '10play-not-in-oz' in m3u8_url:
self.raise_geo_restricted(countries=['AU']) self.raise_geo_restricted(countries=['AU'])
# Attempt to get a higher quality stream # Attempt to get a higher quality stream
m3u8_url = m3u8_url.replace(',150,75,55,0000', ',300,150,75,55,0000') formats = self._extract_m3u8_formats(
formats = self._extract_m3u8_formats(m3u8_url, content_id, 'mp4') m3u8_url.replace(',150,75,55,0000', ',300,150,75,55,0000'),
content_id, 'mp4', fatal=False)
if not formats:
formats = self._extract_m3u8_formats(m3u8_url, content_id, 'mp4')
return { return {
'id': content_id, 'id': content_id,
@ -112,21 +136,22 @@ def _real_extract(self, url):
class TenPlaySeasonIE(InfoExtractor): class TenPlaySeasonIE(InfoExtractor):
IE_NAME = '10play:season'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-14', 'url': 'https://10play.com.au/masterchef/episodes/season-15',
'info_dict': { 'info_dict': {
'title': 'Season 14', 'title': 'Season 15',
'id': 'MjMyOTIy', 'id': 'MTQ2NjMxOQ==',
}, },
'playlist_mincount': 64, 'playlist_mincount': 50,
}, { }, {
'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2022', 'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2024',
'info_dict': { 'info_dict': {
'title': 'Season 2022', 'title': 'Season 2024',
'id': 'Mjc0OTIw', 'id': 'Mjc0OTIw',
}, },
'playlist_mincount': 256, 'playlist_mincount': 159,
}] }]
def _entries(self, load_more_url, display_id=None): def _entries(self, load_more_url, display_id=None):

View File

@ -12,11 +12,13 @@
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
parse_age_limit,
parse_qs, parse_qs,
traverse_obj, traverse_obj,
unsmuggle_url, unsmuggle_url,
update_url, update_url,
update_url_query, update_url_query,
url_or_none,
urlhandle_detect_ext, urlhandle_detect_ext,
xpath_with_ns, xpath_with_ns,
) )
@ -63,62 +65,53 @@ def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL d
return formats, subtitles return formats, subtitles
def _download_theplatform_metadata(self, path, video_id): def _download_theplatform_metadata(self, path, video_id, fatal=True):
info_url = f'http://link.theplatform.{self._TP_TLD}/s/{path}?format=preview' return self._download_json(
return self._download_json(info_url, video_id) f'https://link.theplatform.{self._TP_TLD}/s/{path}', video_id,
fatal=fatal, query={'format': 'preview'}) or {}
def _parse_theplatform_metadata(self, info): @staticmethod
subtitles = {} def _parse_theplatform_metadata(tp_metadata):
captions = info.get('captions') def site_specific_filter(*fields):
if isinstance(captions, list): return lambda k, v: v and k.endswith(tuple(f'${f}' for f in fields))
for caption in captions:
lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
subtitles.setdefault(lang, []).append({
'ext': mimetype2ext(mime),
'url': src,
})
duration = info.get('duration') info = traverse_obj(tp_metadata, {
tp_chapters = info.get('chapters', []) 'title': ('title', {str}),
chapters = [] 'episode': ('title', {str}),
if tp_chapters: 'description': ('description', {str}),
def _add_chapter(start_time, end_time): 'thumbnail': ('defaultThumbnailUrl', {url_or_none}),
start_time = float_or_none(start_time, 1000) 'duration': ('duration', {float_or_none(scale=1000)}),
end_time = float_or_none(end_time, 1000) 'timestamp': ('pubDate', {float_or_none(scale=1000)}),
if start_time is None or end_time is None: 'uploader': ('billingCode', {str}),
return 'creators': ('author', {str}, filter, all, filter),
chapters.append({ 'categories': (
'start_time': start_time, 'categories', lambda _, v: v.get('label') in ['category', None],
'end_time': end_time, 'name', {str}, filter, all, filter),
}) 'tags': ('keywords', {str}, filter, {lambda x: re.split(r'[;,]\s?', x)}, filter),
'age_limit': ('ratings', ..., 'rating', {parse_age_limit}, any),
'season_number': (site_specific_filter('seasonNumber'), {int_or_none}, any),
'episode_number': (site_specific_filter('episodeNumber', 'airOrder'), {int_or_none}, any),
'series': (site_specific_filter('show', 'seriesTitle', 'seriesShortTitle'), (None, ...), {str}, any),
'location': (site_specific_filter('region'), {str}, any),
'media_type': (site_specific_filter('programmingType', 'type'), {str}, any),
})
for chapter in tp_chapters[:-1]: chapters = traverse_obj(tp_metadata, ('chapters', ..., {
_add_chapter(chapter.get('startTime'), chapter.get('endTime')) 'start_time': ('startTime', {float_or_none(scale=1000)}),
_add_chapter(tp_chapters[-1].get('startTime'), tp_chapters[-1].get('endTime') or duration) 'end_time': ('endTime', {float_or_none(scale=1000)}),
}))
# Ignore pointless single chapters from short videos that span the entire video's duration
if len(chapters) > 1 or traverse_obj(chapters, (0, 'end_time')):
info['chapters'] = chapters
def extract_site_specific_field(field): info['subtitles'] = {}
# A number of sites have custom-prefixed keys, e.g. 'cbc$seasonNumber' for caption in traverse_obj(tp_metadata, ('captions', lambda _, v: url_or_none(v['src']))):
return traverse_obj(info, lambda k, v: v and k.endswith(f'${field}'), get_all=False) info['subtitles'].setdefault(caption.get('lang') or 'en', []).append({
'url': caption['src'],
'ext': mimetype2ext(caption.get('type')),
})
return { return info
'title': info['title'],
'subtitles': subtitles,
'description': info['description'],
'thumbnail': info['defaultThumbnailUrl'],
'duration': float_or_none(duration, 1000),
'timestamp': int_or_none(info.get('pubDate'), 1000) or None,
'uploader': info.get('billingCode'),
'chapters': chapters,
'creator': traverse_obj(info, ('author', {str})) or None,
'categories': traverse_obj(info, (
'categories', lambda _, v: v.get('label') in ('category', None), 'name', {str})) or None,
'tags': traverse_obj(info, ('keywords', {lambda x: re.split(r'[;,]\s?', x) if x else None})),
'location': extract_site_specific_field('region'),
'series': extract_site_specific_field('show') or extract_site_specific_field('seriesTitle'),
'season_number': int_or_none(extract_site_specific_field('seasonNumber')),
'episode_number': int_or_none(extract_site_specific_field('episodeNumber')),
'media_type': extract_site_specific_field('programmingType') or extract_site_specific_field('type'),
}
def _extract_theplatform_metadata(self, path, video_id): def _extract_theplatform_metadata(self, path, video_id):
info = self._download_theplatform_metadata(path, video_id) info = self._download_theplatform_metadata(path, video_id)

121
yt_dlp/extractor/toutiao.py Normal file
View File

@ -0,0 +1,121 @@
import json
import urllib.parse
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
str_or_none,
try_call,
url_or_none,
)
from ..utils.traversal import find_element, traverse_obj
class ToutiaoIE(InfoExtractor):
IE_NAME = 'toutiao'
IE_DESC = '今日头条'
_VALID_URL = r'https?://www\.toutiao\.com/video/(?P<id>\d+)/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.toutiao.com/video/7505382061495176511/',
'info_dict': {
'id': '7505382061495176511',
'ext': 'mp4',
'title': '新疆多地现不明飞行物,目击者称和月亮一样亮,几秒内突然加速消失,气象部门回应',
'comment_count': int,
'duration': 9.753,
'like_count': int,
'release_date': '20250517',
'release_timestamp': 1747483344,
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
'uploader': '极目新闻',
'uploader_id': 'MS4wLjABAAAAeateBb9Su8I3MJOZozmvyzWktmba5LMlliRDz1KffnM',
'view_count': int,
},
}, {
'url': 'https://www.toutiao.com/video/7479446610359878153/',
'info_dict': {
'id': '7479446610359878153',
'ext': 'mp4',
'title': '小伙竟然利用两块磁铁制作成磁力减震器,简直太有创意了!',
'comment_count': int,
'duration': 118.374,
'like_count': int,
'release_date': '20250308',
'release_timestamp': 1741444368,
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
'uploader': '小莉创意发明',
'uploader_id': 'MS4wLjABAAAA4f7d4mwtApALtHIiq-QM20dwXqe32NUz0DeWF7wbHKw',
'view_count': int,
},
}]
def _real_initialize(self):
if self._get_cookies('https://www.toutiao.com').get('ttwid'):
return
urlh = self._request_webpage(
'https://ttwid.bytedance.com/ttwid/union/register/', None,
'Fetching ttwid', 'Unable to fetch ttwid', headers={
'Content-Type': 'application/json',
}, data=json.dumps({
'aid': 24,
'needFid': False,
'region': 'cn',
'service': 'www.toutiao.com',
'union': True,
}).encode(),
)
if ttwid := try_call(lambda: self._get_cookies(urlh.url)['ttwid'].value):
self._set_cookie('.toutiao.com', 'ttwid', ttwid)
return
self.raise_login_required()
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = traverse_obj(webpage, (
{find_element(tag='script', id='RENDER_DATA')},
{urllib.parse.unquote}, {json.loads}, 'data', 'initialVideo',
))
formats = []
for video in traverse_obj(video_data, (
'videoPlayInfo', 'video_list', lambda _, v: v['main_url'],
)):
formats.append({
'url': video['main_url'],
**traverse_obj(video, ('video_meta', {
'acodec': ('audio_profile', {str}),
'asr': ('audio_sample_rate', {int_or_none}),
'audio_channels': ('audio_channels', {float_or_none}, {int_or_none}),
'ext': ('vtype', {str}),
'filesize': ('size', {int_or_none}),
'format_id': ('definition', {str}),
'fps': ('fps', {int_or_none}),
'height': ('vheight', {int_or_none}),
'tbr': ('real_bitrate', {float_or_none(scale=1000)}),
'vcodec': ('codec_type', {str}),
'width': ('vwidth', {int_or_none}),
})),
})
return {
'id': video_id,
'formats': formats,
**traverse_obj(video_data, {
'comment_count': ('commentCount', {int_or_none}),
'duration': ('videoPlayInfo', 'video_duration', {float_or_none}),
'like_count': ('repinCount', {int_or_none}),
'release_timestamp': ('publishTime', {int_or_none}),
'thumbnail': (('poster', 'coverUrl'), {url_or_none}, any),
'title': ('title', {str}),
'uploader': ('userInfo', 'name', {str}),
'uploader_id': ('userInfo', 'userId', {str_or_none}),
'view_count': ('playCount', {int_or_none}),
'webpage_url': ('detailUrl', {url_or_none}),
}),
}

View File

@ -20,6 +20,7 @@ class TruTVIE(TurnerBaseIE):
'skip_download': True, 'skip_download': True,
}, },
} }
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhYzQyOTkwMi0xMDYzLTQyNTQtYWJlYS1iZTY2ODM4MTVmZGIiLCJuYmYiOjE1MzcxOTA4NjgsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwODY4fQ.ewXl5LDMDvvx3nDXV4jCdSwUq_sOluKoOVsIjznAo6Zo4zrGe9rjlZ9DOmQKW66g6VRMexJsJ5vM1EkY8TC5-YcQw_BclK1FPGO1rH3Wf7tX_l0b1BVbSJQKIj9UgqDp_QbGcBXz24kN4So3U22mhs6di9PYyyfG68ccKL2iRprcVKWCslIHwUF-T7FaEqb0K57auilxeW1PONG2m-lIAcZ62DUwqXDWvw0CRoWI08aVVqkkhnXaSsQfLs5Ph1Pfh9Oq3g_epUm9Ss45mq6XM7gbOb5omTcKLADRKK-PJVB_JXnZnlsXbG0ttKE1cTKJ738qu7j4aipYTf-W0nKF5Q'
def _real_extract(self, url): def _real_extract(self, url):
series_slug, clip_slug, video_id = self._match_valid_url(url).groups() series_slug, clip_slug, video_id = self._match_valid_url(url).groups()
@ -39,7 +40,7 @@ def _real_extract(self, url):
title = video_data['title'].strip() title = video_data['title'].strip()
info = self._extract_ngtv_info( info = self._extract_ngtv_info(
media_id, {}, { media_id, {}, self._SOFTWARE_STATEMENT, {
'url': url, 'url': url,
'site_name': 'truTV', 'site_name': 'truTV',
'auth_required': video_data.get('isAuthRequired'), 'auth_required': video_data.get('isAuthRequired'),

View File

@ -22,7 +22,7 @@ class TurnerBaseIE(AdobePassIE):
def _extract_timestamp(self, video_data): def _extract_timestamp(self, video_data):
return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts')) return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, custom_tokenizer_query=None): def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, software_statement, custom_tokenizer_query=None):
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*' secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
token = self._AKAMAI_SPE_TOKEN_CACHE.get(secure_path) token = self._AKAMAI_SPE_TOKEN_CACHE.get(secure_path)
if not token: if not token:
@ -34,7 +34,8 @@ def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, c
else: else:
query['videoId'] = content_id query['videoId'] = content_id
if ap_data.get('auth_required'): if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], content_id, ap_data['site_name'], ap_data['site_name']) query['accessToken'] = self._extract_mvpd_auth(
ap_data['url'], content_id, ap_data['site_name'], ap_data['site_name'], software_statement)
auth = self._download_xml( auth = self._download_xml(
tokenizer_src, content_id, query=query) tokenizer_src, content_id, query=query)
error_msg = xpath_text(auth, 'error/msg') error_msg = xpath_text(auth, 'error/msg')
@ -46,7 +47,7 @@ def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, c
self._AKAMAI_SPE_TOKEN_CACHE[secure_path] = token self._AKAMAI_SPE_TOKEN_CACHE[secure_path] = token
return video_url + '?hdnea=' + token return video_url + '?hdnea=' + token
def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}, fatal=False): def _extract_cvp_info(self, data_src, video_id, software_statement, path_data={}, ap_data={}, fatal=False):
video_data = self._download_xml( video_data = self._download_xml(
data_src, video_id, data_src, video_id,
transform_source=lambda s: fix_xml_ampersands(s).strip(), transform_source=lambda s: fix_xml_ampersands(s).strip(),
@ -101,7 +102,7 @@ def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}, fatal=
video_url = self._add_akamai_spe_token( video_url = self._add_akamai_spe_token(
secure_path_data['tokenizer_src'], secure_path_data['tokenizer_src'],
secure_path_data['media_src'] + video_url, secure_path_data['media_src'] + video_url,
content_id, ap_data) content_id, ap_data, software_statement)
elif not re.match('https?://', video_url): elif not re.match('https?://', video_url):
base_path_data = path_data.get(ext, path_data.get('default', {})) base_path_data = path_data.get(ext, path_data.get('default', {}))
media_src = base_path_data.get('media_src') media_src = base_path_data.get('media_src')
@ -215,10 +216,12 @@ def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}, fatal=
'is_live': is_live, 'is_live': is_live,
} }
def _extract_ngtv_info(self, media_id, tokenizer_query, ap_data=None): def _extract_ngtv_info(self, media_id, tokenizer_query, software_statement, ap_data=None):
if not isinstance(ap_data, dict):
ap_data = {}
is_live = ap_data.get('is_live') is_live = ap_data.get('is_live')
streams_data = self._download_json( streams_data = self._download_json(
f'http://medium.ngtv.io/media/{media_id}/tv', f'https://medium.ngtv.io/media/{media_id}/tv',
media_id)['media']['tv'] media_id)['media']['tv']
duration = None duration = None
chapters = [] chapters = []
@ -230,8 +233,8 @@ def _extract_ngtv_info(self, media_id, tokenizer_query, ap_data=None):
continue continue
if stream_data.get('playlistProtection') == 'spe': if stream_data.get('playlistProtection') == 'spe':
m3u8_url = self._add_akamai_spe_token( m3u8_url = self._add_akamai_spe_token(
'http://token.ngtv.io/token/token_spe', 'https://token.ngtv.io/token/token_spe',
m3u8_url, media_id, ap_data or {}, tokenizer_query) m3u8_url, media_id, ap_data, software_statement, tokenizer_query)
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, media_id, 'mp4', m3u8_id='hls', live=is_live, fatal=False)) m3u8_url, media_id, 'mp4', m3u8_id='hls', live=is_live, fatal=False))

View File

@ -1,4 +1,5 @@
import base64 import base64
import hashlib
import itertools import itertools
import re import re
@ -16,6 +17,7 @@
str_to_int, str_to_int,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -171,6 +173,10 @@ def find_dmu(x):
'player': 'pc_web', 'player': 'pc_web',
}) })
password_params = {
'word': hashlib.md5(video_password.encode()).hexdigest(),
} if video_password else None
formats = [] formats = []
# low: 640x360, medium: 1280x720, high: 1920x1080 # low: 640x360, medium: 1280x720, high: 1920x1080
qq = qualities(['low', 'medium', 'high']) qq = qualities(['low', 'medium', 'high'])
@ -178,7 +184,7 @@ def find_dmu(x):
'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]), 'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
)): )):
formats.append({ formats.append({
'url': m3u8_url, 'url': update_url_query(m3u8_url, password_params),
'format_id': f'hls-{quality}', 'format_id': f'hls-{quality}',
'ext': 'mp4', 'ext': 'mp4',
'quality': qq(quality), 'quality': qq(quality),
@ -192,7 +198,7 @@ def find_dmu(x):
'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]), 'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
)): )):
formats.append({ formats.append({
'url': ws_url, 'url': update_url_query(ws_url, password_params),
'format_id': f'ws-{mode}', 'format_id': f'ws-{mode}',
'ext': 'mp4', 'ext': 'mp4',
'quality': qq(mode), 'quality': qq(mode),

View File

@ -20,7 +20,6 @@
remove_end, remove_end,
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj,
truncate_string, truncate_string,
try_call, try_call,
try_get, try_get,
@ -29,6 +28,7 @@
url_or_none, url_or_none,
xpath_text, xpath_text,
) )
from ..utils.traversal import require, traverse_obj
class TwitterBaseIE(InfoExtractor): class TwitterBaseIE(InfoExtractor):
@ -1342,7 +1342,7 @@ def _extract_status(self, twid):
'tweet_mode': 'extended', 'tweet_mode': 'extended',
}) })
except ExtractorError as e: except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429: if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
raise raise
self.report_warning('Rate-limit exceeded; falling back to syndication endpoint') self.report_warning('Rate-limit exceeded; falling back to syndication endpoint')
status = self._call_syndication_api(twid) status = self._call_syndication_api(twid)
@ -1596,8 +1596,8 @@ def _find_dimension(target):
class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE): class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
IE_NAME = 'twitter:broadcast' IE_NAME = 'twitter:broadcast'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/(?P<type>broadcasts|events)/(?P<id>\w+)'
_TESTS = [{ _TESTS = [{
# untitled Periscope video # untitled Periscope video
'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj', 'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
@ -1605,6 +1605,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1yNGaQLWpejGj', 'id': '1yNGaQLWpejGj',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Andrea May Sahouri - Periscope Broadcast', 'title': 'Andrea May Sahouri - Periscope Broadcast',
'display_id': '1yNGaQLWpejGj',
'uploader': 'Andrea May Sahouri', 'uploader': 'Andrea May Sahouri',
'uploader_id': 'andreamsahouri', 'uploader_id': 'andreamsahouri',
'uploader_url': 'https://twitter.com/andreamsahouri', 'uploader_url': 'https://twitter.com/andreamsahouri',
@ -1612,6 +1613,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20200601', 'upload_date': '20200601',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
}, },
}, { }, {
'url': 'https://twitter.com/i/broadcasts/1ZkKzeyrPbaxv', 'url': 'https://twitter.com/i/broadcasts/1ZkKzeyrPbaxv',
@ -1619,6 +1622,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1ZkKzeyrPbaxv', 'id': '1ZkKzeyrPbaxv',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starship | SN10 | High-Altitude Flight Test', 'title': 'Starship | SN10 | High-Altitude Flight Test',
'display_id': '1ZkKzeyrPbaxv',
'uploader': 'SpaceX', 'uploader': 'SpaceX',
'uploader_id': 'SpaceX', 'uploader_id': 'SpaceX',
'uploader_url': 'https://twitter.com/SpaceX', 'uploader_url': 'https://twitter.com/SpaceX',
@ -1626,6 +1630,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20210303', 'upload_date': '20210303',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
}, },
}, { }, {
'url': 'https://twitter.com/i/broadcasts/1OyKAVQrgzwGb', 'url': 'https://twitter.com/i/broadcasts/1OyKAVQrgzwGb',
@ -1633,6 +1639,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1OyKAVQrgzwGb', 'id': '1OyKAVQrgzwGb',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starship Flight Test', 'title': 'Starship Flight Test',
'display_id': '1OyKAVQrgzwGb',
'uploader': 'SpaceX', 'uploader': 'SpaceX',
'uploader_id': 'SpaceX', 'uploader_id': 'SpaceX',
'uploader_url': 'https://twitter.com/SpaceX', 'uploader_url': 'https://twitter.com/SpaceX',
@ -1640,21 +1647,58 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20230420', 'upload_date': '20230420',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
},
}, {
'url': 'https://x.com/i/events/1910629646300762112',
'info_dict': {
'id': '1LyxBWDRNqyKN',
'ext': 'mp4',
'title': '#ガンニバル ウォッチパーティー',
'concurrent_view_count': int,
'display_id': '1910629646300762112',
'live_status': 'was_live',
'release_date': '20250423',
'release_timestamp': 1745409000,
'tags': ['ガンニバル'],
'thumbnail': r're:https?://[^?#]+\.jpg\?token=',
'timestamp': 1745403328,
'upload_date': '20250423',
'uploader': 'ディズニープラス公式',
'uploader_id': 'DisneyPlusJP',
'uploader_url': 'https://twitter.com/DisneyPlusJP',
'view_count': int,
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
broadcast_id = self._match_id(url) broadcast_type, display_id = self._match_valid_url(url).group('type', 'id')
if broadcast_type == 'events':
timeline = self._call_api(
f'live_event/1/{display_id}/timeline.json', display_id)
broadcast_id = traverse_obj(timeline, (
'twitter_objects', 'broadcasts', ..., ('id', 'broadcast_id'),
{str}, any, {require('broadcast ID')}))
else:
broadcast_id = display_id
broadcast = self._call_api( broadcast = self._call_api(
'broadcasts/show.json', broadcast_id, 'broadcasts/show.json', broadcast_id,
{'ids': broadcast_id})['broadcasts'][broadcast_id] {'ids': broadcast_id})['broadcasts'][broadcast_id]
if not broadcast: if not broadcast:
raise ExtractorError('Broadcast no longer exists', expected=True) raise ExtractorError('Broadcast no longer exists', expected=True)
info = self._parse_broadcast_data(broadcast, broadcast_id) info = self._parse_broadcast_data(broadcast, broadcast_id)
info['title'] = broadcast.get('status') or info.get('title') info.update({
info['uploader_id'] = broadcast.get('twitter_username') or info.get('uploader_id') 'display_id': display_id,
info['uploader_url'] = format_field(broadcast, 'twitter_username', 'https://twitter.com/%s', default=None) 'title': broadcast.get('status') or info.get('title'),
'uploader_id': broadcast.get('twitter_username') or info.get('uploader_id'),
'uploader_url': format_field(
broadcast, 'twitter_username', 'https://twitter.com/%s', default=None),
})
if info['live_status'] == 'is_upcoming': if info['live_status'] == 'is_upcoming':
self.raise_no_formats('This live broadcast has not yet started', expected=True)
return info return info
media_key = broadcast['media_key'] media_key = broadcast['media_key']

View File

@ -1,98 +1,53 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import clean_html
int_or_none, from ..utils.traversal import find_element, traverse_obj
parse_filesize,
parse_iso8601,
)
class UMGDeIE(InfoExtractor): class UMGDeIE(InfoExtractor):
_WORKING = False
IE_NAME = 'umg:de' IE_NAME = 'umg:de'
IE_DESC = 'Universal Music Deutschland' IE_DESC = 'Universal Music Deutschland'
_VALID_URL = r'https?://(?:www\.)?universal-music\.de/[^/]+/videos/[^/?#]+-(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?universal-music\.de/[^/?#]+/videos/(?P<slug>[^/?#]+-(?P<id>\d+))'
_TEST = { _TESTS = [{
'url': 'https://www.universal-music.de/sido/videos/jedes-wort-ist-gold-wert-457803', 'url': 'https://www.universal-music.de/sido/videos/jedes-wort-ist-gold-wert-457803',
'md5': 'ebd90f48c80dcc82f77251eb1902634f',
'info_dict': { 'info_dict': {
'id': '457803', 'id': '457803',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Jedes Wort ist Gold wert', 'title': 'Jedes Wort ist Gold wert',
'artists': ['Sido'],
'description': 'md5:df2dbffcff1a74e0a7c9bef4b497aeec',
'display_id': 'jedes-wort-ist-gold-wert-457803',
'duration': 210.0,
'thumbnail': r're:https?://images\.universal-music\.de/img/assets/.+\.jpg',
'timestamp': 1513591800, 'timestamp': 1513591800,
'upload_date': '20171218', 'upload_date': '20171218',
'view_count': int,
}, },
} }, {
'url': 'https://www.universal-music.de/alexander-eder/videos/der-doktor-hat-gesagt-609533',
'info_dict': {
'id': '609533',
'ext': 'mp4',
'title': 'Der Doktor hat gesagt',
'artists': ['Alexander Eder'],
'display_id': 'der-doktor-hat-gesagt-609533',
'duration': 146.0,
'thumbnail': r're:https?://images\.universal-music\.de/img/assets/.+\.jpg',
'timestamp': 1742982100,
'upload_date': '20250326',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id, video_id = self._match_valid_url(url).group('slug', 'id')
video_data = self._download_json( webpage = self._download_webpage(url, display_id)
'https://graphql.universal-music.de/',
video_id, query={
'query': '''{
universalMusic(channel:16) {
video(id:%s) {
headline
formats {
formatId
url
type
width
height
mimeType
fileSize
}
duration
createdDate
}
}
}''' % video_id})['data']['universalMusic']['video'] # noqa: UP031
title = video_data['headline']
hls_url_template = 'http://mediadelivery.universal-music-services.de/vod/mp4:autofill/storage/' + '/'.join(list(video_id)) + '/content/%s/file/playlist.m3u8'
thumbnails = []
formats = []
def add_m3u8_format(format_id):
formats.extend(self._extract_m3u8_formats(
hls_url_template % format_id, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
for f in video_data.get('formats', []):
f_url = f.get('url')
mime_type = f.get('mimeType')
if not f_url or mime_type == 'application/mxf':
continue
fmt = {
'url': f_url,
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'filesize': parse_filesize(f.get('fileSize')),
}
f_type = f.get('type')
if f_type == 'Image':
thumbnails.append(fmt)
elif f_type == 'Video':
format_id = f.get('formatId')
if format_id:
fmt['format_id'] = format_id
if mime_type == 'video/mp4':
add_m3u8_format(format_id)
urlh = self._request_webpage(f_url, video_id, fatal=False)
if urlh:
first_byte = urlh.read(1)
if first_byte not in (b'F', b'\x00'):
continue
formats.append(fmt)
if not formats:
for format_id in (867, 836, 940):
add_m3u8_format(format_id)
return { return {
**self._search_json_ld(webpage, display_id),
'id': video_id, 'id': video_id,
'title': title, 'artists': traverse_obj(self._html_search_meta('umg-artist-screenname', webpage), (filter, all)),
'duration': int_or_none(video_data.get('duration')), # The JSON LD description duplicates the title
'timestamp': parse_iso8601(video_data.get('createdDate'), ' '), 'description': traverse_obj(webpage, ({find_element(cls='_3Y0Lj')}, {clean_html})),
'thumbnails': thumbnails, 'display_id': display_id,
'formats': formats, 'formats': self._extract_m3u8_formats(
'https://hls.universal-music.de/get', display_id, 'mp4', query={'id': video_id}),
} }

View File

@ -32,6 +32,7 @@ def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''
class ViceIE(ViceBaseIE, AdobePassIE): class ViceIE(ViceBaseIE, AdobePassIE):
_WORKING = False
IE_NAME = 'vice' IE_NAME = 'vice'
_VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})' _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
_EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})'] _EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})']
@ -99,6 +100,7 @@ class ViceIE(ViceBaseIE, AdobePassIE):
'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1', 'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1',
'only_matching': True, 'only_matching': True,
}] }]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwMTVjODBlZC04ZDcxLTQ4ZGEtOTZkZi00NzU5NjIwNzJlYTQiLCJuYmYiOjE2NjgwMTM0ODQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNjY4MDEzNDg0fQ.CjhUnTrlh-bmYnEFHyC2Y4it5Y_Zfza1x66O4-ki5gBR7JT6aUunYI_YflXomQPACriMpObkITFz4grVaDwdd8Xp9hrQ2R0SwRBdaklkdy1_j68RqSP5PnexJIa0q_ThtOwfRBd5uGcb33nMJ9Qs92W4kVXuca0Ta-i7SJyWgXUaPDlRDdgyCL3hKj5wuM7qUIwrd9A5CMm-j3dMIBCDgw7X6TwRK65eUQe6gTWqcvL2yONHHTpmIfeOTUxGwwKFr29COOTBowm0VJ6HE08xjXCShP08Neusu-JsgkjzhkEbiDE2531EKgfAki_7WCd2JUZVsAsCusv4a1maokk6NA'
def _real_extract(self, url): def _real_extract(self, url):
locale, video_id = self._match_valid_url(url).groups() locale, video_id = self._match_valid_url(url).groups()
@ -116,7 +118,7 @@ def _real_extract(self, url):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
'VICELAND', title, video_id, rating) 'VICELAND', title, video_id, rating)
query['tvetoken'] = self._extract_mvpd_auth( query['tvetoken'] = self._extract_mvpd_auth(
url, video_id, 'VICELAND', resource) url, video_id, 'VICELAND', resource, self._SOFTWARE_STATEMENT)
# signature generation algorithm is reverse engineered from signatureGenerator in # signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in # webpack:///../shared/~/vice-player/dist/js/vice-player.js in
@ -181,6 +183,7 @@ def _real_extract(self, url):
class ViceShowIE(ViceBaseIE): class ViceShowIE(ViceBaseIE):
_WORKING = False
IE_NAME = 'vice:show' IE_NAME = 'vice:show'
_VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
_PAGE_SIZE = 25 _PAGE_SIZE = 25
@ -221,6 +224,7 @@ def _real_extract(self, url):
class ViceArticleIE(ViceBaseIE): class ViceArticleIE(ViceBaseIE):
_WORKING = False
IE_NAME = 'vice:article' IE_NAME = 'vice:article'
_VALID_URL = r'https?://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)' _VALID_URL = r'https?://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'

View File

@ -236,7 +236,7 @@ def _parse_config(self, config, video_id):
for tt in (request.get('text_tracks') or []): for tt in (request.get('text_tracks') or []):
subtitles.setdefault(tt['lang'], []).append({ subtitles.setdefault(tt['lang'], []).append({
'ext': 'vtt', 'ext': 'vtt',
'url': urljoin('https://vimeo.com', tt['url']), 'url': urljoin('https://player.vimeo.com/', tt['url']),
}) })
thumbnails = [] thumbnails = []

View File

@ -548,21 +548,21 @@ def _real_extract(self, url):
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
**traverse_obj(mv_data, { **traverse_obj(mv_data, {
'title': ('title', {unescapeHTML}), 'title': ('title', {str}, {unescapeHTML}),
'description': ('desc', {clean_html}, filter), 'description': ('desc', {clean_html}, filter),
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
'like_count': ('likes', {int_or_none}), 'like_count': ('likes', {int_or_none}),
'comment_count': ('commcount', {int_or_none}), 'comment_count': ('commcount', {int_or_none}),
}), }),
**traverse_obj(data, { **traverse_obj(data, {
'title': ('md_title', {unescapeHTML}), 'title': ('md_title', {str}, {unescapeHTML}),
'description': ('description', {clean_html}, filter), 'description': ('description', {clean_html}, filter),
'thumbnail': ('jpg', {url_or_none}), 'thumbnail': ('jpg', {url_or_none}),
'uploader': ('md_author', {unescapeHTML}), 'uploader': ('md_author', {str}, {unescapeHTML}),
'uploader_id': (('author_id', 'authorId'), {str_or_none}, any), 'uploader_id': (('author_id', 'authorId'), {str_or_none}, any),
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), { 'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), {
'title': ('text', {unescapeHTML}), 'title': ('text', {str}, {unescapeHTML}),
'start_time': 'time', 'start_time': 'time',
}), }),
}), }),

View File

@ -1,4 +1,5 @@
import base64 import base64
import functools
import hashlib import hashlib
import hmac import hmac
import itertools import itertools
@ -17,99 +18,227 @@
UserNotLive, UserNotLive,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty,
jwt_decode_hs256,
str_or_none, str_or_none,
traverse_obj,
try_call, try_call,
update_url_query, update_url_query,
url_or_none, url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class WeverseBaseIE(InfoExtractor): class WeverseBaseIE(InfoExtractor):
_NETRC_MACHINE = 'weverse' _NETRC_MACHINE = 'weverse'
_ACCOUNT_API_BASE = 'https://accountapi.weverse.io/web/api' _ACCOUNT_API_BASE = 'https://accountapi.weverse.io'
_CLIENT_PLATFORM = 'WEB'
_SIGNING_KEY = b'1b9cb6378d959b45714bec49971ade22e6e24e42'
_ACCESS_TOKEN_KEY = 'we2_access_token'
_REFRESH_TOKEN_KEY = 'we2_refresh_token'
_DEVICE_ID_KEY = 'we2_device_id'
_API_HEADERS = { _API_HEADERS = {
'Accept': 'application/json', 'Accept': 'application/json',
'Origin': 'https://weverse.io',
'Referer': 'https://weverse.io/', 'Referer': 'https://weverse.io/',
'WEV-device-Id': str(uuid.uuid4()),
} }
_LOGIN_HINT_TMPL = (
'You can log in using your refresh token with --username "{}" --password "REFRESH_TOKEN" '
'(replace REFRESH_TOKEN with the actual value of the "{}" cookie found in your web browser). '
'You can add an optional username suffix, e.g. --username "{}" , '
'if you need to manage multiple accounts. ')
_LOGIN_ERRORS_MAP = {
'login_required': 'This content is only available for logged-in users. ',
'invalid_username': '"{}" is not valid login username for this extractor. ',
'invalid_password': (
'Your password is not a valid refresh token. Make sure that '
'you are passing the refresh token, and NOT the access token. '),
'no_refresh_token': (
'Your access token has expired and there is no refresh token available. '
'Refresh your session/cookies in the web browser and try again. '),
'expired_refresh_token': (
'Your refresh token has expired. Log in to the site again using '
'your web browser to get a new refresh token or export fresh cookies. '),
}
_OAUTH_PREFIX = 'oauth'
_oauth_tokens = {}
_device_id = None
def _perform_login(self, username, password): @property
if self._API_HEADERS.get('Authorization'): def _oauth_headers(self):
return return {
**self._API_HEADERS,
headers = { 'X-ACC-APP-SECRET': '5419526f1c624b38b10787e5c10b2a7a',
'x-acc-app-secret': '5419526f1c624b38b10787e5c10b2a7a', 'X-ACC-SERVICE-ID': 'weverse',
'x-acc-app-version': '3.3.6', 'X-ACC-TRACE-ID': str(uuid.uuid4()),
'x-acc-language': 'en',
'x-acc-service-id': 'weverse',
'x-acc-trace-id': str(uuid.uuid4()),
'x-clog-user-device-id': str(uuid.uuid4()),
} }
valid_username = traverse_obj(self._download_json(
f'{self._ACCOUNT_API_BASE}/v2/signup/email/status', None, note='Checking username',
query={'email': username}, headers=headers, expected_status=(400, 404)), 'hasPassword')
if not valid_username:
raise ExtractorError('Invalid username provided', expected=True)
headers['content-type'] = 'application/json' @functools.cached_property
def _oauth_cache_key(self):
username = self._get_login_info()[0]
if not username:
return 'cookies'
return join_nonempty(self._OAUTH_PREFIX, username.partition('+')[2])
@property
def _is_logged_in(self):
return bool(self._oauth_tokens.get(self._ACCESS_TOKEN_KEY))
def _access_token_is_valid(self):
response = self._download_json(
f'{self._ACCOUNT_API_BASE}/api/v1/token/validate', None,
'Validating access token', 'Unable to valid access token',
expected_status=401, headers={
**self._oauth_headers,
'Authorization': f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}',
})
return traverse_obj(response, ('expiresIn', {int}), default=0) > 60
def _token_is_expired(self, key):
is_expired = jwt_decode_hs256(self._oauth_tokens[key])['exp'] - time.time() < 3600
if key == self._REFRESH_TOKEN_KEY or not is_expired:
return is_expired
return not self._access_token_is_valid()
def _refresh_access_token(self):
if not self._oauth_tokens.get(self._REFRESH_TOKEN_KEY):
self._report_login_error('no_refresh_token')
if self._token_is_expired(self._REFRESH_TOKEN_KEY):
self._report_login_error('expired_refresh_token')
headers = {'Content-Type': 'application/json'}
if self._is_logged_in:
headers['Authorization'] = f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}'
try: try:
auth = self._download_json( response = self._download_json(
f'{self._ACCOUNT_API_BASE}/v3/auth/token/by-credentials', None, data=json.dumps({ f'{self._ACCOUNT_API_BASE}/api/v1/token/refresh', None,
'email': username, 'Refreshing access token', 'Unable to refresh access token',
'otpSessionId': 'BY_PASS', headers={**self._oauth_headers, **headers},
'password': password, data=json.dumps({
}, separators=(',', ':')).encode(), headers=headers, note='Logging in') 'refreshToken': self._oauth_tokens[self._REFRESH_TOKEN_KEY],
}, separators=(',', ':')).encode())
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401: if isinstance(e.cause, HTTPError) and e.cause.status == 401:
raise ExtractorError('Invalid password provided', expected=True) self._oauth_tokens.clear()
if self._oauth_cache_key == 'cookies':
self.cookiejar.clear(domain='.weverse.io', path='/', name=self._ACCESS_TOKEN_KEY)
self.cookiejar.clear(domain='.weverse.io', path='/', name=self._REFRESH_TOKEN_KEY)
else:
self.cache.store(self._NETRC_MACHINE, self._oauth_cache_key, self._oauth_tokens)
self._report_login_error('expired_refresh_token')
raise raise
WeverseBaseIE._API_HEADERS['Authorization'] = f'Bearer {auth["accessToken"]}' self._oauth_tokens.update(traverse_obj(response, {
self._ACCESS_TOKEN_KEY: ('accessToken', {str}, {require('access token')}),
self._REFRESH_TOKEN_KEY: ('refreshToken', {str}, {require('refresh token')}),
}))
def _real_initialize(self): if self._oauth_cache_key == 'cookies':
if self._API_HEADERS.get('Authorization'): self._set_cookie('.weverse.io', self._ACCESS_TOKEN_KEY, self._oauth_tokens[self._ACCESS_TOKEN_KEY])
self._set_cookie('.weverse.io', self._REFRESH_TOKEN_KEY, self._oauth_tokens[self._REFRESH_TOKEN_KEY])
else:
self.cache.store(self._NETRC_MACHINE, self._oauth_cache_key, self._oauth_tokens)
def _get_authorization_header(self):
if not self._is_logged_in:
return {}
if self._token_is_expired(self._ACCESS_TOKEN_KEY):
self._refresh_access_token()
return {'Authorization': f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}'}
def _report_login_error(self, error_id):
error_msg = self._LOGIN_ERRORS_MAP[error_id]
username = self._get_login_info()[0]
if error_id == 'invalid_username':
error_msg = error_msg.format(username)
username = f'{self._OAUTH_PREFIX}+{username}'
elif not username:
username = f'{self._OAUTH_PREFIX}+USERNAME'
raise ExtractorError(join_nonempty(
error_msg, self._LOGIN_HINT_TMPL.format(self._OAUTH_PREFIX, self._REFRESH_TOKEN_KEY, username),
'Or else you can u', self._login_hint(method='session_cookies')[1:], delim=''), expected=True)
def _perform_login(self, username, password):
if self._is_logged_in:
return return
token = try_call(lambda: self._get_cookies('https://weverse.io/')['we2_access_token'].value) if username.partition('+')[0] != self._OAUTH_PREFIX:
if token: self._report_login_error('invalid_username')
WeverseBaseIE._API_HEADERS['Authorization'] = f'Bearer {token}'
self._oauth_tokens.update(self.cache.load(self._NETRC_MACHINE, self._oauth_cache_key, default={}))
if self._is_logged_in and self._access_token_is_valid():
return
rt_key = self._REFRESH_TOKEN_KEY
if not self._oauth_tokens.get(rt_key) or self._token_is_expired(rt_key):
if try_call(lambda: jwt_decode_hs256(password)['scope']) != 'refresh':
self._report_login_error('invalid_password')
self._oauth_tokens[rt_key] = password
self._refresh_access_token()
def _real_initialize(self):
cookies = self._get_cookies('https://weverse.io/')
if not self._device_id:
self._device_id = traverse_obj(cookies, (self._DEVICE_ID_KEY, 'value')) or str(uuid.uuid4())
if self._is_logged_in:
return
self._oauth_tokens.update(traverse_obj(cookies, {
self._ACCESS_TOKEN_KEY: (self._ACCESS_TOKEN_KEY, 'value'),
self._REFRESH_TOKEN_KEY: (self._REFRESH_TOKEN_KEY, 'value'),
}))
if self._is_logged_in and not self._access_token_is_valid():
self._refresh_access_token()
def _call_api(self, ep, video_id, data=None, note='Downloading API JSON'): def _call_api(self, ep, video_id, data=None, note='Downloading API JSON'):
# Ref: https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/2488.a09b41ff.chunk.js # Ref: https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/2488.a09b41ff.chunk.js
# From https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/main.e206f7c1.js: # From https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/main.e206f7c1.js:
key = b'1b9cb6378d959b45714bec49971ade22e6e24e42'
api_path = update_url_query(ep, { api_path = update_url_query(ep, {
# 'gcc': 'US', # 'gcc': 'US',
'appId': 'be4d79eb8fc7bd008ee82c8ec4ff6fd4', 'appId': 'be4d79eb8fc7bd008ee82c8ec4ff6fd4',
'language': 'en', 'language': 'en',
'os': 'WEB', 'os': self._CLIENT_PLATFORM,
'platform': 'WEB', 'platform': self._CLIENT_PLATFORM,
'wpf': 'pc', 'wpf': 'pc',
}) })
wmsgpad = int(time.time() * 1000) for is_retry in (False, True):
wmd = base64.b64encode(hmac.HMAC( wmsgpad = int(time.time() * 1000)
key, f'{api_path[:255]}{wmsgpad}'.encode(), digestmod=hashlib.sha1).digest()).decode() wmd = base64.b64encode(hmac.HMAC(
headers = {'Content-Type': 'application/json'} if data else {} self._SIGNING_KEY, f'{api_path[:255]}{wmsgpad}'.encode(),
try: digestmod=hashlib.sha1).digest()).decode()
return self._download_json(
f'https://global.apis.naver.com/weverse/wevweb{api_path}', video_id, note=note, try:
data=data, headers={**self._API_HEADERS, **headers}, query={ return self._download_json(
'wmsgpad': wmsgpad, f'https://global.apis.naver.com/weverse/wevweb{api_path}', video_id, note=note,
'wmd': wmd, data=data, headers={
}) **self._API_HEADERS,
except ExtractorError as e: **self._get_authorization_header(),
if isinstance(e.cause, HTTPError) and e.cause.status == 401: **({'Content-Type': 'application/json'} if data else {}),
self.raise_login_required( 'WEV-device-Id': self._device_id,
'Session token has expired. Log in again or refresh cookies in browser') }, query={
elif isinstance(e.cause, HTTPError) and e.cause.status == 403: 'wmsgpad': wmsgpad,
if 'Authorization' in self._API_HEADERS: 'wmd': wmd,
raise ExtractorError('Your account does not have access to this content', expected=True) })
self.raise_login_required() except ExtractorError as e:
raise if is_retry or not isinstance(e.cause, HTTPError):
raise
elif self._is_logged_in and e.cause.status == 401:
self._refresh_access_token()
continue
elif e.cause.status == 403:
if self._is_logged_in:
raise ExtractorError(
'Your account does not have access to this content', expected=True)
self._report_login_error('login_required')
raise
def _call_post_api(self, video_id): def _call_post_api(self, video_id):
path = '' if 'Authorization' in self._API_HEADERS else '/preview' path = '' if self._is_logged_in else '/preview'
return self._call_api(f'/post/v1.0/post-{video_id}{path}?fieldSet=postV1', video_id) return self._call_api(f'/post/v1.0/post-{video_id}{path}?fieldSet=postV1', video_id)
def _get_community_id(self, channel): def _get_community_id(self, channel):

View File

@ -45,7 +45,7 @@ class XinpianchangIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id=video_id) webpage = self._download_webpage(url, video_id=video_id, headers={'Referer': url})
video_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['detail']['video'] video_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['detail']['video']
data = self._download_json( data = self._download_json(

View File

@ -35,6 +35,7 @@
class _PoTokenContext(enum.Enum): class _PoTokenContext(enum.Enum):
PLAYER = 'player' PLAYER = 'player'
GVS = 'gvs' GVS = 'gvs'
SUBS = 'subs'
# any clients starting with _ cannot be explicitly requested by the user # any clients starting with _ cannot be explicitly requested by the user
@ -174,6 +175,15 @@ class _PoTokenContext(enum.Enum):
'INNERTUBE_CONTEXT_CLIENT_NAME': 7, 'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
'tv_simply': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'TVHTML5_SIMPLY',
'clientVersion': '1.0',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 75,
},
# This client now requires sign-in for every video # This client now requires sign-in for every video
# It was previously an age-gate workaround for videos that were `playable_in_embed` # It was previously an age-gate workaround for videos that were `playable_in_embed`
# It may still be useful if signed into an EU account that is not age-verified # It may still be useful if signed into an EU account that is not age-verified
@ -787,6 +797,7 @@ def _download_webpage_with_retries(self, *args, retry_fatal=False, retry_on_stat
def _download_ytcfg(self, client, video_id): def _download_ytcfg(self, client, video_id):
url = { url = {
'mweb': 'https://m.youtube.com',
'web': 'https://www.youtube.com', 'web': 'https://www.youtube.com',
'web_music': 'https://music.youtube.com', 'web_music': 'https://music.youtube.com',
'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1', 'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1',

View File

@ -72,6 +72,9 @@
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client' STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token' STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token'
STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context'
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide' PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
@ -247,7 +250,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'400': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'}, '400': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
'401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'}, '401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
} }
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt') _SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
_DEFAULT_CLIENTS = ('tv', 'ios', 'web') _DEFAULT_CLIENTS = ('tv', 'ios', 'web')
_DEFAULT_AUTHED_CLIENTS = ('tv', 'web') _DEFAULT_AUTHED_CLIENTS = ('tv', 'web')
@ -2225,21 +2228,21 @@ def _decrypt_nsig(self, s, video_id, player_url):
def _extract_n_function_name(self, jscode, player_url=None): def _extract_n_function_name(self, jscode, player_url=None):
varname, global_list = self._interpret_player_js_global_var(jscode, player_url) varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if debug_str := traverse_obj(global_list, (lambda _, v: v.endswith('_w8_'), any)): if debug_str := traverse_obj(global_list, (lambda _, v: v.endswith('-_w8_'), any)):
funcname = self._search_regex( pattern = r'''(?x)
r'''(?xs) \{\s*return\s+%s\[%d\]\s*\+\s*(?P<argname>[a-zA-Z0-9_$]+)\s*\}
[;\n](?: ''' % (re.escape(varname), global_list.index(debug_str))
(?P<f>function\s+)| if match := re.search(pattern, jscode):
(?:var\s+)? pattern = r'''(?x)
)(?P<funcname>[a-zA-Z0-9_$]+)\s*(?(f)|=\s*function\s*) \{\s*\)%s\(\s*
\((?P<argname>[a-zA-Z0-9_$]+)\)\s*\{ (?:
(?:(?!\}[;\n]).)+ (?P<funcname_a>[a-zA-Z0-9_$]+)\s*noitcnuf\s*
\}\s*catch\(\s*[a-zA-Z0-9_$]+\s*\)\s* |noitcnuf\s*=\s*(?P<funcname_b>[a-zA-Z0-9_$]+)(?:\s+rav)?
\{\s*return\s+%s\[%d\]\s*\+\s*(?P=argname)\s*\}\s*return\s+[^}]+\}[;\n] )[;\n]
''' % (re.escape(varname), global_list.index(debug_str)), ''' % re.escape(match.group('argname')[::-1])
jscode, 'nsig function name', group='funcname', default=None) if match := re.search(pattern, jscode[match.start()::-1]):
if funcname: a, b = match.group('funcname_a', 'funcname_b')
return funcname return (a or b)[::-1]
self.write_debug(join_nonempty( self.write_debug(join_nonempty(
'Initial search was unable to find nsig function name', 'Initial search was unable to find nsig function name',
player_url and f' player = {player_url}', delim='\n'), only_once=True) player_url and f' player = {player_url}', delim='\n'), only_once=True)
@ -2286,8 +2289,8 @@ def _extract_n_function_name(self, jscode, player_url=None):
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode, rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode,
f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)] f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)]
def _extract_player_js_global_var(self, jscode, player_url): def _interpret_player_js_global_var(self, jscode, player_url):
"""Returns tuple of strings: variable assignment code, variable name, variable value code""" """Returns tuple of: variable name string, variable value list"""
extract_global_var = self._cached(self._search_regex, 'js global array', player_url) extract_global_var = self._cached(self._search_regex, 'js global array', player_url)
varcode, varname, varvalue = extract_global_var( varcode, varname, varvalue = extract_global_var(
r'''(?x) r'''(?x)
@ -2305,27 +2308,23 @@ def _extract_player_js_global_var(self, jscode, player_url):
self.write_debug(join_nonempty( self.write_debug(join_nonempty(
'No global array variable found in player JS', 'No global array variable found in player JS',
player_url and f' player = {player_url}', delim='\n'), only_once=True) player_url and f' player = {player_url}', delim='\n'), only_once=True)
return varcode, varname, varvalue return None, None
def _interpret_player_js_global_var(self, jscode, player_url): jsi = JSInterpreter(varcode)
"""Returns tuple of: variable name string, variable value list"""
_, varname, array_code = self._extract_player_js_global_var(jscode, player_url)
jsi = JSInterpreter(array_code)
interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url) interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url)
return varname, interpret_global_var(array_code, {}, allow_recursion=10) return varname, interpret_global_var(varvalue, {}, allow_recursion=10)
def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url): def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url):
varcode, varname, _ = self._extract_player_js_global_var(jscode, player_url) varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if varcode and varname: if varname and global_list:
nsig_code = varcode + '; ' + nsig_code nsig_code = f'var {varname}={json.dumps(global_list)}; {nsig_code}'
_, global_list = self._interpret_player_js_global_var(jscode, player_url)
else: else:
varname = 'dlp_wins' varname = 'dlp_wins'
global_list = [] global_list = []
undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+' undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+'
fixed_code = re.sub( fixed_code = re.sub(
rf'''(?x) fr'''(?x)
;\s*if\s*\(\s*typeof\s+[a-zA-Z0-9_$]+\s*===?\s*(?: ;\s*if\s*\(\s*typeof\s+[a-zA-Z0-9_$]+\s*===?\s*(?:
(["\'])undefined\1| (["\'])undefined\1|
{re.escape(varname)}\[{undefined_idx}\] {re.escape(varname)}\[{undefined_idx}\]
@ -2399,6 +2398,11 @@ def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=F
return sts return sts
def _mark_watched(self, video_id, player_responses): def _mark_watched(self, video_id, player_responses):
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join(CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(16))
for is_full, key in enumerate(('videostatsPlaybackUrl', 'videostatsWatchtimeUrl')): for is_full, key in enumerate(('videostatsPlaybackUrl', 'videostatsWatchtimeUrl')):
label = 'fully ' if is_full else '' label = 'fully ' if is_full else ''
url = get_first(player_responses, ('playbackTracking', key, 'baseUrl'), url = get_first(player_responses, ('playbackTracking', key, 'baseUrl'),
@ -2409,11 +2413,6 @@ def _mark_watched(self, video_id, player_responses):
parsed_url = urllib.parse.urlparse(url) parsed_url = urllib.parse.urlparse(url)
qs = urllib.parse.parse_qs(parsed_url.query) qs = urllib.parse.parse_qs(parsed_url.query)
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join(CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(16))
# # more consistent results setting it to right before the end # # more consistent results setting it to right before the end
video_length = [str(float((qs.get('len') or ['1.5'])[0]) - 1)] video_length = [str(float((qs.get('len') or ['1.5'])[0]) - 1)]
@ -2821,6 +2820,10 @@ def _generate_player_context(cls, sts=None):
context['signatureTimestamp'] = sts context['signatureTimestamp'] = sts
return { return {
'playbackContext': { 'playbackContext': {
'adPlaybackContext': {
'pyv': True,
'adType': 'AD_TYPE_INSTREAM',
},
'contentPlaybackContext': context, 'contentPlaybackContext': context,
}, },
**cls._get_checkok_params(), **cls._get_checkok_params(),
@ -2863,7 +2866,8 @@ def _get_config_po_token(self, client: str, context: _PoTokenContext):
continue continue
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None, def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None,
data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None, **kwargs): data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None,
required=False, **kwargs):
""" """
Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client. Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client.
@ -2878,6 +2882,7 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
@param player_url: player URL. @param player_url: player URL.
@param video_id: video ID. @param video_id: video ID.
@param webpage: video webpage. @param webpage: video webpage.
@param required: Whether the PO Token is required (i.e. try to fetch unless policy is "never").
@param kwargs: Additional arguments to pass down. May be more added in the future. @param kwargs: Additional arguments to pass down. May be more added in the future.
@return: The fetched PO Token. None if it could not be fetched. @return: The fetched PO Token. None if it could not be fetched.
""" """
@ -2926,6 +2931,7 @@ def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None,
player_url=player_url, player_url=player_url,
video_id=video_id, video_id=video_id,
video_webpage=webpage, video_webpage=webpage,
required=required,
**kwargs, **kwargs,
) )
@ -2945,6 +2951,7 @@ def _fetch_po_token(self, client, **kwargs):
or ( or (
fetch_pot_policy == 'auto' fetch_pot_policy == 'auto'
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS'] and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
and not kwargs.get('required', False)
) )
): ):
return None return None
@ -3133,6 +3140,8 @@ def append_client(*client_names):
player_url = self._download_player_url(video_id) player_url = self._download_player_url(video_id)
tried_iframe_fallback = True tried_iframe_fallback = True
pr = initial_pr if client == 'web' else None
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg) visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg) data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg)
@ -3147,12 +3156,19 @@ def append_client(*client_names):
'ytcfg': player_ytcfg or self._get_default_ytcfg(client), 'ytcfg': player_ytcfg or self._get_default_ytcfg(client),
} }
player_po_token = self.fetch_po_token( # Don't need a player PO token for WEB if using player response from webpage
player_po_token = None if pr else self.fetch_po_token(
context=_PoTokenContext.PLAYER, **fetch_po_token_args) context=_PoTokenContext.PLAYER, **fetch_po_token_args)
gvs_po_token = self.fetch_po_token( gvs_po_token = self.fetch_po_token(
context=_PoTokenContext.GVS, **fetch_po_token_args) context=_PoTokenContext.GVS, **fetch_po_token_args)
fetch_subs_po_token_func = functools.partial(
self.fetch_po_token,
context=_PoTokenContext.SUBS,
**fetch_po_token_args,
)
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS'] required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
if ( if (
@ -3179,7 +3195,6 @@ def append_client(*client_names):
only_once=True) only_once=True)
deprioritize_pr = True deprioritize_pr = True
pr = initial_pr if client == 'web' else None
try: try:
pr = pr or self._extract_player_response( pr = pr or self._extract_player_response(
client, video_id, client, video_id,
@ -3197,10 +3212,13 @@ def append_client(*client_names):
if pr_id := self._invalid_player_response(pr, video_id): if pr_id := self._invalid_player_response(pr, video_id):
skipped_clients[client] = pr_id skipped_clients[client] = pr_id
elif pr: elif pr:
# Save client name for introspection later # Save client details for introspection later
sd = traverse_obj(pr, ('streamingData', {dict})) or {} innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT')
sd = pr.setdefault('streamingData', {})
sd[STREAMING_DATA_CLIENT_NAME] = client sd[STREAMING_DATA_CLIENT_NAME] = client
sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context
sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})): for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
f[STREAMING_DATA_CLIENT_NAME] = client f[STREAMING_DATA_CLIENT_NAME] = client
f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
@ -3262,6 +3280,25 @@ def _report_pot_format_skipped(self, video_id, client_name, proto):
else: else:
self.report_warning(msg, only_once=True) self.report_warning(msg, only_once=True)
def _report_pot_subtitles_skipped(self, video_id, client_name, msg=None):
msg = msg or (
f'{video_id}: Some {client_name} client subtitles require a PO Token which was not provided. '
'They will be discarded since they are not downloadable as-is. '
f'You can manually pass a Subtitles PO Token for this client with '
f'--extractor-args "youtube:po_token={client_name}.subs+XXX" . '
f'For more information, refer to {PO_TOKEN_GUIDE_URL}')
subs_wanted = any((
self.get_param('writesubtitles'),
self.get_param('writeautomaticsub'),
self.get_param('listsubtitles')))
# Only raise a warning for non-default clients, to not confuse users.
if not subs_wanted or client_name in (*self._DEFAULT_CLIENTS, *self._DEFAULT_AUTHED_CLIENTS):
self.write_debug(msg, only_once=True)
else:
self.report_warning(msg, only_once=True)
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration): def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
CHUNK_SIZE = 10 << 20 CHUNK_SIZE = 10 << 20
PREFERRED_LANG_VALUE = 10 PREFERRED_LANG_VALUE = 10
@ -3365,8 +3402,15 @@ def build_fragments(f):
self._decrypt_signature(encrypted_sig, video_id, player_url), self._decrypt_signature(encrypted_sig, video_id, player_url),
) )
except ExtractorError as e: except ExtractorError as e:
self.report_warning('Signature extraction failed: Some formats may be missing', self.report_warning(
video_id=video_id, only_once=True) f'Signature extraction failed: Some formats may be missing\n'
f' player = {player_url}\n'
f' {bug_reports_message(before="")}',
video_id=video_id, only_once=True)
self.write_debug(
f'{video_id}: Signature extraction failure info:\n'
f' encrypted sig = {encrypted_sig}\n'
f' player = {player_url}')
self.write_debug(e, only_once=True) self.write_debug(e, only_once=True)
continue continue
@ -3512,6 +3556,11 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ') f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ')
f['source_preference'] -= 20 f['source_preference'] -= 20
# XXX: Check if IOS HLS formats are affected by player PO token enforcement; temporary
# See https://github.com/yt-dlp/yt-dlp/issues/13511
if proto == 'hls' and client_name == 'ios':
f['__needs_testing'] = True
itags[itag].add(key) itags[itag].add(key)
if itag and all_formats: if itag and all_formats:
@ -3553,6 +3602,9 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}' hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}'
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live') hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
for sub in traverse_obj(subs, (..., ..., {dict})):
# HLS subs (m3u8) do not need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) subtitles = self._merge_subtitles(subs, subtitles)
for f in fmts: for f in fmts:
if process_manifest_format(f, 'hls', client_name, self._search_regex( if process_manifest_format(f, 'hls', client_name, self._search_regex(
@ -3564,6 +3616,9 @@ def process_manifest_format(f, proto, client_name, itag, po_token):
if po_token: if po_token:
dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}' dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}'
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False) formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
for sub in traverse_obj(subs, (..., ..., {dict})):
# TODO: Investigate if DASH subs ever need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
for f in formats: for f in formats:
if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token): if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token):
@ -3890,47 +3945,85 @@ def is_bad_format(fmt):
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'), 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'),
} }
def get_lang_code(track):
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
or track.get('languageCode'))
def process_language(container, base_url, lang_code, sub_name, client_name, query):
lang_subs = container.setdefault(lang_code, [])
for fmt in self._SUBTITLE_FORMATS:
query = {**query, 'fmt': fmt}
lang_subs.append({
'ext': fmt,
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
'name': sub_name,
STREAMING_DATA_CLIENT_NAME: client_name,
})
subtitles = {} subtitles = {}
pctr = traverse_obj(player_responses, (..., 'captions', 'playerCaptionsTracklistRenderer'), expected_type=dict) skipped_subs_clients = set()
if pctr:
def get_lang_code(track):
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
or track.get('languageCode'))
# Converted into dicts to remove duplicates # Only web/mweb clients provide translationLanguages, so include initial_pr in the traversal
captions = { translation_languages = {
get_lang_code(sub): sub lang['languageCode']: self._get_text(lang['languageName'], max_runs=1)
for sub in traverse_obj(pctr, (..., 'captionTracks', ...))} for lang in traverse_obj(player_responses, (
translation_languages = { ..., 'captions', 'playerCaptionsTracklistRenderer', 'translationLanguages',
lang.get('languageCode'): self._get_text(lang.get('languageName'), max_runs=1) lambda _, v: v['languageCode'] and v['languageName']))
for lang in traverse_obj(pctr, (..., 'translationLanguages', ...))} }
# NB: Constructing the full subtitle dictionary is slow
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and (
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles'))
def process_language(container, base_url, lang_code, sub_name, query): # Filter out initial_pr which does not have streamingData (smuggled client context)
lang_subs = container.setdefault(lang_code, []) prs = traverse_obj(player_responses, (
for fmt in self._SUBTITLE_FORMATS: lambda _, v: v['streamingData'] and v['captions']['playerCaptionsTracklistRenderer']))
query.update({ all_captions = traverse_obj(prs, (
'fmt': fmt, ..., 'captions', 'playerCaptionsTracklistRenderer', 'captionTracks', ..., {dict}))
}) need_subs_langs = {get_lang_code(sub) for sub in all_captions if sub.get('kind') != 'asr'}
lang_subs.append({ need_caps_langs = {
'ext': fmt, remove_start(get_lang_code(sub), 'a-')
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)), for sub in all_captions if sub.get('kind') == 'asr'}
'name': sub_name,
})
# NB: Constructing the full subtitle dictionary is slow for pr in prs:
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and ( pctr = pr['captions']['playerCaptionsTracklistRenderer']
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles')) client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME]
for lang_code, caption_track in captions.items(): innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName']
base_url = caption_track.get('baseUrl') required_contexts = self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
orig_lang = parse_qs(base_url).get('lang', [None])[-1] fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN]
if not base_url:
continue pot_params = {}
already_fetched_pot = False
for caption_track in traverse_obj(pctr, ('captionTracks', lambda _, v: v['baseUrl'])):
base_url = caption_track['baseUrl']
qs = parse_qs(base_url)
lang_code = get_lang_code(caption_track)
requires_pot = (
# We can detect the experiment for now
any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv'))
or _PoTokenContext.SUBS in required_contexts)
if not already_fetched_pot:
already_fetched_pot = True
if subs_po_token := fetch_subs_po_token_func(required=requires_pot):
pot_params.update({
'pot': subs_po_token,
'potc': '1',
'c': innertube_client_name,
})
if not pot_params and requires_pot:
skipped_subs_clients.add(client_name)
self._report_pot_subtitles_skipped(video_id, client_name)
break
orig_lang = qs.get('lang', [None])[-1]
lang_name = self._get_text(caption_track, 'name', max_runs=1) lang_name = self._get_text(caption_track, 'name', max_runs=1)
if caption_track.get('kind') != 'asr': if caption_track.get('kind') != 'asr':
if not lang_code: if not lang_code:
continue continue
process_language( process_language(
subtitles, base_url, lang_code, lang_name, {}) subtitles, base_url, lang_code, lang_name, client_name, pot_params)
if not caption_track.get('isTranslatable'): if not caption_track.get('isTranslatable'):
continue continue
for trans_code, trans_name in translation_languages.items(): for trans_code, trans_name in translation_languages.items():
@ -3950,10 +4043,25 @@ def process_language(container, base_url, lang_code, sub_name, query):
# Add an "-orig" label to the original language so that it can be distinguished. # Add an "-orig" label to the original language so that it can be distinguished.
# The subs are returned without "-orig" as well for compatibility # The subs are returned without "-orig" as well for compatibility
process_language( process_language(
automatic_captions, base_url, f'{trans_code}-orig', f'{trans_name} (Original)', {}) automatic_captions, base_url, f'{trans_code}-orig',
f'{trans_name} (Original)', client_name, pot_params)
# Setting tlang=lang returns damaged subtitles. # Setting tlang=lang returns damaged subtitles.
process_language(automatic_captions, base_url, trans_code, trans_name, process_language(
{} if orig_lang == orig_trans_code else {'tlang': trans_code}) automatic_captions, base_url, trans_code, trans_name, client_name,
pot_params if orig_lang == orig_trans_code else {'tlang': trans_code, **pot_params})
# Avoid duplication if we've already got everything we need
need_subs_langs.difference_update(subtitles)
need_caps_langs.difference_update(automatic_captions)
if not (need_subs_langs or need_caps_langs):
break
if skipped_subs_clients and (need_subs_langs or need_caps_langs):
self._report_pot_subtitles_skipped(video_id, True, msg=join_nonempty(
f'{video_id}: There are missing subtitles languages because a PO token was not provided.',
need_subs_langs and f'Subtitles for these languages are missing: {", ".join(need_subs_langs)}.',
need_caps_langs and f'Automatic captions for {len(need_caps_langs)} languages are missing.',
delim=' '))
info['automatic_captions'] = automatic_captions info['automatic_captions'] = automatic_captions
info['subtitles'] = subtitles info['subtitles'] = subtitles
@ -4181,6 +4289,7 @@ def process_language(container, base_url, lang_code, sub_name, query):
if upload_date and live_status not in ('is_live', 'post_live', 'is_upcoming'): if upload_date and live_status not in ('is_live', 'post_live', 'is_upcoming'):
# Newly uploaded videos' HLS formats are potentially problematic and need to be checked # Newly uploaded videos' HLS formats are potentially problematic and need to be checked
# XXX: This is redundant for as long as we are already checking all IOS HLS formats
upload_datetime = datetime_from_str(upload_date).replace(tzinfo=dt.timezone.utc) upload_datetime = datetime_from_str(upload_date).replace(tzinfo=dt.timezone.utc)
if upload_datetime >= datetime_from_str('today-2days'): if upload_datetime >= datetime_from_str('today-2days'):
for fmt in info['formats']: for fmt in info['formats']:

View File

@ -39,6 +39,7 @@
class PoTokenContext(enum.Enum): class PoTokenContext(enum.Enum):
GVS = 'gvs' GVS = 'gvs'
PLAYER = 'player' PLAYER = 'player'
SUBS = 'subs'
@dataclasses.dataclass @dataclasses.dataclass

View File

@ -20,6 +20,7 @@
'WEB_EMBEDDED_PLAYER', 'WEB_EMBEDDED_PLAYER',
'WEB_CREATOR', 'WEB_CREATOR',
'WEB_REMIX', 'WEB_REMIX',
'TVHTML5_SIMPLY',
'TVHTML5_SIMPLY_EMBEDDED_PLAYER', 'TVHTML5_SIMPLY_EMBEDDED_PLAYER',
) )
@ -51,7 +52,7 @@ def get_webpo_content_binding(
return visitor_id, ContentBindingType.VISITOR_ID return visitor_id, ContentBindingType.VISITOR_ID
return request.visitor_data, ContentBindingType.VISITOR_DATA return request.visitor_data, ContentBindingType.VISITOR_DATA
elif request.context == PoTokenContext.PLAYER or client_name != 'WEB_REMIX': elif request.context in (PoTokenContext.PLAYER, PoTokenContext.SUBS):
return request.video_id, ContentBindingType.VIDEO_ID return request.video_id, ContentBindingType.VIDEO_ID
return None, None return None, None

View File

@ -6,6 +6,7 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
ISO639Utils,
determine_ext, determine_ext,
filter_dict, filter_dict,
float_or_none, float_or_none,
@ -118,10 +119,7 @@ def _extract_ptmd(self, ptmd_urls, video_id, api_token=None, aspect_ratio=None):
if ext == 'm3u8': if ext == 'm3u8':
fmts = self._extract_m3u8_formats( fmts = self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False) format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
elif ext == 'mpd': elif ext in ('mp4', 'webm'):
fmts = self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False)
else:
height = int_or_none(quality.get('highestVerticalResolution')) height = int_or_none(quality.get('highestVerticalResolution'))
width = round(aspect_ratio * height) if aspect_ratio and height else None width = round(aspect_ratio * height) if aspect_ratio and height else None
fmts = [{ fmts = [{
@ -132,16 +130,31 @@ def _extract_ptmd(self, ptmd_urls, video_id, api_token=None, aspect_ratio=None):
'format_id': join_nonempty('http', stream.get('type')), 'format_id': join_nonempty('http', stream.get('type')),
'tbr': int_or_none(self._search_regex(r'_(\d+)k_', format_url, 'tbr', default=None)), 'tbr': int_or_none(self._search_regex(r'_(\d+)k_', format_url, 'tbr', default=None)),
}] }]
else:
self.report_warning(f'Skipping unsupported extension "{ext}"', video_id=video_id)
fmts = []
f_class = variant.get('class') f_class = variant.get('class')
for f in fmts: for f in fmts:
f_lang = ISO639Utils.short2long(
(f.get('language') or variant.get('language') or '').lower())
is_audio_only = f.get('vcodec') == 'none'
formats.append({ formats.append({
**f, **f,
'format_id': join_nonempty(f.get('format_id'), is_dgs and 'dgs'), 'format_id': join_nonempty(f['format_id'], is_dgs and 'dgs'),
'format_note': join_nonempty( 'format_note': join_nonempty(
f_class, is_dgs and 'German Sign Language', f.get('format_note'), delim=', '), not is_audio_only and f_class,
'language': variant.get('language') or f.get('language'), is_dgs and 'German Sign Language',
f.get('format_note'), delim=', '),
'preference': -2 if is_dgs else -1, 'preference': -2 if is_dgs else -1,
'language_preference': 10 if f_class == 'main' else -10 if f_class == 'ad' else -1, 'language': f_lang,
'language_preference': (
-10 if ((is_audio_only and f.get('format_note') == 'Audiodeskription')
or (not is_audio_only and f_class == 'ad'))
else 10 if f_lang == 'deu' and f_class == 'main'
else 5 if f_lang == 'deu'
else 1 if f_class == 'main'
else -1),
}) })
return { return {
@ -333,12 +346,13 @@ class ZDFIE(ZDFBaseIE):
'title': 'Dobrindt schließt Steuererhöhungen aus', 'title': 'Dobrindt schließt Steuererhöhungen aus',
'description': 'md5:9a117646d7b8df6bc902eb543a9c9023', 'description': 'md5:9a117646d7b8df6bc902eb543a9c9023',
'duration': 325, 'duration': 325,
'thumbnail': 'https://www.zdf.de/assets/dobrindt-csu-berlin-direkt-100~1920x1080?cb=1743357653736', 'thumbnail': 'https://www.zdfheute.de/assets/dobrindt-csu-berlin-direkt-100~1920x1080?cb=1743357653736',
'timestamp': 1743374520, 'timestamp': 1743374520,
'upload_date': '20250330', 'upload_date': '20250330',
'_old_archive_ids': ['zdf 250330_clip_2_bdi'], '_old_archive_ids': ['zdf 250330_clip_2_bdi'],
}, },
}, { }, {
# FUNK video (hosted on a different CDN, has atypical PTMD and HLS files)
'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html', 'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html',
'md5': '57af4423db0455a3975d2dc4578536bc', 'md5': '57af4423db0455a3975d2dc4578536bc',
'info_dict': { 'info_dict': {
@ -651,6 +665,7 @@ class ZDFChannelIE(ZDFBaseIE):
'description': 'md5:6edad39189abf8431795d3d6d7f986b3', 'description': 'md5:6edad39189abf8431795d3d6d7f986b3',
}, },
'playlist_count': 242, 'playlist_count': 242,
'skip': 'Video count changes daily, needs support for playlist_maxcount',
}] }]
_PAGE_SIZE = 24 _PAGE_SIZE = 24

View File

@ -590,39 +590,12 @@ def dict_item(key, val):
return ret, True return ret, True
return ret, False return ret, False
for m in re.finditer(rf'''(?x)
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
var = m.group('var1') or m.group('var2')
start, end = m.span()
sign = m.group('pre_sign') or m.group('post_sign')
ret = local_vars[var]
local_vars[var] += 1 if sign[0] == '+' else -1
if m.group('pre_sign'):
ret = local_vars[var]
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
if not expr:
return None, should_return
m = re.match(fr'''(?x) m = re.match(fr'''(?x)
(?P<assign>
(?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s* (?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s*
(?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})? (?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})?
=(?!=)(?P<expr>.*)$ =(?!=)(?P<expr>.*)$
)|(?P<return> ''', expr)
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$ if m: # We are assigning a value to a variable
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:
(?P<nullish>\?)?\.(?P<member>[^(]+)|
\[(?P<member2>{_NESTED_BRACKETS})\]
)\s*
)|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr)
if m and m.group('assign'):
left_val = local_vars.get(m.group('out')) left_val = local_vars.get(m.group('out'))
if not m.group('index'): if not m.group('index'):
@ -640,7 +613,35 @@ def dict_item(key, val):
m.group('op'), self._index(left_val, idx), m.group('expr'), expr, local_vars, allow_recursion) m.group('op'), self._index(left_val, idx), m.group('expr'), expr, local_vars, allow_recursion)
return left_val[idx], should_return return left_val[idx], should_return
elif expr.isdigit(): for m in re.finditer(rf'''(?x)
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
var = m.group('var1') or m.group('var2')
start, end = m.span()
sign = m.group('pre_sign') or m.group('post_sign')
ret = local_vars[var]
local_vars[var] += 1 if sign[0] == '+' else -1
if m.group('pre_sign'):
ret = local_vars[var]
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
if not expr:
return None, should_return
m = re.match(fr'''(?x)
(?P<return>
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:
(?P<nullish>\?)?\.(?P<member>[^(]+)|
\[(?P<member2>{_NESTED_BRACKETS})\]
)\s*
)|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr)
if expr.isdigit():
return int(expr), should_return return int(expr), should_return
elif expr == 'break': elif expr == 'break':

View File

@ -230,6 +230,9 @@ def format_option_help(self, formatter=None):
formatter.indent() formatter.indent()
heading = formatter.format_heading('Preset Aliases') heading = formatter.format_heading('Preset Aliases')
formatter.indent() formatter.indent()
description = formatter.format_description(
'Predefined aliases for convenience and ease of use. Note that future versions of yt-dlp '
'may add or adjust presets, but the existing preset names will not be changed or removed')
result = [] result = []
for name, args in _PRESET_ALIASES.items(): for name, args in _PRESET_ALIASES.items():
option = optparse.Option('-t', help=shlex.join(args)) option = optparse.Option('-t', help=shlex.join(args))
@ -238,7 +241,7 @@ def format_option_help(self, formatter=None):
formatter.dedent() formatter.dedent()
formatter.dedent() formatter.dedent()
help_lines = '\n'.join(result) help_lines = '\n'.join(result)
return f'{formatted_help}\n{heading}{help_lines}' return f'{formatted_help}\n{heading}{description}\n{help_lines}'
def create_parser(): def create_parser():
@ -470,7 +473,7 @@ def _preset_alias_callback(option, opt_str, value, parser):
general.add_option( general.add_option(
'--live-from-start', '--live-from-start',
action='store_true', dest='live_from_start', action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently only supported for YouTube (experimental) and Twitch') help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch')
general.add_option( general.add_option(
'--no-live-from-start', '--no-live-from-start',
action='store_false', dest='live_from_start', action='store_false', dest='live_from_start',
@ -545,9 +548,9 @@ def _preset_alias_callback(option, opt_str, value, parser):
help=( help=(
'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". ' 'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". '
'Arguments are parsed according to the Python string formatting mini-language. ' 'Arguments are parsed according to the Python string formatting mini-language. '
'E.g. --alias get-audio,-X "-S=aext:{0},abr -x --audio-format {0}" creates options ' 'E.g. --alias get-audio,-X "-S aext:{0},abr -x --audio-format {0}" creates options '
'"--get-audio" and "-X" that takes an argument (ARG0) and expands to ' '"--get-audio" and "-X" that takes an argument (ARG0) and expands to '
'"-S=aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. ' '"-S aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. '
'Alias options can trigger more aliases; so be careful to avoid defining recursive options. ' 'Alias options can trigger more aliases; so be careful to avoid defining recursive options. '
f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. ' f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. '
'This option can be used multiple times')) 'This option can be used multiple times'))

View File

@ -0,0 +1 @@
# Utility functions for handling web input based on commonly used JavaScript libraries

View File

@ -0,0 +1,167 @@
from __future__ import annotations
import array
import base64
import datetime as dt
import math
import re
from .._utils import parse_iso8601
TYPE_CHECKING = False
if TYPE_CHECKING:
import collections.abc
import typing
T = typing.TypeVar('T')
_ARRAY_TYPE_LOOKUP = {
'Int8Array': 'b',
'Uint8Array': 'B',
'Uint8ClampedArray': 'B',
'Int16Array': 'h',
'Uint16Array': 'H',
'Int32Array': 'i',
'Uint32Array': 'I',
'Float32Array': 'f',
'Float64Array': 'd',
'BigInt64Array': 'l',
'BigUint64Array': 'L',
'ArrayBuffer': 'B',
}
def parse_iter(parsed: typing.Any, /, *, revivers: dict[str, collections.abc.Callable[[list], typing.Any]] | None = None):
# based on https://github.com/Rich-Harris/devalue/blob/f3fd2aa93d79f21746555671f955a897335edb1b/src/parse.js
resolved = {
-1: None,
-2: None,
-3: math.nan,
-4: math.inf,
-5: -math.inf,
-6: -0.0,
}
if isinstance(parsed, int) and not isinstance(parsed, bool):
if parsed not in resolved or parsed == -2:
raise ValueError('invalid integer input')
return resolved[parsed]
elif not isinstance(parsed, list):
raise ValueError('expected int or list as input')
elif not parsed:
raise ValueError('expected a non-empty list as input')
if revivers is None:
revivers = {}
return_value = [None]
stack: list[tuple] = [(return_value, 0, 0)]
while stack:
target, index, source = stack.pop()
if isinstance(source, tuple):
name, source, reviver = source
try:
resolved[source] = target[index] = reviver(target[index])
except Exception as error:
yield TypeError(f'failed to parse {source} as {name!r}: {error}')
resolved[source] = target[index] = None
continue
if source in resolved:
target[index] = resolved[source]
continue
# guard against Python negative indexing
if source < 0:
yield IndexError(f'invalid index: {source!r}')
continue
try:
value = parsed[source]
except IndexError as error:
yield error
continue
if isinstance(value, list):
if value and isinstance(value[0], str):
# TODO: implement zips `strict=True`
if reviver := revivers.get(value[0]):
if value[1] == source:
# XXX: avoid infinite loop
yield IndexError(f'{value[0]!r} cannot point to itself (index: {source})')
continue
# inverse order: resolve index, revive value
stack.append((target, index, (value[0], value[1], reviver)))
stack.append((target, index, value[1]))
continue
elif value[0] == 'Date':
try:
result = dt.datetime.fromtimestamp(parse_iso8601(value[1]), tz=dt.timezone.utc)
except Exception:
yield ValueError(f'invalid date: {value[1]!r}')
result = None
elif value[0] == 'Set':
result = [None] * (len(value) - 1)
for offset, new_source in enumerate(value[1:]):
stack.append((result, offset, new_source))
elif value[0] == 'Map':
result = []
for key, new_source in zip(*(iter(value[1:]),) * 2):
pair = [None, None]
stack.append((pair, 0, key))
stack.append((pair, 1, new_source))
result.append(pair)
elif value[0] == 'RegExp':
# XXX: use jsinterp to translate regex flags
# currently ignores `value[2]`
result = re.compile(value[1])
elif value[0] == 'Object':
result = value[1]
elif value[0] == 'BigInt':
result = int(value[1])
elif value[0] == 'null':
result = {}
for key, new_source in zip(*(iter(value[1:]),) * 2):
stack.append((result, key, new_source))
elif value[0] in _ARRAY_TYPE_LOOKUP:
typecode = _ARRAY_TYPE_LOOKUP[value[0]]
data = base64.b64decode(value[1])
result = array.array(typecode, data).tolist()
else:
yield TypeError(f'invalid type at {source}: {value[0]!r}')
result = None
else:
result = len(value) * [None]
for offset, new_source in enumerate(value):
stack.append((result, offset, new_source))
elif isinstance(value, dict):
result = {}
for key, new_source in value.items():
stack.append((result, key, new_source))
else:
result = value
target[index] = resolved[source] = result
return return_value[0]
def parse(parsed: typing.Any, /, *, revivers: dict[str, collections.abc.Callable[[typing.Any], typing.Any]] | None = None):
generator = parse_iter(parsed, revivers=revivers)
while True:
try:
raise generator.send(None)
except StopIteration as error:
return error.value

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2025.04.30' __version__ = '2025.06.25'
RELEASE_GIT_HEAD = '505b400795af557bdcfd9d4fa7e9133b26ef431c' RELEASE_GIT_HEAD = '1838a1ce5d4ade80770ba9162eaffc9a1607dc70'
VARIANT = None VARIANT = None
@ -12,4 +12,4 @@
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.04.30' _pkg_version = '2025.06.25'