1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-07-09 23:08:32 +00:00

Merge remote-tracking branch 'upstream/master' into wait-retries

This commit is contained in:
Paul Storkman 2025-03-14 23:30:13 +01:00
commit 65e217714d
106 changed files with 7106 additions and 5172 deletions

View File

@ -2,13 +2,11 @@ name: Broken site support
description: Report issue with yt-dlp on a supported site
labels: [triage, site-bug]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -24,9 +22,7 @@ body:
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input
@ -47,6 +43,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -78,11 +76,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -2,13 +2,11 @@ name: Site support request
description: Request support for a new site
labels: [triage, site-request]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -24,9 +22,7 @@ body:
required: true
- label: I've checked that none of provided URLs [violate any copyrights](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy) or contain any [DRM](https://en.wikipedia.org/wiki/Digital_rights_management) to the best of my knowledge
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and am willing to share it if required
- type: input
@ -59,6 +55,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -90,11 +88,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -1,14 +1,12 @@
name: Site feature request
description: Request a new functionality for a supported site
description: Request new functionality for a site supported by yt-dlp
labels: [triage, site-enhancement]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -22,9 +20,7 @@ body:
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input
@ -55,6 +51,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -86,11 +84,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -2,13 +2,11 @@ name: Core bug report
description: Report a bug unrelated to any particular site or extractor
labels: [triage, bug]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -20,13 +18,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description
@ -40,6 +32,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -71,11 +65,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -1,14 +1,12 @@
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
description: Request a new feature unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -22,9 +20,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description
@ -38,6 +34,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
- label: "If using API, add `'verbose': True` to `YoutubeDL` params instead"
@ -65,11 +63,3 @@ body:
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
<more lines>
render: shell
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -1,14 +1,12 @@
name: Ask question
description: Ask yt-dlp related question
description: Ask a question about using yt-dlp
labels: [question]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: markdown
attributes:
value: |
@ -28,9 +26,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: question
@ -44,6 +40,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
- label: "If using API, add `'verbose': True` to `YoutubeDL` params instead"
@ -71,11 +69,3 @@ body:
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
<more lines>
render: shell
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View File

@ -1,8 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Get help from the community on Discord
- name: Get help on Discord
url: https://discord.gg/H5MNcFW63r
about: Join the yt-dlp Discord for community-powered support!
- name: Matrix Bridge to the Discord server
url: https://matrix.to/#/#yt-dlp:matrix.org
about: For those who do not want to use Discord
about: Join the yt-dlp Discord server for support and discussion

View File

@ -18,9 +18,7 @@ body:
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input

View File

@ -18,9 +18,7 @@ body:
required: true
- label: I've checked that none of provided URLs [violate any copyrights](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy) or contain any [DRM](https://en.wikipedia.org/wiki/Digital_rights_management) to the best of my knowledge
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and am willing to share it if required
- type: input

View File

@ -1,5 +1,5 @@
name: Site feature request
description: Request a new functionality for a supported site
description: Request new functionality for a site supported by yt-dlp
labels: [triage, site-enhancement]
body:
%(no_skip)s
@ -16,9 +16,7 @@ body:
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input

View File

@ -14,13 +14,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description

View File

@ -1,5 +1,5 @@
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
description: Request a new feature unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
%(no_skip)s
@ -16,9 +16,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description

View File

@ -1,5 +1,5 @@
name: Ask question
description: Ask yt-dlp related question
description: Ask a question about using yt-dlp
labels: [question]
body:
%(no_skip)s
@ -22,9 +22,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: question

View File

@ -1,14 +1,17 @@
**IMPORTANT**: PRs without the template will be CLOSED
<!--
**IMPORTANT**: PRs without the template will be CLOSED
Due to the high volume of pull requests, it may be a while before your PR is reviewed.
Please try to keep your pull request focused on a single bugfix or new feature.
Pull requests with a vast scope and/or very large diff will take much longer to review.
It is recommended for new contributors to stick to smaller pull requests, so you can receive much more immediate feedback as you familiarize yourself with the codebase.
PLEASE AVOID FORCE-PUSHING after opening a PR, as it makes reviewing more difficult.
-->
### Description of your *pull request* and other information
<!--
Explanation of your *pull request* in arbitrary form goes here. Please **make sure the description explains the purpose and effect** of your *pull request* and is worded well enough to be understood. Provide as much **context and examples** as possible
-->
ADD DESCRIPTION HERE
ADD DETAILED DESCRIPTION HERE
Fixes #
@ -16,24 +19,22 @@ ### Description of your *pull request* and other information
<details open><summary>Template</summary> <!-- OPEN is intentional -->
<!--
# PLEASE FOLLOW THE GUIDE BELOW
# PLEASE FOLLOW THE GUIDE BELOW
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes `[ ]` relevant to your *pull request* (like [x])
- Use *Preview* tab to see how your *pull request* will actually look like
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes `[ ]` relevant to your *pull request* (like [x])
- Use *Preview* tab to see what your *pull request* will actually look like
-->
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [contributing guidelines](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#developer-instructions) including [yt-dlp coding conventions](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#yt-dlp-coding-conventions)
- [ ] [Searched](https://github.com/yt-dlp/yt-dlp/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check all of the following options that apply:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check those that apply and remove the others:
- [ ] I am the original author of the code in this PR, and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of the code in this PR, but it is in the public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
### What is the purpose of your *pull request*? Check those that apply and remove the others:
- [ ] Fix or improvement to an extractor (Make sure to add/update tests)
- [ ] New extractor ([Piracy websites will not be accepted](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy))
- [ ] Core bug fix/improvement

View File

@ -33,7 +33,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@ -47,7 +47,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3
# Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
@ -60,6 +60,6 @@ jobs:
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{matrix.language}}"

View File

@ -736,3 +736,9 @@ NecroRomnt
pjrobertson
subsense
test20140
arantius
entourage8
lfavole
mp3butcher
slipinthedove
YoshiTabletopGamer

View File

@ -4,6 +4,49 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2025.02.19
#### Core changes
- **jsinterp**
- [Add `js_number_to_string`](https://github.com/yt-dlp/yt-dlp/commit/0d9f061d38c3a4da61972e2adad317079f2f1c84) ([#12110](https://github.com/yt-dlp/yt-dlp/issues/12110)) by [Grub4K](https://github.com/Grub4K)
- [Improve zeroise](https://github.com/yt-dlp/yt-dlp/commit/4ca8c44a073d5aa3a3e3112c35b2b23d6ce25ac6) ([#12313](https://github.com/yt-dlp/yt-dlp/issues/12313)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **acast**: [Support shows.acast.com URLs](https://github.com/yt-dlp/yt-dlp/commit/57c717fee4bfbc9309845bbb48901b72e4b69304) ([#12223](https://github.com/yt-dlp/yt-dlp/issues/12223)) by [barsnick](https://github.com/barsnick)
- **cwtv**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/18a28514e306e822eab4f3a79c76d515bf076406) ([#12207](https://github.com/yt-dlp/yt-dlp/issues/12207)) by [arantius](https://github.com/arantius)
- movie: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/03c3d705778c07739e0034b51490877cffdc0983) ([#12227](https://github.com/yt-dlp/yt-dlp/issues/12227)) by [bashonly](https://github.com/bashonly)
- **digiview**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/f53553087d3fde9dcd61d6e9f98caf09db1d8ef2) ([#9902](https://github.com/yt-dlp/yt-dlp/issues/9902)) by [lfavole](https://github.com/lfavole)
- **dropbox**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/861aeec449c8f3c062d962945b234ff0341f61f3) ([#12228](https://github.com/yt-dlp/yt-dlp/issues/12228)) by [bashonly](https://github.com/bashonly)
- **francetv**
- site
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/817483ccc68aed6049ed9c4a2ffae44ca82d2b1c) ([#12236](https://github.com/yt-dlp/yt-dlp/issues/12236)) by [bashonly](https://github.com/bashonly)
- [Fix livestream extraction](https://github.com/yt-dlp/yt-dlp/commit/1295bbedd45fa8d9bc3f7a194864ae280297848e) ([#12316](https://github.com/yt-dlp/yt-dlp/issues/12316)) by [bashonly](https://github.com/bashonly)
- **francetvinfo.fr**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/5c4c2ddfaa47988b4d50c1ad4988badc0b4f30c2) ([#12402](https://github.com/yt-dlp/yt-dlp/issues/12402)) by [bashonly](https://github.com/bashonly)
- **gem.cbc.ca**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/5271ef48c6f61c145e03e18e960995d2e651d205) ([#12404](https://github.com/yt-dlp/yt-dlp/issues/12404)) by [bashonly](https://github.com/bashonly), [dirkf](https://github.com/dirkf)
- **generic**: [Extract `live_status` for DASH manifest URLs](https://github.com/yt-dlp/yt-dlp/commit/19edaa44fcd375f54e63d6227b092f5252d3e889) ([#12256](https://github.com/yt-dlp/yt-dlp/issues/12256)) by [mp3butcher](https://github.com/mp3butcher)
- **globo**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f8d0161455f00add65585ca1a476a7b5d56f5f96) ([#11795](https://github.com/yt-dlp/yt-dlp/issues/11795)) by [slipinthedove](https://github.com/slipinthedove), [YoshiTabletopGamer](https://github.com/YoshiTabletopGamer)
- **goplay**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/d59f14a0a7a8b55e6bf468237def62b73ab4a517) ([#12237](https://github.com/yt-dlp/yt-dlp/issues/12237)) by [alard](https://github.com/alard)
- **pbs**: [Support www.thirteen.org URLs](https://github.com/yt-dlp/yt-dlp/commit/9fb8ab2ff67fb699f60cce09163a580976e90c0e) ([#11191](https://github.com/yt-dlp/yt-dlp/issues/11191)) by [rohieb](https://github.com/rohieb)
- **reddit**: [Bypass gated subreddit warning](https://github.com/yt-dlp/yt-dlp/commit/6ca23ffaa4663cb552f937f0b1e9769b66db11bd) ([#12335](https://github.com/yt-dlp/yt-dlp/issues/12335)) by [bashonly](https://github.com/bashonly)
- **twitter**: [Fix syndication token generation](https://github.com/yt-dlp/yt-dlp/commit/14cd7f3443c6da4d49edaefcc12da9dee86e243e) ([#12107](https://github.com/yt-dlp/yt-dlp/issues/12107)) by [Grub4K](https://github.com/Grub4K), [pjrobertson](https://github.com/pjrobertson)
- **youtube**
- [Retry on more critical requests](https://github.com/yt-dlp/yt-dlp/commit/d48e612609d012abbea3785be4d26d78a014abb2) ([#12339](https://github.com/yt-dlp/yt-dlp/issues/12339)) by [coletdjnz](https://github.com/coletdjnz)
- [nsig workaround for `tce` player JS](https://github.com/yt-dlp/yt-dlp/commit/ec17fb16e8d69d4e3e10fb73bf3221be8570dfee) ([#12401](https://github.com/yt-dlp/yt-dlp/issues/12401)) by [bashonly](https://github.com/bashonly)
- **zdf**: [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/241ace4f104d50fdf7638f9203927aefcf57a1f7) ([#9565](https://github.com/yt-dlp/yt-dlp/issues/9565)) by [StefanLobbenmeier](https://github.com/StefanLobbenmeier) (With fixes in [e7882b6](https://github.com/yt-dlp/yt-dlp/commit/e7882b682b959e476d8454911655b3e9b14c79b2) by [bashonly](https://github.com/bashonly))
#### Downloader changes
- **hls**
- [Fix `BYTERANGE` logic](https://github.com/yt-dlp/yt-dlp/commit/10b7ff68e98f17655e31952f6e17120b2d7dda96) ([#11972](https://github.com/yt-dlp/yt-dlp/issues/11972)) by [entourage8](https://github.com/entourage8)
- [Support `--write-pages` for m3u8 media playlists](https://github.com/yt-dlp/yt-dlp/commit/be69468752ff598cacee57bb80533deab2367a5d) ([#12333](https://github.com/yt-dlp/yt-dlp/issues/12333)) by [bashonly](https://github.com/bashonly)
- [Support `hls_media_playlist_data` format field](https://github.com/yt-dlp/yt-dlp/commit/c987be0acb6872c6561f28aa28171e803393d851) ([#12322](https://github.com/yt-dlp/yt-dlp/issues/12322)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- [Improve Issue/PR templates](https://github.com/yt-dlp/yt-dlp/commit/517ddf3c3f12560ab93e3d36244dc82db9f97818) ([#11499](https://github.com/yt-dlp/yt-dlp/issues/11499)) by [seproDev](https://github.com/seproDev) (With fixes in [4ecb833](https://github.com/yt-dlp/yt-dlp/commit/4ecb833472c90e078567b561fb7c089f1aa9587b) by [bashonly](https://github.com/bashonly))
- **cleanup**: Miscellaneous: [4985a40](https://github.com/yt-dlp/yt-dlp/commit/4985a4041770eaa0016271809a1fd950dc809a55) by [dirkf](https://github.com/dirkf), [Grub4K](https://github.com/Grub4K), [StefanLobbenmeier](https://github.com/StefanLobbenmeier)
- **docs**: [Add note to `supportedsites.md`](https://github.com/yt-dlp/yt-dlp/commit/01a63629a21781458dcbd38779898e117678f5ff) ([#12382](https://github.com/yt-dlp/yt-dlp/issues/12382)) by [seproDev](https://github.com/seproDev)
- **test**: download: [Validate and sort info dict fields](https://github.com/yt-dlp/yt-dlp/commit/208163447408c78673b08c172beafe5c310fb167) ([#12299](https://github.com/yt-dlp/yt-dlp/issues/12299)) by [bashonly](https://github.com/bashonly), [pzhlkj6612](https://github.com/pzhlkj6612)
### 2025.01.26
#### Core changes

View File

@ -6,7 +6,6 @@
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=brightgreen&label=Download&style=for-the-badge)](#installation "Installation")
[![PyPI](https://img.shields.io/badge/-PyPI-blue.svg?logo=pypi&labelColor=555555&style=for-the-badge)](https://pypi.org/project/yt-dlp "PyPI")
[![Donate](https://img.shields.io/badge/_-Donate-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](Collaborators.md#collaborators "Donate")
[![Matrix](https://img.shields.io/matrix/yt-dlp:matrix.org?color=brightgreen&labelColor=555555&label=&logo=element&style=for-the-badge)](https://matrix.to/#/#yt-dlp:matrix.org "Matrix")
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&labelColor=555555&label=&logo=discord&style=for-the-badge)](https://discord.gg/H5MNcFW63r "Discord")
[![Supported Sites](https://img.shields.io/badge/-Supported_Sites-brightgreen.svg?style=for-the-badge)](supportedsites.md "Supported Sites")
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE "License")
@ -338,10 +337,11 @@ ## General Options:
--plugin-dirs PATH Path to an additional directory to search
for plugins. This option can be used
multiple times to add multiple directories.
Note that this currently only works for
extractor plugins; postprocessor plugins can
only be loaded from the default plugin
directories
Use "default" to search the default plugin
directories (default)
--no-plugin-dirs Clear plugin directories to search,
including defaults and those provided by
previous --plugin-dirs
--flat-playlist Do not extract a playlist's URL result
entries; some entry metadata may be missing
and downloading may be bypassed
@ -1530,7 +1530,7 @@ ## Sorting Formats
- `hasvid`: Gives priority to formats that have a video stream
- `hasaud`: Gives priority to formats that have an audio stream
- `ie_pref`: The format preference
- `lang`: The language preference
- `lang`: The language preference as determined by the extractor (e.g. original language preferred over audio description)
- `quality`: The quality of the format
- `source`: The preference of the source
- `proto`: Protocol used for download (`https`/`ftps` > `http`/`ftp` > `m3u8_native`/`m3u8` > `http_dash_segments`> `websocket_frag` > `mms`/`rtsp` > `f4f`/`f4m`)
@ -1816,6 +1816,9 @@ #### hotstar
* `vcodec`: vcodec to ignore - one or more of `h264`, `h265`, `dvh265`
* `dr`: dynamic range to ignore - one or more of `sdr`, `hdr10`, `dv`
#### instagram
* `app_id`: The value of the `X-IG-App-ID` header used for API requests. Default is the web app ID, `936619743392459`
#### niconicochannelplus
* `max_comments`: Maximum number of comments to extract - default is `120`

View File

@ -11,11 +11,13 @@
from devscripts.utils import get_filename_args, read_file, write_file
VERBOSE_TMPL = '''
VERBOSE = '''
- type: checkboxes
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -47,31 +49,23 @@
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
'''.strip()
NO_SKIP = '''
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
'''.strip()
def main():
fields = {'no_skip': NO_SKIP}
fields['verbose'] = VERBOSE_TMPL % fields
fields['verbose_optional'] = re.sub(r'(\n\s+validations:)?\n\s+required: true', '', fields['verbose'])
fields = {
'no_skip': NO_SKIP,
'verbose': VERBOSE,
'verbose_optional': re.sub(r'(\n\s+validations:)?\n\s+required: true', '', VERBOSE),
}
infile, outfile = get_filename_args(has_infile=True)
write_file(outfile, read_file(infile) % fields)

View File

@ -10,6 +10,9 @@
from inspect import getsource
from devscripts.utils import get_filename_args, read_file, write_file
from yt_dlp.extractor import import_extractors
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
from yt_dlp.globals import extractors
NO_ATTR = object()
STATIC_CLASS_PROPERTIES = [
@ -38,8 +41,7 @@ def main():
lazy_extractors_filename = get_filename_args(default_outfile='yt_dlp/extractor/lazy_extractors.py')
from yt_dlp.extractor.extractors import _ALL_CLASSES
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
import_extractors()
DummyInfoExtractor = type('InfoExtractor', (InfoExtractor,), {'IE_NAME': NO_ATTR})
module_src = '\n'.join((
@ -47,7 +49,7 @@ def main():
' _module = None',
*extra_ie_code(DummyInfoExtractor),
'\nclass LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n',
*build_ies(_ALL_CLASSES, (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor),
*build_ies(list(extractors.value.values()), (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor),
))
write_file(lazy_extractors_filename, f'{module_src}\n')
@ -73,7 +75,7 @@ def build_ies(ies, bases, attr_base):
if ie in ies:
names.append(ie.__name__)
yield f'\n_ALL_CLASSES = [{", ".join(names)}]'
yield '\n_CLASS_LOOKUP = {%s}' % ', '.join(f'{name!r}: {name}' for name in names)
def sort_ies(ies, ignored_bases):

View File

@ -10,10 +10,21 @@
from devscripts.utils import get_filename_args, write_file
from yt_dlp.extractor import list_extractor_classes
TEMPLATE = '''\
# Supported sites
Below is a list of all extractors that are currently included with yt-dlp.
If a site is not listed here, it might still be supported by yt-dlp's embed extraction or generic extractor.
Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them.
The only reliable way to check if a site is supported is to try it.
{ie_list}
'''
def main():
out = '\n'.join(ie.description() for ie in list_extractor_classes() if ie.IE_DESC is not False)
write_file(get_filename_args(), f'# Supported sites\n{out}\n')
write_file(get_filename_args(), TEMPLATE.format(ie_list=out))
if __name__ == '__main__':

View File

@ -25,7 +25,8 @@ def parse_args():
def run_tests(*tests, pattern=None, ci=False):
run_core = 'core' in tests or (not pattern and not tests)
# XXX: hatch uses `tests` if no arguments are passed
run_core = 'core' in tests or 'tests' in tests or (not pattern and not tests)
run_download = 'download' in tests
pytest_args = args.pytest_args or os.getenv('HATCH_TEST_ARGS', '')

View File

@ -384,6 +384,7 @@ select = [
"W391",
"W504",
]
exclude = "*/extractor/lazy_extractors.py,*venv*,*/test/testdata/sigs/player-*.js,.idea,.vscode"
[tool.pytest.ini_options]
addopts = "-ra -v --strict-markers"

View File

@ -1,4 +1,10 @@
# Supported sites
Below is a list of all extractors that are currently included with yt-dlp.
If a site is not listed here, it might still be supported by yt-dlp's embed extraction or generic extractor.
Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them.
The only reliable way to check if a site is supported is to try it.
- **17live**
- **17live:clip**
- **1News**: 1news.co.nz article videos
@ -314,7 +320,8 @@ # Supported sites
- **curiositystream**: [*curiositystream*](## "netrc machine")
- **curiositystream:collections**: [*curiositystream*](## "netrc machine")
- **curiositystream:series**: [*curiositystream*](## "netrc machine")
- **CWTV**
- **cwtv**
- **cwtv:movie**
- **Cybrary**: [*cybrary*](## "netrc machine")
- **CybraryCourse**: [*cybrary*](## "netrc machine")
- **DacastPlaylist**
@ -349,6 +356,7 @@ # Supported sites
- **DigitalConcertHall**: [*digitalconcerthall*](## "netrc machine") DigitalConcertHall extractor
- **DigitallySpeaking**
- **Digiteka**
- **Digiview**
- **DiscogsReleasePlaylist**
- **DiscoveryLife**
- **DiscoveryNetworksDe**
@ -465,9 +473,9 @@ # Supported sites
- **fptplay**: fptplay.vn
- **FranceCulture**
- **FranceInter**
- **FranceTV**
- **francetv**
- **francetv:site**
- **francetvinfo.fr**
- **FranceTVSite**
- **Freesound**
- **freespeech.org**
- **freetv:series**
@ -499,7 +507,7 @@ # Supported sites
- **GediDigital**
- **gem.cbc.ca**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:live**
- **gem.cbc.ca:playlist**
- **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine")
- **Genius**
- **GeniusLyrics**
- **Germanupa**: germanupa.de

View File

@ -101,87 +101,109 @@ def getwebpagetestcases():
md5 = lambda s: hashlib.md5(s.encode()).hexdigest()
def expect_value(self, got, expected, field):
if isinstance(expected, str) and expected.startswith('re:'):
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
def _iter_differences(got, expected, field):
if isinstance(expected, str):
op, _, val = expected.partition(':')
if op in ('mincount', 'maxcount', 'count'):
if not isinstance(got, (list, dict)):
yield field, f'expected either {list.__name__} or {dict.__name__}, got {type(got).__name__}'
return
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
match_rex.match(got),
f'field {field} (value: {got!r}) should match {match_str!r}')
elif isinstance(expected, str) and expected.startswith('startswith:'):
start_str = expected[len('startswith:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
got.startswith(start_str),
f'field {field} (value: {got!r}) should start with {start_str!r}')
elif isinstance(expected, str) and expected.startswith('contains:'):
contains_str = expected[len('contains:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
contains_str in got,
f'field {field} (value: {got!r}) should contain {contains_str!r}')
elif isinstance(expected, type):
self.assertTrue(
isinstance(got, expected),
f'Expected type {expected!r} for field {field}, but got value {got!r} of type {type(got)!r}')
elif isinstance(expected, dict) and isinstance(got, dict):
expect_dict(self, got, expected)
elif isinstance(expected, list) and isinstance(got, list):
self.assertEqual(
len(expected), len(got),
f'Expect a list of length {len(expected)}, but got a list of length {len(got)} for field {field}')
for index, (item_got, item_expected) in enumerate(zip(got, expected)):
type_got = type(item_got)
type_expected = type(item_expected)
self.assertEqual(
type_expected, type_got,
f'Type mismatch for list item at index {index} for field {field}, '
f'expected {type_expected!r}, got {type_got!r}')
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, str),
f'Expected field {field} to be a unicode object, but got value {got!r} of type {type(got)!r}')
got = 'md5:' + md5(got)
elif isinstance(expected, str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue(
isinstance(got, (list, dict)),
f'Expected field {field} to be a list or a dict, but it is of type {type(got).__name__}')
op, _, expected_num = expected.partition(':')
expected_num = int(expected_num)
expected_num = int(val)
got_num = len(got)
if op == 'mincount':
assert_func = assertGreaterEqual
msg_tmpl = 'Expected %d items in field %s, but only got %d'
elif op == 'maxcount':
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d'
elif op == 'count':
assert_func = assertEqual
msg_tmpl = 'Expected exactly %d items in field %s, but got %d'
else:
assert False
assert_func(
self, len(got), expected_num,
msg_tmpl % (expected_num, field, len(got)))
if got_num < expected_num:
yield field, f'expected at least {val} items, got {got_num}'
return
if op == 'maxcount':
if got_num > expected_num:
yield field, f'expected at most {val} items, got {got_num}'
return
assert op == 'count'
if got_num != expected_num:
yield field, f'expected exactly {val} items, got {got_num}'
return
self.assertEqual(
expected, got,
f'Invalid value for field {field}, expected {expected!r}, got {got!r}')
if not isinstance(got, str):
yield field, f'expected {str.__name__}, got {type(got).__name__}'
return
if op == 're':
if not re.match(val, got):
yield field, f'should match {val!r}, got {got!r}'
return
if op == 'startswith':
if not val.startswith(got):
yield field, f'should start with {val!r}, got {got!r}'
return
if op == 'contains':
if not val.startswith(got):
yield field, f'should contain {val!r}, got {got!r}'
return
if op == 'md5':
hash_val = md5(got)
if hash_val != val:
yield field, f'expected hash {val}, got {hash_val}'
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
return
if isinstance(expected, dict) and isinstance(got, dict):
for key, expected_val in expected.items():
if key not in got:
yield field, f'missing key: {key!r}'
continue
field_name = key if field is None else f'{field}.{key}'
yield from _iter_differences(got[key], expected_val, field_name)
return
if isinstance(expected, type):
if not isinstance(got, expected):
yield field, f'expected {expected.__name__}, got {type(got).__name__}'
return
if isinstance(expected, list) and isinstance(got, list):
# TODO: clever diffing algorithm lmao
if len(expected) != len(got):
yield field, f'expected length of {len(expected)}, got {len(got)}'
return
for index, (got_val, expected_val) in enumerate(zip(got, expected)):
field_name = str(index) if field is None else f'{field}.{index}'
yield from _iter_differences(got_val, expected_val, field_name)
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
def _expect_value(message, got, expected, field):
mismatches = list(_iter_differences(got, expected, field))
if not mismatches:
return
fields = [field for field, _ in mismatches if field is not None]
return ''.join((
message, f' ({", ".join(fields)})' if fields else '',
*(f'\n\t{field}: {message}' for field, message in mismatches)))
def expect_value(self, got, expected, field):
if message := _expect_value('values differ', got, expected, field):
self.fail(message)
def expect_dict(self, got_dict, expected_dict):
for info_field, expected in expected_dict.items():
got = got_dict.get(info_field)
expect_value(self, got, expected, info_field)
if message := _expect_value('dictionaries differ', got_dict, expected_dict, None):
self.fail(message)
def sanitize_got_info_dict(got_dict):
@ -237,6 +259,20 @@ def sanitize(key, value):
def expect_info_dict(self, got_dict, expected_dict):
ALLOWED_KEYS_SORT_ORDER = (
# NB: Keep in sync with the docstring of extractor/common.py
'id', 'ext', 'direct', 'display_id', 'title', 'alt_title', 'description', 'media_type',
'uploader', 'uploader_id', 'uploader_url', 'channel', 'channel_id', 'channel_url', 'channel_is_verified',
'channel_follower_count', 'comment_count', 'view_count', 'concurrent_view_count',
'like_count', 'dislike_count', 'repost_count', 'average_rating', 'age_limit', 'duration', 'thumbnail', 'heatmap',
'chapters', 'chapter', 'chapter_number', 'chapter_id', 'start_time', 'end_time', 'section_start', 'section_end',
'categories', 'tags', 'cast', 'composers', 'artists', 'album_artists', 'creators', 'genres',
'track', 'track_number', 'track_id', 'album', 'album_type', 'disc_number',
'series', 'series_id', 'season', 'season_number', 'season_id', 'episode', 'episode_number', 'episode_id',
'timestamp', 'upload_date', 'release_timestamp', 'release_date', 'release_year', 'modified_timestamp', 'modified_date',
'playable_in_embed', 'availability', 'live_status', 'location', 'license', '_old_archive_ids',
)
expect_dict(self, got_dict, expected_dict)
# Check for the presence of mandatory fields
if got_dict.get('_type') not in ('playlist', 'multi_video'):
@ -252,7 +288,13 @@ def expect_info_dict(self, got_dict, expected_dict):
test_info_dict = sanitize_got_info_dict(got_dict)
missing_keys = set(test_info_dict.keys()) - set(expected_dict.keys())
# Check for invalid/misspelled field names being returned by the extractor
invalid_keys = sorted(test_info_dict.keys() - ALLOWED_KEYS_SORT_ORDER)
self.assertFalse(invalid_keys, f'Invalid fields returned by the extractor: {", ".join(invalid_keys)}')
missing_keys = sorted(
test_info_dict.keys() - expected_dict.keys(),
key=lambda x: ALLOWED_KEYS_SORT_ORDER.index(x))
if missing_keys:
def _repr(v):
if isinstance(v, str):

View File

@ -6,6 +6,8 @@
import unittest
from unittest.mock import patch
from yt_dlp.globals import all_plugins_loaded
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -1427,6 +1429,12 @@ def check_for_cookie_header(result):
self.assertFalse(result.get('cookies'), msg='Cookies set in cookies field for wrong domain')
self.assertFalse(ydl.cookiejar.get_cookie_header(fmt['url']), msg='Cookies set in cookiejar for wrong domain')
def test_load_plugins_compat(self):
# Should try to reload plugins if they haven't already been loaded
all_plugins_loaded.value = False
FakeYDL().close()
assert all_plugins_loaded.value
if __name__ == '__main__':
unittest.main()

View File

@ -9,7 +9,7 @@
import math
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter, js_number_to_string
class NaN:
@ -93,6 +93,16 @@ def test_operators(self):
self._test('function f(){return 0 ?? 42;}', 0)
self._test('function f(){return "life, the universe and everything" < 42;}', False)
self._test('function f(){return 0 - 7 * - 6;}', 42)
self._test('function f(){return true << "5";}', 32)
self._test('function f(){return true << true;}', 2)
self._test('function f(){return "19" & "21.9";}', 17)
self._test('function f(){return "19" & false;}', 0)
self._test('function f(){return "11.0" >> "2.1";}', 2)
self._test('function f(){return 5 ^ 9;}', 12)
self._test('function f(){return 0.0 << NaN}', 0)
self._test('function f(){return null << undefined}', 0)
# TODO: Does not work due to number too large
# self._test('function f(){return 21 << 4294967297}', 42)
def test_array_access(self):
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])
@ -431,6 +441,27 @@ def test_slice(self):
self._test('function f(){return "012345678".slice(-1, 1)}', '')
self._test('function f(){return "012345678".slice(-3, -1)}', '67')
def test_js_number_to_string(self):
for test, radix, expected in [
(0, None, '0'),
(-0, None, '0'),
(0.0, None, '0'),
(-0.0, None, '0'),
(math.nan, None, 'NaN'),
(-math.nan, None, 'NaN'),
(math.inf, None, 'Infinity'),
(-math.inf, None, '-Infinity'),
(10 ** 21.5, 8, '526665530627250154000000'),
(6, 2, '110'),
(254, 16, 'fe'),
(-10, 2, '-1010'),
(-0xff, 2, '-11111111'),
(0.1 + 0.2, 16, '0.4cccccccccccd'),
(1234.1234, 10, '1234.1234'),
# (1000000000000000128, 10, '1000000000000000100')
]:
assert js_number_to_string(test, radix) == expected
if __name__ == '__main__':
unittest.main()

View File

@ -720,6 +720,15 @@ def test_allproxy(self, handler):
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close()
@pytest.mark.skip_handlers_if(lambda _, handler: handler not in ['Urllib', 'CurlCFFI'], 'handler does not support keep_header_casing')
def test_keep_header_casing(self, handler):
with handler() as rh:
res = validate_and_send(
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', headers={'X-test-heaDer': 'test'}, extensions={'keep_header_casing': True})).read().decode()
assert 'X-test-heaDer: test' in res
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate:
@ -1289,6 +1298,7 @@ class HTTPSupportedRH(ValidationRH):
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': True}, UnsupportedRequest),
]),
('Requests', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),
@ -1299,6 +1309,9 @@ class HTTPSupportedRH(ValidationRH):
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': False}, False),
({'keep_header_casing': True}, False),
({'keep_header_casing': 'notabool'}, AssertionError),
]),
('CurlCFFI', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),

View File

@ -10,22 +10,71 @@
sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches()
from yt_dlp.utils import Config
from yt_dlp.plugins import PACKAGE_NAME, directories, load_plugins
from yt_dlp.plugins import (
PACKAGE_NAME,
PluginSpec,
directories,
load_plugins,
load_all_plugins,
register_plugin_spec,
)
from yt_dlp.globals import (
extractors,
postprocessors,
plugin_dirs,
plugin_ies,
plugin_pps,
all_plugins_loaded,
plugin_specs,
)
EXTRACTOR_PLUGIN_SPEC = PluginSpec(
module_name='extractor',
suffix='IE',
destination=extractors,
plugin_destination=plugin_ies,
)
POSTPROCESSOR_PLUGIN_SPEC = PluginSpec(
module_name='postprocessor',
suffix='PP',
destination=postprocessors,
plugin_destination=plugin_pps,
)
def reset_plugins():
plugin_ies.value = {}
plugin_pps.value = {}
plugin_dirs.value = ['default']
plugin_specs.value = {}
all_plugins_loaded.value = False
# Clearing override plugins is probably difficult
for module_name in tuple(sys.modules):
for plugin_type in ('extractor', 'postprocessor'):
if module_name.startswith(f'{PACKAGE_NAME}.{plugin_type}.'):
del sys.modules[module_name]
importlib.invalidate_caches()
class TestPlugins(unittest.TestCase):
TEST_PLUGIN_DIR = TEST_DATA_DIR / PACKAGE_NAME
def setUp(self):
reset_plugins()
def tearDown(self):
reset_plugins()
def test_directories_containing_plugins(self):
self.assertIn(self.TEST_PLUGIN_DIR, map(Path, directories()))
def test_extractor_classes(self):
for module_name in tuple(sys.modules):
if module_name.startswith(f'{PACKAGE_NAME}.extractor'):
del sys.modules[module_name]
plugins_ie = load_plugins('extractor', 'IE')
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn('NormalPluginIE', plugins_ie.keys())
@ -35,17 +84,29 @@ def test_extractor_classes(self):
f'{PACKAGE_NAME}.extractor._ignore' in sys.modules,
'loaded module beginning with underscore')
self.assertNotIn('IgnorePluginIE', plugins_ie.keys())
self.assertNotIn('IgnorePluginIE', plugin_ies.value)
# Don't load extractors with underscore prefix
self.assertNotIn('_IgnoreUnderscorePluginIE', plugins_ie.keys())
self.assertNotIn('_IgnoreUnderscorePluginIE', plugin_ies.value)
# Don't load extractors not specified in __all__ (if supplied)
self.assertNotIn('IgnoreNotInAllPluginIE', plugins_ie.keys())
self.assertNotIn('IgnoreNotInAllPluginIE', plugin_ies.value)
self.assertIn('InAllPluginIE', plugins_ie.keys())
self.assertIn('InAllPluginIE', plugin_ies.value)
# Don't load override extractors
self.assertNotIn('OverrideGenericIE', plugins_ie.keys())
self.assertNotIn('OverrideGenericIE', plugin_ies.value)
self.assertNotIn('_UnderscoreOverrideGenericIE', plugins_ie.keys())
self.assertNotIn('_UnderscoreOverrideGenericIE', plugin_ies.value)
def test_postprocessor_classes(self):
plugins_pp = load_plugins('postprocessor', 'PP')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
self.assertIn('NormalPluginPP', plugin_pps.value)
def test_importing_zipped_module(self):
zip_path = TEST_DATA_DIR / 'zipped_plugins.zip'
@ -58,10 +119,10 @@ def test_importing_zipped_module(self):
package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(zip_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE')
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginIE', plugins_ie.keys())
plugins_pp = load_plugins('postprocessor', 'PP')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginPP', plugins_pp.keys())
finally:
@ -69,23 +130,116 @@ def test_importing_zipped_module(self):
os.remove(zip_path)
importlib.invalidate_caches() # reset the import caches
def test_plugin_dirs(self):
# Internal plugin dirs hack for CLI --plugin-dirs
# To be replaced with proper system later
custom_plugin_dir = TEST_DATA_DIR / 'plugin_packages'
Config._plugin_dirs = [str(custom_plugin_dir)]
importlib.invalidate_caches() # reset the import caches
def test_reloading_plugins(self):
reload_plugins_path = TEST_DATA_DIR / 'reload_plugins'
load_plugins(EXTRACTOR_PLUGIN_SPEC)
load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
# Remove default folder and add reload_plugin path
sys.path.remove(str(TEST_DATA_DIR))
sys.path.append(str(reload_plugins_path))
importlib.invalidate_caches()
try:
package = importlib.import_module(f'{PACKAGE_NAME}.extractor')
self.assertIn(custom_plugin_dir / 'testpackage' / PACKAGE_NAME / 'extractor', map(Path, package.__path__))
for plugin_type in ('extractor', 'postprocessor'):
package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(reload_plugins_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE')
self.assertIn('PackagePluginIE', plugins_ie.keys())
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('NormalPluginIE', plugins_ie.keys())
self.assertTrue(
plugins_ie['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin')
self.assertTrue(
extractors.value['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin globally')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertTrue(plugins_pp['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin')
self.assertTrue(
postprocessors.value['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin globally')
finally:
Config._plugin_dirs = []
importlib.invalidate_caches() # reset the import caches
sys.path.remove(str(reload_plugins_path))
sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches()
def test_extractor_override_plugin(self):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.TEST_FIELD, 'override')
self.assertEqual(GenericIE.SECONDARY_TEST_FIELD, 'underscore-override')
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
importlib.invalidate_caches()
# test that loading a second time doesn't wrap a second time
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
def test_load_all_plugin_types(self):
# no plugin specs registered
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
load_all_plugins()
self.assertTrue(all_plugins_loaded.value)
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_no_plugin_dirs(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
plugin_dirs.value = []
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_set_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
plugin_dirs.value = [custom_plugin_dir]
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_invalid_plugin_dir(self):
plugin_dirs.value = ['invalid_dir']
with self.assertRaises(ValueError):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
def test_append_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
self.assertEqual(plugin_dirs.value, ['default'])
plugin_dirs.value.append(custom_plugin_dir)
self.assertEqual(plugin_dirs.value, ['default', custom_plugin_dir])
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_get_plugin_spec(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('extractor'), EXTRACTOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('postprocessor'), POSTPROCESSOR_PLUGIN_SPEC)
self.assertIsNone(plugin_specs.value.get('invalid'))
if __name__ == '__main__':

View File

@ -3,19 +3,20 @@
# Allow direct execution
import os
import sys
import unittest
import unittest.mock
import warnings
import datetime as dt
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import contextlib
import datetime as dt
import io
import itertools
import json
import pickle
import subprocess
import unittest
import unittest.mock
import warnings
import xml.etree.ElementTree
from yt_dlp.compat import (
@ -2087,21 +2088,26 @@ def test_http_header_dict(self):
headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '0')])
headers['ytdl-test'] = 1
self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '1')])
headers['Ytdl-test'] = '2'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')])
self.assertEqual(list(headers.sensitive().items()), [('Ytdl-test', '2')])
self.assertTrue('ytDl-Test' in headers)
self.assertEqual(str(headers), str(dict(headers)))
self.assertEqual(repr(headers), str(dict(headers)))
headers.update({'X-dlp': 'data'})
self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')})
self.assertEqual(set(headers.sensitive().items()), {('Ytdl-test', '2'), ('X-dlp', 'data')})
self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'})
self.assertEqual(len(headers), 2)
self.assertEqual(headers.copy(), headers)
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, **headers, **{'X-dlp': 'data2'})
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, headers, **{'X-dlP': 'data2'})
self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')})
self.assertEqual(set(headers2.sensitive().items()), {('Ytdl-test', '2'), ('X-dlP', 'data2')})
self.assertEqual(len(headers2), 2)
headers2.clear()
self.assertEqual(len(headers2), 0)
@ -2109,16 +2115,23 @@ def test_http_header_dict(self):
# ensure we prefer latter headers
headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2})
self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')})
self.assertEqual(set(headers3.sensitive().items()), {('Ytdl-test', '2')})
del headers3['ytdl-tesT']
self.assertEqual(dict(headers3), {})
headers4 = HTTPHeaderDict({'ytdl-test': 'data;'})
self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers4.sensitive().items()), {('ytdl-test', 'data;')})
# common mistake: strip whitespace from values
# https://github.com/yt-dlp/yt-dlp/issues/8729
headers5 = HTTPHeaderDict({'ytdl-test': ' data; '})
self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers5.sensitive().items()), {('ytdl-test', 'data;')})
# test if picklable
headers6 = HTTPHeaderDict(a=1, b=2)
self.assertEqual(pickle.loads(pickle.dumps(headers6)), headers6)
def test_extract_basic_auth(self):
assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None)

View File

@ -44,7 +44,7 @@ def websocket_handler(websocket):
return websocket.send('2')
elif isinstance(message, str):
if message == 'headers':
return websocket.send(json.dumps(dict(websocket.request.headers)))
return websocket.send(json.dumps(dict(websocket.request.headers.raw_items())))
elif message == 'path':
return websocket.send(websocket.request.path)
elif message == 'source_address':
@ -266,18 +266,18 @@ def test_cookies(self, handler):
with handler(cookiejar=cookiejar) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
with handler() as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -287,7 +287,7 @@ def test_cookie_sync_only_cookiejar(self, handler):
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()}))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -298,12 +298,12 @@ def test_cookie_sync_delete_cookie(self, handler):
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie'))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
cookiejar.clear_session_cookies()
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
def test_source_address(self, handler):
@ -341,6 +341,14 @@ def test_request_headers(self, handler):
assert headers['test3'] == 'test3'
ws.close()
def test_keep_header_casing(self, handler):
with handler(headers=HTTPHeaderDict({'x-TeSt1': 'test'})) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url, headers={'x-TeSt2': 'test'}, extensions={'keep_header_casing': True}))
ws.send('headers')
headers = json.loads(ws.recv())
assert 'x-TeSt1' in headers
assert 'x-TeSt2' in headers
@pytest.mark.parametrize('client_cert', (
{'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')},
{

View File

@ -201,6 +201,10 @@
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
'YWt1qdbe8SAfkoPHW5d', 'RrRjWQOJmBiP',
),
(
'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js',
'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg',
),
]

View File

@ -2,4 +2,5 @@
class PackagePluginIE(InfoExtractor):
_VALID_URL = 'package'
pass

View File

@ -0,0 +1,10 @@
from yt_dlp.extractor.common import InfoExtractor
class NormalPluginIE(InfoExtractor):
_VALID_URL = 'normal'
REPLACED = True
class _IgnoreUnderscorePluginIE(InfoExtractor):
pass

View File

@ -0,0 +1,5 @@
from yt_dlp.postprocessor.common import PostProcessor
class NormalPluginPP(PostProcessor):
REPLACED = True

View File

@ -6,6 +6,7 @@ class IgnoreNotInAllPluginIE(InfoExtractor):
class InAllPluginIE(InfoExtractor):
_VALID_URL = 'inallpluginie'
pass

View File

@ -2,8 +2,10 @@
class NormalPluginIE(InfoExtractor):
pass
_VALID_URL = 'normalpluginie'
REPLACED = False
class _IgnoreUnderscorePluginIE(InfoExtractor):
_VALID_URL = 'ignoreunderscorepluginie'
pass

View File

@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class OverrideGenericIE(GenericIE, plugin_name='override'):
TEST_FIELD = 'override'

View File

@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class _UnderscoreOverrideGenericIE(GenericIE, plugin_name='underscore-override'):
SECONDARY_TEST_FIELD = 'underscore-override'

View File

@ -2,4 +2,4 @@
class NormalPluginPP(PostProcessor):
pass
REPLACED = False

View File

@ -2,4 +2,5 @@
class ZippedPluginIE(InfoExtractor):
_VALID_URL = 'zippedpluginie'
pass

View File

@ -30,9 +30,18 @@
from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor
from .extractor import gen_extractor_classes, get_info_extractor, import_extractors
from .extractor.common import UnsupportedURLIE
from .extractor.openload import PhantomJSwrapper
from .globals import (
IN_CLI,
LAZY_EXTRACTORS,
plugin_ies,
plugin_ies_overrides,
plugin_pps,
all_plugins_loaded,
plugin_dirs,
)
from .minicurses import format_text
from .networking import HEADRequest, Request, RequestDirector
from .networking.common import _REQUEST_HANDLERS, _RH_PREFERENCES
@ -44,8 +53,7 @@
network_exceptions,
)
from .networking.impersonate import ImpersonateRequestHandler
from .plugins import directories as plugin_directories
from .postprocessor import _PLUGIN_CLASSES as plugin_pps
from .plugins import directories as plugin_directories, load_all_plugins
from .postprocessor import (
EmbedThumbnailPP,
FFmpegFixupDuplicateMoovPP,
@ -158,7 +166,7 @@
write_json_file,
write_string,
)
from .utils._utils import _UnsafeExtensionError, _YDLLogger
from .utils._utils import _UnsafeExtensionError, _YDLLogger, _ProgressState
from .utils.networking import (
HTTPHeaderDict,
clean_headers,
@ -599,7 +607,7 @@ class YoutubeDL:
# NB: Keep in sync with the docstring of extractor/common.py
'url', 'manifest_url', 'manifest_stream_number', 'ext', 'format', 'format_id', 'format_note',
'width', 'height', 'aspect_ratio', 'resolution', 'dynamic_range', 'tbr', 'abr', 'acodec', 'asr', 'audio_channels',
'vbr', 'fps', 'vcodec', 'container', 'filesize', 'filesize_approx', 'rows', 'columns',
'vbr', 'fps', 'vcodec', 'container', 'filesize', 'filesize_approx', 'rows', 'columns', 'hls_media_playlist_data',
'player_url', 'protocol', 'fragment_base_url', 'fragments', 'is_from_start', 'is_dash_periods', 'request_data',
'preference', 'language', 'language_preference', 'quality', 'source_preference', 'cookies',
'http_headers', 'stretched_ratio', 'no_resume', 'has_drm', 'extra_param_to_segment_url', 'extra_param_to_key_url',
@ -643,20 +651,23 @@ def __init__(self, params=None, auto_init=True):
self.cache = Cache(self)
self.__header_cookies = []
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout
self._out_files = Namespace(
out=stdout,
error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout,
console=None if os.name == 'nt' else next(
filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
)
# compat for API: load plugins if they have not already
if not all_plugins_loaded.value:
load_all_plugins()
try:
windows_enable_vt_mode()
except Exception as e:
self.write_debug(f'Failed to enable VT mode: {e}')
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout
self._out_files = Namespace(
out=stdout,
error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout,
console=next(filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
)
if self.params.get('no_color'):
if self.params.get('color') is not None:
self.params.setdefault('_warnings', []).append(
@ -957,21 +968,22 @@ def to_stderr(self, message, only_once=False):
self._write_string(f'{self._bidi_workaround(message)}\n', self._out_files.error, only_once=only_once)
def _send_console_code(self, code):
if os.name == 'nt' or not self._out_files.console:
return
if not supports_terminal_sequences(self._out_files.console):
return False
self._write_string(code, self._out_files.console)
return True
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
def to_console_title(self, message=None, progress_state=None, percent=None):
if not self.params.get('consoletitle'):
return
message = remove_terminal_sequences(message)
if os.name == 'nt':
if ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
else:
self._send_console_code(f'\033]0;{message}\007')
if message:
success = self._send_console_code(f'\033]0;{remove_terminal_sequences(message)}\007')
if not success and os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
ctypes.windll.kernel32.SetConsoleTitleW(message)
if isinstance(progress_state, _ProgressState):
self._send_console_code(progress_state.get_ansi_escape(percent))
def save_console_title(self):
if not self.params.get('consoletitle') or self.params.get('simulate'):
@ -985,6 +997,7 @@ def restore_console_title(self):
def __enter__(self):
self.save_console_title()
self.to_console_title(progress_state=_ProgressState.INDETERMINATE)
return self
def save_cookies(self):
@ -993,6 +1006,7 @@ def save_cookies(self):
def __exit__(self, *args):
self.restore_console_title()
self.to_console_title(progress_state=_ProgressState.HIDDEN)
self.close()
def close(self):
@ -4012,15 +4026,6 @@ def print_debug_header(self):
if not self.params.get('verbose'):
return
from . import _IN_CLI # Must be delayed import
# These imports can be slow. So import them only as needed
from .extractor.extractors import _LAZY_LOADER
from .extractor.extractors import (
_PLUGIN_CLASSES as plugin_ies,
_PLUGIN_OVERRIDES as plugin_ie_overrides,
)
def get_encoding(stream):
ret = str(getattr(stream, 'encoding', f'missing ({type(stream).__name__})'))
additional_info = []
@ -4059,17 +4064,18 @@ def get_encoding(stream):
_make_label(ORIGIN, CHANNEL.partition('@')[2] or __version__, __version__),
f'[{RELEASE_GIT_HEAD[:9]}]' if RELEASE_GIT_HEAD else '',
'' if source == 'unknown' else f'({source})',
'' if _IN_CLI else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}',
'' if IN_CLI.value else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}',
delim=' '))
if not _IN_CLI:
if not IN_CLI.value:
write_debug(f'params: {self.params}')
if not _LAZY_LOADER:
if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
write_debug('Lazy loading extractors is forcibly disabled')
else:
write_debug('Lazy loading extractors is disabled')
import_extractors()
lazy_extractors = LAZY_EXTRACTORS.value
if lazy_extractors is None:
write_debug('Lazy loading extractors is disabled')
elif not lazy_extractors:
write_debug('Lazy loading extractors is forcibly disabled')
if self.params['compat_opts']:
write_debug('Compatibility options: {}'.format(', '.join(self.params['compat_opts'])))
@ -4098,24 +4104,27 @@ def get_encoding(stream):
write_debug(f'Proxy map: {self.proxies}')
write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}')
if os.environ.get('YTDLP_NO_PLUGINS'):
write_debug('Plugins are forcibly disabled')
return
for plugin_type, plugins in {'Extractor': plugin_ies, 'Post-Processor': plugin_pps}.items():
display_list = ['{}{}'.format(
klass.__name__, '' if klass.__name__ == name else f' as {name}')
for name, klass in plugins.items()]
for plugin_type, plugins in (('Extractor', plugin_ies), ('Post-Processor', plugin_pps)):
display_list = [
klass.__name__ if klass.__name__ == name else f'{klass.__name__} as {name}'
for name, klass in plugins.value.items()]
if plugin_type == 'Extractor':
display_list.extend(f'{plugins[-1].IE_NAME.partition("+")[2]} ({parent.__name__})'
for parent, plugins in plugin_ie_overrides.items())
for parent, plugins in plugin_ies_overrides.value.items())
if not display_list:
continue
write_debug(f'{plugin_type} Plugins: {", ".join(sorted(display_list))}')
plugin_dirs = plugin_directories()
if plugin_dirs:
write_debug(f'Plugin directories: {plugin_dirs}')
plugin_dirs_msg = 'none'
if not plugin_dirs.value:
plugin_dirs_msg = 'none (disabled)'
else:
found_plugin_directories = plugin_directories()
if found_plugin_directories:
plugin_dirs_msg = ', '.join(found_plugin_directories)
write_debug(f'Plugin directories: {plugin_dirs_msg}')
@functools.cached_property
def proxies(self):

View File

@ -19,7 +19,9 @@
from .extractor import list_extractor_classes
from .extractor.adobepass import MSO_INFO
from .networking.impersonate import ImpersonateTarget
from .globals import IN_CLI, plugin_dirs
from .options import parseOpts
from .plugins import load_all_plugins as _load_all_plugins
from .postprocessor import (
FFmpegExtractAudioPP,
FFmpegMergerPP,
@ -33,7 +35,6 @@
)
from .update import Updater
from .utils import (
Config,
NO_DEFAULT,
POSTPROCESS_WHEN,
DateRange,
@ -66,8 +67,6 @@
from .utils._utils import _UnsafeExtensionError
from .YoutubeDL import YoutubeDL
_IN_CLI = False
def _exit(status=0, *args):
for msg in args:
@ -308,18 +307,20 @@ def parse_sleep_func(expr):
raise ValueError(f'invalid {key} retry sleep expression {expr!r}')
# Bytes
def validate_bytes(name, value):
def validate_bytes(name, value, strict_positive=False):
if value is None:
return None
numeric_limit = parse_bytes(value)
validate(numeric_limit is not None, 'rate limit', value)
validate(numeric_limit is not None, name, value)
if strict_positive:
validate_positive(name, numeric_limit, True)
return numeric_limit
opts.ratelimit = validate_bytes('rate limit', opts.ratelimit)
opts.ratelimit = validate_bytes('rate limit', opts.ratelimit, True)
opts.throttledratelimit = validate_bytes('throttled rate limit', opts.throttledratelimit)
opts.min_filesize = validate_bytes('min filesize', opts.min_filesize)
opts.max_filesize = validate_bytes('max filesize', opts.max_filesize)
opts.buffersize = validate_bytes('buffer size', opts.buffersize)
opts.buffersize = validate_bytes('buffer size', opts.buffersize, True)
opts.http_chunk_size = validate_bytes('http chunk size', opts.http_chunk_size)
# Output templates
@ -444,6 +445,10 @@ def metadataparser_actions(f):
}
# Other options
opts.plugin_dirs = opts.plugin_dirs
if opts.plugin_dirs is None:
opts.plugin_dirs = ['default']
if opts.playlist_items is not None:
try:
tuple(PlaylistEntries.parse_playlist_items(opts.playlist_items))
@ -984,11 +989,6 @@ def _real_main(argv=None):
parser, opts, all_urls, ydl_opts = parse_options(argv)
# HACK: Set the plugin dirs early on
# TODO(coletdjnz): remove when plugin globals system is implemented
if opts.plugin_dirs is not None:
Config._plugin_dirs = list(map(expand_path, opts.plugin_dirs))
# Dump user agent
if opts.dump_user_agent:
ua = traverse_obj(opts.headers, 'User-Agent', casesense=False, default=std_headers['User-Agent'])
@ -1003,6 +1003,11 @@ def _real_main(argv=None):
if opts.ffmpeg_location:
FFmpegPostProcessor._ffmpeg_location.set(opts.ffmpeg_location)
# load all plugins into the global lookup
plugin_dirs.value = opts.plugin_dirs
if plugin_dirs.value:
_load_all_plugins()
with YoutubeDL(ydl_opts) as ydl:
pre_process = opts.update_self or opts.rm_cachedir
actual_use = all_urls or opts.load_info_filename
@ -1102,8 +1107,7 @@ def make_row(target, handler):
def main(argv=None):
global _IN_CLI
_IN_CLI = True
IN_CLI.value = True
try:
_exit(*variadic(_real_main(argv)))
except (CookieLoadError, DownloadError):

View File

@ -35,6 +35,7 @@ def get_suitable_downloader(info_dict, params={}, default=NO_DEFAULT, protocol=N
from .rtsp import RtspFD
from .websocket import WebSocketFragmentFD
from .youtube_live_chat import YoutubeLiveChatFD
from .bunnycdn import BunnyCdnFD
PROTOCOL_MAP = {
'rtmp': RtmpFD,
@ -55,6 +56,7 @@ def get_suitable_downloader(info_dict, params={}, default=NO_DEFAULT, protocol=N
'websocket_frag': WebSocketFragmentFD,
'youtube_live_chat': YoutubeLiveChatFD,
'youtube_live_chat_replay': YoutubeLiveChatFD,
'bunnycdn': BunnyCdnFD,
}

View File

@ -0,0 +1,50 @@
import hashlib
import random
import threading
from .common import FileDownloader
from . import HlsFD
from ..networking import Request
from ..networking.exceptions import network_exceptions
class BunnyCdnFD(FileDownloader):
"""
Downloads from BunnyCDN with required pings
Note, this is not a part of public API, and will be removed without notice.
DO NOT USE
"""
def real_download(self, filename, info_dict):
self.to_screen(f'[{self.FD_NAME}] Downloading from BunnyCDN')
fd = HlsFD(self.ydl, self.params)
stop_event = threading.Event()
ping_thread = threading.Thread(target=self.ping_thread, args=(stop_event,), kwargs=info_dict['_bunnycdn_ping_data'])
ping_thread.start()
try:
return fd.real_download(filename, info_dict)
finally:
stop_event.set()
def ping_thread(self, stop_event, url, headers, secret, context_id):
# Site sends ping every 4 seconds, but this throttles the download. Pinging every 2 seconds seems to work.
ping_interval = 2
# Hard coded resolution as it doesn't seem to matter
res = 1080
paused = 'false'
current_time = 0
while not stop_event.wait(ping_interval):
current_time += ping_interval
time = current_time + round(random.random(), 6)
md5_hash = hashlib.md5(f'{secret}_{context_id}_{time}_{paused}_{res}'.encode()).hexdigest()
ping_url = f'{url}?hash={md5_hash}&time={time}&paused={paused}&resolution={res}'
try:
self.ydl.urlopen(Request(ping_url, headers=headers)).read()
except network_exceptions as e:
self.to_screen(f'[{self.FD_NAME}] Ping failed: {e}')

View File

@ -31,6 +31,7 @@
timetuple_from_msec,
try_call,
)
from ..utils._utils import _ProgressState
class FileDownloader:
@ -333,7 +334,7 @@ def _report_progress_status(self, s, default_template):
progress_dict), s.get('progress_idx') or 0)
self.to_console_title(self.ydl.evaluate_outtmpl(
progress_template.get('download-title') or 'yt-dlp %(progress._default_template)s',
progress_dict))
progress_dict), _ProgressState.from_dict(s), s.get('_percent'))
def _format_progress(self, *args, **kwargs):
return self.ydl._format_text(
@ -357,6 +358,7 @@ def with_fields(*tups, default=''):
'_speed_str': self.format_speed(speed).strip(),
'_total_bytes_str': _format_bytes('total_bytes'),
'_elapsed_str': self.format_seconds(s.get('elapsed')),
'_percent': 100.0,
'_percent_str': self.format_percent(100),
})
self._report_progress_status(s, join_nonempty(
@ -375,13 +377,15 @@ def with_fields(*tups, default=''):
return
self._progress_delta_time += update_delta
progress = try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)
s.update({
'_eta_str': self.format_eta(s.get('eta')).strip(),
'_speed_str': self.format_speed(s.get('speed')),
'_percent_str': self.format_percent(try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)),
'_percent': progress,
'_percent_str': self.format_percent(progress),
'_total_bytes_str': _format_bytes('total_bytes'),
'_total_bytes_estimate_str': _format_bytes('total_bytes_estimate'),
'_downloaded_bytes_str': _format_bytes('downloaded_bytes'),

View File

@ -457,8 +457,6 @@ class FFmpegFD(ExternalFD):
@classmethod
def available(cls, path=None):
# TODO: Fix path for ffmpeg
# Fixme: This may be wrong when --ffmpeg-location is used
return FFmpegPostProcessor().available
def on_process_started(self, proc, stdin):

View File

@ -16,6 +16,7 @@
update_url_query,
urljoin,
)
from ..utils._utils import _request_dump_filename
class HlsFD(FragmentFD):
@ -72,11 +73,23 @@ def check_results():
def real_download(self, filename, info_dict):
man_url = info_dict['url']
self.to_screen(f'[{self.FD_NAME}] Downloading m3u8 manifest')
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.url
s = urlh.read().decode('utf-8', 'ignore')
s = info_dict.get('hls_media_playlist_data')
if s:
self.to_screen(f'[{self.FD_NAME}] Using m3u8 manifest from extracted info')
else:
self.to_screen(f'[{self.FD_NAME}] Downloading m3u8 manifest')
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.url
s_bytes = urlh.read()
if self.params.get('write_pages'):
dump_filename = _request_dump_filename(
man_url, info_dict['id'], None,
trim_length=self.params.get('trim_file_name'))
self.to_screen(f'[{self.FD_NAME}] Saving request to {dump_filename}')
with open(dump_filename, 'wb') as outf:
outf.write(s_bytes)
s = s_bytes.decode('utf-8', 'ignore')
can_download, message = self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')), None
if can_download:
@ -177,6 +190,7 @@ def is_ad_fragment_end(s):
if external_aes_iv:
external_aes_iv = binascii.unhexlify(remove_start(external_aes_iv, '0x').zfill(32))
byte_range = {}
byte_range_offset = 0
discontinuity_count = 0
frag_index = 0
ad_frag_next = False
@ -204,6 +218,11 @@ def is_ad_fragment_end(s):
})
media_sequence += 1
# If the byte_range is truthy, reset it after appending a fragment that uses it
if byte_range:
byte_range_offset = byte_range['end']
byte_range = {}
elif line.startswith('#EXT-X-MAP'):
if format_index and discontinuity_count != format_index:
continue
@ -217,10 +236,12 @@ def is_ad_fragment_end(s):
if extra_segment_query:
frag_url = update_url_query(frag_url, extra_segment_query)
map_byte_range = {}
if map_info.get('BYTERANGE'):
splitted_byte_range = map_info.get('BYTERANGE').split('@')
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range['end']
byte_range = {
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else 0
map_byte_range = {
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
@ -229,7 +250,7 @@ def is_ad_fragment_end(s):
'frag_index': frag_index,
'url': frag_url,
'decrypt_info': decrypt_info,
'byte_range': byte_range,
'byte_range': map_byte_range,
'media_sequence': media_sequence,
})
media_sequence += 1
@ -257,7 +278,7 @@ def is_ad_fragment_end(s):
media_sequence = int(line[22:])
elif line.startswith('#EXT-X-BYTERANGE'):
splitted_byte_range = line[17:].split('@')
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range['end']
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range_offset
byte_range = {
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),

View File

@ -1,16 +1,25 @@
from ..compat.compat_utils import passthrough_module
from ..globals import extractors as _extractors_context
from ..globals import plugin_ies as _plugin_ies_context
from ..plugins import PluginSpec, register_plugin_spec
passthrough_module(__name__, '.extractors')
del passthrough_module
register_plugin_spec(PluginSpec(
module_name='extractor',
suffix='IE',
destination=_extractors_context,
plugin_destination=_plugin_ies_context,
))
def gen_extractor_classes():
""" Return a list of supported extractors.
The order does matter; the first extractor matched is the one handling the URL.
"""
from .extractors import _ALL_CLASSES
return _ALL_CLASSES
import_extractors()
return list(_extractors_context.value.values())
def gen_extractors():
@ -37,6 +46,9 @@ def list_extractors(age_limit=None):
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
from . import extractors
import_extractors()
return _extractors_context.value[f'{ie_name}IE']
return getattr(extractors, f'{ie_name}IE')
def import_extractors():
from . import extractors # noqa: F401

View File

@ -312,6 +312,7 @@
)
from .bundesliga import BundesligaIE
from .bundestag import BundestagIE
from .bunnycdn import BunnyCdnIE
from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
@ -508,6 +509,7 @@
from .dhm import DHMIE
from .digitalconcerthall import DigitalConcertHallIE
from .digiteka import DigitekaIE
from .digiview import DigiviewIE
from .discogs import DiscogsReleasePlaylistIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
@ -1893,6 +1895,7 @@
from .smotrim import SmotrimIE
from .snapchat import SnapchatSpotlightIE
from .snotr import SnotrIE
from .softwhiteunderbelly import SoftWhiteUnderbellyIE
from .sohu import (
SohuIE,
SohuVIE,
@ -2221,6 +2224,7 @@
TVPlayIE,
)
from .tvplayer import TVPlayerIE
from .tvw import TvwIE
from .tweakers import TweakersIE
from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE

View File

@ -1,7 +1,6 @@
import json
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils.traversal import require, traverse_obj
class AZMedienIE(InfoExtractor):
@ -9,15 +8,15 @@ class AZMedienIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.|tv\.)?
(?P<host>
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch|
tvo-online\.ch
)/
[^/]+/
[^/?#]+/
(?P<id>
[^/]+-(?P<article_id>\d+)
[^/?#]+-\d+
)
(?:
\#video=
@ -47,19 +46,17 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True,
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221'
def _real_extract(self, url):
host, display_id, article_id, entry_id = self._match_valid_url(url).groups()
display_id, entry_id = self._match_valid_url(url).groups()
if not entry_id:
entry_id = self._download_json(
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
'variables': json.dumps({
'contextId': 'NewsArticle:' + article_id,
}),
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
webpage = self._download_webpage(url, display_id)
data = self._search_json(
r'window\.__APOLLO_STATE__\s*=', webpage, 'video data', display_id)
entry_id = traverse_obj(data, (
lambda _, v: v['__typename'] == 'KalturaData', 'kalturaId', any, {require('kaltura id')}))
return self.url_result(
f'kaltura:{self._PARTNER_ID}:{entry_id}',

View File

@ -0,0 +1,178 @@
import json
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
extract_attributes,
int_or_none,
parse_qs,
smuggle_url,
unsmuggle_url,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import find_element, traverse_obj
class BunnyCdnIE(InfoExtractor):
_VALID_URL = r'https?://(?:iframe\.mediadelivery\.net|video\.bunnycdn\.com)/(?:embed|play)/(?P<library_id>\d+)/(?P<id>[\da-f-]+)'
_EMBED_REGEX = [rf'<iframe[^>]+src=[\'"](?P<url>{_VALID_URL}[^\'"]*)[\'"]']
_TESTS = [{
'url': 'https://iframe.mediadelivery.net/embed/113933/e73edec1-e381-4c8b-ae73-717a140e0924',
'info_dict': {
'id': 'e73edec1-e381-4c8b-ae73-717a140e0924',
'ext': 'mp4',
'title': 'mistress morgana (3).mp4',
'description': '',
'timestamp': 1693251673,
'thumbnail': r're:^https?://.*\.b-cdn\.net/e73edec1-e381-4c8b-ae73-717a140e0924/thumbnail\.jpg',
'duration': 7.0,
'upload_date': '20230828',
},
'params': {'skip_download': True},
}, {
'url': 'https://iframe.mediadelivery.net/play/136145/32e34c4b-0d72-437c-9abb-05e67657da34',
'info_dict': {
'id': '32e34c4b-0d72-437c-9abb-05e67657da34',
'ext': 'mp4',
'timestamp': 1691145748,
'thumbnail': r're:^https?://.*\.b-cdn\.net/32e34c4b-0d72-437c-9abb-05e67657da34/thumbnail_9172dc16\.jpg',
'duration': 106.0,
'description': 'md5:981a3e899a5c78352b21ed8b2f1efd81',
'upload_date': '20230804',
'title': 'Sanela ist Teil der #arbeitsmarktkraft',
},
'params': {'skip_download': True},
}, {
# Stream requires activation and pings
'url': 'https://iframe.mediadelivery.net/embed/200867/2e8545ec-509d-4571-b855-4cf0235ccd75',
'info_dict': {
'id': '2e8545ec-509d-4571-b855-4cf0235ccd75',
'ext': 'mp4',
'timestamp': 1708497752,
'title': 'netflix part 1',
'duration': 3959.0,
'description': '',
'upload_date': '20240221',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2e8545ec-509d-4571-b855-4cf0235ccd75/thumbnail\.jpg',
},
'params': {'skip_download': True},
}]
_WEBPAGE_TESTS = [{
# Stream requires Referer
'url': 'https://conword.io/',
'info_dict': {
'id': '3a5d863e-9cd6-447e-b6ef-e289af50b349',
'ext': 'mp4',
'title': 'Conword bei der Stadt Köln und Stadt Dortmund',
'description': '',
'upload_date': '20231031',
'duration': 31.0,
'thumbnail': 'https://video.watchuh.com/3a5d863e-9cd6-447e-b6ef-e289af50b349/thumbnail.jpg',
'timestamp': 1698783879,
},
'params': {'skip_download': True},
}, {
# URL requires token and expires
'url': 'https://www.stockphotos.com/video/moscow-subway-the-train-is-arriving-at-the-park-kultury-station-10017830',
'info_dict': {
'id': '0b02fa20-4e8c-4140-8f87-f64d820a3386',
'ext': 'mp4',
'thumbnail': r're:^https?://.*\.b-cdn\.net/0b02fa20-4e8c-4140-8f87-f64d820a3386/thumbnail\.jpg',
'title': 'Moscow subway. The train is arriving at the Park Kultury station.',
'upload_date': '20240531',
'duration': 18.0,
'timestamp': 1717152269,
'description': '',
},
'params': {'skip_download': True},
}]
@classmethod
def _extract_embed_urls(cls, url, webpage):
for embed_url in super()._extract_embed_urls(url, webpage):
yield smuggle_url(embed_url, {'Referer': url})
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id, library_id = self._match_valid_url(url).group('id', 'library_id')
webpage = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/{library_id}/{video_id}', video_id,
headers=traverse_obj(smuggled_data, {'Referer': 'Referer'}),
query=traverse_obj(parse_qs(url), {'token': 'token', 'expires': 'expires'}))
if html_title := self._html_extract_title(webpage, default=None) == '403':
raise ExtractorError(
'This video is inaccessible. Setting a Referer header '
'might be required to access the video', expected=True)
elif html_title == '404':
raise ExtractorError('This video does not exist', expected=True)
headers = {'Referer': url}
info = traverse_obj(self._parse_html5_media_entries(url, webpage, video_id, _headers=headers), 0) or {}
formats = info.get('formats') or []
subtitles = info.get('subtitles') or {}
original_url = self._search_regex(
r'(?:var|const|let)\s+originalUrl\s*=\s*["\']([^"\']+)["\']', webpage, 'original url', default=None)
if url_or_none(original_url):
urlh = self._request_webpage(
HEADRequest(original_url), video_id=video_id, note='Checking original',
headers=headers, fatal=False, expected_status=(403, 404))
if urlh and urlh.status == 200:
formats.append({
'url': original_url,
'format_id': 'source',
'quality': 1,
'http_headers': headers,
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
})
# MediaCage Streams require activation and pings
src_url = self._search_regex(
r'\.setAttribute\([\'"]src[\'"],\s*[\'"]([^\'"]+)[\'"]\)', webpage, 'src url', default=None)
activation_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/activate)[\'"]', webpage, 'activation url', default=None)
ping_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/ping)[\'"]', webpage, 'ping url', default=None)
secret = traverse_obj(parse_qs(src_url), ('secret', 0))
context_id = traverse_obj(parse_qs(src_url), ('contextId', 0))
ping_data = {}
if src_url and activation_url and ping_url and secret and context_id:
self._download_webpage(
activation_url, video_id, headers=headers, note='Downloading activation data')
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, video_id, 'mp4', headers=headers, m3u8_id='hls', fatal=False)
for fmt in fmts:
fmt.update({
'protocol': 'bunnycdn',
'http_headers': headers,
})
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
ping_data = {
'_bunnycdn_ping_data': {
'url': ping_url,
'headers': headers,
'secret': secret,
'context_id': context_id,
},
}
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(webpage, ({find_element(id='main-video', html=True)}, {extract_attributes}, {
'title': ('data-plyr-config', {json.loads}, 'title', {str}),
'thumbnail': ('data-poster', {url_or_none}),
})),
**ping_data,
**self._search_json_ld(webpage, video_id, fatal=False),
}

View File

@ -1,29 +1,32 @@
import base64
import functools
import json
import re
import time
import urllib.parse
from .common import InfoExtractor
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
js_to_json,
jwt_decode_hs256,
mimetype2ext,
orderedSet,
parse_age_limit,
parse_iso8601,
replace_extension,
smuggle_url,
strip_or_none,
traverse_obj,
try_get,
unified_timestamp,
update_url,
url_basename,
url_or_none,
urlencode_postdata,
)
from ..utils.traversal import require, traverse_obj, trim_str
class CBCIE(InfoExtractor):
@ -516,9 +519,43 @@ def entries():
return self.playlist_result(entries(), playlist_id)
class CBCGemIE(InfoExtractor):
class CBCGemBaseIE(InfoExtractor):
_NETRC_MACHINE = 'cbcgem'
_GEO_COUNTRIES = ['CA']
def _call_show_api(self, item_id, display_id=None):
return self._download_json(
f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}',
display_id or item_id, query={'device': 'web'})
def _extract_item_info(self, item_info):
episode_number = None
title = traverse_obj(item_info, ('title', {str}))
if title and (mobj := re.match(r'(?P<episode>\d+)\. (?P<title>.+)', title)):
episode_number = int_or_none(mobj.group('episode'))
title = mobj.group('title')
return {
'episode_number': episode_number,
**traverse_obj(item_info, {
'id': ('url', {str}),
'episode_id': ('url', {str}),
'description': ('description', {str}),
'thumbnail': ('images', 'card', 'url', {url_or_none}, {update_url(query=None)}),
'episode_number': ('episodeNumber', {int_or_none}),
'duration': ('metadata', 'duration', {int_or_none}),
'release_timestamp': ('metadata', 'airDate', {unified_timestamp}),
'timestamp': ('metadata', 'availabilityDate', {unified_timestamp}),
'age_limit': ('metadata', 'rating', {trim_str(start='C')}, {parse_age_limit}),
}),
'episode': title,
'title': title,
}
class CBCGemIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s[0-9]+[a-z][0-9]+)'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]+)'
_TESTS = [{
# This is a normal, public, TV show video
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
@ -529,7 +566,7 @@ class CBCGemIE(InfoExtractor):
'description': 'md5:929868d20021c924020641769eb3e7f1',
'thumbnail': r're:https://images\.radio-canada\.ca/[^#?]+/cbc_schitts_creek_season_06e01_thumbnail_v01\.jpg',
'duration': 1324,
'categories': ['comedy'],
'genres': ['Comédie et humour'],
'series': 'Schitt\'s Creek',
'season': 'Season 6',
'season_number': 6,
@ -537,9 +574,10 @@ class CBCGemIE(InfoExtractor):
'episode_number': 1,
'episode_id': 'schitts-creek/s06e01',
'upload_date': '20210618',
'timestamp': 1623988800,
'timestamp': 1623974400,
'release_date': '20200107',
'release_timestamp': 1578427200,
'release_timestamp': 1578355200,
'age_limit': 14,
},
'params': {'format': 'bv'},
}, {
@ -557,12 +595,13 @@ class CBCGemIE(InfoExtractor):
'episode_number': 1,
'episode': 'The Cup Runneth Over',
'episode_id': 'schitts-creek/s01e01',
'duration': 1309,
'categories': ['comedy'],
'duration': 1308,
'genres': ['Comédie et humour'],
'upload_date': '20210617',
'timestamp': 1623902400,
'release_date': '20151124',
'release_timestamp': 1448323200,
'timestamp': 1623888000,
'release_date': '20151123',
'release_timestamp': 1448236800,
'age_limit': 14,
},
'params': {'format': 'bv'},
}, {
@ -570,82 +609,107 @@ class CBCGemIE(InfoExtractor):
'only_matching': True,
}]
_GEO_COUNTRIES = ['CA']
_TOKEN_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcgem'
_CLIENT_ID = 'fc05b0ee-3865-4400-a3cc-3da82c330c23'
_refresh_token = None
_access_token = None
_claims_token = None
def _new_claims_token(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._TOKEN_API_KEY}
resp = self._download_json('https://api.loginradius.com/identity/v2/auth/login',
None, data=data, headers=headers, query=query)
access_token = resp['access_token']
@functools.cached_property
def _ropc_settings(self):
return self._download_json(
'https://services.radio-canada.ca/ott/catalog/v1/gem/settings', None,
'Downloading site settings', query={'device': 'web'})['identityManagement']['ropc']
query = {
'access_token': access_token,
'apikey': self._TOKEN_API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json('https://cloud-api.loginradius.com/sso/jwt/api/token',
None, headers=headers, query=query)
sig = resp['signature']
def _is_jwt_expired(self, token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
data = json.dumps({'jwt': sig}).encode()
headers = {'content-type': 'application/json', 'ott-device-type': 'web'}
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/token',
None, data=data, headers=headers, expected_status=426)
cbc_access_token = resp['accessToken']
def _call_oauth_api(self, oauth_data, note='Refreshing access token'):
response = self._download_json(
self._ropc_settings['url'], None, note, data=urlencode_postdata({
'client_id': self._CLIENT_ID,
**oauth_data,
'scope': self._ropc_settings['scopes'],
}))
self._refresh_token = response['refresh_token']
self._access_token = response['access_token']
self.cache.store(self._NETRC_MACHINE, 'token_data', [self._refresh_token, self._access_token])
headers = {'content-type': 'application/json', 'ott-device-type': 'web', 'ott-access-token': cbc_access_token}
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/profile',
None, headers=headers, expected_status=426)
return resp['claimsToken']
def _perform_login(self, username, password):
if not self._refresh_token:
self._refresh_token, self._access_token = self.cache.load(
self._NETRC_MACHINE, 'token_data', default=[None, None])
def _get_claims_token_expiry(self):
# Token is a JWT
# JWT is decoded here and 'exp' field is extracted
# It is a Unix timestamp for when the token expires
b64_data = self._claims_token.split('.')[1]
data = base64.urlsafe_b64decode(b64_data + '==')
return json.loads(data)['exp']
if self._refresh_token and self._access_token:
self.write_debug('Using cached refresh token')
if not self._claims_token:
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
return
def claims_token_expired(self):
exp = self._get_claims_token_expiry()
# It will expire in less than 10 seconds, or has already expired
return exp - time.time() < 10
try:
self._call_oauth_api({
'grant_type': 'password',
'username': username,
'password': password,
}, note='Logging in')
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 400:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
def claims_token_valid(self):
return self._claims_token is not None and not self.claims_token_expired()
def _fetch_access_token(self):
if self._is_jwt_expired(self._access_token):
try:
self._call_oauth_api({
'grant_type': 'refresh_token',
'refresh_token': self._refresh_token,
})
except ExtractorError:
self._refresh_token, self._access_token = None, None
self.cache.store(self._NETRC_MACHINE, 'token_data', [None, None])
self.report_warning('Refresh token has been invalidated; retrying with credentials')
self._perform_login(*self._get_login_info())
def _get_claims_token(self, email, password):
if not self.claims_token_valid():
self._claims_token = self._new_claims_token(email, password)
return self._access_token
def _fetch_claims_token(self):
if not self._get_login_info()[0]:
return None
if not self._claims_token or self._is_jwt_expired(self._claims_token):
self._claims_token = self._download_json(
'https://services.radio-canada.ca/ott/subscription/v2/gem/Subscriber/profile',
None, 'Downloading claims token', query={'device': 'web'},
headers={'Authorization': f'Bearer {self._fetch_access_token()}'})['claimsToken']
self.cache.store(self._NETRC_MACHINE, 'claims_token', self._claims_token)
else:
self.write_debug('Using cached claims token')
return self._claims_token
def _real_initialize(self):
if self.claims_token_valid():
return
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._download_json(
f'https://services.radio-canada.ca/ott/cbc-api/v2/assets/{video_id}',
video_id, expected_status=426)
video_id, season_number = self._match_valid_url(url).group('id', 'season')
video_info = self._call_show_api(video_id)
item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items',
lambda _, v: v['url'] == video_id, any, {require('item info')}))
email, password = self._get_login_info()
if email and password:
claims_token = self._get_claims_token(email, password)
headers = {'x-claims-token': claims_token}
else:
headers = {}
m3u8_info = self._download_json(video_info['playSession']['url'], video_id, headers=headers)
headers = {}
if claims_token := self._fetch_claims_token():
headers['x-claims-token'] = claims_token
m3u8_info = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
video_id, headers=headers, query={
'appCode': 'gem',
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': item_info['idMedia'],
})
if m3u8_info.get('errorCode') == 1:
self.raise_geo_restricted(countries=['CA'])
@ -671,26 +735,20 @@ def _real_extract(self, url):
fmt['preference'] = -2
return {
'season_number': int_or_none(season_number),
**traverse_obj(video_info, {
'series': ('title', {str}),
'season_number': ('structuredMetadata', 'partofSeason', 'seasonNumber', {int_or_none}),
'genres': ('structuredMetadata', 'genre', ..., {str}),
}),
**self._extract_item_info(item_info),
'id': video_id,
'episode_id': video_id,
'formats': formats,
**traverse_obj(video_info, {
'title': ('title', {str}),
'episode': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'series': ('series', {str}),
'season_number': ('season', {int_or_none}),
'episode_number': ('episode', {int_or_none}),
'duration': ('duration', {int_or_none}),
'categories': ('category', {str}, all),
'release_timestamp': ('airDate', {int_or_none(scale=1000)}),
'timestamp': ('availableDate', {int_or_none(scale=1000)}),
}),
}
class CBCGemPlaylistIE(InfoExtractor):
class CBCGemPlaylistIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:playlist'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>(?P<show>[0-9a-z-]+)/s(?P<season>[0-9]+))/?(?:[?#]|$)'
_TESTS = [{
@ -700,70 +758,35 @@ class CBCGemPlaylistIE(InfoExtractor):
'info_dict': {
'id': 'schitts-creek/s06',
'title': 'Season 6',
'description': 'md5:6a92104a56cbeb5818cc47884d4326a2',
'series': 'Schitt\'s Creek',
'season_number': 6,
'season': 'Season 6',
'thumbnail': 'https://images.radio-canada.ca/v1/synps-cbc/season/perso/cbc_schitts_creek_season_06_carousel_v03.jpg?impolicy=ott&im=Resize=(_Size_)&quality=75',
},
}, {
'url': 'https://gem.cbc.ca/schitts-creek/s06',
'only_matching': True,
}]
_API_BASE = 'https://services.radio-canada.ca/ott/cbc-api/v2/shows/'
def _entries(self, season_info):
for episode in traverse_obj(season_info, ('items', lambda _, v: v['url'])):
yield self.url_result(
f'https://gem.cbc.ca/media/{episode["url"]}', CBCGemIE,
**self._extract_item_info(episode))
def _real_extract(self, url):
match = self._match_valid_url(url)
season_id = match.group('id')
show = match.group('show')
show_info = self._download_json(self._API_BASE + show, season_id, expected_status=426)
season = int(match.group('season'))
season_id, show, season = self._match_valid_url(url).group('id', 'show', 'season')
show_info = self._call_show_api(show, display_id=season_id)
season_info = traverse_obj(show_info, (
'content', ..., 'lineups',
lambda _, v: v['seasonNumber'] == int(season), any, {require('season info')}))
season_info = next((s for s in show_info['seasons'] if s.get('season') == season), None)
if season_info is None:
raise ExtractorError(f'Couldn\'t find season {season} of {show}')
episodes = []
for episode in season_info['assets']:
episodes.append({
'_type': 'url_transparent',
'ie_key': 'CBCGem',
'url': 'https://gem.cbc.ca/media/' + episode['id'],
'id': episode['id'],
'title': episode.get('title'),
'description': episode.get('description'),
'thumbnail': episode.get('image'),
'series': episode.get('series'),
'season_number': episode.get('season'),
'season': season_info['title'],
'season_id': season_info.get('id'),
'episode_number': episode.get('episode'),
'episode': episode.get('title'),
'episode_id': episode['id'],
'duration': episode.get('duration'),
'categories': [episode.get('category')],
})
thumbnail = None
tn_uri = season_info.get('image')
# the-national was observed to use a "data:image/png;base64"
# URI for their 'image' value. The image was 1x1, and is
# probably just a placeholder, so it is ignored.
if tn_uri is not None and not tn_uri.startswith('data:'):
thumbnail = tn_uri
return {
'_type': 'playlist',
'entries': episodes,
'id': season_id,
'title': season_info['title'],
'description': season_info.get('description'),
'thumbnail': thumbnail,
'series': show_info.get('title'),
'season_number': season_info.get('season'),
'season': season_info['title'],
}
return self.playlist_result(
self._entries(season_info), season_id,
**traverse_obj(season_info, {
'title': ('title', {str}),
'season': ('title', {str}),
'season_number': ('seasonNumber', {int_or_none}),
}), series=traverse_obj(show_info, ('title', {str})))
class CBCGemLiveIE(InfoExtractor):

View File

@ -2,7 +2,6 @@
import collections
import functools
import getpass
import hashlib
import http.client
import http.cookiejar
import http.cookies
@ -30,6 +29,7 @@
from ..cookies import LenientSimpleCookie
from ..downloader.f4m import get_base_url, remove_encrypted_media
from ..downloader.hls import HlsFD
from ..globals import plugin_ies_overrides
from ..networking import HEADRequest, Request
from ..networking.exceptions import (
HTTPError,
@ -78,7 +78,6 @@
parse_iso8601,
parse_m3u8_attributes,
parse_resolution,
sanitize_filename,
sanitize_url,
smuggle_url,
str_or_none,
@ -100,6 +99,7 @@
xpath_text,
xpath_with_ns,
)
from ..utils._utils import _request_dump_filename
class InfoExtractor:
@ -201,6 +201,11 @@ class InfoExtractor:
fragment_base_url
* "duration" (optional, int or float)
* "filesize" (optional, int)
* hls_media_playlist_data
The M3U8 media playlist data as a string.
Only use if the data must be modified during extraction and
the native HLS downloader should bypass requesting the URL.
Does not apply if ffmpeg is used as external downloader
* is_from_start Is a live format that can be downloaded
from the start. Boolean
* preference Order number of this format. If this field is
@ -1017,23 +1022,6 @@ def __check_blocked(self, content):
'Visit http://blocklist.rkn.gov.ru/ for a block reason.',
expected=True)
def _request_dump_filename(self, url, video_id, data=None):
if data is not None:
data = hashlib.md5(data).hexdigest()
basen = join_nonempty(video_id, data, url, delim='_')
trim_length = self.get_param('trim_file_name') or 240
if len(basen) > trim_length:
h = '___' + hashlib.md5(basen.encode()).hexdigest()
basen = basen[:trim_length - len(h)] + h
filename = sanitize_filename(f'{basen}.dump', restricted=True)
# Working around MAX_PATH limitation on Windows (see
# http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
if os.name == 'nt':
absfilepath = os.path.abspath(filename)
if len(absfilepath) > 259:
filename = fR'\\?\{absfilepath}'
return filename
def __decode_webpage(self, webpage_bytes, encoding, headers):
if not encoding:
encoding = self._guess_encoding_from_content(headers.get('Content-Type', ''), webpage_bytes)
@ -1062,7 +1050,9 @@ def _webpage_read_content(self, urlh, url_or_request, video_id, note=None, errno
if self.get_param('write_pages'):
if isinstance(url_or_request, Request):
data = self._create_request(url_or_request, data).data
filename = self._request_dump_filename(urlh.url, video_id, data)
filename = _request_dump_filename(
urlh.url, video_id, data,
trim_length=self.get_param('trim_file_name'))
self.to_screen(f'Saving request to {filename}')
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
@ -1123,7 +1113,9 @@ def download_content(self, url_or_request, video_id, note=note, errnote=errnote,
impersonate=None, require_impersonation=False):
if self.get_param('load_pages'):
url_or_request = self._create_request(url_or_request, data, headers, query)
filename = self._request_dump_filename(url_or_request.url, video_id, url_or_request.data)
filename = _request_dump_filename(
url_or_request.url, video_id, url_or_request.data,
trim_length=self.get_param('trim_file_name'))
self.to_screen(f'Loading request from {filename}')
try:
with open(filename, 'rb') as dumpf:
@ -3963,14 +3955,18 @@ def _extract_url(cls, webpage): # TODO: Remove
def __init_subclass__(cls, *, plugin_name=None, **kwargs):
if plugin_name:
mro = inspect.getmro(cls)
super_class = cls.__wrapped__ = mro[mro.index(cls) + 1]
cls.PLUGIN_NAME, cls.ie_key = plugin_name, super_class.ie_key
cls.IE_NAME = f'{super_class.IE_NAME}+{plugin_name}'
next_mro_class = super_class = mro[mro.index(cls) + 1]
while getattr(super_class, '__wrapped__', None):
super_class = super_class.__wrapped__
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
_PLUGIN_OVERRIDES[super_class].append(cls)
if not any(override.PLUGIN_NAME == plugin_name for override in plugin_ies_overrides.value[super_class]):
cls.__wrapped__ = next_mro_class
cls.PLUGIN_NAME, cls.ie_key = plugin_name, next_mro_class.ie_key
cls.IE_NAME = f'{next_mro_class.IE_NAME}+{plugin_name}'
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
plugin_ies_overrides.value[super_class].append(cls)
return super().__init_subclass__(**kwargs)
@ -4026,6 +4022,3 @@ class UnsupportedURLIE(InfoExtractor):
def _real_extract(self, url):
raise UnsupportedError(url)
_PLUGIN_OVERRIDES = collections.defaultdict(list)

View File

@ -3,7 +3,7 @@
class CultureUnpluggedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?'
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/(?:documentary/watch-online/)?play/(?P<id>\d+)(?:/(?P<display_id>[^/#?]+))?'
_TESTS = [{
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
@ -12,12 +12,25 @@ class CultureUnpluggedIE(InfoExtractor):
'display_id': 'The-Next--Best-West',
'ext': 'mp4',
'title': 'The Next, Best West',
'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
'description': 'md5:770033a3b7c2946a3bcfb7f1c6fb7045',
'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'Coldstream Creative',
'creators': ['Coldstream Creative'],
'duration': 2203,
'view_count': int,
},
}, {
'url': 'https://www.cultureunplugged.com/play/2833/Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'md5': 'dc2014bc470dfccba389a1c934fa29fa',
'info_dict': {
'id': '2833',
'display_id': 'Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'ext': 'mp4',
'title': 'Koi Sunta Hai: Journeys with Kumar & Kabir (Someone is Listening)',
'description': 'md5:fa94ac934927c98660362b8285b2cda5',
'view_count': int,
'thumbnail': 'https://s3.amazonaws.com/cdn.cultureunplugged.com/thumbnails_16_9/lg/2833.jpg',
'creators': ['Srishti'],
},
}, {
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
'only_matching': True,

View File

@ -100,7 +100,7 @@ def _call_api(self, object_type, xid, object_fields, note, filter_extra=None):
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'''(?ix)
https?://
(?:https?:)?//
(?:
dai\.ly/|
(?:
@ -116,7 +116,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
(?P<id>[^/?_&#]+)(?:[\w-]*\?playlist=(?P<playlist_id>x[0-9a-z]+))?
'''
IE_NAME = 'dailymotion'
_EMBED_REGEX = [r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1']
_EMBED_REGEX = [rf'(?ix)<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)["\'](?P<url>{_VALID_URL[5:]})']
_TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe',
@ -308,6 +308,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'description': 'Que lindura',
'tags': [],
},
}, {
# //geo.dailymotion.com/player/xysxq.html?video=k2Y4Mjp7krAF9iCuINM
'url': 'https://lcp.fr/programmes/avant-la-catastrophe-la-naissance-de-la-dictature-nazie-1933-1936-346819',
'info_dict': {
'id': 'k2Y4Mjp7krAF9iCuINM',
'ext': 'mp4',
'title': 'Avant la catastrophe la naissance de la dictature nazie 1933 -1936',
'description': 'md5:7b620d5e26edbe45f27bbddc1c0257c1',
'uploader': 'LCP Assemblée nationale',
'uploader_id': 'xbz33d',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 3220,
'thumbnail': 'https://s1.dmcdn.net/v/Xvumk1djJBUZfjj2a/x1080',
'tags': [],
'timestamp': 1739919947,
'upload_date': '20250218',
},
}]
_GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description

View File

@ -0,0 +1,130 @@
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import clean_html, int_or_none, traverse_obj, url_or_none, urlencode_postdata
class DigiviewIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ladigitale\.dev/digiview/#/v/(?P<id>[0-9a-f]+)'
_TESTS = [{
# normal video
'url': 'https://ladigitale.dev/digiview/#/v/67a8e50aee2ec',
'info_dict': {
'id': '67a8e50aee2ec',
'ext': 'mp4',
'title': 'Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 635,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 635,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 0,
'availability': 'public',
},
}, {
# cut video
'url': 'https://ladigitale.dev/digiview/#/v/67a8e51d0dd58',
'info_dict': {
'id': '67a8e51d0dd58',
'ext': 'mp4',
'title': 'Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 5,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 10,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 5,
'availability': 'public',
},
}, {
# changed title
'url': 'https://ladigitale.dev/digiview/#/v/67a8ea5644d7a',
'info_dict': {
'id': '67a8ea5644d7a',
'ext': 'mp4',
'title': 'Big Buck Bunny (with title changed)',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 5,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 15,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 10,
'availability': 'public',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'https://ladigitale.dev/digiview/inc/recuperer_video.php', video_id,
data=urlencode_postdata({'id': video_id}))
clip_id = video_data['videoId']
return self.url_result(
f'https://www.youtube.com/watch?v={clip_id}',
YoutubeIE, video_id, url_transparent=True,
**traverse_obj(video_data, {
'section_start': ('debut', {int_or_none}),
'section_end': ('fin', {int_or_none}),
'description': ('description', {clean_html}, filter),
'title': ('titre', {str}),
'thumbnail': ('vignette', {url_or_none}),
'view_count': ('vues', {int_or_none}),
}),
)

View File

@ -1,10 +1,24 @@
from .zdf import ZDFIE
from .zdf import ZDFBaseIE
class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
class DreiSatIE(ZDFBaseIE):
IE_NAME = '3sat'
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
'url': 'https://www.3sat.de/dokumentation/reise/traumziele-suedostasiens-die-philippinen-und-vietnam-102.html',
'info_dict': {
'id': '231124_traumziele_philippinen_und_vietnam_dokreise',
'ext': 'mp4',
'title': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'description': 'md5:26329ce5197775b596773b939354079d',
'duration': 2625.0,
'thumbnail': 'https://www.3sat.de/assets/traumziele-suedostasiens-die-philippinen-und-vietnam-100~2400x1350?cb=1699870351148',
'episode': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'episode_id': 'POS_cc7ff51c-98cf-4d12-b99d-f7a551de1c95',
'timestamp': 1738593000,
'upload_date': '20250203',
},
}, {
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
@ -17,6 +31,7 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'timestamp': 1608604200,
'upload_date': '20201222',
},
'skip': '410 Gone',
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
@ -30,6 +45,7 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'params': {
'skip_download': True,
},
'skip': '404 Not Found',
}, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
@ -39,3 +55,14 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, fatal=False)
if webpage:
player = self._extract_player(webpage, url, fatal=False)
if player:
return self._extract_regular(url, player, video_id)
return self._extract_mobile(video_id)

View File

@ -1,28 +1,37 @@
import contextlib
import inspect
import os
from ..plugins import load_plugins
from ..globals import LAZY_EXTRACTORS
from ..globals import extractors as _extractors_context
# NB: Must be before other imports so that plugins can be correctly injected
_PLUGIN_CLASSES = load_plugins('extractor', 'IE')
_CLASS_LOOKUP = None
if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
LAZY_EXTRACTORS.value = False
else:
try:
from .lazy_extractors import _CLASS_LOOKUP
LAZY_EXTRACTORS.value = True
except ImportError:
LAZY_EXTRACTORS.value = None
_LAZY_LOADER = False
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
with contextlib.suppress(ImportError):
from .lazy_extractors import * # noqa: F403
from .lazy_extractors import _ALL_CLASSES
_LAZY_LOADER = True
if not _CLASS_LOOKUP:
from . import _extractors
if not _LAZY_LOADER:
from ._extractors import * # noqa: F403
_ALL_CLASSES = [ # noqa: F811
klass
for name, klass in globals().items()
_CLASS_LOOKUP = {
name: value
for name, value in inspect.getmembers(_extractors)
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE) # noqa: F405
}
_CLASS_LOOKUP['GenericIE'] = _extractors.GenericIE
globals().update(_PLUGIN_CLASSES)
_ALL_CLASSES[:0] = _PLUGIN_CLASSES.values()
# We want to append to the main lookup
_current = _extractors_context.value
for name, ie in _CLASS_LOOKUP.items():
_current.setdefault(name, ie)
from .common import _PLUGIN_OVERRIDES # noqa: F401
def __getattr__(name):
value = _CLASS_LOOKUP.get(name)
if not value:
raise AttributeError(f'module {__name__} has no attribute {name}')
return value

View File

@ -9,6 +9,7 @@
ExtractorError,
clean_html,
determine_ext,
extract_attributes,
filter_dict,
format_field,
int_or_none,
@ -18,7 +19,7 @@
unsmuggle_url,
url_or_none,
)
from ..utils.traversal import traverse_obj
from ..utils.traversal import find_element, traverse_obj
class FranceTVBaseInfoExtractor(InfoExtractor):
@ -358,7 +359,8 @@ def _real_extract(self, url):
# For livestreams we need the id of the stream instead of the currently airing episode id
video_id = traverse_obj(nextjs_data, (
..., ..., 'children', ..., 'children', ..., 'children', ..., 'children', ..., ...,
'children', ..., ..., 'children', ..., ..., 'children', ..., 'options', 'id', {str}, any))
'children', ..., ..., 'children', ..., ..., 'children', (..., (..., ...)),
'options', 'id', {str}, any))
else:
video_id = traverse_obj(nextjs_data, (
..., ..., ..., 'children',
@ -459,11 +461,16 @@ def _real_extract(self, url):
self.url_result(dailymotion_url, DailymotionIE.ie_key())
for dailymotion_url in dailymotion_urls])
video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
video_id = (
traverse_obj(webpage, (
{find_element(tag='button', attr='data-cy', value='francetv-player-wrapper', html=True)},
{extract_attributes}, 'id'))
or self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
)
return self._make_url_result(video_id, url=url)

View File

@ -293,6 +293,19 @@ class GenericIE(InfoExtractor):
'timestamp': 1378272859.0,
},
},
# Live DASH MPD
{
'url': 'https://livesim2.dashif.org/livesim2/ato_10/testpic_2s/Manifest.mpd',
'info_dict': {
'id': 'Manifest',
'ext': 'mp4',
'title': r're:Manifest \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'live_status': 'is_live',
},
'params': {
'skip_download': 'livestream',
},
},
# m3u8 served with Content-Type: audio/x-mpegURL; charset=utf-8
{
'url': 'http://once.unicornmedia.com/now/master/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/content.m3u8',
@ -2436,10 +2449,9 @@ def _real_extract(self, url):
subtitles = {}
if format_id.endswith('mpegurl') or ext == 'm3u8':
formats, subtitles = self._extract_m3u8_formats_and_subtitles(url, video_id, 'mp4', headers=headers)
elif format_id.endswith(('mpd', 'dash+xml')) or ext == 'mpd':
formats, subtitles = self._extract_mpd_formats_and_subtitles(url, video_id, headers=headers)
elif format_id == 'f4m' or ext == 'f4m':
formats = self._extract_f4m_formats(url, video_id, headers=headers)
# Don't check for DASH/mpd here, do it later w/ first_bytes. Same number of requests either way
else:
formats = [{
'format_id': format_id,
@ -2521,6 +2533,7 @@ def _real_extract(self, url):
doc,
mpd_base_url=full_response.url.rpartition('/')[0],
mpd_url=url)
info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None
self._extra_manifest_info(info_dict, url)
self.report_detected('DASH manifest')
return info_dict

View File

@ -69,8 +69,13 @@ class GloboIE(InfoExtractor):
'info_dict': {
'id': '8013907',
'ext': 'mp4',
'title': 'Capítulo de 14081989',
'title': 'Capítulo de 14/08/1989',
'episode': 'Episode 1',
'episode_number': 1,
'uploader': 'Tieta',
'uploader_id': '11895',
'duration': 2858.389,
'subtitles': 'count:1',
},
'params': {
'skip_download': True,
@ -82,7 +87,12 @@ class GloboIE(InfoExtractor):
'id': '12824146',
'ext': 'mp4',
'title': 'Acordo de damas',
'episode': 'Episode 1',
'episode_number': 1,
'uploader': 'Rensga Hits!',
'uploader_id': '20481',
'duration': 1953.994,
'season': 'Season 2',
'season_number': 2,
},
'params': {
@ -136,9 +146,10 @@ def _real_extract(self, url):
else:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
main_source['url'], video_id, 'mp4', m3u8_id='hls')
self._merge_subtitles(traverse_obj(main_source, ('text', ..., {
'url': ('subtitle', 'srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='en')})), target=subtitles)
self._merge_subtitles(traverse_obj(main_source, ('text', ..., ('caption', 'subtitle'), {
'url': ('srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='pt-BR')})), target=subtitles)
return {
'id': video_id,

View File

@ -2,12 +2,12 @@
import itertools
import json
import re
import time
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
bug_reports_message,
decode_base_n,
encode_base_n,
filter_dict,
@ -15,12 +15,12 @@
format_field,
get_element_by_attribute,
int_or_none,
join_nonempty,
lowercase_escape,
str_or_none,
str_to_int,
traverse_obj,
url_or_none,
urlencode_postdata,
)
_ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
@ -28,63 +28,30 @@
def _pk_to_id(media_id):
"""Source: https://stackoverflow.com/questions/24437823/getting-instagram-post-url-from-media-id"""
return encode_base_n(int(media_id.split('_')[0]), table=_ENCODING_CHARS)
pk = int(str(media_id).split('_')[0])
return encode_base_n(pk, table=_ENCODING_CHARS)
def _id_to_pk(shortcode):
"""Covert a shortcode to a numeric value"""
return decode_base_n(shortcode[:11], table=_ENCODING_CHARS)
"""Convert a shortcode to a numeric value"""
if len(shortcode) > 28:
shortcode = shortcode[:-28]
return decode_base_n(shortcode, table=_ENCODING_CHARS)
class InstagramBaseIE(InfoExtractor):
_NETRC_MACHINE = 'instagram'
_IS_LOGGED_IN = False
_API_BASE_URL = 'https://i.instagram.com/api/v1'
_LOGIN_URL = 'https://www.instagram.com/accounts/login'
_API_HEADERS = {
'X-IG-App-ID': '936619743392459',
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
'Accept': '*/*',
}
def _perform_login(self, username, password):
if self._IS_LOGGED_IN:
return
login_webpage = self._download_webpage(
self._LOGIN_URL, None, note='Downloading login webpage', errnote='Failed to download login webpage')
shared_data = self._parse_json(self._search_regex(
r'window\._sharedData\s*=\s*({.+?});', login_webpage, 'shared data', default='{}'), None)
login = self._download_json(
f'{self._LOGIN_URL}/ajax/', None, note='Logging in', headers={
**self._API_HEADERS,
'X-Requested-With': 'XMLHttpRequest',
'X-CSRFToken': shared_data['config']['csrf_token'],
'X-Instagram-AJAX': shared_data['rollout_hash'],
'Referer': 'https://www.instagram.com/',
}, data=urlencode_postdata({
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{int(time.time())}:{password}',
'username': username,
'queryParams': '{}',
'optIntoOneTap': 'false',
'stopDeletionNonce': '',
'trustedDeviceRecords': '{}',
}))
if not login.get('authenticated'):
if login.get('message'):
raise ExtractorError(f'Unable to login: {login["message"]}')
elif login.get('user'):
raise ExtractorError('Unable to login: Sorry, your password was incorrect. Please double-check your password.', expected=True)
elif login.get('user') is False:
raise ExtractorError('Unable to login: The username you entered doesn\'t belong to an account. Please check your username and try again.', expected=True)
raise ExtractorError('Unable to login')
InstagramBaseIE._IS_LOGGED_IN = True
@property
def _api_headers(self):
return {
'X-IG-App-ID': self._configuration_arg('app_id', ['936619743392459'], ie_key=InstagramIE)[0],
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
'Accept': '*/*',
}
def _get_count(self, media, kind, *keys):
return traverse_obj(
@ -209,7 +176,7 @@ def _extract_product(self, product_info):
def _get_comments(self, video_id):
comments_info = self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/comments/?can_support_threading=true&permalink_enabled=false', video_id,
fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._API_HEADERS) or {}
fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._api_headers) or {}
comment_data = traverse_obj(comments_info, ('edge_media_to_parent_comment', 'edges'), 'comments')
for comment_dict in comment_data or []:
@ -402,14 +369,14 @@ def _real_extract(self, url):
info = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/info/', video_id,
fatal=False, errnote='Video info extraction failed',
note='Downloading video info', headers=self._API_HEADERS), ('items', 0))
note='Downloading video info', headers=self._api_headers), ('items', 0))
if info:
media.update(info)
return self._extract_product(media)
api_check = self._download_json(
f'{self._API_BASE_URL}/web/get_ruling_for_content/?content_type=MEDIA&target_id={_id_to_pk(video_id)}',
video_id, headers=self._API_HEADERS, fatal=False, note='Setting up session', errnote=False) or {}
video_id, headers=self._api_headers, fatal=False, note='Setting up session', errnote=False) or {}
csrf_token = self._get_cookies('https://www.instagram.com').get('csrftoken')
if not csrf_token:
@ -429,7 +396,7 @@ def _real_extract(self, url):
general_info = self._download_json(
'https://www.instagram.com/graphql/query/', video_id, fatal=False, errnote=False,
headers={
**self._API_HEADERS,
**self._api_headers,
'X-CSRFToken': csrf_token or '',
'X-Requested-With': 'XMLHttpRequest',
'Referer': url,
@ -437,7 +404,6 @@ def _real_extract(self, url):
'doc_id': '8845758582119845',
'variables': json.dumps(variables, separators=(',', ':')),
})
media.update(traverse_obj(general_info, ('data', 'xdt_shortcode_media')) or {})
if not general_info:
self.report_warning('General metadata extraction failed (some metadata might be missing).', video_id)
@ -466,6 +432,26 @@ def _real_extract(self, url):
media.update(traverse_obj(
additional_data, ('graphql', 'shortcode_media'), 'shortcode_media', expected_type=dict) or {})
else:
xdt_shortcode_media = traverse_obj(general_info, ('data', 'xdt_shortcode_media', {dict})) or {}
if not xdt_shortcode_media:
error = join_nonempty('title', 'description', delim=': ', from_dict=api_check)
if 'Restricted Video' in error:
self.raise_login_required(error)
elif error:
raise ExtractorError(error, expected=True)
elif len(video_id) > 28:
# It's a private post (video_id == shortcode + 28 extra characters)
# Only raise after getting empty response; sometimes "long"-shortcode posts are public
self.raise_login_required(
'This content is only available for registered users who follow this account')
raise ExtractorError(
'Instagram sent an empty media response. Check if this post is accessible in your '
f'browser without being logged-in. If it is not, then u{self._login_hint()[1:]}. '
'Otherwise, if the post is accessible in browser without being logged-in'
f'{bug_reports_message(before=",")}', expected=True)
media.update(xdt_shortcode_media)
username = traverse_obj(media, ('owner', 'username')) or self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"', webpage, 'username', fatal=False)
@ -485,8 +471,7 @@ def _real_extract(self, url):
return self.playlist_result(
self._extract_nodes(nodes, True), video_id,
format_field(username, None, 'Post by %s'), description)
video_url = self._og_search_video_url(webpage, secure=False)
raise ExtractorError('There is no video in this post', expected=True)
formats = [{
'url': video_url,
@ -689,7 +674,7 @@ def _query_vars_for(data):
class InstagramStoryIE(InstagramBaseIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/?#]+)(?:/(?P<id>\d+))?'
IE_NAME = 'instagram:story'
_TESTS = [{
@ -699,25 +684,38 @@ class InstagramStoryIE(InstagramBaseIE):
'title': 'Rare',
},
'playlist_mincount': 50,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper/3570766765028588805/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper',
'only_matching': True,
}]
def _real_extract(self, url):
username, story_id = self._match_valid_url(url).groups()
story_info = self._download_webpage(url, story_id)
user_info = self._search_json(r'"user":', story_info, 'user info', story_id, fatal=False)
username, story_id = self._match_valid_url(url).group('user', 'id')
if username == 'highlights' and not story_id: # story id is only mandatory for highlights
raise ExtractorError('Input URL is missing a highlight ID', expected=True)
display_id = story_id or username
story_info = self._download_webpage(url, display_id)
user_info = self._search_json(r'"user":', story_info, 'user info', display_id, fatal=False)
if not user_info:
self.raise_login_required('This content is unreachable')
user_id = traverse_obj(user_info, 'pk', 'id', expected_type=str)
story_info_url = user_id if username != 'highlights' else f'highlight:{story_id}'
if not story_info_url: # user id is only mandatory for non-highlights
raise ExtractorError('Unable to extract user id')
if username == 'highlights':
story_info_url = f'highlight:{story_id}'
else:
if not user_id: # user id is only mandatory for non-highlights
raise ExtractorError('Unable to extract user id')
story_info_url = user_id
videos = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/feed/reels_media/?reel_ids={story_info_url}',
story_id, errnote=False, fatal=False, headers=self._API_HEADERS), 'reels')
display_id, errnote=False, fatal=False, headers=self._api_headers), 'reels')
if not videos:
self.raise_login_required('You need to log in to access this content')
user_info = traverse_obj(videos, (user_id, 'user', {dict})) or {}
full_name = traverse_obj(videos, (f'highlight:{story_id}', 'user', 'full_name'), (user_id, 'user', 'full_name'))
story_title = traverse_obj(videos, (f'highlight:{story_id}', 'title'))
@ -727,6 +725,7 @@ def _real_extract(self, url):
highlights = traverse_obj(videos, (f'highlight:{story_id}', 'items'), (user_id, 'items'))
info_data = []
for highlight in highlights:
highlight.setdefault('user', {}).update(user_info)
highlight_data = self._extract_product(highlight)
if highlight_data.get('formats'):
info_data.append({
@ -734,4 +733,7 @@ def _real_extract(self, url):
'uploader_id': user_id,
**filter_dict(highlight_data),
})
if username != 'highlights' and story_id and not self._yes_playlist(username, story_id):
return traverse_obj(info_data, (lambda _, v: v['id'] == _pk_to_id(story_id), any))
return self.playlist_result(info_data, playlist_id=story_id, playlist_title=story_title)

View File

@ -26,6 +26,7 @@ class LBRYBaseIE(InfoExtractor):
_CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = f'[^$@:/?#&]+(?:[:#]{_CLAIM_ID_REGEX})?'
_SUPPORTED_STREAM_TYPES = ['video', 'audio']
_UNSUPPORTED_STREAM_TYPES = ['binary']
_PAGE_SIZE = 50
def _call_api_proxy(self, method, display_id, params, resource):
@ -336,12 +337,15 @@ def _real_extract(self, url):
'vcodec': 'none' if stream_type == 'audio' else None,
})
final_url = None
# HEAD request returns redirect response to m3u8 URL if available
final_url = self._request_webpage(
urlh = self._request_webpage(
HEADRequest(streaming_url), display_id, headers=headers,
note='Downloading streaming redirect url info').url
note='Downloading streaming redirect url info', fatal=False)
if urlh:
final_url = urlh.url
elif result.get('value_type') == 'stream':
elif result.get('value_type') == 'stream' and stream_type not in self._UNSUPPORTED_STREAM_TYPES:
claim_id, is_live = result['signing_channel']['claim_id'], True
live_data = self._download_json(
'https://api.odysee.live/livestream/is_live', claim_id,

View File

@ -1,35 +1,36 @@
from .common import InfoExtractor
from ..utils import parse_age_limit, parse_duration, traverse_obj
from ..utils import parse_age_limit, parse_duration, url_or_none
from ..utils.traversal import traverse_obj
class MagellanTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.magellantv.com/watch/my-dads-on-death-row?type=v',
'url': 'https://www.magellantv.com/watch/incas-the-new-story?type=v',
'info_dict': {
'id': 'my-dads-on-death-row',
'id': 'incas-the-new-story',
'ext': 'mp4',
'title': 'My Dad\'s On Death Row',
'description': 'md5:33ba23b9f0651fc4537ed19b1d5b0d7a',
'duration': 3780.0,
'title': 'Incas: The New Story',
'description': 'md5:936c7f6d711c02dfb9db22a067b586fe',
'age_limit': 14,
'tags': ['Justice', 'Reality', 'United States', 'True Crime'],
'duration': 3060.0,
'tags': ['Ancient History', 'Archaeology', 'Anthropology'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.magellantv.com/video/james-bulger-the-new-revelations',
'url': 'https://www.magellantv.com/video/tortured-to-death-murdering-the-nanny',
'info_dict': {
'id': 'james-bulger-the-new-revelations',
'id': 'tortured-to-death-murdering-the-nanny',
'ext': 'mp4',
'title': 'James Bulger: The New Revelations',
'description': 'md5:7b97922038bad1d0fe8d0470d8a189f2',
'title': 'Tortured to Death: Murdering the Nanny',
'description': 'md5:d87033594fa218af2b1a8b49f52511e5',
'age_limit': 14,
'duration': 2640.0,
'age_limit': 0,
'tags': ['Investigation', 'True Crime', 'Justice', 'Europe'],
'tags': ['True Crime', 'Murder'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.magellantv.com/watch/celebration-nation',
'url': 'https://www.magellantv.com/watch/celebration-nation?type=s',
'info_dict': {
'id': 'celebration-nation',
'ext': 'mp4',
@ -43,10 +44,19 @@ class MagellanTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', 'reactContext',
(('video', 'detail'), ('series', 'currentEpisode')), {dict}), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(data['jwpVideoUrl'], video_id)
context = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['reactContext']
data = traverse_obj(context, ((('video', 'detail'), ('series', 'currentEpisode')), {dict}, any))
formats, subtitles = [], {}
for m3u8_url in set(traverse_obj(data, ((('manifests', ..., 'hls'), 'jwp_video_url'), {url_or_none}))):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and (error := traverse_obj(context, ('errorDetailPage', 'errorMessage', {str}))):
if 'available in your country' in error:
self.raise_geo_restricted(msg=error)
self.raise_no_formats(f'{self.IE_NAME} said: {error}', expected=True)
return {
'id': video_id,

View File

@ -4,7 +4,9 @@
from ..utils import (
extract_attributes,
unified_timestamp,
url_or_none,
)
from ..utils.traversal import traverse_obj
class N1InfoAssetIE(InfoExtractor):
@ -35,9 +37,9 @@ class N1InfoIIE(InfoExtractor):
IE_NAME = 'N1Info:article'
_VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)'
_TESTS = [{
# Youtube embedded
# YouTube embedded
'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/',
'md5': '01ddb6646d0fd9c4c7d990aa77fe1c5a',
'md5': '987ce6fd72acfecc453281e066b87973',
'info_dict': {
'id': 'L5Hd4hQVUpk',
'ext': 'mp4',
@ -45,7 +47,26 @@ class N1InfoIIE(InfoExtractor):
'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS',
'description': 'md5:467f330af1effedd2e290f10dc31bb8e',
'uploader': 'Sport Klub',
'uploader_id': 'sportklub',
'uploader_id': '@sportklub',
'uploader_url': 'https://www.youtube.com/@sportklub',
'channel': 'Sport Klub',
'channel_id': 'UChpzBje9Ro6CComXe3BgNaw',
'channel_url': 'https://www.youtube.com/channel/UChpzBje9Ro6CComXe3BgNaw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 1049,
'thumbnail': 'https://i.ytimg.com/vi/L5Hd4hQVUpk/maxresdefault.jpg',
'chapters': 'count:9',
'categories': ['Sports'],
'tags': 'count:10',
'timestamp': 1631522787,
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
},
}, {
'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/',
@ -55,6 +76,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode',
'upload_date': '20210924',
'timestamp': 1632481347,
'thumbnail': 'http://n1info.rs/wp-content/themes/ucnewsportal-n1/dist/assets/images/placeholder-image-video.jpg',
},
'params': {
'skip_download': True,
@ -67,6 +89,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”',
'timestamp': 1632567630,
'upload_date': '20210925',
'thumbnail': 'https://n1info.si/wp-content/uploads/2021/09/06/1630945843-tomaz3.png',
},
'params': {
'skip_download': True,
@ -81,6 +104,14 @@ class N1InfoIIE(InfoExtractor):
'upload_date': '20210924',
'timestamp': 1632448649.0,
'uploader': 'YouLotWhatDontStop',
'display_id': 'pu9wbx',
'channel_id': 'serbia',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'duration': 134,
'thumbnail': 'https://external-preview.redd.it/5nmmawSeGx60miQM3Iq-ueC9oyCLTLjjqX-qqY8uRsc.png?format=pjpg&auto=webp&s=2f973400b04d23f871b608b178e47fc01f9b8f1d',
},
'params': {
'skip_download': True,
@ -93,6 +124,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)',
'upload_date': '20211102',
'timestamp': 1635861677,
'thumbnail': 'https://nova.rs/wp-content/uploads/2021/11/02/1635860298-TNJG_Ana_Brnabic_i_Zaklina_Tatalovic_100_dana_Vlade_GP.jpg',
},
}, {
'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/',
@ -104,6 +136,16 @@ class N1InfoIIE(InfoExtractor):
'timestamp': 1687290536,
'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg',
},
}, {
'url': 'https://n1info.rs/vesti/vuciceva-turneja-po-srbiji-najavljuje-kontrarevoluciju-preti-svom-narodu-vredja-novinare/',
'info_dict': {
'id': '2025974',
'ext': 'mp4',
'title': 'Vučićeva turneja po Srbiji: Najavljuje kontrarevoluciju, preti svom narodu, vređa novinare',
'thumbnail': 'https://cdn-uc.brid.tv/live/partners/26827/snapshot/2025974_fhd_67c4a23280a81_1740939826.jpg',
'timestamp': 1740939936,
'upload_date': '20250302',
},
}, {
'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/',
'only_matching': True,
@ -115,11 +157,11 @@ def _real_extract(self, url):
title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title')
timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage))
plugin_data = self._html_search_meta('BridPlugin', webpage)
plugin_data = re.findall(r'\$bp\("(?:Brid|TargetVideo)_\d+",\s(.+)\);', webpage)
entries = []
if plugin_data:
site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id')
for video_data in re.findall(r'\$bp\("Brid_\d+", (.+)\);', webpage):
for video_data in plugin_data:
video_id = self._parse_json(video_data, title)['video']
entries.append({
'id': video_id,
@ -140,7 +182,7 @@ def _real_extract(self, url):
'url': video_data.get('data-url'),
'id': video_data.get('id'),
'title': title,
'thumbnail': video_data.get('data-thumbnail'),
'thumbnail': traverse_obj(video_data, (('data-thumbnail', 'data-default_thumbnail'), {url_or_none}, any)),
'timestamp': timestamp,
'ie_key': 'N1InfoAsset',
})
@ -152,7 +194,7 @@ def _real_extract(self, url):
if url.startswith('https://www.youtube.com'):
entries.append(self.url_result(url, ie='Youtube'))
elif url.startswith('https://www.redditmedia.com'):
entries.append(self.url_result(url, ie='RedditR'))
entries.append(self.url_result(url, ie='Reddit'))
return {
'_type': 'playlist',

View File

@ -13,11 +13,13 @@
ExtractorError,
OnDemandPagedList,
clean_html,
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
parse_duration,
parse_iso8601,
parse_qs,
parse_resolution,
qualities,
remove_start,
@ -26,6 +28,7 @@
try_get,
unescapeHTML,
update_url_query,
url_basename,
url_or_none,
urlencode_postdata,
urljoin,
@ -430,6 +433,7 @@ def _yield_dms_formats(self, api_data, video_id):
'format_id': ('id', {str}),
'abr': ('bitRate', {float_or_none(scale=1000)}),
'asr': ('samplingRate', {int_or_none}),
'quality': ('qualityLevel', {int_or_none}),
}), get_all=False),
'acodec': 'aac',
}
@ -441,7 +445,9 @@ def _yield_dms_formats(self, api_data, video_id):
min_abr = min(traverse_obj(audios, (..., 'bitRate', {float_or_none})), default=0) / 1000
for video_fmt in video_fmts:
video_fmt['tbr'] -= min_abr
video_fmt['format_id'] = f'video-{video_fmt["tbr"]:.0f}'
video_fmt['format_id'] = url_basename(video_fmt['url']).rpartition('.')[0]
video_fmt['quality'] = traverse_obj(videos, (
lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1
yield video_fmt
def _real_extract(self, url):
@ -1033,6 +1039,7 @@ def _real_extract(self, url):
thumbnails.append({
'id': f'{name}_{width}x{height}',
'url': img_url,
'ext': traverse_obj(parse_qs(img_url), ('image', 0, {determine_ext(default_ext='jpg')})),
**res,
})

View File

@ -501,7 +501,7 @@ def _extract_webpage(self, url):
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
r'class="coveplayerid">([^<]+)<', # coveplayer
r'<section[^>]+data-coveid="(\d+)"', # coveplayer from http://www.pbs.org/wgbh/frontline/film/real-csi/
r'\bclass="passportcoveplayer"[^>]+\bdata-media="(\d+)', # https://www.thirteen.org/programs/the-woodwrights-shop/who-wrote-the-book-of-sloyd-fggvvq/
r'\sclass="passportcoveplayer"[^>]*\sdata-media="(\d+)', # https://www.thirteen.org/programs/the-woodwrights-shop/who-wrote-the-book-of-sloyd-fggvvq/
r'<input type="hidden" id="pbs_video_id_[0-9]+" value="([0-9]+)"/>', # jwplayer
r"(?s)window\.PBS\.playerConfig\s*=\s*{.*?id\s*:\s*'([0-9]+)',",
r'<div[^>]+\bdata-cove-id=["\'](\d+)"', # http://www.pbs.org/wgbh/roadshow/watch/episode/2105-indianapolis-hour-2/

View File

@ -23,9 +23,9 @@ class PinterestBaseIE(InfoExtractor):
def _call_api(self, resource, video_id, options):
return self._download_json(
f'https://www.pinterest.com/resource/{resource}Resource/get/',
video_id, f'Download {resource} JSON metadata', query={
'data': json.dumps({'options': options}),
})['resource_response']
video_id, f'Download {resource} JSON metadata',
query={'data': json.dumps({'options': options})},
headers={'X-Pinterest-PWS-Handler': 'www/[username].js'})['resource_response']
def _extract_video(self, data, extract_formats=True):
video_id = data['id']

View File

@ -1,4 +1,7 @@
import base64
import hashlib
import json
import uuid
from .common import InfoExtractor
from ..utils import (
@ -142,39 +145,73 @@ class PlaySuisseIE(InfoExtractor):
id
url
}'''
_LOGIN_BASE_URL = 'https://login.srgssr.ch/srgssrlogin.onmicrosoft.com'
_LOGIN_PATH = 'B2C_1A__SignInV2'
_CLIENT_ID = '1e33f1bf-8bf3-45e4-bbd9-c9ad934b5fca'
_LOGIN_BASE = 'https://account.srgssr.ch'
_ID_TOKEN = None
def _perform_login(self, username, password):
login_page = self._download_webpage(
'https://www.playsuisse.ch/api/sso/login', None, note='Downloading login page',
query={'x': 'x', 'locale': 'de', 'redirectUrl': 'https://www.playsuisse.ch/'})
settings = self._search_json(r'var\s+SETTINGS\s*=', login_page, 'settings', None)
code_verifier = uuid.uuid4().hex + uuid.uuid4().hex + uuid.uuid4().hex
code_challenge = base64.urlsafe_b64encode(
hashlib.sha256(code_verifier.encode()).digest()).decode().rstrip('=')
csrf_token = settings['csrf']
query = {'tx': settings['transId'], 'p': self._LOGIN_PATH}
request_id = parse_qs(self._request_webpage(
f'{self._LOGIN_BASE}/authz-srv/authz', None, 'Requesting session ID', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'scope': 'email profile openid offline_access',
'response_type': 'code',
'code_challenge': code_challenge,
'code_challenge_method': 'S256',
'view_type': 'login',
}).url)['requestId'][0]
status = traverse_obj(self._download_json(
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/SelfAsserted', None, 'Logging in',
query=query, headers={'X-CSRF-TOKEN': csrf_token}, data=urlencode_postdata({
'request_type': 'RESPONSE',
'signInName': username,
'password': password,
}), expected_status=400), ('status', {int_or_none}))
if status == 400:
raise ExtractorError('Invalid username or password', expected=True)
try:
exchange_id = self._download_json(
f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/initiate/password', None,
'Submitting username', headers={'content-type': 'application/json'}, data=json.dumps({
'usage_type': 'INITIAL_AUTHENTICATION',
'request_id': request_id,
'medium_id': 'PASSWORD',
'type': 'password',
'identifier': username,
}).encode())['data']['exchange_id']['exchange_id']
except ExtractorError:
raise ExtractorError('Invalid username', expected=True)
urlh = self._request_webpage(
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/api/CombinedSigninAndSignup/confirmed',
None, 'Downloading ID token', query={
'rememberMe': 'false',
'csrf_token': csrf_token,
**query,
'diags': '',
})
try:
login_data = self._download_json(
f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/authenticate/password', None,
'Submitting password', headers={'content-type': 'application/json'}, data=json.dumps({
'requestId': request_id,
'exchange_id': exchange_id,
'type': 'password',
'password': password,
}).encode())['data']
except ExtractorError:
raise ExtractorError('Invalid password', expected=True)
authorization_code = parse_qs(self._request_webpage(
f'{self._LOGIN_BASE}/login-srv/verification/login', None, 'Logging in',
data=urlencode_postdata({
'requestId': request_id,
'exchange_id': login_data['exchange_id']['exchange_id'],
'verificationType': 'password',
'sub': login_data['sub'],
'status_id': login_data['status_id'],
'rememberMe': True,
'lat': '',
'lon': '',
})).url)['code'][0]
self._ID_TOKEN = self._download_json(
f'{self._LOGIN_BASE}/proxy/token', None, 'Downloading token', data=b'', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'code': authorization_code,
'code_verifier': code_verifier,
'grant_type': 'authorization_code',
})['id_token']
self._ID_TOKEN = traverse_obj(parse_qs(urlh.url), ('id_token', 0))
if not self._ID_TOKEN:
raise ExtractorError('Login failed')

View File

@ -198,6 +198,25 @@ class RedditIE(InfoExtractor):
'skip_download': True,
'writesubtitles': True,
},
}, {
# "gated" subreddit post
'url': 'https://old.reddit.com/r/ketamine/comments/degtjo/when_the_k_hits/',
'info_dict': {
'id': 'gqsbxts133r31',
'ext': 'mp4',
'display_id': 'degtjo',
'title': 'When the K hits',
'uploader': '[deleted]',
'channel_id': 'ketamine',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 18,
'duration': 34,
'thumbnail': r're:https?://.+/.+\.(?:jpg|png)',
'timestamp': 1570438713.0,
'upload_date': '20191007',
},
}, {
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj',
'only_matching': True,
@ -245,6 +264,15 @@ def _perform_login(self, username, password):
elif not traverse_obj(login, ('json', 'data', 'cookie', {str})):
raise ExtractorError('Unable to login, no cookie was returned')
def _real_initialize(self):
# Set cookie to opt-in to age-restricted subreddits
self._set_cookie('reddit.com', 'over18', '1')
# Set cookie to opt-in to "gated" subreddits
options = traverse_obj(self._get_cookies('https://www.reddit.com/'), (
'_options', 'value', {urllib.parse.unquote}, {json.loads}, {dict})) or {}
options['pref_gated_sr_optin'] = True
self._set_cookie('reddit.com', '_options', urllib.parse.quote(json.dumps(options)))
def _get_subtitles(self, video_id):
# Fallback if there were no subtitles provided by DASH or HLS manifests
caption_url = f'https://v.redd.it/{video_id}/wh_ben_en.vtt'

View File

@ -3,12 +3,20 @@
import re
import urllib.parse
from .common import InfoExtractor
from ..utils import js_to_json
from .common import InfoExtractor, Request
from ..utils import (
determine_ext,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class RTPIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:(?:estudoemcasa|palco|zigzag)/)?p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:[^/#?]+/)?p(?P<program_id>\d+)/(?P<id>e\d+)'
_TESTS = [{
'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas',
'md5': 'e736ce0c665e459ddb818546220b4ef8',
@ -16,99 +24,173 @@ class RTPIE(InfoExtractor):
'id': 'e174042',
'ext': 'mp3',
'title': 'Paixões Cruzadas',
'description': 'As paixões musicais de António Cartaxo e António Macedo',
'description': 'md5:af979e58ba0ab73f78435fc943fdb070',
'thumbnail': r're:^https?://.*\.jpg',
'series': 'Paixões Cruzadas',
'duration': 2950.0,
'modified_timestamp': 1553693464,
'modified_date': '20190327',
'timestamp': 1417219200,
'upload_date': '20141129',
},
}, {
'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril',
'md5': '9a81ed53f2b2197cfa7ed455b12f8ade',
'md5': '5b4859940e3adef61247a77dfb76046a',
'info_dict': {
'id': 'e757904',
'ext': 'mp4',
'title': '25 Curiosidades, 25 de Abril',
'description': 'Estudar ou não estudar - Em cada um dos episódios descobrimos uma curiosidade acerca de como era viver em Portugal antes da revolução do 25 de abr',
'title': 'Estudar ou não estudar',
'description': 'md5:3bfd7eb8bebfd5711a08df69c9c14c35',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1711958401,
'duration': 146.0,
'upload_date': '20240401',
'modified_timestamp': 1712242991,
'series': '25 Curiosidades, 25 de Abril',
'episode_number': 2,
'episode': 'Estudar ou não estudar',
'modified_date': '20240404',
},
}, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
'only_matching': True,
}, {
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/portugues-1-ano',
'only_matching': True,
}, {
'url': 'https://www.rtp.pt/play/palco/p13785/l7nnon',
'only_matching': True,
# Episode not accessible through API
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/e500050/portugues-1-ano',
'md5': '57660c0b46db9f22118c52cbd65975e4',
'info_dict': {
'id': 'e500050',
'ext': 'mp4',
'title': 'Português - 1.º ano',
'duration': 1669.0,
'description': 'md5:be68925c81269f8c6886589f25fe83ea',
'upload_date': '20201020',
'timestamp': 1603180799,
'thumbnail': 'https://cdn-images.rtp.pt/EPG/imagens/39482_59449_64850.png?v=3&w=860',
},
}]
_USER_AGENT = 'rtpplay/2.0.66 (pt.rtp.rtpplay; build:2066; iOS 15.8.3) Alamofire/5.9.1'
_AUTH_TOKEN = None
def _fetch_auth_token(self):
if self._AUTH_TOKEN:
return self._AUTH_TOKEN
self._AUTH_TOKEN = traverse_obj(self._download_json(Request(
'https://rtpplayapi.rtp.pt/play/api/2/token-manager',
headers={
'Accept': '*/*',
'rtp-play-auth': 'RTPPLAY_MOBILE_IOS',
'rtp-play-auth-hash': 'fac9c328b2f27e26e03d7f8942d66c05b3e59371e16c2a079f5c83cc801bd3ee',
'rtp-play-auth-timestamp': '2145973229682',
'User-Agent': self._USER_AGENT,
}, extensions={'keep_header_casing': True}), None,
note='Fetching guest auth token', errnote='Could not fetch guest auth token',
fatal=False), ('token', 'token', {str}))
return self._AUTH_TOKEN
@staticmethod
def _cleanup_media_url(url):
if urllib.parse.urlparse(url).netloc == 'streaming-ondemand.rtp.pt':
return None
return url.replace('/drm-fps/', '/hls/').replace('/drm-dash/', '/dash/')
def _extract_formats(self, media_urls, episode_id):
formats = []
subtitles = {}
for media_url in set(traverse_obj(media_urls, (..., {url_or_none}, {self._cleanup_media_url}))):
ext = determine_ext(media_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
media_url, episode_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
media_url, episode_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'url': media_url,
'format_id': 'http',
})
return formats, subtitles
def _extract_from_api(self, program_id, episode_id):
auth_token = self._fetch_auth_token()
if not auth_token:
return
episode_data = traverse_obj(self._download_json(
f'https://www.rtp.pt/play/api/1/get-episode/{program_id}/{episode_id[1:]}', episode_id,
query={'include_assets': 'true', 'include_webparams': 'true'},
headers={
'Accept': '*/*',
'Authorization': f'Bearer {auth_token}',
'User-Agent': self._USER_AGENT,
}, fatal=False), 'result', {dict})
if not episode_data:
return
asset_urls = traverse_obj(episode_data, ('assets', 0, 'asset_url', {dict}))
media_urls = traverse_obj(asset_urls, (
((('hls', 'dash'), 'stream_url'), ('multibitrate', ('url_hls', 'url_dash'))),))
formats, subtitles = self._extract_formats(media_urls, episode_id)
for sub_data in traverse_obj(asset_urls, ('subtitles', 'vtt_list', lambda _, v: url_or_none(v['file']))):
subtitles.setdefault(sub_data.get('code') or 'pt', []).append({
'url': sub_data['file'],
'name': sub_data.get('language'),
})
return {
'id': episode_id,
'formats': formats,
'subtitles': subtitles,
'thumbnail': traverse_obj(episode_data, ('assets', 0, 'asset_thumbnail', {url_or_none})),
**traverse_obj(episode_data, ('episode', {
'title': (('episode_title', 'program_title'), {str}, filter, any),
'alt_title': ('episode_subtitle', {str}, filter),
'description': (('episode_description', 'episode_summary'), {str}, filter, any),
'timestamp': ('episode_air_date', {parse_iso8601(delimiter=' ')}),
'modified_timestamp': ('episode_lastchanged', {parse_iso8601(delimiter=' ')}),
'duration': ('episode_duration_complete', {parse_duration}),
'episode': ('episode_title', {str}, filter),
'episode_number': ('episode_number', {int_or_none}),
'season': ('program_season', {str}, filter),
'series': ('program_title', {str}, filter),
})),
}
_RX_OBFUSCATION = re.compile(r'''(?xs)
atob\s*\(\s*decodeURIComponent\s*\(\s*
(\[[0-9A-Za-z%,'"]*\])
\s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\)
''')
def __unobfuscate(self, data, *, video_id):
if data.startswith('{'):
data = self._RX_OBFUSCATION.sub(
lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote(
''.join(self._parse_json(m.group(1), video_id)),
)).decode('iso-8859-1')),
data)
return js_to_json(data)
def __unobfuscate(self, data):
return self._RX_OBFUSCATION.sub(
lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote(
''.join(json.loads(m.group(1))),
)).decode('iso-8859-1')),
data)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
f, config = self._search_regex(
r'''(?sx)
(?:var\s+f\s*=\s*(?P<f>".*?"|{[^;]+?});\s*)?
var\s+player1\s+=\s+new\s+RTPPlayer\s*\((?P<config>{(?:(?!\*/).)+?})\);(?!\s*\*/)
''', webpage,
'player config', group=('f', 'config'))
config = self._parse_json(
config, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
f = config['file'] if not f else self._parse_json(
f, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
def _extract_from_html(self, url, episode_id):
webpage = self._download_webpage(url, episode_id)
formats = []
if isinstance(f, dict):
f_hls = f.get('hls')
if f_hls is not None:
formats.extend(self._extract_m3u8_formats(
f_hls, video_id, 'mp4', 'm3u8_native', m3u8_id='hls'))
f_dash = f.get('dash')
if f_dash is not None:
formats.extend(self._extract_mpd_formats(f_dash, video_id, mpd_id='dash'))
else:
formats.append({
'format_id': 'f',
'url': f,
'vcodec': 'none' if config.get('mediaType') == 'audio' else None,
})
subtitles = {}
vtt = config.get('vtt')
if vtt is not None:
for lcode, lname, url in vtt:
subtitles.setdefault(lcode, []).append({
'name': lname,
'url': url,
})
media_urls = traverse_obj(re.findall(r'(?:var\s+f\s*=|RTPPlayer\({[^}]+file:)\s*({[^}]+}|"[^"]+")', webpage), (
-1, (({self.__unobfuscate}, {js_to_json}, {json.loads}, {dict.values}, ...), {json.loads})))
formats, subtitles = self._extract_formats(media_urls, episode_id)
return {
'id': video_id,
'title': title,
'id': episode_id,
'formats': formats,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
'subtitles': subtitles,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage, default=None),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image'], webpage, default=None),
**self._search_json_ld(webpage, episode_id, default={}),
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
}
def _real_extract(self, url):
program_id, episode_id = self._match_valid_url(url).group('program_id', 'id')
return self._extract_from_api(program_id, episode_id) or self._extract_from_html(url, episode_id)

View File

@ -0,0 +1,87 @@
from .common import InfoExtractor
from .vimeo import VHXEmbedIE
from ..utils import (
ExtractorError,
clean_html,
update_url,
urlencode_postdata,
)
from ..utils.traversal import find_element, traverse_obj
class SoftWhiteUnderbellyIE(InfoExtractor):
_LOGIN_URL = 'https://www.softwhiteunderbelly.com/login'
_NETRC_MACHINE = 'softwhiteunderbelly'
_VALID_URL = r'https?://(?:www\.)?softwhiteunderbelly\.com/videos/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.softwhiteunderbelly.com/videos/kenneth-final1',
'note': 'A single Soft White Underbelly Episode',
'md5': '8e79f29ec1f1bda6da2e0b998fcbebb8',
'info_dict': {
'id': '3201266',
'ext': 'mp4',
'display_id': 'kenneth-final1',
'title': 'Appalachian Man interview-Kenneth',
'description': 'Soft White Underbelly interview and portrait of Kenneth, an Appalachian man in Clay County, Kentucky.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/249f6db0-2b39-49a4-979b-f8dad4681825.jpg',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos',
'uploader_id': 'user80538407',
'duration': 512,
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
'url': 'https://www.softwhiteunderbelly.com/videos/tj-2-final-2160p',
'note': 'A single Soft White Underbelly Episode',
'md5': '286bd8851b4824c62afb369e6f307036',
'info_dict': {
'id': '3506029',
'ext': 'mp4',
'display_id': 'tj-2-final-2160p',
'title': 'Fentanyl Addict interview-TJ (follow up)',
'description': 'Soft White Underbelly follow up interview and portrait of TJ, a fentanyl addict on Skid Row.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/c883d531-5da0-4faf-a2e2-8eba97e5adfc.jpg',
'duration': 817,
'uploader': 'OTT Videos',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader_id': 'user80538407',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}]
def _perform_login(self, username, password):
signin_page = self._download_webpage(self._LOGIN_URL, None, 'Fetching authenticity token')
self._download_webpage(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata({
'email': username,
'password': password,
'authenticity_token': self._html_search_regex(
r'name=["\']authenticity_token["\']\s+value=["\']([^"\']+)', signin_page, 'authenticity_token'),
'utf8': True,
}),
)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if '<div id="watch-unauthorized"' in webpage:
if self._get_cookies('https://www.softwhiteunderbelly.com').get('_session'):
raise ExtractorError('This account is not subscribed to this content', expected=True)
self.raise_login_required()
embed_url, embed_id = self._html_search_regex(
r'embed_url:\s*["\'](?P<url>https?://embed\.vhx\.tv/videos/(?P<id>\d+)[^"\']*)',
webpage, 'embed url', group=('url', 'id'))
return {
'_type': 'url_transparent',
'ie_key': VHXEmbedIE.ie_key(),
'url': VHXEmbedIE._smuggle_referrer(embed_url, 'https://www.softwhiteunderbelly.com'),
'id': embed_id,
'display_id': display_id,
'title': traverse_obj(webpage, ({find_element(id='watch-info')}, {find_element(cls='video-title')}, {clean_html})),
'description': self._html_search_meta('description', webpage, default=None),
'thumbnail': update_url(self._og_search_thumbnail(webpage) or '', query=None) or None,
}

View File

@ -52,7 +52,8 @@ class SoundcloudBaseIE(InfoExtractor):
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_HEADERS = {}
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_IMAGE_REPL_RE = r'-[0-9a-z]+\.(?P<ext>jpg|png)'
_TAGS_RE = re.compile(r'"([^"]+)"|([^ ]+)')
_ARTWORK_MAP = {
'mini': 16,
@ -331,12 +332,14 @@ def invalid_url(url):
thumbnails = []
artwork_url = info.get('artwork_url')
thumbnail = artwork_url or user.get('avatar_url')
if isinstance(thumbnail, str):
if re.search(self._IMAGE_REPL_RE, thumbnail):
if url_or_none(thumbnail):
if mobj := re.search(self._IMAGE_REPL_RE, thumbnail):
for image_id, size in self._ARTWORK_MAP.items():
# Soundcloud serves JPEG regardless of URL's ext *except* for "original" thumb
ext = mobj.group('ext') if image_id == 'original' else 'jpg'
i = {
'id': image_id,
'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.jpg', thumbnail),
'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.{ext}', thumbnail),
}
if image_id == 'tiny' and not artwork_url:
size = 18
@ -372,6 +375,7 @@ def extract_count(key):
'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'),
'genres': traverse_obj(info, ('genre', {str}, filter, all, filter)),
'tags': traverse_obj(info, ('tag_list', {self._TAGS_RE.findall}, ..., ..., filter)),
'artists': traverse_obj(info, ('publisher_metadata', 'artist', {str}, filter, all, filter)),
'formats': formats if not extract_flat else None,
}
@ -425,6 +429,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg',
'uploader_url': 'https://soundcloud.com/ethmusic',
'tags': 'count:14',
},
},
# geo-restricted
@ -440,7 +445,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_id': '9615865',
'timestamp': 1337635207,
'upload_date': '20120521',
'duration': 227.155,
'duration': 227.103,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@ -450,6 +455,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg',
'genres': ['Alternative'],
'artists': ['The Royal Concept'],
'tags': [],
},
},
# private link
@ -475,6 +481,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# private link (alt format)
@ -500,15 +507,16 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# downloadable song
{
'url': 'https://soundcloud.com/the80m/the-following',
'md5': '9ffcddb08c87d74fb5808a3c183a1d04',
'md5': 'ecb87d7705d5f53e6c02a63760573c75', # wav: '9ffcddb08c87d74fb5808a3c183a1d04'
'info_dict': {
'id': '343609555',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'The Following',
'track': 'The Following',
'description': '',
@ -526,15 +534,18 @@ class SoundcloudIE(SoundcloudBaseIE):
'view_count': int,
'genres': ['Dance & EDM'],
'artists': ['80M'],
'tags': ['80M', 'EDM', 'Dance', 'Music'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# private link, downloadable format
# tags with spaces (e.g. "Uplifting Trance", "Ori Uplift")
{
'url': 'https://soundcloud.com/oriuplift/uponly-238-no-talking-wav/s-AyZUd',
'md5': '64a60b16e617d41d0bef032b7f55441e',
'md5': '2e1530d0e9986a833a67cb34fc90ece0', # wav: '64a60b16e617d41d0bef032b7f55441e'
'info_dict': {
'id': '340344461',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
@ -552,7 +563,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/oriuplift',
'genres': ['Trance'],
'artists': ['Ori Uplift'],
'tags': ['Orchestral', 'Emotional', 'Uplifting Trance', 'Trance', 'Ori Uplift', 'UpOnly'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# no album art, use avatar pic for thumbnail
{
@ -577,6 +590,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'uploader_url': 'https://soundcloud.com/garyvee',
'artists': ['MadReal'],
'tags': [],
},
'params': {
'skip_download': True,
@ -604,8 +618,47 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'genres': ['Piano'],
'uploader_url': 'https://soundcloud.com/giovannisarani',
'tags': 'count:10',
},
},
# .png "original" artwork, 160kbps m4a HLS format
{
'url': 'https://soundcloud.com/skorxh/audio-dealer',
'info_dict': {
'id': '2011421339',
'ext': 'm4a',
'title': 'audio dealer',
'description': '',
'uploader': '$KORCH',
'uploader_id': '150292288',
'uploader_url': 'https://soundcloud.com/skorxh',
'comment_count': int,
'view_count': int,
'like_count': int,
'repost_count': int,
'duration': 213.469,
'tags': [],
'artists': ['$KORXH'],
'track': 'audio dealer',
'timestamp': 1737143201,
'upload_date': '20250117',
'license': 'all-rights-reserved',
'thumbnail': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png',
'thumbnails': [
{'id': 'mini', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-mini.jpg'},
{'id': 'tiny', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-tiny.jpg'},
{'id': 'small', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-small.jpg'},
{'id': 'badge', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-badge.jpg'},
{'id': 't67x67', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t67x67.jpg'},
{'id': 'large', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-large.jpg'},
{'id': 't300x300', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t300x300.jpg'},
{'id': 'crop', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-crop.jpg'},
{'id': 't500x500', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t500x500.jpg'},
{'id': 'original', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png'},
],
},
'params': {'skip_download': 'm3u8', 'format': 'hls_aac_160k'},
},
{
# AAC HQ format available (account with active subscription needed)
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',

View File

@ -1,5 +1,6 @@
from .bunnycdn import BunnyCdnIE
from .common import InfoExtractor
from ..utils import try_get, unified_timestamp
from ..utils import make_archive_id, try_get, unified_timestamp
class SovietsClosetBaseIE(InfoExtractor):
@ -43,7 +44,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'url': 'https://sovietscloset.com/video/1337',
'md5': 'bd012b04b261725510ca5383074cdd55',
'info_dict': {
'id': '1337',
'id': '2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67',
'ext': 'mp4',
'title': 'The Witcher #13',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67/thumbnail\.jpg$',
@ -55,20 +56,23 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20170413',
'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 7007,
'duration': 7008,
'was_live': True,
'availability': 'public',
'series': 'The Witcher',
'season': 'Misc',
'episode_number': 13,
'episode': 'Episode 13',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1337'],
},
},
{
'url': 'https://sovietscloset.com/video/1105',
'md5': '89fa928f183893cb65a0b7be846d8a90',
'info_dict': {
'id': '1105',
'id': 'c0e5e76f-3a93-40b4-bf01-12343c2eec5d',
'ext': 'mp4',
'title': 'Arma 3 - Zeus Games #5',
'uploader': 'SovietWomble',
@ -80,39 +84,20 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20160420',
'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 8804,
'duration': 8805,
'was_live': True,
'availability': 'public',
'series': 'Arma 3',
'season': 'Zeus Games',
'episode_number': 5,
'episode': 'Episode 5',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1105'],
},
},
]
def _extract_bunnycdn_iframe(self, video_id, bunnycdn_id):
iframe = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/5105/{bunnycdn_id}',
video_id, note='Downloading BunnyCDN iframe', headers=self.MEDIADELIVERY_REFERER)
m3u8_url = self._search_regex(r'(https?://.*?\.m3u8)', iframe, 'm3u8 url')
thumbnail_url = self._search_regex(r'(https?://.*?thumbnail\.jpg)', iframe, 'thumbnail url')
m3u8_formats = self._extract_m3u8_formats(m3u8_url, video_id, headers=self.MEDIADELIVERY_REFERER)
if not m3u8_formats:
duration = None
else:
duration = self._extract_m3u8_vod_duration(
m3u8_formats[0]['url'], video_id, headers=self.MEDIADELIVERY_REFERER)
return {
'formats': m3u8_formats,
'thumbnail': thumbnail_url,
'duration': duration,
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@ -122,13 +107,13 @@ def _real_extract(self, url):
stream = self.parse_nuxt_jsonp(f'{static_assets_base}/video/{video_id}/payload.js', video_id, 'video')['stream']
return {
return self.url_result(
f'https://iframe.mediadelivery.net/embed/5105/{stream["bunnyId"]}', ie=BunnyCdnIE, url_transparent=True,
**self.video_meta(
video_id=video_id, game_name=stream['game']['name'],
category_name=try_get(stream, lambda x: x['subcategory']['name'], str),
episode_number=stream.get('number'), stream_date=stream.get('date')),
**self._extract_bunnycdn_iframe(video_id, stream['bunnyId']),
}
_old_archive_ids=[make_archive_id(self, video_id)])
class SovietsClosetPlaylistIE(SovietsClosetBaseIE):

View File

@ -249,6 +249,12 @@ def _extract_web_data_and_status(self, url, video_id, fatal=True):
elif fatal:
raise ExtractorError('Unable to extract webpage video data')
if not traverse_obj(video_data, ('video', {dict})) and traverse_obj(video_data, ('isContentClassified', {bool})):
message = 'This post may not be comfortable for some audiences. Log in for access'
if fatal:
self.raise_login_required(message)
self.report_warning(f'{message}. {self._login_hint()}', video_id=video_id)
return video_data, status
def _get_subtitles(self, aweme_detail, aweme_id, user_name):
@ -895,8 +901,12 @@ def _real_extract(self, url):
if video_data and status == 0:
return self._parse_aweme_video_web(video_data, url, video_id)
elif status == 10216:
raise ExtractorError('This video is private', expected=True)
elif status in (10216, 10222):
# 10216: private post; 10222: private account
self.raise_login_required(
'You do not have permission to view this post. Log into an account that has access')
elif status == 10204:
raise ExtractorError('Your IP address is blocked from accessing this post', expected=True)
raise ExtractorError(f'Video not available, status code {status}', video_id=video_id)

117
yt_dlp/extractor/tvw.py Normal file
View File

@ -0,0 +1,117 @@
import json
from .common import InfoExtractor
from ..utils import clean_html, remove_end, unified_timestamp, url_or_none
from ..utils.traversal import traverse_obj
class TvwIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvw\.org/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://tvw.org/video/billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211/',
'md5': '9ceb94fe2bb7fd726f74f16356825703',
'info_dict': {
'id': '2024011211',
'ext': 'mp4',
'title': 'Billy Frank Jr. Statue Maquette Unveiling Ceremony',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:58a8150017d985b4f377e11ee8f6f36e',
'timestamp': 1704902400,
'upload_date': '20240110',
'location': 'Legislative Building',
'display_id': 'billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211',
'categories': ['General Interest'],
},
}, {
'url': 'https://tvw.org/video/ebeys-landing-state-park-2024081007/',
'md5': '71e87dae3deafd65d75ff3137b9a32fc',
'info_dict': {
'id': '2024081007',
'ext': 'mp4',
'title': 'Ebey\'s Landing State Park',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:50c5bd73bde32fa6286a008dbc853386',
'timestamp': 1724310900,
'upload_date': '20240822',
'location': 'Ebeys Landing State Park',
'display_id': 'ebeys-landing-state-park-2024081007',
'categories': ['Washington State Parks'],
},
}, {
'url': 'https://tvw.org/video/home-warranties-workgroup-2',
'md5': 'f678789bf94d07da89809f213cf37150',
'info_dict': {
'id': '1999121000',
'ext': 'mp4',
'title': 'Home Warranties Workgroup',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:861396cc523c9641d0dce690bc5c35f3',
'timestamp': 946389600,
'upload_date': '19991228',
'display_id': 'home-warranties-workgroup-2',
'categories': ['Legislative'],
},
}, {
'url': 'https://tvw.org/video/washington-to-washington-a-new-space-race-2022041111/?eventID=2022041111',
'md5': '6f5551090b351aba10c0d08a881b4f30',
'info_dict': {
'id': '2022041111',
'ext': 'mp4',
'title': 'Washington to Washington - A New Space Race',
'thumbnail': r're:^https?://.*\.(?:jpe?g|png)$',
'description': 'md5:f65a24eec56107afbcebb3aa5cd26341',
'timestamp': 1650394800,
'upload_date': '20220419',
'location': 'Hayner Media Center',
'display_id': 'washington-to-washington-a-new-space-race-2022041111',
'categories': ['Washington to Washington', 'General Interest'],
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
client_id = self._html_search_meta('clientID', webpage, fatal=True)
video_id = self._html_search_meta('eventID', webpage, fatal=True)
video_data = self._download_json(
'https://api.v3.invintus.com/v2/Event/getDetailed', video_id,
headers={
'authorization': 'embedder',
'wsc-api-key': '7WhiEBzijpritypp8bqcU7pfU9uicDR',
},
data=json.dumps({
'clientID': client_id,
'eventID': video_id,
'showStreams': True,
}).encode())['data']
formats = []
subtitles = {}
for stream_url in traverse_obj(video_data, ('streamingURIs', ..., {url_or_none})):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if caption_url := traverse_obj(video_data, ('captionPath', {url_or_none})):
subtitles.setdefault('en', []).append({'url': caption_url, 'ext': 'vtt'})
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'title': remove_end(self._og_search_title(webpage, default=None), ' - TVW'),
'description': self._og_search_description(webpage, default=None),
**traverse_obj(video_data, {
'title': ('title', {str}),
'description': ('description', {clean_html}),
'categories': ('categories', ..., {str}),
'thumbnail': ('videoThumbnail', {url_or_none}),
'timestamp': ('startDateTime', {unified_timestamp}),
'location': ('locationName', {str}),
'is_live': ('eventStatus', {lambda x: x == 'live'}),
}),
}

View File

@ -1,11 +1,12 @@
import functools
import json
import random
import math
import re
import urllib.parse
from .common import InfoExtractor
from .periscope import PeriscopeBaseIE, PeriscopeIE
from ..jsinterp import js_number_to_string
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
@ -1330,6 +1331,11 @@ def _build_graphql_query(self, media_id):
},
}
def _generate_syndication_token(self, twid):
# ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
translation = str.maketrans(dict.fromkeys('0.'))
return js_number_to_string((int(twid) / 1e15) * math.pi, 36).translate(translation)
def _call_syndication_api(self, twid):
self.report_warning(
'Not all metadata or media is available via syndication endpoint', twid, only_once=True)
@ -1337,8 +1343,7 @@ def _call_syndication_api(self, twid):
'https://cdn.syndication.twimg.com/tweet-result', twid, 'Downloading syndication JSON',
headers={'User-Agent': 'Googlebot'}, query={
'id': twid,
# TODO: token = ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
'token': ''.join(random.choices('123456789abcdefghijklmnopqrstuvwxyz', k=10)),
'token': self._generate_syndication_token(twid),
})
if not status:
raise ExtractorError('Syndication endpoint returned empty JSON response')

View File

@ -116,6 +116,7 @@ class VKIE(VKBaseIE):
'id': '-77521_162222515',
'ext': 'mp4',
'title': 'ProtivoGunz - Хуёвая песня',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
'uploader_id': '39545378',
'duration': 195,
@ -165,6 +166,7 @@ class VKIE(VKBaseIE):
'id': '-93049196_456239755',
'ext': 'mp4',
'title': '8 серия (озвучка)',
'description': 'Видео из официальной группы Noize MC\nhttp://vk.com/noizemc',
'duration': 8383,
'comment_count': int,
'uploader': 'Dizi2021',
@ -240,6 +242,7 @@ class VKIE(VKBaseIE):
'upload_date': '20221005',
'uploader': 'Шальная Императрица',
'uploader_id': '-74006511',
'description': 'md5:f9315f7786fa0e84e75e4f824a48b056',
},
},
{
@ -278,6 +281,25 @@ class VKIE(VKBaseIE):
},
'skip': 'No formats found',
},
{
'note': 'video has chapters',
'url': 'https://vkvideo.ru/video-18403220_456239696',
'info_dict': {
'id': '-18403220_456239696',
'ext': 'mp4',
'title': 'Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер',
'description': 'md5:b112ea9de53683b6d03d29076f62eec2',
'uploader': 'Руслан Усачев',
'uploader_id': '-18403220',
'comment_count': int,
'like_count': int,
'duration': 1983,
'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:21',
'timestamp': 1738252883,
'upload_date': '20250130',
},
},
{
# live stream, hls and rtmp links, most likely already finished live
# stream by the time you are reading this comment
@ -449,7 +471,6 @@ def _real_extract(self, url):
return self.url_result(opts_url)
data = player['params'][0]
title = unescapeHTML(data['md_title'])
# 2 = live
# 3 = post live (finished live)
@ -507,17 +528,29 @@ def _real_extract(self, url):
return {
'id': video_id,
'formats': formats,
'title': title,
'thumbnail': data.get('jpg'),
'uploader': data.get('md_author'),
'uploader_id': str_or_none(data.get('author_id') or mv_data.get('authorId')),
'duration': int_or_none(data.get('duration') or mv_data.get('duration')),
'subtitles': subtitles,
**traverse_obj(mv_data, {
'title': ('title', {unescapeHTML}),
'description': ('desc', {clean_html}, filter),
'duration': ('duration', {int_or_none}),
'like_count': ('likes', {int_or_none}),
'comment_count': ('commcount', {int_or_none}),
}),
**traverse_obj(data, {
'title': ('md_title', {unescapeHTML}),
'description': ('description', {clean_html}, filter),
'thumbnail': ('jpg', {url_or_none}),
'uploader': ('md_author', {str}),
'uploader_id': (('author_id', 'authorId'), {str_or_none}, any),
'duration': ('duration', {int_or_none}),
'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), {
'title': ('text', {str}),
'start_time': 'time',
}),
}),
'timestamp': timestamp,
'view_count': view_count,
'like_count': int_or_none(mv_data.get('likes')),
'comment_count': int_or_none(mv_data.get('commcount')),
'is_live': is_live,
'subtitles': subtitles,
'_format_sort_fields': ('res', 'source'),
}

View File

@ -12,6 +12,7 @@
str_or_none,
strip_jsonp,
traverse_obj,
truncate_string,
url_or_none,
urlencode_postdata,
urljoin,
@ -96,7 +97,8 @@ def _extract_formats(self, video_info):
})
return formats
def _parse_video_info(self, video_info, video_id=None):
def _parse_video_info(self, video_info):
video_id = traverse_obj(video_info, (('id', 'id_str', 'mid'), {str_or_none}, any))
return {
'id': video_id,
'extractor_key': WeiboIE.ie_key(),
@ -105,9 +107,10 @@ def _parse_video_info(self, video_info, video_id=None):
'http_headers': {'Referer': 'https://weibo.com/'},
'_old_archive_ids': [make_archive_id('WeiboMobile', video_id)],
**traverse_obj(video_info, {
'id': (('id', 'id_str', 'mid'), {str_or_none}),
'display_id': ('mblogid', {str_or_none}),
'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter),
'title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'),
{lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}, filter),
'alt_title': ('page_info', 'media_info', ('video_title', 'kol_title', 'name'), {str}, filter),
'description': ('text_raw', {str}),
'duration': ('page_info', 'media_info', 'duration', {int_or_none}),
'timestamp': ('page_info', 'media_info', 'video_publish_time', {int_or_none}),
@ -129,9 +132,11 @@ class WeiboIE(WeiboBaseIE):
'url': 'https://weibo.com/7827771738/N4xlMvjhI',
'info_dict': {
'id': '4910815147462302',
'_old_archive_ids': ['weibomobile 4910815147462302'],
'ext': 'mp4',
'display_id': 'N4xlMvjhI',
'title': '【睡前消息暑假版第一期:拉泰国一把 对中国有好处】',
'alt_title': '【睡前消息暑假版第一期:拉泰国一把 对中国有好处】',
'description': 'md5:e2637a7673980d68694ea7c43cf12a5f',
'duration': 918,
'timestamp': 1686312819,
@ -149,9 +154,11 @@ class WeiboIE(WeiboBaseIE):
'url': 'https://m.weibo.cn/status/4189191225395228',
'info_dict': {
'id': '4189191225395228',
'_old_archive_ids': ['weibomobile 4189191225395228'],
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'alt_title': '柴犬柴犬的秒拍视频',
'description': 'md5:80f461ab5cdae6bbdb70efbf5a1db24f',
'duration': 53,
'timestamp': 1514264429,
@ -166,34 +173,35 @@ class WeiboIE(WeiboBaseIE):
},
}, {
'url': 'https://m.weibo.cn/detail/4189191225395228',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagramshibainu.gaku http://t.cn/RHbmjzW ',
'duration': 53,
'timestamp': 1514264429,
'upload_date': '20171226',
'thumbnail': r're:https://.*\.jpg',
'uploader': '柴犬柴犬',
'uploader_id': '5926682210',
'uploader_url': 'https://weibo.com/u/5926682210',
'view_count': int,
'like_count': int,
'repost_count': int,
},
'only_matching': True,
}, {
'url': 'https://weibo.com/0/4224132150961381',
'note': 'no playback_list example',
'only_matching': True,
}, {
'url': 'https://m.weibo.cn/detail/5120561132606436',
'info_dict': {
'id': '5120561132606436',
},
'playlist_count': 9,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._parse_video_info(self._weibo_download_json(
f'https://weibo.com/ajax/statuses/show?id={video_id}', video_id))
meta = self._weibo_download_json(f'https://weibo.com/ajax/statuses/show?id={video_id}', video_id)
mix_media_info = traverse_obj(meta, ('mix_media_info', 'items', ...))
if not mix_media_info:
return self._parse_video_info(meta)
return self.playlist_result(self._entries(mix_media_info), video_id)
def _entries(self, mix_media_info):
for media_info in traverse_obj(mix_media_info, lambda _, v: v['type'] != 'pic'):
yield self._parse_video_info(traverse_obj(media_info, {
'id': ('data', 'object_id'),
'page_info': {'media_info': ('data', 'media_info', {dict})},
}))
class WeiboVideoIE(WeiboBaseIE):

View File

@ -100,8 +100,8 @@ def _real_extract(self, url):
class WSJArticleIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?wsj\.com/articles/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'(?i)https?://(?:www\.)?wsj\.com/(?:articles|opinion)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.wsj.com/articles/dont-like-china-no-pandas-for-you-1490366939?',
'info_dict': {
'id': '4B13FA62-1D8C-45DB-8EA1-4105CB20B362',
@ -110,11 +110,20 @@ class WSJArticleIE(InfoExtractor):
'uploader_id': 'ralcaraz',
'title': 'Bao Bao the Panda Leaves for China',
},
}
}, {
'url': 'https://www.wsj.com/opinion/hamas-hostages-caskets-bibas-family-israel-gaza-29da083b',
'info_dict': {
'id': 'CE68D629-8DB8-4CD3-B30A-92112C102054',
'ext': 'mp4',
'upload_date': '20241007',
'uploader_id': 'Tinnes, David',
'title': 'WSJ Opinion: "Get the Jew": The Crown Heights Riot Revisited',
},
}]
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
webpage = self._download_webpage(url, article_id, impersonate=True)
video_id = self._search_regex(
r'(?:id=["\']video|video-|iframe\.html\?guid=|data-src=["\'])([a-fA-F0-9-]{36})',
webpage, 'video id')

View File

@ -0,0 +1,50 @@
# flake8: noqa: F401
from ._base import YoutubeBaseInfoExtractor
from ._clip import YoutubeClipIE
from ._mistakes import YoutubeTruncatedIDIE, YoutubeTruncatedURLIE
from ._notifications import YoutubeNotificationsIE
from ._redirect import (
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
)
from ._search import YoutubeMusicSearchURLIE, YoutubeSearchDateIE, YoutubeSearchIE, YoutubeSearchURLIE
from ._tab import YoutubePlaylistIE, YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
# Hack to allow plugin overrides work
for _cls in [
YoutubeBaseInfoExtractor,
YoutubeClipIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeNotificationsIE,
YoutubeConsentRedirectIE,
YoutubeFavouritesIE,
YoutubeFeedsInfoExtractor,
YoutubeHistoryIE,
YoutubeLivestreamEmbedIE,
YoutubeRecommendedIE,
YoutubeShortsAudioPivotIE,
YoutubeSubscriptionsIE,
YoutubeWatchLaterIE,
YoutubeYtBeIE,
YoutubeYtUserIE,
YoutubeMusicSearchURLIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubePlaylistIE,
YoutubeTabBaseInfoExtractor,
YoutubeTabIE,
YoutubeIE,
]:
_cls.__module__ = 'yt_dlp.extractor.youtube'

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,66 @@
from ._tab import YoutubeTabBaseInfoExtractor
from ._video import YoutubeIE
from ...utils import ExtractorError, traverse_obj
class YoutubeClipIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:clip'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/clip/(?P<id>[^/?#]+)'
_TESTS = [{
# FIXME: Other metadata should be extracted from the clip, not from the base video
'url': 'https://www.youtube.com/clip/UgytZKpehg-hEMBSn3F4AaABCQ',
'info_dict': {
'id': 'UgytZKpehg-hEMBSn3F4AaABCQ',
'ext': 'mp4',
'section_start': 29.0,
'section_end': 39.7,
'duration': 10.7,
'age_limit': 0,
'availability': 'public',
'categories': ['Gaming'],
'channel': 'Scott The Woz',
'channel_id': 'UC4rqhyiTs7XyuODcECvuiiQ',
'channel_url': 'https://www.youtube.com/channel/UC4rqhyiTs7XyuODcECvuiiQ',
'description': 'md5:7a4517a17ea9b4bd98996399d8bb36e7',
'like_count': int,
'playable_in_embed': True,
'tags': 'count:17',
'thumbnail': 'https://i.ytimg.com/vi_webp/ScPX26pdQik/maxresdefault.webp',
'title': 'Mobile Games on Console - Scott The Woz',
'upload_date': '20210920',
'uploader': 'Scott The Woz',
'uploader_id': '@ScottTheWoz',
'uploader_url': 'https://www.youtube.com/@ScottTheWoz',
'view_count': int,
'live_status': 'not_live',
'channel_follower_count': int,
'chapters': 'count:20',
'comment_count': int,
'heatmap': 'count:100',
},
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
_, data = self._extract_webpage(url, clip_id)
video_id = traverse_obj(data, ('currentVideoEndpoint', 'watchEndpoint', 'videoId'))
if not video_id:
raise ExtractorError('Unable to find video ID')
clip_data = traverse_obj(data, (
'engagementPanels', ..., 'engagementPanelSectionListRenderer', 'content', 'clipSectionRenderer',
'contents', ..., 'clipAttributionRenderer', 'onScrubExit', 'commandExecutorCommand', 'commands', ...,
'openPopupAction', 'popup', 'notificationActionRenderer', 'actionButton', 'buttonRenderer', 'command',
'commandExecutorCommand', 'commands', ..., 'loopCommand'), get_all=False)
return {
'_type': 'url_transparent',
'url': f'https://www.youtube.com/watch?v={video_id}',
'ie_key': YoutubeIE.ie_key(),
'id': clip_id,
'section_start': int(clip_data['startTimeMs']) / 1000,
'section_end': int(clip_data['endTimeMs']) / 1000,
'_format_sort_fields': ( # https protocol is prioritized for ffmpeg compatibility
'proto:https', 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang'),
}

View File

@ -0,0 +1,69 @@
from ._base import YoutubeBaseInfoExtractor
from ...utils import ExtractorError
class YoutubeTruncatedURLIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list
_VALID_URL = r'''(?x)
(?:https?://)?
(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/
(?:watch\?(?:
feature=[a-z_]+|
annotation_id=annotation_[^&]+|
x-yt-cl=[0-9]+|
hl=[^&]*|
t=[0-9]+
)?
|
attribution_link\?a=[^&]+
)
$
'''
_TESTS = [{
'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?feature=foo',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?hl=en-GB',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?t=2372',
'only_matching': True,
}]
def _real_extract(self, url):
raise ExtractorError(
'Did you forget to quote the URL? Remember that & is a meta '
'character in most shells, so you want to put the URL in quotes, '
'like yt-dlp '
'"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
' or simply yt-dlp BaW_jenozKc .',
expected=True)
class YoutubeTruncatedIDIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:truncated_id'
IE_DESC = False # Do not list
_VALID_URL = r'https?://(?:www\.)?youtube\.com/watch\?v=(?P<id>[0-9A-Za-z_-]{1,10})$'
_TESTS = [{
'url': 'https://www.youtube.com/watch?v=N_708QY7Ob',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
raise ExtractorError(
f'Incomplete YouTube ID {video_id}. URL {url} looks truncated.',
expected=True)

View File

@ -0,0 +1,98 @@
import itertools
import re
from ._tab import YoutubeTabBaseInfoExtractor, YoutubeTabIE
from ._video import YoutubeIE
from ...utils import traverse_obj
class YoutubeNotificationsIE(YoutubeTabBaseInfoExtractor):
IE_NAME = 'youtube:notif'
IE_DESC = 'YouTube notifications; ":ytnotif" keyword (requires cookies)'
_VALID_URL = r':ytnotif(?:ication)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytnotif',
'only_matching': True,
}, {
'url': ':ytnotifications',
'only_matching': True,
}]
def _extract_notification_menu(self, response, continuation_list):
notification_list = traverse_obj(
response,
('actions', 0, 'openPopupAction', 'popup', 'multiPageMenuRenderer', 'sections', 0, 'multiPageMenuNotificationSectionRenderer', 'items'),
('actions', 0, 'appendContinuationItemsAction', 'continuationItems'),
expected_type=list) or []
continuation_list[0] = None
for item in notification_list:
entry = self._extract_notification_renderer(item.get('notificationRenderer'))
if entry:
yield entry
continuation = item.get('continuationItemRenderer')
if continuation:
continuation_list[0] = continuation
def _extract_notification_renderer(self, notification):
video_id = traverse_obj(
notification, ('navigationEndpoint', 'watchEndpoint', 'videoId'), expected_type=str)
url = f'https://www.youtube.com/watch?v={video_id}'
channel_id = None
if not video_id:
browse_ep = traverse_obj(
notification, ('navigationEndpoint', 'browseEndpoint'), expected_type=dict)
channel_id = self.ucid_or_none(traverse_obj(browse_ep, 'browseId', expected_type=str))
post_id = self._search_regex(
r'/post/(.+)', traverse_obj(browse_ep, 'canonicalBaseUrl', expected_type=str),
'post id', default=None)
if not channel_id or not post_id:
return
# The direct /post url redirects to this in the browser
url = f'https://www.youtube.com/channel/{channel_id}/community?lb={post_id}'
channel = traverse_obj(
notification, ('contextualMenu', 'menuRenderer', 'items', 1, 'menuServiceItemRenderer', 'text', 'runs', 1, 'text'),
expected_type=str)
notification_title = self._get_text(notification, 'shortMessage')
if notification_title:
notification_title = notification_title.replace('\xad', '') # remove soft hyphens
# TODO: handle recommended videos
title = self._search_regex(
rf'{re.escape(channel or "")}[^:]+: (.+)', notification_title,
'video title', default=None)
timestamp = (self._parse_time_text(self._get_text(notification, 'sentTimeText'))
if self._configuration_arg('approximate_date', ie_key=YoutubeTabIE)
else None)
return {
'_type': 'url',
'url': url,
'ie_key': (YoutubeIE if video_id else YoutubeTabIE).ie_key(),
'video_id': video_id,
'title': title,
'channel_id': channel_id,
'channel': channel,
'uploader': channel,
'thumbnails': self._extract_thumbnails(notification, 'videoThumbnail'),
'timestamp': timestamp,
}
def _notification_menu_entries(self, ytcfg):
continuation_list = [None]
response = None
for page in itertools.count(1):
ctoken = traverse_obj(
continuation_list, (0, 'continuationEndpoint', 'getNotificationMenuEndpoint', 'ctoken'), expected_type=str)
response = self._extract_response(
item_id=f'page {page}', query={'ctoken': ctoken} if ctoken else {}, ytcfg=ytcfg,
ep='notification/get_notification_menu', check_get_keys='actions',
headers=self.generate_api_headers(ytcfg=ytcfg, visitor_data=self._extract_visitor_data(response)))
yield from self._extract_notification_menu(response, continuation_list)
if not continuation_list[0]:
break
def _real_extract(self, url):
display_id = 'notifications'
ytcfg = self._download_ytcfg('web', display_id) if not self.skip_webpage else {}
self._report_playlist_authcheck(ytcfg)
return self.playlist_result(self._notification_menu_entries(ytcfg), display_id, display_id)

View File

@ -0,0 +1,247 @@
import base64
import urllib.parse
from ._base import YoutubeBaseInfoExtractor
from ._tab import YoutubeTabIE
from ...utils import ExtractorError, classproperty, parse_qs, update_url_query, url_or_none
class YoutubeYtBeIE(YoutubeBaseInfoExtractor):
IE_DESC = 'youtu.be'
_VALID_URL = rf'https?://youtu\.be/(?P<id>[0-9A-Za-z_-]{{11}})/*?.*?\blist=(?P<playlist_id>{YoutubeBaseInfoExtractor._PLAYLIST_ID_RE})'
_TESTS = [{
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
'info_dict': {
'id': 'yeWKywCrFtk',
'ext': 'mp4',
'title': 'Small Scale Baler and Braiding Rugs',
'uploader': 'Backus-Page House Museum',
'uploader_id': '@backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@backuspagemuseum',
'upload_date': '20161008',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
'like_count': int,
'age_limit': 0,
'playable_in_embed': True,
'thumbnail': r're:^https?://.*\.webp',
'channel': 'Backus-Page House Museum',
'channel_id': 'UCEfMCQ9bs3tjvjy1s451zaw',
'live_status': 'not_live',
'view_count': int,
'channel_url': 'https://www.youtube.com/channel/UCEfMCQ9bs3tjvjy1s451zaw',
'availability': 'public',
'duration': 59,
'comment_count': int,
'channel_follower_count': int,
},
'params': {
'noplaylist': True,
'skip_download': True,
},
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
playlist_id = mobj.group('playlist_id')
return self.url_result(
update_url_query('https://www.youtube.com/watch', {
'v': video_id,
'list': playlist_id,
'feature': 'youtu.be',
}), ie=YoutubeTabIE.ie_key(), video_id=playlist_id)
class YoutubeLivestreamEmbedIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube livestream embeds'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/embed/live_stream/?\?(?:[^#]+&)?channel=(?P<id>[^&#]+)'
_TESTS = [{
'url': 'https://www.youtube.com/embed/live_stream?channel=UC2_KI6RB__jGdlnK6dvFEZA',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/channel/{channel_id}/live',
ie=YoutubeTabIE.ie_key(), video_id=channel_id)
class YoutubeYtUserIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube user videos; "ytuser:" prefix'
IE_NAME = 'youtube:user'
_VALID_URL = r'ytuser:(?P<id>.+)'
_TESTS = [{
'url': 'ytuser:phihag',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
return self.url_result(f'https://www.youtube.com/user/{user_id}', YoutubeTabIE, user_id)
class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:favorites'
IE_DESC = 'YouTube liked videos; ":ytfav" keyword (requires cookies)'
_VALID_URL = r':ytfav(?:ou?rite)?s?'
_LOGIN_REQUIRED = True
_TESTS = [{
'url': ':ytfav',
'only_matching': True,
}, {
'url': ':ytfavorites',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=LL',
ie=YoutubeTabIE.ie_key())
class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
"""
Base class for feed extractors
Subclasses must re-define the _FEED_NAME property.
"""
_LOGIN_REQUIRED = True
_FEED_NAME = 'feeds'
@classproperty
def IE_NAME(cls):
return f'youtube:{cls._FEED_NAME}'
def _real_extract(self, url):
return self.url_result(
f'https://www.youtube.com/feed/{self._FEED_NAME}', ie=YoutubeTabIE.ie_key())
class YoutubeWatchLaterIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:watchlater'
IE_DESC = 'Youtube watch later list; ":ytwatchlater" keyword (requires cookies)'
_VALID_URL = r':ytwatchlater'
_TESTS = [{
'url': ':ytwatchlater',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
'https://www.youtube.com/playlist?list=WL', ie=YoutubeTabIE.ie_key())
class YoutubeRecommendedIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube recommended videos; ":ytrec" keyword'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/?(?:[?#]|$)|:ytrec(?:ommended)?'
_FEED_NAME = 'recommended'
_LOGIN_REQUIRED = False
_TESTS = [{
'url': ':ytrec',
'only_matching': True,
}, {
'url': ':ytrecommended',
'only_matching': True,
}, {
'url': 'https://youtube.com',
'only_matching': True,
}]
class YoutubeSubscriptionsIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)'
_VALID_URL = r':ytsub(?:scription)?s?'
_FEED_NAME = 'subscriptions'
_TESTS = [{
'url': ':ytsubs',
'only_matching': True,
}, {
'url': ':ytsubscriptions',
'only_matching': True,
}]
class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'Youtube watch history; ":ythis" keyword (requires cookies)'
_VALID_URL = r':ythis(?:tory)?'
_FEED_NAME = 'history'
_TESTS = [{
'url': ':ythistory',
'only_matching': True,
}]
class YoutubeShortsAudioPivotIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube Shorts audio pivot (Shorts using audio of a given video)'
IE_NAME = 'youtube:shorts:pivot:audio'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/source/(?P<id>[\w-]{11})/shorts'
_TESTS = [{
'url': 'https://www.youtube.com/source/Lyj-MZSAA9o/shorts',
'only_matching': True,
}]
@staticmethod
def _generate_audio_pivot_params(video_id):
"""
Generates sfv_audio_pivot browse params for this video id
"""
pb_params = b'\xf2\x05+\n)\x12\'\n\x0b%b\x12\x0b%b\x1a\x0b%b' % ((video_id.encode(),) * 3)
return urllib.parse.quote(base64.b64encode(pb_params).decode())
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
f'https://www.youtube.com/feed/sfv_audio_pivot?bp={self._generate_audio_pivot_params(video_id)}',
ie=YoutubeTabIE)
class YoutubeConsentRedirectIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:consent'
IE_DESC = False # Do not list
_VALID_URL = r'https?://consent\.youtube\.com/m\?'
_TESTS = [{
'url': 'https://consent.youtube.com/m?continue=https%3A%2F%2Fwww.youtube.com%2Flive%2FqVv6vCqciTM%3Fcbrd%3D1&gl=NL&m=0&pc=yt&hl=en&src=1',
'info_dict': {
'id': 'qVv6vCqciTM',
'ext': 'mp4',
'age_limit': 0,
'uploader_id': '@sana_natori',
'comment_count': int,
'chapters': 'count:13',
'upload_date': '20221223',
'thumbnail': 'https://i.ytimg.com/vi/qVv6vCqciTM/maxresdefault.jpg',
'channel_url': 'https://www.youtube.com/channel/UCIdEIHpS0TdkqRkHL5OkLtA',
'uploader_url': 'https://www.youtube.com/@sana_natori',
'like_count': int,
'release_date': '20221223',
'tags': ['Vtuber', '月ノ美兎', '名取さな', 'にじさんじ', 'クリスマス', '3D配信'],
'title': '【 #インターネット女クリスマス 】3Dで歌ってはしゃぐインターネットの女たち【月美兎/名取さな】',
'view_count': int,
'playable_in_embed': True,
'duration': 4438,
'availability': 'public',
'channel_follower_count': int,
'channel_id': 'UCIdEIHpS0TdkqRkHL5OkLtA',
'categories': ['Entertainment'],
'live_status': 'was_live',
'release_timestamp': 1671793345,
'channel': 'さなちゃんねる',
'description': 'md5:6aebf95cc4a1d731aebc01ad6cc9806d',
'uploader': 'さなちゃんねる',
'channel_is_verified': True,
'heatmap': 'count:100',
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
redirect_url = url_or_none(parse_qs(url).get('continue', [None])[-1])
if not redirect_url:
raise ExtractorError('Invalid cookie consent redirect URL', expected=True)
return self.url_result(redirect_url)

View File

@ -0,0 +1,167 @@
import urllib.parse
from ._tab import YoutubeTabBaseInfoExtractor
from ..common import SearchInfoExtractor
from ...utils import join_nonempty, parse_qs
class YoutubeSearchIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_DESC = 'YouTube search'
IE_NAME = 'youtube:search'
_SEARCH_KEY = 'ytsearch'
_SEARCH_PARAMS = 'EgIQAfABAQ==' # Videos only
_TESTS = [{
'url': 'ytsearch5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'note': 'Suicide/self-harm search warning',
'url': 'ytsearch1:i hate myself and i wanna die',
'playlist_count': 1,
'info_dict': {
'id': 'i hate myself and i wanna die',
'title': 'i hate myself and i wanna die',
},
}]
class YoutubeSearchDateIE(YoutubeTabBaseInfoExtractor, SearchInfoExtractor):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube search, newest videos first'
_SEARCH_PARAMS = 'CAISAhAB8AEB' # Videos only, sorted by date
_TESTS = [{
'url': 'ytsearchdate5:youtube-dl test video',
'playlist_count': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}]
class YoutubeSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube search URLs with sorting and filter support'
IE_NAME = YoutubeSearchIE.IE_NAME + '_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/(?:results|search)\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,
'info_dict': {
'id': 'youtube-dl test video',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.youtube.com/results?search_query=python&sp=EgIQAg%253D%253D',
'playlist_mincount': 5,
'info_dict': {
'id': 'python',
'title': 'python',
},
}, {
'url': 'https://www.youtube.com/results?search_query=%23cats',
'playlist_mincount': 1,
'info_dict': {
'id': '#cats',
'title': '#cats',
# The test suite does not have support for nested playlists
# 'entries': [{
# 'url': r're:https://(www\.)?youtube\.com/hashtag/cats',
# 'title': '#cats',
# }],
},
}, {
# Channel results
'url': 'https://www.youtube.com/results?search_query=kurzgesagt&sp=EgIQAg%253D%253D',
'info_dict': {
'id': 'kurzgesagt',
'title': 'kurzgesagt',
},
'playlist': [{
'info_dict': {
'_type': 'url',
'id': 'UCsXVk37bltHxD1rDPwtNM8Q',
'url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'ie_key': 'YoutubeTab',
'channel': 'Kurzgesagt In a Nutshell',
'description': 'md5:4ae48dfa9505ffc307dad26342d06bfc',
'title': 'Kurzgesagt In a Nutshell',
'channel_id': 'UCsXVk37bltHxD1rDPwtNM8Q',
# No longer available for search as it is set to the handle.
# 'playlist_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsXVk37bltHxD1rDPwtNM8Q',
'thumbnails': list,
'uploader_id': '@kurzgesagt',
'uploader_url': 'https://www.youtube.com/@kurzgesagt',
'uploader': 'Kurzgesagt In a Nutshell',
'channel_is_verified': True,
'channel_follower_count': int,
},
}],
'params': {'extract_flat': True, 'playlist_items': '1'},
'playlist_mincount': 1,
}, {
'url': 'https://www.youtube.com/results?q=test&sp=EgQIBBgB',
'only_matching': True,
}]
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
return self.playlist_result(self._search_results(query, qs.get('sp', (None,))[0]), query, query)
class YoutubeMusicSearchURLIE(YoutubeTabBaseInfoExtractor):
IE_DESC = 'YouTube music search URLs with selectable sections, e.g. #songs'
IE_NAME = 'youtube:music:search_url'
_VALID_URL = r'https?://music\.youtube\.com/search\?([^#]+&)?(?:search_query|q)=(?:[^&]+)(?:[&#]|$)'
_TESTS = [{
'url': 'https://music.youtube.com/search?q=royalty+free+music',
'playlist_count': 16,
'info_dict': {
'id': 'royalty free music',
'title': 'royalty free music',
},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music&sp=EgWKAQIIAWoKEAoQAxAEEAkQBQ%3D%3D',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - songs',
'title': 'royalty free music - songs',
},
'params': {'extract_flat': 'in_playlist'},
}, {
'url': 'https://music.youtube.com/search?q=royalty+free+music#community+playlists',
'playlist_mincount': 30,
'info_dict': {
'id': 'royalty free music - community playlists',
'title': 'royalty free music - community playlists',
},
'params': {'extract_flat': 'in_playlist'},
}]
_SECTIONS = {
'albums': 'EgWKAQIYAWoKEAoQAxAEEAkQBQ==',
'artists': 'EgWKAQIgAWoKEAoQAxAEEAkQBQ==',
'community playlists': 'EgeKAQQoAEABagoQChADEAQQCRAF',
'featured playlists': 'EgeKAQQoADgBagwQAxAJEAQQDhAKEAU==',
'songs': 'EgWKAQIIAWoKEAoQAxAEEAkQBQ==',
'videos': 'EgWKAQIQAWoKEAoQAxAEEAkQBQ==',
}
def _real_extract(self, url):
qs = parse_qs(url)
query = (qs.get('search_query') or qs.get('q'))[0]
params = qs.get('sp', (None,))[0]
if params:
section = next((k for k, v in self._SECTIONS.items() if v == params), params)
else:
section = urllib.parse.unquote_plus(([*url.split('#'), ''])[1]).lower()
params = self._SECTIONS.get(section)
if not params:
section = None
title = join_nonempty(query, section, delim=' - ')
return self.playlist_result(self._search_results(query, params, default_client='web_music'), title, title)

File diff suppressed because it is too large Load Diff

View File

@ -137,6 +137,116 @@ def _extract_player(self, webpage, video_id, fatal=True):
group='json'),
video_id)
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = traverse_obj(t, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template'),
), get_all=False)
if not ptmd_path:
raise ExtractorError('Could not extract ptmd_path')
info = self._extract_ptmd(
urljoin(url, ptmd_path.replace('{playerId}', 'android_native_5')), video_id, player['apiToken'], url)
thumbnails = []
layouts = try_get(
content, lambda x: x['teaserImageRef']['layouts'], dict)
if layouts:
for layout_key, layout_url in layouts.items():
layout_url = url_or_none(layout_url)
if not layout_url:
continue
thumbnail = {
'url': layout_url,
'format_id': layout_key,
}
mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
if mobj:
thumbnail.update({
'width': int(mobj.group('width')),
'height': int(mobj.group('height')),
})
thumbnails.append(thumbnail)
chapter_marks = t.get('streamAnchorTag') or []
chapter_marks.append({'anchorOffset': int_or_none(t.get('duration'))})
chapters = [{
'start_time': chap.get('anchorOffset'),
'end_time': next_chap.get('anchorOffset'),
'title': chap.get('anchorLabel'),
} for chap, next_chap in zip(chapter_marks, chapter_marks[1:])]
return merge_dicts(info, {
'title': title,
'description': content.get('leadParagraph') or content.get('teasertext'),
'duration': int_or_none(t.get('duration')),
'timestamp': unified_timestamp(content.get('editorialDate')),
'thumbnails': thumbnails,
'chapters': chapters or None,
'episode': title,
**traverse_obj(content, ('programmeItem', 0, 'http://zdf.de/rels/target', {
'series_id': ('http://zdf.de/rels/cmdm/series', 'seriesUuid', {str}),
'series': ('http://zdf.de/rels/cmdm/series', 'seriesTitle', {str}),
'season': ('http://zdf.de/rels/cmdm/season', 'seasonTitle', {str}),
'season_number': ('http://zdf.de/rels/cmdm/season', 'seasonNumber', {int_or_none}),
'season_id': ('http://zdf.de/rels/cmdm/season', 'seasonUuid', {str}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode_id': ('contentId', {str}),
})),
})
def _extract_regular(self, url, player, video_id, query=None):
player_url = player['content']
content = self._call_api(
update_url_query(player_url, query),
video_id, 'content', player['apiToken'], url)
return self._extract_entry(player_url, player, content, video_id)
def _extract_mobile(self, video_id):
video = self._download_v2_doc(video_id)
formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
document = formitaeten and video['document']
if formitaeten:
title = document['titel']
content_id = document['basename']
format_urls = set()
for f in formitaeten or []:
self._extract_format(content_id, formats, format_urls, f)
thumbnails = []
teaser_bild = document.get('teaserBild')
if isinstance(teaser_bild, dict):
for thumbnail_key, thumbnail in teaser_bild.items():
thumbnail_url = try_get(
thumbnail, lambda x: x['url'], str)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'id': thumbnail_key,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': content_id,
'title': title,
'description': document.get('beschreibung'),
'duration': int_or_none(document.get('length')),
'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
try_get(video, lambda x: x['meta']['editorialDate'], str)),
'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(document),
'formats': formats,
}
class ZDFIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
@ -187,12 +297,20 @@ class ZDFIE(ZDFBaseIE):
'info_dict': {
'id': '151025_magie_farben2_tex',
'ext': 'mp4',
'duration': 2615.0,
'title': 'Die Magie der Farben (2/2)',
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
'duration': 2615,
'timestamp': 1465021200,
'upload_date': '20160604',
'thumbnail': 'https://www.zdf.de/assets/mauve-im-labor-100~768x432?cb=1464909117806',
'upload_date': '20160604',
'episode': 'Die Magie der Farben (2/2)',
'episode_id': 'POS_954f4170-36a5-4a41-a6cf-78f1f3b1f127',
'season': 'Staffel 1',
'series': 'Die Magie der Farben',
'season_number': 1,
'series_id': 'a39900dd-cdbd-4a6a-a413-44e8c6ae18bc',
'season_id': '5a92e619-8a0f-4410-a3d5-19c76fbebb37',
'episode_number': 2,
},
}, {
'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html',
@ -200,12 +318,13 @@ class ZDFIE(ZDFBaseIE):
'info_dict': {
'ext': 'mp4',
'id': 'video_funk_1770473',
'duration': 1278,
'description': 'Die Neue an der Schule verdreht Ismail den Kopf.',
'duration': 1278.0,
'title': 'Alles ist verzaubert',
'description': 'Die Neue an der Schule verdreht Ismail den Kopf.',
'timestamp': 1635520560,
'upload_date': '20211029',
'thumbnail': 'https://www.zdf.de/assets/teaser-funk-alles-ist-verzaubert-102~1920x1080?cb=1663848412907',
'upload_date': '20211029',
'episode': 'Alles ist verzaubert',
},
}, {
# Same as https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche
@ -248,121 +367,55 @@ class ZDFIE(ZDFBaseIE):
'title': 'Das Geld anderer Leute',
'description': 'md5:cb6f660850dc5eb7d1ab776ea094959d',
'duration': 2581.0,
'timestamp': 1675160100,
'upload_date': '20230131',
'timestamp': 1728983700,
'upload_date': '20241015',
'thumbnail': 'https://epg-image.zdf.de/fotobase-webdelivery/images/e2d7e55a-09f0-424e-ac73-6cac4dd65f35?layout=2400x1350',
'series': 'SOKO Stuttgart',
'series_id': 'f862ce9a-6dd1-4388-a698-22b36ac4c9e9',
'season': 'Staffel 11',
'season_number': 11,
'season_id': 'ae1b4990-6d87-4970-a571-caccf1ba2879',
'episode': 'Das Geld anderer Leute',
'episode_number': 10,
'episode_id': 'POS_7f367934-f2f0-45cb-9081-736781ff2d23',
},
}, {
'url': 'https://www.zdf.de/dokumentation/terra-x/unser-gruener-planet-wuesten-doku-100.html',
'info_dict': {
'id': '220605_dk_gruener_planet_wuesten_tex',
'id': '220525_green_planet_makingof_1_tropen_tex',
'ext': 'mp4',
'title': 'Unser grüner Planet - Wüsten',
'description': 'md5:4fc647b6f9c3796eea66f4a0baea2862',
'duration': 2613.0,
'timestamp': 1654450200,
'upload_date': '20220605',
'format_note': 'uhd, main',
'thumbnail': 'https://www.zdf.de/assets/saguaro-kakteen-102~3840x2160?cb=1655910690796',
'title': 'Making-of Unser grüner Planet - Tropen',
'description': 'md5:d7c6949dc7c75c73c4ad51c785fb0b79',
'duration': 435.0,
'timestamp': 1653811200,
'upload_date': '20220529',
'format_note': 'hd, main',
'thumbnail': 'https://www.zdf.de/assets/unser-gruener-planet-making-of-1-tropen-100~3840x2160?cb=1653493335577',
'episode': 'Making-of Unser grüner Planet - Tropen',
},
'skip': 'No longer available: "Leider kein Video verfügbar"',
}, {
'url': 'https://www.zdf.de/serien/northern-lights/begegnung-auf-der-bruecke-100.html',
'info_dict': {
'id': '240319_2310_sendung_not',
'ext': 'mp4',
'title': 'Begegnung auf der Brücke',
'description': 'md5:e53a555da87447f7f1207f10353f8e45',
'thumbnail': 'https://epg-image.zdf.de/fotobase-webdelivery/images/c5ff1d1f-f5c8-4468-86ac-1b2f1dbecc76?layout=2400x1350',
'upload_date': '20250203',
'duration': 3083.0,
'timestamp': 1738546500,
'series_id': '1d7a1879-01ee-4468-8237-c6b4ecd633c7',
'series': 'Northern Lights',
'season': 'Staffel 1',
'season_number': 1,
'season_id': '22ac26a2-4ea2-4055-ac0b-98b755cdf718',
'episode': 'Begegnung auf der Brücke',
'episode_number': 1,
'episode_id': 'POS_71049438-024b-471f-b472-4fe2e490d1fb',
},
}]
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = traverse_obj(t, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template'),
), get_all=False)
if not ptmd_path:
raise ExtractorError('Could not extract ptmd_path')
info = self._extract_ptmd(
urljoin(url, ptmd_path.replace('{playerId}', 'android_native_5')), video_id, player['apiToken'], url)
thumbnails = []
layouts = try_get(
content, lambda x: x['teaserImageRef']['layouts'], dict)
if layouts:
for layout_key, layout_url in layouts.items():
layout_url = url_or_none(layout_url)
if not layout_url:
continue
thumbnail = {
'url': layout_url,
'format_id': layout_key,
}
mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
if mobj:
thumbnail.update({
'width': int(mobj.group('width')),
'height': int(mobj.group('height')),
})
thumbnails.append(thumbnail)
chapter_marks = t.get('streamAnchorTag') or []
chapter_marks.append({'anchorOffset': int_or_none(t.get('duration'))})
chapters = [{
'start_time': chap.get('anchorOffset'),
'end_time': next_chap.get('anchorOffset'),
'title': chap.get('anchorLabel'),
} for chap, next_chap in zip(chapter_marks, chapter_marks[1:])]
return merge_dicts(info, {
'title': title,
'description': content.get('leadParagraph') or content.get('teasertext'),
'duration': int_or_none(t.get('duration')),
'timestamp': unified_timestamp(content.get('editorialDate')),
'thumbnails': thumbnails,
'chapters': chapters or None,
})
def _extract_regular(self, url, player, video_id):
content = self._call_api(
player['content'], video_id, 'content', player['apiToken'], url)
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id):
video = self._download_v2_doc(video_id)
formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
document = formitaeten and video['document']
if formitaeten:
title = document['titel']
content_id = document['basename']
format_urls = set()
for f in formitaeten or []:
self._extract_format(content_id, formats, format_urls, f)
thumbnails = []
teaser_bild = document.get('teaserBild')
if isinstance(teaser_bild, dict):
for thumbnail_key, thumbnail in teaser_bild.items():
thumbnail_url = try_get(
thumbnail, lambda x: x['url'], str)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'id': thumbnail_key,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': content_id,
'title': title,
'description': document.get('beschreibung'),
'duration': int_or_none(document.get('length')),
'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
try_get(video, lambda x: x['meta']['editorialDate'], str)),
'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(document),
'formats': formats,
}
def _real_extract(self, url):
video_id = self._match_id(url)
@ -370,7 +423,7 @@ def _real_extract(self, url):
if webpage:
player = self._extract_player(webpage, url, fatal=False)
if player:
return self._extract_regular(url, player, video_id)
return self._extract_regular(url, player, video_id, query={'profile': 'player-3'})
return self._extract_mobile(video_id)
@ -416,7 +469,8 @@ def _extract_entry(self, entry):
'title': ('titel', {str}),
'description': ('beschreibung', {str}),
'duration': ('length', {float_or_none}),
# TODO: seasonNumber and episodeNumber can be extracted but need to also be in ZDFIE
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
}))
def _entries(self, data, document_id):

30
yt_dlp/globals.py Normal file
View File

@ -0,0 +1,30 @@
from collections import defaultdict
# Please Note: Due to necessary changes and the complex nature involved in the plugin/globals system,
# no backwards compatibility is guaranteed for the plugin system API.
# However, we will still try our best.
class Indirect:
def __init__(self, initial, /):
self.value = initial
def __repr__(self, /):
return f'{type(self).__name__}({self.value!r})'
postprocessors = Indirect({})
extractors = Indirect({})
# Plugins
all_plugins_loaded = Indirect(False)
plugin_specs = Indirect({})
plugin_dirs = Indirect(['default'])
plugin_ies = Indirect({})
plugin_pps = Indirect({})
plugin_ies_overrides = Indirect(defaultdict(list))
# Misc
IN_CLI = Indirect(False)
LAZY_EXTRACTORS = Indirect(None) # `False`=force, `None`=disabled, `True`=enabled

View File

@ -25,7 +25,7 @@ def zeroise(x):
with contextlib.suppress(TypeError):
if math.isnan(x): # NB: NaN cannot be checked by membership
return 0
return x
return int(float(x))
def wrapped(a, b):
return op(zeroise(a), zeroise(b)) & 0xffffffff
@ -95,6 +95,61 @@ def _js_ternary(cndn, if_true=True, if_false=False):
return if_true
# Ref: https://es5.github.io/#x9.8.1
def js_number_to_string(val: float, radix: int = 10):
if radix in (JS_Undefined, None):
radix = 10
assert radix in range(2, 37), 'radix must be an integer at least 2 and no greater than 36'
if math.isnan(val):
return 'NaN'
if val == 0:
return '0'
if math.isinf(val):
return '-Infinity' if val < 0 else 'Infinity'
if radix == 10:
# TODO: implement special cases
...
ALPHABET = b'0123456789abcdefghijklmnopqrstuvwxyz.-'
result = collections.deque()
sign = val < 0
val = abs(val)
fraction, integer = math.modf(val)
delta = max(math.nextafter(.0, math.inf), math.ulp(val) / 2)
if fraction >= delta:
result.append(-2) # `.`
while fraction >= delta:
delta *= radix
fraction, digit = math.modf(fraction * radix)
result.append(int(digit))
# if we need to round, propagate potential carry through fractional part
needs_rounding = fraction > 0.5 or (fraction == 0.5 and int(digit) & 1)
if needs_rounding and fraction + delta > 1:
for index in reversed(range(1, len(result))):
if result[index] + 1 < radix:
result[index] += 1
break
result.pop()
else:
integer += 1
break
integer, digit = divmod(int(integer), radix)
result.appendleft(digit)
while integer > 0:
integer, digit = divmod(integer, radix)
result.appendleft(digit)
if sign:
result.appendleft(-1) # `-`
return bytes(ALPHABET[digit] for digit in result).decode('ascii')
# Ref: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence
_OPERATORS = { # None => Defined in JSInterpreter._operator
'?': None,

View File

@ -296,6 +296,7 @@ def _check_extensions(self, extensions):
extensions.pop('cookiejar', None)
extensions.pop('timeout', None)
extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def _create_instance(self, cookiejar, legacy_ssl_support=None):
session = RequestsSession()
@ -312,11 +313,12 @@ def _create_instance(self, cookiejar, legacy_ssl_support=None):
session.trust_env = False # no need, we already load proxies from env
return session
def _send(self, request):
headers = self._merge_headers(request.headers)
def _prepare_headers(self, _, headers):
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
max_redirects_exceeded = False
session = self._get_instance(

View File

@ -379,13 +379,15 @@ def _create_instance(self, proxies, cookiejar, legacy_ssl_support=None):
opener.addheaders = []
return opener
def _send(self, request):
headers = self._merge_headers(request.headers)
def _prepare_headers(self, _, headers):
add_accept_encoding_header(headers, SUPPORTED_ENCODINGS)
def _send(self, request):
headers = self._get_headers(request)
urllib_req = urllib.request.Request(
url=request.url,
data=request.data,
headers=dict(headers),
headers=headers,
method=request.method,
)

View File

@ -116,6 +116,7 @@ def _check_extensions(self, extensions):
extensions.pop('timeout', None)
extensions.pop('cookiejar', None)
extensions.pop('legacy_ssl', None)
extensions.pop('keep_header_casing', None)
def close(self):
# Remove the logging handler that contains a reference to our logger
@ -123,15 +124,16 @@ def close(self):
for name, handler in self.__logging_handlers.items():
logging.getLogger(name).removeHandler(handler)
def _send(self, request):
timeout = self._calculate_timeout(request)
headers = self._merge_headers(request.headers)
def _prepare_headers(self, request, headers):
if 'cookie' not in headers:
cookiejar = self._get_cookiejar(request)
cookie_header = cookiejar.get_cookie_header(request.url)
if cookie_header:
headers['cookie'] = cookie_header
def _send(self, request):
timeout = self._calculate_timeout(request)
headers = self._get_headers(request)
wsuri = parse_uri(request.url)
create_conn_kwargs = {
'source_address': (self.source_address, 0) if self.source_address else None,

View File

@ -206,6 +206,7 @@ class RequestHandler(abc.ABC):
- `cookiejar`: Cookiejar to use for this request.
- `timeout`: socket timeout to use for this request.
- `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support.
- `keep_header_casing`: Keep the casing of headers when sending the request.
To enable these, add extensions.pop('<extension>', None) to _check_extensions
Apart from the url protocol, proxies dict may contain the following keys:
@ -259,6 +260,23 @@ def _make_sslcontext(self, legacy_ssl_support=None):
def _merge_headers(self, request_headers):
return HTTPHeaderDict(self.headers, request_headers)
def _prepare_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external use.
Subclasses may define a _prepare_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers)
self._prepare_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
def _calculate_timeout(self, request):
return float(request.extensions.get('timeout') or self.timeout)
@ -317,6 +335,7 @@ def _check_extensions(self, extensions):
assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType))
assert isinstance(extensions.get('timeout'), (float, int, NoneType))
assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType))
assert isinstance(extensions.get('keep_header_casing'), (bool, NoneType))
def _validate(self, request):
self._check_url_scheme(request)

View File

@ -5,11 +5,11 @@
from dataclasses import dataclass
from typing import Any
from .common import RequestHandler, register_preference
from .common import RequestHandler, register_preference, Request
from .exceptions import UnsupportedRequest
from ..compat.types import NoneType
from ..utils import classproperty, join_nonempty
from ..utils.networking import std_headers
from ..utils.networking import std_headers, HTTPHeaderDict
@dataclass(order=True, frozen=True)
@ -123,7 +123,17 @@ def _get_request_target(self, request):
"""Get the requested target for the request"""
return self._resolve_target(request.extensions.get('impersonate') or self.impersonate)
def _get_impersonate_headers(self, request):
def _prepare_impersonate_headers(self, request: Request, headers: HTTPHeaderDict) -> None: # noqa: B027
"""Additional operations to prepare headers before building. To be extended by subclasses.
@param request: Request object
@param headers: Merged headers to prepare
"""
def _get_impersonate_headers(self, request: Request) -> dict[str, str]:
"""
Get headers for external impersonation use.
Subclasses may define a _prepare_impersonate_headers method to modify headers after merge but before building.
"""
headers = self._merge_headers(request.headers)
if self._get_request_target(request) is not None:
# remove all headers present in std_headers
@ -131,7 +141,11 @@ def _get_impersonate_headers(self, request):
for k, v in std_headers.items():
if headers.get(k) == v:
headers.pop(k)
return headers
self._prepare_impersonate_headers(request, headers)
if request.extensions.get('keep_header_casing'):
return headers.sensitive()
return dict(headers)
@register_preference(ImpersonateRequestHandler)

View File

@ -398,7 +398,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'(Alias: --no-config)'))
general.add_option(
'--no-config-locations',
action='store_const', dest='config_locations', const=[],
action='store_const', dest='config_locations', const=None,
help=(
'Do not load any custom configuration files (default). When given inside a '
'configuration file, ignore all previous --config-locations defined in the current file'))
@ -410,12 +410,21 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'("-" for stdin). Can be used multiple times and inside other configuration files'))
general.add_option(
'--plugin-dirs',
dest='plugin_dirs', metavar='PATH', action='append',
metavar='PATH',
dest='plugin_dirs',
action='callback',
callback=_list_from_options_callback,
type='str',
callback_kwargs={'delim': None},
default=['default'],
help=(
'Path to an additional directory to search for plugins. '
'This option can be used multiple times to add multiple directories. '
'Note that this currently only works for extractor plugins; '
'postprocessor plugins can only be loaded from the default plugin directories'))
'Use "default" to search the default plugin directories (default)'))
general.add_option(
'--no-plugin-dirs',
dest='plugin_dirs', action='store_const', const=[],
help='Clear plugin directories to search, including defaults and those provided by previous --plugin-dirs')
general.add_option(
'--flat-playlist',
action='store_const', dest='extract_flat', const='in_playlist', default=False,

View File

@ -1,4 +1,5 @@
import contextlib
import dataclasses
import functools
import importlib
import importlib.abc
@ -14,17 +15,48 @@
from pathlib import Path
from zipfile import ZipFile
from .globals import (
Indirect,
plugin_dirs,
all_plugins_loaded,
plugin_specs,
)
from .utils import (
Config,
get_executable_path,
get_system_config_dirs,
get_user_config_dirs,
merge_dicts,
orderedSet,
write_string,
)
PACKAGE_NAME = 'yt_dlp_plugins'
COMPAT_PACKAGE_NAME = 'ytdlp_plugins'
_BASE_PACKAGE_PATH = Path(__file__).parent
# Please Note: Due to necessary changes and the complex nature involved,
# no backwards compatibility is guaranteed for the plugin system API.
# However, we will still try our best.
__all__ = [
'COMPAT_PACKAGE_NAME',
'PACKAGE_NAME',
'PluginSpec',
'directories',
'load_all_plugins',
'load_plugins',
'register_plugin_spec',
]
@dataclasses.dataclass
class PluginSpec:
module_name: str
suffix: str
destination: Indirect
plugin_destination: Indirect
class PluginLoader(importlib.abc.Loader):
@ -44,7 +76,42 @@ def dirs_in_zip(archive):
pass
except Exception as e:
write_string(f'WARNING: Could not read zip file {archive}: {e}\n')
return set()
return ()
def default_plugin_paths():
def _get_package_paths(*root_paths, containing_folder):
for config_dir in orderedSet(map(Path, root_paths), lazy=True):
# We need to filter the base path added when running __main__.py directly
if config_dir == _BASE_PACKAGE_PATH:
continue
with contextlib.suppress(OSError):
yield from (config_dir / containing_folder).iterdir()
# Load from yt-dlp config folders
yield from _get_package_paths(
*get_user_config_dirs('yt-dlp'),
*get_system_config_dirs('yt-dlp'),
containing_folder='plugins',
)
# Load from yt-dlp-plugins folders
yield from _get_package_paths(
get_executable_path(),
*get_user_config_dirs(''),
*get_system_config_dirs(''),
containing_folder='yt-dlp-plugins',
)
# Load from PYTHONPATH directories
yield from (path for path in map(Path, sys.path) if path != _BASE_PACKAGE_PATH)
def candidate_plugin_paths(candidate):
candidate_path = Path(candidate)
if not candidate_path.is_dir():
raise ValueError(f'Invalid plugin directory: {candidate_path}')
yield from candidate_path.iterdir()
class PluginFinder(importlib.abc.MetaPathFinder):
@ -56,40 +123,16 @@ class PluginFinder(importlib.abc.MetaPathFinder):
def __init__(self, *packages):
self._zip_content_cache = {}
self.packages = set(itertools.chain.from_iterable(
itertools.accumulate(name.split('.'), lambda a, b: '.'.join((a, b)))
for name in packages))
self.packages = set(
itertools.chain.from_iterable(
itertools.accumulate(name.split('.'), lambda a, b: '.'.join((a, b)))
for name in packages))
def search_locations(self, fullname):
candidate_locations = []
def _get_package_paths(*root_paths, containing_folder='plugins'):
for config_dir in orderedSet(map(Path, root_paths), lazy=True):
with contextlib.suppress(OSError):
yield from (config_dir / containing_folder).iterdir()
# Load from yt-dlp config folders
candidate_locations.extend(_get_package_paths(
*get_user_config_dirs('yt-dlp'),
*get_system_config_dirs('yt-dlp'),
containing_folder='plugins'))
# Load from yt-dlp-plugins folders
candidate_locations.extend(_get_package_paths(
get_executable_path(),
*get_user_config_dirs(''),
*get_system_config_dirs(''),
containing_folder='yt-dlp-plugins'))
candidate_locations.extend(map(Path, sys.path)) # PYTHONPATH
with contextlib.suppress(ValueError): # Added when running __main__.py directly
candidate_locations.remove(Path(__file__).parent)
# TODO(coletdjnz): remove when plugin globals system is implemented
if Config._plugin_dirs:
candidate_locations.extend(_get_package_paths(
*Config._plugin_dirs,
containing_folder=''))
candidate_locations = itertools.chain.from_iterable(
default_plugin_paths() if candidate == 'default' else candidate_plugin_paths(candidate)
for candidate in plugin_dirs.value
)
parts = Path(*fullname.split('.'))
for path in orderedSet(candidate_locations, lazy=True):
@ -109,7 +152,8 @@ def find_spec(self, fullname, path=None, target=None):
search_locations = list(map(str, self.search_locations(fullname)))
if not search_locations:
return None
# Prevent using built-in meta finders for searching plugins.
raise ModuleNotFoundError(fullname)
spec = importlib.machinery.ModuleSpec(fullname, PluginLoader(), is_package=True)
spec.submodule_search_locations = search_locations
@ -123,8 +167,10 @@ def invalidate_caches(self):
def directories():
spec = importlib.util.find_spec(PACKAGE_NAME)
return spec.submodule_search_locations if spec else []
with contextlib.suppress(ModuleNotFoundError):
if spec := importlib.util.find_spec(PACKAGE_NAME):
return list(spec.submodule_search_locations)
return []
def iter_modules(subpackage):
@ -134,19 +180,23 @@ def iter_modules(subpackage):
yield from pkgutil.iter_modules(path=pkg.__path__, prefix=f'{fullname}.')
def load_module(module, module_name, suffix):
def get_regular_classes(module, module_name, suffix):
# Find standard public plugin classes (not overrides)
return inspect.getmembers(module, lambda obj: (
inspect.isclass(obj)
and obj.__name__.endswith(suffix)
and obj.__module__.startswith(module_name)
and not obj.__name__.startswith('_')
and obj.__name__ in getattr(module, '__all__', [obj.__name__])))
and obj.__name__ in getattr(module, '__all__', [obj.__name__])
and getattr(obj, 'PLUGIN_NAME', None) is None
))
def load_plugins(name, suffix):
classes = {}
if os.environ.get('YTDLP_NO_PLUGINS'):
return classes
def load_plugins(plugin_spec: PluginSpec):
name, suffix = plugin_spec.module_name, plugin_spec.suffix
regular_classes = {}
if os.environ.get('YTDLP_NO_PLUGINS') or not plugin_dirs.value:
return regular_classes
for finder, module_name, _ in iter_modules(name):
if any(x.startswith('_') for x in module_name.split('.')):
@ -163,24 +213,42 @@ def load_plugins(name, suffix):
sys.modules[module_name] = module
spec.loader.exec_module(module)
except Exception:
write_string(f'Error while importing module {module_name!r}\n{traceback.format_exc(limit=-1)}')
write_string(
f'Error while importing module {module_name!r}\n{traceback.format_exc(limit=-1)}',
)
continue
classes.update(load_module(module, module_name, suffix))
regular_classes.update(get_regular_classes(module, module_name, suffix))
# Compat: old plugin system using __init__.py
# Note: plugins imported this way do not show up in directories()
# nor are considered part of the yt_dlp_plugins namespace package
with contextlib.suppress(FileNotFoundError):
spec = importlib.util.spec_from_file_location(
name, Path(get_executable_path(), COMPAT_PACKAGE_NAME, name, '__init__.py'))
plugins = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = plugins
spec.loader.exec_module(plugins)
classes.update(load_module(plugins, spec.name, suffix))
if 'default' in plugin_dirs.value:
with contextlib.suppress(FileNotFoundError):
spec = importlib.util.spec_from_file_location(
name,
Path(get_executable_path(), COMPAT_PACKAGE_NAME, name, '__init__.py'),
)
plugins = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = plugins
spec.loader.exec_module(plugins)
regular_classes.update(get_regular_classes(plugins, spec.name, suffix))
return classes
# Add the classes into the global plugin lookup for that type
plugin_spec.plugin_destination.value = regular_classes
# We want to prepend to the main lookup for that type
plugin_spec.destination.value = merge_dicts(regular_classes, plugin_spec.destination.value)
return regular_classes
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.extractor', f'{PACKAGE_NAME}.postprocessor'))
def load_all_plugins():
for plugin_spec in plugin_specs.value.values():
load_plugins(plugin_spec)
all_plugins_loaded.value = True
__all__ = ['COMPAT_PACKAGE_NAME', 'PACKAGE_NAME', 'directories', 'load_plugins']
def register_plugin_spec(plugin_spec: PluginSpec):
# If the plugin spec for a module is already registered, it will not be added again
if plugin_spec.module_name not in plugin_specs.value:
plugin_specs.value[plugin_spec.module_name] = plugin_spec
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.{plugin_spec.module_name}'))

Some files were not shown because too many files have changed in this diff Show More