1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2026-01-12 18:01:22 +00:00

Compare commits

...

9 Commits

Author SHA1 Message Date
pukkandan
65156eba45 Release 2021.01.10 2021-01-11 04:09:08 +05:30
pukkandan
ba3c9477ee [Animelab] Added (https://github.com/ytdl-org/youtube-dl/pull/13600)
Authored by mariuszskon
2021-01-11 03:10:53 +05:30
pukkandan
a3e26449cd [archive.org] Fix extractor and add support for audio and playlists (https://github.com/ytdl-org/youtube-dl/pull/27156)
Coauthored by wporr
2021-01-11 03:10:53 +05:30
pukkandan
7267acd1ed [youtube:search] fix view_count (https://github.com/ytdl-org/youtube-dl/pull/27588/)
Authored by ohnonot
2021-01-11 02:59:44 +05:30
pukkandan
f446cc6667 Create to_screen and similar functions in postprocessor/common
`to_screen`, `report_warning`, `report_error`, `write_debug`, `get_param`

This is a first step in standardizing these function. This has to be done eventually for extractors and downloaders too
2021-01-10 22:22:24 +05:30
pukkandan
ebdd9275c3 Enable test_youtube_search_matching
I forgot to enable this when the search url extractor was reinstated
2021-01-10 22:20:32 +05:30
pukkandan
b2f70ae74e Update version badge automatically in README
Uses: https://github.com/Schneegans/dynamic-badges-action
2021-01-09 22:58:23 +05:30
pukkandan
5ac2324460 [youtube] Show if video is embeddable in info
Closes https://github.com/ytdl-org/youtube-dl/issues/27730
2021-01-09 21:29:58 +05:30
pukkandan
4084f235eb [version] update 2021-01-09 18:44:32 +05:30
24 changed files with 623 additions and 5382 deletions

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.08. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.09. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dlc version **2021.01.08**
- [ ] I've verified that I'm running youtube-dlc version **2021.01.09**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ Add the `-v` flag to your command line you run youtube-dlc with (`youtube-dlc -v
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dlc version 2021.01.08
[debug] youtube-dlc version 2021.01.09
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.08. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.09. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/pukkandan/yt-dlc. youtube-dlc does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dlc version **2021.01.08**
- [ ] I've verified that I'm running youtube-dlc version **2021.01.09**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.08. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.09. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dlc version **2021.01.08**
- [ ] I've verified that I'm running youtube-dlc version **2021.01.09**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.08. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.09. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dlc version **2021.01.08**
- [ ] I've verified that I'm running youtube-dlc version **2021.01.09**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ Add the `-v` flag to your command line you run youtube-dlc with (`youtube-dlc -v
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dlc version 2021.01.08
[debug] youtube-dlc version 2021.01.09
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.08. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.09. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dlc version **2021.01.08**
- [ ] I've verified that I'm running youtube-dlc version **2021.01.09**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -161,3 +161,19 @@ jobs:
asset_path: ./SHA2-256SUMS
asset_name: SHA2-256SUMS
asset_content_type: text/plain
update_version_badge:
runs-on: ubuntu-latest
needs: build_unix
steps:
- name: Create Version Badge
uses: schneegans/dynamic-badges-action@v1.0.0
with:
auth: ${{ secrets.GIST_TOKEN }}
gistID: c69cb23c3c5b3316248e52022790aa57
filename: version.json
label: Version
message: ${{ needs.build_unix.outputs.ytdlc_version }}

View File

@@ -7,10 +7,9 @@ jobs:
strategy:
fail-fast: true
matrix:
os: [ubuntu-latest]
os: [ubuntu-18.04]
# TODO: python 2.6
# 3.3, 3.4 are not running
python-version: [2.7, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
python-version: [2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
python-impl: [cpython]
ytdl-test-set: [core, download]
run-tests-ext: [sh]

View File

@@ -5,4 +5,10 @@ nixxo
GreyAlien502
kyuyeunk
siikamiika
jbruchon
jbruchon
alexmerkel
glenn-slayden
Unrud
wporr
mariuszskon
ohnonot

5294
ChangeLog

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,34 @@
# Changelog
<!--
# Instuctions for creating release
* Run `make doc`
* Update Changelog.md and Authors-Fork
* Commit to master as `Release <version>`
* Push to origin/release - build task will now run
* Update version.py and run `make issuetemplates`
* Commit to master as `[version] update`
* Push to origin/master
-->
### 2020.01.10
* [archive.org] Fix extractor and add support for audio and playlists by @wporr
* [Animelab] Added by @mariuszskon
* [youtube:search] Fix view_count by @ohnonot
* [youtube] Show if video is embeddable in info
* Update version badge automatically in README
* Enable `test_youtube_search_matching`
* Create `to_screen` and similar functions in postprocessor/common
### 2020.01.09
* [youtube] Fix bug in automatic caption extraction
* Add `post_hooks` to YoutubeDL by @alexmerkel
* Batch file enumeration improvements by @glenn-slayden
* Stop immediately when reaching `--max-downloads` by @glenn-slayden
* Fix incorrect ANSI sequence for restoring console-window title by @glenn-slayden
* Kill child processes when yt-dlc is killed by @Unrud
### 2020.01.08
* **Merge youtube-dl:** Upto [2020.01.08](https://github.com/ytdl-org/youtube-dl/commit/bf6a74c620bd4d5726503c5302906bb36b009026)

View File

@@ -10,7 +10,8 @@ PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin
MANDIR ?= $(PREFIX)/man
SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
# make_supportedsites.py doesnot work correctly in python2
PYTHON ?= /usr/bin/env python3
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)

View File

@@ -1,4 +1,5 @@
[![Release Version](https://img.shields.io/badge/Release-2021.01.09-brightgreen)](https://github.com/pukkandan/yt-dlc/releases/latest)
<!-- See: https://github.com/marketplace/actions/dynamic-badges -->
[![Release Version](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/pukkandan/c69cb23c3c5b3316248e52022790aa57/raw/version.json&color=brightgreen)](https://github.com/pukkandan/yt-dlc/releases/latest)
[![License: Unlicense](https://img.shields.io/badge/License-Unlicense-blue.svg)](https://github.com/pukkandan/yt-dlc/blob/master/LICENSE)
[![Core Status](https://github.com/pukkandan/yt-dlc/workflows/Core%20Test/badge.svg?branch=master)](https://github.com/pukkandan/yt-dlc/actions?query=workflow%3ACore)
[![CI Status](https://github.com/pukkandan/yt-dlc/workflows/Full%20Test/badge.svg?branch=master)](https://github.com/pukkandan/yt-dlc/actions?query=workflow%3AFull)

View File

@@ -48,6 +48,8 @@
- **AMCNetworks**
- **AmericasTestKitchen**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeLab**
- **AnimeLabShows**
- **AnimeOnDemand**
- **Anvato**
- **aol.com**
@@ -58,7 +60,7 @@
- **ApplePodcasts**
- **appletrailers**
- **appletrailers:section**
- **archive.org**: archive.org videos
- **archive.org**: archive.org video and audio
- **ArcPublishing**
- **ARD**
- **ARD:mediathek**

View File

@@ -69,9 +69,9 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:tab'])
# def test_youtube_search_matching(self):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)

View File

@@ -0,0 +1,285 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
int_or_none,
str_or_none,
determine_ext,
)
from ..compat import compat_HTTPError
class AnimeLabBaseIE(InfoExtractor):
_LOGIN_REQUIRED = True
_LOGIN_URL = 'https://www.animelab.com/login'
_NETRC_MACHINE = 'animelab'
def _login(self):
def is_logged_in(login_webpage):
return 'Sign In' not in login_webpage
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
# Check if already logged in
if is_logged_in(login_page):
return
(username, password) = self._get_login_info()
if username is None and self._LOGIN_REQUIRED:
self.raise_login_required('Login is required to access any AnimeLab content')
login_form = {
'email': username,
'password': password,
}
try:
response = self._download_webpage(
self._LOGIN_URL, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError('Unable to log in (wrong credentials?)', expected=True)
else:
raise
# if login was successful
if is_logged_in(response):
return
raise ExtractorError('Unable to login (cannot verify if logged in)')
def _real_initialize(self):
self._login()
class AnimeLabIE(AnimeLabBaseIE):
_VALID_URL = r'https?://(?:www\.)?animelab\.com/player/(?P<id>[^/]+)'
# the following tests require authentication, but a free account will suffice
# just set 'usenetrc' to true in test/local_parameters.json if you use a .netrc file
# or you can set 'username' and 'password' there
# the tests also select a specific format so that the same video is downloaded
# regardless of whether the user is premium or not (needs testing on a premium account)
_TEST = {
'url': 'https://www.animelab.com/player/fullmetal-alchemist-brotherhood-episode-42',
'md5': '05bde4b91a5d1ff46ef5b94df05b0f7f',
'info_dict': {
'id': '383',
'ext': 'mp4',
'display_id': 'fullmetal-alchemist-brotherhood-episode-42',
'title': 'Fullmetal Alchemist: Brotherhood - Episode 42 - Signs of a Counteroffensive',
'description': 'md5:103eb61dd0a56d3dfc5dbf748e5e83f4',
'series': 'Fullmetal Alchemist: Brotherhood',
'episode': 'Signs of a Counteroffensive',
'episode_number': 42,
'duration': 1469,
'season': 'Season 1',
'season_number': 1,
'season_id': '38',
},
'params': {
'format': '[format_id=21711_yeshardsubbed_ja-JP][height=480]',
},
'skip': 'All AnimeLab content requires authentication',
}
def _real_extract(self, url):
display_id = self._match_id(url)
# unfortunately we can get different URLs for the same formats
# e.g. if we are using a "free" account so no dubs available
# (so _remove_duplicate_formats is not effective)
# so we use a dictionary as a workaround
formats = {}
for language_option_url in ('https://www.animelab.com/player/%s/subtitles',
'https://www.animelab.com/player/%s/dubbed'):
actual_url = language_option_url % display_id
webpage = self._download_webpage(actual_url, display_id, 'Downloading URL ' + actual_url)
video_collection = self._parse_json(self._search_regex(r'new\s+?AnimeLabApp\.VideoCollection\s*?\((.*?)\);', webpage, 'AnimeLab VideoCollection'), display_id)
position = int_or_none(self._search_regex(r'playlistPosition\s*?=\s*?(\d+)', webpage, 'Playlist Position'))
raw_data = video_collection[position]['videoEntry']
video_id = str_or_none(raw_data['id'])
# create a title from many sources (while grabbing other info)
# TODO use more fallback sources to get some of these
series = raw_data.get('showTitle')
video_type = raw_data.get('videoEntryType', {}).get('name')
episode_number = raw_data.get('episodeNumber')
episode_name = raw_data.get('name')
title_parts = (series, video_type, episode_number, episode_name)
if None not in title_parts:
title = '%s - %s %s - %s' % title_parts
else:
title = episode_name
description = raw_data.get('synopsis') or self._og_search_description(webpage, default=None)
duration = int_or_none(raw_data.get('duration'))
thumbnail_data = raw_data.get('images', [])
thumbnails = []
for thumbnail in thumbnail_data:
for instance in thumbnail['imageInstances']:
image_data = instance.get('imageInfo', {})
thumbnails.append({
'id': str_or_none(image_data.get('id')),
'url': image_data.get('fullPath'),
'width': image_data.get('width'),
'height': image_data.get('height'),
})
season_data = raw_data.get('season', {}) or {}
season = str_or_none(season_data.get('name'))
season_number = int_or_none(season_data.get('seasonNumber'))
season_id = str_or_none(season_data.get('id'))
for video_data in raw_data['videoList']:
current_video_list = {}
current_video_list['language'] = video_data.get('language', {}).get('languageCode')
is_hardsubbed = video_data.get('hardSubbed')
for video_instance in video_data['videoInstances']:
httpurl = video_instance.get('httpUrl')
url = httpurl if httpurl else video_instance.get('rtmpUrl')
if url is None:
# this video format is unavailable to the user (not premium etc.)
continue
current_format = current_video_list.copy()
format_id_parts = []
format_id_parts.append(str_or_none(video_instance.get('id')))
if is_hardsubbed is not None:
if is_hardsubbed:
format_id_parts.append('yeshardsubbed')
else:
format_id_parts.append('nothardsubbed')
format_id_parts.append(current_format['language'])
format_id = '_'.join([x for x in format_id_parts if x is not None])
ext = determine_ext(url)
if ext == 'm3u8':
for format_ in self._extract_m3u8_formats(
url, video_id, m3u8_id=format_id, fatal=False):
formats[format_['format_id']] = format_
continue
elif ext == 'mpd':
for format_ in self._extract_mpd_formats(
url, video_id, mpd_id=format_id, fatal=False):
formats[format_['format_id']] = format_
continue
current_format['url'] = url
quality_data = video_instance.get('videoQuality')
if quality_data:
quality = quality_data.get('name') or quality_data.get('description')
else:
quality = None
height = None
if quality:
height = int_or_none(self._search_regex(r'(\d+)p?$', quality, 'Video format height', default=None))
if height is None:
self.report_warning('Could not get height of video')
else:
current_format['height'] = height
current_format['format_id'] = format_id
formats[current_format['format_id']] = current_format
formats = list(formats.values())
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'series': series,
'episode': episode_name,
'episode_number': int_or_none(episode_number),
'thumbnails': thumbnails,
'duration': duration,
'formats': formats,
'season': season,
'season_number': season_number,
'season_id': season_id,
}
class AnimeLabShowsIE(AnimeLabBaseIE):
_VALID_URL = r'https?://(?:www\.)?animelab\.com/shows/(?P<id>[^/]+)'
_TEST = {
'url': 'https://www.animelab.com/shows/attack-on-titan',
'info_dict': {
'id': '45',
'title': 'Attack on Titan',
'description': 'md5:989d95a2677e9309368d5cf39ba91469',
},
'playlist_count': 59,
'skip': 'All AnimeLab content requires authentication',
}
def _real_extract(self, url):
_BASE_URL = 'http://www.animelab.com'
_SHOWS_API_URL = '/api/videoentries/show/videos/'
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id, 'Downloading requested URL')
show_data_str = self._search_regex(r'({"id":.*}),\svideoEntry', webpage, 'AnimeLab show data')
show_data = self._parse_json(show_data_str, display_id)
show_id = str_or_none(show_data.get('id'))
title = show_data.get('name')
description = show_data.get('shortSynopsis') or show_data.get('longSynopsis')
entries = []
for season in show_data['seasons']:
season_id = season['id']
get_data = urlencode_postdata({
'seasonId': season_id,
'limit': 1000,
})
# despite using urlencode_postdata, we are sending a GET request
target_url = _BASE_URL + _SHOWS_API_URL + show_id + "?" + get_data.decode('utf-8')
response = self._download_webpage(
target_url,
None, 'Season id %s' % season_id)
season_data = self._parse_json(response, display_id)
for video_data in season_data['list']:
entries.append(self.url_result(
_BASE_URL + '/player/' + video_data['slug'], 'AnimeLab',
str_or_none(video_data.get('id')), video_data.get('name')
))
return {
'_type': 'playlist',
'id': show_id,
'title': title,
'description': description,
'entries': entries,
}
# TODO implement myqueue

View File

@@ -1,27 +1,43 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
KNOWN_EXTENSIONS,
extract_attributes,
unified_strdate,
unified_timestamp,
clean_html,
dict_get,
parse_duration,
int_or_none,
str_or_none,
merge_dicts,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
IE_DESC = 'archive.org video and audio'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^?#]+)(?:[?].*)?$'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogg',
'ext': 'ogv',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
}
'release_date': '19681210',
'timestamp': 1268695290,
'upload_date': '20100315',
'creator': 'SRI International',
'uploader': 'laura@archive.org',
},
}, {
'url': 'https://archive.org/details/Cops1922',
'md5': '0869000b4ce265e8ca62738b336b268a',
@@ -29,37 +45,199 @@ class ArchiveOrgIE(InfoExtractor):
'id': 'Cops1922',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:89e7c77bf5d965dd5c0372cfb49470f6',
}
'description': 'md5:43a603fd6c5b4b90d12a96b921212b9c',
'uploader': 'yorkmba99@hotmail.com',
'timestamp': 1387699629,
'upload_date': "20131222",
},
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}, {
'url': 'https://archive.org/details/Election_Ads',
'md5': '284180e857160cf866358700bab668a3',
'info_dict': {
'id': 'Election_Ads/Commercial-JFK1960ElectionAdCampaignJingle.mpg',
'title': 'Commercial-JFK1960ElectionAdCampaignJingle.mpg',
'ext': 'mp4',
},
}, {
'url': 'https://archive.org/details/Election_Ads/Commercial-Nixon1960ElectionAdToughonDefense.mpg',
'md5': '7915213ef02559b5501fe630e1a53f59',
'info_dict': {
'id': 'Election_Ads/Commercial-Nixon1960ElectionAdToughonDefense.mpg',
'title': 'Commercial-Nixon1960ElectionAdToughonDefense.mpg',
'ext': 'mp4',
'timestamp': 1205588045,
'uploader': 'mikedavisstripmaster@yahoo.com',
'description': '1960 Presidential Campaign Election Commercials John F Kennedy, Richard M Nixon',
'upload_date': '20080315',
},
}, {
'url': 'https://archive.org/details/gd1977-05-08.shure57.stevenson.29303.flac16',
'md5': '7d07ffb42aba6537c28e053efa4b54c9',
'info_dict': {
'id': 'gd1977-05-08.shure57.stevenson.29303.flac16/gd1977-05-08d01t01.flac',
'title': 'Turning',
'ext': 'flac',
},
}, {
'url': 'https://archive.org/details/gd1977-05-08.shure57.stevenson.29303.flac16/gd1977-05-08d01t07.flac',
'md5': 'a07cd8c6ab4ee1560f8a0021717130f3',
'info_dict': {
'id': 'gd1977-05-08.shure57.stevenson.29303.flac16/gd1977-05-08d01t07.flac',
'title': 'Deal',
'ext': 'flac',
'timestamp': 1205895624,
'uploader': 'mvernon54@yahoo.com',
'description': 'md5:6a31f1996db0aa0fc9da6d6e708a1bb0',
'upload_date': '20080319',
'location': 'Barton Hall - Cornell University',
},
}, {
'url': 'https://archive.org/details/lp_the-music-of-russia_various-artists-a-askaryan-alexander-melik',
'md5': '7cb019baa9b332e82ea7c10403acd180',
'info_dict': {
'id': 'lp_the-music-of-russia_various-artists-a-askaryan-alexander-melik/disc1/01.01. Bells Of Rostov.mp3',
'title': 'Bells Of Rostov',
'ext': 'mp3',
},
}, {
'url': 'https://archive.org/details/lp_the-music-of-russia_various-artists-a-askaryan-alexander-melik/disc1/02.02.+Song+And+Chorus+In+The+Polovetsian+Camp+From+%22Prince+Igor%22+(Act+2%2C+Scene+1).mp3',
'md5': '1d0aabe03edca83ca58d9ed3b493a3c3',
'info_dict': {
'id': 'lp_the-music-of-russia_various-artists-a-askaryan-alexander-melik/disc1/02.02. Song And Chorus In The Polovetsian Camp From "Prince Igor" (Act 2, Scene 1).mp3',
'title': 'Song And Chorus In The Polovetsian Camp From "Prince Igor" (Act 2, Scene 1)',
'ext': 'mp3',
'timestamp': 1569662587,
'uploader': 'associate-joygen-odiongan@archive.org',
'description': 'md5:012b2d668ae753be36896f343d12a236',
'upload_date': '20190928',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
@staticmethod
def _playlist_data(webpage):
element = re.findall(r'''(?xs)
<input
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s+class=['"]?js-play8-playlist['"]?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s*/>
''', webpage)[0]
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
return json.loads(extract_attributes(element)['value'])
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote_plus(self._match_id(url))
identifier, entry_id = (video_id.split('/', 1) + [None])[:2]
# Archive.org metadata API doesn't clearly demarcate playlist entries
# or subtitle tracks, so we get them from the embeddable player.
embed_page = self._download_webpage(
'https://archive.org/embed/' + identifier, identifier)
playlist = self._playlist_data(embed_page)
entries = {}
for p in playlist:
# If the user specified a playlist entry in the URL, ignore the
# rest of the playlist.
if entry_id and p['orig'] != entry_id:
continue
entries[p['orig']] = {
'formats': [],
'thumbnails': [],
'artist': p.get('artist'),
'track': p.get('title'),
'subtitles': {}}
for track in p.get('tracks', []):
if track['kind'] != 'subtitles':
continue
entries[p['orig']][track['label']] = {
'url': 'https://archive.org/' + track['file'].lstrip('/')}
metadata = self._download_json(
'http://archive.org/details/' + video_id, video_id, query={
'output': 'json',
})['metadata']
info.update({
'title': get_optional(metadata, 'title') or info.get('title'),
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
})
'http://archive.org/metadata/' + identifier, identifier)
m = metadata['metadata']
identifier = m['identifier']
info = {
'id': identifier,
'title': m['title'],
'description': clean_html(m.get('description')),
'uploader': dict_get(m, ['uploader', 'adder']),
'creator': m.get('creator'),
'license': m.get('licenseurl'),
'release_date': unified_strdate(m.get('date')),
'timestamp': unified_timestamp(dict_get(m, ['publicdate', 'addeddate'])),
'webpage_url': 'https://archive.org/details/' + identifier,
'location': m.get('venue'),
'release_year': int_or_none(m.get('year'))}
for f in metadata['files']:
if f['name'] in entries:
entries[f['name']] = merge_dicts(entries[f['name']], {
'id': identifier + '/' + f['name'],
'title': f.get('title') or f['name'],
'display_id': f['name'],
'description': clean_html(f.get('description')),
'creator': f.get('creator'),
'duration': parse_duration(f.get('length')),
'track_number': int_or_none(f.get('track')),
'album': f.get('album'),
'discnumber': int_or_none(f.get('disc')),
'release_year': int_or_none(f.get('year'))})
entry = entries[f['name']]
elif f.get('original') in entries:
entry = entries[f['original']]
else:
continue
if f.get('format') == 'Thumbnail':
entry['thumbnails'].append({
'id': f['name'],
'url': 'https://archive.org/download/' + identifier + '/' + f['name'],
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('width')),
'filesize': int_or_none(f.get('size'))})
extension = (f['name'].rsplit('.', 1) + [None])[1]
if extension in KNOWN_EXTENSIONS:
entry['formats'].append({
'url': 'https://archive.org/download/' + identifier + '/' + f['name'],
'format': f.get('format'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'filesize': int_or_none(f.get('size')),
'protocol': 'https'})
# Sort available formats by filesize
for entry in entries.values():
entry['formats'] = list(sorted(entry['formats'], key=lambda x: x.get('filesize', -1)))
if len(entries) == 1:
# If there's only one item, use it as the main info dict
only_video = entries[list(entries.keys())[0]]
if entry_id:
info = merge_dicts(only_video, info)
else:
info = merge_dicts(info, only_video)
else:
# Otherwise, we have a playlist.
info['_type'] = 'playlist'
info['entries'] = list(entries.values())
if metadata.get('reviews'):
info['comments'] = []
for review in metadata['reviews']:
info['comments'].append({
'id': review.get('review_id'),
'author': review.get('reviewer'),
'text': str_or_none(review.get('reviewtitle'), '') + '\n\n' + review.get('reviewbody'),
'timestamp': unified_timestamp(review.get('createdate')),
'parent': 'root'})
return info

View File

@@ -46,6 +46,10 @@ from .alura import (
AluraCourseIE
)
from .amcnetworks import AMCNetworksIE
from .animelab import (
AnimeLabIE,
AnimeLabShowsIE,
)
from .americastestkitchen import AmericasTestKitchenIE
from .animeondemand import AnimeOnDemandIE
from .anvato import AnvatoIE

View File

@@ -1817,6 +1817,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if not isinstance(video_info, dict):
video_info = {}
playable_in_embed = try_get(
player_response, lambda x: x['playabilityStatus']['playableInEmbed'])
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
@@ -2538,6 +2541,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'release_date': release_date,
'release_year': release_year,
'subscriber_count': subscriber_count,
'playable_in_embed': playable_in_embed,
}
@@ -3620,8 +3624,8 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubeBaseInfoExtractor):
description = try_get(video, lambda x: x['descriptionSnippet']['runs'][0]['text'], compat_str)
duration = parse_duration(try_get(video, lambda x: x['lengthText']['simpleText'], compat_str))
view_count_text = try_get(video, lambda x: x['viewCountText']['simpleText'], compat_str) or ''
view_count = int_or_none(self._search_regex(
r'^(\d+)', re.sub(r'\s', '', view_count_text),
view_count = str_to_int(self._search_regex(
r'^([\d,]+)', re.sub(r'\s', '', view_count_text),
'view count', default=None))
uploader = try_get(video, lambda x: x['ownerText']['runs'][0]['text'], compat_str)
total += 1

View File

@@ -37,7 +37,25 @@ class PostProcessor(object):
self.PP_NAME = self.__class__.__name__[:-2]
def to_screen(self, text, *args, **kwargs):
return self._downloader.to_screen('[%s] %s' % (self.PP_NAME, text), *args, **kwargs)
if self._downloader:
return self._downloader.to_screen('[%s] %s' % (self.PP_NAME, text), *args, **kwargs)
def report_warning(self, text, *args, **kwargs):
if self._downloader:
return self._downloader.report_warning(text, *args, **kwargs)
def report_error(self, text, *args, **kwargs):
if self._downloader:
return self._downloader.report_error(text, *args, **kwargs)
def write_debug(self, text, *args, **kwargs):
if self.get_param('verbose', False):
return self._downloader.to_screen('[debug] %s' % text, *args, **kwargs)
def get_param(self, name, default=None, *args, **kwargs):
if self._downloader:
return self._downloader.params.get(name, default, *args, **kwargs)
return default
def set_downloader(self, downloader):
"""Sets the downloader for this PP."""
@@ -64,10 +82,10 @@ class PostProcessor(object):
try:
os.utime(encodeFilename(path), (atime, mtime))
except Exception:
self._downloader.report_warning(errnote)
self.report_warning(errnote)
def _configuration_args(self, default=[]):
args = self._downloader.params.get('postprocessor_args', {})
args = self.get_param('postprocessor_args', {})
if isinstance(args, list): # for backward compatibility
args = {'default': args, 'sponskrub': []}
return cli_configuration_args(args, self.PP_NAME.lower(), args.get('default', []))

View File

@@ -41,8 +41,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
thumbnail_filename = info['thumbnails'][-1]['filename']
if not os.path.exists(encodeFilename(thumbnail_filename)):
self._downloader.report_warning(
'Skipping embedding the thumbnail because the file is missing.')
self.report_warning('Skipping embedding the thumbnail because the file is missing.')
return [], info
def is_webp(path):
@@ -125,8 +124,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
self.to_screen('Adding thumbnail to "%s"' % filename)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] AtomicParsley command line: %s' % shell_quote(cmd))
self.verbose_message('AtomicParsley command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process_communicate_or_kill(p)
@@ -140,7 +138,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
# for formats that don't support thumbnails (like 3gp) AtomicParsley
# won't create to the temporary file
if b'No changes' in stdout:
self._downloader.report_warning('The file format doesn\'t support embedding a thumbnail')
self.report_warning('The file format doesn\'t support embedding a thumbnail')
else:
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))

View File

@@ -68,8 +68,7 @@ class FFmpegPostProcessor(PostProcessor):
self._versions[self.basename], required_version):
warning = 'Your copy of %s is outdated, update %s to version %s or newer if you encounter any errors.' % (
self.basename, self.basename, required_version)
if self._downloader:
self._downloader.report_warning(warning)
self.report_warning(warning)
@staticmethod
def get_versions(downloader=None):
@@ -99,11 +98,11 @@ class FFmpegPostProcessor(PostProcessor):
self._paths = None
self._versions = None
if self._downloader:
prefer_ffmpeg = self._downloader.params.get('prefer_ffmpeg', True)
location = self._downloader.params.get('ffmpeg_location')
prefer_ffmpeg = self.get_param('prefer_ffmpeg', True)
location = self.get_param('ffmpeg_location')
if location is not None:
if not os.path.exists(location):
self._downloader.report_warning(
self.report_warning(
'ffmpeg-location %s does not exist! '
'Continuing without avconv/ffmpeg.' % (location))
self._versions = {}
@@ -111,7 +110,7 @@ class FFmpegPostProcessor(PostProcessor):
elif not os.path.isdir(location):
basename = os.path.splitext(os.path.basename(location))[0]
if basename not in programs:
self._downloader.report_warning(
self.report_warning(
'Cannot identify executable %s, its basename should be one of %s. '
'Continuing without avconv/ffmpeg.' %
(location, ', '.join(programs)))
@@ -177,9 +176,7 @@ class FFmpegPostProcessor(PostProcessor):
encodeFilename(self.executable, True),
encodeArgument('-i')]
cmd.append(encodeFilename(self._ffmpeg_filename_argument(path), True))
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
self.write_debug('%s command line: %s' % (self.basename, shell_quote(cmd)))
handle = subprocess.Popen(
cmd, stderr=subprocess.PIPE,
stdout=subprocess.PIPE, stdin=subprocess.PIPE)
@@ -228,8 +225,7 @@ class FFmpegPostProcessor(PostProcessor):
+ [encodeArgument(o) for o in opts]
+ [encodeFilename(self._ffmpeg_filename_argument(out_path), True)])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] ffmpeg command line: %s' % shell_quote(cmd))
self.write_debug('ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
stdout, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
@@ -566,8 +562,7 @@ class FFmpegMergerPP(FFmpegPostProcessor):
'youtube-dlc will download single file media. '
'Update %s to version %s or newer to fix this.') % (
self.basename, self.basename, required_version)
if self._downloader:
self._downloader.report_warning(warning)
self.report_warning(warning)
return False
return True
@@ -656,7 +651,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
new_file = subtitles_filename(filename, lang, new_ext, info.get('ext'))
if ext in ('dfxp', 'ttml', 'tt'):
self._downloader.report_warning(
self.report_warning(
'You have requested to convert dfxp (TTML) subtitles into another format, '
'which results in style information loss')

View File

@@ -46,16 +46,16 @@ class SponSkrubPP(PostProcessor):
self.to_screen('Skipping sponskrub since it is not a YouTube video')
return [], information
if self.cutout and not self.force and not information.get('__real_download', False):
self._downloader.to_screen(
'[sponskrub] Skipping sponskrub since the video was already downloaded. '
self.report_warning(
'Skipping sponskrub since the video was already downloaded. '
'Use --sponskrub-force to run sponskrub anyway')
return [], information
self.to_screen('Trying to %s sponsor sections' % ('remove' if self.cutout else 'mark'))
if self.cutout:
self._downloader.to_screen('WARNING: Cutting out sponsor segments will cause the subtitles to go out of sync.')
self.report_warning('Cutting out sponsor segments will cause the subtitles to go out of sync.')
if not information.get('__real_download', False):
self._downloader.to_screen('WARNING: If sponskrub is run multiple times, unintended parts of the video could be cut out.')
self.report_warning('If sponskrub is run multiple times, unintended parts of the video could be cut out.')
filename = information['filepath']
temp_filename = filename + '.' + self._temp_ext + os.path.splitext(filename)[1]
@@ -68,8 +68,7 @@ class SponSkrubPP(PostProcessor):
cmd += ['--', information['id'], filename, temp_filename]
cmd = [encodeArgument(i) for i in cmd]
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] sponskrub command line: %s' % shell_quote(cmd))
self.write_debug('sponskrub command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
stdout, stderr = p.communicate()

View File

@@ -57,16 +57,16 @@ class XAttrMetadataPP(PostProcessor):
return [], info
except XAttrUnavailableError as e:
self._downloader.report_error(str(e))
self.report_error(str(e))
return [], info
except XAttrMetadataError as e:
if e.reason == 'NO_SPACE':
self._downloader.report_warning(
self.report_warning(
'There\'s no disk space left, disk quota exceeded or filesystem xattr limit exceeded. '
+ (('Some ' if num_written else '') + 'extended attributes are not written.').capitalize())
elif e.reason == 'VALUE_TOO_LONG':
self._downloader.report_warning(
self.report_warning(
'Unable to write extended attributes due to too long values.')
else:
msg = 'This filesystem doesn\'t support extended attributes. '
@@ -74,5 +74,5 @@ class XAttrMetadataPP(PostProcessor):
msg += 'You need to use NTFS.'
else:
msg += '(You may have to enable them in your /etc/fstab)'
self._downloader.report_error(msg)
self.report_error(msg)
return [], info

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.01.08'
__version__ = '2021.01.09'