废话不多说直接开始UC视频的下载过程。过程不是很繁琐,可以直接从最下面代码中找到直观过程,跳过多余的步骤。如果你想学会分析过程,可以认真解读完整过程,任何不懂的可以在下方留言评论。
准备阶段:
- 具备基本的编程经验
- 能安装python环境和依赖包,读懂和运行python代码
- 会进行谷歌浏览器调试
第一步,找到链接:
打开手机UC浏览器,首先找到自己感兴趣的视频,比如我很喜欢李子柒的视频,找到李子柒个人主页。
然后点击右上角分享按钮,找到右下方的复制链接。这个链接如下所示:
http://a.mp.uc.cn/media.html?mid=5ebdd3d3aada4f3fb5b5a33b3669a0c6&client=ucweb&uc_param_str=frdnsnpfvecpntnwprdsssnikt

第二步,浏览器打开视频:
然后将这个链接在电脑端谷歌浏览器中打开,你会看到如下内容:

然后F12打开调试窗口,点击第一个视频看看发送了哪些请求,得到了哪些结果,可以看到发送了超级多的请求,那么接下来就要进行数据分析,看看每个API都在干什么,总有API是返回视频标题,视频图片和视频源的,我们只要用心去找,总能找到正在的视频地址。

第三步,找到视频真正的地址:
真正的视频链接很容易被找到,以下就是该视频的真正链接,打开看一下,对,就是这个视频。当然咯,你可以直接鼠标右键视频另存为就完成操作了,但是接下来我们要讲的是如何使用python爬取这个视频呢?(链接可能失效,实时链接如下调试截图可找到)

第四步:反推请求过程:
既然找到这个链接,那么开始反推,它做了哪些请求:
它需要一个token,一个ums_id,一个wm_cid,一个wm_id和一个画质分辨率resolution,接下来,开始分析并且找到这些值,token就是你当前浏览器的token,其它几个参数需要通过以下步骤拿到。

wmAid 和 wm_id都能从上面这个链接地址中获取,接下来需要用这个参数去请求下面这个API来获取wm_cid和ums_id,你可以在它的返回结果中找到这两个参数,content_id就是wm_cid
https://ff.dayu.com/contents/origin/"+wmAid+"?biz_id=1002


然后所有参数就都找到了,最后拼接并请求如下URL:
https://mparticle.uc.cn/api/vps?token='+token+'&ums_id='+ums_id+'&wm_cid='+cid+'&wm_id='+wmId+'&resolution=high
以下是根据视频页面地址获取视频地址的完整代码:
import urllib,re,requests
import sys
import json
import os
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Cookie':'MUID=0DA4D4BD498D6CFB3AA9D9BE4D8D6820; _SS=SID=00; videoCookiesLastCategory=en-ca=animals; _cb_ls=1; _cb=DsAPZCJmzJ0BiCB2c; _chartbeat2=.1550504320201.1550509083431.11.uAsG96ZMUeDgWcawC2JWWNCmna0Z.1; ANON=A=E43533FD3C93526D33F4F5C4FFFFFFFF&E=164d&W=1; NAP=V=1.9&E=15f3&C=gBm5NGQ6hq9_2JLke_9M_uUX-nhzUVJp3UwliRTchLXC4pE05Iv2DA&W=1; vidvol=10; adoptout={"msaOptOut":0,"adIdOptOut":0}; videoerrorcount=0; trg=0%7C0%7C0; ecasession=v2_9a22cfec7b49fc3893239e7b074a63fd_ba74fcd1-eff2-48d8-b6af-082423d58358-tuct35eea8e_1550666026_1550666984_CNawjgYQqLw-GP6e37rd9J7zBiACKAYwMDjK_QdA_qAQSI7OHlCJxAlYAGAC'
}
def downloadVideo(url, path):
if (not os.path.exists(path)):
urllib.request.urlretrieve(url, path)
if __name__ == "__main__":
initURL = "https://mparticle.uc.cn/video.html?uc_param_str=frdnsnpfvecpntnwprdssskt&wm_aid=ce1b8ba53f9e4ffa8b609cd7955e5cad&wm_id=5ebdd3d3aada4f3fb5b5a33b3669a0c6"
res = requests.get(initURL, headers = headers)
token = None
if res.status_code == 200:
res.encoding = 'utf8'
token = res.cookies['vpstoken']
print(token)
url = initURL
parsed = urlparse(url)
wmAid = parse_qs(parsed.query)['wm_aid'][0]
wmId = parse_qs(parsed.query)['wm_id'][0]
print(wmAid)
print(wmId)
umsIDAPI = "https://ff.dayu.com/contents/origin/"+wmAid+"?biz_id=1002"
res = requests.get(umsIDAPI, headers = headers)
if res.status_code == 200:
res.encoding = 'utf8'
videoInfo = json.loads(res.text)
cid = videoInfo['data']["content_id"]
title = videoInfo['data']['title']
coverURL = videoInfo['data']['cover_url']
ums_id = videoInfo['data']['body']['videos'][0]['ums_id']
print(title)
print(coverURL)
print(ums_id)
videoAPI = 'https://mparticle.uc.cn/api/vps?token='+token+'&ums_id='+ums_id+'&wm_cid='+cid+'&wm_id='+wmId+'&resolution=high'
print(videoAPI)
res = requests.get(videoAPI, headers = headers)
if res.status_code == 200:
res.encoding = 'utf8'
videoURLInfo = json.loads(res.text)
print(videoURLInfo['data']['url'])
如何批量下载作者的所有视频:
如果你想获取李子柒这个号的所有视频链接呢?也很好做,通过如下链接,通过传递size,page参数,就可以拿到所有李子柒视频列表了
https://ff.dayu.com/contents/author/5ebdd3d3aada4f3fb5b5a33b3669a0c6?biz_id=1002&_size=8&_page=1&_order_type=published_at&status=1&_fetch=1&uc_param_str=frdnsnpfvecpntnwprdsssnikt&_=1596342684957
每个视频的content_id和ums_id都可以找到,继而循环下载所有视频:

完整代码如下,可将李子柒的视频下载到"D\\李子柒"目录下,以下代码仅下载第一页的八个视频。
import urllib,re,requests
import sys
import json
import os
import time
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Cookie':'MUID=0DA4D4BD498D6CFB3AA9D9BE4D8D6820; _SS=SID=00; videoCookiesLastCategory=en-ca=animals; _cb_ls=1; _cb=DsAPZCJmzJ0BiCB2c; _chartbeat2=.1550504320201.1550509083431.11.uAsG96ZMUeDgWcawC2JWWNCmna0Z.1; ANON=A=E43533FD3C93526D33F4F5C4FFFFFFFF&E=164d&W=1; NAP=V=1.9&E=15f3&C=gBm5NGQ6hq9_2JLke_9M_uUX-nhzUVJp3UwliRTchLXC4pE05Iv2DA&W=1; vidvol=10; adoptout={"msaOptOut":0,"adIdOptOut":0}; videoerrorcount=0; trg=0%7C0%7C0; ecasession=v2_9a22cfec7b49fc3893239e7b074a63fd_ba74fcd1-eff2-48d8-b6af-082423d58358-tuct35eea8e_1550666026_1550666984_CNawjgYQqLw-GP6e37rd9J7zBiACKAYwMDjK_QdA_qAQSI7OHlCJxAlYAGAC'
}
def downloadVideo(url, path):
if (not os.path.exists(path)):
urllib.request.urlretrieve(url, path)
def analysizeAndDownload(url, page, size, directory):
res = requests.get(initURL, headers = headers)
token = None
if res.status_code == 200:
res.encoding = 'utf8'
token = res.cookies['vpstoken']
print(token)
parsed = urlparse(url)
wmAid = parse_qs(parsed.query)['mid'][0]
uc_param_str = parse_qs(parsed.query)['uc_param_str'][0]
listAPI = 'https://ff.dayu.com/contents/author/'+wmAid+'?biz_id=1002&_size='+str(size)+'&_page='+str(page)+'&_order_type=published_at&status=1&_fetch=1&uc_param_str='+uc_param_str+'&_='+str(int(time.time()) * 1000)+''
res = requests.get(listAPI, headers = headers)
if res.status_code == 200:
res.encoding = 'utf8'
listData = json.loads(res.text)
for one in listData['data']:
wmId = wmAid
cid = one["content_id"]
title = one['title']
ums_id = one['body']['videos'][0]['ums_id']
videoAPI = 'https://mparticle.uc.cn/api/vps?token='+token+'&ums_id='+ums_id+'&wm_cid='+cid+'&wm_id='+wmId+'&resolution=high'
res = requests.get(videoAPI, headers = headers)
if res.status_code == 200:
res.encoding = 'utf8'
videoURLInfo = json.loads(res.text)
videoURL = videoURLInfo['data']['url']
print("downloading: " + videoURL)
downloadVideo(videoURL, directory+"\\"+title+".mp4")
print("Fininsed!")
if __name__ == "__main__":
initURL = "http://a.mp.uc.cn/media.html?mid=5ebdd3d3aada4f3fb5b5a33b3669a0c6&client=ucweb&uc_param_str=frdnsnpfvecpntnwprdsssnikt"
directory="D:\\李子柒"
if (not os.path.exists(directory)):
os.makedirs(directory)
analysizeAndDownload(initURL, 1, 8, directory) #下载第一页的八个视频,可修改参数循环下载第二页,第三页

此文仅供学习用途,若用于商业用途,自行承担后果!
转载请标明出处!http://52sbl.cn/article/8


发表评论
所有评论(0)
挺好的