2020
12-04
12-04
python 爬取百度文库并下载(免费文章限定)
importrequestsimportreimportjsonimportossession=requests.session()deffetch_url(url):returnsession.get(url).content.decode('gbk')defget_doc_id(url):returnre.findall('view/(.*).html',url)[0]defparse_type(content):returnre.findall(r"docType.*?\:.*?\'(.*?)\'\,",content)[0]defparse_title(content):returnre.findall(r"title.*?\:.*?\'(.*?)\'\,",content)[0]de...
继续阅读 >