2020
09-29
09-29
PyPDF2读取PDF文件内容保存到本地TXT实例
我就废话不多说了,大家还是直接看代码吧!fromPyPDF2.pdfimportPdfFileReaderimportpandasaspddefPdf_to_txt(pdf):foriinrange(0,pdf.getNumPages()):title=[]lin1,lin2,lin3,lin4,lin5,lin6,lin7,lin8=[],[],[],[],[],[],[],[]extractedText=pdf.getPage(i).extractText()text=extractedText.split('\n')num=0forlinintext:ifnum==0:title.ap...
继续阅读 >
我就废话不多说了,大家还是看代码吧!importPyPDF2importrepdf_file=open('xxx.pdf',mode='rb')read_pdf=PyPDF2.PdfFileReader(pdf_file)#获取pdf文件的所有页数number_of_pages=read_pdf.getNumPages()#print('total_page:',number_of_pages)line_list=[]#循环遍历每一页foriinrange(0,number_of_pages):#读取每一页的内容page=read_pdf.getPage(i)page_content=page.extractText()...