2021
07-17
07-17
python文本处理的方案(结巴分词并去除符号)
看代码吧~importreimportjieba.analyseimportcodecsimportpandasaspddefsimplification_text(xianbingshi):"""提取文本"""xianbingshi_simplification=[]withcodecs.open(xianbingshi,'r','utf8')asf:forlineinf:line=line.strip()line_write=re.findall('(?<=\<b\>).*?(?=\<e\>)',line)forlineinline_write:xianbin...
继续阅读 >