文章詳情頁

python - 爬蟲內容保存成文本文件編碼問題

瀏覽：102日期：2022-06-29 09:03:36

問題描述

測試一個非常簡單的爬蟲，把一個非常簡約風格的網頁的文本內容保存到本地的電腦上。最后出現錯誤：

UnicodeEncodeErrorTraceback (most recent call last)<ipython-input-35-ead5570b2e15> in <module>() 7 filename=str(i)+’.txt’ 8 with open(filename,’w’)as f:----> 9 f.write(content) 10 print(’當前小說第{}章已經下載完成’.format(i)) 11 f.close()UnicodeEncodeError: ’gbk’ codec can’t encode character ’xa0’ in position 7: illegal multibyte sequence

代碼如下：

In [1]: import requestsIn [2]: from bs4 import BeautifulSoupIn [3]: re=requests.get(’http://www.qu.la/book/168/’)In [4]: html=re.textIn [5]: soup=BeautifulSoup(html,’html.parser’)In [6]: list=soup.find(id='list')In [9]: link_list=list.find_all(’a’)In [14]: mylist=[] ...: for link in link_list: ...: mylist.append(’http://www.qu.la’+link.get(’href’)) ...: ...:#遍歷每個鏈接，下載文本內容到本地文本文件i=0 ...: for url in mylist1: ...: re1=requests.get(url) ...: html2=re1.text ...: soup=BeautifulSoup(html2,'html.parser') ...: content=soup.find(id='content').text.replace(’chaptererror();’, ’’) ...: filename=str(i)+’.txt’ ...: with open(filename,’w’)as f: ...: f.write(content) ...: print(’當前小說第{}章已經下載完成’.format(i)) ...: f.close() ...: i=i+1

問題解答

回答1：

f.write(content.encode(’utf-8’))

或者

import codecswith codecs.open(filename, ’w’, ’utf-8’) as f: f.write(content)

Python 編程

上一條：python3 PyQt5 多線程報錯，QObject: Cannot下一條：python - 使用pyinstaller 可以添加指定的模塊嗎？

相關文章：

1. php - mysql 模糊搜索問題2. php如何獲取訪問者路由器的mac地址3. javascript - js setTimeout在雙重for循環中如何使用？4. php - 微信開發驗證服務器有效性5. javascript - 在 vue里面用import引入js文件，結果為undefined6. 小程序怎么加外鏈，語句怎么寫！求救新手，開文檔沒發現7. html - 爬蟲時出現“DNS lookup failed”，打開網頁卻沒問題，這是什么情況？8. 求救一下，用新版的phpstudy，數據庫過段時間會消失是什么情況？9. python沒入門，請教一個問題10. javascript - 我的站點貌似被別人克隆了， google 搜索特定文章，除了域名不一樣，其他的都一樣，如何解決？

排行榜

					
					python沒入門，請教一個問題
求救一下，用新版的phpstudy，數據庫過段時間會消失是什么情況？
html - 爬蟲時出現“DNS lookup failed”，打開網頁卻沒問題，這是什么情況？
php如何獲取訪問者路由器的mac地址
javascript - 我的站點貌似被別人克隆了， google 搜索特定文章，除了域名不一樣，其他的都一樣，如何解決？
android clickablespan獲取選中內容
小程序怎么加外鏈，語句怎么寫！求救新手，開文檔沒發現
php - 微信開發驗證服務器有效性
php -  mysql 模糊搜索問題
javascript -  在 vue里面用import引入js文件，結果為undefined
node.js - npm一直提示proxy有問題
				

熱門標簽

亚洲精品久久久中文字幕-亚洲精品久久片久久-亚洲精品久久青草-亚洲精品久久婷婷爱久久婷婷-亚洲精品久久午夜香蕉

python - 爬蟲內容保存成文本文件 編碼問題

python - 爬蟲內容保存成文本文件編碼問題