有什么问题欢迎大家加QQ群:565712652进行讨论!

解决:UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 32-33

Python Jason zhou 5232℃

解决:UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 32-33: Body (‘推荐’) is not valid Latin-1. Use body.encode(‘utf-8’) if you want to send it encoded in UTF-8.在我们使用Python的第三方库requests发起post请求时,有时会遇到由于post所携带的数据的编码问题,导致请求失败,这时我们就需要改变编码格式。

一,问题重现

我要使用Python的第三方库requests发起一个post请求,post所携带的数据类型为:

Content-Type:text/plain;charset=UTF-8

是字符串,data为:

    
data = '{"filter":"all","auto":1,"tab":"推荐","direction":"homebutton","c_types":[1,3,2,8,7,9,11],"sdk_ver":{"tt":"1.9.6.3","tx":"4.19.574","tt_aid":"5004095","tx_aid":"1107850635"},"ad_wakeup":1,"h_ua":"Mozilla\/5.0 (Linux; Android 7.1.2; MI 5X Build\/N2G47H; wv) AppleWebKit\/537.36 (KHTML, like Gecko) Version\/4.0 Chrome\/67.0.3396.87 Mobile Safari\/537.36","h_av":"4.7.3","h_dt":0,"h_os":25,"h_app":"zuiyou","h_model":"MI 5X","h_did":"866655030396869_02:00:00","h_nt":1,"h_m":116456192,"h_ch":"xiaomi","h_ts":1543834422778,"token":"TfKbNCRqAec6tUN7wn3-JSGqoTcO1QytGiEBG2E1jQvCYBqj-TcCLYxVzUKtxgpDii503","android_id":"57b9b8465c2e440b"}'

刚开始我是这样写的:

    
r = requests.post(url, headers=headers, data=data)

结果得到:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 32-33: Body ('推荐') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

这样一个错误。

二,解决问题

这是一个编码问题,但是编码也不能随便编。于是我查阅了一下requests的官方文档看看post的data参数能接受哪种类型的数据。

requests.post(url, data=None, json=None, **kwargs)
Sends a POST request.

Parameters:	
url – URL for the new Request object.
data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request.
json – (optional) json data to send in the body of the Request.
**kwargs – Optional arguments that request takes.
Returns:	
Response object

Return type:	requests.Respons

data的数据类型可以为byte,因此只需要将str类型的data直接按utf-8编码成byte类型,然后加在request的报文主体就能获取到数据了。修改代码:

url = 'http://api.izuiyou.com/index/recommend?sign=818350617283d1f719b3873545b36965'
data = '{"filter":"all","auto":1,"tab":"推荐","direction":"homebutton","c_types":[1,3,2,8,7,9,11],"sdk_ver":{"tt":"1.9.6.3","tx":"4.19.574","tt_aid":"5004095","tx_aid":"1107850635"},"ad_wakeup":1,"h_ua":"Mozilla\/5.0 (Linux; Android 7.1.2; MI 5X Build\/N2G47H; wv) AppleWebKit\/537.36 (KHTML, like Gecko) Version\/4.0 Chrome\/67.0.3396.87 Mobile Safari\/537.36","h_av":"4.7.3","h_dt":0,"h_os":25,"h_app":"zuiyou","h_model":"MI 5X","h_did":"866655030396869_02:00:00","h_nt":1,"h_m":116456192,"h_ch":"xiaomi","h_ts":1543834422778,"token":"TfKbNCRqAec6tUN7wn3-JSGqoTcO1QytGiEBG2E1jQvCYBqj-TcCLYxVzUKtxgpDii503","android_id":"57b9b8465c2e440b"}'
# 将data按utf-8编码
byte_data = data.encode('utf-8')
headers = {"User-Agent": "okhttp/3.11.0 Zuiyou/4.7.1"}
r = requests.post(url, headers=headers, data=byte_data)
print(r.json())

没有报错,执行结果:

{'data': {'tips': '', 'list': [{'id': 81886568, 'mid': 28189997, 'vd_stat': 1, 'god_rids': {'741612920': {'audited': 1, 'ct': 1543416446}}, 'content': '一句话证明你看过悲伤逆流成河❤️', 'ut': 1543840276, 'type': 0, 'up': 5676, 'status': 2, 'ct': 1543306019, 'c_type': 1, '_id': 81886568, 'god_reviews': [{'id': 741612920, 'source': 'user', 'mid': 34064058, 'ut': 1543840464, 'likes': 2726, 'up': 2730, 'status': 3, 'avatar': 461569561, 'ct': 1543383085, 'svut': 1543837023, 'pid': 81886568, 'subreviewcnt': 14, '_id': 741612920, 'disp': 117834, 'mname': '此时路过一位沙雕',(部分数据)

三,总结

在我们使用Python编写爬虫程序遇到类似的问题时应首先按utf-8格式编码,另外在抓包过程中要确定请求的格式(Content-Type),有时候格式不对就会导致爬不到数据。

转载请注明:志颖博客 » 解决:UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 32-33

喜欢 (22)or分享 (0)