解决:UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 32-33: Body (‘推荐’) is not valid Latin-1. Use body.encode(‘utf-8’) if you want to send it encoded in UTF-8.在我们使用Python的第三方库requests发起post请求时,有时会遇到由于post所携带的数据的编码问题,导致请求失败,这时我们就需要改变编码格式。
一,问题重现
我要使用Python的第三方库requests发起一个post请求,post所携带的数据类型为:
Content-Type:text/plain;charset=UTF-8
是字符串,data为:
data = '{"filter":"all","auto":1,"tab":"推荐","direction":"homebutton","c_types":[1,3,2,8,7,9,11],"sdk_ver":{"tt":"1.9.6.3","tx":"4.19.574","tt_aid":"5004095","tx_aid":"1107850635"},"ad_wakeup":1,"h_ua":"Mozilla\/5.0 (Linux; Android 7.1.2; MI 5X Build\/N2G47H; wv) AppleWebKit\/537.36 (KHTML, like Gecko) Version\/4.0 Chrome\/67.0.3396.87 Mobile Safari\/537.36","h_av":"4.7.3","h_dt":0,"h_os":25,"h_app":"zuiyou","h_model":"MI 5X","h_did":"866655030396869_02:00:00","h_nt":1,"h_m":116456192,"h_ch":"xiaomi","h_ts":1543834422778,"token":"TfKbNCRqAec6tUN7wn3-JSGqoTcO1QytGiEBG2E1jQvCYBqj-TcCLYxVzUKtxgpDii503","android_id":"57b9b8465c2e440b"}'
刚开始我是这样写的:
r = requests.post(url, headers=headers, data=data)
结果得到:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 32-33: Body ('推荐') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
这样一个错误。
二,解决问题
这是一个编码问题,但是编码也不能随便编。于是我查阅了一下requests的官方文档看看post的data参数能接受哪种类型的数据。
requests.post(url, data=None, json=None, **kwargs) Sends a POST request. Parameters: url – URL for the new Request object. data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request. json – (optional) json data to send in the body of the Request. **kwargs – Optional arguments that request takes. Returns: Response object Return type: requests.Respons
data的数据类型可以为byte,因此只需要将str类型的data直接按utf-8编码成byte类型,然后加在request的报文主体就能获取到数据了。修改代码:
url = 'http://api.izuiyou.com/index/recommend?sign=818350617283d1f719b3873545b36965' data = '{"filter":"all","auto":1,"tab":"推荐","direction":"homebutton","c_types":[1,3,2,8,7,9,11],"sdk_ver":{"tt":"1.9.6.3","tx":"4.19.574","tt_aid":"5004095","tx_aid":"1107850635"},"ad_wakeup":1,"h_ua":"Mozilla\/5.0 (Linux; Android 7.1.2; MI 5X Build\/N2G47H; wv) AppleWebKit\/537.36 (KHTML, like Gecko) Version\/4.0 Chrome\/67.0.3396.87 Mobile Safari\/537.36","h_av":"4.7.3","h_dt":0,"h_os":25,"h_app":"zuiyou","h_model":"MI 5X","h_did":"866655030396869_02:00:00","h_nt":1,"h_m":116456192,"h_ch":"xiaomi","h_ts":1543834422778,"token":"TfKbNCRqAec6tUN7wn3-JSGqoTcO1QytGiEBG2E1jQvCYBqj-TcCLYxVzUKtxgpDii503","android_id":"57b9b8465c2e440b"}' # 将data按utf-8编码 byte_data = data.encode('utf-8') headers = {"User-Agent": "okhttp/3.11.0 Zuiyou/4.7.1"} r = requests.post(url, headers=headers, data=byte_data) print(r.json())
没有报错,执行结果:
{'data': {'tips': '', 'list': [{'id': 81886568, 'mid': 28189997, 'vd_stat': 1, 'god_rids': {'741612920': {'audited': 1, 'ct': 1543416446}}, 'content': '一句话证明你看过悲伤逆流成河❤️', 'ut': 1543840276, 'type': 0, 'up': 5676, 'status': 2, 'ct': 1543306019, 'c_type': 1, '_id': 81886568, 'god_reviews': [{'id': 741612920, 'source': 'user', 'mid': 34064058, 'ut': 1543840464, 'likes': 2726, 'up': 2730, 'status': 3, 'avatar': 461569561, 'ct': 1543383085, 'svut': 1543837023, 'pid': 81886568, 'subreviewcnt': 14, '_id': 741612920, 'disp': 117834, 'mname': '此时路过一位沙雕',(部分数据)
三,总结
在我们使用Python编写爬虫程序遇到类似的问题时应首先按utf-8格式编码,另外在抓包过程中要确定请求的格式(Content-Type),有时候格式不对就会导致爬不到数据。
转载请注明:志颖博客 » 解决:UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 32-33