Python爬虫入门:requests库入门

  • Python爬虫入门:requests库入门
    • requests基本用法
      • 简单起步
      • 支持方法
    • GET请求
      • 无参数基本实例
      • 带参数请求
        • 参数直接在url中
        • 参数通过params=data形式添加
      • 调用json解析
      • 获取网页中的ico图标
      • 添加headers
    • POST请求
    • 文件上传
    • Cookies
      • 获取Cookies
      • 设置Cookies
    • 会话维持
      • 多个请求不是同一个会话
      • Session会话维持

Python爬虫入门:requests库入门

requests基本用法

简单起步

import requests
r = requests.get('https://www.baidu/')
print(type(r))	 		#<class 'requests.models.Response'>
print(r.status_code) 	#200
print(r.text) 			#html
print(r.cookies) 		#<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu/>]>

支持方法

  • get()
  • post()
  • put()
  • delete()
  • head()
  • options()

GET请求

无参数基本实例

import requests
r = requests.get('http://httpbin/get')
print(r.text)
"""
输出为:
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin",
    "User-Agent": "python-requests/2.22.0"
  },
  "origin": "123.121.82.132, 123.121.82.132",
  "url": "https://httpbin/get"
}
"""

带参数请求

参数直接在url中

import requests
r = requests.get('http://httpbin/get?name=germey&age=22')
print(r.text)

参数通过params=data形式添加

import requests
data = {
	'name':'germey',
	'age':22
}
r = requests.get('http://httpbin/get', params=data)
print(r.text)

调用json解析

import requests
r = requests.get('http://httpbin/get')
print(type(r.text))
print(r.json())
print(type(r.json()))

获取网页中的ico图标

import requests
r = requests.get('https://github/favicon.ico')
with open('favicon.ico','wb') as f:
	f.write(r.content)

添加headers

import requests
headers = {
	'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko)\
	Chrome/52.0.2743.116 Safari/537.36'
}
r= requests.get('https://www.zhihu/explore',headers=headers)
print(r.text)

POST请求

import requests
data = {'name':'germey','age':'22'}
r = requests.post('http://httpbin/post',data=data)
print(r.text)
"""
结果为:
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "age": "22",
    "name": "germey"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "18",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin",
    "User-Agent": "python-requests/2.22.0"
  },
  "json": null,
  "origin": "123.121.82.132, 123.121.82.132",
  "url": "https://httpbin/post"
}
"""

文件上传

import requests
files = {'file':open('favicon.ico','rb')}
r = requests.post('http://httpbin/post',files=files)
print(r.text)

Cookies

获取Cookies

import requests
r = requests.get('https://www.baidu')
print(r.cookies)
for key, value in r.cookies.items():
	print(key + '=' + value)

设置Cookies

import requests
headers = {
	'Cookie':'Cookie信息',
	'Host':'www.zhihu',
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
r = requests.get('https://www.zhihu',headers=headers)
print(r.text)

会话维持

多个请求不是同一个会话

多个请求时不相干的,相当于多个浏览器的cookies不干扰。可以为每个请求添加一个header来设置cookies,但是比较麻烦。
注:http://httpbin/cookies/set/number/123456789 能够把cookies设置为123456789
http://httpbin/cookies 能获取cookies

import requests
requests.get('http://httpbin/cookies/set/number/123456789')
r = requests.get('http://httpbin/cookies')
print(r.text)
"""
结果为
{
  "cookies": {}
}
"""

Session会话维持

import requests
s = requests.Session()
s.get('http://httpbin/cookies/set/number/123456789')
r = s.get('http://httpbin/cookies')
print(r.text)
"""
结果为
{
  "cookies": {
    "number": "123456789"
  }
}
"""

更多推荐

Python爬虫入门:requests库入门