早就聽說requests的庫的強大,只是還沒有接觸,今天接觸了一下,發現以前使用urllib,urllib2等方法真是太搓了……
這里寫些簡單的使用初步作為一個記錄
一、下載
官方項目頁: https://pypi.python.org/pypi/requests/#downloads
可以從上面直接下載。
二、發送無參數的get請求
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
>>> r = requests.get( 'http://httpbin.org/get' ) >>> print r.text { "args" : {}, "headers" : { "Accept" : "*/*" , "Accept-Encoding" : "gzip, deflate" , "Connection" : "close" , "Host" : "httpbin.org" , "User-Agent" : "python-requests/2.3.0 CPython/2.6.6 Windows/7" , "X-Request-Id" : "8a28bbea-55cd-460b-bda3-f3427d66b700" }, "origin" : "124.192.129.84" , "url" : "http://httpbin.org/get" } |
三、發送帶參數的get請求,將key與value放入一個字典中,通過params參數來傳遞,其作用相當于urllib.urlencode
1
2
3
4
5
|
>>> import requests >>> pqyload = { 'q' : '楊彥星' } >>> r = requests.get( 'http://www.so.com/s' ,params = pqyload) >>> r.url u 'http://www.so.com/s?q=%E6%9D%A8%E5%BD%A6%E6%98%9F' |
四、發送post請求,通過data參數來傳遞,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
>>> payload = { 'a' : '楊' , 'b' : 'hello' } >>> r = requests.post( "http://httpbin.org/post" , data = payload) >>> print r.text { "args" : {}, "data" : "", "files" : {}, "form" : { "a" : "\u6768" , "b" : "hello" }, "headers" : { "Accept" : "*/*" , "Accept-Encoding" : "gzip, deflate" , "Connection" : "close" , "Content-Length" : "19" , "Content-Type" : "application/x-www-form-urlencoded" , "Host" : "httpbin.org" , "User-Agent" : "python-requests/2.3.0 CPython/2.6.6 Windows/7" , "X-Request-Id" : "c81cb937-04b8-4a2d-ba32-04b5c0b3ba98" }, "json" : null, "origin" : "124.192.129.84" , "url" : "http://httpbin.org/post" } >>> |
可以看到,post參數已經傳到了form里,data不光可以接受字典類型的數據,還可以接受json等格式
1
2
3
|
>>> payload = { 'a' : '楊' , 'b' : 'hello' } >>> import json >>> r = requests.post( 'http://httpbin.org/post' , data = json.dumps(payload)) |
五、發送文件的post類型,這個相當于向網站上傳一張圖片,文檔等操作,這時要使用files參數
1
2
3
|
>>> url = 'http://httpbin.org/post' >>> files = { 'file' : open ( 'touxiang.png' , 'rb' )} >>> r = requests.post(url, files = files) |
定制headers,使用headers參數來傳遞
1
2
3
4
5
|
>>> import json >>> url = 'https://api.github.com/some/endpoint' >>> payload = { 'some' : 'data' } >>> headers = { 'content-type' : 'application/json' } >>> r = requests.post(url, data = json.dumps(payload), headers = headers) |
六、響應內容
響應狀態碼:
1
2
|
r = requests.get( 'http://httpbin.org/get' ) print r.status_code |
響應頭:
1
2
|
>>> print r.headers { 'content-length' : '519' , 'server' : 'gunicorn/18.0' , 'connection' : 'keep-alive' , 'date' : 'Sun, 15 Jun 2014 14:19:52 GMT' , 'access-control-allow-origin' : '*' , 'content-type' : 'application/json' } |
也可以取到這個個別的響應頭用來做一些判斷,這里的參數是不區分大小寫的
1
2
|
r.headers[‘Content - Type '] r.headers.get(‘Content - Type ') |
響應內容,前面已經在應用了:
1
2
|
r.text r.content |
七、獲取響應中的cookies
1
2
3
|
>>> r = requests.get( 'http://www.baidu.com' ) >>> r.cookies[ 'BAIDUID' ] 'D5810267346AEFB0F25CB0D6D0E043E6:FG=1' |
也可以自已定義請求的COOKIES
1
2
3
4
5
6
7
8
9
10
11
|
>>> url = 'http://httpbin.org/cookies' >>> cookies = { 'cookies_are' : 'working' } >>> r = requests.get(url,cookies = cookies) >>> >>> print r.text { "cookies" : { "cookies_are" : "working" } } >>> |
cookies還有很多,因為目前我也還不是很多,以后再擴充吧
八、使用timeout參數設置超時時間
1
2
|
>>> requests.get( 'http://github.com' , timeout = 1 ) <Response [ 200 ]> |
如果將時間設置成非常小的數,如
1
|
requests.get( 'http://github.com' , timeout = 0.001 ) |
,那么如果在timeout的時間內沒有連接,那么將會拋出一個Timeout的異常
九、訪問中使用session
先初始化一個session對象,
1
|
s = requests.Session() |
然后使用這個session對象來進行訪問,r = s.post(url,data = user)
以下通過訪問人人網來獲取首頁中的最近來訪問,然后再訪問查看更多的來訪來讀取更多的最近來訪
更多的來訪就是以帶session的訪問http://www.renren.com/myfoot.do
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
#coding:utf-8 import requests import re url = r 'http://www.renren.com/ajaxLogin' user = { 'email' : 'email' , 'password' : 'pass' } s = requests.Session() r = s.post(url,data = user) html = r.text visit = [] first = re. compile (r '</span><span class="time-tip first-tip"><span class="tip-content">(.*?)</span>' ) second = re. compile (r '</span><span class="time-tip"><span class="tip-content">(.*?)</span>' ) third = re. compile (r '</span><span class="time-tip last-second-tip"><span class="tip-content">(.*?)</span>' ) last = re. compile (r '</span><span class="time-tip last-tip"><span class="tip-content">(.*?)</span>' ) visit.extend(first.findall(html)) visit.extend(second.findall(html)) visit.extend(third.findall(html)) visit.extend(last.findall(html)) for i in visit: print i print '以下是更多的最近來訪' vm = s.get( 'http://www.renren.com/myfoot.do' ) fm = re. compile (r '"name":"(.*?)"' ) visitmore = fm.findall(vm.text) for i in visitmore: print i |
十、requests-cookies
Cookies就像字典一樣儲存了各個項的值并保存起來, 例如我們的用戶名, 密碼, 登錄信息等都可以保存起來. 當網頁再次被加載時可以從cookies中找到相關的信息并從而免除再次輸入賦值的過程.
在requests中使用get等請求時同樣可以賦予cookies信息. 例如我們從瀏覽器中獲取某次網頁加載時請求的cookies, 可以同樣賦予requests再次使用.
requests請求時加入cookies={key:value}參數即可傳遞cookies.
1
2
3
4
5
6
7
|
import requests url = 'http://httpbin.org/cookies' cookies = dict (cookies_are = 'working' ) r = requests.get(url, cookies = cookies) r.text #'{"cookies": {"cookies_are": "working"}}' |
查詢某次請求的cookies很簡單, 就像獲得headers一樣使用cookies屬性即可:
1
2
3
4
5
|
url = 'http://example.com/some/cookie/setting/url' r = requests.get(url) r.cookies[ 'example_cookie_name' ] # 'example_cookie_value' |
以下函數可以分解瀏覽器獲得的cookies字符串到一個字典,從而幫助我們模擬requests請求.
1
2
3
4
5
6
7
8
9
|
def browsercookiesdict(s): '''Covert cookies string from browser to a dict''' ss = s.split( ';' ) outdict = {} for item in ss: i1 = item.split( '=' , 1 )[ 0 ].strip() i2 = item.split( '=' , 1 )[ 1 ].strip() outdict[i1] = i2 return outdict |