前言
大家應該都有所體會,很多時候在做網絡爬蟲的時候特別需要將爬蟲搜索到的超鏈接進行處理,統一都改成絕對路徑的,所以本文就寫了一個正則表達式來對搜索到的鏈接進行處理。下面話不多說,來看看詳細的介紹吧。
通常我們可能會搜索到如下的鏈接:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
<!-- 空超鏈接 --> < a href = "" ></ a > <!-- 空白符 --> < a href = " " rel = "external nofollow" > </ a > <!-- a標簽含有其它屬性 --> < a href = "index.html" rel = "external nofollow" rel = "external nofollow" rel = "external nofollow" alt = "超鏈接" > index.html </ a > < a href = "/" rel = "external nofollow" rel = "external nofollow" rel = "external nofollow" rel = "external nofollow" target = "_blank" > / target="_blank" </ a > < a target = "_blank" href = "/" rel = "external nofollow" rel = "external nofollow" rel = "external nofollow" rel = "external nofollow" alt = "超鏈接" > target="_blank" / id="codetool">
| |||||