今天早上,我注意到一个 IP 地址有点在爬我的网站,尽管它在几分钟内多次查询同一个页面。然后我注意到它正在使用不同的用户代理来执行此操作。
我决定通过分析 Apache httpd 日志来检查发生了什么
cut -d' ' -f1 /var/log/apache2/*access.log | # Extract all IP-addresses from the server logs
sort -u | # List every IP-address only once
while read ip; do # Cycle through the list of IP-addresses
printf "$ip\t"; # Print the IP-address
grep ^$ip /var/log/apache2/*access.log | # Select log entries for an IP-address
sed 's/^.*\("[^"]*"\)$/\1/' | # Extract the user-agent
sort -u | # Create a list of user-agents
wc -l; # Count the unique user-agents
done |
tee >( cat >&2; echo '=== SORTED ===' ) | # Suspense is killing me, I want to see the progress while the script runs...
sort -nk2 | # Sort list by number of different user agents
cat -n # Add line numbers
这导致了一个很长的列表:
line IP-address number of different user-agents used.
...
1285 176.213.0.34 15
1286 176.213.0.59 15
1287 5.158.236.154 15
1288 5.158.238.157 15
1289 5.166.204.48 15
1290 5.166.212.42 15
1291 176.213.28.54 16
1292 5.166.212.10 16
1293 176.213.28.32 17
1294 5.164.236.40 17
1295 5.158.238.6 18
1296 5.158.239.1 18
1297 5.166.208.39 18
1298 176.213.20.0 19
1299 5.164.220.43 19
1300 5.166.208.35 19
因此,有数十个 IP 地址在几分钟内摆弄用户代理。我对照已知机器人的私人小日志检查了前 50 个 IP 地址,但没有匹配项。
这是单个 IP 地址的访问日志的样子(为了便于阅读,垂直和水平截断):
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
其他人看到了吗?有人知道这里发生了什么吗?