keepalived VM禁用網卡導致無VIP分析

來源:本站原創 網絡技術 超過1,711 views圍觀 0條評論

測試環境約定
keepalived-1.2.13-8.el7.x86_64
CentOS Linux release 7.3.1611 (Core)
Linux shtslb01 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

雙機的keepalived進程都開啟,啟用master backup的模式
vip地址能正常ping通

1.關閉master的網卡
ifdown ens33

ping keepalive master ————-
[c:\~]$ ping 172.16.9.67

正在 Ping 172.16.9.67 具有 32 字節的數據:
請求超時。

ping vip地址—————
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
請求超時。
請求超時。
請求超時。

故障現像
雙機的keepalived進程都存在,但m/b機器都沒有Add VIP地址

原因是主機未sending  0 priority 給備機導致備機由于機制原因未增加vip.

0915具體原因已查明,在VM環境下ifdown ens160接口,會出現vm的網卡關閉,但在另一個機器上

tcpdump -i ens160 -nn grrp 發現這個vm仍會持續發送vrrp包,導致備機優先級為10小于100無法進行切換.這就奇怪了.
10:11:18.356859 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:11:19.357352 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

因為 關機,掛起,kill進程都會切換備機,唯獨關閉網卡出現無VIP的情況

測試1 執行掛機操作
在VM上執行掛起操作,抓包發現瞬間進行了切換.
10:18:16.859199 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:18:17.860395 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:18:18.861724 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
————–切換開始—————-
10:18:22.823343 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
————–切換結束—————-
10:18:23.825414 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
10:18:24.826114 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
10:18:25.826719 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
10:18:26.827314 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20

測試2 將master 進行恢復操作

master vip已恢復
10:21:15.930309 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
————-切換開始—————-
10:21:15.930829 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:21:15.930965 IP 172.16.9.68 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 10, authtype simple, intvl 1s, length 20
————-切換結束—————-
10:21:15.931328 IP 172.16.9.67 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

相關測試  ping vip地址切換過程中丟一個包.
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
請求超時。
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61
來自 172.16.9.69 的回復: 字節=32 時間<1ms TTL=61

master 看接口地址
[[email protected] ~]#
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:9d:5d:16 brd ff:ff:ff:ff:ff:ff
    inet 172.16.9.67/24 brd 172.16.9.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 172.16.9.69/32 scope global ens160————vip地址——-
       valid_lft forever preferred_lft forever

———-以下為錯誤分析——
—–這個版本的 keepalived 會使兩臺機器都不啟用 vip本案例需要解決這個問題 ——

Master日志:
Sep 14 14:40:44 mysql01 NetworkManager[627]: <info>  [1505371244.4998] device (ens33): state change: activated -> deactivating (reason ‘user-requested’) [100 110 39]
網絡被關閉-----Sep 14 14:40:44 mysql01 NetworkManager[627]: <info>  [1505371244.5006] manager: NetworkManager state is now DISCONNECTING
Sep 14 14:40:44 mysql01 dbus[617]: [system] Activating via systemd: service name=’org.freedesktop.nm_dispatcher’ unit=’dbus-org.freedesktop.nm-dispatcher.service’
Sep 14 14:40:44 mysql01 dbus-daemon: dbus[617]: [system] Activating via systemd: service name=’org.freedesktop.nm_dispatcher
‘ unit=’dbus-org.freedesktop.nm-dispatcher.service’
Sep 14 14:40:44 mysql01 systemd: Starting Network Manager Script Dispatcher Service…
Sep 14 14:40:44 mysql01 NetworkManager[627]: <info>  [1505371244.6169] audit: op="device-disconnect" interface="ens33" ifindex=2 pid=126800 uid=0 result="success"
Sep 14 14:40:44 mysql01 NetworkManager[627]: <info>  [1505371244.6178] device (ens33): state change: deactivating -> disconnected (reason ‘user-requested’) [110 30 39]
Sep 14 14:40:44 mysql01 Keepalived_vrrp[126656]: Netlink reflector reports IP fe80::20c:29ff:fe5c:8574 removed
Sep 14 14:40:44 mysql01 Keepalived_healthcheckers[126655]: Netlink reflector reports IP fe80::20c:29ff:fe5c:8574 removed
Sep 14 14:40:44 mysql01 dbus[617]: [system] Successfully activated service ‘org.freedesktop.nm_dispatcher’
Sep 14 14:40:44 mysql01 systemd: Started Network Manager Script Dispatcher Service.
Sep 14 14:40:44 mysql01 dbus-daemon: dbus[617]: [system] Successfully activated service ‘org.freedesktop.nm_dispatcher’
Sep 14 14:40:44 mysql01 Keepalived_vrrp[126656]: Netlink reflector reports IP 192.168.142.138 removed  ---地址被移除
Sep 14 14:40:44 mysql01 Keepalived_vrrp[126656]: Netlink reflector reports IP 192.168.142.188 removed
Sep 14 14:40:44 mysql01 Keepalived_healthcheckers[126655]: Netlink reflector reports IP 192.168.142.138 removed
Sep 14 14:40:44 mysql01 Keepalived_healthcheckers[126655]: Netlink reflector reports IP 192.168.142.188 removed
Sep 14 14:40:44 mysql01 nm-dispatcher: req:1 ‘connectivity-change’: new request (4 scripts)
Sep 14 14:40:44 mysql01 nm-dispatcher: req:1 ‘connectivity-change’: start running ordered scripts…
Sep 14 14:40:44 mysql01 NetworkManager[627]: <info>  [1505371244.6525] manager: NetworkManager state is now DISCONNECTED
Sep 14 14:40:44 mysql01 nm-dispatcher: req:2 ‘down’ [ens33]: new request (4 scripts)
Sep 14 14:40:44 mysql01 nm-dispatcher: req:2 ‘down’ [ens33]: start running ordered scripts…
Sep 14 14:40:44 mysql01 chronyd[647]: Source 172.30.100.139 offline
Sep 14 14:40:44 mysql01 chronyd[647]: Can’t synchronise: no selectable sources
Sep 14 14:40:50 mysql01 Keepalived_healthcheckers[126655]: TCP socket bind failed. Rescheduling.
Sep 14 14:40:56 mysql01 Keepalived_healthcheckers[126655]: TCP socket bind failed. Rescheduling.
Sep 14 14:41:02 mysql01 Keepalived_healthcheckers[126655]: TCP socket bind failed. Rescheduling.

正常的切換日志如下
[[email protected] ~]# !tail
tail -250 /var/log/messages|less
[[email protected] ~]# tail -f /var/log/messages
Sep 14 14:59:11 mysql01 chronyd[647]: System clock wrong by -3.886966 seconds, adjustment started
Sep 14 15:01:01 mysql01 systemd: Started Session 322 of user root.
Sep 14 15:01:01 mysql01 systemd: Starting Session 322 of user root.
Sep 14 15:32:51 mysql01 Keepalived[127003]: Stopping Keepalived v1.2.13 (11/05,2016)---正常停止
Sep 14 15:32:51 mysql01 Keepalived_vrrp[127005]: VRRP_Instance(VI_1) sending 0 priority---發送優先級
Sep 14 15:32:51 mysql01 Keepalived_vrrp[127005]: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 14 15:32:51 mysql01 Keepalived_healthcheckers[127004]: Netlink reflector reports IP 192.168.142.188 removed
Sep 14 15:32:51 mysql01 Keepalived_healthcheckers[127004]: Removing service [192.168.142.138]:3310 from VS [192.168.142.188]:3310
Sep 14 15:32:51 mysql01 Keepalived_healthcheckers[127004]: IPVS: No such destination
Sep 14 15:32:51 mysql01 Keepalived_healthcheckers[127004]: IPVS: No such file or directory

備機收到的日志
Sep 14 15:32:51 mysql02 Keepalived_vrrp[124728]: VRRP_Instance(VI_1) Transition to MASTER STATE---
Sep 14 15:32:52 mysql02 Keepalived_vrrp[124728]: VRRP_Instance(VI_1) Entering MASTER STATE---
Sep 14 15:32:52 mysql02 Keepalived_vrrp[124728]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 14 15:32:52 mysql02 Keepalived_vrrp[124728]: VRRP_Instance(VI_1) Sending gratuitous ARPs on ens33 for 192.168.142.188--
Sep 14 15:32:52 mysql02 Keepalived_healthcheckers[124727]: Netlink reflector reports IP 192.168.142.188 added
Sep 14 15:32:57 mysql02 Keepalived_vrrp[124728]: VRRP_Instance(VI_1) Sending gratuitous ARPs on ens33 for 192.168.142.188

2.解決方案
增加人工仲裁機制,即ping 對方如果對方不可訪問即重啟keepalived進程

增加人工仲裁腳本
#——————jeff v2——————
IP=172.16.9.69
date="`date ‘+%Y-%m-%d %H:%M:%S’`"    #取時間
lost=`ping -c 3 -w 3 $IP | grep ‘packet loss’ \    #取lost packet 值與 0、100進行對比
| awk -F’packet loss’ ‘{ print $1 }’ \
| awk ‘{ print $NF }’ | sed ‘s/%//g’`

if [ $lost -eq 0 ]   #不丟包則打印
then
echo "$date ping is ok" >>/var/log/keepalived.log
elif [ $lost -lt 100 ]  #不是100丟包即報警
then
echo "$date ping is error" >>/var/log/keepalived_error.log
else                     #等于100即重啟服務
systemctl restart keepalived
fi
——————jeff v2———————————

keepalived 增加配置
#———–add bgeing————
vrrp_script cl {
    script "/opt/script/keepalive_check.sh"
    interval 5
    weight 120    #如檢測出現問題 優先級增加120
        }
#————add bgeing———
vrrp_instance VI_1 {
    state BACKUP
    interface ens160
    virtual_router_id 51
    priority 10
    advert_int 1
#————add check begin—–
   track_script {
    cl
   }
#———–add check begin——-

檢測腳本最終版

#——————jeff v2-0914—————–
IP=172.16.9.69
date="`date ‘+%Y-%m-%d %H:%M:%S’`"
lost=`ping -c 3 -w 3 $IP | grep ‘packet loss’ \
| awk -F’packet loss’ ‘{ print $1 }’ \
| awk ‘{ print $NF }’ | sed ‘s/%//g’`

if [ $lost -eq 0 ]
then
echo "$date ping $IP  is ok" >>/var/log/keepalived.log
elif [ $lost -lt 100 ]
then
echo "$date ping $IP is error" >>/var/log/keepalived_error.log
else
echo "$date ping $IP is error" >>/var/log/keepalived_error.log
#systemctl restart keepalived
#pkill keepalived
fi
~                                                                                                                          

3.腳本運行后的恢復方式

1 恢復主服服務器的網絡
2 開啟 systemctl restart network 
       systemctl restart keepalived
     關掉備機進程  pkill keepalived
3 ps aux |grep keepalive_check
4 ip a 查看
5 觀察主機是否恢復  
6 開啟備機keepalived

文章出自:CCIE那點事 http://www.qdxgqk.live/ 版權所有。本站文章除注明出處外,皆為作者原創文章,可自由引用,但請注明來源。 禁止全文轉載。
本文鏈接:http://www.qdxgqk.live/?p=3562轉載請注明轉自CCIE那點事
如果喜歡:點此訂閱本站
  • 相關文章
  • 為您推薦
  • 各種觀點
?
暫時還木有人評論,坐等沙發!
發表評論

您必須 [ 登錄 ] 才能發表留言!

?
?
萌宠夺宝游戏