Nginx 机器关闭 443 端口导致的 LVS 故障

今天在下线 Nginx 443 端口的时候发现 正常的 80 的服务也挂了,然后发现 Nginx 前面的 LVS 日志报如下:

Dec 20 16:44:04 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.54]:443.
Dec 20 16:44:04 lb16 Keepalived_healthcheckers: Removing service [10.0.11.54]:443 from VS [125.39.216.11_443]:0
Dec 20 16:44:04 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.71]:443.
Dec 20 16:44:04 lb16 Keepalived_healthcheckers: Removing service [10.0.11.71]:443 from VS [123.150.178.140_443]:0
Dec 20 16:44:04 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.71]:443.
Dec 20 16:44:04 lb16 Keepalived_healthcheckers: Removing service [10.0.11.71]:443 from VS [125.39.216.11_443]:0
Dec 20 16:44:09 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.54]:443.
Dec 20 16:44:09 lb16 Keepalived_healthcheckers: Removing service [10.0.11.54]:443 from VS [123.150.178.140_443]:0
Dec 20 16:44:09 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.56]:443.
Dec 20 16:44:09 lb16 Keepalived_healthcheckers: Removing service [10.0.11.56]:443 from VS [123.150.178.140_443]:0
Dec 20 16:44:13 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.56]:443.
Dec 20 16:44:13 lb16 Keepalived_healthcheckers: Removing service [10.0.11.56]:443 from VS [125.39.216.11_443]:0
Dec 20 16:44:23 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.25]:443.
Dec 20 16:44:23 lb16 Keepalived_healthcheckers: Removing service [10.0.11.25]:443 from VS [123.150.178.140_443]:0
Dec 20 16:44:23 lb16 Keepalived_healthcheckers: Lost quorum 1-0=1 > 0 for VS [123.150.178.140_443]:0
Dec 20 16:44:23 lb16 Keepalived_healthcheckers: Executing [ip addr del 123.150.178.140/32 dev lo] for VS [123.150.178.140_443]:0
Dec 20 16:44:24 lb16 Keepalived_healthcheckers: SSL handshake/communication error connecting to server (openssl errno: 1) [10.0.11.25]:443.
Dec 20 16:44:24 lb16 Keepalived_healthcheckers: Removing service [10.0.11.25]:443 from VS [125.39.216.11_443]:0
Dec 20 16:44:24 lb16 Keepalived_healthcheckers: Lost quorum 1-0=1 > 0 for VS [125.39.216.11_443]:0
Dec 20 16:44:24 lb16 Keepalived_healthcheckers: Executing [ip addr del 125.39.216.11/32 dev lo] for VS [125.39.216.11_443]:0

 

原来 LVS 检查 后端的 443 端口,发现全部失败之后,就把对应的 VIP 删掉了,reload LVS 之后又好了。

这是一个大坑,这是 LVS 的自动切换机制,是这样:

1. LVS 的 VIP 可以有多个端口,当某个端口检查全部失败时(也就是所有后端的这个端口都挂掉),LVS 会删除这个 VIP,其他端口的正常访问会失败。

2. 如果 reload LVS 的时候所有后端的某个端口都已经挂了,VIP 则不会删除,其他的端口服务没有问题。

所以,对于 Nginx 配置里面的 端口 删除,必须要谨慎,要检查 LVS,否则可能酿成大错。

 

发表评论

电子邮件地址不会被公开。 必填项已用*标注