2023-05-29
PostgreSQL高可用工具-repmgr-自动故障转移
集群安装配置:http://www.linuxmysql.com/14/2023/1194.htm
通过在主,备库上定制failover脚本文件,利用repmgrd守护进程实现自动切换
修改主备库上的repmgr.conf文件,添加故障自动切换参数
node13:
[root@node13 12]# cat repmgr.conf
node_id=1
node_name='node13'
conninfo='host=node13 port=5432 dbname=repmgr user=repmgr password=123456'
priority=3
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'
#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
node14:
[root@node14 12]# cat repmgr.conf
node_id=2
node_name='node14'
conninfo='host=node14 port=5432 dbname=repmgr user=repmgr password=123456'
priority=2
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'
#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
node15:
[root@node15 12]# cat repmgr.conf
node_id=3
node_name='node15'
conninfo='host=node15 port=5432 dbname=repmgr user=repmgr password=123456'
priority=1
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'
#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
在所有节点上重新启动repmgr服务:
su - postgres
repmgr node service --action=restart
所有节点重启repmgrd进程:
-bash-4.2$ kill -9 `ps -ef | grep repmgrd | awk '{print $2}'| head -n 1`
-bash-4.2$ repmgrd -d
[2023-05-28 17:19:21] [NOTICE] redirecting logging output to "/var/lib/pgsql/12/replogs/repmgr.log"
查看集群的主节点,现在在node14上面
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1 | node13 | standby | running | node14 | default | 3 | 2 | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2 | node14 | primary | * running | | default | 2 | 2 | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3 | node15 | standby | running | node14 | default | 1 | 2 | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4 | node16 | witness | * running | node14 | default | 0 | n/a | host=node16 port=5432 dbname=repmgr user=repmgr password=123456
在node14上关闭主库:
su - postgres
pg_ctl stop -m fast
过一会儿检查集群状态:
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1 | node13 | primary | * running | | default | 3 | 3 | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2 | node14 | primary | - failed | ? | default | 2 | | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3 | node15 | standby | running | node13 | default | 1 | 2 | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4 | node16 | witness | * running | node13 | default | 0 | n/a | host=node16 port=5432 dbname=repmgr user=repmgr password=123456
WARNING: following issues were detected
- unable to connect to node "node14" (ID: 2)
HINT: execute with --verbose option to see connection error messages
此时node13提升为主库,目前节点node14为不可用状态
如果再启动node14:
-bash-4.2$ pg_ctl -D $PGDATA start
则会出现两个主库的情况,但会提示node14节点不可用
此时需要关闭node14上的repmgr服务,并重新加入到集群中:
-bash-4.2$ kill -9 `ps -ef | grep repmgrd | awk '{print $2}'| head -n 1`
停止数据库:
pg_ctl stop -m fast
重新加入到集群:
-bash-4.2$ repmgr -D$PGDATA -d repmgr -p5432 -Urepmgr standby clone node13 -F
启动数据库:
pg_ctl start
-bash-4.2$ repmgr standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/70020A0, follow target lsn is 0/70020A0
NOTICE: setting node 2's upstream to node 1
NOTICE: stopping server using "/usr/bin/pg_ctl -D '/var/lib/pgsql/12/data' -w -m fast stop"
NOTICE: starting server using "/usr/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' start"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node13" (ID: 1)
-bash-4.2$ repmgrd -d
[2023-05-28 17:33:38] [NOTICE] redirecting logging output to "/var/lib/pgsql/12/replogs/repmgr.log"
查看集群状态:集群恢复正常
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1 | node13 | primary | * running | | default | 3 | 3 | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2 | node14 | standby | running | node13 | default | 2 | 3 | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3 | node15 | standby | running | node13 | default | 1 | 3 | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4 | node16 | witness | * running | node13 | default | 0 | n/a | host=node16 port=5432 dbname=repmgr user=repmgr password=123456
评论
发表评论
姓 名: