PostgreSQL高可用工具-repmgr-自动故障转移


集群安装配置:http://www.linuxmysql.com/14/2023/1194.htm

通过在主,备库上定制failover脚本文件,利用repmgrd守护进程实现自动切换

修改主备库上的repmgr.conf文件,添加故障自动切换参数
node13:
[root@node13 12]# cat repmgr.conf
node_id=1
node_name='node13'
conninfo='host=node13 port=5432 dbname=repmgr user=repmgr password=123456'
priority=3
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'


#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'


node14:
[root@node14 12]# cat repmgr.conf
node_id=2
node_name='node14'
conninfo='host=node14 port=5432 dbname=repmgr user=repmgr password=123456'
priority=2
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'
#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'


node15:
[root@node15 12]# cat repmgr.conf
node_id=3
node_name='node15'
conninfo='host=node15 port=5432 dbname=repmgr user=repmgr password=123456'
priority=1
reconnect_attempts=6
reconnect_interval=10
data_directory='/var/lib/pgsql/12/data'
pg_bindir='/usr/bin'
config_directory='/var/lib/pgsql/12/data'
log_level=INFO
log_facility=STDERR
log_file='/var/lib/pgsql/12/replogs/repmgr.log'
#以下为新添加的参数
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'


在所有节点上重新启动repmgr服务:
su - postgres
repmgr node service --action=restart


所有节点重启repmgrd进程:
-bash-4.2$ kill -9 `ps -ef | grep repmgrd | awk '{print $2}'| head -n 1`
-bash-4.2$ repmgrd -d
[2023-05-28 17:19:21] [NOTICE] redirecting logging output to "/var/lib/pgsql/12/replogs/repmgr.log"




查看集群的主节点,现在在node14上面
-bash-4.2$ repmgr cluster show
ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1  | node13 | standby |   running | node14   | default  | 3        | 2        | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2  | node14 | primary | * running |          | default  | 2        | 2        | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3  | node15 | standby |   running | node14   | default  | 1        | 2        | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4  | node16 | witness | * running | node14   | default  | 0        | n/a      | host=node16 port=5432 dbname=repmgr user=repmgr password=123456


在node14上关闭主库:
su - postgres
pg_ctl stop -m fast


过一会儿检查集群状态:
-bash-4.2$ repmgr cluster show
ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1  | node13 | primary | * running |          | default  | 3        | 3        | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2  | node14 | primary | - failed  | ?        | default  | 2        |          | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3  | node15 | standby |   running | node13   | default  | 1        | 2        | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4  | node16 | witness | * running | node13   | default  | 0        | n/a      | host=node16 port=5432 dbname=repmgr user=repmgr password=123456


WARNING: following issues were detected
  - unable to connect to node "node14" (ID: 2)


HINT: execute with --verbose option to see connection error messages


此时node13提升为主库,目前节点node14为不可用状态




如果再启动node14:
-bash-4.2$ pg_ctl -D $PGDATA start


则会出现两个主库的情况,但会提示node14节点不可用




此时需要关闭node14上的repmgr服务,并重新加入到集群中:
-bash-4.2$ kill -9 `ps -ef | grep repmgrd | awk '{print $2}'| head -n 1`


停止数据库:
pg_ctl stop -m fast


重新加入到集群:
-bash-4.2$ repmgr -D$PGDATA -d repmgr -p5432 -Urepmgr standby clone node13  -F


启动数据库:
pg_ctl start


-bash-4.2$ repmgr standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/70020A0, follow target lsn is 0/70020A0
NOTICE: setting node 2's upstream to node 1
NOTICE: stopping server using "/usr/bin/pg_ctl  -D '/var/lib/pgsql/12/data' -w -m fast stop"
NOTICE: starting server using "/usr/bin/pg_ctl  -w -D '/var/lib/pgsql/12/data' start"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node13" (ID: 1)


-bash-4.2$ repmgrd -d
[2023-05-28 17:33:38] [NOTICE] redirecting logging output to "/var/lib/pgsql/12/replogs/repmgr.log"


查看集群状态:集群恢复正常
-bash-4.2$ repmgr cluster show
ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+--------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1  | node13 | primary | * running |          | default  | 3        | 3        | host=node13 port=5432 dbname=repmgr user=repmgr password=123456
2  | node14 | standby |   running | node13   | default  | 2        | 3        | host=node14 port=5432 dbname=repmgr user=repmgr password=123456
3  | node15 | standby |   running | node13   | default  | 1        | 3        | host=node15 port=5432 dbname=repmgr user=repmgr password=123456
4  | node16 | witness | * running | node13   | default  | 0        | n/a      | host=node16 port=5432 dbname=repmgr user=repmgr password=123456


 


















分割线
感谢打赏
江西数库信息技术有限公司
YWSOS.COM 平台代运维解决方案
 评论
 发表评论
姓   名:

Powered by AKCMS