客户的一套生产环境采用的架构是Oracle ADG + Keepalived,近期需要进行切换演练,要求我这边保障。ADG本身切换倒没啥可说的,但引入keepalived软件,就需要提前研究下这个架构。其实看了下环境配置,整体思路也非常简单,说白了就是利用keepalived软件引入一个VIP,应用侧只需配置连接这个VIP即可。
依据当前生产环境架构模拟了一套自己的测试环境。
1.Keepalived相关配置
关于Keepalived软件的配置和编译安装,可以参考之前《MySQL主主+Keepalived架构安装部署》中Keepalived安装部署章节。
除了利用keepalived软件引入一个VIP,还有一些配置和脚本,脱敏如下:
--------------------------------------------------------
--节点1(192.168.1.124)keepalived.conf文件内容:
--------------------------------------------------------
[root@test04 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
vrrp_script chk_dg_stats {
script "/etc/keepalived/check_dataguard.sh"
interval 2
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface eth0
mcast_src_ip 192.168.1.124
virtual_router_id 131
priority 101
inopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 888888
}
virtual_ipaddress {
192.168.1.131
}
track_script {
chk_dg_stats
}
}
--------------------------------------------------------
--节点2(192.168.1.125)keepalived.conf文件内容:
--------------------------------------------------------
[root@test05 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
vrrp_script chk_dg_stats {
script "/etc/keepalived/check_dataguard.sh"
interval 2
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
mcast_src_ip 192.168.1.125
virtual_router_id 131
priority 99
inopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 888888
}
virtual_ipaddress {
192.168.1.131
}
track_script {
chk_dg_stats
}
}
--------------------------------------------------------
--所有节点配置脚本check_dataguard.sh,并确认具有x执行权限:
--------------------------------------------------------
# cat /etc/keepalived/check_dataguard.sh
#!/bin/bash
dbstats=`ps -ef | grep ora_smon | grep -v grep | wc -l`
dgstats=`ps -ef | grep ora_mrp | grep -v grep | wc -l`
if [ "${dbstats}" -eq 0 ]; then
/etc/init.d/keepalived stop
elif [[ "${dbstats}" -gt 0 ]] && [[ "${dgstats}" -gt 0 ]]; then
/etc/init.d/keepalived stop
fi
说明:脚本check_dataguard.sh主要通过对ora_smon和ora_mrp进程的监控,判断哪种场景下该关闭keepalived服务:
场景1:当不存在ora_smon进程时(数据库实例Crash);
场景2:存在ora_smon进程同时存在ora_mrp进程时(已启动mrp进程的备库)。
--添加x执行权限:
chmod u+x /etc/keepalived/check_dataguard.sh
[root@test04 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 282 Jul 14 22:35 /etc/keepalived/check_dataguard.sh
[root@test05 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 281 Jul 14 22:36 /etc/keepalived/check_dataguard.sh
2.ADG手工切换步骤
1)在switchover正式切换前先在主库上手工切换几次日志,确认DG备库同步正常:
--PRIMARY(主库192.168.1.124)切换几次日志:
SQL>
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
--Standby (备库192.168.1.125)需确认同步正常没有延迟:
SQL>
select * from v$dataguard_stats;
2)主库切换为备库
-- 在PRIMARY(主库192.168.1.124)查询,确认可切换为备库:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database;
-- 在PRIMARY(主库192.168.1.124)操作,切换为备库:
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
3)备库切换为主库
-- 在Standby(备库192.168.1.125)查询,确认可切换为主库:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database;
-- 在Standby(备库192.168.1.125)操作,切换为主库(根据SWITCHOVER_STATUS值确认用下面哪个命令):
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
4)新主库open,新备库启动并开启MRP,新主库启动keepalived服务
--NEW PRIMARY(新主库192.168.1.125)数据库从mount启动到open状态:
ALTER DATABASE OPEN;
--NEW STANDBY(新备库192.168.1.124)数据库startup启动,开启DG日志应用:
STARTUP
RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
--确认NEW STANDBY(新备库192.168.1.124)DG同步正常,没有延迟:
SQL>
select * from v$dataguard_stats;
5) 新主库启动keepalived服务
--NEW PRIMARY(新主库192.168.1.125)OS层root用户启动keepalived服务:
# /etc/init.d/keepalived start
注意:当演练结束后,若需要switchover主备再次切换,只需要按上面规范步骤重复操作即可(注意主备角色的转换)。
3.VIP和监听的关系
源于最早的一次面试,两个节点的RAC,节点1主机Crash,此时应用通过节点1的VIP是否可以连接到数据库?为什么?
我们都知道节点1主机Crash,其VIP会自动漂移节点2,ping这个IP也是通的,但是通过其连接数据库却不行!会报一个没有监听(ORA-12541: TNS:no listener)的错误。
具体可参考:RAC 某节点不可用时,对应VIP是否可用
那这里的环境,同样是VIP的设置,为何却可以通过VIP(192.168.1.131)连接呢?
[oracle@test03 ~]$ sqlplus sys/oracle@192.168.1.131/demo as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Jul 14 23:45:23 2020
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
实际验证,是因为这里主备库的监听配置统一都是主机名:
[oracle@test04 admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = test04)(PORT = 1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
[oracle@test05 admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = test05)(PORT = 1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = jingyus)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME = jingyu)
)
)
如果将主机名修改为具体的IP地址,则测试同样会报错(ORA-12541: TNS:no listener)。