背景:本文是根据实际客户测试需求整理,因为客户OGG所在环境只有GI集群,数据库部署在其他位置,所以会有一些差异,但核心思路一致,已完全测试通过,整理出来供大家参考。
1.前期准备
RAC环境
DB: 19.12.0
GI: 19.12.0
OS: RHEL 7.6以上 or Oracle Linux 7.7以上
OGG软件
Oracle GoldenGate 19.1.0.0.4 for Oracle on Linux x86-64
XAG软件
Patch 31215432: XAG 10.2 BUG FIX MLR
目前最新的RU是19.12,同时下载对应最新的OPatch版本,之后使用最新OPatch应用19.12的补丁。
成功应用19.12的RU之后,检查下ACFS的相关Modules是否正常。
2.创建ACFS文件系统
因为本次安装环境只有GI的grid用户,所以acfs这里设置为grid和oinstall。
ASMCA调用图形界面创建ACFS文件系统,只要正常显示一般就没啥问题。
3.安装GoldenGate软件
此次安装选择Oracle GoldenGate for Oracle Database 19c
因为都是使用GRID用户安装,所以这里需要手工改为GRID_HOME对应路径,也充当了客户端功能,无需额外安装。
--ogg install
[grid@db193 media]$ unzip V983658-01.zip
[grid@db193 ~]$ cd /u01/media/fbo_ggs_Linux_x64_shiphome/Disk1/
[grid@db193 Disk1]$ ls
install response runInstaller stage
[grid@db193 Disk1]$ ./runInstaller
安装成功:特别注意这里手工修改了图形界面中的ORACLE_HOME默认值!!
当然修改这里也是因为我这个客户的需求相对特殊,没有oracle用户及其软件目录。
4.安装XAG软件
解压XAG介质,创建XAG目录,安装XAG软件,设置环境变量:
[root@db193 media]# ls -lrth
总用量 531M
-rwxr-xr-x 1 root root 213K 9月 14 09:23 p31215432_190000_Generic.zip
-rw-r--r-- 1 root root 531M 9月 14 09:24 V983658-01.zip
为了操作方便,root和grid用户都配置下GRID_HOME变量:
export GRID_HOME=/u01/app/19.3.0/grid
xag安装,确定安装目录:
[root@db195 ~]# cd /u01/app
[root@db195 app]# mkdir xag
[root@db195 app]# chown grid:oinstall xag
注意:xag目录确保所有节点都有创建成功,权限一致且正确;
xagsetup.sh --install --directory <installdir> [{--nodes <node1,node2[,...]> | --all_nodes}]
xagsetup.sh --install --directory /u01/app/xag --all_nodes
[grid@db193 media]$ unzip p31215432_190000_Generic.zip
[grid@db193 xag]$ pwd
/u01/media/xag
[grid@db193 xag]$ ./xagsetup.sh --install --directory /u01/app/xag --all_nodes
Installing Oracle Grid Infrastructure Agents on: db193
Installing Oracle Grid Infrastructure Agents on: db195
Updating XAG resources.
Successfully updated XAG resources.
设置环境变量:
export XAG_HOME=/u01/app/xag
同时将$XAG_HOME/bin设置到PATH变量中,方便调用。
5.在cluster上添加OGG资源
源端和目标端集群添加OGG资源方法一致,本次实施的环境,要配置的数据库不在本集群,只有GI集群软件和grid用户:
5.1 选择一个未使用的VIP地址添加
[grid@db193 admin]$ $GRID_HOME/bin/crsctl stat res -p |grep -ie .network -ie subnet |grep -ie name -ie subnet
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="asmnetwork" type="ResList">ora.asmnet1.asmnetwork</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="asmnetwork" type="ResList">ora.asmnet1.asmnetwork</Arg></xml>
SUBNET=10.10.1.0
REGISTRATION_INVITED_SUBNETS=
NAME=ora.asmnet1.asmnetwork(ora.asmgroup)
USR_ORA_SUBNET=10.10.1.0
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
NAME=ora.net1.network
USR_ORA_SUBNET=192.168.1.0
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
[root@db193 media]# $GRID_HOME/bin/appvipcfg create -network=1 -ip=192.168.1.198 -vipname=xag.gg_1-vip.vip -user=grid
5.2 将VIP资源赋权给GRID用户
[root@db193 media]# $GRID_HOME/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
5.3 启动VIP并检查状态
启动VIP资源:
[grid@db193 ~]$ $GRID_HOME/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'db193'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'db193' succeeded
检查VIP资源状态:
[grid@db193 ~]$ $GRID_HOME/bin/crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appviptypex2.type
TARGET=ONLINE
STATE=ONLINE on db193
5.4 添加goldengate实例并检查状态
[grid@db193 grid]$ $XAG_HOME/bin/agctl add goldengate gg_1 --gg_home /oggsou --instance_type source --nodes db193,db195 --vip_name xag.gg_1-vip.vip --filesystems ora.data.oggsou.acfs --oracle_home /u01/app/19.3.0/grid
检查状态
[grid@db193 grid]$ $XAG_HOME/bin/agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
启动goldengate gg_1
[grid@db193 grid]$ $XAG_HOME/bin/agctl start goldengate gg_1
5.5 检查资源状态
[grid@db195 oggsou]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.OGGSOU.advm
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.chad
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.data.oggsou.acfs
ONLINE ONLINE db193 mounted on /oggsou,S
TABLE
ONLINE ONLINE db195 mounted on /oggsou,S
TABLE
ora.net1.network
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.ons
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.proxy_advm
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE db193 STABLE
2 ONLINE ONLINE db195 STABLE
ora.CRS.dg(ora.asmgroup)
1 ONLINE ONLINE db193 STABLE
2 ONLINE ONLINE db195 STABLE
ora.DATA.dg(ora.asmgroup)
1 ONLINE ONLINE db193 STABLE
2 ONLINE ONLINE db195 STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE db195 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE db193 Started,STABLE
2 ONLINE ONLINE db195 Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE db193 STABLE
2 ONLINE ONLINE db195 STABLE
ora.cvu
1 ONLINE ONLINE db195 STABLE
ora.db193.vip
1 ONLINE ONLINE db193 STABLE
ora.db195.vip
1 ONLINE ONLINE db195 STABLE
ora.jydb.cmdb1.svc
2 ONLINE ONLINE db195 STABLE
ora.jydb.db
1 ONLINE ONLINE db193 Open,HOME=/u01/app/o
racle/product/19.3.0
/db_1,STABLE
2 ONLINE ONLINE db195 Open,HOME=/u01/app/o
racle/product/19.3.0
/db_1,STABLE
ora.qosmserver
1 ONLINE ONLINE db195 STABLE
ora.scan1.vip
1 ONLINE ONLINE db195 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE db195 STABLE
xag.gg_1.goldengate
1 ONLINE ONLINE db195 STABLE
--------------------------------------------------------------------------------
5.6 切换测试
节点db193切换到节点db195:
[grid@db193 grid]$ $XAG_HOME/bin/agctl relocate goldengate gg_1 --node db195
[grid@db193 grid]$ crsctl stat res -t
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1-vip.vip
1 ONLINE ONLINE db195 STABLE
xag.gg_1.goldengate
1 ONLINE ONLINE db195 STABLE
--------------------------------------------------------------------------------
节点db195切换到节点db193:
[grid@db193 grid]$ $XAG_HOME/bin/agctl relocate goldengate gg_1 --node db193
[grid@db193 grid]$ crsctl stat res -t
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1-vip.vip
1 ONLINE ONLINE db193 STABLE
xag.gg_1.goldengate
1 ONLINE ONLINE db193 STABLE
--------------------------------------------------------------------------------
均可以正常切换。
同样测试reboot重启db195主机,OGG的VIP和资源也会自动切换到db193,反之亦然。说明goldengate的高可用OK。
6.RAC上OGG的启停方法
6.1 停止OGG常用命令
1. 停止GoldenGate资源
[grid@db195 oggsou]$ agctl stop goldengate gg_1
[grid@db195 oggsou]$ crsctl stat res xag.gg_1.goldengate -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1.goldengate
1 OFFLINE OFFLINE STABLE
--------------------------------------------------------------------------------
2. 停止ACFS文件系统
[grid@db195 ~]$ srvctl stop filesystem -volume oggsou -diskgroup data
[grid@db195 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.data.oggsou.acfs
OFFLINE OFFLINE db193 admin unmounted /ogg
sou,STABLE
OFFLINE OFFLINE db195 admin unmounted /ogg
sou,STABLE
--------------------------------------------------------------------------------
3. 停止CRS
[root@db195 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'db195'
CRS-2673: Attempting to stop 'ora.crsd' on 'db195'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'db195'
<省略>
CRS-2677: Stop of 'ora.gipcd' on 'db195' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'db195' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'db195' has completed
CRS-4133: Oracle High Availability Services has been stopped.
4. 检查CRS是否完全停止
[root@db195 ~]# crsctl stat res -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
6.2 启动OGG常用命令
1.启动CRS
[root@db195 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
2. 启动ACFS文件系统
[grid@db195 ~]$ srvctl start filesystem -volume oggsou -diskgroup data
[grid@db195 ~]$ crsctl stat res ora.data.oggsou.acfs -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.data.oggsou.acfs
ONLINE ONLINE db193 mounted on /oggsou,S
TABLE
ONLINE ONLINE db195 mounted on /oggsou,S
TABLE
--------------------------------------------------------------------------------
3. 启动GoldenGate资源
[grid@db195 ~]$ agctl start goldengate gg_1
[grid@db195 ~]$ crsctl stat res xag.gg_1.goldengate -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1.goldengate
1 ONLINE ONLINE db193 STABLE
--------------------------------------------------------------------------------
7.其他补充
OGG具体进程也可以加入到集群监管,这样有进程异常通过集群状态可以清楚看到:
[grid@db193 oggsou]$ agctl modify goldengate gg_1 --monitor_extracts extjy1
[grid@db193 oggsou]$ agctl config goldengate gg_1
Instance name: gg_1
Application GoldenGate location is: /oggsou
Goldengate MicroServices Architecture environment: no
GoldenGate instance type is: source
EXTRACT groups to monitor: extjy1
REPLICAT groups to monitor:
Critical EXTRACT groups:
Critical REPLICAT groups:
Autostart on DataGuard role transition to PRIMARY: no
Autostart JAgent: no
Configured to run on Nodes: db193 db195
ORACLE_HOME location is: /u01/app/19.3.0/grid
File System resources needed: ora.data.oggsou.acfs
VIP name: xag.gg_1-vip.vip
如果有监管的进程未启动时会显示:
xag.gg_1-vip.vip
1 ONLINE ONLINE db195 STABLE
xag.gg_1.goldengate
1 ONLINE INTERMEDIATE db195 ER(s) not running :
EXTJY1,STABLE
--------------------------------------------------------------------------------
OGG的mgr进程可以配置自动启动其他进程(AUTOSTART ER *),下面是测试中使用的OGG配置供参考:
GGSCI (db193) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXTJY1 00:00:02 00:00:00
GGSCI (db193) 2> view param mgr
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
PORT 7809
GGSCI (db193) 3> view param extjy1
EXTRACT extjy1
USERID ggs_admin@prod, PASSWORD ggs_admin
TRANLOGOPTIONS DBLOGREADER
EXTTRAIL ./dirdat/sa
TABLE JY.T_SECOND_P;
最终本环境经测试可以实现各种场景切换:人工relocate切换、crs集群故障自动切换、主机直接重启自动切换等。
笔者感觉使用XAG在RAC环境上配置OGG还是非常不错的,是非常值得推广使用的,大家如果感兴趣可以实际测试感受一下。