背景:协助客户做验证,客户使用的是RHEL7.6环境,我这边是OEL7.6环境,开始以为区别不大,结果acfs兼容还是遇到问题,特此记录下。
现象:asmca图形没有acfs相关内容,无法使用acfs。
起初以为是个简单的问题,之前也遇到因为bug导致类似现象,结果这次应用最新的RU补丁依然不行。
[grid@db193 ~]$ lsmod|grep oracle
这里依然没有结果显示,再次尝试安装还是报错当前OS版本不被支持:
[root@db193 bin]# pwd
/u01/app/19.3.0/grid/bin
[root@db193 bin]# ./acfsroot install
ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7'
此时是非常奇怪的,客户环境RHEL7.6虽然遇到一些异常,但最起码是可以安装使用的,难道有什么区别吗?
通过MOS搜索ACFS支持的OS平台:
- ACFS Support On OS Platforms (Certification Matrix). (Doc ID 1369107.1)
起初从列表中可以确认有一些bug 27494830 等,但是目前环境已经应用最新的RU,而且也查了这些bug,都已经应用过补丁:
[grid@db193 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory |grep 27494830
22162072, 27494830, 27917085, 28064731, 28293236, 28321248, 28375150
再次仔细看MOS文章时,发现支持的OS版本实际和我目前环境是有区别的:
All Updates, 4.14.35-1902 and later UEK 4.14.35 kernels
查了下,这个实际对应的是 OEL 7.7,而我这个是OEL 7.6,所以确实不支持。。
[grid@db193 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7'
ACFS-9201: Not Supported
那客户RHEL 7.6 为何就支持呢?两者差异在哪里?
实际上我们知道OEL有两个内核可选择,一个是UEK内核,另一个就是兼容红帽的RHCK内核,而我的环境默认是UEK内核,很遗憾这个内核对应的7.6版本就是不支持ACFS的。
可是测试任务重,不可能升级/重新安装系统,于是想到是否可以切换到RHCK内核呢?因为根据列表看到RHEL 7.6就是支持的版本:
Update 6 3.10.0-957 and later 3.10.0 Red Hat Compatible kernels
所以尝试更改内核,根据MOS文档:
- Change Booting Kernel From UEK to RHCK on OL 7.X IaaS Compute Instances (Doc ID 2248303.1)
有些步骤在我的环境是不需要的,我这边实际测试只需如下步骤:
--Oracle Linux切换uek内核到rhck内核
[root@db195 ~]# uname -a
Linux db195 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@db195 ~]# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
0 : Oracle Linux Server (4.14.35-1818.3.3.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.6
1 : Oracle Linux Server (3.10.0-957.el7.x86_64 with Linux) 7.6
2 : Oracle Linux Server (0-rescue-06634a96d9af4acdaa83c9227d61a7f3 with Linux) 7.6
[root@db195 ~]# grub2-set-default 1
[root@db195 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.14.35-1818.3.3.el7uek.x86_64
Found initrd image: /boot/initramfs-4.14.35-1818.3.3.el7uek.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-06634a96d9af4acdaa83c9227d61a7f3
Found initrd image: /boot/initramfs-0-rescue-06634a96d9af4acdaa83c9227d61a7f3.img
done
[root@db195 ~]# reboot
[root@db195 ~]# uname -a
Linux db195 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 1 00:13:43 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
更换RHCK内核后,再次查询acfs是否支持:
[root@db195 ~]# su - grid
上一次登录:二 9月 14 00:57:30 CST 2021
[grid@db195 ~]$ cd $ORACLE_HOME/bin
[grid@db195 bin]$ ./acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9200: Supported
终于支持了,此时再次检查ACFS的Modules并尝试安装成功:
[root@db193 bin]# lsmod|grep oracle
[root@db193 bin]# cd /u01/app/19.3.0/grid/bin
[root@db193 bin]# ./acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.
[root@db193 bin]# lsmod|grep oracle
oracleacfs 5184608 0
oracleadvm 1163390 0
oracleoks 757134 2 oracleacfs,oracleadvm
[root@db193 bin]#
在所有节点都安装后,然后查看状态:
[grid@db193 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE db193 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE db193 STABLE
ora.crf
1 ONLINE ONLINE db193 STABLE
ora.crsd
1 ONLINE ONLINE db193 STABLE
ora.cssd
1 ONLINE ONLINE db193 STABLE
ora.cssdmonitor
1 ONLINE ONLINE db193 STABLE
ora.ctssd
1 ONLINE ONLINE db193 ACTIVE:0,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE db193 STABLE
ora.gipcd
1 ONLINE ONLINE db193 STABLE
ora.gpnpd
1 ONLINE ONLINE db193 STABLE
ora.mdnsd
1 ONLINE ONLINE db193 STABLE
ora.storage
1 ONLINE ONLINE db193 STABLE
--------------------------------------------------------------------------------
此时发现依然没有acfs的资源,尝试asmca创建试试看:最后执行脚本有问题,尝试手工启动依然有问题:
[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85
PRCA-1138 : 无法启动一个或多个文件系统资源:
Not all ADVM/ACFS drivers have been loaded.
CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db195' failed
Not all ADVM/ACFS drivers have been loaded.
CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db193' failed
尝试添加acfs资源,acfsroot enable:
[root@db193 bin]# cd /u01/app/19.3.0/grid/bin/
[root@db193 bin]# ./acfsroot enable
ACFS-9376: Adding ADVM/ACFS drivers resource succeeded.
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'db193'
CRS-2676: Start of 'ora.drivers.acfs' on 'db193' succeeded
ACFS-9380: Starting ADVM/ACFS drivers resource succeeded.
此时再次查询发现ora.drivers.acfs已经有了。
再次尝试启动filesystem成功:
[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85
再次查询acfs资源,已经正常mount成功:
[grid@db193 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.OGGSOU.advm
ONLINE ONLINE db193 STABLE
ONLINE ONLINE db195 STABLE
ora.data.oggsou.acfs
ONLINE ONLINE db193 mounted on /oggsou,S
TABLE
ONLINE ONLINE db195 mounted on /oggsou,S
TABLE
最后reboot两个机器,验证是否acfs可以开机自动启动,验证结果是OK的,因为之前协助解决问题时,是参考之前经验加了服务启动项,实际发现这种正常操作下来后并不需要配置启动项,去查询也是没有的:
[root@db193 system]# pwd
/etc/systemd/system
[root@db193 system]# ls -lrth
总用量 16K
drwxr-xr-x. 2 root root 44 7月 16 2019 system-update.target.wants
drwxr-xr-x. 2 root root 32 7月 16 2019 getty.target.wants
drwxr-xr-x. 2 root root 87 7月 16 2019 default.target.wants
drwxr-xr-x. 2 root root 35 7月 16 2019 local-fs.target.wants
drwxr-xr-x. 2 root root 38 7月 16 2019 dev-virtio\x2dports-org.qemu.guest_agent.0.device.wants
drwxr-xr-x. 2 root root 57 7月 16 2019 basic.target.wants
lrwxrwxrwx. 1 root root 37 7月 16 2019 default.target -> /lib/systemd/system/multi-user.target
drwxr-xr-x. 2 root root 51 7月 30 2019 sockets.target.wants
drwxr-xr-x. 2 root root 31 7月 30 2019 remote-fs.target.wants
drwxr-xr-x. 2 root root 4.0K 9月 9 2019 sysinit.target.wants
drwxr-xr-x 2 root root 34 9月 13 17:17 oracle-ohasd.service.d
-rw-r--r-- 1 root root 699 9月 13 17:17 oracle-ohasd.service
-rw-r--r-- 1 root root 452 9月 13 17:22 oracle-tfa.service
drwxr-xr-x. 2 root root 4.0K 9月 13 17:22 multi-user.target.wants
drwxr-xr-x 2 root root 60 9月 13 17:22 graphical.target.wants
[root@db193 system]#
实际操作下来遇到的知识点还是蛮多,看来有些新版本的东西还是要实际动手验证下,不能只凭历史经验,也是应了那句老话:纸上得来终觉浅,绝知此事要躬行。