Oracle Linux切换uek内核到rhck内核解决ACFS兼容问题

背景:协助客户做验证,客户使用的是RHEL7.6环境,我这边是OEL7.6环境,开始以为区别不大,结果acfs兼容还是遇到问题,特此记录下。
现象:asmca图形没有acfs相关内容,无法使用acfs。

起初以为是个简单的问题,之前也遇到因为bug导致类似现象,结果这次应用最新的RU补丁依然不行。

[grid@db193 ~]$ lsmod|grep oracle

这里依然没有结果显示,再次尝试安装还是报错当前OS版本不被支持:

[root@db193 bin]# pwd
/u01/app/19.3.0/grid/bin
[root@db193 bin]# ./acfsroot install
ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7'

此时是非常奇怪的,客户环境RHEL7.6虽然遇到一些异常,但最起码是可以安装使用的,难道有什么区别吗?
通过MOS搜索ACFS支持的OS平台:

  • ACFS Support On OS Platforms (Certification Matrix). (Doc ID 1369107.1)

起初从列表中可以确认有一些bug 27494830 等,但是目前环境已经应用最新的RU,而且也查了这些bug,都已经应用过补丁:

[grid@db193 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory |grep 27494830
     22162072, 27494830, 27917085, 28064731, 28293236, 28321248, 28375150

再次仔细看MOS文章时,发现支持的OS版本实际和我目前环境是有区别的:

All Updates, 4.14.35-1902 and later UEK 4.14.35 kernels 

查了下,这个实际对应的是 OEL 7.7,而我这个是OEL 7.6,所以确实不支持。。

[grid@db193 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7'
ACFS-9201: Not Supported

那客户RHEL 7.6 为何就支持呢?两者差异在哪里?
实际上我们知道OEL有两个内核可选择,一个是UEK内核,另一个就是兼容红帽的RHCK内核,而我的环境默认是UEK内核,很遗憾这个内核对应的7.6版本就是不支持ACFS的。
可是测试任务重,不可能升级/重新安装系统,于是想到是否可以切换到RHCK内核呢?因为根据列表看到RHEL 7.6就是支持的版本:

Update 6 3.10.0-957 and later 3.10.0 Red Hat Compatible kernels 

所以尝试更改内核,根据MOS文档:

  • Change Booting Kernel From UEK to RHCK on OL 7.X IaaS Compute Instances (Doc ID 2248303.1)

有些步骤在我的环境是不需要的,我这边实际测试只需如下步骤:

--Oracle Linux切换uek内核到rhck内核
[root@db195 ~]# uname -a
Linux db195 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@db195 ~]# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
0 : Oracle Linux Server (4.14.35-1818.3.3.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.6
1 : Oracle Linux Server (3.10.0-957.el7.x86_64 with Linux) 7.6
2 : Oracle Linux Server (0-rescue-06634a96d9af4acdaa83c9227d61a7f3 with Linux) 7.6
[root@db195 ~]# grub2-set-default 1
[root@db195 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.14.35-1818.3.3.el7uek.x86_64
Found initrd image: /boot/initramfs-4.14.35-1818.3.3.el7uek.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-06634a96d9af4acdaa83c9227d61a7f3
Found initrd image: /boot/initramfs-0-rescue-06634a96d9af4acdaa83c9227d61a7f3.img
done
[root@db195 ~]# reboot
[root@db195 ~]# uname -a
Linux db195 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 1 00:13:43 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux

更换RHCK内核后,再次查询acfs是否支持:

[root@db195 ~]# su - grid
上一次登录:二 9月 14 00:57:30 CST 2021
[grid@db195 ~]$ cd $ORACLE_HOME/bin
[grid@db195 bin]$ ./acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9200: Supported

终于支持了,此时再次检查ACFS的Modules并尝试安装成功:

[root@db193 bin]# lsmod|grep oracle
[root@db193 bin]# cd /u01/app/19.3.0/grid/bin
[root@db193 bin]# ./acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.
[root@db193 bin]# lsmod|grep oracle
oracleacfs           5184608  0
oracleadvm           1163390  0
oracleoks             757134  2 oracleacfs,oracleadvm
[root@db193 bin]#

在所有节点都安装后,然后查看状态:

[grid@db193 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       db193                    Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       db193                    STABLE
ora.crf
      1        ONLINE  ONLINE       db193                    STABLE
ora.crsd
      1        ONLINE  ONLINE       db193                    STABLE
ora.cssd
      1        ONLINE  ONLINE       db193                    STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       db193                    STABLE
ora.ctssd
      1        ONLINE  ONLINE       db193                    ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       db193                    STABLE
ora.gipcd
      1        ONLINE  ONLINE       db193                    STABLE
ora.gpnpd
      1        ONLINE  ONLINE       db193                    STABLE
ora.mdnsd
      1        ONLINE  ONLINE       db193                    STABLE
ora.storage
      1        ONLINE  ONLINE       db193                    STABLE
--------------------------------------------------------------------------------

此时发现依然没有acfs的资源,尝试asmca创建试试看:最后执行脚本有问题,尝试手工启动依然有问题:

[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85
PRCA-1138 : 无法启动一个或多个文件系统资源:
Not all ADVM/ACFS drivers have been loaded.
CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db195' failed
Not all ADVM/ACFS drivers have been loaded.
CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db193' failed

尝试添加acfs资源,acfsroot enable:

[root@db193 bin]# cd /u01/app/19.3.0/grid/bin/
[root@db193 bin]# ./acfsroot enable
ACFS-9376: Adding ADVM/ACFS drivers resource succeeded.
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'db193'
CRS-2676: Start of 'ora.drivers.acfs' on 'db193' succeeded
ACFS-9380: Starting ADVM/ACFS drivers resource succeeded.

此时再次查询发现ora.drivers.acfs已经有了。
再次尝试启动filesystem成功:

[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85

再次查询acfs资源,已经正常mount成功:

[grid@db193 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.OGGSOU.advm
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.data.oggsou.acfs
               ONLINE  ONLINE       db193                    mounted on /oggsou,S
                                                             TABLE
               ONLINE  ONLINE       db195                    mounted on /oggsou,S
                                                             TABLE

最后reboot两个机器,验证是否acfs可以开机自动启动,验证结果是OK的,因为之前协助解决问题时,是参考之前经验加了服务启动项,实际发现这种正常操作下来后并不需要配置启动项,去查询也是没有的:

[root@db193 system]# pwd
/etc/systemd/system
[root@db193 system]# ls -lrth
总用量 16K
drwxr-xr-x. 2 root root   44 7月  16 2019 system-update.target.wants
drwxr-xr-x. 2 root root   32 7月  16 2019 getty.target.wants
drwxr-xr-x. 2 root root   87 7月  16 2019 default.target.wants
drwxr-xr-x. 2 root root   35 7月  16 2019 local-fs.target.wants
drwxr-xr-x. 2 root root   38 7月  16 2019 dev-virtio\x2dports-org.qemu.guest_agent.0.device.wants
drwxr-xr-x. 2 root root   57 7月  16 2019 basic.target.wants
lrwxrwxrwx. 1 root root   37 7月  16 2019 default.target -> /lib/systemd/system/multi-user.target
drwxr-xr-x. 2 root root   51 7月  30 2019 sockets.target.wants
drwxr-xr-x. 2 root root   31 7月  30 2019 remote-fs.target.wants
drwxr-xr-x. 2 root root 4.0K 9月   9 2019 sysinit.target.wants
drwxr-xr-x  2 root root   34 9月  13 17:17 oracle-ohasd.service.d
-rw-r--r--  1 root root  699 9月  13 17:17 oracle-ohasd.service
-rw-r--r--  1 root root  452 9月  13 17:22 oracle-tfa.service
drwxr-xr-x. 2 root root 4.0K 9月  13 17:22 multi-user.target.wants
drwxr-xr-x  2 root root   60 9月  13 17:22 graphical.target.wants
[root@db193 system]#

实际操作下来遇到的知识点还是蛮多,看来有些新版本的东西还是要实际动手验证下,不能只凭历史经验,也是应了那句老话:纸上得来终觉浅,绝知此事要躬行。

This entry was posted in Oracle故障处理 and tagged , , , . Bookmark the permalink.