安装TFA用于快速收集RAC各类日志

TFA一般主要用于Oracle RAC环境一键收集需要的日志进行分析问题,解决传统人工收集集群、数据库等各类日志效率低下的问题。具体关于TFA的介绍,网上资料已经非常多,在此不再赘述。
TFA的安装也非常简单,从MOS(Autonomous Health Framework (AHF) – Including TFA and ORAchk/EXAChk (Doc ID 2550798.1) )下载最新的稳定版
TFA & ORAchk/EXAchk 20.1 for Linux,对应介质为:AHF-LINUX_v20.1.3.zip,解压后运行ahf_setup即可完成安装。

1.官方说明文档

安装包中README.txt内容如下:

Oracle Autonomous Health Framework (AHF) - with ORAchk, EXAchk & Trace File Analyzer
------------------------------------------------------------------------------------

Installation Instructions
-------------------------
This must be completed for each node of a cluster.

1) Copy the zip file to a target machine and unzip
2) Run the command: ahf_setup

For advanced installation options see the User Guide

User Guide:
---------------
The AHF User Guide can now be found online at https://docs.oracle.com/en/engineered-systems/health-diagnostics/autonomous-health-framework/ahfug/index.html


Enhancements and Bug Fixes
--------------------------
A full list of enhancements and bugs fixed in each release is available at:
https://support.oracle.com/epmos/main/downloadattachmentprocessor?attachid=2550798.1:AHF_VERSION_HISTORY

2.测试安装遇到问题

我在自己一套RHEL7.3 + Oracle 11.2.0.4 RAC的环境进行安装,结果过程中遇到错误:

Extracting AHF to /opt/oracle.ahf
Can't locate Data/Dumper.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . /opt/oracle.ahf/tfa/bin /opt/oracle.ahf/tfa/bin/common /opt/oracle.ahf/tfa/bin/modules /opt/oracle.ahf/tfa/bin/common/exceptions) at /opt/oracle.ahf/tfa/bin/common/tfactlshare.pm line 1350, <F1> line 1.
BEGIN failed--compilation aborted at /opt/oracle.ahf/tfa/bin/common/tfactlshare.pm line 1350, <F1> line 1.
Compilation failed in require at /opt/oracle.ahf/tfa/bin/tfasetup.pl line 107, <F1> line 1.
BEGIN failed--compilation aborted at /opt/oracle.ahf/tfa/bin/tfasetup.pl line 107, <F1> line 1.

Failed to start TFA Services

这导致最终TFA没有真正安装成功,去查看TFA状态或尝试启动TFA都会报相同错误:

[root@db01 bin]# tfactl start
TFA scheduler is not running
[root@db01 bin]# which tfactl
/usr/bin/tfactl
[root@db01 bin]# tfactl stop
TFA scheduler is not running

实际查询,是因为系统缺少包导致,需要yum install 'perl(Data::Dumper)'各节点安装,以节点1为例:

[root@db01 bin]# yum install 'perl(Data::Dumper)'
..省略部分输出..
Installed:
  perl-Data-Dumper.x86_64 0:2.145-3.el7                                                                                                                                                                       

Complete!

3.重新尝试安装TFA

再次尝试安装TFA,需要在各节点手工删除之前的安装路径,以节点1为例:

[root@db01 tmp]# rm -rf /opt/oracle.ahf/
[root@db01 tmp]# rm -rf /opt/app/grid/oracle.ahf/

注意:注意选择你自己安装TFA的路径。

安装TFA:

[root@db01 tmp]# ./ahf_setup 

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_9681_2020_06_25-17_50_19.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

Default AHF Location : /opt/oracle.ahf

Do you want to install AHF at [/opt/oracle.ahf] ? [Y]|N : Y

AHF Location : /opt/oracle.ahf

AHF Data Directory stores diagnostic collections and metadata.
AHF Data Directory requires at least 5GB (Recommended 10GB) of free space.

Choose Data Directory from below options : 

1. /opt/app/grid [Free Space : 27037 MB]
2. Enter a different Location

Choose Option [1 - 2] : 1

AHF Data Directory : /opt/app/grid/oracle.ahf/data

Do you want to add AHF Notification Email IDs ? [Y]|N : 

Enter Email IDs separated by space : xxdba@xx.com

AHF will also be installed/upgraded on these Cluster Nodes :

1. db02

The AHF Location and AHF Data Directory must exist on the above nodes
AHF Location : /opt/oracle.ahf
AHF Data Directory : /opt/app/grid/oracle.ahf/data

Do you want to install/upgrade AHF on Cluster Nodes ? [Y]|N : Y

Extracting AHF to /opt/oracle.ahf

Configuring TFA Services

Discovering Nodes and Oracle Resources

Not generating certificates as GI discovered

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.-------------------------------------------------------------------------.
| Host | Status of TFA | PID   | Port | Version    | Build ID             |
+------+---------------+-------+------+------------+----------------------+
| db01 | RUNNING       | 10850 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
'------+---------------+-------+------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.------------------------------------------------------------.
|                Summary of AHF Configuration                |
+-----------------+------------------------------------------+
| Parameter       | Value                                    |
+-----------------+------------------------------------------+
| AHF Location    | /opt/oracle.ahf                          |
| TFA Location    | /opt/oracle.ahf/tfa                      |
| Orachk Location | /opt/oracle.ahf/orachk                   |
| Data Directory  | /opt/app/grid/oracle.ahf/data            |
| Repository      | /opt/app/grid/oracle.ahf/data/repository |
| Diag Directory  | /opt/app/grid/oracle.ahf/data/db01/diag  |
'-----------------+------------------------------------------'


Starting orachk daemon from AHF ...

AHF install completed on db01

Installing AHF on Remote Nodes :

AHF will be installed on db02, Please wait.

Installing AHF on db02 :

[db02] Copying AHF Installer

[db02] Running AHF Installer

AHF binaries are available in /opt/oracle.ahf/bin

AHF is successfully installed

Moving /tmp/ahf_install_9681_2020_06_25-17_50_19.log to /opt/app/grid/oracle.ahf/data/db01/diag/ahf/

安装完成后可以查看到tfa的进程,另外值得一提的是,TFA中也集成了OSW,无需DBA再单独部署OSW:

[oracle@db01 ~]$ ps -ef|grep tfa
root     10732     1  0 Jun25 ?        00:00:02 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
oracle   12963 19247  0 01:44 pts/0    00:00:00 grep --color=auto tfa
root     16654     1  0 Jun25 ?        00:02:23 /opt/oracle.ahf/jre/bin/java -server -Xms32m -Xmx64m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:HeapDumpPath=/opt/app/grid/oracle.ahf/data/db01/diag/tfa oracle.rat.tfa.TFAMain /opt/oracle.ahf/tfa
[oracle@db01 ~]$ ps -ef|grep osw
oracle   12965 19247  0 01:44 pts/0    00:00:00 grep --color=auto osw
grid     16991     1  0 Jun25 ?        00:00:09 /bin/sh ./OSWatcher.sh 30 48 NONE /opt/app/grid/oracle.ahf/data/repository/suptools/db01/oswbb/grid/archive
grid     17259 16991  0 Jun25 ?        00:00:03 /bin/sh ./OSWatcherFM.sh 48 /opt/app/grid/oracle.ahf/data/repository/suptools/db01/oswbb/grid/archive
[oracle@db01 ~]$ 

4.使用TFA一键收集相关日志

最常用的比如说目前要收集最近5h内的所有相关日志,使用命令tfactl diagcollect -all -since 5h,这其中也包含了OSW的信息:

[oracle@db01 ~]$ /opt/oracle.ahf/tfa/bin/tfactl diagcollect -all -since 5h
The -all switch is being deprecated as collection of all components is the default behavior. TFA will continue to collect all components.
Collecting data for all nodes

Collection Id : 20200626012924db01

Detailed Logging at : /opt/app/grid/oracle.ahf/data/repository/collection_Fri_Jun_26_01_29_25_CST_2020_node_all/diagcollect_20200626012924_db01.log
2020/06/26 01:29:42 CST : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2020/06/26 01:29:42 CST : Collection Name : tfa_Fri_Jun_26_01_29_25_CST_2020.zip
2020/06/26 01:29:46 CST : Collecting diagnostics from hosts : [db01, db02]
2020/06/26 01:29:46 CST : Scanning of files for Collection in progress...
2020/06/26 01:29:46 CST : Collecting additional diagnostic information...
2020/06/26 01:30:06 CST : Getting list of files satisfying time range [06/25/2020 20:29:42 CST, 06/26/2020 01:30:06 CST]
2020/06/26 01:30:41 CST : Collecting ADR incident files...
2020/06/26 01:32:23 CST : Completed collection of additional diagnostic information...
2020/06/26 01:32:27 CST : Completed Local Collection
2020/06/26 01:32:28 CST : Remote Collection in Progress...
.---------------------------------.
|        Collection Summary       |
+------+-----------+-------+------+
| Host | Status    | Size  | Time |
+------+-----------+-------+------+
| db02 | Completed | 1.5MB | 126s |
| db01 | Completed | 2MB   | 161s |
'------+-----------+-------+------'

Logs are being collected to: /opt/app/grid/oracle.ahf/data/repository/collection_Fri_Jun_26_01_29_25_CST_2020_node_all
/opt/app/grid/oracle.ahf/data/repository/collection_Fri_Jun_26_01_29_25_CST_2020_node_all/db01.tfa_Fri_Jun_26_01_29_25_CST_2020.zip
/opt/app/grid/oracle.ahf/data/repository/collection_Fri_Jun_26_01_29_25_CST_2020_node_all/db02.tfa_Fri_Jun_26_01_29_25_CST_2020.zip

然后只需将/opt/app/grid/oracle.ahf/data/repository/collection_Fri_Jun_26_01_29_25_CST_2020_node_all目录下的文件:

[oracle@db01 collection_Fri_Jun_26_01_29_25_CST_2020_node_all]$ ls -lrth
total 3.6M
-rw-r--r-- 1 oracle oinstall  955 Jun 26 01:30 db01.tfa_Fri_Jun_26_01_29_25_CST_2020.zip.txt
-rw-r--r-- 1 oracle oinstall 1.6M Jun 26 01:32 db02.tfa_Fri_Jun_26_01_29_25_CST_2020.zip
-rw-r--r-- 1 oracle oinstall  921 Jun 26 01:32 db02.tfa_Fri_Jun_26_01_29_25_CST_2020.zip.txt
-rw-r--r-- 1 oracle oinstall 1.6K Jun 26 01:32 diagcollect_20200626012924_db02.log
-rw-r--r-- 1 oracle oinstall 2.1M Jun 26 01:32 db01.tfa_Fri_Jun_26_01_29_25_CST_2020.zip
-rw-r--r-- 1 oracle oinstall  993 Jun 26 01:32 diagcollect_console_20200626012924_db01.log
-rw-r--r-- 1 oracle oinstall 2.1K Jun 26 01:32 diagcollect_20200626012924_db01.log

按需下载下来即可。

Reference:

  • Autonomous Health Framework (AHF) – Including TFA and ORAchk/EXAChk (Doc ID 2550798.1)
  • SRDC – How to Collect Standard Information for a SQL Performance Problem Using TFA Collector (Recommended) or Manual Steps (Doc ID 2366043.1)
This entry was posted in 善事利器 and tagged . Bookmark the permalink.