Troubleshooting oracle clusterware

Tags

A cluster consists of 2 or more nodes, and any cluster related issue dumps large contents mostly in all the nodes. Clusterware issue diagnosis persists collection of logs, gathering information, and if require use event level monitoring/tracing.

Let’s have a closer look on data collection.
** Oracle recommends strongly to use TFA for all clusterware issues, but this post doesn’t cover TFA

Oracle version 11gR2 and above:
$ORACLE_HOME/bin/diagcollection.sh (diagcollection.pl for earlier versions)
It generates following files:
— crsData_<hostname>.tar.gz,
— ocrData_<hostname>.tar.gz,
— oraData_<hostname>.tar.gz,
— coreData_<hostname>.tar.gz (only –core option specified)
— os_<hostname>.tar.gz

Using diagcollection.sh:
$GRID_HOME/bin/diagcollection.sh -collect -crs $GRID_HOME

./diagcollection.sh –help
–collect
[–crs] For collecting crs diag information
[–adr] For collecting diag information for ADR; specify ADR location
[–chmos] For collecting Cluster Health Monitor (OS) data
[–all] Default.For collecting all diag information. <<<>>>
[–core] Unix only. Package core files with CRS data
[–afterdate] Unix only. Collects archives from the specified date.
[–aftertime] Supported with -adr option. Collects archives after the specified
[–beforetime] Supported with -adr option. Collects archives before the specified
[–crshome] Argument that specifies the CRS Home location
[–incidenttime] Collects Cluster Health Monitor (OS) data from the specified
[–incidentduration] Collects Cluster Health Monitor (OS) data for the duration

Gathering Configuration data:
ORAchk utility gathers all the required data. It gathers below information:
1. OS kernel parameters
2. OS packages
3. Many other OS configuration settings
4. CRS/Grid Infrastructure
5. RDBMS
6. ASM
7. Database Parameters and Configuration Settings
8. Upgrade Readiness assessment for target versions 11.2.0.3 and above

When to use it:
After initial Oracle RAC deployment
Before planned system maintenance
After planned system maintenance
At least once every three months

Usage example:
./orachk -localonly –>runs only one node.
./orachk -dball

OS related data collection:
Use OSWatcher, and analyze OS stats during issue time frame:
ps -ef | grep osw –> gives you the log location.

Metalink doc: Note 301137.1 – OS Watcher User Guide

if root.sh/rootupgrade.sh execution failed, check for logs in below locations:
$ORACLE_BASE/cfgtoollogs
$ORACLE_BASE/diag

SSD and Hanganalyze for RAC:
Below steps help to you gather critical process dumps during hangs or performance issues:
— SQL> oradebug setospid <ospid of diag process>
— SQL> oradebug unlimit
— SQL> oradebug -g all hanganalyze 3
##..Wait about 2 minutes and repeat at again.

Use sqlplus -prelim /as sysdba to login to hang systems.

OracleTek Blog

Troubleshooting oracle clusterware

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply