Monday, 14 May 2012

PRVF-5150: Path ORCL: is not a valid path while installing Clusterware 11.2

Today I faced below issue at Pre-requisit checks for ASM shared storage for clusterware.Verify all the steps and ignore the error and proceed for installation.

"Device Checks for ASM - This is a pre-check to verity if the specified devices meet the requirements for configuration through the Oracle Universal Storage Manager Configuration Assistant.

Verification result of failed node: racnode1
 List of errors:
  - PRVF-5150: Path ORCL:DISK1 is not a valid path on all nodes
Operation Failed on Nodes: [racnode1]
 List of errors:
  -  Could not get the type of storage
     - Cause: Cause of Problem Not Available
     - Action: User Action Not Available
After doing some R&D I found MOS DOC:"Device Checks for ASM Fails with PRVF-5150: Path ORCL: is not a valid path [ID 1210863.1]"
Solution:
At the time of this writing, bug 10026970 is fixed in 11.2.0.3 which is not released yet. If ASM device passes manual verification, the warning can be ignored.

Manual Verification

To verify asmlib status:

$/etc/init.d/oracleasm status
Checking if ASM is loaded:       yes
Checking if /dev/oracleasm is mounted:      yes

## Both should be [yes]

To verify user setting in asmlib:

For environment without job role separation:
========
id <grid user>
uid=1001(oracle) gid=1000(dba) groups=1000(dba)

/usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=dba
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""


For environment with job role separation:
========
id <grid user>
uid=1001(oracle) gid=1000(dba) groups=1000(dba)

/usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=dba
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""


## Note: Regardless job role separation, both ORACLEASM_UID and ORACLEASM_GID should match id output for grid user.


To verify disk

/etc/init.d/oracleasm listdisks
DISK1

ls -l /dev/oracleasm/disks
..
brw-rw----    1 oracle dba       8,  33 Sep 16 09:41 DISK1

## Note: In job role separation environment, group will be asmadmin instead of oinstall
dd if=/dev/oracleasm/disks/DISK1 of=/dev/null bs=1024k count=1
1+0 records in
1+0 records out

## Note: Disk DISK1 is available and readable from above output. 
Enjoy Reading: 

Friday, 13 April 2012

What is SCAN in Oracle RAC 11gR2? Tips and Tricks to troublshoot connectivity with SCAN listener?

Single Client Access Name(SCAN) is a new feature of Oracle Real Application Cluster(RAC) 11g Release that provides a single name for the clients to access Oracle Database running in a cluster.  The benefit of the SCAN Listener is that the client's connection data does not need to changed if you add or delete a node from a cluster. 
The Single Client Access Name is configured during the installation of Oracle Grid Infrastructure. Once configured, application tier connection descriptors just specify the SCAN name rather than all the [virtual] hosts in the cluster.
Without the Single Client Access Name, the descriptor for a two-node cluster would be
TEST =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=db1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=db2-vip)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=TEST)))  
With the Single Client Access Name, just the SCAN name needs to be specified:

TEST = (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-scan)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=TEST))) 

The benefit is that the scan description will remain same irrespective of the number of nodes in the cluster.

In short SCAN is
1)The address used by clients connecting to cluster

2)The SCAN is fully qualified hostname located in GNS subdomain registered to three IP addressess.

3) The SCAN provides a stable,highly available name for clients to use,independent of the nodes that make up the cluster

Verify SCAN Listener Configuration on Server

After grid installation is completed you can verify the SCAN Listener configuration on your server -
  • At the $GRID_HOME/network/admin directory you will have two listener related files.
                 -rw-r--r-- 1 grid oinstall 887 Jul 13 09:33 listener.ora
                 -rw-r--r-- 1 grid oinstall 375 Jul 13 09:33 endpoints_listener.ora

example of listener.ora file showing three scan listeners entries

LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent


Check Status of SCAN IPs and SCAN Listener

srvctl command can be used to check the status of SCAN IPs and SCAN listener

  [grid@db1  admin]$ srvctl status scan
  SCAN VIP scan1 is enabled
  SCAN VIP scan1 is running on node db2
  SCAN VIP scan2 is enabled
  SCAN VIP scan2 is running on node db1
  SCAN VIP scan3 is enabled
  SCAN VIP scan3 is running on node db1

Note that two SCAN IPs are online on node db1 and one is online on db2.




Monday, 9 April 2012

Using FNDCPASS to change apps,applsyspub and EBusiness Suite Base Product Schemas


Login as appstier OS user and execute below command to change apps, applsyspub and EBusiness Suite Base Product Schemas

1)Run env file on appstier.
2)Stop application services

3)Steps to change 

a)apps password

$ FNDCPASS apps/apps 0 Y system/oracle SYSTEM APPLSYS apps123

*Here apps123 is new password for apps user.
b) applsyspub
$ FNDCPASS apps/apps123 0 Y system/oracle ORACLE APPLSYSPUB PUB123

*PUB123 is new password for APPLSYSPUB User

c)EBusiness Suite Base Product Schemas

$ FNDCPASS apps/apps123 0 Y system/oracle ALLORACLE oracle123

*oracle123 is new password EBusiness Suite Base Product Schemas

4)Run autoconfig on appstier

$cd $ADMIN_SCRIPTS_HOME
$ sh adautocfg.sh

5) Start application services

Hope its helpfull ...enjoy reading..:)


Sunday, 8 April 2012

How to configure EMCA With Oracle RAC 11gR2?

The RAC database has been created manually without using dbca, therefore emca has not been run and there is no DB Control repository created in the RAC database.The RAC database is not the hosting database for a Grid Control repositoryThis can be checked running the following SQL statement connected as a DBA user to the database:

SQL> select username from DBA_USERS where username = 'SYSMAN';
 
If the SQL Statement returns 'SYSMAN', it means that there is already a DB Control repository or a Grid Control repository present in the database.If you want to drop existing repository please read end of this article.
 
Create EM repository:

Run emca in interactive mode
 On Node 1(db1):
$ emca -config dbcontrol db -repos create -cluster

Enter the following information:
1)Cluster Name
To find out the value of your CLUSTER_NAME from CRS (OCR), do the following from the CRS_HOME:
      $ cd $CRS_HOME/bin
      $ ./cemutlo -n
 Cluster Name:db-scan
2) Database unique name:TEST
If you're not sure of the values for Database unique name and service name, execute the following statement connected as a DBA user to any instance of the RAC database:      
 
SQL> show parameter db_unique_name
 TEST
3) Listener port:1521
4) SYS password:*******
5) DBSNMP password:******
6) SYSMAN password:*****
7) ASM ORACLE_HOME:/u01/app/oracle/db-home1
8) ASM SID:+ASM1
9) ASM ROLE:sysdba
10) ASM USERNAME:sys
11) ASM PORT:1521
 
On Node 2:
 
Configure EMC on Node 2(db2):

$emca -reconfig dbcontrol -cluster -EM_NODE db2  -EM_NODE_LIST db2

*Here EM_NODE is hostname on which you are running emca and EM_NODE_LIST  refers to list of nodes where you want to configure EM.This can be list of nodes participating in cluster.
 
 
Drop EM Repository:
 In order to drop an existing EM repository , perform below steps

1. SQL> alter user SYSMAN account lock;
2. SQL> drop user SYSMAN cascade;
3. SQL> alter user MGMT_VIEW account lock;
4. SQL> drop user MGMT_VIEW cascade;
5. $ emca -deconfig dbcontrol db -repos drop -cluster


Hope its helpfull..:)

Saturday, 7 April 2012

What are Oracle’s HA features FAN, TAF and FCF?

The primary goal of this article is to clarify Oracle’s HA features FAN, FCF and TAF and  make recommendations on when to use which. I will discuss in greater details how to use these mechanisms.But first let take a look on Oracle's HA configurations

Oracle Database HA Configurations

The Oracle Database furnishes the following High Availability configurations:
• Single Instance HA
• Cold Failover Cluster
• RAC, RAC One, RAC with Vendor Clusterware
• Data Guard Physical Standby (single Instance or RAC with/without Broker)
• Data Guard Logical Standby (Single Instance or RAC with/without Broker)

Detailed coverage of each of these configurations can be found in the Oracle Database 11g Release 2 High Availability Overview document @http://download.oracle.com/docs/cd/E14072_01/server.112/e10804.pdf

The Why and What of Application Failover

When a database outage occurs, two problems (the Evil Twins) confront applications: errors and hangs. Applications encounter errors, because the work they were doing(queries, transactions) is interrupted. Even worse, those errors may take some time to arrive.Oracle’s HA features address these Twin Evils by helping to speed application response to failure, and by helping to mask the error from the end user in some situations.

Fast Application Notification (FAN)

FAN addresses one of the Evil Twins: hangs. FAN is an Oracle High Availability mechanism that emits events when database conditions change i.e, a managed service,instance or site1 goes up or down. The events are propagated by either Oracle Notification System (ONS) to Java subscribers, or Streams AQ to OCI (C, C++, PHP,Python) and .NET subscribers. The main benefits of FAN when compared to TCP
timeouts are: fast detection of condition change and fast notification.FAN is available through the following Oracle components: CMAN session pools; Oracle Call Interface (OCI) and a number of drivers or adapters that use OCI libraries (including OCCI, PHP, Python); Universal Connection Pool for Java; JDBC SimpleFAN API; and ODP.NET connection pools.
 
How to use Fan events?
a) Non-programmatically through Oracle database integrated clients (using Oracle
Restart): Oracle JDBC, Universal Connection Pool for Java, Oracle Call Interface,
and ODP.NET. These clients can be configured to enable FAN and automatically
connect to a new primary database upon failover using Fast Connection Failover
(FCF).
 
b) Programmatically: 3rd party drivers, containers or frameworks may use Java FAN API
(SimpleFAN) or OCI Callbacks, to handle FAN events themselves.

c) FAN callout scripts can be configured on the database tier and allow server-side
actions when conditions change.

Transparent Application Failover (TAF) Overview

Transparent Application Failover (TAF) helps to address the other Evil Twin: errors. TAF is an OCI feature providing connection recovery capabilities: connection failover, session state restoration, query failover, and graceful session migration for planned downtime.TAF operates at session or connection level and is available to database clients that use the OCI driver including: OCI, OCCI, Pre-compilers (Pro*), ODP.Net, JDBC-OCI (not JDBC thin), PHP OCI8, Ruby OCI8, Python cx-oracle, etc.TAF is particularly useful for read-only and read-mostly applications. When failure occurs amid a query or transaction and the client has subscribed to FAN events, TAF will re-execute the query and re-position the cursor. As a result, the application may continue fetching after a failure occurs. If failure occurs during a transaction, the database rolls back the transaction, TAF notifies the client to clean up application state (by issuing ROLLBACK) before resuming normal operations on a new connection.TAF may be used with or without FAN.

a) In TAF-only environments (i.e., when not subscribing to FAN events), upon condition change (i.e., node down), and upon expiration of TCP timeout, TAF executes the recovery and failover process; the client may experience longer delay (varies with systems) because, unlike FAN, application threads may remain blocked until the TCP timeout expires.

b) When combined with FAN, in RAC and Data Guard environments, delays due to TCP timeouts are eliminated.

c) TAF callback functionality, allows applications to extend the TAF recovery mechanism.
 
Fast Connection Failover (FCF)

FCF designates the set of actions that integrated Oracle clients (UCP, OCI Session pool,etc) take to process FAN events. The key features of FCF are:
 
a) Rapid database service/instance/node failure detection then abort and removal of invalid connections from the pool
Unplanned outages -- Dead connections are rapidly detected: the borrowed and inuse connections are aborted and removed from the pool; idle connections are cleaned up as well.
Planned Outages (graceful shutdown) -- Borrowed or in-use connections are not interrupted however, at the completion of the database operation the connections are marked for removal and returned to the pool. When all connections are checked-in,back to the pool, the database can shutdown gracefully.

b) Recognition of new nodes that join an Oracle RAC cluster

c) Runtime distribution of connection requests to all active Oracle RAC instances

 


Recommnendations:

The following rules of thumb or recommendations apply:

1. When using integrated Oracle clients (JDBC, OCI, .Net etc), FAN with FCF is highly recommended as it provides immunity from TCP timeouts for in-flight calls, and eagerly cleans up dead connections from connection pools to minimize application exposure to the failure.

2. For - read-only and read-mostly applications, TAF (in conjunction with FAN) is the recommended choice; it provides query failover (i.e. it allows active queries to continue) without disrupting the application.

3. Java containers, drivers, frameworks, or applications may use FAN API to directly manage FAN events themselves. OCI-based containers, drivers, frameworks or applications (C, C++, OCCI, PreComps, PHP, Ruby, Python, Perl) may use OCI callbacks to directly manage FAN events themselves.

Friday, 6 April 2012

Troubleshoot Grid Infrastructure Startup Issues


1) Clusterware Start up sequence:


In a nutshell, the operating system starts ohasd, ohasd starts agents to start up daemons (gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc), and crsd starts agents that start user resources (database, SCAN, listener etc).

2)Cluster status

To find out cluster and daemon status:
$GRID_HOME/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

$GRID_HOME/bin/crsctl stat res -t -init
 
3) To start an offline daemon - if ora.crsd is OFFLINE:

$GRID_HOME/bin/crsctl start res ora.crsd -init


4) OHASD does not start

As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up. If ohasd.bin is not up, when checking it's status, CRS-4639 (Could not contact Oracle High Availability Services) will be reported; and if ohasd.bin is already up, CRS-4640 will be reported if another start up attempt is made; if it fails to start, the following will be reported:

CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

Automatic ohasd.bin start up depends on the following:

a) OS is at appropriate run level:

OS need to be at specified run level before CRS will try to start up.

To find out at which run level the clusterware needs to come up:

cat /etc/inittab|grep init.ohasd
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null


Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.

To find out current run level:

who -r

b)"init.ohasd run" is up

On Linux/UNIX, as "init.ohasd run" is configured in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux, /usr/sbin/init on AIX) will start and respawn "init.ohasd run" if it fails. Without "init.ohasd run" up and running, ohasd.bin will not start:

ps -ef|grep init.ohasd|grep -v grep
root      2279     1  0 18:14 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

If any rc Snncommand script (located in rcn.d, example S98gcstartup) stuck, init process may not start "/etc/init.d/init.ohasd run"; please engage OS vendor to find out why relevant Snncommand script stuck.

c)Cluserware auto start is enabled - its enabled by default

By default CRS is enabled for auto start upon node reboot, to enable:

$GRID_HOME/bin/crsctl enable crs

To verify whether its currently enabled or not:

cat $SCRBASE/$HOSTNAME/root/ohasdstr
enable

SCRBASE is /etc/oracle/scls_scr on Linux and AIX, /var/opt/oracle/scls_scr on hp-ux and Solaris

Note: NEVER EDIT THE FILE MANUALLY, use "crsctl enable/disable crs" command instead.

Thursday, 5 April 2012

Patching Oracle Clusterware 11gR2

Oracle clusterware 11gR2 supports only out-of-place upgrades.

*An in-place upgrade is intalled in the existing Clusterware home and replaces the older software.

*An out-of-place upgrade has both clusterware versions present on the nodes at the same time,in different Grid homes; but only one is active.

Install Oracle Clusterware in separate home before the upgrade reduces the down time for cluster upgrades.

The active software version and Grid Home location are stored in the OCR.Checking software versions

$crsctl query crs softwareversion [hostname]
$crsctl query crs activeversion

Oracle Local Registry(OLR) in 11gR2 RAC

Each cluster node has a local registry for node-specific resources, called an Oracle Local Registry(OLR).Its function is to facilitate clusterware startup in situations where the ASM stores the OCR and voting disks.During startup process,the OLR is referenced to determine the exact location of the voting disks.This enables the node to join the cluster.After this initial phase,ASM is started.After ASM is started,processes that require the full OCR can start and clusterware startup process completes.

Use ocrcheck for OLR

$ocrcheck -local
$ocrdump -local -stdout
$ocrconfig -local -export file_name