Thursday, May 28, 2009

HACMP IP concepts

HACMP

Boot ip address:

The Ip which we are assigning to the physical adapter, using smitty tcpip, this boot ip will work while system is booting & until the HACMP service is starting, this boot ip address will there in the entire physical adapter which is included in the HACMP network.

Note: By default there is no route for this IP address, the route ip will only available for Service, persistent & management ip address.

Persistent ip address:

The Persistent ip address is only for the administrator access, this will never come under any official document which we declared to the client, this is also one of the virtual ip, even if all the ips goes down also this will work, all the admin work should happen through this IP only.

It’s regardless whether the HACMP service is running or not, this is one of the routed ip address, where you can access at any time. This ip will be in primary network card.

Note: One persistent ip is allowed for one network, for example a node connected to 2 networks defined in HACMP, that node can be identified by two persistent ip labels (addresses).

This persistent ip we will declare this while configuring the HACMP we will mention, as soon as the Nodes synched it start working, even the HACMP service stopped also. If the primary cards failed the persistent ip will automatically switched to the secondary interface, if the server is breakdown or rebooted that time only the this ip will go down.

Concept of the Persistent ip:

For IPAT via ip replacement:

The persistent ip must be in the different subnet to the standby interface, but it can be in same subnet as the primary interface & service ip.

For IPAT via ip alias:

The persistent ip must be different from both the boot ip subnet.

Service Ip:

The Service ip is the ip used by the client to access the application, this service ip is monitoring by the HACMP service because it’s the part of the RG.

Two Types of Service IP:

Shared service ip:

This service ip can be configured on multiple nodes and it’s the part of RG that can be active on one node at a time.

Node-bound service ip:

This service ip can configure on one node only (not shared on multiple node), this type of ip concept is usable in concurrent RG.

Note: This service ip’s will work when the HACMP service is coming up.

We have 4 types service ip distribution:

Anti-Collocation:

This is the normal configuration, the Service ip will distribute to all boot ip interface in the same HACMP network.

Collocation:

HACMP will allocate all service ip on the same boot ip interface.

Collocation with persistent label:

HACMP will allocate all the service ip on boot ip interface that is hosting the persistent IP label, this will be useful for the VPN user & firewall setup where only one interface is granted for external communication.

Anti-Collocation with persistent label:

HACMP will distribute all the service ip label to all the boot ip interface in the same HACMP network, this is not hosting the persistent ip label, if there is shortage of interface then the service ip will share the adapter with the persistence alias ip label.

regards,

GuruRam

Wednesday, May 6, 2009

TSM Server-Side Daily Administrator Checklist

1. List TSM license compliance.

audit lic
select compliance from licenses

2. Query server processes and pending requests to determine if any jobs are waiting on operator action.

q pr
q req
q se

3. Query all disk storage pools to determine if the migration process has completed.

select stgpool_name, pct_utilized from stgpools where devclass='DISK'

4. List all drives that are OFFLINE.

select drive_name from drives where not online='YES'

5. List all paths that are OFFLINE.

select source_name, source_type, destination_name, destination_type from paths where not online='YES'

6. List all locked nodes.

select node_name from nodes where not locked='NO'

7. List all non-writeable tape and disk volumes.

q v acc=unavail
q v acc=reado
q v acc=destroyed

select volume_name, read_errors, write_errors from volumes where (read_errors>0 or write_errors>0)

select volume_name from volumes where devclass_name='DISK' and not status='ONLINE'

8. Verify that the library has sufficient scratch volumes.

select library_name,status,count(*) as "VOLUMES" from libvolumes group by library_name,status

9. Verify that the database extension and reduction values are non-zero and that the Cache Hit Ration is above 99%.

q db f=d

10. Verify that the recovery log extension and reduction values are non-zero and that the Wait Percentage is zero.

q log f=d

11. Verify that database and recovery log volumes are online and synchronized.

q dbv f=d
q logv f=d

12. Inspect TSM database fragmentation level.

select cast((100 - (cast(max_reduction_mb as float) * 256 ) / (cast(usable_pages as float) - cast(used_pages as float) ) * 100) as decimal(4,2)) as PERCENT_FRAG from db

13. Verify that the scheduled database backups completed successfully.

select date (date_time) as date, time(date_time) as time, volume_name, type from volhistory where type in ('BACKUPFULL', 'BACKUPINCR', 'DBSNAPSHOT', 'DBDUMP')

14. Verify that all CLIENT schedules for the last day succeeded.

q ev * * begind=-1 endd=today begint=00:00:00 endt=00:00:00

To restrict the listing to only those nodes with non-completed status:

q ev * * begind=-1 endd=today begint=00:00:00 endt=00:00:00 ex=y

15. Verify that all ADMINISTRATIVE schedules for the last day succeeded.

q ev * t=a begind=-1 endd=today begint=00:00:00 endt=00:00:00

To restrict the listing to only those nodes with non-completed status:

q ev * t=a begind=-1 endd=today begint=00:00:00 endt=00:00:00 ex=y

16. Check the activity log for error messages.

q actl search=AN?????E begind=-1 begint=00:00 endd=today endt=00:00

17. List nodes that are not associated with a backup schedule.

select node_name from nodes where node_name not in (select node_name from associations)

18. Cross match the TSM node name with the host name or computer name.

select node_name, tcp_address, tcp_name from nodes

19. List PRIMARY POOL volumes that have been checked out of the library.

select volume_name, stgpool_name from volumes where stgpool_name in (select stgpool_name from stgpools where devclass<>'DISK' and pooltype='PRIMARY') and volume_name not in (select volume_name from libvolumes)

20. Checkout all D/R Media for offsite storage.

move drm * wherest=mo tost=va rem=b

21. Verify that all D/R volumes have been checked out.

select volume_name from libvolumes where volume_name in (select volume_name from volumes where stgpool_name in (select stgpool_name from stgpools where devclass<>'DISK' and pooltype='COPY'))

22. Verify that all TSM database backup volumes have been checked out.

select volume_name from libvolumes where last_use='DbBackup'

23. Identify previous offsite volumes that can be recycled to scratch status and checkin the same.

q drm wherest=vaultr
move drm * wherest=vaultr tost=onsite
checki libv checkl=b stat=scr search=b wait=0

24. Generate a list of unlocked TSM administrator accounts with full system privileges.

select admin_name from admins where not system_priv='No' and not locked='No'

25. List TSM Nodes and Client (BA/TDP) versions by platform.

select platform_name as OS, client_os_level as OS_VER, node_name as Node, cast(cast(client_version as char(2)) '.' cast(client_release as char(2)) '.' cast(client_level as char(2)) '.' cast(client_sublevel as char(2)) as char(15)) as "TSM Client" from nodes order by platform_name, "TSM Client", Node

26. Data backed up in the last 24 hours:

select entity, date(start_time) as DATE, time(start_time) as START_TIME, time(end_time) as END_TIME, substr(char(end_time-start_time),3,8) as DURATION, cast((bytes/1024/1024/1024) as decimal(18,2)) as GB_BACKED_UP, successful from summary where cast((current_timestamp-start_time) hours as decimal)<24>=current_timestamp-24 hours and activity='BACKUP' group by entity

29. Size and duration of archive operations for each node in the last 24 hours:

select entity as "Node Name ", cast(sum(bytes/1024/1024) as decimal(10,3)) as "Total MB", substr(cast(min(start_time) as char(26)),1,19) as "Date/Time ", cast(substr(cast(max(end_time)-min(start_time) as char(20)),3,8) as char(8)) as "Length " from summary where start_time>=current_timestamp-24 hours and activity='ARCHIVE' group by entity

30. Compare PRIMARY and COPY pool occupancy totals.

select sum(num_files) as num_of_files,sum(physical_mb) as Physical_mb,sum(logical_mb) as logical_mb from occupancy where stgpool_name in (select stgpool_name from stgpools where pooltype='PRIMARY')

select sum(num_files) as num_of_files,sum(physical_mb) as Physical_mb,sum(logical_mb) as logical_mb from occupancy where stgpool_name in (select stgpool_name from stgpools where pooltype='COPY')

regards,
GuruRam

Monday, May 4, 2009

When do AIX kenel processes start

ஹாய்,
"ps -fk" lists that there are many kernel processes with PPID=0. I have read in documents that init process is started to handle rc.boot at boot time. So, when do kernel processes start, and why they have PPID=0? Furthermore, "swapper" process has both PID=PPID=0. Is it realy the parent of init and others who have PPID=0?

Process 0 is the parent of Init.
The very first kernel process/thread is a pure Kernel-Land process and
has PID 0.
Then Process 0 becomes the "swapper".
Process 0 launch the init process (PID 1), which is responsible for
User-Land processes.
That'why Init is the (grand) father of all user-land processes.

There are some other pure kernel-land process that are launched by the
PID 0 process, hence they have PPID 0 (e.g. kproc).

regards,
Guru Ram

Sunday, January 4, 2009

HACMP snapshot backup

Hi dudes,

Lets see about the HACMP snapshot backup,the below mentioned snaphots taken from one node of HACMP, take the backup from one node is enough, it will take the entire backup,

Hope this will help you peoples.