Saturday, April 26, 2014

Solution - VMware vSphere Datastore is not accessible or not detected after unplanned PDL

I would like to share some KB & solutions related to PDL.
Some explanations on storage connectivity problems - PDL (Permanent Device Loss) and APD (All Path Down) is available in VMware Documentation here
Identifying Device Connectivity Problems
Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x (2004684)

Some of my customers experiences PDL, unplanned PDL, where they can no longer access the datastore but storage team has confirmed that they have presented the LUN to the hosts & clusters.
However, the datastore is not detected or not accessible from vSphere.
Below are some of the scenario and its resolution.


1. After unplanned PDL, vSphere can see the Datastores. However, when we try to browse the datastore the Datastore Browser is just loading ... no response.
Solution: Follow this KB VMware KB: Cannot remount a datastore after an unplanned permanent device loss (PDL) or just simply reboot the ESXi host

2. After unplanned PDL, Datastore is gone from vSphere. Storage team confirmed that they have presented the LUN to the hosts & clusters. However, the datastore is not detected as existing datastore and also not detected as new LUN.
Solution: Check storage devices from the host, see if you found any detached devices

If you found any, check the naa id/LUN# if it is the LUN that you are looking for. Select the device then attach and rescan. The datastore you are looking for will normally appear in the list of datastore



3. After unplanned PDL, Datastore is gone but you can see it when you want to add new Datastore (detected as if it is a new Datastore).
You do not want to reformat the datastore but the options to keep or resignature VMFS only are greyed out. The available option is only to format VMFS.

Solution: Follow this KB VMware KB: Cannot remount a datastore after an unplanned permanent device loss (PDL)  or just simply reboot the ESXi host


4. After unplanned PDL,You want to remove inactive datastore because of PDL, but all options are greyed out
Delete it from vCenter Database, the steps are explained in this KB: VMware KB: Unable to remove a datastore from the vCenter Server 4.x / 5.x inventory
or you can also reboot the affected ESXi host

To avoid these issues follow the proper way/best practices for removing a LUN from ESX host
VMware KB: Unmounting a LUN or detaching a datastore/storage device from multiple VMware ESXi 5.x hosts
Best Practice: How to correctly remove a LUN from an ESX host | VMware vSphere Blog - VMware Blogs

Wednesday, April 16, 2014

VMware vCenter SQL Database Maintenance Best Practices

People often ask on what are best practices on maintaining vCenter SQL Database.
One of the VMware Best Practices (listed in VMware Health Check/HealthAnalyzer tool) is to periodically perform database maintenance tasks on the vCenter database.
When vCenter server service is stopped and cannot be started, most of the problem is because the disk which stores the DB data is full.
This is why monitoring the disk space and utilization is important to ensure that the database has sufficient space for growth.
We also should schedule regular backups of the vCenter database. The backup for vCenter Server should also include the SSL certificates and licenses from the vCenter Server.

vCenter stores configuration, tasks, events and performance data records in Database, the configuration record usually do not grow or changing most of the time, only happens when we change the setting of a cluster, adding host to cluster, etc. Tasks, events, and performance data records do grow over time and will populate table rows in Database as time goes by.

vCenter Server has a Database Retention Policy setting that allows you to specify when vCenter Server tasks and events should be deleted. vCenter has mechanism to purge the database so that it does not overgrow. There is some built-in vCenter SQL DB automated jobs in Microsoft SQL Server to clean performance data, tasks and events records in Database. Since the retention policy does not affect performance data records, it is still possible to purge or shrink old records from the database using the scripts available in this KB
Reducing the size of the vCenter Server database when the rollup scripts take a long time to run
http://kb.vmware.com/kb/1025914

To access the Database Retention Policy setting in the vSphere Client: Click Administration > vCenter Server Settings > Database Retention Policy.
If it's not set, then it means there is no imitation on how long vCenter will keep tasks and events records in the database, this can also lead to database overgrowth. The default setting is 180 days, so vCenter will purge old data after 180 days.

vCenter performs basic statistics operations of insert, roll up, and purge. Higher statistics levels require that more work be performed by the vCenter Server for these operations, which can impact the performance of the vCenter Server database.
Higher statistics levels also increase the size of the vCenter database. You can use the database sizing estimator when changing the statistics level to make sure that you have adequate space in the vCenter database.

vCenter statistics levels:
  1. vCenter statistics level 1 includes the basic metrics but does not include statistics for devices.
  2. vCenter statistics level 2 includes all the metrics including statistics for devices.
  3. vCenter statistics level 3 includes all the metrics and all of the counter groups.
  4. vCenter statistics level 4 includes all the metrics supported by vCenter Server.
To prevent performance data from growing so large, we can set the stats collection level to 1.
It is recommended that the vCenter statistics level is kept at level 1 or 2. Level 2 gives more comprehensive vCenter statistics than the default setting.
When increasing level statistics from default level 1, monitor closely the vCenter database growth.

Consider upgrading to vCenter 5.1+ if you are still using vCenter version prior v5.1.
vCenter Server 5.1 introduces some significant improvements to the statistics subsystem. The improvements are especially important for vCenter Server 5.1 deployments running at-scale inventory. The database is therefore a critical component of vCenter Server performance. Because the statistics data consumes a large fraction of the database, proper functioning of statistics is an important consideration for the overall database performance. Thus, statistics collection and processing are key components for vCenter Server performance.

VMware vCenter Server 5.1 Database Performance Improvements and Best Practices for Large-Scale Environments: https://www.vmware.com/files/pdf/techpaper/VMware-vCenter-DBPerfBestPractices.pdf

Below are some references & KBs related to vCenter Database Maintenance: