Best practices for VM migration

Last modified Feb, 26, 2017

VM migration

ScaleArc does not recommend or support voluntary or automatic vMotion of an active (that is, HA Primary or standalone ScaleArc) appliance from one ESX server to another. Testing shows that the vMotion process results in lost database updates, application server errors, and potentially-corrupted database records. vMotion is not fast enough to support real-time operations that ScaleArc counts on.

ScaleArc supports cold migration. First, the ScaleArc appliance should be shut down before you begin the migration. Next, you need to configure static MAC addresses on all virtual NICs on the appliance. Virtual switches must be configured to both accept and propagate ARP updates, or the appliance may not be accessible from the network, following migration.

If you have configured ScaleArc for HA, the HA Secondary may be vMotioned, provided you have configured Static MAC, and that the virtual switches accept and propagate routing/forwarding ARP updates.To move both members of an HA pair, move the HA Secondary, perform a manual HA switchover, then move the newly-demoted HA Secondary.

For best survivability in the event of system or network failures, ScaleArc recommends you separate the two appliances of an HA pair onto two ESX servers. In case of a failure, rather than attempt to move the HA Primary off of a failing ESX server, perform an HA switchover from the GUI of the HA Secondary, then attempt to shut down the newly demoted HA Secondary on the failing ESX server. Before bringing up the appliance that was running on the failed ESX server, contact ScaleArc support for assistance to avoid problems. In an emergency, the HA Secondary detects failure of the HA Primary and takes over the traffic.This happens much faster than vMotion.

Should you encounter an emergency on an ESX server running a standalone ScaleArc you have little choice but to attempt a vMotion. In this case, contact ScaleArc Support to assist you in recovery immediately. Make sure you immediately collect a log dump, using the instructions in KB article 2736, and using the --date syntax, with no hour specified. The logs should be collected within an hour of the problem to ensure the best assistance by avoiding log rollover; so don't delay. Contact us on our 24x7 toll-free support hotline, listed at support.scalearc.com.

ScaleArc does not recommend or support Automatic VM migration with a technology like VMware’s DRS.

ScaleArc VM provisioning at the ESX level

It is important to consider how ScaleArc VMs are setup at the Hypervisor level to use "Resource Pools" that has "Reservation."

For instance, on VMware, Guaranteed CPU or memory allocation for a given resource pool can be setup. A non-zero reservation is subtracted from the unreserved resources of the parent (host or resource pool). The resources are considered reserved, regardless of whether virtual machines are associated with the resource pool. Defaults to 0. Such a reservation would avoid overloading of ESX leading to a situation of vMotion.

For instance, If an unrelated, non-ScaleArc VM, is manually moved from ESX-B to ESX-A where ESX-A is hosting ScaleArc and if this VM being moved is large and comes online, it might overload ESX-A to the extent of halting the CPU queue and starving all the VMs on ESX-A (including ScaleArc).

Furthermore, if ScaleArc is not excluded as the candidate for vMotion, such a situation might trigger a vMotion of ScaleArc to another ESX host.

vMotion and similar technologies with ScaleArc HA

With vMotion and similar technologies, it is very risky moving ScaleArc nodes in HA, as it could very easily lead to a 'split brain' situation. For this reason, ScaleArc strongly discourages migrating ScaleArc HA-Primary server. A manual monitoring would be needed at all times till the activity is complete.

ScaleArc recommends the following approach:

Ensure the machine being moved is a Secondary (if you require to move the Primary server, then HA Switch it to become the Secondary before vMotion).
Stop all ScaleArc service and HA service on this machine.

SSH into the ScaleArc machine to run the following commands:

 sudo service heartbeat stop
 /etc/init.d/idblb stop
 /etc/init.d/idb_watchdog stop
 /etc/init.d/analytics stop

Complete the vMotion or similar activity with this machine. Ensure the machine is not rebooted. (If rebooted, stop the HA service.)
Check all network connectivity, including network connectivity with the database servers and the Primary ScaleArc.
Monitor the response lag between the Primary and Secondary ScaleArc. If it is within the HA-configured parameters, proceed; else, fix the networking issue or modify the HA configuration parameters to match the new setup.
If all of the above steps have been checked and verified, start the HA service and the idb services on the Secondary machine, or preferably reboot the Secondary machine.
```
 sudo service heartbeat start
 /etc/init.d/idblb start
 /etc/init.d/idb_watchdog start
 /etc/init.d/analytics start
```
Once the services are up, this machine is in HA as the Secondary.

Using VMware snapshots with ScaleArc appliances

VMware disk snapshot technology with an active HA Primary or standalone ScaleArc appliance can compromise performance and requires that the virtual disk be quiesced briefly during snapshot consolidation. This can cause the ScaleArc virtual machine to pause during consolidation, resulting in application server errors. Therefore, ScaleArc recommends that you keep snapshots only briefly, such as during backup of the virtual disks. This should be done at a time of low activity to reduce the possibility of errors. A brief snapshot also consolidates more quickly.

Neither VMware nor ScaleArc recommends using snapshots as a backup strategy.

We do not recommend reverting a ScaleArc appliance to a snapshot because of the complex interplay between live system configuration and configuration files on disk.

On this page

Comments

Add new comment