Difference between revisions of "Information Systems:Server Virtualization Planning"

From uniWIKI
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 36: Line 36:
   
 
* '''Cost/features.''' The areas that need to be scrutinized for cost are:
 
* '''Cost/features.''' The areas that need to be scrutinized for cost are:
:# Storage: Shared or local storage? If shared, fibre channel can be ruled out - it makes sense with an existing FC infrastructure, but it's too expensive to get going from scratch. Right now it's shared SAS or iSCSI. Being a networking guy, I'd love to do iSCSI. Ipso facto, I'd hate to deal with the coupling of network and infrastructure downtime, should either happen.
+
:* '''Storage.''' Shared or local storage? If shared, fibre channel can be ruled out - it makes sense with an existing FC infrastructure, but it's too expensive to get going from scratch. Right now it's shared SAS or iSCSI. Being a networking guy, I'd love to do iSCSI. Ipso facto, I'd hate to deal with the coupling of network and infrastructure downtime, should either happen.
:#* There is actually a big case here for scrapping the idea of shared storage, and just using local storage i.e. redundant arrays on each host. The new-ish vSphere feature that makes local storage a feasible idea is '''cross-host vMotion'''. Along with cold-migration, this may be all we need to meet or even exceed the current level of service provided by our current infrastructure. Currently, I see shared storage as representing perhaps 1/3 of the project cost. While it would still be good to have in the future, perhaps the acquisition of shared storage can be implemented as a separate and future phase of this project (i.e. migration to shared storage in year 20xx).
+
::* There is actually a big case here for scrapping the idea of shared storage, and just using local storage i.e. redundant arrays on each host. The new-ish vSphere feature that makes local storage a feasible idea is '''cross-host vMotion'''. Along with cold-migration, this may be all we need to meet or even exceed the current level of service provided by our current infrastructure. Currently, I see shared storage as representing perhaps 1/3 of the project cost. While it would still be good to have, perhaps the acquisition of shared storage can be implemented as a separate and future phase of this project (i.e. migration to shared storage in year 20xx).
   
:# '''vSphere license.''': Essentials Plus still appears to be the right flavor for us. No DRS, no FT, no Distributed Switch, but it does have vMotion (although no storage vMotion). This license allows for 3 hosts.
+
:* '''vSphere license.''': Essentials Plus still appears to be the right flavor for us. No DRS, no FT, no Distributed Switch, but it does have vMotion (although no storage vMotion). This license allows for 3 hosts.
   
:# '''Host hardware configuration.''' We definitely want two hosts for redundancy, but with 12-core CPUs, I'm not sure if we'll need dual CPUs per host. RAM can be scaled back too. Perhaps 64GB per host. Overprovisioning resources to VMs seems to be a standard practice anyways, and our compute needs are consistent (not fluctuating).
+
:* '''Host hardware configuration.''' We definitely want two hosts for redundancy, but with 12-core CPUs, I'm not sure if we'll need dual CPUs per host. RAM can be scaled back too. Perhaps 64GB per host. Overprovisioning resources to VMs seems to be a standard practice anyways, and our compute needs are consistent (not fluctuating).
   
 
* '''High availability/Disaster recovery.''' Deploying VMware vSphere (actually, doing virtualization in general) raises the baseline for high-availability/redundancy i.e. virtualizing an entire infrastructure immediately makes it more redundant and better able to cope with downtime (planned or disaster-related). However, HA/DR has many faces, and it will be important to weed out the features that either are not feasible or not ideal in our infrastructure, so we can establish that "baseline". For example, the following features are likely not a good fit:
 
* '''High availability/Disaster recovery.''' Deploying VMware vSphere (actually, doing virtualization in general) raises the baseline for high-availability/redundancy i.e. virtualizing an entire infrastructure immediately makes it more redundant and better able to cope with downtime (planned or disaster-related). However, HA/DR has many faces, and it will be important to weed out the features that either are not feasible or not ideal in our infrastructure, so we can establish that "baseline". For example, the following features are likely not a good fit:
Line 49: Line 49:
 
:# vSphere Replication/Data Protection: These are separate products, and may be better considered down the road, if we feel the need to backup virtual machines off-site.
 
:# vSphere Replication/Data Protection: These are separate products, and may be better considered down the road, if we feel the need to backup virtual machines off-site.
   
* '''vCenter Deployment.''' vCenter deployment can be a Catch-22: we are trying to virtualize our entire infrastructure, yet need a host that is external to the entire infrastructure to manage it. There are several online resources (Google it) discussing vCenter as a VM within the cluster (how meta!). That is, vCenter as a VM managing the infrastructure on which it itself is virtualized!
+
* '''vCenter Deployment.''' vCenter deployment can be a Catch-22: we are trying to virtualize our entire infrastructure, yet need a host that is external to the entire infrastructure to manage it. There are several online resources (Google it) discussing vCenter as a VM within the cluster (how meta!). That is, vCenter as a VM managing the infrastructure on which it itself is virtualized.
   
 
I'm actually doing this in my home lab. It works very well, and I've even vMotion-ed it a bunch of times to perform maintenance on the other host. But the potential risks should be considered. -[[User:Norwinu|norwizzle]] ([[User talk:Norwinu|talk]])
 
I'm actually doing this in my home lab. It works very well, and I've even vMotion-ed it a bunch of times to perform maintenance on the other host. But the potential risks should be considered. -[[User:Norwinu|norwizzle]] ([[User talk:Norwinu|talk]])
   
 
:* An alternative could be to recycle the physical machine of one of the servers that gets virtualized. Consider that, while vCenter is a very important part of the vSphere infrastructure, connecting to the hosts directly to perform emergency tasks is possible through vSphere Client. So even if vCenter is down, you can access the hosts.
 
:* An alternative could be to recycle the physical machine of one of the servers that gets virtualized. Consider that, while vCenter is a very important part of the vSphere infrastructure, connecting to the hosts directly to perform emergency tasks is possible through vSphere Client. So even if vCenter is down, you can access the hosts.
  +
  +
==Phase 3 Planning==
  +
Post-training discussion.
  +
  +
* Most of the topics discussed in the previous sections/phases are still relevant.
  +
  +
* Another point of discussion is the Smithers server. It is a dual-CPU, 12-core, 24-thread (6c/12t each) system with 32GB of DDR4 RAM. It has more than enough power to be one of the hosts. Though, from a business perspective, this machine has technically been approved and purchased for the uses that it currently serves today, it is still heavily under-utilized (CPU-wise). Seeing that the virtualization of our current physical servers will not be done all at once, there is potential value in using this machine as one of the hosts. Recall that the point of virtualization is to shift the emphasis/focus from the hardware specifications of the physical machine. Server hosts should just be considered compute nodes, and as such, Smithers appears quite capable, even if one or two CPU generations removed from what you'd be looking to buy today.
  +
:* A potential caveat to this endeavor is that the processor is or will be 2-3 generations older, which means little when considering compute capabilities, but vMotion requires compatibility between CPUs. However, there is a vSphere feature to work around this (EVC clustering), but this should be kept in mind nonetheless.
  +
  +
* At this point in time, storage IO should start being considered. Whether shared or local, storage will be redundant, and thus an array will be required. The first-level RAID arrays (1,5,6) will likely not satisfy the IO requirements when usage by multiple VMs is factored in, so we may have to consider nested RAID arrays. Nested RAID usually dictates more disks, and there will be trade-off between number of disks in a local array (which also has to be x2 for each host) vs. "just going with a SAN". Even in the case of a SAN, there is the possibility of hard disks still being too slow. The incorporation of flash, whether as cache or as a datastore for operating system partitions, should be strongly considered.
  +
  +
* Superserver virtualization somewhat complicates storage planning. At 3.5TB, it would not be feasible to have 2 arrays (one on each host) of this size. So either the Superserver VM and virtual disks be localized to one host (no redundancy), or Superserver is not virtualized until a SAN can be purchased. The latter option is not feasible. Superserver is not compute-redundant today, so it may be ok to localize it on a single VM host. The problem with this however is the storage requirements of Superserver would still require a large array.
  +
  +
==Phase 4 Planning==
  +
Proposed purchase list
  +
  +
* (2) - Lenovo X3650 M5 (MT 8871),
  +
:* 16GB memory
  +
:* 1 x E5-2620v4 '''10-core (10 physical / 20 logical)''', 2.4GHz per core
  +
* (12) - Lenovo 8GB DDR4 ECC
  +
:* 6 per host + include 16GB = '''64GB RAM per host'''
  +
* (10) - Lenovo PM863a 960GB Enterprise Entry SSD
  +
:* All-flash storage
  +
:* (1) 5-disk RAID 6 array per host
  +
:* '''2-disk failure tolerance'''
  +
:* '''~2.5TB''' of storage per host.
  +
:* Controlled by included M5210 controller
  +
* (2) 1GB cache / RAID 6 feature upgrade for M5210
  +
* (1) VMware Essentials Plus Kit + 3-year support
  +
* (2) Lenovo Technician Installed Parts, 3-year 24x7, 4h response time.
  +
  +
Total resources (across 2-host cluster): 20 physical cores, 128GB RAM, ~5TB of space across 2 independent arrays
  +
   
   

Latest revision as of 16:31, 24 April 2017

Overview

This article is a drawing board or "pinterest" for the server virtualization project. The phases loosely represent periods of time where there was concentrated effort put into the project i.e. ~2015-2016 for initial discussions prior to training, 2017 for discussion throughout training etc.

Discussion

Phase 1 Planning

This was written before VMware training. -norwizzle (talk)

Windows Server licensing

  • Datacenter is very expensive, but entitles us to unlimited virtual instances on a single machine with two processors.
  • Standard edition allows for 2 virtual instances
    • Worth checking to see if our Standard licenses can each convert to two virtual licenses.

Hardware

  • 2 hosts
  • 10-14 core processors most ideal
  • SAN - this will be a major decision
    • Shared storage is not required for vMotion - which is being able to migrate a virtual machine to another server.
    • Shared storage, however, is required for high availability i.e. seamless machine failover
      • This would be a nice thing to have, but our servers don't need this kind of 100% uptime. Or rather, the ones that do will not be virtualized.
    • SANs are very expensive, but if we do consider one, there is a little relief in the fact that we won't need a huge amount of GB for each virtual machine.

VMware vSphere

For clarity:

  • vSphere is the virtualization software package
    • ESX/ESXi is the hypervisor
    • vMotion, vCenter are features of vSphere
  • There are two main licensing categories: Essentials and Operations Management

This is another major decision that has a significant impact going forward

    • Essentials entitles you to three hosts, or 6 CPUs, but you will not be able to expand beyond this without practically purchasing a new license altogether.
    • Essentials Plus (which is what we would consider if we went the Essentials route), entitles us to vMotion.

Phase 2 Planning

This section discusses ideas throughout VMware training.
Note: It is apparent that this project still carries the potential to be very expensive. That is my main concern halfway through the training - that the expense of implementation might exceed the value savings of  virtualization. Scale needs to be a major factor during implementation planning, otherwise it will make more sense to not virtualize, and just replace physical servers at the current rate of replacement.  -norwizzle (talk)
  • Cost/features. The areas that need to be scrutinized for cost are:
  • Storage. Shared or local storage? If shared, fibre channel can be ruled out - it makes sense with an existing FC infrastructure, but it's too expensive to get going from scratch. Right now it's shared SAS or iSCSI. Being a networking guy, I'd love to do iSCSI. Ipso facto, I'd hate to deal with the coupling of network and infrastructure downtime, should either happen.
  • There is actually a big case here for scrapping the idea of shared storage, and just using local storage i.e. redundant arrays on each host. The new-ish vSphere feature that makes local storage a feasible idea is cross-host vMotion. Along with cold-migration, this may be all we need to meet or even exceed the current level of service provided by our current infrastructure. Currently, I see shared storage as representing perhaps 1/3 of the project cost. While it would still be good to have, perhaps the acquisition of shared storage can be implemented as a separate and future phase of this project (i.e. migration to shared storage in year 20xx).
  • vSphere license.: Essentials Plus still appears to be the right flavor for us. No DRS, no FT, no Distributed Switch, but it does have vMotion (although no storage vMotion). This license allows for 3 hosts.
  • Host hardware configuration. We definitely want two hosts for redundancy, but with 12-core CPUs, I'm not sure if we'll need dual CPUs per host. RAM can be scaled back too. Perhaps 64GB per host. Overprovisioning resources to VMs seems to be a standard practice anyways, and our compute needs are consistent (not fluctuating).
  • High availability/Disaster recovery. Deploying VMware vSphere (actually, doing virtualization in general) raises the baseline for high-availability/redundancy i.e. virtualizing an entire infrastructure immediately makes it more redundant and better able to cope with downtime (planned or disaster-related). However, HA/DR has many faces, and it will be important to weed out the features that either are not feasible or not ideal in our infrastructure, so we can establish that "baseline". For example, the following features are likely not a good fit:
  1. vSphere HA: This feature is a maybe, or fits under "would be nice". But 'compute HA' is not something we currently have anyway. Requires shared storage.
  2. vSphere FT: Requires multiple copies of the virtual machine. High computer overhead. We do not require the level of uptime afforded by this feature.
  3. vSphere Replication/Data Protection: These are separate products, and may be better considered down the road, if we feel the need to backup virtual machines off-site.
  • vCenter Deployment. vCenter deployment can be a Catch-22: we are trying to virtualize our entire infrastructure, yet need a host that is external to the entire infrastructure to manage it. There are several online resources (Google it) discussing vCenter as a VM within the cluster (how meta!). That is, vCenter as a VM managing the infrastructure on which it itself is virtualized.
I'm actually doing this in my home lab. It works very well, and I've even vMotion-ed it a bunch of times to perform maintenance on the other host. But the potential risks should be considered. -norwizzle (talk)
  • An alternative could be to recycle the physical machine of one of the servers that gets virtualized. Consider that, while vCenter is a very important part of the vSphere infrastructure, connecting to the hosts directly to perform emergency tasks is possible through vSphere Client. So even if vCenter is down, you can access the hosts.

Phase 3 Planning

Post-training discussion.
  • Most of the topics discussed in the previous sections/phases are still relevant.
  • Another point of discussion is the Smithers server. It is a dual-CPU, 12-core, 24-thread (6c/12t each) system with 32GB of DDR4 RAM. It has more than enough power to be one of the hosts. Though, from a business perspective, this machine has technically been approved and purchased for the uses that it currently serves today, it is still heavily under-utilized (CPU-wise). Seeing that the virtualization of our current physical servers will not be done all at once, there is potential value in using this machine as one of the hosts. Recall that the point of virtualization is to shift the emphasis/focus from the hardware specifications of the physical machine. Server hosts should just be considered compute nodes, and as such, Smithers appears quite capable, even if one or two CPU generations removed from what you'd be looking to buy today.
  • A potential caveat to this endeavor is that the processor is or will be 2-3 generations older, which means little when considering compute capabilities, but vMotion requires compatibility between CPUs. However, there is a vSphere feature to work around this (EVC clustering), but this should be kept in mind nonetheless.
  • At this point in time, storage IO should start being considered. Whether shared or local, storage will be redundant, and thus an array will be required. The first-level RAID arrays (1,5,6) will likely not satisfy the IO requirements when usage by multiple VMs is factored in, so we may have to consider nested RAID arrays. Nested RAID usually dictates more disks, and there will be trade-off between number of disks in a local array (which also has to be x2 for each host) vs. "just going with a SAN". Even in the case of a SAN, there is the possibility of hard disks still being too slow. The incorporation of flash, whether as cache or as a datastore for operating system partitions, should be strongly considered.
  • Superserver virtualization somewhat complicates storage planning. At 3.5TB, it would not be feasible to have 2 arrays (one on each host) of this size. So either the Superserver VM and virtual disks be localized to one host (no redundancy), or Superserver is not virtualized until a SAN can be purchased. The latter option is not feasible. Superserver is not compute-redundant today, so it may be ok to localize it on a single VM host. The problem with this however is the storage requirements of Superserver would still require a large array.

Phase 4 Planning

Proposed purchase list
  • (2) - Lenovo X3650 M5 (MT 8871),
  • 16GB memory
  • 1 x E5-2620v4 10-core (10 physical / 20 logical), 2.4GHz per core
  • (12) - Lenovo 8GB DDR4 ECC
  • 6 per host + include 16GB = 64GB RAM per host
  • (10) - Lenovo PM863a 960GB Enterprise Entry SSD
  • All-flash storage
  • (1) 5-disk RAID 6 array per host
  • 2-disk failure tolerance
  • ~2.5TB of storage per host.
  • Controlled by included M5210 controller
  • (2) 1GB cache / RAID 6 feature upgrade for M5210
  • (1) VMware Essentials Plus Kit + 3-year support
  • (2) Lenovo Technician Installed Parts, 3-year 24x7, 4h response time.

Total resources (across 2-host cluster): 20 physical cores, 128GB RAM, ~5TB of space across 2 independent arrays