Information Systems:VMWare Production Infrastructure

From uniWIKI
Jump to navigation Jump to search

Hardware

uniPHARM purchased a pair of Lenovo x3650 M5 servers in March of 2018 from Anisoft to act as hosts for VMware. The machine type of both servers is 8871-16A and the serial numbers are J121W8C and J121W8D. Both servers do have identical hardware components and as of March 2018, they also have identical and current firmware. Both servers have an Intel Xeon E5-2620 v4 processor populating the first socket. The second socket for both servers is empty. The Xeon has 8 cores and 16 threads. Both machines came with an initial 16GB stick of memory in the first slot. An additional 6 sticks of 8GB were installed into each server so that each has 64GBs of memory. Be aware that the motherboard has specific requirements for where additional memory can be inserted. The numerical order for which memory slots can be used is clearly displayed on the top side lid of the server. If more memory is purchased and installed for these servers, the instructions on the top lid must be followed, or the memory will not be recognized correctly. The memory slots labeled from 13 to 24 on the motherboard cannot be used until the second CPU socket has a processor.

Each server has an ServeRAID M5210 controller that is attached to the motherboard. For each M5210, there is also a RAID5 daughter card that is attached which provides additional capabilities. The M5210 can control up to 24 drives for each server. The initial purchase included 5 drives for each server that are 960GB SSDs. Both of the M5210 controllers are configured with a RAID6 set containing the 5 SSDs. The RAID6 volume can survive the failure of 2 drives before there is data loss. There are no warm or cold spare drives available as of March 2018. The total amount of storage space on each server is 2679GB. The strip size is 256KB. All parameters of the RAID6 set were set to default for the M5210 controller. Each server also has a USB thumb drive that is plugged into the motherboard where the hypervisor software is installed. The thumb drive is 2GB in size and is USB2.0.

Each server has 4 network ports that use the Broadcom NetXtreme chip. Each server has an additional network port that is for dedicated IMM2 access. More on the IMM2 is below. Each server has 2 USB and a VGA port on the front side and the back side. The purchased configuration of each server did not include any riser cards for PCIe expansion cards so if there is a need for extra abilities, the riser cards are also required. Each server has 2 power supplies which can share the load as well as take the entire electrical load should 1 fail. Each server has different power sources feeding into each power supply. The top, or number 2 power supply for each server, is fed from the right side PDU in the Power8 rack. The bottom, or number 1 power supply, is fed from the left side PDU. The PDU's are the "power strips" on each side of the rack where the right side gets power from the Leibert UPS and the left gets power from the Power8 UPS at the bottom of the rack. This all means that in the event of a Hydro power failure, the servers will stay up and operational even if one out of two UPS's fails. Both servers will immediately power off if there is a Hydro power failure AND BOTH UPS's also fail or run out of battery power.

Both servers are located in the Power8 rack and are 2U in size. The top server is at rack unit 18 and the bottom server is at rack unit 16. Both servers can be pulled out on the rack rails and serviced while powered on. The hard drives and power supplies are hot swappable but the memory is not. Neither server has a CD-ROM drive so if a disc needs to be used, the only option is to plug in a USB optical drive.

Hardware - Network Port Map

This list shows the layout of the network cables that connect the servers to the stacked core network switches in the Server Room.

  • Server Room Stacked Switch (1of4 - Top) Port 2 goes to VMHost01 dedicated IMM2 port
  • Server Room Stacked Switch (1of4 - Top) Port 14 goes to VMHost02 dedicated IMM2 port
  • Server Room Stacked Switch (1of4 - Top) Port 3 goes to VMHost01 Management Port on eth2
  • Server Room Stacked Switch (1of4 - Top) Port 15 goes to VMHost02 Management Port on eth2
  • Server Room Stacked Switch (1of4 - Top) Port 4 goes to VMHost01 vMotion Port on eth3
  • Server Room Stacked Switch (1of4 - Top) Port 16 goes to VMhost02 vMotion Port on eth3
  • Server Room Stacked Switch (1of4 - Top) Port 5 goes to VMHost01 LAN1 Uplink on eth0
  • Server Room Stacked Switch (1of4 - Top) Port 17 goes to VMHost01 LAN2 Uplink on eth1
  • Server Room Stacked Switch (1of4 - Top) Port 6 goes to VMHost02 LAN1 Uplink on eth0
  • Server Room Stacked Switch (1of4 - Top) Port 18 goes to VMHost02 LAN2 Uplink on eth1

Hardware Purchase Details

The purchase order number for this project is 8226136 (dated Feb 13, 2018) with a vendor number of 20145 and invoices numbers 19253, 19254, 19256 dated March 14, 2018.

Hardware Support

uniPHARM has a hardware maintenance contract with Lenovo to provide 24x7x365 onsite parts and labour with a 4 hour response time for the period of March 6, 2018 to March 5, 2021. A renewal of this maintenance contract is expected in February of 2021 because the expected life span of these servers is 5 to 6 years. Note that "maintenance" is not a good descriptor of the service. If a hardware component fails, then the replacement of that part is provided by Lenovo at no cost and is installed by a Lenovo technician at no cost. If uniPHARM adds in non-Lenovo parts they are not covered by the existing contract. Additional Lenovo branded parts that are installed after initial purchase are covered under the existing contract. If a non-Lenovo part is installed and causes damage to the server, the agreement becomes null and void. The phone number for parts replacement and technical support under this contract is 1-800-426-7378. This contract does not cover any VMware software.

IMM2

The Integrated Management Module II is an out of band service used to control the x3650 server hardware. It is comparable to the HMC for the Power8. The IMM2 is on and active and accessible as long as the server has power feeding into the power supplies. If both power supplies are unplugged, the IMM2 is not active and accessible. The IMM2 resides on a small "SystemOnChip" on the motherboard and is served by a webserver within a small Linux OS within the SOC. The IMM2 also has a dedicated network port that is only used by that function. On smaller 1U servers the IMM2 shares the first Broadcom network port.

The IP addresses assigned to each of the 2 IMM2's are 172.30.18.54 and 172.30.18.55. The host names are bmvmhost01.unipharm.local and bmvmhost02.unipharm.local. BM is a hold over from a previous IBM product called "Baseboard Management" and this naming convention is a continuation of that. The username for both IMM2's is adminit and the password is visionit. The IMM2's are not accessible from the public side of the firewall and need VPN access if logging in from outside the local network.

The IMM2 web interface is primarily used to control the hardware, alert for hardware failures and to provide a screen console for the server when no physical screen-keyboard-mouse is attached. This is a critical function for troubleshooting or diagnosing software crashes no matter what OS or hypervisor is installed. The console screen function can be presented using a ActiveX, or Java, or HTML5 app that is served from the IMM2 - no need to install any EXE on a laptop. The Java client works consistently. The IMM2 can power on or off the server and show very detailed information on temperatures, fan speeds, voltages and firmware levels of all components. The IMM2 is also setup to email alerts to I.S. staff when hardware events occur such as a failed hard drive or power supply. The IMM2 is also configured to call home to Lenovo when a hardware component fails in the same manner that the HMC connects to IBM when a Power8 hardware failure occurs.

As of March 2018, the x3650 servers do have the latest available firmware and should not need any firmware updates unless Lenovo requires it for replacement parts. If updated firmware is needed, it can be installed from within the IMM2 web interface.

vSphere Hypervisor

The vSphere (ESX) hypervisor is installed on the USB thumb drive that is plugged into the internal motherboard port for each server. The customized Lenovo version of the ESX installer was used as it is pre-compiled with all of the device drivers present in Lenovo branded hardware. Each server, now known as a host, is set to only boot from that USB thumb drive. The hypervisor operating system boots and loads into memory and is thusly ready to house and run virtual machines.

  • Server with serial number J121W8C has a host name of vmhost01.unipharm.local
  • The administrator username is root
  • The administrator password is NewVisionIT
  • The management network is on eth2 which is the third network port on the back of the server
  • The management network IP address is 172.30.18.19 subnet 255.255.248.0 gateway 172.30.16.1 and is configured to do DNS lookups to 172.30.18.13 and .14
  • The management network is using the default VLAN

and

  • Server with serial number J121W8D has a host name of vmhost02.unipharm.local
  • The administrator username is root
  • The administrator password is NewVisionIT
  • The management network is on eth2 which is the third network port on the back of the server
  • The management network IP address is 172.30.18.20 subnet 255.255.248.0 gateway 172.30.16.1 and is configured to do DNS lookups to 172.30.18.13 and .14
  • The management network is using the default VLAN

The web administration pages to directly access the ESX hypervisor and bypass vCenter are:

The above links that go directly to the hypervisor don't need to be used or accessed for day to day administration because all administrative tasks should be done within the vCenter user interface. The above links only need to be accessed if vCenter is unavailable, down or broken.

vCenter Server Appliance

During Stage 1 of the VCSA setup, the following settings were used

  • The FQDN of the VCSA is vcsa.unipharm.local
  • The IP address of the VCSA is 172.30.18.23
  • The password for the VCSA is NewVisionIT@2051
  • The password requires upper and lower case letters AND a number AND a special character but no spaces are allowed.
  • The VCSA has an integrated, as in no external, Embedded Platform Services Controller
  • The VCSA was deployed in "Tiny" mode and is using a thin provisioned virtual disk

During Stage 2 of the VCSA setup, on the SSO configuration screen, the following settings were used

  • Single Sign-On domain name is vsphere.local
  • Single Sign-On user name is administrator
  • Single Sign-On password is NewVisionIT@2051
  • Site name is uniPHARM
  • SSH access was enabled
  • The following link was used as a guide for the very confusing and badly designed SSO setup https://esxsi.com/2016/11/16/vcsa65/

Please note that if the VCSA virtual machine needs to be rebooted, it will take a good 5 minutes for the web page to be accessible and you may see a plain text page saying the interface is initializing so be patient as the appliance settles down after a reboot.

vCenter Management

The virtualization infrastructure is designed according to VMware's best practices. In vCenter, a datacenter has been created and called "uniPHARM Datacenter". It contains a cluster called "uniPHARM Cluster. The cluster contains both hosts and any virtual machines are listed under the hosts. The cluster was created with DRS and HA turned off to begin with, however they can be turned on a later time. Each host has a single datastore and they are named VMHost01DataStore01 and VMHost02DataStore01. Each datastore is the entire RAID6 set of 5 SSDs, totaling 2.62TB of usable space. If, in the future, there is ever a need to add storage, the naming convention should continue with VMHost01DataStore02 or VMHost01DataStore03.

vCenter Management - vNetwork

As mentioned above, each host has 4 gigabit network ports. The network configuration for our VMware infrastructure is not complex, by VMware standards but the information below is critical to understanding how it has been designed:

  • The first (vmnic0) and second (vmnic1) NICs are used for everyday ordinary network traffic going in or out from or to the virtual machines
  • The first (vmnic0) NIC is connected to vSwitch2 and is labeled as "uniPHARM Production LAN" - imagine that an invisible virtual switch lies between the first NIC and the real physical switch in the Server Room
  • The second (vmnic1) NIC is initially not configured to do anything as of March 2018 but it can be used for a second Production LAN or some sort of testing LAN or fail over if licensing permits
  • The third (vmnic2) NIC is configured as a vmKernel port (172.30.18.19 & 20) connected to vSwitch0 acting as the "Management Network"
  • The third (vmnic2) NIC and vSwitch0 are only used as a connection that vCenter uses to control the hosts and its setup is a best practice from VMware
  • The fourth (vmnic3) NIC is configured as a vmKernel port (172.30.24.1 & 2) connected to vSwitch1 on VLAN 3 acting as the "vMotion Network" and is only used when a virtual motion needs to move from one host to another

It is CRITICALLY important that ANY vnic, vSwitch or vmKernel configuration changes that are made on one host are precisely repeated on the other host. Having different vSwitch labels, for example, prevents vMotion from succeeding. Our EPK license does not permit distributed switches so we have to do the configuration changes manually to each host.

VUM VMware Update Manager

VUM has been configured with a dynamic baseline for the hosts and for the VCSA. The best explanation of how VUM works is this link - https://www.youtube.com/watch?v=X_xGihAfLSo Please note that the safest way to apply patches and updates to the hosts hypervisor is to vMotion all the VM's on one host to another and then to remediate the empty host, then vMotion all the VM's back and do the other host. Since we only have 2 or 3 hosts and don't apply patches very often, this method (while time consuming) results in very little down time and a patched infrastructure.

Content Library

Within vCenter, a content library called uniPHARM Content Library has been created. The purpose of the library is to store ISO and OVF files that are used when creating new VM's so that an OS can be booted - remember that the VMHosts don't have attached CD-ROM drives. Since creating new VM's is not a daily occurrence, the content library should be empty or mostly empty for the majority of the time in order to not consume storage space that is best served for VM's.

Virtual Machines

A virtual machine is like a bucket. The bucket contains a simulated processor, memory and a flat file that acts as a hard drive. An OS can be installed inside the bucket and can be given a network connection which is also simulated. The OS can't tell the difference between real hardware and simulated so it behaves normally. The bucket that contains the simulated hardware and the OS install, sits on top of the hypervisor. The hypervisor takes simulated hardware calls and maps them to real physical hardware so that each bucket can get a slice of processor and memory. The real hardware can house many buckets and the contents of each bucket don't mix or interact preserving the containerization. And finally buckets can be moved to different hardware without interrupting the OS contents via hocus pocus magic.

VCSA

The vCenter appliance VM was created with all the default settings as they were set by the ISO installer. It currently has 2 vCPUs and is using 1.5GB of memory. It has a very unique way of using 14 different thin provisioned VMDK files that act as its hard drives and is currently only using 25GB of real storage space. The VM hardware version is 10. This VM is very important to safe and healthy operation of our virtualization infrastructure.

XClarity

Appliance VM that replaces IBM Director. Might not need this after all.

Thermoprofile

First VM to be converted, document settings

Mail

Other VM to be converted, document settings

Mirador

Other VM to be converted, document settings

SuperServer

Other VM to be converted, document settings

WindowsXP JetDirect

DarrenF created this VM a long time ago in VMware Workstation so that the HP JetDirect boxes at each picking station can be controlled, configured and setup correctly. The JetDirect boxes are so old that their web administration pages only work with Microsoft Java, which only exists in a 2002 version of WindowsXP prior to SP1 or SP2. The web administration page of the JetDirect boxes works in the same manner as pages for all the other network attached printers. The admin page for the JetDirect boxes need to have the correct IP address and host name set plus other print and network options. New Windows OS's like Windows 7 and 10 are not able to display the JetDirect web admin page because the version of Java it needs does not exist in newer versions of Windows. Having a VM with an old version of Windows is the most convenient method if someone needs to configure a JetDirect box, otherwise, the VM can be left powered off. This VM is only taking up 16GB of storage space.

Lucy

Other VM to be converted, document settings

Smithers

Other VM to be converted, document settings

Third Host Used For Testing

Describe how after we converted Smithers to a VM, the physical server was added to vCenter as a third host

VMware Tools

Describe how most VM's need the Tools application installed and how its the method that vCenter uses to talk to VM's in a deeper more spiritual manner

vCenter Standalone Converter

The vCenter Standalone Converter is a badly named application that converts real physical computers into virtual machines running on a host. The application is Windows based and is typically installed on an IT laptop or workstation. The software creates a "triangle" where it talks to the source computer and the destination host. To begin with the Converter needs the FQDN of the source computer and local administrator credentials. AD administrator credentials are good enough if they are in the local admin group on the source machine. The Converter then uses a WMI connection to figure out what hardware and OS is present on the source machine. During the "Convert Machine" wizard, the Converter also opens a connection to the VCSA which manages the uniPHARM Cluster. Once the Converter understands the source and destination, the user doing the conversion has a chance to customize certain options like how many vCPUs are in the converted VM and how much memory there is and whether or not the source machine is powered off at the end of the conversion.

At the end of the wizard the Converter creates an empty "shell" virtual machine on the chosen destination host and begins to copy the source hard drive(s) to it. It is important to note that the drive copy is direct from the source to the destination and does not hop to the laptop controlling the conversion. The hard drive copy process is using VSS in the context of converting Windows machines so the VMware best practice is to turn off any non-Microsoft services that are running prior to the conversion process to minimize any chance that VSS can't get a lock on a file. It is also best practice to set the source machine in the convert wizard to power off at the end of the last sync so that the newly created VM can be up and running after the last sync and so that there is no duplicate IP address/name conflict. According to VMware documentation, the same "magic" that makes a vMotion happen is used in the Converter to take running physical machines and turn them into running VMs with only a second or two of pause.

When the Converter is finished copying the hard drive(s) and the last sync, the VM is set to either be up and running or powered off and ready to be powered on for the first time as a VM. In the second scenario, when the converted VM is powered on for the first time, it will recognize a bunch of new hardware. For example, an older IBM server would have a physical Broadcom NIC, when it is converted it will have a VMXNET NIC. The Converter will inject all the drivers the newly converted VM will need to recognize all the different virtualised hardware. There may be a need to do a reboot for a newly created VM as Windows typically asks for one when new hardware drivers are installed. The person doing the conversion can also choose to have the Converter automatically install the VMware Tools package if needed.

Be aware that the amount of time it takes to convert a physical machine to a VM is limited by network bandwidth. All potential source machines are on the same switch stack as the VMHosts so that's as fast as it is going to be. A source machine that has SSDs would be bottlenecked by a 1GB network connection, however an older slower source server with mechanical hard drives might not be. In any case, there is a hard limit on how fast a 1GB connection can move data.

The Converter application is currently installed on DarrenF's laptop, but can be installed on other I.S. laptops and is not tied to any licensing. The Converter is free to use but can't do anything useful without vCenter.

VMware Licensing And Support

uniPHARM has purchased a vSphere 6 Essentials Plus Kit that includes vCenter. This means that the license entitles us to have a maximum of 3 hosts each having a maximum of 2 CPU sockets and we are entitled to 1 instance of vCenter. The license is usable forever, but the attached technical support is renewed yearly. The licenses and keys are available through the MyVMware portal at

  • www.vmware.com
  • Username is darrenf@unipharm.com
  • Password is visionit
  • Account number 114624681
  • Customer number 9027898369

uniPHARM also owns a license for a very old version of VMware Server 2.0 and Workstation 9.0 which are both EOL and not usable. While the current EPK licenses are installed correctly inside vCenter, if vCenter needs to be re-installed, the process is to run a licensing report from MyVMware, which generates a CSV file that is imported into vCenter. The next step is to assign the vCenter license to itself and the vSphere licenses to the hosts.

Change Log - Try And Record Any Major Configuration Or Software/Hardware Changes To The Infrastructure Here

  1. Both servers have their UEFI/BIOS set to only boot from the USB thumb drive
  2. Both servers have their UEFI/BIOS set to use maximum performance in the power settings as per this KB article https://www.ibm.com/support/home/docdisplay?lndocid=migr-5098137
  3. Initial ESX version installed on March 21, 2018 is 6.5.0 Build 7388607
  4. VMHost01DataStore01 created as the first datastore
  5. The local datastores are seen as non-ssd by ESX because they are behind a RAID controller and not part of a vSAN
  6. VCSA password is NewVisionIT@2051
  7. vcsa.unipharm.local is at 172.30.18.23
  8. VCSA virtual machine set to automatically start when the hypervisor on VMHost01 boots
  9. VCSA added to Active Directory as a computer object for SSO on March 22, 2018, located in the Servers OU
  10. Configed SSO according to https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.vcsa.doc/GUID-08EA2F92-78A7-4EFF-880E-2B63ACC962F3.html
  11. Added DarrenF and NorwinU AD user accounts as being able to log into VCSA web client
  12. vMotion IP on VMHost01 is 172.30.24.1 IP on VMHost02 is 172.30.24.2
  13. 8 minutes to vMotion 60GB Win7 cagepc as a test P2V on March 23, 2018 with no VLAN
  14. Created content library and uploaded the xclarity OVF and 3CX ISO files March 23, 2018
  15. Added VLAN 5 to vMotion Network - NorwinU configed physical switch to use VLAN 5 on ports 4 and 16 on Stacked Switch 1of4 March 23, 2018
  16. Changed VLAN 5 to 3 for some reason and changed the TCPIP stack for the vMotion Network for optimization
  17. Added DarrenF and NorwinU AD accounts as permitted Administrators to the entire VCSA hierarchy and got SSO working correctly
  18. Configured VUM to scan for updates each Monday of every week and email DarrenF
  19. Ran a VUM Remediation to install Spectre VIBs via the VUM wizard both hosts now at 6.5 build 7967591 March 26, 2018
  20. P2Ved the Thermoprofile physical server onto VMHost01 March 26, 2018