I’ve been lucky enough in the last couple of days to get hands on with some Cisco UCS kit. Coming from a 99% HP environment , its been a very new experience. I’ll try to go get too bogged down into technical details , but wanted to note down what I liked and what I didn’t like about my initial ventures into UCS.
As ever with things like this, I didn’t spend weeks reading the manual. If I did that I’d do nothing but read manuals with no time to do any actual work I did got through a few blog posts and guides by fellow bloggers who have covered UCS in much more detail than I will 9 at this stage at least.
It seems that the unique selling point of the UCS system is “server profiles” rather than setting up a given blade in a given slot , a profile is created and then either assigned to a specific server or allocated from a pool of servers. The profile contains a number of configuration items , such as number and config of NICs & HBA’s that a blade will have , and what order the server will try devices for boot.
The last item seems the most critical , because in order to turn our UCS blades into stateless bits of tin , I am building the server profiles to Boot-from-SAN. Specifically they will be booting up into ESXi , stored on a LUN of a Netapp FAS2020 storage unit. the Netapp kit was also a little on the new side to me so I’m looking forward to documenting my journey with that too!
Before heading deep into deploying multiple service profiles from a template, I thought I would start with some (relative) baby steps and create a single service profile , apply that profile to a blade and install ESXi onto an attached LUN , which I would then boot from. A colleague had predefined some MAC & WWN pools from me so I didn’t have to worry about what was going to happen with those.
Creating the service profile from scratch , using the expert mode ran me through a fairly lengthy wizard that allowed me to deploy a pair of vNIC’s and a pair of vHBA’s on the appropriate fabrics.A boot policy was also defined to enable boot form a virtual CDROM , followed by the SAN boot. At this point I found my first gotcha. It was a lot easier to give the vHBA’s a generic name , such as fc0 and fc1 rather than a device specific one e.g.. SRV01-HBA-A. Using the generic name would later allow me to use the same boot policy for all servers at a template level. As you also have to specify the WWPN for the SAN target, and at the time of writing the lab only had a single SAN , a single Set of WWPN’s can be put in. If you had requirements for different target WWPN’s you would need a number of boot policies.
Working our way back down the stack to the storage , the next task was to create the zone on the Nexus 5000 fabric switches. For cisco “old hands” here is a great video on how to do this via an SSH session.
I had just spent a bit of time getting a local install of fabric manager to run due to the local PostGres db. service account loosing rights to run as a service , which was nice so determined to use fabric manager to define the zones. As with zoning on any system you need to persuade the HBA to log into the fabric. As a boot target had already been defined the blade will attempt to log into the fabric on startup , but it did mean powering it on and waiting for the SAN boot to fail. Once this was done the HBA’s can be assigned an alias , then dropped into a zone along with the WWPN of the storage and finally rolled up in to a zone set. Given that the UCS is supposed to be a unified system , this particular step seems to be a little bit clunky and would take me quite some time if I had 100 blades to configure. I will be interested to see if I can find a more elegant solution in the upcoming weeks.
Last but not least , I had to configure a disk. For this I used Netapp System Manager to create a lun and associated volume. I then added an initiator group containing the two HBA WWPN and presented the lun to that group. Again this seems like quite a lot of steps to be doing when provisioning a large number of hosts. Any orchestration system to make the this more expansive would have to be able to talk to UCS or the fabric to pull the WWPN’s from , provision the storage and present it accordingly.
The last step was to mount an iso to the blade , and install ESXi. This is the only step I’m not really pondering how I would do the install if it was not 1 but 100 hosts I had to deploy. I’d certainly look to PXE boot the servers and deploy ESXi with something like the EDA . By this stage I figured It was time to sit back with a cup of tea and ponder further about how to scale this out a bit. However when I rebooted the server post ESXi install , in stead of ESXi starting , I was dumped back to the “ no boot device found: hit any key “ message.
This was a bit of a setback as you can imagine , so I started to troubleshoot from the ground up. Has I zoned it correctly ? Did I present it correctly ? Had I got the boot policy correct ? I worked my way through every blog post and guide I could find but to no avail. I even attempted to create the service profile on the same blade , but again no joy. It would see the LUN to install from , but not to boot from. As Charlie Sheen has shown , “when the going gets tough , the tough get tweeting” so reached out to the hive mind that is twitter. I had some great replies from @ChrisFendya and @Mike_Laverick who both suggested a hardware reset ( although mike suggested it in a non UCS way. The best way for me to achieve this was to “migrate” the service profile to another blade. This was really easy to do and one reboot later I was very relieved to see it had worked. It seems that sometimes UCS just doesn’t set the boot policy on the HBA, which is resolve by reassociating the profile.
I look forward to being able to deploy a few more hosts and making my UCS setup as agile as the marketing materials would suggest !