Archive for June, 2011


No, I’m not buying a Prius ! Its much better than that , As of today I’m no longer with my current employer – it was an amicable parting of ways and I would recommend them to anyone interested in the Datacenter space , especially around the Bucks / Beds areas ! If you are, please let me know.

 

I will be starting a new role next week with Veeam as a Solutions Architect for the UK & Northern EMEA – being part of the “dream team” of staff helping existing & new customers get the best from their investments – you never know you may be unlucky enough to see me at a tradeshow near you at some point on “Booth Babe” duty. I’ve known the company for a few years and am really looking forward to being part of a team of professionals with such a great drive and approach to technology & the people that use it.

 

I will continue to blog and co-host the vSoup podcast and will endeavour to keep both a plug free zone ! While I don’t normally make a huge thing of pushing my employers I felt it important to post in the interests of disclosure. Outside of my job I will not be receiving any additional sponsorship from Veeam for the blog or podcast.

This is a blog post that I’ve had at the back of my mind for a good 6 months or so. The pieces of the puzzle have come together after the Gestalt IT Tech Field Day event in Boston. After spending the best part of a week with some very very clever virtualisation pro’s I think I’ve managed to marshal the ideas that have been trying to make the cerebral cortex to wordpress migration for some time !

Managing an environment , be it physical or virtual for capacity & performance requires tools that can provide you with a view along the timeline. Often the key difference between dedicated “capacity management” offerings and performance management tools is the very scale of that timeline.

clip_image002

Short Term : Performance & Availability

There we are looking at timings within a few seconds / minutes ( or less ) this is where a toolset is going to be focused for current performance on any particular metric , be it the response time to load a web application , Utilisation of a processor core or command operations rate on a disk array. The tools that are best placed to give us that information need to be capable of processing a large volume of data very quickly due to the requirement to pull in a given metric on a very frequent interval. The more frequently you can sample the data , the better quality output the tool can give. This can present a problem in large scale deployments due to a requirement that many tools have to write this data out to a table in a database – this potentially tethers the performance of a monitoring tool to the underlying storage available for that tools , which of course can be increased but sometimes at quite a significant cost. As a result you many want to scope the use of such tools only to the workloads that require that short term , high resolution monitoring. In a production environment with a known baseline workload , tools that use a dynamic threshold / profile for alerting on a metric can be very useful here ( for example Xangati or vCenter Operations ) If you don’t have a workload that can be suitably base lined ( and note that the baseline can vary on your business cycle , so may well take 12 months to establish ! ) then the dynamic thresholds are not of as much use.

Availability tools have less of a reliance on a high performance data layer as they are essentially storing a single bit of data on a given metric. This means the toolset can scale pretty well. The key part of availability monitoring is the visualisation and reporting layer. There is no point only displaying that data to a beautiful and elegant dashboard if no-one is there to see that dashboard ( and according to the Zen theory of network operations , would it change if there was no one there to watch it ! ) The data needs to be fed into a system that best allow an action to be made – even if it’s an SMS / Page to someone who is asleep. In this kind of case , having suitable thresholds are important – you don’t want to be setting fire alarms off for a blip in a system that does not affect the end service. Know the dependencies on the service and try to ensure that the root cause alert is the first one sent out. You do need to know that the router that affects 10,000 websites is out long before you have alerts for those individual websites.

Medium Term : Trending & Optimisation

Where the timeline goes beyond “what’s wrong now” , you can start to look at what’s going to go wrong soon. This is edge of the crystal ball stuff , where predictions are looking to be made in the order of days / weeks. Based on collected utilisation data in a given period , we can assess if we have sufficient capacity to be able to provide an acceptable service level in the near future. At this stage , adjustments can be made to the infrastructure in the form of resource balancing ( by storage or traditional load ) – tweaks can also be made to virtual machine configuration to “rightsize” an environment. By using these techniques it is possible to reclaim over allocated space and delay potential hardware expansions. This is especially valid where there may be a long lead time on a hardware order. The types of recommendations generated by the capacity optimisation components of VKernel , NetApp ( Akorri ) and Solarwinds products are great examples of rightsizing calculations.  As the environment scales up , not only are we looking for optimisations , but potential automated remediation ( within the bounds of a change controlled environment ) would save time and therefore money.

Long Term capacity analysis : When do we need to migrate Data centers ?

Trying to predict what is going to happen to an IT infrastructure in the long term is a little like trying to predict the weather in 5 years time , you know roughly what might happen but you don’t really know when. Taking a tangent away from the technology side of things , this is where the IT strategy comes in – knowing what applications are likely to come into the pipeline. Without this knowledge you can only guess how much capacity you will need in the long term. The process can be bidirectional though , with the information from a capacity management function being fed back into the wider picture for architectural strategy for example should a lack of physical space be discovered , this may combine with a strategy to refresh existing servers with blades. Larger Enterprises will often deploy dedicated capacity management software to do this ( for example Metron’s Athene product which will model capacity for not only the virtual but the physical environment )  Long term trending is a key part of a capacity management strategy but this will need to be blended with a solution to allow environmental modeling and what if scenarios. Within the virtual environment the scheduled modeling feature of VKernel’s vOperations Suite is possibly the best example of this that I’ve come across so far – all that is missing is an API to link to any particular enterprise architecture applications. When planning for growth not only must the growth of the application set be considered but the expansion in the management framework around it , including but not limited to backup and the short-medium term monitoring solutions.  Unless you are consuming your it infrastructure as a service , you will not be able to get away with a suite that only looks at the Virtual Piece of the puzzle – Power / Cooling & Available space need to be considered – look far enough into the future and you may want to look at some new premises !

We’re going to need a bigger house to fit the one pane of glass into…

“one pane of glass” – is a phrase I hear very often but not something I’ve really seen so far. Given the many facets of a management solution I have touched on above , that single pane of glass is going to need to display a lot ! So many metrics and visualisations to put together , you’d have a very cluttered single pane. Consolidating data from many systems into a mash-up portal is about the best that can occur , but yet there isn’t a single framework to date that can really tick all the boxes. Given the lack of a “savior” product you may feel disheartened , but have faith!. As the ecosystem begins to realise that no single vendor can give you everything and that an integrated management platform that can not only display consolidated data , but act as a databus to facilitate sharing between those discrete facets is very high on the enterprise wishlist , we may see something yet.

I’d like to leave you with some of the inspiration for this post – as seen on a recent “Demotivational Poster” –a quick reminder of perfection being in the eye of the beholder.

“No matter how good she looks, some other guy is sick and tired of putting up with her s***”

The London VMUG just keeps on getting better doesn’t it ? What do you mean you don’t know ?  Well for a start its now a pretty much full day event – Complete with its own Genius bar staffed by VMware GSS guys & girls who will be happy to help out on any issues you have ( Though I thought that was usually some of the vExperts enjoying a swift half or two at the pub afterwards ) If you can’t make the genius bar , then you’d be amazed at how the VMUG Hive mind can put itself towards a bit of problem solving , especially if you are buying the round !

 

The afternoon features two tracks with a few great guys , Several of which have been guests on vSoup , clearly a sign of greatness Winking smile The morning features the Keynote & sponsor presentations, this time from Arista Networks , Vision Solutions & Embotics . If you can spare the time then the trip to London is well worth it , not only for the content but the chance to talk geeky with other like minded professionals !

 

I’ll leave you with the full timetable and the link for the signup , which can be found here 

Agenda
Plenary sessions in Capital
10:00 a.m.  – 10:15 a.m. – Welcome, Alaric Davies, Chairman
10:15 a.m. – 11:00 a.m. – Cloudvision for the Virtualised Environment, John Peach, Arista Networks, Sention System Architect
11:00 a.m. – 11:45 a.m. – Private Cloud Management Made Simple, Martin Sajkowski, Embotics, EMEA Operations & Colin Jacks Senior Solutions Specialist
11:45 a.m. – 12:15 p.m.  – Break in Sponsor Expo
12:15 p.m. – 13:00 p.m.– Double-Take by Vision Solutions – Christian Willis, Technical Director: Meeting the Availability Challenges of physical, Virtual and Geographically Dispersed Systems
13:00 p.m. – 14:00 p.m.– Lunch in Sponsor Expo

Track 1 

14:00 p.m. – 14:50 p.m. – vCOPS Advanced, Mark Stockham, VMware  
15:00 p.m. – 15:50 p.m. – SRM Futures, Mike Laverick                                               
16:00 p.m. – 16:50 p.m.- Cloud: Can You Compete? Mark Craddock

Track 2

14:00 p.m. – 14:50 p.m. – Thinking, Building & Scripting Globally, Julian Wood
15:00 p.m. – 15:50 p.m. – Managing IT as We Evolve to Cloud Computing, Colin Fernandez, VMware
16:00 p.m. – 16:50 p.m. -  How to Save your Time With PowerCLI, Jonathan Medd

17:00 p.m. – Close
17:00 p.m. – Onward Drinks at Pavilion End

 

Note: Agenda Subject to change – you will need to register with myvmug.org to register for the event.

I’ve spent more than my usual amount of time in an around airports this week – Travelling to and from the Tech Field Day event in Boston , then hung around with a few of the other delegates before their flights back.

 

It seems that one of the other delegates flights had a pretty severe delay on it due to the incoming flight being late. We realised the only way his flight would be on time would be if they used a different aircraft. My mind immediately went off on a bit of a tangent to Cisco UCS ( as you do ! )

 

The flight plan consists of a number  , a given size and model of Aircraft and a source / destination. With me so far ? The flight plan is given to a particular aircraft , so plane #767-4001 becomes Delta Flight DL270 going from Boston to Heathrow – and will be known as DL270 while that Flight is in use. If for some reason there is a problem with 767-4001 , the airline can opt to use a different plane , for example 777-4002 , which is not quite the same model and in fact has a few more seats & flies a little faster. The plane is still able to take of and land under the identity of DL270.

 

This is very much like a service profile – its not fixed to the hardware ( plane ) and can be associated with different hardware ( which may not be of quite the same specification ) should you require. Its is purely a definition of what would make up that profile just in the same way that we have defined flight DL270 to fly me from Boston back to London.

 

Now if I could only persuade my UCS Chassis to serve complimentary drinks…..

 

Never being a company to stagnate when it comes to releases , VKernel are continuing to develop their product set around capacity management and infrastructure optimisation for virtualised environments. After a strong quarter that has seen record numbers , expanded support for alternate Hypervisors such as Hyper-V & a new product aimed at the real time monitoring end of the capacity management spectrum ( vOPS Performance Analyzer )

The 3.5 release of the main VKernel vOperations Suite , to give it its full name is now “with added cloud”. I’m so glad the product marketing guys did NOT say that – in fact quite the opposite. The product had taken on features as suggested by its service provider & customers who are already down the path towards a private cloud.

vOPS 3.5 adds features which may make the life of an admin in such an environment easier – more often then not they are becoming the caretaker of an environment as workloads are generated via self service portals and on demand by applications. Being able to model different scenarios based on a real life workload is key to ensure your platform can meet its availability & performance SLA’s. Metrics in an environment mean nothing if you are unable to report on then, and this has been address with the implementation of a much improved reporting module within the product , which allows a much more granular permissions structure & the ability to export reports into other portals.

The capacity modeller component now allows “VM’s as a reservation” – knowing that not all workloads are equal means that you need to model the addition of workloads of differing size into an environment. These model VM’s can be based on a real CPU/MEM/IO workload.

The last key improvement is yet more metrics – this time around Datastore & VM performance including IOPS. Having been through an exercise where I had to manually collect IOPS data for an environment , I can personally attest to the value of automating this! When I was an end user of the vOPS product it was a metric I was constantly bugging the product development guys for – looks like they listened !

 

For more information, head over to the VKernel website.

I’ve just spent a couple of hours this morning having a kick about with some of the more advanced features of Veeam backup & replication v5.  I’ve been able to run instant restores of my machines without any issues , but what I really wanted was some backup verifications. I had a Virtual Lab & Application group built , but could not get the VM’s to ping. I’ve been digging into the setup of the router and on the verge of hair pulling – how hard can it be ? all of the VM’s are on the same VLAN/Subnet.

I tried a slight change of tack and selected a different test machine , simple because the sizable server 2008 r2 VM I used was taking a little longer than I was hoping for each test. When I used a smaller Server 2003 VM for the test the ping test worked just fine.

 

This led me to go back to the base virtual machine & its own settings. I’m trying to keep my lab network as secure as possible , and make use of the security settings Microsoft has been kind enough to provide. As a result , the windows firewall is enabled and set up correctly for the production environment. However as I am restoring the VM in isolation (without a domain controller ) it would appear that the following happens.

 

Windows Firewall on the Source VM.

image

Windows Firewall on the Recovered VM

 

image

File and printer settings in Firewall exceptions.

image

 

There’s ya problem ! I don’t have a Virtualised DC to test to see if a DC is in the same Surebackup application group that the windows firewall will not switch to Public settings , but in the interim I’ve had to open up the file and print sharing on the public domain. However it is possible to be a little bit more granular than just allowing all file and printer sharing. If you go to advanced settings, you can Enable the inbound rule “File and Printer Sharing ( Echo Request – ICMPv4-In)” for the Public domain.

image