Showing posts with label capacity management. Show all posts
Showing posts with label capacity management. Show all posts

Thursday, June 26, 2008

Getting a bit drunk on enterprise dollars.

It's official - Gartner's Thomas Bittman thinks VMware might be drunk.

Denise Dubie has a great article on virtualization management capabilities and a great quote describing the upcoming battle between VMware and Microsoft for the x86 virtualization market, he is quoted as saying:

"The enterprise is going to be very leery of Microsoft, but the on-ramp to VMware is a bit steep for small businesses. VMware doesn't want to lose that potential business, but the company was getting a bit drunk on enterprise dollars," says Thomas Bittman, Gartner vice president and distinguished analyst.

The abundance of enterprise dollars spent on virtualization is because virtualization fixes so many problems, reduces power, reduces physical requirements, makes x86 hardware more efficient, increases uptime, allows resource management at the OS workload level, etc.

One of my favorite reports is one that IDC did in 2006 – it depicted– IT investment to be higher spending in Year 1 on a VMware / virtualization project but that in Year 2 and Year 3 and possibly Year 4 – IT departments would avoid spending on server hardware – you would just fill up the empty capacity of the system you built in 2006.


It looked good on paper, spend more now, avoid spending later.

Unfortunately, multiple issues caused IT departments to run out of capacity, VM sprawl occurred, single core and dual core servers could not hold as many VMs as the equivalent quad core servers.

Often P2V migrations went unchecked, servers have excess CPU capacity but not enough Memory, VMs are consuming too many resources and as a result enterprises are oversizing virtualization projects or not driving up VM density to get the biggest bang from their investment.

Avoid the hangover from “getting a bit drunk” and having to purchase new hardware, more memory, bigger servers, etc. by getting a resource management tool in place and understanding what resources your VMs are using and where you have capacity in virtualized environments.


Monday, June 23, 2008

The Calm Before the Storm

Rakesh Kumar of Gartner published a white paper last fall entitled "U.S. Data Centers: The Calm Before the Storm".

In it he says U.S. Data Centers "are facing considerable disruption during the next three or more years" and they are facing it from a few things:

  • Energy
  • Green IT initiatives
  • Floor space demands
  • New technology

No mention of virtualization unless it's the source of all of the above - impacting energy, trying to be a green IT initiative, trying to help with floor space and it is a new technology.

What should CIOs be going now to prepare for this storm?

  • Consider Data Center Colocation - see who has a data center nearby and see who has fiber to it - you will need 1 GB or 10 GB links depending on the size of your enterprise and hire some good financial people to determine if there is a decent ROI on moving your Data Center to a third party provider.

    Also ask the beancounters to factor in running your own fiber, this may not be as expensive as it once was, carriers may have available strands and you may only have to do the last mile to your location.

  • Worry about your power bill - energy costs have increased, consider off peak times, VMware's introduction of DPM is the ability to power on ESX hosts and then move VMs to take advantage of lower density environments (think your overnight routines that chew up alot of CPU) with DPM you can spread them out to take advantage of lower power costs.

    Before 6 AM and after 6 PM may be lower rates ($$ per kw/hr).

  • Invest in Systems Management tools - get something that helps you identify who is using your resources and driving up your costs. Chargeback by VM will allow you to fairly delivery charges for Data Center usage by the business unit using the most resources.

    This type of transparency will priortize which business unit needs to fix their Data Center problems - be it running highly transactional reports during the day that could be run at night, poorly coded applications that use too much memory, too much CPU, etc.

    Start a list of power supplies in the Data Center - servers, SANs, etc - you will be shocked to see newer servers may have power supplies running at 900-1300 watts - that's nearly 1 KW per hour per server.

Remember its this simple:

Servers = Power A = Heat = Cooling = Power B.

To fix this:

  1. Reducing your physical servers
  2. Reduces your power A
  3. Reduces your heat
  4. Reduces your cooling
  5. Reduces your power B.

Now just get the financial data lined up to show that a server reduction project (i.e. virtualization) may costs some $$ but it be offset by the cost saving of reducing power A and B.

Monday, June 9, 2008

Killer VMs on the loose.

I have been hearing more and more about the "Killer VMs" - think of this as the lovechild of Virtualization and the industry term "Killer Apps".

Wikipedia has it as "is an application so compelling that someone will buy the hardware or software components necessary to run it."

Basically virtualize the "Killer App" and its becomes the "Killer VM".

This will make you visible, put your name on the map, your CIO will thank you, your CEO may even wave at you once in recognition of your stellar service.

Don't get cocky. These "Killer Apps" often have problems and are the application server than when it hiccups, goes down, causing your business pain and makes everyone go "We really should do something about that" but no one does.

Alot of IT departments may spend a ton of money on Active/Passive Clustering, etc - remember its a favorite child, it may get all sorts of resources, dollars spent on it, trying to ensure application SLA's or increase its uptime and performance.

I have seen new servers, more memory, faster CPUs, even SAN's purchased to manage "Killer Apps".

I have seen investment in high-end clustering, low-end clustering for "Killer Apps" - when at the end of the day the "Killer App" may just be a Windows 2000 Server running SQL with an application that is mission critical.

Enter VMWare, enter DRS, VMotion, HA, etc and you pick up some amazing tools to manage these "Killer Apps" and they become "Killer VMs".

Mark Brunnel writes about putting Navision and SQL 2005 into a VM, and had the v-piphany (virtualization epiphany):

"It is amazing to see VMWare running and the management and failover capabilities. For me it means the end of active passive clusters."

Mark's done some rough benchmarking and found that "VMWare is just slightly slower in posting but only 5% maximum." that's compared to the application running in Windows.

More and more people are going past their initial P2V consolidation effors, alot more are building VMs without every having a physical server, and now folks are optimizing environments, in this 2nd or 3rd phase of Virtualization, the Killer VMs are going to start showing up - these VMs will be more important than some others, require more attention, more care and feeding and better capacity management of the resources.

No CIO is going to like to hear that a VM used by 10 people took down a "Killer VM".

No CIO wants to hear that you could have prevented it but hadn't rolled out Resource Pools yet or don't have the right systems management tools to manage/monitor resource utilization.

Monday, June 2, 2008

One Quad or Two?

One of the best resources on the Internet for VMWare implementation is the VMTN community forums - its top notch.

This week there was a discussion about budgets and performance (where finance always mixes it up with IT (that and Chargeback)).

The post asks about the value of two medium-speed (1.6 - 2 Ghz) QuadCore CPU's ($$) vs. one high speed (3.33 Ghz) QuadCore CPU ($$$$).

I liked William Bishop from Huntsville Hospital response - "You'll get better density on the dual socket". He prefers the "dual proc, quad core setup" and has been "adminning vmware from some of the first dual cores to the newest quad cores."

I wonder if he has done anything with 4 socket x QuadCores?

Density is important - it's going to help you drive down your per VM costs and generate better ROI on the dollars invested in a virtualization product.

Thursday, May 29, 2008

B-Hive acquisition is a real smart move for VMWARE

Since the dawn of computing when it comes to performance management the most critical element is the user. It does not matter what performance metrics say inside the Virtual Center. It is all about how your users perceive the performance to be. Is it fast, Ok, too slow. It is not about the metrics. At the end of the day it is a qualitative experience. As David Marshall
correctly points out in his coverage of the news, the cool factor here is that now VMWare will be able to granularly break down where the time in application response is being spent. Is it in the network, database, application? When combined with vmotion, a vm could be moved to another host based on the analysis of where the problems are. If it is a network problem, move vm to another segment closer to ens users. If it is a host capacity issue, move it to another host where more capacity exists. The angle that really excites me is the ability to monitor more granularly down to the end user level. We would be able to answer how much resource is being utilized by a given user. This can be used in chargeback, capacity management etc. I hope to see Vmware publish this API in the near future!!

Saturday, May 24, 2008

Capacity Planning for ESX is a multi-dimensional challenge

Windows Admins listen up. Your world has changed. No more one application running on one Windows server where none of your capacity resources were shared. Once you virtualize your servers, they have to share memory, cpu, storage, network bandwidth, disk i/o, network i/0, etc, with other VMs running on the same hardware. Capacity planning which was a non-event in the Windows world is now a must do, otherwise you will run out of capacity and experience performance problems or even worse - downtime.

Capacity Planning is a multidimensional problems. To do it correctly you must take into account literally hundreds of variables. Here are some of them:

- how many VMs do you deploy?
- where are you going to deploy them
- how much resource to allocate to them
- what happens if you want to change hardware
- will you violate any configuration constraints?
- do you need another host?
-what resource will you run out of first? memory? CPU, Storage -- and where?
- how many more VMs can you fit into each cluster?
- what happens if VMs get vmotioned?
- will you violate DRS affinity rules?
- what configuration constraints will you violate?
- will DRS work?
- will HA work?

I can go on and on. I hope you see just how complex capacity planning has become. As VM density on hosts continues to increase, capacity planning in VMware will become even more critical, because every physical server becomes more business critical and failure is not an option. Systems Management is fun again!!

Friday, May 9, 2008

It's all about the ratio.

VKernel is an advocate of running your hardware at high levels of capacity. I know that we would see record-breaking ratios of virtual machines to server hardware.

  • A major worldwide financial services organization achieved a 12:1 consolidation ratio and increased its central processing unit utilization by 30 percent.

  • An Indian petroleum refining and distribution company achieved a 17:1 consolidation ratio and expects to increase that to 30:1 with additional CPUs and RAM.

  • One of Italy's largest banks improved its server utilization rates by 100 percent.

  • A leading US faucet manufacturer saved $250,000 in hardware costs by reallocating existing units instead of purchasing new, achieving a 10:1 consolidation ratio.

  • A South American energy company consolidated its servers by a 20:1 ratio.

  • A federation of trade unions in Singapore consolidated its servers by 46:1, achieving a 26 percent savings.

Smallest is 10:1 and largest is 46:1.

It's all about the ratio.

Tuesday, April 15, 2008

VKernel Ships Capacity Bottleneck Analyzer

We shipped the Capacity Bottleneck Analzyer (PDF) virtual appliance.

Go download CBA and start reviewing your performance, resources, and capacity with an appliance that's quick to download.

You just configure the IP address (DHCP or Static) and then point CBA at your Virtual Center or ESX host and start collecting and reporting.

Friday, March 28, 2008

Swims like a mainframe.

I love the duck test. If a bird looks like a duck, swims like a duck and quacks like a duck, then it's probably a duck.

HP's new 8-way DL785 G5 looks like a mainframe wannabe.

Replace the mainframe OS ($$$$) and replace it with ESX ($$). Then replace mainframe workloads with virtual machines - basically VM workloads - both are consumers of disk, memory, cpu, network.

Mainframes try to run continuously at over 70% busy. A 90% figure is more typical, and modern mainframes could see sustained periods of 100% CPU utilization. You're going to need a capacity tool.

Typically, a mainframe is repaired without being shut down. Also, memory, storage and processor modules of chips could be added or hot swapped without being shut down. It is not unusual for a mainframe to be continuously switched on for 6 months at a stretch.

So maybe if it runs CPU like a mainframe, and has uptime like a mainframe, is it a mainframe??

Check out the numbers:

8 sockets (up to 32 cores)
64 DIMM slots, (Up to 256 GB of RAM - 4 GB max per slot)
11 PCI-e expansion slots (3 x16 slots, 3 x8 slots and 5 x4 slots)
2.3 terabytes of internal storage

When the 8 GB DIMMs ship - this could be 512 GB of RAM.

HP is aiming these behemoths are two roles:

1) Very Large Database Systems (VLDBS)

Very large database servers with massive data buffer caches.

2) Very Large Virtualization System (VLVS)

These are going to push capacity and virtual machine counts to new historic levels.

A huge issue with these massive systems into production is finding better/smarter management tools that can help you identify potential capacity bottlenecks and gather capacity and performance data. Oh and don't forget about VM chargeback.

The key to these beasts looks like the Opteron chipset - no shared memory bus - each processor has its own memory and I/O bus. Sun's Sun Fire X4600's also running's eight sockets and Opteron's. I can't imagine Intel is going to stand for that - new word of the week - octal core.

The mainframe folks are seeing a return to shared processing of the very large systems, so it may not be a mainframe per se, but this system sure quacks and swims like one. Except it csts like a server.

Tuesday, February 12, 2008

Servers are no longer a "Resource Boundary"

One of the hardest concepts for System Administrators new to virtualization to understand is the shared resource management. VMware ESX makes it possible to share resources namely memory, cpu, storage and network not only inside a physical host, but also across multiple physical hosts. The resources are pulled together to create one massive resource pool captured in a concept called a cluster. Even resources inside clusters can be further subdivided into many Resource Pools. For admins who are only used to dealing with physical servers as resource boundaries this can be confusing, especially when it comes to planning and management of capacity. For example when monitoring or determining resource capacity, Admins must now take into consideration how all resource boundaries are affected. Looking just at physical servers is no longer an option!

Friday, February 1, 2008

How many new VMs are you adding per week?

How many new VMs are you adding per week? This is very important question, because it has major implication to capacity availability in your ESX data center and ultimately performance. Every VM you deploy will consume cpu, memory, storage and network resources. It will also add additional disk I/O. It is easy to see how, if uncontrolled, you can quickly run out of resources and develop capacity bottlenecks. Of course the trick is to figure out which resource you are going to run out of first? Will you hit the bottleneck in memory, cpu, storage, disk i/o or network? The answer is it really depends on your environment, but in most cases the first bottleneck is memory. Why? Remember you were able to virtualize servers, because they were under utilizing CPU. That is what enabled you to combine 8+ plus servers on one piece of hardware. When you think about memory, it is a different story. Just because your servers are now virtual, it does not mean they are consuming less memory. Hence that's why in most environments the first capacity bottleneck is memory. What do you think the second capacity bottleneck you are likely to hit? Let me know at abakman@vkernel.com

Tuesday, November 20, 2007

Are you ready to SHARE your resources?

Sharing what? Resources? Memory? CPU? Storage?

There is an entire generation of Sys Admins now that has grown up with a distributed computing data center where one application is normally run on one server. This mostly happened because of Windows instability. Most administrators did not want to deal with trying to troubleshoot OS problems and multiple application problems at the same time. The threat of the infamous "Blue Screen of Death" defacto created this one application one server architecture. In this world admins did not have to think about or worry about sharing of resources.

Welcome to server virtualization where sharing of resources IS the primary idea. Sys Admins now will have to get used to the fact that their VM may suffer performance degradation as a result of its neighbor VM running on the same hardware and consuming a disproportionate amount of CPU and memory. So now Capacity Analysis and Capacity Monitoring becomes important again just as it was back in the mainframe days. Sys Admins now have to really pay attention to "Who is consuming what resources". Capacity Analysis is not a one time event. It is an ongoing activity. In fact many System Administrators I have spoken with are already spending a good chuck of their time troubleshooting capacity related bottlenecks in their environment.

The problem will only get worse. As organizations continue to add Virtual Machines at an exponential rate, this problem in fact will get exponentially more challenging. The unpredictability of work load management will make Capacity Analysis a required activity that will have to be performed at least daily. Just because you have used VMWARE Capacity Planner for your initial P-to-V conversion, you have to realize that it was nothing more than initial sizing. As you continue to add more virtual systems to the mix, many of the previous assumptions made by the Capacity Planner will no longer be accurate.

I love virtualization! Let me know what you think and email to abakman@vkernel.com

Alex Bakman

Sunday, November 11, 2007

Virtualized Dataceter Brings New Challenges

Right now an average US corporation has about 7% of its Datacenter virtualized. As organizations continue to virtualize servers they will face 3 new management challenges:

1. Explosion in the number of virtual servers. User have already figured out just how easy it is for IT to create new virtual servers. The number of requests for new virtual servers will continue to skyrocket

2. Sharing of resources: memory, cpu, storage and network . In the traditional data center where one application server was dedicated to one application, no sharing of resources took place. That's not the case anymore in the virtualized datacenter

3. Servers have grown "legs". In the traditional datacenter we did not need to worry about servers moving around the network from one location to another. Now we do.

In the next post I will explore how VMWARE administrators can address the 3 new challenges

Friday, October 19, 2007

How to Calculate how much to Chargeback

While most agree that charging departments for computing resources consumed is the way to go, many get stuck with the question of "How do we compute what what we need to charge for Memory, CPU, Storage and Network usage. Since you have many departments that share Virtual Datacenter, how do you go about figuring out how much to charge users per every GB of memory used, or for every Ghz of CPU consumed. This is a real brain cramp!

It took us a little while here at VKernel, but we "cracked the code" on this one. We created a spreadsheet that takes into consideration your ESX hosts, storage and network devices and what you paid for them, how many user departments you have, who is using the resources, cost recovery timeframe, etc and automatically calculates rates that you should be charging your users per day for memory, CPU, Storage and Network.

As you make changes to your infrastructure simply update the spreadsheet and it will recalculate the rates. You can download the Calculator and the White Paper that describes step by step how to use it from

http://www.vkernel.com/resourcecenter/methodology/


Let me know what you think?

Thursday, October 18, 2007

Charging Customers for ESX Resources Should Be Fair

Many organizations are trying to figure how to do chargeback in a virtualized environment. For technical folks it is not an easy task. They understand the feeds and speeds but don’t know how to translate Gigabytes and GHz into dollars and cents. Conversely, accounting types understand cost recovery, but can’t quiet grasp this virtualization “thing”.

It is actually not that hard. As IT embraces “utility” or “on demand” computing, it can borrow from many lessons learned by companies who provide us with electricity, oil and gas. As every consumer knows, your utility company charges you for the amount of utilities you consume. If you use more you have to pay more and conversely if you use little you pay little. This approach is fair and easy to understand

Many initial attempts at chargeback are based around charging a flat rate per VM. It goes something like this. My server costs me X dollars and I can approximately host 8 VMs on a dual processor machine therefore I should charge every client X/8 per VM.

While simplistic, this approach is flawed in many ways:

  1. We all know that VMs consume vastly different amount of resources (cpu, memory, storage, and network). A busy MS Exchange server supporting thousands of users is consuming a lot more resources then an old application server used by a couple of people. It is simply unfair for IT to charge the same price to all users.
  2. A flat per VM model does not capture many other costs associated with running a data center In addition to consumable resources, IT must recover for software licenses, electricity and cooling, administrative cost and many other expenses. While some of them are “fixed” expenses, many are variable and must be adjusted for each billing period
  3. Remember the original reason why many application servers were virtualized in the first place – they were underutilizing resources or ran on old hardware that was getting impossible to support. These servers consume hardly any resources and users wanted to save money by virtualizing them. To turn around and charge the users a lot of money for these underutilized servers is not right.
  4. The fact is that most VMs are shared applications used by many departments. Some VMs are used by many while others are dedicated to a particular group. Furthermore to say that all departments use VMs equally is not based in reality. For example, take SAP. I am sure that people in finance spend a lot more time in SAP then people in IT. How do you account for this uneven usage between departments? Flat model breaks down here again.
  5. Here is another problem. In dynamic utility computing, resources are allocated on demand. If you need more capacity for a business application, another VM gets launched and consumable resources get allocated for it on demand. How would you keep track of resources in this scenario. Again per VM flat charging model breaks down

The only conclusion one can draw is that chargeback needs to be based on consumption of resources and services. That way departments only pay for resources and services they actually use. Life is not always fair, but maybe at least in the new Datacenter it can be J

Let me know what you think.

Alex Bakman abakman@vkernel.com