System Etiquette

May 13th, 2009

My work desktop has 4 64 bit cores and 4Gb of RAM. While not exactly the cutting edge of machines its a powerful work horse. Yet, in spite of this, there is one thing I hate having to do with it - rebooting it. When I reboot my machine and log in the machine essentially becomes unusable for 10 to 15 minutes as it goes through various loads and checks. During this period doing anything with the machine feels like wading through molasses. For example, I tried to start Firefox this morning and it was at least 2 minutes before there was any visual sign that the application was actually running.

After this initial period of horrendous performance the machine settles down and is reasonably responsive for the rest of the day. The fact that the reboot is just so painful actively discourages me from performing it until I have absolutely no other choice. What really drives me nuts though is that it’s impossible to tell whats holding everything up. If you open the task manager the system happily reports that its sitting at 98% or 99% idle time and there is no significant network traffic. Under normal circumstances you would read this as the machine has next to nothing to do. I’ve thought that maybe its something like a virus scanner starting up that’s causing the issue but there’s no sign of any application taking the amount of processor time that you’d expect in such a case. It’s just a mystery!

This is not confined to Windows either. I don’t know how many times I’ve been frustrated in OS X when the little spinning beach ball comes up and just seems to linger on what should be trivial tasks. Plus I can’t blame a virus scanner on my Mac because it doesn’t have one.

In the past I’ve witnessed end users express exasperation with their computers for taking several seconds to perform a task. As a software engineer I’ve always been forgiving of these short delays as I had some inkling of what was happening to cause the delays. I have to admit that the issues that I experience with my current work machine go above and beyond the understandable and drive me absolutely nuts.

Processes running under modern operating systems should be mindful of the impact they can have where they have the potential to affect the overall responsiveness of the OS. If an application is going to have a serious impact when it starts up then it should start as part of the OS boot process and not start when the user logs in and attempts to interact with the system. If the process is non-essential it should either offer the user the option to terminate it or to defer it until a more appropriate time. It’s also easy for the authors of such snail-ware to forget that they will not be alone on end user machines or to test them only in an optimal environment. I think it’s time some developers started acquiring a sense of system etiquette.

The Futures (Not So) Bright

April 28th, 2009

It should be obvious to any level of consideration that a rise in the capability of computers and automation systems is going to eliminate the need for a great deal of human involvement for certain tasks. This article goes further, predicting massive job losses in the next decade as a result of advances in technology.

I don’t question the underlying precept of the article but I do wonder whether all of the cognizant factors have been considered. If we consider this a part of the move away from unskilled to skilled roles then we must consider what portion of the population can be considered skilled. A crude measurement, as it does not factor in any consideration of the actual skills required, would be the number of graduates. For both the UK and the US this figure seems to be between 20 and 30 percent of the population (see here for UK figures and here for US figures). If we consider this group as viable for skilled roles then that’s placing at least 70 percent in the unskilled category. If vast swathes of jobs in this category are going to disappear we are looking at fiscally crippling a substantial portion of the population.

Of course, such an impact has implications. If a substantial portion of the population are reduced to subsistence levels or below spending in the economy will be heavily curtailed. This leads to a deflationary spiral, where companies cut costs to attract the smaller amount of customers and cost cutting leads to further job reductions, exacerbating the original problem. Furthermore, such a shift can only lead to social unrest as a substantial section of the population would be affected.

So what can be done about this? Well, for a start, companies should certainly be giving more consideration to the cost-cutting nature of introducing automation. In effect, if the predicted shift goes ahead, they will be shrinking their own market, so is it really saving them money? Obviously there’s also an onus on governments to push the educational side, to encourage their populations to acquire the skills that will be needed by future roles. Not everyone can hold a degree however, nor am I saying that holding a degree gives you the skills you need. Personally I feel it’s going to take an entire paradigm shift to completely resolve this situation but, what that actually involves, I have no idea.

Concurrency

April 23rd, 2009

Concurrency represents one of the modern challenges to software development. As the number of CPUs in machines rises the question of how to exploit the hardware to it’s maximum capability is felt to be vexatious. Structuring software to be split across multiple processing units is widely regarded as a non-trivial problem yet one that needs to be addressed if the multi-core machines of the near future are not to sit mostly idle.

It was while thinking on this matter the other evening that it occurred to me that there already is another approach to concurrency that exists and that is virtual machines. With the application of virtual server technology we’ve added another level to the process, thread, fibre tree. Virtual machine technology has to address the same sorts of resource isolation and access issues that other concurrency technologies address but does so at another level. A virtual machine is effectively a collection of processes, isolated from other such collections by the hypervisor and with the hypervisor also controlling access to shared resources such as memory, storage and networking.

This is, of course, not a new idea. It’s always been the case that this was the arrangement when multiple actual machines where used. What’s new here is that this concept has now moved from the hardware side to the software side. I think it’s also the case that this may start moving from the server room to the desktop as we start to see machines with more processors and memory. With a hypervisor adding a layer of abstraction on top of the operating system I can easily envision a work platform where you have multiple OS installations, each optimized or configured for different uses.

Question now is, should we start looking at how we can adapt applications to this new environment and what impact would this have?

Start Ups

April 7th, 2009

I’ve worked for two start up companies in my career. Both followed a fairly standard model and both made a relatively rapid transition for a handful of employees to a more substantial workforce. I feel both had good products, although I wouldn’t say they were world changers. Both having been running for several years as of the time of writing. Unfortunately they both have something else in common too, they are both dying.

I spent several years working at both of these companies but this was in the past. Despite this I still had the occasional inkling to check in on them and see how they were doing. I’d honestly liked to have seen both do well. The reality of the start up market, irrespective of the current economic conditions, is that more will fail than succeed and this is certainly borne out in my experience of start ups.

There is another aspect that both these companies shared that I feel has had an impact on their impending failure. They were both oriented around the “big sell”. Both were looking for deals in the tens of thousands, hundreds of thousands or millions of pounds/dollars region. Both were promoting big ticket items and the only market for these were big companies.

Unfortunately, as Joel Spolsky points out in his Camels And Rubber Duckies essay this is a hard market to sell into. To successfully sell into this market requires substantial marketing and sales resources and takes months of schmoozing with executives at various levels in the customers company. If it all works out this can be a very lucrative sale with one or two of these big deals often being enough to determine the success of a company. Unfortunately most of the time these efforts are delayed to the point of frustration or simply never come to fruition.

The effort involved in assembling a company to chase these kinds of targets is very substantial. I’d already mentioned that both of the start ups I’d worked for had scaled up their staff relatively quickly to meet a preceived demand that simply never materialized for them. Such an influx of resources has a cost associated with it and I know that both these companies burned through millions during their lifetimes. The kind of time and money required to pull a company like this together means that it’s dependent on a number of things, not least of which are…

  1. Having access to levels of financing required.
  2. Confidence that the product has a reasonable chance of success.

There’s a number of things about these two points that mean the majority of people will never be in a position to be a founder in a big ticket item company. Firstly, I don’t personally know anyone that has the kind of money required to back one of these companies. Sure, there are banks and venture capitalists out there that can provide this kind of cash but this then connects to the second point. As the person with the idea you may have a great deal of confidence in it but that isn’t necessarily going to be shared by the people with the money. In fact, to get access to the cash, you’re going to have to divert substantial effort and resources into putting together a pitch to convince the finance people that you’re worth the gamble. The more money you need up front, the more difficult this becomes.

So, given the statistics on start up failure rates and the high cost of running such a business in terms of financing and resources, why would you want to start a business like this? Well, if you’re in the position I’m in (which I suspect most people are), you wouldn’t. That is to say, if you don’t have a very large sum of disposable cash ready to hand and nothing better to do with your time then this kind of company doesn’t represent a good investment for you.

Luckily, with regards to the software world, there is an alternative available. Although the term Web 2.0 seems to have fallen from grace the concepts and practices remain valid. Bootstrapping a new software company for a small amount of money is a viable alternative in the current times. The need to commit a large amount of time is still part of the deal but it’s possible to bring an idea to fruition without hiring the number of staff that were previously thrown at these kinds of efforts. Also, with overheads kept to an absolute minimum it’s possible to move away from the big ticket item approach and offer the software for a much lower cost. Rather than relying on getting one or two huge sales to survive and prosper you can work towards many smaller ones. If this article has even a fraction of truth in it, then this approach could also be a very lucrative business opportunity indeed.

Even with this approach we’d still all be subject to the start up failure rates referred to above. Yet there are numerous stories of the early failures of people who’ve eventually navigated one or more businesses to success. The one element common to all of these is that they say you have to pick yourself up, learn the lessons from the failure and try again. This is a difficult thing to do where the failed venture has just consumed all of your resources and future ventures are also front loaded on their resource requirements. With low overhead start ups this “try, try and try again” model becomes one that’s viable for a wider range of potential founders.

Load Testing With JMeter (Part 5)

March 30th, 2009

This is the fifth part in a series of posts where I’m covering my experiences in working with the JMeter application to perform load testing on a web based application. The first, second, third and fourth parts in the series can be view here, here, here and here respectively. In this post I’m going to address expanding my testing so that the load was being generated from more than one machine.

Modern computer architectures are moving away from faster CPUs and towards a multiple CPU approach to providing more capacity. Most current desktop machines would contain 2 cores, with some high end servers having 8, 16 or more. Despite this there are still limitations to the amount of load that these machines can be used to generate. As I noted in my previous post in this series, memory is a key resource here and, although 64 bit architectures have raised the bar on it’s availability, it’s uncommon to have access to machines with more than 16Gb of RAM. Ultimately what this means for load testing is that load generation is going to have to be spread across multiple machines.

I’d assembled my test script on a single machine and, while this was quite a powerful machine as desktops go, I’d hit a wall on the load I was able to generate from it at around 1500 users. The machine was running a number of other applications (including an Oracle database server instance) and I had to hand the machine over exclusively to the load test for the duration of the run. I didn’t feel that I could successfully push it further. I flagged the fact that I’d need extra machines to my boss and the search for extra work horses was kicked off.

In the end I acquired a good selection of machines to perform the testing. My own machine is a quad core 64 bit Windows desktop with 4Gb of RAM. I managed to obtain one other identical machine and a third machine with a similar specification but more RAM. Once I’d identified the machines I would be working with I needed to configure them for use. The obvious first step was software installation. I installed the latest Java Runtime Environment on the extra machines and then downloaded and installed the JMeter application. I updated the jmeter.properties file on all machines to allow them to feed back results in batch mode and upped the memory given to the Java instance running JMeter to 1Gb in the jmeter.bat file.

JMeter comes with a jmeter-server.bat file to run the application in server mode for distributed testing but I found this didn’t work for me. I got the feeling that the problems originated from the fact that their were spaces in the path to my Java installation. To get around this I put together a separate script that fired up the Java RMI registry application and then execute the jmeter.bat file in server mode. With this in place I had all the components I needed to start doing some distributed load testing.

I intended to use my own machine both for load generation and for results data so I needed to configure the two other machines in my cluster in my local copy of the jmeter.properties file. To do this I simply had to add the IP addresses of the test machines to the remote_hosts entry in this file. This entry contains a value of 127.0.0.1 by default, which represents the local host, so I didn’t have to explicitly add the IP address of my own machine. The JMeter documentation for distributed testing has warnings about subnets and firewalls but I found that these did not apply in my case. Once I’d configured the jmeter.properties file I restarted the JMeter application and was able to see my configured entries under the Run -> Remote Start menu. There’s also a Run -> Remote Start All option for kicking off all configured entries simultaneously - this was the option I would use.

When running locally JMeter has a small counter of active threads that it displays in the upper right hand corner of it’s window. Unfortunately this counter is not updated when running in distributed mode, even if your local machine is part of the load generation. The indicator still goes from grey to green and back again to show the start and end of testing though.

Initially I had some issues with the application configuration while trying to do distributed testing and this highlighted what I consider a weakness in JMeter functionality. I’d set my test plan to stop on errors but, if an error did occur on a remote load generation machine while I was testing, the test stopped but the application did not clean up properly. In such a case the remote instance would be left hanging and uncontactable. I found that I had to shutdown and restart the RMI registry and JMeter server on the remote machine to get around this. If this happens a lot it can be very frustrating.

Once I had all issues ironed out, running in distributed mode was almost as simple as running locally. The only thing to note is that JMeter doesn’t divide the work load specified across the available remote machines. Instead it creates the specified work load on each of the remote machines. In my case I had 3 load generation machines and wanted to generate a total of 3000 users against my test application. This meant configuring the test plan to start 1000 users.

This is my final post in the series on JMeter and I have to say over all I was impressed. Load testing was a task that I was not looking forward to but JMeter has helped make the process a lot less arduous than I’d originally imagined. As an open source tool it may not be as slick as some of the commercial offerings out there but it shouldn’t be undervalued either. There’s clearly been a lot of work gone into JMeter that insulates the end user from a great deal of the underlying complexity. Having said that its a tool that is perhaps better suited to users that have a more technical view, with an understanding of protocols like HTTP or how to use regular expressions being examples of the kind of knowledge that is beneficial in using JMeter. Given the option I’d certainly be happy to use JMeter again if the need arises.

Load Testing With JMeter (Part 4)

March 27th, 2009

This is the fourth part in a series of posts where I’m covering my experiences in working with the JMeter application to perform load testing on a web based application. The first, second and third parts in the series can be view here, here and here. In this post I’m going to address my experiences in executing JMeter to perform load testing.

I initially started my load testing efforts on my development machine. This is a substantial enough machine, being a 64 bit quad core machine with 4Gb of RAM running Windows XP. I initially started by using the jmeterw.cmd file to run the JMeter GUI. This is fine for test plan development but I’d recommend opening a stand alone command line and using the jmeter.bat file once you start applying larger loads. The reason for this is that JMeter writes some exceptions (more specifically, out of memory exceptions) to the command line and these won’t be visible unless you run the system in this way.

During script development I obviously ran with just a single user to insure that failures related to the test plan rather than any concurrency that I was applying. Once I had the script to the point that I felt it was ready for use I decided to up the load in increments. I initially started with simulated load of 10 users. Being new to JMeter this was a cautious first step and also intended to eek out any potential concurrency issues at a smaller load.

With a successful run of 10 users I next upped the load to 50 users and from there to 100 and 250. Gaining in confidence I next upped the load to 500 users and here I hit my first JMeter specific issue. I started the load test with 500 users and it initially seemed to be running fine. Shortly after it got over 100 users however the GUI locked up and became unresponsive. A check on the system monitor also showed that it seemed to be thrashing one or more of the CPUs on the machine.

The JMeter web site had some details on out of memory issues with the tester and I had a feeling that this might be the cause. I shutdown the system, cleaned up in preparation for a new run and went about checking how I could get around a memory issue. The documentation I read provided details of editing the jmeter.bat script to make more memory available to the Java VM. Up to now I had been using the jmeterw.cmd file so it meant a switch away from this but, as I mentioned above, I feel that this switch was the proper way to run for true load testing purposes.

Once I had edited the jmeter.bat file and made the switch to using it instead of the jmeterw.cmd file I was able to drive the load to 500 users. I continued upping the number of users in the load test until I finally reached 1500 simulated users. At this point I had to give the Java VM 1Gb of heap allocation and I was running short of RAM on my machine. To counter this I had to shutdown other applications on the machine, as I found that using applications on the machine while the load test was running was adversely impacting the response times for the test.

Load testing with up to 1500 users had shown that the application had dealt with the volumne of requests in a manner that was satisfactory. I’d only managed to get half way towards the total load I wanted to apply however but it was now time to move from generating the load from a single machine. Moving to a multiple machine test environment will be the subject of the next post in this series.