Outsource Software Development

Should companies outsource?

Software codeA great many companies have got themselves into an unusual situation. They have teams of software developers working around the clock, but the problem is that they’re not software development companies. This is actually an insane state of affairs.

Why would an insurance company need a team of software developers? Or an asset management company? Or a health service provider? Do they have teams of electricians on staff? Or teams of actors for adverts? No, for those, they get services provided by vendors, not staff members.

Sometimes these companies hire contractors and then pat themselves on the back for focusing their permanent staff hiring on their core competencies. Except… who exactly is managing these contractors? Why, it’s the selfsame people who would be managing them if they were permanent staff! It’s internal management, people who’ve come up from the call centre, or retail operations, who now have to manage software developers.

This is not a good situation to be in

Cat herdingSoftware developers are difficult to manage. A lot of the usual tricks simply will not work. It’s been compared to herding cats, mainly because they tend to be clever and cynical. Ra-ra techniques will fall flat, and threats, well those only work when the subject is afraid that they won’t get another job – not normally a concern for any halfway decent developer.

So how is your manager going to manage these people? Well, the answer is normally “poorly”. Don’t get me wrong, there are some fantastic software development managers who don’t come from an ICT background. But they’re not the rule, they’re the exception.

What should we do?

Well, if you’re not a software development company, you ideally shouldn’t have lots of software developers working at your premises. You should outsource to companies which specialise in creating, motivating, and growing effective software development teams.

So outsource everything to [insert favourite Eastern country here]?

Not at all. I am most definitely not a believer in outsourcing work you don’t fully understand to someone thousands of miles away who you don’t have the tools to adequately assess. No, you need to find a local partner to outsource to. A company which can put teams in your premises to speak to your people, one which will provide real advice and insight, and one which will tell you when you’re wrong.

Such a local partner may well outsource some or most of the work to some other country. That’s okay, it’s not necessarily inefficient. They’re the ones dealing with the pain of the remote vendors, not you and your management team. They have the expertise in the field, to know when the remote vendors are delivering efficiently or not, when the partner is making up in volume what they’re not charging you in hourly rate, how to identify when the vendor is trying to slip junior staff in as senior staff. You get to focus on giving your customers what they want, instead of being forced to focus on the intricacies of software development.

But bodyshopping is cheap

This is a common misperception because you’re not adding up all the stuff you need. Either you’re sitting in a situation where you’re using tools illegally, or you’re not providing the right tools to your team, or you’re paying a small fortune.

Software developers are expensive, and their tools are comparatively cheap. The note here is comparatively. Most companies think they just give the devs a development environment and off they go. Not so fast! What about code analysis, code reviews, timesheets, training, automated testing tools, performance testing tools, security testing tools, task tracking tools, bug logging software…

My rule of thumb is that at least one months salary a year should go to tools for the developer.

Some companies believe that the contracting house provides the tools. Not many do actually. At Palantir we expect our developers to work at client and on product, and because we need the tools for our product development, they’re also licensed for those tools at client. We also budget for regular replacements of hardware for the developers. Similar story with training, most body shopping companies don’t focus too much on training besides rote motions to get partner points.

Finally we prefer each of our developers have their own private work space, because research tells us that productivity among developers jumps up to 2.6 times in private offices! I’m going to guess that your office environment is open plan.

Most body shopping companies however, provide just the person. They’ll normally fudge over the issue of licenses. So there you are, thinking your team is licensed, and in reality they’re not, and it’s your responsibility. Developers don’t come cheap, but cheap developers can be even less cheap in the long run.

But I get exactly what I want

Please let me get what I want this timeSure, you do, but here’s the question: how much experience do you have with software product development? Because a lot of what needs to go into a maintainable, cost effective solution over the long term is “under the hood” kinds of things. Are you driving those decisions? Do you know what those decisions even are?

At Palantir, one of our specialities is rescuing troubled systems and teams. Systems that have become clunky and outdated, where it takes forever to get the smallest change made, and there’s numerous bugs, and common production outages. Systems which are not only easy to hack, but have no way of letting you know that it’s even been compromised. Where do those systems come from?

Your brand new system that does “exactly what you want” in a couple of years!

Wrap it up

Stop trying to focus on things that aren’t your core competency. Find a good software partner and work with and through them. You want opinionated partners, not ones who will roll over to your every desire, because you want partners who know the best way to do the work, and stick up for it. Ask about things like application security, performance, maintainability, and supportability.

Beware of managed service/software development hybrids. The partner should either be selling you the software as a product or be selling you the software development as a service. They should not be selling you the software development as a product, because, trust me, you’re about to get the worst of both worlds. If they’re doing “custom software” as a managed service it means that their interests may not be aligned with yours, a very dangerous place to be.

Looking for an opinionated software development partner? Palantir are opinionated, and have assisted some of South Africa’s premier financial services companies in getting their teams and software under control. Contact us at solutions@palantir.co.za.

Organisational Clock Speed

Below is an edited version of the speech I gave at the ITWeb Software Development Management Conference 2015 about organisational clock speed.

Adding Capacity

How do we build capacity? What do you say when your boss comes to you and says “Fred, we need to double the amount we delivered last year”. What’s normally our first thought? “How many more people do I need in order to double my capacity?” Logically the answer would be double the people?

Of course it’s not as simple as just doubling the amount of staff is it? Unless you have a huge amount of spare time, you’re going to have to delegate a lot of tasks in order to keep the team well managed, so you probably need a team lead kind of person for your current team, and another one for the whole new team you’re creating. Okay, so a bit more than doubling and that’s just the start. When you bring in more people they all come with their own needs and wants. The bigger things get the more complicated they become.

The Mythical Man-Month by Fred Brooks explores how adding software developers to a project doesn’t increase overall productivity. There is an added communication and synchronization overhead to having more developers. From this hypothesis we’ve derived what’s called Brooks’ Law: adding software developers to a late project makes it later. What this means is that you’ll need to more than double your number of staff in order to achieve a doubling of output.

Gordon Moore, from Intel, observed that the number of transistors in a dense integrated circuit like a CPU doubles every 2 years. Put simply computers double their performance every 2 years. But lately they’ve had to do some pretty clever tricks to achieve that performance, and currently the state of the art is multi-core CPUs. The problem is that CPUs also suffer from a variant of Brooks Law. Adding an extra CPU to a single CPU does not double performance. There’s synchronisation that needs to be done, locks are needed for shared resources, all this puts in place an overhead that gets worse as you increase the number of CPUs.If transistors were people

If you think about this in human terms: if you a have one developer and you need to double capacity you get a second. They sit next to each other, they share information, and they probably manage themselves. Now if you have 20 developers, suddenly you need 20 more. The logistics become more difficult to manage, as does communication. Who sits where? Who is working on what? Who is managing these developers? Now if you have 100 developers…you get the idea.

Currently there is a lot of talk about scale. How do we scale out? We add virtual machines, lots of them, each with an operating system and monitoring programs and network cards. We explode the communication and management overheads. We add Redis and Hadoop and Varnish, and a hundred other systems. We shard our data onto multiple machines. We do all this to spread the load for the flood of users we expect.

HighScalability.com recently had an interesting story about a system serving 10,000 daily customers. They had 130 VMs. That’s 77 users per server and they weren’t handling the load, which is just ridiculous. Plus it’s 2 or 4 CPUs per server. On an average that’s 1 CPU for every 20 users.

And what do you think all this computing power is doing? Calculating orbital trajectories? Bringing about world peace? Maybe creating a universe from scratch? Nope, what you have is almost 400 CPUs being thrown at this massive problem, a website! Just a website with a few simple pages.

Obviously this is absurd but what had gone wrong? They had made the faulty assumption that more machines equals more capability. That assumption is just wrong. To fix it, one of the things they did was cut the number of machines from 130 down to…One server

Drumroll maestro…

One, one machine. This one machine serves all 10,000 of their customers with excellent performance. Okay, I’m sure this one machine has got a fair bit of grunt, but even if it’s got 64 CPUs we’re still talking 156 users per CPU, and I think 64 would be overkill.

Now consider how much less this solution must be costing the company. We’ve gone from 130 VMs, with all those licensing and support costs to one. Imagine how many fewer support technicians they need, imagine how much quicker and easier it is to track down defects. Imagine how much simpler it is to develop against and deploy changes to. Imagine these improvements as a wave rolling through the organisation.

I was working on a huge project for a Palantir client, huge costs, huge required performance, huge team, and everything was huge. We were taking software designed to handle a large brokerage and then we were scaling it to handle lots of large brokerages. To accomplish this we created multiple databases, each talking to multiple servers, all with complex integration patterns between them.

The teams working on the project were huge and as such became disjointed. Requirements were misunderstood; communication and decision making were major bottlenecks. More team members were added and as a result more overheads were added. More servers and environments were added, which meant more system and management and support overheads. As a result of this people were working more overtime to maintain everything. But despite all those challenges we managed to get to a point where we could see the end in sight.

However, the complexity of the solution and the massive hardware required meant that the system operationally would cost more than the mainframe it was meant to replace. So what happened? They killed the project, and wrote off many, many millions of Rands. An abject failure by any standard. But it could have been worse. We could have worked for an extra year to complete the system, and worse yet, taken it live. The support for this beast would have crippled the client. Killing the project was the right decision; my regret is that it took so long to come to that decision.

Throwing capacity at the problem, at both a systemic and at a team level had made the problems worse not better.

Improving Capacity

Another client I worked with had some serious problems, it’s my company’s specialty: helping to turnaround struggling teams. Anyway, a lot of their problems appeared to relate to a lack of capacity. People were too busy, servers were overloaded, and project deliveries were few and far between. What was different at this client, and what inspired me to come and give this talk, was their approach to the problem.

They had a new CIO, who had brought in some counterintuitive ideas about capacity. First and foremost was the concept that adding more capability to a team increases liability, management costs, licensing and so on, which quickly reaches a point of diminishing returns. Instead they decided to solve for efficiency, simplify and optimise.

This is not as easy as it sounds, optimising complex systems and teams requires a different way of thinking, thinking with a systemic view: mindful of interactions and their effects rather than an isolated view focusing on individual systems.

Clock faceOne of the main optimisations was to focus on time. As an example, there was a process that had to start on month end and had to be complete by the next morning. The problem was that it took 16 hours. In order for it to complete in that time the machine had to be made unavailable to the business, which meant that the contact centre couldn’t work while this process was running.

The machine had to go down at 3 in the afternoon, and this meant that the business lost 2 hours for the entire contact centre every month. Calculate that out for a 20 person contact centre, that’s 480 hours a year wasted. Think about all those angry clients not being able to get answers to their queries.

And the work they couldn’t do during those lost hours doesn’t go away, so overtime was needed to catch up. Staff would have to come in on the weekend. Many of them relied on public transport so on weekend’s taxis needed to be used to get staff to work. The canteen was closed on weekends so catering was needed. All these costs add up, quickly.

Obviously, this run had to be carefully monitored; we couldn’t dare have it fail. So someone had to baby-sit it all night. That’s another 14 extra hours being spent on this process.

They could have thrown another machine at the problem but that would have other costs and difficulties associated. You might give the business back some of their hours, but you would pay for it with increased IT effort and management.

Instead they optimised the process. They got it down from 16 hours to 7 hours. They gave the business back the lost 480 hours a year. The system administrator got back 5 hours each month, 30 hours a year. Add to this less obvious cost savings, there was now this extra capacity of 9 hours more processing power on the server – so the machine could be used to do more things, to run 5 applications instead of only 3. Reinvestment of these savings is important because if done wisely it becomes a compounding improvement.

They called this “organisational clock speed improvements”. It’s not the catchiest title but we work in IT so it will have to do. Think back to our CPUs, we’re not adding more CPUs, we’re making our existing CPUs faster. No increase in overhead, no increase in licensing costs, but doing more nonetheless. This initiative was run as a competition between teams to see who could push back the biggest savings. They only tracked the direct savings, not the downstream and knock on savings, and they saved thousands of hours.

Sandals at the beachThey gave people their weekends back.

Now some of you might think that this reduction in required work could be threatening. What do you think when 15 days of your work disappear? You start thinking you can be replaced. And that’s maybe one way to use that saving but that’s not a reinvestment. Instead they encouraged the use of that time for the most important and under rated activity: thinking. Let’s say that for every 8 hours thinking, a person saves 1 hour a month for the organization. But that 1 hour a month is EVERY month. In a year you’re looking at a 50% return: 8 hours invested for a 12 hour return.

So they changed the way they thought about capacity and what was the result?

Actual spend on ICT was 10% lower than budgeted. With the same staff they delivered 48 projects in 2014 compared to 9 the year before. They had 10 major releases compared to 4 the previous year and all for 10% less money.

Simplification gives unintended and unexpected results. Simplification of systems, simplification of activities, results in more capacity from the same capabilities. This investment in simplicity is applied recursively, again and again, winding up the clock more and more, resulting in faster and faster pace and delivery. All with the same staff, but happier, less stressed, less tired, and more thoughtful.

You just need to change the way you think.

Organisational Clock Speed

ITWeb Software Development Management 2015

Tomorrow afternoon I’m giving a talk about Organisation Clock Speed at the ITWeb Software Development Management Conference 2015.

I’ll put the content up here after the speech is delivered.