Individual Report WHY ARE COMPUTER PROJECTS FREQUENTLY LATE & OVER BUDGET An Assignment for System Spec & Design module in the University of Central England By: Harry Sufehmi, student id #98799231 Abstract A report discussing about computer projects, why the majority of them is late and/or over-budget, and some of them finally failed. Some cases provided for further study. Also recommendations are given for each point, to avoid or remedy the problem. Contents -------- Chapter 1 - Introduction................ page 1 Chapter 2 - Findings.................... page 2 Chapter 3 - Discussions................. page 3 Chapter 4 - Conclusions................. page 11 Chapter 5 - Recommendations............. page 12 References list......................... page 16 Chapter 1 Introduction A brief look at the problem --------------------------- Slowdowns in the fast lane? People associates computers with the word "fast". Everything computerised must be fast, and efficient. Little does they know, that the path to the "computerised" is hard to follow through the end, and contains many pitfalls along its way(s). Managers trying to computerise their company often found the ugly surprise that the project goes way over the time allocated, and way over the budget given. Even outsourcing the projects will not always be enough avoid those pitfalls - many disgruntled managers have already told their horror stories with consultants that not only had failed to finish the project on-time and on-budget - but also walks away, leaving the project unfinished, and a complete mess to deal with. One that follows the progress of computer world only from the magazines and media hypes will surely be confused, since now should be the era of integration, open-platform, easy of use, and other buzzwords that seems to confirm that implementing a computer-system nowadays would only take a snap. Not true. Implementing a computer system is only easy to be said and imagined, but the process in reality could be nightmarishly complex. The number of technical details that must be handled is enormous; what hardware to use, whether they could co-operate with each other (open-platform is still not that open yet), what software to use, could they co-operate with each other, are they compatible with all the hardware, is they stable, how about the support, is our staff able to handle it, to outsource or to not outsource, etceteras - then you will have to make the schedule and the budget based on those variables. You've got the big picture of the chaos now. In fact, from a survey done by Controllers Council of the Institute of Management Accountant (IMA), none of the members surveyed report that their computer projects meet the deadline and the budget allocated. The survey covers 2000 member companies with revenues ranging from US$ 1 million to US$ 1 billion. What is causing all of this? And the most important, how to avoid it? This report will try to discuss some of the findings. Also some recommendations will be drawn, to help avoid the readers from falling to the same traps. Chapter 2 Findings Some shocking truth revealed ---------------------------- In this chapter I'll present some of my findings about the topic. Surprisingly, many managers falls for the same failure again, and again. Let us browse some of it. 1. Not enough time given to gather the necessary data 2. Not enough time given to make a good design 3. Falling for the hypes 4. Using untested, cutting-edge technology in production environment 5. Testing a new program in the production server 6. The danger of outsourcing 7. Inadequate testing 8. Runaway consultant 9. Over-ambitious project 10. Never trying to restore the data backup 11. Pressing the budget too tight 12. Not preparing a backup communication link 13. Hiring a consultant, and never/rarely checks their progress 14. The Mythical Man-Month, the mishandling of a late computer project Chapter 3 Discussions Study cases and other informations ---------------------------------- 1. Not enough time given to gather the necessary data This mistake is very common. Usually, this one is caused by top management that don't know much about computer system design, and don't care. They thought building a computer system, be it a hardware system, a software system, or both of it; is as easy as buying a car. They'd just say, "Well gentleman, let's have our database computerised. I give you 6 months to do that:". If the IT people don't have power to bargain with the management, they'd just have to do so in the time-limit given, or their job would be in danger. So then it'd be only natural if they skipped the extensive data-gathering session, and step right into the coding process. No wonder later the system is plagued with problems, from bad user interface, bugs, un- normalised (unoptimised) database, system that doesn't fulfil the user's needs - all that leads to extended, development (patching) time and budget. 2. Not enough time given to make a good design Almost the same like above, but lack of time to design mainly cause a system that is non- upgradeable. Of course, development time is short, sometimes could be VERY short. For the short-term, this would be thought as a saving of both budget and time. But, in the long term, this means re- developing the application, since the system couldn't handle any increase in number of products to support, or changes in products, or changes in procedures. Also, lack of a good design could means that the system/program specification would be modified many times in the middle of the development. This would only frustrate the programmers (becomes the incentives for them to work slower - to anticipate as many changes in the design first before continuing), and delay the finishing date. Sometimes, this mistake is also caused because of a non-competent system designer/system analyst. This could be easily be identified when the company's operating procedure is already standard, yet the design specification for the computer system keeps on changing. 3. Falling for the hypes A computer hardware/software is a very sophisticated thing. It is not always easy to say which is the best of them because of that. Also, the criteria to be the best itself is plenty; there are speed, flexibility, upgradability, technical excellence, stability, support, etc. So choosing the right computer/software inherently is not an easy task. Advertising is one of the evil in this case. Because of advertising, many people believes in a product, even though the product is inferior. Case study: Microsoft. Product: Windows NT. Microsoft is very notorious in the circles of system administrators. Many have the experience of being pressed by their manager to use the Microsoft's product, while the existing software is doing just fine. Seems incomprehensible, but yet it happens, again, and again. This is happened because the manager has been falling for the hype. The outcome? Because the transition is so pushed, system unavailability or system performance degradation is common. System administrators that have long enjoyed the 99,995% availability of a Unix system or the raw speed and flexibility of a NetWare server, is now forced to constantly monitor and tweak their Windows NT server to ensure the longest run possible with decent performance - causing less time is available to do other task, such as making a good upgrade plan for the computer infrastructure. Another common outcome is upgrade; a hardware that was enough for their old software must be upgraded first (in some cases upgrade is not enough, a new computer server is in order) before it will run WindowsNT with satisfying performance. Some system administrator then forced to use hide-and-seek approach. Quietly, they install Unix software, usually Linux because of its price (free), and run some of the services (usually mail and DNS service) in that machine. They risked their job if their boss finds out about that, but system performance is so highly intolerable they simply must do it to keep the company running. 4. Using untested, cutting-edge technology in production environment The newest technology is always inviting to be tried, but an IT manager should be careful about it. Of course, the newest technology will offer more bandwidth, more capacity, more speed, in general - more power; but there is one more criteria that should be on top on the list of any IT manager - stability. New technology usually lacks this; at best because the lack of global-scale testing, at worst because of the rush since the design process in the manufacturer - hence producing a product that's flaky. If a company needs to use a new technology; it really should find the information about the device/software from somebody that has use it in its production environment already. In case that nobody could confirm the new technology's stability, then after purchase the device/software should be tested rigorously in a separate environment. This way, if anything goes wrong, the production of the company won't be affected. 5. Testing a new program in the production server When a module in a computer system is updated, it should always be tested in a research server first. Only after the Quality Assurance team grade it pass, then the module may be used in the production server. This mistake could be caused by : # User urging the programmer to release the module immediately. # Lack of quality assurance team. # Breach in security setup (programmer have access to the production server, etc). Whatever the reason is, this mistake should never happens. Damage ranges from nothing to destroyed data, everything is possible. The risk is simply too high. 6. The danger of outsourcing Case study: Takaful Insurance This particular incident happened in their first days of operations. They needed a marketing system, but the only programmer is already busy doing other projects. So, they hired a consultant to do it. The system is finished soon, they're happy, asked the source code, paid the consultant, and started using the system. After several years, they need to modify the program, so they give us the source code to modify it. Problem is, the source code is spaghetti-coded, barely readable. Anyway, we tried first. After wasting a week trying to comprehend it, we gave up. The system is too complex, and being coded badly make it even harder to understand. So we setup a new project, to build this system from anew. The new system is finished after 3 months, involving 30% of our IT departments resource full-time in the process, leaving the operation with just 30% of the staff (about another 40% is already involved in other projects) - causing another important project to be left stranded. Then they still have to attend it part- time to squash bugs from the program for another 4- 5 months. Instead of being able to use the system after modifying it a little, we must rebuild it again from ground up. A waste of time and resources. 7. Inadequate testing "Patience is a virtue" This is especially true with new system testing. A new system should ALWAYS be tested first for as long as the Quality Assurance people think it should, before it is deployed in the production environment. Even after that, it should be run side by side with the old system for several months. This is because however exhaustive the tests was, it'd never be able to simulate a real production environment. In essence, running it in the production environment is just like as if we are giving it another different test. So in case there are still bugs that was missed in the testing phase, the company would be able to switch back to the old system and continue the business, while the engineers fixes the bugs. That's why it isn't surprising if changing a system could take anywhere from 2 months to a year. A notorious example of lack of testing, quoted directly from Paul A. Strassman's article, is presented below: "The near bankruptcy of Bank of America, once the largest and most powerful of U.S. banks, is an infamous case of what can happen when a company switches computer systems. In 1983, the bank installed a new system to handle the rapidly expanding volume of security trading. The application involved conversion from an existing computer system, but also required switching to a new accounting method. In the banking industry, such conversions frequently occur because the technical limitations of computer hardware and software generally do not allow for expanding operations without a complete overhaul. Computer managers at Bank of America did not run the new system and the old system in parallel to correct faults. Then, the bank's executives switched to the new programs without any backup. The new system and the old one were sufficiently different so that mistakes remained uncorrected. It turned out that on any day only a small percentage of the security trades was improper, but nobody could tell for sure which transactions were good and which were bad. Meanwhile, trading continued at an accelerated pace, accumulating an enormous number of doubtful records. All in all, billions of dollars of security transactions were unaccounted for. The failure of the software came close to destroying the bank." 8. Runaway consultant Another danger of outsourcing, is runaway consultant. Computer projects business is indeed a strange business, where you paid an integrator consultant in time-and-materials basis. Unlike construction business, where the time and price involved are agreed first, no way the contractor could move beyond that or it will face the lawsuit from the client. In the computer projects business, when the project goes awry, usually there is not much that the client can do. And, it happens very often, proving that outsourcing jobs to the external consultant would not always solve the problem. Case study: NAI Block Inc - Salt Lake City NAI Block Inc, which provides test and assessment services for the construction industry, faced problems with their consultant. The consultant just went away, leaving them and the half-finished system. They had to regroup their people on the project, review it, and at the end they are forced to terminate the project because it's already too messy there is almost no hope for its continuity. Finally, they started again from scratch. 9. Over-ambitious project Quoted from Paul A. Strassman's article: "Max Hopper, one of the architects of American Airlines' Sabre system, found a mess when he became the chief information officer of the Bank of America. Systems were technically obsolete, poorly organized, and incompatible. Auditors accumulated a long list of deficiencies, some of which threatened the integrity of the Bank of America business. To solve the problem in one giant sweep, Hopper proposed installing the transaction processing facility (TPF) as the master system for all Bank of America processing. TPF was an improved version of a system running the American Airlines' reservation system. The proposed budget of $4 billion, the largest nonmilitary computer project ever reported, was discussed in magazine articles as an imaginative move to restore the bank's "competitive edge." TPF had a reputation as an exceedingly fast and expensive system built to handle thousands of transactions per second. TPF also required centralizing computer resources on a scale that was unprecedented in the banking industry. Such a concentration did not fit well with the highly decentralized traditions of Bank of America. Hopper proceeded to install TPF, taking money from existing systems that already were performing poorly with the hope that the replacement systems would finally deliver much-needed improvements. The problem was how to justify the massive expenditures for TPF. Bank of America's current transaction volumes did not need that sort of computing power. To gain the expected benefits, the bank would have to reach the ambitious marketing objectives established by business planners. If the projected volumes missed, TPF was the wrong solution. If TPF was not available, the new markets could not materialize using the existing systems. Plans for a large number of new products, such as banking by phone, banking by computers, and terminals in all client stores, never materialized. Hopper resigned and the CEO of the bank retired. Afterward, the systems investments were redirected to patching up existing malfunctions. The risk of TPF creating organizational, procedural, and technical failure were too great to add to the already precarious financial position of the bank. The Bank of America case offers an excellent study of information systems strategies that got ahead of the capacity of an organization to commit and execute a business plan. Information systems strategies cannot serve as the vanguard in attempts to reform an otherwise reluctant organization." 10. Never trying to restore data backup This was once happened to myself. The previous system administrator got a job in a bank, and handed the responsibility for performing data backup to me. I did it dutifully, doing so every scheduled time. But I didn't know that the backup must be tested. I just did the backup, believed in the tape drive faithfully, and thought that a tape backup is already the ultimate safety net for our company's data. Thus, disaster loomed. One day the server failed, and the backup can't be restored even after much fiddling with it. Fortunately, before I was too frustrated to reformat the server, I remember to try restoring the backup once again using another drive. It finally restored using another tape drive borrowed from my friend working in a bank's IT department. After some investigation, turned out the tape drive was dirty (I didn't know before that tape drive must be cleaned up periodically just like floppy disk drive), way too dirty to be able to be relied on again in the future. Strangely, it always gives an OK signal after every backup, so I never found out about the problem. Have I performed the restore test after every backup, I'd found out that problem quickly. The problem above costed us 1 office-day. Many jobs, including some computer projects, was halted and in the danger of discontinuity, if fate had it that the backup failed to be restored. We're very lucky that the outcome turned out to be the opposite. Not everyone is so lucky. 11. Pressing the budget too tight Company's concern about pressing expenses collides many times with IT department's projects. I was situated in this situation for several times. Once, we need to buy a new server since the previous one is dead, and we're running on backup servers. I was comparing prices for weeks. It is completely normal though if I come to think about that again, the company is still new so it's still very picky about spending its cash. The result would be very satisfying for both of finance department and me, if the financial crisis didn't struck Indonesia - the price when translated to our currency is higher by 500%. But I think we still got the best deal there with the fine balance of price (finance dept.'s concern) and stability-performance-upgradability-support (my concern). The server is a Dell PowerEdge 2200, more than enough to run our current database software on a NetWare server, and ready when we move to a client/server database model. 12. Not preparing a backup communication link We usually take the communication for granted. We install it, pay the price, and start utilizing and put heavy demands on it. Then we forgot that it is there. All we knew are that everything went as it should - until one day it stopped working for some reason. Imagine the riots happened. Most company, surprisingly, doesn't thought about this, especially medium/small sized companies. Or, in other cases, just provides an inadequate backup link. For example, do you think providing a 19,200 bps line as a backup for the company's T1 is logical? Of course not,but it happened. Imagine the chaos when their main line is unusable, the line would be jammed so badly that it is practically useless. It is just as if they don't have any backup at all (actually it would be better, they won't have to pay the 19,200 bps rent every month, and still got the same result). Case study: Takaful Insurance. We connected one of our branch to the head office by the Internet. The term "connected" here is actually to update their database every day, so it's more to batch-online not real-time connection. The server at night would automatically copy this branch's data to several separate files, which in turn would be placed in our FTP server or sent by e-mail. At the scheduled time, they would pick it up, and run a special program in their server to insert it into their database. All worked fine until one day, our ISP got a connection problem. Worse, it took days to resolve it. Our branch people were screaming to us. So we finally setup another account in another ISP and designated it as our backup Internet connection, and sent instructions to our branch to adjust their computer setting. Everything went normal after that. 13. Hiring a consultant, and never/rarely checks their progress Another sure-way to make a project late, or even never gets completed. Sometimes, there are evil consultants that's very happy when their work is never checked, they would work unseriously and at the end of the time of agreement they would just disappear with an unfinished project. But sometimes, the problem lies in the company itself. The company's staff is already too busy with their own job, and the management didn't make it clear that it's high priority for them to assist the consultant in their job; e.g. obtaining system documents, explaining the company's needs, etc. Result; slow progress, unhappy consultant, unhappy client, and ultimately - loss of time and money. 14. The Mythical Man-Month, the mishandling of a late computer project Many managers, usually the ones that don't have computer background, thinks that computer- projects is just like any other projects when faced with a potential delay; the solution is just add more people until the rate of processes done is agreed to be enough to meet the deadline. Unfortunately, this simply is not true. A computer project is special, especially the software ones. Adding more people to it could actually slows it down. How come? First, every computer project is unique. Adding new people, especially the fresh ones out of university, into it will require that some time is given to train them; before they could actually contribute to the projects. So the gain could not be immediately realised. Second, the most competent person to do the training is the one who already involved in the project. Even when a formal training is done by external trainer, they'll eventually still have to train the new staff; either formally (specialised training session(s)) or non-formally (work while guided by the senior staff). So we are now actually introduce a new delay to the already late project Third, and perhaps the factor that causes most delay, is the communication overhead. Humans, more or less, is like microprocessors; as we adds more processors (staffs) to a motherboard (project), some communication overheads is involved. Between processors, this usually just deciding which tasks goes to which processors. But even though just that, and they do it blazingly fast; it's still noticeable. That's why adding more processors will not exactly double the performance of a computer. Between humans, it's even more complex. Retraining when the new technology is going to be used, co-ordinating and balancing tasks that must be done, inter-personnel's problems, time needed for meetings, etc. More added, then more of this overheads is involved. It is advised to think very carefully before adding new personnel to an already late project. Only add new personnel when it's agreed the advantages outweighs all of the problems outlined above (although, this is very rare). Chapter 4 Conclusions ----------- Mostly, computer project failures is caused by incapable management. False assumptions, too-good-to-be-true projections, lack of control on the projects underway, forgetting the important details, etc. Even Fortune 500 companies made these mistakes too. To remember that they have big budgets and good IT people, yet still managed to fail surely makes us cringe. Actually, sometimes a small team of highly skilled people lead by a manager with clear vision and good balance of technical and managerial skill is all that it takes to tackle the computer project. But, a manager that really understand and able to apply the needed managerial skill is rare. A person with excellent technical skill is rarer. And unfortunately, a person that has both of the skills is the rarest. So no wonder that not every company has them. Chapter 5 Recommendations How to avoid, or terminate, those problems ------------------------------------------ 1. Not enough time given to gather the necessary data 2. Not enough time given to make a good design Just give it more time. But often, you will be forced to first scrap the already going project, because it's already in a state of total disarray - making it unfeasible to reanalyse the original data / design . Sad but true. 3. Falling for the hypes Too bad there is not much that can be done about this problem, because usually the manager that have fallen to this trap is already put too many stakes in the technology he believes in, so he won't listen to anybody else even if the truth is that his "chosen" technology is a piece of junk. 4. Using untested, cutting-edge technology in production environment Simply don't do it. Ever. 5. Testing a new program in the production server You will always found somebody that would be tempted to try this in your company. To avoid this, you must: # Dictate a good testing procedure; clear, detailed and describes the punishment for the violators # Setup a complete and good security for your production server # Establish a Quality Assurance team if you don't already have one Of course, the management would have to understand about this point too, because most of the time, the users that urges the new module to be run immediately in the production server is the top managers. If they don't understand this point and its implications to company's data, you'll still meet a lot of trouble along the way. 6. The danger of outsourcing Source code is one of the many problems that could plague a company that's using consultants to build their computer system. From the notorious hidden time-bomb, to unclear source code that could not be modified, the horror stories is simply too many to be listed in here. The solution for the problem is to make yourself clear that you want the source code as clean and readable as possible without any hidden traps (and make it written explicitly in the contract), and then assign an IT staff to check their final work, before agreeing that the project has indeed been finished. 7. Inadequate testing This is usually again a case of management failure to identify a potential problem. Usually, the management is already hard-pressed to launch their new system, because its development has already took too many time - and they need to produce more profit immediately which could only be gained from the utilisation of the new system. To realise this, they skip the testing process. And the result varies, from disruption of service(s), to the death of the company. Recommendation: Try to lobby the management, and explains fully the consequences for your company's specific case. 8. Runaway consultant For the example I gave above, the solution is to find a consultant who will work with fixed-time, fixed-price basis. More and more consultants are offering this service. Just don't be surprised if the tag price could be as many as five times higher than the consultants that don't offer this service. Anyway, the completion on schedule and on budget really worth it, and sometimes even will save you a lot of money. For example, Nabisco estimated that they save 20% - 30% for using this kind of service in their year 2000 compliance project. But exceptions are still there. For example, most of them would avoid ERP (Enterprise Resource Planning). Because of its enormous complexity, even with their enhanced analysis methodologies and tools, they still couldn't precisely estimate the time and budget needed to finish the job. There are simply too many variables. Another kind of project they would avoid usually the ones that relies heavily on new technology. The tight constraints of such deal won't give them enough time to learn it. A pitfall that must be avoided is the definition of the end of the project. Both parties must agree upon the definition of "the end of the project", and it should be printed in a well-defined clause. Blur statements could lead to extended work time/budget or even consultant's bail-out from the project, without you being able to sue them. 9. Over-ambitious project Another classic management failure example - fails to calculate whether the available resources would be enough to accomplish the project. Not much that we can do here except to try to rationalise the situation to the CEO/CIO, if they are the one who made the mistake. If the mistake is done by lower rank staff, we could report the problem to the management along with the proof and/or arguments about the danger lurks behind it. If damage(s) has already happened, and the management had realise what is wrong, counter-measure action must be taken immediately to prevent even more waste of time and money. Cut excessive parts from the projects, but do so carefully so not to hamper with other parts' performance. If after the cuts the projects become so out of focus or very hard to continue, it may be necessary to cancel it. Do not hesitate to do it, as the case of Bank of America showed to us. Then continue working as usual, while planning a new project that have better chance to be completed. 10. Never trying to restore the data backup Arrange a staff to be responsible to do this every time after the scheduled backup is done. The word "every time" is emphasised here, since you don't know when the backup will fails you. Computer world bears its own peculiar behaviour, Murphy's Law dictates that your data backup will fails you at the exact time the main server goes down. Many system administrators have experienced that, and some of them got fired as the result. On my own experience listed previously, after that we devised multiple backups strategy; one in the tape, and several others in the hard disks in which some of them is taken away from company's building. We got some minor incidents again after that, but thanks to the multiple backup strategy, we never failed to restore the data again. 11. Pressing the budget too tight The obvious answer is to ask for a more loose budget. But if that's not possible, be ready to compare prices aggressively and very carefully - for non-existent much- needed features that come with that slashed price. Sometimes you can't be stubborn, you must consider what it'd be like if you're in the Finance dept's head position too - imagine the pressure from the top to push the expenditure as much as possible. So what that the company needs is a CIO with good lobby to the management (to enable easy access to the funding needed for IT projects - thus saving precious time which instead would be wasted to try negotiating the budget) and good knowledge to enable him/her choose the right technology without paying a hefty tag price. 12. Not preparing a backup communication link Double (in some cases when that won't be enough, then triple or even quadruple it) every communication link that you have. If you can't do this (mostly due to budget constraints) then operate in triage mode - save the most important ones and prepare procedures to be executed when the other communication links went down to minimise productivity loss. You must be very aggressive concerning this. Whenever there is chance to propose installing a backup link, go ahead. Built as many redundancies as possible into your network. In Takaful Insurance's case; we set it up so that now we have 2 different accounts at 2 different ISPs with 2 different Internet backbone. This way, even if the failure happened at the backbone level, we could still do our data transfer by just using other ISP that's using another Internet backbone. 13. Hiring a consultant, and never/rarely checks their progress The Management must instruct related staff(s) to give full co-operation for the consultant. It must be done in writing, as I've seen myself that people tends (try) to forget it when it's just said. Because however smart the consultant are, if they must re-learn the system by themselves it would take months or years before they would have a good understanding of it. Help from the staff is essential to accelerate this process much, much faster. Then, the management must also check their progress too. Define milestones and its deadlines. Then at the deadline dates, the management must check whether the consultant has reached the agreed milestone. Although easy to be said, sometimes it's very hard to define the milestones, and even harder to reach agreement on the deadline. But it must be done, to ease the management to do the checking. 14. The Mythical Man-Month, the mishandling of a late computer project What could we do for an already late project? There are some, like trashing unneded features of a hardware/software, and re-schedule. If you're planning to re-schedule it, remember that a good rule of thumb for a computer project schedule is like this: # 1/3 planning # 1/6 coding # 1/4 component test, and early system test # 1/4 complete system test Strange as it seems, but it's true. And based from my own experiences, I fully agree with it. Reference list -------------- 1. Strassman, Paul A., (1991), Twenty Deadly Sins of Project Management, http://www.strassman.com, Strassman Inc. 2. Esther Shein (1998), Prix Fixe Deals, http://www.zdnet.com/pcweek/news/0727/27prix1.html, PC-Week Online. 3. Inman News (1996), Company technology projects are late and over budget, http://www.inman.com, Inman news features 4. Brooks, Frederick P. (1975) (Ed.), The Mythical Man-Month, North Carolina, Addison-Wesley. _____________________________________________________________ Last revision : Nov 15, 1998