Delivering the Goods: 2013

Wednesday 9 October 2013

With continuous processes and collaboration to DevOps

If we already have focus on being agile, doing continuous integration and keeping the quality of our SW constantly on a good level, next step to look into is the distribution of the SW. When working with embedded systems, deployments of new SW are usually done rarely and frequent deployments are not possible in practice as the access to the devices might be difficult and frequent updates are annoying and even risky for the users (e.g. mobile phones), or the update process complex and uptime requirements very high (e.g. core network servers). However, there are other areas where frequent updates could be possible and even desirable, like getting bug fixes and new features deployed rapidly to an application in the cloud.

What is required to make deployments frequent, is a good co-operation between the team doing the development, the team verifying the content and the team deploying and maintaining the services, i.e. handling operations. This is called DevOps.

The foundation for functional DevOps is continuous development process, meaning that high-quality changes to SW are rapidly included to releases. This requires that:

Changes are small
SW builds are fast
Submission gate is fast and well-focused

Fast feedback on the changes encourage developers to continue delivering small changes, and filtering the bad changes out is done fluently. Developers shouldn't be afraid of sending bad changes occasionally, because too much verification on the developer's table slows down the sharing of the good changes. Automated submission gate should take care of checking out the changes, with testing covering key features that are often broken and/or widely used, and analysis that quickly provides good coverage on common issues.

After the collecting of changes is running fluently, next focus area is the creation of releases, which should be documented well and automated as much as possible. Goal should be to have at least one new release every day, to be used as the baseline for further development at minimum.

With all this we have a fluent process for the development. When we start to focus on DevOps, we need to take also care of the needs of the operations team. First thing that operations team expects is high quality of the SW release. While primary aim of the development process is to do superficial testing to ensure that the release is basically ok, that kind of testing is not good enough for keeping the SW in good shape in the long run. We need deep testing done by the verification team, a proper quality assurance. Full verification is not suitable for a fast-paced development process, but the verification process needs to be served as the primary customer for the development. Verification may take days or even weeks, but it will provide important insight on possible problems to the development team, which should fix any upcoming issues promptly, in order to keep the provided insight meaningful also in later verification cycles.

Besides verification, operations team will expect a fluent process for deployments and updates. For that development team needs to pay attention how to create a system which can be deployed and updated easily, and consultation with operations team is very valuable. It is also very important that development and verification teams in testing use replicas of the final production environment, or at least close copies of it. Proper configuration management of the environments will be crucial.

DevOps as a term and practice has been popular for a few years, but I believe that majority of the organizations that could benefit from that are still struggling in transformation from waterfall to agile development and continuous integration. Those that are successful in renewing their processes and keeping those in good shape, have better capabilities for making their business successful.

For further reading on DevOps, wikipedia page is a good place to start from.

Sunday 29 September 2013

Eliminate before automating

I've encouraged in my earlier texts to push for more automation, because it typically helps to create products faster and with better quality. Therefore, it's beneficial for the business to encourage automation. However, there's one more thing that has even more value for the business - eliminating tasks that don't provide enough value for the business.

Traditionally, in SW development we focus on improving the way we develop, test and distribute SW. And with automation we help the process to be faster. With automation we are working harder, but not smarter.

When we develop SW too hastily, we create technical debt into our system. To keep our system in good shape, we should be cleaning it up every now and then, refactoring the code and dropping out features that are not used anymore. We need to do the same also for our product development process, refactoring the existing steps and above all, eliminating steps, tasks and items that don't provide additional value to the business but are slowing down the product development.

As an example, if we have been developing our products for years, our bug handling process has collected a lot of things into itself. Typically, some stakeholder makes noise that a certain piece of information would be vital to have in the bug report form, and that piece of information is added to the process without evaluating the real cost of adding it. This scheme is repeated over the years and we end up with a bug reporting process nobody is happy about. Still, none of the parties is ready to give up any of the bits and pieces each has been able to include in process.

Calculating costs for each item in the process might be difficult. Therefore, better approach to keep the process lean is to start from clean table and include only those steps and items that can be shown to have significant value. The team that succeeds in doing that well will have foundation for making innovative products, saving time by eliminating meaningless tasks.

Saturday 21 September 2013

Creating a decent submission gate

In continuous integration (CI), a well-functioning submission gate is crucial (and it's important also in non-CI process) . Submission gate is the criteria which has to be passed in order for a change to be accepted to the common codebase. Note the difference: this is about single changes to be accepted to a baseline, while definition of done is about acceptance of a feature. Basically, all features are composed of several changes.

Key characteristics for a submission gate criteria:

It should prevent the breakage of the common codebase
It should be fluent and swift to use
It should be reliable

For preventing the breakage, we need to have good enough coverage. However, coverage is limited by the need to make the entries through the gate quick. Therefore, we need to pick an optimal set of activities. These should include:

Static code analysis for detecting e.g. possible memory leaks
Tests for frequently used features, breakage of which would prevent the use of the baseline for many
Tests for areas where the breakage could prevent a major part of further and more expensive testing
Tests for easily broken features
Unit testing done beforehand
Code review
Creation of change metadata

Quite a set and all these need to be fast! For static code analysis there are good tools available, which usually provide reports that enable quick fixing of upcoming problems - so those are very useful! Code review is very important and, if organized properly, an inexpensive way to discover problems that testing can't typically find.

Change metadata refers to all formalities that are related to change management, e.g. creating a ticket to the change management tool. This is often a heavy part of the process and should be optimized to make the creation of changes fluent while serving the management with enough information.

Tests need to be selected based on the above criteria, need to be automated (for fluent use) and quick. But we need to remember that the tests need to be also reliable! That's a major challenge as there are many things that could fail, e.g. bugs in test scripts, or failures in SW/HW environment. We will never have ~100% reliable tests (unless we test only very simple things), and therefore we need to be prepared for random failures. What should we do when a test fails?

Discard the change which seemed to break the test, if we rely on the test and our analysis on the results support the view.
Run the test again a few times to check if it is a random failure. How many times is enough? Do we have time and resources for retesting? Random failure may be caused also by the change at hand, so we need to run further tests also for older SW stacks in our codebase. We may also classify the failure as random if it has already appeared in earlier test runs.
Same failure has appeared occasionally already before - let's report an error and get a high priority for fixing it. Perhaps we should even drop the test until we receive a fix? Running a failing test is not sensible, it just grows irritation in everybody. On the other hand, problem should be fixed quickly as while the test is out of use, or random failures present, new errors causing additional failures in the test may enter our codebase without us noticing it.

A lot of things that we should be doing in designing and operating a submission gate. It will never be perfect, we will suffer in either speed, coverage or reliability. So we need to aim to make it decent. However, the most important aspect for a submission gate is always fast feedback, because good coverage is more a requirement for further testing.

Friday 13 September 2013

Why continuous integration won't succeed?

Continuous integration (CI) as a principle is key part of agile SW development. It is in practice mandatory in order to keep the asset in shape for potential shippable product at the end of the sprint. But there are many hurdles for CI to succeed. Here are some I have faced.

First, there might cultural reasons why developers may not be used to make small changes. This will happen if integration is provided for developers as a service. Thereby, it's no problem for the developer to have a long-living branch with no updates from other development during coding the change, and then hand it over to integrator that will then feel the pain of merging it to SW stack. In addition, developer will in this scheme avoid taking the "bad quality" changes of others into his/her development environment.

Second, there might be also practical reasons, not just culture. How easy it is to merge latest changes to own branch? We need to have good tooling for merges, and new baselines provided at least daily. How much effort there is to deliver a change? Unit testing, reviews, builds, integration testing and bureaucracy may all require such an effort that delivering small changes is not efficient SW development. What is the quality of baselines? Those need to be trustworthy, testing and analysis should be quick but have enough coverage to enable the capturing of most common failures, and baselines not fulfilling the tight release criteria shouldn't be published.

So we need to carefully look into our processes and think how well those support CI culture. Building and testing of changes need to be automated and swift, reviews need to have enough priority within the team and bureaucracy minimized. Ideally, developer's effort before submitting the change shouldn't take more than half an hour, and the complete SW build portfolio and acceptance testing in integration another half an hour. This is not the optimum, but bare minimum that would keep the CI process fluent. Tooling for the process needs to be intuitive to use. If our builds are done nightly or getting results from tests take hours, we have still a lot to do for getting into continuous integration.

Friday 6 September 2013

Version numbering means trouble for continuous integration

In continuous integration (CI), the aim is to push new changes quickly to the SW stack, pre- and postevaluate the changes through analysis and testing, making each of the changes a release candidate. For a release, we would need some simple identifiers. The most typical identifier is a version number. However, in CI incrementing version number can be painful, especially if it is defined compile-time.

First, let's look at the reasons why we need releases. Ultimately, a release is needed to deliver the SW to customer. However, releasing is beneficial also for internal purposes:

Release points out a baseline on which next changes are built on. This is important if testing in the CI process is not happening promptly and we want to avoid the situation that developers are using a bad candidate as a base for their changes.
Release simplifies the build configuration, in the form of baseline.
Release points out our latest and greatest SW to stakeholders outside SW development, e.g. verification.

Typical identifiers used for a release are:

baseline (label)
some identifier for the candidate
version number

Baseline identification is defined when the baseline is created, and can be thus freely selected based on scheme we have defined for the purpose. The scheme should be simple using a short body and a running number. Candidate will get its identifier when it's submitted, also based on a predefined scheme which should be simple.

But version number is difficult in case it is defined compile-time, i.e. when the candidate is submitted, and we would need to know already then what version number is allocated. Knowing it at that time would be difficult if release frequently (daily) and can't be sure which content ends up in the previous release.

If we redefine it when baseline is created, we need to recompile the SW and test it again, taking a lot of time if we have an extensive set of compilations and tests for baseline selection. Without testing we'll have a risk that something gets broken. Sometimes there's an alternative to hack the version number in the binaries and thus avoid recompilation, but the risk that the SW gets broken is still there.

Ok, so could we then make a meaningful selection for the version number already when the candidate is built? Yes we can, at least sometimes or even most of the time, but not always. And every time we have wrong version number, our CI process gets into trouble and SW distribution delayed. Version numbering should enter the game only when we are getting near the time that a release will be provided also to customer.

Best alternatives would be to have a version numbering which is not done compile-time but later, or meaningful identifiers for the candidates. The latter would mean a completely new SW philosophy for any mature organization that has been using version numbering as the primary means of identification for years, and the resistance will be furious by some people. However, open-minded developers familiar with agile ways of working will understand that by avoiding compile-time version numbering we'll keep our CI process fluent.

Friday 30 August 2013

The challenge of change

My previous topics - about making things continuous, automating things and renewing processes - require all major changes to existing way of working. Deploying a change is never easy. Most of us are happily living in our comfort zone which we fear a change may disrupt.

However, all people are not the same. We may find individuals that are eager to follow us, especially if we are properly prepared and can justify why the change is good. Getting some people to support the change could help us to get rest of the people to follow us, too. But the change will get harder towards the end, the last laggards are the most difficult to convince, even after we already have transformed the great majority.

The heaviest resistance is provided by narcissists. Those are people who think they know everything, are absolutely right about everything, and will never give up. Often these people represent some specific area of work, and are being impacted by the change in a negative way, i.e. increasing their workload. They are pushing the change to follow their own interests without taking any responsibility for the consequences. They ignore the benefit of the company while their own comfort is threatened.

If these people are not crucial to our change, we are lucky. We may then just try to ignore them. In any case, we need to be well prepared so that the narcissists don't convert other people to support them. We need to have facts right, information shared and know that the thing we are trying to accomplish is the right one.

If the narcissists are blocking the change, we need to focus on showing what is the value of the change for the company, and what does it cost for the company that some individuals are blocking it. Then, it will be finally the task of higher management to decide and give support, or postpone or reject the change.

But we mustn't forget that there is always also valuable resistance, which should be used as feedback to tune the change into perfection. Thus, even in the presence of narcissists, we need to keep our posture and openness towards the community, so that we will be able accomplish the change and do it right.

For a quick handbook on making a change try reading How to Change the World, pdf available just for a couple of bucks!

Wednesday 21 August 2013

SW development of the industrial era

Think about a situation where you would create a specification of a new system and with a push of a button you would get the implementation and a set of tests to validate it. Utopia? Not exactly, even though not yet reality.

The problem with SW development is that the code is something that typically only the developer(s) understand. SW engineering has three major problems:
1. Software is written in a language not commonly understood.
2. Engineer is usually expert of some domain. SW engineer is an exception, known more for expertise of a programming language more than a domain.
3. SW development is still handicraft. It lacks the performance improvements many other areas of engineering have reached via industrialization.

To tackle the first problem, we would need a language that is understood by all parties involved in the development of a product. For second, SW engineer would be transformed to an interpreter for that language and the implementation. For third, development would be done as configurable components, which could mostly still be utilized even if the description done in the common language changes.

Starting with the language problem, solution would be modeling, because that could provide an opportunity to make high-level descriptions of the components and their interactions. Typically, when talking about modeling of SW people think about UML (Unified Modeling Language). But that's not the right option. Problem with UML is that it's supposed to be an all-purpose modeling language and it's thus very complex. I've seen attempts to make specific UML limit-and-extend solutions, but it just won't work out. In addition, UML doesn't raise the level of abstraction, it's basically just another programming language.

A better option would be to create an own domain-specific model (DSM) and a language for that (DSL). Creation of a DSL is often expected to be complex, but it probably isn't - it really depends on your domain and how do you want to show it. There are tools to help you in the creation. And one of the problems to overcome is that we will most likely try the creation of DSL with SW engineers that are experts in some programming language and thus will be out of their comfort zone when working with DSL. So growing the SW engineers to DSL world would be necessary to tackle problem two. That might be the biggest change project when starting DSM.

For problem three, we need to create a code and test generator for our DSL. That isn't as difficult as it sounds first, but finally it depends on what you are working with. If the application is mostly about interactions (e.g. UI or some protocol), DSL could work out. For mathematical algorithms, DSL might not provide that much benefit, but might still help to understand and manage the entity.

Working with DSMs require specific tooling. There are several tools available, commercial and free, for doing modeling for code or tests. When starting modeling, it's important to select the tools carefully, because we might well be tied to the use of the tools for the lifetime of the application. That's probably the biggest hurdle in taking DSL approach in use.

As I mentioned in the beginning, it's not yet reality to have a high level model from which both code and tests are generated (AFAIK, but if you know a case please share it to us). If you wish to learn more, check for example what MetaCase has done in DSM tooling and Conformiq in the area of model-based testing.

Tuesday 13 August 2013

Automate... everything?

In the previous post, I encouraged to make everything continuous. In order to do so, we need to automate a lot of things... but we can't automate everything. Automation suites to activities that are manually demonstrated to be monotonic, and it's beneficial in things that are done repeatedly.

Considering what I suggested to make continuous, planning is always heuristic science, not monotonic, neither repeatable. Development is mostly heuristic but partly it's about common routines that are duplicated here and there. A basic solution for the repetitive parts is to form libraries of functions or macros.

Integration should be mostly repetitive and monotonic, thus it should be mostly automated. However, integration tasks are often lengthy and therefore it should be automated as much as possible with attention of skillful people. It will pay off, because the alternative is that developers will manually do those long tasks, leading to errors and waste of time. We shouldn't use dedicated integrators, because they will get bored with the work, and it's also waste of resources. Problem with dedicated integrators is also that they know much less about the change to be integrated than the people who have written the code, thus giving an opportunity for extra bugs to be inserted. Separate integrators would also add a handover to the development process, which is waste in lean principles. And when dedicated integrators are present, developers take much less responsibility of the changes they submit.

Testing is something we should automate a lot, because running tests is very repetitive and highly monotonic. However, not all tests are suitable for automation. Tests that require complex procedures (and therefore are also seldom executed) might not be worth automating. Delivery and deployment are such that should be at minimum very well documented, but preferably automated.

When automating things, it would be beneficial to pick a framework for automation. Thereby, we can reduce a lot the need for maintenance later on. This is often forgotten when we eagerly start developing new tools and systems, that those need to be also maintained. However, the framework should be carefully selected, seeking for a versatile, cost-effective alternative that is not overwhelmingly complex and heavy for the purpose. Prospects for further development of the system, possible vendor lock-in and available support should also be checked when making a selection.

Then, we could think about automating the whole development process. More about that next...

Sunday 11 August 2013

Continuous... everything!

The buzzword in modern SW development is continuous. Disruptive, big bang work is out of fashion. Or at least it should be.

Avoiding big bangs starts with continuous planning. We shouldn't have content for our next big release planned a year before, but instead the planning should happen in smaller pieces, giving new content timely for development to be used in their short-term planning.

Traditionally (in embedded systems), SW development is done in big chunks. Typically, every chunk is a project with own branch in version control, and a project manager. Development is done in silos, fixes are copied by developers (or worse - integrators) to every project branch - just because every project manager cares about his/her project. This kind of approach is waste of resources and possesses a major risk in quality when changes are manually copied between branches.

A better way would be to focus efforts on one branch (called usually trunk or master/main code line), branching only for verifying that the code base possesses the level of quality that customers expects to get. This typically requires a short period of time for fixing the remaining bugs, expected that quality has been constantly kept on a good level already before branching.

This I would call continuous development. It should be usable in most cases. If there's need to make several consecutive releases with remarkably different content partially from the same code base, this approach might not work any better than the traditional one.

Often it's tempting for developers to take their own or team branches and continue developing SW there for long periods, ending up in a big bang integration after the development is "ready". Integration will take a lot of time, and will face complications with other developers wishing to integrate their code. Development in own branch is often done to avoid the "bad quality" code written by other developers.

Continuous integration (CI) is an approach where developers commit their code frequently to trunk, at least once a day. The submissions don't need to provide new functionality, the only requirement is that one doesn't break any existing functionality. To make CI work, there needs to be an engine that relentlessly and swiftly checks, builds and tests every submission, and naturally the system needs to be very fluent to use for the developers. Finally, there needs to be fluent process for making daily releases of the latest changes for the developers to be used as a base for their submissions next day. It's also useful to get a release for more extensive testing overnight. We need to have continuous testing instead of leaving it to be done after the development.

So the checks, builds and tests need to be fast, and there needs to be enough capacity to verify all submissions. If capacity is limited, verifications may be done for only every nth submission, but in that case finding a breaking submission from a set of n will lead to blocking of further submissions until the bad submission is found and removed. If blocking is not done, removal of the bad SW might get very difficult and may lead to even to discarding of the whole stack of later submissions above it.

In many embedded systems, deployments can't be continuous, SW updates require typically a service break, often also extensive testing to guarantee that the existing quality of service will be preserved - there is a fear that the new release breaks some of the features already in use (as testing in the lab never uncovers all the bugs). And that fear is just common sense, because uptime of the systems needs to be often practically 100%, unlike many web services were a break for a few seconds every now and then might be acceptable. Updating remotely a huge number of devices around the world is also a case where the quality of the update must be guaranteed, because reputation may be lost for a long time just by one bad delivery. Deployment to production of the devices could be attempted more often, in order to get the latest and greatest SW to new customers, but the concerns are the same as above.

But it's great if you are able to please your customers with continuous delivery of SW updates. Thereby, they don't need to wait until the quarterly or yearly release project has got in all (or most) of the planned features, but they'll have an opportunity to update when it suites them best.

For continuous delivery and deployment, we need to consider what is our support capacity and scheme. Distributing numerous SW releases requires resources from the support organization, as it would be expected to support all of them. We may try to deploy a scheme were the default action is to offer latest SW as the solution, but if the deployments can't be done very often...

However, if we have a process in place for making good-quality deliveries continuously, we will have the freedom to choose the moment when we think the content for our next major release is ready, instead of following a plan we might have made long before, where everything relies on a bug fixing period to be done after development is "ready".

If you wish to learn more about making things continuous, you could read book Continuous Delivery, which gives very deep insight of many of the aspects I mentioned here.

Thursday 8 August 2013

Foreword

I started this blog to share what I've learned and experienced in SW development, just to document those and improve my own understanding. Many of the things have been said before, but I try to present the ideas here concisely to give a quick insight, and give references for further reading on the topic. I hope you are already familiar with basic concepts in modern SW development, like agile and lean.

Feel free to leave comments or questions to any of the topics, I'll try to give meaningful responses promptly. You may also request additional postings on deeper insight of the topic.