Flint Weiss: September 2008

Saturday, September 13, 2008

Learn from your QA staff

In modern times, it is common practice for developers to write unit tests with their code. This is a great thing. But developers aren't practiced at writing tests, nor do they have that base knowledge that your QA team would have.

Why not start learning from your QA team and getting them to share their subject matter expertise? Your QA folks have many years of experience in how to find bugs in software.

Just for fun (to prove this to yourself), the next time you bump into one of your testers in the hallway, ask them to list the standard / rule-of-thumb use-cases for boundary condition testing. I bet they come up with a few you didn't think about. They know a lot of cool stuff like that, stuff that you need to know to write effective unit tests.

There is a lot of good that comes from this.

The developers will start writing better unit tests. That means they find more of their own bugs, the ones that are easy for QA to find. This reduces cycle time and makes better use of the QA folks time.

The test team will feel good and valuable and loved because you are highlighting the fact that they have knowledge to share and you want to learn from them. This is not only a morale boost, but a culture shift as well.

Having more interaction between the test team and the dev team, outside of the "your app sucks" and "why didn't you find this bug" meetings, will enhance that relationship making for better and smoother interactions on projects which makes for better delivered product.

Don't give your crap to your testers

If you're a developer with any kind of honor or pride what-so-ever, your goal is to have no bugs. And while all software has bugs, the bugs in your code that QA does find, should be so arcane and edge case-y, that they everyone is impressed that those bugs were able to be found.

Said differently, let's say you drop your car off at your favorite shop for a tune-up. How would you feel if your car won't start when you go to pick it up, as if the tune- up had made your car worse and the mechanics didn't notice?

You'd be the indignant kind of mad. You'd think that those mechanics had no sense of pride in their work and didn't care about you the customer.

So how do you think it reflects on you when you give something to QA and in the first 5 minutes, running through the basic smoke tests, the app fails? Do you like like looking like a dolt? Don't you want to be able to find the bugs in your code before they do? Sure you do, because you have pride in your work.

Test your stuff before you give it over to QA. Don't waste their time and yours by giving them stuff that fails on the easy tests. Test their mettle. Make them stretch to find the good bugs because you've already squashed the easy ones.

Monday, September 8, 2008

Not me, not me

I heard from a few friends of mine, that I'm wrong about software owners choosing to ship untested software and I thought I'd drop a quick note about that.

Does QA actually release the software themselves when they feel it is ready?

If not, then whomever owns the software and decides to ship is overriding the QA group.

It happens when QA's schedule is nickel and dime'd up front. When the QA schedule at the end of the project is compressed because dev teams slipped. When the software is released with known bugs. When the QA team isn't given an equal voice and influence at the triage table. When time for building automated test suites is cut.

The only time QA is really in control is when they have total go/no-go authority _AND_ they aren't being pressured by one manager up who is vested in the date.

Mind you, I don't think the QA group should be given total go/no-go authority. That is a business decision that needs to take a lot of factors into account. I wish I could say that it was only about "quality" but it isn't. Besides, what is "quality" anyway and who's requirements define the quality of the software product? Is the QA team incented to follow the business goals (which always include quality) or to let no bugs get to production? The two are very different.

I just bring this up because the conversation gets easier when we own up. You own the software, you decide when to release. If you can accept this fact it clears up a lot of other confusing issues.

I own the software and am responsible for it working or not working.

I own the test rigs and test environments since those things are critical to my knowing that my software is going to work. Just as I use testing to verify the work of the developers, I use the developers to verify the work of QA. I leverage the QA team to develop some components of that test system along with my dev team but I am still responsible for the test team's output even though they report up through a different chain. They are essentially loaned to me as part of my team for their expertise. All of the test environment code is reviewed by someone from the dev team as we may have to modify it, fix it, extend it while they are doing other important work, just like inside the dev team.

If you can own-up to this state of mind, your interactions with your QA team _will_ change as now you are working together on your own thing -- the thing that you have vested interest in.

Who does what? (pt 1)

Up until this year, our QA "team" owned and operated the test environments. We've been running the same general software stack for about 5 years so large portions of the QA space were also 5 years old and had been maintained by 3 different SDET's over time. This year we had a pretty big storm in the form of:

a new SDET, new to the company and our team.
a whole lot of software being written and ready to test all at once,
a large scale host lease return / migration effort all happening.
a lot of "hard-coded", by-hand local changes in temporary files in our QA maintained regression environments.

That last part was the real killer because as the hosts were swapped, the regression environments stopped working and the new SDET didn't really know how they worked enough to fix them. Meanwhile, we couldn't test the gobs and gobs of software and a number of projects were poised to start slipping day for day waiting on our singular new QA resource to migrate hosts and fix the regression environments.

As SDE's were starting to idle, we had a few options:

Shelve the new software and move on to other new software while waiting on QA.
Build the new test environments for the new software ourselves.
Help QA fix the broken environments.

We chose both to have SDE's pitch in to help fix the broken environments as well as start building the new automated test environments for the new software just written. This was a real eye opener and not in a good way.

We found a lot of stuff that is considered poor Software Development; like full test harnesses cut-and-paste cloned six times to lots of stuff that wasn't checked in like code and config.

The SDET's who had been serially writing and maintaining our test environments over time had some strikes against them:

They didn't get to spend much time writing software and as such their software development skills atrophied.
They didn't religiously check in code and get code reviews (this is Amazon policy for SDE's).
They didn't allot time in their estimates to do these things.
They considered the work complete without these tasks being done.

All of these "transgressions" are the exact same ones that less experienced software teams make and it was sad to see this when the QA team was surrounded by SDE's who weren't allowed those oversights.

At the end of our intervention we had fully deployable regression environments that the SDE's understood and could extend. Subsequent host migrations were trivial and the SDE's started down a path to centralize, leverage, and extend the automated test rigs producing some really interesting improvements in test case generation and execution.

There was an additional feature add here: once the SDE's saw first hand how hard their software was to automate, they went back and changed the software to be more easily testable. In one recent example, by changing an app to take a file of inputs for testing, instead of feeding those inputs through another team's test domain publishing software, they were able to get the test runs down from 6 hours to 20 minutes AND allowed all the SDE's to run their own mini-stack simultaneously. That was a huge win and a big lesson.

The first moral of this story is that we should have been holding the SDET's up to the same standards as we held ourselves. They were building stuff that would ensure our software was good, yet we didn't code review that work nor make sure it was checked in, as we would if we were borrowing developers from other teams. We should have considered the test environments to be ours as well, even if we didn't build them nor extend them.

(continued)

Use QA wisely

I've heard stories that some groups in Microsoft have a Dev to QA ratio of 1:2 (that's 2 QA for every Dev). On my team at Amazon that ratio is more like 7:1. (7 Dev for 1 QA). 7 SDE's can write a lot more software than 1 SDET can test and as such we have to prioritize quite a bit to make the best use of our SDET's time.

For a while our QA traditionally spent a lot of time running regression tests. Mind you, our software doesn't have a UI so the tests are all about setting state, applying data to the application, and verifying state.

That wasn't such a good use of time. Computers are really good at monotonous repetitive tasks and tying up QA manually running regression tests was an egregious waste.

The first step was to automate the regression tests so that QA could kick them off and then work on other stuff. The next step was to fully automate the regression suite so that it could be run by anyone, not just QA -- like the Dev team. The third step was to simplify the machinery of the regression suite so that each developer could run it locally, as part of their development effort.

At this point, the SDE's are able to drop software into the regression suite before they even check it in!!! New test cases to go along with the new features are added to the automated test suite along with development, as a way to test the software being built, as it is being built. The automate test suite can then easily be run after the code is merged to mainline, and then again after we merge the code up to the release branch. Our QA never has to get involved in this cycle which is great because there are more valuable places for them to contribute.

QA now has a lot more time to bring their expertise to bear: finding mean and devious ways to break the software and expose the bugs.

The other benefit is that when the automated tests are easy to run, the developers will run them all the time and catch the easy bugs before QA gets a turn. That further reduces cycle time and raises the bar on the QA team to find the really good bugs.

Wednesday, September 3, 2008

Your QA team is your friend

At Amazon we work in the total ownership model. Each team is fully responsible for the software and services we write. We own, develop, maintain, and operate.

I like this model for a number of reasons but when it comes to QA it makes the situation very clear. The software is mine and I am responsible for shipping quality software that is not only good, but also doesn't page me in the middle of the night.

The software runs better in production when it has been well tested. Testing has to be done before the software ships. That is work that has to be done. Facets of that work are better done by folks classically trained in QA. Other parts of that work are better achieved by software engineers. Regardless of who gets the work done, it has to be done, and done well.

If your QA group is competent then they will find plenty of bugs if you give them tools and time. The more useful tools and API you give them, the more bugs they will find in less time. If you don't want to give them enough time to do their job then you get what you paid for, just like when you compress a software engineer's schedule so much that you get some clunky rickety thing that barely works.

In the Amazon model, if I ship bad software then _I_ shipped bad software. QA doesn't ship bad software because they don't ship software. They advise me on the state of my software. Sometimes, time to market pressure requires software to be released with known non-critical flaws that will be fixed in a follow up release. Most times, the date is slipped and the bugs are fixed.

In the case where QA gives the software is given the green light and then fails terribly in Production, you have really clear and specific things to discuss with the QA team and clear and specific opportunities to adjust the process on the next iteration -- which is usually pretty soon given that you have to fix all the bugs you just launched with.

(more on this soon)