Up until this year, our QA "team" owned and operated the test environments. We've been running the same general software stack for about 5 years so large portions of the QA space were also 5 years old and had been maintained by 3 different SDET's over time. This year we had a pretty big storm in the form of:
- a new SDET, new to the company and our team.
- a whole lot of software being written and ready to test all at once,
- a large scale host lease return / migration effort all happening.
- a lot of "hard-coded", by-hand local changes in temporary files in our QA maintained regression environments.
That last part was the real killer because as the hosts were swapped, the regression environments stopped working and the new SDET didn't really know how they worked enough to fix them. Meanwhile, we couldn't test the gobs and gobs of software and a number of projects were poised to start slipping day for day waiting on our singular new QA resource to migrate hosts and fix the regression environments.
As SDE's were starting to idle, we had a few options:
- Shelve the new software and move on to other new software while waiting on QA.
- Build the new test environments for the new software ourselves.
- Help QA fix the broken environments.
We chose both to have SDE's pitch in to help fix the broken environments as well as start building the new automated test environments for the new software just written. This was a real eye opener and not in a good way.
We found a lot of stuff that is considered poor Software Development; like full test harnesses cut-and-paste cloned six times to lots of stuff that wasn't checked in like code and config.
The SDET's who had been serially writing and maintaining our test environments over time had some strikes against them:
- They didn't get to spend much time writing software and as such their software development skills atrophied.
- They didn't religiously check in code and get code reviews (this is Amazon policy for SDE's).
- They didn't allot time in their estimates to do these things.
- They considered the work complete without these tasks being done.
All of these "transgressions" are the exact same ones that less experienced software teams make and it was sad to see this when the QA team was surrounded by SDE's who weren't allowed those oversights.
At the end of our intervention we had fully deployable regression environments that the SDE's understood and could extend. Subsequent host migrations were trivial and the SDE's started down a path to centralize, leverage, and extend the automated test rigs producing some really interesting improvements in test case generation and execution.
There was an additional feature add here: once the SDE's saw first hand how hard their software was to automate, they went back and changed the software to be more easily testable. In one recent example, by changing an app to take a file of inputs for testing, instead of feeding those inputs through another team's test domain publishing software, they were able to get the test runs down from 6 hours to 20 minutes AND allowed all the SDE's to run their own mini-stack simultaneously. That was a huge win and a big lesson.
The first moral of this story is that we should have been holding the SDET's up to the same standards as we held ourselves. They were building stuff that would ensure our software was good, yet we didn't code review that work nor make sure it was checked in, as we would if we were borrowing developers from other teams. We should have considered the test environments to be ours as well, even if we didn't build them nor extend them.
(continued)