Sunday, December 23, 2012

Dealing with Stress


Stress failures and bug advocacy - looking at stress tests from a value perspective

Stress is part of your test strategy. You use it as a tool to test your product and find bugs. This is one of the “non-functional” test categories you run. Did you devote the time to think about what is actually being tested by your stress tests?

You may continue to test without answering this question, but when the time comes for bug advocacy, you have to defend your test strategy and findings, and this may force you to search for an answer.

What are we stressing for?
1)      Statistical failure  - Stress increases the chances of the appearance of a sporadic defect since it executes a flow a lot of times
2)      Run stability tests in a shorter time – the stress speeds up the time factor – failure reveals in a short time a defect that a system which runs in normal conditions (amount of data, number of simultaneous actions, etc.) will experience after a longer run.  A common example of such a failure is a memory leak found using the stress setup.
3)      Load (sometimes defined as a category by itself) – when we test how our system scales with multiple calls, large amount of data or both. Here, the failure reveals a point when the system fails to handle the load.
4)      Any  combination of 1, 2 or 3.

In a utopic scenario, when a stress related defect is reported, it follows the path of debug, root cause and fix. But in many cases, we will need our bug advocacy skills in order to convince our stakeholders of the need to fix the defect.

A typical bug discussion can start like this:
Developer: “Stress of 4½ hours and 5MB data files is not a normal usage of our system. A typical use case takes 15 minutes and a smaller amount of data. We should reject this bug.”
This point in the discussion can reveal whether you did your homework or not.
To decide that the failure is from the 1st classification – statistical, we need to decompose the stress to a meaningful use case and run it over and over while bringing the system to a clean state between the each use case. Automation can be a big help here.
If we succeed in reproducing the failure under such conditions, our report will transform from a stress failure report to a use case failure report with reproduction rate. When we have a sufficient statistical sample, the impact is clear.

Pinpointing whether the failure is related to time or to load is more complex, as we need to “play” with both factors in order to reach a conclusion about the amount of time, load or both that is needed in order to cause the system to reach a failure point. The awareness of the possible options is an important tool in bug advocacy. For example, it can enhance stakeholder’s perspective when you are able to say that “we’re not sure yet, but it is possible that we will see the failure in normal conditions after a long period of time.”
Doing complete research before reporting the stress failure can consume lot of resources and time, so I don’t suggest delaying the report till the tester has all of the answers. Many times, we can reach faster and better conclusions about the failure from a focused code review or a debug log analysis.

I would like to suggest the following: learn to classify your stress failures.  When you see and report a stress failure, treat it as a start of the classification and investigation. While sometimes the report will be enough to call for a bug fix, many times it will serve as a call for investigation. During the investigation – make clear to stakeholders what you already know and what you don’t know yet. Make sure that new findings are updated in the bug and don’t be afraid to change the title to reflect it.

There is much more to learn than the basics I summarized in this post. Learning more about stress in general and about your specific system, can help you classify and investigate your stress failures and no less important – plan your stress tests better.

Thursday, February 23, 2012

DOWNTIME NOTIFICATION

If you will it, it is no dream.
One day, the test engineers in the Testing lab came to work and went through their inbox. Among the mail was a short message from the Testing lab manager:
To: All testers
Subject: QC DOWNTIME NOTIFICATION: unlimited period
Dear testers, in order to try new ways of work, our test management system, Quality Center (AKA “QC”), will be down until further notice. There is no change in your task definitions, roles and responsibilities.
P.S. I will be out of office on vacation for the next two weeks.
At first glance, the testers were shocked, they re-read the message and moved their eyes to the message date, but it wasn’t April 1st. They looked at the Hebrew calendar, but the month of Adar hadn’t started yet. As seasoned testers, they didn’t take the message as a fact and tried to enter the Quality Center site, without success. 
The coffee corner was crowded. Continuous humming of testers and team managers shaking their head from side to side in disbelief. Managers tried to reach the test lab manager via cell phone, SMS, email, Facebook, and twitter, but got no response. They approached the manager’s manager who told them: “Please talk with your test lab manager. I believe that it is important, but in order to change his decision you must contact him. Till then I will back his decision and won’t interfere.”
The most concerned was the project testing leadership team. They realized that they would not be able to query progress indicators and track whether they had reached coverage expectations for each work week: ww25 10%; ww26 20%; ww27 40%; ww28 60%; ww29 80%; ww30 90% and so on.
The Security team manager gave his team clear instructions: “Perform any legal or semi-legal operation to hack and bring up the system”. But even the efforts of the best white hat hackers team in the industry were not enough to bring up a system that was shutdown, power unplugged, behind a locked door.
Team leaders sat in the labs perplexed. Some asked their team members to perform some bug verifications and handle other minor issues, but didn’t have an idea as to what to do next. One of the leads had an idea: Shmuel, from the Sustaining product test team, told him once that he does not work with QC. As he knew Shmuel managed to perform testing, they could ask him for advice.
An expedition of several people approached Shmuel. The following dialogue took place:
Crowd: “Please help us. We need to run our cycle, but QC is down. How can we do it? Do we need some kind of magic?”
Shmuel: "Does QC run the cycle for you?"
Crowd: “No! But how will we know what tests to run?”
Shmuel: "Oh, knowing what tests to run is important, as we will want to look for information about what we don't know, and get a feeling for areas we consider at risk. Is that what you want from your tests too?"
Crowd: “Yes”
Shmuel: "So let's see: How do you decide what tests to run?"
Sara: "I go to QC and pull the list of tests that are not 'done'."
Shmuel: "We said we are looking for risk and new info – is that what you have in the not-’done’ list?"
Herzl: "Not exactly, it is a list of tests for old info."
Shmuel: "But your team does find new bugs using them, Herzl, how do you do that?"
Herzl: “I guess we find them because when we run the list of tests, we somehow know where things can show problems.”
Josh: “During test execution sometimes I see something behaves suspiciously. I investigate it and find a bug.”
Shmuel: "Josh, this is what happens in my team, too! But you said 'somehow', Herzl... how do you think you knew?"
Herzl: "I don't know... we've seen similar problems in the past?"
Shmuel: "Ah, so you use your experience as part of the secret. Do you have seasoned testers in your team?"
Herzl: “Yes."
Sara: "We also have new testers, they sometimes tests things we hadn't written or thought of."
Shmuel: "It looks like we let our teams think about risk and new information (what we said we were looking for) everyday! Is that so? Sara?"
Sara: "Well, yes... in a sense... for bugs."
Shmuel: "So what do you need QC for?"
Herzl: "How will they start if we don't give them a cycle?"
Shmuel: "How about you just tell them to start? You can ask them to list what they want to test without another list distracting them.”
Herzl: If you will it, it is no dream
Sara: “How would we make sure that teams distribute the work correctly and focus on the risk areas?”
Shmuel: “You do talk with your team members, don’t you? “
Sara (a bit offended): ”Sure, constantly!”
Shmuel: “When talking with your team you learn about what they tested, so you can take the opportunity to discuss focus areas and distribute the work.”
Sara: ”I guess this can be more meaningful than when I just hand them the test cycle and ask about progress.”
Josh: ”Without a list of test cases, how do you make sure that you will not forget to check important things or some details? Our product is pretty big and you can easily forget a part.”
Sara: ”I agree with Josh and want to add that sometimes we do find a bug when a test case fails.”
Shmuel: “Sara and Josh, this is a good question, and hard to answer. It seems that there’s an assumption that the central test list will solve the problem for you.
Testers work is solving this problem, and they have a wealth of tools for that – the test case repository is one of them. Renewing the test case list is another, which can be done by interviewing managers and programmers. Now you don’t have the central repository, but you can organize your own checklists and areas of coverage.”
Herzl: “Sound great, but are you able to report coverage and test results without having numbers of test cases and pass/fail results?”
Shmuel: “Let’s take a closer look at these results and see if we’ll miss them. Are found problems lost, unless there’s a failed test? We use a bug repository to log and study them, which is reviewed by the project managers.”
Herzl: “But without the pass results, do you suggest providing our message about coverage without supporting numbers?”
Shmuel: “Not at all! If you have supportive numbers you should provide them. But in your cycle case... if the receiver does not know what the tests are, there’s little support by counting them.“
Sara: “Well, all they ask is whether we will complete our work in time. And I think we will.“
The test team leaders start to disperse, each to his own team, to discuss the work without the test management database. In a few hours the whole testing department was back at work. Issues were raised, bugs were opened, and short meaningful coverage reports were provided.
Two weeks later. A short message appeared in the tester’s inbox:
To: All testers
Subject: COMPLETED: QC DOWNTIME NOTIFICATION:  server is up again
No one bothered to read the rest of the message.
Special thank to Shmuel Gershon for participating as a guest author in this post 

=====================================================================
Further reading: