Tester Mindset: Stress

Sunday, November 3, 2013

Dealing with Stress take 2

In case that you are reading this post, there is a chance that you read my blog post about Stress.

This post was transformed twice: first, I turned it into an article in Tea Time with testers magazine, than it became a presentation in QA&Test 2013 conference at Bilbao, Spain.

The conference was awesome. Hospitality was great. I had an opportunity to get to meet and share ideas

with many cool testers from all over the world. I was also able to experience presenting at an International conference.

Besides the format change – from a blog post to an article and then into a presentation, the ideas themselves emerged and developed as I got a lot of feedbacks while working on the material.

The main idea behind the work is the need to connect our Stress tests to the user’s needs . To do that, I suggest categorizing our Stress tests and failures into 3 main categories: Multiple experiments, Stability over time and Load.

While working on the presentation I added a few aspects which are connected to the failure classification and Stress test planning:

· Assessment of risk when selecting risky flows for multiple experiments.

· Taking in account the impact of the product on the system stability.

· The need to find good oracles beyond the official requirements when defining the load and stability targets.

· Perform load tests of few types :

o The largest amount of data or actions which has meaning for the users

o The full capacity of the product – in order to spot degradation in the capacity before they has impact on the users.

· Use good logging mechanism to gather data on all the experiments that your stress performs.

· Monitor the system resources in order to quickly find stability issues .

Since I was scheduled to present on the last day of the conference, I had some time to get inspiration from a few people that I met during the 1^st two days. The night before the presentation, I changed the summary slide from a list into a mind map that summarizes my takes on the subject.

I am publishing the mind map and would like to ask you to review it and contribute to my initial work on that. I promise to give you credit if you’ll provide meaningful input.

Click to enlarge

Sunday, December 23, 2012

Dealing with Stress

Stress failures and bug advocacy - looking at stress tests from a value perspective

Stress is part of your test strategy. You use it as a tool to test your product and find bugs. This is one of the “non-functional” test categories you run. Did you devote the time to think about what is actually being tested by your stress tests?

You may continue to test without answering this question, but when the time comes for bug advocacy, you have to defend your test strategy and findings, and this may force you to search for an answer.

What are we stressing for?

1) Statistical failure - Stress increases the chances of the appearance of a sporadic defect since it executes a flow a lot of times

2) Run stability tests in a shorter time – the stress speeds up the time factor – failure reveals in a short time a defect that a system which runs in normal conditions (amount of data, number of simultaneous actions, etc.) will experience after a longer run. A common example of such a failure is a memory leak found using the stress setup.

3) Load (sometimes defined as a category by itself) – when we test how our system scales with multiple calls, large amount of data or both. Here, the failure reveals a point when the system fails to handle the load.

4) Any combination of 1, 2 or 3.

In a utopic scenario, when a stress related defect is reported, it follows the path of debug, root cause and fix. But in many cases, we will need our bug advocacy skills in order to convince our stakeholders of the need to fix the defect.

A typical bug discussion can start like this:

Developer: “Stress of 4½ hours and 5MB data files is not a normal usage of our system. A typical use case takes 15 minutes and a smaller amount of data. We should reject this bug.”

This point in the discussion can reveal whether you did your homework or not.

To decide that the failure is from the 1^st classification – statistical, we need to decompose the stress to a meaningful use case and run it over and over while bringing the system to a clean state between the each use case. Automation can be a big help here.

If we succeed in reproducing the failure under such conditions, our report will transform from a stress failure report to a use case failure report with reproduction rate. When we have a sufficient statistical sample, the impact is clear.

Pinpointing whether the failure is related to time or to load is more complex, as we need to “play” with both factors in order to reach a conclusion about the amount of time, load or both that is needed in order to cause the system to reach a failure point. The awareness of the possible options is an important tool in bug advocacy. For example, it can enhance stakeholder’s perspective when you are able to say that “we’re not sure yet, but it is possible that we will see the failure in normal conditions after a long period of time.”

Doing complete research before reporting the stress failure can consume lot of resources and time, so I don’t suggest delaying the report till the tester has all of the answers. Many times, we can reach faster and better conclusions about the failure from a focused code review or a debug log analysis.

I would like to suggest the following: learn to classify your stress failures. When you see and report a stress failure, treat it as a start of the classification and investigation. While sometimes the report will be enough to call for a bug fix, many times it will serve as a call for investigation. During the investigation – make clear to stakeholders what you already know and what you don’t know yet. Make sure that new findings are updated in the bug and don’t be afraid to change the title to reflect it.

There is much more to learn than the basics I summarized in this post. Learning more about stress in general and about your specific system, can help you classify and investigate your stress failures and no less important – plan your stress tests better.