Sunday, December 23, 2012

Dealing with Stress


Stress failures and bug advocacy - looking at stress tests from a value perspective

Stress is part of your test strategy. You use it as a tool to test your product and find bugs. This is one of the “non-functional” test categories you run. Did you devote the time to think about what is actually being tested by your stress tests?

You may continue to test without answering this question, but when the time comes for bug advocacy, you have to defend your test strategy and findings, and this may force you to search for an answer.

What are we stressing for?
1)      Statistical failure  - Stress increases the chances of the appearance of a sporadic defect since it executes a flow a lot of times
2)      Run stability tests in a shorter time – the stress speeds up the time factor – failure reveals in a short time a defect that a system which runs in normal conditions (amount of data, number of simultaneous actions, etc.) will experience after a longer run.  A common example of such a failure is a memory leak found using the stress setup.
3)      Load (sometimes defined as a category by itself) – when we test how our system scales with multiple calls, large amount of data or both. Here, the failure reveals a point when the system fails to handle the load.
4)      Any  combination of 1, 2 or 3.

In a utopic scenario, when a stress related defect is reported, it follows the path of debug, root cause and fix. But in many cases, we will need our bug advocacy skills in order to convince our stakeholders of the need to fix the defect.

A typical bug discussion can start like this:
Developer: “Stress of 4½ hours and 5MB data files is not a normal usage of our system. A typical use case takes 15 minutes and a smaller amount of data. We should reject this bug.”
This point in the discussion can reveal whether you did your homework or not.
To decide that the failure is from the 1st classification – statistical, we need to decompose the stress to a meaningful use case and run it over and over while bringing the system to a clean state between the each use case. Automation can be a big help here.
If we succeed in reproducing the failure under such conditions, our report will transform from a stress failure report to a use case failure report with reproduction rate. When we have a sufficient statistical sample, the impact is clear.

Pinpointing whether the failure is related to time or to load is more complex, as we need to “play” with both factors in order to reach a conclusion about the amount of time, load or both that is needed in order to cause the system to reach a failure point. The awareness of the possible options is an important tool in bug advocacy. For example, it can enhance stakeholder’s perspective when you are able to say that “we’re not sure yet, but it is possible that we will see the failure in normal conditions after a long period of time.”
Doing complete research before reporting the stress failure can consume lot of resources and time, so I don’t suggest delaying the report till the tester has all of the answers. Many times, we can reach faster and better conclusions about the failure from a focused code review or a debug log analysis.

I would like to suggest the following: learn to classify your stress failures.  When you see and report a stress failure, treat it as a start of the classification and investigation. While sometimes the report will be enough to call for a bug fix, many times it will serve as a call for investigation. During the investigation – make clear to stakeholders what you already know and what you don’t know yet. Make sure that new findings are updated in the bug and don’t be afraid to change the title to reflect it.

There is much more to learn than the basics I summarized in this post. Learning more about stress in general and about your specific system, can help you classify and investigate your stress failures and no less important – plan your stress tests better.

Thursday, February 23, 2012

DOWNTIME NOTIFICATION

If you will it, it is no dream.
One day, the test engineers in the Testing lab came to work and went through their inbox. Among the mail was a short message from the Testing lab manager:
To: All testers
Subject: QC DOWNTIME NOTIFICATION: unlimited period
Dear testers, in order to try new ways of work, our test management system, Quality Center (AKA “QC”), will be down until further notice. There is no change in your task definitions, roles and responsibilities.
P.S. I will be out of office on vacation for the next two weeks.
At first glance, the testers were shocked, they re-read the message and moved their eyes to the message date, but it wasn’t April 1st. They looked at the Hebrew calendar, but the month of Adar hadn’t started yet. As seasoned testers, they didn’t take the message as a fact and tried to enter the Quality Center site, without success. 
The coffee corner was crowded. Continuous humming of testers and team managers shaking their head from side to side in disbelief. Managers tried to reach the test lab manager via cell phone, SMS, email, Facebook, and twitter, but got no response. They approached the manager’s manager who told them: “Please talk with your test lab manager. I believe that it is important, but in order to change his decision you must contact him. Till then I will back his decision and won’t interfere.”
The most concerned was the project testing leadership team. They realized that they would not be able to query progress indicators and track whether they had reached coverage expectations for each work week: ww25 10%; ww26 20%; ww27 40%; ww28 60%; ww29 80%; ww30 90% and so on.
The Security team manager gave his team clear instructions: “Perform any legal or semi-legal operation to hack and bring up the system”. But even the efforts of the best white hat hackers team in the industry were not enough to bring up a system that was shutdown, power unplugged, behind a locked door.
Team leaders sat in the labs perplexed. Some asked their team members to perform some bug verifications and handle other minor issues, but didn’t have an idea as to what to do next. One of the leads had an idea: Shmuel, from the Sustaining product test team, told him once that he does not work with QC. As he knew Shmuel managed to perform testing, they could ask him for advice.
An expedition of several people approached Shmuel. The following dialogue took place:
Crowd: “Please help us. We need to run our cycle, but QC is down. How can we do it? Do we need some kind of magic?”
Shmuel: "Does QC run the cycle for you?"
Crowd: “No! But how will we know what tests to run?”
Shmuel: "Oh, knowing what tests to run is important, as we will want to look for information about what we don't know, and get a feeling for areas we consider at risk. Is that what you want from your tests too?"
Crowd: “Yes”
Shmuel: "So let's see: How do you decide what tests to run?"
Sara: "I go to QC and pull the list of tests that are not 'done'."
Shmuel: "We said we are looking for risk and new info – is that what you have in the not-’done’ list?"
Herzl: "Not exactly, it is a list of tests for old info."
Shmuel: "But your team does find new bugs using them, Herzl, how do you do that?"
Herzl: “I guess we find them because when we run the list of tests, we somehow know where things can show problems.”
Josh: “During test execution sometimes I see something behaves suspiciously. I investigate it and find a bug.”
Shmuel: "Josh, this is what happens in my team, too! But you said 'somehow', Herzl... how do you think you knew?"
Herzl: "I don't know... we've seen similar problems in the past?"
Shmuel: "Ah, so you use your experience as part of the secret. Do you have seasoned testers in your team?"
Herzl: “Yes."
Sara: "We also have new testers, they sometimes tests things we hadn't written or thought of."
Shmuel: "It looks like we let our teams think about risk and new information (what we said we were looking for) everyday! Is that so? Sara?"
Sara: "Well, yes... in a sense... for bugs."
Shmuel: "So what do you need QC for?"
Herzl: "How will they start if we don't give them a cycle?"
Shmuel: "How about you just tell them to start? You can ask them to list what they want to test without another list distracting them.”
Herzl: If you will it, it is no dream
Sara: “How would we make sure that teams distribute the work correctly and focus on the risk areas?”
Shmuel: “You do talk with your team members, don’t you? “
Sara (a bit offended): ”Sure, constantly!”
Shmuel: “When talking with your team you learn about what they tested, so you can take the opportunity to discuss focus areas and distribute the work.”
Sara: ”I guess this can be more meaningful than when I just hand them the test cycle and ask about progress.”
Josh: ”Without a list of test cases, how do you make sure that you will not forget to check important things or some details? Our product is pretty big and you can easily forget a part.”
Sara: ”I agree with Josh and want to add that sometimes we do find a bug when a test case fails.”
Shmuel: “Sara and Josh, this is a good question, and hard to answer. It seems that there’s an assumption that the central test list will solve the problem for you.
Testers work is solving this problem, and they have a wealth of tools for that – the test case repository is one of them. Renewing the test case list is another, which can be done by interviewing managers and programmers. Now you don’t have the central repository, but you can organize your own checklists and areas of coverage.”
Herzl: “Sound great, but are you able to report coverage and test results without having numbers of test cases and pass/fail results?”
Shmuel: “Let’s take a closer look at these results and see if we’ll miss them. Are found problems lost, unless there’s a failed test? We use a bug repository to log and study them, which is reviewed by the project managers.”
Herzl: “But without the pass results, do you suggest providing our message about coverage without supporting numbers?”
Shmuel: “Not at all! If you have supportive numbers you should provide them. But in your cycle case... if the receiver does not know what the tests are, there’s little support by counting them.“
Sara: “Well, all they ask is whether we will complete our work in time. And I think we will.“
The test team leaders start to disperse, each to his own team, to discuss the work without the test management database. In a few hours the whole testing department was back at work. Issues were raised, bugs were opened, and short meaningful coverage reports were provided.
Two weeks later. A short message appeared in the tester’s inbox:
To: All testers
Subject: COMPLETED: QC DOWNTIME NOTIFICATION:  server is up again
No one bothered to read the rest of the message.
Special thank to Shmuel Gershon for participating as a guest author in this post 

=====================================================================
Further reading:


Thursday, November 17, 2011

When you Feel Rejected…

It is common to see a bug rejected as “Not a requirement”. It sometimes hurts as it pushes aside your valuable feedback with a process related excuse.
Common examples are
·         When a requirement includes implementation details and the devil (our bug) is in those details – the bug is actually in the requirements.
·         When an issue is detected by using an oracle other than the official requirement (for example one of the HICCUPPS heuristics).
Some less logical examples that I’ve actually seen:
·         When the fix involves someone who is not committed to the effort yet – for example when a Platform bug requires a Software workaround, especially if the effort is big. “Not a requirement” here actually means “Not my responsibility”.
·         When a bug is the result of a design limitation. “Not a requirement” here is actually “It’s not my fault, it’s the Designers fault” and many times the “Bug fix is too expensive”.
Choose the playing field according to the context.
There’s a big field of product value that includes a smaller field of the requirements scope. I play in both. When in order to find this disputable bug, we kicked the ball to the big field, when someone moved its status to “Not a requirement”, he kicked the ball to a smaller field.

Now it’s your turn to select your move according to the context:
1)      Accept the bug rejection
Sometimes the other side of the coins’ argument has validity.
2)      Kick the ball within the requirements field
While the “requirements – yes or no?” argument limits the discussion, if you are able to win it, it will be easier to lead the bug to fix, as the bug handling process is usually more efficient and faster than the requirements definition and approval process. Beware of being too persuasive and winning the argument without a proper reviewer.
3)      Kick the ball to the big field of value again
When the rejection is correct process-wise but not product value wise, it’s time to play in the big field with the big boys. Advocate your bug to stakeholders and decision makers, learn more from customer support and architects or submit a requirement change request. Running in this field is long distance, scoring a goal is much rarer, but this is where you will meet the professional players and improve your own skills.
While the requirements discussion can be more or less relevant, playing beyond it might bring the best rewards.

Sunday, July 24, 2011

The Double Sin of the Early Perfect Test Case



Since I started leading my current testing team, I’ve been struggling with the test case base.

There are a few factors that made the test case base clumsy and outdated. One of the most stunning facts for me was that even test cases that demonstrated big investment in details were often out dated. Despite the large investment, some details were wrong, had changed, had never been true, or were outdated. Often, you could find new testers struggling to understand and execute the test baseline.

The First Sin: Detailed Gen 0 Test Cases

In my experience when test cases are created before the test designer sees and experiences the product, it’s more than likely that they will not be accurate.

The reason for the failure is the limitation of our mind to perfectly imagine an abstract design. Sometimes even the designers doesn’t have a 100% complete design. While you can plan many things ahead of time, you can also anticipate that you will have gaps in your planning, but not anticipate their exact location.

The Second Sin: Detailed Gen 1 Test Cases

What about tests that successfully made it from Gen 0 to Gen 1 and proved to be correct? What about tests that were designed after the product was introduced and tried? They might not suffer from the first sin, but they will suffer from the second sin. Although these tests were accurate in the assumptions about the product itself, not all of them were the correct ones to run. Moreover, some of the tests that did a great job for Gen 1, finished their duty. Using these tests in regression will not be efficient.

As we progress with the test execution, we learn more about the risks of the product. At the end of the first generation testing we can plan better regression testing for the next generations. Typically we will add a small number of test cases and get rid of a larger amount of tests.

Conclusion: investing in too many detailed test cases during Gen0 and Gen1 is not efficient.

I’ll try to define basic guidelines to deal with this issue:

1) Lower the expectations from Gen0 and Gen 1 test cases – understand the built-in limitation of these test cases: Gen 0 might be inaccurate and Gen 1 will not fit your regression needs.

2) Seek for alternatives when planning Gen 0 and Gen 1 test cases. For example, use checklists instead of steps (See “The Value of Checklists and the Danger of Scripts” by Cem Kaner).

3) Try to thinks of better uses of your time during test planning. For example, invest in automation infrastructure during preparation.

4) Realize that moving from Gen 1 o Gen 2 will require more time in test documentation, and is not just copy-paste from Gen0-1 test cases. In this stage, you can save time by creating less “perfect” test cases of new features in the same product introduced at this time.

5) Consider the possibility that you will come to like your lean Gen 0 and Gen 1 test cases so much, that you won’t want to invest in more details for the regression test case base.

In case you claim that your experience is different and it’s possible to create perfect re-usable test plans in early stages, I can think of the following possibilities:

1) You are a better product and test designer than the ones I work with (please mentor me).

2) You don’t have complex and innovative products like the products that I test.

3) You follow a perfect process that prevents you from falling in such traps (and I would like to hear more about it).

Sunday, May 8, 2011

Is there a Pesticide paradox in testing?

As a tester, I have heard and read the term “Pesticide paradox “ on many occasions . However, I do not feel comfortable with it so I avoid using it. In the last few days, I decided to examine it more carefully – does this term makes sense? I did some googling to explore the common use of the term in SW testing, the definition of the paradox in the real pesticide world, and to try to give an answer to the question.

The original paradox is explained in Wikipedia :
“The Paradox of the pesticides is a paradox that states that by applying pesticide to a pest, one may in fact increase its abundance. This happens when the pesticide upsets natural predator-prey dynamics in the ecosystem.” I'll refer to this definition as the "original".

The common use of the term in Testing as I experienced it, is to describe that when using scripted tests (automated in most cases), which are repeated over and over again, eventually the same set of test cases will no longer find any new defects (I took a quote from the ITCQB syllabus which is great source to find "terms of common use"). I'll refer to this definition as the "common use"

I also found the explanation that "A static set of tests will become less effective as developers learn to avoid making mistakes that trigger the tests." (A paper by Rex Black ).

I could say that the common use of the term is to describe that repeating the same checks tends to yield less bugs from run to run. I agree that usually this is true – according to my experience, since bugs get fixed, and usually when no major changes are introduced and no very bad development occurs, products become more stable from release to release also, the development learning explanation is logical.

If we try to correlate between the Testing common use of the term and the original one, we will be able to see a very loose connection – SW bugs do not increase due to the fact that you repeat some checks, and moreover, where is the paradox here? If you ask a question over and over, it is very likely that most of times you'll get the correct answer. I don’t call this a paradox.

When you analyze a term, it is good practice to read the source. I on't have the book Software testing techniques by Boris Beizer, from which is the term origin, but thanks to http://www.softwarequotes.com/  , I found it:
First law: The pesticide paradox. Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffective.
- Boris Beizer - Chapter 1, Section 1.7. Boris notes that farmers solve this problem by planting sacrifice crops for the bugs to eat, and laments that programmers are unable to write sacrifice functions., Software testing techniques by Boris Beizer , ISBN: 0442206720. I'll refer to this quote as "Bezier's".

Well, that makes sense too, and is a good foundation law before you learns about methods – any method is not fully effective. Like with the common use term, I don't see the paradox.

I'll summarize my conclusions on the subject

• The connection between the original term – the biological phenomena of the "Pesticide paradox" and the common use in the testing world is mostly due to the use of the term “bug” to describe a defect, and that the original paradox deals with type of in efficiency when trying to pesticide pests.

• A clear logical paradox appear in the original phenomena – you kill bugs, but this increases their abundance, while the Bezier's and the common use of the term talk about less efficiency, not a paradox. A possible response to this statement will be to argue that when you do something and it is not efficient this is a paradox, to my taste this is too apologetic argument.

• The original SW testing usage quote from Bezier is a warning about relying on a sole method, while the common use by others, which usually refer to Bezier as the source, is to describe the decreased efficiency of repeating a scripted test.

It’s fine to use a cool term with loose analogy to describe your idea, but as you can see in our example, this might cause others to "steal" your term to describe other things (and worse – reference you as the source). In addition, it will be hard to convince people with critical thinking to use your term. I will leave the pesticide paradox to its original meaning.

Tuesday, January 4, 2011

Note about terminology

Sometimes, It’s all about branding. When you want to sell a product or an approach, using terminology that will "sell" your approach to your stakeholders or the professional community has impact on the chances that it will be accepted .
When Fred Hoyle coined the term Big Bang during a 1949 BBC radio broadcast, he did not anticipate that he is doing a branding service to the competitive theory. According to Hoyle, who favored an alternative "steady state" cosmological model, he used the striking image to highlight the difference between the two models. Probably he did it too well :-)

Markus Gärtner in his post Active vs. passive testing  introduce refreshing terminology for what we use to call Testing Vs. Checking or Exploratory Vs. Scripted. He uses the terms Active Vs. Passive testing. He also talk about the role of judgment which is part of being active, but basically, the new pair of active Vs. Passive comes to describe Research, critical, exploratory approach versos executing the planned tests, checking and following defined scripts.

This new terminology has some benefits on the terms we are used to. It's not using a term that we already use to describe wider area like "Testing". And unlike the term exploratory it's not suffer from the "unstructured" public image.

Thursday, November 11, 2010

Using the definition of quality as a tool for context awareness

What is quality?
Since testers job is to perform quality assessments, ask questions and provide answers regarding quality, understanding what quality is can help identify dilemmas in our work and put them in context.
My preferred definition for quality is “value to someone” (Jerry Weinberg). Cem Kaner adds the extension “who matters”. Usually I refer to this VIP as a “User”.

Recently, I noticed that using this definition helps identify context and explain the context to others during discussion.
A few examples:

Focusing a discussion on the goal rather than the process.
Process is important, but sometimes process discussions disconnect from the goal. For example, when a bug is opened and there is a discussion about whether it’s a requirement violation or not, providing insight on the User value could help direct the discussion to a productive place. This is also true when a tester spots an issue and is not sure whether it falls under his responsibility to report it – “is there a threat to the value to the users?” is a good litmus test to aid the decision.


Selecting a process
When defining a process, understanding how it’s connected to the value to the user is a good way to examine it. When we define a process it should connect our efforts to the goal rather than disconnect them. A negative example is when mixing between the priority for fixing the bug and the bug Severity. Many times there is correlation between the two, but it will be a good idea to define process that will address the cases when there is a difference between the two (like setting different field for each goal) so the information of the value threat severity will not be replaced by the work plan.

Determine classification of a problem
We face many types of problems. Some related to the quality of the product and some interfere with other aspects of our work, delay our progress or block our testing efforts. When a testability issue is examined in perspective of user value, it can be underrated since our inability to test efficiently is the real issue here, and not the impact of the tested attribute on the end user.
Sometimes there are two types of issues combined together in one problem description. Distinguishing between the value to the user and the other issue, helps provide a clearer explanation.

Overcoming the tunnel effect when setting bug severity
Setting correct bug exposure classification helps the bug life cycle start on the right foot. When a tester tests his area of responsibility and spots a problem, sometimes it’s not easy to relate to the big picture – what will be the impact on the user? Will he be able to recover? Or in other words – what is the threat in terms of value to the user. Answering this question easily directs the bug submitter to specify the correct bug severity.