It's easy to make
mistakes when testing software or planning a testing effort. Some
mistakes are made so often, so repeatedly, by so many different
people, that they deserve the label Classic Mistake.
在测试软件或制订测试工作计划时很容易犯一些错误。有些错误经常被许多不同的人一而再、再而三地犯,应该被列为典型错误。
Classic mistakes cluster usefully into five groups,
which I've called "themes":
典型错误可以有效地分为五组,我把这些组称为“主题”。
· The Role of Testing: who does the testing team
serve, and how does it do that?
· 测试的作用:谁承担测试小组的责任,如何做?
· Planning the Testing Effort: how should the
whole team's work be organized?
· 制订测试工作计划:应该如何组织整个小组的工作?
· Personnel Issues: who should test?
· 人员问题:谁应该做测试?
· The Tester at Work: designing, writing, and
maintaining individual tests.
· 工作中的测试员:设计、编写和维护各测试。
· Technology Rampant: quick technological fixes
for hard problems.
· 过度使用技术:艰难问题的快速技术修复
I have two goals for this paper. First, it should
identify the mistakes, put them in context, describe why they're
mistakes, and suggest alternatives. Because the context of one mistake
is usually prior mistakes, the paper is written in a narrative style
rather than as a list that can be read in any order. Second, the
paper should be a handy checklist of mistakes. For that reason,
the classic mistakes are printed in a larger bold font when they
appear in the text, and they're also summarized at the end.
本文有两个目标。第一,应当识别错误,将它们放到具体环境中,描述它们为什么是错误,并给出替代方法的建议。因为一个错误的具体环境通常是先决错误,所以本文将以叙事的方式而不是以可以按任意顺序阅读的列表方式来描述。第二,本文应该是一个便于查看的错误列表。因为这个原因,文章中出现的典型错误都以大号粗体字印刷,并在文章的结尾处汇总。
Although many of these mistakes apply to all types
of software projects, my specific focus is the testing of commercial
software products, not custom software or software that is safety
critical or mission critical.
虽然这些错误很多都适用于所有类型的软件项目,但我的重点将放在商用软件产品的测试上,而不是定制软件或者是高度安全或关键任务的软件测试上。
This paper is essentially a series of bug reports
for the testing process. You may think some of them are features,
not bugs. You may disagree with the severities I assign. You may
want more information to help in debugging, or want to volunteer
information of your own. Any decent bug reporting system will treat
the original bug report as the first part of a conversation. So
should it be with this paper. Therefore, follow this link for an
ongoing discussion of this topic.
本文主要是测试过程的一系列错误报告。你可能认为它们中的部分属于特性问题而不是 bug。你可能不赞成我设定的严重性级别。你可能需要更多的信息以用于帮助排除错误,或者希望提供你自己的信息。任何设计良好的错误报告系统都将原始的错误报告当作是对话的起始部分。本文也是这样,所以,可以按照链接参加这个主题的讨论。
Theme One: The Role of Testing
主题一:测试的作用
A first major mistake people make is thinking
that the testing team is responsible for assuring quality. This
role, often assigned to the first testing team in an organization,
makes it the last defense, the barrier between the development team
(accused of producing bad quality) and the customer (who must be
protected from them). It's characterized by a testing team (often
called the "Quality Assurance Group") that has formal
authority to prevent shipment of the product. That in itself is
a disheartening task: the testing team can't improve quality, only
enforce a minimal level. Worse, that authority is usually more apparent
than real. Discovering that, together with the perverse incentives
of telling developers that quality is someone else's job, leads
to testing teams and testers who are disillusioned, cynical, and
view themselves as victims. We've learned from Deming and others
that products are better and cheaper to produce when everyone, at
every stage in development, is responsible for the quality of their
work ([Deming86], [Ishikawa85]).
人们犯的第一个主要错误是认为测试小组应当负责质量保证。这个角色常常分配给组织中的第一测试小组,将它作为最后的防御,成为开发小组(被指责为产生低劣质量)和客户(必须受到保护以远离低劣质量)的一个屏障。它的特征是测试小组(常称为“质量保证组”)表面上具有阻止产品发货的权力。
这本身是一个令人沮丧的任务:测试小组不能提高质量,只能强制一个最低水平。更糟糕的是,这种权力常常是看上去比实际的重要。如果发现这一点,再加上有违常理地暗示开发人员质量是别人的事情,导致测试小组和测试员感到失望、愤事嫉俗、感觉自己是受害者。我们从Deming
和其他人的工作可以得知:如果每个人都在开发的各个阶段对他们的工作质量负责,则产品会又好又便宜([Deming86],[Ishikawa85])。
In practice, whatever the formal role, most organizations
believe that the purpose of testing is to find bugs. This is a less
pernicious definition than the previous one, but it's missing a
key word. When I talk to programmers and development managers about
testers, one key sentence keeps coming up: "Testers aren't
finding the important bugs." Sometimes that's just griping,
sometimes it's because the programmers have a skewed sense of what's
important, but I regret to say that all too often it's valid criticism.
Too many bug reports from testers are minor or irrelevant, and too
many important bugs are missed.
实际上,不管表面上的作用是什么,大多数组织都相信测试的目的是发现 bug。这个定义的危害比前一个定义的危害要小,但是忽略了一个关键词。当我同程序员和开发经理谈到测试员的时候,不时听到一个关键的句子:测试员找不到重要的
bug。有时候这种说法只是一种抱怨,有时候是因为程序员对于什么是正确的感觉不对,但我很遗憾地说,它们经常是有效的批评。测试员的太多的bug
报告是微小的、不相关的,而有太多重要的错误都被遗漏了。
What's an important bug? Important to whom? To
a first approximation, the answer must be "to customers".
Almost everyone will nod their head upon hearing this definition,
but do they mean it? Here's a test of your organization's maturity.
Suppose your product is a system that accepts email requests for
service. As soon as a request is received, it sends a reply that
says "your request of 5/12/97 was accepted and its reference
ID is NIC-051297-3". A tester who sends in many requests per
day finds she has difficulty keeping track of which request goes
with which ID. She wishes that the original request were appended
to the acknowledgement. Furthermore, she realizes that some customers
will also generate many requests per day, so would also appreciate
this feature. Would she:
什么是重要的 bug?对谁而言是重要的?直观的估计,答案肯定是“对于客户”。听到这个定义,几乎每个人都会点头称是,但他们确实这样认为吗?这里要测试一下你们组织的成熟度。假设你们的产品是一个接受电子邮件请求服务的系统。当收到请求时,它马上发送一个“您在97年5月12日发送的请求已经受理,参考ID是NIC-051297-3”的答复。一个每天发送很多请求的测试员发现要分清楚哪个请求与哪个ID对应是非常困难的。她希望最初的请求能够附加在确认邮件的后面。并且,她意识到某些可户可能每天也会产生很多请求,所以会高度评价这个功能的。那么她将:
1. file a bug report documenting a usability problem,
with the expectation that it will be assigned a reasonably high
priority (because the fix is clearly useful to everyone, important
to some users, and easy to do)?
写一个 bug 报告,记录一个可用性问题,希望能够分配一个合理的高优先级(因为这个修复很明显对每个人都很用,对有部分用户来说还非常重要,并且也容易修改)?
2. file a bug report with the expectation that
it will be assigned "enhancement request" priority and
disappear forever into the bug database?
写一个 bug 报告,希望它被分配为“功能提升请求”优先级并永远从 bug 数据库中消失?
3. file a bug report that yields a "works
as designed" resolution code, perhaps with an email "nastygram"
from a programmer or the development manager?
写一个 bug 报告,产生一个“按设计工作”解决码,可能还加上一个来自程序员或开发经理的“不同意”电子邮件?
4. not bother with a bug report because it would
end up in cases (2) or (3)?
不打算费事去写 bug 报告,因为它将以情况(2)或(3)结束?
If usability problems are not considered valid
bugs, your project defines the testing task too narrowly. Testers
are restricted to checking whether the product does what was intended,
not whether what was intended is useful. Customers do not care about
the distinction, and testers shouldn't either.
如果可用性问题不认为是有效的 bug,那么你们的项目将测试任务定义得太狭窄了。测试员严格限制为检查产品是否按预期工作,而不管这种预期是否有效。客户不关心这个区别,测试员也不应该关心。
Testers are often the only people in the organization
who use the system as heavily as an expert. They notice usability
problems that experts will see. (Formal usability testing almost
invariably concentrates on novice users.) Expert customers often
don't report usability problems, because they've been trained to
know it's not worth their time. Instead, they wait (in vain, perhaps)
for a more usable product and switch to it. Testers can prevent
that lost revenue.
测试员经常是组织中唯一像专家一样大量使用系统的人。他们会注意到专家会看到的可用性问题。(形式上的可用性测试几乎不可避免地集中于没有经验的用户。)专家客户常常不会报告可用性问题,因为他们已经被训练的知道不值得花时间去这样做。相反,他们(也许是徒劳地)等待下一个可用的产品然后切换过去。测试员可以避免这个损失。
While defining the purpose of testing as "finding
bugs important to customers" is a step forward, it's more restrictive
than I like. It means that there is no focus on an estimate of quality
(and on the quality of that estimate). Consider these two situations
for a product with five subsystems.
将测试的目的定义为“找到对用户重要的 bug ”是向前进了一步,但与我所喜欢定义相比仍有限制。这意味着没有集中于质量评估(以及这种评估的质量)。考虑一下测试含有五个子系统的产品的两种情况。
1. 100 bugs are found in subsystem 1 before release.
(For simplicity, assume that all bugs are of the highest priority.)
No bugs are found in the other subsystems. After release, no bugs
are reported in subsystem 1, but 12 bugs are found in each of the
other subsystems.
在发布前,在子系统1中找到了100个bug 。(为了简单起见,假设所有的 bug 都是最高级别的。)在其他子系统中没有发现
bug 。在发布后,在子系统1中没有报告 bug ,但在其他每个子系统中都报告了12个 bug 。
2. Before release, 50 bugs are found in subsystem
1. 6 bugs are found in each of the other subsystems. After release,
50 bugs are found in subsystem 1 and 6 bugs in each of the other
subsystems.
在发布前,在子系统1中找到了50个 bug 。在其他每个子系统中都找到了6个 bug 。在发布后,在子系统1中报告了50个
bug ,在其他每个子系统中都报告了6个 bug。
From the "find important bugs" standpoint,
the first testing effort was superior. It found 100 bugs before
release, whereas the second found only 74. But I think you can make
a strong case that the second effort is more useful in practical
terms. Let me restate the two situations in terms of what a test
manager might say before release:
从“找到重要 bug”的观点看,第1种测试情况较为理想。在发布前找到了100个 bug ,但是第2种情况中只找到74个。但我想你们可能会提出一个有力的理由认为第2中测试在实际中更有用。让我以产品发版前测试经理可能说些什么来重新描述一下两种测试情况:
1. "We have tested subsystem 1 very thoroughly,
and we believe we've found almost all of the priority 1 bugs. Unfortunately,
we don't know anything about the bugginess of the remaining five
subsystems."
“我们全面测试了子系统1,我们相信已经找出了几乎所有优先级为1的 bug。不幸的是,我们对其他五个子系统的的
bug 一无所知。”
2. "We've tested all subsystems moderately
thoroughly. Subsystem 1 is still very buggy. The other subsystems
are about 1/10th as buggy, though we're sure bugs remain."
“我们比较全面地测试了所有的子系统。子系统1仍旧有不少 bug。其他子系统虽然还有 bug,但只有子系统1的
bug 的十分之一。”
This is, admittedly, an extreme example, but it
demonstrates an important point. The project manager has a tough
decision: would it be better to hold on to the product for more
work, or should it be shipped now? Many factors - all rough estimates
of possible futures - have to be weighed: Will a competitor beat
us to release and tie up the market? Will dropping an unfinished
feature to make it into a particular magazine's special "Java
Development Environments" issue cause us to suffer in the review?
Will critical customer X be more annoyed by a schedule slip or by
a shaky product? Will the product be buggy enough that profits will
be eaten up by support costs or, worse, a recall?
必须承认,这是一个极端的例子,但是证明了一个重要的观点。项目经理有一个艰难的决定:是延迟产品交付,再工作一段时间,还是现在就交付使用?许多因素——都是一些大致的评估——都必须予以权衡:竞争对手会抢先发布产品并占领市场吗?如果丢掉一个未完工的功能部件会使得某一个杂志的
“Java 开发环境” 特别期刊的评论中对我们造成损害吗?关键客户 X 对产品延期和劣质产品哪一个更感到烦恼?产品是否有很多 bug,以至于支持成本会吃掉利润,或者更糟糕的是将产品召回?
The testing team will serve the project manager
better if it concentrates first on providing estimates of product
bugginess (reducing uncertainty), then on finding more of the bugs
that are estimated to be there. That affects test planning, the
topic of the next theme.
如果测试小组首先集中于产品错误的估计(减少不确定性),然后再找到更多的错误,他们会更好地服务于项目经理。这会影响测试计划。测试计划将在下个主题中论述。
It also affects status reporting. Test managers
often err by reporting bug data without putting it into context.
Without context, project management tends to focus on one graph:
这也会影响状态报告。测试经理常常会被没有放到具体环境中的报告 bug数据误导。没有具体环境,项目管理倾向于集中于一幅图:
The flattening in the curve of bugs found will be interpreted in
the most optimistic possible way unless you as test manager explain
the limitations of the data:
平滑的错误曲线很容易以一种乐观的方式解释,除非你作为测试经理解释了数据的局限:
· "Only half the planned testing tasks have
been finished, so little is known about half the areas in the project.
There could soon be a big spike in the number of bugs found."
· 只有一半的计划测试做完了,对于项目的一半所知甚少。很快就有很多错误要被发现了。
· "That's especially likely because the last
two weekly builds have been lightly tested. I told the testers to
take their vacations now, before the project hits crunch mode."
· 很有可能这样,因为在过去的两个周构建只是略微测试了一下。我告诉测试员在项目进入艰难状态之前,现在开始休假。
· "Furthermore, based on previous projects
with similar amounts and kinds of testing effort, it's reasonable
to expect at least 45 priority-1 bugs remain undiscovered. Historically,
that's pretty high for a successful product."
· 并且,根据以前的经验,可以预料到至少还有45个级别为1的 bug还没有发现。从历史看,这对于一个成功产品来说是很高的。
For discussions of using bug data, see [Cusumano95],
[Rothman96], and [Marick97].
关于使用 bug 数据的讨论,请参阅[Cusumano95]、[Rothman96]和[Marick97]。
Earlier I asserted that testers can't directly
improve quality; they can only measure it. That's true only if you
find yourself starting testing too late. Tests designed before coding
begins can improve quality. They inform the developer of the kinds
of tests that will be run, including the special cases that will
be checked. The developer can use that information while thinking
about the design, during design inspections, and in his own developer
testing.
我在前面说过,测试员不能直接提高质量,他们只能评估它。只有在你发现测试开始得太晚的时候,这种说法才是正确的。在编码开始前设计测试将会提高质量。他们让开发人员知道将进行什么样的测试,将检查哪些特殊用例。开发人员在思考设计、审查设计和自己做测试的时候可以使用这些信息。
Early test design can do more than prevent coding
bugs. As will be discussed in the next theme, many tests will represent
user tasks. The process of designing them can find user interface
and usability problems before expensive rework is required. I've
found problems like no user-visible place for error messages to
go, pluggable modules that didn't fit together, two screens that
had to be used together but could not be displayed simultaneously,
and "obvious" functions that couldn't be performed. Test
design fits nicely into any usability engineering effort ([Nielsen93])
as a way of finding specification bugs.
尽早测试的作用不仅仅是防止编码错误。像我们将在下一个主题中所讨论的那样,许多测试代表的是用户任务。设计它们的过程可以在昂贵的重新工作之前发现用户界面和可用性问题。我发现过的问题包括:错误消息不能显示在用户可以看到的地方,插件不能放到一起,两个必须同时使用的屏幕不能同时显示,一个“很明显”的功能不能执行。测试设计作为一个发现规格说明书
bug 的方法,很好地与可用性工程工作相适应([Nielsen93])。
I should note that involving testing early feels
unnatural to many programmers and development managers. There may
be feelings that you are intruding on their turf or not giving them
the chance to make the mistakes that are an essential part of design.
Take care, especially at first, not to increase their workload or
slow them down. It may take one or two entire projects to establish
your credibility and usefulness.
我应当说明早期介入测试对于许多程序员和开发经理来说不自然。可能有一种感觉是你干扰了他们,没有给他们在设计的基础部分犯错误的机会。小心些,尤其是在开始的时候,不要增加他们的工作量或减慢了他们的速度。可能需要一至两个完整的项目才能建立你们的可信度并显示出作用。
主题二:计划测试工作
I'll first discuss specific planning mistakes, then relate test
planning to the role of testing.
我将首先讨论特定的计划错误,然后将测试计划与测试作用关联起来。
It's not unusual to see test plans biased toward
functional testing. In functional testing, particular features are
tested in isolation. In a word processor, all the options for printing
would be applied, one after the other. Editing options would later
get their own set of tests.
将测试计划偏重于功能测试的情况的并不少见。在功能测试中,某个功能部件是孤立测试的。在字处理软件中,所有打印选项都将一个接一个地应用。编辑选项在后面将得到它们自己的测试集。
But there are often interactions between features,
and functional testing tends to miss them. For example, you might
never notice that the sequence of operations open a document, edit
the document, print the whole document, edit one page, print that
page doesn't work. But customers surely will, because they don't
use products functionally. They have a task orientation. To find
the bugs that customers see - that are important to customers -
you need to write tests that cross functional areas by mimicking
typical user tasks. This type of testing is called scenario testing,
task-based testing, or use-case testing.
但是,在各个功能部件中常常有交互作用,功能测试很容易遗漏它们。例如,你可能从未注意到一系列的操作:打开文档、编辑文档、打印整个文档、编辑一页、打印该页不能工作。但是客户一定会注意到,因为他们不会按功能使用产品。他们是面向任务的。如果要找到客户看到的
bug——这些 bug 对于客户来说是很重要的——你需要编写模仿典型用户任务的跨功能区的测试用例。这类测试称为场景测试、基于任务的测试,或使用用例测试。
A bias toward functional testing also underemphasizes
configuration testing. Configuration testing checks how the product
works on different hardware and when combined with different third
party software. There are typically many combinations that need
to be tried, requiring expensive labs stocked with hardware and
much time spent setting up tests, so configuration testing isn't
cheap. But, it's worth it when you discover that your standard in-house
platform which "entirely conforms to industry standards"
actually behaves differently from most of the machines on the market.
偏重于功能测试也会低估配置测试的重要性。配置测试检查产品在不同硬件上、以及在与不同的第三方软件组合使用时如何工作。通常有不同的典型组合需要尝试,需要有装备了硬件的昂贵实验室,并花费很多时间设置测试,所以配置测试成本不低。但是,当你发现你的“完全符合业界标准”的标准机构内部平台实际上在市场上不同的机器上表现不同的时候,这样做就值了。
Both configuration testing and scenario testing
test global, cross-functional aspects of the product. Another type
of testing that spans the product checks how it behaves under stress
(a large number of transactions, very large transactions, a large
number of simultaneous transactions). Putting stress and load testing
off to the last minute is common, but it leaves you little time
to do anything substantive when you discover your product doesn't
scale up to more than 12 users.
配置测试和场景测试都测试产品的全面的、跨功能的方面。另一类测试是跨越产品以检查在压力(大量事务、很大的事务、大量并发事务)下的表现。将压力测试和负载测试推迟到最后一刻才进行是一种常见的情况,但是这样做的结果是,当你发现你的产品不能支持12个以上的用户时,你已经没有多少时间来采用实际的措施。
Two related mistakes are not testing the documentation
and not testing installation procedures. Testing the documentation
means checking that all the procedures and examples in the documentation
work. Testing installation procedures is a good way to avoid making
a bad first impression.
一个相关错误是不测试文档,也不测试安装过程。测试文档意味着检查文档中所有过程和示例都能工作。测试安装过程是避免给别人留下糟糕的第一印象的好方法。
How about avoiding testing altogether?
不做测试会怎么样?
At a conference last year, I met (separately)
two depressed testers who told me their management was of the opinion
that the World Wide Web could reduce testing costs. "Look at
[wildly successful internet company]. They distribute betas over
the network and get their customers to do the testing for free!"
The Windows 95 beta program is also cited in similar ways.
在去年的一个会议上,我(分别)遇到两个沮丧的测试员,他们告诉我他们的管理是基于这样一种意见:万维网(World
Wide Web)可以减少测试成本。“看看非常成功的网络公司”。他们在网络上分发β版,让客户免费给他们做测试!”。Windows
95的β程序也是这样的。
Beware of an overreliance on beta testing. Beta
testing seems to give you test cases representative of customer
use - because the test cases are customer use. Also, bugs reported
by customers are by definition those important to customers. However,
there are several problems:
要当心对β测试的过分依赖。因为测试用例是客户使用的,所以β测试似乎是给了你客户使用的代表用例。另外,客户报告的错误也是对客户重要的。但是,有几个问题:
1. The customers probably aren't that representative.
In the common high-tech marketing model, beta users, especially
those of the "put it on your web site and they will download"
sort, are the early adopters, those who like to tinker with new
technologies. They are not the pragmatists, those who want to wait
until the technology is proven and safe to adopt. The usage patterns
of these two groups are different, as are the kinds of bugs they
consider important. In particular, early adopters have a high tolerance
for bugs with workarounds and for bugs that "just go away"
when they reload the program. Pragmatists, who are much less tolerant,
make up the large majority of the market.
客户可能不是代表。在一个普通的高科技市场营销模型中,β用户,特别是那种“将产品放到网站上让他们下载”的情况,是早期的采用者。他们喜欢摆弄新技术。他们不是实用主义者,不是那种愿意等到新技术被证明是安全可靠后才采用的人。这两种类别的使用方式是不同的,就像他们认为
bug 的重要程度是不同的一样。特别地,早期的采用者对于能够用变通方法解决的 bug和重新加载程序就能消失的 bug有较强的容忍性。但容忍性较差的实用主义者占据了市场的大部分。
2. Even of those beta users who actually use the
product, most will not use it seriously. They will give it the equivalent
of a quick test drive, rather than taking the whole family for a
two week vacation. As any car buyer knows, the test drive often
leaves unpleasant features undiscovered.
即使是那些实际使用产品的β用户,大多数也不会认真地使用。他们会给一个类似于试驾车的快速测试,而不是带着整个家庭休假两周。很多购买汽车的人都知道,试驾车经常会遗漏一些令人不愉快的特性。
3. Beta users - just like customers in general
- don't report usability problems unless prompted. They simply silently
decide they won't buy the final version.
β用户象客户一样,除非特别要求,一般不会报告可用性错误。他们只是暗自决定不去购买最终产品。
4. Beta users - just like customers in general
- often won't report a bug, especially if they're not sure what
they did to cause it, or if they think it is obvious enough that
someone else must have already reported it.
β用户象客户一样,常常不会报告 bug ,尤其是当他们不能确定是什么操作导致了错误,或者是他们认为这个错误很明显,其他人肯定已经报告了。
5. When beta users report a bug, the bug report
is often unusable. It costs much more time and effort to handle
a user bug report than one generated internally.
当β用户报告错误时,错误报告常常无法使用。处理一个用户的错误报告比一个内部产生的错误报告要花费多得多的时间和精力。
Beta programs can be useful, but they require
careful planning and monitoring if they are to do more than give
a warm fuzzy feeling that at least some customers have used the
product before it's inflicted on all of them. See [Kaner93] for
a brief description.
β程序可能是有用的,但是需要仔细的计划和监督,否则它们在激怒所有β客户之前,除了带来一种模糊的、兴奋的感觉,认为至少有一些客户在使用产品之外,不会后其他收获。参见[kaner93]以获取一个简要描述。
The one situation in which beta programs are unequivocally
useful is in configuration testing. For any possible screwy configuration,
you can find a beta user who has it. You can do much more configuration
testing than would be possible in an in-house lab (or even perhaps
an outsourced testing agency). Beta users won't do as thorough a
job as a trained tester, but they'll catch gross errors of the "BackupBuster
doesn't work on this brand of 'compatible' floppy tape drive"
sort.
β测试有用的一种情况是配置测试。对于任何古怪的配置,你都可以找到一个使用此配置的β用户。你可以做比机构内部实验室(或者甚至是外包给测试机构)多的配置测试。β用户不会象一个训练有素的测试员一样做完整的测试,但他们可以捕捉到大致错误,像“BackupBuster在这个品牌的兼容磁带驱动器上不能工作”。
Beta programs are also useful for building word
of mouth advertising, getting "first glance" reviews in
magazines, supporting third-party vendors who will build their product
on top of yours, and so on. Those are properly marketing activities,
not testing.
β程序也有助于建立口头的广告,获得杂志的“第一印象”评论,支持第三方供应商在你的产品上构建他们的产品等等。这些都是正常的市场营销活动,不是测试。
Planning and replanning in support of the role
of testing
计划和重新计划测试的支持作用
Each of the types of testing described above,
including functional testing, reduces uncertainty about a particular
aspect of the product. When done, you have confidence that some
functional areas are less buggy, others more. The product either
usually works on new configurations, or it doesn't.
上面所描述的包括功能测试在内的各种类型的测试,减少了产品某一方面的不确定性。在执行完毕后,你可以确信某些功能领域的错误较少了,其他的还比较多。产品通常将在新配置中起作用,或者是不起作用。
There's a natural tendency toward finishing one
testing task before moving on to the next, but that may lead you
to discover bad news too late. It's better to know something about
all areas than everything about a few. When you've discovered where
the problem areas lie, you can test them to greater depth as a way
of helping the developers raise the quality by finding the important
bugs.
有一种很自然的倾向,就是在进行到下一个测试任务之前先完成一个任务,但这可能导致你过晚地发现坏消息。对所有领域都了解一些比深入了解几个领域更重要。如果你发现了问题在哪个地方,你可以更深入地测试它们,通过发现重要
bug来帮助开发人员提高质量。
Strictly, I've been over-simplistic in describing
testing's role as reducing uncertainty. It would be better to say
"risk-weighted uncertainty". Some areas in the product
are riskier than others, perhaps because they're used by more customers
or because failures in that area would be particularly severe. Riskier
areas require more certainty. Failing to correctly identify risky
areas is a common mistake, and it leads to misallocated testing
effort. There are two sound approaches for identifying risky areas:
严格地说,我对将测试的作用描述为减少不确定性是太简单了。更恰当的说法是“风险加权”的不确定性。产品中某些领域比其他领域更有风险,也许是因为它们由更多客户使用或是因为那个领域的故障更严重。危险性高的区域需要更好的稳定性。不能正确地识别危险区域是一个常犯的错误,它导致测试工作的不恰当分配。
1. Ask everyone you can for their opinion. Gather
data from developers, marketers, technical writers, customer support
people, and whatever customer representatives you can find. See
[Kaner96a] for a good description of this kind of collaborative
test planning.
向每一个能够找到的人征询意见。从开发人员、市场人员、技术写作人员、客户支持人员和你能找到的每一个客户代表那里收集意见。查看[Kaner96a]以获得关于这种协同测试计划的描述。
2. Use historical data. Analyzing bug reports
from past products (especially those from customers, but also internal
bug reports) helps tell you what areas to explore in this project.
使用历史数据。分析以前产品的 bug 报告(特别是来自客户的,但也要包含内部 bug 报告)可以帮助你辨别在这个项目中还需要探索哪些领域。
"So, winter's early this year. We're
still going to invade Russia."
“今年冬天来得很早。但我们还是要入侵俄国。”
Good testers are systematic and organized, yet
they are exposed to all the chaos and twists and turns and changes
of plan typical of a software development project. In fact, the
chaos is magnified by the time it gets to testers, because of their
position at the end of the food chain and typically low status.
One unfortunate reaction is sticking stubbornly to the test plan.
Emotionally, this can be very satisfying: "They can flail around
however they like, but I'm going to hunker down and do my job."
The problem is that your job is not to write tests. It's to find
the bugs that matter in the areas of greatest uncertainty and risk,
and ignoring changes in the reality of the product and project can
mean that your testing becomes irrelevant.
好的测试员是有计划、有组织的,但他们受到计划,特别是软件开发项目计划的各种混乱、各种意外转折的影响,因为他们处于食物链的最后一环,而且通常地位比较低。一个不幸的反应是固执地坚持测试计划。从感情上讲,这会令人很满意:“他们可以随意胡乱摆弄,但我要坐下来做我的工作。”但问题是你的工作不是编写测试。而是在最不确定和危险的领域发现
bug 。忽略产品和项目的实际变化可能意味着你的测试变得无关紧要。
That's not to say that testers should jump to
readjust all their plans whenever there's a shift in the wind, but
my experience is that more testers let their plans fossilize than
overreact to project change.
这不是说测试员在有任何变化时都应该匆忙地重新调节他们的计划,但我的经验是很多的测试员都让计划僵化而不是对项目变化起过度的反应。
主题三:人员问题
Fresh out of college, I got my first job as a
tester. I had been hired as a developer, and knew nothing about
testing, but, as they said, "we don't know enough about you
yet, so we'll put you somewhere where you can't do too much damage".
In due course, I "graduated" to development.
刚走出大学校门的时候,我得到了第一份工作:测试员。我是做为开发人员被录用的,对测试一无所知,但是他们说:“我们对你还不太了解,所以要把你放到一个你不能做太多破坏的地方。”在这个课程结束后,我“毕业”并加入到开发部门。
Using testing as a transitional job for new programmers
is one of the two classic mistaken ways to staff a testing organization.
It has some virtues. One is that you really can keep bad hires away
from the code. A bozo in testing is often less dangerous than a
bozo in development. Another is that the developer may learn something
about testing that will be useful later. (In my case, it founded
a career.) And it's a way for the new hire to learn the product
while still doing some useful work.
将测试作为新程序员的过渡工作是组织测试人员架构的两个典型错误中的一个。这样做有一些可取之处。一是你的确可以使一些不合格的雇员远离代码。一个测试行业的笨蛋常常比一个开发行业的笨蛋的危险性要小。再有就是开发人员可能学习到一些以后有用的测试知识(就我而言,测试开创了我的职业生涯)。还有就是一个新手在了解产品的同时还能做一些有用的工作。
The advantages are outweighed by the disadvantage:
the new hire can't wait to get out of testing. That's hardly conducive
to good work. You could argue that the testers have to do good work
to get "paroled". Unfortunately, because people tend to
be as impressed by effort as by results, vigorous activity - especially
activity that establishes credentials as a programmer - becomes
the way out. As a result, the fledgling tester does things like
become the expert in the local programmable editor or complicated
freeware tool. That, at least, is a potentially useful role, though
it has nothing to do with testing. More dangerous is vigorous but
misdirected testing activity; namely, test automation. (See the
last theme.)
但是不利之处超过了有利之处:新雇员迫不及待地要离开测试行业。这很难产生高质量的工作。你可能会争辩说测试员为了“被释放”,必定会好好工作。不幸的是,过程给人留下的印象常常像结果一样深刻,严厉的活动——特别是为了证实具备程序员资格的活动——变得过时了。结果,缺乏经验的测试员所做的事情就像一个局部可编程编辑器专家或是一个复杂的自由软件工具专家所做的事情一样。这些虽然与测试无关,但至少还有潜在的作用。更危险的是误导了测试活动,即测试自动化。(参见最后一个主题)
Even if novice testers were well guided, having
so much of the testing staff be transients could only work if testing
is a shallow algorithmic discipline. In fact, good testers require
deep knowledge and experience.
即使新测试员很好地获得指导,除非测试是一个浅显的算法学科,否则将这么多测试人员转换工作也是不可行的。事实上,好的测试员需要深入的知识与经验。
The second classic mistake is recruiting testers
from the ranks of failed programmers. There are plenty of good testers
who are not good programmers, but a bad programmer likely has some
work habits that will make him a bad tester, too. For example, someone
who makes lots of bugs because he's inattentive to detail will miss
lots of bugs for the same reason.
第二个典型错误是从不合格的程序员中招募测试员。有很多好的测试员都不是好的程序员,但一个不好的程序员的一些工作习惯可能使他也会成为一个不好的测试员。例如,一个因为不注重细节的而产生很多
bug 的人也会因为同样的原因而漏掉很多 bug 。
So how should the testing team be staffed? If
you're willing to be part of the training department, go ahead and
accept new programmer hires. Accept as applicants programmers who
you suspect are rejects (some fraction of them really have gotten
tired of programming and want a change) but interview them as you
would an outside hire. When interviewing, concentrate less on formal
qualifications than on intelligence and the character of the candidate's
thought. A good tester has these qualities:
那么应该如何招募测试团队呢?如果你愿意成为一个培训部门,可以继续接受一些新程序员。接受一些你怀疑是被其他人舍弃的程序员申请人(他们之中确实有一些人是厌倦了编程而想有一些变化),但是像从公司外面招人一样面试他们。在面试的时候,重点集中于应聘者的智力和思想特征而不是表面的资历。一个好测试员应该具备:
· methodical and systematic.
· 有条理、有计划。
· tactful and diplomatic (but firm when necessary).
· 有策略、说话办事得体(但在需要的时候要坚定)
· skeptical, especially about assumptions, and
wants to see concrete evidence.
· 怀疑能力,特别是关于假设的,并要看到具体证明。
· able to notice and pursue odd details.
· 能够注意并跟踪奇怪的细节之处。
· good written and verbal skills (for explaining
bugs clearly and concisely).
· 良好的书面和口头表达技巧(可以清楚、简洁地解释 bug )。
· a knack for anticipating what others are likely
to misunderstand. (This is useful both in finding bugs and writing
bug reports.)
· 能够预料到其他人可能会误解什么的能力(这在发现 bug 和编写 bug 报告时非常有用)
· a willingness to get one's hands dirty, to experiment,
to try something to see what happens.
· 愿意不辞辛苦地进行实验,尝试一些事情来看看会发生什么。
Be especially careful to avoid the trap of testers
who are not domain experts. Too often, the tester of an accounting
package knows little about accounting. Consequently, she finds bugs
that are unimportant to accountants and misses ones that are. Further,
she writes bug reports that make serious bugs seem irrelevant. A
programmer may not see past the unrepresentative test to the underlying
important problem. (See the discussion of reporting bugs in the
next theme.)
特别是要小心避免测试员不是领域专家的陷阱。经常地,会计软件包的测试员对会计了解很少。结果是,她发现的
bug 对于会计师来说不重要,但又漏掉了很多对于会计师来说很重要的 bug 。而且,她编写的 bug 报告将使严重的 bug 看起来无关紧要。程序员可能无法透过不具备代表性的测试来看到底层的重要问题(查看下一主题中的关于报告
bug 的讨论。)
Domain experts may be hard to find. Try to find
a few. And hire testers who are quick studies and are good at understanding
other people's work patterns.
领域专家可能不太好找。尝试去找几个。聘用那些能够快速学习并且善于理解他人工作方式的测试员。
Two groups of people are readily at hand and often
have those skills. But testing teams often do not seek out applicants
from the customer service staff or the technical writing staff.
The people who field email or phone problem reports develop, if
they're good, a sense of what matters to the customer (at least
to the vocal customer) and the best are very quick on their mental
feet.
有两组人员比较容易找并且常常具备这些技能。但是测试小组经常不从客户服务人员或技术文档写作人员中寻求申请人。通过邮件或电话解决问题报告的人,如果是称职的,那么他们知道对于客户(至少是电话中的客户)来说什么是重要、最好的,这种感觉对他们将有所帮助。
Like testers, technical writers often also lack
detailed domain knowledge. However, they're in the business of translating
a product's behavior into terms that make sense to a user. Good
technical writers develop a sense of what's important, what's confusing,
and so on. Those areas that are hard to explain are often fruitful
sources of bugs. (What confuses the user often also confuses the
programmer.)
像测试员一样,技术写作人员常常也缺乏详细的领域知识。但是,他们的工作是将产品的特性以对用户有意义的方式转换出来。一个好的技术写作人员有培养出一种什么是重要的、什么是令人迷惑的感觉。那些难于解释的领域经常包含了很多的测试错误。(使用户感到迷惑的地方同样也会使程序员感到迷惑。)
One reason these two groups are not tapped is
an insistence that testers be able to program. Programming skill
brings with it certain advantages in bug hunting. A programmer is
more likely to find the number 2,147,483,648 interesting than an
accountant will. (It overflows a signed integer on most machines.)
But such tricks of the trade are easily learned by competent non-programmers,
so not having them is a weak reason for turning someone down.
没有选择这两组人员的一个原因是坚持认为测试员都应当会编程。编程技巧会给搜寻 bug 带来一定的优势。与财务人员相比,程序员更有可能发现数字2,147,483,648是有趣的(这个数字在大多数机器的有符号整数中溢出。)但是这种技巧很容易被有能力的非程序员掌握,所以这是不录取他们的一个不充分的理由。
If you hire according to these guidelines, you
will avoid a testing team that lacks diversity. All of the members
will lack some skills, but the team as a whole will have them all.
Over time, in a team with mutual respect, the non-programmers will
pick up essential tidbits of programming knowledge, the programmers
will pick up domain knowledge, and the people with a writing background
will teach the others how to deconstruct documents.
如果你按照这些规则招聘员工,你就会避免一个缺乏多样性的测试小组。所有的成员都会缺乏某些技能,但作为一个整体,小组应当具备这些所有的技能。随着时间的推移,在一个互相尊重的的小组中,非程序员将获取一些最基础的编程知识,程序员将获得专业领域知识,而具有写作背景的人将教会其他人如何解构、拆析文档。
All testers - but non-programmers especially -
will be hampered by a physical separation between developers and
testers. A smooth working relationship between developers and testers
is essential to efficient testing. Too much valuable information
is unwritten; the tester finds it by talking to developers. Developers
and testers must often work together in debugging; that's much harder
to do remotely. Developers often dismiss bug reports too readily,
but it's harder to do that to a tester you eat lunch with.
所有的测试员——尤其是非程序员——会被开发人员和测试员在物理位置上的隔离所困扰。开发人员和测试员之间和谐的工作关系对于有效测试来说至关重要。太多有价值的信息没有记录下来;测试员在与开发人员交谈时发现了它。开发人员与测试员必须在一起工作以排除
bug ,远程实现是非常困难的。开发人员常常随意关闭一个 bug 报告,但是对一个一起吃午餐的测试员的报告却很难这样做。
Remote testing can be made to work - I've done
it - but you have to be careful. Budget money for frequent working
visits, and pay attention to interpersonal issues.
远程测试也能达到目的——我就这样做过——但你必须很小心。经常进行工作访问的资金预算,并且要注意人际关系问题。
Some believe that programmers can't test their
own code. On the face of it, this is false: programmers test their
code all the time, and they do find bugs. Just not enough of them,
which is why we need independent testers.
有些人相信程序员不能测试他们自己的代码。这显然不对:程序员一直都在测试他们的代码,而且他们也的确能够发现
bug 。只是发现的 bug 还不够多,这也是为什么我们需要独立的测试员。
But if independent testers are testing, and programmers
are testing (and inspecting), isn't there a potential duplication
of effort? And isn't that wasteful? I think the answer is yes. Ideally,
programmers would concentrate on the types of bugs they can find
adequately well, and independent testers would concentrate on the
rest.
但是如果独立测试员也在测试,程序员也在测试(并且也在走查代码),其中不存在潜在的重复工作吗?这不是一种浪费吗?我想答案是肯定的。理想情况中,程序员应当集中于他们能够充分发现的
bug 类型,而独立测试员应集中于其他部分。
The bugs programmers can find well are those where
their code does not do what they intended. For example, a reasonably
trained, reasonably motivated programmer can do a perfectly fine
job finding boundary conditions and checking whether each known
equivalence class is handled. What programmers do poorly is discovering
overlooked special cases (especially error cases), bugs due to the
interaction of their code with other people's code (including system-wide
properties like deadlocks and performance problems), and usability
problems.
程序员能够较好地发现的 bug 是那些与他们预期不符的代码。例如,一个接受过一定培训、有一定积极性的程序员可以很好地找到边界条件,并且检查每一个等价类是否都处理了。程序员做的不好的地方是不能发现被忽略的某些情况(尤其是错误情况),不能发现由于他们的代码与其他人的代码交互作用而产生的
bug ,以及易用性问题。
Crudely put, good programmers do functional testing,
and testers should do everything else. Recall that I earlier claimed
an over-concentration on functional testing is a classic mistake.
Decent programmer testing magnifies the damage it does.
大致来说,好的程序员进行功能测试,测试员应该完成其他所有工作。回忆一下我前面曾说过,过分集中于功能测试是一个典型错误。合格的程序员测试夸大了它产生的破坏。
Of course, decent programmer testing is relatively
rare, because programmers are neither trained nor motivated to test.
This is changing, gradually, as companies realize it's cheaper to
have bugs found and fixed quickly by one person, instead of more
slowly by two. Until then, testers must do both the testing that
programmers can do and the testing only testers can do, but must
take care not to let functional testing squeeze out the rest.
当然,合格的程序员测试相对较少,因为程序员既没有接受过培训,对测试也没有热情。但是随着各个公司意识到由一个人发现并修复
bug 成本较低,这种情况也在逐步改变。在此之前,测试员不但必须完成程序员可以完成的测试,还要完成只有测试员才能完成的工作,还必须小心不要让功能测试挤占了其他测试。
主题四:工作中的测试员
When testing, you must decide how to exercise
the program, then do it. The doing is ever so much more interesting
than the deciding. A tester's itch to start breaking the program
is as strong as a programmer's itch to start writing code - and
it has the same effect: design work is skimped, and quality suffers.
Paying more attention to running tests than to designing them is
a classic mistake. A tester who is not systematic, who does not
spend time laying out the possibilities in advance, will overlook
special cases. They may be the same subtle ones that the programmers
overlooked.
在测试的时候,必须决定如何执行程序,然后完成它们。完成它们比决定它们要有趣的多。测试员渴望的是开始破坏程序,程序员渴望的是开始写代码——这导致相同结果:设计工作被忽略了,产品质量受到损害。将更多的注意力集中于运行测试而不是设计它们是一个典型错误。
Concentration on execution also results in unreviewed
test designs. Just like programmers, testers can benefit from a
second pair of eyes. Reviews of test designs needn't be as elaborate
as product design reviews, but a short check of the testing approach
and the resulting tests can find significant omissions at low cost.
集中于执行测试也导致未经审核的测试设计。就像程序员一样,测试员也得益于第二双眼睛的检查。测试设计的审核不必像产品审核那样严格,但是对测试方法和结果测试的快速检查可以低成本地找到重要的疏忽。
What is a test design?
什么是测试设计?
A test design should contain a description of
the setup (including machine configuration for a configuration test),
inputs given to the product, and a description of expected results.
One common mistake is being too specific about test inputs and procedures.
测试设计应当包含设置描述(包括配置测试的机器配置),对产品的输入和预期结果的描述。一个常见错误是对测试输入和过程过于注重细节。
Let's assume manual test implementation for the
moment. A related argument for automated tests will be discussed
in the next section. Suppose you're testing a banking application.
Here are two possible test designs:
让我们先假设一个手工测试实施。相关的自动化测试将在下一节讨论。假设你在测试银行应用程序。这里有两个可能的测试设计:
Design 1
设计1
Setup: initialize the balance in account 12 with
$100.
设置:将帐户12的余额初始化为$100。
Procedure:
过程:
Start the program.
Type 12 in the Account window.
Press OK.
Click on the 'Withdraw' toolbar button.
In the withdraw popup dialog, click on the 'all'
button.
Press OK.
Expect to see a confirmation popup that says "You
are about to withdraw all the money from this account. Continue?"
Press OK.
Expect to see a 0 balance in the account window.
Separately query the database to check that the
zero balance has been posted.
Exit the program with File->Exit.
启动程序。
在帐户窗口中输入12。
按“确定”按钮。
点击“取款”工具条按钮。
在弹出的取款对话框中,点击“所有”按钮。
按“确定”按钮。
预期会看到一个确认消息:“您将从此帐户中取出所有的钱,是否继续?”
按“确定”按钮。
在帐户窗口中预期会看到余额为0。
单独查询数据库,检查余额为0。
通过“文件->退出” 退出程序。
Design 2
设计2
Setup: initialize the balance with a positive
value.
设置:将帐户余额初始化为一个正值。
Procedure:
过程:
Start the program on that account.
Withdraw all the money from the account using
the 'all' button.
It's an error if the transaction happens without
a confirmation popup.
Immediately thereafter:
- Expect a $0 balance to be displayed.
- Independently query the database to check that
the zero balance has been posted.
启动该帐户的程序。
用“所有”按钮从帐户中取出所有的钱。
如果在事务发生时没有弹出确认消息,则是一个错误。
其后立即:
- 预期余额会显示$0。
- 单独查询数据库,检查余额为0。
The first design style has these advantages:
第一种设计风格有以下优点:
· The test will always be run the same way. You
are more likely to be able to reproduce the bug. So will the programmer.
· 测试总是以相同方式运行。重现错误的可能性更大。程序员也一样。
· It details all the important expected results
to check. Imprecise expected results make failures harder to notice.
For example, a tester using the second style would find it easier
to overlook a spelling error in the confirmation popup, or even
that it was the wrong popup.
· 它将所有要检查的预期结果的细节都描述出来。不精确的预期结果使得错误更难注意到。例如,使用第二种风格的测试员将会发现更容易忽略确认对话框中的错误拼写,甚至是错误的对话框。
· Unlike the second style, you always know exactly
what you've tested. In the second style, you couldn't be sure that
you'd ever gotten to the Withdraw dialog via the toolbar. Maybe
the menu was always used. Maybe the toolbar button doesn't work
at all!
· 不像第二种测试风格,你总是能明确地知道你在测试什么。在第二种风格中,你不能确定可以通过工具条得到“取款”对话框。也许总是使用菜单。也许工具条根本不起作用!
· By spelling out all inputs, the first style
prevents testers from carelessly overusing simple values. For example,
a tester might always test accounts with $100, rather than using
a variety of small and large balances. (Either style should include
explicit tests for boundary and special values.)
· 通过写出所有的输入,第一种风格防止程序员无意间过度使用简单的值。例如,一个测试员可能总是用$100测试帐户,而不是使用一些小的和大的余额的组合。(这两种风格都应显式地包含边界值和特殊值测试。)
However, there are also some disadvantages:
但是,也有一些缺点:
· The first style is more expensive to create.
· 创建第一种风格的测试成本较高。
· The inevitable minor changes to the user interface
will break it, so it's more expensive to maintain.
· 对用户界面的一些不可避免的更改将中断它,因此维护成本也就更高。
· Because each run of the test is exactly the
same, there's no chance that a variation in procedure will stumble
across a bug.
· 因为每一轮测试都完全相同,所以也就没有机会因为过程不同而偶然发现 bug 。
· It's hard for testers to follow a procedure
exactly. When one makes a mistake - pushes the wrong button, for
example - will she really start over?
· 测试员难于遵循测试过程。如果一个人出现错误——比如说按错按钮——她需要重新开始吗?
On balance, I believe the negatives often outweigh
the positives, provided there is a separate testing task to check
that all the menu items and toolbar buttons are hooked up. (Not
only is a separate task more efficient, it's less error-prone. You're
less likely to accidentally omit some buttons.)
如果能有一个独立的测试任务来检查所有的菜单项和工具条按钮都连接了代码(一个单独的测试不但更有效,而且不易出错。你不大会偶然地忽略掉一些按钮。),那么权衡利弊,我相信第一种设计的负面影响超过正面影响。
I do not mean to suggest that test cases should
not be rigorous, only that they should be no more rigorous than
is justified, and that we testers sometimes error on the side of
uneconomical detail.
我不是认为测试用例不应当严格,只是说它们过分严格,而且我们测试员有时在不经济的细节中犯错误。
Detail in the expected results is less problematic
than in the test procedure, but too much detail can focus the tester's
attention too much on checking against the script he's following.
That might encourage another classic mistake: not noticing and exploring
"irrelevant" oddities. Good testers are masters at noticing
"something funny" and acting on it. Perhaps there's a
brief flicker in some toolbar button which, when investigated, reveals
a crash. Perhaps an operation takes an oddly long time, which suggests
to the attentive tester that increasing the size of an "irrelevant"
dataset might cause the program to slow to a crawl. Good testing
is a combination of following a script and using it as a jumping-off
point for an exploration of the product.
详细的预期结果比详细的测试过程问题要少,但是过多的细节可能是测试员的注意力过多集中于检查他所依照的脚本。这可能也导致另一个典型错误:不能注意和探索“不相关的”奇怪现象。好的测试员善于注意到“有趣的东西”并对其进行操作。可能在工具条的一个短暂的闪动,经过调查后,揭示了一个失效错误。也许一个操作任务奇怪地花费了很长时间,可能使专注的程序员感到增加“不相关”的数据集合的大小可能使程序慢如蜗牛。好的测试是既遵循脚本,又能将它作为探索产品的出发点。
An important special case of overlooking bugs
is checking that the product does what it's supposed to do, but
not that it doesn't do what it isn't supposed to do. As an example,
suppose you have a program that updates a health care service's
database of family records. A test adds a second child to Dawn Marick's
record. Almost all testers would check that, after the update, Dawn
now has two children. Some testers - those who are clever, experienced,
or subject matter experts - would check that Dawn Marick's spouse,
Brian Marick, also now has two children. Relatively few testers
would check that no one else in the database has had a child added.
They would miss a bug where the programmer over-generalized and
assumed that all "family information" updates should be
applied both to a patient and to all members of her family, giving
Paul Marick (aged 2) a child.
一个重要的忽略 bug的特例情况是检查产品完成预期操作,但不检查它是否没有完成不应该完成的操作。举个例子,假设你有一个更新医疗机构的家庭记录数据库的程序。一个测试是在Dawn
Marick的记录中添加第二个小孩。几乎所有的测试员都将在更新之后检查Dawn Marick现在有两个小孩了。部分测试员——那些聪明的、有经验的专家——将会检查Dawn
Marick的配偶——Brian Marick,现在也有两个小孩了。相对较少的测试员将检查数据库中没有其他人添加了小孩。如果程序员将规则过分扩展,认为应当对所有的既是病人又是她的家庭成员的人都更新
“家庭信息”,给了Paul Marick(2岁)一个小孩,则这个 bug 就被忽略了。
Ideally, every test should check that all data
that should be modified has been modified and that all other data
has been unchanged. With forethought, that can be built into automated
tests. Complete checking may be impractical for manual tests, but
occasional quick scans for data that might be corrupted can be valuable.
理想情况中,每个测试都应检查需要修改的数据都被修改了,其他数据都没有。在经过仔细考虑后,可以将这个过程构建到自动化测试中。完全检查可能对于手工测试来说不切合实际的,但是偶尔地快速检查数据是否破坏可能是很有价值的。
Testing should not be isolated work
测试不应当是孤立的工作
Here's another version of the test we've been
discussing:
这里是我们讨论过的另一个版本:
Design 3
设计3
Withdraw all with confirmation and normal check
for 0.
取出所有钱,需要确认,并检查余额为0。
That means the same thing as Design 2 - but only
to the original author. Test suites that are understandable only
by their owners are ubiquitous. They cause many problems when their
owners leave the company; sometimes many month's worth of work has
to be thrown out.
除了最初的作者,这与设计2是相同的。测试套件只有它们的作者才能理解是常见情况。当它们的拥有者离开公司后,会带来许多问题;有时候很多个月的工作就白费了。
I should note that designs as detailed as Designs
1 or 2 often suffer a similar problem. Although they can be run
by anyone, not everyone can update them when the product's interface
changes. Because the tests do not list their purposes explicitly,
updates can easily make them test a little less than they used to.
(Consider, for example, a suite of tests in the Design 1 style:
how hard will it be to make sure that all the user interface controls
are touched in the revised tests? Will the tester even know that's
a goal of the suite?) Over time, this leads to what I call "test
suite decay," in which a suite full of tests runs but no longer
tests much of anything at all.
我需要说明的是像设计1和2那样详细的设计也存在同样的问题。虽然他们可能由任何人运行,但不是每个人都能在产品界面变化后更新它们。因为测试不会显式地列出它们的目的,更新它们可能很容易使得比以前测试的少一点点。(例如,考虑一下,设计1风格中的测试套件:要确保所有用户界面控件在更改后的测试中被涉及是一件多么困难的事情?)长期以来,这导致了我称为“测试套件变质”的问题,完整的测试套件仍旧在运行,但什么也测试不了。
Another classic mistake involves the boundary
between the tester and programmer. Some products are mostly user
interface; everything they do is visible on the screen. Other products
are mostly internals; the user interface is a "thin pipe"
that shows little of what happens inside. The problem is that testing
has to use that thin pipe to discover failures. What if complicated
internal processing produces only a "yes or no" answer?
Any given test case could trigger many internal faults that, through
sheer bad luck, don't produce the wrong answer.
另一个典型错误是测试员与程序员的边界。某些产品主要是用户界面;他们做的所有操作在屏幕上都是可见的。其他产品主要是内部的;用户界面是一个“细管道”,很少显示内部发生什么。问题是测试必须使用那个细管道来发现错误。如果一个复杂的内部处理产生的只是“是或否”的答案,结果会怎么样呢?任何给定的测试用例都能触发很多内部错误,仅仅通过不坏的运气,才不会产生错误的答案。
In such situations, testers sometimes rely solely
on programmer ("unit") testing. In cases where that's
not enough, testing only through the user-visible interface is a
mistake. It is far better to get the programmers to add "testability
hooks" or "testpoints" that reveal selected internal
state. In essence, they convert a product like this:
在这样的情况中,有时候测试员单独依赖于程序员(“单元”) 测试。在这不够充足的情况下,仅从用户可见的界面测试是一个错误。如果使程序员加上“可测试性钩子”或“测试点”以揭示所选择的内部状态的话,会好得多。本质上,他们将一个产品:
to one like this:
转化为:
It is often difficult to convince programmers to add test support
code to the product. (Actual quote: "I don't want to clutter
up my code with testing crud.") Persevere, start modestly,
and take advantage of these facts:
说服程序员向产品中添加测试支持代码常常是很困难的(一个真实引语:“我不想让测试代码弄乱我的程序。”)坚持下去,适时开始,并利用以下事实:
1. The test support code is often a simple extension
of the debugging support code programmers write anyway.
测试支持代码常常只是程序员随便编写的调试支持程序的简单延伸。
2. A small amount of test support code often goes
a long way.
少量的测试支持代码常常就会带来很大帮助。
A common objection to this approach is that the
test support code must be compiled out of the final product (to
avoid slowing it down). If so, tests that use the testing interface
"aren't testing what we ship". It is true that some of
the tests won't run on the final version, so you may miss bugs.
But, without testability code, you'll miss bugs that don't reveal
themselves through the user interface. It's a risk tradeoff, and
I believe that adding test support code usually wins. See [Marick95],
chapter 13, for more details.
对这种方法的普遍的反对意见是测试支持代码必须编译在最终产品之外(以避免显示)。如果是这样的,测试员使用的测试界面“不是我们交付的产品”。诚然,某些测试不会运行在最终版本中,所以可能会漏掉一些
bug 。但是,没有可测试的代码,你会漏掉一些通过用户界面无法揭示的 bug 。这是一个风险的权衡,我相信添加测试代码通常会占上风。参见[Marick95]的第13章以获取更多详细内容。
In one case, there's an alternative to having
the programmer add code to the product: have a tool do it. Commercial
tools like Purify, Boundschecker, and Sentinel automatically add
code that checks for certain classes of failures (such as memory
leaks). They provide a narrow, specialized testing interface. For
marketing reasons, these tools are sold as programmer debugging
tools, but they're equally test support tools, and I'm amazed that
testing groups don't use them as a matter of course.
有一种情况是,有一个方案替代程序远向产品添加代码:用工具来完成。一些商用工具如Purify、Boundschecker和Sentinel可以自动添加代码以检查某种类型的错误(比如内存泄露)。它们提供一个狭小的、专用的测试界面。因为市场营销的原因,这些工具是作为程序员调试工具出售的,但它们等同于测试支持工具,测试小组没有把它们当成常规工具来使用,让我觉得很吃惊。
Testability problems are exacerbated in distributed
systems like conventional client/server systems, multi-tiered client/server
systems, Java applets that provide smart front-ends to web sites,
and so forth. Too often, tests of such systems amount to shallow
tests of the user interface component because that's the only component
that the tester can easily control.
测试问题在分布式系统中,比如传统的客户/服务器系统、多层的客户/服务器系统、向站点提供灵巧的前端应用的Java小程序等,可测试性问题更为严重。常常地,测试这类系统等同于用户界面部件的浅显测试,因为它们是测试员能够容易控制的唯一部件。
Finding failures is only the start
发现错误仅仅是开始
It's not enough to find a failure; you must also
report it. Unfortunately, poor bug reporting is a classic mistake.
Tester bug reports suffer from five major problems:
发现错误是不够的,还必须报告它。不幸的是,低劣的 bug 报告是一个典型错误。测试员的错误报告存在五个主要问题:
1. They do not describe how to reproduce the bug.
Either no procedure is given, or the given procedure doesn't work.
Either case will likely get the bug report shelved.
他们没有描述如何重现 bug 。要么没有描述过程,要么描述的过程不正确。这两种情况都会使错误报告被搁置。
2. They don't explain what went wrong. At what
point in the procedure does the bug occur? What should happen there?
What actually happened?
他们没有解释出了什么问题。在什么地方出现了 bug ?将会发生什么?实际上又发生了什么?
3. They are not persuasive about the priority
of the bug. Your job is to have the seriousness of the bug accurately
assessed. There's a natural tendency for programmers and managers
to rate bugs as less serious than they are. If you believe a bug
is serious, explain why a customer would view it the way you do.
If you found the bug with an odd case, take the time to reproduce
it with a more obviously common or compelling case.
4. 关于 bug 的级别没有说服力。你的工作是评估 bug 的严重性。对于程序员和经理有一种很自然的倾向:评估的严重性比实际的严重性低。如果你确信一个
bug 是严重的,要解释一下为什么客户要以你的方式看待这个问题。如果你发现一个奇怪的错误,花一些时间以更普通、更令人信服的方式重现它。
5. They do not help the programmer in debugging.
This is a simple cost/benefit tradeoff. A small amount of time spent
simplifying the procedure for reproducing the bug or exploring the
various ways it could occur may save a great deal of programmer
time.
他们不能帮助程序员排除 bug 。这是一个简单的成本/收益权衡。花一点时间简化重现 bug
的过程或探索一下各种发生它的方法可以节约程序员大量的时间。
6. They are insulting, so they poison the relationship
between developers and testers.
它们是侮辱性的,破坏了开发人员和测试人员的关系。
[Kaner93] has an excellent chapter (5) on how
to write bug reports. Read it.
[Kaner93]有一章(第5章)非常好的内容说明了应该如何写 bug 报告。可以读一下。
Not all bug reports come from testers. Some come
from customers. When that happens, it's common for a tester to write
a regression test that reproduces the bug in the broken version
of the product. When the bug is fixed, that test is used to check
that it was fixed correctly.
不是所有的 bug 报告都是测试员写的。有一些是来自客户的。如果出现这样的情况,常见情况是测试员编写一个回归测试,在产品出现问题的版本上重现这个
bug 。如果 bug 得到修复,这个测试可以用于检查它是否正确修复。
However, adding only regression tests is not enough.
A customer bug report suggests two things:
但是,仅仅添加回归测试是不够的。客户 bug 报告暗示着两个东西:
1. That area of the product is buggy. It's well
known that bugs tend to cluster.
产品的那个领域包含了 bug。大家都知道, bug 一般是集中出现的。
2. That area of the product was inadequately tested.
Otherwise, why did the bug originally escape testing?
产品的那个领域没有进行充分测试。否则的话,为什么开始测试的时候漏掉了那个 bug ?
An appropriate response to several customer bug
reports in an area is to schedule more thorough testing for that
area. Begin by examining the current tests (if they're understandable)
to determine their systematic weaknesses.
对于某个领域中的几个客户 bug 报告的适当响应是对该领域安排一个更全面的测试。首先检查当前测试(如果它们是可以理解的话)以确定在系统性方面的不足之处。
Finally, every bug report is a gift from a customer
that tells you how to test better in the future. A common mistake
is failing to take notes for the next testing effort. The next product
will be somewhat like this one, the bugs will be somewhat like these,
and the tests useful in finding those bugs will also be somewhat
like the ones you just ran. Mental notes are easy to forget, and
they're hard to hand to a new tester. Writing is a wonderful human
invention: use it. Both [Kaner93] and [Marick95] describe formats
for archiving test information, and both contain general-purpose
examples.
总之,每个 bug 报告都是客户的礼物,告诉我们在今后如何更好地测试。一个常见错误是不能为下次测试工作做好记录。下一个产品将在某种程度上类似于这一个,
bug 在某种程度上类似于这一个,你刚才所做的测试在某种程度上类似于将来找出那些错误的测试。脑海中的记录容易忘记,也很难传授给新测试员。书写是人类一个美妙的发明:使用它。[Kaner93]和[Marick95]都描述了归档测试信息的格式,并包含了通用的示例。
主题五:过度使用技术
Test automation is based on a simple economic
proposition:
测试自动化基于一个简单的经济观点:
· If a manual test costs $X to run the first time,
it will cost just about $X to run each time thereafter, whereas:
· 如果第一次手工测试的成本是$X,则其后每次测试的成本大致都是$X,然而:
· If an automated test costs $Y to create, it
will cost almost nothing to run from then on.
· 如果创建自动化测试的成本是$Y,则其后的运行成本几乎为零。
$Y is bigger than $X. I've heard estimates ranging
from 3 to 30 times as big, with the most commonly cited number seeming
to be 10. Suppose 10 is correct for your application and your automation
tools. Then you should automate any test that will be run more than
10 times.
$Y比$X大。我了解到的估计范围是从3至30倍,而常常被引用的数值似乎是10。假设10对于应用程序和自动化工具是正确的。这样应当将运行10次以上的测试都进行自动化。
A classic mistake is to ignore these economics,
attempting to automate all tests, even those that won't be run often
enough to justify it. What tests clearly justify automation?
一个典型错误是忽略这些经济上的考虑,试图自动化所有的测试,甚至包括那些不常运行的测试以至不能证明自动化是必要的。哪些测试能明显地证明自动化是必要的?
· Stress or load tests may be impossible to implement
manually. Would you have a tester execute and check a function 1000
times? Are you going to sit 100 people down at 100 terminals?
· 压力或负载测试可能无法手工实现。你会让测试员执行并检查一个函数1000次吗?你会找100个人坐在100个终端前面吗?
· Nightly builds are becoming increasingly common.
(See [McConnell96] or [Cusumano95] for descriptions of the procedure.)
If you build the product nightly, you must have an automated "smoke
test suite". Smoke tests are those that are run after every
build to check for grievous errors.
· 夜间构建变得越来越普遍了。(参见[McConnell96]或[Cusumano95]可以了解这个过程的描述)。如果在夜间构建产品,就必须有一个自动化的“冒烟测试套件”。
冒烟测试指的是那些在每次构建之后都去检查严重错误的测试。
· Configuration tests may be run on dozens of
configurations.
· 配置测试可能要在数十种配置上运行。
The other kinds of tests are less clear-cut. Think
hard about whether you'd rather have automated tests that are run
often or ten times as many manual tests, each run once. Beware of
irrational, emotional reasons for automating, such as testers who
find programming automated tests more fun, a perception that automated
tests will lead to higher status (everything else is "monkey
testing"), or a fear of not rerunning a test that would have
found a bug (thus leading you to automate it, leaving you without
enough time to write a test that would have found a different bug).
其他种类的测试不是这个明显。仔细想一下,对于那些多次运行或者运行次数是手工运行次数10倍的自动化测试,你是否只运行一次。要当心实现自动化的不理性的、感情的原因,例如测试员发现程序自动化测试更有趣,认为自动化测试将带来更高的地位(其他测试都是“猴子测试”),或者是害怕不能重新运行一个会发现
bug 的测试(这导致你将它自动化,使你没有足够的时间编写一个会发现其他 bug 的测试)。
You will likely end up in a compromise position,
where you have:
你可能在最后有一个折中的方式,你将有:
1. a set of automated tests that are run often.
一套经常运行的自动化测试。
2. a well-documented set of manual tests. Subsets
of these can be rerun as necessary. For example, when a critical
area of the system has been extensively changed, you might rerun
its manual tests. You might run different samples of this suite
after each major build.
一套文档齐备的手工测试。这些测试的子集合可以在需要的时候重新运行。例如,当一个系统的关键领域被大规模地改变时,可能会重新运行手工测试。在每一次主要构建之后,都可能会运行这个套件的不同样例。
3. a set of undocumented tests that were run once
(including exploratory "bug bash" tests).
一套没有文档的、只运行一次的测试(包括探索性的“bug 大清除”测试)。
Beware of expecting to rerun all manual tests.
You will become bogged down rerunning tests with low bug-finding
value, leaving yourself no time to create new tests. You will waste
time documenting tests that don't need to be documented.
注意不要期望重新运行所有的手工测试。重新运行这些很少能找到 bug 的测试会使你停滞不前,使你自己没有时间创建新的测试。你会把时间浪费在为不需要文档的测试编写文档上。
You could automate more tests if you could lower
the cost of creating them. That's the promise of using GUI capture/replay
tools to reduce test creation cost. The notion is that you simply
execute a manual test, and the tool records what you do. When you
manually check the correctness of a value, the tool remembers that
correct value. You can then later play back the recording, and the
tool will check whether all checked values are the same as the remembered
values.
如果你能够降低创建自动测试的成本,就可以多做一些。这也是使用GUI捕获/回放工具能够减少创建测试的成本的承诺。这个想法是你只需要执行手工测试,工具会录制下你所做的操作。当你手工检查一个值是否正确时,工具会记着那个正确值。过后你可以回放录制,工具会检查是否所有检查的值是否都与记忆的值相同。
There are two variants of such tools. What I call
the first generation tools capture raw mouse movements or keystrokes
and take snapshots of the pixels on the screen. The second generation
tools (often called "object oriented") reach into the
program and manipulate underlying data structures (widgets or controls).
这类工具有两个变种。我称为第一代的工具只是捕获原始的鼠标移动或击键操作,并记下屏幕上象素的瞬象。第二代工具(常称为“面向对象的”)深入程序并操纵底层数据结构(小配件或控件)。
First generation tools produce unmaintainable
tests. Whenever the screen layout changes in the slightest way,
the tests break. Mouse clicks are delivered to the wrong place,
and snapshots fail in irrelevant ways that nevertheless have to
be checked. Because screen layout changes are common, the constant
manual updating of tests becomes insupportable.
第一代工具产生的是不可维护的测试。不论什么时候,只要屏幕布局有了非常微小的变化,测试就要中断。鼠标的点击传送到不正确的位置,瞬象以一种不相关的方式失败,必须予以检查。因为屏幕布局变化是常见情况,所以经常手动更新测试也变得无法忍受。
Second generation tools are applicable only to
tests where the underlying data structures are useful. For example,
they rarely apply to a photograph editing tool, where you need to
look at an actual image - at the actual bitmap. They also tend not
to work with custom controls. Heavy users of capture/replay tools
seem to spend an inordinate amount of time trying to get the tool
to deal with the special features of their program - which raises
the cost of test automation.
第二代工具只有在底层数据结构有用时才是可行的。例如,它们很少能用于照片编辑工具,因为你需要查看实际的图象,即实际的位图。它们也不大能够与定制的控件一起使用。大量用户的捕获/回放工具似乎都要花费大量时间来使得工具能够处理他们程序的特殊功能——这增加了自动测试的成本。
Second generation tools do not guarantee maintainability
either. Suppose a radio button is changed to a pulldown list. All
of the tests that use the old controls will now be broken.
第二代工具也不能保证可维护性。假设一个单选按钮改变为下拉列表。所有使用老控件的测试都将中断。
GUI interface changes are of course common, especially
between releases. Consider carefully whether an automated test that
must be recaptured after GUI changes is worth having. Keep in mind
that it can be hard to figure out what a captured test is attempting
to accomplish unless it is separately documented.
GUI界面当然是常常会改变的,特别是在不同的发行版之间。仔细考虑一下一个在GUI变化之后必须重新捕获的自动化测试工具是否值得拥有。记住,除非另外使用文档记录下来,否则想要了解一个录制的测试能够完成什么工作是一件困难的事。
As a rule of thumb, it's dangerous to assume that
an automated test will pay for itself this release, so your test
must be able to survive a reasonable level of GUI change. I believe
that capture/replay tests, of either generation, are rarely robust
enough.
一个基本原则是,认为自动化测试的投资在这个发行版就能收回的想法是危险的,所以在一个合理的GUI变化范围之内测试必须能够继续使用。我相信不论是第一代还是第二代捕获/回放测试,都不够健壮。
An alternative approach to capture/replay is scripting
tests. (Most GUI capture/replay tools also allow scripting.) Some
member of the testing team writes a "test API" (application
programmer interface) that lets other members of the team express
their tests in less GUI-dependent terms. Whereas a captured test
might look like this:
捕获/回放的一个替代方法是脚本化测试。(大多数GUI捕获/回放工具都允许编写脚本。)测试小组的某些成员编写一个“测试API(应用编程接口)”,允许小组的其他成员以较少依赖GUI的方式表达他们的测试。一个捕获的测试类似于这样:
· text $main.accountField "12"
click $main.OK
menu $operations
menu $withdraw
click $withdrawDialog.all
...
文本 $main.accountField "12"
点击 $main.OK
菜单 $operations
菜单 $withdraw
点击 $withdrawDialog.all
a script might look like this:
而一个脚本类似于:
· select-account 12
withdraw all
...
select-account 12
withdraw all
The script commands are subroutines that perform
the appropriate mouse clicks and key presses. If the API is well-designed,
most GUI changes will require changes only to the implementation
of functions like withdraw, not to all the tests that use them.
Please note that well-designed test APIs are as hard to write as
any other good API. That is, they're hard, and you shouldn't expect
to get it right the first time.
脚本命令是执行适当的鼠标点击和按键的子程序。如果API设计得好,大多数GUI变化仅需要对函数(例如withdraw)实现变化,而不是所有使用它们的测试。请注意设计精良的API和其他好API一样难写。也就是说,因为它们不容易写,你也不要指望第一次就得到正确结果。
In a variant of this approach, the tests are data-driven.
The tester provides a table describing key values. Some tool reads
the table and converts it to the appropriate mouse clicks. The table
is even less vulnerable to GUI changes because the sequence of operations
has been abstracted away. It's also likely to be more understandable,
especially to domain experts who are not programmers. See [Pettichord96]
for an example of data-driven automated testing.
这个方法的一个变种,是数据驱动的测试。测试员提供一个表来描述键值。某些工具读取表并将它转换为特定的鼠标点击。这个表即使在GUI变化时也不易受到损害,因为操作序列已经被抽象出来了。它也有可能是更易于理解,尤其是对于非程序员的领域专家。查看[Pettichord96]可以获得数据驱动自动化测试的示例。
Note that these more abstract tests (whether scripted
or data-driven) do not necessarily test the user interface thoroughly.
If the Withdraw dialog can be reached via several routes (toolbar,
menu item, hotkey), you don't know whether each route has been tried.
You need a separate (most likely manual) effort to ensure that all
the GUI components are connected correctly.
注意这些抽象测试(不论是脚本化的还是数据驱动的)不一定会完全测试用户界面。如果“取款”对话框能够通过几个途径(工具条、菜单项)达到,你无法知道是否尝试了每个路线。你需要一个单独的(很可能是手工的)的工作来确保所有的GUI部件都正确地连接。
Whatever approach you take, don't fall into the
trap of expecting regression tests to find a high proportion of
new bugs. Regression tests discover that new or changed code breaks
what used to work. While that happens more often than any of us
would like, most bugs are in the product's new or intentionally
changed behavior. Those bugs have to be caught by new tests.
不论你采用的是什么方法,不要陷入期望回归测试发现高比例的新 bug 的陷阱。回归测试是发现以前起作用、但新代码或更改后的代码不起作用的现象。虽然它比我们希望的发生的次数更多,但许多
bug 是产品的新的或故意更改的行为。那些 bug 必须通过新测试来捕捉。
Code coverage
代码覆盖率
GUI capture/replay testing is appealing because
it's a quick fix for a difficult problem. Another class of tool
has the same kind of attraction.
GUI捕获/回放测试因为可以快速修复困难问题而具有吸引力。另一类工具也同样具有吸引力。
The difficult problem is that it's so hard to
know if you're doing a good job testing. You only really find out
once the product has shipped. Understandably, this makes managers
uncomfortable. Sometimes you find them embracing code coverage with
the devotion that only simple numbers can inspire. Testers sometimes
also become enamored of coverage, though their romance tends to
be less fervent and ends sooner.
困难的问题是很难知道你是否圆满地完成了测试工作。可能只有当产品已交付后才能真正知道。可以理解的是,这使得经理们不舒服。有时候你会发现他们热心采用代码覆盖率,认为只有那些简单的数字可以鼓舞士气。候测试员也变得倾心于覆盖率,虽然他们的兴趣没有那么高,而且结束得也快。
What is code coverage? It is any of a number of
measures of how thoroughly code is exercised. One common measure
counts how many statements have been executed by any test. The appeal
of such coverage is twofold:
什么是代码覆盖率?它是代码是否全面执行的数字衡量。一个常见的衡量是计算所有测试共执行了多少条语句。对这种覆盖率的呼吁有两方面:
1. If you've never exercised a line of code, you
surely can't have found any of its bugs. So you should design tests
to exercise every line of code.
如果你从未执行过某一行代码,你当然不能找出它的任何 bug 。所以应当设计一个可以执行每一行代码的测试。
2. Test suites are often too big, so you should
throw out any test that doesn't add value. A test that adds no new
coverage adds no value.
测试套件常常很大,所以应该抛弃任何不能增值的测试。一个不增加新覆盖率的测试不能增加任何价值。
Only the first sentences in (1) and (2) are true.
I'll illustrate with this picture, where the irregular splotches
indicate bugs:
句子(1)和(2)中,只有第一句是正确的。我将用下面的图说明,其中的不规则黑点指示的是 bug
:
If you write only the tests needed to satisfy
coverage, you'll find bugs. You're guaranteed to find the code that
always fails, no matter how it's executed. But most bugs depend
on how a line of code is executed. For example, code with an off-by-one
error fails only when you exercise a boundary. Code with a divide-by-zero
error fails only if you divide by zero. Coverage-adequate tests
will find some of these bugs, by sheer dumb luck, but not enough
of them. To find enough bugs, you have to write additional tests
that "redundantly" execute the code.
如果你仅编写需要满足覆盖率的测试,你会发现 bug 。那些总是失败的代码不论怎样执行,你都肯定能发现它们。但是大多数的
bug 取决于如何执行某一行代码。例如,对于“大小差一”(off-by-one)错误的代码,只有当你执行边界测试时才会失败。只有在被零除的时候,代码才会发生被零除的错误。覆盖率足够的测试会发现这些
bug 中的一部分,全靠运气,但发现得还不够多。要发现足够多的 bug ,你必须编写其他的测试“冗余地”执行代码。
For the same reason, removing tests from a regression
test suite just because they don't add coverage is dangerous. The
point is not to cover the code; it's to have tests that can discover
enough of the bugs that are likely to be caused when the code is
changed. Unless the tests are ineptly designed, removing tests will
just remove power. If they are ineptly designed, using coverage
converts a big and lousy test suite to a small and lousy test suite.
That's progress, I suppose, but it's addressing the wrong problem.
同样的原因,因为有些测试不能增加覆盖率而将它们从回归测试套件中去掉也是危险的。关键不是覆盖代码;而是测试那些当代码更改时容易被发现的
bug 。除非测试用例是不熟练的设计,否则去掉测试用例就是去除作用力。如果它们是不熟练的设计,可以使用覆盖率将一个大而粗劣测试用例套件转化成一个小而粗劣的测试用例套件。我想这是进步,但是与这个问题无关。
A grave danger of code coverage is that it is
concrete, objective, and easy to measure. Many managers today are
using coverage as a performance goal for testers. Unfortunately,
a cardinal rule of management applies here: "Tell me how a
person is evaluated, and I'll tell you how he behaves." If
a person is evaluated by how much coverage is achieved in a given
time (or in how little time it takes to reach a particular coverage
goal), that person will tend to write tests to achieve high coverage
in the fastest way possible. Unfortunately, that means shortchanging
careful test design that targets bugs, and it certainly means avoiding
in-depth, repetitive testing of "already covered" code.
代码覆盖率的一个重大危险是它是具体、主观而易于衡量的。今天的许多经理都使用覆盖率作为测试员的绩效目标。不幸的是,一个重要的管理规则适用于这里:“告诉我如何评价一个人,然后我才能告诉你他的表现。”如果一个人是通过在给定的时间内覆盖了多少代码(或者是在多么少的时间内达到了特定覆盖目标)来评估的,那么那个人将倾向于以尽可能快的方式达到高覆盖率的测试。不幸的是,这将意味对以发现
bug 为目的的仔细测试设计的偷工减料,这当然也意味着避开了深层次、重复地测试“已经覆盖”的代码。
Using coverage as a test design technique works
only when the testers are both designing poor tests and testing
redundantly. They'd be better off at least targeting their poor
tests at new areas of code. In more normal situations, coverage
as a guide to design only decreases the value of the tests or puts
testers under unproductive pressure to meet unhelpful goals.
仅当测试员设计了的测试质量不高并且冗余地进行测试时,将测试度作为测试设计技巧才能起作用。至少可以让他们将这些把这些质量不高的测试转移到新的领域中。在正式的场合,覆盖率作为一个设计的指导只会减少测试的价值,或将测试员置于低效率的压力下,以达到没有用处的目标。
Coverage does play a role in testing, not as a
guide to test design, but as a rough evaluation of it. After you've
run your tests, ask what their coverage is. If certain areas of
the code have no or low coverage, you're sure to have tested them
shallowly. If that wasn't intentional, you should improve the tests
by rethinking their design. Coverage has told you where your tests
are weak, but it's up to you to understand how.
覆盖率在测试中确实能起作用,但不是作为测试设计的指导,而是作为一个大致的评估。在运行完测试后,看一下它们的覆盖率是多少。如果某个领域的代码没有覆盖到或覆盖率很低,可以确定你对它们的测试很肤浅。如果不是故意那样做的,你应该考虑重新设计它们以改进测试。覆盖率告诉你测试的哪个部分是薄弱的,但如何理解则取决于你。
You might not entirely ignore coverage. You might
glance at the uncovered lines of code (possibly assisted by the
programmer) to discover the kinds of tests you omitted. For example,
you might scan the code to determine that you undertested a dialog
box's error handling. Having done that, you step back and think
of all the user errors the dialog box should handle, not how to
provoke the error checks on line 343, 354, and 399. By rethinking
design, you'll not only execute those lines, you might also discover
that several other error checks are entirely missing. (Coverage
can't tell you how well you would have exercised needed code that
was left out of the program.)
你也不能完全忽略覆盖率。你可以浏览未覆盖的代码行(可能是在程序员的辅助下)以发现你忽略的某种测试。例如,你可能浏览代码以确定你是否对某个对话框的错误处理测试不足。在完成这些之后,你翻回头应该考虑对话框应该处理的所有用户错误,而不是检查第343、354和399行的错误。通过重新思考设计,你不仅能执行那些行,而且可能会发现几个其他完全被忽略了错误。(覆盖率不能告诉你程序之外的、所需要代码的执行情况)。
There are types of coverage that point more directly
to design mistakes than statement coverage does (branch coverage,
for example). However, none - and not all of them put together -
are so accurate that they can be used as test design techniques.
还有几类覆盖率,比语句覆盖率更直接地指向设计错误(例如分支覆盖率)。但是,其他种类——即使把他们都放在一起——也不能够精确到用于测试用例设计技巧。
One final note: Romances with coverage don't seem
to end with the former devotee wanting to be "just good friends".
When, at the end of a year's use of coverage, it has not solved
the testing problem, I find testing groups abandoning coverage entirely.
That's a shame. When I test, I spend somewhat less than 5% of my
time looking at coverage results, rethinking my test design, and
writing some new tests to correct my mistakes. It's time well spent.
最后再说明一下:对覆盖率的兴趣似乎不能以从前的爱好者希望成为“好朋友”而结束。在使用了一年的覆盖率之后,它没有解决测试问题,我发现测试小组完全放弃了覆盖率。这是一件丢人的事情。当我测试的时候,我花大约5%的时间查看覆盖率结果,重新考虑我的测试用例设计,并编写一些新的测试用例校正我的错误。这个时间是值得花的。
|