游戏开发论坛

 找回密码
 立即注册
搜索
查看: 7709|回复: 5

典型测试错误(一)——测试的作用

[复制链接]

5

主题

8

帖子

13

积分

新手上路

Rank: 1

积分
13
发表于 2004-10-18 21:20:00 | 显示全部楼层 |阅读模式
It's easy to make mistakes when testing software or planning a testing effort. Some mistakes are made so often, so repeatedly, by so many different people, that they deserve the label Classic Mistake.
  在测试软件或制订测试工作计划时很容易犯一些错误。有些错误经常被许多不同的人一而再、再而三地犯,应该被列为典型错误。
  Classic mistakes cluster usefully into five groups, which I've called "themes":
  典型错误可以有效地分为五组,我把这些组称为“主题”。
  · The Role of Testing: who does the testing team serve, and how does it do that?
  · 测试的作用:谁承担测试小组的责任,如何做?
  · Planning the Testing Effort: how should the whole team's work be organized?
  · 制订测试工作计划:应该如何组织整个小组的工作?
  · Personnel Issues: who should test?
  · 人员问题:谁应该做测试?
  · The Tester at Work: designing, writing, and maintaining individual tests.
  · 工作中的测试员:设计、编写和维护各测试。
  · Technology Rampant: quick technological fixes for hard problems.
  · 过度使用技术:艰难问题的快速技术修复
  I have two goals for this paper. First, it should identify the mistakes, put them in context, describe why they're mistakes, and suggest alternatives. Because the context of one mistake is usually prior mistakes, the paper is written in a narrative style rather than as a list that can be read in any order. Second, the paper should be a handy checklist of mistakes. For that reason, the classic mistakes are printed in a larger bold font when they appear in the text, and they're also summarized at the end.
  本文有两个目标。第一,应当识别错误,将它们放到具体环境中,描述它们为什么是错误,并给出替代方法的建议。因为一个错误的具体环境通常是先决错误,所以本文将以叙事的方式而不是以可以按任意顺序阅读的列表方式来描述。第二,本文应该是一个便于查看的错误列表。因为这个原因,文章中出现的典型错误都以大号粗体字印刷,并在文章的结尾处汇总。
  Although many of these mistakes apply to all types of software projects, my specific focus is the testing of commercial software products, not custom software or software that is safety critical or mission critical.
  虽然这些错误很多都适用于所有类型的软件项目,但我的重点将放在商用软件产品的测试上,而不是定制软件或者是高度安全或关键任务的软件测试上。
  This paper is essentially a series of bug reports for the testing process. You may think some of them are features, not bugs. You may disagree with the severities I assign. You may want more information to help in debugging, or want to volunteer information of your own. Any decent bug reporting system will treat the original bug report as the first part of a conversation. So should it be with this paper. Therefore, follow this link for an ongoing discussion of this topic.
  本文主要是测试过程的一系列错误报告。你可能认为它们中的部分属于特性问题而不是 bug。你可能不赞成我设定的严重性级别。你可能需要更多的信息以用于帮助排除错误,或者希望提供你自己的信息。任何设计良好的错误报告系统都将原始的错误报告当作是对话的起始部分。本文也是这样,所以,可以按照链接参加这个主题的讨论。
  Theme One: The Role of Testing
  主题一:测试的作用
  A first major mistake people make is thinking that the testing team is responsible for assuring quality. This role, often assigned to the first testing team in an organization, makes it the last defense, the barrier between the development team (accused of producing bad quality) and the customer (who must be protected from them). It's characterized by a testing team (often called the "Quality Assurance Group") that has formal authority to prevent shipment of the product. That in itself is a disheartening task: the testing team can't improve quality, only enforce a minimal level. Worse, that authority is usually more apparent than real. Discovering that, together with the perverse incentives of telling developers that quality is someone else's job, leads to testing teams and testers who are disillusioned, cynical, and view themselves as victims. We've learned from Deming and others that products are better and cheaper to produce when everyone, at every stage in development, is responsible for the quality of their work ([Deming86], [Ishikawa85]).
  人们犯的第一个主要错误是认为测试小组应当负责质量保证。这个角色常常分配给组织中的第一测试小组,将它作为最后的防御,成为开发小组(被指责为产生低劣质量)和客户(必须受到保护以远离低劣质量)的一个屏障。它的特征是测试小组(常称为“质量保证组”)表面上具有阻止产品发货的权力。 这本身是一个令人沮丧的任务:测试小组不能提高质量,只能强制一个最低水平。更糟糕的是,这种权力常常是看上去比实际的重要。如果发现这一点,再加上有违常理地暗示开发人员质量是别人的事情,导致测试小组和测试员感到失望、愤事嫉俗、感觉自己是受害者。我们从Deming 和其他人的工作可以得知:如果每个人都在开发的各个阶段对他们的工作质量负责,则产品会又好又便宜([Deming86],[Ishikawa85])。
  In practice, whatever the formal role, most organizations believe that the purpose of testing is to find bugs. This is a less pernicious definition than the previous one, but it's missing a key word. When I talk to programmers and development managers about testers, one key sentence keeps coming up: "Testers aren't finding the important bugs." Sometimes that's just griping, sometimes it's because the programmers have a skewed sense of what's important, but I regret to say that all too often it's valid criticism. Too many bug reports from testers are minor or irrelevant, and too many important bugs are missed.
  实际上,不管表面上的作用是什么,大多数组织都相信测试的目的是发现 bug。这个定义的危害比前一个定义的危害要小,但是忽略了一个关键词。当我同程序员和开发经理谈到测试员的时候,不时听到一个关键的句子:测试员找不到重要的 bug。有时候这种说法只是一种抱怨,有时候是因为程序员对于什么是正确的感觉不对,但我很遗憾地说,它们经常是有效的批评。测试员的太多的bug 报告是微小的、不相关的,而有太多重要的错误都被遗漏了。
  What's an important bug? Important to whom? To a first approximation, the answer must be "to customers". Almost everyone will nod their head upon hearing this definition, but do they mean it? Here's a test of your organization's maturity. Suppose your product is a system that accepts email requests for service. As soon as a request is received, it sends a reply that says "your request of 5/12/97 was accepted and its reference ID is NIC-051297-3". A tester who sends in many requests per day finds she has difficulty keeping track of which request goes with which ID. She wishes that the original request were appended to the acknowledgement. Furthermore, she realizes that some customers will also generate many requests per day, so would also appreciate this feature. Would she:
  什么是重要的 bug?对谁而言是重要的?直观的估计,答案肯定是“对于客户”。听到这个定义,几乎每个人都会点头称是,但他们确实这样认为吗?这里要测试一下你们组织的成熟度。假设你们的产品是一个接受电子邮件请求服务的系统。当收到请求时,它马上发送一个“您在97年5月12日发送的请求已经受理,参考ID是NIC-051297-3”的答复。一个每天发送很多请求的测试员发现要分清楚哪个请求与哪个ID对应是非常困难的。她希望最初的请求能够附加在确认邮件的后面。并且,她意识到某些可户可能每天也会产生很多请求,所以会高度评价这个功能的。那么她将:
  1. file a bug report documenting a usability problem, with the expectation that it will be assigned a reasonably high priority (because the fix is clearly useful to everyone, important to some users, and easy to do)?
  写一个 bug 报告,记录一个可用性问题,希望能够分配一个合理的高优先级(因为这个修复很明显对每个人都很用,对有部分用户来说还非常重要,并且也容易修改)?
  2. file a bug report with the expectation that it will be assigned "enhancement request" priority and disappear forever into the bug database?
  写一个 bug 报告,希望它被分配为“功能提升请求”优先级并永远从 bug 数据库中消失?
  3. file a bug report that yields a "works as designed" resolution code, perhaps with an email "nastygram" from a programmer or the development manager?
  写一个 bug 报告,产生一个“按设计工作”解决码,可能还加上一个来自程序员或开发经理的“不同意”电子邮件?
  4. not bother with a bug report because it would end up in cases (2) or (3)?
  不打算费事去写 bug 报告,因为它将以情况(2)或(3)结束?
  If usability problems are not considered valid bugs, your project defines the testing task too narrowly. Testers are restricted to checking whether the product does what was intended, not whether what was intended is useful. Customers do not care about the distinction, and testers shouldn't either.
  如果可用性问题不认为是有效的 bug,那么你们的项目将测试任务定义得太狭窄了。测试员严格限制为检查产品是否按预期工作,而不管这种预期是否有效。客户不关心这个区别,测试员也不应该关心。
  Testers are often the only people in the organization who use the system as heavily as an expert. They notice usability problems that experts will see. (Formal usability testing almost invariably concentrates on novice users.) Expert customers often don't report usability problems, because they've been trained to know it's not worth their time. Instead, they wait (in vain, perhaps) for a more usable product and switch to it. Testers can prevent that lost revenue.
  测试员经常是组织中唯一像专家一样大量使用系统的人。他们会注意到专家会看到的可用性问题。(形式上的可用性测试几乎不可避免地集中于没有经验的用户。)专家客户常常不会报告可用性问题,因为他们已经被训练的知道不值得花时间去这样做。相反,他们(也许是徒劳地)等待下一个可用的产品然后切换过去。测试员可以避免这个损失。
  While defining the purpose of testing as "finding bugs important to customers" is a step forward, it's more restrictive than I like. It means that there is no focus on an estimate of quality (and on the quality of that estimate). Consider these two situations for a product with five subsystems.
  将测试的目的定义为“找到对用户重要的 bug ”是向前进了一步,但与我所喜欢定义相比仍有限制。这意味着没有集中于质量评估(以及这种评估的质量)。考虑一下测试含有五个子系统的产品的两种情况。
  1. 100 bugs are found in subsystem 1 before release. (For simplicity, assume that all bugs are of the highest priority.) No bugs are found in the other subsystems. After release, no bugs are reported in subsystem 1, but 12 bugs are found in each of the other subsystems.
  在发布前,在子系统1中找到了100个bug 。(为了简单起见,假设所有的 bug 都是最高级别的。)在其他子系统中没有发现 bug 。在发布后,在子系统1中没有报告 bug ,但在其他每个子系统中都报告了12个 bug 。
  2. Before release, 50 bugs are found in subsystem 1. 6 bugs are found in each of the other subsystems. After release, 50 bugs are found in subsystem 1 and 6 bugs in each of the other subsystems.
  在发布前,在子系统1中找到了50个 bug 。在其他每个子系统中都找到了6个 bug 。在发布后,在子系统1中报告了50个 bug ,在其他每个子系统中都报告了6个 bug。
  From the "find important bugs" standpoint, the first testing effort was superior. It found 100 bugs before release, whereas the second found only 74. But I think you can make a strong case that the second effort is more useful in practical terms. Let me restate the two situations in terms of what a test manager might say before release:
  从“找到重要 bug”的观点看,第1种测试情况较为理想。在发布前找到了100个 bug ,但是第2种情况中只找到74个。但我想你们可能会提出一个有力的理由认为第2中测试在实际中更有用。让我以产品发版前测试经理可能说些什么来重新描述一下两种测试情况:
  1. "We have tested subsystem 1 very thoroughly, and we believe we've found almost all of the priority 1 bugs. Unfortunately, we don't know anything about the bugginess of the remaining five subsystems."
  “我们全面测试了子系统1,我们相信已经找出了几乎所有优先级为1的 bug。不幸的是,我们对其他五个子系统的的 bug 一无所知。”
  2. "We've tested all subsystems moderately thoroughly. Subsystem 1 is still very buggy. The other subsystems are about 1/10th as buggy, though we're sure bugs remain."
  “我们比较全面地测试了所有的子系统。子系统1仍旧有不少 bug。其他子系统虽然还有 bug,但只有子系统1的 bug 的十分之一。”
  This is, admittedly, an extreme example, but it demonstrates an important point. The project manager has a tough decision: would it be better to hold on to the product for more work, or should it be shipped now? Many factors - all rough estimates of possible futures - have to be weighed: Will a competitor beat us to release and tie up the market? Will dropping an unfinished feature to make it into a particular magazine's special "Java Development Environments" issue cause us to suffer in the review? Will critical customer X be more annoyed by a schedule slip or by a shaky product? Will the product be buggy enough that profits will be eaten up by support costs or, worse, a recall?
  必须承认,这是一个极端的例子,但是证明了一个重要的观点。项目经理有一个艰难的决定:是延迟产品交付,再工作一段时间,还是现在就交付使用?许多因素——都是一些大致的评估——都必须予以权衡:竞争对手会抢先发布产品并占领市场吗?如果丢掉一个未完工的功能部件会使得某一个杂志的 “Java 开发环境” 特别期刊的评论中对我们造成损害吗?关键客户 X 对产品延期和劣质产品哪一个更感到烦恼?产品是否有很多 bug,以至于支持成本会吃掉利润,或者更糟糕的是将产品召回?
  The testing team will serve the project manager better if it concentrates first on providing estimates of product bugginess (reducing uncertainty), then on finding more of the bugs that are estimated to be there. That affects test planning, the topic of the next theme.
  如果测试小组首先集中于产品错误的估计(减少不确定性),然后再找到更多的错误,他们会更好地服务于项目经理。这会影响测试计划。测试计划将在下个主题中论述。
  It also affects status reporting. Test managers often err by reporting bug data without putting it into context. Without context, project management tends to focus on one graph:
  这也会影响状态报告。测试经理常常会被没有放到具体环境中的报告 bug数据误导。没有具体环境,项目管理倾向于集中于一幅图:

5

主题

8

帖子

13

积分

新手上路

Rank: 1

积分
13
 楼主| 发表于 2004-10-18 21:21:00 | 显示全部楼层

Re:典型测试错误(一)——测试的作用

The flattening in the curve of bugs found will be interpreted in the most optimistic possible way unless you as test manager explain the limitations of the data:
  平滑的错误曲线很容易以一种乐观的方式解释,除非你作为测试经理解释了数据的局限:
  · "Only half the planned testing tasks have been finished, so little is known about half the areas in the project. There could soon be a big spike in the number of bugs found."
  · 只有一半的计划测试做完了,对于项目的一半所知甚少。很快就有很多错误要被发现了。
  · "That's especially likely because the last two weekly builds have been lightly tested. I told the testers to take their vacations now, before the project hits crunch mode."
  · 很有可能这样,因为在过去的两个周构建只是略微测试了一下。我告诉测试员在项目进入艰难状态之前,现在开始休假。
  · "Furthermore, based on previous projects with similar amounts and kinds of testing effort, it's reasonable to expect at least 45 priority-1 bugs remain undiscovered. Historically, that's pretty high for a successful product."
  · 并且,根据以前的经验,可以预料到至少还有45个级别为1的 bug还没有发现。从历史看,这对于一个成功产品来说是很高的。
  For discussions of using bug data, see [Cusumano95], [Rothman96], and [Marick97].
  关于使用 bug 数据的讨论,请参阅[Cusumano95]、[Rothman96]和[Marick97]。
  Earlier I asserted that testers can't directly improve quality; they can only measure it. That's true only if you find yourself starting testing too late. Tests designed before coding begins can improve quality. They inform the developer of the kinds of tests that will be run, including the special cases that will be checked. The developer can use that information while thinking about the design, during design inspections, and in his own developer testing.
  我在前面说过,测试员不能直接提高质量,他们只能评估它。只有在你发现测试开始得太晚的时候,这种说法才是正确的。在编码开始前设计测试将会提高质量。他们让开发人员知道将进行什么样的测试,将检查哪些特殊用例。开发人员在思考设计、审查设计和自己做测试的时候可以使用这些信息。
  Early test design can do more than prevent coding bugs. As will be discussed in the next theme, many tests will represent user tasks. The process of designing them can find user interface and usability problems before expensive rework is required. I've found problems like no user-visible place for error messages to go, pluggable modules that didn't fit together, two screens that had to be used together but could not be displayed simultaneously, and "obvious" functions that couldn't be performed. Test design fits nicely into any usability engineering effort ([Nielsen93]) as a way of finding specification bugs.
  尽早测试的作用不仅仅是防止编码错误。像我们将在下一个主题中所讨论的那样,许多测试代表的是用户任务。设计它们的过程可以在昂贵的重新工作之前发现用户界面和可用性问题。我发现过的问题包括:错误消息不能显示在用户可以看到的地方,插件不能放到一起,两个必须同时使用的屏幕不能同时显示,一个“很明显”的功能不能执行。测试设计作为一个发现规格说明书 bug 的方法,很好地与可用性工程工作相适应([Nielsen93])。
  I should note that involving testing early feels unnatural to many programmers and development managers. There may be feelings that you are intruding on their turf or not giving them the chance to make the mistakes that are an essential part of design. Take care, especially at first, not to increase their workload or slow them down. It may take one or two entire projects to establish your credibility and usefulness.
  我应当说明早期介入测试对于许多程序员和开发经理来说不自然。可能有一种感觉是你干扰了他们,没有给他们在设计的基础部分犯错误的机会。小心些,尤其是在开始的时候,不要增加他们的工作量或减慢了他们的速度。可能需要一至两个完整的项目才能建立你们的可信度并显示出作用。

30

主题

89

帖子

109

积分

注册会员

Rank: 2

积分
109
QQ
发表于 2004-10-23 02:57:00 | 显示全部楼层

Re:典型测试错误(一)——测试的作用

游戏测试的一些资料,以前收集的,比较乱,不知道作者是谁,连文章的题目都不知道


测试的定义

如果给个定义,我觉得:测试工作是,解决玩家所遇非正常问题的预测工作,同时也是不断调试平衡的一个长期观察任务。无论在什么时间段,功能实现、内测、公测等。测试都应该是分硬件与软件两部分测试。

硬性问题

硬件的BUG部分是指会引起不能让游戏流程进行的BUG。死机、画面出错等硬性问题。这种问题只要按照一定流程进行游戏,就会发生。但对一些会不断增加服务器负担的高级BUG,应该不会短期测试出来。而对这种在有计算机就出现的问题,现在的游戏在制作过程中都有可自动记录问题的LOG功能,所出现的BUG大多会被程序部门解决掉。部分的LOG功能可保留到正式客户端,以收集因为升级客户端,而不断产生的新问题。这里应该不会在讨论范围内吧。

软性问题

而软件的逻辑部分大多会在后期进行,比如公测。是各种功能的数值调整。主要为游戏的世界定义一个平衡。除了初级的数值设定外,内部测试人员很少有能把一个功能测试千万遍的。于是有可能产生出猫耍的老虎团团转,这种经典的寓言故事。策划及相关测试人员注重的应该是这部分的测试原理及方法。

而这部分问题的测试,同硬性问题一样,需要一定流程及要求。而具体流程只有根据具体游戏来决定,大多是将问题分裂存放,并将理由归纳。但有几点是不变的。

平衡的目标

而如何让各种设定不偏离主题,明确世界背景及制定等级概念应该是首要的。尤其在一些角色等系统十分复杂的情况下。那种变态ADD的规则,可由主角的5~6种基础属性影响到数十种战斗、非战斗技能。还可根据各种物品来休整这些数值。而无论如何。他们都有个明确的等级观念。从弱到普通,再到强,甚至到最强的龙。这是因为他们知道一个人,最强也不能强过龙。这样就给自己定下上限目标。

所以,测试时首先不要去看玩家可选择的职业技能等等是否足够多。都会获得什么强大的技能、体力等等。先了解到这个世界里,各个种族之间的关系、职业的互补、各个角色的互相的关系,在整个世界中是什么位置,是否够合理、让常人可以现实中的逻辑去衡量,这个角色在游戏是否合理。之后才需要针对每个种族、每类职业、每个角色的平衡。最后到一个一个角色的测试。有人会说这是前期策划制定讨论的部分,没错,因为测试从这里就已经在策划的头脑中开始了。

在这里定义的过程,正好与现实世界中相反。现实世界是总结出整体的平衡,而游戏世界则要定义平衡,再将世界整理成平衡的状态。

划分等级

测试时同样要明确问题的严重等级,一个数值影响的事物越多,那他的严重等级越高。现在的MMRPG整个属性结构,基本都类似树形结构,之间也有着一些交错的枝叶。力量等最基本的角色属性,为根。这类属性会影响到的其他属性,最终到达游戏的胜负,任务的完成等等。而这些属性的等级自然也就十分明确。根为最高,枝叶最低。而修整树木永远不会从根开始。

力量,最基础的属性,结合自己的命中率,对方的敏捷等,会影响物理攻击。同样也影响着可拿的武器。但如果这个人攻击力过高。那是谁的原因?是武器,还是角色的力量。需要修改那一个?那些角色的基础属性是最不能随便修改的。因此,还是武器吧。实在不行在从由属性引发的其他部分着手,如技能的熟练度等。越基础的部分,影响力越大,也最容易出错。角色的基础属性是一切测试的根源,同样也是最不能随便更改的一类。更不应该因为某个问题而被指明要求更改。而添加删除任何一个属性,更会让之前的测试工作有2/3付之一炬,也许更多。而对于各种武器,基本可以与角色测试分开。在角色属性有数十条的游戏中,武器更不会容易出现大的问题。

严重等级之间从高到底可分为,角色,物品,技能。要修正这三大类属性,尽量在自己的范围内修正。不要妄想在其他级别动手,更别想在比自己之前高的级别里动手脚。而在这些属性里面同样还各种属性,就需要根据具体游戏进行划分测试。虽然这里以属性距离,但任务也同样如此,相互关连的任务网同样十分重要。只不过之前变化较属性掠少。

玩家是否付出与获得成正比

现实世界中,没有可能可用捷径获得某一种事物、,只有拼搏。游戏世界里是否也是?获得一个强大技能之前,给角色的锻炼是否足够。让他足够珍惜这一种技能或物品。这是游戏中较为关键的一部分,多体现在任务上。时间、精力的消耗,是否足够让玩家获得物品时有足够的满足感。以及对得起测试人员的劳动。

记录、调整,总结

软性问题应该同硬性问题一样拥有足够多的文档资料来记录,同时也方便对以往数值的效果再思考。这也应该是所有文档资料应该具备的,记录每次关键更新的工作。

调整方面Sid Meier说过,每一次调整都要多一些。这样可以看到数值中的巨大差别,从中找到合适的数值。这几乎是知道Sid Meier的人都知道的一句话。(大意相似,具体内容没办法记起来,惭愧)

很多时候,测试时会直接将测试的内容按自己的想法修改。即便记录下来也是只要改好就好。其实很多时候这些修改都有一定规律,一些修正往往是没改变任何事情。多一些时间去探讨大家是否按照原来制定的目标去修正,会更合理的利用剩下的时间测试。同样,全部结束后的总结也会让下次制作时避免出现需要大量修正的设计。

21

主题

43

帖子

43

积分

注册会员

Rank: 2

积分
43
QQ
发表于 2004-11-3 10:56:00 | 显示全部楼层

Re:典型测试错误(一)——测试的作用

很实用啊,下了慢慢研究,谢谢分享!!!

6

主题

382

帖子

384

积分

中级会员

Rank: 3Rank: 3

积分
384
发表于 2005-4-1 15:57:00 | 显示全部楼层

Re:典型测试错误(一)——测试的作用

本人需要学习

0

主题

5

帖子

5

积分

新手上路

Rank: 1

积分
5
发表于 2005-5-25 10:59:00 | 显示全部楼层

Re:典型测试错误(一)——测试的作用

你好,可否邮件联系下,深入交流,谢谢。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

作品发布|文章投稿|广告合作|关于本站|游戏开发论坛 ( 闽ICP备17032699号-3 )

GMT+8, 2024-5-4 18:24

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表