我们应该抛弃标准化考核吗
日期:2020-03-16 15:35

(单词翻译:单击)

 MP3点击下载

The first standardized tests that we know of were administered in China over 2,000 years ago during the Han dynasty.
我们所知的第一场标准化考核是在2000多年前由中国的汉朝举办的。
Chinese officials used them to determine aptitude for various government posts.
当时汉朝的官员依据这些考核来为政府职位挑选人才。
The subject matter included philosophy, farming, and even military tactics.
考试的科目包括哲学、农业,甚至军事策略。
Standardized tests continued to be used around the world for the next two millennia,
标准化考核在之后的两千年中被世界各地所采用,
and today, they're used for everything from evaluating stair climbs for firefighters in France
时至今日,它们仍然被广泛应用于方方面面,从法国消防员的台阶攀爬考核,
to language examinations for diplomats in Canada to students in schools.
到加拿大外交官的语言考核,再到学校的学生。
Some standardized tests measure scores only in relation to the results of other test takers.
有些标准化考核的成绩仅仅和其他参加考试的考生成绩相关。
Others measure performances on how well test takers meet predetermined criteria.
其他考试则依据预定的标准来评判考生的表现。
So the stair climb for the firefighter could be measured by comparing the time of the climb to that of all other firefighters.
所以消防员的台阶攀爬测试可以通过和其他消防员比较攀爬时长来进行评估。
This might be expressed in what many call a bell curve.
考核结果可以用我们大家所说的钟形曲线来展现。
Or it could be evaluated with reference to set criteria,
或者可以依据预设的标准为参考来进行评估,
such as carrying a certain amount of weight a certain distance up a certain number of stairs.
比如携带指定的负重向上攀爬特定距离及特定的台阶数。
Similarly, the diplomat might be measured against other test-taking diplomats,
同样的,外交官考核的成绩可以通过和其他考生互相比较来评估,
or against a set of fixed criteria, which demonstrate different levels of language proficiency.
或者根据能够展现语言掌握程度而设立的标准进行评估。
And all of these results can be expressed using something called a percentile.
而所有这些考核成绩都可以通过一种被称为百分位数的形式来展现。
If a diplomat is in the 70th percentile, 70% of test takers scored below her.
例如,一位外交官的成绩是第70个百分位数,即高于70%的考生。
If she scored in the 30th percentile, 70% of test takers scored above her.
而如果她的成绩是第30个百分位数,就是低于70%的考生。
Although standardized tests are sometimes controversial, they're simply a tool.
尽管标准化考核有时也会引起争议,它们也仅仅只是一种工具而已。
As a thought experiment, think of a standardized test as a ruler.
把标准化考核想像成一把尺。
A ruler's usefulness depends on two things.
而让尺发挥作用取决于两个因素。
First, the job we ask it to do.
首先,是我们想让它发挥的功能。
Our ruler can't measure the temperature outside or how loud someone is singing.
我们不能用尺来测量室外的温度,或者某个人唱歌的分贝。
Second, the ruler's usefulness depends on its design.
其次,尺的设计决定了它的作用。
Say you need to measure the circumference of an orange.
比如你想要测量一个橙子的圆周长。
Our ruler measures length, which is the right quantity,
我们的尺正是用来测量长度的,
but it hasn't been designed with the flexibility required for the task at hand.
但是它的设计并不能满足当前任务所需的弹性。

我们应该抛弃标准化考核吗

So, if standardized tests are given the wrong job, or aren't designed properly,
所以当标准化考核被赋予了错误的功能,或者考核的设计失当,
they may end up measuring the wrong things.
它们最终可能会得出错误的测试结果。
In the case of schools, students with test anxiety may have trouble performing their best on a standardized test,
例如在学校中,有考试焦虑症的学生可能无法在标准化考核中展现全部实力,
not because they don't know the answers, but because they're feeling too nervous to share what they've learned.
这并不是因为他们不知道答案,而是因为他们太紧张而无法分享自己所学的知识。
Students with reading challenges may struggle with the wording of a math problem,
有阅读障碍的学生可能无法理解一道数学题的题意,
so their test results may better reflect their literacy rather than numeracy skills.
所以他们的考试成绩也许更好的反馈了他们的读写能力,而不是数学能力。
And students who were confused by examples on tests that contain unfamiliar cultural references may do poorly,
而有些学生对于试题中涉及的他们所不熟悉的文化背景感到困惑,因而表现不佳。
telling us more about the test taker's cultural familiarity than their academic learning.
这些最终会更多的向我们展示考生对于文化的熟悉程度,而非他们的学术能力。
In these cases, the tests may need to be designed differently.
以上事例中的考核也许需要重新设计。
Standardized tests can also have a hard time measuring abstract characteristics or skills,
标准化考核在测试抽象的特性或者技能上也无法发挥应有的作用,
such as creativity, critical thinking, and collaboration.
比如创造力,批判性思维和协同合作性。
If we design a test poorly, or ask it to do the wrong job,
如果我们没有正确的设计考核机制或者赋予考核错误的作用,
or a job it's not very good at, the results may not be reliable or valid.
或者将考核应用于不恰当的领域,考核的结果就可能并不可信或者无效。
Reliability and validity are two critical ideas for understanding standardized tests.
可信度和有效性是理解标准化考核的两个重要概念。
To understand the difference between them, we can use the metaphor of two broken thermometers.
为了理解这两者间的不同之处,我们可以用两个破损的温度计做比喻。
An unreliable thermometer gives you a different reading each time you take your temperature,
一个不可靠的温度计会在每次测量的时候得到不同的读数,
and the reliable but invalid thermometer is consistently ten degrees too hot.
而一个可靠但是结果无效的温度计的读数会始终偏高10度。
Validity also depends on accurate interpretations of results.
有效性也取决于对于结果准确的解读。
If people say results of a test mean something they don't, that test may have a validity problem.
如果人们想将考核的结果推广到超出其本身所代表的意义,那这个考核的有效性就出现了问题。
Just as we wouldn't expect a ruler to tell us how much an elephant weighs, or what it had for breakfast,
正如我们不能期望用尺来测量出大象的重量或者它早饭吃了什么,
we can't expect standardized tests alone to reliably tell us how smart someone is,
我们也无法期待仅仅通过标准化考核就能知道某个人有多聪明,
how diplomats will handle a tough situation, or how brave a firefighter might turn out to be.
外交官是否能机智的化解困境,或者消防员会有多勇敢。
So standardized tests may help us learn a little about a lot of people in a short time,
所以标准化考核也许能够帮助我们在短时间内对一大群人有大概的了解,
but they usually can't tell us a lot about a single person.
但是这些考核通常无法告诉我们关于某一个人的很多特点。
Many social scientists worry about test scores resulting in sweeping
很多社会学家担心考核成绩太过笼统
and often negative changes for test takers, sometimes with long-term life consequences.
并且通常会为考生带来负面的变化,有时候甚至是长期或者影响终生的变化。
We can't blame the tests, though.
然而我们不能抱怨考核本身。
It's up to us to use the right tests for the right jobs, and to interpret results appropriately.
因为这取决于我们如何去将正确的考核用在正确的领域,并且正确的解读考核的结果。

分享到