Validity & Reliability of Tests

until blanket testing is ‘dead, buried and cremated’.

The Treehorn Express

If Treehorn, the hero of Florence Patty Heidi’s The Shrinking of Treehorn. was to set a test for those adults who constantly judged him and made certain assertions about him and his condition…as pro-Naplanners are wont to do with all children…one has to wonder just what questions he might ask! Would you care to send some to me at  ….apart from the obvious :”Why don’t you take any notice of me?”

Please send one or two or more. I‘d love to list them.


Validity & Reliability of Tests

Blanket testing is a device used by the unprincipled and inexperienced to annoy children, with whom they won’t or can’t discuss individual learning progress.

{Called NCLB in U.S.A.; National Standards in N.Z.; National Testing in the U.K.; Naplan in Australia}

The tests pretend to measure some half-dozen or so hard  competencies of young children, too intimidated to explain anything to anybody.

Even though they know of its evils, the unscrupulous use their findings to make gross statements about pupils’ general competencies, teacher ability and over-all school performance. They allow the publishing of the names of the ‘best’ schools and the ‘worst’ schools.

These classroom sciolists, camp followers and professional illiterates believe that the testing is valid and reliable ; and that the tests can distinguish the holistic differences between children, teachers and schools accurately. They have the political power to pretend. Basic professional ethics, human child feelings and parent concern just don’t matter.

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

Since I am a convert from an earlier-generation Kleinism [fear-based schooling], I can describe blanket testing with such politeness. I didn’t worry about the validity or reliability of those term and annual tests that I gave to pupils in my developmental-principal years. Tagging kids with numbers and scores was good enough for most parents. That’s all they seemed to want. Since then, wider experience, deeper professional reading and a colleague-developed, deeply-entrenched ethical standard have assured me that such blanket tests are useless, evil and dangerous. They should not exist. They tag everything with numbers!!

Now…with reform-based Kleinism,  those, more expert than I, believe that the validity and reliability aspects need to be considered more deeply than they are…

Professor Brian Cambourne, Australia’s distinguished literacy guru says:

This acceptance [especially by the media and education bureaucrats] has given NAPLAN high ‘face’ validity with the general public and this face validity has been conflated to equate with what psychometricians call ‘construct’ validity. Nor has what some new breed psychometricians call ‘consequential’ validity ever been researched. (Consequential validity addresses the question ‘Are the consequences of applying this test worth the pedagogical costs of using it?’)

Brian goes on to express the hope that the community itself would come to question the validity of NAPLAN. He expresses concern about…

1. The number of kids who have been classified as “failing” or “poor” readers who are avid effective readers of complex books, web sites, etc. I’ve a;lready met a number of parents and teachers who look at their children’s NAPLAN results and shake their heads in amazement because they know that these kids are very effective readers. (Wouldn’t it be wonderful if we could somehow collect and share hundreds of these stories?)

2. I’ve seen some eye-movement comparisons of effective readers and ineffective readers reading normal book-based or paper-based text of appropriate level and standardised test texts. The evidence is that reading the standardised test is a substantially different process from reading a “normal” text.

3. One school I visit is worried that its kids did poorly in the spelling section of NAPLAN, yet in their writing at school they clearly demonstrate high spelling ability. Have you looked at the spelling section if NAPLAN? It’s not a test of the kind of spelling knowledge that supports written communication.

Brian reckons that if he could find $50K for the eye-movement technology mentioned in#2 above, he could collect some VERY HARD data which shows just how invalid NAPLAN is. Anybody know a mining magnate or CSG operator who could spare this amount out of Petty Cash?

He believes that there is a need for  detailed research into the nitty-gritty details of the processes and assumptions underpinning the construction, use, application and scoring of NAPLAN type tests.

Seriously, if you have the address of a philanthropic rich person who is looking for a children-benefit project…why not send him this Treehorn and highlight this section. You never know! It seems that SOMETHING MUST BE DONE.

Thanks to Dr. Marian Lewis of USQ, support for this view comes from the U.S…..

Whoever Said There’s No Such Thing As a Stupid Question Never Looked Carefully at a Standardized Test

It can’t be repeated often enough: Standardized tests are very poor measures of the intellectual capabilities that matter most, and that’s true because of how they’re designed, not just because of how they’re used. Like other writers, I’ve relied on arguments and research to make this point. But sometimes a telling example can be more effective. So here’s an item that appeared on the state high school math exam in Massachusetts:

n 1 2 3 4 5 6

tn 3 5 __ __ __ __

The first two terms of a sequence, t1 and t2, are shown above as 3 and 5. Using the rule: tn = (tn-1) plus (tn-2), where n is greater than or equal to 3, complete the table.

If (a) your reaction to this question was “Huh??” (or “Uh-oh. What’s with the teeny little n’s?”) and (b) you lead a reasonably successful and satisfying life, it may be worth pausing to ask why we deny diplomas to high school students just because they, too, struggle with such questions. Hence [Deborah] Meier’s Mandate: “No student should be expected to meet an academic requirement that a cross section of successful adults in the community cannot.”

But perhaps you figured out that the test designers are just asking you to add 3 and 5 to get 8, then add 5 and 8 to get 13, then add 8 to 13 to get 21, and so on. If so, congratulations. But what is the question really testing? A pair of math educators, Al Cuoco and Faye Ruopp, pointed out how much less is going on here than meets the eye:

The problem simply requires the ability to follow a rule; there is no mathematics in it at all. And many 10th-grade students will get it wrong, not because they lack the mathematical thinking necessary to fill in the table, but simply because they haven’t had experience with the notation. Next year, however, teachers will prep students on how to use formulas like tn = tn-1 + tn-2, more students will get it right, and state education officials will tell us that we are increasing mathematical literacy.[1]

In contrast to most criticisms of standardized testing, which look at tests in the aggregate and their effects on entire populations, this is a bottom-up critique. Its impact is to challenge not only the view that such tests provide “objective” data about learning but to jolt us into realizing that high scores are not necessarily good news and low scores are not necessarily bad news.

If the questions on a test measure little more than the ability to apply an algorithm mindlessly, then you can’t use the results of that test to make pronouncements about this kid’s (or this school’s, or this state’s, or this country’s) proficiency at mathematical thinking. Similarly, if the questions on a science or social studies test mostly gauge the number of dates or definitions that have been committed to memory — and, perhaps, a generic skill at taking tests — it would be foolish to draw conclusions about students’ understanding of those fields.

A parallel bottom-up critique emerges from interviewing children about why they picked the answers they did on multiple-choice exams — answers for which they received no credit — and discovering that some of their reasons are actually quite sophisticated, which of course one would never know just by counting the number of their “correct” answers.[2]

No newspaper, no politician, no parent or school administrator should ever assume that a test score is a valid and meaningful indicator without looking carefully at the questions on that test to ascertain that they’re designed to measure something of importance and do so effectively. Moreover, as Cuoco and Ruopp remind us, rising scores over time are often nothing to cheer about because the kind of instruction intended to prepare kids for the test — even when it does so successfully — may be instruction that’s not particularly valuable. Indeed, teaching designed to raise test scores typically reduces the time available for real learning. And it’s naïve to tell teachers they should “just teach well and let the tests take care of themselves.” Indeed, if the questions on the tests are sufficiently stupid, bad teaching may produce better scores than good teaching.


1. Cuoco and Ruopp, “Math Exam Rationale Doesn’t Add Up,” Boston Globe, May 24, 1998, p. D3.

2. For examples (and analysis) of this kind of discrepancy, see Banesh Hoffmann, The Tyranny of Testing (New York: Crowell-Collier, 1962); Deborah Meier, “Why Reading Tests Don’t Test Reading,”Dissent, Fall 1981: 457-66; Walt Haney and Laurie Scott, “Talking with Children About Tests: An Exploratory Study of Test Item Ambiguity,” in Roy O. Freedle and Richard P. Duran, eds., Cognitive and Linguistic Analyses of Test Performance (Norwood, NJ: Ablex, 1987); and Clifford Hill and Eric Larsen,Children and Reading Tests (Stamford, CT: Ablex, 2000).

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

Next Issue

Since the holiday period allows more time for deep professional reading, I hope to concoct a list of appropriate readings that you might enjoy. Aren’t we lucky these days to have so much available at our finger-tips [so to speak] that gives meaning and pride to the teaching-learning enterprise?

Don’t Forget

Can you send a question that Treehorn will put on his test for his adult ‘carers’?   AND  If you do know the address of a magnate or person who might sponsor research into aspects of NAPLAN mentioned above, you can send it to me, if you prefer. I shall send it on. It could help our kids and show them that we like them.

Phil Cullen

41 Cominan Avenue

Banora Point  2486

07 5524 6443

Enhanced by Zemanta