Why Is Australia Following A Flawed Model?

Distinguished Guest Writer

David HornsbyDavid Hornsby One of Australia’s greatest advocate for the rights of children in the school learning context, David was with the Ministry of Education (Victoria) for 28 years and taught every year level. For 4 years, he was a curriculum consultant in primary and secondary schools and then returned to the primary classroom. During that time, he also lectured at La Trobe University and RMIT University. He was a principal for 5 years but now works as a curriculum consultant. David has completed many lecture tours of the United States and the United Kingdom. He has also worked with teachers in Costa Rica, Hong Kong, Indonesia, Malta, New Zealand and Singapore. This year, he has been asked to work with teachers in Milan, Hong Kong, Beijing and Shanghai.

David has written or co-authored many popular books, including Write On, Read On, Novel Approaches, Sounds Great, Planning for English, Planning Curriculum Connections and others. A Closer Look at Guided Reading won “The 2001 Australian Awards for Excellence in Educational Publishing.” His latest book is Teaching Phonics in Context (co-authored with Lorraine Wilson).

His insightful views on the elements of testing as used in a realistic classroom setting just cannot be ignored. That is why he has asked the Senators and the Australian educational community…..

Phil Cullen



by David Hornsby 


Thank-you for the opportunity to write a submission for the Senate Inquiry. I am writing as an educator, but also as a parent and grandparent. I have been an educator for 48 years and I’m passionate about quality teaching and learning, which includes rigorous assessment. Sadly, NAPLAN is not rigorous — it is deeply flawed.

In 48 years in education, I’ve never seen anything as destructive to quality teaching and learning as NAPLAN.

In view of the short timeline for writing submissions, I am focusing on just a few of the problems with NAPLAN. Other serious problems with NAPLAN have been identified in eighteen 2-page papers available at: http://www.literacyeducators.com.au/naplan

I urge you to refer to those papers. It is significant that over 130 academics from around Australia signed a “Letter of Support” to accompany those papers(letter attached).

Clearly, there is broad concern about the serious problems with NAPLAN.

 A. The score on a NAPLAN test does NOT tell you about student achievement.

When testing literacy or numeracy, there are thousands of possible test items from which test designers can select. In NAPLAN, only 40 items are selected from the thousands available, so they cannot possibly represent all topics. By chance, some students will be advantaged because the selected items test topicsthey understand; by chance, other students will be disadvantaged because the selected items test topics they haven’t experienced or don’t understand. (Indeed, which curriculum is NAPLAN testing? Different States and Territories have adapted the Australian Curriculum in different ways. Some schools are not doing the Australian Curriculum at all; they may be International Baccalaureate schools or Steiner Schools or any number of other school types. We have always valued diversity in Australia. Is this no longer the case?)

So the 40 items selected do not cover all topics. To complicate this, there is an assumption that the topics tested are tested well. However, we know this is not the case. Indeed, it’s frightening to know how many of the items are bad items.

Polster & Ross (2013) have evaluated the test items in the 2012 numeracy test. They found ambiguity in test items, ‘archetypally bad’ tag questions, contrived contexts and contrived wording of questions. There were many questions on which the students performed poorly, but “it is often difficult to be sure why.” The authors accuse ACARA of poor accountability. Who is testing the test items and who is testing the testers?

Buchanan & Bartlett (2012) have shown that the spelling assessment items arenot valid or reliable measures of students’ spelling. They question the usefulness of NAPLAN in improving student achievement. Snowball (2012) addresses the misleading information provided to teachers about spelling after the NAPLAN testing. “It is bad enough that the NAPLAN tests are such poor measures of students’ ability to spell, but the information distributed with the results also shows a lack of knowledge about English orthography and the strategies used by competent spellers.”

Mueller and her colleagues (2012) have written about their concerns regarding the NAPLAN tests of language conventions. After examining the sections relating to grammar and punctuation, they wrote, “an analysis of the 2008-2012 tests reveals that they do not provide an appropriate platform on which teachers and their students can build a sophisticated understanding of English grammar and punctuation.” Grant & Mueller (2010) wrote, “Many items are unclear as to their purpose, or test meaning rather than grammar, punctuation or spelling.”

Wilson (2012) has shown that the NAPLAN reading tests fail to assess what is being taught and how it is being taught. Wilson highlights the differences between the narrow NAPLAN view of reading and the kinds of reading required by 21st century readers.

Clearly, the tests only test a very small part of any one curriculum area, and the few topics tested are not tested well because the test items are often poorly written, or ambiguous, or they don’t test the skill or concept they intend to test.

There’s another important reason why we can’t use NAPLAN results to tell us about student achievement – the errors of measurement are unacceptably high (see Wu, 2010).

Important questions to ask: Why aren’t the measurement errors of NAPLAN pubished? Why is the Technical Report only available through Freedom of Information? Is there something to hide? Polster & Ross are right to ask, “Who is testing the testers?”

 B. Timed, standardised tests are NOT able to assess important learning.

Timed tests actually advantage students who are happy to race through a test and give superficial, quick responses just so that they get finished. On the other hand, students who are thoughtful and reflective are disadvantaged because they want to consider each alternative of a multiple-choice item carefully. Because the test is timed, they may not get finished. “Studies of students of different ages have found a statistical association btween students with high scores on standardized tests and relatively shallow thinking.” (Kohn, 2000.) If students had more time, they may have answered more items correctly. If they’re not all getting the time they need to be thoughtful and reflective, are we measuring what they really know? Are we measuring what we value?

Some adults in Melbourne were asked to complete the Year 3 reading test in2010. It took them 35-40 minutes. The 8-year-old students had only 40-45 minutes. The amount they had to read, in the time given, was excessive. We also need to ask if 8 year-olds can concentrate on an independent task like a standardised test for 40 to 45 minutes. Are the NAPLAN tests really designed to assess students’ abilities as fairly and accurately as possible?

 C. NAPLAN tests are not diagnostic tests.

The standardised NAPLAN tests give population data, not individual data. The population data from NAPLAN tests can be used to answer questions such as:

– overall, are boys doing better than girls?

– overall, are urban students doing better than rural students? and so on.

It should be noted, of course, that we could answer these questions by testing a sample of students every 3 or 4 years. Testing every student in Years 3, 5, 7 & 9 every year is a massive waste of taxpayers’ money. Schools have been bullied by bureaucrats in Regional Offices and higher levels of the education system. They have been required to spend hours and hours analysing NAPLAN data as if it were diagnostic data. The level of ignorance is astounding and bureaucrats get away with the bullying because most people, including teachers, do not have statistical literacy. Because NAPLAN tests are not diagnostic, the data do not help teachers plan for individual students. On a 40-item test, one item about the appropriate use of a comma does not assess a student’s knowledge and understanding about

comma use! One item on subject-verb agreement does not assess a student’s knowledge and unerstanding about subject-verb agreement! Even if the tests were diagnostic, the results would be useless because teachers get them 5 months later. ACARA acknowledges this and they have said that they are planning to put the tests online so that results can be provided more quickly. Teachers wonder how that would be possible. In Victoria, when the education department tried to connect teachers online, their new and expensive software system collapsed – and it was only one State. How does ACARA believe that all students across the whole country can be tested online at the same time? Even if programmers developed an incredibly advanced system, schools don’t have a computer for every child! Unless there is massive investment in upgrading school computers and providing one computer for every child, whole-scale online assessment will be impossible.

 D. The average result for a school cannot be used to evaluate the teachers or the school, and schools are not the only influence on test scores.

Harris et.al. (2011) remind us that test scores cannot tell you whether a school is good or bad because schools are not the only influence on test scores. Teese (2012) reported that the country’s top 100 primary and secondary schools have students from well-to-do suburbs. He claimed that, “It’s not an even playing field in which talent can blossom from whatever location – it’s people excelling through social advantage.”

Ocean (2012) asked pre-service teachers to analyse NAPLAN results and compare them to the wealth of the school’s parents. They were all greatly concerned when they realised the truth of Teese’s claim that there is “not an even playing field” and shocked to discover how NAPLAN compounds the lack of fairness.

Dinham (2012) says, “We cannot ignore the effects on learning and development of socioeconomic status, family background, geographic location and the funding and resources available to schools. Every teacher is not going to be able to bring every student to an average or above average level of performance but the vast majority of teachers will try very hard to do this.”

Berliner (2009) has shown that powerful out-of-school factors greatly influence achievement gaps. These factors are “related to a host of poverty-induced physical, sociological, and psychological problems that children often bring to school…” Clearly, NAPLAN results alone cannot be used to evaluate teachers or schools. Teachers and schools have no control over many of the factors that negatively impact on their students’ potential.

UNICEF (2007) also acknowledges the relationship between test scores and poverty. Perhaps it would be more appropriate to talk about a “poverty gap” than an “achievement gap”.

Given that there are many factors, including out-of-school factors, effecting students’ NAPLAN scores, it’s dishonest to link the results to high-stakes issues.

Publicly, I’ve heard the CEO of ACARA say that NAPLAN is only a snapshot of students’ learning. However, these same “snapshot results” are then used to give a school ‘red’ or ‘green’ on the MySchool website and to make statements about the quality of teachers and schools. Appallingly, parents are also told that they should use the flawed information on the MySchool website to make decisions about school choice. I believe that parents are actually being misled.


I have commented on only four of the problems with NAPLAN.

All are serious problems.

  1. A student’s score on a NAPLAN test does not tell us anything worth knowing about that student. Many test items are not testing what they are claiming to test. The tests are unreliable and have unacceptably high errors of measurement (see Wu 2010; Hornsby & Wu 2012).
  2. Timed tests are not able to assess important learning. Indeed, timed tests advantage superficial, shallow thinking and disadvantage critical and reflective thinking.
  3. NAPLAN tests are standardised tests that provide population data; they are not diagnostic tests and they cannot provide valid or reliable data about individual students.
  4. Teachers and schools are not the only influence on test scores, so test scores cannot establish teacher or school accountability. Many academics, including statistician Prof Margaret Wu, have repeatedly warned about the abuse of statistics for inappropriate purposes. However, the abuse continues, and unreliable NAPLAN test scores are still linked to high-stakes issues such as funding and teacher/school accountability.

Finally, I hope the Senate Inquiry considers these questions:

How is it that Finland, one of the top education systems in the world, achieves such remarkable standards without the use of standardised tests?

Why is Australia following the flawed US model instead?


[For print-out to distribute this article with its full bibliography please go to list of Senate Inquiry submissions]

 DID YOU KNOW? Another anti-Standardised Blanket Testing opponent, Sir Ken Robinson’s talk about changing paradigms in education has been viewed by over 300million people?

Phil Cullen,

[Former Director of Primary Education, Q”ld] ,

41 Cominan Avenue, Banora Point 2486

07 5524 6443



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s