OfSTED, the Panama Canal and the end of ideology

[OfSTED = Office for Standards in Education. The English schools inspection service.]

You’ll have to bear with me on this one. The dots do link up, I promise – in an entertaining way.

First, some evidence.


After 25 years of failure and humiliation the French pulled out of building the canal – even Ferdinand de Lesepps who had built the stupendous Suez Canal. Yellow Fever, lakes, rivers, jungle. Enter Roosevelt and the Americans. Turns out it was not a waterways challenge at all. It was a tripartite challenge of medical epidemiology, politics and, crucially, of railways! The canal was opened in overlooked triumph on the same day in 1914 that Europe fell into a moral abyss with the declaration of World War 1. 


School test developer: Minister, how many children do you want to succeed this year?

Minister: Fewer than last year! We were taken to the cleaners by the press for how easy the exams were. What were you doing?!

Tester: Okay, we’ll raise the pass mark and moderation criteria.


Test developer: Minister, how many kids do you want to pass this year?

Minister: Are you kidding?! Loads! I was excoriated last year because so many failed. What did you do?!! Lower the pass mark – PLEASE!


….plus ça change

…and we all thought educational ‘standards’ were part of the laws of physics…



The turn of the 20th Century in the United States was a pivotal moment for the world we find ourselves in today. Here was the birth of American Progressivism and the beginning of the end of political ideology – Left versus Right, and so on. Progressivism was rooted in the belief that technical mastery could overcome nature and lead to perfection – a secular kind of redemption. Science was revered; intellectualism held in deep suspicion.

Progressivism was, perhaps, best exemplified in the building of the Panama Canal by President Theodore Roosevelt. The French had failed over 25 years to build the canal, succumbing to mosquito-born diseases, a gigantic lake in the middle of its path, and financial corruption – for which most of the planners (including Henri Eiffel) went to jail. In its day, this was a challenge equal to building the International Space Station. It was accomplished through the creative energy of a doctor who understood how to eliminate mosquitos in a quarter-mile path by observing their behaviour; an engineer who realised that building the canal was a railway project, not a waterway project (i.e. for logistics and to overcome jungle inaccessibility); and politicians, who negotiated the illegal creation of a new country to allow for untramelled access.

The triumph of the canal was technical – of the highest order (this was, for example, the first use of concrete in a major civil engineering project – and it is the same concrete that forms the three super-massive locks today – each the size of the Empire State Building on its side). miraflores

But that was merely the underpinning for Roosevelt’s real aspiration. This was a moral challenge, to demonstrate supremacy over the drag of history; the capacity to turn adversity into excellence; the reaffirmation of the human spcies as a transcendental ape. This was moral progress disguised as technical accomplishment. One step further from the apes.

Progressivism has been called the unification of “idealism with efficiency” – each of those feeding into the other. It marked out the clearest cultural difference between the USA and the UK over the course of the 20th Century. The US saw no barriers to ambition and social or scientific advancement; the UK, rooted in centuries of class differentiation and a deep embededness of social inertia, settled for less, resigned to the inevitable falling short of the ideal. Actually, some Victorians, convinced that poverty was the inevitable result of delayed evolution (of the poor classes) saw no urgency in its relief – but still built philosophical institutes and concert gardens, for why should the poor starve without the benefit of culture. The genius of Barack Obama was to tap into the historical belief and commitment that ‘yes we can’, while the UK settled for MacMillan’s lugubrious, ‘you’ve never had it so good’ – meaning, take what you’re given and build your expectations around that. Obama’s was a celebration; MacMillan’s was a lament.

Progressivism grew in strength and was expressed across decades in breathtakingly ambitious legislative programs – to rid society of the scourge of prostitution, alcohol, poverty, school underachievement, racism – to launder society towards a utopianism that was in a generation’s grasp. Lyndon Johnson openly declared “war on poverty” – and no American president enters a war without a firm expectation of victory.

The unification of idealism with efficiency.

Progressivism breached the shores of the UK under Tony Blair. It was he who saw through the miserable defeatism of One Nation Toryism (the rich patronising the poor) and chose the American way. He was obsessed with both efficiency and idealism – with the commitment to excellence, to triumph over deficits of any kind, with an intolerance of shortcoming or shortfall in ambition. Ideological differences were irrelevant and belonged to the old social divisions in which we ‘settled for less’. Like the Panama Canal, the process was technical, the goal was moral advancement; we did not need gloomy philosophers, only pragmatic managers. It was a spontaneous outburst in an early interview in his premiership when he announced that he would retain OfSTED under the infamous Chris Woodhead. But the commitment was pre-prepared, in the sense that OfSTED was part of his philosophy. This was the compost in which OfSTED flourished. They would police compliance with the agreed goals and ‘standards’, measure performance, ensure testing was rigorous, impose measures of competence/incompetence, good/bad and indifferent – and so on.

At the same time, Blair announced an end to ideology in education – a linked statement. There were no arguments to be had about educational purposes. These were set, clear and uncontestable. The issues were all technical. OfSTED was the ringmaster of efficiency and technical mastery. Quality of teaching? Develop an algorithm. Learning? A matter for transmission and reception. Knowledge to be taught? Irrelevant, so long as it is at the correct ‘level’. Underachievement? Tweek the engine. If you argue with the goal consensus you are ‘dissident’. Response to dissidence? Discipline. This gave the levers of inspection to OfSTED and protected them from argument. If OfSTED said your school was ‘failing’, that’s that. No argument – no appeal. The method is the method, the measure is the measure.

Of course, this is all airy hopefulness – though many educational theorists came on board to provide the intellectual justification, as we (I am one) always will – if that’s where the research grants are. But there is also enough research experience to argue that there is only an approximate relationship between what is taught and what is learned; that a class of 30 students inevitably means 30 individual worlds of meaning and learning; that every act of teaching is an experiment for the teacher to observe and learn from; that teaching quality is a matter of weeks/months/years exposure; and that whatever we think we mean by a ‘good class’ or a ‘good lesson’ allows no prediction of any kind of the quality of learning that goes on. There is evidence that the kind of knowledge involved in repetition and testing tends to be short-lived, whereas learning that is embedded and which lasts is that which sediments over time. We can compare two schools with the same futility and artifice with which we compare two families – it’s just too complex. And we have a wealth of research experience on testing which suggests that it is not possible to develop a test that has no cultural or socio-economic bias. The best test developers (most are in the USA) concede two crucial insights: (i) that a test can only reliably test the ability to take a test; and (ii) like a car MOT test, the result of an educational test has a very short half-life – that whatever it might say decays rapidly over time and space: what is true here may not be true there; what is true today may not be true tomorrow. If you want to be shocked into an argument abut this look here:


But here is the killer: there is a strong argument made by leading test developers (people like Lee Cronbach and Gene Glass) that it is not possible validly to set a cut-off score – a fixed line between grades or even between ‘pass’ and ‘fail’. Here is Gene Glass in a paper lamenting the corruption of educational testing by those looking for scores of ‘good/bad’, ‘competent/incompetent’ or ‘pass/fail’.

“In education, one can recognize improvement and decay, but one cannot make cogent absolute judgments of good and bad…”

For those who want to take this a little deeper, read this quote from that paper – others, just skip it.

I am confident that the only sensible interpretations of data from assessment programs will be based solely on whether the rate of performance goes up or down. Interpretations and decisions based on absolute levels on performance on exercises will be largely meaningless, since these absolute levels vary unaccountably with exercise content and difficulty, since judges will disagree wildly on the question of what consequences ought to ensue from the same absolute level of performance, and since there is no way to relate absolute levels of performance on exercises to success on the job, at higher levels of schooling, or in life. Setting performance standards on tests and exercises by known methods is a waste of time or worse.

All parents know that the difference between E (Fail) and D (Pass) is distressingly within a margin of error. But the surprising thing is that under experimental conditions the margin of error between A and E may not be that much smaller. What Glass argues is that the only valid and useful result of testing is to show movement from one test to the next: is the student doing better or worse and what does that mean. Indeed, common practice in New Zealand schools (which do not allow the kind of high-stakes testing in England) and especially in schools which teach the Primary International Baccalaureate, is to use student assessment as a basis for a discussion between the teacher and student on that student’s progress.

[For the real aficionados of detail I include a longer extract from Glass’s paper at the end of this blog. It’s not too hard at all for the lay person to follow and it holds surprises.]

So, with Blair OfSTED washed up on the shores of redemption and set to. It calculated that there were 13,000 “incompetent teachers” – no matter that measures of competence are momentary inventions and have nothing to do with professional judgement, wisdom or mastery. Schools fail against standards of achievement! – ignoring that a standard is set by the politicians and doesn’t arrive on a deep-space asteroid as a detached, universal, irrefutable law of nature.

Where are we now.

OfSTED inspectors go into a school, spend minutes in occasional classes, have fleeting conversations with students, glance at their work, pore over documents – and in two days they are done-and-dusted. All the complexities of the people, the classrooms, the educational philosophies and learning, the professional experience built up over years, the relationships, experiments, long-standing practitioner discussions, compromises, agreements and arguments, mysteries, learning trajectories, knowledge conflicts, ethnic differences, religious and cultural understandings, social media exchanges, friendship networks, mental illnesses, sexuality confusions, family breakdowns and bankruptcies, and so on and so on and so on….all these are reduced to the grotesque absurdity of single judgements. This school (what does THAT mean??) is failing or is satisfactory (what does THAT mean?). This ‘leadership’ is effective or not. These students (which students?) are not being well-served.

And here is the point of this blog. That there are educational arguments to be had over purposes and over what counts as educational quality. OfSTED closes down discussion at precisely the point where we need it, where evidence from many sides points to multiple narratives about school quality. The aim of laundering the school system of underachievement is as hopelessly misguided as the Progressive notion that society can be cleansed, that nature can be mastered and that ideology – the clash of beliefs and values – can be eliminated. There are left-wing and right-wing approaches to education (which I explore in another, related, blog) and it is unwise and unjust to suppress them. Progressivism is to be embraced for its ambition and its optimism, but held in deep skepticism where it produces an instrument like OfSTED. Society is not like a canal where starting point and destination can be known, where there exist concrete measures of quality and success (though whether the canal was worth the aberration of creating an artificial and undemocratic country remains a question).

As a final note in this long piece it is worth looking at the case of New Zealand. here, in pace of school inspection there is school ‘review’. This is a collaborative process in which the school is engaged in a discussion and a negotiation over its strengths and weaknesses. Those schools which understand the possibilities can use the process to be transparent and self-reflective without the anxieties, suicides and manic compliances that come with OfSTED. New Zealand’s Educational Review Office is based on reasonableness  and a symmetrical relationship, where OfSTED’s foundations are fear and punishment. Progressivism has its dark side.

From Gene Glass (1978). Standards and Criteria. Journal of Educational Measurement15(4), 237-261.

Standards in Common Parlance

Setting standards or mastery levels is frequently written about as though it is a well-established and routine phase of instructional development. In conversations with measurement specialists and instructional development experts over the past few years, I have been literally dumbfounded by the nonchalance with which they handle the standards problem. One will report that he always sets a standard of two-thirds of the items correct for mastery because he’s a sort of “liberal guy.” Another expert will report that he holds learners to 70% mastery, and a third advances his 90% standard with an air of tough-mindedness and respect for excellence. None of them bothers with such apparently extraneous considerations as how the test items are to be composed and whether they will be abstruse or obvious. In one of the sacred writings of the instructional objectives movement, Robert F. Mager (1962) identified standard setting as an integral part of stating an objective properly:

If we can specify at least the minimum acceptable performance for each objective, we will have a performance standard against which to test our instructional programs; we will have a means for determining whether our programs are successful in achieving our instructional intent. What we must try to do, then, is indicate in our statement of objectives what the acceptable performance will be, by adding words that describe the criterion of success. (p. 44)

Mager went on to illustrate what he meant by a behavioral objective and its associate standard: 

  • The student must be able to correctly solve at least seven simple linear equations within a Period of thirty minutes. 
  • Given a human skeleton, the student must be able to correctly identify by labeling at least 40 of the. . . bones; there will be no penalty for guessing. 
  • The student must be able to spell correctly at least 80 percent of the words called out to him during an examination period. (p. 44)

This language of performance standards is pseudoquantification, a meaningless application of numbers to a question not prepared for quantitative analysis. A teacher, or psychologist, or linguist simply cannot set meaningful standards of performance for activities as imprecisely defined as “spelling correctly words called out during an examination period. And, little headway is made toward a solution to the problem by specifying greater detail about how the questions, tasks, or exercises will be constructed. 
Can a more meaningful performance standard be stated for an objective as molecular as “the pupil will be able to discriminate the grapheme combination ‘vowel + r’ spelled ‘ir’ from other graphemes”? Can it be asserted confidently about this narrow objective that a pupil should be able to make 9 out of 10 correct discriminations? In point of fact, this objective appears on the Stanford Reading Test where it is assessed by two different items: 

a) Mark the word “firm” (Read by proctor)

firm form farm

b) Mark the word “girl” (Read by proctor) 

goal  girl grill

The percentages of second-grade pupils in the norm population answering items a) and b) correctly were 56% and 88%, respectively. Any performance standards (e.g., “8 out of 10 correct”) for a group of items like item a would be quite inappropriate for a group of items like item b, since they are so different in difficulty. Results from a grade seven assessment by the Department of Education in New Jersey illustrate the same point. Pupils averaged 86% on vertical addition, but only 46% on horizontal addition. The vagaries of teaching and measurement are so poorly understood that the a prioristatement of performance standards is foolhardy.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s