Examination of Witnesses (Questions 1
MONDAY 10 DECEMBER 2007
Q1 Chairman: May I welcome Professor
Sir Michael Barber and Professor Peter Tymms to the first evidence
session of the new Committee? We have been busily building the
team, seminaring and deciding our priorities for investigation,
but this is our first proper session, so thank you very much for
being able to appear before us at reasonably short notice. Both
of you will know that our predecessor Committee started an inquiry
into testing and assessment. It was a quite different Committee,
but with its interest in schools, it decided to embark on a serious
investigation into testing and assessment. It managed to tie up
with a nice little bow almost every other area through 11 different
reports in the previous Parliament, but it could not conclude
this one. It troubled people to the extent that copious volumes
of written evidence had come to the Committee, and it would seem
wrong if we did not make such an important issue our first topic,
pick up that written evidence, slightly modify and expand the
terms of reference and get on with it. So, thank you very much
for being here. You are key people in this inquiry: first, Michael,
because of your association with testing and assessment, through
which many of us have known you for a long time, right back to
your National Union of Teachers days; and secondly, Professor
Tymms, through your career in a number of institutions, where
we have known you, and known and admired your work. We generally
give witnesses a couple of minutes to make some introductory remarks.
You know what you have been invited to talk about. If you would
like to have a couple of minutesnot too long, although
a couple of minutes is probably a bit shortto get us started,
then I shall start the questioning. Peter, you were here first,
so we shall take you first.
Professor Tymms: I am director
of a centre at the University of Durham which monitors the progress
of children in order to give schoolsnot anybody elsegood
information. It provides us with a tremendous database from which
to view other issues, meaning that I have taken an interest in
all the different assessmentskey stage and so on. They
have concluded that standards in reading have stayed constant
for a long time, but that in mathematics, they have risen since
about 1995. Those are the headlines on testing. On the introduction
of new policies, I am keen to sayI might return to thisthat
there is a need for good trials. If we try something new, we should
get it working before we move it out to the rest of the public.
I am very keen for new ways of operating to be properly evaluated
before they are rolled out, and then to be tracked effectively.
We have been missing that.
Chairman: Thank you.
Sir Michael Barber: Thank you
very much for your invitation, Chairman. I shall comment on the
story of standards in primary schools, which I see in four phases.
The first came between 1988 and 1996, when the then Conservative
Government put in place the national curriculum, national assessment,
Ofsted inspections, league tables and the devolution of resources
to schools. There were lots of ups and downs in that story, but
nevertheless that framework was established. Secondly, there was
the phase with which I was associatedGovernment policy
under David Blunkett who was the then Secretary of State for Education
and Employmentduring which there was a focus on what we
called standards, rather than on structures. A big investment
in teachers' skills, through the national literacy and numeracy
strategies, led to rises in the national test results. I have
always accepted that some of that was down to teaching to the
tests, but a lot of it was down to real improvements evidenced
by Ofsted data and international comparisons. In the third phase,
between 2000 and 2005, the Government were focused largely on
long-term, underpinning and structural reforms, including of the
teaching profession, of secondary education and the introduction
of the children's agenda, at which stage results plateaued. Things
got harder, too, because we had picked the low-hanging fruit,
as it were. I think that we should have stayed much more focused
on literacy and numeracy, in addition to the others things that
we did. That was my error. Now there is an opportunity to make
real progress on literacy and numeracy as a result of the Rose
review last year and the new emphasis on phonics. By the way,
I completely agree with Peter on the pilots and progression. If
all those things are put together, I could envisage a fourth stage,
during which we can begin to make progress. In summary, we have
gone from being below average, on international comparisons, to
above averagewe are above France, Scotland and the EU average.
However, we have a long way to go and significant improvements
to make. If we want to be world class, we must do more.
Q2 Chairman: Thank you for those
introductory remarks. I remember taking the Committee to New Zealand
where people wanted to be able to assess more carefully the progress
of students and were looking at what we had done. I recall their
horror when it was suggested that they might adopt our system.
They said, "We want to know how our young people are doing,
but we do not want to go to the extent that you are of testing
at so many ages." Are you sympathetic to that point of view?
Do you think that we over-test?
Sir Michael Barber: Personally,
I do not think that we over-test in primary schoolsif that
is what you are talking about. Primary school children take literacy
and numeracy tests aged seven and externally-set and marked literacy,
numeracy and science tests aged 11. That is a relatively small
number of tests during a six-year primary school career. The information
provided by the tests is fundamental to understanding how the
system is working and to looking for strategies for future improvements.
I do not think that we over-test at all.
Q3 Chairman: Even if that adds up
to ages seven, 11, 14, 16, 17 and 18?
Sir Michael Barber: I focused
my answer on primary schools. There is a separate debate to be
had about secondary examinations and tests at ages 14, 16, 17
and 18. However, at primary level, we conduct the bare minimum
of testing if we want to give parents, the system, schools and
teachers the information that they need, at different levels,
in order to drive through future improvements. One of the benefits
of 10 years, or so, of national assessments is that this system
has better information with which to make decisions than many
others around the world.
Professor Tymms: I do not think
that testing at seven and 11 is too much testing. However, if
you have a system in which you take those tests, put them into
league tables and send Ofsted inspectors in to hold people accountable,
schools will test a lot more. So we probably do have too much
testing in the top end of primary schools, but that is not statutory
testing. It is the preparation for the statutory testing, so it
is a consequence of what is happening. Of course, we do need the
kind of information that those tests were designed to get at.
You mentioned the need to know what our children are doing and
their levels. If we wanted to know the reading standards of 11-year-olds
in this country, we could probably find out by assessing 2,000
pupils picked at random. We do not have to assess 600,000 pupils.
One purpose is to know what the levels are, which could be done
with a sampling procedure, with the same tests every year, which
would be secret and run by professionals going out and getting
the data. There is another kind of information, for teachers about
their pupils, which they could get by their own internal tests
or other tests if they wanted, and another kind of information
for parents. There is an interface: how do they get that information?
Do they go to the schools, or do they read it in their newspapers?
Do they know about their own pupils? Those layers of information,
and how to get them, provide the complex background to the answer
to your question. There is too much testing, but not because of
a single test at 11for goodness' sake, children can do
that. I think that I was tested every two weeks when I was about
eight years old, and I quite enjoyed them. Not all children do,
but the possibility of that exists. We need good information in
the system for parents, teachers and Parliament, and we need to
know it nationally, but we do not necessarily have to do the sort
of testing that we currently have to get that information. There
are different purposes and reasons for doing it. I guess that
I can expand on that as you need.
Q4 Chairman: But Michael is known
to believeI am not setting you against each otherin
the notion that testing would drive up standards. It was the "engine",
was it not? I am not misquoting you, am I?
Sir Michael Barber: It is not
a misquote, but it is not a complete view of what I believe. I
believe that, in order to drive up standards, we need a combination
of challenge and support. Assessment and Ofsted inspection provide
the challenge in the system, and then we need serious investment
in teachers and their skills, pay and conditions. I am in favour
of assessment, being able to benchmark schools and the information
that that provides to heads, teachers and parents. I agree with
Peter that there may in addition be an advantage to sampling techniques,
probably linked with the international benchmarks to assess the
performance of the whole system.
Q5 Chairman: I have slightly misquoted
you: testing was "the engine to drive performance",
I think you said.
Sir Michael Barber: But I am saying
that the accountability system on its own is not enough. You need
investment in teachers' skills, which is what the national literacy
and numeracy strategies did. They gave teachers the skills and
wherewithal to understand how to teach reading, writing and mathematics.
The evidence of that is powerful. Only recently, the effective
pre-school and primary education research programme, which Pam
Sammons and others run, has shown clearly the benefits in student
outcomes if teachers teach the last part of the literacy hour
wellthe plenary. Detailed pedagogical skills need to be
developed by teachers, which needs an investment. Obviously, you
also need to pay teachers well, ensure that the system is recruiting
enough teachers and devolve money to the schools. I am strongly
in favour of the challenge that comes from an accountability system,
along with the wherewithal for heads and teachers to get the job
done in schoolsnot one or the other, but both.
Q6 Chairman: Any comment on that,
Professor Tymms: There is an assumption
here that standards have risen and that the national literacy
strategy made a difference. In fact, over those years, reading
hardly shifted at all. I perhaps need to back that up, because
there are a lot of different sets of data. Somebody can claim
one thing, somebody can claim another and so on. Is this an appropriate
moment to go into that?
Chairman: Yes, indeed.
Professor Tymms: Okay. From 1995
to 2000, we saw a massive rise in the statutory test data at the
end of primary school. They were below 50% and got up towards
80%. From about 2000 onwards, they were pretty flat. That looks
like a massive rise in standards, and then it was too difficult
because we had got to the top end, all our efforts had gone and
so on. In fact, in 1998 or thereabouts, I was looking at our test
datawe use the same test every year with the same groups
of pupilsand did not see any shift in reading standards.
The key stage assessments use a new test every year, and one must
decide what mark corresponds to Level 4. That is harder. Test
scores rose year on year as a percentage of Level 4 with a new
test, but did not rise with a static test, and that raised a question.
At the same time, Hawker was working at the Qualifications and
Curriculum Authority, and said in The Times Educational Supplement
that if results continued to rise, we would need an independent
investigation. Around that time, QCA decided internally that it
would investigate further. It commissioned Cambridge Assessment
under Massey to take the tests from 1996 and 1999, and to go to
a place that had not been practising the testsNorthern
Ireland. It took equivalent samples of pupils and gave the 1996
and 1999 tests to them. If those tests were measuring a Level
4 of the same standard, the same proportion should have got Level
4, but they did not. Far more got Level 4 with the later test,
so the standards were not equivalent, and that was fully supported
in the Massey study. Massey did a follow-up study in which he
compared the 2000 and 1996 tests, and found rises in maths, which
were not as big as the tests suggested, but nevertheless were
rises. He found that writing scores had increased, but called
the rise in reading skills illusory. Additionally, several local
education authorities collected independent data on reading, using
the same test across the whole LA year after year, and there was
practically no shift in reading scores, but there was a rise in
maths scores. I was able to look at 11 separate studies, which
all told the same story: over that period there was probably a
slight to nothing riseabout one 10th of a standard deviationwhich
might have been achieved if children had practised tests, but
there was no underlying rise. In maths, there was an underlying
rise. There are two things going on. One is that children get
better at tests if they practise them. Prior to national testing,
they were doing practically no testsit was necessary to
go back to the time of the 11-plus for that. We saw a rise because
of practising tests, and we saw an additional rise because standards
were not being set correctly by the School Curriculum and Assessment
Authority and then QCA between 1995 and 2000. Then there was teaching
to the test. After 2000, QCA got its act together and set standards
correctly. It now has a proper system in place, and standards
are flat. There are small rises, and we must treat them with interest,
but with a pinch of salt. Let us suppose that it is decided in
committee that Level 4 is anything above 30 marks. If it were
decided that it was one mark higher than that, the Level 4 percentage
might go up by 2% or 3%, and that would make national headlines,
but that would be due to errors of measurement. The discussion
in the Committee is about three or four points around that point.
The accuracy in one year, although there may be 600,000 pupils,
is dependent on the cut mark, which is clear and was set incorrectly
between 1995 and 2000. The assumption that standards were going
up because we were introducing accountability, because we had
testing, because we had Ofsted, and because we had the 500 initiatives
that the Labour party put in place without evaluation shortly
after coming to office, was based on a misjudgment about standards.
Maths, yes; reading, no; writing, yes.
Sir Michael Barber: This is, as
evidenced by Peter's comments, a complicated area, and I accept
that completely. First, the national literacy and numeracy strategies
are effectively a major investment in teachers' skills and their
capacity to teach in classrooms. That is a long-term investment;
it is not just about this year's, next year's or last year's test
results. It is a long-term investment in the teaching profession's
capacity, and it is well worth making because for decades before
that primary school teachers were criticised for not teaching
reading, writing and maths properly, but no one had invested in
their skills and understanding of best practices. Secondly, there
is a debate about extent, but we seem to be in agreement on maths
and writing. When I was in the delivery unit after I left the
Department for Education and Employment, I learned that it is
dangerous to rely on one set of data. When looking at reading
standards, it is right to look at several sets of data. One is
the national curriculum test results, which tell an important
story. Of course, there is an element of teaching to the test,
but an element of teaching to a good test is not necessarily a
bad thing, although overdoing it is. I always accepted that in
debate with head teachers and teachers during that time. The second
thing is that Ofsted records a very significant improvement in
teachers' skills over that period of time. If teachers improve
their skills in teaching reading, writing and mathematics, you
would expect the results to go up. The third data set that I would
put in that linked argument is that international comparisonsmost
importantly, the progress in international reading literacy study,
that England in 2001 did very well up on international comparisons
in reading. In 1999 came the first accusations that the test results
were not real. Jim Rose led a review involving representatives
of all the parties represented on this Committee, which found
no evidence whatever of any tampering with the tests. In addition,
people in other countries have taken the kinds of things we did
in that phase of the reform and replicated, adapted or built on
themOntario being the best exampleand they, too,
have had improvements in reading, writing and maths. To summarise,
although we might disagree about the extent of improvement, I
think we agree that there has been significant improvement in
maths and writing, which are very important. We are debating whether
there has been improvement in reading. I think the combination
of data sets that I have just set out suggests that there has
been significant improvement in reading. I would be the first
to say that it is not enough and that we have further to go in
all three areas; nevertheless, we have made real progress. My
final point is that over that period, there has, as far as I can
make out, been no significant change in reading and writing in
Scotland, where there was no literacy strategy. The results in
international comparisons indicate that Scotland ticks along roughly
at the same position.
Q7 Chairman: There has been a sharp
drop in recent PIRLS. Does that mean we are going backwards?
Sir Michael Barber: Actually,
I think it means that other countries have improved faster over
that period. As I said in my opening statement, between 2001 and
2005, the Government were focused on some serious, long-term,
underpinning reformsmost importantly, in my view, for the
long run, solving the teacher recruitment shortage and bringing
some very good new people into the teaching profession. That will
have benefits for decades to come, but there was a loss of focus
on literacy and numeracy at that point. Personally, I wish I had
pressed harder on that at the time, but that is what you are seeingthe
PIRLS data follows the same patterns as the national curriculum
Q8 Chairman: I want to shift on because
colleagues will get restless, but Peter was shaking his head,
so I shall have to ask you to comment, Peter.
Professor Tymms: I must comment
on several of those points. Take PIRLS, for starters, in 2001,
and in 2006, when it apparently went back. Michael's comment was
that we did not look good the second time because other countries
went better than us. Certainly, some countries went better, but,
in fact, PIRLS is standardised and uses Rasch models to get the
same marks meaning the same thing, and our marks dropped back
there. It was not just other people getting better; we actually
got worse. But I want to persuade you that PIRLS in 2001 got it
wrong and made us look better than we were and that the level
has remained static. The reason for that is that for those international
tests to work properly, the students who are tested must be a
representative sample of the country. The PIRLS committee defines
how to collect those pupils. We went out, in this country, to
collect the pupils to do it and asked the schools to do the tests,
but about half of the schools did not want to do it and refused
to play ball. The second wave of schools were asked and only some
of them complied, and then a third wave were asked. If you look
at the 2001 PIRLS data, you will see two asterisks by England,
because our sampling procedure was not right. If you are the head
of a school and you are asked to do the tests, but your kids are
not reading too well that year, you will say no, whereas if they
are doing really well, you will say, "Oh yes, I'll go for
it." So we had a bias in the data. We got people who really
wanted to play ball, and it made us look better than we were.
The next year, when schools were paid to do the testssome
held out and got quite a lot of moneywe got a proper representative
sample and found our proper place, which shows that our standards
are just, sort of, in the middle for reading. The blip previously,
which was crowed about a lot, was a mistake in the data.
Q9 Chairman: So, it was quite an
awkward mistake in some ways, if it was a mistake. It is interesting
that under PIRLSwe will shift on, before I get a rebellion
heremost of the big countries like us, such as Germany
and France, are about the same. Okay, Finland and some smaller
countries such as Taiwan and Korea will always be high up there,
but countries with big populationsin Europe, places such
as France and Germany that are, in a sense, like Great Britainare
at around the same position.
Professor Tymms: I would point
to a different pattern in the data which relates not to size but
to the language that is chosen. Translating the results of reading
tests in other languages is problematic to begin with. Can one
say that reading levels are the same? You pay when you take your
choice. But a long tail of underachievement in reading, will also
be found in all the other countries where English is spoken. You
will find it in Australia and even in Singapore, which is largely
a Chinese population but reading in English, and in Canada and
America. That is because English is a difficult language to learn
to read, whereas Finnish is much more regular in the way that
it is written on to the page. If you are going to be born dyslexic,
do not be born in a country where people speak English, because
it will really be a problem. Be born in another country such as
Germany or Italy. I make that general point.
Sir Michael Barber: Peter has
made an important point. I would like to add two other things.
First, other European countries look at our reforms in education
over the past 10 years and are impressed by them. I have had conversations
with people from several of the countries that we have talked
about, and on this set of PIRLS we were actually significantly
above the EU average. We were above France and just behind Germany.
The long tail of underachievement is a real issue. Personally,
I think that the places to look for English-speaking populations
that do really well on reading, writing and, indeed, generally
are the Canadian provinces. Some of their practices are very impressive.
That is one place I would urge you to look if you are thinking
about the future.
Chairman: Thank you for those opening
Q10 Fiona Mactaggart: You talk a
lot about whether our assessment system accurately assesses standards
over time, but that is only one purpose of assessment. I wonder
whether our national assessment system is fit for purpose as a
tool for assessment for learning. I am concerned about the fact
that we have examinations at seven. I am not sure that they help
teachers as much as they should. Could you give your views on
whether Standard Assessment TestsSATsin primary
and secondary education help teachers use assessment for learning?
Professor Tymms: They were not
designed to do that. A test taken at the end of primary school
is clearly not meant to help children in primary schools because
they are about to leave and go to secondary schools, which often
ignore the information and do their own tests as soon as students
come in because they do not believe what the primary schools say
they have done. Unfortunately, that is the way of the world. It
happens when children who have A-levels in mathematics go to university.
They are immediately tested in mathematics. Even if you take pre-school,
all the information passed from the pre-school to the reception
teacher is often ignored, as the reception teacher does their
own assessment. The tests are certainly not being used as assessment
for learning, other than that the practice for the tests and other
tests that might be used leading up to a test might be used in
that way. They might be used as assessment for learning a little
bit at age seven, but an infant school certainly would not use
them in that way because it would be passing its kids on to the
junior school. The tests are not intended to do that kind of thing,
so they cannot be and are not used in that way. They are meant
to hold schools to account and in order to produce information
for parents. If we want assessment for learning, we must do something
different. Many schools and teachers do that kind of thing off
their own bat. There are other ways to assess. For example, there
are diagnostic and confirmatory assessments. We could go into
that kind of thing, but they are not assessments for learning.
Sir Michael Barber: You made an
aside about tests or exams at seven. It is important for the system
and, indeed, teachers in schools, to know early on whether children
are learning to read and write and do mathematics, because if
intervention is needed to support a child in getting on track
with their cohort, the sooner you know that they have a problem,
the easier it is to fix it. One purpose of national curriculum
tests is to provide accountability and to provide information
for parents, as Peter rightly said, and it is absolutely right
that that should be the case. However, in addition to that, over
a period of time the tests have taught teachers what the levels
are. The basis of assessment for learning is for the teacher and,
obviously, the student or pupil to be able to understand what
level they are working at and what they need to do next to get
to the next level. If it had not been for the national curriculum
and the national tests, I doubt very much whether the quality
of those conversations would be as good as they are. The key to
assessment for learning is investment in teachers' skills to do
that, so that they are constantly focusednot just individually,
but in teams with their colleagueson improving the quality
of their teaching, working out what they must do to get the next
child up to the next level and therefore constantly improving
their pedagogy, which is the essence of the whole issue.
Q11 Fiona Mactaggart: The interesting
thing is that your view, Peter, is that the real function of those
tests is to hold schools to account, rather than as assessments
for learning. I was speaking to a head teacher on Friday, who
said to me, "Fiona, I just wish all primary schools were
all through, because then we wouldn't have inflated test results
for 7-year-olds coming out of infant schools." Her analysis
was that in infant schools, for which Key Stage 1 SATs were summative
results, there was a tendency towards grade inflation, which undermines
your point, Michael. I agree that you need to know to intervene
early, but if the accountability function militates against accuracy
of assessment for learning, how do you square it?
Sir Michael Barber: First, the
Key Stage 1 results are not under the same accountability pressures
as those for Key Stages 2 or 4. Secondly, I would not have moved
away from externally set and marked tests for Key Stage 1, because
if you consider the evidence in the work of Pam Sammons and others,
objective tests marked externally to the school are more likely
than teacher-assessed tests in the school to provide a drive for
equity. If that had been done, I doubt that the issue you just
raised would have occurred.
Professor Tymms: The assessment
for learning is really interesting. The evidence is that if we
give back to pupils information on how to get better, but we do
not give them grades, they are likely to get better. Putting in
the grades, marks or levels and feeding back countermandsunderminesthe
feedback. That is very clear in the randomised trials and in the
meta-analysis by Black and Wiliam in Inside the Black Box. The
feedback to pupils on how to get better is vital, but it is undermined
in other ways. The other point that Michael raised about identifying
special needs early is also crucial. The key stage assessments
will not identify special needs or identify them early; they are
too late and not precise enough. If, for example, a child is likely
to have trouble reading, they can exhibit it when they are 5 or
4-years-old through a phonological problem, which can be assessed
diagnostically at an early stage. A child later on, who has, for
example, a decoding or a word-recognition problem, or perhaps
they can do both but they do not understand or make sense of the
text despite being able to bark the words, can also be diagnosed.
Diagnostic assessments can be put in place, but they are different
from the summative assessments at the key stages. There are horses
for courses, and we must be careful about how we aim to use them.
Q12 Fiona Mactaggart: So, if the
assessments do not necessarily do what we want, how else could
we assess the impact of national policies on schools? How can
we test what the Government policies, national curriculum or improvements
in teacher training do? How do we know?
Professor Tymms: We need a series
of different systems; we should not have a one-size-fits-all test.
We need an independent body, charged with monitoring standards
over time, which would use a sampling procedure in the same way
as the NAEP does in the United States, as the APU used to in England
and as other governments do in their countries. The procedure
would become impervious to small changes in the curriculum, because
it would have a bank of data against which it would check issues
over time, so that we might track them and receive regular information
about a variety of them, including not only attainment but attitudes,
aspirations, vocabulary and so on. I would ensure that teachers
had available to them good diagnostic assessments of the type
that I described. I would also ensure that there was a full understanding
of assessment for learning among the pupils, and I would continue
to have national tests at the age of 11, but I would not put the
results in league tables. In fact, I would ensure that there were
laws to prevent that sort of thing from happening.
Q13 Fiona Mactaggart: Would you have
to keep them secret from parents?
Professor Tymms: No. Parents would
be allowed to go to a school and ask for the results, but I would
not make the results the subject of newspaper reports, with everyone
looking at them in a sort of voyeuristic way. There are real problems
with those tables, which are actually undermining the quality
and the good impact that assessment data can have. We are forcing
teachers to be unprofessional. League tables are an enemy of improvement
in our educational system, but good data is not. We need good
data. We need to know the standards and variations across time,
but we do not need a voyeuristic way of operating and pressure
that makes teachers behave unprofessionally.
Sir Michael Barber: At the risk
of ruining Peter's reputation, I agree with a lot of that, and
I want to say a few things about it. First, as I understand it,
a new regulator is due to be set up. An announcement was made
a couple of months ago by Ed Balls: I am not sure where that has
got to, but the announcement was made in precise response to the
issues that Peter has raised. Personally, I have no doubt about
the professionalism of the QCA in the past decade. It has done
a good job, but it is important that standards are not just maintained
but seen to be maintained. The new regulator will help with that
once it is up and running. Secondly, on monitoring standards over
time, as I said earlier, particularly now that international benchmarking
has become so important not just here but around the world, I
would like the regulator to use samples connected with those benchmarks
and help to solve the problems of getting schools to participate
in samples, which Peter mentioned. That would be extremely helpful.
I agree completely with Peter about investing in teachers' skills
and giving them the diagnostic skills to make them expert in assessment
for learning. When I debate the programme for international student
assessment results with Andreas Schleicher, who runs PISAhe
is an outstanding person and it may be worth your interviewing
himhe says that virtually no country in the world implements
more of the policies that would be expected to work according
to the PISA data than England, but that that has not yet translated
into consistent quality, classroom by classroom. That is the big
challenge, and what Peter recommended would help to achieve it.
Like Peter, I would keep tests at 11. On league tables, the issueand
I have this debate with head teachers a lotis that unless
a law is passed, which I do not see as terribly likely, there
are only two options for the schools system. One is that the Government,
in consultation with stakeholders, designs and publishes league
tables. The other is that one of the newspapers does it for them.
That is what happened in Holland and it is happening, too, in
Toronto and in Finland. It happens with universities. If you talk
to university vice-chancellors, you find that they are in despair
because various newspapers and organisations are publishing league
tables of university performance over which they have no leverage.
The data will be out therethis is an era of freedom of
information, so there is a choice between the Government doing
it or somebody else doing it for them. If I were a head teacher,
I would rather have the Government do itat least you can
have a debate with themthan have the Daily Mail
or another newspaper publish my league tables for me.
Professor Tymms: Can I pick up
on that? I wish to make two points about league tables. First,
we publish the percentage of children who attain a Level 4 and
above, so if a school wants to go up the league tables it puts
its effort into the pupils who might just get a Level 4 or a Level
3. It puts its efforts into the borderline pupils, and it does
not worry about the child who may go to Cambridge one day and
has been reading for years, or the child with special needs who
is nowhere near Level 4. That is not going to show up on the indicator,
so we are using a corrupting indicator in our league tables. Secondly,
if you look at the positions of primary and secondary schools
in the league tables, you will find that secondary schools are
pretty solid in their positions year on year, but primary schools
jump up and down. That is not because of varying teachers but
because of varying statistics. If a school has only 11 pupils
and one gets a Level 4 instead of a Level 3, the school is suddenly
up by almost 10% and jumps massively. There is a massive fluctuation,
because we produce league tables for tiny numbers of pupils. We
can include only children who are there from Key Stage 1 to Key
Stage 2 on the value added, which often means there is turbulence
in a school. We should not publish for tiny numbers. The Royal
Statistical Society recommends always quoting a measure of uncertainty
for error, which is never done in those tables. We have 20,000
primary schools, and if the Government did not produce tables
that the newspapers could just pick up and put in, it would require
a pretty hard-working journalist to persuade them to give the
press their data. It would be possible to make laws saying that
you cannot publish tables. Parliament makes laws saying that you
should not have your expenses scrutinised, so why can we not produce
a law that says that schools' results should not be scrutinised?
Q14 Mr Slaughter: You said a few
moments ago, Sir Michael, that one of the purposes of national
testing at seven and 11 was to identify children who are in difficulties.
That sounds counter-intuitive. Would you not expect teachers to
know that anyway? If testing has a role, is it not in assessing
the needs of individual children, just as testing is used, for
example, to assess the needs of people with a hearing problem?
Otherwise, it is likely to lead to buck passing? If we test everybody,
it almost becomes the responsibility of the state or someone else
to ensure that everyone reaches a higher level. Given the length
of time that we have had testing, how far has that become true?
Stories in newspapers report the reverse, and say that a substantial
minority of children still move onto secondary school without
Sir Michael Barber: I am not arguing
that national curriculum tests alone will solve every child's
problems. I agree strongly with what Peter said about teachers
developing the diagnostic skills to diagnose such things. We want
all teachersI shall focus on primary schoolsto be
able to teach reading, writing, mathematics, and some other things,
well, and then develop over time the skills needed to deal with
individuals who fall behind. It is very good to see Government
initiatives, such as the Every Child a Reader initiative, that
pick up children who fall behind. I am in favour of all that.
You need good diagnosis, which incidentally is one of the features
of the Finnish education system that makes it so goodthey
diagnose these things early.
The national curriculum tests have spread understanding
among teachers of what the levels are and of what being good at
reading, writing and mathematics looks like. They also enable
the system to identify that among not just individual students,
but among groups of students who have fallen behind. The system
has great data about particular groups of students or schools
that are falling behind, which enables it to make informed decisions
about where to target efforts. My point is not just about individual
students, therefore, but about groups of students or variations
within the cohort. I shall comment on the point about league tables.
In the end, the data will outthis is an era of freedom
of information. We can have a perfectly valid debate about whether
Level 4 is the right indicator. However, the percentage achieving
Level 5 went up very rapidly during the early phase of the national
literacy strategy, which suggests that good teaching is good teaching
is good teaching. That was a result of the combination of the
accountability system and the big investment in teachers' skills.
Q15 Lynda Waltho: In evidence so
far, we have heard that the testing regime serves a large number
of purposesspecifically, end of key stage, school accountability,
assuring standards over time and assessment for learning. I am
getting the feeling that there is not a lot of confidence that
at least two of those are being achieved. What about the others?
Can the system fulfil any of those purposes? Is it working? Is
it fit for purpose? I do not have the impression that it is. As
a former teacher and a parent, I found the regime useful in all
of those areas at some point, but what is your assessment of its
capabilities across that range?
Professor Tymms: I do not think
that it is being used at all for assessment for learning. And
I do not think that it can be, except where it is used incidentally.
It provides a level against which teachers can set their pupils.
If a teacher in a high-achieving school could judge her pupils,
she would probably underestimate them because she would base her
judgment on those she knows. The reverse would probably happen
in a low-achieving school. Standardised levels for national tests
give the firm ground on which a teacher can make a judgment. That
is a good thing. It is there and it is being used. It gets information
to parents, but it has its downsides. I do not think that testing
is good at monitoring standards over time. We are saying, "Take
this test, and we will hold you to account for the results and
put them in league tables. We will send in an Ofsted inspector
and ask you to assess your pupils and send us the results".
That is an inherently problematic system. It is a little difficult.
Another inherently problematic thing is having qualifications
and curriculum in the same bodythe QCA. Somebody should
design the curriculum and somebody should assess it, but they
should be separate bodies. That is an unhealthy way to operate
a system. If we want to know what standards are over time, we
are far better off with an independent body. If we change the
curriculumwe read in The Times that that will happen,
and we hear it regularlyand introduce an oral test, suddenly
Level 4 will not mean the same thing, because a different curriculum
will be assessed. We cannot monitor standards over time, but by
having an independent body charged with monitoring standards not
just against the national curriculum but against an international
concept of mathematics or reading, we can track things over time.
We must do different things. I come back to the need to understand
the special needs of the child and pick out the child who already
has a serious problem. Teachers can assess their children pretty
well, but they cannot be expert in all the special needsvarieties
of dyslexia, dyscalculia, attention-deficit hyperactivity disorder
and so onnor should they be expected to be. However, they
might spot a problem with a child who needs to be assessed in
different ways, so tools to help the teacher help the child and
identify special needs and things falling back or not going quite
right to begin with would make sense. Computerised diagnostic
assessments with bespoke tests in which the child uses headphones
to listen to the computer and is asked questions according to
how they respond is to be the way of the future, but it cannot
be the way of the future for statutory assessments, which require
a new test every year to maintain security.
Q16 Lynda Waltho: There would be
more tests then.
Professor Tymms: Different types,
and probably less testing. We have more testing if we have league
tables. It is the league tables that are our enemy.
Sir Michael Barber: I think that,
on the whole, the national curriculum tests are beneficial. I
have a lot of confidence in them, and I am always cautious in
advising anybody or any education system to move too rapidly in
changing assessment or qualifications, as that involves a lot
of risk. Nevertheless, one should not stick with things for all
time. I think that they have been good tests and that they have
been good for accountability purposes. Along with the supports
that I mentioned earlier, they have helped to drive improvement
in the system. I agree with Peter about the need for an independent
body to monitor standards over timethat is absolutely right.
The proposal that is currently being piloted in 400 or 500 schoolsprogression
pilots in which children are tested when they are ready for level
testsis very promising, but it is all in the detail. If
that works, it could be beneficial in making sure that children
at all stages and ages are making progress. The data show that,
at present, there is a bit of drop-off in progress for years 3
and 4, but we would be able to move away from that if we had testing-when-ready
tests. There is a lot of promise in them, but, as with any shift
in the testing and assessment system, it is all about getting
the detail right.
Q17 Chairman: We can come back to
your last point. You mentioned a comment by Professor Schleicher.
Sir Michael Barber: I do not think
that Andreas Schleicher is a professor, but he would be a very
Q18 Chairman: Can you guide us to
what you were quoting from?
Sir Michael Barber: I was quoting
from a conversation with him. Before using his comments in the
Committee, I checked that he was happy to be quoted on the record.
You can put the quote on the record. He is quite happy to be quoted
along the lines that I gave.
Q19 Lynda Waltho: You both discussed
whether league tables were an enemy or a friend. It seems that
you have completely different ideas. I agree with you, Sir Michael.
I think that it is likely that the newspapers will develop their
own league tables. If they do league tables about what we spend
on our breakfast at the House of Commons, they will do league
tables for school results, believe me. Would it not be better
if the Government set out explicitly the full range of purposes
for league tables; in effect, if they explained the results better?
Would that make a difference, or am I just being a bit naive?
Professor Tymms: It would be interesting
to try, but I do not know. If I buy something, I never bother
reading the instructions until I get stuck. I would guess that
most people would just look down the league tables and read the
small print and headlines to find out who is at the top and who
is at the bottom. When the league tables come out every year,
the major headlines that we see are whether boys have done better
than girls, or vice versa, or that one type of school has
come top. It is the same old thing time and again, despite great
efforts to steer journalists in a different direction. I despair
of league tables, but it would certainly be worth trying providing
more information. I think that the Royal Statistical Society's
recommendation not to give out numbers unless we include the uncertainties
around them is a very proper thing to do, but it is probably a
bit late. The cat is out of the bag, and people are looking at
the league tables. Even if there is more information, people will
concentrate on the headline figures.
Sir Michael Barber: You can always
look at how you can improve a data system like that and explain
it better. I agree about that. I have been a strong advocate of
league tablesand not only in relation to schoolsbecause
they put issues out in public and force the system to address
those problems. League tables, not just in education, have had
that benefit. Going back some time, I remember lots of conversations
with people running local education authorities. They would know
that a school was poor, and it would drift along being poor. That
was known behind closed doors, but nothing was done about it.
Once you put the data out in public, you have to focus the system
on solving those problems. One reason why we have made real progress
as a system, in the past 10 to 15 years, in dealing with school
failuregoing back well before 1997is that data are
out in the open. That forces the system to address those problems.
Professor Tymms: Why has it not
got better then?
Sir Michael Barber: It has got
significantly better. We have far fewer seriously underperforming
schools than we had before.
Chairman: We do not usually allow one
witness to question another, but never mind. You can bat it back.
Sir Michael Barber: It was a fair
1 Progress in International Reading Literacy Study Back