CORRECTED TRANSCRIPT OF ORAL EVIDENCE
To be published as HC 588 i

House of COMMONS

Oral EVIDENCE

TAKEN BEFORE the

Education Committee

GCSE English Exam Results 2012

Tuesday 11 September 2012

Brian Lightman, Mike Griffiths, Russell Hobby and Kenny Fredericks

Glenys Stacey, Amanda Spielman and Cath Jadhav

Evidence heard in Public Questions 1 163

USE OF THE TRANSCRIPT

1. This is a corrected transcript of evidence taken in public and reported to the House. The transcript has been placed on the internet on the authority of the Committee, and copies have been made available by the Vote Office for the use of Members and others.

2. The transcript is an approved formal record of these proceedings. It will be printed in due course.

Oral Evidence

Taken before the Education Committee

on Tuesday 11 September 2012

Members present:

Mr Graham Stuart (Chair)

Neil Carmichael

Alec Cunningham

Bill Esterson

Pat Glass

Damian Hinds

Charlotte Leslie

Siobhain McDonagh

Ian Mearns

Mr David Ward

Craig Whittaker

________________

Examination of Witnesses

Witnesses: Brian Lightman, General Secretary, Association of School and College Leaders (ASCL), Mike Griffiths, Headmaster, Northampton School for Boys and ASCL President, Russell Hobby, General Secretary, National Association of Head Teachers (NAHT), and Kenny Fredericks, Principal, George Green’s School, Isle of Dogs, and member of the NAHT executive, gave evidence.

Q1 Chair: Good morning and welcome to this meeting of the Education Select Committee. The Education Select Committee exists to scrutinise education policy and hold those in power to account. Central to that task is ensuring the wellbeing and the best possible outcomes for all those involved in the education system, not least children, whose futures depend on the performance of the education system and the way they are treated within it. I am delighted that you are able to join us today. Can you, as briefly as you can in the timeconstrained situation we are in, set out what you think the problems are with what has happened with GCSE English grade boundaries in 2012?

Brian Lightman: Thank you very much, and thank you very much for the invitation. What I would like to say by way of introduction is this is a completely separate issue from any wider debate that is going on about qualifications reform or anything else. This is about a very specific issue to do with GCSE English. We believe that the implementation of that new examination was deeply flawed and not fit for purpose, and that has led to a massive variability in outcomes between different schools. We would strongly disagree with the claim in the Ofqual report that the standards were comparable between years. That is simply not the evidence that we have seen, and I am sure we will all be able to give lots of examples of that from many, many different schools and the results of many, many thousands of young people who have been affected at all grade boundaries.

What I want to emphasise here is that this variation between the schools has affected all kinds of schools: maintained schools, academies, HMC schools-Headmasters’ Conference-Girls School Association schools, so leading independent schools as well are telling us that it has affected them. It is not only at the C/D borderline but at other grade borderlines as well. I am sure we can go into more detail in a moment, but those problems are to do with grades coming through that are in no way in line with what one might have reasonably expected young people to be getting, having taught them throughout the time of their course. In particular, you will be aware that the grade boundaries were changed during the course of the year and so, depending on when students had put in their work to be banked, the grade would have been different for exactly the same work. So we are seeing all sorts of variability like that. We have more detail that we can go into.

We are seeing major flaws in the way the process of that examination was managed over the two years of its running, so that, for example, early work that was produced as controlled assessment was not moderated in a way that would have highlighted any issues of leniency, harsh marking or anything else. That should have been identified early, and that led to a major problem at the end, which meant that adjustments were made at the end that had a particularly harsh effect on those pupils who took the exam at the end of the course. I have summarised a vast amount of detail there, but I hope that helps by way of introduction.

Q2 Chair: Of the four key points you made there, I think one is accepted by Ofqual; the Chief Regulator said that it was easier in January and they got lucky. So there is no dispute that there was a difference there. You have said it is not comparable. The fundamental statutory duty of the regulator, after the Education Act 2011, is to ensure comparability, including over time, and you said that you believe, from the data you have seen, that it is not comparable, and that is their central duty. They say that is why so much of this has followed-because they have done that. Can you tell us what evidence you have to back that up?

Brian Lightman: We have had a look at a lot of figures here, and there is a lot of data underlying what we have looked at. I should emphasise that the awarding bodies and Ofqual have been very helpful and forthcoming in providing a lot of that information to us. When you look at it, if you take this approach of comparable outcomes and you think of what the definition of "fairness" is in terms of the Ofqual report saying that standards are comparable between the years, then what that definition says really is that an appropriate percentage would get a certain grade across the whole cohort. The problem is that, if the marking was wrong early on and students therefore were marked too high and too many C grades were awarded, the only way of addressing that is at the end of the course. So the people who did their work at the end of the course would therefore be marked particularly harshly.

Q3 Chair: You are accepting, are you, that broadly the June results were right except that they were skewed by overgenerous application of-

Brian Lightman: No, no, we are not. We are saying that the June marking was particularly harsh in order to compensate for what was apparently overgenerous early in the year. The result is that, if you have these different standards being applied during the course of the year, those standards cannot be comparable. If you take just some of the statistics in the foundation tier units, in June 2011, 26.7% got a C grade.

Q4 Chair: This is the first smattering of people taking-

Brian Lightman: The year 10s who took it. In January 2012, it was 37%, and then in June 2012, 10.2%. These are enormous differences and there is no evidence that those papers had any difference in the level of challenge in those examinations between those times. You have these massive differences and effects on some pupils within the year. Therefore, it cannot be that the standard was the same across the whole two years.

Q5 Chair: That is accepted: Ofqual has accepted that the same standard was not applied within year. They claim because it is a new qualification it takes time to bed down; they had limited data; they did the best they could. I think they specifically say in a memo to us that the exam boards could not have known that they were being too generous. That appears to be accepted. Just trying to drill down, your point is that, because they got it wrong there and because they look at the whole cohort together to create a comparable line, it has skewed the June results downwards.

Brian Lightman: Yes.

Q6 Chair: Perhaps Russell wants to come in, but could you explain to what extent that has happened, and as the percentage of people who took the June 2011 or January 2012 English GCSE modules was pretty tiny, is that really a significant skewing? I know that is an entirely different topic to whether or not the expectations were mismanaged, but how skewed was it?

Russell Hobby: I think it goes deeper than just the balance between January and June. If you look at the correspondence that has been released today, you can see quite a strong argument between the exam board and Ofqual on how to use the Key Stage 2 data to predict what attainment should be. There was a genuine professional disagreement about those sorts of things, so already they are not able to use the concept of comparable outcomes to predict this, nor did they have the statistical base early on in the year to apply that notion. So it looks like the comparable outcome methodology failed due to not being able to use predictive data about the ability of the cohort and not being able to extrapolate from the initial size of the sample that they had. Given that comparable outcomes is an methodology that we are increasingly heading towards, as well as looking at how this has adversely affected the students this year, it is extremely dangerous for future years, I think, that we are going to be heading in that direction.

Q7 Chair: Just on the level of the skewing, because Brian made that point-and I know it is just one of many points-again could you explain to what extent the huge numbers of people who took the June 2012 exam were affected? How many people would have been affected by the skewing from a relatively small number of people-I forget; it is a single-figure percentage-who took the GCSE prior to June?

Brian Lightman: This is where I think there are a lot of misunderstandings. There are a lot of figures floating around here, and we have had to delve down very deep into them. If you take the AQA foundation tier-and I want to emphasise that this is not just an AQA problem but they have the largest entry-AQA had 54,000 students taking that foundation tier in June 2011, 31,000 in January 2012 and 140,000 in June 2012. These are not insignificant figures and they do have, obviously, a very significant skewing effect, and I do not think we can dismiss that at all. I know that we are moving towards linear qualifications, but certainly with this qualification there was an option to complete it in year 10 and also in January of year 11.

Q8 Damian Hinds: Just so the point does not pass, can we be 100% clear here? I think you might have been saying something new this morning. If the target level index is 100-that is what it "should" be, based on what has happened in prior years-and if in January people were passing at a rate of 110, are you saying that in June they were passing at a rate of (a) 100 or (b) something like 95 to make the weighted average 100? Because I think that would be a new point and quite a serious one.

Brian Lightman: I think you have hit the nail on the head there: if you are talking about comparable outcomes over the whole cohort as opposed to over those individual students, that is exactly what needs to happen.

Q9 Damian Hinds: But are you saying that is what did happen? That is the question, Brian.

Brian Lightman: Yes, we think that is what did happen, and I think the evidence that has just been mentioned here backs that up in terms of some of the correspondence between Ofqual and awarding bodies.

Q10 Damian Hinds: How? Not in what we have seen.

Brian Lightman: Because if they were outside tolerances, they had to then move the grade boundaries in order to compensate for what had happened in the rest of the year.

Q11 Damian Hinds: I am not sure that follows at all.

Russell Hobby: I do not know whether we do have evidence on that basis with the data. The very complexity of the data itself is one of the things that we are struggling with in here, to understand it. What I think we can say is that the students in June were unfairly treated, because they are competing with the cohort that took their exams in January, even if the results that they got would have been justified by the grading methodology.

Q12 Damian Hinds: Sure. I think that point is accepted and backed. We are now asking a different question: were the pupils taking the exam in June particularly harshly treated, even compared with prior years, because of this need to make up a weighted average across the year cohort?

Brian Lightman: Yes.

Q13 Damian Hinds: Russell does not seem to be saying that is the case. Brian, you do seem to be saying that is the case.

Russell Hobby: No, I do not know.

Brian Lightman: Russell is saying he does not know, but we have been into this and that is the 10.2% statistic that I quoted to you a few minutes ago. How can it be that in January 37% were getting a C and then in June 10.2%? That is where it has been adjusted in order to give the comparable outcome over the whole cohort.

Q14 Chair: That is a question for us to ask Ofqual then, because my understanding-and I wanted to explore that nonetheless-was they had taken the big June cohort, applied a comparable-outcomes expectation and triangulated data on that, and had found that the grade boundaries were wrong, just looking at that group as one. They had then looked back and said, "Oh, so we got the grade boundaries wrong before," rather than there being a skewing from the prior results. Is that something we need to try to hammer out with the regulator this morning?

Brian Lightman: I think it certainly is, and I think it will come back to the issue of 60% controlled assessment, where the grades had already been announced and been stated and moderated at that early stage. Therefore, you have built up this imbalance, as it were, that needs to be balanced out at the end.

Mike Griffiths: I am here as one representing literally hundreds of head teachers who, like me, have seen their students have their hopes and aspirations really shattered because of this. As a head teacher, my belief is that Ofqual has simply failed to maintain standards; we keep hearing this phrase "maintaining standards". I am not a statistician. I am not statistically illiterate, but I am not a statistician and I just would like for members of the Committee to be absolutely clear on how it affects a school like mine, where our student cohort was comparable this year compared with other years.

For the past five years, the percentage getting Cs and above in my school-it is only one school, but it is typical of literally hundreds-has ranged between 88.6% and 91.6%, a variation of only 3% over five years. This year, it fell 17% to 74.6%, despite being predicted as 86%. Our English literature results, which all of the students also take, similarly have ranged in those five years from 83% to 91%. They declined in that year, but only in line with expectations, by 1%.

So my team of English teachers, as many others across the country, are distraught, devastated and confused as to why this decline has happened. It is the same team of teachers that has been teaching for several years and the students feel the same. I have lots of evidence to support, at an individual school level, the fact that Ofqual has simply failed to maintain the standard. I have 17% of students-that equates in just one school to 36 boys-who have not got a C grade who, were previous years’ standards applied, would have got a C grade or above. It also applies through the grade ranges, and I have evidence, which I can leave with you, of boys who have got nothing but As and A*s across the board, but Bs in English language.

Q15 Chair: Is the variability, which is again-a bit like the difference between January and June-not disputed by Ofqual, the increased variability that leads to schools getting results, after years of consistently predicting what was going to happen, and finding that there is a big difference? The increased variability is accepted; it has happened this year. Is that necessarily a failure of the regulator or could that be an inevitable consequence of the introduction of these new syllabuses with the new modular form and with increased percentage of controlled assessment? Could it be the architecture of the exams that made everybody’s job-yours and the regulator’s-impossible rather than necessarily a failure of the regulator?

Mike Griffiths: I think that my department was as well prepared as it possibly could be. They are highly professional people, and I can, as I say, leave the evidence as to exactly what they did. But as a very final point on that thing about it being due to a change, the board’s very final comment to my school was, "The centre is to be congratulated on its approach to the new specification and the accuracy of the marks awarded to students."

Russell Hobby: It is a failure of the regulator to have the information many years ago about what impact not necessarily a modular exam system but simply a new exam system would have on this, and not to prepare schools and exam boards to know that the change in grade boundaries this year would not be in line with that in other years but that there would be a dramatic shift.

Q16 Chair: They are quite clear. They say in evidence to us that the expectations of schools were raised by the January grade boundaries, but that "schools are told"-are told-"not to rely on those grade boundaries and are told, particularly when there is a new examination, not to rely on it".

Kenny Fredericks: They are told it is usually between two and three points, not 10 points. You have staff and youngsters working together, and it is not quite as clear as between January and June, because in a lot of schools they might have done the exam in January, but the controlled assessment, which is 60% of the mark, they would have cashed in in June. They have done that. So you have the same cohort judged under two different exam grades.

Whose responsibility is it? The exam boards set the exam; Ofqual approve it. They set the assessment criteria; Ofqual approve it. We go to training; my staff know what they have to do. It seems to me that there is no understanding of what goes on in school, because we have to set targets based on Key Stage 2 results. If they come in and they are level four, they have to go out with at least a C, and that is what we work through. We measure them, we test them every six weeks, we look at their strengths, their weaknesses; we work with them. That is gone through. All that work is done. We have double checked everything.

We have had our controlled assessments marked externally by people who do not know the youngsters and people who have been trained by the exam board. The exam board feedback was fantastic. Surely if you do your driving test, is there a quota on doing it? I did mine three times. You might do the theory test, you might pass that and you do the other bit. You get there until you pass it. These are young people at the heart of this. Their whole future is destroyed. They cannot go and do the courses. Some of them have gone into Level 2 courses when they should have gone into Level 3 courses.

Q17 Chair: Did you mean to say their whole future is destroyed? I do think a certain moderation of language might be necessary.

Kenny Fredericks: Well, it depends on the children, doesn’t it? I know a number of youngsters who are dropping out and will not go forward. It depends on the youngsters and on the backing they get from their home. So for a lot of these youngsters their future is destroyed, and I do not think that is extreme at all and, talking to head teachers across the country, they would certainly back that up.

Brian Lightman: Could I just say something about that C grade, because it is such an important qualification? We have hundreds and hundreds of examples of students who have had difficulty accessing A level courses and so on. Colleges and sixth forms are trying to be as helpful as they can, but also there is the issue of selective universities, and if you have not got an A in certain GCSEs, you will not get into a selective university. So some of our most able students are at risk, because of this, of losing out on that basis.

Q18 Chair: You are gathering information on that.

Brian Lightman: We are.

Q19 Chair: One of the big issues is to identify any children who have been damaged and estimate how many and to what extent precisely in order to have a clear idea.

Brian Lightman: Yes.

Q20 Neil Carmichael: Can I ask a fairly general question, just to guide the Committee? Are we really talking about a fundamental problem with the architecture of comparable outcomes or are we talking about a failure to appreciate and gather the right kind of data and apply it to the system?

Russell Hobby: We are talking about both, but the concept of comparable outcomes seems to me to be utterly incoherent and unable to be applied in the context of a changing exam system as well. I think they did not gather the data they should have done and did not warn people. If you look at the criteria for comparable outcomes, they need to be able to predict performance based on Key Stage 2 data. They disagreed on how that would apply this year. The only way that you can judge whether teaching standards have risen is using exam data, which is a completely circular argument for the comparable outcomes. You have every school in the country racing to improve its results and you have an exam system that will automatically refactor the results so every school gets the same at the end of it. It just does not work, full stop.

Q21 Chair: If you have a new qualification and it is made modular, and 60% of the marks are given out by the very teachers who are under targets to deliver, and then there is a major inflation in results, is that going to be because there has been a transformation in teaching from last year to this or because there is a systemic problem that it is the regulator’s duty to intervene in?

Brian Lightman: I think all of those things need to be taken into account. I think the issue that we have here is that, when you introduce a new qualification, whether it is modular or whatever the structure of it, you have to put in processes that are fit for purpose. We do not want grade inflation; we want standards to be maintained and improved. The only way to do that is to make sure that you have rigorous and valid and fit-for-purpose assessment processes. That was clearly not in place and the evidence for that is what happened in January: if we had those processes in place, you would not have had that lenient marking.

Q22 Bill Esterson: To come back to what Mike said about other subjects, from what I understand this has only happened in English language. Do people want to give a view on why that is, and is it to do with the assessment in schools, again, where English and maths are so crucial?

Mike Griffiths: With this comparative data, it would have been useful if Ofqual had looked at some of those things, paired examples, because-the corollary of what Graham was saying-I do not believe that the teaching and preparedness of my students was catastrophic this year and has led to the decline. The evidence I have and that I can leave with you is that students maintained their performance in other subjects-in fact, my maths results went up 1% to 92%-including English literature, and that it was this one subject where there has been this catastrophic failure, which has, I believe, led to a great injustice for tens of thousands of students.

Brian Lightman: There is some evidence that there is a bit of a problem in maths, which we have been picking up recently, but it is on a much, much smaller scale. I think one of the possible reasons behind that is English is a subject in which, for obvious reasons, a lot of attention is paid to improving performance and raising standards of literacy, and a lot of interventions are put in place. That means that there are schools that have made exceptional progress. I think that perhaps throws some light on why it is that some of our high-profile schools, ones that have been identified as making a particular difference, seem to have fallen foul of what has happened here-because they have put interventions in that have pushed their students above that level of expectation. That is probably why, but there are all sorts of other reasons why, with this particular qualification, it just seems to have gone wrong.

Q23 Siobhain McDonagh: I am the newest member of this Committee and I am completely a layperson, so I am sorry if I am about to embarrass myself. The whole comparable outcomes concept is one I have only come across in the last 24 hours, but it is moving against everything that I want to do as a constituency MP. I want to make our schools the best. I am not dogmatic about the form of those schools. I want the kids with the hardest backgrounds to do as well as they possibly can. So I do not want them to do, at 16, what they are predicted to do at 11. I want them to do as well as they possibly can, as any teacher can inspire them to do. We have two academies, and our experience is they have transformed the lives of loads and loads of kids and that this comparable outcomes concept just puts them back into the basket of: "This is what you got when you came in; this is what you are going to get when you come out." I do not understand how we can all want improving standards but somehow want to stop or cap people from doing that. Have I got it completely wrong?

Kenny Fredericks: No, you are exactly right.

Russell Hobby: No, not at all.

Mike Griffiths: I would agree entirely, but we are not here, I do not think, to debate the entire examination system and the principle upon which it is based. We have to keep in mind the thousands of individual students this year who now are sitting there with grades that are below what they have earned and what they deserve. What I am looking to this Committee, along with hundreds of other head teachers, to do is to help us to put that right for those children. What happens in the future is for another day’s debate, but I think we need to focus on the students.

Ofqual, when it came to the conclusion, said in its report that the grades in June are right. They did that based on a statistical analysis yet again. They did not look at any children’s work before coming to that decision that the grades were right. They did not take students who they had awarded Cs and Ds and look at the controlled assessments and papers to see whether the standards were comparable. It was just a desktop statistical exercise to come to the conclusion, "No, we have got it right." I just do not think that is fair on the students.

Q24 Chair: Exam boards do look at that work. They do exactly the things you have just said and they ensure that within their whole range of marks there is a fair allocation between and among students. That is what they do. It is Ofqual’s duty then to look from a higher level at ensuring fairness and culpability.

Mike Griffiths: All I can say is something has gone wrong in the system somewhere this year that has led to this situation.

Q25 Damian Hinds: It strikes me we might have two separate discussions already going on. One is about what happened at the macro level, on average, to pupils in England in 2012. The other is about differential patterns among different schools, and I think it might be helpful to take those as two separate questions. If we start with the individual, as we have two distinguished heads with us, Mike and Kenny, could you just explain in simple, entrylevel, accessible language what happens in your schools: how many pupils we are talking about, what they did in terms of written papers, what they did in terms of controlled assessments, when they did those things and which particular students you are particularly concerned about?

Mike Griffiths: I had 214 students this year who did the GCSE. The teachers have years of experience and years of producing excellent results. They know what a C grade is and what an A grade is.

Q26 Damian Hinds: Sorry, we will get to that, but, just in simple terms, what did they do in terms of papers, exams, controlled assessments and so on?

Mike Griffiths: We had completed the controlled assessments before January, but did not submit them; everything was submitted at the end. They rigorously examined the standardisation materials, examiners’ reports, as a team. They had a clear set of parameters for each grade. They set practice essays for all of the candidates. For the mock exam, using previous mark schemes, they even purchased a complete set of full practice papers from AQA, all 220 copies, so that the boys would have had practice with the new style of paper. They did the redrafting that was appropriate. We, as a school, hosted moderation visits and we ensured that our students sat every single element of the exam under exam conditions and received targets depending on their performance, but all of them were submitted at the end of the course.

Q27 Damian Hinds: The final exam, if I remember rightly, was worth 40% of the overall and the controlled assessment was worth 60% of the overall. Is that correct?

Mike Griffiths: I believe so.

Q28 Damian Hinds: So the controlled assessments had happened gradually, obviously within school and marked by the teachers, but for some reason you-and I know many other schools did this as well-did not submit them at the time, but amassed them at the end. Why do you do that?

Mike Griffiths: As a school, it is just our policy in all these things. As a school, we believe, as I think the present Secretary of State does, in the notion that you are best prepared at the end of the course when you have completed all of the work and so on, and in all of our subjects they sit the exams at the end of the course.

Q29 Damian Hinds: Sorry, we are talking about controlled assessment. You say you are best prepared at the end, but you were saying that they did the controlled assessment through the year, but you submitted them at the end. So that would not have an effect on the preparedness of the pupils.

Mike Griffiths: We only submit them for grading at the very end.

Q30 Damian Hinds: Okay, thank you. Kenny, is that similar in your school?

Kenny Fredericks: Yes. In my school, 160 youngsters did the English and 54 did English language and literature. The 160 who did English were mostly the foundation tier. They sat the exam in June and they did the controlled assessment, speaking and listening, between January and June. The reason you do that is because all the time you are building up the skills. Teachers all the time are working on different skills, so all the time we are looking at them and testing them and seeing which are their strengths, which are their weaknesses. We regroup them to work with particular teachers to build up particular skills, so it leaves different parts to the end.

Plus, you have to look at the whole picture of youngsters who are doing 10 GCSEs and you look at the whole year, where the pressure points are, and you try to work that out to give them a fair chance of doing it. The pressure on young people now is enormous. They are constantly being examined and our teachers are far more strategic now in the way that they are able to work with them and prepare them.

Q31 Damian Hinds: Is it correct that your concerns are mostly around the foundation level students-students taking papers for which you cannot score above a grade C?

Kenny Fredericks: Yes. 10 points were added to that point.

Q32 Damian Hinds: Sure, but specifically on foundation level; that is my question.

Kenny Fredericks: Yes. There have been smaller movements in the higher grades, but that is between two and three points, so it is specifically those lower ones. It is schools where they struggle even more-in deprived areas-that are going to suffer more than others.

Q33 Damian Hinds: Okay, thank you. Can I just check with Mike: is that the same with you? Is it foundation level students or is it foundation level and higher level?

Mike Griffiths: It is all the way through. It does affect the foundation students and I have some data on those, but it also affects the top end as well. For instance, the percentage of my students who got As and A*s dropped from 34% the previous year to 23%, so there was an 11% drop in the highest grades as well.

Q34 Damian Hinds: Can I flip back now to the macro level and ask Brian and Russell? We know that overall the drop in the proportion of students getting A* to C in the English suite was 1.5 percentage points. We also know from the ASCL survey that 87% of schools thought that their results were worse than they had expected and 42% thought they were more than 10% worse than expected. I know we are discussing GCSE English and not maths, but how do you square those two things?

Brian Lightman: 87% of the people who responded to our survey-

Q35 Damian Hinds: What was the sampling technique, by the way?

Brian Lightman: This is just a survey of our school members. We fully understand that the schools that have done better are less likely to respond. Some did respond. So that would explain that difference there. That 1.5% is a global figure across the whole cohort for the whole year.

Since we have done that survey we have been getting a lot more information about the effect on schools, and it is a significant number of schools. For example, we have been doing a survey on the number who feel that they have been pushed below the floor standard of 40% as a result of this particular incident and so far-and it is an open survey-143 schools have told us that.

Q36 Damian Hinds: I think perhaps we might come on to that later. Russell, how do you square those two statistics?

Russell Hobby: I think that a very small percentage change can have quite a large impact when the C/D boundary is so prominent. A very small shift there can have a much larger emotional impact and an impact on people’s plans. I think that is why it feels bigger than the percentages would say, and I think it is bigger than the percentages say, because that 1% is quite a lot of students. Whether it is all of the population of students or just a segment of them, those people have been disadvantaged by that, so I think we need to dig below the number.

Q37 Damian Hinds: Does it suggest that schools were particularly bold and optimistic in their expectations this year or most years?

Russell Hobby: I think it shows that they rely very heavily on predictions for targeting their activities within a school. If you have a student with a secure C who is struggling in maths-or what you think is a secure C-you put your weight and effort into helping them become better at maths. Therefore, you were operating on what you thought was a legitimate expectation, because you thought, "They will never shift the grade boundaries by 10 points on this one." So I think it is that lack of warning and the way that schools are, in fact, extremely strategic, as Kenny said, in allocating very limited resource.

Q38 Chair: Schools were more optimistic this year, were they not? According to Ofqual, typically schools overpredict. This year they overpredicted more than usual: they predicted 15% more students would get Cs this year than last year, whereas normally it is 12% and they normally overpredict. This is more excessive this year than usual. Isn’t that true or is Ofqual wrong?

Kenny Fredericks: No.

Mike Griffiths: It is going to vary from school to school. If my head of English was here, she would resent hugely the implication. Their predictions are incredibly accurate. They are rarely more than 1% out either way, and last year they were quite ashamed because they went up 2% on their predictions. They are pretty good at it. I think it is a bit of an insult to the professional integrity of a lot of people to just assume that schools are artificially inflating their predictions.

Q39 Damian Hinds: Perhaps this comes back to the distinction between the individual school and the overall macro picture, and I am sure that is something we will come back to. Before marks were adjusted, we learned that in correspondence between Edexcel and Ofqual there was, at one point, an eight percentage point growth in performance in GCSE English for the Edexcel set compared with the prior year. Notwithstanding everything we have all been saying about obviously everybody wanting all children to be improving all the time, and, of course, in certain schools there will be exceptional performances, do you think it is possible in one year for performance to improve by eight percentage points?

Kenny Fredericks: It depends on your starting point, doesn’t it?

Q40 Damian Hinds: The starting point is the prior year.

Kenny Fredericks: When we had Olympic medals, did we put in a cap and say, "That is going to be the limit," or did we put all the measures in? This is a bit like Usain Bolt or Mo Farah running around, getting to the end of-I am not very sporty, sorry-the track, winning the race and suddenly being told, "Sorry, there was another 100 metres you should have done."

Q41 Damian Hinds: With respect, Ms Fredericks, that is not my question at all. I am not talking about an individual student.

Kenny Fredericks: But this is what it is all about.

Damian Hinds: Of course we all know individual students, even very large numbers of students, can do extraordinarily well, but at the macro level across the entire year cohort, and bear in mind you were also the head teacher in this school the previous year and all your staff were also doing a great job the previous year, can performance grow by eight percentage points in one year?

Brian Lightman: Yes it can; it depends very largely on the entry profile of the cohort, but there are situations where that can happen. The cohorts do change and one of the things we heard on the exam results day, in the briefings, was about the differences between the cohorts.

Q42 Damian Hinds: But do we believe that would have had a net positive effect or a net negative effect?

Brian Lightman: It depends on the cohort.

Q43 Damian Hinds: But from what we know of the changes in the cohort this time around, as I understand it, would a decline in the number of pupils from independent and selective schools, for example-other things being equal-given the mix of results that you have, have a net positive impact or a net negative impact?

Brian Lightman: That is too technical for us to try to make an informed judgment about at this stage.

Q44 Damian Hinds: Ofqual have helped you by saying: "There were about 23,000 fewer candidates from selective and independent schools, about 3.4% of the total, who will probably have migrated to international GCSE or other qualifications. These candidates will typically have relatively high Key Stage 2 point scores."

Brian Lightman: One of the points there is that Ofqual did not have the data on the whole cohort when they were preparing for these comparable outcomes. They only have data on 75% of the cohort. You have children who have come in from abroad who did not have Key Stage 2 results. You have students in the independent sector who have not done Key Stage 2 and so on, so it is not data on the whole cohort.

Mike Griffiths: We can, if you like, play games with statistics, but the simple fact is that there are hundreds of schools like mine-one in Suffolk I have data for, another one in our county-where there are dozens and dozens of students who have got D grades. What has happened in the rest of the country, in one sense, as a head teacher, is not relevant. What is relevant is, in my school, the 30 boys who have not got a C that one would have expected, if standards were maintained, would have done; in another school, the 55 students in the same position. You would have almost thought that, if there is a removal of students at the top end, a high performing and successful school like mine and the one I am thinking of in Suffolk would have benefited from that, because their students would, apparently, have been even better. The opposite has occurred, which is that in many high-performing schools, if they entered all their grades at the end of the course, their students have, in great numbers, had their grades downgraded. Whatever you do statistically, I just think that is unfair.

Q45 Damian Hinds: I hope very much we will come back to these individual school effects, which are absolutely crucial. The last question I want to put is to Russell. I was going to ask a question to try to help my understanding of how publishing the grade boundaries in January could affect the actual results of children later on. I can understand how it can affect expectation, but not how it affects outcomes compared with if there had been no announcement of any grade boundaries at all. I think you already answered that question when you said that, if you had a banked C, as it were, in English, you might then move on and put more emphasis on maths or other subjects. Can you just expand on that a little bit more for us?

Chair: Perhaps briefly.

Russell Hobby: I could just say "yes", like I answered it previously-I could contract on it. If you have a secure C, then it would be sensible, in the child’s own best interests, to focus on subjects in which they are not certain to get that, because if you get a C in English and a D in maths, that does not do you any good at all, so you would not keep working on. So you may have assumed that child was safe and secure in that C for English, you have done the work that needed to be done and, given limited teaching time, you are going to focus on their other, weaker, subjects.

Damian Hinds: Thank you.

Q46 Alex Cunningham: What did Edexcel tell schools about the changes after the January exams? What contact has there been about changes?

Kenny Fredericks: We went with AQA and they did not give us any feedback on that at all. We were working towards the grade boundaries we had with an expectation that might vary between two or three points.

Q47 Chair: Who told you it would vary between two and three points?

Kenny Fredericks: You always know that with exams. That is expected.

Q48 Chair: So no one told you that; you just thought that.

Kenny Fredericks: Yes. If we had been told that suddenly there was going to be a 10point difference, we would have been able to sit down in the English department to see what they had to do to make sure that the children were able to have a go at getting those additional 10 marks. We are really cross that we did not know, and therefore we could do nothing to support them.

Q49 Alex Cunningham: So the schools did not see it coming.

Kenny Fredericks: No. There was no indication.

Mike Griffiths: My head of department, on the morning of the exam results, came into my office in tears, basically offering to resign. We had absolutely no idea this was going to happen. We had no idea from the boards that this was going to happen. As I quoted to you from the board’s own comments, they feel that the department had prepared the students well, had marked them accurately, it was consistent and they had applied to the new specification in an exemplary fashion. So why would those experienced teachers, who know what C and A grades are, have been expecting that huge decline?

Q50 Alex Cunningham: Edexcel are not due before us immediately at this stage, but they have to answer some questions here, surely.

Mike Griffiths: Like Ms Fredericks, we were AQA as a board, but, again, had no information from the board and at no stage was the department given any sort of indication that this might happen.

Q51 Neil Carmichael: Do you accept that Ofqual was right in saying that the January grades were a bit generous? If so, to what extent do you make that measurement?

Brian Lightman: We have to accept that, but that should have been identified at the time. What we are saying here is that is the purpose of moderation. That is what should be happening. It is very common for schools to have their marking for course work or controlled assessment-those types of things-adjusted to make sure that it is of the right standard, but that was not the case.

We have plenty of schools that were told by boards-and it is not just one board but all boards-that there would be little change from the January boundaries. They were told that in conversations that they had between January and June.

Russell Hobby: To be honest, the time to act was before January or after this year’s results, not in the middle of this year. It is almost a belated realisation that they are going to get into trouble with grade inflation, and therefore the June cohort is being punished for that. I think Ofqual should have taken that hit and then revised their procedures for the year to come rather than the June cohort.

Q52 Neil Carmichael: So they were too late in noticing what had gone wrong in January.

Russell Hobby: Yes.

Q53 Neil Carmichael: Effectively then correcting-

Brian Lightman: Harshly in June.

Q54 Neil Carmichael: At what point do you think Ofqual realised they had made the mistake?

Russell Hobby: There is some evidence to suggest they knew three years ago that there were going to be problems at this level, and there has been a lack of institutional memory within Ofqual as well to do that. They should have known it was coming rather than realising afterwards. Between January and June, who knows what went on; that is something for them to comment on, but it was far too late.

Brian Lightman: They did have the work of 54,000 candidates in June 2011 to look at as well.

Q55 Neil Carmichael: So they could have done a comparison.

Brian Lightman: Yes.

Q56 Neil Carmichael: You obviously feel they did not do that or at least do it properly.

Brian Lightman: Yes.

Q57 Chair: The allegation is a straightforward failure of moderation and grade boundary understanding at an earlier time when there was sufficient evidence to come to that conclusion, and their suggestion that there was not you do not think is fair.

Brian Lightman: Yes.

Q58 Craig Whittaker: Had schools and head teachers known earlier, would that have made a difference?

Kenny Fredericks: Yes.

Q59 Craig Whittaker: In what way? Are you saying you would have worked harder?

Kenny Fredericks: If you are the English teacher and you are working with your groups and you work as a team, you look at the exam specification, what needs to be done and how you get the points. It is very sophisticated now. Teaching is really, really good now and I do not think people recognise that. They would then have looked at what youngsters have to achieve to get the grade.

Q60 Craig Whittaker: Why didn’t they do that anyway? If there is room to get these kids to do better anyway, why aren’t teachers automatically doing it?

Russell Hobby: Teachers are working as hard as they can.

Q61 Craig Whittaker: So it would not have made any difference.

Russell Hobby: No, because they could have used those hours on the subjects in which they were not secure, as I was saying to Damian. If you have a secure C, you are not going to spend more time when a student is lacking a D in another subject, because they need that D. It is not the school that needs it; it is the child that needs that to get their places. So it is a question of working smarter rather than harder. I do not think you can get anything more out of the teachers themselves in terms of effort.

Q62 Craig Whittaker: So it would not have made any difference then.

Brian Lightman: I think you are highlighting a very important point here that is for a later discussion, perhaps, about what teaching is about, because we are so focussed on the exam system here; it is a very strong focus. But that is another discussion, I think.

Kenny Fredericks: At least give the schools and the teachers a chance to do something and to give children a fair chance at this. This is completely unfair and really is not acceptable at all.

Q63 Neil Carmichael: Could you talk about the evidence you have along the lines of Ofqual not applying the comparable outcomes approach in a proper and consistent way?

Brian Lightman: Yes. We have collected statistics across the whole examination here. We have looked at the grading of all of the different parts of the exam. We have looked at the percentages that are going there and we have looked at these variations there. I am struggling; I do not think I can just describe all of that information. What I can offer, and we have been offering, is to provide anything that will be necessary to help there, and we have been working closely with Ofqual to give them that information and feedback, but we have been gathering a lot of evidence from schools about it.

Q64 Neil Carmichael: Can I ask one more question? How can we map out the variance between schools, which has been referred to since we started this session in one way or another? That is an equally important question, which we discussed prior to your arriving here, and it seems to be still not properly calibrated.

Brian Lightman: You have to go back to looking at the pupils’ work and you have to look at comparisons of prepared subject analyses as well to see how the students did across their other subjects. That is going to require some digging back in schools, but I think it is absolutely essential, in a future investigation of this, to get to the bottom of what has happened here. There are urgent issues that need to be addressed here, obviously, in terms of the grades those pupils have got, but there are a lot of very searching questions that can only be answered by looking at that sort of level of detail.

Q65 Chair: Can I ask you then to just briefly tell us what you want to happen now, what needs to be done, and who you think is best suited to do it?

Russell Hobby: There are two initial things: there does need to be an independent inquiry into this. I do not think Ofqual can investigate itself on these matters because I think they are part of the issue. Now, an independent inquiry may reveal that they have made all the right choices and so on, but at least we would then have the confidence in that.

I think there should be a regrade for those students affected in June. They are competing against the January cohort. They are competing for places and for jobs and apprenticeships. If comparable outcomes is going to mean anything, it ought to mean comparable within a year as well. They were not given the predictions and the chance to overcome the new grade boundaries. If they had been warned about them, they could have done something or pulled something out of the bag in order to achieve that. So I think they have been let down.

Q66 Chair: Do you welcome the resit opportunity?

Russell Hobby: It is not enough, and it does not reflect that you have to study the GCSE again. You have to start again for parts of that as well. Plus, people are being entered in Level 2 courses when they should be in Level 3 courses in further education. It is an inadequate redress, I think.

Mike Griffiths: I think the resit is a complete red herring; it has nothing to do with this. The students have already done the paper and they have already performed and whatever else. It is not because of any shortcoming in the students that this has happened, so I think that offering a free resit is not useful.

Q67 Ian Mearns: The thing that strikes me from the previous discussion is where Ofqual state that "it is regrettable that the publication of grade boundaries for the January assessments could have led schools to assume that the boundary would remain constant, and we will review with exam boards any lessons from this". Why would schools think that the grade boundaries would be changed within the same academic year?

Brian Lightman: Awarding bodies need to have the right to change grade boundaries if there is a different level of challenge between the papers. That has always happened and that makes the awarding fairer, but in this case there was no reason to expect that it would be this sort of extent of change.

Q68 Ian Mearns: I think any rational person would assume that grade boundaries would remain constant within an academic year.

Brian Lightman: Within an academic year, yes-between years.

Q69 Ian Mearns: Do you really feel that Ofqual have treated young people who have suffered because of this as some sort of collateral damage in the face of the Secretary of State’s concern about grade inflation?

Russell Hobby: I think Ofqual have focussed too much on the statistical arguments and not on the people involved in that. I felt the initial interim report was remarkably complacent about its impact on the individuals concerned. In fact, I do not think they were mentioned within it. We are all getting dragged into statistical arguments here and, indeed, there does need to be some sort of statistical base to this, but it seems like we are letting the statistics and the calculations drive the choices that we make rather than simple justice for the people involved. I think whenever a system has got to this level of complexity, where interested laypeople from all sorts of directions cannot really tell what is happening, we have a system that is out of control.

Q70 Ian Mearns: It feels to me that there is some sort of element creeping in here that there has to be some limit to progression-that only a certain amount of youngsters are allowed to go above that level, so there is a glass ceiling being introduced artificially halfway through an academic year.

Kenny Fredericks: That is exactly what is happening. I had to go and speak to year 11 yesterday and try to explain to them what was happening and what is going to happen this year. They are going to sit their exams and yet they do not know what they are going in for. They do not know what the expectation is. They do not know what marks they are supposed to get. Their teachers do not know. It really is not right.

Q71 Ian Mearns: Because of the importance now for the individual, but also for institutions, of English and maths, do you think this has had a particularly strong effect because of what has happened in English this year? You are all nodding.

Mike Griffiths: I think that is true. At the end of the day, the statistics are one thing, but ours, if you like, is a people business. It is about the students themselves. I do not know any other issue that has united the NUT and HMC, the academy chains and local authorities. They are all united in this-that there has been a miscarriage of justice-and that is why we want to see a change. I am sure you have colleagues who have children who have sat the GCSE this year and who have been affected by this. I know of one MP because of a local school, and her son did do the GCSE and has suffered at the hands of this. At the people level, it requires something to be done.

Q72 Mr Ward: First of all, I think the language has been a bit difficult, because adjusting marks and grade boundary changes are not the same thing. There was an opportunity to adjust the marks through moderation in January and the decision was made not to do that. Can I just come to you, Mike? You did not cash in, in January. There must have been some results that you saw in January of which you thought, "That is a bit better than I expected." Wasn’t it risky to leave it until June? Why didn’t you cash in some of the results where you thought it might be risky if you left it until June?

Mike Griffiths: There is no reason to. My head of department had no reason to imagine that there would be a problem. We did not come into this assuming that there was going to be this level of turbulence in the system.

Q73 Mr Ward: So based upon the results received, with another four or five months to work with the young people, knowing what the boundaries would be, you felt you could enhance the January results.

Brian Lightman: I think there is another point here. Whilst some of that January marking was overgenerous, that is a global statement across the whole system. It may well have been that in Mike’s school it was not overgenerous, because the moderation was only a very, very small sample of the people who took it.

Q74 Chair: You still have to have 60% of all the marks in a qualification awarded by controlled assessment, namely by the teachers concerned.

Brian Lightman: That is a different discussion to be had in the future and one that we must have, but the issue here is that, if you have that system, you have to have proper safeguards in place. I think that is where the difficulty is.

Q75 Ian Mearns: Kenny, you talked before, particularly in the sort of school that you are head of, of the youngsters who did not make the C. What is happening to those youngsters now? A lot of them will have been disappointed and not progressed and left. What is happening to them?

Kenny Fredericks: We have a sixth form, but we do the International Baccalaureate at our school, which is quite a difficult qualification to do. We do some Level 2 courses also. We are working with the colleges and with ourselves to see what arrangements we can make for them. For youngsters who, through good career planning and support, were going into courses that they had chosen, they had applied for and so on and they cannot now do, the colleges are trying to find them Level 2 courses. Occasionally they are being told that they can do the Level 3 course, but they have to spend a year doing the exam again, which takes their eye off the ball of the original course they are doing.

Some are so disgruntled and so demotivated they feel that, no matter what they do, they cannot succeed, because when they keep moving the barriers or the goalposts, you cannot succeed. They are talking about taking a year out. Now, I know with some of my students that a year out means they are going to turn into NEETs. I think this is a big issue; we are going to increase the number of young people not in employment or further training and so on.

Some who were supposed to do apprenticeships have not been able to go on to them and we are trying to find them alternatives. My staff are working with those individuals, trying to place them so they are not sitting at home.

Q76 Ian Mearns: According to Ofqual, AQA found evidence of significant teacher overmarking. Should exam boards have been adjusting schools’ marks downwards if that was the case?

Kenny Fredericks: First of all, when mine did the exam in January, because our expectations and the results were right, what we had expected to happen did happen. With the controlled assessment, you receive 20% speaking and listening and 40% controlled assessments. They have to be taught certain things, whatever the question is, and they have to work on that, do essay plans and so on. It is moderated, so the exam board asks for samples, they look at the work, they moderate the work, they give feedback. The feedback we had commended our marking, agreed that we had awarded grades and so on. It is not as if teachers just mark. Because of the nature of our school, we made sure that those controlled assessments were checked and moderated by external markers as well as our own staff. So there is no way that the teachers were overmarking, and certainly AQA gave no indication that was the case.

Mike Griffiths: Again, just to reiterate, we were AQA as well and there was no evidence at all that the teachers were overmarking in my particular school; I know it is only one school, but that is why I am here. I repeat that the exam board’s report said, "The centre is to be congratulated on the accuracy of the marks awarded to the students," and then they get demoted 17%.

Q77 Chair: Assuming that is the case, could it be that the schools that were congratulated and had very high standards and did not boost marks to suit their league tables were disadvantaged at the expense of others? AQA are saying, overall, they think there was significant evidence of overmarking, but it was within the tolerances so that they could not intervene. Within the tolerances, people were marking up because they thought they knew where the grade boundary was and they boosted the marks to their pupils in order to boost their apparent performance. Could it be that the schools that did not do that have been penalised as an expense of those that did?

Kenny Fredericks: I do not think AQA gave that information to anybody, as far as I know from the feedback I have had from hundreds of schools.

Q78 Chair: The Ofqual memorandum to us says that AQA found evidence of significant teacher overmarking, and "overmarking" means boosting of scores artificially, does it not?

Brian Lightman: My question would be why wasn’t that fed back to schools? If they were overmarking, why wasn’t it corrected at the time? Clearly, you have heard Mike’s moderator’s report, and I have seen many other similar comments in feedback from schools saying they have been told exactly the same thing.

Q79 Ian Mearns: What you are saying is, if Ofqual are saying that AQA said that, they kept it to themselves.

Brian Lightman: Yes.

Kenny Fredericks: Yes.

Russell Hobby: Yes.

Q80 Ian Mearns: Lastly, to Mike and Kenny in particular, did your staff attend exam board training events for GCSE English? If so, what information was given at those events about controlled assessment grade boundaries?

Kenny Fredericks: They certainly went. They always go to the exam moderator meetings. I cannot tell you precisely what they were told. Whatever they were told, they followed to the letter. They followed the exact instructions, they took advice and we brought people in as well who had come from the exam board to make sure that everything was okay.

Q81 Bill Esterson: Given the importance of getting five or more grades A* to C, isn’t this as much about the league tables and the need to try to get above the 40% floor level as it is about individual students?

Brian Lightman: Of course the league tables and the accountability framework drive behaviour, and schools could not possibly ignore the fact that is what the expectation is. But the issue here-and I think this is why there is such strength of feeling about this-is this is about students. We are here talking about the students today, and we have examples; we have lists of names of people who have lost out as a result of this, and that is what the issue is here. We need to have conversations later about the effects of the accountability system and so on, but they are different discussions. The issue here is about the students.

Russell Hobby: Of course, five A*s to C matters as much to the student as it does to the school, so your interests are aligned. If you have not got that, you are not going forward. But I think throughout this conversation you can see just how much our examination and assessment system is distorting and changing what is going on in schools. Schools are becoming-and are-exam factories. You have seen the strategy and tactics that are used. This is in the interests of the students as well, because if they do not get those five A*s to C they are not going anywhere. But we have a schooling system that is dominated now, from the age of five with the phonics screening check, by tests. Once we have solved the problem for the students in this year, we need to move on and look at what damage is being done to learning by the examination system.

Q82 Bill Esterson: Do you think the reason that English has had a particular problem-I think one of you said earlier that there was slight evidence of something in maths as well-is because of the importance of five A*s to C?

Brian Lightman: Yes, because literacy is the key to every other subject and to every career that you need. So all of the strategies that have been implemented by schools, both at school level and, indeed, at national level if you take London Challenge, National Strategies-all these other things that have gone through in the past-they have all, rightly, focussed on standards of literacy, and that is absolutely what needs to happen.

Q83 Bill Esterson: Is there a tendency in any way for teachers to be, however slightly, generous in marking assessments, particularly at the borderline?

Brian Lightman: I think the quality of assessment has improved absolutely beyond recognition over the years. It has been a major, major focus for training in every school in the country. When I go into schools now, as I do, as often as I can, head teachers show me how they are monitoring the progress of their students and what interventions they are putting in as a result of that assessment. It is not in their interests. Indeed, if Ofsted came in and you were assessing overgenerously, you would be, rightly, in for criticism there. It is a science. Assessment is a very high-order skill for teachers and it is something that is, rightly, emphasised as a very important aspect of our work.

Kenny Fredericks: It is moderated by the exam boards very carefully, and we have to take back what they tell us and we act on that, so it would be in nobody’s interest to overmark. If anything, teachers probably undermark.

Q84 Bill Esterson: So the suggestion that could happen is completely wide of the mark.

Kenny Fredericks: I see no evidence of that.

Q85 Chair: Except for AQA stating that it is the case and there being an incentive system precisely to overmark. There is transparency in the system, so you know exactly what the tolerances are and how many marks you can increase over what it should be, and you know there cannot be intervention, and you have all these targets and you are spending every effort to try to reach these targets for the school’s and the child’s benefit. To suggest that there is not an incentive to overmark is absurd.

Kenny Fredericks: As we pay AQA a large amount of money-we spend a huge amount of money on these exam boards-would they not do us the courtesy of giving us that feedback? We take feedback and we act on feedback.

Q86 Chair: It is a different point, though. There is an incentive in the system-which Russell highlights has all sorts of perverse outcomes-within the tolerances to overmark.

Brian Lightman: What we need to get back to is talking about what standards a student should learn. What should they be learning? Instead of chasing a grade C or a grade B, we should be talking about teaching and learning.

Q87 Chair: I entirely agree. I must bring this to an end. It has been a fascinating session of the Committee this morning. Personally, I am particularly struck by the issue of what might be called the "turbulence" in the system, this variability, because we are talking about the grade boundary as if it is uniform in its effect. This year is peculiar in that the application of standard lines is impacting people in a much more variable way than before. That means that there are thousands of children who are being treated in a way that is different from prior years, and I am not sure we are that much further towards understanding why that variability occurred, regardless of where you put the boundaries and what you do between January and June. So anything you can do to throw light on that would help us understand who has been illaffected in a bad way and how many and, therefore, that might contribute to any understanding of what needs to be done about it. Thank you all very much; it has been very useful.

Examination of Witnesses

Witnesses: Glenys Stacey, Chief Executive and Chief Regulator of Qualifications and Examinations, Ofqual, Amanda Spielman, Chair, Ofqual, and Cath Jadhav, Acting Director of Standards, Ofqual, gave evidence.

Q88 Chair: Good morning and welcome to our second session this morning looking at the issue of English GCSEs in 2012. Perhaps it might be best to start with as succinct an overview as you can give us, please, as to what has occurred, what is commonly agreed and where there is dispute.

Glenys Stacey: Thank you very much, Chair; good morning. The position is this: GCSE English results were down A* to C this year by about 1.5% overall. That was very much in line with expectations because of changes to the student mix, but there has been an unusual distribution pattern of results: there is a greater variation between schools than expected, it seems. For some schools, these results are a far cry from their expectations, and so we have been looking carefully at what may lie behind that.

I think there are a few important points that we would make. First of all, there has been no political interference; I am happy to talk about that. Secondly, awarding and grade boundary setting worked as it should have done for the English suite and as it worked, indeed, for other GCSEs, AS and A levels. So we played our proper part in regulating standards there, and neither the exam boards nor the regulators did anything other than what they should have done. But it is very clear when we look back that in January awards were, as we have said, generous, but that could not really have been seen by examiners at the time.

The issue really seems to be, as we boil it down, that students have taken different routes through these qualifications. There is what you might call a route effect, which we need to understand more. These were designed as modular qualifications with 60% controlled assessment, but it seems that a good number of students took them as if they were linear qualifications. We are looking, for example, I think, at 45% doing that in AQA, the biggest provider, and we may well be seeing here a route effect, and we need to understand more about whether that route effect is pretty well what you would expect for any qualification that was used in that way, or whether there is something particularly different about these qualifications.

Q89 Chair: Thank you. You have known all about the risks around new qualifications, around modular construction and controlled assessment for several years. Why didn’t you anticipate the situation you now find yourself in and do something about it?

Glenys Stacey: We have certainly known about modularisation and the issues that arise. Modularisation was from a different era, I think, and we have announced that we are making changes to GCSEs. These are coming into effect from September, and move to a linear approach. We are currently consulting on that for A levels as well. That has come from research we have done and published, where we have looked at international ways of assessing. So we recognised that modularisation can have an effect on standards, and we very much want to see a better arrangement there.

On controlled assessment, the arrangements that we inherited provide for different levels of controlled assessment depending on which GCSE subject you are dealing with, and it is a particularly high amount for the English subjects; it is 60%. As you will know, Chair, we reviewed controlled assessment last year and we found from schools some issues about the workability of it, and so we then worked with exam boards to make things a little bit more straightforward. This year, we started another review of controlled assessment, a more fundamental review, looking really at its suitability. We were concerned there about what we were hearing from schools, most particularly in English, about the impact it was having on teaching and testing in schools. In short, the amount of time spent preparing for it, then delivering it and then assessing it was thought to be disproportionate and eating unduly into teaching time. What we can see from the evidence in the English suite this year is that there are other considerations as well that can now feed into that review.

Q90 Chair: We know you are aware of the risks around the introduction of new qualifications. You have well known views. You are reviewing or are scrapping modular controlled assessment. You are totally on board with the risks, and yet we find ourselves in this situation. The key question is why you did not anticipate that and what you could and should have done about it, so that we did not find ourselves in this position, with school leaders, as we have just had, expressing the views that they do.

Glenys Stacey: I am sorry, Chair; I got carried away with modularisation and controlled assessment. We did see risks to the English suite. We approach oversight of these qualifications, where they are delivered, on a risk basis, and we had identified that we were looking particularly this year at new qualifications and those in significant high-stake subjects, and, of course, the English suite was one of those. So we did put in place a particular scrutiny programme right from the beginning, so we focussed on the suite. We have attended many of the meetings for written papers and controlled assessment, for example, as boundary setting is done, many more than we would do ordinarily for the average qualification. We did not see anything at any of those meetings that meant that we needed to take further action, and we are happy to take more questions about that.

Q91 Chair: That goes to the heart of it: why didn’t you? We have just heard evidence that, although some modules had very small entries, the foundation tier, as far back as June 2011, had more than 50,000 people, if I recall, sitting it. So the allegation is that there were sufficient numbers. If you look at A level subjects, which rely on essay writing, some of those have much smaller entries than the entries that you saw in January this year or June last year and, basically, you should have had enough data and information to be able to do this. You were already alert to the problems, and yet somehow we still find ourselves sitting here today. That must mean that you failed somewhere to do what you should do in order to anticipate it. That is the allegation.

Glenys Stacey: I understand and the position is, as ever, a little bit more complicated than it perhaps first appears. The foundation tier paper is, of course, an examination paper, and examinations are, by their nature, different. An examination can prove to be more or less demanding and more or less effective in what it is trying to do, so it is not at all unusual for boundaries on examined papers to change. The critical issue, I think, for us here is in the assessment of speaking and listening, which is part of controlled assessment, where it is notoriously difficult to assess speaking and listening; that has always been known. But in the speaking and listening units with, for example, the major provider, AQA, in January this year there were 14,000 students that sat that. In June, there were 376,000, so a very large percentage were delaying until June. It is not that different for Edexcel either; I think it was 700 in January and 23,500 in June.

Q92 Chair: 14,000 is large for many subjects. Are we to assume that any entry that is less than 14,000 has unstable and unreliable results? That is what we are struggling to understand.

Glenys Stacey: I see; then let me try to explain that. It is a relatively small amount of the whole and that is not unimportant. The other issue is that, in those early assessments, examiners did not have much information other than the scripts to go on. It was a new set of qualifications, so they were not able to rely explicitly on past examples of performance that they could truly compare in the way that you can do and examiners do do when qualifications are well established. It is always a problem when you introduce new qualifications.

Q93 Chair: Isn’t your key control, rightly or wrongly, the foundation of the comparable outcomes framework? You have developed and discussed and been open about that: to look back at Key Stage 2 results-in other words, how kids do at the end of primary-and then map that forward across decent numbers. That does not stop individuals moving from the bottom to the top, but allows you to check whether you are maintaining comparability. Surely, across 14,000 earlier in the year, you would have applied exactly that and the alarm bells should have rung and something should have been done then.

Glenys Stacey: I think the position, Chair, is that the key piece of information available to exam boards by June, but not before that, was a statistical prediction based on prior attainment of the candidates, the cohorts. It was based, as you say, on the relationship between national outcomes in GCSE English and the Key Stage 2 scores for the cohort. If I could just explain the prediction, it makes the assumption that the relationship between prior attainment of a student and outcomes ought to be stable from one year to the next. It was used definitely to guide the 2012 June award in a way that simply was not possible for the awards in each of the earlier units.

Q94 Chair: That gets to the nub of it, but could you explain why that was the case? I think a lot of people are wondering how it was possible that did not happen before.

Glenys Stacey: I can quite understand the confusion. The statistical predictions that were used to inform the early awards were, compared with June, a comparatively weak source of evidence, because they were thin. They were based on it being a new qualification. We know, for example, that in AQA it was understood and explained very carefully to the awarding committees at the time that this was not as robust an approach. Why not? Because, at the time, you were looking at individual units early on, again with low numbers, and the position is that, at that time, there are several unknowns. There are more unknowns than there are when you have a stable qualification or at the end of the qualification. The unknowns are these, for example: the strength of the correlation between the units, which strongly influences how unit outcomes aggregate to give subject outcomes; the impact of the move from coursework to controlled assessment on mark distributions and, therefore, the aggregation. That move happened in other subjects, but of course here we are talking about 60% and we are talking about English and the assessment, particularly, of speaking and listening. Another unknown until the end was the impact of changes to tiering and candidates being able, for the first time, to get a grade A on the foundation tier. So there were very significant unknowns until June.

Q95 Bill Esterson: Coming back to the statistical point, 14,000 out of 376,000 is statistically a very high sample number. If you compare it with the way that public opinion polls or market surveys are done, if you look at public opinion polls, roughly 2,000 people are used to predict the political allegiances of a population of many, many millions to within an accuracy of 95% to 98%, within a margin of error of 1% or 2%. Surely the comparison is that 14,000 out of 376,000 should have made that very much easier to predict than the equivalent in our business.

Glenys Stacey: I wish that it was so. Of course, we know that in political polls the sample is very carefully collected to be, so far as possible, representative of the population as whole. It is not like that in awarding.

Q96 Bill Esterson: I am sorry, but in the field of statistics-which I can barely pronounce-a large sample size of 14,000 irons out some of those anomalies and, in fact, good analysis enables you to do exactly that by weighting the data yourselves.

Glenys Stacey: I will ask Cath to explain a bit more in a moment, but we know that in these early units you may get, for example, particularly bright children being put forward, because they can put that unit out of the way and under their belts and get a decent grade.

Q97 Chair: Sorry to take you back, but if we are talking about the comparable outcomes as being the foundation, and the foundation of that is the Key Stage 2, then if the percentage at any particular time happened to be weighted more to the kids at the bottom or the top academically, prior attainment, you would immediately know that. This was not 50 compared with 300,000; this was 14,000, as Bill has said. We are struggling to understand why it was not possible. Was there something missing? Was software not available? Were techniques planned for June that simply were not used in January? I am none the wiser; I may be being dim.

Q98 Mr Ward: The 14,000 was not the sample; it was the population. It was 100% of the population. What you are saying is that, out of 100% of the population, you could not do an analysis to say whether the marking was fair.

Glenys Stacey: What I am trying to say is that those 14,000, firstly, may not have been representative of the whole, but I understand the point you are making. Secondly, when awarders are awarding they are trying to understand, so far as possible, what they have got in that 14,000. There is some information that can help them, some understanding, school by school or centre by centre, but there is not a lot. So it is much easier to do the job with a greater level of assurance when you have a much more sizeable proportion of the whole of the cohort. I hope that is one element that is understood. That is the first point.

I think the second point is coming back to the unknowns that I mentioned earlier. They are extremely relevant as awarders are trying to gauge standards, particularly if there is nothing else beforehand directly comparable to go on.

Thirdly, and I know this is tricky, English is a particularly difficult subject in terms of setting standards. It is not a hardedged subject where figures add up and so on. There is a large element of judgment. Now, we know from the scrutiny studies that we did looking at how it was done and from the reporting that we had back from awarding bodies, because we were looking closely at this, that examiners did find it difficult. They did say it has been particularly difficult, but the best judgments that could be made were made. Cath, I wonder whether you want to add anything else to that.

Cath Jadhav: As you say, we have Key Stage 2 predictions, but they predict how the students will do in the qualification as a whole. The awarding bodies do not know, at the point they are making individual unit entries, how that translates into unit level performance, and that is the difficulty. We know where we expect these students to end up, but we do not know how they are going to get there.

Q99 Chair: So, in effect, it is about the fact that only at the end do all the elements come together, and only when all the elements have come together are you able to triangulate, or whatever the right word is, and ensure the comparable outcomes. I understood that point; that is good. There were some letters between you and a couple of the exam boards that were published this morning or last night. Could you tell us about the discussions you had with exam boards about GCSE English results prior to the publication, to what extent there was a toing and froing, and how normal that is? Was this extraordinary in any way? Can you tell us a bit about that?

Glenys Stacey: It certainly was not extraordinary, I can assure you of that, but the best way to approach this to assist is if you explain, Cath, the general approach ahead of the regulator’s awarding meeting, and then I can explain the specific position we found ourselves in with Edexcel, if that is what you are interested in.

Cath Jadhav: Yes. Throughout the awarding period we meet regularly with the awarding bodies via telephone and we receive the data coming out of the award-the emerging data for each subject as the award meeting is completed-and how that compares with the predictions. We review that within Ofqual with the three regulators, and we will go back and challenge or question anything that appears out of the ordinary. Then we pull that together for two separate meetings with the exam boards to look at A levels and GCSEs.

Glenys Stacey: Those meetings occur as soon as we have sufficient provisional data across the subjects-there are about 46 or 47 GCSE subjects-and across each of the exam boards. Just to set the context with Edexcel, they are a relatively small player in the GCSE English market, although their share has risen year on year from 5% to 10% this year, so they are making inroads in the market. The preliminary result that we saw when we had our regulator’s awarding meeting, I think it was on 6 August, was that their results were generous overall.

Q100 Chair: Can you take us through the different English GCSE awarding bodies and give us an idea of the extent to which you influenced the eventual grade boundaries compared with their starting point?

Glenys Stacey: Yes. Cath will correct me if I get this wrong, but there are four main bodies and then one in Northern Ireland as well. Of those bodies, AQA’s results came in I think just a little bit above expectations from last year, and we were happy with those results. That showed some improvement in true attainment, so that was fine.

Q101 Chair: Does that not mean there is not a comparable set of letters to the ones we saw between you and Edexcel?

Glenys Stacey: That is right.

Q102 Chair: So you did not write to them in a similar manner.

Glenys Stacey: No, we did not. In the awarding meeting, when we saw the provisional data on 6 August, it looked acceptable to us from AQA. There was no exchange. It was a very similar position with OCR: those results had come in in good shape and no exchange occurred there. We did have exchanges, though, with Edexcel and WJEC, and I can deal with those, if you wish.

Chair: Yes, that would be helpful.

Glenys Stacey: Okay. Well, perhaps, first of all, with Edexcel, as I said, the preliminary results there were high. If the provisional results had been left to stand, I think we were looking at 6% or 7% inflation, and there would have been a different outcry, I suspect, had that been the outcome. They were certainly out of line with the other awarding body results and, of course, our statutory role is to maintain standards over time and across awarding bodies as well. At the meeting, the regulators were there, but also each of the leads in the exam boards. We discussed this at that time, and there was a clear recognition, a view, that this really did need challenge, and so I agreed that I would deal with that with Edexcel on a one to one, which is what you would expect.

I should say that at that meeting I did recognise and say that I knew that awarding GCSE English had been a real challenge for exam boards this year, and I acknowledged that they had been trying very hard to get that right.

Q103 Chair: You did not have to direct them in the end. In a way, they accepted the fairness of the line you agreed, because if they had not, they could have refused and then insisted that you directed them to change.

Glenys Stacey: Absolutely so.

Q104 Chair: Did you direct any other awarding body to change in any other subject this year?

Glenys Stacey: No, but we were prepared to; that is our job.

Q105 Chair: It also gives an indication that, if nobody ever forced you to direct them, they are pretty keen on not being directed.

Glenys Stacey: No. You will know that we are there to do a job: to maintain standards. If a direction is required, we will do it. That is why Parliament set us up as it did and gave us the powers that we have. I know that was recognised when our enabling legislation was passing through Parliament-that this would be required to hold the line. Ultimately, it is the exam boards’ decision as to how they bring the qualification in. We basically did a proper job in writing to Edexcel to remind them that it was their duty to do that, and they came and responded to us with information as to how they were going to do it.

Q106 Chair: Can I move you on to WJEC? They are a rather anomalous situation.

Glenys Stacey: Absolutely. I should just say, though, because it is not entirely clear so far in debate, that the proposal that Edexcel came back with still left them out of tolerance. They were still showing year-on-year increases over the year before, and we had a difficult judgment to make as to whether we thought that was tolerable, if you like. We do have to make those judgments and we did, and we concluded that we would accept the proposal that Edexcel made.

Q107 Mr Ward: When you asked Edexcel to bring them into line, did that include their already published January results?

Glenys Stacey: It is not for Ofqual, at that stage, to advise and state what we think they should do. Obviously, we had discussions with them throughout awarding as to the issues. You will see we did not write to them telling them to change grade boundaries in June. We wrote to them telling them they had a requirement to bring the qualification in, rather than at 7% above last year’s results.

Q108 Mr Ward: On the basis that they had already published the January results, and we have more or less agreed that they were flawed from this analysis that was carried out, in your view would it have been inappropriate for Edexcel to bring their results into line using the January results as well?

Glenys Stacey: I do know that Edexcel had considered and discussed with us before the possibility of reopening the January grade boundaries; it was an uncomfortable prospect and, I think, fairly quickly discounted at the time, because those results are out there, established results and students had relied on them as published results when they were building up the rest of their qualification.

Q109 Mr Ward: So if they did not do that, your understanding is that they brought them into line with what we have already agreed to be flawed January results?

Glenys Stacey: No, they did not quite bring them into line with what I would say were generous but right-at-the-time results.

Q110 Mr Ward: Did they claw back the overgenerous January marks through adjustments to the June boundaries?

Glenys Stacey: I think I understand what your concern is there: that there was some overharsh approach to the June boundaries. The position is that if they had clawed back, as you say, to get to truly comparable outcomes, they would have had to make more significant changes to the grade boundaries than they made. They did not do that. They did not come back to us saying they were going to do that. They came back to us saying that they were going to make changes, yes, but they were not of that ilk. They made a judgment; we looked at that; we thought it to be right. An option for us would have been to go back and direct them to make more radical change or, indeed, to reopen the January boundaries. We did not think that to be right.

Q111 Chair: We have limited time and a lot to cover: WJEC as succinctly as you can manage, please.

Glenys Stacey: Right. Well, as I have said before we are applying comparable outcomes across A levels, AS and 47 or so GCSEs. Following the meeting with exam boards to review the provisional A level results, we (the three regulators, acting jointly) challenged WJEC's results in 33 of the 36 subjects they awarded, and we focused on ten subjects in particular. Following the meeting with exam boards to review GCSEs, we challenged WJEC's provisional results in GCSE English, and English Language.  It is our job to challenge those results and we do.

Q112 Chair: Is this raw, year-on-year comparison, because you do not have Key Stage 2 data for the Welsh students certainly, or are we talking about English students who took WJEC? I might be getting myself confused; I think I am.

Glenys Stacey: WJEC is based in Wales, but it markets its qualifications across the border and, indeed, in English the majority of its students are in English schools and centres, so it is an added complication, but I am talking about all of the A levels and GCSEs that WJEC were providing. Again it is our job to challenge those results and of course, we did so, but we particularly chose to challenge those qualifications where WJEC had a good sample size, if you like, from the English schools and centres and that is what we did. I forget how many it was, but it was six or eight, wasn't it?

Cath Jadhav: That was for A level; we challenged some A level, and we challenged GCSE English because we had a substantial subset of the entry with Key Stage 2 predictions.

Q113 Chair: The regulator in Wales, the Welsh Government, has come to a very different conclusion from you and has recommended regrading. Why have they come to such a different conclusion from you?

Glenys Stacey: Our obligations are to regulate to maintain standards in England. That is our statutory obligation. Things are different in different countries, as you know.

Amanda Spielman: I think there is political difficulty in Wales. In what we are seeing, there is a clear divergence in performance between English and Welsh candidates. If English candidates are where we think they are based on our work, the implication is that Welsh candidate performance is not improving. This is a very difficult conclusion for the Welsh to accept politically, hence what we saw yesterday.

Q114 Chair: So as they cannot get results up by improved performance, they are simply inflating results. Is that the allegation?

Glenys Stacey: I think it is useful to reflect on. We had several discussions with WJEC and with the Welsh regulator to try to get to the right position in relation to results in English. It was particularly difficult, because we were able to break the results down by centre or school location, and the results for English students were significantly better than the results for students based in Wales. That is a particularly difficult problem for the regulators and we were very keen to make sure that we got to some sort of acceptable common standard, bearing in mind that these students are competing with students who are taking the qualification with other exam boards. So, yes, we did write to WJEC, Chair, to answer your question; we wrote a tworegulator letter to WJEC to ask for proposals. They put a number of options to us, to the regulators, as to what could be done. We spoke again with our fellow regulator in Wales and agreed what was, in effect, the softest option, which was to move, by one point, two or three boundaries. WJEC had put up four options, I think, which were more significant as you went up the range. The other options were not acceptable to the Welsh regulator. We took the first option.

Q115 Chair: I think the Welsh review has recommended that there should be a regrading of WJEC candidates in England. Sorry-right, I have been misinformed. I think that is in Wales.

Glenys Stacey: I think the review said it would be possible in England as well, Chair.

Q116 Chair: Yes, so do you have a view on the regrading that they recommend for WJEC candidates, wherever it is they lie?

Glenys Stacey: You will understand that this report came in just yesterday and we have not had an opportunity to review it at length, and we would very much like to do that.

Chair: Thank you very much.

Q117 Craig Whittaker: One final question from me. I know we have hammered this to death, but you have already said that the January awards were generous. Can you give us the evidence to support why they were generous? Everybody is saying they were generous, but nobody has given us any evidence yet as to why they were.

Cath Jadhav: I think the evidence comes from having those subject-level predictions when the summer awards were taking place and reflecting back, so it is the benefit of hindsight. Had the awarding committee had that data when they were setting boundaries in the earlier series, they would probably have set higher boundary marks.

Glenys Stacey: In addition, we met regularly with exam boards in English, on a technical level, if you like, and exceptionally, because we knew there were risks with these qualifications, we did ask them to report to us formally after the January 2012 series of assessments. We would not normally do that, but we did this time. We asked them to report any difficulties they had experienced at the time. They told us that they had been cautious in making unit level awards for English. They reported that they had concerns about the quality of controlled assessment marking by teachers, even though they had been putting more effort into giving schools detailed feedback and advice to improve on the quality of their marking. So we were getting that feedback from the January series, if that helps.

Q118 Craig Whittaker: I am still struggling to understand what the physical evidence is to suggest that they were highly inflated.

Glenys Stacey: I will ask Cath again, but I think the issue comes down to the nature of the subject-English-where it is very judgmentbased. It also comes down to the specific aspects of achievement that examiners were trying to assess and evaluate. If you are looking at speaking and listening skills, they are notoriously difficult to assess. It is illusive; it is difficult to grasp this, but there were a number of things in January about the subject-about what you are trying to assess, about the dearth of data, about the lack of comparable things from the past that you could truly absolutely rely on-that made those judgments simply very difficult. That is the nature of judgment. There are a number of places in the whole system of awarding and getting to the final outcome where judgment is applied: it is applied at examiner level, at exam board level and certainly at regulator level.

Q119 Craig Whittaker: So what you are saying is there is no specific thing that you could assess, with hindsight, that says, "The reason January was highly inflated was because of A, B, C and D."

Glenys Stacey: No. I think we can say there is a combination of factors that contributed, but I do not think we would say it was highly inflated either. No.

Q120 Craig Whittaker: Okay. But we generally accept that January was of an easier grade than June?

Glenys Stacey: This is quite a difficult thing to get across, but we were looking very carefully at it; examiners were looking very carefully at it. They were exercising their best judgment at the time. Indeed, some reported they thought they were being harsh. We had subject experts at many of these meetings; they could not see that the standard was not right.

Q121 Chair: Why not? We are still struggling with that, because even if you take a module in which you will not get the whole picture until the end, you then analyse the results and you compare that with Key Stage 2 results and whatever other comparative performance you have that you use. Wouldn’t it have flagged up? I am still struggling to understand why it was not possible. Let me try a different question: seeing where we are today, sitting here, looking back-and hindsight is a wonderful thing; politicians are great at suggesting everyone should have it apart from them-could you have done things differently? Were there techniques, as we are sitting here today, that you wish you had applied or you had had more resource or people to do? Were there things that you could have done that could have given you that insight, or was it literally impossible in any conceivable scenario for you to have been able to predict that the January boundaries were too generous?

Glenys Stacey: I am certainly not aware of any techniques that we could have applied to put more certainty, more assurance, on the judgments that were being made in the best possible way at the time. That might reflect back and you might ask questions then about the nature of the qualification, but, at the end of the day, speaking and listening is in the National Curriculum and it has to be assessed.

Q122 Chair: If you are right that it is impossible, then it does take me back. I looked up the passing of the legislation, and I think it was in Bill Committee back in 2009 that the then Government basically kept being asked by, ironically, the new Schools Minister, "In order to keep standards constant, does the Minister expect there to be a change in the borderline of grades so that the percentage of students in each cohort getting A, B and C grades remains roughly the same?" He was asking about grade inflation and saying that the new exams, modular and assessed, would lead to inflation. The Minister very conveniently said, again and again, "That will be the up to Ofqual. They are the independent regulator." So it was predicted and you are on the receiving end.

Amanda Spielman: If I can come in here, there is a really important piece to get across. It has been really hard and very upsetting listening to schools talking about their disappointment this year, and it is clear that so many schools were expecting much more from the new specs. But whilst some schools are good at predicting, it is clear that a lot are not. In the last year of a very long-established specification, the schools that were using AQA, which is much the largest board, predicted 77% would get a C or better in the written paper; 65% did, so school estimates were 12% higher than the outcome. You might have thought that this year, in the first year of a completely different exam, schools would be a bit more cautious about their predictions and have brought them down a bit.

Q123 Alex Cunningham: So it is the schools’ fault, is it?

Amanda Spielman: This is not saying it is the schools’ fault.

Q124 Alex Cunningham: That is what you have just said.

Amanda Spielman: No, it is not. I am saying that school expectations are in a very different place. We would have expected, in the face of a new specification, to see schools being a bit more cautious. What we actually saw, based again on the data collected by AQA, was that school expectations for Cs went up by 2% this year. They went up to 79%; that is 14% more than achieved a C last year in the same paper. So it seems as though there was an increased level of expectation in aggregate, and we truly do not understand why school expectations were so far ahead of what anybody can realistically have expected to happen.

Q125 Bill Esterson: You get some years where a cohort’s potential results are significantly out of step, either up or down. That surely is the answer. Where your director of standards wrote to Edexcel about this 8% rise, surely that is the point. We have this linear increase in exam results, but that does not reflect the reality of the potential of each cohort. Surely that is part of the issue here.

Glenys Stacey: I can assure you we do see changes in cohort, and the approach that we adopt does allow for changes in achievement as a result. I think in GCSE economics this year, for example, we saw very significant changes in outcomes, positive ones, which we could understand. I think the issue here is in a subject like English, an established subject where the National Curriculum was not changing-qualifications were, or the measurement tool, if you like, but the Curriculum was not-it would have been remarkable indeed if, within that one year, student achievement had shot up by 12% or, indeed, 15%. It would have been very hard to see that over one year.

Q126 Bill Esterson: What about the 8% that you wrote to Edexcel about?

Glenys Stacey: It is the same thing. It is the same thing, because if you think of it, achievement tends to stay relatively steady over time.

Q127 Bill Esterson: Over a number of years it does, yes, but it can go well outside of that linear progression.

Amanda Spielman: At school level, performance can change quite significantly from year to year. At national level, performance changes only very slowly. If you are measuring a glacier and put in a new meter, and the meter tells you that the rate at which the glacier is moving has quintupled in a year, then your first reaction is to check the new meter you have put in, not to say, "The glacier has speeded up five times."

Q128 Siobhain McDonagh: Our previous guests, whatever we are supposed to call you, brought us back to the fact that it is individual children and individual schools that we are talking about rather than a very advanced degree in statistics. I just want to give you the results from not an academy but a very well established Catholic school in my constituency, in the borough: Wimbledon College, a boys’ school. They normally expect, every year, roundabout, the students to get the same level of grade in literature as language. This year, 73% of boys got five A to C passes in English literature; 32% got five A to Cs in language. In the combined paper that they did, 91% of boys got the A to C grades in the language paper and 88% in the literature. Given that there are years and years of statistical history, years and years of knowledge, that would seem a particularly spectacular result.

Glenys Stacey: If I can just deal with that and say straight away that we need to do more work on the most extreme variances. We do not yet understand them fully. You will see from the report we put in yesterday that we are working very hard to understand what is underlying all of this, but we do need to understand more. But we have to bear in mind that, overall, achievement went down by 1.5% and that was in line with expectations. In fact, underlying that we think there is a slight increase when we look closely at the cohort data. For every school, if you like, that has had a significant shortfall as against expectations, there are others that have had much better results and, of course, we are not hearing so much from those. But I can give you my assurance that, for our next steps, we are continuing to work with ASCL, particularly, and NAHT and others, and we do want to understand more about why there are these particular variances. We know there are some contributors that might be relevant and different school by school. For example, the routing through, the approach to controlled assessment, the mix of units; these are all very relevant things that we need to understand better.

Q129 Chair: Can you give us any insight? It came out in the first session-all these technical challenges and difficulties, including enhanced expectation, and then this huge variability. It would appear that the key ingredient in what was already a difficult, challenging environment was the fact that there was massive variability. You have outstanding heads with a long history of departments getting consistent predictions and good results suddenly falling off a cliff. It is not surprising that they are going to appear in front of us and write letters and be in local newspapers screaming from the rooftops about something that they just cannot begin to understand, when they have English heads of department who have given every single effort coming in to offer their resignation when there is this inexplicable, for them, collapse in performance in one of the most important qualifications.

Glenys Stacey: Yes, okay. First of all, we are looking to identify nationally the rate of variability. As usual, it is not straightforward. We know there has been variability here. We do not know as yet that nationally that variability is particularly out of sync with what we would expect. That is the first thing.

Q130 Chair: I thought your interim report had said that the variability was normally-I might be getting the numbers wrong-about 8% and it was double that this time. Am I wrong?

Cath Jadhav: What we do not know is whether that variability is naturally greater when specifications change, so what we need to do is some more work looking at change situations.

Glenys Stacey: We need to understand how different this is at a national level when qualifications change, and we need a bit of time to do that.

Q131 Chair: Given the introduction of a new exam and given the fact that every year schools are predicting 12%, 14% more Cs and above than they get, and that there is variability anyway between them-so there is always a bunch of outstanding schools that suddenly fall off a cliff-are you saying it is just the collection of all those who do badly in any one year given a reason, once it is in the national press, to believe it is not their fault who are coming out, or not?

Glenys Stacey: Not at all. If we look at it, stand back and ask what the reasons could be here, we know that other qualifications are modular, so it is not necessarily simply the modular factor. We know that other GCSEs have controlled assessment and it came in for the first time for these qualifications. We think there is something about the fact that, for this English suite, 60% of it was controlled assessment. So we are very interested in how schools have approached that and whether there is something to learn from that.

Q132 Chair: Does that mean, in crystal clear terms, boosting marks-"overmarking" as you call it?

Glenys Stacey: Not necessarily. There are some suggestions from exam boards that they were concerned about overmarking, but the fact is this is also a completely different way of assessing for schools. It is different from coursework. It is absolutely different, and some of the reports that we have had back from awarding bodies are that some schools were finding that difficult. They were unclear about the rulesetting. So there is something to explore there definitely.

Q133 Ian Mearns: We heard in the previous evidence session that there has been some discourse between you and the awarding bodies about potential overmarking. Why was that never reported back to the schools that are dealing with this and trying to moderate the things for themselves, so that it would have an impact on the learning and the achievement of the children? That is the bottom line here.

Glenys Stacey: It is the bottom line, I agree, and we need to look more at that. We do know that those who were moderating controlled assessment did have conversations when they were returning scripts to schools, but we would like to know more about that, particularly in those schools that are most adversely affected.

There are a couple of other issues, though, I think, that are looking as though they are relevant here as well. Some of the schools that are seemingly most adversely affected-as far as we know at the moment, but we have more work to do-may have a particular student mix and a particular bunching up of students at the C/D boundary level. So they may have, because of their intake, a greater proportion of students who are trying very hard to get across that boundary. So the student mix in each type of institution is really very important.

Q134 Ian Mearns: I put it to you that, if the awarding bodies knew that there was some potential for grade inflation in terms of the moderation being done within schools and they did not report it back to the schools, so the schools could not do anything about it in time for it to affect the grades that were achieved by the children, that is a massive problem. That is a massive problem and, as Ofqual, you have to do something about that with those awarding bodies to make sure that never happens again, but also to make sure something is done about what has happened to the grades of the individual children who have been affected by this.

Glenys Stacey: There are several things there. One is I think one of the lessons learnt here is that communications could have been better.

Q135 Ian Mearns: I think that is an understatement, Glenys.

Glenys Stacey: Well, we know that awarding bodies put the usual health caveats, health markings, on exam board grades from January. We know that they were communicating with individual schools when they were engaging on moderation, but not everything is moderated. It is not clear that enough was done about getting enough out when it became apparent that there were these problems.

I shall come back to the individual student in a moment. Just to complete answering the last question about the particular factors, I have mentioned the particular type of candidature or cohort per school that we need to know more about. The last thing, which I do not think is insignificant, is the routes through these qualifications-the route effect, if you like. A good number of schools were taking a modular qualification in a linear fashion, and we just need to know more about how that correlates to those schools that are most significantly affected by these results.

When it comes to individual students, I think you are thinking there about the fair thing for students. Of course, maintaining standards is ultimately about fairness for all students over time. There is no easy answer. There are competing fairness considerations here. We know, for example, that about 95% of the AQA candidates, the majority provider, took their controlled assessment units in June and not January. We also know that, as it happens, one in four students taking the foundation tier paper in June took it as part of their first year of study. So they are carrying a grade through to next year and, of course, they will be competing with other students next year.

We have reflected on fairness and what is the right thing to do very profoundly, understanding the effect that this has had on schools and, of course, on students. We know that awarding in June was regular and proper and, strictly speaking, unpalatable though it is, one might say that the unfairness is in January, but we do not think it right to revisit or intervene in the January results, as you know. Students have already got their grades there and relied on them.

I have mentioned before the small but significant difference between routes, and I think that is really important, but there are factors outside of our control, ultimately, that affect results each year. We are an industry regulator, as you know. We regulate exam boards and, if I can give you an analogy, we know that Ofcom regulate telephone providers but they do not necessarily ensure that every single phone call and every single bill is right, and we cannot do that. We are, I think, in an uncomfortable position in fairness terms, because our job is to maintain standards.

Chair: I will have to cut you off, Glenys, I am sorry; we have limited time left.

Q136 Alex Cunningham: I am sure you are in an uncomfortable position, but this is about the future of young people who sat their particular exam this year, who might not have achieved the A they need for their future university prospects or the C that they need in order to go forward to their next studies. You are acknowledging it is not fair; some people got lucky and it was tough on the rest.

Glenys Stacey: I am not saying it was tough on the rest, because we have had a very careful look at June awarding, and June awarding was right. From the outcomes overall, a 1.5% decrease, we know that, when we strip out cohort changes, that is quite generous.

Q137 Alex Cunningham: So the January lot were lucky.

Glenys Stacey: I said that publicly, because it is one way of expressing it that people could perhaps understand. I hope we have managed to give you a little bit more information behind what we mean by that.

Q138 Alex Cunningham: Can you explain the luck to the 13 students at I think it was Egglescliffe School on Teesside who did not achieve the grade that they expected?

Glenys Stacey: What I would say to that is I quite understand the effect on them of that and the disappointment, but from what we have seen of the way the awarding works, the grades were the right grades.

Q139 Alex Cunningham: We assume that the person from Edexcel who signed the letter dated 8 August stands by their statement. That statement is, "We believe this to be compelling evidence that our award is a fair award and we do not believe a further revision of our grade boundaries is justified." What changed? Why did they change their mind when they made such a bold, straightforward statement in that letter?

Glenys Stacey: What changed, Cath? Do you remember?

Q140 Alex Cunningham: Was it the fact that you adopted a strong-arm tactic to tell them, "You must ensure that these grades are downgraded" or "these children are downgraded"?

Glenys Stacey: I think the position is that what changed was we knew, looking at that, that there would be a 6% or 7% increase, grade inflation, that we did not think to be right or justifiable. We, therefore, wrote to Edexcel pointing out that they needed to bring the qualification in appropriately. They reflected on that, and it is quite right and proper that they should have done. The way the system is set up in the legislation that we all operate to requires us to put that challenge back to them. It then requires them to look at whether they can justify their outcomes, and that is what they did. Do bear in mind that this is a competitive market and the legislation that we work to is a really solid framework that stops the race to the bottom. Exam boards really have to square up and justify if they are presenting an outcome that is not, on any factor that we can see, a justifiable outcome.

They could have come back to us and said, "No, the outcome we have is right." They did not.

Q141 Alex Cunningham: If they had done so, what would have happened? If they had decided to stick by their guns, stick by their statement, what would have happened?

Glenys Stacey: If they had done so, then the legislation provides that the regulators can direct grade boundary changes.

Q142 Alex Cunningham: Would you have done that?

Glenys Stacey: I think we would have done, yes.

Q143 Alex Cunningham: You think you would have. Would you have done that?

Amanda Spielman: It would have to depend on the arguments that were presented for maintaining that position.

Q144 Alex Cunningham: They have already presented the arguments to you, haven’t they?

Amanda Spielman: If they had come back with further arguments as to why they were justified, we would have had to have considered those. It is impossible to say one way or the other.

Alex Cunningham: So they just threw in the towel after you told them that they must downgrade these young people.

Q145 Ian Mearns: Do you think it is possible also that AQA and others might not have bothered to have an argument with you because they knew what was coming?

Glenys Stacey: Not at all, not at all. I do not think the relationship is quite like that. If I can come back to Edexcel, we had not seen any compelling evidence that persuaded us or, indeed, anybody else, including the other regulators and other exam boards, that their preliminary results could be sustained. They were out of line and there was no clear understanding or argument put to us as to why those results were different.

Q146 Alex Cunningham: I think you have made that quite clear. Sadly, we are not going to have the chance to speak to Edexcel today, but I hope we will extend the Inquiry in order to do so. If you effectively forced them to ensure that fewer students passed at the higher levels, does that mean the credibility and the professional standing of Edexcel is, at best, compromised and, at worst, totally shot? They are not a credible organisation, are they? They were so far out that you had to direct them to lower the grades of these young people.

Glenys Stacey: We did not direct them. I think they were the furthest out, weren’t they? All I can say about that is I do understand and I have set out the particular difficulties of awarding.

Q147 Siobhain McDonagh: We have already covered some of the issues of what sorts of schools and colleges were affected. Looking in my own area, there is no pattern at all to it. We have roughly the same number of students passing five A to Cs English and maths, with huge drops from the boys’ Catholic schools and increases from the girls’ Catholic schools. We have one academy doing well, and one much more well established, Harris Academy, doing less well. Have you identified what sorts of schools or institutions it is that have had these particular issues?

Glenys Stacey: We are beginning to do so, yes. When you are looking at these differences and when we are looking at them as well, we need to understand whether they are differences compared with achievement last year or achievement the year before, bearing in mind last year was an ending spec, which is relevant, or whether it is differences against predictions that were made by schools. I do think that is relevant.

We have looked across the types of schools and colleges that exam boards deal with to see a particular pattern. Achievement year on year is matched very closely in academies and in secondary schools, but within academies and secondary schools there will be these differentials that we need to understand. We did see a particular effect in further education colleges, which we want to understand more as well.

Q148 Siobhain McDonagh: What happened in the FE colleges?

Glenys Stacey: Looking at the data overall, they saw particular falls as a group of institutions compared with, say, secondary schools or academies. There will be some likely reasons for that. For example, those in further education colleges taking English may well be taking it for a second time. They may well not have been able to get the grade they wanted when they were at secondary school, so that will be one of the factors, certainly.

Q149 Chair: That would not change year by year. It would not explain why there was a sudden drop this year over last year, unless there had been a change in the cohort.

Amanda Spielman: One hypothesis is that, post the Woolf recommendations, more colleges are instituting a policy of having more people do resits if they have not already got a C when they arrive. We have a number of hypotheses we have to explore that we simply have not been able to cover yet.

Q150 Siobhain McDonagh: Do you have any idea what group or type of student was affected? We are concentrating on the D/C band, but there is some evidence there was some problem higher up the bands as well.

Glenys Stacey: We know the spread of grades, obviously, that were achieved. We can see that there were particular pressures, I think, on the C/D grade boundary.

Q151 Siobhain McDonagh: Are they boys or girls?

Glenys Stacey: We have not done a gender analysis.

Q152 Damian Hinds: To pick up where Siobhain left off, is there any difference between foundation and higher?

Cath Jadhav: Obviously there are differences in the grade profiles for those groups of students.

Q153 Chair: Can you tell us a little bit about it, because the Secretary of State a few weeks ago said-it came as news to lots of people, including, shamefully, perhaps, the Chair of the Education Select Committee-that significant numbers of children were on this foundation tier, which meant they could not get more than a C. It looked remarkably like a CSE and looked remarkably like a twotier system had been in our system all along. What percentage of people take each one in English, seeing we are looking at that, and can you let us know whether the people taking the foundation tier are peculiarly affected?

Glenys Stacey: I think the first thing to say is that things changed this year, because traditionally the foundation tier was for, if you like, the less able student and you could not get more than a C grade if you went down that route. The rule has changed for these qualifications and it is possible to get an A grade on foundation tier.

Q154 Chair: As of 2012. So in 2011 the maximum was C, like the old CSE, but in 2012 you can get an A. Is that right?

Cath Jadhav: As always, it is slightly more complex than that. It is to do with the proportion of the assessment within the English suite that is untiered. The controlled assessment is untiered, so a candidate who might be entering the foundation written paper could also, in theory, score maximum marks on the controlled assessment, which, when it is aggregated, means they could, in theory, achieve an A.

Q155 Damian Hinds: Is that overall?

Cath Jadhav: Overall.

Glenys Stacey: It would be unusual to do it, but it could be done.

Damian Hinds: Gosh.

Glenys Stacey: I know; it gets quite complicated. There were, of course, these common units that students would be taking early on. For exam boards, that is particularly difficult because they do not know, at that stage, what the ultimate destination is in terms of the tier of qualification for the student. It is another complicating factor.

Q156 Damian Hinds: There are different questions around the impact on different schools, as Siobhain explored very effectively, and then what happened at the aggregate level. You have mentioned a couple of times the changes in the cohort, and we know there were 23,000 fewer pupils from independent, selective schools and there were perhaps more pupils doing resits-perhaps. Do you have such a thing as a like-for-like analysis? Obviously, you cannot say like for like, because they are different children each year, but if you say among maintained, nonselective schools, last year X% of pupils got A* to C and this year it was Y%.

Glenys Stacey: This is matched data really, isn’t it?

Cath Jadhav: We know all things are never equal, but the matched candidates, the candidates with their Key Stage 2 scores, give us, as far as possible, that like-for-like comparison, and that was up by between 1% and 2%.

Q157 Damian Hinds: For the avoidance of doubt, if you strip out, insofar as you can, the changes in the mix of pupils, there were, nationally, more children who scored grade C or above at GCSE English in 2012 than in 2011.

Glenys Stacey: Yes, that is right.

Q158 Pat Glass: I have sat very quietly all morning and the Chair is now cutting me off because we are running over time, so I know that, in fairness, he will give me more than my fair share with the Secretary of State tomorrow to make up for it.

Damian Hinds: Tactics.

Pat Glass: I have two questions and they just require yes or no answers. You said, right at the beginning of your supplementary memorandum to us, that there was no political interference. We accept that there is no phone call between you and the Secretary of State, but if, as a Committee, we decide to extend this Inquiry-and I think we should-would you be prepared to publish copies of correspondence, emails, text messages and phone calls between your staff who are involved in this and senior staff at the DfE who are involved in this and special advisers, ministers’ spads?

Glenys Stacey: Absolutely; we would have nothing to hide.

Q159 Pat Glass: Thank you; that is a yes answer. Secondly, this question is about the people who are affected by this. This is not just numbers on a piece of paper. There is substantial evidence of the difference in life chances between children who get five A to Cs and those who do not, and those who get Cs and Ds, and it is not just about their academic qualifications. Children who get five A to Cs are less likely to get divorced, are less likely to get cancer, are less likely to end up in prison or homeless, and a whole range of other things. So this is about what is going to happen to these young people for the rest of their lives. Given that, and given what we have heard about what the Welsh regulator is doing, and given that we know that, as this is largely the C/D borderline, it is likely to affect fewer children in grammar schools and independent schools-it is going to be kids in comprehensives, on the foundation level, on those C/D borderlines-are you not prepared to look again at the issue of rebounding, given the long-term impact of this on children’s lives?

Glenys Stacey: I think the problem we face with that, given our statutory framework and obligations and objectives, is that, if we apply the January boundaries to the June students, we would have inflation of, we think, about 5% or 6%; we have not worked it out entirely.

Q160 Pat Glass: These children worked just as hard, and their teachers worked just as hard. This is not their fault.

Glenys Stacey: But the results would be inflated.

Q161 Pat Glass: So the answer is no.

Glenys Stacey: What we have seen so far, with nothing to doubt it, and we do not expect to doubt it, is that the June boundary setting occurred properly. That does, as I say, leave us all in a very uncomfortable position. We have thought, as I said, carefully about fairness and we keep on thinking about it. We think that the right thing to do is to offer a resit opportunity for those students, and we have been working very hard with exam boards to make sure we can do that quickly.

Pat Glass: Some of them have lost the will to do that now.

Q162 Chair: I must bring this to a close, but in terms of conducting an inquiry into this, is that something that you can properly do when you are such a key player in it? Aren’t you effectively investigating yourself?

Glenys Stacey: We have reacted very promptly to concerns expressed to us by schools and colleges, and I know that they recognise that. We have been very open with them about the data information that we are collecting and the conclusions that we are reaching. We really want to do more to get to the root of this. It is our job to make sure that standards are right, and so we want to do that, but of course we do not object if there is any other sort of inquiry-not at all.

Amanda Spielman: The issues are mainly around what has happened in exam boards and in schools, not around what has happened in Ofqual. We are not simply looking at ourselves.

Q163 Ian Mearns: Chair, I think anyone, an objective observer watching this from outside, could draw the conclusion that there are lots of unknowns, there are things that you are thinking about doing, and there are some things that you are beginning to do. Some objective observer could draw the conclusion there is a lack of urgency in Ofqual in terms of dealing with this. How would you respond to that? As Pat has pointed out, this is having an impact on individual children’s lives.

Glenys Stacey: I would most certainly and robustly refute that. We were notified, I think, on the Bank Holiday of concerns. We responded immediately. We reported in our initial report inside of a week. There is lots of midnight oil being burnt at Ofqual and we are giving this our top priority, I assure you. Indeed, no one, until now, has suggested otherwise. All of the resource that I can devote to this is being devoted to it.

Chair: Thank you very much for giving evidence to us this morning.

Prepared 12th March 2013