Memorandum from Professor Paul Black and
Professor Dylan Wiliam (HE 73)
THE USE OF THE SAT IN THE USA
Whilst it is widely used, the SAT is a focus
of public controversy and professional concern in the USA.
Private test preparation agencies can enhance
SAT scores by ad hoc training programmes, which are open to those
who can afford their fees.
SAT results show considerable bias against those
from low-income groups and/or from ethnic minorities.
SAT scores are weak predictors of college success,
and are less effective for this purpose than school grades, despite
the wide variety of school systems across the USA.
A significant number of universities and colleges
in the USA have abandoned the use of SAT scores for selection,
on the grounds that the pressures on schools for ad hoc preparation
for those who are to take it distort the education of their future
2. Following the contributions of Professor
Wiliam to the discussions of the Committee on Tuesday, 18 July,
we have noted the interest of the Committee in the possible use
of the USA's SAT as a selection test for tertiary education in
this country. We offer this background note as a contribution
to the Committee's consideration of the SAT, particularly because
we are aware that the use of the SAT is a matter of controversy
in the USA. Whilst both of us have interest and expertise in the
field of testing and assessment, Professor Black has extensive
first-hand knowledge of the situation in the USA, being a visiting
professor at Stanford University, a contributor to education studies
of the National Academy of Sciences, through work on their Board
of Testing and Assessment, and in their preparation of three documents
of significance for national policy : the USA National Science
Standards, a new addendum to those standards on Classroom Assessment,
and a forthcoming study on the Cognitive Foundations of Assessment.
3. The issues raised in this note are based
on research and professional literature, as set out in a list
of references at the end. The issues are further illustrated by
two extracts from the press, one being an article in the New York
Times (10 July 2000) and the other an editorial debate in USA
Today (11 July 2000), copies of which are also appended.
4. The SAT was founded in 1926 under the
name of Scholastic Aptitude Test. It was adapted from the IQ tests
first used extensively for selection purposes by the USA military
in World War I. It was composed then and ever since, solely of
multiple choice items. Some of those who developed it had a vision
of a range of such tests being used as tools for social engineering
on a large scale, and were closely associated with the eugenics
movement which burgeoned at that time (see Hanson 1993, Lemann
1999). The SAT, set up for the College Entrance Examination Board,
is produced and administered by the Educational Testing Service
(ETS) a private "not-for-profit" organisation.
5. Early impetus was given by the president
of Harvard, J B Conant, whose aim was more restricted and less
alarmingto select, for the privileged universities, the
most "intelligent" applicants in a way that would negate
the effects of the privileged educational and social backgrounds
enjoyed by some. The SAT subsequently changed character, particularly
after World War II, into a programme to promote equality of access
for all. The size of the USA, the difficulties of communication,
the multiplicity of curriculum influences in over 40 states and
the heterogeneous background of a country still welcoming large
numbers of immigrants, meant that test methods familiar elsewhere,
notably in European countries, were unsuitable.
6. Despite its remarkable expansion and
the lead that it gave to the growth of a major test industry in
the USA (private test agencies provide several hundred million
multiple choice tests each year), the SAT has attracted a range
of serious criticisms which call in question the viability of
selection for higher education based on the measurement of "intelligence".
Four of these attack the basic claims upon which the SAT is founded,
7. One claim that was essential to its reputation
was that the previous experiences and education of candidates
would not affect the measurement, so that no amount of coaching
could enhance one's score on the test. This claim was severely
dented very early in its history by private agencies who convinced
the public that they could raise the scores of SAT candidates
by ad hoc drilling with questions similar to those used in the
test: as a consequence, those who can afford the fees for private
preparation can enhance their SAT scores. The claim that such
enhancement was not possible was abandoned by ETS in 1979: there
is well researched evidence that ad hoc test preparation does
yield significant score increases (Bond, 1989) and the recent
editorial debate in USA Today (attached herewith)
illustrates the current public interest in the challenge to this
8. A second claim is that tests should be
free from bias, in that inequalities associated with irrelevant
effects of the family origin, gender, race and so on of candidates
will not affect their scores. A great deal of effort has been
invested in exploring this problem in order to alleviate its effects,
but it cannot be claimed that bias has been eliminated. There
is a vast history of legislative battles in the USA over the problems
of alleged bias in standardised tests, and the SAT has endured
its shared of these (Cole & Moss 1989, Heubert & Hauser,
1999). The results of the 1999 tests for college-bound students
shows, for example, that the mean SAT score for white students
is much greater than that for African American or Black students
(1024 points against 856 points), and the score for those with
family incomes over $70,000 a year (about 1070 points) it is greater
than for those with incomes under $20,000 a year (about 900 points)
(see College Board 1999). Thus the evidence shows that privileged
candidates do secure, on average, higher SAT scores than those
less privileged. A difference in marks in A-level of a size comparable
to a SAT score difference of 200 points would produce a change
of more than one A-level grade.
9. A third claim is that the IQ test, or
the numerical and verbal parts of the SAT, measure well-defined,
underlying and central components of human capacity and potential.
Ironically, Brigham, the inventor of the SAT, who came to be one
of its harshest critics, foresaw challenges to this claim when
he wrote in 1929:
The more I work in this field the more I am convinced
that psychologists have sinned greatly in sliding easily from
the name of the test to the function or trait measured.
(Quoted by Lemann 1999, p 33).
Most psychologists do not now accept the notion
of the single unitary trait that the IQ claims to measure and
argue for more complex measures of human thinking (Gardner, 1993:
Sternberg, 1997). It should be noted that the title Scholastic
Aptitude Test was changed by ETS to Scholastic Achievement Test,
and subsequently the test has come to be called the SAT without
connection of the initials to particular words. Recently it seems
that ETS has said that it is a test of mental dexterity (see the
recent New York Times article attached herewith)
this concept is not recognised in the field of psychology of learning,
and it is doubtful if it has any clear meaning.
10. It may be that, despite the ambiguity
about what exactly is measured, the SAT does measure a set of
aspects of the complex of human thinking which are relevant for
prediction of achievement in a particular sphere, notably tertiary
level study, and may function usefully because it reflects a relevant
combination of these aspects. This leads into the fourth claim,
namely that the SAT is a good predictive measure. The ETS has
to be able to show evidence of strong correlation between the
SAT scores of college applicants and their subsequent performance,
in order to convince tertiary institutions to require applicants
to take the SAT, thereby requiring them to pay the fees which
are the main source of income for the ETS.
11. Correlations between the SAT score and
the performance of college students at the end of their first
year are of the order of 0.5 which means that the SAT scores account
for about 25 per cent of the variance of college results. What
a correlation of 0.5 means in practice is that if, for example,
we have to choose between two candidates, then if we know nothing
about them, we can only choose at random and then have a 50 per
cent chance of choosing the one best qualified. If we know the
SAT scores, we can then choose the one with the higher scorein
that case the chance of having chosen the one most likely to succeed
will have risen to 67 per cent. If the intrinsically better candidate
were from a group known to be disadvantaged by the SAT, we might,
by using the SAT as a basis for choice, end up selecting the better
candidate in less than 50 percent of the cases.
12. Thus a correlation of 0.5 is not very
impressive, and has usually been less than the correlation obtained
with school grades (which can account for about 33 per cent of
the variance). The SAT scores do however add to the power of the
school gradesthe optimum combination of school grades and
SAT can give correlations of more than 0.6, so accounting for
about 40 per cent of the variance (Morgan, 1989)
. Some now argue that the cost and the undesirable effects of
the SATs cannot be justified given that they do not add a great
deal to the predictive power of school grades (Crouse & Trusheim
1988). A British attempt to use a version of the SAT to check
its predictive value for degree results against that of the UK's
A-level examinations showed that it was no better, and that it
added very little predictive power when added to the A-level results
(Choppin & Orr, 1976).
13. Alongside these challenges to the basic
justifications of the SATs and similar tests, the last 20 years
have seen the emergence in the USA of increasingly severe criticism
from teachers and educational researchers who deplore the narrowing
and atomisation of learning that follows from the intensive training
given to help students increase their SAT scores (see eg Clifford
& O'Connor, 1992, Linn 2000). The flavour of this current
concern is strongly conveyed in a prophetic piece written by Brigham
in 1938, 12 years after he had helped to invest the test :
If the unhappy day ever comes when teachers point
their students towards these newer examinations, and the present
weak and restricted procedures get a grip on education, then we
may look for the inevitable distortion of education in terms of
tests. And that means that mathematics will be completely departmentalised
and broken into disintegrated bits, that the science will become
highly verbalised and that computation, manipulation and thinking
in terms other than verbal will be minimised, that languages will
be taught for linguistic skills only without reference to literary
values, that English will be taught for reading alone, and that
practice and drill in writing of English will disappear.
(Quoted by Lemann 1999, pp 40-41)
14. The SAT always relied for its viability
on convincing tertiary institutions of its value for their selection
processes, so that as many institutions as possible would require
applicants to take the testand to pay the fees involved.
The recent article in the New York Times that is attached herewith
reports on a trend for colleges, including some of the most prestigious
private colleges, to abandon the requirement that applicants take
the SAT, on the grounds that it adds little in predictive validity
to school grades, whilst distorting the educational preparation
of future students. It must be noted that these decisions are
being made in a country which has no national curriculum, and
no national systems of testing and certification, such matters
being in the hands of thenow 50states for them to
handle in their own, and diverse, ways.
Professor Paul Black and Professor Dylan Wiliam
BOND, L (1989) The Effects of Special Preparation
on Measures of Scholastic Ability pp 429-444 in Linn, R L (ed)
Educational Measurements (3rd edn) (New York, Macmillan).
CHOPPIN B & ORR, L (1976) Aptitude Testing
at 18+ (Windsor, NFER Publishing Co Ltd)
COLE, N S & MOSS, PA (1989) Bias in Test
Use pp 201-219 in Linn, R L (ed.) Educational Measurement (3rd
edn) (New York, Macmillan).
COLLEGE BOARD (1999). College Board Seniors
National Report 1999 (from FairTest web-site http://fairtest.org/univ/99SAT%20Scores.html).
CROUSE, J & TRUSHEIM, D (1988) The case
against the SAT (Chicago, University of Chicago Press).
GARDNER, H (1993) Multiple intelligences : the
theory in practice. New York : Basic Books.
GIFFORD, B R O'CONNOR, MC (eds.) (1992) Changing
Assessments : Alternative Views of Aptitude, Achievement and Instruction.
(Boston MA, Kluwer).
HANSON, F A (1993) Testing Testing : Social
Consequences of the Examined Life. Berkeley CA : University of
HEUBERT, J P & HAUSER, R M (eds) (1999)
High Stakes ; Testing for Tracking, Promotion and Graduation (Washington
DC, National Academy Press).
LEMANN, N (1999) The Big Test (New York, Farrar,
Strauss & Giroux).
LINN, R L (2000). Asessments and Accountability.
Educational Researcher. 29(2) pp 4-16.
MORGAN, R (1989) Analyses of the Predictive
Validity of the SAT and High School Grades From 1976 to 1985.
College Board Report No 89-7 (New York, College Entrance Examination
PELLEGRINO, J W, BAXTER, G P & GLASER,
R (1999) Addressing the "Two Disciplines" problem: Linking
Theories of Cognition with Assessment and Instructional Practice.
Review of Research in Education. 24 pp 307-353.
STERNBERG, R J (1997) Thinking Styles. Cambridge:
Cambridge University Press.
29 Not Printed. Back
Not Printed. Back
Not Printed. Back
The correlation coefficients quoted here were (in Morgan's analyses)
adjusted to allow for attenuation of range, ie whereas the data
available are, of necessity, limited to those who passed the admission
hurdles set by the tests, a theoretical model is used to estimate
what the correlation would have been if all had been admitted
to college regardless of their test score. Back