Peter H. Schonemann

                          Research Interests

 
Note:  numbers in brackets refer to Publications list.

Multivariate Statistical Methods

Early work on matrix derivatives [1, 45]. Since differentiation is a linear map, partial differentiation  interfaces with matrix notation. This  simplifies work in least squares and maximum likelihood estimation [3, 6, 7, 8, 10]. Some  minor papers on the non-central F-distribution [27, 66] and the  non-null distribution  of intraclass correlations (which is central F). Intraclass correlations provide basic data for twin research ( see Quantitative Behavior Genetics, below).

Factor Analysis

Early work on procrustes methods (least squares maps T for given A, B in  B = AT+E, to minimize the sum of squared residuals in E, usually under some constraint on E, such as, in this case, orthogonality  [1, 3, 6, 8, 17, 21]) and methods for machine rotation [4] to simple structure.
After finally having been made aware of it (by Heerman at a meeting of the Psychometric Society in 1964,  after obtaining a Ph.D. in psychometrics at the UofI),  intensive study of the implications and ramifications of  factor indeterminacy:  In the classical factor model,  the number of factors always exceeds number of observed variables, so that no unique solution for "factor scores" exists  [11, 12, 20, 24, 26 ].  This  defect of the factor model vitiates any claims that factors provide an  objective basis for defining " intelligence", which had been Spearman's declared objective [40, 47, 52, 57, 75, 76]. The same indeterminacy affects models of the LISREL type.
E.B. Wilson, an acknowledged scholar of  first rank, first  drew attention to this issue in 1928. It was subsequently discussed by numerous  competent psychometricians and statisticians (see [24] for an unsanitized history of this problem). During the Thurstone era of classical psychometrics, this whole problem area faded into oblivion, until it was eventually revived again in the early 70s. It is still the subject of debate today  [79, 80].  Most importantly, it bears directly on  Jensen's specious claim that Spearman's g  factor provides "an operational definition of intelligence".

Papers [26, 40, 52] highlight one of several peculiar consequences this indeterminacy implies for the classical the factor model :

The factors of the factor model can always be chosen in such a way that they predict any criterion whatever perfectly (in a multiple regression sense).
        For example, regardless of the observed variables from which the factors are derived,  they
        can  always  be so chosen as to predict the dates of Easter Sunday perfectly [52].
 
In paper  [32] it is shown that the power of maximum likelihood factor analysis is poor , barely exceeding twice the alpha level for moderate sample sizes 100-200. Paper [20] presents a comprehensive discussion of an alternative to the factor  model (Regression component analysis). This methodology is not afflicted by any indeterminacy problems, and in practice it gives very similar numerical results. However, in contrast to the factor model, in the absence of further constraints it does not pose as a falsifiable theory, but rather is a purely descriptive data reduction method of the data at hand.

Multidimensional Scaling

Some early work on Thurstonian scaling  [5, 10], Guttman's simplex theory [7], and Coombsian metric multidimensional unfolding [9].  An algebraic solution for Horan's subjective metrics model (which underlies "INDSCAL" [15]) was subsequently extended into a computationally efficient and robust scaling algorithm (COSPA, [25, 29]).  Later  [9] was extended (with Wang) into a multidimensional scaling model for preference data that combines the Bradley-Terry-Luce model with  Coombs' unfolding model  [14, 19]. A common characteristic all these metric MDS models share is that they all have exact algebraic solutions and are, at least in principle, testable [35, 39] .
Later  empirical work  with similarity data on rectangles followed up on Krantz and Tversky's (1970) lead [36, 37]).  On closer scrutiny we found that dissimilarity ratings often  violate some basic assumptions required by the conventional metric models, notably  the Archimedean axiom, which underlies all Minkowski metrics, in particular, the euclidean and the city-block metric [38, 41, 42, 46, 48, 59, 73].
 
Naively, one might think that  both scaling and test theory ought to relate to measurement theory  in some way since all three profess to be concerned with the problem of assigning numbers to objects or subjects. Our earlier, still relatively upbeat thoughts on these issues  are summarized in [33, with I. Borg].  However, as  time went on, and the anticipated empirical support of axiomatic measurment theories  never materialized,  we found it increasingly harder to maintain our earlier optimism about the prospective utility of such abstract theories. Eventually, this scepticism  extended to mathematics more generally as a tool for solving non-ficticious problems  in psychology [73].

 

IQ Controversy

(a)  Problem of defining "intelligence":

In his controversial revival  of the eugenic  traditions of the 20s, Arthur Jensen (1969) appealed explicitly to Spearman's factor model  as a vehicle for defining "intelligence".  However, in view of the factor indetermincay problem (see above, factor analysis), these high hopes are doomed to failure [40, 47, 52, 57, 83] . Recourse to concrete IQ tests is equally unsatisfactory, because different tests are often quite poorly correlated. In fact , this was the reason why Spearman had postulated his factor model in the first place. From a purely pragmatic point of view one further finds that, contrary to what some authors who should know better have claimed, conventional IQ tests  are surprisingly poor predictors of  most criteria of practical interest, including scholastic achievement. For example, the SAT - a descendent of conventional "verbal" IQ tests such as the Army Alpha - consistently performs worse  than easily available  previous grades as a predictor of subsequent grades. This was known, though not advertised, since the 20s. For long range criteria (such as graduation or GPA at graduation), the SAT usually accounts for less than 5% of the criterion variance (Humphreys, 1967, Donlon, 1984). As one might expect, the picture dims further for the GRE: In two recent, large scale, validity studies, Horn and Hofer (undated) and Sternberg (1998 ) found that the validities of the GRE for predicting successful completion of graduate training were effectively zero.
 

(b) Spearman's Hypothesis

In the early 80s, Jensen (in Bias in Mental Testing, 1980) revived a casual observation Spearman had made in 1927:  He had reported that subtests most highly loaded on  his  general intelligence factor g showed the largest Black/White contrasts (Spearman Hypothesis). Jensen, after substituting the largest principal component (PC1) for g, interpreted this as new, compelling evidence for the existence of g which seemed to corroborate his central claim that Blacks, on average, are deficient in g compared to Whites,  and that these differences are primarily genetic rather than not cultural, in origin.
In [43]  I drew attention to the fact that this result can be explained as an artifact which has nothing to do with Blacks or g. Rather it arises with any data, including randomly generated data, if they exhibit a sufficiently large mean difference vector.  William Shockley subsequently challenged this interpretation. He correctly pointed out that it was limited to a positive relation between the mean differences and the weights of the PC1 of the pooled group, while  most of Jensen's data showed such  positive correlationswithineachgroup. I, therefore, extended  my results to this more general situation by invoking joint  multinormality as an additional condition. I then showed mathematically, geometrically, empirically, and by random simulation, the following result:
If one splits a multinormal distribution of positively intercorrelated variables into a high and a low group, one  finds
(a) that the mean differences between both groups are monotonically related to the loadings on the largest principal component
(b)  if both groups are of equal size, then the cosine between both vectors will not just be large but 1, while,
(c)  if the groups are of different size,  the effect will be more pronounced for the larger group [68, 82, 83].
  • Thus, Spearman's Hypothesis does not warrant any of the farreaching claims Jensen and some of his followers (e.g., Herrnstein and Murray) have attached to it. In particular, it does not validate the existence of a general ability g as Jensen has asserted. Nor does it have any bearing on the race question.
  • Publication [83] is a Target Article on this topic, followed by  numerous  commentaries. Most of them endorse the stringency of the above reasoning. For a chronicle of the incredibly protracted history of this paper, see [86].

    (c) Hit-Rate Bias

    In  view of the severe implications  of a mistaken interpretation of discrepancies in IQ performance of various ethnic groups, much attention has been focused on the question whether these discrepancies might conceivably  be the result of  a bias  favoring some groups over others (perhaps also, males over females). A. Jensen (1980)  devoted a whole book to this issue, befittingly entitled Bias in Mental Testing. He concluded that such worries are unwarranted so far as the Black/White discrepancy is concerned because, if anything, conventional IQ tests  overpredict Black criterion performance.

    With this reasoning Jensen followed tradition in adopting an institutional point of view (e.g. that of universities) over that of the applicants, by focussing on  regression equations and  validity coefficients. From an institutional point of view, a test is useful if it improves the composition of the subgroup that is eventually hired or admitted on the basis of superior test performance.  From this point of view, one can  show that even a test with low validity has some merit, as long as the hiring institution employs a sufficiently stringent admission quota (by raising the test cut-off).

    However, this narrow perspective ignores two important aspects of the bias problem:

    (a)  the base-rate problem:

     By solely focusing on the regression equation and correlations ( predictive validities), the traditional approach to the bias problem ignores the fact that a valid test can be worse than useless  if the base-rates (= proportion of qualified candidates) are sufficiently skewed. To illustrate this briefly, suppose the base-rate of  a clinical syndrom (e.g., schizophrenia) is  1%. In this case we could  achieve 99% correct predictions by simply predicting that everybody is "normal", regardless of test performance.  For a test to achieve such a high degree of correct prediction, it would have to have an unrealistically high predictive validity.

    More generally, validity coefficients by themselves (in the absence of knowledge of base rate and quota), are meaningless as indicators of the pratical utility of a test.
    Though this was already known to Meehl and Rosen (1955), it has been conveniently ignored in the meantime.
     

    (b) the interests of the testee (as opposed to that of the hiring or admitting institution):

    Once the bias problem is cast into the language of prediction error frequencies (rather than validities and regression equations,  disregarding base-rates),  it becomes  apparent that, to the same extent an institution benefits from use of a low validity test  by tightening the (admission) quota, qualified applicants will suffer because an increasingly larger proportion of them is wrongly rejected as a result of imperfect test validity. This follows directly from Bayes' well-known theorem that  relates two types of conditional probabilities. In the present context, they have the following concrete interpretation:

    The conditional probability that a candidate will be successful (e.g., graduate), given that he passes the test, is called the success ratio (SR) of the test. Following standard terminology of signal detection theory, let HR denote the hit-rate, which is the reverse conditional probability that a candidate  passes the test if he is qualified. Finally, let Q denote the (admission) quota, and BR the base-rate (the proportion of qualified candidates in the unselected population.
    Then Bayes' Theorem asserts:
                                                                     SR = HR x BR/Q,
    which expresses the conventional instituional point of view:  The smaller we make the quota (by raising the test cut-off), the larger will be the success rate, because Q shows up in the denominator.
    However, if we adopt the point of view of  qualified candidates, then we find (by  solving the above equation for the hit-rate):
                                                                    HR = SR x Q /BR.
    Now Q appears in the numerator. Hence, the  tighter the admission quota, the smaller will be the hit-rate, the chance of the qualified student to be admitted.
    Although these simple relations have been known for a long time, they have been consistently ignored or downplayed in the mental test literature. In particular, so far as I know, few if any systematic investigations of  actual  hit-rates as a function of validity, base-rate, and quotas  seem to have been reported in the past. Nor have the test experts shown  much interest  in the problem whether the tests may be biased against certain groups, e.g., Blacks, in terms of hit-rates.
    In [78]  we (with Thompsen) derive simple approximations for hit-rates and tabulate them as a function of validity, quota, and base-rate.  We also derive a  bound on hit-rates,
                                                                       HR < Q/BR,
    which says that tightening the quota inevitably penalizes the qualified students by lowering the hit-rate.
    Finally, we review a number of data sets to evaluate the  hit-rates of the SAT and ACT for different ethnic groups. We also assess the hit-rate bias, i.e., the extent to which conventional tests favor or discriminate against subgroups in terms of the chances that a qualified student  passes the test. We found that conventional admission tests discriminate against Blacks, and  further, that this bias increases as the admission quotas are tightened. In [82] these results are further refined and extendend to include formulae for estimating the minimum validity needed for given quota and base rate, so that use of the test improves the percentage of correct decisions over random admissions. The bottom line is that, in the realistic validity range (.3 - .4)  for longrange criteria of practical interest, no test improves over random admission in terms of overall percent of correct decisions if one of the two base rates exceeds .7.

    Quantitative Behavior Genetics

    One reason for the astonishing persistence of the IQ myth in the facce of overwhelming prior and posterior odds against it may be the unbroken chain of excessive "heritability" claims for "intelligence", which IQ tests are supposed to "measure". However, if  "intelligence" is undefined, and Spearman's g is beset with numerous problems, not the least of which is universal (and by now tacitly though grudgingly acknowledged) rejection of Spearman's model by the data, then how can the heritability of "intelligence" exceed that of milk production of cows and egg production of hens?
    These problems are addressed in a series of more recent publications, [54, 60, 61, 62, 63, 70, 71, 72, 75, 81]. In [70] it is shown that a once widely used "heritability estimate" is mathematically unsound, because Holzinger had made a mistake in his derivations which had been overlooked for decades. Another such estimate, though mathematically valid, never fits any real data. This should have been obvious from the start because it typically produces an inordinate number of  inadmissible estimates (e.g., proportions larger than 1). These absurd results nevertheless found their way into print without comment or challenge. The same estimate also produces excessive "heritabilities" for variables which plainly have nothing to do with genes. For example, the "heritability" of answers to the question: "Did you have your back rubbed last year?" turns out to be 92% for males and 21% for females [81].


    Click here to return to Home Page.

    1.28.03