FORDISC Interobserver Comparability

The Contribution of South Asia to the Peopling of Australasia -->  Craniometrics -->  Interobserver comparability

 

Locations of populations measured by W.W.Howells

Interobserver Comparability in FORDISC 2.0

David Bulbeck
School of Archaeology and Anthropology, The Australian National University,
Canberra, Australia
Date of Document: November 2005
 

Introduction

Stephen Ousley and Richard Jantz are the developers of FORDISC, a very useful computer program in forensic anthropology. FORDISC 3 is currently available (see What's New in FORDISC 3?), but the version that I use and refer to is FORDISC 2 (Ousley and Jantz 1996). The particular capacity that interests me, as a physical anthropologist working on Australasian topics, is the component that allows the user to enter up to 21 cranial measurements, which are then compared with the 28 populations measured by William White Howells (see above map). The user enters the available measurements from the skull of interest, and FORDISC calculates the variance-covariance matrices, followed by linear discriminant analysis, to advise the user of the probability that the skull would belong to, or at least resemble, one or the other population measured by Howells (Ousley and Jantz 1996). Users can elect the Howells population(s) with which to compare their specimen of interest. The 21 available measurements are glabella-opisthocranion length, maximum biparietal breadth, basion-bregma height, nasion-basion length, nasion-prosthion length, biauricular breadth, nasion-bregma chord, bregma-lambda chord, lambda-opisthion chord, foramen magnum length, mastoid height, upper facial breadth, bizygomatic breadth, upper facial height, nasal height, nasal breadth, orbital height, orbital breadth, biorbital breadth, interorbital breadth, and external palate breadth.

One potential concern with this procedure is that the measurements recorded for the skull may not have been taken according to the techniques Howells had employed, perhaps producing spurious affinities. There would appear to be scope for interobserver error based on the measurement definitions in the manual that comes with FORDISC 2.0. In the case of upper facial breadth, whose acronym in FORDISC is UFBR, the observer is required to take a slightly different measurement, and the program brings Howells's (1973) fronto-malar breadth into comparability by adding 6 mm to his measurements (Stephen Ousley pers. comm. 2 December 2005). In the case of upper facial height (UFHT in FORDISC), the different definitions of prosthion between the FORDISC and Howells systems are corrected for in FORDISC 2.0 by adding 1.5 mm to Howells's measurements for upper facial height (NPH). Basion-prosthion length (BPL) would also be slightly differently defined in the two systems. However, both have been recorded during the The Contribution of South Asia to the Peopling of Australasia project, and any differences would appear to be negligible. The same would also apply to orbital breadth (OBB) and interorbital breadth (DKB), which are potentially affected by slightly different definitions of the dacryon landmark between the two systems, and to nasal height (measured in FORDISC to nasiospinale, which differs slightly from the definition given by Howells, who measured nasal height bilaterally to the base of the nasal aperture, and took the average of the two chords). (Note that FORDISC 3 takes a different approach by using both systems in parallel, allowing measurements taken according to Howells's definitions to be directly compared with Howells's original data - Stephen Ousley pers. comm. 2 December 2005).

To check on the comparability of the measurement specifications followed by Howells and by the recorder whose data are being entered into FORDISC 2.0, I recommend entering the sampled population averages. As will be shown, when the averages for a large sample of recent human crania are entered into FORDISC, we can expect a high "typicality probability" to be calculated for at least one of the 28 Howells' populations. We can expect this for two reasons: first, Howells measured populations from virtually across the world (though South Asia was not covered); and secondly, the world's populations by and large do not have very different cranial measurements from each other, at least in terms of their average values for the 21 measurements of relevance here. Hence, when FORDISC calculates the variance-covariance matrices that allow estimation of the "typicality probability" that those measurements would be found in a Howells' population, and the measurements entered are sample means rather than a specimen's individual values, at least one healthy "typicality probability" (in the order of 0.6 or more) should emerge. When all the "typicality probabilities" are small, we have very solid grounds for suspecting that inter-observer measurement error is obfuscating the comparisons.

In the illustrative tests that follow, the male averages will be used, because two of the Howells populations were measured for male skulls only (see above map).

Tests on Australian populations

Our first test compares the results of entering into FORDISC W.W. Howells's (1973, 1989) own means for his Lake Alexandrina (predominantly Swanport) South Australian males, and those published by Peter Brown (1989) for Swanport. Twenty measurements from Howells are available, i.e. the full 21 less upper facial breadth (UFBR), while Brown has published data on 18 of the measurements (the full 21, less foramen magnum length, mastoid height and interorbital breadth). When we look at the results from entering Howells's means (Table 1), we note the expected result of a 1.000 typicality probability that the Lake Alexandrina Australians are, indeed, the Lake Alexandrina Australians. Perhaps less expected is the finding that a skull with the same measurements as Howells's Lake Alexandrina means would also be very typical of many other populations, not only southwest Pacific populations (Tasmanians, Tolai) but also the Teita, Ainu, Zulu and, at a pinch, populations in every permanently inhabited continent in the world. When we now compare the results from entering Brown's means, we first note that they could hardly be closer to Howells's as far as South Australians are concerned. Note that Brown's measurement definitions equate to those in the FORDISC 2.0 manual for five of the six measurements flagged above as potential sources of interobserver error, demonstrating the robustness of the FORDISC 2.0 results notwithstanding minor discrepancies between measurement prescriptions. Interestingly, if there is any noticeable effect attributable to interobserver error, it would be that Brown's Swanport means have noticeably lower typicality probabilities than Howells's Lake Alexandrina means with every population in the world apart from Australians (and Buriats). In that sense, any error arising from the use of Brown's measurements would be to make the specimen appear more Australian than Howells's measurements would (or perhaps slightly more like the Tolai, referring to the posterior probabilities).

Table 1. Probabilities for South Australian males (means) measured by Howells and by Brown
Howells population Typicality Probabilities Posterior Probabilities
Howells Brown Howells Brown
South Australians 1.000 0.995 0.973 0.975
Tasmanians 0.981 0.534 0.016 0.005
New Britain Tolai 0.960 0.724 0.009 0.019
Teita (Africa) 0.829 0.157 0.001 0.000
Ainu 0.733 0.048 0.001 0.000
Zulu 0.712 0.107 0.001 0.000
Norse 0.398 0.065 0.000 0.000
Zalavar (Hungary) 0.360 0.052 0.000 0.000
Dogon (Africa) 0.285 0.004 0.000 0.000
Santa Cruz (USA) 0.244 0.109 0.000 0.000
Taiwan Atayal 0.233 0.005 0.000 0.000
Guam Chamorros 0.229 0.034 0.000 0.000
Egyptians 0.224 0.015 0.000 0.000
Eskimos 0.210 0.175 0.000 0.000
South Japan 0.179 0.006 0.000 0.000
Easter Island 0.175 0.003 0.000 0.000
North Japan 0.172 0.005 0.000 0.000
Philippines 0.156 0.003 0.000 0.000
Bush (San) 0.154 0.007 0.000 0.000
Yauyos Peruvians 0.127 0.008 0.000 0.000
Moriori 0.115 0.019 0.000 0.000
Mokapu Hawaiians 0.086 0.004 0.000 0.000
Anyang Chinese 0.055 0.000 0.000 0.000
Arikara (USA) 0.047 0.003 0.000 0.000
Andaman Islands 0.043 0.000 0.000 0.000
Hainan Chinese 0.041 0.000 0.000 0.000
Berg (Austria) 0.028 0.003 0.000 0.000
Buriats 0.000 0.000 0.000 0.000

Our next illustrative test (Table 2) utilises the Swanport average measurements published by Michael Pietrusewsky (1984). Eighteen measurements are available for testing, viz. the usual 21 less bizygomatic breadth, foramen magnum length and lambda-opisthion chord. In addition, Richard Jantz (pers. comm., 9 November 2005) has advised me that Pietrusewsky's interorbital breadth measurements are not comparable to those of Howells, and should not be employed when using FORDISC. This advice provides an opportunity to test the effects of interobserver measurement error.

When we enter Pietrusewsky's means excluding interorbital breadth, we clearly obtain an affinity first and foremost with Howells's Lake Alexandrina Australians. This holds true even if the typicality and posterior probabilities are both slightly lower than they were using Howells's or Brown's averages (Tables 1 and 2). This minor discrepancy could arise from causes as simple as a different selection of specimens (given that the sampled populationss are not quite identical), different sexing criteria, rounding off errors (as data can only be entered into FORDISC as integer values), or slight variations in measuring techniques. Further, the distinctly southwest Pacific status of South Australians is strongly reconfirmed through the use of Pietrusewsky's data (Table 2). This latter point holds even when we include interorbital breadth, but now we would have to examine the posterior probabilities to see it (and even then, Tasmanians rather than Australians would emerge as the closest population to Pietrusewsky's Swanport sample). The important point is that the typicality probabilities are all reduced to 0.000 by the inclusion of this one, seemingly innocuous measurement which, as previously noted by Richard Jantz, has been measured in a systematically different way by Howells and by Pietruswesky.

Table 2. Probabilities for Swanport Australian males (means) measured by Pietrusewsky
Howells population Typicality Probabilities Posterior Probabilities
Without DKB With DKB Without DKB With DKB
Note. DKR stands for interorbital breadth.
South Australians 0.749 0.000 0.890 0.340
Tasmanians 0.429 0.000 0.090 0.434
New Britain Tolai 0.253 0.000 0.020 0.140
Zulu 0.029 0.000 0.000 0.007
Teita (Africa) 0.016 0.000 0.000 0.067
Eskimos 0.010 0.000 0.000 0.000
Santa Cruz (USA) 0.009 0.000 0.000 0.004
Zalavar (Hungary) 0.008 0.000 0.000 0.001
Norse 0.007 0.000 0.000 0.006
Ainu 0.004 0.000 0.000 0.000
Bush (San) 0.004 0.000 0.000 0.000
Egyptians 0.003 0.000 0.000 0.000
Guam Chamorros 0.002 0.000 0.000 0.000
South Japan 0.001 0.000 0.000 0.000
Moriori 0.001 0.000 0.000 0.000
Dogon (Africa) 0.000 0.000 0.000 0.000
Taiwan Atayal 0.000 0.000 0.000 0.000
Easter Island 0.000 0.000 0.000 0.000
North Japan 0.000 0.000 0.000 0.000
Philippines 0.000 0.000 0.000 0.000
Yauyos Peruvians 0.000 0.000 0.000 0.000
Mokapu Hawaiians 0.000 0.000 0.000 0.000
Anyang Chinese 0.000 0.000 0.000 0.000
Arikara (USA) 0.000 0.000 0.000 0.000
Andaman Islands 0.000 0.000 0.000 0.000
Hainan Chinese 0.000 0.000 0.000 0.000
Berg (Austria) 0.000 0.000 0.000 0.001
Buriats 0.000 0.000 0.000 0.000

How do other Australian samples compare with Howells's Australian sample? To investigate this question, I have entered into FORDISC the averages published by Brown (1989) for recent Murray Valley and Coobool Creek (terminal Pleistocene Murray Valley) male Aborigines, and the averages of the recent male Aborigines (mostly from the northern two-thirds of the continent) published by Halina Milicerowa (1955). The 18 measurements published by Brown have been mentioned above, while the 19 measurements published by Milicerowa exclude upper facial breadth and mastoid height from the possible suite of 21 variables.

For both recent populations, South Australians emerge as the closest population, but the Tolai are hot on their heels (Table 3). In the case of of Coobool Creek means, the roles are reversed and the Tolai appear as the closest population. In all three cases, all typicality probabilities compared to Howells's southwest Pacific populations are healthy (0.325-0.908), suggesting that any role of interobserver error in the results would be minimal. (The Coobool Creek typicality probabilities are lower than those of the two recent samples, presumably because the Coobool Creek sample is relatively small or because we are dealing with an ancient population.) Howells's three southwest Pacific populations are also the closest populations to all three of the populations tested here, with the exception of the slightly closer relationship shown by Milicerowa's sample with the Zulu compared to Tasmanians. In fact, looking closely at the results, Milicerowa's sample appears slightly closer to Africans than either of the Murray Valley samples do (or Swanport for that matter), but there would be no reason to attribute that result to interobserver error rather than population differences (within Australia). In summary, as our tested Australian samples move further away from Howells's reference Lake Alexandrina sample in space and in time, there is a natural tendency for the samples to align less clearly with Lake Alexandrina (in these particular cases, to veer towards Howells's Tolai reference sample).

Table 3. Probabilities for Murray Valley, Coobool Creek, and Milicerowa's Australian Aboriginal males (means)
Howells population Typicality Probabilities Posterior Probabilities
Murray Valley Coobool Creek Milicerowa Murray Valley Coobool Creek Milicerowa
Australia 0.908 0.353 0.794 0.580 0.256 0.475
Tolai 0.875 0.441 0.783 0.396 0.522 0.431
Tasmanians 0.458 0.325 0.378 0.015 0.199 0.019
Teita (Africa) 0.267 0.022 0.375 0.003 0.001 0.018
Ainu 0.230 0.015 0.114 0.002 0.000 0.001
Zulu 0.213 0.122 0.506 0.002 0.017 0.052
Eskimos 0.180 0.004 0.020 0.001 0.000 0.000
Zalavar 0.111 0.020 0.089 0.000 0.000 0.000
Norse 0.068 0.008 0.077 0.000 0.000 0.000
Guam 0.055 0.064 0.056 0.000 0.000 0.000
South Japan 0.055 0.006 0.052 0.000 0.000 0.000
Taiwan Atayal 0.048 0.001 0.033 0.000 0.000 0.000
Santa Cruz 0.047 0.016 0.026 0.000 0.000 0.000
North Japan 0.039 0.007 0.017 0.000 0.000 0.000
Egyptians 0.034 0.002 0.083 0.000 0.000 0.000
Anyang Chinese 0.032 0.004 0.010 0.000 0.000 0.000
Dogon (Africa) 0.030 0.004 0.173 0.000 0.000 0.002
Moriori 0.026 0.006 0.025 0.000 0.000 0.000
Philippines 0.025 0.008 0.029 0.000 0.000 0.000
Bush (San) 0.017 0.000 0.038 0.000 0.000 0.000
Easter Island 0.013 0.004 0.024 0.000 0.000 0.000
Hainan Chinese 0.013 0.001 0.006 0.000 0.000 0.000
Arikara (USA) 0.013 0.001 0.003 0.000 0.000 0.000
Peruvians 0.008 0.002 0.011 0.000 0.000 0.000
Hawaiians 0.005 0.015 0.009 0.000 0.000 0.000
Andaman Islands 0.002 0.000 0.024 0.000 0.000 0.000
Berg (Austria) 0.002 0.001 0.001 0.000 0.000 0.000
Buriats 0.000 0.000 0.000 0.000 0.000 0.000

Tests on non-Australian populations

We have already observed that Michael Pietrusewsky's measurements, with the exclusion of interorbital breadth, may be entered into FORDISC without fear of interobserver error distorting the results. I have trialled the male means of the 17 relevant measurements for all eleven Melanesian populations published by Pietrusewsky (1984), and in all cases healthy typicality probabilities, between 0.127 and 0.909, result in the comparisons with Howells's southwest Pacific populations. The a priori expectation that the Tolai would be the closest population to Pietrusewsky's Melanesians proved true in the majority of the tests, but in four tests Australians were the closest population (and Tasmanians always in second or third place). Table 4 gives the details of two of these tests, along with a similar test I carried out on Pietrusewsky's (1984) male South Moluccan means. The South Moluccans interestingly appear closest to Tasmanians and then Australians, with the Tolai in fifth spot (indicating that the southwest Pacific unity is beginning to break up as we move away from the region).

Table 4. Probabilities for New Ireland, Purari Delta, and South Moluccan males (means)
Howells population Typicality Probabilities Posterior Probabilities
New Ireland Purari Maluku New Ireland Purari Maluku
New Britain Tolai 0.909 0.739 0.380 0.620 0.352 0.037
South Australians 0.703 0.806 0.470 0.102 0.598 0.072
Tasmanians 0.761 0.342 0.780 0.157 0.019 0.671
Guam Chamorros 0.419 0.052 0.424 0.014 0.000 0.052
Zalavar (Hungary) 0.470 0.063 0.419 0.020 0.000 0.050
Eskimos 0.480 0.068 0.066 0.021 0.000 0.001
South Japan 0.438 0.095 0.052 0.241 0.001 0.010
Teita (Africa) 0.416 0.172 0.126 0.013 0.003 0.002
Zulu 0.322 0.369 0.155 0.006 0.024 0.004
Philippines 0.332 0.031 0.361 0.007 0.000 0.031
Taiwan Atayal 0.336 0.014 0.184 0.007 0.000 0.005
Mokapu Hawaiians 0.211 0.015 0.292 0.002 0.000 0.017
Egyptians 0.290 0.078 0.210 0.005 0.001 0.007
Yauyos Peruvians 0.205 0.026 0.242 0.002 0.000 0.010
Norse 0.210 0.018 0.189 0.002 0.000 0.006
Andaman Islands 0.197 0.025 0.173 0.002 0.000 0.005
Santa Cruz (USA) 0.180 0.026 0.181 0.001 0.000 0.005
Berg (Austria) 0.061 0.001 0.180 0.000 0.000 0.005
Ainu 0.177 0.023 0.082 0.001 0.000 0.001
Arikara (USA) 0.123 0.003 0.177 0.001 0.000 0.005
North Japan 0.171 0.021 0.083 0.001 0.000 0.001
Hainan Chinese 0.150 0.005 0.068 0.001 0.000 0.001
Moriori 0.115 0.013 0.129 0.000 0.000 0.002
Easter Island 0.103 0.011 0.032 0.000 0.000 0.000
Anyang Chinese 0.065 0.002 0.038 0.000 0.000 0.000
Dogon (Africa) 0.060 0.045 0.048 0.000 0.000 0.000
Bush (San) 0.014 0.004 0.030 0.000 0.000 0.000
Buriats 0.000 0.000 0.001 0.000 0.000 0.000

Our final illustrative test involves the male Peninsula Malay skulls held at the American Museum of Natural History and measured by Daniel Rayner and Adam Lauer, and the Punjab males measured by Pathmanathan Raghavan in various Human Anatomy departments in the state of Punjab, India, for the The Contribution of South Asia to the Peopling of Australasia project. In both cases, all 21 relevant measurements (taken according to the FORDISC definitions) are available. If these measurements are comparable with those of Howells (either directly, or as rendered through FORDISC), we would expect high typicality probabilities, of around 0.6 or more, to emerge with at least one of the Howells reference populations. The Punjab skulls are a particularly good test case, because Howells's world coverage specifically missed South Asia (see the map at the top of this web page), so if we do obtain healthy typicality probabilities this could not be attributed to a specific similarity to a related population measured by Howells. In fact the closest affinities to the Punjab skulls emerge, as expected, with populations in the general vicinity, specifically the Zalavar of Hungary, Gizeh Egyptians, the Teita of East Africa, and Andaman Islanders. In the case of the Malays, they are considered biologically close to Filipinos, and so the expected result would be a specific affinity with the male Philippine skulls measured by Howells (1989). All of our expectations are indeed met (Table 5). Of particular note is the very high typicality probability (0.975) between Malays and Howells's Filipinos, an outcome that would not be expected if interobserver measurement error were playing any substantive role.

Table 5. Probabilities for Malay and Punjab males (means)
Howells population Typicality Probabilities Posterior Probabilities
Malays Punjab Malays Punjab
Philippines 0.975 0.285 0.818 0.016
Hainan Chinese 0.765 0.162 0.045 0.003
Mokapu Hawaiians 0.730 0.086 0.033 0.001
South Japan 0.689 0.359 0.024 0.034
Andaman Islands 0.668 0.479 0.020 0.093
Taiwan Atayal 0.595 0.394 0.011 0.046
North Japan 0.590 0.117 0.011 0.002
Guam Chamorros 0.563 0.108 0.009 0.001
Zalavar (Hungary) 0.498 0.614 0.005 0.271
Arikara (USA) 0.487 0.088 0.005 0.001
Berg (Austria) 0.487 0.080 0.005 0.001
Yauyos Peruvians 0.435 0.147 0.003 0.003
Anyang Chinese 0.432 0.106 0.003 0.001
Santa Cruz (USA) 0.404 0.128 0.002 0.002
Tasmanians 0.316 0.183 0.001 0.005
Dogon (Africa) 0.310 0.144 0.001 0.003
Buriats 0.248 0.000 0.001 0.000
Egyptians 0.236 0.589 0.000 0.223
Zulu 0.225 0.459 0.000 0.080
Moriori 0.205 0.064 0.000 0.000
Norse 0.187 0.401 0.000 0.049
Ainu 0.140 0.062 0.000 0.000
Teita (Africa) 0.089 0.521 0.000 0.131
New Britain Tolai 0.080 0.236 0.000 0.009
Easter Island 0.026 0.088 0.000 0.001
Eskimos 0.016 0.059 0.000 0.000
South Australians 0.011 0.146 0.000 0.003
Bush (San) 0.006 0.319 0.000 0.023

Conclusion

This document proposes a test for inferring whether interobserver differences in measurement technique are substantially influencing the results to be obtained from FORDISC analysis (in particular, the analysis that compares the entered data with the populations measured by Howells). The user enters the average measurements of the population of interest and examines the typicality probabilities produced by FORDISC. If these are 0.6 or greater for one or more of the Howells populations, the user can be confident that the recorded measurements have been taken using a technique comparable to Howells (directly, or after conversion through FORDISC). The Swanport-South Australian (Table 1) and Malay-Filipino (Table 5) comparisons in particular indicate the capacity of the FORDISC program to neutralise most potential sources of interobserver error. If very low typicality probabilities emerge, however, then one (or more) of the measurements has been taken using a non-comparable technique (see Table 2).

Of course, FORDISC has been designed as an aid to forensic anthropology, in particular to help identify the population affinities of a single skull or, at most, a small collection of skulls. Population means, it might be contended, would then be beside the point. However, the "means test" proposed here could still be applied to a representative sample of other skulls measured by the same observer who measured the skull(s) under investigation. The implications of the test for interobserver comparability should be valid regardless of which population is tested. If in fact the observer who measured the skull(s) of interest has not had prior experience measuring a large sample of skulls, so preventing implementation of my "means test", then that observer's measurements should be regarded with suspicion in any case, given that reliable craniometry (like any other scientific training) requires study, training and experience.

A useful by-product of the research described here is the fact that the tests carried out on means can provide useful clues as to the affinities to be expected when individual crania in that population are tested with FORDISC. As an example, the basic craniometric similarity of all southwest Pacific populations emerged as a consistent result from the FORDISC tests; yet, as our focus shifted to eastern Indonesia (the Moluccas), the Malay world and finally Punjab, Howells's southwest Pacific populations became increasingly less relevant to the comparisons. As another example, we may predict, from the trials with average measurements, the "mis-classifications" that may be expected. Nor is there any reason why these mis-classifications should not be useful forensic information in themselves. For instance, if we were investigating an unknown skull in Australia, but one of suspected Aboriginal affinity, then a Tolai, Tasmanian, or even a Teita or Zulu FORDISC result could be interpreted as confirmation of that skull's indigenous Australian status.

Acknowledgments

My thanks to Dr Pathmanathan Raghavan, and to Daniel Rayner and Adam Lauer, who collected craniometric data as part of the The Contribution of South Asia to the Peopling of Australasia project and have allowed it to be utilised here. Professor Maciej Henneberg kindly provided me with a copy of Milicerowa (1955). Richard Jantz and Stephen Ousley provided some useful pointers on this web page.

References

Howells, W.W. 1973. Cranial Variation in Man. Cambridge, Mass.: Peabody Museum.

Howells, W.W. 1989. Skull Shapes and the Map. Cambridge, Mass.: Peabody Museum.

Milicerowa, H. 1955. Crania Australica. Wroclaw: Polska Akademia Nauk Zaklud Antropologii, Materialy I Prace Antropologiczne Nr 6.

Ousley, S.D. and R.L. Jantz. 1996. FORDISC 2.0: Personal Computer Forensic Discriminant Functions. Knoxville: University of Tennessee.

Pietrusewsky, M. 1984. Metric and Non-Metric Cranial Variation in Australian Aboriginal Populations Compared with Populations from the Pacific and Asia. Canberra: Australian Institute of Aboriginal and Torres Strait Islander Studies.