“Scott's Pi”的意思、由来-开放百科全书

词条

Scott's Pi

释义

See also
References

Scott's pi (named after William A. Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance.

Scott's pi is similar to Cohen's kappa in that they improve on simple observed agreement by factoring in the extent of agreement that might be expected by chance. However, in each statistic, the expected agreement is calculated slightly differently. Scott's pi makes the assumption that annotators have the same distribution of responses, which makes Cohen's kappa slightly more informative. Scott's pi is extended to more than two annotators in the form of Fleiss' kappa.

The equation for Scott's pi, as in Cohen's kappa, is:

However, Pr(e) is calculated using joint proportions. A worked example is given below.

Confusion matrix for two annotators, three categories {Yes, No, Maybe} and 45 items rated (90 ratings for 2 annotators):

Yes	No	Maybe	Marginal Sum
Yes	1	2	3	6
No	4	5	6	15
Maybe	7	8	9	24
Marginal Sum	12	15	18	45

To calculate the expected agreement, sum marginals across annotators and divide by the total number of ratings to obtain joint proportions. Square and total these:

Ann1	Ann2	Joint Proportion	JP Squared
Yes	12	6	(12 + 6)/90 = 0.2	0.04
No	15	15	(15 + 15)/90 = 0.333	0.111
Maybe	18	24	(18 + 24)/90 = 0.467	0.218
Total	0.369

To calculate observed agreement, divide the number of items on which annotators agreed by the total number of items. In this case,

Given that Pr(e) = 0.369, Scott's pi is then

References

Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 19(3), 321-325.
Krippendorff, K. (2004b) “Reliability in content analysis: Some common misconceptions and recommendations.” in Human Communication Research. Vol. 30, pp. 411-433.

1 : Inter-rater reliability

随便看

开放百科全书收录14589846条英语、德语、日语等多语种百科知识，基本涵盖了大多数领域的百科知识，是一部内容自由、开放的电子版国际百科全书。

See also

References