请输入您要查询的百科知识:

 

词条 Mosaic plot
释义

  1. Example

     Mosaic plot construction 

  2. Properties

  3. See also

  4. References

  5. Further reading

A mosaic plot (also known as a Marimekko diagram) is a graphical method for visualizing data from two or more qualitative variables.[1] It is the multidimensional extension of spineplots, which graphically display the same information for only one variable.[2] It gives an overview of the data and makes it possible to recognize relationships between different variables. For example, independence is shown when the boxes across categories all have the same areas.[3] Mosaic plots were introduced by Hartigan and Kleiner in 1981 and expanded on by Friendly in 1994.[4] Mosaic plots are also called Mekko charts due to their resemblance to a Marimekko print.

As with bar charts and spineplots, the area of the tiles, also known as the bin size, is proportional to the number of observations within that category.[5]

Example

A classic example of mosaic plots uses data from the passengers on the Titanic. The data used for this example has 2201 observations and 3 variables. The variables are:

  • the gender of the person (male / female)
  • the class (1st, 2nd and 3rd class, or crew)
  • did this person survive the sinking (yes / no)?

The observations were compiled into the following table:

Gender Survived 1st Class 2nd Class 3rd Class Crew
Male No118154422670
Yes622588192
Female No4131063
Yes141939020

Mosaic plot construction

Order Variable Axis
1. Gender Vertical
2. Class Horizontal
3. Survived Vertical

The categorical variables are first put in order. Then, each variable is assigned to an axis. In the table to the right, sequence and classification is presented for this data set. Another ordering will result in a different mosaic plot, i.e., the order of the variables is significant as for all multivariate plots.

At the left edge of the first variable we first plot "Gender," meaning that we divide the data vertically in two blocks: the bottom blocks corresponds to females, while the upper (much larger) one to males. One immediately sees that roughly a quarter of the passengers were female and the remaining three quarters male.

One then applies the second variable "Class" to the top edge. The four vertical columns therefore mark the four values of that variable (1st, 2nd, 3rd, and crew). These columns are of variable thickness, because column width indicates the relative proportion of the corresponding value on the population. Crew plainly represents the largest male group, whereas third-class passengers are the largest female group. The number of female crew members is also seen to have been marginal.

The last variable ("Survived") is finally applied, this time along the left edge with the result highlighted by shade: dark grey rectangles represent people that did not survive the disaster, light grey ones people that did. Women in the first class are immediately seen to have had the highest survival probability. The survival probability for females is seen to have been higher than that for men (marginalised over all classes). Similarly, a marginalization over gender identifies first-class passengers as most probable to survive. Overall, about 1/3 of all people survived (proportion of light gray areas).

Properties

  • The displayed variables are categorical or ordinal scales.
  • The plot is of at least two variables. There is no upper limit, but too many variables may be confusing in graphic form.
  • The number of observations is not limited, but not read in the image.
  • The surfaces of the rectangular fields that are available for a combination of features are proportional to the number of observations that have this combination of features.
  • Unlike, for example, the boxplot or QQ plot, it is not possible for the mosaic plot to plot a confidence interval. The significance of different frequencies of the various characteristic values can therefore not be observed visually.

See also

  • Heat map
  • Treemap
  • Contingency table

References

1. ^{{cite book|author=Sandra D. Schlotzhauer|title=Elementary Statistics Using JMP|url=https://books.google.com/books?id=5JYM1WxGDz8C&pg=PA407|date=1 April 2007|publisher=SAS Institute|isbn=978-1-59994-428-9|pages=407}}
2. ^{{cite book|title=New Techniques and Technologies for Statistics II: Proceedings of the Second Bonn Seminar|url=https://books.google.com/books?id=-Pp8hbwAtq8C&pg=PA254|date=1 January 1997|publisher=IOS Press|isbn=978-90-5199-326-4|pages=254}}
3. ^{{cite book|author=Michael Friendly|title=SAS System for Statistical Graphics|url=https://books.google.com/books?id=bBIUdg5LjeUC&pg=PA512|date=1 January 1991|publisher=SAS Institute|isbn=978-1-55544-441-9|pages=512–}}
4. ^{{cite book|author=SAS Institute|title=JMP 11 Basic Analysis|url=https://books.google.com/books?id=US_pAAAAQBAJ&pg=PT251|date=6 September 2013|publisher=SAS Institute|isbn=978-1-61290-684-3|pages=251–}}
5. ^{{cite book|author1=Martin Theus|author2=Simon Urbanek|title=Interactive Graphics for Data Analysis: Principles and Examples|url=https://books.google.com/books?id=xHIH1Q47FeoC&pg=PA31|date=23 March 2011|publisher=CRC Press|isbn=978-1-4200-1106-7}}

Further reading

  • John Hartigan, Beat Kleiner: Mosaics for contingency tables. In: Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface. 1981, S. 268–273.

1 : Statistical charts and diagrams

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/9/25 10:28:56