请输入您要查询的百科知识:

 

词条 Misra–Gries summary
释义

  1. The Misra–Gries summary

  2. Properties

  3. Example

  4. References

  5. External links

{{refimprove|date=March 2018}}

In the field of streaming algorithms, Misra–Gries summaries are used to solve the frequent elements problem in the data stream model. That is, given a long stream of input that can only be examined twice (and in some arbitrary order), the Misra-Gries algorithm can be used to compute which (if any) value makes up a majority of the stream, or more generally, the set of items that constitute some fixed fraction of the stream.

The Misra–Gries summary

As for all algorithms in the data stream model, the input is a finite sequence of integers from a finite domain. The algorithm outputs an associative array which has values from the stream as keys, and estimates of their frequency as the corresponding values. It takes a parameter {{var|k}} which determines the size of the array, which impacts both the quality of the estimates and the amount of memory used.

 '''algorithm''' misra-gries:{{sfn|Cormode|2014}}     '''input:'''          A positive integer {{var|k}}         A finite sequence {{var|s}} taking values in the range 1,2,...,{{var|m}}     '''output:''' An associative array {{var|A}} with frequency estimates for each item in {{var|s}}          {{var|A}} := new (empty) associative array     '''while''' {{var|s}} is not empty:         '''take''' a value {{var|i}} from {{var|s}}         '''if''' {{var|i}} is in keys({{var|A}}):             {{var|A}}[{{var|i}}] := {{var|A}}[i] + 1         '''else if''' |keys({{var|A}})| < {{var|k}} - 1:             {{var|A}}[{{var|i}}] := 1         '''else''':             '''for each''' {{var|K}} in keys({{var|A}}):                 {{var|A}}[{{var|K}}] := {{var|A}}[{{var|K}}] - 1                 '''if''' {{var|A}}[{{var|K}}] = 0:                     remove {{var|K}} from keys({{var|A}})     '''return''' {{var|A}}

Properties

The Misra–Gries algorithm uses O({{var|k}}(log({{var|m}})+log({{var|n}}))) space, where {{var|m}} is the maximum value in the stream and {{var|n}} is the length of the stream.

Every item which occurs at least {{var|n}}/{{var|k}} times is guaranteed to appear in the output array.{{sfn|Cormode|2014}} Therefore, in a second pass over the data, the exact frequencies for the {{var|k}}−1 items can be computed to solve the frequent items problem, or in the case of {{var|k}}=2, the majority problem. This second pass takes O({{var|k}}log({{var|m}})) space.{{Citation needed|date=November 2017}}

The summaries (arrays) output by the algorithm are mergeable, in the sense that combining summaries of two streams {{var|s}} and {{var|r}} by adding their arrays keywise and then decrementing each counter in the resulting array until only {{var|k}} keys remain results in a summary of the same (or better) quality as compared to running the Misra-Gries algorithm over the concatenation of {{var|s}} with {{var|r}}.

Example

{{Expand section|date=November 2017}}

Let k=2 and the data stream be 1,4,5,4,4,5,4,4 (n=8,m=5). Note that 4 is appearing 5 times in the data stream which is more than n/k=4 times and thus should appear as the output of the algorithm.

Since k=2 and |keys(A)|=k−1=1 the algorithm can only have one key with its corresponding value. The algorithm will then execute as follows(- signifies that no key is present):

Example Execution of Misra–Gries
Stream ValueKeyValue
111
40
551
40
441
50
441
442

Output: 4

References

{{Refbegin}}
  • {{Cite journal|last=Misra|first=J.|last2=Gries|first2=David|title=Finding repeated elements|journal=Science of Computer Programming|volume=2|issue=2|pages=143–152|doi=10.1016/0167-6423(82)90012-0|year=1982}}
  • {{Cite book|title=Encyclopedia of Algorithms|last=Cormode|first=Graham|date=2014|publisher=Springer US|isbn=9783642278488|editor-last=Kao|editor-first=Ming-Yang|pages=1–5|language=en|doi=10.1007/978-3-642-27848-8_572-1|chapter = Misra-Gries Summaries}}
{{Refend}}

External links

  • Comprehensive lecture notes: http://www.cs.dartmouth.edu/%7Eac/Teach/CS49-Fall11/Notes/lecnotes.pdf
  • https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf

1 : Streaming algorithms

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 11:05:37