请输入您要查询的百科知识:

 

词条 Inside–outside–beginning (tagging)
释义

  1. References

{{context|date=September 2013}}{{Use dmy dates|date=October 2017}}

The IOB format (short for inside, outside, beginning) is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition).[1] It was presented by Ramshaw and Marcus in their paper "Text Chunking using Transformation-Based Learning", 1995[2] The B- prefix before a tag indicates that the tag is the beginning of a chunk, and an I- prefix before a tag indicates that the tag is inside a chunk. The B- tag is used only when a tag is followed by a tag of the same type without O tokens between them. An O tag indicates that a token belongs to no chunk.

Another similar format which is widely used is IOB2 format, which is the same as the IOB format except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

A readable introduction to entity tagging is given in Bob Carpenter's blog post, "Coding Chunkers as Taggers".[3] 'BIO' is plausibly a synonym for 'IOB'.

An example with IOB format:

Alex I-PERis Ogoing Oto OLos B-LOCAngeles I-LOC
An example with IOB2 format:
Alex B-PERis Ogoing Oto OLos B-LOCAngeles I-LOC
Related tagging schemes sometimes include "START/END: This consists of the tags B, E, I, S or O where S is used to represent a chunk containing a single token. Chunks of length greater than or equal to two always start with the B tag and end with the E tag."[4]

References

1. ^{{cite web|title=Entity Recognition|url=http://www.evalita.it/2009/tasks/entity}}
2. ^{{cite arxiv|title=Text Chunking using Transformation-Based Learning|author=Ramshaw and Marcus|year=1995|arxiv=cmp-lg/9505040}}
3. ^{{cite web|title=Coding Chunkers as Taggers: IO, BIO, BMEWO, and BMEWO+|url=https://lingpipe-blog.com/2009/10/14/coding-chunkers-as-taggers-io-bio-bmewo-and-bmewo/|author=Bob Carpenter|year=2009}}
4. ^http://cs229.stanford.edu/proj2005/KrishnanGanapathy-NamedEntityRecognition.pdf
{{DEFAULTSORT:Inside-outside-beginning (tagging)}}{{comp-ling-stub}}

1 : Computational linguistics

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 9:02:38