请输入您要查询的百科知识:

 

词条 Speech corpus
释义

  1. See also

  2. References

  3. External links

{{broader|Corpus linguistics}}

A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions.

In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine).

In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of Speech Corpora:

  1. Read Speech – which includes:
    • Book excerpts
    • Broadcast news
    • Lists of words
    • Sequences of numbers
  2. Spontaneous Speech – which includes:
    • Dialogs – between two or more people (includes meetings);
    • Narratives – a person telling a story (one such corpus is the Buckeye Corpus);
    • Map-tasks – one person explains a route on a map to another;
    • Appointment-tasks – two people try to find a common meeting time based on individual schedules.

A special kind of speech corpora are non-native speech databases that contain speech with foreign accent.

See also

  • Arabic Speech Corpus
  • EXMARaLDA
  • List of children's speech corpora
  • Non-native speech database
  • Praat
  • Spoken English Corpus
  • The BABEL Speech Corpus
  • TIMIT
  • Transcriber
  • Transcription (linguistics)

References

  • Edwards, Jane / Lampert, Martin (eds.) (1992): Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum.
  • Leech, Geoffrey / Myers, Greg / Thomas, Jenny (eds.) (1995): Spoken English on Computer: Transcription, Markup and Application. Harlow: Longman.

External links

  • Santa Barbara Corpus of Spoken American English
  • Buckeye Corpus The Buckeye Corpus of Conversational Speech
  • Spoken Language Corpora at the Research Center on Multilingualism
  • The Spoken Turkish Corpus at METU Ankara
  • Spoken Corpus Klient with the Corp-Oral Corpus at ILTEC Lisbon
  • VoxForge – open source speech corpora
  • OLAC: Open Language Archives Community
  • BAS Bavarian Archive for Speech Signals
  • Simmortel Speech Recognition Corpus for Indian English and Hindi
  • ELRA: the European Language Resources Association
  • The PELCRA Conversational Corpus of Polish
  • The Arabic Speech Corpus
{{Natural Language Processing}}Textkorpus

6 : Corpora|Corpus linguistics|Speech recognition|Dialectology|Phonetics|Language documentation

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 17:40:28