“Web search query”的意思、由来-开放百科全书

A web search query is a query based on a specific search term that a user enters into a web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are often plain text or hypertext with optional search-directives (such as "and"/"or" with "-" to exclude). They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.

Types

There are three broad categories that cover most web search queries: informational, navigational, and transactional.^[1] These are also called "do, know, go."^[2] Although this model of searching was not theoretically derived, the classification has been empirically validated with actual search engine queries.^[3]

Search engines often support a fourth type of query that is used far less frequently:

Characteristics

Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by.^[5] Nevertheless, research studies appeared in 1998.^[6]^[7] Later, a study in 2001^[8] analyzed the queries from the Excite search engine showed some interesting characteristics of web search:

A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.).^[9] Studies also show that, in addition to short queries (i.e., queries with few terms), there are also predictable patterns to how users change their queries.^[10]

A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result.^[11] This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post telling about 30% queries are navigational queries ^[12]

In addition, much research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually.^[13] This example of the Pareto principle (or 80–20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching. In addition, studies have been conducted on discovering linguistically-oriented attributes that can recognize if a web query is navigational, informational or transactional.^[14]

But in a recent study in 2011 it was found that the average length of queries has grown steadily over time and average length of non-English languages queries had increased more than English queries.^[15] Google has implemented the hummingbird update in August 2013 to handle longer search queries since more searches are conversational (i.e. "where is the nearest coffee shop?").^[16]

For longer queries, Natural language processing helps, since parse trees of queries can be matched with that of answers and their snippets.^[17] For multi-sentence queries where keywords statistics and Tf–idf is not very helpful, Parse thicket technique comes into play to structurally represent complex questions and answers.^[18]

Structured queries

With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles. A faceted query is a conjunction of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral) is likely to find documents about electronic voting even if they omit one of the words "electronic" and "voting", or even both.^[19]

See also

References

1. ^Broder, A. (2002). A taxonomy of Web search. SIGIR Forum, 36(2), 3–10.
2. ^{{cite web|last=Gibbons|first=Kevin|title=Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle|url=http://searchenginewatch.com/article/2235624/Do-Know-Go-How-to-Create-Content-at-Each-Stage-of-the-Buying-Cycle|publisher=Search Engine Watch|accessdate=24 May 2014|date=2013-01-11}}
3. ^Jansen, B. J., Booth, D., and Spink, A. (2008) [https://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_user_intent.pdf Determining the informational, navigational, and transactional intent of Web queries], Information Processing & Management. 44(3), 1251-1266.
4. ^{{cite web|last=Moore|first=Ross|title=Connectivity servers|url=http://nlp.stanford.edu/IR-book/html/htmledition/connectivity-servers-1.html|publisher=Cambridge University Press|accessdate=24 May 2014}}
5. ^Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data
6. ^Jansen, B. J., Spink, A., Bateman, J., and Saracevic, T. 1998. [https://faculty.ist.psu.edu/jjansen/academic/jansen_sigir_forum.pdf Real life information retrieval: A study of user queries on the web]. SIGIR Forum, 32(1), 5 -17.
7. ^Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum,33(1), 6–12.
8. ^{{cite journal|author1=Amanda Spink |author2=Dietmar Wolfram |author3=Major B. J. Jansen |author4=Tefko Saracevic | year = 2001 | title = [https://faculty.ist.psu.edu/jjansen/academic/jansen_public_queries.pdf Searching the web: The public and their queries] | journal = Journal of the American Society for Information Science and Technology | volume = 52 | issue = 3 | pages = 226–234 | doi = 10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I |citeseerx=10.1.1.23.9800 }}
9. ^{{cite conference |author1=Mark Sanderson |author2=Janet Kohler |lastauthoramp=yes | year = 2004 | title = Analyzing geographic queries | booktitle = Proceedings of the Workshop on Geographic Information (SIGIR '04) | url =http://supremacyseo.com/analyzing-geographic-queries }}
10. ^Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching. Journal of the American Society for Information Science and Technology. 60(3), 557-570. 60(7), 1358-1371.
11. ^{{cite conference |author1=Jaime Teevan |author2=Eytan Adar |author3=Rosie Jones |author4=Michael Potts | year = 2005 | title = History repeats itself: Repeat Queries in Yahoo's query logs | booktitle = Proceedings of the 29th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '06) | pages = 703–704 | url =http://www.csail.mit.edu/~teevan/work/publications/posters/sigir06.pdf | doi=10.1145/1148170.1148326 }}
12. ^http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-yours.aspx
13. ^{{cite book | author = Ricardo Baeza-Yates | title = Advances in Information Retrieval | year = 2005 | chapter = Applications of Web Query Mining | booktitle = Lecture Notes in Computer Science | pages = 7–22 | volume = 3408 | publisher = Springer Berlin / Heidelberg | chapter-url = http://www.springerlink.com/content/kpphaktugag5mbv0/ | isbn = 978-3-540-25295-5| doi = 10.1007/978-3-540-31865-1_2 | series = Lecture Notes in Computer Science }}
14. ^{{cite journal | author = Alejandro Figueroa | year = 2015 | title = Exploring effective features for recognizing the user intent behind web queries | booktitle = Computers in Industry | pages = 162–169 | volume = 68 | publisher = Elsevier | url = https://www.researchgate.net/publication/271911317}}
15. ^{{cite journal |author1=Mona Taghavi |author2=Ahmed Patel |author3=Nikita Schmidt |author4=Christopher Wills |author5=Yiqi Tew | year = 2011 | title = An analysis of web proxy logs with query distribution pattern approach for search engines |journal=Computer Standards & Interfaces | booktitle = Journal of Computer Standards & Interfaces | pages = 162–170 | volume = 34 | issue = 1 | doi=10.1016/j.csi.2011.07.001}}
16. ^{{cite web|last=Sullivan|first=Danny|title=FAQ: All About The New Google "Hummingbird" Algorithm|url=http://searchengineland.com/google-hummingbird-172816|publisher=Search Engine Land|accessdate=24 May 2014|date=2013-09-26}}
17. ^{{vcite journal |author=Galitsky B|title=Machine learning of syntactic parse trees for search and classification of text|journal=Engineering Applications of Artificial Intelligence |volume=26 |issue=3 |date=2013 |pages=153–172|doi=10.1016/j.engappai.2012.09.017}}
18. ^{{vcite journal |author=Galitsky B, Ilvovsky D, Kuznetsov SO, Strok F|title=Finding Maximal Common Sub-parse Thicketsfor Multi-sentence Search |journal=Lecture Notes in Artificial Intelligence |volume = 8323 |date=2013 |http://www.aclweb.org/anthology/R13-1037}}
19. ^{{Cite journal|url=http://eprints.eemcs.utwente.nl/6918/01/TR-CTIT-06-57.pdf|title=Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness|author1=Vojkan Mihajlović |author2=Djoerd Hiemstra |author3=Henk Ernst Blok |author4=Peter M.G. Apers |postscript=|date=October 2006}}