请输入您要查询的百科知识:

 

词条 OutWit Hub
释义

  1. Versions

  2. Features

  3. Advanced features

  4. See also

  5. Similar Tools

  6. References

  7. External links

{{Infobox software
| name = OutWit Hub
| developer = OutWit Technologies
| operating system = Microsoft Windows, macOS, Linux
| genre = Web scraping, download manager
| license = Proprietary
| website = {{URL|outwit.com}}
}}

OutWit Hub is a Web scraping software application designed to automatically extract information from online or local resources. It recognizes and grabs links, images, documents, contacts, recurring vocabulary and phrases, rss feeds and converts structured and unstructured data into formatted tables which can be exported to spreadsheets or databases. The first version was released in 2010. Version 7.0 was released in March 2018.

The program includes a Mozilla-based browser and a side bar which gives access to a number of views with pre-set extractors. Web pages and textual documents are broken down into their different constituents, presented as tables in these views. The application can navigate through series of links and sequences of search engine results pages to extract information elements, organize them in tables and export them to various formats. The predefined extractors allow to collect structured tables, lists or feeds. Custom scrapers can also be created to extract data from less structured page elements.[1] Regular expressions can be included in scrapers as well as in other parts of the application to define variable recognition markers.[2]

Although OutWit Hub is presented as a tool for non-technical users, the fact that the application doesn't use the DOM structure for its extractions prevents visual "point & grab" data scraping and forces the user who wants to create custom scrapers to define markers in the source code of the page. The advantage of this approach, however, is that it allows a more precise definition of extraction masks than HTML nodes and faster execution, as the DOM tree doesn't need to be rendered by the browser at extraction time.

Versions

The program exists in two versions: a standalone application and a Mozilla Firefox add-on, which include identical features. A limited free version can be downoaded from the publisher's site and shareware download websites.[3]

Features

  • Recognition and extraction of links, email addresses, structured & non-structured data, RSS news
  • Extraction & download of images and documents
  • Extraction of text, with dictionary of words & groups of words by frequency
  • Automated browsing with user-defined Web exploration rules
  • Automatic query and URL generation by patterns
  • Directories of links & queries
  • Custom scrapers
  • Macro automation
  • Periodical job execution

Advanced features

An Enterprise edition of the application includes advanced extraction and automation features for specific or large volume extractions, sending series of automatically generated HTTP or POST queries and uploading scraped data to FTP servers.

See also

  • Data driven journalism
  • Web scraping

Similar Tools

  • yahoo pipes
  • Automation Anywhere - Web extractor and automation system
  • [https://octatools.com/ Octatools.com]

References

1. ^{{cite journal|title=Using "separators and labels" in Outwit Hub pro|url=https://datacrumble.wordpress.com/2013/05/04/using-separators-and-labels-in-outwit-hub-pro/|journal=datacrumble |date=May 2013}}
2. ^{{cite journal| title=How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper|url=http://onlinejournalismblog.com/2012/11/06/how-to-scraping-uglier-html-using-regular-expressions-in-an-outwit-hub-scraper/|journal=online journalism|date=Nov 2012}}
3. ^{{cite journal| title = How to use OutWit Hub to scrape data for free|url=http://www.interhacktives.com/2014/03/04/use-outwit-hub-scrape-data-free/|journal=interhacktives|date=Mar 2014}}

External links

  • http://www.outwit.com/

3 : Data processing|Web crawlers|Firefox add-ons

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 5:11:08