Project Description
WebExtractor360 is a free and open source web data extractor. It uses Regular Expressions to find, extract and scrape internet data quickly and easily. It is very flexible, allowing you to extract both simple and commonly used data and complex data structures like HTML tables.




The web extractor software starts by crawling the specified web URL or any local file resource. All data that maps to the Match (Regular Expression) field will be returned as a result. Upon completion of the matching process for the specified URL, the crawler will continue to process other URLs that the specified URL links to. This is as shown in the diagram below. The entire process is repeated until the Maximun URL has been reached or there are no more URLs to process.



Regular Expressions
WebExtractor360 extracts information from the web using Regular Expressions. A regular expression is a text string used for describing a search pattern. They can be thought of as special kinds of wildcards. WebExtractor360 provides many commonly used Regular Expressions for extracting data on the web.

Web Extractor Website

Last edited Mar 25, 2009 at 1:57 PM by webextractor360, version 2