| Today World Wide Web is flooded with billions of | | | | users having different profiles, interests and usage |
| static and dynamic web pages created with | | | | purposes. Every one of these requires good |
| programming languages such as HTML, PHP and ASP. | | | | information but don't know how to retrieve relevant |
| Web is great source of information offering a lush | | | | data efficiently and with least efforts. |
| playground for data mining. Since the data stored on | | | | It is important to note that only a small section of the |
| web is in various formats and are dynamic in nature, | | | | web possesses really useful information. There are |
| it's a significant challenge to search, process and | | | | three usual methods that a user adopts when |
| present the unstructured information available on the | | | | accessing information stored on the internet: |
| web. | | | | Random surfing i.e. following large numbers of |
| Complexity of a Web page far exceeds the | | | | hyperlinks available on the web page. |
| complexity of any conventional text document. Web | | | | Query based search on Search Engines - use |
| pages on the internet lack uniformity and | | | | Google or Yahoo to find relevant documents (entering |
| standardization while traditional books and text | | | | specific keywords queries of interest in search box) |
| documents are much simpler in their consistency. | | | | Deep query searches i.e. fetching searchable |
| Further, search engines with their limited capacity can | | | | database from eBay.com's product search engines or |
| not index all the web pages which makes data mining | | | | Business.com's service directory, etc. |
| extremely inefficient. | | | | To use the web as an effective resource and |
| Moreover, Internet is a highly dynamic knowledge | | | | knowledge discovery researchers have developed |
| resource and grows at a rapid pace. Sports, News, | | | | efficient data mining techniques to extract relevant |
| Finance and Corporate sites update their websites on | | | | data easily, smoothly and cost-effectively. |
| hourly or daily basis. Today Web reaches to millions of | | | | |