Text Mining Applied to Online News

Mauricio Onada, Valeria M. Bastos, Cristian K. Santos, Marcello P. A. Fonseca, Victor S. Bursztyn, Alexandre G. Evsukoff, Nelson F. F. Ebecken


This paper describes a work that includes the development and implementation of a practical and efficient methodology to construct a knowledge extraction environment that contemplates the search of information from Portuguese language Web sites. The application has much functionality in text mining, such as similarities and differences identification between pages and sites, content classification and document clustering, which can be applicable to competitive intelligence tools. The application conception has as origin the exploration evaluation environment of literal informations that still come back toward the availability of a tool that deals with only part of problem. Thinking about the increasing availability of information in the Web, it was possible to elaborate a proposal of an environment that presents these solutions in an integrated form, supplying results analysis, according to the user indication.

