ISSN ONLINE(2320-9801) PRINT (2320-9798)
Unstructured Data into Intelligent Information Analysis and Evaluation
Unstructured data constitutes about 70% of the data collected or stored in larger organizations which are difficult to access, use or retrieved. This topic deals with this uncertainty to convert the unstructured data in actionable form. Knowing the business value and IT value of the structured data, the amount of effort and time wasted in accessing the necessary information lying in the back bench of collected data, cost spent on searching the information, it becomes highly necessary to manage the unstructured data. In this research, the aim is to retrieve the structured information out of unstructured data using feature extraction, analyzing this data syntactically, organize the analyzed data into entities, rules, associations, facts. Represent this data into structured form either in form of XML or data tables. XML language is very suitable for data storage and data exchange. Data transformation utility was developed using Microsoft Visual Studio 2005. The textual data in documents can be transformed into text file, the data in which can be imported into database. So the transformation of unstructured data can be accomplished with this utility. Feature extraction categorizes the data into entities, events and builds the relations among these entities and events. Due to complexity involved in extracting, mining and structuring the data, research is considered for textual data either in form of documents or web pages. The structured information can be used in decision support systems or serve the purpose intended for the process. We aim at developing a simple approach to extract the key information from scattered unstructured data lying across websites, database, emails etc. The goal is to have effective, improved information retrieval system with this approach. As an application of the approach, we are developing a news retrieval system incorporating the features discussed in this paper. In this paper, an application “Intelligent news retrieval system” has been proposed as model which pulls out the news (same or different) from various web pages (blogs, news websites) and processes them on the basis of popularity or page ranking and display on a single web page. This model collects news from various sources. The use of regular expressions is to recognize the required patterns of the data, anything inside header and title tags. To carry out the procedure, convert the web pages into plain text. This plain text analyzed for entities, facts, relationships, synonyms, thematic analysis, and verb phrases. Data dictionary is used to recognize English words. Extracted data is stored in database inform of tables or XML. Database models can be constructed using constructive information by inference rules or actionable intelligence. The structured information can be used for the purposes intended. The goal of the proposed model is to develop a simple, effective filtered online news reading website which highlights news based on priorities of users, number of hits in source websites, explicit and implicit ratings, and likes by users.
Dr.S.Chitra M.E, Ph.D, Mrs.N.Shunmuga Karpagam M.E, Mr.K.Venkataramanan
To read the full article Download Full Article | Visit Full Article