How Text Analytics Works

Series 2: Text mining – how it works


This week continues our second series of Text Analytics (TA). As previously stated, TA can be used as an invaluable source in interpreting text data in providing rich customer information that otherwise may go undetected via numerical data (structured) analysis. Several players in the industry have text-mining tools that can extract this “undiscovered” data which is quite difficult to access by other means. The forms of data that can be potentially assessed include emails, insurance claims, news feeds, and other data repositories. Moreover, the following will present a scope on how text analytics is broken down into several categories according to the industry experts. To get an accurate synopsis on text analytics, it can be segmented into these three phases in which they encompass the overall contribution to Business Intelligence and what it can offer an organization in terms of business solutions. Figure 1.1 illustrates this approach:


Figure 1.1


The phases from gathering to applications can be further segmented into these steps. Phases are attached to the corresponding segment.

1) Information gathering acquisition (phase 1) – which refers to information acquisition from text sources such as the web, databases, or content management systems for analytical purposes. One should note this a preparatory step that makes use of a corpus (a large and structured set of texts) for identification of textual materials for analysis. After the information is attained it is filtered into a data cleansing process that removes any redundant or meaningless data, and then into integration, fitting in the data that was found appropriately.

2) Root (Core) text analytics (phase 2) – involves a natural language process (NLP), in other words, the process of a computer extracting meaningful information from natural language input and producing natural language output. This can also include speech tagging, which is the process of marking up a word in a text as a corresponding part of speech based on its definition and context. Also, included in core analytics is syntactic parsing, which breaks up text into tokens (words) that can then be fitted into a formal grammatical structure. These tokens are scaled to what the meaning would be in formal grammar. Core analytics also feature information extraction, sentiment analysis and other semantic applications.

3) Search based applications (SBA) (phase 3) – Uses semantic analysis to search and return the most organized and relevant results. This software uses a range of proprietary algorithms that search for specific web content, categorizes the data, and offers businesses solutions. SBA’s are a form of semantic technologies, which can be defined as software that encodes meanings separately from data and content files. SBA’s classify unstructured data across multiple repositories that work together with natural language technologies.

4) Enterprise information management (EIM) (phase 3) – Specializes in finding solutions for specific optimized uses of data within the information technology area of an organization. This may involve the business decision support processes of a firm that requires the accessibility of client information. These systems allow an organization to build and maintain tags that attaches itself to the vocabulary of labels, then rationalizes the description of the digital content. The key here is precision and consistency, the better the results are in these two areas, the better off an organization is with leveraging results from tags used.

5) Text analytics within enterprise applications (phase 3) – This supports business functions such as customer relationship management (CRM). Customer-managing software specifically that aggregate, attracts, and retains clients, market research, competitive intelligence, and enterprise feedback management surveys. Text analytics with the use of enterprise is that customer intelligence can be better assessed through a more comprehensive analysis of professional surveys and greater insight into collected market data.