词条 | Data classification (business intelligence) |
释义 |
In business intelligence, data classification has close ties to data clustering, but where data clustering is descriptive, data classification is predictive.[1][2] In essence data classification consists of using variables with known values to predict the unknown or future values of other variables. It can be used in e.g. direct marketing, insurance fraud detection or medical diagnosis.[1] The first step in doing a data classification is to cluster the data set used for category training, to create the wanted number of categories. An algorithm, called the classifier, is then used on the categories, creating a descriptive model for each. These models can then be used to categorize new items in the created classification system.[2] EffectivenessAccording to Golfarelli and Rizzi, these are the measures of effectiveness of the classifier:[2]
Typical examples of input for data classification could be variables such as demographics, lifestyle information, or economical behaviour. ChallengesThere are several challenges in working with data classification. One in particular is that it is necessary for all using categories on e.g. customers or clients, to do the modeling in an iterative process. This is to make sure that change in the characteristics of customer groups does not go unnoticed, making the existing categories outdated and obsolete, without anyone noticing. This could be of special importance to insurance or banking companies, where fraud detection is extremely relevant. New fraud patterns may come unnoticed, if the methods to surveil these changes and alert when categories are changing, disappearing or new ones emerge, are not developed and implemented. References1. ^1 Kimball, R. et al. (2008). The Data Warehouse Lifecycle Toolkit. (2. Ed.). Wiley. {{ISBN|0-471-25547-5}} {{DEFAULTSORT:Data Classification (Business Intelligence)}}2. ^1 2 Golfarelli, M. & Rizzi, S. (2009). Data Warehouse Design : Modern Principles and Methodologies. McGraw-Hill Osburn. {{ISBN|0-07-161039-1}} 2 : Statistical classification|Business intelligence |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。