Sunday, May 25, 2008

Where Da Cash At

In the business world, the applications are more grown-up and mission-critical. One example is churn prediction - i.e. finding out if a (say, wireless) customer would stay loyal with the current provider, or move on ("churn") to competitor (in which case, the current provider could try to entice him/her to stay with appropriate promos). The customer data used for such churn prediction applications contains categorical (e.g. gender, education, occupation) and numerical (e.g. age, salary, fico score, distance of residence from a metro) attributes/columns in a table. The data in these numeric columns will be widely dispersed, across different scales. For e.g. values within salary can be from 10s of thousands to several millions. Two numerical attributes will be in different scales - example salary (30K - 2 mil) vs age (1-100). Such disparity in scales, if left untreated, can throw most mining algorithms out of whack - the attributes with higher range of values will start outweighing those in the lower range during the computation of prediction. For such algorithms, the numerical data is normalized to a smaller range [-1, 1] or [0, 1] using the z-transform, to enable uniform handling of numerical data by the algorithm.

Sphere: Related Content