Business data exist in many different forms, including time series and longitudinal data, and may contain various types of variables, such as time to event, occurrences, categorical and continuous measurements. Statistical data analysis includes data management, modeling, segmentation and application integration, and seeks to uncover useful information; forecast business processes at various horizons; and detect hidden structural changes.
Recent examples include:
Revenue forecasting
Demand forecasting
Network traffic forecasting
Expense tracking
Customer profitability analysis
Targeting analytics
Capacity planning
System monitoring
Warranty data analysis
The challenge of business analytics is that one has to estimate the parameters of the model without comprehensive information. For instance, IBM Research may be asked to develop a control center (or dashboard) that will track a portfolio of large IT services customers and identify those that might decide to terminate their contract, or renegotiate it to achieve a lower price, which would lead to a significant loss of revenue to the company. In this case, as shown in Figure 1, the inputs to the model are numerous variables that describe the following five risk factors: 1) financial health of client companies; 2) previous relationships with the service provider; 3) price and competitiveness of the offered service; 4) significant events in the client company that could have a potential impact on the decision to cancel the service (e.g. change of CEO, merger, restructuring, etc.); and 5) previous history of contract terminations or renegotiations. The output variable is the likelihood that a customer will terminate its contract (or a part of it).
Conventional methods are limited to estimating the likelihood of termination, without providing insights into which factor is most influential in the decision. Yet, knowing the impact of different factors to the client's decision can help the service provider influence the outcome. For example, if the decision to terminate is based on limited cash availability, the service provider might arrange different financing. On the other hand, if the decision is based on the low satisfaction with the service, the service provider can influence the outcome by improving service and mobilizing its sales and marketing teams to save the relationship. Typically only the variables that influence the risk factors are known, not the risk factors themselves, which means they are considered hidden states.
Statistical Modeling for Client Assessment
In traditional methods, such as logistic regression, the values of these hidden states are computed as a byproduct of the model. However, in many applications, at least some of the relationships among these factors (i.e. hidden variables) are known. For example, in the aforementioned problem of the dashboard design, it is often possible to provide additional information in the form of "Company A has been more satisfied than Company B" or "Company C has better financial health than Company D." Because the traditional parameter estimation techniques, such as (Multidimensional Analysis Platform (MAP) or iteratively re-weighted least squares, do not account for these relationships, the estimation of hidden variables obtained with the standard parameter estimation procedures is not optimal. An estimation procedure that captures such relationships in the data is needed.
At IBM Research, an algorithm was developed to compute the regression parameters, taking into account the issue of estimating hidden variables with constraints. A recursive adaptation procedure has also been developed to identify the most significant nonlinear relationships in the data and to adapt the model by introducing corresponding higher order terms. The entire approach is embedded in a Web-based platform and used to track the portfolio of large IT services clients in the IBM Global Services division, and identify those who are likely to terminate or reduce the scope of their engagement.
Related Publications
Bonnie K. Ray. LONG-RANGE DEPENDENCE AND REGIME-SWITCHING MODELS. ISF 2003-International Symposium on Forecasting . International Institute of Forecasting, June 2003.
Hosking, J., Bonti, G., and Siegel, D." Beyond the lognormal," Risk, vol. 13, no. 5, pp. 59-62, 2000.
Li, T.-H. and Hinich, M. "A filter bank approach for modeling and forecasting seasonal patterns," Technometrics, vol. 44, no. 1, pp. 1-14, 2002.
Yashchin, E., Stein, D., and Philips, T. "Using Statistical Process Control to Monitor Active Managers,” Journal of Portfolio Management, vol. 30, no. 1, pp. 86-94, 2003.
Wall, M. M., and Amemiya, Y. "Estimation for polynomial structural equation models,'' Journal of the American Statistical Association, vol. 95, pp. 929--940, 2000.
Rate this article








