Statistical methods for business intelligence

Innovation Matters


Business data exist in many different forms, including time series and longitudinal data, and may contain various types of variables, such as time to event, occurrences, categorical and continuous measurements. Statistical data analysis includes data management, modeling, segmentation and application integration, and seeks to uncover useful information; forecast business processes at various horizons; and detect hidden structural changes.

Recent examples include:
Revenue forecasting
Demand forecasting
Network traffic forecasting
Expense tracking
Customer profitability analysis
Targeting analytics
Capacity planning
System monitoring
Warranty data analysis


The challenge of business analytics is that one has to estimate the parameters of the model without comprehensive information. For instance, IBM Research may be asked to develop a control center (or dashboard) that will track a portfolio of large IT services customers and identify those that might decide to terminate their contract, or renegotiate it to achieve a lower price, which would lead to a significant loss of revenue to the company. In this case, as shown in Figure 1, the inputs to the model are numerous variables that describe the following five risk factors: 1) financial health of client companies; 2) previous relationships with the service provider; 3) price and competitiveness of the offered service; 4) significant events in the client company that could have a potential impact on the decision to cancel the service (e.g. change of CEO, merger, restructuring, etc.); and 5) previous history of contract terminations or renegotiations. The output variable is the likelihood that a customer will terminate its contract (or a part of it).

Conventional methods are limited to estimating the likelihood of termination, without providing insights into which factor is most influential in the decision. Yet, knowing the impact of different factors to the client's decision can help the service provider influence the outcome. For example, if the decision to terminate is based on limited cash availability, the service provider might arrange different financing. On the other hand, if the decision is based on the low satisfaction with the service, the service provider can influence the outcome by improving service and mobilizing its sales and marketing teams to save the relationship. Typically only the variables that influence the risk factors are known, not the risk factors themselves, which means they are considered hidden states.

Statistical Modeling for Client Assessment
Statistical Modeling for Client Assessment


In traditional methods, such as logistic regression, the values of these hidden states are computed as a byproduct of the model. However, in many applications, at least some of the relationships among these factors (i.e. hidden variables) are known. For example, in the aforementioned problem of the dashboard design, it is often possible to provide additional information in the form of "Company A has been more satisfied than Company B" or "Company C has better financial health than Company D." Because the traditional parameter estimation techniques, such as (Multidimensional Analysis Platform (MAP) or iteratively re-weighted least squares, do not account for these relationships, the estimation of hidden variables obtained with the standard parameter estimation procedures is not optimal. An estimation procedure that captures such relationships in the data is needed.

At IBM Research, an algorithm was developed to compute the regression parameters, taking into account the issue of estimating hidden variables with constraints. A recursive adaptation procedure has also been developed to identify the most significant non­linear relationships in the data and to adapt the model by introducing corresponding higher order terms. The entire approach is embedded in a Web-based platform and used to track the portfolio of large IT services clients in the IBM Global Services division, and identify those who are likely to terminate or reduce the scope of their engagement.

Related Publications  

Bonnie K. Ray. LONG-RANGE DEPENDENCE AND REGIME-SWITCHING MODELS. ISF 2003-International Symposium on Forecasting . International Institute of Forecasting, June 2003.


Hosking, J., Bonti, G., and Siegel, D." Beyond the lognormal," Risk, vol. 13, no. 5, pp. 59-62, 2000.

Li, T.-H. and Hinich, M. "
A filter bank approach for modeling and forecasting seasonal patterns," Technometrics, vol. 44, no. 1, pp. 1-14, 2002.

Yashchin, E., Stein, D., and Philips, T. "
Using Statistical Process Control to Monitor Active Managers,” Journal of Portfolio Management, vol. 30, no. 1,  pp. 86-94, 2003.

Wall, M. M., and Amemiya, Y. "Estimation for polynomial structural equation models,''  Journal of the American Statistical Association, vol. 95, pp. 929--940, 2000.


Rate this article

Innovator's corner  

Yasuo AmemiyaYasuo Amemiya Researcher
What is the most exciting potential future use for the work you're doing?
Data-based business intelligence is becoming increasingly popular, and the demand for statistical analytics continues to grow. Development of innovative statistical methods will impact how business monitoring data are captured, stored and processed; and how they are utilized in understanding, forecasting and decision making in business.


What is the most interesting part of your research?
In many business-oriented problems, the data and information available can be limited, incomplete, inconsistent and/or disorganized. It is interesting and rewarding to find a clever way to extract useful information out of such messy data, and to come up with a practical solution for a vaguely posed business problem.


What inspired you to go into this field?
I was studying mathematics and philosophy, and found statistics to be a subject that intermingles these two areas. So, I came to statistics from a conceptual or theoretical side. But, once I started statistics, its practical problem-solving side became more interesting to me.


What is your favorite invention of all time?
Statistical theory for data collection has always been my favorite. The theory consists of two parts: experimental design and survey design. These two areas developed the basic ideas of very proactive human involvement in collecting informative data efficiently, and played fundamental roles in advancement of biological, physical, and social sciences. Now, they can make a difference in development of business analytics.

Related Research  

Disciplines: Computer Science , Mathematical Sciences
Research Areas: Statistics
Research Labs: Watson Research Center