# ESTIMATING DEMAND BY REGRESSION ANALYSIS

|The first question which arises is, what is the difference between **demand estimation **and **demand forecasting?** The answer is that estimation attempts to quantify the links between the level of demand and the variables which determine it. Forecasting, on the other hand, attempts to predict the overall level of future demand rather than looking at specific linkages. For this reason the set of techniques used may differ, although there will be some overlap between the two. In general, an estimation technique can be used to forecast demand but a forecasting technique cannot be used to estimate demand. A manager who wishes to know how high demand is likely to be in two years’ time might use a forecasting technique. A manager who wishes to know how the firm’s pricing policy could be used to generate a given increase in demand would use an estimation technique.

The firm needs to have information about likely future demand in order to pursue optimal pricing strategy. It can only charge a price that the market will bear if it is to sell the product. On one hand, over-optimistic estimates of demand may lead to an excessively high price and lost sales. On the other hand, over-pessimistic estimates of demand may lead to a price which is set too low resulting in lost profits. The more accurate, information the firm has, the less likely it is to take a decision which will have a negative impact on its operations and profitability.

The level of demand for a product will influence decisions, which the firm will take regarding the non-price factors that form part of its overall competitive strategy. For example, the level of advertising it carries out will be determined by the perceived need to stimulate demand for the product. As advertising expenditure represents an additional cost to the firm, unnecessary spending in this area needs to be avoided. If the firm’s expectations about demand are too low it may try to compensate by spending large sums on advertising, money which in this instance may be, at least, partly wasted. Alternatively it may decide to redesign the product in response to this, thus incurring unnecessary additional costs in the form of research and development expenditure.

In the earlier post, demand analysis was introduced as a tool for managerial decision-making. For example, it was shown that knowledge of price and cross elasticities can assist managers in pricing and that income elasticities provide useful insights into how demand for a product will respond to different macroeconomic conditions. We assumed that these elasticities were known or that the data were already available to allow them to be easily computed. Unfortunately, this is not usually the case. For many business applications, **the manager who desires information about elasticities must develop a data set and use statistical methods to estimate a demand equation from which the elasticities can then be calculated. **This estimated equation could then, also be used to predict demand for the product, based on assumptions about prices, income, and other factors. In the coming two posts, the basic techniques of **demand estimation** and **demand forecasting** are introduced.** **

**ESTIMATING DEMAND USING REGRESSION ANALYSIS**

The basic **regression tools** are also be used to estimate demand relationships. Consider a small restaurant chain specializing in Chinese dinners. The business has collected information on prices and the average number of meals served per day for a random sample of eight restaurants in the chain. These data are shown below. Use regression analysis to estimate the coefficients of the demand function Qd = a + bP. Based on the estimated equation, calculate the point price elasticity of demand at mean values of’ the variables.

**EVALUATING THE ACCURACY OF THE REGRESSION EQUATION – REGRESSION STATISTICS**

Once the parameters have been estimated, the strength of the relationship between the dependent variable and the independent variables can be measured in two ways. The first uses a measure called the **coefficient of determination**, denoted as **R ^{2}**, to measure how well the overall equation explains changes in the dependent variable. The second measure uses the

**t-statistic**to test the strength of the relationship between an independent variable and the dependent variable.

**Testing Overall Explanatory Power : **Define the squared deviation of any Yi from the mean of Y [i.e., **(Yi–****Ȳ****) ^{2}**] as the variation in Y. The total variation is found by summing these deviations for all values of the dependent variable as

**Total variation = S (Yi–****Ȳ****) ^{2}**

Total variation can be separated into two components: **explained variation** and **unexplained variation**. These concepts are explained below, for each Xi value, compute the predicted value of Yi (denoted as Ŷ_{i}) by substituting Xi in the estimated regression equation:

**Ŷ _{i}**

**= aˆ**

**+**

**bˆX**

_{i}**The squared difference between the predicted value Yi and the mean value Y[i.e., (****Ŷ _{i}**

**–**

**Ȳ**

**)**The word explained means that the deviation of Y from its average value is Ȳ the result of (i.e., is explained by) changes in X. For example, in the data on total output and cost used previously, one important reason the cost values are higher or lower than Ȳ is because output rates (Xi) are higher or lower than the average output rate.

^{2}] defined as explained variation.Total explained variation is found by summing these squared deviations, that is,

**Total explained variation = ∑ (****Ŷi – Ȳ) ^{2}**

**Unexplained variation is the difference between Yi and Ŷi** . That is, part of the deviation of Yi from the average value (Y) is “explained” by the independent variable, X. The remaining deviation, **Yi – Ŷi** , is said to be unexplained. Summing the squares of these differences yields

**Total Unexplained variation =** **∑ (Yi – Ŷi) ^{2}**

The three sources of variation are shown in Figure 6.1.

**The coefficient of determination (R ^{2})** measures the proportion of total’ variation in the dependent variable that is “explained” by the regression equation. That is,

The value of R^{2} ranges from **zero to 1**. If the regression equation explains none of the variation in Y (i.e., there is no relationship between the independent variables and the dependent variable), R^{2} will be zero. **If the equation explains all the variation (i.e., total explained variation = total variation), the coefficient of determination will be 1**. In general, the higher the value of R^{2}, the “better” the regression equation. The term fit is often used to describe the explanatory power of the estimated equation. When **R ^{2} is high, the equation is said to fit the data well**. A low R

^{2}would be indicative of a rather poor fit.

How high must the coefficient of determination be in order that a regression equation be said to fit well? There is no precise answer to this question. For some relationships, such as that between consumption and income over time, one might expect R2 to be at least 0.95. In other cases, such as estimating the relationship between output and average cost for fifty different producers during one production period, an R^{2} of 0.40 or 0.50 might be regarded as quite good.

Based on the estimated regression equation for total cost and output, that is,

**Ŷi**** = 87.08 + 12.21X1**

the coefficient of determination can be computed using the data on sources of variation shown in Table -1.

The value of R^{2} is 0.954, which means that more than 95 percent of the variation in total cost is explained by changes in output levels. Thus the equation would appear to fit the data quite well.

**Evaluating the Explanatory Power of Individual Independent Variables**

The t-test is used to determine whether there is a significant relationship between the dependent variable and each independent variable. This test requires that the standard deviation(or standard error) of the estimated regression coefficient be computed. The relationship between a dependent variable and an independent variable is not fixed because the estimate of b will vary for different data samples. The standard error of *b*ˆ from one of these regression equations provides an estimate of the amount of variability in b. The equation for this standard error is

where **t _{n-k-1}** represents the value of a particular probability distribution known as student’s distribution. The subscript (n -k -1) refers to the number of degrees of freedom, where n is the number of observations or data points and k is the number of independent variables in the equation. An abbreviated list of t-values for use in estimating 95 percent confidence intervals is shown in Table-2. In the example discussed here, n = 7 and k = 1, so there are five (i.e., 7 -1 -1) degrees of freedom, and the value of t in the table is 2.571. Thus, in repeated estimations of the output-cost relationship, it is expected that about 95 percent of the time such that the true value of parameter b will lie in the interval defined by the estimated value of b plus or minus 2.571 times the standard error of b. For the output-cost data, the 95 percent confidence interval estimate would be

**12.21± 2.571(1.19)**

or from 9.15 to 15.27. This means that the probability that the true marginal relationship between cost and output (i.e., the value of b) within this range is 0.95. If there is no relationship between the dependent and an independent variable, the parameter b would be zero. A standard statistical test for the strength of the relationship between Y and X is to check whether the 95 percent confidence interval includes the value zero. If it does not, the relationship between X and Y as measured by bˆ is said to be statistically significant. If that interval does include zero, then 6 is said to be non significant, meaning that there does not appear to be a strong relationship between the two variables. The confidence interval for in bˆ the output-cost example did not include zero, and thus it is said that bˆ , an estimate of marginal cost, is statistically significant or that there is a strong relationship between cost and rate of output.

Another way to make the same test is **to divide the estimated coefficient (bˆ) by its standard error**. The probability distribution of this ratio is the same as **Student’s t distribution**; thus this ratio is called a t-value. If the absolute value of this ratio is equal to or greater than the tabled value of t for n – k – 1 degrees of freedom, bˆ is said to statistically significant. Using the output-cost data, the t-value is computed to be

The standard error of the equation is used to determine the likely accuracy with which we can predict the value of the dependent variable associated with particular values of the independent variables. As a general principle, **the smaller the value of the standard error of the equation, the more accurate the equation is** and hence the more accurate any predictions made from it will be. To put this in another way, the standard error represents the standard deviation of the dependent variable about the regression line. Thus the smaller the value, the better the fit of the equation to the data and the closer the estimate will be to the true regression line. **Conversely, the larger the standard error, the bigger the deviation from the regression line and the less confidence that can be put in any prediction arising from it. **The standard error of the coefficient works along similar lines. It gives an indication of the amount of confidence that can be placed in the estimated regression coefficient for each independent variable. Again, the smaller the value, the greater the confidence that can be placed in the estimated coefficient and vice versa. Finally, the **t-test** provides a further measurement of the accuracy of the regression coefficient for each of the independent variables.

A value of t greater than or equal to 2 generally indicates that the calculated coefficient is a **reliable estimate**, while a value of less than 2 indicates that the **coefficient is unreliable**.

(Note: This also partly depends, however, on the number of data observations on which the equation is based so that t-test tables need to be used in order to ensure an accurate interpretation of this statistic.)** **