|
CONTENTS
Evaluating the Beef Promotion Checkoff
Replication: An Essential Step Toward Improved Program Evaluation
Editor's Notes
Next Meeting
|
Replication:
An Essential Step Toward Improved Program Evaluation
by Henry W. Kinnucan
In a previous NICPRE Quarterly article, I discussed three principles
of sound evaluation: use of the scientific method, peer review, and reproducibility.
Here I elaborate on reproducibility. In particular, I argue that replication,
i.e., estimating the same model with new or updated data, is an essential
step toward improved advertising benefit-cost analysis. Moreover, I argue
that replication should not be confined to assessing the robustness of
previous findings. Rather, it should be incorporated into current evaluations
as part of the research design. Before describing how this can be done
in a relatively unintrusive manner, I discuss three recent studies that
highlight the need for replication and its role in assuring that results
are trustworthy.
Meat Study: For the meat study, I refer to the results published
in the February 1997 issue of the American Journal of Agricultural
Economics. In that study, the same model (a Rotterdam specification
for beef, pork, poultry, and fish) was estimated for two sample periods:
an "initial" sample consisting of quarterly data for the period
1976-91, and an "updated" sample for the period 1976-93. The
updated sample was identical to the initial sample except that it contained
seven additional observations.
Estimates for both sample periods were similar for the economic variables
(prices and income), health information, and trend. In particular, the
economic, trend, and health variables that were significant in the initial
sample tended also to be significant in the updated sample. And in most
cases the estimated magnitudes of the coefficients were similar, meaning
that one could have confidence in the estimates.
This was not the case for the advertising variables. In particular, whereas
the initial sample indicated a significant relationship between beef advertising
expenditures (lagged one period) and beef consumption, the updated sample
indicated an insignificant relationship. A similar result occurred for
the remaining commodities, i.e., the advertising coefficients that were
significant in the initial sample tended not to be significant in the
updated sample, and vice versa. Consequently, the authors concluded that
the effects of generic advertising of meats in the U.S. are uncertain.
Note that this conclusion is the direct result of the replication. If
the authors had not updated their sample and provided estimates for both
sample periods, they may well have concluded that beef advertising increased
beef demand. Given the sensitivity of this conclusion to sample period,
it would have been difficult to confirm.
Citrus Study: In a 1961 article that has since become a classic,
Nerlove and Waugh found that with production held constant, U.S. orange
growers would receive $20 in gross revenue from an existing dollar invested
in advertising. This marginal return estimate was based in part on a statistical
demand equation that was estimated with data published in the study's
appendix.
Using the appendix data, Tomek was able to duplicate exactly Nerlove
and Waugh's empirical estimates of the demand parameters. However, he
found that the parameters were sensitive to sample period. In particular,
the model estimated with the post-war data produced an insignificant advertising
effect. In addition, Tomek found that the demand specification had serial
correlation, which suggests that the model may have been misspecified.
Although Nerlove and Waugh's major contribution was theoretical, not empirical,
the sensitivity of their empirical estimates to sample period, coupled
with the serial correlation problem, calls into question the validity
of their estimated rate of return. The lesson from this historical analysis
is that estimated advertising effects are fragile, which reinforces the
need for replication.
Catfish Study: The catfish example is more optimistic in the sense
that results hold up better under replication. Four studies were done,
an initial study based on monthly data for the period 1988-89 (study A),
and three updates based on data for 1986-94 (study B), 1986-96 (Study
C), and 1986-97 (study D). The model used in the updates is essentially
the same as used in the original study with the following exceptions.
First, in the original study the dependent variable was defined as whole
catfish. In the updates the definition was expanded to include value-added
catfish products, mainly fillets. Since data for the value-added products
were not available prior to 1986, it was necessary to truncate the samples
in the updates to begin in 1986 rather than 1980. This truncation, however,
had no material effect on the replications, as the advertising campaign
commenced in 1987, so none of the advertising observations were lost.
The second important difference between the the original study and the
updates is that the updates contain more advertising variables. This is
because the industry expanded its media mix to include newspapers and
radio in 1992, and television in 1994. In studies B and C, the print and
electronic media were combined; in study D they were kept separate.
The final important difference is that studies A, B, and C imposed a
geometrically declining (Nerlove) lag on the estimated advertising responses.
In study D the advertising lag structure was estimated using the polynomial
inverse lag (PIL) procedure, which subsumes the Nerlove lag as a special
case. (Further details, including citations for the earlier studies, are
provided in Kinnucan and Miao.)
The main results are summarized in Table 1. These
elasticities are long-run estimates. As expected given the change in the
dependent variable to include value-added products, the updates show a
larger income elasticity. (The insignificant income elasticity in study
A suggests that by the late 1980s catfish had overcome its status as an
inferior good, a result from earlier econometric studies.) The price elasticities,
although smaller in the more recent periods, are not significantly different
from -1.0.
The most interesting results are for the advertising variables. Focusing
first on print media, and bearing in mind that newspaper advertising was
restricted to three months, the estimated advertising elasticities are
remarkably robust. Regardless of sample period and distributed-lag restrictions,
elasticity estimates are significant. Moreover, for the estimates based
on the Nerlove lag, the elasticity from study A (0.0075) is almost identical
to the elasticity from study C (0.0078). Thus, unlike for the meat and
citrus studies, one can conclude with some confidence that generic advertising
did indeed affect demand, at least with respect to the magazine portion
of the campaign.
For the electronic media, the elasticity estimates are not nearly as
stable. For example, study B produces a significant effect, whereas study
C produces an insignificant effect. In study D, which separates the media,
the elasticity for radio is positive and significant, but the elasticity
for television is negative and unreliable. Thus, one cannot make any definitive
statements about the effectiveness of electronic media. There appears
to be some evidence that the radio advertising is effective, but given
the instability of the estimates, one would want to replicate the study
to be sure.
Returning to print media, notice that the advertising elasticity for
magazines in study D is about three times larger than the estimates obtained
in the earlier studies. This is due to the use of a less restrictive procedure
for estimating the distributive lag in study D. In other words, the earlier
studies, by employing the Nerlove lag, produced estimates of the advertising
elasticities that were severely downward biased. The discovery of this
bias highlights the side benefits of replication noted by Tomek, namely
learning and research innovation. With the improved model specification,
we now know that the returns to the print media campaign estimated in
the earlier studies were understated. This finding should be reassuring
to producers, since print media accounted for the bulk of their advertising
investment.
As the foregoing examples illustrate, replication is essential to establish
whether estimated advertising effects are robust, and therefore trustworthy.
There are three additional reasons for replication: 1.) Data used in economic
analysis are frequently revised, which means that published studies based
on the unrevised data may be compromised due to measurement error. It
is only with replication that we can know whether the results in published
studies are spurious or real. 2.) Most published studies showing a 'significant'
advertising effect are from ad hoc models, i.e., models that are based
more on heuristics than theory. (The Rotterdam specification used in the
meat study is an example of a theory-based model.) As such, it is safe
to assume that the models are the product of a specification search. That
is, the researcher has tried alternative combinations of variables, lag
structures, and estimation procedures to find the most 'plausible' result.
There are two problems with this. First, the estimated advertising effect
may be more a reflection of the pecularities of the data set than the
actual market response. Second, the reported t-values used to determine
significance are difficult to interpret because one does not know the
true probability of a Type I error. Although replication cannot solve
the latter problem, it can shed light on the former. In particular, if
the estimated advertising effects are indeed data specific, this will
show up in the replication. 3.) Replication of the type that was done
in the meat study, hereafter referred to as 'internal replication,' permits
more extensive model testing. For example, a number of authors have provided
compelling arguments that model testing go beyond the R2 and the Durbin
Watson statistic to include tests for model mis-specification and structural
change.A number of the regression diagnostics provided for this purpose
in econometric software such as Eviews require post-sample predictions.
By witholding a portion of the sample to be used for the internal replication,
these additional tests can be performed to provide a more thorough assessment
of model adequacy.
Recommendation: all advertising studies based on time series data
should provide two sets of estimates, one for the first 80- to 90 % of
the observations in the series, and another for the entire sample period.
In addition, a complete set of regression diagnostics based on the initial
sample should be provided. This will help ensure that the results are
robust, and it should mitigate some of the more serious problems associated
with specification search. Internal replication and the associated diagnostic
testing will add only moderately to the cost of the evluation, while significantly
enhancing the study's credibility from a scientific perspective.
Table 1: Demand Elasticity Estimates for Catfish from Initial (Study A)
and Replicated Studies (Studies B, C, and D)
| Elasticity |
Study A (1980-89) |
Study B (1986-94) |
Study C (1986-96) |
Study D (1986-97) |
| Income |
0.38* |
2.19 |
1.79 |
1.58 |
| Own-Price |
-1.00 |
-0.83 |
-0.86 |
-0.71 |
| Advertising: |
| Magazines |
0.0075 |
-- |
-- |
0.0244 |
| Newspapers |
-- |
-- |
-- |
0.0614* |
| Radio |
-- |
-- |
-- |
0.0204 |
| TV |
-- |
-- |
-- |
-0.1074** |
| Mags. & News.*** |
-- |
0.0095 |
0.0078 |
-- |
| Radio & TV |
-- |
0.0002 |
-0.0005* |
-- |
* Not Significant.
** The PIL Coefficients
are significant, but the elasticity is unreliable.
See Kinnucan and Miao for details.
***Newspaper advertising is resricted to 3 months. |
|