Nick Redfern, autocorrelation function, film style, Nothelfer, De Long, outliers, IQR, grossing films, robust estimates, film editing, rank index, shot length, autocorrelation, motion picture, autocorrelation functions, cmAR, autoregressive, Action Adventure, Hollywood style, action films, adventure films, comedy films, robust methods, Drama Comedy, stylistic variation, drama films, Nick Redfern Leeds Trinity University
Content:
Journal of Data Science 12(2014), 277291 Robust estimation of the mAR index of high grossing films at the US box office, 1935 to 2005 Nick Redfern Leeds Trinity University Abstract: The modified autoregressive (mAR) index has been proposed as a description of the clustering of shots of similar duration in a motion picture. In this paper we derive robust estimates of the mAR index for high grossing films at the US box office using a rankbased auto
correlation function resistant to the influence of outliers and compare this to estimates obtained using the classical, momentbased autocorrelation function. The results show that (1) The classical mAR index underestimates both the level of shot clustering in a film and the variation in style among the films in the sample; (2) there is a decline in shot clustering from 1935 to the 1950s followed by an increase from the 1960s to the 1980s and a levelling off thereafter rather than the monotonic trend indicated by the classical index, and this is mirrored in the trend of the median shot lengths and interquartile range; and (3) the rank mAR index identifies differences between genres overlooked when using the classical index. Key words: autocorrelation, film editing, modified autoregressive index, robust methods 1. Introduction Cutting, De Long, and Nothelfer(2010) proposed the modified autoregressive (mAR) index as a statistic of
film style measuring the degree to which shots of similar duration cluster together in a motion picture. They calculate the mAR index as the intercept of the negative exponential function 1/[1 + lag] fitted to the partial autocorrelation function out to lag20 with a critical value based on the average number of shots in a motion picture from a sample. Applying this method to 150 high grossing films at the US box office released from 1935 to 2005 they identified a tendency for shots to become increasingly more correlated in length with their neighbours over time and also noted variations in the degree of shot clustering between genres. Though the mAR index can be a useful description of film style there is good reason to doubt the validity of these conclusions. The mAR values reported by
278
Nick Redfern
Cutting, De Long, and Nothelfer are derived from the classical, momentbased estimator of the autocovariance function, which is well known be nonresistant to the presence of outliers (Ma and Genton, 2000; Marrona,Martina, and Yohai, 2006). Typically, the distribution of shot lengths in a motion picture is positively skewed and contains a number of shots of atypically long duration that adversely affect the moments this function is calculated from (i.e. the mean and variance). Consequently, estimates of the mAR index determined in this way will not accurately describe the style of a film and lead to incorrect conclusions about the nature of film style. Because these long takes are `true' outliers representing the decisions of filmmakers about the arrangement of stylistic elements (staging, cinematography, editing, etc) we are interested in using robust
statistical methods that perform reliably in the presence of outliers and departures from the assumptions that underpin statistical methods(Marrona, Martina and Yohai, 2006). This paper calculates robust estimates of the mAR index using a rankbased autocorrelation function (rmAR) and compares these values to the index based on classical autocorrelation function (cmAR).
2. Classical and rankbased autocorrelation
The autocovariance function describes the statistical dependence between the values taken by a stochastic process at two points in time. The classical, momentbased autocovariance function of a weakly stationary time series x = (X1, ..., Xn)T is defined as
(h, x)
=
1 n
nh (Xi

XЇ )(Xi+h

XЇ ),
i=1
(2.1)
where XЇ is the mean and h is a lag operator specifying the distance between the observations Xi and Xi+h. The denominator in (1) is the total
sample size (n), and so this function is biased and positive semidefinite. The autocovariance function when h = 0 is equal to the variance and standardising (1) by this value gives the autocorrelation function
(h, x)
(h, x) =
.
(0, x)
(2.2)
The autocorrelation function ranges from 1 to 1, with negative autocorrelation at lag h reflecting a tendency of observations to lie on opposite sides of the mean and positive autocorrelation a tendency for observations tend to lie on the same side of the mean. The partial autocorrelation function (h, x) is the correlation between Xi and Xi+h with the linear dependence of the intervening lags removed, and can be calculated recursively using the DurbinLevinson algorithm.
Robust estimation of the mAR index
279
The above functions are not resistant to the influence of outlying
data points. The mean and the variance of a dataset have finite sample breakdown points of 1/n and unbounded
influence functions, and can be arbitrarily bad estimates of location and dispersion in the presence of even a single outlier. Therefore, the above functions, being based on these statistics, are similarly affected by the presence of outliers. The presence of outliers in the upper tail of a shot length distribution inflates the mean so that the majority of observations will tend to lie on the same side of the mean irrespective of the underlying structure of the time series. Consequently, the autocorrelation function will tend to overestimate positive autocorrelation and underestimate negative autocorrelation. The presence of outliers inflates the variance introducing a bias of (h, x) toward zero that becomes stronger as the magnitude of the outlier increases as they appear quadratically in the denominator in (2) (see Marrona, Martina and Yohai, 2006, pp. 250252). Consequently, the presence of outliers leads to underestimation of the strength of autocorrelation between observations in a time series. The lack of robustness of the classical autocovariance and its derived functions mean that the information it carries about the structure of a time series can be destroyed by just a single outlier (Ma and Genton, 2000). Furthermore, if a time series contains more than one outlier we may find spuriously large auto
correlation coefficients when h is equal to the distance between outliers (Chatfield, 2004). Rankbased methods provide an obvious alternative to the classical functions (Ferguson, Genest and Hallin, 2000) and have been explored since Wolfowitz (1943). Although some information is lost when ranking data, rankautocorrelation functions have a number of attractive properties: they are distributionfree while also being as powerful as classical methods (and in many cases more powerful); they are robust being relatively resistant to the influence of outliers and nonlinear distortions; and they are conceptually simple (Hallin and Puri, 1992). A rankbased approach to identifying serial dependency and periodicities in time series by Ahdesmaki, Lahdesmaki, Pearson, Huttunen and YliHarja (2005) calculates the autocorrelation function of a time series as
1
12
nh
nh+1
nh+1
^s = n (n  h)2  1 (Rx(i) 
2 )(Rx(i) 
), 2
i=1
(2.3)
where Rx(i) are the ranks of xi in S = {xt, t = 1, ..., n  h} and Rx(i) are the ranks of xi+h in S = {xt+h, t = 1, ..., n  h}. As a movingwindow extension of Spearman's rank correlation statistic, ^s measures the monotonicity of the relationship between two observations and does not assume linearity. This function is biased and is directly comparable to the biased autocorrelation function based on (1), though it is not guaranteed to be positive semidefinite.
280 3. Methods
Nick Redfern
The
data set used in this study comprises the same data used by Cutting, De Long, and Nothelfer accessed via the Cinemetrics database (http://www.cinemetrics.lv/index.php). However, we were unable to use all 150 films from the original study because the minimum shot length for nine films was given as 0.0 seconds and was less than 0.0s for seven films, presumably due to rounding or data entry errors. These films were excluded from the study to give a reduced sample size of 134 films. We calculated the classical and rank autocorrelation functions for the linearly detrended shot lengths of films in the sample to h = (n + 1)/2 if n is odd and h = n/2 if n is even, where n is the number of shots in a film. The rank function in (3) was calculated using unpackaged R functions by Bernhard Spangl,1 and verified as a valid positive definite sequence in each case. The partial autocorrelations for each measure were calculated recursively using the DurbinLevinson algorithm, and the mAR indices determined by fitting the negative exponential function 1/[1 + h] to the partial autocorrelation functions for lags 0 to 20 using nonlinear least squares (df = 20). The value of an index is the intercept between the fitted function and a critical value of 2/ N = 0.0611, where N = 1070 and is the median number of shots in a film in the sample. The methods used here differ from that originally used by Cutting, De Long and Nothefler and so our estimates of the cmAR differ from theirs. The full set of results is in the supplementary material attached to this article. We were unable to determine the cmAR index for three films (A Night at the Opera [1935], The Great Dictator [1940], Detour [1945]) because the lag1 autocorrelation was negative resulting in a singular gradient when fitting by nonlinear least squares, and so these films are excluded from discussion of the classical mAR index but the rank index of each film is included. It is unclear how Cutting, De Long and Nothefler obtained mAR values for these films. To describe trends in film style over time we fit a locally weighted (LOESS) regression smoother to the
Descriptive Statistics. LOESS is a nonparametric method for graphically depicting the relationship between independent and
Dependent Variables in a scatterplot by fitting a loworder polynomial to only those observations in the neighbourhood of a point on the xaxis (xi) rather than fitting the trendline globally. Observations within this window are inversely weighted according to their distance from the evaluation point so that points closest to xi have more influence on the placement of the LOESS curve than more distant observations. The degree of smoothing is controlled by the span, which specifies 1These functions can be accessed at: http://lists.rforge.rproject.org/pipermail/robusttscommits/2009March/000000.html.
Robust estimation of the mAR index
281
the proportion of the data included in the window. In this study the span was determined separately for each time series using a generalized crossvalidation procedure, and a bootstrapped 95% confidence interval gives the precision of the LOESS curve. See Jacoby (2000) for an overview of LOESS regression.
4. Results
Comparing the cmAR and rmAR values calculated for each film we see that they are very different. Specifically, the classical, momentbased method underestimates both the degree to which shots of similar duration are clustered together within a film and differences in film style between films in the sample. The cmAR indices are less than the rank mAR index for 96% of films in the reduced sample, with a median difference between the two indices of 1.42 (95% CI: 1.62, 1.21). The largest difference is for Charlie's Angels (2000), which has a classical mAR index of 1.82 but a rank index of 6.82. We also note the dispersion of the rank index is greater for films in the sample indicating more variation in editing style than that suggested by the cmAR index: the range and
standard deviation for the rmAR index are 7.37 and 1.40, respectively, and the corresponding statistics for the cmAR index are 4.01 and 0.79. Figure 1 presents the times series plots of the classical and rank mAR indices. The classical mAR index shows a gradual trend to increased shot clustering, and is consistent with the monotonic trend reported by Cutting, De Long, and Nothelfer in Figure 2.a of their article. The trendline for the rank mAR index shows a very different pattern of changes in film style with a decline in the clustering of shots from 1935 to the 1950s followed by an increase from the 1960s to the 1980s with a levelling off after 1985. Basing our analyses of changes in film style overtime on the classical mAR index would thus lead us to incorrectly describe changes in films style over time and to underestimate the size of those changes. Cutting, De Long, and Nothelfer state that the trend in the cmAR index is not an artefact of decreases in the mean shot length, but because neither statistic is resistant to outliers this claim is dubious. From Figure 2.a we see that measures of location and scale for shot length distributions are strongly related, and so we combined the median and interquartile range (IQR) of the shot lengths of each film using principal components analysis to produce a new
dummy variable that retains most of the information of the original variables (see Abdi and Williams (2010) for an overview of PCA). As an alternative descriptive statistic this dummy variable can be thought of as a size measure with films with a low score having a low median and low IQR a stronger tendency to more rapid editing while highscoring films with a high median and high IQR are edited more slowly. Plotting this score against year of release (Figure 2.b) we see the same trend in film style evident in the rank mAR index, with above average scores tending to come in
282
Nick Redfern
the early decades of the sample while later films tend have lowerthanaverage scores. There is a slowing down of film editing from 1935 to the 1950s as the median and dispersion of shot lengths increases, followed by a decrease on both measures from the mid1960s to 2005. The differences between a group of films and the one immediately preceding it have become smaller over time as editing has stabilised into a single Hollywood style: the greater variation in the scores in Figure 2.b for the 1940s and 1950s indicates much greater stylistic variation between films from those decades; while the trendline after the 1950s shows that high grossing films have converged to a single, rapidlyedited style. These trends correspond to the trends in shot clustering in Figure 1.b, indicating that changes in the rapidity of cutting and in the degree of shot clustering are a part of the same overall transformation of film style. Again, this is a relationship overlooked when using nonrobust methods. Cutting, De Long, and Nothelfer assigned the films in their sample to one of five genres (action, adventure, animation, comedy, drama) and compared the distribution of the mAR index of each genre. They found that action films tend to have a higher mAR index than films in other genres along with smaller differences between the other genres, though they did not correct for multiple comparisons. The beanplots in Figure 3 present the classical and rank mAR indices sorted by the above genre categories, and the differences in the level and dispersion of the indices are stark. To compare the distribution of indices for each genre we performed a KruskalWallis ANOVA test, and pairwise posthoc Dunn tests assuming an experimentwise error rate of 0.10 and 10 tests giving a twotailed Sidakcorrected pvalue of 0.0105 and a critical Z value of 2.56. All reported test statistics are corrected for ties. The omnibus test for the cmAR index shows a statistically significant difference between genres (2(4) = 28.50, p =< 0.01), with significant pairwise differences between the action genre and comedy films (Z = 4.09), and drama films (Z = 4.49). For the rmAR index we also see a statistically significant difference (2(4) = 38.61, p =< 0.01), with pairwise differences between action films and the animation (Z = 3.22), comedy (Z = 4.59), and drama (Z = 5.22) genres and between adventure films and comedy (Z = 3.15) and drama (Z = 3.60) films. (The difference between adventure films and animated films was not quite significant [Z = 2.46]). Using the classical mAR index would therefore lead us to miss key differences between the style of films in particular genres.
5. Conclusion
This paper compared estimates of the degree of shot clustering using the classical, momentbased autocorrelation function and a rankautocorrelation function resistant to the influence of outliers. The results show that the classical mAR
Robust estimation of the mAR index
283
index underestimates both the level of shot clustering and the variation in style among the films in the sample. We also found that this index gives a misleading impression of changes in film style over time and that the trends identified by the rank mAR index are consistent with trends in other statistics describing the editing style of these films. Finally, the classical mAR index failed to identify key differences between films in the sample when sorted by genre. These results show that the mAR index can be a useful statistics of film style but that it is necessary to use robust methods due to the presence of outliers in shot length data. Because the power spectral density of a time series is the Fourier transform of its autocorrelation function the problem of outliers in shot length data will be transmitted to the
Spectral analysis of time series. This raises questions about Cutting, De Long, and Nothelfer's claim that the editing of films in the sample shows an increasing tendency to be well fitted by a 1/f noise pattern over time. This should not lead us to reject the idea that a 1/f noise pattern is a characteristic of film editing. On the contrary, given that the rank mAR index indicates the correlation between shots has been underestimated it is likely the role of 1/f noise in entraining viewers' attention in the cinema has also been underestimated due to the incorrect identification of
White Noise in the composite power spectra of each film. Future research on the relationship between film style and attention will therefore need to employ robust methods of spectral analysis (see, for example, Spangl, 2008) to determine if this is the case.
Acknowledgements
I would like to thank Mike Baxter for a timely observation on an early draft of this article. References
Abdi, H.and Williams, L. J. (2010). Principal components analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 433459. Lahdesmaki, H., Pearson, R., Huttunen, H., and YliHarja, O. (2005). FRobust detection of periodic time series measured from biological systems. BMC Bioinformatics 6, 117. Hallin, M., and Puri M. L. (1992). Statistical Factor Analysis and Related Methods: Theory and Applications. Wiley, New York. Chatfield, C. (2004). The Analysis of Time Series: An Introduction,
6th Edition. Chapman & Hall/CRC, Boca Raton, FL.
284
Nick Redfern
Cutting, J. E., De Long, J. E, and Nothelfer, C. E. (2010). Attention and the evolution of
Hollywood film.
Psychological Science 21, 432439.
Ferguson, T. S., Genest, C., & Hallin, M. (2000). Kendall's tau for serial dependence. Canadian Journal of Statistics 28, 587604.
Jacoby, W. G. (2000). Loess: a nonparametric, graphical tool for depicting relationships between variables. Electoral Studies 19, 577613.
Ma, Y. and Genton, M. G. (2000). Highly robust estimation of the autocovariance function. Journal of Time Series Analysis 21, 663684.
Maronna, R., Martin, D., and Yohai, V. (2006). Robust Statistics: Theory and Method. John Wiley and Sons, Chichester.
Spangl, B. (2008). On Robust Spectral Density Estimation. Ph.D. Thesis, Vienna Technical University.
Wald, A., and Wolfowitz, J. (1943). An exact test of randomness in the nonparametric case based on serial correlation, Annals of Mathematical Statistics14, 378388.
Robust estimation of the mAR index
285
Supplementary material
Key: Med = median shot length (in seconds), IQR = interquartile range (in seconds), PC1 = first principle component score, cmAR = classical modified autoregressive index, rmAR = rank modified autoregressive index
Title A Night at the Opera A Tale of Two Cities Anna Karenina Captain Blood Les Miserables Mutiny on the Bounty The 39 Steps The Informer Top Hat Westward Ho Fantasia Foreign Correspondent Grapes of Wrath Pinocchio Rebecca
Santa Fe Trail The Great Dictator The Letter Thief of Bagdad Bell's of St. Mary's Blood On The Sun Brief Encounter Detour In Pursuit to Algiers Leave Her to Heaven Lost Weekend Spellbound All About Eve Annie Get Your Gun Born Yesterday Cheaper By The Dozen Cinderella Harvey The Asphalt Jungle The Flame and the Arrow Battle Cry
Year 1935 1935 1935 1935 1935 1935 1935 1935 1935 1935 1940 1940 1940 1940 1940 1940 1940 1940 1940 1945 1945 1945 1945 1945 1945 1945 1945 1950 1950 1950 1950 1950 1950 1950 1950
Genre Comedy Drama Drama Action Drama Drama Drama Drama Comedy Drama Animation Drama Drama Animation Drama Action Comedy Drama Adventure Drama Action Drama Drama Adventure Drama Drama Drama Drama Comedy Comedy Comedy Animation Comedy Drama Action
Med 4.0 5.2 6.5 4.4 5.0 4.5 4.1 7.4 5.4 4.0 6.5 4.3 6.8 4.0 4.8 4.3 9.2 7.1 3.6 6.0 9.1 9.2 12.3 5.8 6.0 8.2 5.1 4.9 5.0 6.9 7.3 3.1 13.2 6.6 4.0
IQR 6.2 7.0 8.0 4.8 7.4 7.6 6.5 9.0 7.3 5.7 8.8 5.9 9.0 4.4 7.5 6.3 14.3 13.0 4.0 9.0 12.9 12.9 16.2 13.3 9.1 11.6 9.4 8.4 13.2 16.5 14.4 2.8 26.7 8.1 5.1
PC1 0.21 0.31 0.89 0.31 0.31 0.18 0.13 1.34 0.42 0.29 1.02 0.16 1.15 0.51 0.26 0.10 2.80 1.91 0.70 0.89 2.53 2.57 4.11 1.54 0.91 2.03 0.67 0.44 1.26 2.42 2.20 1.06 6.13 0.94 0.39
cmAR N/A 2.49 0.73 2.70 2.29 2.51 2.67 1.62 3.09 2.43 3.33 1.92 1.48 2.36 1.27 4.36 N/A 1.17 2.07 1.62 2.00 1.65 N/A 0.91 1.93 1.67 1.75 2.35 4.43 1.93 1.60 3.18 0.87 1.91 3.50
rmAR 4.87 4.94 2.61 3.89 4.34 4.83 3.31 2.41 3.93 4.55 4.45 3.92 2.45 2.45 2.06 5.92 1.07 2.35 4.19 2.47 2.85 3.24 0.58 1.75 2.00 2.75 3.46 1.83 7.57 2.49 2.51 3.18 1.23 1.84 4.90
1955 Drama
6.0 7.6 0.66 1.72
2.75
286
Nick Redfern
Title East Of Eden Lady And The Tramp Mr Roberts Night Of The Hunter Rebel Without A Cause Seven Year Itch The Ladykillers The Trouble With Harry Butterfield 8 Exodus Inherit the Wind The Magnificent Seven Ocean's 11 Peeping Tom Spartacus Swiss Family Robinson The Apartment The Time Machine Dr Zhivago Flight Of The Phoenix Those Magnificent Men in Their Flying Machines Help Shenandoah Sound Of Music That Darn Cat The Great Race Thunderball What's New Pussycat Airport Aristocats Beneath The Planet Of The Apes Catch 22 Five Easy Pieces Kelly's Heroes Patton Tora! Tora! Tora! Barry Lyndon Three Days Of The Condor Dog Day Afternoon
Year 1955 1955 1955 1955 1955 1955 1955 1955 1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 1965 1965 1965 1965 1965 1965 1965 1965 1965 1965 1970 1970 1970 1970 1970 1970 1970 1970 1975 1975 1975
Genre Drama Animation Comedy Drama Drama Comedy Comedy Comedy Drama Action Drama Adventure Comedy Drama Action Adventure Comedy Adventure Drama Adventure Adventure Comedy Drama Drama Comedy Action Action Comedy Drama Animation Action Comedy Drama Action Drama Action Drama Drama Drama
Med 5.1 3.8 6.5 4.9 4.8 11.3 5.1 5.1 6.1 13.8 7.5 4.5 7.5 5.2 4.9 3.1 8.3 3.4 5.9 3.3 3.9 2.6 4.0 4.2 3.4 4.0 2.5 4.2 4.7 3.3 2.8 5.6 3.6 4.1 5.0 4.5 9.8 3.4 3.1
IQR 7.8 3.6 8.7 6.9 6.0 26.1 7.1 7.2 10.7 20.3 10.9 5.7 9.4 8.2 6.8 3.7 15.7 5.0 7.2 3.5 4.4 2.8 5.2 5.1 4.0 6.1 2.7 6.4 6.2 2.7 3.7 10.8 4.9 5.5 7.7 5.9 11.4 4.9 3.4
PC1 0.41 0.70 1.01 0.19 0.01 5.42 0.29 0.31 1.21 5.27 1.69 0.13 1.44 0.50 0.18 0.91 2.74 0.60 0.57 0.88 0.54 1.22 0.38 0.33 0.77 0.23 1.27 0.11 0.01 1.01 1.01 1.06 0.55 0.29 0.36 0.10 2.51 0.62 0.96
cmAR 1.91 2.99 1.41 1.67 3.28 1.27 1.36 1.03 2.23 2.93 1.18 2.16 2.70 1.40 2.28 3.92 2.62 2.10 1.66 3.04 2.03 2.02 2.51 2.64 0.81 2.73 3.22 2.11 1.79 2.57 3.62 1.71 3.14 2.98 2.73 2.38 2.86 2.80 3.22
rmAR 2.83 2.84 2.00 2.90 4.54 1.26 2.00 1.25 3.11 4.42 1.57 4.45 1.94 2.74 3.68 5.66 3.58 3.78 2.45 5.26 4.65 2.96 2.75 4.73 2.32 5.29 5.40 4.20 2.50 3.15 5.79 4.46 3.70 4.57 4.90 4.54 3.73 4.09 4.67
Robust estimation of the mAR index
287
Title Jaws The Man Who Would Be King Monty Python And The Holy Grail One Flew Over the Cuckoo's Nest Return Of The Pink Panther The Rocky Horror Picture Show Shampoo Airplane Coal Miner's Daughter The Empire Strikes Back Nine To Five Ordinary People Popeye Stir Crazy Superman 2 The Blue Lagoon Urban Cowboy Back To The Future Cocoon Out Of Africa Police Academy 2 Rambo II Spies Like Us The
Color Purple Witness Dick Tracy Die Hard 2 Ghost Goodfellas Home Alone Hunt For Red October Pretty Woman Teenage Mutant Ninja Turtles Total Recall Ace Ventura 2 Apollo 13 Batman Forever Casper Goldeneye Jumanji Pocohontas
Year 1975 1975
Genre Adventure Action
Med 3.6 4.9
IQR 4.7 5.8
PC1 0.59 0.01
cmAR 1.76 2.07
rmAR 5.67 2.81
1975 Comedy
2.6 3.5 1.11 2.36
3.29
1975 Drama
3.6 4.1 0.69 2.58
3.59
1975 Comedy
3.7 6.9 0.19 1.71
3.56
1975 Comedy
3.3 4.4 0.73 2.39
5.12
1975 Drama
6.1 8.6 0.86 1.79
3.40
1980 Comedy
4.3 5.8 0.18 1.84
3.27
1980 Drama
5.3 9.2 0.70 2.34
3.75
1980 Action
2.9 3.1 1.08 3.45
5.45
1980 Comedy
3.9 4.5 0.52 2.07
3.26
1980 Drama
3.5 4.5 0.65 2.25
4.27
1980 Comedy
3.3 3.5 0.88 3.46
5.47
1980 Comedy
4.4 5.4 0.21 1.20
2.41
1980 Action
2.5 2.9 1.24 3.96
6.17
1980 Adventure 3.1 3.5 0.95 2.54
4.06
1980 Drama
3.5 4.0 0.73 2.70
4.86
1985 Action
2.7 3.9 1.01 3.51
7.95
1985 Adventure 3.9 4.6 0.51 2.99
5.18
1985 Drama
3.5 3.9 0.75 2.94
5.55
1985 Comedy
3.0 3.8 0.93 2.39
4.57
1985 Action
2.0 2.1 1.53 2.96
6.90
1985 Comedy
2.5 3.0 1.22 1.97
3.58
1985 Drama
4.7 6.0 0.02 3.01
3.87
1985 Drama
4.2 4.9 0.36 2.81
4.41
1990 Action
2.8 3.2 1.09 3.65
5.17
1990 Action
2.1 2.3 1.46 2.69
4.52
1990 Comedy
3.4 3.9 0.78 3.26
5.34
1990 Drama
4.2 5.5 0.26 2.08
3.05
1990 Comedy
3.1 3.4 0.96 2.93
4.34
1990 Action
4.7 5.7 0.07 2.18
3.82
1990 Comedy
3.8 4.5 0.56 1.80
2.83
1990 Action
2.8 3.2 1.09 2.38
4.16
1990 Action
2.4 2.8 1.29 3.41
6.78
1995 Comedy
2.7 3.4 1.09 1.92
3.79
1995 Adventure 3.5 3.7 0.78 2.61
3.76
1995 Action
2.4 2.8 1.29 3.72
6.40
1995 Comedy
4.1 5.8 0.24 3.20
4.26
1995 Action
2.3 2.7 1.33 3.60
5.95
1995 Adventure 2.5 2.6 1.29 3.57
5.29
1995 Animation 2.8 2.7 1.17 2.51
3.51
288
Nick Redfern
Title Sense And Sensibility Toy Story Castaway Charlie's Angels Dinosaur Erin Brockovich The Grinch Who Stole Christmas Scary Movie The Perfect Storm What Women Want Xmen Hitch King Kong The Longest Yard Madagascar Mr and Mrs Smith Walk The Line The Wedding Crashers
Year 1995 1995 2000 2000 2000 2000 2000
Genre Drama Animation Adventure Action Animation Drama Comedy
Med 3.8 2.1 4.5 2.0 2.8 4.2 2.6
IQR 4.2 2.1 6.7 2.1 2.2 3.8 3.0
PC1 0.60 1.50 0.03 1.53 1.26 0.54 1.19
cmAR 1.95 2.69 3.43 1.82 3.26 2.37 2.58
rmAR 4.62 3.58 5.67 6.82 4.36 3.44 4.17
2000 Comedy
2.3 2.6 1.35 2.54
3.46
2000 Adventure 3.3 3.5 0.88 3.55
5.47
2000 Comedy
2.5 2.8 1.25 2.93
4.28
2000 Action
2.0 2.1 1.53 3.32
6.10
2005 Comedy
2.8 2.7 1.17 2.46
3.61
2005 Adventure 2.6 2.5 1.27 4.59
5.75
2005 Comedy
2.3 2.0 1.45 1.61
3.34
2005 Animation 3.0 3.6 0.96 2.33
3.67
2005 Action
2.8 3.4 1.06 2.73
4.77
2005 Drama
4.3 4.0 0.48 2.10
2.56
2005 Comedy
2.4 2.4 1.35 2.82
5.68
Sixteen films were not included in the study: Philadelphia Story (1940), Anchors Aweigh (1945), Mildred Pierce (1945),
King Solomon's Mines (1950), Sunset Boulevard (1950), To Catch a Thief (1955), Little Big Man (1970), Mash (1970), Jewel of the Nile (1985), Rocky IV (1985), Dances with Wolves (1990), The Usual Suspects (1995), Mission Impossible 2 (2000), Chicken Little (2005),
Harry Potter and the Goblet of Fire (2005), and Star Wars: Episode III The Revenge of the Sith (2005).
Received December 13, 2012; accepted August 19, 2013.
Nick Redfern Department of Journalism, Media, and Business, Leeds Trinity University, Leeds, LS18 5HD,
United Kingdom [email protected]Robust estimation of the mAR index
289
mAR index
(a) Classical mAR index 5 4 3 2 1 0
1940
1950
1960
1970 Year
1980
1990
2000
(b) Rank mAR index 8
6
4
2
0
1940
1950
1960
1970 Year
1980
1990
2000
mAR index
Figure 1: Scatterplot of (a) classical and (b) rank mAR indices against year of release with fitted LOESS trendlines and bootstrapped 95%
confidence intervals.
290
Nick Redfern
IQR (s)
(a) Descriptive statistics 30 25 20 15 10 5 0
0
5
10
15
Median (s)
(b) First Principal Component score 8 6 4 2 0 2
1940
1950
1960
1970 Year
1980
1990
2000
Score
Figure 2: (a) Scatterplot of median shot lengths and interquartile ranges for high grossing films at the US box office, 1935 to 2005 (N = 134). (b) Scatterplot of first principal component score against year of release with fitted LOESS trendline and bootstrapped 95% confidence interval. The dummy variable is a combination of these descriptive statistics, and accounts for 96.5% of the variance of the original variables. PC scores are calculated from the correlation matrix, and the correlation of both the median and IQR with the first principal component is 0.98. The loadings for the median and the IQR on the first principal component is 0.7071 for both variables.
Robust estimation of the mAR index
291
10
8
mAR Index
6
4
2
0
Action
Adventure
Animation
Comedy
Drama
Figure 3: Classical (left) and rank (right) mAR indices of high grossing films at the US box office, 1935 to 2005, sorted by genre. The average beanlines are set to the sample medians.
N Redfern