Is Accuracy Only For Probability Samples, J Martinsson, S Dahlberg

Tags: web survey, respondents, telephone survey, random samples, response rate, probability samples, surveys, population, probability, Driving license, internet access, point estimates, the SOM, Political interest, Statistics Sweden, random population, SOM, population sample, recruitment, response rates, random population sample, Stefan Dahlberg, Frederick S. Wamboldt, Public Opinion, Internet Surveys, conventional methods, Opinion Polls, Susan J. Bartlett, Krosnick, James C. Peterson, random sampling, internet coverage, survey research, Cynthia S. Rand, probability sampling, accuracy
Content: Is Accuracy Only For Probability Samples? Comparing Probability and Non-probability Samples in a Country with Almost Full Internet Coverage Johan Martinsson, Stefan Dahlberg and Sebastian Oskar Lundmark Department of Political Science, University of Gothenburg Abstract Commercial on-line panels based on non-probability samples have begun to be widely used not only in traditional market research but also in more academic research. But is the quality and accuracy of such data comparable to that of probability based samples? The overall aim of this study is to compare the quality of probability based and non-probability based on-line panels in Sweden, a country with almost full internet coverage. We proceed in two steps. Firstly, we compare the accuracy of three survey modes all using random samples: postal survey, telephone survey, and web survey. Secondly, we compare the accuracy of two commercial non-probability based panels with two commercial probability based panels, using the traditional mail and telephone surveys as benchmark surveys. Demographics are compared to government records, and attitudes are compared to benchmark studies of high quality and high response rate. In order to allow comparisons, seven surveys with comparable questions were run at approximately the same time. We compare the accuracy of the four commercial on-line panels both with and without weights. In contrast to previous studies, the results indicate a surprising similarity in terms of accuracy between probability panels and non-probability panels. The two non-probability based online panels do not seem to be less accurate than probability based on-line panels in terms of demographics, nor do their estimates of political attitudes seem to differ more from traditional methods such as a high response rate mail survey. We conclude that a larger comparison based on more demographic and attitudinal variables are needed. Paper to be presented at the AAPOR conference in Boston, MAy 15-19, 2013
Introduction The last decades, public opinion research has struggled with continuously lower and lower Response Rates in surveys using probability samples. This trend has been attributed to respondents being harder to contact as well as less prone to answer surveys in general. Lower response rates have also been evident regardless of mode of survey. These recent problems facing survey research has contributed to casting doubt on the advantages of random samples. Recently, various non-probability based surveys, such as self-recruited on-line surveys and panels, have emerged and have become increasingly used. Today, a large part of mainstream media use public opinion polls from companies using self-selected access panels rather than probability based samples, with the logic that probability samples with low response rate does not perform better than a large non-probability sample with high participation rates (e.g., Crampton 2007; Kellner 2004, 2007; Zogby 2007, 2009). Academic researchers have also started using non-probability based surveys to a larger extent not only for experiments (e.g. Druckman et al 2012), but also for crosssectional or panel data collection (e.g. Crewe 2005; Twyman 2008). In contradiction to this Yeager and colleagues (2011), when comparing RDD-telephone surveys and non-probability internet surveys, found that the probability samples performed better than the nonprobability sampled counterparts. However, when comparing these different samples they could not safely assess whether differences found where due to mode or simply sampling techniques. In addition to this Chang and Krosnick (2009, 34) found similar results when comparing a probability to non-probability internet samples; probability samples yielded better demographic estimates, however with more measurement errors than the non-probability sample. The principal caveat to the works done by Yeager and colleagues (2011) and Chang and Krosnick (2009) is that they compare internet-samples from 2000-2005 in a country where only approximately 61 percent of the population in 2005 claimed to have Internet access at home and 71 % in 2010 (Smith 2010, 6; Pew Research Centre 2012). In 2010 the same number in Sweden was 92 percent. Perhaps the differences between the internet samples could be attributed to internet access being less extensive in the U.S? When internet access becomes more common, maybe even the non-probability samples become closer to representative of the population? This study compares three cross-sectional probability surveys as well as four surveys using the methods of on-line panels of pre-recruited respondents (two probability based panels and two nonprobability based panels). The first question we address concerns how three survey modes differ when using the same kind of sampling (random sampling from the pubic population register). The second question we address concerns how and to what extent on-line surveys using probability 1
based and non-probability based on-line panels differ, both from each other and from benchmark data. We compare primary and secondary demographics as well as a set of political attitudes. In addition, these comparisons are made in different context from previous research, a context where internet access is as common as owning a telephone and thus not as socially exclusive as in U.S prior to 2010. The first section of this paper presents a brief review of the research field. The second section will present the data and research design. The third section will present the comparisons of the different samples. Previous research Previous research indicates that different sampling procedures as well as different survey-modes create biases in responses and that random-sampling perform better valuations than self-selected samples. Malhotra & Krosnick (2007) compared two different survey modes; self-selected internet survey and randomly sampled face-to-face interview. They found that both probability sampling as well as face-to-face interviews proved more reliable for achieving representativeness. This lead them to suggest that "researchers interested in assuring the accuracy of their findings in describing populations should rely on face-to-face surveys of probability samples rather than Internet samples of volunteer respondents." (Malhotra & Krosnick 2007, 286). One potential caveat of their results is that the surveys used different wordings and answer alternatives for their questions thus leading their comparisons to become only tentative (Malhotra & Krosnick 2007, 291). Another potential caveat is that they compare different survey modes (face-to-face and web surveys). If one could compare different sampling procedures with the same type of questionnaire as well as with the same survey mode, one might come closer than just indications of whether different sampling modes create biases in responses as well as representativeness. To rectify these shortcomings Chang and Krosnick (2009) compared the representativeness of one probability telephone-survey, one probability internet survey and one non-probability internetsurvey. They found that internet surveys performed better on respondents self-reports and that satisficing and measurement error were higher in probability samples, compared to the internet nonprobability samples. However, when comparing the probability to the non-probability sample they found, in line with the findings of Malhotra and Krosnick (2007), that probability-samples fared a lot better at representativeness, even after weighting (Chang & Krosnick 2009, 34). The same conclusion was reached in Yeager et al. (2011) where probability telephone surveys done in 2004/2005 and probability internet surveys done in 2004/2005 and 2009 were compared to an internet nonprobability sample from the same years (Yeager et al 2011, 713-715). The accuracy of the probability 2
samples was even better than the non-probability samples even when the response rates of the probability samples were low. It thus seem as if we can expect to find in that the four non-probability samples in our study should be perform comparatively poor compared the three probability samples (Yeager et al 2011, 737). From the studies outlined above it might seem redundant to yet again analyze how representative an internet sample (both non-probability and probability sampled) can become using different modes of sampling. However the studies above mostly stem from data collected early after the turn of the millennia and are limited to studies in the USA (Malhotra & Krosnick 2007; Chang & Krosnick 2009; Yeager et al 2010). By the year 2012, internet usage in Sweden has gotten an even more important part in citizen's lives. As many as 92% of the population claim to have availability to the Internet in their homes whilst only 82.5% of the same population claim to have a regular phone (the same numbers in the year 2000 in was 65.0% internet users and 98.1% telephone users)1. In a digitalized era and a country like Sweden, where internet is even more common than telephones, survey modes using the Internet as a basis of sampling might prove to not be as biasing as previous studies have claimed. Data and measurements This study compares seven surveys with a set of identical questions. For all but one, the fieldwork was conducted almost simultaneously ­ in June 2012. The exception being the SOM-institute survey, which uses an extended period of field work, of which the main part was conducted in September to October 2012. Three surveys are traditional cross-sectional studies using random samples from the Swedish individual level population register which is frequently updated and accurate. These three surveys represent different survey modes (mail survey, telephone survey, and on-line survey), but the same sampling frames and methods. The web survey (LORe) has to use another contact mode than the actual survey mode since it starts from a random population sample, and no e-mail address register exist. The LORe survey started with a postal invitation to offer people to participate in the actual online survey. None of these three surveys offer participants any kind of material incentives. 1 Data from the national SOM survey in 2011. 3
Table 1. Overview of seven surveys
Survey company
Mode
Sampling method
The SOM institute Mail questionnaire Random population sample
Detector
Telephone interview Random population sample
LORe
Web survey
Random population sample
Novus TNS Sifo YouGov Cint
Web survey Web survey Web survey Web survey
Probability based recruitment into web panel Probability based recruitment into web panel Self-recruitment into web panel Self-recruitment into web panel (85%)
Response rate 53 51 8
In addition, four commercial on-line panels were used. Two of these work with probability based recruitment into their panel (Novus and TNS Sifo), while YouGov works with the strategy of selfrecruitment/open-panel. TNS Sifo describes the recruitment to their on-line panel in the following way: "The TNS Sifo web panel is recruited from random nationally representative telephone surveys and from nationally representative postal surveys". Novus describe their recruitment in the following way: "We use probability based recruitment, that is, not self-recruitment. This means the initiative always comes from Novus, and that no one can register or apply to be in the panel. Recruitment is made through several different channels, mainly through telephone interviews but also through targeted personal invitations to underrepresented target groups". Lastly, Cint, is not a company running their own on-line panel, rather Cint provides access to a large number of other on-line panels that are hosted by Cint. In our comparison, we use Cint as an example of a self-recruited on-line panel and did not choose to specify from which panels our respondents should be selected. Currently, in Sweden, this means that we receive on average 85% respondents from self-recruited panels and 15% from probability based panels. These four commercial on-line panels all provided participation rates upon request, but since it is our view that such participation rates are not as meaningful as response rates from random population samples they are not included in table 1. All these four commercial on-line panels offer their respondents various kinds of incentives and rewards for participation in their surveys. In order to enable meaningful comparisons we made efforts to make sure that all seven surveys contained a set of identical survey questions. It was also important to make sure there were governmental records or census data available to which the survey results could be compared. However, due to limited funds, this study only contains a rather limited set of primary and secondary demographic indicators that can be compared to benchmark data from the national population register.
4
Comparing three survey modes Demographics The first part of our analysis will focus on mode effects comparing three different data sources, benchmarked against available census data. The three surveys are based on random samples of Swedish citizens from the national population register, but differ in survey modes; SOM is carried out as a traditional postal survey, Detector as a telephone survey, and LORe as a web survey. Given an small but expected unequal distribution of internet access and registered telephone numbers among different age cohorts, it is plausible to expect that different people will be tend to be attracted to different survey modes. In table 2 one can see that all three samples are fairly accurate in terms of gender, although the SOM survey has an overrepresentation of women. Regarding the Age Composition of the three samples we find that both SOM (postal) and LORe (web) are underrepresented in terms of the youngest respondents. This is a somewhat surprising result since one would expect the younger cohorts being more easily attracted to the web surveys than other cohorts. The fact that both surveys are similar in terms of the distribution of respondents in different age cohorts suggests that younger people seems to be less inclined to participate in surveys carried out in a 'traditional' mode, in the sense that internet surveys are mimicking the paper survey even though the LORe survey is distributed by internet. Younger people are also more difficult in general to persuade to participate in any kind of survey. And since the LORe web survey has a distinctly lower response rate this might have contributed the underrepresentation of younger people. Although we cannot know this, it is possible that a higher response rate in the web survey had meant better accuracy in terms of age. 5
Table 2. Demographic accuracy of three survey modes, unweighted estimates (deviation from Statistics Sweden in italics)
The
Statistics SOM
Sweden institute Diff. Detector Diff.
LORe
Diff.
Sex a
man
50
45
-5
49
-1
47
-3
woman Age a
50
55
5
51
1
53
3
18/30
27
18
-9
23
-4
17
-10
31/40
21
18
-3
20
1
18
-3
41/50
20
23
3
20
0
18
-2
51/60
17
19
2
17
0
23
6
61/70 Education b
16
22
6
20
4
23
7
College degree*
38
43
5
47
9
53
15
labour market position c
Unemployed
6
6
0
5
-1
6
0
Working Driving license d
80
75
-5
81
1
76
-4
0
0
Yes
81
-
87
6
90
9
a Sex and Age pertain to the full target range of 18-70 years of age. b Education pertain to the age span of 25-64 years of age. c Employment status pertain to the age span of 20-64 years. d Driver's license pertain to the age span of 18-64 years of age. *=Number is an estimate; no exact official figure concerning "degree" is available.
Considering education, or more specifically the amount of respondents with at least a college degree, all three surveys are over-reporting compared to the census records. This is especially true for the LORe survey where 53 percent of the respondents holds a college degree compared to the census data's 38 percent. Reaching respondents by a web survey seem to a further extent maroon the lower educated comparing to the other modes. However, we should also keep in mind that the web survey (LORe) also yielded the lowest response rate by far. In order to get a more comprehensive summary measure for the deviations of the three surveys and data from the population register table 3 presents and compares absolute deviations averaged over all categories of each indicator, and then in turn averaged over our five indicators. On average, the Detector telephone survey has the smallest deviation (the highest accuracy) over all five indicators while the LORe web survey has the largest. The SOM postal survey is superior in terms of education but is performing less accurate for the other indicators. When it comes to the primary demographics such as age and gender, the LORe web survey is surprisingly accurate given its low response rate and actually comes close than the SOM mail survey with six times its response rate.
6
Table 3. Demographics, average absolute deviation from Statistics Sweden
Average 5 indicators
The SOM institute 4.3
Detector 3.8
LORe 6.9
Sex Age Education labor market situation Driving license
5.0
1.0
3.0
4.6
1.8
5.6
5.0
9.0 15.0
2.5
1.0
2.0
-
6.0
9.0
Attitudes
Given the discrepancies that were found in terms of demography between the three samples, it is plausible to expect these differences to have impact on point estimates on political attitude questions. The obvious problem when comparing attitudes is that there is no natural benchmark to rely on for measuring and comparing accuracy. We still believe a comparison between the three surveys is of relevance and may provide some insight into how the three modes might differ. Table 4. Comparing attitudes in three different survey modes, unweighted estimates
The SOM institute
Political interest
Very interested
12
Fairly interested
43
Not particularly interested
36
Not interested at all
9
Trust in politicians
Very high
1
Fairly high
39
Fairly low
50
Very low
9
Left-right position
Left (0-3)
-
Middle (4-6)
-
Right (7-10)
-
Congestion charges Gothenburg
Against
62
In favor
21
Don't know/uncertain
18
Detector 13 54 28 5 3 57 32 8 27 44 29 55 35 10
LORe 21 57 20 1 2 47 40 11 30 41 30 49 35 16
7
What appears in table 4 is that LORe (web) has a larger proportion of politically interested respondents, which may be due to the result we noted in tables 2 and 3 ­ the overrepresentation of highly educated people in the same survey. Of even more interest is the larger proportion of the less politically interested respondents in the SOM survey. In this respect postal surveys seem more efficient for surveying that part of the `hard to get' population. But once again, the better performance for the postal survey might also be due to the fact that the SOM institute employs a much more intense and prolonged field work with a very large number of reminders and follow-ups. It is not necessarily the mode (paper) in itself. It is also possible that those with higher political interest have a higher general motivation to participate in and see surveys as useful. That the web survey attracts a larger share with higher political interest can certainly be related to this since it takes a slightly higher amount of initiative and motivation to participate in the web survey than in the telephone survey. When an interviewer calls, if the respondent picks up the phone, all they have to do in order to participate is to talk to the interviewer for five-six minutes. For the web survey on the other hand, they have to read the post card they received in their mail, and decide to go to their computer or similar device, and actually type the address to the survey and enter their log-in information. This higher degree of motivation is likely to be correlated with an interest in political or social issues. The differences between the SOM and LORe, and to some extent Detector, are also reproduced in the trust item where the SOM survey has a higher proportion of respondents with lower levels of trust in politicians. Regarding the respondents' left-right positions (only available for the Detector and the LORe surveys) and the item of congestion charges, no clear patterns are appearing. However, we do see a lower share of "Don't know" in the telephone survey, as expected. This is clearly consistent with a higher degree of social desirability bias in telephone surveys. Overall, mode effects exist and they have implications for the representativeness of a survey even if sampling procedures stay the same. Postal surveys seems to be better at capturing otherwise 'hard to get' respondents such as 'low-trusters' and less politically interested respondents. We could also discern an overrepresentation of educated and politically interested respondents in the web-survey. It should be stressed that comparing mode effects from three probability based samples is a fragile escapade and favorably, comparisons should be founded on several random samples. This since the very nature of the probability sampling procedure implies that we will obtain different estimates from time to time by pure randomness. Further, the large difference in response rate between the mail survey and the web survey makes this comparison somewhat difficult. The observed differences in this study should thus be interpreted cautiously. 8
Comparing probability and non-probability on-line panels
In the second part of our analysis we are shifting our attention from survey modes to sampling procedures. In table 5 we investigate differences in demography compared to census data, but now with the mode of the surveys (web) held constant while the sampling of respondents is varying. In table 5, four surveys carried out by four different commercial on-line panels are presented. As seen in table 1, the recruitment to these four on-line panels differs in the sense that respondents in Novus and Sifo are recruited using probability sampling into their panels, while YouGov is based on selfrecruitment into their panel, and the sample from Cint is a combination of both, but clearly dominantly self-recruited in this case (85 percent). In the light of previous research, the natural expectation is that probability based on-line panels should have superior accuracy to self-recruited on-line panels.
Demographics
We start by examining unweighted estimates briefly, and then move on to the accuracy of the weighted estimates of demographic indicators.
Table 5. Demographics accuracy of four on-line panels, unweighted estimates (deviation from Statistics Sweden in italics)
Sex a man woman Age a 18/30 31/40 41/50 51/60 61/70
Statistics
TNS
Sweden Novus
Sifo
YouGov
Cint
50
50
0
49
-1
45
-5
47
-3
50
50
0
51
1
55
5
53
3
27
19
-8
8
-19
21
-6
30
3
21
21
0
15
-6
21
0
20
1
20
18
-2
23
3
23
3
21
1
17
18
0
27
10
19
2
14
-3
16
24
6
28
12
17
1
14
-2
Education b College degree*
38
44
6
47
9
40
2
37
-1
Labor market position c
Unemployed
6
7
1
3
-3
-
Working
80
78
-2
83
3
-
-
11
5
-
64
-16
Driving license d
Yes
81
88
7
93
12
83
2
80
-1
a Sex and Age pertain to the full target range of 18-70 years of age. b Education pertain to the age span of 25-64 years of age. c Employment status pertain to the age span of 20-64 years. d Driver's license pertain to the age
span of 18-64 years of age. *=Number is an estimate; no exact official figure concerning "degree" is available.
9
First, we examine the four surveys from on-line panels without applying the post-stratification weights they provided. In the case of Cint, we constructed the weights ourselves, but the other three provided their own weight variables. When examining the unweighted data, we should remember that all these survey companies intend their data to be used with weighting. Notably, all four surveys are quite similar in their gender compositions. Worst off is the YouGov survey, which is based on a self-recruited sample, with women being overrepresented by about five percentage points. This, however, is the same as in the case of the SOM mail survey in the previous section. Regarding the age differences in the surveys the results are mixed and at first glance no clear division between the probability-based versus the self-recruited samples can be noted. Sifo is clearly containing way too few of the youngest respondents, while Cint have a slightly too large proportion of respondents in this category. The two probability based samples are, however, over-represented in terms of older people and both are scoring low on younger people as well, which is an interesting result per se. This is also a similar to the result in the previous section with the three probability based cross-sectional studies. In general it is reasonable to expect older people to be less inclined to participate in web-based surveys. The point here is that all surveys are carried out through web surveys, by which we would expect an underrepresentation of older people in general, no matter which sampling procedure being used. Instead, older people are over-represented in the two probability sampled surveys and so is the amount of more highly educated people. The same pattern appears when it comes to having a driving license (which is reasonable since one can expect a correlation between age, education and having a driving license). Since we do not have access to the actual sample who received the surveys, but only those who participated, it is hard to know exactly why the two probability based panels perform so clearly worse when it comes to young respondents. The rather large proportion of elderly respondents in the samples, besides that Cint-sample which scored below the census records in this respect, is not so surprising since the age span for the oldest people included in the samples is set to 61-70 years of age at the same time as about 85 percent of the Swedish population have internet access (Findahl 2010). Regarding the respondent's employment status, the two probability samples are quite accurate compared to the census records while Cint have a rather large over-representation of unemployed respondents. Let us now turn to the accuracy of the weighted estimates instead. These are presented in table 6. 10
Table 6. Demographic accuracy of four on-line panels, weighted estimates (deviation from Statistics Sweden in italics)
Statistics
TNS
Sweden Novus
Sifo
YouGov
Cint
Sex a
man
50
51
1
51
1
50
0
50
0
woman Age a
50
49
-1
49
-1
50
0
50
0
18/30
27
25
-2
25
-2
27
0
26
-1
31/40
21
20
-1
19
-2
21
0
21
0
41/50
20
19
-1
20
0
20
0
20
0
51/60
17
18
1
18
1
17
0
17
0
61/70 Education b
16
18
2
18
2
15
-1
16
0
College degree*
38
Labor market position c
44
6
50
12
41
3
37
-1
Unemployed
6
7
1
4
-2
-
-
11
5
Working Driving license d
80
77
-3
78
-2
-
-
66
-14
Yes
81
88
7
93
12
83
2
81
0
a Sex and Age pertain to the full target range of 18-70 years of age. b Education pertain to the age span of 25-64 years of age. c Employment status pertain to the age span of 20-64 years. d Driver's license pertain to the age
span of 18-64 years of age. *=Number is an estimate; no exact official figure concerning "degree" is available.
Since all surveys are weighting according to the official census records in terms of gender and age, they all become highly accurate on these two primary demographics when weights are applied. Our three secondary demographics, education, employment status, and having a driving license, are however only affected by the weights to a small extent. In table 7 the demographic accuracy of the four on-line panels are summarized, both with and without weights. The first thing that is striking when we compute an average deviation for the five indicators is that YouGov and Cint, the two non-probability based panels, both have a smaller average deviation from official census records than the two probability based panels (Novus and TNS Sifo). This is clearly contrary to previous research by for example Yeager et al. (2011). And it does now matter if we compute this index based on weighted or unweighted estimates. However, it should be noted that Novus is fairly close to Cint, or to the unweighted number for YouGov. While TNS has a far higher average deviation, even when applying their weights. Two things are worth stating already here: Firstly, we cannot at this stage put too much trust in these results due to the low number of indicators (compared to for example Yeager et al 2011). This makes the result less reliable. Neither can we put too much trust in a single survey from each panel when
11
the results go counter to previous research. Larger extended studies from the Swedish context are needed. Table 7 also show us that the two non-probability based panels perform clearly better when it comes to secondary demographics such as education and driving license, the exception being labour market situation, where Cint vastly over-represents unemployed. One possible interpretation of these results is that the two probability based panels seem to attract more people with higher social status, that are more integrated into society, while the two nonprobability based panels attract more people with low social or socioeconomic status, and that this is to their benefit and makes them potentially more representative of the general population. Table 7. Demographics, average absolute deviation from Statistics Sweden (weighted estimates), summary table
Average 5 indicators, weighted Average 5 indicators, unweighted Sex, weighted Sex, unweighted Age, weighted Age, unweighted Education, weighted Education, unweighted Labour market situation, weighted Labour market situation, unweighted Driving license, weighted Driving license, unweighted
Novus 3.5 3.7 1.0 0.0 1.4 3.8 6.0 6.0 2.0 1.5 7.0 7.0
TNS Sifo 5.7 7
YouGov 1.3 2.9
1.0
0.0
1.0
5.0
1.4
0.2
10
2.4
12.0
3.0
9.0
2.0
2.0
-
3.0
-
12.0
2.0
12.0
2.0
Cint 2.2 3.1 0.0 3.0 0.5 2.0 1.0 1.0 9.5 10.5 0.0 1.0
Attitudes Using the best performing probability sample cross-sectional survey, both in terms of response rates, demographics as well as in reaching the `hard to get'-population, we aim to benchmark how well the various online-panels perform on a set of political attitudes. Since the Detector telephone survey and the SOM mail survey showed to be fairly similar in terms of response rate and overall demographic accuracy the SOM mail survey is the best choice as quasi-benchmark since the postal mode is much more similar to the web mode in terms of Visual presentation and the lack of a personal interviewer. In table 8 the four commercial online-panels' estimates (weighted, unweighted estimated are found 12
in table A.1 in the appendix for those interested) of political attitudes are presented and compared to those of the SOM survey.
Table 8. Comparison of attitudes in four on-line panels, weighted estimates
Political interest Very interested Fairly interested Not particularly interested Not interested at all Average absolute deviation: Trust in politicians Very high Fairly high Fairly low Very low Average absolute deviation Left-right position Left (0-3) Middle (4-6) Right (7-10) Congestion charges Gothenburg Against In favor Don't know/uncertain Average absolute deviation Mean average absolute deviation for 3 indicators:
The SOM institute Novus
12
20
43
51
36
26
9
3
8.0
1
3
39
54
50
41
9
2
8.3
-
32
-
32
-
36
62
55
21
30
18
15
6.3
7.5
Sifo YouGov
17
19
49
46
31
30
3
5
5.5
5.0
3
1
49
41
40
45
8
13
5.8
2.8
27
27
37
36
36
37
55
58
30
25
15
17
6.3
3.0
5.9
3.6
Cint 17 49 30 4 5.5 1 37 47 15 2.8 27 42 31 55 29 16 5.7 4.6
In terms of absolute deviations both the Cint and YouGov-sample perform better estimates with a difference of more than half the mean average absolute errors compared to the Novus sample. All four samples give point estimates quite far away from the benchmark but in general both the samples that start off with a self-recruited sample seem to perform closer point estimates in political attitudes than the two samples claiming to use probability-based recruitment. Most of these differences are likely to stem from the fact that the YouGov and Cint-sample perform better at recruiting low-educated and younger respondents. In turn, this difference cannot later be overcome by weighting. One should, however, be cautious in the interpretations of the implications of the deviations from the benchmark survey, since the SOM survey after all just is another survey, although of higher quality.
13
Conclusions This paper deals with two overarching research question. The first question focus on the impact of different survey modes (similar samples and sampling procedures, but varying survey modes such as postal, telephone and internet surveys) while the second question comprise different sampling procedures (internet surveys based on probability and non-probability samples). Overall, mode effects exist and they have implications for the representativeness of a survey even if sampling procedures stay the same. The more traditional postal surveys seems to be better at capturing otherwise 'hard to get' respondents such as 'low-trusters' and less politically interested respondents. We could also discern an overrepresentation of educated and politically interested respondents in the web-survey. It should be stressed that comparing mode effects from three probability based samples is a fragile escapade and preferably, comparisons should be founded on several samples from each mode. This is because the very nature of the probability sampling procedure implies that we will obtain different estimates from time to time by pure randomness. Further, the amount of effort made to persuade those in the sample to agree to participate in the survey also differ between these three surveys, and it is hard to keep such things constant between different modes. The observed differences in this study should thus be interpreted cautiously. When we compared the four on-line panels in terms of demographics we received a surprising result: the average deviation from benchmark data was smaller for the two non-probability based panels than for the two probability based surveys. This result contradicts most previous research. Further, when comparing a set of political attitudes to a quasi-benchmark study with high response rate and probability sample the two non-probability based panels also came closest to the benchmark. This particular result is even less certain, since we lack a real benchmark for attitudes. It is important to remember, though, that we cannot at this stage put too much trust in this result due to the low number of demographics indicators in our study. This makes the result less reliable. Neither can we put too much trust in a single survey from each panel when the results go counter to previous research. The safe conclusion at this stage is that in this particular study, we do not find any indication that non-probability based on-line panels are less accurate than probability based on-line panels. But this is certainly a clear indication that more and larger COMPARATIVE STUDIES from the Swedish context are certainly needed in this area. 14
But after all we should not disregard the possibility that some self-recruited panels can attract respondents, for example with low socio-economic status, that are often hard-to-get for conventional methods and probability based sampling. Probability based samples have obvious and undisputable advantages that non-probability samples cannot match. But sometimes the real business of survey research does not meet the high standards and principles of statistical theory. This might be the explanation why two non-probability panels in this study seem to outperform two probability based on-line panels. 15
Appendix
Table A.1 Comparison of attitudes in four on-line panels, unweighted
Political interest Very interested Fairly interested Not particularly interested Not interested at all Trust in politicians Very high Fairly high Fairly low Very low Left-right position Left (0-3) Middle (4-6) Right (7-10) Congestion charges Gothenburg Against In favor Don't know/uncertain
The SOM institute 12 43 36 9 1 39 50 9 - 62 21 18
Novus 20 51 26 3 3 53 42 2 31 33 36 56 30 14
Sifo YouGov
17
19
52
46
28
31
3
5
2
2
48
41
41
45
9
13
28
27
37
36
35
38
60
60
26
23
13
17
Cint 17 48 30 5 1 37 47 15 27 42 31 55 29 16
16
References Bender, Bruce G., Susan J. Bartlett, Cynthia S. Rand, Charles Turner, Frederick S. Wamboldt, and Lening Zhang. (2007) Impact of Reporting Mode on Accuracy of Child and Parent Report of Adherence with Asthma Controller Medication. Pediatrics 120:e471­77. Crewe, Ivor (2005) The Opinion Polls: The Election They Got (Almost) Right. Parliamentary Affairs. 58 (4): 684-698 Druckman, James N., Fein, Jordan, Leeper, Thomas J. (2012) A Source of Bias in Public Opinion Stability American Political Science Review 106 (2): 430-454 Ferrell, Dan, and James C. Peterson. (2010) The Growth of Internet research methods and the Reluctant Sociologist.'' Sociological Inquiry 80:114­25. Findahl, Olle (2010) Svenskarna och Internet 2010. Downloaded 2013-04-03 at: http://www.iis.se/docs/SOI2010_web_v1.pdf Holbrook, Green, and Krosnick (2003) Telephone Versus Face-To-Face Interviewing of National Probability Samples With Long Questionnaires ­ Comparisons of Respondent Satisficing and Social Desirability Response Bias Public Opinion Research Volume 67: 79-125 Kellner, Peter. 2004. Can Online Polls Produce Accurate Findings? International Journal of Market Research 46:3­23. ------. 2007. ``Down with Random Samples.'' http://my.yougov.com/commentaries/peterkellner/down-with-random-samples.aspx. Malhotra, Neil and Krosnick, Jon A. (2007) The Effect of Survey Mode and Sampling on Inferences about Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Smaples Political Analysis (2007) 15:286-323) Pew Research Center (2012) Main report ­ Internet adoption over time. Downloaded 2013-04-03 at: http://pewinternet.org/Reports/2012/Digital-differences/Main-Report/Internet-adoptionover-time.aspx Smith, Aaron (2010) Home Broadband 2010. Downloaded 2012-05-02 at: http://www.pewinternet.org/~/media//Files/Reports/2010/Home%20broadband%202010.pdf Twyman, Joe (2008) Getting It Right: YouGov and Online Survey Research in Britain. Journal of Elections, Public Opinions & Parties. 18 (4): 343-354 Yeager, Krosnick, Chang, Javitz, Levendusky, Simpser, Wang (2011) Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples Public Opinion Quarterly Volume 75, Number 4, Winter 2011, pages 709-747 Zogby, Jonathan. 2007. ``The New Polling Revolution: Opinion Researchers Overcome Their Hangups with Online Polling.'' Campaigns and Elections (May):16­19. ------. 2009. ``For Interactive Polling, the Future Is Now.'' Campaigns and Elections: Politics. (June). http://politicsmagazine.com/magazine-issues/june-2009/for-interactive-polling-the-futureisnow/. 17

J Martinsson, S Dahlberg

File: is-accuracy-only-for-probability-samples.pdf
Author: J Martinsson, S Dahlberg
Author: Johan Martinsson;Stefan Dahlberg;Sebastian Lundmark
Published: Tue May 7 10:53:05 2013
Pages: 18
File size: 0.28 Mb


Capriccio a 4, 2 pages, 0.38 Mb

, pages, 0 Mb

Groups acting on the circle, 5 pages, 0.07 Mb

, pages, 0 Mb

The Spurt of Blood, 8 pages, 0.43 Mb
Copyright © 2018 doc.uments.com