Surveys are a vital tool for understanding public opinion and knowledge, but they can also yield biased estimates of behavior. Here we explore a popular and important behavior that is frequently measured in public opinion surveys: news consumption. Previous studies have shown that television news consumption is consistently overreported in surveys relative to passively collected behavioral data. We validate these earlier findings, showing that they continue to hold despite large shifts in news consumption habits over time, while also adding some new nuance regarding question wording. We extend these findings to survey reports of online and social media news consumption, with respect to both levels and trends. Third, we demonstrate the usefulness of passively collected data for measuring a quantity such as “consuming news” for which different researchers might reasonably choose different definitions. Finally, recognizing that passively collected data suffers from its own limitations, we outline a framework for using a mix of passively collected behavioral and survey-generated attitudinal data to accurately estimate consumption of news and related effects on public opinion and knowledge, conditional on media consumption.
Motivated misreporting occurs when respondents give incorrect responses to survey questions to shorten the interview; studies have detected this behavior across many modes, topics, and countries. This paper tests whether motivated misreporting affects responses in a large survey of household purchases, the U. S. Consumer Expenditure Interview Survey. The data from this survey inform the calculation of the official measure of inflation, among other uses. Using a parallel web survey and multiple imputation, this paper estimates the size of the misreporting effect without experimentally manipulating questions in the survey itself. Results suggest that household purchases are underreported by approximately 5 percentage points in three sections of the first wave of the survey. The approach used here, involving a web survey built to mimic the expenditure survey, could be applied in other large surveys where budget or logistical constraints prevent experimentation
The U.S. Consumer Expenditure Interview Survey asks many filter questions to identify the items that households purchase. Each reported purchase triggers follow-up questions about the amount spent and other details. We test the hypothesis that respondents learn how the questionnaire is structured and underreport purchases in later waves to reduce the length of the interview. We analyze data from 10,416 four-wave respondents over two years of data collection. We find no evidence of decreasing data quality over time; instead, panel respondents tend to give higher quality responses in later waves. The results also hold for a larger set of two-wave respondents.
Although counts of the novel Coronavirus (SARS-CoV-2) infections and deaths are reported by several sources online, precise estimation of the exposed proportion of the population is not possible in most areas of the world. Estimates of other disease prevalence in the United States are often obtained through in-person seroprevalence surveys. The availability of testing only for individuals with symptoms, combined with stay-at-home and social distancing mandates to stem the spread of the disease, limit in-person data collection options. A probability-based mail survey with at-home, self-administered testing is a feasible method to safely estimate SARS-CoV-2 antibody prevalence within the United States while also easing burden on the U.S. public and health care system. This mail survey could be a one-time, cross-sectional design, or a repeated cross-sectional or longitudinal survey. We discuss several options for designing and conducting this survey.
Several studies have shown that high response rates are not associated with low bias in survey data. This paper shows that, for face-to-face surveys, the relationship between response rates and bias is moderated by the type of sampling method used. Using data from Rounds 1 through 7 of the European Social Survey, we develop two measures of selection bias, then build models to explore how sampling method, response rate, and their interaction affect selection bias. When interviewers are involved in selecting the sample of households or respondents for the survey, high reported response rates can in fact be a sign of poor data quality. We speculate that the positive association detected between response rates and selection bias is because of interviewers’ incentives to select households and respondents who are likely to complete the survey.
Administrative data are increasingly important in statistics, but, like other types of data, may contain measurement errors. To prevent such errors from invalidating analyses of scientific interest, it is therefore essential to estimate the extent of measurement errors in administrative data. Currently, however, most approaches to evaluate such errors involve either prohibitively expensive audits or comparison with a survey that is assumed perfect. We introduce the “generalized multitrait-multimethod” (GMTMM) model, which can be seen as a general framework for evaluating the quality of administrative and survey data simultaneously. This framework allows both survey and administrative data to contain random and systematic measurement errors. Moreover, it accommodates common features of administrative data such as discreteness, nonlinearity, and nonnormality, improving similar existing models. The use of the GMTMM model is demonstrated by application to linked survey-administrative data from the German Federal Employment Agency on income from of employment, and a simulation study evaluates the estimates obtained and their robustness to model misspecification. Supplementary materials for this article are available online.
The LISS online panel has made extra efforts to recruit and retain households that were not regular users of the internet into the study. Households were provided with computers and/or internet when necessary. Including these cases made the panel more representative of the Dutch population, by bringing in respondents who were more likely to be older, to live in single-person homes and to have migration backgrounds. This paper replicates five published papers which used LISS data and explores how the conclusions in these papers would have been different had the LISS panel not included the non-internet households. There are strong demographic differences between the internet and non-internet households, and estimates of means would in many cases be biased if these households had not been included. However, across the five replicated studies, few of the published model estimates are substantively affected by the inclusion of these households in the LISS sample.