Stephanie Eckman

Researcher & Data Scientist

I have a Ph.D. in Statistics & Methodology. I collect high quality data for social science and model training

Skills

R coding

Statistics, Data Science, Machine Learning

International Data Collection

Experience

Principal Research Scientist

Amazon

February 2023 – Present Arlington, VA

Researcher and Data Scientist

Social Data Science Center, University of Maryland

February 2023 – Present College Park, MD

Fellow

RTI International

August 2015 – December 2022 Washington, DC

Conducting and publishing research into data quality
Mentoring junior researchers
Providing scientific leadership to the institute

Senior Researcher

Institute for Employment Research

July 2010 – July 2015 Nuremberg, Germany

Conducting and publishing research
Advising on design of IAB surveys

Chair of Sociology (Interim)

University of Mannheim

January 2013 – January 2014 Mannheim, Germany

Conducting and publishing research
Teaching courses in data analysis & research methods
Mentoring students

Methodologist

NORC at the University of Chicago

April 2001 – January 2010 Chicago, IL

Recent Publications

Stephanie Eckman

June, 2021 EMNLP

Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.

Jacob Beck, Stephanie Eckman, Rob Chew, Frauke Kreuter

February, 2020

Improving Labeling Through Social Science Insights: Preliminary Results and Research Agenda

Although often seen as a gold-standard, human labeled training data is not error free. Decisions in the design of labeling tasks can impact the resulting labeled data and impact predictions. Building on insights from survey methodology, a field that studies the impact of instrument design on survey data and estimates, we examine how the structure of a hate speech labeling task affects which labels are assigned. We also examine what effect task ordering has on the perception of hate speech and what role background characteristics of annotators have on classifications provided by annotators. The study demonstrates the im-portance of applying design thinking at the earliest steps of ML product development. Design principles such as quick prototyping and critically assessing user interfaces are not only important in interaction with end users of an artificial intelligence (AI)-driven products, but are crucial early in development, prior to training AI algorithms.

Daniel Oberski, Antje Kirchner, Stephanie Eckman, Frauke Kreuter

January, 2018 JASA Applications

Evaluating the Quality of Survey and Administrative Data with Generalized Multitrait-Multimethod Models

Administrative data are increasingly important in statistics, but, like other types of data, may contain measurement errors. To prevent such errors from invalidating analyses of scientific interest, it is therefore essential to estimate the extent of measurement errors in administrative data. Currently, however, most approaches to evaluate such errors involve either prohibitively expensive audits or comparison with a survey that is assumed perfect. We introduce the “generalized multitrait-multimethod” (GMTMM) model, which can be seen as a general framework for evaluating the quality of administrative and survey data simultaneously. This framework allows both survey and administrative data to contain random and systematic measurement errors. Moreover, it accommodates common features of administrative data such as discreteness, nonlinearity, and nonnormality, improving similar existing models. The use of the GMTMM model is demonstrated by application to linked survey-administrative data from the German Federal Employment Agency on income from of employment, and a simulation study evaluates the estimates obtained and their robustness to model misspecification. Supplementary materials for this article are available online.

Recent & Upcoming Talks

Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

5 minute lightening talk on methods to collect annotations for NLP models

Aug 22, 2023 6:30 PM Arlington, VA

Stephanie Eckman

Bringing Survey Methodology to Machine Learning: Effects of Data Collection Methods on Model Performance

The instruments used to collect training data for machine learning models have many similarities to web surveys, such as the provision …

Jun 8, 2023 1:00 PM

Stephanie Eckman

Bringing Survey Methodology to Machine Learning: Effects of Data Collection Methods on Model Performance

Improving Label Collection Through Social Science Insights

As many in the AI field transition from a model-centric to a data-centric approach, the quality of training data is receiving increased …

May 11, 2022 8:30 AM

Stephanie Eckman

Improving Label Collection Through Social Science Insights

Using Passive Data to Supplement or Replace Survey Data

Dec 1, 2021 10:00 AM

Stephanie Eckman

Data Quality in Data Science

Data scientists are increasingly turning their attention to the collection of high quality training data. Those of us with expertise in data collection can apply lessons in web survey design to them.

Aug 5, 2021 10:00 AM

Stephanie Eckman

See all events

Research Projects

Feb 13, 2023

Data Quality, Data Centric AI

Helping data scientists collect more accurate training data, decreasing the cost and time needed to train models

Feb 14, 2023

Coverage & Selection Bias

My research has explored undercoverage bias in surveys around the world

Feb 13, 2023

Innovative Sampling Methods

Alternative sampling approaches which do not depend on up-to-date census data or interviewer involvement

Feb 13, 2023

Role of Incentives in Data Construction

The incentives of those producing data impact the quality of the data

Feb 13, 2023

Passive Data Collection

Data collected without the involvement of the respondent can improve quality

Additional Publications

Quickly discover relevant content by filtering publications.

Florian Keusch, Sebastian Bähr, Georg-Christoph Haas, Frauke Kreuter, Mark Trappmann, Stephanie Eckman (2022). Non-participation in smartphone data collection using research apps. JRSSA.

PDF DOI

Stephanie Eckman, Jennifer Unangst, Jill Dever, Christopher Antoun (2022). The Precision of Estimates of Nonresponse Bias in Means. JSSAM.

Tobias Konitzer, Jennifer Allen, Stephanie Eckman, Baird Howland, Markus Mobius, David Rothschild, Duncan Watts (2021). Comparing Estimates of News Consumption from Survey and Passively Collected Behavioral Data. POQ.

PDF Project DOI

Stephanie Eckman (2021). Underreporting of Purchases in the U.S. Consumer Expenditure Survey. JSSAM.

PDF Project DOI

Stephanie Eckman, Ruben Bach (2021). Panel Conditioning in the U.S. Consumer Expenditure Survey. Journal of Official Statistics.

PDF Project DOI

See all publications