Gary Charness, Brian Jabarian, John A List
Cited by*: None Downloads*: None

We investigate the potential for Large Language Models (LLMs) to enhance scientific practice within experimentation by identifying key areas, directions, and implications. First, we discuss how these models can improve experimental design, including improving the elicitation wording, coding experiments, and producing documentation. Second, we discuss the implementation of experiments using LLMs, focusing on enhancing causal inference by creating consistent experiences, improving comprehension of instructions, and monitoring participant engagement in real time. Third, we highlight how LLMs can help analyze experimental data, including pre-processing, data cleaning, and other analytical tasks while helping reviewers and replicators investigate studies. Each of these tasks improves the probability of reporting accurate findings. Finally, we recommend a scientific governance blueprint that manages the potential risks of using LLMs for experimental research while promoting their benefits. This could pave the way for open science opportunities and foster a culture of policy and industry experimentation at scale.
Aaron Bodoh-Creed, Brent R Hickman, John A List, Ian Muir, Gregory Sun
Cited by*: None Downloads*: None

In this paper, we provide a suite of tools for empirical market design, including optimal nonlinear pricing in intensive-margin consumer demand, as well as a broad class of related adverse selection models. Despite significant data limitations, we are able to derive informative bounds on demand under counterfactual price changes. These bounds arise because empirically plausible DGPs must respect the Law of Demand and the observed shift(s) in aggregate demand resulting from a known exogenous price change(s). These bounds facilitate robust policy prescriptions using rich, internal data sources similar to those available in many real-world applications. Our partial identification approach enables viable nonlinear pricing design while achieving robustness against worst-case deviations from baseline model assumptions. As a side benefit, our identification results also provide useful, novel insights into optimal experimental design for pricing RCTs.
Amee Kamdar, Steven D Levitt, John A List, Brian Mullaney, Chad Syverson
Cited by*: None Downloads*: None

In this paper, we present the results of a two-year series of large-scale natural field experiments involving hundreds of thousands of subjects.
Amanda Kowalski
Cited by*: None Downloads*: None

A headline result from the Oregon Health Insurance Experiment is that emergency room (ER) utilization increased. A seemingly contradictory result from the Massachusetts health reform is that ER utilization decreased. I reconcile both results by identifying treatment effect heterogeneity within the Oregon experiment and extrapolating it to Massachusetts. Even though Oregon compliers increased their ER utilization, they were adversely selected relative to Oregon never takers, who would have decreased their ER utilization. Massachusetts expanded coverage from a higher level to healthier compliers. Therefore, Massachusetts compliers are comparable to a subset of Oregon never takers, which can reconcile the results.
John A List
Cited by*: None Downloads*: None

Presentation Slides
Patricia Gil, Justin Holz, John A List, Andrew Simon, Alejandro Zentner
Cited by*: None Downloads*: None

In modern economies, when debt and trust issues arise, a partial forgiveness policy is often the solution to induce payment and increase disclosure. For their part, governments around the globe continue to use tax amnesties as a strategy to allow debtors to make amends for past misdeeds in exchange for partial debt forgiveness. While ubiquitous, much remains unknown about the basic facts of how well amnesties work, for whom, and why. We present a simple theoretical construct that provides both economic clarity into tax amnesties as well as insights into the necessary behavioral parameters that one must estimate to understand the consequences of tax amnesties. We partner with the Dominican Republic Tax Authorities to design a natural field experiment that is linked to the theory to estimate key causal mechanisms. Empirical results from our field experiment, which covers 125,452 taxpayers who collectively owe $5.2 billion (5.5% of GDP) in known debt, highlight the import of deterrence laws, beliefs about future amnesties, and tax morale for debt payment and increased disclosure. Importantly, we find large short run effects: our most effective treatment (deterrence) increased payments of known debt by 25% and hidden debt by 48%. Further, we find no evidence of our intervention backfiring on subsequent tax payments.
John A List, Ian Muir, Devin Pope, Gregory Sun
Cited by*: None Downloads*: None

Left-digit bias (or 99-cent pricing) has been discussed extensively in economics, psychology, and marketing. Despite this, we show that the rideshare company, Lyft, was not using a 99-cent pricing strategy prior to our study. Based on observational data from over 600 million Lyft sessions followed by a field experiment conducted with 21 million Lyft passengers, we provide evidence of large discontinuities in demand at dollar values. Approximately half of the downward slope of the demand curve occurs discontinuously as the price of a ride drops below a dollar value (e.g. $14.00 to $13.99). If our short run estimates persist in the longer run, we calculate that Lyft could increase its profits by roughly $160M per year by employing a left-digit bias pricing strategy. Our results showcase the robustness of an important behavioral bias for a large, modern company and its persistence in a highly-competitive market.
John A List
Cited by*: None Downloads*: None

In 2019, I put together a summary of data from my field experiments website that pertained to natural field experiments. Several people have asked me if I have an update. In this document I update all figures and numbers to show the details for 2022. I also include the description from the 2019 paper below.
John A List
Cited by*: None Downloads*: None

Editor's Introduction to JPE Micro
John A List
Cited by*: None Downloads*: None

In 2019 I put together a summary of data from my field experiments website that pertained to framed field experiments. Several people have asked me if I have an update. In this document I update all figures and numbers to show the details for 2022. I also include the description from the 2019 paper below with appropriate additions
John A List
Cited by*: None Downloads*: None

2022 Summary of Artefactual Experiments
Omar Al-Ubaydli, Jason Chien-Yu, John A List
Cited by*: None Downloads*: None

The "voltage effect" is defined as the tendency for a program's efficacy to change when it is scaled up, which in most cases results in the absolute size of a program's treatment effects to diminish when the program is scaled. Understanding the scaling problem and taking steps to diminish voltage drops are important because if left unaddressed, the scaling problem can weaken the public's faith in science, and it can lead to a misallocation of public resources. There exists a growing literature illustrating the prevalence of the scaling problem, explaining its causes, and proposing countermeasures. This paper adds to the literature by providing a simple model of the scaling problem that is consistent with rational expectations by the key stakeholders. Our model highlights that asymmetric information is a key contributor to the voltage effect.
John A List, Matthias Rodemeier, Sutanuka Roy, Gregory Sun
Cited by*: None Downloads*: None

While behavioral non-price interventions ("nudges") have grown from academic curiosity to a bona fide policy tool, their relative economic efficiency remains under-researched. We develop a unified framework to estimate welfare effects of both nudges and taxes. We showcase our approach by creating a database of more than 300 carefully hand-coded point estimates of non-price and price interventions in the markets for cigarettes, influenza vaccinations, and household energy. While nudges are effective in changing behavior in all three markets, they are not necessarily the most efficient policy. We find that nudges are more efficient in the market for cigarettes, while taxes are more efficient in the energy market. For influenza vaccinations, optimal subsidies likely outperform nudges. Importantly, two key factors govern the difference in results across markets: i) an elasticity-weighted standard deviation of the behavioral bias, and ii) the magnitude of the average externality. Nudges dominate taxes whenever i) exceeds ii). Combining nudges and taxes does not always provide quantitatively significant improvements to implementing one policy tool alone.
Pradhi Aggarwal, Alec Brandon, Ariel Goldszmidt, Justin Holz, John A List, Ian Muir, Gregory Sun, Thomas Yu
Cited by*: None Downloads*: None

Prior research finds that, conditional on an encounter, minority civilians are more likely to be punished by police than white civilians. An open question is whether the actual encounter is related to race. Using high-frequency location data of rideshare drivers operating on the Lyft platform in Florida, we estimate the effect of driver race on traffic stops and fines for speeding. Estimates obtained across traditional and machine learning approaches show that, relative to a white driver traveling the same speed, minorities are 24 to 33 percent more likely to be stopped for speeding and pay 23 to 34 percent more in fines. We find no evidence that these estimates can be explained by racial differences in accident and re-offense rates. Our study provides key insights into the total effect of civilian race on outcomes of interest and highlights the potential value of private sector data to help inform major social challenges.
John A List, Ian Muir, Gregory Sun
Cited by*: None Downloads*: None

This study investigates how to use regression adjustment to reduce variance in experimental data. We show that the estimators recommended in the literature satisfy an orthogonality property with respect to the parameters of the adjustment. This observation greatly simplifies the derivation of the asymptotic variance of these estimators and allows us to solve for the efficient regression adjustment in a large class of adjustments. Our efficiency results generalize a number of previous results known in the literature. We then discuss how this efficient regression adjustment can be feasibly implemented. We show the practical relevance of our theory in two ways. First, we use our efficiency results to improve common practices currently employed in field experiments. Second, we show how our theory allows researchers to robustly incorporate machine learning techniques into their experimental estimators to minimize variance.
Alec Brandon, Christopher M Clapp, John A List, Robert D Metcalfe, Michael K Price
Cited by*: None Downloads*: None

Smart-home technologies have been heralded as an important way to increase energy conservation. While in vitro engineering estimates provide broad optimism, little has been done to explore whether such estimates scale beyond the lab. We estimate the causal impact of smart thermostats on energy use via two novel framed field experiments in which a random subset of treated households have a smart thermostat installed in their home. Examining 18 months of associated high-frequency data on household energy consumption, yielding more than 16 million hourly electricity and daily natural gas observations, we find little evidence that smart thermostats have a statistically or economically significant effect on energy use. We explore potential mechanisms using almost four million observations of system events including human interactions with their smart thermostat. Results indicate that user behavior dampens energy savings and explains the discrepancy between estimates from engineering models, which assume a perfectly compliant subject, and actual households, who are occupied by users acting in accord with behavioral economists' conjectures. In this manner, our data document a keen threat to the scalability of new user-based technologies.
Isabelle Brocas, Juan D Carrillo
Cited by*: None Downloads*: None

Adults do not play the Nash equilibrium in the well known centipede game. While Palacios-Huerta and Volij (2009) argued that behavior results from the failure of backward induction logic, Levitt et al. (2011) found that players who know how to backward induct still do not play Nash. Here, we ask children and adolescents (ages 8 to 16) to play the centipede game in the laboratory and we leverage knowledge about developing abilities to assess the contribution of backward induction logic. In line with the literature, we find that the ability to perform backward induction increases with age. However, it predicts behavior only in elementary school children: those with advanced logical abilities over-apply their skills. Starting in middle school, students who reason logically know that the unraveling argument should not be applied blindly. They utilize Theory-of-Mind (ToM) abilities to form beliefs about others' play and (optimally) refrain from stopping immediately. Their behavior is in line with the deviations observed in adults. Interestingly, developing ToM leads to a gradual decrease in stopping stages with age, which is accompanied by a decrease in payoffs with age. The results indicate that ToM is the key contributor of behavior that helps departing from backward induction when beneficial.
John A List, Rohen Shah
Cited by*: None Downloads*: None

In organizations, teams are ubiquitous. "Weakest Link" and "Best Shot" are incentive schemes that tie a group member's compensation to the output of their group's least and most productive member, respectively. In this paper, we test the impact of these incentive schemes by conducting two pilot RCTs (one in-person, one online), which included more than 250 graduate students in a graduate math class. Students were placed in study groups of three or four students, and then groups were randomized to either control, Weakest Link, or Best Shot incentives. We find evidence that such incentive approaches can affect test scores, both in-person and online.
Brian Albrecht, Omar Al-Ubaydli, Peter Boettke
Cited by*: None Downloads*: None

Economists well understand that the work of Friedrich Hayek contains important theoretical insights. It is less often acknowledged that his work contains testable predictions about the nature of market processes. Vernon Smith termed the most important one the 'Hayek hypothesis': that gains from trade can be realized in the presence of diffuse, decentralized information, and in the absence of price-taking behavior and centralized market direction. Vernon Smith tested this prediction by surveying data on laboratory experimental markets and found strong support. We extend Smith's work first by showing how subsequent theoretical advances provide a theoretical foundation for the Hayek Hypothesis. We then test the hypothesis using recent field experimental market data. Using field experiments allows us to test several other predictions from Hayek, such as that market experience increases the realized gains from trade. Generally speaking, we find support for Hayek's theories.
Majid Ahmadi, Nathan Durst, Jeff Lachman, Mason List, Noah List, John A List, Atom Vayalinkal
Cited by*: None Downloads*: None

Recent models and empirical work on network formation emphasize the importance of propinquity in producing strong interpersonal connections. Yet, one might wonder how deep such insights run, as thus far empirical results rely on survey and lab-based evidence. In this study, we examine propinquity in a high-stakes setting of talent allocation: the Major League Baseball (MLB) Draft. We examine draft picks from 2000-2019 across every MLB club of the nearly 30,000 players drafted (from a player pool of more than a million potential draftees). Our findings can be summarized in three parts. First, propinquity is alive and well in our setting, and spans even the latter years of our sample, when higher-level statistical exercises have become the norm rather than the exception. Second, the measured effect size is important, as MLB clubs pay a real cost in terms of inferior talent acquired due to propinquity bias: for example, their draft picks appear in 25 fewer games relative to teams that do not exhibit propinquity bias. Finally, the effect is found to be the most pronounced in later rounds of the draft (after round 15), where the Scouting Director has the greatest latitude.