Read by QxMD icon Read

Evaluation Review

Daniel Kassler, Ira Nichols-Barrer, Mariel Finucane
BACKGROUND: Researchers often wish to test a large set of related interventions or approaches to implementation. A factorial experiment accomplishes this by examining not only basic treatment-control comparisons but also the effects of multiple implementation "factors" such as different dosages or implementation strategies and the interactions between these factor levels. However, traditional methods of statistical inference may require prohibitively large sample sizes to perform complex factorial experiments...
January 10, 2019: Evaluation Review
Richard Hendra, Aaron Hill
BACKGROUND: Federally funded evaluation research projects typically strive for an 80% survey response rate, but the increasing difficulty and expense in reaching survey respondents raises the question of whether such a threshold is necessary for reducing bias and increasing the accuracy of survey estimates. OBJECTIVES: This analysis focuses on a particular component of survey methodology: the survey response rate and its relationship to nonresponse bias. Following a review of the literature, new analysis of data from a large, multisite random assignment experiment explores the relationship between survey response rates and measured nonresponse bias...
December 23, 2018: Evaluation Review
Lauren Vollmer, Mariel Finucane, Randall Brown
BACKGROUND: Policy makers seek to replace the "thumbs up-thumbs down" of conventional hypothesis testing with statements about the probability that program effects on key outcomes exceed policy-relevant thresholds. OBJECTIVE: We develop a Bayesian model that addresses the shortcomings of a typical frequentist approach to estimating the effects of the Comprehensive Primary Care (CPC) initiative, a Centers for Medicare and Medicaid Services demonstration...
December 11, 2018: Evaluation Review
Judith Scott-Clayton, Qiao Wen
BACKGROUND: The increasing availability of massive administrative data sets linking postsecondary enrollees with postcollege earnings records has stimulated a wealth of new research on the returns to college and has accelerated state and federal efforts to hold institutions accountable for students' labor market outcomes. Many of these new research and policy efforts rely on state databases limited to postsecondary enrollees who work in the same state postcollege, with limited information regarding family background and precollege ability...
November 19, 2018: Evaluation Review
Edward Wu, Johann A Gagnon-Bartsch
BACKGROUND: When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. OBJECTIVES: This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data...
November 15, 2018: Evaluation Review
Natalie Todak, Michael D White, Lisa M Dario, Andrea R Borrego
OBJECTIVE: To provide guidance to criminologists for conducting experiments in light of two common discouraging factors: the belief that they are overly time-consuming and the belief that they can compromise the ethical principles of human subjects' research. METHOD: A case study approach is used, based on a large-scale randomized controlled trial experiment in which we exposed participants to a 5-s TASER shock, to describe how the authors overcame ethical, methodological, and logistical difficulties...
October 16, 2018: Evaluation Review
Quinn Moore, Irma Perez-Johnson, Robert Santillano
BACKGROUND: Differences in earnings measured using either survey or administrative data raise the question of which is preferred for program impact evaluations. This is especially true when the population of interest has varying propensities to be represented in either source. OBJECTIVES: We aim to study differences in impacts on earnings from a job training voucher experiment in order to demonstrate which source is most appropriate to interpret findings. RESEARCH DESIGN: Using study participants with survey-reported earnings, we decompose mean earnings differences across sources into those resulting from (1) differences in reported employment and (2) differences in reported earnings for those who are employed in both sources...
October 16, 2018: Evaluation Review
Donald P Green, Winston Lin, Claudia Gerber
BACKGROUND: Many place-based randomized trials and quasi-experiments use a pair of cross-section surveys, rather than panel surveys, to estimate the average treatment effect of an intervention. In these studies, a random sample of individuals in each geographic cluster is selected for a baseline (preintervention) survey, and an independent random sample is selected for an endline (postintervention) survey. OBJECTIVE: This design raises the question, given a fixed budget, how should a researcher allocate resources between the baseline and endline surveys to maximize the precision of the estimated average treatment effect? RESULTS: We formalize this allocation problem and show that although the optimal share of interviews allocated to the baseline survey is always less than one-half, it is an increasing function of the total number of interviews per cluster, the cluster-level correlation between the baseline measure and the endline outcome, and the intracluster correlation coefficient...
October 9, 2018: Evaluation Review
Edith Yang, Richard Hendra
BACKGROUND: The high costs of implementing surveys are increasingly leading research teams to either cut back on surveys or to rely on administrative records. Yet no policy should be based on a single set of estimates, and every approach has its weaknesses. A mixture of approaches, each with its own biases, should provide the analyst with a better understanding of the underlying phenomenon. This claim is illustrated with a comparison of employment effect estimates of two conditional cash transfer programs in New York City using survey and administrative unemployment insurance (UI) data...
September 17, 2018: Evaluation Review
Reuben Ford, Douwêrê Grékou, Isaac Kwakye, Taylor Shek-Wai Hui
BACKGROUND: This article reports on the Future to Discover Project-a Canadian randomized controlled trial of two high school interventions-where data on key postsecondary enrollment outcomes were collected for two phases. During the initial phase, outcomes were recorded from administrative data and follow-up surveys. During the later phase, data came from administrative records only. OBJECTIVES: The article provides analyses that are informative about the consequences of a change from administrative-only data to survey-only data (and vice versa) for the estimation of impacts...
September 13, 2018: Evaluation Review
Jaime Thomas, Thomas D Cook, Alice Klein, Prentice Starkey, Lydia DeFlorio
Policy makers face dilemmas when choosing a policy, program, or practice to implement. Researchers in education, public health, and other fields have proposed a sequential approach to identifying interventions worthy of broader adoption, involving pilot, efficacy, effectiveness, and scale-up studies. In this article, we examine a scale-up of an early math intervention to the state level, using a cluster randomized controlled trial. The intervention, Pre-K Mathematics, has produced robust positive effects on children's math ability in prior pilot, efficacy, and effectiveness studies...
August 6, 2018: Evaluation Review
David M Rindskopf, William R Shadish, M H Clark
BACKGROUND: Randomized experiments yield unbiased estimates of treatment effect, but such experiments are not always feasible. So researchers have searched for conditions under which randomized and nonrandomized experiments can yield the same answer. This search requires well-justified and informative correspondence criteria, that is, criteria by which we can judge if the results from an appropriately adjusted nonrandomized experiment well-approximate results from randomized experiments...
July 30, 2018: Evaluation Review
Vivian C Wong, Peter M Steiner, Kylie L Anglin
Given the widespread use of nonexperimental (NE) methods for assessing program impacts, there is a strong need to know whether NE approaches yield causally valid results in field settings. In within-study comparison (WSC) designs, the researcher compares treatment effects from an NE with those obtained from a randomized experiment that shares the same target population. The goal is to assess whether the stringent assumptions required for NE methods are likely to be met in practice. This essay provides an overview of recent efforts to empirically evaluate NE method performance in field settings...
April 2018: Evaluation Review
Philip Gleason, Alexandra Resch, Jillian Berk
BACKGROUND: This article explores the performance of regression discontinuity (RD) designs for measuring program impacts using a synthetic within-study comparison design. We generate synthetic RD data sets from experimental data sets from two recent evaluations of educational interventions-the Educational Technology Study and the Teach for America Study-and compare the RD impact estimates to the experimental estimates of the same intervention. OBJECTIVES: This article examines the performance of the RD estimator with the design is well implemented and also examines the extent of bias introduced by manipulation of the assignment variable in an RD design...
February 2018: Evaluation Review
Jade Marcus Jenkins, Terri J Sabol, George Farkas
BACKGROUND: Recent growth in subsidized preschool opportunities in the United States for low-income 4-year-old children has allowed federal Head Start programs to fund more slots for 3-year-old children. In turn, when Age-3 Head Start participants turn four, they may choose to switch into one of the many alternative care options or choose to stay in Head Start for a second year. OBJECTIVES: We analyze a nationally representative sample of Age-3 Head Start participants to examine whether children who stay in Head Start for a second year at Age 4 exhibit greater school readiness and subsequent cognitive and behavioral performance compared with children who switch out of Head Start into alternative care...
January 1, 2018: Evaluation Review
Vivian C Wong, Peter M Steiner
Over the last three decades, a research design has emerged to evaluate the performance of nonexperimental (NE) designs and design features in field settings. It is called the within-study comparison (WSC) approach or the design replication study. In the traditional WSC design, treatment effects from a randomized experiment are compared to those produced by an NE approach that shares the same target population. The nonexperiment may be a quasi-experimental design, such as a regression-discontinuity or an interrupted time-series design, or an observational study approach that includes matching methods, standard regression adjustments, and difference-in-differences methods...
January 1, 2018: Evaluation Review
Yang Tang, Thomas D Cook
The basic regression discontinuity design (RDD) has less statistical power than a randomized control trial (RCT) with the same sample size. Adding a no-treatment comparison function to the basic RDD creates a comparative RDD (CRD); and when this function comes from the pretest value of the study outcome, a CRD-Pre design results. We use a within-study comparison (WSC) to examine the power of CRD-Pre relative to both basic RDD and RCT. We first build the theoretical foundation for power in CRD-Pre, then derive the relevant variance formulae, and finally compare them to the theoretical RCT variance...
January 1, 2018: Evaluation Review
Yasemin Kisbu-Sakarya, Thomas D Cook, Yang Tang, M H Clark
Compared to the randomized experiment (RE), the regression discontinuity design (RDD) has three main limitations: (1) In expectation, its results are unbiased only at the treatment cutoff and not for the entire study population; (2) it is less efficient than the RE and so requires more cases for the same statistical power; and (3) it requires correctly specifying the functional form that relates the assignment and outcome variables. One way to overcome these limitations might be to add a no-treatment functional form to the basic RDD and including it in the outcome analysis as a comparison function rather than as a covariate to increase power...
January 1, 2018: Evaluation Review
Peter M Steiner, Vivian C Wong
In within-study comparison (WSC) designs, treatment effects from a nonexperimental design, such as an observational study or a regression-discontinuity design, are compared to results obtained from a well-designed randomized control trial with the same target population. The goal of the WSC is to assess whether nonexperimental and experimental designs yield the same results in field settings. A common analytic challenge with WSCs, however, is the choice of appropriate criteria for determining whether nonexperimental and experimental results replicate...
January 1, 2018: Evaluation Review
David Kaplan, Chansoon Lee
This article provides a review of Bayesian model averaging as a means of optimizing the predictive performance of common statistical models applied to large-scale educational assessments. The Bayesian framework recognizes that in addition to parameter uncertainty, there is uncertainty in the choice of models themselves. A Bayesian approach to addressing the problem of model uncertainty is the method of Bayesian model averaging. Bayesian model averaging searches the space of possible models for a set of submodels that satisfy certain scientific principles and then averages the coefficients across these submodels weighted by each model's posterior model probability (PMP)...
January 1, 2018: Evaluation Review
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"