Talking About “What Works”

How do you answer questions about the difference you are making in the world?

Do you and others in your nonprofit talk about program outcomes or program impact? In what ways do you describe the changes in your clients after they’ve been through your program? Do you compare their knowledge, attitudes or behavior after your program to when they first showed up? In conversations with donors, funders, board members and others, if you sometimes suggest the positive changes you’ve seen in your clients resulted from your program, you might find this post interesting.

I have written previously on the debate over “evidence” or proof that a social sector program is effective (What Does Evidence Really Mean?). I have also written about “Moneyball for Government,” the trend for public agencies (especially federal) to rely on online “evidence clearinghouses” that rank the level of evaluation proof that a certain program works. These databases rely on experts to review evaluation studies on each program and rate the quality of the research on its effectiveness. The programs showing positive findings from the largest number of studies that use the most rigorous evaluation methods are deemed to have the strongest proof and are placed in the top tier. Typically, they are called “evidence-based programs/practices”. Based on these rankings, government agencies make funding decisions that favor replication of proven successful models over other interventions that may equally work, but to date have been less rigorously evaluated.

A new pair of articles in Stanford Social Innovation Review about “evidence” and public policy prompts me to don my “nerd hat” again and weigh in on this subject. The first piece, by Srik Gopal & Lisbeth B. Schorr, argues against a narrow definition of social program evidence, especially reliance on randomized control trial experimental design studies (RCTs). And they outline their opposition to the practice of using the ranked-evidence clearinghouses to make public funding decisions, citing the importance of local context, idiosyncrasies of target groups and service delivery, as well as the need for continual innovation of program models.

The second article by Patrick Lester is an answer to the first article. Lester argues in favor of the “evidence hierarchies” that name RCT evidence at the best proof there is that a specific program is successful in achieving what it intends. He favors their use in public policy decisions.

Why Should Anyone Care About the Debate Over Evidence?

In my view, everyone involved in trying to change the world for better should care about this debate for several reasons:

  1. Government is increasingly relying on these evidence clearinghouses to make funding decisions. If you don’t have rigorous formal evaluation findings showing your program does what it intends, you may lose out on these funding streams.
  2. Foundations and donors are also becoming more demanding about documenting program outcomes. How you talk with them and other important stakeholders about the effectiveness of your program matters.
  3. To communicate successfully about the difference your group is making, you must understand some of the fundamental challenges involved in obtaining evidence of what works.

I care about this debate because, in my consulting work and evaluation classes, I regularly interact with board and staff members in small to mid-sized nonprofits who seek my help to devise simple program evaluation methods (such as pre- and post-surveys) so that they can “prove” their program works. I try to explain that only advanced evaluation strategies with sophisticated statistical methods and tools can do this. I tell them, even if they can carefully document the before and after differences in their clients, they are still a long way from being able to say that those findings constitute evidence that their program actually caused the differences. (If this is not immediately clear, think about all the other possible reasons why people might be different over time other than their participation in your program.)

In my introductory evaluation classes, I offer students a taste of these complexities–and their practical implications–with a provocative two-part SSIR essay, “Why Charities Should Not Evaluate Their Work.” Author Carolyn Fiennes argues that nonprofit organizations should implement only rigorously proven (evidence-based) program models and then simply monitor the program’s delivery. Because they have neither the required training nor the necessary objectivity, she says, nonprofits should not attempt to evaluate program outcomes or program effectiveness (see the essays); instead, they should leave program evaluation to the experts. After my students read the pieces and choose sides, we hold a mock debate in class. This exercise is always lively, illuminating some of the issues around thinking and talking about what works in the social sector.

Continuum of Rigor

To help drive the concepts home, I developed a highly simplified diagram (below) illustrating the continuum of evaluation rigor. My goal is to ground students’ expectations about what they can achieve with modest budgets. I also want to motivate those that haven’t yet to “just get on the bus”–start small with an internal monitoring system to document the most meaningful outcomes to them and to also track carefully how they are delivering their program. I mention the evidence clearinghouses and encourage them to learn more about the kinds of programs that have been shown to be most effective in their field, their region and with their target population. My goal is to inspire a desire for learning–both individually and at the organization level.


I’d love to hear your thoughts about my continuum graphic, the debate over program evidence and how best to talk about “causality” with non-nerds.