Where’s the Evidence on Evidence? My Beef with the Lack of Efficacy Data on Apps

The latest report from the Joan Ganz Cooney Center, Getting a Read on the App Stores, provides an overview of what parents are most likely to encounter when they are looking for “educational apps.” The apps the team reviewed are ones that parents are likely to find on a variety of well-respected lists and have received awards for excellence. App descriptions were analyzed for a variety of features. The finding that was particularly interesting to me was that most app descriptions (on apps that are considered educational) do not mention “benchmarks” of educational quality. Rarely do those descriptions include information on having education or child development expertise on the team. Even more striking, only about 1/4 of the apps mentioned some sort of research testing. Almost all of the testing that was described was around usability, not efficacy. Hardly anybody reports any efficacy testing, which likely means that hardly anyone does any efficacy testing.

I don’t think I’m the only one who looked at this statistic and parroted Clara Peller from the 80s Wendy’s commercial and thought, “Where’s the beef?” Or in non-hamburger terms, “Where’s the evidence on evidence?” Given that the educational app industry takes home millions of dollars annually, it seems fair that we should expect some sort of positive growth in children when they spending hours using these apps. But what should we expect a single app to do? And should expectations of effectiveness vary by the cost of the app and the amount of time a child might use the app? If an app is free, should we not hold it to the same standard?

Having educators and child development experts on the development team is a start, as such experts do have more knowledge of the ingredients of what makes something educational. However, just because the ingredients going into something may be right, doesn’t mean the outcome is as intended (just look at anything I’ve ever tried to bake).

I am not an economist, but I do like to think of educational products in terms of time-displacement and cost-benefit, especially if parents and school systems are spending money on activities that are supposed to be enriching to children. Years ago, my graduate school advisors found that educational television viewing co-existed with book reading and other educational endeavors. In fact, those who viewed more educational television also read more and did other educational activities than those who watched less educational television. These days, we do not have solid data on what app use displaces or complements. We also have very little information on the cost-benefit of individual apps or app learning programs.

Those studying effects of large-scale literacy interventions look at impact in terms of statistical “effect” sizes. Given that there are widely-accepted valid and reliable “tests” of literacy, researchers are able to compare one intervention over another and divide the cost of the intervention across the number of children served and the “effect” it produced for that cost. Over the course of a year, for example, 1^st graders on average increase their literacy scores on national tests almost a full standard deviation. Comparing the amount of time a child spends on literacy activities in school compared to the amount of time he or she spends with an app, if an inexpensive app produces an increase of even a quarter of a standard deviation over a few months, that would be an impressive app.

Effect sizes and standard deviations are difficult to describe in practical terms so here is another way of thinking about effects using vocabulary growth as an example. According to some literary experts children need to learn about 800 words per year in 1^st and 2^nd grade in order to remain on grade level. Using this as a benchmark, if a child learns one extra word a day for 3 months from using an app that costs $1.99 for 5 minutes a day, it appears to be a good use of a $1.99. But if no words are learned, does it still count as educational? Do inedible cookies count as delicious if they look beautiful with most of the right ingredients (aren’t baking soda and baking powder the same?)?

I am not expecting every individual app to be held to the “effect size” benchmark requiring standardized tests to assess effectiveness (although I do think more extensive learning systems should be held to such standards). But I would expect that the developers begin to engage in some sort of assessment of whether children learn the content the app is intending to teach. Furthermore, an app may increase motivation or interest in literacy and other educational activities. Measuring these kinds of changes is also important.

It was understandable when apps first came on the education scene that there was little research to back up educational claims. Now though, it seems fair to expect some sort of quality control on what allows an app to self-proclaim educational merit. Dr. Hirsh-Pasek and colleagues wrote a wonderful guide about what should be in an app to make it educational. This is a great start. I think we also need criteria that is not only based on what ingredients go into making an app, but also what evidence it produces. Intent should be backed up by results. And indeed, one day I will perfect my intent to become a master baker, with a deliciously baked chocolate chip cookie that Cookie Monster would be proud to devour.

Jennifer Kotler Clarke, PhD, is the Vice President of Content Research & Evaluation at Sesame Workshop, the nonprofit educational organization behind Sesame Street and other educational programs for children. Dr. Kotler Clarke is responsible for research planning and creating logic models in order to maximize the impact of Sesame Workshop’s content on children, parents, and educators across the globe. She oversees research design, methodology, assessment and data analysis. She also develops and executes studies designed to inform the creation of educational material for children.