1 Introduction

Education is important at all levels. At national or state levels, there is increasing evidence that education is positively related to economic growth (Hanushek and Kimko, 2000; Hanushek and Woessmann, 2008; Hanushek and Woessmann, 2010, 2012; Hanushek et al, 2015). Hanushek and Woessmann (2008), for example, report, using a cross-country dataset, that for each additional year of schooling, the long-run growth rate of GDP per capita is 0.58% points higher, and this value is statistically significant.Footnote 1 While quantity of education is important, quality of education (usually measured by performance of students in standard international tests) is even more so: Hanushek and Woessmann (2008) conclude from results of several studies that there is around a one percentage point gain in GDP growth rates for every one country-level standard deviation higher test performance.

In addition to these benefits to society, education is also important in determining lifetime returns of individuals (see, for example, Psacharopoulos, 1994; Psacharopoulos and Patrinos, 2004; Walker and Zhu, 2008; Colclough et al, 2010; Chevalier, 2011; Walker and Zhu, 2011). For example, the private rate of return to investment in an additional year of schooling in a developed economy such as the United States is of the order of 10% per year in real terms (Psacharopoulos and Patrinos, 2004). This is likely to be higher for less developed countries (Psacharopoulos and Patrinos, 2004), and might vary by level of education (Colclough et al, 2010).

Some of the effects of education are clearly beneficial to society as a whole (social or external returns), while others are confined solely to the individual (and are therefore private). The existence of substantial social and external benefits from education (McMahon, 2004) justifies its public provision. Thus, compulsory education is typically funded from the public purse, while further and higher education, which is traditionally seen to have a greater proportion of private benefits than primary and secondary education, is usually only partially funded by government.

With competing demands for public money, however, it is important that resources for education are used efficiently: there have been few attempts to evaluate the costs of inefficiency in education, but one study suggests that the losses from inefficiency in secondary education are under 1% of potential GDP (Taylor, 1994). In addition, the results surrounding the relationship between education and growth suggest that it is important to distinguish between the quantity of education provided and the quality of provision. This has important implications for studies of efficiency in education since measures of quality are traditionally more difficult to derive than measures of quantity.

It is useful to distinguish at the outset between the terms ‘efficiency’ and ‘effectiveness’. Efficiency refers to ‘doing things right’, while effectiveness relates to ‘doing the right things’ (Drucker, 1967). Thus, in the context of education, efficient use of resources (be that financial or the innate ability of students) occurs when the observed outputs from education (such as test results or value added) are produced at the lowest level of resource; effective use of resources ensures that the mix of outcomes from education desired by society are achieved. It is efficiency (rather than effectiveness) of education with which this special issue is largely concerned.

Identifying how efficiently education is provided has challenged researchers over the decades. Development of frontier estimation techniques (in the late 1970s) such as data envelopment analysis (DEA) (Charnes et al, 1978, 1979; Banker et al, 1984) and stochastic frontier analysis (SFA) (Aigner et al, 1977; Battese and Corra, 1977; Meeusen and van den Broeck, 1977) led to an expanding literature on efficiency in the education context. Education institutions (such as schools or universities) are seen as multi-product organisations producing an array of outputs from various inputs. Frontier estimation methods can be used to estimate cost functions or production frontiers for these institutions from which efficiency estimates can be derived.

This special issue represents a timely reflection on efficiency in education as countries struggle to recover from the global financial crisis (which started circa 2008) and its effect on public funding. The special issue grew out of (but was not confined to) a two-day workshop on efficiency in education which took place in London 2014. This introductory paper is structured in seven sections of which this is the first. The remaining sections provide an overview of the themes addressed by the special issue and introduce the papers featured within.

2 Frontier estimation methods: a literature review

In line with the overarching theme of the special issue ‘Efficiency in education: a review of literature and a way forward’ by De Witte and López-Torres focuses on reviewing exclusively the efficiency (rather than effectiveness) in education literature. The paper, aimed at experienced researchers in the field, provides a comprehensive overview of frontier efficiency measurement techniques and their application in the education context up to 2015. A unique feature of this review compared to previous ones (for example Worthington, 2001; Johnes, 2004; Emrouznejad et al, 2008) is that it bridges the gap between the parametric (generally in the form of regression or SFA) education economics literature and the non-parametric (typically in the form of DEA) efficiency literature. This is indeed a useful contribution and it draws out hitherto unremarked connections between themes in the two strands of literature.

This paper provides an excellent resource to researchers in the field as it covers studies based on various levels of analysis (individual students, institutions and nations), identifies the datasets and measures of inputs and outputs which have been used in past papers and details the possible non-discretionary or environmental variables which are relevant in education studies. Discussion of methodological concerns revolves around endogeneity and its sources, in particular omitted variable bias, measurement error, selection bias and simultaneous causality issues. This leads to a discussion and comparison of each of these problems in the parametric and non-parametric contexts. The efficiency (non-parametric) literature is criticised for largely ignoring the possible detrimental effects of endogeneity on efficiency while devoting too much energy to minor methodological details.

A particular contribution of the review concerns the links made between parametric and non-parametric approaches in four cases. First of all, matching analysis is compared to conditional efficiency. Second, quantile regression is related to partial frontiers. Third, difference-in-difference analysis is compared to meta-frontier analysis. Fourth, it is noted that value added studies are more prevalent in the economics of education literature than in the efficiency literature, where they are relatively rare. Mutual benefits, it is argued, could be made in each of these four areas if researchers in one field learnt from those in the other.

3 Assessing equity and effectiveness in resource allocation for primary and secondary education

According to the review of De Witte and López-Torres in this special issue, educational studies may focus on several levels (university, school/high school, district, county and country), but only a small number of frontier-based efficiency studies have focused on country or multi-country analysis. There are several reasons why authors may avoid cross-country efficiency analyses. First, comparable data at national level can be difficult to obtain. But the availability of datasets such as TIMMS (the Trends in International Mathematics and Science Study), PIRLS (the Progress in International Reading Literacy Study) and PISA (Programme for International Student Assessment) have made it possible to compare countries based on pupil attainment. Second, an assumption underlying frontier estimation is that the units of assessment face the same production conditions and technology. This assumption is difficult to maintain in a cross-country framework especially where the sample of countries might be particularly diverse. The heterogeneity of country technologies and education policies may therefore hinder the comparability of the results, but at the same time, it is the only way to compare and benchmark educational policies across countries. Some examples of cross-country analyses include Afonso and St Aubyn (2005, 2006), Giménez et al (2007) and Thieme et al (2012).

In this issue Cordero, Santin and Simancas-Rodriguez in their paper ‘Assessing European primary school performance through a conditional nonparametric model’ contribute to the cross-country empirical literature by providing an application of a frontier-based method to assess the efficiency of primary schools in 16 European countries (based on data from PIRLS, 2011). Efficiency of primary schools is assessed through an order-m non-parametric approach where a single output (average results in PIRLS Reading test) and inputs relating to the prior achievement of students, and to school resources such as teachers, computers and instructional hours, are used. The importance of the environment where schools operate is stressed in this paper and taken into account in a second-stage analysis, where country and school contextual factors are considered to account for the heterogeneity of countries and schools. The findings reveal that country-specific factors have a higher influence on efficiency than school-specific factors highlighting, therefore, the importance of benchmarking countries’ educational policies.

Much is being done on cross-country analyses by the OECD, whose report on equity and quality in education we highlight (OECD, 2012). Cross-country comparisons focus regularly on funding and educational expenditure issues (Afonso and St Aubyn, 2005, 2006), but the general consensus appears to be that providing more money and resources to schools is not enough to improve their quality and their students’ performance (Hanushek, 2003). The way the money (or funding) is allocated, however, is a means by which governments can improve equity between schools facing different environments (typically a harsher environment is one where the percentage of economically and culturally disadvantaged students is higher). These issues are at the heart of the papers in this special issue by Haelermans and Ruggiero entitled ‘Nonparametric estimation of the cost of adequacy in education: the case of Dutch schools’ and by Weber, Grosskopf, Hayes and Taylor (henceforth Weber et al) entitled ‘Would weighted-student funding enhance intra-district equity in Texas? A simulation using DEA’.

These papers represent a timely contribution to the literature given the current interest in allocation of funding in Europe in response to the 2008 economic crisis (see European Commission, 2014 for the various funding mechanisms of public sector schools). In England, for example, the Government has recently produced a consultation document on the funding of schools (Department for Education, 2016). A major part of the proposal is a move away from block funds allocated to schools on the basis of historical costs and towards a funding mechanism which removes inequities by allocating a lump sum to schools and incorporating a national mechanism for dealing with the extra costs faced by low-population areas with small schools. In Portugal, a new formula for the financing of higher education institutions was put forward in July 2015, but public primary and secondary schools are still financed based on approved budgets.

The case of the Netherlands is analysed in this special issue by Haelermans and Ruggiero, where it is shown that schools in harsher environments do indeed receive extra funds; however, excess funding does not compensate for the excess costs of achieving acceptable standards (the authors derive the cost required for schools to achieve a certain standard of performance deemed acceptableFootnote 2). The minimum cost to achieve these standards is called adequacy by the authors (see also Levačić, 2008). Results further suggest that the minimum cost to reach standards for schools located in favourable environments are about 70% of the costs of schools in harsher environments, which testifies to the importance of taking the environment of schools into account in efficiency and effectiveness studies.

In Weber et al's paper, the authors also tackle financing issues, this time in the US (schools in the district of Texas), linking these with equity issues. The equity the authors are interested in is not equity of school budgets, but equity of school outcomes (analysed under two budget scenarios: (1) current budget and (2) a simulated budget determined by student weighted funding, based on the schools’ number and type of students). Main results show that policies that reduce inefficiency tend to enhance equity as well. The paper also suggests that weighted student funding may be a way to reduce inequalities, but cautions against the fact that for inefficient schools an enhanced budget may not resolve their inefficiencies and inequalities. That is, there are winners (schools that would see their budgets increase under a weighted student funding) and losers (schools that would see their budget shrink under a weighted student funding), but extra funds will eventually only benefit efficient schools, which are more able to use the extra resources efficiently. This paper therefore links three important issues in education: funding, efficiency and equity (see also Woessmann, 2008 for links between efficiency and equity of schools in the EU). In addition, Weber et al contribute to and extend the literature on school funding formulae (Levačić, 2008; BenDavid-Hadar and Ziderman, 2011).

4 Assessing aspects of efficiency and productivity in tertiary education

As noted earlier, education (including higher education) contributes to economic growth; higher education also receives public funding in many countries, and so it is important to understand productivity growth in universities. The paper by Edvardsen, Førsund and Kittelsen (henceforth Edvardsen et al) entitled ‘Productivity development of Norwegian institutions of higher education 2004–2013’ provides an excellent example of how a Malmquist productivity index (including computation of components) can be used to inform policy makers and managers. The study is based on universities in Norway over a 10-year period. With only a small number of exceptions, previous studies of higher education productivity growth (Flegg et al, 2004; Carrington et al, 2005; Johnes, 2008; Worthington and Lee, 2008; Kempkes and Pohl, 2010; Margaritis and Smart, 2011) rely on point estimates of productivity change. This study, however, applies a bootstrap procedure (Simar and Wilson, 1998, 1999, 2000) for the Malmquist productivity index (MPI) which takes into account sampling variation. It differs from Parteka and Wolszczak-Derlacz (2013), which also applies bootstrap methods in the MPI context, in that it (i) derives and examines the components of the MPI and (ii) visually inspects productivity change in the context of labour input changes.

The production relationship is defined with 2 inputs and 4 outputs. The initial analysis of the components of MPI (catch-up and frontier shift) suggests that the two measures move in parallel until 2009 after which frontier shift grows markedly, while the catch-up measure gradually deteriorates. Productivity change distributions for each university over time are examined in three time blocks and reveal a general picture that the group of institutions with significant productivity decrease is shrinking while the group with productivity increase is expanding.

The authors note that it would be interesting to extend the study to examine the relationship between size and productivity growth and in particular to the question of whether merging institutions might increase productivity; the effect of merging on both efficiency and productivity is largely unresearched (Johnes, 2014). While there are some mergers in this dataset, the small number precludes a more detailed study at present but is something which might be possible as the database increases.

5 Using student ratings to assess performance in tertiary education

There are two papers in this special issue (one by Thanassoulis, Dey, Petridis, Georgiou and Goniadis henceforth Thanassoulis et al – entitled “Evaluating higher education teaching performance using combined analytic hierarchy process and data envelopment analysis” and another by Sneyer and De Witte entitled ‘The interaction between dropout, graduation rates and quality ratings in universities’) that use students’ views to assess efficiency in the higher education context. They are distinct, however, in that one (Thanassoulis et al) uses student feedback to assess performance of individual tutors, while the other (Sneyers and De Witte) uses student satisfaction in a model with both graduation and dropout rates to examine efficiency at programme level. Much of the extant literature on efficiency and frontier estimation in higher education focuses on the university or the department as the unit of assessment (exceptions include Dolton et al, 2003; Johnes, 2006a, b; Barra and Zotti, 2016 whose empirical analysis is at the student level, and Colbert et al, 2000 who examine efficiency in the context of MBA programmes). These two papers in this special issue therefore offer original contributions by providing approaches for evaluating efficiency at tutor and programme levels which, as established in the review paper by De Witte and Løpez-Torres in this special issue, have not previously been examined.

The paper by Thanassoulis et al deals with the assessment of teaching efficiency of academic staff. The method it proposes combines the Analytical Hierarchy Process (AHP) and DEA in order to arrive at an overall assessment of a tutor reflecting their performance in teaching. To the extent, however, that a teacher normally also carries out research the method also allows the assessment of the teacher given their performance in research. A crucial feature is that the teaching dimension reflects the value judgements made by the students at the receiving end of the teaching. This is a key departure point of this study from previous studies in this area. The basic premise is that students, depending perhaps on gender, career aspirations and type of course (e.g. optional vs compulsory), may attach different weights to the criteria, deeming some of them more important than others. The different weights are then used in the computation of a mean aggregate score on teaching per tutor, which is operationalized by AHP (Saaty, 1980). The aggregate grade (or grades) on teaching along with measures of the research output by the tutor are then used as outputs in a DEA model, set against the salary and teaching experience of the teacher.

The authors illustrate their approach using real data (modified for confidentiality) on these variables for teachers at a Greek University. The DEA model is solved to estimate the scope for improving performance by the teacher depending on the relative emphasis given to teaching versus research. It is noteworthy that whether emphasis is placed solely on improving on teaching or equally on improving teaching and research similar results are obtained where the estimated scope to improve on teaching is concerned. This suggests teaching and research are largely separable, and poor teaching performance is not generally compensated for by good performance in research. Information of this type can be useful to a teacher in terms of setting aspiration levels for improvement in teaching, depending on whether the tutor is to focus on teaching or teaching and research.

The paper by Sneyers and De Witte, in this special issue, addresses the use of first-year student dropout rates,Footnote 3 programme quality ratings and graduation ratesFootnote 4 as indicators of university performance for the distribution of funding. In the Netherlands, for example, 7% of the higher education budget is earmarked for performance mainly on these three indicators, yet there is little work to date on the interaction between them. Is it possible, for example, to perform well along all three dimensions simultaneously? Given that dropout rates at the end of the first year at university could actually be a means of selecting the best and most motivated students to go forward, it is important to examine graduation rates and quality rating given the first-year student dropout rate. Specifically, the paper compares programmes on graduation rates and quality ratings (conditional on first-year dropout rates) and examines the programme and institutional characteristics which underpin the performance.

The paper is original in two ways. First, the level of analysis is the programme (rather than, for example, the institution or department). Second, the paper applies a non-parametric conditional efficiency method with continuous environmental variables (Cazals et al, 2002; Daraio and Simar, 2005) and extends this to also include discrete environmental variables (De Witte and Kortelainen, 2013). The significance of the effects of environmental variables on performance at programme level can be derived using this approach.

The study employs a rich dataset for universities in the Netherlands. The authors find that there is considerable variation in how the first-year dropout rate (and the selectivity which that implies) is used to have a positive effect on graduation rates and programme quality ratings. Some programmes are found to be inefficient in terms of their graduation rates and quality ratings (given the incidence of first-year dropout) and could learn from the practices characterising the efficient programmes. There is clear evidence of programme characteristics which influence graduation rates and quality ratings. These results, therefore, have clear policy implications including, for example, that policies formulated at programme level would have higher impact than those formulated at an institution level.

6 Methodological papers with special reference to education

There are two papers with a primary focus on methodology and a secondary one on an empirical application in this issue: one is by Mayston, entitled ‘Convexity, quality and efficiency in education’ and the other by Karagiannis and Paschalidou, entitled ‘Assessing research effectiveness: A comparison of alternative parametric models’.

The paper by Mayston addresses the issue of incorrectly assuming convexity for the production possibility set (PPS) in DEA as this could happen in assessments in the education context. The question is of course not new and many authors have questioned the assumption of convexity in DEA in general. For example, Farrell (1959) notes that indivisibilities in production or economies of specialisation could lead to a non-convex PPS. He concludes, however, that in the framework of competitive markets lack of convexity in production, or indeed in indifference curves, is unnecessary for “received economic theory” so long as each producer accounts for a negligible part of the total output. Within the extant DEA literature, it is well understood that in many contexts the PPS may not be convex. Free Disposal Hull (FDH) technologies, introduced by Deprins et al (1984), can be deployed to measure efficiency, set targets for performance, etc. when convexity of the PPS cannot be assumed. An interesting empirical application in which DEA and FDH are used on the same dataset is that by Cullinane et al (2005). They assess container ports on efficiency where inputs in the form of indivisible capital items such as of berths, gantry cranes, straddle carriers, etc., can lead to a non-convex PPS. They conclude that the FDH method does not in some cases set demanding targets and can make units appear efficient simply for lack of comparators. Its advantage is that when units are not efficient, the benchmarks exist in real life so that they can be used as role models for less efficient units to emulate. DEA with the assumption of convexity of PPS on the other hand is more discriminating in terms of efficiency and so better for setting more challenging performance targets. This, however, can be at the expense of using virtual rather than real units as role model benchmarks for inefficient units.

The Mayston paper argues that in the specific context of assessments by DEA of comparative efficiency in education, convexity may not hold because of the fact that outputs have a quality dimension in a way that differs from output quality in other contexts. In addition, lack of convexity can arise because both physical capital assets such as lecture theatres and libraries are non-divisible and because intangible assets in the form of knowledge specialisation by academics can also lead to indivisibilities of efficient research output. It is suggested that we cannot simply assume convexity in an educational context. This would require that gains due to complementarity between research and teaching quality should be sufficiently strong to make up for the loss of gains that would result from the ‘indivisible’ specialised knowledge needed for the production of original contributions to research.

The situation is further complicated by two facts. First in the educational context, assessments of research and teaching are reflected in grades. Each grade covers a range of performance. Secondly, rewards for grades are highly non-linear (e.g. in the UK research assessments of Universities, the financial benefits from achieving a grade 4 are much higher than for achieving a grade 3). The paper argues factors of the foregoing type in the educational context militate both against convexity in the PPS and lead to non-linear utilities over outputs.

The effect of assuming convexity in DEA when it does not exist is that it can lead to results which understate the true technical efficiency of a unit while at the same time overstating its allocative efficiency. This can happen because the ‘convex’ technically efficient point can be placed on the exterior of the non-convex frontier. Caution is therefore needed, in particular, in decomposing overall inefficiency into allocative and technical efficiency.

The paper by Karagiannis and Paschalidou compares the Benefit of the Doubt (BoD) model of (Cherchye et al, 2007) and the Kao and Hung (2003) (K&H) model in assessing entities characterised by multiple indices of performance. Further, it addresses the case where there is no traditional set of inputs that need to be set against the indices. The authors refer to this context as a case of assessing the ‘effectiveness’ rather than ‘efficiency’ of the use of resources by the entities. Each one of the two methods is used under three alternative approaches for arriving at weights by which the indices of performance on the criteria can be aggregated to an overall index of performance. They illustrate the six resulting approaches using data on the research outputs of faculty of a Greek University.

The BoD model is essentially equivalent to a DEA model in which the PPS is formed using constant returns to scale (CRS) technology when the input level is the notional 1 across all the entities (academics in this case), while the output levels reflect measures of attainment on each criterion (e.g. papers, citations, books, etc. in this case). The K&H model is similar to the BoD model in that it attempts to estimate an optimal set of weights to assign to each criterion. However, it does this under the sole restriction that the weights should add up to 1, rather than under the traditional DEA restrictions. This is equivalent to computing the best weighted average possible for the criteria values of each entity being assessed in turn. Such a weighted average makes better sense in practice when indices of attainment on each criterion are being added so that a composite index is arrived at to reflect overall performance. The paper notes that the K&H and BoD models are related in the solutions produced when the measures of attainment on each criterion range between 0 and 1 (e.g. when they are indices).

The paper proceeds to explain how the two models differ for the case where we may want to restrict weight flexibility reflected in the foregoing paragraph. Six alternative approaches to the flexibility of the weights are used in the paper, ranging from full flexibility (each entity is free to choose the weights assigned to each performance index) to non-flexibility reflected in a common set of weights (each entity assigns the same weights to each performance index). The paper uses data on the research outputs of academics from the authors’ own institution. One key finding is that there is greater variability in results within each one of the two methods (BoD v K&H) depending on how the weights on the criteria are restricted than between the methods themselves when the same type of restriction is applied on the weights. Faculty are found to follow a more or less bi-modal distribution in research effectiveness with very few achieving well on research output and most achieving poorly. The findings clearly have managerial implications for improving research output by faculty.

7 Concluding remarks

In this introductory paper to the special issue, we have presented an overview of the various papers that constitute it, highlighting their main contributions and their main findings. We also put these papers into the context of existing literature on efficiency of education calling the attention of the reader to some fundamental issues in this context. The issues addressed here include: cross-country analyses and their importance for educational policy benchmarking; the need to understand the impact of funding policies on the quality, efficiency and equity of education; the need to analyse educational issues over time in dynamic settings; the importance of using student feedback in tertiary education efficiency analysis as well as the importance of assessing it at person level and finally the importance of understanding methodological assumptions behind efficiency models like convexity and the importance of using alternative assessment models on the same data and reconciling the findings.

The foregoing list is inclusive of current and pertinent issues in education, but many others could have been raised. Some examples of further issues include the impact of certain education practices (like student repetition or streaming) in primary and secondary education; the trade-off or complementarity between teaching and research outputs in university assessments; funding and financing in universities and their impact on efficiency, and the measurement of quality of both inputs and outputs at all levels of education.

We hope this summary will enable the reader at a glance to identify the papers within this special issue that best fit his/her research interests.