Ramón Arce, Esther Arias, Mercedes Novo, and Francisca Fariña
Universidade de Santiago de Compostela, Spain
Received 5 December 2019, Accepted 13 April 2020
Abstract
The inconsistency in the results both internally and between of previous meta-analyses on batterer intervention program efficacy, and the publication of new batterer interventions underscored the need for an up-to-date meta-analyticalreview. A total of 25 primary studies were found from literature search, obtaining 62 effect sizes, and a total sample of 20,860 intervened batterers. The results of a global meta-analysis showed a positive, significant, and of a mediummagnitude effect size for batterer interventions, but not generalizable. Nevertheless, the results exhibited a significantly higher rate of recidivism measured in couple reports (CRs) than in official records (ORs). As a consequence, intervention efficacy measuring in CRs was null, whilst in ORs was positive and significant. As for the intervention model, positive andsignificant effects were observed under the Duluth Model and cognitive-behavioural treatment programs (CBTPs), but a higher effect size was obtained with CBTPs in comparison to the Duluth Model (under this model, interventions may have negative effects, i.e., an increase in recidivism rate). In relation to intervention length, short interventions failed to reduce recidivism in ORs and may have negative effects, while long interventions were effective in reducing recidivism rate in ORs without negative effects. Efficacy evaluations in short follow-ups were invalid as artificially boosted recidivismreduction rate. Limitations of ORs and short follow-ups as measures of the intervention efficacy and implications of results for batterer intervention are discussed.
Resumen
La inconsistencia interna y entre las revisiones metaanalíticas en los resultados sobre la eficacia de los programas de intervención con maltratadores, así como la publicación de nuevos estudios, pone de manifiesto la necesidad de llevar a cabo una revisión metaanalítica actualizada. Se encontró un total de 25 estudios primarios, de los que se obtuvieron 62 tamaños del efecto para una muestra total de 20,860 maltratadores intervenidos. Los resultados del metaanálisis global mostraron un tamaño del efecto promedio positivo, significativo y de una magnitud moderada para la intervención con maltratadores, pero no generalizable. Sin embargo, los resultados revelaron una tasa de reincidencia mayor medida en los informes de las parejas (IPs) que en los registros oficiales (ROs). Como consecuencia, la eficacia de la intervención medida en los IPs resultó nula, mientras que en los ROs fue positiva y significativa. En relación al modelo de intervención, se encontraron tamaños del efecto positivos y significativos con el Modelo Duluth y los programas de tratamiento cognitivo-conductuales (PTC-Cs), pero el tamaño del efecto obtenido con los PTC-Cs era significativamente mayor que con el Modelo Duluth (con este modelo las intervenciones pueden tener efectos negativos, es decir, unincremento en la tasa de reincidencia). En relación a la longitud de la intervención, las intervenciones breves fallaron en la reducción de la reincidencia en los ROs y pueden tener efectos negativos, en tanto que las intervenciones largas fueron eficaces en la reducción de la tasa de reincidencia en los ROs y no dan lugar a efectos negativos. Las evaluacionesde la eficacia de la intervención en períodos cortos de seguimiento resultaron no válidas al incrementar artificialmente la tasa de reducción de la reincidencia. Se discuten las limitaciones de la medida de la eficacia de la intervención en los ROs y en períodos cortos de seguimiento, así como las implicaciones para la intervención con maltratadores.
Keywords
Batterer, Intervention assessment, Duluth model, Cognitive-behavioural treatment programs, Official records, Couple reportsPalabras clave
Maltratador, Evaluación de la intervención, Modelo Duluth, Programas de tratamiento cognitivo-conductuales, Registros oficiales, Informes de las parejasCite this article as: Arce, R., Arias, E., Novo, M., and Fariña, F. (2020). Are Interventions with Batterers Effective? A Meta-analytical Review. Psychosocial Intervention, 29(3), 153 - 164. https://doi.org/10.5093/pi2020a11
ramon.arce@usc.es Correspondence: ramon.arce@usc.es (R. Arce)Intervention programs for batterers have been the subject of controversy ever since their conception. These interventions have been open to criticism from both a restorative perspective and a feminist perspective demanding resources should be allocated to victims, not to batterers. Such criticism, however, comes into direct conflict with the legal and judicial mandate of prison institutions that are obliged to embark on the resocialization and rehabilitation of inmates. A further bone of contention concerns the efficacy of interventions, that is, if the intervention lacks efficacy, it is unfounded. Initially, interventions were designed for violent physical or sexual offenders, but in recent years they have been generalized to less serious offences, mainly minor offences and misdemeanours with non-custodial community sentencing. Claims that interventions with batterers were not effective (Rosenfeld, 1992) have been refuted by meta-analytical reviews that have found interventions to be effective. Nevertheless, in several reviews the mean effect size was significant, but modest (Babcock et al., 2004), whereas in others the mean effect size was important, but not generalizable to all interventions (Arias et al., 2013). Notwithstanding, estimates on the benefits of batterer interventions range from 5% (Babcock et al., 2004) to 20% (Arias et al., 2013). Moreover, the comparison of the mean effect size of the efficacy of interventions (BESD; Rosnow & Rosenthal, 1988) in the most recent meta-analysis (Arias et al., 2013), r = .20, with the results for the treatment of delinquency in general (Redondo et al., 2002), and the results for sexual offenders in particular, r = .13 (Schmucker & Lösel, 2015), showed batterer interventions were similar or more effective. Moderators of the intervention were analysed as the effects were subject to heterogeneity. The variable measuring effects has been inconsistent, in comparison to the systematic positive and significant, but not generalizable effect reported in official records (ORs) (Arias et al., 2013; Babcock et al., 2004; Feder & Wilson, 2005; Rosenfeld, 1992), the results for couple reports (CRs) were inconsistent. In fact, Feder and Wilson (2005) and Arias et al. (2013) found a null effect in CRs, whilst Rosenfeld (1992) encountered a notable decline in the recidivism rate (33% in treated batterers vs. 47.3% in dropouts), and Babcock et al. (2004) the same effect as in ORs, d = 0.18. Likewise, the discrepancies observed in the results for the type of intervention. Thus, in terms of intervention programs based on the Duluth Model, a feminist psychoeducational approach, Babcock et al. observed a significant effect on the reduction of recidivism in ORs, d = 0.25 (12.4%) and CRs, d = 0.24 (11.9%), but no significant effect was observed in Cognitive-Behavioural Treatment Programs (CBTPs). Moreover, Arias et al. found a moderate effect size, but not generalizable to all studies (i.e., with adverse-negative-outcomes) in ORs both in the Duluth Model, d = 0.41, and in CBTPs, d = 0.47, which was insignificant in CRs with either type of treatment. As for the length of the intervention, Babcock et al. (2004) determined short and long interventions were as equally effective in ORs, (d = 0.16 and d = 0.20, respectively), and Arias et al. (2013) noticed long interventions had a significant and generalizable effect, d = 0.49, but short ones did not. In CRs, Babcock et al. obtained small significant effects in both short and long interventions. In contrast, Arias et al. attained both short and long interventions could increase recidivism. During long-term follow-up, Babcock et al. observed a significant effect in the decline of recidivism in ROs, d = 0.25, which Arias et al. claimed was not generalizable, and could potentially increase the recidivism rate by 17.7%. In short-term follow-ups, the meta-analysis by Babcock et al. revealed (unexpectedly given that the detected rate of recidivism is obviously smaller in shorter than in longer periods) a smaller effect of the intervention, d = 0.13, but significant, than in the work by Arias et al., who obtained a null effect, d = 0.04, which could eventually have an adverse effect in the range of 22%. In CRs, Babcock et al. obtained a similar pattern of results: significant effects in both short and long follow-ups, and paradoxically large in long follow-ups that provide a broader measure of recidivism than short periods. In comparison, Arias et al. found a null effect in short follow-ups, d = 0.03, with a negative effect of up to 28.3%, and negligible, d = 0.12, with adverse effects of up to 18.2% in long follow-ups. The inconsistency in the results both internally and between meta-analysis, and the fact that initially they measured physical violence, which was recently extended to encompass psychological violence, as well as the proliferation of new batterer intervention programs underscore the need for a meta-analytical review to establish the actual state-of-the-art of batterer interventions, and to elucidate the inter- and intra-analysis inconsistencies. Study Search The search of studies was designed to update the 2012 meta-analysis by Arias et al. (2013). Thus, the following search strategies were employed: a) search in broad databases PsycInfo, ERIC, EBSCO, and Google Scholar; b) search in gender violence observatories (v.gr., www.work-with-perpetrators.eu; www.VAWnet.org; www.mincava.umn.edu; www.courtinnovation.org; www.cienciaspenales.net; www.iresweb.org); c) contacting prominent researchers in the field (that is, researchers who were the corresponding authors of all the papers found, both the included and excluded papers); and d) review of all the bibliography in the references lists of all the papers found, and previous meta-analytical reviews. The most productive keywords were: “batterer”, “intervention program”, “evaluation”, “assessment”, “effectiveness”, “intimate partner violence”, “partner-violent men”, “recidivism”, “reoffending”, “attrition”, “domestic violence”, “court mandates batterer intervention”, and “prison intervention”. The relation of keywords was generated by a system of successive approximations whereby the initial keywords of previous meta-analyses determined the inclusion of relevant keywords from all the papers found. Inclusion and Exclusion Criteria Bearing in mind the objectives of the meta-analysis, the following inclusion criteria were applied to the papers eligible in the search: a) they reported the number of participants; b) they provided the recidivism rate of the sample of subjects who completed the intervention; c) they applied an experimental or quasi-experimental design (with or without control group); d) they indicated theoretical approach, contents, and duration of the intervention program; and e) they stipulated the follow-up period for measuring recidivism. Studies involving a 6-month follow-up period or shorter were excluded owing to the lack of validity in the measures. A total of 25 papers were included under these criteria, with 62 effect sizes, and a sample total of 20,860 intervened batterers. The study search flowchart is shown in Figure 1. Data Analysis A meta-analysis of experiments was performed by correcting the effect sizes according to the distribution of artifacts (Schmidt & Hunter, 2015). As the measure of recidivism was normally expressed in percentages, or when this was not the case they could be computed, in this analysis the measure of recidivism was defined as the percentage of batterers who had reoffended in gender violence (data on recidivism in other offences were excluded) during the follow-up period. Two indices have been proposed for calculating the effect size in proportions: Cohen’s h and Hedges and Olkin’s (1985) δ, leading to slightly higher, but qualitatively similar results to h (Arias et al., 2013). In this meta-analysis, Hedges and Olkin’s δ was employed. Nonetheless, all of the analyses with h were replicated and obtained qualitatively similar results. In terms of interpretation, δ and h are interchangeable with Cohen’s d, and an effect size of 0.20, 0.50, 0.80 is considered to be small, medium, and large, respectively. However, this classification has generated discrepant interpretations, where small effect sizes were classified as negligible by certain authors but relevant by others. Hence, the effect size was transformed into percentiles (Monteiro et al., 2018) and the magnitude was interpreted in terms of percentage superiority over all possibilities (Vilariño et al., 2018). As for the effects of the calculus of δ, in previous reviews it has been calculated with control groups both in experimental and quasi-experimental designs. The results showed the type of design mediated differences in effect sizes. However, the type of analysis design is not an effect of the intervention. That is, the differences in results were due to the test value, not the intervention. For this reason, and in order to homogenize the inter-studies contrast value, to control the bias of control groups in primary studies and to include interventions without control groups, the contrast value of recidivism was calculated as reoffending without intervention and the recidivism rate weighted by the sample error of the control group total (Schmidt & Hunter, 2015), which was .22 in ORs and .28 in CRs. Computed effect sizes from primary studies (see Appendix), the following statistics were calculated: sample size weighted mean effect size (d), standard deviation of d (SDd), standard deviation predicted for sampling error alone (SDpre), standard deviation of d after removing sampling error variance (SDres), mean true effect size (δ), standard deviation of δ (SDδ), percentage of variance attributable to statistical artifacts (%Var), 95% confidence interval for d (95% CId), and 80% credibility interval for δ (80% CIδ). Though results opened the door to further meta-analyses (the variance explained by artifactual errors was below 75%, i.e., 75% rule; Schmidt & Hunter, 1981), further meta-analyses were impossible as number of sizes (k ≤ 3) or sample sizes (N < 400) were insufficient, or no other moderators had been defined in the primary studies. When meta-analyses are calculated for the study of moderators for the grouping of variables according to levels, the results do not include the comparison between meta-analyses. Thus, the results were insufficient, as is the case of the analysis of intervention models where significant effects were observed in ORs, but if they were the same or higher in one condition or another was not reported. The same occurred with the duration of the intervention, and the follow-up period. To overcome this contingency, the solution proposed by Amado et al. (2015) was adopted, where the statistic qs was calculated to compare two effect sizes by converting r, and then comparing the rs. Furthermore, the results of the meta-analysis were also quite limited in discerning the implications for practice, in this case, the intervention. Regarding the implications of the meta-analysis on professional practice, Fariña et al. (2017) recommend, according to the specific objectives at hand, sensitivity estimation (statistical U), effect quantifying (in present meta-analytical analysis, the efficacy of the intervention, BESD), and the probability of superiority. Criterion Reliability As CRs were gathered with different measurement instruments, the correction for the criterion unreliability was computed with Mosier’s (1943) composite reliability coefficient, r = .87. Coding For the analysis of moderators, the following recidivism variables were encoded (ORs, N = 19,429 and k = 46, and CRs, N = 1,351 and k = 16); follow-up time (≤ 12 months, N = 3,509 and k = 21, and > 12 months, N = 16,050 and k = 26); treatment duration (< 16 sessions/weeks, N = 3,631 and k = 14; and >16 sessions/weeks, N = 15,878 and k = 32); intervention level (individual vs. multi-level); type of sessions (individual sessions, group sessions or combined sessions); and type of treatment (Duluth Model: N = 15,027 and k = 25; CBTPs: N = 1,629 and k = 9, and OTIs: N = 2,853 and k = 12). The encoding was carried out by an encoder who noted the levels of each of the categories created and described by researchers, and marked exactly where they were referred to in the text. In the encoding of the type of intervention, the criterion stated by the authors of the primary studies was applied, but it should be noted that increasingly interventions based on the Duluth Model also include cognitive-behavioural training techniques for anger management, and CBTPs a gender perspective (e.g., patriarchal attitudes and values). Notwithstanding, the approach ascribed to the intervention was in line with the models proposed. A second encoder reviewed all of the studies using the same encoding system. Thereafter, both the registered categories, and the exact correspondence in the encodings were checked to estimate the true concordance (k) with the kappa statistic (Arce et al., 2000). Kappa corrects the random effect in concordance, but not the correspondence in encodings. Thus, the registration of a category by two encoders in two different places are two encoding errors in the true kappa (lack of correspondence), whereas it is encoded as correct with Cohen’s kappa. Moreover, consistency between two encoders is not sufficient for estimating the fidelity of encodings (i.e., coding correctness in relation to the content of the category), which would require for the encoding to remain stable through time, intra-encoder, and in other evaluation contexts (coders in other studies). The true concordance inter- and intra-encoders was very good (k > .81). Moreover, the encoders were consistent in other contexts i.e., evaluations (Arias et al., 2013; Gallego et al., 2019). Bearing in mind the inter- and intra-evaluator and inter-context consistency of coders, encodings were reliable, that is, reflected the original data. Contrast of Inter-criteria Consistency of the Recidivism Measurement The results showed a significantly higher rate (+.1540), Z = 12.32, p < .001, of recidivism measured in CRs, than in ORs. Thus, ORs masked a significant amount of recidivism (not surprisingly, because many couples refuse to collaborate, a predictable behaviour associated to revictimization; Brame et al., 2015; and concealment of injuries; Arce, Fariña, Seijo, et al., 2015). Moreover, the confidence margin of the mean rate of recidivism, higher in CRs than in ORs, CI 95% [.1491, .1589], revealed it was stable around 15%. Analysis of Outliers Taking into account that the efficacy of a treatment varies according to the variable measured, an analysis of outliers was carried out for each measure, ORs and CRs. Data showed that one of the studies, Stith et al. (2004) with CRs, was an extreme value (± 3 * IQR). As for the effect sizes of studies measuring recidivism in ORs, no extreme cases were observed nor outliers with the criterion ± 1.5 * IQR, but extreme cases were observed with a much more conservative criterion ± 2SD (Chauvenet’s criterion). This criterion requires that each meta-analysis should verify the correct classification of outliers and inconvenient studies (contrary to the hypothesis of analysis), or moderators (Tukey, 1960). Thus, a meta-analysis was performed with and without these studies to determine their impact and their effects on estimators. Moreover, a meta-analysis with and without extreme values was performed. Analysis of the General Efficacy of Batterer Intervention Programs The meta-analysis of the total effect size of the studies on the efficacy of batterer interventions (see Table 1) showed, for 62 effect sizes and for a sample of 20,860 batterers, a positive, significant (confidence interval for d does not include zero), and medium-moderate mean true effect size, δ = 0.44, for batterer interventions. However, these results were not generalizable (credibility interval for δ includes zero) to all studies on batterer interventions, since interventions may have negative effects. Likewise, the results without outliers (N = 20,215 and k = 59) showed a positive, significant (confidence interval for d does not include zero), medium magnitude, and generalizable (credibility interval for δ does not include zero), i.e., batterer interventions had no negative effects (on recidivism) mean true effect size, δ ≈ 0.50. Comparatively, the effect sizes with and without atypical values were equivalent, qs = .024, ns. Thus, the elimination of outliers excluded inconvenient results that had no effects on global results. In terms of net intervention, efficacy (reduction in recidivism rate) would be approximately 21.49% (r = .2149), but with a negative lower limit, i.e., an increase in recidivism of 6.49% (80% LCV [lower credibility value] converted to r = -.0649). The magnitude of the effect size was greater than 62.17% of all possibilities, and 24.34% of all the positive effect sizes of the interventions. Table 1 Note. k = number of effect sizes; N = total sample size; d = sample size weighted mean effect size; SDd = standard deviation of d; SDpre = standard deviation predicted for sampling error alone; SDres = standard deviation of d after removing sampling error variance; δ = mean true effect size; SDδ = the standard deviation of δ; %var = percent of observed variance accounted by artifactual errors; 95% CId = 95% confidence interval for d; 80% CIδ = 80% credibility interval for δ; 1meta-analysis including all effect sizes from primary studies; 2meta-analysis removing extreme and/or outliers effect sizes. Nevertheless, the percentage of variance explained by artifactual errors was below 75% in both meta-analyses, so these results were mediated by moderators of effect. The most extensively researched moderator is the variable measuring effects (criterion), i.e., recidivism, which has been assessed in primary studies in both ORs (i.e., police, courts, correctional institutions), and CRs. Analysis of the Effects of Batterer Interventions on the Variable Measuring Recidivism The meta-analysis on the studies measuring intervention efficacy on the recidivism rate in CRs (see Table 1), with a sample of 1,351 batterers and 16 effect sizes, revealed that interventions had no effect on recidivism, with a null (δ = 0.005) mean true effect size (U1 = 0.007, i.e., the independence of the distributions of treated and non-treated batterers was only 0.7%), and could be negative by up to -0.10 or, in other words, the intervention could have a negative effect increasing recidivism rate by up to 4.99% (r = -.0499). The lack of effects of the intervention measured in the recidivism rate in CRs was not mediated by moderators (% VAR > 75). Thus, the results were conclusive. The results were replicated (see Table 1) including the extreme case (N = 1,340 and k = 15), which underpins their stability. The meta-analysis on studies estimating intervention efficacy on recidivism in ORs, with a sample of 19,509 batterers and 46 effect sizes, showed (see Table 1) a positive, significant (confidence interval for d does not include zero), small-medium (δ = 0.45) and non-generalizable (credibility interval for δ includes zero) mean true effect size in the intervention, with possible negative effects of up to 4.99% (80% LCV converted to r = -.0499), which were mediated by moderators of the relationship between treatment and recidivism (% VAR > 75%). Without outliers (N = 18,875 and k = 44), the results were replicated: a positive, significant, medium, and mediated by moderators mean true effect size, but generalizable (credibility interval for δ does not include zero) to all the studies. Comparatively, the effect size of the meta-analysis with and without outliers were similar, qs = .029, ns. In short, the effects of outliers were constrained to the lack of generalization of the results to all of the studies, so by eliminating them we would be discarding inconvenient results, not real outliers. Thus, the mean recidivism reduction rate in ORs due to the intervention (versus non-intervened offenders) was 21.95%, but with a lower negative limit, that is, an increase in recidivism of up to 4.99%. Succinctly, on average, interventions reduced recidivism in ORs, but they could also have adverse effects by increasing recidivism rate by more than 50% (Taylor & Maxwell, 2009). The magnitude of the effect was higher than 62.55% of all possibilities, and higher than 25.1% of all the positive effect sizes for batterer interventions. Thus, the positive intervention effect is explained by the measurement method, ROs, rather than the measured construct, recidivism (Podsakoff et al., 2003). However, the percentage of variance explained by artifactual errors was lower than 75%. So, the results were mediated by moderators. Analysis of the Effects of Follow-up Time on Recidivism in ORs The recidivism follow-up period is a critical factor for criterion validity. In fact, short follow-up periods can artificially increase the efficacy rate, given that approximately 2/3 of reoffending occurs in the first two years (Redondo et al., 2001). According to Gondolf, 2000 and Jones and Gondolf (2002), reoffending occurs in batterers in half the time, and there is widespread agreement that most battery goes unreported, and not documented in ORs (European Union Agency for Fundamental Rights, 2014). For this reason, the follow-up period was taken as a moderator of the results of the intervention and were subdivided into short follow-up periods of 6 to 12 months, and long follow-up periods of more than 12 months, a classification attested to be valid (Arias et al., 2013; Gondolf, 2000). The meta-analysis on the efficacy of batterer interventions in short follow-up periods, i.e., 12-month or less in ORs (see Table 1), with a sample of 3,509 batterers and a total of 21 effect sizes, found a positive and significant (confidence interval for d does not include zero) mean true effect size (δ = 0.35), but not generalizable to other studies (credibility interval for δ includes zero), and of a small magnitude. These results were replicated in the meta-analysis without outliers (N = 2,875, k = 19), with a positive, and significant (confidence interval for d does not include zero) mean true effect size of moderate magnitude (δ = 0.69), which was generalizable to other studies (credibility interval for δ does not include zero). Comparatively, the effect size of the intervention in the meta-analysis without outliers was significantly higher than with outliers, qs = .164, p < .01, which was an unexpected and incongruous result (outliers diminished the efficacy of the intervention). Therefore, the results should be analysed without outliers. In practical terms, batterer interventions reduced the recidivism rate in ORs by an average 32.61%, without interventions with adverse effects, (80% LCV = .23), and the effect size was higher than 68.79% of all possibilities and 37.58% of all positive intervention effect sizes. However, these results were mediated by moderators, both in the meta-analysis with all of the effect sizes, and without outliers (% VAR < 75). The meta-analysis on the efficacy of batterer interventions on 12-month follow-up periods longer in ORs (see Table 1), with 26 effect sizes and a sample of 16,050 batterers, revealed a positive and significant (confidence interval for d does not include zero) mean true effect size of moderate magnitude (δ ≈ 0.50). Moreover, this result was generalizable to other studies (credibility interval for δ does not include zero), but was mediated by moderators (% VAR < 75). As for the reduction in the recidivism rate in ORs, batterer interventions reduced the recidivism rate by 23.34% (r = .2334) and, once again, without detecting negative intervention effects (80% LCV = .11). The magnitude of the effect size was above the 63.31% of all possibilities and 26.62% of all the positive intervention effect sizes. The mean true effect size of the intervention in ORs with a short follow-up period of up to 12 months (δ = 0.69) or less was significantly larger, qs = .101, p < .05, than that obtained for the long follow-up period of more than 12 months (δ = 0.48). Moderators could not be analysed as Ns (< 400) or effect sizes were insufficient (k ≤ 3). Analysis of the Effects of the Modality of the Intervention on Recidivism in ORs Another moderator reported in the literature as having effects on the intervention is the modality of the intervention, categorized as interventions based on the Duluth Model, Cognitive-Behavioural Treatment interventions, and Other Type of Interventions (Arias et al., 2013; Babcock et al. 2004; Feder & Wilson, 2005; Levesque & Gelles, 1998). Meta-analysis of the effects on recidivism in ORs for interventions based on the Duluth Model with a sample of 15,027 batterers, and 25 effect sizes revealed (see Table 1) a positive and significant (confidence interval for d does not include zero) mean true effect size of small magnitude (d = 0.37). Moreover, these results were not generalizable (credibility interval for δ includes zero) to all studies with interventions based on the Duluth Model. These results were replicated without outliers (N = 14,393, k = 23), with the exception that they were generalizable to all the studies (credibility interval for δ does not include zero). Comparatively, the effect sizes of all the studies, and those without outliers were similar, qs = .034, ns. Thus, the outliers were not such, but inconvenient studies with negative effects. In practical terms, interventions based on the Duluth Model reduced on average recidivism by 18.19% (r = .1819), with potential negative effects of up to 9.95% (80% LCV converted to r = -.0995), and an effect size above 60.25% of all possibilities, and 20.50% of all the positive intervention effect sizes. However, the results were mediated by moderators (% VAR < 75). The meta-analysis of the effects of batterer interventions with CBTPs on recidivism in ORs (see Table 1), with a total of 9 effect sizes and a sample of 1,629 batterers, exhibited a positive, significant for the intervention (confidence interval for d does not include zero), and of a large magnitude (δ = 0.88) mean true effect size, implying an average reduction in the recidivism rate of 40.27% (r = .4027). The generalization of the true effect size could not be studied as the variance was zero (i.e., the studies of this meta-analysis were not randomly distributed); thus, it was analysed with the effect size weighted by sampling error (conservative estimate versus the true size), and was generalizable (credibility interval for d does not include zero) to other studies, and without negative effects, 80% CI [0.70, 1.06]. Additionally, the lower limit of the effect (90% of the interventions surpassed this limit) was 33.40% (80% LCV converted to r = .3340). The magnitude of the effect size was higher than 73.24% of all possibilities, and 46.48% of all the positive intervention effect sizes. The meta-analysis of the effects of Other Types of Intervention (OTIs) on recidivism in ORs (see Table 1), with a total of 12 effect sizes, and a sample of 2,853 batterers, found a positive, moderate, and significant (confidence interval for d does not include zero) mean true effect size (δ = 0.63), generalizable to all the studies (credibility interval for δ does not include zero). The reduction in the recidivism rate with OTIs (versus the rate for non-intervened batterers) was on average 30.04% (r = .3004), and without interventions with negative effects, and a minimum threshold in the reduction of the recidivism rate of 20.08% (80% LCV converted to r = .2008). The reduction in the recidivism rate by OTIs (versus the rate for non-intervened batterers) was on average 30.48%, without negative effects, and with a minimum threshold in the reduction of recidivism of 20.55%. The magnitude of the effect size was higher than 67.36% of all possibilities, and 34.72% of all the positive intervention effect sizes. Nevertheless, the effect of these intervention modalities on recidivism was mediated by moderators (% VAR < 75). Once again, the analysis of moderators in the different types of interventions was not possible due to insufficient Ns and ks. Comparatively, the effect of the intervention on the reduction of recidivism was significantly higher in the CBTPs than in the Duluth Model, qs = .243, p < .01 and in the OTIs, qs = .117, p < .01, and in the OTIs than in the Duluth Model, qs = .126, p < .01. Analysis of Effects of Duration of the Intervention on Recidivism in ORs A further moderator of the effects that is systematically reviewed concerns the duration of the intervention (Arias et al., 2013; Babcock et al., 2004). In accordance with Babcock et al. (2004), two categories were created: short duration for programs under 16 sessions/weeks (4 months) and long duration for programs of more than 16 weeks/sessions. The results of the meta-analysis for short interventions (< 16 sessions), with a total of 14 effect sizes and 3,631 batterers (see Table 1), showed a non-significant (confidence interval for d includes zero) mean true effect size and, consequently, it was not generalizable to other studies. The meta-analysis replicated without outliers (N = 2,997, k = 12) found a positive and significant (credibility interval for δ does not include zero) mean true effect size of small magnitude (δ = 0.29), not generalizable (credibility interval for δ includes zero), and mediated by moderators (% VAR < 75) that could not be analysed owing to the insufficient Ns and ks. The comparison of effect sizes with all of the studies with the exclusion of outliers revealed significant differences, qs = .129, p < .01, ranging from a non-significant (all of the studies) to a significant (without outliers) mean true effect size. Thus, the outliers were not such, but inconvenient studies with very negative effects. In fact, brief programs can increase the recidivism rate by 39.5% (80% LCV converted to r = .3950). Regarding long intervention programs (> 16 sessions), the results of the meta-analysis with a total of 32 effect sizes and a sample of 15,878 batterers (see Table 1) displayed a positive and significant (confidence interval for d does not include zero) mean true effect size of moderate magnitude (δ = 0.55), generalizable to other studies (credibility interval for δ does not include zero). As for the efficacy of the intervention, long interventions reduced the recidivism rate in ORs by 26.52% (r = .2652), and without interventions with negative effects, and a minimum threshold in the reduction of recidivism of 10.44% (80% LCV converted to r = .1044). The magnitude of the effect size was higher than 65.17% of all possibilities and 30.34% of all the positive intervention effect sizes. However, the average effect size was mediated by moderators (% VAR < 75) that were not studied owing to insufficient Ns and ks. Comparatively, long programs were more effective in reducing recidivism in comparison to short interventions, qs= .257, p < .01. The present meta-analyses have several limitations that should be borne in mind in the generalization of results. First, the classification of studies taken from primary studies outlined the characteristics for defining them (e.g., an intervention was categorized under a particular model, but the description also encompassed other models; the length of the follow-up was differently measured); thus, it was categorically assumed in the meta-analysis, when there may be substantial variability in categories or overlapping categories. Second, not all studies consistently reported moderators (e.g., treatment type), with the subsequent exclusion of them. Third, the validity of the measure of efficacy of interventions on recidivism failed to be a good estimator, as it may not even be the real objective of interventions under Spanish law treatment in correctional institutions is voluntary, and the mandate of correctional institutions is to develop aptitudes to overcome deficiencies and to modify unfavourable and negative attitudes; Ley Orgánica 1/1979, de 26 de septiembre, General Penitenciaria). Fourth, the evaluation of the efficacy of the intervention on recidivism based on official records was a criterion of limited validity (a strong positive biased approximation of recidivism; in other words, in comparison to CRs, it failed to capture approximately half of the recidivism). Fifth, follow-up time, as a measure of the efficacy of interventions on recidivism, had a direct and big effect on the validity of the measure. In general, an estimated 50 to 60% of recidivism occurred during the first year (Redondo et al., 2001), with a total confirmed recidivism rate ranging from 40% to more than 80%, which varied according to the measure and the follow-up time (Stover, 2005). Sixth, the measure of the efficacy of interventions in CRs was also probably positively biased (overestimation of the results of the interventions) both in the measure itself (e.g., not all of the batterers who had a partner after treatment reported it; the exposure times of couples with direct effects on recidivism were not reported; a significant number of women failed to respond to encoders) and due to response bias (e.g., bias of leniency, i.e., tendency to minimize or conceal assaults). In fact, one of the distinctive characteristics of victims of gender violence (injury) is the failure to report offences and the concealment of assaults in reports (Holt et al., 2003), concealing offences (Arce, 2017), minimizing injuries (Arce, Fariña, & Vilariño, 2015; Vilariño et al., 2009), and failing to report the recidivism of intervened batterers (Brame et al., 2015), in particular during cohabitation with the batterer. Seventh, evaluations of efficacy were taken from the authors of the interventions, who rely on efficacy for program continuity. Thus, the number of unpublished studies with no effects or negative effects may be high. Eighth, test values were also subject to similar measurement errors. Ninth, the dropouts linked to recidivism were not generally encoded as such, which would artificially increase the efficacy rate (Taft et al., 2001). Tenth, other moderators (not studied as Ns or ks were insufficient) may modify the magnitude or direction of the treatment efficacy (Martín & Moral, 2019; Martín et al., 2019). Thus, further research with additional moderators should be designed. In short, the results had a certain degree of validity for measuring the efficacy of interventions versus non-intervention, but not sufficient to determine the real rates of recidivism, which were presumably significantly higher than registered. These limitations, in terms of validity of conclusions, can be classified as systematic measurement errors (when they generate an alternative explanation to the results), or a biased measurement method (i.e., variance attributed to method and not to measured construct) (Podsakoff et al., 2003). Bearing in mind the implications of these limitations on the validity of the conclusions of this meta-analysis, they are discussed further below:
In conclusion, there is a corpus of literature on the efficacy of interventions, showing significant effects in reducing recidivism in official records. In other words, intervened batterers were less likely to be accused/sentenced again in (ORs) for the same offence. Notwithstanding, not all of the interventions were effective in ORs. Thus, short interventions were completely ineffective and could have negative effects of up to 40%, and certain interventions based on the Duluth Model may have negative effects of up to 10%. In contrast, long interventions based on CBTPs or OTIs (the results may not be generalised to other techniques than those revised) were on average effective and without negative effects on recidivism in ORs. The evaluation of the efficacy of the intervention in ORs was not a valid measure as it was subject to a systematic measurement error (it minimized the rate), which could reach over 80% (Stover, 2005). However, regardless the fact that ORs were not a valid measure of recidivism, we may draw from the meta-analysis with ORs the conclusion that intervention programs should be based on a long cognitive-behavioural approach. However, this does not imply that interventions had a significant effect on reducing recidivism in all measures of violence. There was a null effect in couple reports. Further research should be undertaken to evaluate the efficacy of interventions in both types of measure to identify the type of recidivism lost in ORs and to identify moderators of the efficacy of interventions on this measure. Moreover, studies are required to assess the efficacy of interventions on other criteria such as the internal mechanisms underlying gender violence (Arce et al., 2014), and acquisition of skills and abilities that enable batterers to successfully manage risk situations of intimate partner violence (Arce & Fariña, 2010; Lila et al., 2018; Martín-Fernández, Gracia, Marco, et al., 2018; Martín-Fernández, Gracia, & Lila, 2018). Conflict of Interest The authors of this article declare no conflict of interest. Funding: This research has been sponsored by a grant of the Spanish Ministry of Economy, Industry and Competitiveness (PSI2017-87278-R). Cite this article as: Arce, R., Arias, E., Novo, M., & Fariña, F. (2020). Are interventions with batterers effective? A meta-analytical review. Psychosocial Intervention, 29(3), 153-164. https://doi.org/10.5093/pi2020a11 References References marked with an asterisk indicate studies included in the meta-analysis |
Cite this article as: Arce, R., Arias, E., Novo, M., and Fariña, F. (2020). Are Interventions with Batterers Effective? A Meta-analytical Review. Psychosocial Intervention, 29(3), 153 - 164. https://doi.org/10.5093/pi2020a11
ramon.arce@usc.es Correspondence: ramon.arce@usc.es (R. Arce)Copyright © 2025. Colegio Oficial de la Psicología de Madrid