Isabel Suevos-Rodríguez1, Luis Burgos-Benavides1, Raúl Quevedo-Blasco2, & Francisco Javier Rodríguez-Díaz1
1University of Oviedo, Spain; 2Mind, Brain and Behavior Research Center (CIMCYC), University of Granada, Spain
Received 28 October 2024, Accepted 14 September 2025
Abstract
This paper explores the reliability of assessment instruments used in the prison population to measure the Dark Triad, a constellation of personality traits including machiavellianism, narcissism, and psychopathy. A systematic literature review was conducted using six databases, and 12 articles were selected according to the eligibility criteria. A Cronbach’s alpha reliability generalization meta-analysis was carried out, which reported acceptable internal consistencies (≥ .70) for all three subscales, with machiavellianism achieving the best estimated alpha, and good internal consistency (≥ .80) for the Dirty Dozen, the most commonly used scale in this context. However, the paucity of studies, publication bias, and high heterogeneity reflected in Cochran’s Q statistic and I2 index (≥ 75%) compromise the generalizability of the estimates and underline the need to examine the which moderating variables that influence this variability.
Resumen
Este trabajo explora la fiabilidad de los instrumentos de evaluación empleados en población penitenciaria para medir la Tríada Oscura, constelación de rasgos de personalidad que incluye el maquiavelismo, el narcisismo y la psicopatía. Se realizó una revisión sistemática de la literatura en seis bases de datos, seleccionando doce artículos acordes a los criterios de elegibilidad. Se llevó a cabo un metaanálisis de generalización de la fiabilidad de alfa de Cronbach, que mostró una consistencia interna aceptable (≥ .70) en las tres subescalas, siendo el maquiavelismo el que mejor alfa estimada tenía y una buena consistencia interna (≥ .80) en el Dirty Dozen, la escala más utilizada en este contexto. Sin embargo, la escasez de estudios, el sesgo de la publicación y la gran heterogeneidad reflejada en el estadístico Q de Cochran y el índice I2 (≥ 75%) ponen en peligro la generalización de las estimaciones y subrayan la necesidad de examinar qué variables moderadoras influyen en esta variabilidad.
Palabras clave
Tríada Oscura, Prisión, Personalidad, Instrumentos de evaluación, Fiabilidad, MetaanálisisKeywords
Dark Triad, Prison, Personality, Assessment instruments, Reliability, Meta-analysisCite this article as: Suevos-Rodríguez, I., Burgos-Benavides, L., Quevedo-Blasco, R., & Rodríguez-Díaz, F. J. (2026). Assessment of the Dark Triad in the Prison Population: A Meta-Analysis of Reliability Generalization. Anuario de Psicología Jurídica, 36, Article e260472. https://doi.org/10.5093/apj2026a5
Correspondence: burgosluis@uniovi.es (L. Burgos-Benavides).Since the beginning of this century, a growing number of studies have explored three traits whose socially malevolent communality led Paulhus and Williams (2002) to group them into the Dark Triad of personality. Machiavellianism reflects cold and calculating manipulation, using deception to achieve its own objectives (Jones & Mueller, 2021; Wright et al., 2017) and is usually associated with thoughts of “means to and end” (Crysel et al., 2013). Narcissism reflects the traits of exaggerated grandiosity of self, feelings of entitlement, dominance, and desires for superiority and social admiration (Amos et al., 2022; Corry et al., 2008). Psychopathy manifests as a high degree of manipulation, callousness, and impulsivity (Patrick & Drislane, 2014). Psychopathy reflects a lack of empathy, egoism, remorse, high levels of manipulation and impulsivity (Hare, 1996; Patrick & Drislane, 2014). Currently, dark traits are the predominant model in dark core personality studies (Postigo et al., 2023). To this psychological construct represented by three dark personality profiles (Wright et al., 2017) a fourth trait has been added called sadism (Paulhus, 2014). Sadism reflects a person’s tendency to engage in antagonistic or cruel behavior, inflicting pain for pleasure (Ploufe et al., 2017) and is reflected in their everyday enjoyment of violent video games, movies, and sports (Paulhus, 2014). However, evidence on the role of this trait and its overlap with the other three components has suggested that the variance shared among all traits forms a single underlying dark core (Bader et al., 2021; Moshagen et al., 2020). Dark personality traits have been assessed using various psychometric instruments. Machiavellianism has mainly been assessed using the Machiavellianism Inventory-Version IV (MACH-IV; Christie & Geis, 1970). Instruments such as the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988) and the Narcissistic Admiration and Rivalry Questionnaire (NARQ; Back et al., 2013) have gained prominence. Regarding the measurement of psychopathy, the Psychopathy Checklist-Revised (PCL-R; Hare, 2003) is the most indicated for forensic settings (Salvador et al., 2017) because it allows the evaluation of populations considered to be at high risk (Amador-Zavala et al., 2023). For community samples, the Psychopathic Personality Inventory (PPI; Lilienfeld & Andrews, 1996) or the Self-Reported Psychopathy Scale (SRP; Paulhus et al. 2009) stand out. The Comprehensive Assessment of Sadistic Tendencies (CAST; Buckels & Paulhus., 2014) stands out for its assessment of sadism. The rise in Dark Triad research over the last decade has led to the development of instruments that assess all the three traits together. Two self-report measures stand out in this area: on the one hand, the Dirty Dozen (DD; Jonason & Webster, 2010), whose authors originally started with 22 candidate items based on the NPI, the SRP and the MACH-IV to finally consolidate this tool formed by 12 items with 4-item scales for each trait and, on the other hand, the Short Dark Triad (SD3; Jones & Paulhus, 2013), initially formed by 41 items that were reduced to 27 with 9-item scales for each of the traits. These short instruments allow for a more integrated and efficient assessment of dark traits, thereby facilitating investigation of their interrelationships and combined effects. The Short Dark Tetrad (SD4; Paulhus et al., 2020) was the first measure to assess the four Dark Tetrad traits across 28 items. The Dark Triad presents measurement problems, such as an empirical overlap between machiavellianism and psychopathy, reflected in a high shared variance (80%) in its items (Miller et al., 2016; Muris et al., 2017). Although machiavellianism includes features absent in psychopathy, such as impulse control and long-term planning (Dowgwillo & Pincus, 2017; Jones & Figueredo, 2013), neither the DD nor SD3 items align with expert ratings or are distinguishable from psychopathy (Vize et al., 2018). Furthermore, DD and SD3 use unidimensional scales to assess multidimensional traits, which limits their understanding of nature and its interrelationships (Watts et al., 2017). Recently, the five-factor model antagonistic triad measure (FFM ATM; Rose et al., 2022), a self-reported measure of the Dark Triad, was created, which obtained powerful ratings by Welsh et al.’s (2024) COSMIN review. This appears to solve previous problems using super-short forms of individual scales based on factor analyses. The malevolent nature of these personality traits constitutes the Dark Triad as a strong predictor of violent or criminal behaviors (Wright et al., 2017), as well as other deviant behaviors, such as risk-taking, substance abuse, sexual harassment, reckless driving, bullying, and cyberbullying (Endriulaitienè et al., 2018; Jauk and Dieterich, 2019; Maneiro et al., 2020). Exploring how dark personality traits are associated with psychopathology or the cognitive strategies involved, such as moral disengagement (Gómez & Durán, 2024) in the prison population, may help to understand psychological maladjustment and criminal behavior (Brugués & Caparrós, 2023a, 2023b). Assessing psychometric properties such as reliability is crucial to ensure the quality of tests and facilitate prognostic and therapeutic decisions (Badenes-Ribera et al., 2020), consolidating tools that capture the multidimensional nature of dark personality traits in offenders. However, reliability does not refer to a stable and unchanging concept, but to a concept that will change in accordance with the population to which it is applied. Since the reported reliability varies among studies, the reported is an estimation (Quevedo-Blasco et al., 2023). Studies of the dark triad and its relationship with antisocial behaviors (Pechorro et al., 2022) have attracted the attention of psychologists (Wright et al., 2017). This construct represented by dark triad or dark tetrad personality profiles, reflects malevolent aspects of human personality with behavioral tendencies similar to those of predatory and insensitive criminals who appear to be extremely cruel, and their behaviors are morally repugnant (Wright et al., 2017). The main objective of the present study was to identify the instruments used in the prison population to identify the features of the dark triad or dark tetrad. The secondary objectives were: 1) to analyze the generalizability of the reliability coefficients and 2) to analyze the heterogeneity of the reliability coefficients. Eligibility Criteria The inclusion criteria were as follows: (1) studies that identify the traits of the dark triad or dark tetrad in the prison population, (2) studies using psychometric instruments to identify traits of the dark triad or dark tetrad, and (3) to report the Cronbach’s alpha reliability coefficient. Sources of Information Databases were selected based on their international scientific relevance and high-quality coverage of health and social science studies. The following six databases were accessed from the library of the University of [anonymized]: Scopus, Web of Science, PubMed, PsycInfo, PsycArticles, and Psyctest. These information sources were consulted on February 22, 2024. Search Strategy The search phrase was constructed using terms frequently used as keywords in articles, systematic reviews, and meta-analyses on the Triad or the Dark Tetrad, and the thesaurus of the University of [anonymized] was checked to see which terms were the most frequently used to refer to each concept. The search strategy includes the Boolean operators ‘AND’ and ‘OR’; as well as the truncation ‘$’ and ‘*’. Double inverted commas (”) were used to identify documents containing these words. The phrase used to search each of the selected databases was: “(“antisocial personality disorder$” OR “sociopathic personality$” OR “antisocial behavior$” OR “deviant behavior$” OR psychopath* OR narcissism* OR machiavellianism* OR sadism*) AND (“dark core” OR “dark tetrad” OR “dark triad”) AND (delinq* OR perpetrator$ OR prison*). In each database, the search fields were adapted to achieve the widest bibliographic scope considering the variability of the database structures. (See Appendix). Selection Process Six databases were searched following the guidelines described in the PRISMA statement by Page et al. (2021). All records were selected from each database and downloaded in RIS format. The articles from the six databases were then uploaded along with their basic bibliographic information to the Rayyan bibliographic manager (Ouzzani et al., 2016), where they were classified as duplicates, deleted, included, excluded, and probable. The first screening was conducted by Rayyan when loading records that algorithmically identified duplicate articles, which were then manually checked by discarding those with a content overlap of more than 90%. Second, articles that did not address either the Triad or the Dark Tetrad in their titles, keywords, or abstracts were excluded, as were systematic reviews, theoretical papers, and non-systematic narrative reviews. The next phase consisted of analyzing the complete text and making a decision based on the eligibility criteria. This phase was carried out by two researchers in a blinded mode and no formal disagreements were obtained regarding the inclusion and exclusion of articles. Data Collection Process Basic bibliographic information for all eligible records was systematically extracted from six selected databases (Figure 1). The retrieved data included author names, publication titles, journals of origin, abstracts, and DOI. These details were downloaded in the RIS file format at the individual record level per database. They were subsequently collated by two researchers and uploaded to the Rayyan Bibliographic Manager Data Items Two categories of complementary outcomes were assessed: 1) Cronbach’s alpha reliability coefficient of the Dark Triad measurement tools and 2) methodological characteristics of the identified studies (author details, date of publication, country of study origin, journal of origin, quartile ranking, sample size, sample age range, instrument used for assessment, number of associated items, and characteristics of the prison sample). Assessment of the Risk of Bias of the Study To assess the possible existence of publication bias, funnel plots were constructed using JAMOVI (The JAMOVI Project, 2023), where the magnitudes of the overall alpha of the scales and alpha of the subscales were plotted against the standard error. Egger’s regression test was used to determine whether the reliability estimates exhibited publication bias. Statistical Analysis For statistical analysis, the following software was used JAMOVI (The JAMOVI Project, 2023) version 2.4.8. Reliability generalization analysis was performed using the MAJOR statistical package. This statistical technique allows the internal consistency of psychometric instruments to be quantified. Cumulative reliability was calculated for each of the scales (Machiavellianism, Narcissism and Psychopathy) that make up the dark triad. Cumulative reliability was estimated using Cronbach’s alpha values transformed using the Hakstian-Walen approach to allow for random effects modeling (Burgos-Benavides et al., 2023). Publication bias was analyzed using the funnel plots and Egger’s regression tests. A random-effects model using a restricted maximum-likelihood estimator (REML) was used to calculate the cumulative estimates of Cronbach’s alpha with a 95% confidence interval. Subsequently, a sensitivity analysis was performed by eliminating studies that did not exceed a Cronbach’s alpha criterion of .70. Cochran’s Q test, I2, H2, and between-study variance using τ2 were used as statistics to assess heterogeneity in the meta-analysis. Table 1 shows the 12 articles selected based on the eligibility criteria. These studies were published between 2015 and 2023 in different countries, with Spain being the most repeated among the studies. The prolific and relevant authors of this study include Brugués & Caparrós (2022, 2023b), Navas et al. (2021a, 2021b), Resett, Caino, and Ireland (2022), and Resett Caino, and Zapata (2022). All articles are in scientific journals related to the field of psychology, psychiatry or related disciplines; through the Journal Citation Reports (JCR), the quartile of each journal was reviewed based on the comprehensive impact factor. Gozzi et al. (2018) and Scientific Journal Reports (SJR) were also reviewed. The journals in which the articles by Glenn and Sellbom (2015) and Navas et al. (2021a) are published are in Q1, which is considered the highest-impact journals. Table 1 Methodological Characteristics of the Identified Studies ![]() Note. M = machiavellianism; N = narcissism; P = psychopathy; G = global, DD = Dirty Dozen, SD3 = short Dark Triad; MACH-IV = Machiavellianism Inventory Version IV, NPI = Narcissistic Personality Inventory, CAST = Comprehensive Assessment of Sadistic Tendencies, LSRP = Levenson Self-Report Psychopathy; S1 = Verbal Sadism, S2 = Physical Sadism, S3 = Vicarious Sadism; P1 = primary psychopathy, P2 = secondary psychopathy; GP = global psychopathy. Regarding the tools used, most studies opted for combined measures to capture the Dark Triad: eight studies chose the DD and three studies chose the SD3. The DD scale was adapted to the Argentine population (Resett, Caino, & Ireland, 2022; Resett, Caino, & Zapata, 2022), Spanish population (Brugués & Caparrós, 2022, 2023; Navas et al., 2021a, 2021b), and Chinese population (Zhang et al. 2023). On the other hand, only Glenn and Sellbom (2015) used individual instruments to measure each trait: MACH-IV, NPI, and PPI, while Balcioglu et al. (2023) used the LSRP to capture psychopathy. Given that both DD and SD3 are self-reported measures, the studies used Likert-type scales of 1-5 for inmates to rate the degree of agreement and disagreement with the test statements. Garofalo et al. (2020) and Zhang et al. (2023) used Likert-type scales 1-7 and the study by Gozzi et al. (2018) used Likert-type scale 1-4. Regarding the sample size of the studies, we found a high degree of heterogeneity, with a minimum of 34 participants and a maximum of 972 participants. At the same time, all studies included an older population aged between 18 and 80 years. Regarding the sex of the participants, all the studies presented a uniform sample of men, in addition to the studies by Brugúes and Caparrós (2022, 2023a), Glenn and Sellbom (2015), and Resett, Caino, and Ireland (2022), whose prison samples were mixed despite the supremacy of the male sex. The characteristics of the prison samples were explored in relation to the typology of offences committed. The study by Balcioglu et al. (2023) classified the 64 defendants accused of sexual offences according to the violence used; Brugúes and Caparrós (2022, 2023a) used the same sample in both studies, whose crimes were mainly robbery with violence (20.6%), criminal gangs and drug trafficking (15.9%), crimes against health and public finances (12.7%), and gender and domestic violence (11.1%). In the study by Garofalo et al. (2020) all prisoners were convicted of violent crimes, whereas in the study by Zhang et al. (2023) most prisoners were convicted of property crimes (48.1%). In a study by Navas et al. (2021b) prisoners were convicted of a variety of offences. However, in another study by Navas et al. (2021a) the prison sample was divided into intimate partner violence (40.5%), sexual assault (44.6%), and sexual harassment (14.9%). The study by Resett, Caino, and Zapata (2022) groups offences committed by sex: on the one hand, men had been convicted of offences homicide or attempted homicide (29%), drug trafficking (23%), and theft or robbery (23%); on the other hand, the majority of women had been convicted of drug trafficking (62%). Another study by Resett, Caino, and Zapata (2022) only used male samples. Oljaa et al. (2021) reported that most of their prison samples were convicted of felony murder (48.6%). By contrast, Gozzi et al. (2018) divided the sample into low-and medium-security regimes (91.18%) and protected regimes (8.82%). Figure 2 Forest Plot of the Studies Selected for Meta-analysis of Cronbach’s Alpha for the Subscales. ![]() Note. M1, N1, and P1 correspond to the original models of machiavellianism, narcissism, and psychopathy; M2, N2, and P2 correspond to the models with sensitivity algorithm of machiavellianism, narcissism, and psychopathy. Resett, Caino, and Ireland (2022) used DD and presented the best Cronbach’s alpha coefficients for both the machiavellianism (.87) and narcissism (.85) subscales. However, regarding the measurement of psychopathy, better internal consistency was obtained in the study by Glenn and Sellbom (2015): .95 in total PPI, .91, and .94, respectively. In contrast, Brugués and Caparrós (2023b) used SD3 and obtained the lowest alpha coefficients for the selected studies on the three subscales: .63 for machiavellianism, .45 for narcissism, and .44 for psychopathy. Results of the Meta-analysis For the reliability generalization analysis of the subscales, the study by Glenn and Sellbom (2015) was excluded from Table 1 because it is the only study that uses independent measures for each trait, and the study by Gozzi et al. (2018) because it only reports the overall alpha of the instrument. Regarding the study by Balcioglu et al. (2023) the alpha coefficients of the machiavellianism and narcissism subscales of the DD will be used since they use an independent measurement instrument for the assessment of psychopathy. Only the study by Oljaa et al. (2021) assessed the Dark Tetrad meeting the selection criteria; the assessment of sadism with the CAST was omitted, and only the reliability measures of the DD subscales were used, as no further studies reported sadism or combined measures of the Dark Tetrad. Ten studies recorded Cronbach’s alpha for the machiavellianism subscale (Egger’s regression test = -6.177, p < .001). The estimated alpha was .79 (SE = .0214; CI lower bound = .75, CI upper bound = .83); which indicated high heterogeneity (I2 = 93.47% and Q = 60.237, p < .001) (see Table 2 and Figure 3). The asymmetry observed in the funnel plot assessing publication bias (see Figure 4) coincided with the results of the Egger’s regression test. For the sensitivity analysis of the Machiavellianism model, one study with an alpha of less than .70 was eliminated (Brugués & Caparrós, 2023b), leaving nine studies. The estimated alpha was .80 (SE = .0211; CI lower limit = .75, CI upper limit = .84); the model with sensitivity continues to present practically the same heterogeneity (I2 = 93.45% and Q = 52.959, p < .001) (see Table 2 and Figure 3) as well as the same publication bias (Egger regression test = -5.658; p < .001). Figure 3 Funnel Plot of the Studies Selected for the Meta-analysis of Cronbach’s Alpha for the Subscales. ![]() Note. M1, N1, and P1 correspond to the original models of Machiavellianism, narcissism, and psychopathy; M2, N2, and P2 correspond to the models with sensitivity algorithm of Machiavellianism, narcissism, and psychopathy. Table 2 Heterogeneity Statistics for Machiavellianism, Narcissism, Psychopathy, and the Global Dark Triad Model ![]() Note. M1, N1, and P1 correspond to the original models of machiavellianism, narcissism, and psychopathy; M2, N2, and P2 correspond to the models with sensitivity algorithm of machiavellianism, narcissism, and psychopathy; G1 corresponds to the original global Dark Triad model. For the narcissism model, 10 studies reported Cronbach’s alpha for this subscale (Egger’s regression test = -4.880, p < .001). The estimated alpha was .76 (SE = 0.0334; CI lower bound = .70, CI upper bound = .83); significant heterogeneity was found between studies (I2 = 96.98% and Q = 59.357, p < .001) (see Table 2 and Figure 3). When performing the sensitivity analysis, studies that did not exceed the criterion of ≥ .70 were omitted (Brugués & Caparrós, 2022, 2023b; Oljaa et al., 2021). The model with sensitivity (n = 7 studies) offered greater homogeneity (I2 = 56.02% and Q = 14.698, p = .023) (Table 2 and Figure 3); it presented an estimated alpha of .82 (SE = .0089; CI lower limit <= .81, CI upper limit = .84). Publication bias was considerably reduced (Egger’s regression test = -.717, p = .473). With respect to the generalization of the reliability of the psychopathy subscale, we found that nine studies reported Cronbach’s alpha (Egger’s regression test = -2.184, p = .029). The estimated alpha was .70 (SE = .0270; CI lower bound = .65, CI upper bound = .76); the original model reflected high heterogeneity (I2 =88.65% and Q = 63.967, p < .001) (see Table 2 and Figure 3). However, when running the sensitivity algorithm, the heterogeneity values were lower. (I2 = 65.65% and Q = 10.620, p = .031) (Table 2 and Figure 3). This model (n = 5) eliminates 4 studies based on the alpha criterion of less than 0.70 (Brugués, & Caparrós., 2023b; Navas et al., 2021; Resett, Caino, & Ireland, 2022; Resett, Caino, & Zapata, 2022), resulting in an estimated alpha of .76 (SE = .0201; CI lower limit = .72, CI upper limit = .80). The asymmetry observed in the funnel plot assessing publication bias (see Figure 4) was reduced compared with the original model (Egger’s regression test = -.847, p = .397). Figure 4 Diagrama de Bosque de los estudios seleccionados para el metaanálisis del alfa de Cronbach global. ![]() Regarding the generalizability of the overall reliability, only three studies reported the overall Cronbach’s alpha of the DD scale (Egger’s regression test = 2.172, p = .030). This limitation prevented sensitivity analysis of the scales globally, as one of the studies did not pass the criterion of ≥ .70 Cronbach’s, and the required parameter for sensitivity analysis was not met. The model showed an estimated alpha of .86 (SE = .0132; CI lower limit = .83, CI upper limit = .88) and was homogeneous (I2 = 0% and Q = 5.668, p = .059). Studies on prison populations are of highly relevant to the society (see, for example, Alcántara-Jiménez et al., 2023; Prieto-Macías et al., 2020; Quevedo-Blasco et al., 2023; Romero-Lara et al., 2020). This study aims to test whether the internal consistency estimates of the Dark Triad assessment instruments in prison samples are generalizable. Cronbach’s alpha coefficients of the machiavellianism, narcissism, and psychopathy subscales obtained in each of the selected studies were quantitatively synthesized, showing consistency between items measuring the same construct; estimated alphas of ≥ .70 were obtained for all three subscales. Although a Cronbach’s alpha was established as an exclusion criterion > .70 following the recommendations of George and Mallery (2011) for acceptable reliability in the initial phases of research, it is essential to recognize that this threshold may have excluded instruments that, despite showing lower internal consistencies, could have captured atypical or intense expressions of dark traits in penitentiary settings. This potential limitation is due to the very nature of studies in this area. For this reason, one of the main tasks for researchers in this field should be to calculate and report other reliability coefficients, such as the omega coefficient or the coefficient reliability (CR), which can overcome the limitations of Cronbach’s alpha. In the meantime, we suggest interpreting these results with caution, at least until future primary studies report other reliability coefficients. These future lines of research will ensure that diagnoses and decision-making in the forensic field—such as prison classification, prediction of recidivism risk, or suitability for treatment programs—are based on instruments with rigorous internal consistency. Regarding individual and clinical decisions, internal consistency coefficients should exceed values of .90 or even .95 (DeVellis, 2017; Nunnally, 1978), especially in contexts where the consequences of misclassification can be serious. Therefore, the specific psychometric validation of these instruments in the prison population is not only desirable but essential if rigorous use in applied contexts is desired. In the current study, the observed reliability, corrected by sensitivity, is insufficient for the evaluation in applied settings and limited for research studies as 45%, 42%, and 49% of the standard deviation for machiavellianism, narcissism, psychopathy, and DD measures are errors, respectively (Quevedo-Blasco et al., 2023). However, the alpha coefficients exhibited significant heterogeneity, as deduced from the high significance achieved by Cochran’s Q statistic and high I2 index. Higgins and Thompson (2002) proposed 25%, 50%, and 75% as possible flags to indicate low, moderate, and high heterogeneity, respectively. To control heterogeneity and improve comparability between studies, it is necessary to control for several moderating variables that may be influencing the variability of the reliability coefficients. The main limitation in controlling for these variables is that current research in this field of study does not report similarity/replicability in the study method and therefore there is difficulty in controlling for certain contextual and individual variables. Therefore, it is recommended to interpret the reliability generalization estimates with caution, as well as the high heterogeneity, since many social and cultural factors that were not controlled for in this study, therefore could be underlying aspects of the reported heterogeneity. To increase comparability between studies, future studies should control for several criminological variables. Zhang et al. (2023) in China as a collectivist society, narcissistic traits are not conceived in the same way as in other cultures. Although a considerable number of studies have been based on male samples, future studies that include men and women in the study sample should provide more sophisticated analyses such as invariance to ensure that differences or equivalences are due to the construct and not to the functioning of the instrument. These analyses are a current limitation in the psychometric evaluation of this issue. On the other hand, the prison context, methods of test administration, and translation and adaptation of the instruments are also factors to be considered. To overcome this limitation, we selected the random effects model, which assumes that the differences between Cronbach’s alpha reliability coefficients are not due to sampling error (intra-study variance) but depend on the variability of each prison population (inter-study variance). Despite the known limitations of Cronbach’s alpha coefficient for assessing reliability (Kalkbrenner, 2024), in this study we have focused on the reliability of this coefficient because it is the one presented by the primary studies, therefore future studies should incorporate reliability measures such as the Macdonald’s omega or the coefficient reliability (CR). The sensitivity analysis consisted of replicating the meta-analysis by excluding studies whose coefficients did not exceed the minimum criterion of ≥ .70 a Cronbach’s alpha. This allowed us to analyze the possible influence of these studies on the final estimate. In the meta-analytic model of the machiavellianism subscales, the estimates did not vary drastically when the sensitivity algorithm was applied. Therefore, this measure has the highest robustness, according to the reliability coefficient. With respect to the models of narcissism and psychopathy with the sensitivity algorithm, although they presented a higher alpha estimate and lower heterogeneity, this was still statistically significant, preventing the generalization of the results. The asymmetry in the funnel plots of the three original models and the sensitivity algorithm suggests the presence of publication bias, which is statistically significant in Egger’s regression tests (p < .005), according to Egger et al. (1997). The lack of dots in the lower right of the graphs indicates that studies with larger reliability estimates, smaller samples, or larger standard errors are not represented in the published literature because of the lack of significant results. The cumulative average Cronbach’s alpha among the three scales was similar (.76, .80, and .82). This explains why these three constructs interact with each other (Jiménez-Granado et al., 2023; Wright et al., 2017). In addition, there was evidence of stability in the reliability of the three scales. The most stable sensitivity algorithm was that of the Machiavellianism scale from .79 to .80. The Narcissism scale presented an improvement in the average reliability from .76 to .82 and in the psychopathy scale the sensitivity improved from .70 to .76. These results should be interpreted with caution until future studies are available to test new sensitivity algorithms. Furthermore, this does not indicate significant differences in the stability of the scales but presents a challenge for future studies. Beyond reviewing the reliability of the instruments, it is essential to deepen the practical utility of assessing these traits in prison settings, as well as to analyse how these data can guide specific interventions. For example, identifying profiles with a high presence of Machiavellianism or psychopathy could facilitate the implementation of rehabilitation programs adapted to the characteristics of this prison population through psychosocial interventions focused on improving empathy skills, impulse control, and recognition of destructive behavior patterns. In addition, the information derived from these assessments would allow the establishment of more precise risk management strategies that are adapted to the needs of each inmate; in this way, the development of individualized monitoring plans, the allocation of specialized therapeutic resources, and the development of social reintegration programs could be addressed. In line with this therapeutic approach, and bearing in mind that attributing psychological and criminal maladjustment solely to personality traits is a reductionist bias, it is crucial that contextual factors such as the family and social environment, criminal history, peer influence or the particularities of the prison system itself are also taken into account, without forgetting the interference of prisonization, understood as the assimilation of prison rules in the habits of thinking, feeling and acting of prisoners. Ultimately, integrating these approaches could optimize risk management and contribute to reducing recidivism, facilitating a more effective transition to life in the community after serving the sentence. The results of this meta-analysis highlight the need to study the discriminant validity of the scales that assess the triad or dark tetrad of personality. At the theoretical and methodological level, there are certain controversies in this field of study. In particular, the discussion on the substantive orthogonality of the dimensions of both the Triad and the Dark Tetrad is particularly relevant. From a conceptual perspective, machiavellianism, narcissism, and psychopathy (and sadism) are expected to be distinct constructs and therefore orthogonal, or at least with minimal overlap. These findings suggest that another limitation of this field of study is the lack of evidence of discriminant validity. Therefore, future studies should urgently address evidence of discriminant validity so that we can discuss in a future meta-analysis the measurement overlap between these factors. In line with the theoretical conceptualization of these constructs, machiavellianism is understood as a behavioral tendency towards manipulation and goal attainment and is not classified as a personality disorder; this relatively narrow definition may have implications for the selection and formulation of the items used to assess it. Psychopathy, on the other hand, encompasses a broader spectrum of characteristics such as impulsiveness, lack of empathy and antisocial behavior; but despite this, the instruments used to measure it often fail to unambiguously capture all its dimensions. This lack of clarity in the definition of psychopathy and in the construction of its items may contribute to the overlap with machiavellianism; consequently, it is imperative to refine measurement instruments to more accurately reflect the theoretical particularities of each trait, which would allow for a clearer differentiation between these two constructs and improve the utility of assessments in correctional settings. It is possible that such overlap does not reflect a true interrelationship inherent in the constructs but rather stems from an inadequate or inaccurate selection of the items that make up the scales. The main limitation is the scarcity of studies on these dark constellations in the prison samples. It is not possible to explore the reliability of the tools used for measuring dark tetrads in this environment because of a lack of studies. However, we had to discard research that included both community and prison samples because they did not differentiate the alpha of the subscales for each sample as well as research conditioned by reliability induction, that is, that had extrapolated to their study the coefficient obtained for the same scale in another similar study. Moreover, only the Short Dark Triad (SD3) and Dirty Dozen (DD) were used for the meta-analysis. Both scales present two main problems in measuring the Dark Triad: difficulty capturing the multidimensionality of traits and high empirical overlap between machiavellianism and psychopathy items. Furthermore, when it comes to assessing psychopathy, studies that use the DD obtain the lowest alpha coefficients, revealing the limitations of this instrument in correctly capturing this trait. In contrast, studies that opt for the SD3 or specific psychopathy assessment tools report the highest alpha coefficients (Balcioglu et al., 2023; Glenn & Sellbom, 2015; Oljača et al., 2021). However, because DD is the most widely used and has good internal consistency, its use should be encouraged in future evaluations by validating it with different penitentiary samples. In addition to the brief scales traditionally used to assess the Dark Triad, the FFM-ATM measure represents a particularly promising tool for the forensic field which allows dark traits to be linked to clinical dimensions of the Big Five model, particularly antagonism, thus providing a broader and clinically interpretable framework. Its application in prison contexts could facilitate the identification of risk profiles that are not easily captured by classic instruments focused exclusively on machiavellianism, narcissism, or psychopathy. For example, it could be useful for assessing propensity for interpersonal insensitivity, disinhibition, or the tendency to exploit others, key aspects in decision-making regarding inmate classification, recidivism prognosis, or suitability for individualised treatment programmes. However, as Postigo et al. (2023) point out, the use of excessively brief measures may limit understanding of the internal structure of dark traits, negatively affecting both their validity and practical usefulness in applied settings. In this sense, the FFM-ATM offers a dimensional approach that could be better suited to the complexity of psychological profiles observed in prison contexts. Its future evaluation according to demanding psychometric criteria, such as those established by the COSMIN review (2024), may contribute to consolidating it as a reliable and clinically useful measure for implementation in forensic settings, contributing to a more ethical, accurate, and useful assessment for decision-making in prisons. In order to explore possible methodological limitations that could explain part of the observed heterogeneity, the profile of the studies that obtained the lowest reliability coefficients was examined in greater depth. In the case of Brugués and Caparrós (2022), the small sample size (n = 63) could have contributed to the instability of the psychometric estimators. However, when compared to their subsequent study (Brugués & Caparrós, 2023a), which has the same sample size but obtains significantly higher alphas, it is reasonable to assume that the difference could be due to factors such as the SD3 application procedure, the contextual conditions during data collection, or even specific motivational variables of the inmates (e.g., disinterest, response bias, or social desirability). In the study by Gozzi et al. (2018), the low overall alpha reported for the Dirty Dozen can be attributed not only to the small sample size (n = 34), but also to the methodological decision not to report reliability coefficients by subscale, which makes it difficult to identify which dimension (machiavellianism, narcissism, or psychopathy) could be generating lower consistency. For their part, Garofalo et al. (2020) also report alphas below .70 on all three subscales, reinforcing the need to carefully examine the cultural and linguistic context of application. In this regard, both studies (Gozzi and Garofalo) were conducted in Italy and used linguistic adaptations of the DD, suggesting the possibility of inadequate translation or poor conceptual equivalence between items. Considering that these brief scales are particularly sensitive to problems of comprehension and semantic ambiguity, any deviation in the adaptation can have a direct impact on reliability. Therefore, it is recommended that future studies in forensic contexts include systematic linguistic and cultural validation processes, as well as standardised administration conditions and quality controls in data collection to minimise the impact of these sources of error. Despite the recurrent publications of authors such as Brugués and Caparrós (2022, 2023b), Navas et al. (2021a, 2021b), Resett, Caino, and Ireland (2022), and Resett, Caino, and Zapata (2022) could imply a redundance in the data, it should be emphasized that the sample sizes and time periods are different, which could indicate that the data do not overlap with each other. However, it is important to suggest that the main limitation lies in the generalization to populations from other cultural contexts. Therefore, future research should be developed in different cultural contexts and clear strategies should be implemented in the original articles to control and clarify the possible duplicity of the samples used. All the studies included in the present review assessed dark personality using self-report measures. There is no evidence that studies controlled for social desirability, which could distort the reliability of the machiavellianism, narcissism, and psychopathy subscales through response biases, given that the inmate population is particularly keen to present themselves in a more favorable light than they really are. It would be interesting to compare the results of traditional scales measuring the Dark Triad with other types of instruments such as the FFM ATM, social desirability scales such as Crowne and Marlow (1960), whose items represent culturally sanctioned or approved behaviors with a low probability of occurrence and without reference to psychopathological aspects, and other non-psychometric measures such as semi-structured interviews with inmates. This study is a contributes to current status of the internal consistency of psychometric tools being used to assess the Dark Triad in the inmate population. In line with the objectives set out: (1) both single-trait instruments (MACH-IV, NPI, PPI, or LSRP) and combined measurement scales (DD and SD3), which are the most frequent, have been identified; (2) the reliability generalization study yields acceptable internal consistencies for the three subscales (≥ .70), and good for the DD scale (≥ .80); (3) the reliability coefficients of the subscales exhibited high heterogeneity, as deduced from the high significance reached by Cochran’s Q statistic and the I2 index obtained (≥ 75%); and (4) the existence of publication bias was evidenced by the asymmetry in the funnel plots, confirmed by the statistically significant results (p < . 05) of Egger’s regression tests; and (5) the sensitivity analysis indicates robustness in the reliability averages of the three scales with minimal variations. In all three, the average improved slightly. Finally, it was concluded that the DD is the most widely used scale, with good internal consistency (. 86; SE = .0132), and homogeneity across studies (I2 = 0% and Q = 5.668, p = .059). The three scales had similar averages reliabilities. After sensitivity analysis, it was observed that the Machiavellianism scale had the greatest heterogeneity. This suggests that the studies using this scale differ markedly from each other. This could be due to differences in methods, populations and measurements. Furthermore, this result implies that owing to the inherent variability in the studies, it is the scale that presents the greatest difficulties of generalization. Future research would need to explore which moderating variables might be influencing this variability; in this way, we would gain a clearer picture of the underlying mechanisms of dark personalities, favoring both the accuracy of assessments and the development of more effective interventions in the prison setting. We also suggest that researchers consider the limitations of ordinal alpha and calculate ordinal alpha and omega in future studies (Kalkbrenner, 2024). Conflict of Interest The authors of this article declare no conflict of interest. Cite this article as: Suevos-Rodríguez, I., Burgos-Benavides, L., Quevedo-Blasco, R., & Rodríguez-Díaz, F. J. (2026). Assessment of the dark triad in the prison population: A meta-analysis of reliability generalization. Anuario de Psicología Jurídica, 36, Article e260472, 1-12. https://doi.org/10.5093/apj2026a5 Funding This research was funded by the Gobierno del Principado de Asturias (PA-23-BP22-097), Ministerio de Ciencia, Innovacion y Universidades (FPU2023-02733). References |
Cite this article as: Suevos-Rodríguez, I., Burgos-Benavides, L., Quevedo-Blasco, R., & Rodríguez-Díaz, F. J. (2026). Assessment of the Dark Triad in the Prison Population: A Meta-Analysis of Reliability Generalization. Anuario de Psicología Jurídica, 36, Article e260472. https://doi.org/10.5093/apj2026a5
Correspondence: burgosluis@uniovi.es (L. Burgos-Benavides).Copyright © 2026. Colegio Oficial de la Psicología de Madrid