ES EN
Vol. 40. Num. 3. December 2024. Pages 164 - 176

Effects of Candidate Gender and Qualification on Hiring Recommendations in Asynchronous Video Interview Tools

[Los efectos del género y de la cualificación de la persona candidata en las recomendaciones de contratación en las entrevistas de trabajo asincrónicas]

Edurne Martínez-Moreno, Edurne Elgorriaga, Lorena Gil de Montes, and Olaia Larruskain-Mandiola


University of the Basque Country UPV/EHU, Spain


https://doi.org/10.5093/jwop2024a14

Received 1 October 2024, Accepted 29 November 2024

Abstract

The main objective of this study is to examine the relative influence of candidate competencies vs. rater biases on hiring recommendations made using asynchronous video interview (AVI) tools, while considering a candidate's gender and qualifications. A 2 × 2 within-subject design was employed with 151 HR professionals in Spain to explore the effects of candidate gender (female vs. male) and qualifications (highly qualified vs. semi-qualified) on hiring recommendations. Binary logistic regression and qualitative analyses revealed that although competencies play a strong role biases were the dominant factor influencing hiring recommendations for all candidates. For women, competence was a key predictor. Sociability predicted hiring recommendation of semi-qualified candidates, particularly men, for whom morality also played an important role. First impressions favoured highly qualified women, while nonverbal communication favoured highly qualified men. Consistent with role congruity theory, communal competencies were more valued in women, while agentic competencies were crucial for men.

Resumen

El objetivo principal de este estudio es examinar la influencia relativa de las competencias de la persona candidata frente a los sesgos de las y los evaluadores en las recomendaciones para el pase a la siguiente fase del proceso de selección realizadas mediante entrevistas de trabajo asincrónicas (AVI), considerando el género y el grado de cualificación de la persona candidata. Se empleó un diseño 2 × 2 intrasujeto con 151 profesionales de recursos humanos en España para explorar los efectos del género de la persona candidata (mujer vs. hombre) y el grado de cualificación (muy cualificado vs. semicualificado) en las recomendaciones de contratación. Los análisis de regresión logística binaria y los análisis cualitativos revelaron que, aunque las competencias juegan un papel importante, los sesgos fueron el factor dominante que influyó en las recomendaciones de contratación para todas las personas candidatas. En las mujeres, la competencia fue un predictor clave. La sociabilidad predijo la recomendación de contratación de las y los candidatos semicualificados, especialmente en los hombres, para los cuales la moralidad también jugó un papel importante. Las primeras impresiones favorecieron a las mujeres muy cualificadas, mientras que la comunicación no verbal favoreció a los hombres muy cualificados. De acuerdo con la teoría de la congruencia de roles, las competencias comunales fueron más valoradas en las mujeres, mientras que las competencias agénticas fueron cruciales para los hombres.

Palabras clave

Entrevista de trabajo asincrónica, Sesgos del evaluador, Competencias, Género, Recomendación de contratación

Keywords

Asynchronous job interview tool, Rater bias, Competency, Gender, Hiring recommendation

Cite this article as: Martínez-Moreno, E., Elgorriaga, E., Gil de Montes, L., and Larruskain-Mandiola, O. (2024). Effects of Candidate Gender and Qualification on Hiring Recommendations in Asynchronous Video Interview Tools. Journal of Work and Organizational Psychology, 40(3), 164 - 176. https://doi.org/10.5093/jwop2024a14

Correspondence: edurne.martinez@ehu.eus (E. Martínez Moreno).

Introduction

The COVID-19 pandemic significantly changed the world, decreasing physical contact among people and promoting the use of information and communication technologies (ICTs) for social interaction and work. Personnel selection processes are not an exception to these changes, with organizations embedding new ICT tools, such as asynchronous video interview (AVI) tools, to assess candidates’ suitability for vacant positions. AVI tools are a type of online job interview conducted through web-based video platforms, where candidates record their responses to a set of questions, which are later assessed by recruitment professionals or artificial intelligent (Dunlop et al., 2022; Liff et al., 2024; Lukacik et al., 2022). In a recent targeted review of digital selection procedures, Woods et al. (2020) concluded that the potential negative effects and biases of using digital selection procedures are still largely unknown, and it remains unclear how these methods may accentuate or mitigate arbitrariness in personnel selection processes. To address this issue, this study has two main purposes. First, it aims to examine whether the evaluated competencies of candidates outweigh the biases of raters in hiring recommendations for a Human Resources (HR) technician position using AVI tools within a Spanish context. Second, the study seeks to identify the most salient competencies and rater biases, taking the gender and qualifications of candidates into account.

First, our study will analyze rater behavior and candidate assessments across different genders and qualifications in controlled quasi-experimental settings to address the limited existing literature on the fairness of using AVI tools. In Spain, a study conducted by González et al. (2019), in which four equivalent resumes of fictitious male and female applicants were submitted to 1,372 job postings, found that women were 30% less likely than men to receive a response to their job applications. However, the study also pointed that highly qualified women without children did not experience this form of discrimination during the initial stage of the selection process, as their likelihood of receiving a response was similar to their male counterparts’. This study aims to delve into the factors that influence the selection of women in the later stages of the hiring process.

Most previous studies have focused on candidate reactions to the use of AVI tools (e.g., Niemitz et al., 2024; Roulin, Pham, et al., 2023; Suen & Hung, 2023; Suen & Hung, 2023), and the few that have examined their effectiveness have reported inconsistent results (e.g., Gorman et al., 2018, 2016; Langer et al., 2017; Roulin, Lukacik, et al., 2023; Suen et al., 2019; Torres & Gregory, 2018). Second, this study seeks to determine whether the competencies required for men and women for the position of HR technician are the same, or whether, on the contrary, as established by the role congruity theory (Eagly & Karau, 2002; Heilman et al., 2024), they will be evaluated differently by the raters. Third, we will explore how rater biases operate and which biases are most relevant, considering the AVI tools format and methods used to study rater biases in Spain.

While many studies have reported information on the fairness of face-to-face (FTF) job interviews (e.g., Alonso & Moscoso, 2017; Alonso et al., 2017), little is known about how technology has transformed selection processes. This study aims to provide new insights into the use of AVI tools in Spain, an area that has not been previously explored. Finally, studies in this field have traditionally used quantitative and deductive methods to estimate the suitability of candidates. This study contributes to a better understanding of the decision-making processes in personnel selection by inductively analyzing the text of evaluative judgments, thus making it possible to identify the main thematic lines on which raters base their decision to recommend or not to recommend candidates.

Candidate Competencies vs. Rater Biases in AVI Tools

One of the primary concerns for HR professionals and scholars in the selection process is evaluating candidates effectively and fairly. To achieve this, various tools have been developed, with AVI being one such tools used for preselecting candidates. An AVI tool typically comprises four or five questions that candidates cannot preview, although they have 30 seconds to prepare their answers. There are two types of question formats (video and text), but the question format does not influence interview performance (Niemitz et al., 2024). Candidates then have 2-3 minutes to respond and record their answers using a webcam through various proprietary software platforms. Once the recording starts, it cannot be interrupted or re-recorded (Dunlop et al., 2022). These interviews are perceived as more useful and easier to use than other preselection tools (Basch et al., 2022) and offer several advantages: they are fast, economical, and timesaving for employees (Dunlop et al., 2022).

Despite the advantages of AVI tools, job raters may incur two types of errors: false positives (recommending an unqualified candidate) and false negatives (not recommending a qualified candidate). False positives are considered worse than false negatives because they can seriously harm both the organization and the person who has been selected. According to the dual-process theory, raters can process candidates’ information in two distinct ways (Derous et al., 2016). On the one hand, some information in job interviews can be processed easily through heuristically driven automatic impression formation, known as Type 1. On the other hand, raters can process candidates’ information laboriously and consciously, with their decisions based on controlled judgments (Type 2). Raters aim to conduct their work through Type 2 processes, employing observation sheets, structured interviews, multiple raters, and recordings of interviews to review before making a decision. The use of AVI tools integrates various functionalities, enabling interviewers to conduct personnel selection processes in a deliberate and thoughtful manner. Another issue with AVI tools is that their design may increase the use of stigma-laden heuristics that bias assessments. In an AVI context, raters have the discretion to decide when to stop watching an interview, potentially denying candidates the opportunity to correct a negative first impression (Lukacik et al., 2022). Furthermore, candidates may not carefully select the setting and timing for recording their answers, inadvertently including background elements that trigger rater biases (Roulin, Lukacik, et al., 2023). Additionally, the manner in which candidates interact with the camera during their responses can negatively influence raters’ perceptions of motivation, interest in the position, or professionalism (Lukacik et al., 2022).

In this regard, several studies have confirmed that first impression significantly influences hiring recommendations (e.g., Buijsrogge et al., 2020; Martín-Raugh et al., 2023; Swider et al., 2016). First impression is easily influenced by a candidate’s attractiveness and non-verbal cues, as these cues are readily accessible and can be processed effortlessly. Attractive candidates exhibit more effective non-verbal cues and consequently receive higher scores than less attractive ones due to their greater sense of power (Tu et al., 2022). Research has demonstrated that both attractiveness and non-verbal communication significantly influence rater and interviewer scores (Koutsoumpis et al., 2024; Martín-Raugh et al., 2023; Nault et al., 2020). The literature on lookism, or discrimination based on physical appearance, has established that attractiveness influences perceptions of competency, leading to favoritism for attractive candidates (Hoffman, 2024; Nault et al., 2020; Niu, 2024; Pireddu et al., 2022). The competence stereotype pertains to qualities such as capability and assertiveness (Fiske, 2018; Fiske et al., 2002). Sociability and morality stereotypes are considered central dimensions for forming a positive view of a person (Leach et al., 2007). Sociability relates to qualities of friendliness, while morality pertains to sincerity and trustworthiness. Additionally, the literature highlights a strong association between interview performance, professional appearance, and non-verbal communication (Martín-Raugh et al., 2023).

Few studies have investigated whether a rater’s competencies or biases have a greater influence on the scores awarded and, consequently, on hiring recommendations in the context of AVI. A recent study (Scott & Roulin, 2024), in which 276 senior raters examined the effect of candidates’ background settings (home, office, and blurred settings) and response quality on overall rating scores, found that first impressions and candidates’ response quality predict overall rating scores. In a hospital-based study comprising 517 observations, findings indicated that scores in communication and conflict resolution competencies significantly influenced overall candidate evaluations, surpassing factors such as aesthetics or procedural order during the use of AVI tools (Torres & Gregory, 2018).

It is established that certain competencies, defined as underlying characteristics of an individual that causally contribute to superior performance in a job or situation (Spencer & Spencer, 2008), are desirable for employers depending on the specific job position (Liff et al., 2024). A field study conducted in Spain with 37 active HR professionals identified six key competencies essential for the development of HR technicians: teamwork, customer orientation, planning, communication, flexibility, and collaboration (Pereda Marín et al., 2003). The present study focuses on assessing three competencies: teamwork, planning oriented to conflict resolution, and flexibility. To evaluate these competencies, the study used written questions centered on past behaviors, supplemented with observation sheets and the Behaviorally Anchored Rating Scale (BARS). This methodology ensures a structured and standardized approach to assessing each candidate’s abilities, facilitating a fair and comprehensive assessment process. By focusing on these competencies, the study aims to identify the most salient attributes that contribute to successful performance in an HR technician role, thereby informing hiring recommendations in the context of AVI.

Building on the findings of the previously mentioned research, which indicate that competency scores hold greater significance than other factors in candidate recommendations, we anticipate that the recommendations for qualified candidates are expected to be driven by the scores they achieve in the competencies relevant to the job position. In contrast, false positives—such as raters endorsing unqualified candidates—will primarily result from rater biases.

Hypothesis 1: Candidate competencies (teamwork, planning oriented to conflict resolution and flexibility) will primarily influence hiring recommendations for qualified candidates more than factors related to raters’ biases (first impression, attractiveness, non-verbal communication, and stereotypes). Conversely, factors associated with raters’ biases will have a stronger influence on hiring recommendations for semi-qualified candidates.

Candidate Competencies and Gender in Hiring Recommendations in AVI Tools

In the workplace, the evaluation and recruitment of women has been negatively influenced by gender stereotypes. Gender stereotypes can be categorized into two main dimensions: agency and communality (Heilman et al., 2024). Agency, which is typically associated more with men, includes attributes related to task orientation and goal achievement. In contrast, communality, more commonly associated with women, encompasses traits such as kindness and concern for others.

This study focuses on examining teamwork and flexibility competencies, considered as communal competencies. Teamwork encompasses problem-solving within a group, commitment to common goals over individual interests, and sharing resources and information. Flexibility involves adapting behavior to varying situations, maintaining effectiveness across different environments, tasks, responsibilities, and interactions with diverse people. In contrast, planning oriented to conflict resolution is viewed as an agentic competency, focusing on objective-oriented planning such as prioritization, establishing action plans to achieve objectives, and implementing appropriate control and follow-up measures. Agency has been subdivided into competence and assertiveness, while communality into morality and warmth (Abele et al., 2016).

In the context of AVIs, a quasi-experimental study conducted in Germany (Kroll & Ziegler, 2016) analyzed fairness in personnel selection for waiter positions among candidates of different genders and ethnicities. The study found no significant differences in teamwork and communication competency scores between men and women candidates. However, candidates with a Turkish background received higher scores in both compared to native German candidates for this particular job position. This study indicates that there are no differences in how women and men are assessed regarding said competencies. However, it does not establish that these competencies will have the same impact on the candidates’ hiring decisions. Therefore, and based on role congruity theory (Eagly & Karau, 2002; Heilman et al., 2024), we expect that communal competencies, such as teamwork and flexibility, will be prioritized in women for recommending advancement to the next phase of the selection process, while for men the emphasis will be on agentic competencies, like planning oriented to conflict resolution.

Hypothesis 2: Competency scores in (a) teamwork and (b) flexibility will have a greater influence on hiring recommendations for women candidates, whereas (c) planning oriented to conflict resolution will have a greater influence on hiring recommendations for men candidates.

Rater Biases and Candidate Gender in Hiring Recommendations in AVI Tools

Culturally, beauty or attractiveness has historically been a more valued trait in women than in men. Given that physically attractive individuals are often perceived as having more positive traits, such as sociability, honesty, intelligence, and life success (Dion et al., 1972), this perception should provide an advantage for attractive women in selection processes. However, conclusions regarding whether attractiveness has a differential impact on men and women remain inconclusive. A meta-analysis by Hosoda et al. (2003) found no significant gender differences, a result further corroborated by more recent studies (Pireddu et al., 2022). Some studies suggest that attractiveness may benefit men (Johnson et al., 2010; Sheppard & Johnson, 2019), while others argue that it plays a more significant role in the evaluation of women (Mao et al., 2024; Turkmenoglu, 2020).

In Spain, a recent study (Cuadrado et al., 2024) revealed that participants exhibited more favourable attitudes toward women perceived as more attractive compared to those considered less attractive. This study also found that women were considered more competent when applying for male-typed jobs than for neutral-typed jobs. Furthermore, Watkins and Johnston (2000) found that physical attractiveness did not influence evaluations when a job application was exceptional, but it did influence evaluations when the applicant was ordinary. In this regard, Hosoda et al. (2003) found that attractiveness may become decisive when decision-making is difficult or multiple candidates are evaluated consecutively.

Another variable explored to understand biases in job interviews is non-verbal communication (Frauendorfer & Mast, 2015). A recent meta-analysis (Martín-Raugh et al., 2023), which synthesizes findings from 63 studies conducted over the past 70 years, reveals a positive correlation between non-verbal cues and candidate evaluations. The analysis highlights professional appearance, eye contact, and head movements as particularly influential factors in candidate assessments. Additionally, gender differences were noted, with facial expressions and professional appearance having a greater impact on the evaluation of women candidates.

The few studies that have examined the influence of aesthetics in an AVI context are promising. A field study in Taiwan (Suen et al., 2019) found that first impression and physical attractiveness significantly affected structured interview ratings (including AVI tools, synchronous video interviews, and artificial intelligence), but these effects were diminished in the case of AVI tools. They also found no significant differences between women and men regarding both first impression and physical attractiveness. Koutsoumpis et al. (2024), who also identified that attractiveness bias was present in AVI contexts, found similar results: participants considered attractive, regardless of their gender, received higher scores for their interview performance.

Despite the potential positive outcomes of AVI tools, we anticipate that rater bias, influenced by first impressions, attractiveness, and non-verbal communication, will influence hiring decisions, particularly for women candidates. This expectation is based on the fact that raters must evaluate multiple candidates, and as noted by Hosoda et al. (2003), attractiveness often plays a decisive role in hiring recommendations. Supporting this, recent findings by Cuadrado et al. (2024) in Spain show that attractive women tend to gain a competitive advantage over other candidates in selection processes. Furthermore, Martín-Raugh et al. (2023) emphasizes the close relationship between attractiveness and non-verbal communication. Therefore, we hypothesize that:

Hypothesis 3: a) First impression, b) attractiveness (b), and c) non-verbal communication will have a greater influence on hiring recommendations for women candidates compared to men candidates.

In the workplace, women’s evaluation and hiring have been adversely affected by gender stereotypes related to competence and sociability. Traditionally, men have been depicted as more competent and brilliant (more aggressive, competitive, dominant, assertive), whereas women have been perceived as more sociable (more caring, empathetic, sensitive, passive) (e.g., Correll, 2017; Duehr & Bono, 2006; Fiske, 2018; Hentschel et al., 2019). There is currently a slight shift in these gender stereotypes. Recent studies have indicated that women candidates are perceived as more competent than men in some cases, and occasionally more trustworthy and socially stable (Leach et al., 2017), although this is not always consistent (Roulin, Lukacik, et al., 2023). Other studies suggest that women who deviate from these stereotypes may be viewed negatively (e.g., Corrington & Hebl, 2018; Rudman & Phelan, 2008): whereas men are expected to be competitive, women are often expected to balance presenting their competence with maintaining warmth and likability. Along these lines, two studies conducted in Italy revealed that competence is the most critical stereotype influencing the hiring recommendation for men candidates, whereas for women candidates all three stereotypes (competence, sociability, and morality) gained importance (Moscatelli et al., 2020), particularly influencing the overall impression through factors like facial morality, facial competence, and attractiveness (Menegatti et al., 2021). Similarly, an audit study by Quadlin (2018), which involved 261 human resource professionals and examined whether men and women receive equal returns on academic performance in hiring, revealed that competence and commitment are valued in hiring decisions only for men. In the case of women, other characteristics, such as being sociable and outgoing, are more appreciated, leading to a penalty for women with good grades. However, a study from the UK (Pireddu et al., 2022) highlighted that attractiveness and competence were equally relevant for hiring decisions for both men and women, while morality and sociability were more critical in assessing men than women for leadership positions. Given the contradictory findings and the lack of testing on the impact of stereotypes on recommendation decisions using AVI tools, we hypothesize that:

Hypothesis 4: a) Sociability and b) morality stereotypes will have a greater influence on hiring recommendations for women candidates, whereas c) the competence stereotype will have a greater influence on hiring recommendations for men candidates.

Regarding the thematic profiles that emerge from the participants’ evaluations, we adopt a fundamentally exploratory approach due to the absence of previous work in this area. Generally, we expect to find compatibility between the hypotheses presented in the quantitative part of the study.

Method

Sample

The study involved 186 HR professionals located in Spain. Thirty-five participants were excluded from the analysis due to errors in candidate assessments, or because they skipped or failed to respond to assessments for at least three candidates. After data cleaning, the final sample consisted of 151 HR professionals. Among these participants, 63.3% were women (n = 95) with a mean age of 32.41 years (SD = 10.20).

In terms of professional roles, 53.5% identified as HR technicians, primarily employed in companies (38.7%, n = 58) or HR consulting firms (20.0%, n = 30) with more than 250 employees (35.7%, n = 46). Regarding educational background and HR experience, a majority held a master’s degree (55.7%, n = 83) or a bachelor’s degree/diploma (38.9%, n = 58), and most had less than 5 years of work experience (59.7%, n = 89). While 63.8% (n = 115) reported frequently or very frequently using competency-based job interviews, only 3.9% (n = 5) reported using AVI tools with the same frequency (See Table 1).

Table 1

Demographic Profile of Sample

Note. N = 151. Participants were on average 32.41 years old (SD = 10.20), with a maximum age of 70 and a minimum age of 21 years old. Number of participants with missing data in the analysed sample: gender: 1, age: 1, education level: 2, HR work experience: 2, work experience in organization types: 1, job position: 22 and number of employees current organization: 22.

Design of the Study

The study employed a quasi-experimental design utilizing discrimination-testing methodology. This design was quasi-experimental because participants were randomly assigned to one of four experimental conditions, while considering their work experience to ensure representation of both senior and junior professionals across all conditions. The discrimination-testing methodology involved presenting two subjects with similar profiles and skills for evaluation. The subjects differed in a specific characteristic, such as gender, allowing for the detection of discrimination based on that characteristic.

In sum, this study was a 2 (gender) x 2 (candidate qualification) within-subjects design. Each participant evaluated 4 candidates varying by gender and qualification. The primary dependent variable was the decision to recommend or not recommend the candidate for advancement to the next phase of the selection process at organization. In addition to the final recommendation decision, each candidate was evaluated on measures of rater bias and candidate competencies. Bias measures included responses to competence, morality, and sociability stereotypes, as well as first impression, attractiveness and non-verbal communication of candidates. Competency measures included teamwork, planning oriented to conflict resolution, and flexibility scores.

Quasi-experiment Preparation

The principal researcher interviewed 9 HR professionals, 4 with proven professional experience in the sector and 5 with fewer than 6 months of experience. The interview consisted of three questions related to teamwork (“Tell me about the last time you worked in a team, your role, your experience”), planning oriented to conflict resolution (“Describe a tense situation you had with a client”), and flexibility competencies (“Describe a situation you had to face suddenly”). The responses from the 9 professionals were disaggregated, resulting in a list of 27 responses—9 for each competency—with 4 from experienced professionals and 4 from less experienced professionals. This step aimed to dissociate the three responses from each interviewed person to ensure heterogeneity and equivalence of the material. The 27 responses were assigned to evaluate two sector experts, independent and external to the study. Each professional received an observation rubric with BARs to evaluate each response on teamwork, planning oriented to conflict resolution and flexibility competencies. The 4 responses with the highest average scores and the 4 responses with the lowest average scores were selected.

Based on the 8 selected responses, 8 interview models were created (4 interview models for qualified candidates and 4 interview models for semi-qualified). All models, whether for qualified or semi-qualified candidates, had the same introduction and farewell questions. This ensured control over years of previous experience, current job quality, and interview conclusion format. 5 men and 5 women were recruited to act as candidates in AVI tools and to record video interpretations for the quasi-experiment. To ensure internal and external validity, the same actor/actress portrayed both the qualified and semi-qualified candidates. This approach maintained neutral and similar contextual and personal characteristics across recordings to mitigate any influence of the recording context on evaluations. Each AVI lasted approximately 10 minutes. In total, 20 AVIs were recorded (video interpretations), manipulating candidate gender and qualification.

A pilot study was conducted with 9 PhD students from the University of the Basque Country to assess video interpretations, observation sheets, and questionnaires. As a result, 8 AVIs (2 women and 2 men candidates, each participating in two interviews) were excluded, because interviewees were perceived as fake.

Procedure

All participants identified themselves as HR professionals and were contacted either through LinkedIn or email (UPV/EHU alumni now working in HR). They were invited to take part in a selection process using AVI tools. Upon confirming their participation, each participant received an email containing a link to a dedicated webpage created for the study.

The website featured 4 AVI tools: one each for a qualified man candidate, qualified woman candidate, semi-qualified man candidate, and semi-qualified woman candidate. Study participants were instructed to evaluate these candidates as they would in a real selection process for an HR technician role. They were tasked with assessing the suitability of each interviewee for the position.

Prior to evaluating the candidates, participants were provided with an introduction to the organization where the HR technician would potentially be employed. They observed the competencies and skills of each candidate using observation sheets and evaluated these using BARS. Subsequently, participants had to make a decision whether to recommend each candidate to proceed to the next stage of the selection process. Participants in the study were provided with 4 different links. While for some participants, some actors were qualified, for others they appeared as semi-qualified, and vice versa. To control the possible strange variables, the order of the presentation of the candidates was carried out by means of randomization.

In all cases, we provided information about the research objectives and permission to use the data was requested (participants signed the informed consent), and anonymity and confidentiality were ensured. This study complied with all the ethical requirements in accordance with the University of the Basque Country, UPV/EHU (M10/2019/157), as well as national and international (APA) ethical guidelines.

Measures

Dependent Variables

Hiring Recommendation. Own elaboration mono-item scale that designates recommendation frequency regarding each hiring candidate to move on to the next phase of the selection process. Participants answered to the single item "Would you recommend this candidate to move on to the next phase of the selection process?", where 1 = yes, 0 = no. Subsequently, in order to further explore the reasons for moving the candidate to the next phase, an open question was used to ask them to reason their decision: “Explain why you chose to recommend or not recommend.”

Independent Variables

Competencies. Participants assessed the candidate in the dimensions of teamwork, planning oriented to conflict resolution and flexibility using BARS for the assessment of candidates (see Table 2). BARS scales range from 0, no evidence of the competency, to 4, high evidence of the competency; but it was recodified in a 5 point Likert scale for further analysis. Inter-rater agreement coefficients were .73 for highly qualified woman, .59 for somewhat qualified woman, ,67 for highly qualified man, and .63 for somewhat qualified man.

Table 2

Observation Sheet with Behaviorally Anchored Rating Scale (BARS)

First Impression. In order to evaluate the initial impression of each candidate, after the introductory question to each applicant, participants answered a 4-item scale adapted from the Swider et al. (2016) scale. An example item is: “This candidate appears to be very qualified.” The inter-rater agreement coefficients were as follows: .85 for highly qualified women, .87 for somewhat qualified women, .88 for highly qualified men, and .85 for somewhat qualified men.

Attractiveness. To analyze the perception of the participants regarding the physical appearance of the candidates, a 4-item scale was used adapted from the Boor et al. (1983) scale. An example of an item is: “This candidate is physically attractive.” The inter-rater agreement coefficients were: .85 for highly qualified woman, .81 for somewhat qualified woman, .85 for highly qualified man, and .81 for somewhat qualified man.

Non-verbal Communication. The ECO-CNV scale by Roso-Bas et al. (2017) which evaluates body expression, facial expression, gaze at the camera, naturalness of speech and fluency was adapted. An example of item is: “Appropriate gestures illustrating speech.” The inter-rater agreement coefficients were .74 for women categorized as highly qualified, .67 for somewhat qualified woman, .80 for man classified as highly qualified, and .76 for evaluated as somewhat qualified man.

Stereotypes. To assess stereotypes about men and women, a 9-item scale was elaborated from the work of Fiske et al. (2018), Fiske et al. (2002), and Leach et al. (2007; 2017). Responses were answered on a 5-point Likert-type response scale (1 = not at all, 5 = a lot). Three items measure morality (e.g., “They are honest”), another three items measure sociability (e.g., “They are kind”) and three more items measure competence (e.g., “They are intelligent”). The inter-rater agreement coefficients for the morality stereotype were .84, .71, .90, and .71 for highly qualified woman, somewhat qualified woman, highly qualified man, and somewhat qualified man, respectively. Similarly, for the sociability stereotype, the coefficients were .76, .76, .73, and .79 for highly qualified woman, somewhat qualified woman, highly qualified man, and somewhat qualified man, respectively. Lastly, for the competence stereotype, the coefficients were 72, .81, .78, and .79 for highly qualified woman, somewhat qualified woman, highly qualified man, and somewhat qualified man, respectively.

The response scale for all instruments were a 5-point Likert scale where 1 = strongly disagree and 5 = strongly agree.

Manipulation Check Variable

Candidate Qualification. To check the manipulation of perceived qualification of the candidates, we asked participants to indicate the degree to which they consider the candidate is qualified for the vacant position. Responses were answered on a 5-point Likert-type response scale (1 = unqualified, 5 = fully qualify).

Analysis

Quantitative data analysis was performed using IBM SPSS Statistics 27. First, preliminary analysis were performed: descriptive analysis and Cronbach alpha. The general assumption of multicollinearity between the variables studied was calculated using Spearman’s rho correlations. No major violations were found, as there is no correlation greater than .70 between the recommendation to proceed to the next stage of the selection process and other variables.

To determine the experimental manipulation, repeated measure ANOVAs, Cochrane Q and McNemar tests were calculated. We conducted repeated measure ANOVAs to examine if there were statistically differences in qualification perceptions between four conditions. Cochrane Q tests were performed to test raters are more likely to select qualified candidates than semi-qualified candidates. In addition, McNemar tests were performed to determine the differences between candidates (Table 3). As there are 4 profiles (p1, p2, p3, and p4) the possible comparisons are 6: p1-p2, p1-p3, p1-p4, p2-p3, p2-p4, and p3-p4.

Table 3

Findings for Nonparametric McNemar Test to Identify Which Candidate Profiles Differ between Them

Binary logistic regression models were run to test the hypotheses of the study. Four separate binary logistic regression models with forward Wald to control for confounding were run to test hypothesis 1. The remaining hypotheses of the study were tested using binary logistic regression calculations with the enter method. The relationship between the predictors and the dependent variable was estimated using the odd ratio (OR) statistic. Values greater than 1 indicate that an increase in the predictor variable is associated with an increased likelihood of recommendation, while values less than 1 are associated with a decreased likelihood of recommendation.

To delve deeper into the content of the value judgments of the AVI tools made by the participants, all open-ended responses to the question “Explain why you chose to recommend or not recommend” were used as a corpus for lexical analysis using the Iramuteq software (version 0.7 alpha 2). By using an automated form of analysis, the study reduces reliability and validity problems typically associated with text analysis (Klein & Licata, 2003). The analysis is based on a contingency table resulting from associating portions of text (Elementary Context Units, ECUs) with each word. From this contingency table, a matrix of squared distances is generated, such that ECUs are considered closer if they share common words (Reinert, 1986). The program proceeds with a top-down hierarchical cluster analysis, resulting in sets of ECUs that best differentiate the vocabulary and assist in interpretation. Following previously established procedures (Idoiaga Mondragon et al., 2023), the most significant vocabulary in each class was selected using the following criteria: an expected value of the word greater than 5, a chi-square association statistical test in each class, and the word appearing in the class in a percentage greater than 50%.

Results

Checking Manipulation

Raters were asked about the perceived degree of qualification of the candidates. Both qualified candidates (woman, X = 4.167, SD = 0.760; man, X = 4.015, SD = 0.828) were perceived as highly qualified, while the semi-qualified candidates (woman, X = 3.264, SD = 0.879; man, X = 2.633, SD = 0.879) were perceived as low or somewhat qualified. Our results showed that there were significant differences between the candidates regarding hiring recommendation, Q(3) = 191.354, p < .001. Both highly qualified candidates (woman, 91.2%, n = 125; man, 89.0%, n = 113) were recommended to a greater extent than semi-qualified candidates (woman, 48.2%, n = 64; man, 16.9%, n = 24). Therefore, experimental manipulation worked well.

Hypothesis Testing

Correlation analyses are presented below to identify the variables associated with the recommendation of highly qualified woman (Table 4), highly qualified man (Table 5), semi-qualified woman (Table 6) and semi-qualified man (Table 7).

Table 4

Spearman’s Correlation Matrix for Qualified Woman

*p < .05, **p < .01.

Table 5

Spearman’s Correlation Matrix for Qualified Man

Table 6

Spearman’s Correlation Matrix for Semi-Qualified Woman

*p < .05, **p < .01.

Table 7

Spearman’s Correlation Matrix for Semi-qualified Man

Our results of four binary logistic regression indicated that the final step of the model explained 76.2% (Nagelkerke's R²) of the variance for highly qualified women (step 5), 54.6% (Nagelkerke's R²) for highly qualified men (step 6), 56.1% (Nagelkerke's R²) for semi-qualified women (step 7), and 34.9Ùª (Nagelkerke's R²) for semi-qualified men (step 7).

Hypothesis 1 expected that candidate competences will primarily influence hiring recommendations for highly qualified candidates more than factors related to raters’ biases. Conversely, for semi-qualified candidates, factors associated with raters’ biases will have a stronger influence on hiring recommendations. The variables predicting the hiring recommendations of candidates are detailed in Table 8, ranked by effectiveness. For highly qualified women, competence (OR = 342.893), first impression (OR = 13.429), and teamwork (OR = 2.406) primarily account for their hiring recommendations. Among highly qualified men, non-verbal communication (OR = 3.711) emerges as the most influential predictor, followed by planning oriented to conflict resolution (OR = 2.039), teamwork (OR = 2.013), and attractiveness (OR = 0.149). For semi-qualified women, competence (OR = 6.671), sociability (OR = 3.366), and flexibility (OR = 1.534) play significant roles. In the case of semi-qualified men, sociability (OR = 3.397), morality (OR = 2.593), and planning oriented to conflict resolution (OR = 1.640) are noteworthy predictors. Therefore, hypothesis 1 is partially supported, as anticipated, given that rater biases strongly influence the hiring recommendations for both semi-qualified and qualified candidates.

Table 8

Binary Logistic Regressions to Explain Recommendation of the Candidates

Hypothesis 2 expected that competency scores in a) teamwork and b) flexibility will have a greater influence on hiring recommendations for women candidates, whereas c) planning oriented to conflict resolution will have a greater influence on hiring recommendations for men candidates. According to Table 8, men perceived as planning oriented to conflict resolution had a 2.039 times higher probability of being hiring recommended if they were highly qualified and a 1.640 times higher probability if they were semi-qualified. For highly qualified women, teamwork competency emerges as the most predictive (OR = 2.406), whereas for semi-qualified women, flexibility is prominent (OR = 1.534). Therefore, hypothesis 2 is partially supported by our data.

Regarding rater biases, our data indicated that first impression predicts the hiring recommendations of qualified women (OR = 13.429), while non-verbal communication predicts the hiring recommendations of qualified men (OR = 3.711). Surprisingly, qualified men who are considered attractive have .149 times lower probability of being recommended. Thus, hypothesis 3 is partially confirmed.

Our results also showed that highly qualified women perceived as competent had 342.892 times higher probability of being recommended for hiring, and 6.671 times higher in the case of semi-qualified women. Another difference between highly qualified and semi-qualified candidates is that sociability is significant only for semi-qualified candidates (women, OR = 3.366; men, OR = 3.397). For semi-qualified men, being moral increases the likelihood of recommendation by 2.593 times. Consequently, Hypothesis 4 is not supported by our data.

Qualitative Analyses

The corpus consisted of 25.382 words, with 2.996 unique words. The descending hierarchical cluster analysis divided the corpus into 492 ECUs. The results revealed two main clusters. The first cluster focused on aspects related to the evaluation of a candidate’s self-presentation (Classes 4 and 5), while the second cluster was related to the decision-making process regarding a candidate’s recommendation to proceed to the second phase (Classes 3, 2, and 1). The first cluster was composed of experience in teamwork (Class 4, 26.4%) and behavioral information observed in the AVI tools (Class 5, 20.1%). Both Classes were associated with the decision not to recommend the candidate for the second phase (p < .05). The second cluster contained three Classes: a candidate’s suitability for the position (Class 3, 22.4%), the decision on the recommendation (Class 1, 18.6%), and overall assessment of the candidate (Class 2, 12.6%). In this second cluster, a candidate’s suitability for the position and the candidate’s overall assessment were associated with the decision to recommend the candidate for the second phase (p < .04 and p < .03, respectively). Furthermore, a candidate’s suitability for the position was also associated with the candidate’s qualification (Class 3, p < .05). (See Figure 1).

Figure 1

Results of the Descending Cluster Analysis Using the Reinert Method. Explanations Given about the Decision Made in the Selection Process of the Candidates.

An example of an ECU from the Classes in the first cluster were the following: “Three years of experience in the sector. He has training and development, is motivated by personal goals but less so by the company’s goals. He argues that teamwork is important to him, but his arguments reveal a selfish attitude towards colleagues. Lacks the ability to share or listen to others’ ideas. He should have shown more empathy…” (Class 4, candidate not recommended) (chi square = 167.69); “When explaining the problem with the other team members, she displayed a very inflexible attitude and offered little support to the workers who were covering for their colleagues. It does not seem to me that she managed the problem well, as when she was put in charge of leading the team, she did not mention any activities carried out.” (Class 5, candidate not recommended) (chi square = 377.61).

Examples of ECUs in the second cluster include the following:

“She possesses the right skills for the position, but her work experience does not cover all the areas of activity in which he would be performing his job. Even so, I believe she has sufficient ability to acquire knowledge in these areas and perform the job satisfactorily. I think she has potential, the ability to learn, and the motivation for the position” (Class 3, candidate recommended) (chi square = 156.24); “Overall, I liked him. He effectively defended his profile by emphasizing his professional trajectory and addressing specific situations to highlight his competencies.” (Class 2, candidate recommended) (Chi Square = 202.95); “I would recommend that the candidate proceed to the next phase of the process because a face-to-face interview would be advisable, in which some other nuances that require spontaneous conversation could be appreciated. Further explore his performance” (Class 1, candidate recommended) (chi square = 310.38).

However, there are also examples in this cluster in this line:

“For him, this position is a step backward; he has already worked as an HR manager and says he is in a project similar to ours, from which he is leaving due to burnout. If the manager of our company is not planning to retire and leave the person we hire as their successor, this guy will also get tired and eventually leave. If it is true that his profile is entirely suitable for the operational part, in terms of motivation, he would start off very strong but would eventually get tired and look for another challenge, or he would continue looking for an HR manager position while working with us. He is already a senior profile, and for this position, we would look for something more junior, like the first candidate.” (Class 2, candidate not recommended) (chi square = 190.99).

The results suggest that the justification for not recommending candidates for the second phase is based on candidate deficiencies based on the evaluation of social and organizational competencies. In contrast, the justifications put forth to reason the recommendation of candidates are mostly based both on the fit between a candidate adequacy to the applied position and on a comprehensive evaluation of their profile.

Discussion

The aim of this study was to identify the most significant competencies and rater biases in hiring recommendations for a HR technician position using AVI tools in a Spanish context, taking into account the gender and qualifications of the candidates.

Contrary to previous research findings (Scott & Roulin, 2024; Torres & Gregory, 2018), our results revealed that competencies were not the primary influence on hiring recommendations, although they did play an important role. In line with this, qualitative analyses showed that justifications for not advancing candidates to the second phase were primarily based on perceived deficiencies in social and organizational competencies. Conversely, recommendations to advance candidates were largely driven by their alignment with position requirements, combined with a comprehensive evaluation of their overall profile. Consistent with the role congruity theory (Eagly & Karau, 2002; Heilman et al., 2024), this study confirms that raters place greater value on competencies associated with gender roles when hiring recommending candidates. Our results highlight that for women it is crucial to appear competent and possess communal competencies (such as teamwork and flexibility) to be recommended by the raters, while for men morality and agentic competency (planning oriented to conflict resolution) are crucial.

In contrast with previous studies (Menegatti et al., 2021; Moscatelli et al., 2020; Pireddu et al., 2022; Quadlin, 2018), for women appearing competent is more important than it is for men in order to be recommended. There is a clear distinction between highly qualified women and semi-qualified women concerning the additional variables associated with competence. For highly qualified women, the first impression is particularly important, while for semi-qualified women, being sociable is the key. Literature in this area has demonstrated that first impression does influence overall rating scores (Swider et al., 2016; Torres & Gregory, 2018; Tu et al., 2022), although to a lesser extent in AVI tools (Suen et al., 2019; Torres & Gregory, 2018). Our results provide evidence that the importance of a candidate’s first impression in AVI tools varies by gender. It is crucial only for highly qualified women, highlighting the significance of appearing competent, not just being qualified for the position.

In the case of highly qualified men, our results suggest that appearing competent, having effective non-verbal communication, and being attractive may lead to an overvaluation of the candidate by raters. According to previous research (Martín-Raugh, et al., 2023), we found that non-verbal communication explains qualified men’ recommendation. Non-verbal presence is associated with eye contact, smiling, and expansive body posture (Tu et al., 2022), but also with professional appearance (Martín-Raugh, et al., 2023). Therefore, highly qualified men whose non-verbal communication is effective are more likely to be recommended to the extent that they are seen as competent.

Our results indicate that while highly qualified men are perceived as attractive, they are less likely to be recommended for hiring. Some authors (Tu et al., 2022) have pointed out attractiveness and non-verbal communication are closely linked, since attractive candidates display more effective non-verbal cues and higher scores than less attractive ones. Highly qualified men may be viewed as overqualified for the position and inspire fear in the raters that they may soon leave the job. Birkelund et al. (2022) suggested that women candidates are perceived as more stable staff than men candidates. Turnover intention is one of the most alarming factors for the organizations since it involves discomfort and lower performance at work. Therefore, one of the basic principles of personnel selection processes is to avoid hiring a candidate who is overqualified for the position. The raters in this study confirmed that, alleging in their recommendation justifications doubts about recommending qualified male, since they could leave the job and look for others of higher rank. In the evaluation of qualified men, the halo effect (Thorndike, 1920) may be at play, where additional characteristics are attributed to a person based on a few observed traits. In this case, it would result in a more positive overall judgment based on characteristics such as competence or attractiveness. This could explain why participants in the study report that qualified men are perceived as overqualified for the position. The halo effect is a judgment error, as evidenced by the fact that the profiles of qualified men and women are similar; however, over qualification is only attributed to men. In this line, a study, comprising four experimental investigations into managers’ hiring decisions, revealed that signals of high capability are not necessarily perceived as advantageous in the selection process. This is because highly capable candidates are seen as having more opportunities to leave the job and are perceived as less committed to the organization (Galperin, et al., 2020). Consequently, being perceived as highly competent may work against men in hiring decisions.

Consistent with previous studies (e.g., Corrington & Hebl, 2018; Rudman & Phelan, 2008), our data show that sociability is a highly valued characteristic in interviews, but only for semi-qualified candidates. It is important to note that there is a gender-based difference: semi-qualified women must also appear competent to be recommended for hiring, whereas for semi-qualified men, demonstrating moral qualities is more important. Indeed, young men may be perceived as less sociable if they are perceived as competent and are penalized in the recruitment process, while this does not happen in the young women case (Krings et al., 2023).

Implications for Practice

The checking manipulations conducted to verify that the use of AVI tools correctly identified qualified and semi-qualified candidates suggest that this tool is effective in recognizing competent candidates. However, despite the fact that the sample in this study consisted of experienced professionals, biases continue to emerge, even with their training and expertise. This leads us to two conclusions regarding the practical implications: first, the use of AVI tools is not suitable for all stages of the selection process, and second, it is essential to continue addressing this issue in the organizational context at different levels.

Regarding the timing of using the AVI tool, it is considered suitable for the early stages as a preliminary screening mechanism for evaluating competencies. However, given that these tools are not free of biases, it is crucial that they be complemented with in-depth job interviews in the later stages of the selection process to ensure a more thorough and accurate assessment of the candidates.

In terms of actions that can be implemented to prevent the emergence of stereotypes, at individual level, this study demonstrates a link between the activation of gender stereotypes and candidate recommendations. Understanding this association could benefit raters by enabling them to improve their selection processes and reduce bias in hiring decisions based on stereotypical thinking. Therefore, training raters in the use of AVI tools and in recognizing the biases that may emerge during the selection process is essential for promoting fairness. Providing objective information about different groups and highlighting the emergence of bias are effective training strategies to reduce discrimination and promote equal opportunities. Recent studies on ethnic discrimination (Derous et al., 2020) have confirmed that prejudice can be reduced through targeted training. However, the same study emphasized the need for tools to sustain the long-term impact of such training. While training may initially reduce hiring discrimination, the effects tend to diminish over time, allowing discrimination to resurface. Future research will focus on developing new training methods that ensure equal access to employment opportunities in a more durable way.

Likewise, the training method and guide could be used as part of the formal university curriculum for training future professionals. The 2030 Agenda emphasizes the need to build more inclusive societies (Goal 11) and promote equality (Goals 5 and 10) for all. This can only be achieved through quality education (Goal 4) that addresses these issues and facilitates the creation of decent jobs (Goal 8) accessible to everyone, without any form of discrimination.

At organizational level, it would be interesting to develop a guide to ensure non-discriminatory e-selection processes that allow consultants and companies to have guidelines for action to make effective selection processes and ensuring equal opportunities.

Limitations and Future Research

Although this study offers insight into AVI tools functioning in the preselection process, it is not without limitations. First, the small sample does not allow the results to be generalized nor does it allow for more complex analyses that allow establishing models about decision-making in personnel selection. We call for researchers to carry out structural equation models that allow us to delve deeper into the results found in this study.

Another limitation of this study pertains to its design, as it exclusively focused on the selection of younger profiles. Consequently, additional research is needed to explore the outcomes for more mature or senior profiles. Future studies should aim to replicate this research while considering other forms of discrimination that may arise based on candidate profile, such as ethnicity, sexual orientation, or disability. Additionally, it would be valuable to investigate other types of jobs, including those related to senior officials, to provide a more comprehensive understanding of the selection process. It would also be interesting for future research to analyze whether the biases identified in this study persist when using artificial intelligence to evaluate candidates through AVI tools.

Likewise, we call for researchers to investigate the biases that may influence the rating of candidates. Our study suggests that while raters focus on the competence scores assigned to candidates when making recommendations, these scores can also be affected by rater biases. Understanding these biases is crucial for ensuring a fair and equitable selection process.

Conclusion

AVI tools have been recognized as valuable instruments that help raters focus on assessing competencies while minimizing biases (Kroll & Ziegler, 2016; Scott & Roulin, 2024; Suen et al., 2019; Torres & Gregory, 2018). However, contrary to findings from studies in other fields and geographical contexts, our results indicate that while raters are able to differentiate between qualified and semi-qualified candidates, their decisions remain influenced by biases. This highlights the importance of using AVIs as preliminary and complementary tools, alongside other methods such as in-depth interviews, while also emphasizing the need to raise awareness and provide training for evaluators on the potential biases that may arise during the selection process.

Conflict of Interest

The authors of this article declare no conflict of interest.

Cite this article as: Martínez-Moreno, E., Elgorriaga, E., Gil de Montes, L., Larruskain-Mandiola, O. (2024). Effects of candidate gender and qualification on hiring recommendations in asynchronous video interview tools. Journal of Work and Organizational Psychology, 40(3), 164-176. https://doi.org/10.5093/jwop2024a14

Funding: This research was supported by a grant from Basque Government Research Groups (‘Culture, Cognition, and Emotion’ Consolidated Group; IT1598-22)-

References

Cite this article as: Martínez-Moreno, E., Elgorriaga, E., Gil de Montes, L., and Larruskain-Mandiola, O. (2024). Effects of Candidate Gender and Qualification on Hiring Recommendations in Asynchronous Video Interview Tools. Journal of Work and Organizational Psychology, 40(3), 164 - 176. https://doi.org/10.5093/jwop2024a14

Correspondence: edurne.martinez@ehu.eus (E. Martínez Moreno).

Copyright © 2025. Colegio Oficial de la Psicología de Madrid

© Copyright 2025. Colegio Oficial de la Psicología de Madrid ContactPrivacy PolicyCookies Policy

We use our own and third­party cookies. The data we compile is analysed to improve the website and to offer more personalized services. By continuing to browse, you are agreeing to our use of cookies. For more information, see our cookies policy

Aceptar