The Unrealized Potential of Technology in Selection Assessment

Ann M. Ryan; Eva Derous

doi:10.5093/jwop2019a10

Vol. 35. Num. 2. - 2019. Pages 85 - 92

<< Previous

Next >>

The Unrealized Potential of Technology in Selection Assessment

Special Issue: New Insights on Technology and Assessment

[El potencial de la tecnología no empleado en la evaluación de la selección]

Ann M. Ryan¹ and Eva Derous²

¹Michigan State University, USA; ²Ghent University, Belgium

https://doi.org/10.5093/jwop2019a10

Received 29 November 2018, Accepted 29 March 2019

Abstract

Technological advances in assessment have radically changed the landscape of employee selection. This paper focuses on three areas where the promise of those technological changes remains undelivered. First, while new ways of measuring constructs are being implemented, new constructs are not being assessed, nor is it always clear what constructs the new ways are measuring. Second, while technology in assessment leads to much greater efficiency, there are also untested assumptions about effectiveness and fairness. There is little consideration of potential negative byproducts of contextual enhancement, removing human judges, and collecting more data. Third, there has been insufficient consideration of the changed nature of work due to technology when assessing candidates. Virtuality, contingent work arrangements, automation, transparency, and globalization should all be having greater impact on selection assessment design. A critique of the current state of affairs is offered and illustrations of future directions with regard to each aspect is provided.

Resumen

Los avances tecnológicos en la evaluación han cambiado radicalmente el panorama de la selección de empleados. Este estudio se enfoca en tres áreas en las que los cambios tecnológicos aún no se han producido. En primer lugar, mientras se están implementando nuevas formas de medir los componentes de la evaluación, estos nuevos componentes no se están evaluando, ni tampoco está claro qué componentes están midiendo los nuevos modelos. En segundo lugar, si bien la tecnología en la evaluación conduce a una eficiencia mucho mayor, también hay suposiciones no probadas sobre su eficacia e imparcialidad. Existe una escasa consideración de los posibles subproductos negativos de la mejora contextual, la eliminación de los juicios humanos y la recopilación de más datos. En tercer lugar, no se ha considerado suficientemente la naturaleza cambiante del trabajo debido a la tecnología a la hora de evaluar a los candidatos. La virtualidad, la supeditación a los acuerdos laborales, la automatización, la transparencia y la globalización deberían tener un mayor impacto en el diseño de la evaluación en selección. Se hace una crítica de la situación actual y se proporcionan ejemplos de directivas futuras con respecto a cada uno de estos aspectos.

Palabras clave

Selección, Evaluación, Tecnología.

Keywords

Selection, Assessment, Technology.

Cite this article as: Ryan, A. M. & Derous, E. (2019). The Unrealized Potential of Technology in Selection Assessment. Journal of Work and Organizational Psychology, 35, 85 - 92. https://doi.org/10.5093/jwop2019a10

Correspondence: ryanan@msu.edu (A. M. Ryan).

https://doi.org/10.5093/jwop2019a10

00005

heading: Research Article

Introduction

The Unrealized Potential of Technology in Selection Assessment

Over the past two decades, assessment for employee selection has undergone radical changes due to technological advances (see Tippins, 2015, for a review). Examples include changes in assessment delivery (computerized, online, mobile, use of adaptive testing), changes in assessment content (greater use of video and audio, graphics, gamification), changes in interactivity (recording of applicants, video interviewing), changes in scoring and reporting (quicker, internet delivered), and changes in many other aspects (e.g., use of algorithms and data mining). The purpose of this paper is not to provide another review of these advances as quality treatments exist (Scott, Bartram, & Reynolds, 2017; Tippins, 2015) but to discuss 3 ways incorporation and use of technological advances in assessment practice and research could be strengthened.

First, we discuss the need to think more creatively regarding what is measured. That is, despite the promise of technology as a means of measuring new KSAOs (i.e., knowledge, skills, abilities, and other constructs like motivation) and richer content, many of our assessments are just “transported” traditional tools. As an example, gamified assessments (Bhatia & Ryan, 2018) most often are measuring traditional constructs in environments with game elements layered on.

Second, we discuss the need to move our focus from efficiency to effectiveness and fairness in thinking about the use of technology (a point raised by Ryan & Derous, 2016). As examples, the use of adaptive testing, mobile testing, and video interviewing have all been undertaken because they make the hiring process more efficient (for both employer and applicant), but the investigation of factors that impact their effectiveness is still quite nascent. Similarly, the use of algorithms and social network information in hiring (e.g., Stoughton, Thompson, & Meade, 2015) has wide appeal as an efficient means of screening, but the evidence suggests these methods are not always more effective than more traditional approaches to assessment of the same constructs.

Finally, the need to adapt assessments for how technology has changed the nature of work should be a consideration in our exploration of new approaches. For example, how are we thinking about assessment differently due to the changing nature of work arrangements to shorter engagements and contract work (Spreitzer, Cameron, & Garrett, 2017)? Are we assessing the skills needed for virtual workplaces (e.g., Schulze, Schultze, West, & Krumm, 2017)? How has the shift to online talent pools and talent badges changed the “business model” underlying the assessment industry (Chamorro-Premuzic, Winsborough, Sherman, & Hogan, 2016)? What are validation strategies that work with these new models? These changes are fueled by changes in the nature of work and employment relationships because of technology; our assessment research and practice also needs to consider how technology has changed work, not just how it has changed assessment.

New Construct Measurement

Much has been written about how technology has changed the way we assess individuals in hiring contexts (Reynolds & Dickter, 2017; Scott & Lezotte, 2012; Tippins, 2015). Some of the advances and advantages noted have been the efficiencies and cost savings that accompany web-based delivery of assessments and adaptive testing technologies (see Scott et al., 2018, for a book-length discussion). Technological advances are also credited with enhancing the candidate experience through providing more convenient and engaging assessments (e.g., use of video, gamification).

However, of all the promises of technology promoted in the past decade, we would argue the one which has been under-delivered is that of harnessing technology to assess new and different attributes. The anticipation was that these technological changes in delivery and the use of greater context would open the door for measuring things that could not be easily measured before. We would argue that while new ways of measuring constructs are being implemented, new constructs are not being measured. Further in some cases, new ways of measuring are implemented but what constructs are being measured is unclear.

New ways of more easily assessing established content include using drag-and-drop matching items, image matching, and hotspot items (e.g., pointing to location of photo), using mobile functionality (e.g., swiping), recording video answers to interview or SJT questions, and scraping social media or other data sources (Dickter, Jockin, & Delany, 2017; Kantrowitz & Gutierrez, 2018). These new ways, however, are typically employed in service of assessing traditional constructs (e.g., verbal skills, quantitative reasoning). Adler, Boyce, and Caputo, (2018) note the majority of internet-delivered cognitive ability tests ask the traditional multiple choice questions on traditional content (e.g., chart and paragraph interpretations). Indeed, Chamorro-Premuzic et al. (2016) discuss that “new talent signals” such as social media and big data are often still looking at the same essential attributes of inter- and intra-personal competencies, abilities, and willingness to work hard. As another example, serious games or gamified assessments are often described as novel, but many are not assessing new constructs, just multiple traditional constructs simultaneously (with the concomitant concerns then about the ability to assess many things well in limited time periods). As Bhatia and Ryan (2018) noted, there is very little unpublished or published research on the validity evidence in support of games and gamified assessments in selection, particularly on construct validity, something that needs to be addressed.

Behaviors such as mouse-over hover times, response latencies, eye tracking, measurement of facial micro-expressions, and biometric sensors assessing emotions are all being implemented (Reynolds & Dickter, 2017). The question of the construct validity of these measures seems to receive a more superficial treatment. Have we thought about “what” they signify or “why” they might relate to valued criteria? That is, the constructs being assessed need to be more clearly specified.

What are some examples of a new “what” that could be assessed? Adler et al. (2018) discuss leveraging the interactive capabilities of technology to assess the rate of speed in acquiring new knowledge (i.e., learning agility), but note that this has not been done yet to efficient levels that would fit a hiring context. As a second example they note that natural language processing technologies and capabilities to interpret visual input could lead to new assessments of personality of a more projective nature (i.e., assessments that involve responses to ambiguous or unstructured stimuli), tapping into less conscious motivations and tendencies, but as yet these have not been adopted for wide use. An example of where steps are being taken to measure new constructs is in assessing emotions, such as through applications of databases of micro-expressions (Yan, Wang, Liu, Wu, & Fu, 2014) and speech patterns into digital interviewing. Yet, even here, the detection of micro-expression and speech mining are discussed as an advance, but that is a focus on the “how” (i.e., tools or methods to assess) and not the “what” (i.e., constructs to be assessed). Tying these tools or methods more clearly to job-relevant constructs (e.g., specific personality traits, emotional regulation concepts) is important for the purpose of validation, explainability, and acceptability.

Realizing the potential for selection assessment to add something new to what we already assess in hiring should start with the development of predictive hypotheses based on job analytic information (Guion, 2011). That is, what KSAOs are important to work outcomes that we have not historically assessed because it was deemed too difficult and too time-consuming? This is where artificial intelligence (AI) might be leveraged in a less atheoretical fashion for assessing prior behavior from existing records. As an example, the field of learning analytics has led those in higher education to track all kinds of student behaviors (e.g., class attendance, course-taking patterns, performance on varied types of assessments). Could such data be used to go beyond GPA as an indicator of past learning ability and allow for assessment of more specific learning capabilities and motivational constructs not typically assessed in selection contexts? Another example would be to drill down into our job analytic results to specify what dynamic capabilities are required for a given job (e.g., adaptation to certain types of changes or events) and leverage the interactive nature of technology-enabled assessments to measure these more contextualized types of adaptive behavior.

As our next section discusses in more detail, are our technology-enhanced assessments “better” than what we had before or just different? Perhaps we are not assessing new things, but the same old things are now assessed in more efficient (and effective) ways.

Enhanced Efficiency, Effectiveness, and Fairness

There is certainly ample evidence that technology has increased efficiency through increasing ease of access, shortening times of assessment and scoring, allowing greater use of multiple assessment types, reducing costs in delivery and scoring, and other process improvements (Adler et al., 2018). One early promise of technology was the ability to increase reliability and validity in assessment. A closer look as to whether this is being fully realized is warranted. We illustrate this with discussions of several assumptions that are unwarranted: 1) as context and richness increases, validity increases, 2) as efficiency in delivery increases, effectiveness is unharmed, 3) as people have less of a role in the process of assessing, validity and fairness increases, and 4) as the quantity of data and information considered increases, validity and fairness increases.

Effects of Contextual Enhancement

One assumption associated with technologically-enhanced assessment is that creating more realistic items will more closely mimic what actually occurs at work and will therefore improve the accuracy of our measurement. However, when we transform our items from written descriptions to video items or other media rich depictions, we may actually be adding more noise and even systematic error (Hawkes, Cek, & Handler, 2018).

As an example, video-SJTs are viewed as an improvement in effectiveness over written SJTs because of the decreased reliance on reading comprehension as well as increased engagement of test takers (Chan & Schmitt, 1997; Jones & DeCotiis, 1986; Lievens & Sackett, 2006). However, video SJTs can have an additional source of non-construct relevant variance in scores, as actors in videos vary in gender, ethnicity, age, and other factors. Individuals process demographic cues quickly and automatically (Ito & Urland, 2003), and this information can influence behavioral responses (e.g., Avery, McKay, & Wilson, 2008; Eagly & Crowley, 1986; Kunstman & Plant, 2008; Perkins, Thomas, & Taylor, 2000; Russell & Owens, 2001). Research on video SJTs has found that Black respondents perform better on an SJT when the videos include Black actors (Golubovich & Ryan, 2012) and White respondents have been found to react less favorably to a hypothetical organization after viewing its SJT videos featuring Black actors (Golubovich & Ryan, 2013).

These studies provide an example of how adding context or “situation” to our measures of “person” have effects. Besides how actors and avatars look, one can imagine effects for all the nonverbal information that is conveyed in these scenarios. The question is whether the technology advance over a written text for a question increases accuracy by assessing behavior in context or adds information that affects measurement accuracy in negative ways. For example, virtual reality (VR) can certainly transform assessments to feel more “real” for participants (Reynolds & Dickter, 2017), but is a VR simulation a “better” measure of the targeted constructs? Adler et al. (2018) suggest that a VR simulation of public speaking with a virtual audience might be a better assessment than a more low-fidelity simulation of these skills; perhaps, but we do not know. We would need to draw on existing research regarding audience presence to develop clear hypotheses and test them.

As another example, video-resumes can provide more personalized, job-relevant information for use in screening, which should in theory result in less bias in information processing. However, these methods also provide more social category information than traditional resumes and more non-job related information, which can create a cognitive challenge for a rater (Apers & Derous, 2017). The category cues dominate our perceptual systems and attract our attention and hence have a greater probability of being processed more deeply (Kulik, Roberson, & Perry, 2007).

As a final example, Arthur, Doverspike, Kinney, and O’Connell (2017) provide a strong note of caution regarding game-thinking in selection contexts, pointing out that job candidates are likely already highly motivated and so the enhanced engagement which game elements are meant to deliver may not have appreciable effects. Further, game mechanics related to the value of feedback may have negative consequences such as increased anxiety (Arthur et al., 2017). Hawkes et al. (2018) go so far as to note that the reliability of game assessments may be affected negatively by practice effects such as those seen in the gaming world or by adding noisy variance associated with hand-eye coordination and mouse control. One direction for the future will be to show that all of this effort focused on enhancing fidelity and engagement adds something in terms of validity and that “something” added is not more error in measurement.

Effects of Efficiency

We should not assume that moving to a more “advanced” technology necessarily increases effectiveness even if it increases efficiency. This is clearly evident in the trend toward shorter assessments (Hardy, Gibson, Sloan, & Carr, 2017; Kruyen, Emons, & Sijtsma, 2012, 2013) where efficiency gains may result in reliability and validity decrements.

One stream of research focused on how more efficient technology advancements may distort what is measured is that on assessment mode equivalence. There is ample research available to indicate that some measures are equivalent across modes (non-cognitive tests; see review by Tippins, 2015) and others are not (e.g., speeded cognitive tests; King, Ryan, Kantrowitz, Grelle, & Dainis, 2015; Mead & Drasgow, 1993). For example, non-cognitive measures may be equivalent when moving from PC to mobile assessment but cognitive measures and SJTs might not, depending on features like scrolling… (see Arthur et al., 2017; King et al., 2015 for comparisons). Morelli, Potosky, Arthur, and Tippins (2017) note that “reactive equivalence” studies comparing assessment modes are not theoretically (or even practically) informative as they do not address why mode differences occur or the reasons for the construct non-equivalence. Frameworks such as Potosky’s (2008) and Arthur, Keiser, and Doverspike’s (2018) indicate ways in which different assessment delivery devices might differ; these need to be expanded and tested to more systematically understand whether and why more efficient delivery of assessments might be a more rather than less accurate way to measure (see Apers & Derous, 2017, for an example in the context of resume screening). Morelli et al. (2017) make a strong case for better theory-based predictions that go beyond considering variance associated with technology use as “construct irrelevant.” We echo their general call for elevating efforts regarding construct specification and evaluating technological feature influence in a more considered fashion.

As another example of this, while the potential for greater accuracy in measurement lies in use of adaptive testing, many computerized tests are not adaptive (Adler et al., 2018). Wider adoption will occur when we can scale up item banks more quickly and refresh them without onerous effort – those in educational testing are very much experimenting with ways to create item clones and variations more efficiently (e.g., using automatic item generation; see Drasgow & Olson-Buchanan, 2018; Kantrowitz & Gutierrez, 2018). Effectiveness of adaptive methods can be severely hampered by small item pools that lack adequate numbers of items at certain trait or ability levels, a not uncommon occurrence. As a final example, the use of an open badges system in Belgium (Derous, 2019) significantly shortened recruitment and selection procedures by allowing candidates with badges (for having passed assessments with another organization) to bypass retesting. Time of both the employer and applicant are saved, and in this case there is no observable negative impact on quality. Taken together, efficiency of assessment can mean more valid assessment, but it can also mean faster and cheaper but not better: it is important to recognize and address which is occurring.

Effects of Removing Judgment

Another broad assumption is that technology increases the accuracy of measurement as we pull humans out of the process. That is, by removing human administration, scoring, and judgment, a reduction in error in measurement is expected (e.g., structuring interviews can improve reliability and validity). However, this is not always the case. For example, recently Facebook was questioned regarding bias against women as job ads were targeted toward certain demographic groups and not others (Scheiber, 2018b; see also Datta, Tschantz, & Datta, 2015 and Sweeney, 2013, for similar examples). Similarly, news stories regarding Amazon’s abandonment of AI screening of resumes because of biases (Meyer, 2018) as well as questions regarding video interview technology and bias continue (see Buolamwini, 2018; McIllvaine, 2018). Finally, a number of researchers have noted the potential biases that may emerge in using social media screening of job candidates, leading to both perceived unfairness (Stoughton et al., 2015) and actual discrimination (Van Iddekinge, Lanivich, Roth, & Junco, 2016). Because existing databases can capture historical biases, the need to consider how technological advances may tap into those biases needs to be at the forefront of assessment design considerations, rather than assuming that the computer is a better judge (e.g., by training the algorithm to make non-discriminatory decisions; Ajunwa, 2016).

As another example, Cascio and Montealegre (2016) noted that we might want to consider the changes to the role of the recruiter due to constant connectivity. Research focused on greater efficiency in reaching candidates has not always considered concurrent positive and negative effects on the internal organizational members such as recruiters, interviewers, and selection system administrators. Rather than assuming that technology has made organizational members’ lives easier, we might consider how their efficiency and effectiveness is both enhanced and burdened by technological innovations in assessment (e.g., candidates are processed more efficiently but the greater volume negates any gains in work time for recruiters).

The transfer of assessment tasks and roles to technology, as with any other automation in the workplace, theoretically removes the low skills, tedious parts of the job, and frees people up to focus on the more creative tasks and the interpersonal elements of the work (Brynjolfsson & McAfee, 2014). We need to attend to ways this is true for those involved in assessment as well as counterinfluences that decrease effectiveness in other aspects of assessment professionals’ work.

Effects of More Data

The research on text analysis, natural language processing, and social media scraping similarly raises questions regarding the effectiveness of efficient methods. While unobtrusive measurement has many advantages in terms of ease and efficiency (and many potential concerns regarding privacy and information control), a key for assessment experts is always validity of inferences. As Dickter et al. (2017) note, data science experts do not approach modeling considering the concept of construct validity.

Research on validity of social media data has been mixed. Connections to job performance have been found in several studies, e.g., based on Facebook ratings (e.g., Kluemper & Rosen, 2009; Kluemper, Rosen, & Mossholder, 2012) and – more recently- also based on LinkedIn ratings (e.g., Roulin & Levashina, 2018). Van Iddekinge et al. (2016), however, sounded a strong cautionary note, as they found that recruiter ratings of Facebook pages were unrelated to job performance or turnover and also had adverse impact (i.e., lower scores or selection rates for underrepresented groups; Guion, 2011). This line of research on validity and social media scraping is a good illustration of the work that needs to be done to build our understanding of when, where, why, and how technology-enabled assessment tools are effective and when they do not fulfill their potential.

Indeed, as we alluded to earlier, the notion that bias is reduced and fairness perceptions increased when one relies on larger quantities of data has been shown to be a fallacy (see for example, Caliskan, Bryson, & Narayanan, 2017). Algorithms can and do build in existing biases when the data they are trained on may have been derived from a biased judgment process and/or biased labor market. There is a lack of an ability to articulate to job candidates exactly what is being assessed and why makes big data use in selection decision-making a challenge from a fairness perceptions perspective as well.

Our examples in these last few sections should not be construed as indicating that technology-enabled assessment is bound to be less effective than traditional methods; indeed, we believe that validity and fairness can be enhanced through technology. Rather, we seek to emphasize the importance of not conflating efficiency and effectiveness when discussing the value of technology in assessment, and the need for more thoughtful, theory-driven examinations of when effectiveness is likely to be enhanced and when it is not.

Changing Nature of Work Requires Changing Assessments

Besides focusing on how technology changes assessments, we should focus on how technological changes to work itself should lead us to change what is assessed. To illustrate, we discuss how the virtuality, contingency, automation, transparency, and globalization of work (in some sense all byproducts of technological change) should be impacting selection assessment.

Virtuality

Discussions on the changing nature of work often focus on how individuals are much more likely to be working remotely and collaborating virtually (Brawley, 2017; Spreitzer et al., 2017). The literature on team virtuality has suggested that competencies for virtual collaborative work and using computer-mediated communication may differ in depth and complexity from those associated with face-to-face teamwork (see Schulze & Krumm, 2017, for a review; also Hassell & Cotton, 2017; Schulze et al., 2017). Beyond offering assessments in digital formats, are assessment designers considering more carefully what is being measured for virtual work? KSAOs such as awareness of media capabilities, communication style adaptability, and other “virtual skills” should be a greater focus of assessment developers.

As another example, in reviewing the effects of telecommuting, Allen, Golden, and Shockley (2015) point out individual differences that relate to successful telecommuting (i.e., moderators of productivity as well as social isolation and satisfaction effects such as self-management skills, personality characteristics, boundary management styles). Assessments related to capability and satisfaction with greater levels of remote work may be valuable for hiring contexts with greater remoteness.

Contingency

Today’s workers are said to be more often employed in short-term engagements (gigs) or contract work rather than long-term, traditional employee/employer arrangements. The implications of this increased contingency of work (George & Chattopadhyay, 2017; Spreitzer et al., 2017) should be given greater consideration. For example, does this mean administering assessments more often as people move from contract to contract? Or does it imply less assessing as organizations do not invest in evaluating KSAOs, relying on crowdsourced ratings and rankings to evaluate talent (Aguinis & Lawal, 2013)? The use of online talent exchanges as a means of securing employment should be forcing a consideration of what is assessed, as well as when and how assessment for gigs and short term contracts may look different than that for longer-term employment relationships. While there is a need for validation work for determining what works in predicting performance and other outcomes in these arrangements, as Brawley (2017) notes, we first need better theory and empirical evidence regarding what relates to the attitudes and behavior of “serious” gig workers.

Figure 1

The Unrealized Potential of Technology in Selection Assessment.

In one effort at expanding theory regarding contingent workers, Petriglieri, Ashford, and Wrzesniewski (2018) have provided a framework for understanding how some individuals are successful at managing the uncertainty of a freelance career. For example, they discuss how those who were able to cultivate established routines for their workdays are more effective than others in managing the greater ambiguity and lack of structure of these careers. Assessments to help individuals understand the predispositions and competencies required for success as a gig worker may be useful tools for platforms that seek to connect workers to work, even if they are not used as selection tools by employers but as self-selection tools used by gig workers.

Automation

As automation of jobs increases (Metz, 2018; Scheiber, 2018a; Wingfield, 2017), we need to consider what else should enter into the selection process. Automation has led to a decrease in lower skill jobs (Nedelkoska & Quintini, 2018), so perhaps some of the high volume selection assessments that are a big share of the assessment market (e.g., basic math skills) will have reduced demand, and the need for assessment of higher level skills (e.g., advanced math skills) will emerge as a larger focus. In general, we tend to focus our assessment development efforts on high volume entry level jobs that may be automated in the near future; as work changes, what is assessed may need to change.

As technology changes jobs, some have also asked what new skills should be assessed. As an example of one way change can impact assessment, research on the social acceptance of technology (Gaudiello, Zibetti, Lefort, Chetouani, & Ivaldi, 2016; Seo et al., 2018; Syrdal, Dautenhahn, Koay, & Walters, 2009) examines how people can collaborate with robots as coworkers (cobots). The questions as to how the form and level of teamwork skills such as communication, collaboration, and conflict management might differ when coworkers are cobots, and what that means for assessment has not been explored.

One might also consider how robotic interfaces within the assessment process might affect tool validity. For example, in many online assessments, individuals interact with avatars. One can envision future assessment center role-players that are humanoid robots, freeing up assessor time and enhancing consistency across candidates. Rather than assuming that interaction with a “simulated other” is akin to an actual interpersonal interaction, understanding the acceptance of and skills for interacting with automation can aid us in adopting technology enhancements to assessment processes that consider these issues of social acceptance.

Transparency

Another example of a workplace trend affecting assessments is the move of organizations toward greater transparency vis-à-vis both external and internal stakeholders (Parris, Dapko, Arnold, & Arnold, 2016). In the selection context, this involves greater transparency regarding what is being assessed and why. Most of the research related to transparency in selection has suggested positive effects on validity and candidate experience (e.g., Klehe, König, Richter, Kleinmann, & Melchers, 2008; Kleinmann, Kuptsch, & Köller, 1996; Kolk, Born, & der Flier, 2003). However, Jacksch and Klehe (2016) demonstrated that transparency’s positive effects are limited to nonthreatening performance dimensions; that is, transparency can benefit some candidates and harm others if the attribute being assessed is associated with a negative stereotype related to the social identity of those being assessed. Langer, König, and Fitili (2018) demonstrated that providing greater transparency regarding what is assessed in an online interview by an avatar (e.g., facial expression, gestures, voice pitch) had equivocal effects on organizational attractiveness, with individuals appreciating the organization’s candor but simultaneously decreasing their views of the organization overall. Langer et al. (2018) note that “what” information and “how much” information to provide regarding technology-enhanced assessments is deserving of greater research focus in this time of greater pressures for transparency.

Globalization

The globalization of business also has implications for assessment (Ryan & Ployhart, 2014). Not only does it mean that assessments must be delivered in multiple languages, and that all the ensuing efforts to ensure psychometric equivalence occur (see International Test Commission, 2005, for standards), but it means all the associated implementation issues related to cross-cultural implementation be considered (see Ryan & Tippins, 2012, for a comprehensive discussion; see Fell, König, & Kammerhoff, 2016, for a specific example of cross-cultural differences in faking in interviews; see Ryan & Delany, 2017, for a discussion of recruiting globally). Also, the need to assess cultural competencies is amplified, like cultural intelligence (Ang et al., 2007), cultural values (Hofstede, 2001), cultural adjustment (Salgado & Bastida, 2017), and leadership and teamwork in multinational teams (Han & Beyerlein, 2016).

We also note that prior research has indicated there can be cross-cultural differences in acceptance of and skill in interacting with technology (Dinev, Goo, Hu, & Nam, 2009; Nistor, Lerche, Weinberger, Ceobanu, & Heymann, 2014). In the selection assessment space, we need to be more mindful of how technological change in the workplace more broadly is varied across cultures – in both adoption rates and in user acceptance – and how that might impact selection tool effectiveness.

Conclusion

The premise of this paper is that there is a lot of unrealized potential in the incorporation of technology into selection assessments. Figure 1 summarizes in a schematic where we believe that the (unrealized) potential of technology in selection assessment is currently situated and themes from this paper that both researchers and practitioners might focus on. First, as regards the predictor side of selection assessment, we have noted that advances in delivery, scoring, interactivity, and reporting have been implemented with a focus on greater efficiency as well as job applicant engagement and other stakeholder reactions. However, we see the potential for leveraging technology to assess new constructs as well as to increase validity as not yet fulfilled and we urge greater focus and energy in these directions. We also have admonished those who ignore the potential downsides of technology-enhanced assessments such as greater introduction of construct-irrelevant variance or instances of unfairness to disadvantaged groups. Of course, technology can and should continue to serve as a means of enhancing assessment efficiency.

Second, we urge selection researchers, assessment developers, and other practitioners to take a closer look at the criterion side, i.e., how work and the workplace is changing, and seek ways to better align assessment content (i.e., KSAs measured) as well as practice (e.g., transparency) with those changes (see Figure 1). While we have highlighted some of the more prevalent trends in the changing nature of work (virtuality, automation, contingency, globalization, and transparency), there are likely other emerging changes that also can serve as inspiration for developing new assessments, leveraging the advantages offered by technology.

The future of technology-enabled assessments is only limited by imagination; we anticipate continued changes in the way in which assessments are developed, delivered, and scored. More importantly, we hope that the next decade will have a greater focus on the areas we have outlined in this paper as needing attention, as this would ensure the realization of the full potential of technology-enabled assessments.

Cite this article as: Ryan, A. M. & Derous, E. (2019). The unrealized potential of technology in selection assessment. Journal of Work and Organizational Psychology, 35, 85-92. https://doi.org/10.5093/jwop2019a10 [Antonio García-Izquierdo and David Aguado were the guest editors for this article].

References

Adler, S., Boyce, A. S., & Caputo, P. M. (2018). Employment testing. In J. C. Scott, D. Bartram, & D. H. Reynolds (Eds.), Next generation technology-enhanced assessment: Global perspectives on occupational and workplace testing (pp. 3-35). Cambridge, UK: Cambridge University Press.

Aguinis, H., & Lawal, S. O. (2013). eLancing: A review and research agenda for bridging the science–practice gap. Human Resource Management Review, 23, 6-17. https://doi.org/10.1016/j.hrmr.2012.06.003

Ajunwa, I. (2016). Hiring by algorithm. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2746078

Allen, T. D., Golden, T. D., & Shockley, K. M. (2015). How effective is telecommuting? Assessing the status of our scientific findings. Psychological Science in the Public Interest, 16, 40-68. https://doi.org/10.1177/1529100615593273

Ang, S., Van Dyne, L., Koh, C., Ng, K., Templer, K., Tay, C., & Chandrasekar, N. (2007). Cultural intelligence: Its measurement and effects on cultural judgment and decision making, cultural adaptation, and task performance. Management and Organization Review, 3, 335-371. https://doi.org/10.1111/j.1740-8784.2007.00082.x

Apers, C., & Derous, E. (2017). Are they accurate? Recruiters’ personality judgments in paper versus video resumes. Computers in Human Behavior, 73, 9-19. https://doi.org/10.1016/j.chb.2017.02.063

Arthur, W. J., Doverspike, D., Kinney, T. B., & O’Connell, M. (2017). The impact of emerging technologies on selection models and research: Mobile devices and gamification as exemplars. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (2nd ed., pp. 967-986). New York, NY: Routledge.

Arthur, W., Keiser, N. L., & Doverspike, D. (2018). An information-processing-based conceptual framework of the effects of unproctored internet-based testing devices on scores on employment-related assessments and tests. Human Performance, 31, 1-32. https://doi.org/10.1080/08959285.2017.1403441

Avery, D. R., McKay, P. F., & Wilson, D. C. (2008). What are the odds? How demographic similarity affects the prevalence of perceived employment discrimination. Journal of Applied Psychology, 93, 235-249. https://doi.org/10.1037/0021-9010.93.2.235

Bhatia, S., & Ryan, A. M. (2018). Hiring for the win: Game-based assessment in employee selection. In J. H. Dulebohn & D. L. Stone (Eds.), The brave new world of eHRM 2.0 (pp. 81-110). Charlotte, NC: Information Age Publishing.

Brawley, A. M. (2017). The big, gig picture: We can’t assume the same constructs matter. Industrial and Organizational Psychology, 10, 687-696. https://doi.org/10.1017/iop.2017.77

Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. New York, NY: W.W. Norton.

Buolamwini, J. (2018, June 21). When the robot doesn’t see dark skin. NY Times. Retrieved from https://www.nytimes.com/2018/06/21/opinion/facial-analysis-technology-bias.html

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. https://doi.org/10.1126/science.aal4230

Cascio, W. F., & Montealegre, R. (2016). How technology is changing work and organizations. Annual Review of Organizational Psychology and Organizational Behavior, 3, 349-375. https://doi.org/10.1146/annurev-orgpsych-041015-062352

Chamorro-Premuzic, T., Winsborough, D., Sherman, R. A., & Hogan, R. (2016). New talent signals: Shiny new objects or a brave new world? Industrial and Organizational Psychology, 9, 621-640. https://doi.org/10.1017/iop.2016.6

Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judgment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied Psychology, 82, 143-159. https://doi.org/10.1037/0021-9010.82.1.143

Datta, A., Tschantz, M. C., & Datta, A. (2015). Automated experiments on ad privacy settings. Proceedings on Privacy Enhancing Technologies, 2015(1). https://doi.org/10.1515/popets-2015-0007

Derous, E. (2019). Van boe-boe machine tot sociale media: Evidence-based werven en selecteren [From reaction time tests to social media: Evidence-based recruitment and selection]. In J. Valk & L. Lopes de Leao Laguna (Eds.), HRM heden en morgen: Evidence-based practice & Practice-based evidence (pp. 1-25). Amsterdam, The Netherlands: Vakmedianet Management.

Dickter, D. N., Jockin, V., & Delany, T. (2017). The evolution of E-selection. In G. Hertel, D. L. Stone, R. D. Johnson, & J. Passmore (Eds.), The Wiley Blackwell Handbook of the Psychology of the Internet at Work (pp. 257-283). Chichester, UK: John Wiley & Sons, Ltd. https://doi. org/10.1002/9781119256151.ch13

Dinev, T., Goo, J., Hu, Q., & Nam, K. (2009). User behaviour towards protective information technologies: The role of national cultural differences. Information Systems Journal, 19, 391-412. https://doi. org/10.1111/j.1365-2575.2007.00289.x

Drasgow, F., & Olson-Buchanan, J. B. (2018). Technology-driven developments in psychometrics. In J. C. Scott, D. Bartram, & D. H. Reynolds (Eds.), Next generation technology-enhanced assessment: Global perspectives on occupational and workplace testing. (pp. 239-264). Cambridge, UK: Cambridge University Press.

Eagly, A. H., & Crowley, M. (1986). Gender and helping behavior: A meta-analytic review of the social psychological literature. Psychological Bulletin, 100, 283-308. https://doi.org/10.1037/0033-2909.100.3.283

Fell, C. B., König, C. J., & Kammerhoff, J. (2016). Cross-cultural differences in the attitude toward applicants’ faking in job interviews. Journal of Business and Psychology, 31, 65-85. https://doi.org/10.1007/s10869-015-9407-8

Gaudiello, I., Zibetti, E., Lefort, S., Chetouani, M., & Ivaldi, S. (2016). Trust as indicator of robot functional and social acceptance. An experimental study on user conformation to iCub answers. Computers in Human Behavior, 61, 633-655. https://doi.org/10.1016/j.chb.2016.03.057

George, E., & Chattopadhyay, P. (2017). Understanding nonstandard work arrangements: Using research to inform practice. SHRM-SIOP Science of HR Series. Retrieved from http://www.siop.org/SIOP-SHRM/2017_03_SHRM-SIOP_Nonstandard_Workers.pdf

Golubovich, J., & Ryan, A. M. (2012). Demographic cues in video-based situational judgment items. Symposium presented at the Annual Meeting of the Society for Industrial and Organizational Psychology. San Diego, CA.

Golubovich, J., & Ryan, A. M. (2013). Demographic cues in video-based situational judgment items: An extension. Poster presented at the annual meeting of the Society for Industrial and Organizational Psychology. Houston, TX.

Guion, R. M. (2011). Assessment, measurement, and prediction for personnel decisions. New York, NY: Routledge.

Han, S. J., & Beyerlein, M. (2016). Framing the effects of nultinational cultural diversity on virtual team processes. Small Group Research, 47, 351-383. https://doi.org/10.1177/1046496416653480

Hardy, J. H., Gibson, C., Sloan, M., & Carr, A. (2017). Are applicants more likely to quit longer assessments? Examining the effect of assessment length on applicant attrition behavior. Journal of Applied Psychology, 102, 1148-1158. https://doi.org/10.1037/apl0000213

Hassell, M. D., & Cotton, J. L. (2017). Some things are better left unseen: Toward more effective communication and team performance in video-mediated interactions. Computers in Human Behavior, 73, 200-208. https://doi.org/10.1016/j.chb.2017.03.039

Hawkes, B., Cek, I., & Handler, C. (2018). The gamification of employee selection tools: An exploration of viability, utility, and future directions. In J. C. Scott, D. Bartram, & D. H. Reynolds (Eds.), Next generation technology-enhanced assessment: Global perspectives on occupational and workplace testing. (pp. 288-316). Cambridge, UK: Cambridge University Press.

Hofstede, G. (2001). Culture’s consequences: Comparing values, behaviors, institutions, and organizations across nations. Thousand Oaks, CA: Sage Publications.

International Test Commission. (2005). Guidelines on computer-based and internet delivered testing. Retrieved from www.intestcom.org

Ito, T. A., & Urland, G. R. (2003). Race and gender on the brain: Electrocortical measures of attention to the race and gender of multiply categorizable individuals. Journal of Personality and Social Psychology, 85, 616-626. https://doi.org/10.1037/0022-3514.85.4.616

Jacksch, V., & Klehe, U.-C. (2016). Unintended consequences of transparency during personnel selection: Benefitting some candidates, but harming others? International Journal of Selection and Assessment, 24, 4-13. https://doi.org/10.1111/ijsa.12124

Jones, C., & DeCotiis, T. A. (1986). Video-assisted selection of hospitality employees. Cornell Hotel and Restaurant Administration Quarterly, 27(2), 67-73. https://doi.org/10.1177/001088048602700222

Kantrowitz, T. M., & Gutierrez, S. L. (2018). The changing landscape of technology-enhanced test administration. In J. C. Scott, D. Bartram, & D. H. Reynolds (Eds.), Next generation technology-enhanced assessment: Global perspectives on occupational and workplace testing (pp. 193-216). Cambridge, UK: Cambridge University Press.

King, D. D., Ryan, A. M., Kantrowitz, T., Grelle, D., & Dainis, A. (2015). Mobile internet testing: An analysis of equivalence, individual differences, and reactions. International Journal of Selection and Assessment, 23, 382-394. https://doi.org/10.1111/ijsa.12122

Klehe, U.-C., König, C. J., Richter, G. M., Kleinmann, M., & Melchers, K. G. (2008). Transparency in structured interviews: Consequences for construct and criterion-related validity. Human Performance, 21, 107-137. https://doi.org/10.1080/08959280801917636

Kleinmann, M., Kuptsch, C., & Köller, O. (1996). Transparency: A necessary requirement for the construct validity of assessment centres. Applied Psychology, 45, 67-84. https://doi.org/10.1111/j.1464-0597.1996.tb00849.x

Kluemper, D. H., & Rosen, P. A. (2009). Future employment selection methods: evaluating social networking web sites. Journal of Managerial Psychology, 24, 567-580. https://doi.org/10.1108/02683940910974134

Kluemper, D. H., Rosen, P. A., & Mossholder, K. W. (2012). Social networking websites, personality ratings, and the organizational context: More than meets the eye? Journal of Applied Social Psychology, 42, 1143-1172. https://doi.org/10.1111/j.1559-1816.2011.00881.x

Kolk, N. J., Born, M. P., & der Flier, H. van. (2003). The transparent assessment centre: The effects of revealing dimensions to candidates. Applied Psychology, 52, 648-668. https://doi.org/10.1111/1464-0597.00156

Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2012). Test length and decision quality in personnel selection: When is short too short? International Journal of Testing, 12, 321-344. https://doi.org/10.1080/15305058.2011.643517

Kruyen, P. M., Emons, W. H. M., & Sijtsma, K. (2013). On the shortcomings of shortened tests: A literature review. International Journal of Testing, 13, 223-248. https://doi.org/10.1080/15305058.2012.703734

Kulik, C. T., Roberson, L., & Perry, E. L. (2007). The multiple-category problem: Category activation and inhibition in the hiring process. Academy of Management Review, 32, 529-548. https://doi.org/10.5465/AMR.2007.24351855

Kunstman, J. W., & Plant, E. A. (2008). Racing to help: Racial bias in high emergency helping situations. Journal of Personality and Social Psychology, 95, 1499-1510. https://doi.org/10.1037/a0012822

Langer, M., König, C. J., & Fitili, A. (2018). Information as a double-edged sword: The role of computer experience and information on applicant reactions towards novel technologies for personnel selection. Computers in Human Behavior, 81, 19-30. https://doi.org/10.1016/j.chb.2017.11.036

Lievens, F., & Sackett, P. R. (2006). Video-based versus written situational judgment tests: A comparison in terms of predictive validity. Journal of Applied Psychology, 91, 1181-1188. https://doi.org/10.1037/0021-9010.91.5.1181

McIllvaine, A. R. (2018). In the fight against bias, AI faces backlash. Retrieved from www.hrexecutive.com

Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449-458. https://doi.org/10.1037/0033-2909.114.3.449

Metz, C. (2018, March 18). FedEx follows Amazon into the robotic future. New York Times. Retrieved from https://www.nytimes.com/2018/03/18/technology/fedex-robots.html

Meyer, D. (2018). Amazon reportedly killed an AI recruitment system because it couldn’t stop the tool from discriminating against women. Retrieved from http://fortune.com/2018/10/10/amazon-ai-recruitment-bais-women-sexist/

Morelli, N., Potosky, D., Arthur, W., & Tippins, N. (2017). A call for conceptual models of technology in I-O Psychology: An example from technology-based talent assessment. Industrial and Organizational Psychology, 10, 634-653. https://doi.org/10.1017/iop.2017.70

Nedelkoska, L., & Quintini, G. (2018). Automation, skills use and training. (Working Papers, no. 202). Paris, France: OECD Publishing. Paris. https://doi.org/10.1787/2e2f4eea-en

Nistor, N., Lerche, T., Weinberger, A., Ceobanu, C., & Heymann, O. (2014). Towards the integration of culture into the unified theory of acceptance and use of technology. British Journal of Educational Technology, 45, 36-55. https://doi.org/10.1111/j.1467-8535.2012.01383.x

Parris, D. L., Dapko, J. L., Arnold, R. W., & Arnold, D. (2016). Exploring transparency: A new framework for responsible business management. Management Decision, 54, 222-247. https://doi.org/10.1108/MD-07-2015-0279

Perkins, L. A., Thomas, K. M., & Taylor, G. A. (2000). Advertising and recruitment: Marketing to minorities. Psychology and Marketing, 17, 235-255. https://doi.org/10.1002/(SICI)1520-6793(200003)17:3<235::AID-MAR3>3.0.CO;2-xxx

Petriglieri, G., Ashford, S. J., & Wrzesniewski, A. (2018). Thriving in the gig economy. Harvard Business Review, 96, 140-143. Retrieved from https://hbr.org/2018/03/thriving-in-the-gig-economy

Potosky, D. (2008). A conceptual framework for the role of the administration medium in the personnel assessment process. Academy of Management Review, 33, 629-648. https://doi.org/10.5465/amr.2008.32465704

Reynolds, D., & Dickter, D. N. (2017). Technology and employee selection: An overview. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (2nd ed., pp. 855-873). New York, NY: Routledge.

Roulin, N., & Levashina, J. (2018). LinkedIn as a new selection method: Psychometric properties and assessment approach. Personnel Psychology. https://doi.org/10.1111/peps.12296

Russell, A., & Owens, L. (2001). Peer estimates of school-aged boys’ and girls’ aggression to same- and cross-sex targets. Social Development, 8, 364-379. https://doi.org/10.1111/1467-9507.00101

Ryan, A. M., & Delany, T. (2017). Attracting job candidates to organizations. In J. Farr & N. Tippins (Eds.), Handbook of employee selection (2nd ed., pp. 165-181). Routledge.

Ryan, A. M., & Derous, E. (2016). Highlighting tensions in recruitment and selection research and practice. International Journal of Selection and Assessment, 24, 54-62. https://doi.org/10.1111/ijsa.12129

Ryan, A. M., & Ployhart, R. E. (2014). A Century of selection. Annual Review of Psychology, 65, 693-717. https://doi.org/10.1146/annurev-psych-010213-115134

Ryan, A. M., & Tippins, N. (2012). Designing and implementing global selection systems. Oxford, UK: Wiley-Blackwell.

Salgado, J. F., & Bastida, M. (2017). Predicting expatriate effectiveness: The role of personality, cross-cultural adjustment, and organizational support. International Journal of Selection and Assessment, 25, 267-275. https://doi.org/10.1111/ijsa.12178

Scheiber, N. (2018a, July 7). High-skilled white-collar work? Machines can do that, too. New York Times. Retrieved from https://www.nytimes.com/2018/07/07/business/economy/algorithm-fashion-jobs.html

Scheiber, N. (2018b, September 18). Facebook accused of allowing bias against women in job ads. New York Times. Retrieved from https://www.nytimes.com/2018/09/18/business/economy/facebook-job-ads.html

Schulze, J., & Krumm, S. (2017). The “virtual team player.” Organizational Psychology Review, 7, 66-95. https://doi.org/10.1177/2041386616675522

Schulze, J., Schultze, M., West, S. G., & Krumm, S. (2017). The knowledge, skills, abilities, and other characteristics required for face-to-face versus computer-mediated communication: Similar or distinct constructs? Journal of Business and Psychology, 32, 283-300. https://doi.org/10.1007/s10869-016-9465-6

Scott, J. C., Bartram, D., & Reynolds, D. H. (Eds.) (2018). Next generation technology-enhanced assessment. Cambridge, UK: Cambridge University Press.

Scott, J. C., & Lezotte, D. V. (2012). Web-based assessments. New York, NY: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199732579.013.0021

Seo, S. H., Griffin, K., Young, J. E., Bunt, A., Prentice, S., & Loureiro-Rodríguez, V. (2018). Investigating people’s rapport building and hindering behaviors when working with a collaborative robot. International Journal of Social Robotics, 10, 147-161. https://doi.org/10.1007/s12369-017-0441-8

Spreitzer, G. M., Cameron, L., & Garrett, L. (2017). Alternative work arrangements: Two images of the new world of work. Annual Review of Organizational Psychology and Organizational Behavior, 4, 473-499. https://doi.org/10.1146/annurev-orgpsych-032516-113332

Stoughton, J. W., Thompson, L. F., & Meade, A. W. (2015). Examining applicant reactions to the use of social networking websites in pre-employment screening. Journal of Business and Psychology, 30, 73-88. https://doi.org/10.1007/s10869-013-9333-6

Sweeney, L. (2013). Discrimination in online ad delivery. Queue, 11(3), 10. https://doi.org/10.1145/2460276.2460278

Syrdal, D. S., Dautenhahn, K., Koay, K. L., & Walters, M. L. (2009). The negative attitudes towards Robots Scale and reactions to robot behaviour in a live human-robot interaction study. In N. Taylor (Ed.), Adaptive and emergent behaviour and complex systems: Proceedings of the 23rd Convention of the Society for the Study of Artificial Intelligence and Simulation of Behaviour (pp. 109-115). Edinburgh, UK: Society for the Study of Artificial Intelligence and Simulation for Behavior (AISB). Retrieved from http://www.scopus.com/inward/record.url?partnerID=yv4JPVwI&eid=2-s2.0-84859046918&md5=0901ed76f558c614f66619620d76878d

Tippins, N. T. (2015). Technology and assessment in selection. Annual Review of Organizational Psychology and Organizational Behavior, 2, 551-582. https://doi.org/10.1146/annurev-orgpsych-031413-091317

Van Iddekinge, C. H., Lanivich, S. E., Roth, P. L., & Junco, E. (2016). Social media for selection? Validity and adverse impact potential of a facebook-based assessment. Journal of Management, 42, 1811-1835. https://doi.org/10.1177/0149206313515524

Wingfield, N. (2017, September 10). As Amazon pushes forward with robots, workers find new roles. New York Times. Retrieved from https://www.nytimes.com/2017/09/10/technology/amazon-robots-workers.html

Yan, W.-J., Wang, S.-J., Liu, Y.-J., Wu, Q., & Fu, X. (2014). For micro-expression recognition: Database and suggestions. Neurocomputing, 136, 82-87 https://doi.org/10.1016/j.neucom.2014.01.029

Introduction
Conclusion

Correspondence: ryanan@msu.edu (A. M. Ryan).

Go top

<< Previous

Next >>