Nine in 10 human resource leaders don’t believe annual performance reviews result in accurate information(1). Similarly, a survey of Fortune 1,000 companies reported that 66% of employees were strongly dissatisfied with the performance evaluations they received—71% of employees perceived that their evaluations were unfair(2). Line managers also lack confidence in performance appraisals. In one study, only 15% of women and 24% of men managers had confidence in the performance evaluation process, while most viewed it as subjective and ambiguous(3). Despite concerns, a survey of 100 large organisations reported 57% of them they weren’t taking any actions to address bias in performance reviews(4).
Idiosyncratic and Inter-Rater Bias
The idiosyncratic rater effect refers to individual-level variations in assessing the performance of others(5). The impact of the idiosyncratic rater effect on performance appraisal is significant: 58 to 72 percent of an individual’s performance rating reflects assessors’ characteristics, not theirs(6)(7). The idiosyncratic rater effect means that an individual can receive different ratings and subjective feedback from different assessors.
Idiosyncratic Rater Bias Risks
Evaluations of performance are driven by individual raters’ interpretations of the meaning of assessment criteria, their sense of what ‘good’ looks like for a particular competency, variations in how harsh or lenient they are in judging others, and their own inherent and unconscious social and cognitive biases.
The idiosyncratic rater effect is more pronounced when performance evaluations are weighted towards or open to subjective assessments of performance. Subjective assessments of performance are common in professional services settings where objective measures are difficult to define and capture. Studies show that people don’t hold stable or equivalent definitions of qualities like business acumen, strategic thinking, political savvy, leadership potential, assertiveness(8). Subjective assessments are also common in an ‘open-box’ approach to performance appraisal involving forms that pose broad, generic questions about employee performance and offer a blank space for managers to respond with their observations. An example of a broad open-box question is ‘Describe the ways the employee’s performance met your expectations? The problem with generic questions is their ambiguity—managers are left to define or interpret what the specific expectations are for that particular employee. Studies have shown that when performance criteria are ambiguous, people are more likely to rely on stereotypes and other biases or subjective criteria such as attitudes and personality when making an appraisal, rather than making an objective assessment of how well an employee performed their assigned tasks and exhibited desired behaviours(9)
Poorly-defined rating scales
Rating scales, relative to open box appraisals, are not necessarily any more objective. When rating criteria are not well-defined in terms of observable behaviours and measurable outcomes, individual raters interpret criteria and rating scales differently. One assessor’s score of 3, for example, might be another assessors score of 5. Non-numerical rating scales such as ‘Exceeds, Meets, Needs Improvement’ are also open to individual rater interpretations.
Consequences of Unfair Appraisals
Favouring advancement of some over others
Bias in performance appraisal can favour the advancement of some groups over others. In turn, appraisal bias limits the organisation’s ability to build to a diverse leadership team comprising the organisation’s top performers. Examples include the glass ceiling that prevents women from advancing to executive roles and the bamboo ceiling that holds back individuals with Asian descent from attaining leadership positions at the same rate as individuals with European backgrounds.
Favouring some groups for particular roles
Bias in performance appraisal can also result in some groups being seen as a better fit for particular roles. In turn bias in role-fit limits an organisation’s ability to build diverse workgroups across the organisation. Examples include the clustering of women in human resource functions or administrative roles and the clustering of individuals of Asian descent in technical or technology roles. Role segregation can result in a workforce that is diverse overall, but homogenous at the department or team level. Homogeneous teams report lower levels of performance compared with diverse and inclusive teams. To reap the competitive benefits of a diverse workforce, the ultimate goal of an organisation’s diversity efforts should be fostering diversity and inclusion at the level of the workgroup. Many organisations focus on tracking and improving entity-level diversity and overlook where the rubber hits the road—team diversity.
Ineffective professional development
When supervisors and their employees do not have reliable information on employee performance, setting meaningful and relevant professional development goals is not possible and an employee’s potential is not realised. An example is the divergent feedback given to men and women. Studies show that men are more likely to receive specific feedback and guidance on how they can improve their performance, whereas women are more likely to receive vague feedback not linked to specific outcomes and not accompanied by actionable performance development guidelines(10). Other research shows women leaders often receive negative feedback that is overly focused on communication style and conflicting—told on the one hand that they’re too bossy or aggressive, but on the other that they should be more confident and assertive(11). One study reported that 76% of references to being “too aggressive” happened in women’s reviews, versus 24% in men’s(10).
Unfair compensation and rewards
Because an organisation’s remuneration structure is typically linked to performance appraisal, bias in appraisal and performance management can lead to inconsistencies in pay and rewards whereby individuals are not remunerated fairly in a manner that accurately reflects their efforts and achievement. Fair compensation is often interpreted as equal pay for like roles, and many companies claim to have achieved equal pay, at least for gender diversity. Fewer companies track and manage compensation by other traditionally marginalised dimensions, including race/ethnicity/language, sexual orientation, and disability. Fair compensation, however, goes beyond equal pay for like roles. Pay inequity occurs when some groups have access to higher financial rewards and non-financial benefits because they belong to an identity group that is more commonly perceived to be a better ‘fit’ for higher-paid roles in the organisation. Consider, for example, the ‘Mad-Men’ (North American TV series set in the 1960s) scenario where women typically earn lower administrative salaries and the men typically earn higher commercial salaries. Sadly, almost 60 years on, gender role segregation persists globally in traditionally masculine industries. In Australia, WGEA reports a pay gap of circa 25% for financial and professional services. Pay inequity conveys disrespect because it implies that some groups of people are valued more highly than others because of their membership of a group. An organisation truly committed to fairness must seek to address pay inequities across diversity dimensions such that there is no systematic difference in average pay across identity groups, whether that be gender, race, language or other. Only when pay equity is achieved can an organisation genuinely say it has achieved equality of opportunity.
Appraisal systems that do not provide an objective assessment of an employee’s performance can lead to erroneous performance concerns and even unfair dismissal with potential legal implications.
When employees feel that their efforts and achievements are not fairly assessed and rewarded, they are less motivated and committed to the organisation. The leading reason employees leave their job is because they don’t feel appreciated(12). Biased performance appraisals can also have a ‘self-fulfilling prophecy’ effect. When employees do not perceive the organisation values them, they may actively refrain from applying their full and best effort or engage in self-sabotaging behaviours. On the contrary, when employees perceive fairness in the evaluation processes, they are more likely to accept feedback and motivate themselves accordingly to improve performance(13)(14).
Solutions for Fair Appraisal
- Bias prompts: Including written prompts designed to disrupt bias and improve objectivity on appraisal forms can prompt managers to reflect on how bias might be influencing appraisals. Examples include, ‘Did you consider performance throughout the entire period of the appraisal?’, ‘Did you consider your rating in light of the criteria listed?’, ‘Give three specific examples of how the employee demonstrated a particular capability?’. Prompts help to ensure different assessors approach each review consistently and objectively.
- Performance rubrics: A rubric defines the objective and specific criteria against which the employee’s performance will be assessed. These can include productivity metrics such as the number of sales calls in a particular time period as well as direct output measures such as sales figures, customer satisfaction scores (internal and external), customer retention. Performance criteria should tie back to evidence that business goals and outcomes have been met rather than broad statements about a person’s general effectiveness or how they get along with team members or customers.
- Constrain the open box: Reduces the likelihood that subjective assessments of performance will drive ratings.
- Set criteria at the beginning of performance period: Evaluation criteria should be communicated to employees and agreed on ahead of the performance review period. This gives employees the best chance of success, allows the employer and manager to define metrics for tracking performance, and provides employees with an opportunity to include professional development goals.
- Weight rating criteria: Where assessors have limited influence over competency/capability models used for rating performance, employers should allow raters an opportunity to assign a weighting to a capability relative to the role requirements. For example, if strong interpersonal skills are not a critical requirement for a particular role, then the weighting given to that competency should be lower than other competencies that are deemed more critical for role mastery and job performance. Checks should be in place to ensure that capability weights used are consistency applied by different assessors.
- Using weighted scales: Rating scales, similar to rating criteria, should be specific and clear. Avoid ambiguous and vague terms likes ‘exceeds expectations’, ‘meets expectations’, and replace with measurable achievements. For example, a rating of 1 is sales between x and y, a rating of 2 is sales between y and z, etc. Similarly, a rating of 5 could mean ‘consistently meets deadlines’, 4 is ‘meets deadlines most of the time’, 3 is ‘meets deadlines in roughly half of all assignments’. As above, rating scales must be applied consistently across assessors.
- Limit the rating scale: There is evidence that decreasing rating scales can help to eliminate bias in assessments. Researchers studied one school of a large, North American university that changed its faculty teaching evaluation system from a 1-10 to a 1-6 scale In total, the researchers looked at 105,034 student ratings of 369 instructors in 235 courses. The research found that, under the 10-point system, men received significantly higher ratings than women in the most male-dominated fields, but switching to a 6-point scale entirely eliminated the gender gap. The results were replicated in a controlled experiment(15).
- Abandon forced rankings: Researchers have found that temporal comparison evaluations, involving the comparison of an individual employee’s current performance with their past performance and evaluating how much employees have (or have not) made progress over time, are considered to be fairer than social comparison evaluations(16). When an employee’s current performance is discussed relative to their past performance, they perceive that evaluations are more individualised, discerning, and accurate and that they have been treated more respectfully.
- Involve multiple perspectives: Asking several people to evaluate an individual engages numerous data points and encourages a broader perspective on performance, both of which act to reduce bias. As an example, 360-degree reviews seek performance feedback from a variety of sources, including supervisors, direct reports, peers, customers, suppliers and other stakeholders—giving you a complete picture of actual performance, while also minimising the impact of any individual rater’s bias
- Manage confirmation bias: When soliciting input from others about an individual’s performance, beware of the risk that confirmation bias—a tendency to favour your preexisting ideas and prejudgements—might encourage you to reject or ignore views that are opposite to your own. To temper confirmation bias, actively seek to understand perspectives different to your own, adopt an open mind, and foster a sense of curiosity.
- Structure the calibration process: While employees perceive benefits in calibration processes, they are not completely satisfied with the system of calibration, in part because they perceive favouritism to be an issue(17). Perhaps not surprisingly, higher-performing employees report higher levels of perceived fairness and satisfaction with the calibration system and less perceived favouritism relative to lower-performing employees. Structuring the calibration processes can help to reduce the potential for bias to influence outcomes and enhance perceptions of fairness. As a start, employers should formalise a process for identifying and discussing bias throughout the calibration process. Examples include acknowledging the potential for bias upfront and ensuring all members of the calibration committee verbally commit to engaging In objective assessments and challenging their own and other’s biases. Decision-making processes can also be formalised to mitigate the potential for some voices to override others in the process. For example, everyone should be given a voice in the calibration discussion and reaching agreement on the final rating. Also, contrary or dissenting views should be encouraged and considered fairly. Other guidelines should ensure that comments on performance are supported by objective evidence such as examples of desired behaviours or quantitative measures of performance and productivity. Calibration members can be encouraged to write down their bias concerns that arise during discussions, and these should be shared and discussed. To promote the calling out of bias, bias red-flags of individual raters can be collected and shared anonymously. It can also be beneficial to appoint an individual to act as overseer of the calibration process. The overseer’s role is to encourage objective discussion and decision-making.
- Adjust the frequency of performance reviews: While taking notes of performance during an appraisal period helps promote fairness because it helps to fill in memory gaps or correct distortions, increasing the frequency of reviews by implementing real-time feedback systems achieves the same outcomes but has additional benefits for performance development. Real-time feedback supports performance development in a way that retrospective appraisals don’t because, perhaps counterintuitively, real-time feedback significantly reduces the time spent on appraising employees allowing managers to focus their efforts instead on performance management. Ongoing feedback also provides employees with real-time feedback that they can apply immediately to improve their performance rather than waiting for an annual review before concerns are noted, and development goals are defined.
- Monitor ratings for bias: Regularly examine performance ratings and feedback for patterns across diversity dimensions. When patterns in performance ratings or feedback are detected that suggest bias might be influencing assessments, appraisal and feedback systems should be scrutinised to identify how bias is creeping into the system, and corrective action is taken swiftly to address weaknesses.
- Develop mindful inclusion capability: Assist people leaders in understanding their implicit assumptions and prejudgments and transfer skills for reducing bias in talent management by developing managers’ ability to monitor and manage their own and other’s bias.
- Wilke. (2015). Is the Annual Performance Review Dead? Downloaded from SHRM website: https://www.shrm.org/resourcesandtools/hr-topics/employee-relations/pages/performance-reviews-are-dead.aspx
- Meinert. (2015). Is It Time to Put the Performance Review on a PIP? Downloaded from SHRM website: https://www.shrm.org/hr-today/news/hr-magazine/pages/0415-qualitative-performance-reviews.aspx
- Mackenzie, Wehner, & Correll. (2019). Why Most Performance Evaluations are Biased, and How to Fix Them. Downloaded from Harvard Business Review Website: https://hbr.org/2019/01/why-most-performance-evaluations-are-biased-and-how-to-fix-them
- Jones, Smith & Rock. (2018). 3 Biases That Hijak Performance Reviews, and How to Address Them. Downloaded from Harvard Business Review Website: https://hbr.org/2018/06/3-biases-that-hijack-performance-reviews-and-how-to-address-them
- Hoffman, Lance, Bynum, & Gentry. (2010). Rater Source Effects Are Alive and Well After All. https://doi.org/10.1111/j.1744-6570.2009.01164.x
- Scullen & Mount. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6):956-70. https://doi.org/10.1037/0021-9010.85.6.956
- Mount, Judge, Scullen, Sytsma, Hezlett. (2006). Trait, Rater and Level Effects in 360-Degree Performance Ratings. Personnel Psychology, 51(3), 557-576. https://doi.org/10.1111/j.1744-6570.1998.tb00251.x
- Buckingham & Goodall. (2019). The Feedback Fallacy. Downloaded from Harvard Business Review website: https://hbr.org/2019/03/the-feedback-fallacy
- Castilla. (2008). Gender, Race, & Meritocracy in Organizational Careers. AJS, 113(6), 1479-1526. https://doi.org/10.5465/ambpp.2005.18778668
- Correll & Simard. (2016). Research: Vague Feedback Is Holding Women Back. Downloaded from Harvard Business Review website: https://hbr.org/2016/04/research-vague-feedback-is-holding-women-back
- Smith, Rosenstein, & Nikolov. (2018). The Different Words We Use to describe Male and Female Leaders. Downloaded from Harvard Business Review website: https://hbr.org/2018/05/the-different-words-we-use-to-describe-male-and-female-leaders
- Rath & Clifton. (2004). The Power of Praise and Recognition. Business Journal. Downloaded from Gallup website: https://news.gallup.com/businessjournal/12157/power-praise-recognition.aspx
- Leung, Su, & Morris. (2001). When is Criticism Not Constructive? The Roles of Fairness Perceptions and Dispositional Attributions in Employee Acceptance of Critical Supervisory Feedback. Human Relations, 54(9), 1155–1187. https://doi.org/10.1177/0018726701549002
- Chun, Brockner, & DeCremer. (2018). People Don’t Want to Be Compared with Others in Performance Reviews. They Want to Be Compared with Themselves. Downloaded from Harvard Business Review website: https://hbr.org/2018/03/people-dont-want-to-be-compared-with-others-in-performance-reviews-they-want-to-be-compared-with-themselves
- Rivera, L. A., & Tilcsik, A. (2019). Scaling Down Inequality: Rating Scales, Gender Bias, and the Architecture of Evaluation. American Sociological Review, 84(2), 248–274. https://doi.org/10.1177/0003122419833601
- Chun, Brockner, & DeCremer. (2018). How temporal and social comparisons in performance evaluation affect fairness perceptions. Organizational Behaviour and Human Decision Processes, 145, 1-15. https://doi.org/10.1016/j.obhdp.2018.01.003
- Demere, Sedatole, & Woods. (2018). Why Managers Shouldn’t Have the Final Say in Performance Reviews. Downloaded from Harvard Business Review website: https://hbr.org/2018/06/why-managers-shouldnt-have-the-final-say-in-performance-reviews