Meeting of 10-13-16 Public Meeting on Big Data in the Workplace
Thank you for the invitation to appear before the Commission to address the important and evolving topic of Big Data in the Workplace.
Numbers have always been used to monitor human activity. From tallies etched onto the walls of ancient caves to multivolume reports generated by computer programs operating "in the cloud," humans have consistently tried to deploy the power of mathematics and numbers to help understand and guide our behavior. We are not surprised, for example, when actuaries use data regarding life expectancy and health risks to set life insurance premiums. We are also not taken aback when our neighborhood bank or car dealer asks a series of questions and assigns us a credit score before deciding whether to lend us money. And it is expected that an employer will evaluate the performance of its sales force by examining the numbers it generates. Not surprisingly, in today's era of super-computing and digitalized information, our ability to acquire and to use large quantities of data has expanded exponentially, and each day brings new developments in both what we know (or stated more precisely, what we can know if we choose to) and how we can use that data. The world of Big Data has arrived, and it is beginning to affect employers and their decision-making in ways undreamed of even a few years ago.
Employers can access more information about their applicant pool and workforce than ever before, and have an ability to correlate data gleaned from an application itself, perhaps supplemented by publicly available social media sources, to determine how long a candidate is likely to stay on a particular job. Similarly, by combing through computerized calendar entries and e-mail headers, Big Data can tell us which employees are likely to leave their employment within the next 12 months. At the same time new tools and methods that rely on concepts of Big Data are becoming part of the daily landscape in human resource departments, employers continue to operate in a legal environment based on rules and regulations developed in an analog world with few guideposts that translate seamlessly into the world of Big Data. The issues that can arise unfold in a context that makes yesterday's compliance paradigm both outdated and difficult to apply.
The analytic methods of Big Data can be in tension with Title VII, the ADEA, and the ADA to the extent the correlations those methods discover overlap with protected employee characteristics. On the one hand, the world of Big Data offers a potent antidote to intentional discrimination. Antidiscrimination laws, however, also prohibit practices that are facially neutral if they have a disparate impact on members of protected categories, unless those practices are "job-related" and consistent with "business necessity."
Employers and their data analysts can use Big Data to determine what reasonably accessible information about employees or applicants correlates with the traits of successful employees. This is apparent from, for example, the observation that visitors to certain Japanese manga sites were often good coders.1 No one is likely to argue that individuals enhance their coding skills by spending time on those sites-surely the majority of visitors have the same coding aptitude before and after their visits. But for whatever reason, those with an aptitude for coding share an appreciation for manga. This is a case in which two traits, an appreciation of manga and coding aptitude, are correlated, but neither causes the other.
Given the complexity of amassing and then analyzing vast quantities of information, an employer would certainly not reverse engineer the process in order to intentionally discriminate against a protected group. It is far more probable that Big Data may be challenged because it unintentionally yields a disparate impact on one or more protected groups. More precisely, a plaintiff or class may allege that the algorithm used for hiring, promotion, or similar purpose adversely impacts one or more protected groups. Let us consider how that case would proceed and the issues that may arise.
The plaintiff in a Title VII disparate-impact-discrimination case must (1) identify with particularity the facially neutral practice being challenged, (2) demonstrate that the practice adversely impacts members of the protected group in question, and (3) show that the practice caused the plaintiff to suffer an adverse employment action.2 If the plaintiff meets that burden, the employer may defend by demonstrating that the practice in question is job-related and consistent with business necessity.3 When that neutral practice is a test or similar screen, courts typically require employers to establish the "validity" of the screen.4 Finally, if the employer establishes that defense, the plaintiff may prevail by proving the existence of a less discriminatory alternative that similarly serves the employer's needs, which the employer refuses to adopt.5
Cases arising under the ADEA6 would proceed somewhat differently. As with Title VII, plaintiffs must plead (1) a specific and actionable policy, (2) a disparate impact, and (3) facts raising a sufficient inference of causation.7 If the plaintiff makes this showing, the employer's burden is to demonstrate that its challenged practice is based on "reasonable factors other than age."8 Accordingly, to avoid liability once an ADEA plaintiff has proved a prima facie case, the employer must establish the reasonableness of its reliance on other neutral criteria.9
Whether a selection method that produces an adverse impact passes muster under Title VII is often decided with reference to the Uniform Guidelines on Employee Selection Procedures (Uniform Guidelines).10 The critical inquiry is whether the selection procedure is a "valid" predictor of success on the job.11 Because the Uniform Guidelines date from 1978, they do not contemplate Big Data's reliance on correlation rather than cause-and-effect relationships. Accordingly, there is a legal significance in the difference between correlative and cause-and-effect methodologies and the Uniform Guidelines may not serve their intended purpose when confronting the methodology of Big Data. The Uniform Guidelines require employers to consider whether there are less discriminatory alternatives to any selection procedure, whereas Title VII assigns this burden of proof to the plaintiff.
The ADEA requires proof that the practice is a "reasonable factor other than age."12 Big Data may elevate the importance of less discriminatory alternatives. Because Big Data derives its algorithms from vast troves of data, which computers combine and weigh in innumerable ways to select the optimal solution, there will typically be a host of near optimal alternatives, each differing slightly in terms of its impact on protected groups and its ability to identify superior employees. Courts must then decide whether an algorithm's marginally greater predictive ability is sufficient to justify its greater adverse impact, if indeed the law recognizes any trade-off between the two.
The ADA raises additional issues. Not all segments of the population are equally likely to leave footprints in places searched by Big Data. For example, a Big Data algorithm that tracks online book purchases may misconstrue the reading habits of sight-impaired individuals who may find electronic media less accessible than print. If the algorithm correlates electronic media purchases with positive job performance, this means the algorithm excludes sight-impaired persons and places the often-unknowing employer in jeopardy. Further, the ADA requires employers to modify "examinations, training materials or policies, the provision of qualified readers or interpreters, and other similar accommodations."13 An employer, however, can only accommodate disabilities of which it is aware. Yet disabled applicants, not knowing the activities and behaviors on which they are being assessed, have no reason to request an accommodation.
Title VII provides that a plaintiff may overcome an employer's proof that the challenged practice is job-related and consistent with business by demonstrating there exists an alternative practice-one which is less impactful on the protected group, yet as effective in meeting the employer's business needs-which the employer refuses to adopt.14
This strategy raises questions that are fundamental to Big Data. First, the plaintiff will want to obtain that algorithm and the data with which it was estimated, as well as the criterion or construct data by which "success" on the job was measured. Often, third-party Big Data companies possess this information, and thus obtaining it may require a protracted discovery battle. In such a scenario, a Big Data company's investment in developing the algorithms-its primary products-may be at risk. Next, a plaintiff must devise an alternative algorithm with a less discriminatory impact. What constitutes "less," however, is unclear. In a Big Data world, almost any improvement, no matter how slight, in the proportion of a protected group that passes a screen will be deemed "statistically significant" yet negligible in a practical sense. Will a court order a company to abandon a product in which it has invested heavily, in order to increase the pass-rate of a protected group by a statistically significant fraction of a percent? Further, there is a "whack-a-mole" aspect to this process. Suppose a female plaintiff undertakes the expense required to re-engineer the company's algorithm and finds a version that reduces the adverse impact on women. As a result, she persuades the employer to adopt this alternative. Subsequently, and unintentionally, the new algorithm enhances the adverse impact against African Americans. An African American plaintiff now sues and suggests an alternative that minimizes the adverse impact on his protected group but inadvertently enhances the adverse impact on Hispanics. The employer finds itself in the center of a game that ends only if there is a solution that minimizes the algorithm's disparate impact on every protected group.
Big Data effects a shift from selection criteria distilled from job-related knowledge, skills, and abilities, leaving correlation to be established empirically, to one in which correlation is first established empirically- independently of knowledge, skills, and ability-and leaves the duration of that correlation in question. Accordingly, rather than assess Big Data in terms of correlation, which it will typically pass with flying colors, courts and employers should ask how long the underlying correlations will endure. In terms of validation, this translates into determining the time elapsed since the algorithm was first calibrated and the time it was applied to the plaintiff, relative to the expected duration of the correlation. Because Big Data algorithms, by design, maximize the correlation between Big Data variables and some measure(s) of job performance, the correlation should be greatest when the algorithm is initially calibrated and should decay as time passes. But how much decay is tolerable before the algorithm is too unreliable to pass legal scrutiny?
The Uniform Guidelines suggest the following: "Generally, a selection procedure is considered related to the criterion, for the purpose of these guidelines, when the relationship between performance on the procedure and performance on the criterion measure is statistically significant at the .05 level of significance."15 This suggests that the useful life of an algorithm should be measured by the time elapsed before the correlation is reduced in significance to the .05 level. This definition fails, however, to consider how long the algorithm remains superior to less discriminatory alternatives. Therefore, determining how long the correlation persists, in both dimensions, is the critical inquiry in assessing whether a Big Data algorithm is lawfully applied to the employer's workforce.
A potential concern with Big Data, viewed from this perspective, is that there is no reason the algorithm that best fits the data on Monday will do so on Tuesday. This is the Achilles heel of purely correlation-based methods. Because there is no understanding of why the correlation exists, there is no basis for surmising how long it will persist. This contrasts with the longevity courts attribute to conventional job analyses and the associated validity studies, which are premised on cause-and-effect relationships. "There is no requirement in the industry or in the law that a new job analysis be prepared for each successive selection procedure, and an earlier-developed job analysis may appropriately be used so long as it is established that the job analysis remains relevant and accurate."16 Additionally, expert testimony has stated that "conventional wisdom places the shelf-life of a job analysis for [certain positions] at 'five plus years,' and up to ten years more."17
What does the arrival of Big Data mean for employers and their business operations and human resources' functions? It has become axiomatic to observe that the digitalization of information has resulted in the creation of more data in recent years than in the prior combined history of humankind, and that at the same time we have acquired all of this data, our ability to apply advanced computer-based techniques to use the information has likewise expanded exponentially. It has also become cheaper and more readily accessible to do so for virtually everyone. For employers, these developments create both opportunities and novel issues of concern, and they generate new questions about long-time problems. Big Data potentially affects every aspect of employment decision-making for employers of all size in virtually every industry, from the selection and hiring process, through performance management and promotion decisions, and up to and beyond the time termination decisions are made, whether for performance reasons or as part of a reorganization. Employers, in essence, need to understand how to balance the opportunities and risks in the brave new world of Big Data. Big Data means that employers can theoretically analyze every aspect of every decision without worrying about a need to rely only on a partial sample, and Big Data allows employers to find (or, in some cases, to disprove) correlations between characteristics and outcomes that may or may not have a seeming connection. As a result, employers need to be able to understand what Big Data means for more than just to reducing the risks of traditional discrimination claims without giving rise to new varieties of such claims. But there are also new implications for background checks and employee privacy, data security obligations, and new theories of liability and new defenses based on statistical correlations, to name but a few.
Employers should prepare for a new human resources world dominated by data sets, analytics, and statistical correlations. Depending on the employer, that world either has already arrived or is in the process of arriving quickly and with momentum. Big Data is here to stay and will not easily be separated from effective human resources management techniques or from the legal world in which employers operate, a world currently governed by standards that do not work effectively in dealing with Big Data methods and techniques. Whether addressing the laws that govern the gathering and storage of information about candidates and employees or the tests used to determine whether illegal discrimination has occurred, or examining the ways in which parties manage data in litigation, employers need to know and understand the interplay between Big Data and the human resources laws that dictate what can and cannot be done. It is already clear that Big Data, used correctly, can be a powerful tool to eliminate overt and implicit bias from an employee selection process, and a misplaced, rigid adherence to outdated legal tests and standards cannot prevent this progress from taking place.
The challenge for employers is to find a way to embrace the strengths of Big Data without losing sight of their own business goals and culture amidst potential legal risks. The challenge for the legal system is to permit those engaged in the responsible development of Big Data methodologies in the employment sector to move forward and explore their possibilities without interference from guidelines and standards based on assumptions that no longer apply or that become obsolete the next year. An important part of this process is permitting employers to find and work with key business partners to assist in Big Data efforts and developing strategies that have the potential to make the workplace function more effectively for everyone. To do that well, it is vital that human resources professionals and their lawyers have a seat at the table when business decisions are made regarding how and when to use Big Data. The first step in securing that place in the decision-making process is to better understand what Big Data is and how it relates to the current legal system and human resources framework. A goal for the firms, like ours, on the forefront of the Big Data frontier, is to help employers, and indeed the legal system, achieve that improved understanding.
4 See, e.g., Ricci v. DeStefano, 557 U.S. 557, 578 (2009); and Johnson v. City of Memphis, 770 F.3d 464, 478 (6th Cir. 2014)( "The City may meet its… burden by showing through 'professionally acceptable methods, [that its testing methodology is] predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated,'" quoting Black Law Enforcement Officers Ass'n v. City of Akron, 824 F.2d 475, 480 (6th Cir. 1987)).
9 It is, accordingly, in cases involving disparate-impact claims that the RFOA provision plays its principal role by precluding liability if the adverse impact was attributable to a nonage factor that was 'reasonable.'" City of Jackson, 544 U.S. at 239.