0% found this document useful (0 votes)
20 views5 pages

2408.00992v3 Fairness in Large Language Models in Three Hours

This document presents a tutorial on fairness in Large Language Models (LLMs), addressing the ethical concerns related to biased predictions that can affect marginalized populations. It outlines the sources of bias in LLMs, methods for evaluating and mitigating this bias, and provides resources for assessing fairness. The tutorial aims to bridge the gap in understanding fairness in LLMs compared to traditional machine learning models, offering practical insights and future directions for research in this area.

Uploaded by

newplato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

2408.00992v3 Fairness in Large Language Models in Three Hours

This document presents a tutorial on fairness in Large Language Models (LLMs), addressing the ethical concerns related to biased predictions that can affect marginalized populations. It outlines the sources of bias in LLMs, methods for evaluating and mitigating this bias, and provides resources for assessing fairness. The tutorial aims to bridge the gap in understanding fairness in LLMs compared to traditional machine learning models, offering practical insights and future directions for research in this area.

Uploaded by

newplato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fairness in Large Language Models in Three Hours

Thang Viet Doan Zichong Wang Minh Nhat Nguyen Wenbin Zhang∗
Florida International Florida International Florida International Florida International
University University University University
Miami, FL, US Miami, FL, US Miami, FL, US Miami, FL, US
tdoan011@fiu.edu ziwang@fiu.edu nhoan009@fiu.edu wenbin.zhang@fiu.edu
ABSTRACT biased prediction has raised significant ethical and societal con-
Large Language Models (LLMs) have demonstrated remarkable suc- cerns, severely limiting the adoption of LLMs in high-risk decision-
arXiv:2408.00992v3 [cs.CL] 8 Aug 2024

cess across various domains but often lack fairness considerations, making scenarios such as hiring, loan approvals, legal sentencing,
potentially leading to discriminatory outcomes against marginal- and medical diagnoses.
ized populations. Unlike fairness in traditional machine learning, To this end, many efforts have been made to mitigate bias in
fairness in LLMs involves unique backgrounds, taxonomies, and LLMs [25, 47, 48]. For example, one line of work extends traditional
fulfillment techniques. This tutorial provides a systematic overview fairness notions—individual fairness and group fairness—to these
of recent advances in the literature concerning fair LLMs, begin- models [6]. Specifically, individual fairness seeks to ensure similar
ning with real-world case studies to introduce LLMs, followed by outcomes for similar individuals [13, 49], while group fairness fo-
an analysis of bias causes therein. The concept of fairness in LLMs cuses on equalizing outcome statistics across subgroups defined by
is then explored, summarizing the strategies for evaluating bias sensitive attributes [18, 44–46] (e.g., gender or race). While these
and the algorithms designed to promote fairness. Additionally, re- classification-based fairness notions are adept at evaluating bias in
sources for assessing bias in LLMs, including toolkits and datasets, LLM’s classification results [6], they fall short in addressing biases
are compiled, and current research challenges and open questions that arise during the LLM generation process [20]. In other words,
in the field are discussed. The repository is available on this website1 . LLMs demand a nuanced approach to measure and mitigate bias
that emerges both in their outputs and during the generation pro-
KEYWORDS cess. This complexity motivates other lines of linguistic strategies
that not only evaluate the accuracy of LLMs but also their propa-
Large Language Model, Fairness, Social Sciences
gation of harmful stereotypes or discriminatory language. For in-
ACM Reference Format: stance, a study examining the behavior of an LLM like ChatGPT
Thang Viet Doan, Zichong Wang, Minh Nhat Nguyen, and Wenbin Zhang. revealed a concerning trend: it generated letters of recommenda-
2024. Fairness in Large Language Models in Three Hours. In Proceedings of tion that described a fictitious individual named Kelly (i.e., a com-
the 33rd ACM International Conference on Information and Knowledge Man- monly female-associated name) as “warm and amiable”, while de-
agement (CIKM ’24), October 21–25, 2024, Boise, ID, USA. ACM, Barcelona, scribing Joseph (i.e., a commonly male-associated name) as a “nat-
Spain, 5 pages. https://doi.org/10.1145/XXXXXX.XXXXXX
ural leader and role model”. This pattern indicates that LLMs may
inadvertently perpetuate gender stereotypes by associating higher
1 INTRODUCTION levels of leadership with males, underscoring the need for more
Large Language Models (LLMs), such as BERT [9], GPT-3 [5], and sophisticated mechanisms to identify and correct such biases.
LLaMA [42], have shown powerful performance and development These burgeoning and varied endeavors aimed at achieving fair-
prospects in various tasks of Natural Language Processing due to ness in LLMs [8, 16, 29] highlight the necessity for a comprehensive
their robust text encoding and decoding capabilities and discov- understanding of how different fair LLM methodologies are imple-
ered emergent capabilities (e.g., reasoning) [7]. Despite their great mented and understood across diverse studies. Lacking clarity on
performance, LLMs tend to inherit bias from multiple sources, in- these correspondences, the design of future fair LLMs can become
cluding training data, encoding processes, and fine-tuning proce- challenging [4]. Consequently, there is a pressing need for a sys-
dures, which may result in biased decisions against certain groups tematic tutorial elucidating the recent advancements in fair LLMs.
defined by the sensitive attribute (e.g., age, gender, or race). The However, although there are several tutorials that address fairness
in machine learning algorithms, [15, 17, 30, 40] these primarily fo-
∗ Corresponding author cus on fairness in broader machine learning algorithms. There is a
1 https://github.com/LavinWong/Fairness-in-Large-Language-Models
noticeable gap in inclusive resources that specifically address fair-
ness within LLMs, distinguishing it from traditional models and
Permission to make digital or hard copies of all or part of this work for personal or discussing recent developments.
classroom use is granted without fee provided that copies are not made or distributed Our tutorial aims to bridge this gap by providing an up-to-date
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than and comprehensive review of existing work on fair LLMs. It be-
the author(s) must be honored. Abstracting with credit is permitted. To copy other- gins with a general overview of LLMs, followed by an analysis of
wise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from [email protected].
the sources of bias inherent in their training processes. We then
CIKM ’24, October 21–25, 2024, Boise, ID, USA. delve into the specific concept of fairness as it applies to LLMs,
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. summarizing the strategies and algorithms employed to assess and
ACM ISBN 979-8-4007-0436-9/24/10
https://doi.org/10.1145/XXXXXX.XXXXXX
CIKM ’24, October 21–25, 2024, Boise, ID, USA. Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, & Wenbin Zhang

enhance fairness. The tutorial also offers practical resources, in- – Balancing Performance and Fairness in LLMs
cluding toolkits and datasets, that are essential for evaluating bias – Fulfilling Multiple Types of Fairness
in LLMs. Furthermore, we explore the unique challenges of fair- – Developing More and Tailored Datasets
ness in LLMs, such as those presented by word embeddings and
the language generation process. Finally, the tutorial concludes by
addressing the current research challenges and proposing future 2.2 Content
directions for this field. Background on LLMs. We start by providing the audience with
Previous tutorial. To the best of our knowledge, no other tu- fundamental knowledge about LLMs. Next, we briefly explain the
torial on fairness in LLMs has been presented at CIKM or other key steps required to train LLMs, including 1) data preparation and
similar venues. preprocessing, 2) model selection and configuration, 3) instruction
tuning, and 4) alignment with humans. By examining the training
2 TUTORIAL OUTLINE process in detail, we identify and discuss three primary sources
We plan to give a half-day tutorial (3 hours plus breaks). To en- contributing to bias in LLMs: i) training data bias, ii) embedding
sure our tutorial remains engaging and interactive, we intend to bias, and iii) label bias.
accomplish as follows: i) Case Studies Introduction. We’ll start Quantifying Bias in LLMs. To evaluate bias in LLMs, the pri-
with a series of case studies that highlight specific instances of mary method involves analyzing bias associations in the model’s
bias within LLMs. By grounding our discussion in real-world exam- output when responding to input prompts. These evaluations can
ples, we aim to help contextualize the discussion and make it more be conducted through various strategies including demographic
relatable for the audience. We aim to encourage participants to representation, stereotypical association, counterfactual fairness,
share their thoughts on these cases and foster dialogue. ii) Interac- and performance disparities [12].
tive Bias Discussion. An integral part of our tutorial will involve Demographic representation [5, 32, 33] evaluation method as-
presenting participants with various LLM outputs and prompts. sesses bias by analyzing the frequency of demographic word ref-
We’ll then facilitate a discussion to identify and analyze poten- erences in the text generated by a model in response to a given
tial biases within these examples. iii) Fair LLMs Discussion. We prompt [29]. In this context, bias is defined as a systematic discrep-
will explore strategies and algorithms for developing fairer LLMs ancy in the frequency of mentions of different demographic groups
through practical examples. Following this, a presentation of use- within the generated text.
ful tools and datasets for assessing fairness in LLMs will take place Stereotypical association [1, 5, 32] method assesses bias by mea-
to provide participants with concrete tools and methodologies for suring the disparity in the rates at which different demographic
fairness in LLMs. iv) Q&A Discussion. The tutorial will culmi- groups are linked to stereotyped terms (e.g., occupations) in the
nate in a Q&A session, allowing participants to ask questions and text generated by the model in response to a given prompt [32].
seek clarifications on any aspects of the session. Additionally, we In this context, bias is defined as a systematic discrepancy in the
will make tutorial materials, such as the description, presentation model’s associations between demographic groups and specific
slides, and pre-recorded videos, available for post-tutorial access stereotypes, which reflects societal prejudices.
and dissemination. Counterfactual fairness [31, 32] evaluates bias by replacing
terms characterizing demographic identity in the prompts and
2.1 Agenda then observing whether the model’s responses remain invari-
The outline of the tutorial is as follows: ant [29]. Bias in this context is defined as the model’s sensitivity
• Part I: Background on LLMs (30 minutes) to demographic-specific terms, measuring how changes to these
– Introduction to LLMs terms affect its output.
– Training Process of LLMs Performance disparities [32, 43, 48] method assesses bias by
– Root Causes of Bias in LLMs measuring the differences in model performance across various de-
• Part II: Quantifying Bias in LLMs (60 minutes) mographic groups on downstream tasks. Bias in this context is de-
– Demographic representation [5, 32, 33] fined as the systematic variation in accuracy or other performance
– Stereotypical association [1, 5, 32] metrics when the model is applied to tasks involving different de-
– Counterfactual fairness [31, 32] mographic groups.
– Performance disparities [32, 43, 48] Mitigating Bias in LLMs. We systematically categorize bias
• Part III: Mitigating Bias in LLMs (40 minutes) mitigation algorithms based on their intervention stage within the
– Pre-processing [14, 27, 47] processing pipeline.
– In-training [26, 36, 37] Pre-processing methods change the data given to the model, like
– Intra-processing [2, 21, 34] training data and prompts. They do this by using methods like data
– Post-processing [11, 24, 41] augmentation [47] and prompt tuning [14, 27].
• Part IV: Resources for Evaluating Bias (30 minutes) In-training methods aim to alter the training process to mini-
– Toolkits [3, 23, 39] mize bias. This includes making modifications to the optimization
– Datasets [10, 22, 28, 35, 38] process by adjusting the loss function [36] and incorporating aux-
• Part V: Challenges and Future Directions (20 minutes) iliary modules [26, 37].
– Formulating Fairness Notions Intra-processing methods mitigate bias in pre-trained or fine-
– Rational Counterfactual Data Augmentation tuned models during inference without additional training. This
Fairness in Large Language Models in Three Hours CIKM ’24, October 21–25, 2024, Boise, ID, USA.

technique includes a range of methods, such as model editing [2, tutorial will intersperse lectures with discussion sessions, encour-
34] and decoding modification [21]. aging attendees to engage, ask questions, and share insights. Fur-
Post-processing methods modify the results generated by the thermore, to extend the tutorial’s reach and impact, all materials,
model to reduce biases, which is crucial for closed-source LLMs ranging from descriptions and slides to pre-recorded videos, will
where direct modification is limited. We use methods such as chain- be available for post-tutorial access, supporting continued educa-
of-thought [11, 24] and rewriting [41] as illustrative approaches to tion and exploration of fairness in LLMs across diverse audiences.
convey this concept.
Resource for Evaluating Bias. In this part, we introduce exist- 4 TUTORS’ SHORT BIO AND EXPERTISE
ing resources for evaluating bias in LLMs. First, we present three es-
sential tools: Perspective API [23], developed by Google Jigsaw, de-
RELATED TO THE TUTORIAL
tects toxicity in text; AI Fairness 360 (AIF360) [3], an open-source Thang Viet Doan is a Ph.D. student in the Knight Foundation
toolkit with various algorithms and tools; and Aequitas [39], an- School of Computing and Information Sciences at Florida Interna-
other open-source toolkit, audits fairness and bias in LLMs, aiding tional University. He holds a Bachelor’s degree in Computer Sci-
data scientists and policymakers. ence from Hanoi University of Science and Technology (HUST).
Next, we summarize worth-noting datasets referenced in the lit- His current research interests are mainly focused on detecting and
erature, categorized into probability-based and generation-based. mitigating social bias in natural language systems.
Probability-based datasets, like WinoBias [38], BUG [28], and
CrowS-Pairs [35], use template-based formats or counterfactual-
based sentences. Generation-based datasets, such as RealToxici- Zichong Wang is currently pursuing his Ph.D. in the Knight
tyPrompts [22] and BOLD [10], specify the first few words of a Foundation School of Computing and Information Sciences at
sentence and require a continuation. Besides, we will introduce Florida International University. His research is centered on mit-
TabLLM [19], a general framework to leverage LLMs for the clas- igating inadvertent disparities resulting from the interaction of al-
sification of tabular data. That approach aims to address the chal- gorithms, data, and human decisions in policy development. His
lenge of using LLMs on structured tabular datasets, which are used work has been honored with the Best Paper Award at FAccT’23
in high-stakes domains for classification tasks. and is a candidate for the Best Paper Award at ICDM’23. Addi-
Challenges and future directions. The tutorial concludes by tionally, he actively contributes as a member of the Program Com-
exploring open research problems and future directions. Firstly, we mittee/Reviewers for esteemed conferences and journals, includ-
discuss the challenges of ensuring fairness in LLMs. Defining fair- ing KDD, IJCAI, ICML, ICLR, FAccT, ECML-PKDD, ECAI, PAKDD,
ness in LLMs is complex due to diverse forms of discrimination re- Machine Learning, and Information Sciences.
quiring tailored approaches to quantify bias, where definitions can
conflict. Rational counterfactual data augmentation, a technique
to mitigate bias, often produces inconsistent data quality and un-
natural sentences, necessitating more sophisticated strategies. In Minh Nhat Hoang Nguyen is a Ph.D. student at the Knight Foun-
addition, balancing performance and fairness involves adjusting dation School of Computing and Information Sciences, Florida In-
the loss function with fairness constraints, but finding the optimal ternational University. He earned his Bachelor’s degree in Data Sci-
trade-off is challenging due to high costs and manual tuning. ence and Artificial Intelligence from Hanoi University of Science
For future directions, it is imperative to address multiple types and Technology (HUST). His research focuses on detecting poten-
of fairness concurrently, as bias in any form is undesirable. . Ad- tial bias in machine learning algorithms, data quality and applying
ditionally, there is a pressing need for more tailored benchmark bias mitigation handling methods to deliver fairness in social ap-
datasets, as current datasets follow a template-based methodology plication.
that may not accurately reflect various forms of bias.

Wenbin Zhang is an Assistant Professor in the Knight Founda-


tion School of Computing and Information Sciences at Florida In-
ternational University, and an Associate Member at the Te Ipu
3 TARGET AUDIENCE AND PREREQUISITES o te Mahara Artificial Intelligence Institute. His research investi-
FOR THE TUTORIAL gates the theoretical foundations of machine learning with a fo-
The tutorial is designed for researchers and practitioners in data cus on societal impact and welfare. In addition, he has worked
mining, artificial intelligence, social science and other interdisci- in a number of application areas, highlighted by work on health-
plinary areas, aiming to cater to individuals with varying degrees care, digital forensics, geophysics, energy, transportation, forestry,
of expertise. The prerequisites include basic knowledge of proba- and finance. He is a recipient of best paper awards/candidates at
bility, linear algebra, and machine learning, while prior knowledge FAccT’23, ICDM’23, DAMI, and ICDM’21, as well as the NSF CRII
of algorithmic fairness or specific algorithms is not a prerequisite, Award and recognition in the AAAI’24 New Faculty Highlights.
ensuring accessibility to beginners. This tutorial is designed for He also regularly serves in the organizing committees across com-
40% novice, 30% intermediate, and 30% expert in order to achieve puter science and interdisciplinary venues, most recently Travel
a good balance between the introductory and advanced materials. Award Chair at AAAI’24, Volunteer Chair at WSDM’24 and Stu-
To foster a dynamic and participatory learning environment, the dent Program Chair at AIES’23.
CIKM ’24, October 21–25, 2024, Boise, ID, USA. Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, & Wenbin Zhang

5 POTENTIAL SOCIETAL IMPACTS for measuring biases in open-ended language generation. In Proceedings of the
2021 ACM conference on fairness, accountability, and transparency. 862–872.
This tutorial possesses significant potential for positive societal im- [11] Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, and Emma Strubell. 2023.
pacts: i) By illuminating the nuances of fairness in LLMs, it endeav- Queer people are people first: Deconstructing sexual identity stereotypes in
large language models. arXiv preprint arXiv:2307.00101 (2023).
ors to ignite research interest and catalyze efforts aimed at advanc- [12] Thang Viet Doan, Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fair-
ing fairness within this domain. Given the early stages of current ness Definitions in Language Models Explained. arXiv:2407.18454 [cs.CL]
initiatives addressing fairness in LLMs, this tutorial stands as a piv- https://arxiv.org/abs/2407.18454
[13] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard
otal milestone in galvanizing further exploration and innovation Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations
in the field. ii) Through the exploration of new challenges that re- in theoretical computer science conference. 214–226.
main unaddressed in existing literature, this tutorial has the po- [14] Zahra Fatemi, Chen Xing, Wenhao Liu, and Caiming Xiong. 2021. Improving
gender fairness of pre-trained language models without catastrophic forgetting.
tential to inspire innovative approaches within the realm of LLMs arXiv preprint arXiv:2110.05367 (2021).
fairness. By shedding light on these issues, it aims to stimulate crit- [15] Rayid Ghani, Kit T Rodolfa, Pedro Saleiro, and Sérgio Jesus. 2023. Addressing
bias and fairness in machine learning: A practical guide and hands-on tutorial.
ical discourse and foster the development of comprehensive solu- In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and
tions that address the complexities inherent in ensuring fairness Data Mining. 5779–5780.
within LLMs. iii) In addition to addressing fairness issues, this tu- [16] Vipul Gupta, Pranav Narayanan Venkit, Shomir Wilson, and Rebecca J Passon-
neau. 2024. Sociodemographic Bias in Language Models: A Survey and Forward
torial emphasizes the importance of developing new datasets that Path. (2024).
reflect diverse and representative forms of bias. By highlighting [17] Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From
gaps in current datasets, it encourages the creation of new ones, discrimination discovery to fairness-aware data mining. In Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery and data min-
aiming to support more accurate and equitable LLM training pro- ing. 2125–2126.
cesses. iv) Beyond its immediate focus on fairness in LLMs, this [18] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in su-
pervised learning. Advances in neural information processing systems 29 (2016).
tutorial endeavors to extend its impact on related research topics [19] Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi
by uncovering new problems and elucidating their interconnect- Jiang, and David Sontag. 2023. Tabllm: Few-shot classification of tabular data
edness with fairness considerations. By identifying emerging is- with large language models. In International Conference on Artificial Intelligence
and Statistics. PMLR, 5549–5581.
sues, it seeks to foster interdisciplinary collaboration and facilitate [20] Taojun Hu and Xiao-Hua Zhou. 2024. Unveiling LLM Evaluation Focused on
holistic advancements in understanding and addressing societal Metrics: Challenges and Solutions. arXiv preprint arXiv:2404.09135 (2024).
concerns surrounding LLMs, thus contributing to broader societal [21] Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack
Rae, Vishal Maini, Dani Yogatama, and Pushmeet Kohli. 2019. Reducing sen-
progress and well-being. timent bias in language models via counterfactual evaluation. arXiv preprint
arXiv:1911.03064 (2019).
[22] Yue Huang, Qihui Zhang, Lichao Sun, et al. 2023. Trustgpt: A bench-
ACKNOWLEDGEMENT mark for trustworthy and responsible large language models. arXiv preprint
arXiv:2306.11507 (2023).
This work was supported in part by the National Science Founda- [23] Google Jigsaw. 2017. Perspective API. https://www.perspectiveapi.com/
tion (NSF) under Grant No. 2245895. [24] Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki, and Timothy Baldwin.
2024. Evaluating Gender Bias in Large Language Models via Chain-of-Thought
Prompting. arXiv preprint arXiv:2401.15585 (2024).
[25] Hadas Kotek, Rikker Dockum, and David Sun. 2023. Gender bias and stereotypes
REFERENCES in large language models. In Proceedings of The ACM Collective Intelligence Con-
[1] Abubakar Abid, Maheen Farooqi, and James Zou. 2021. Persistent anti-muslim ference. 12–24.
bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference [26] Anne Lauscher, Tobias Lueken, and Goran Glavaš. 2021. Sustainable modular
on AI, Ethics, and Society. 298–306. debiasing of language models. arXiv preprint arXiv:2109.03646 (2021).
[2] Afra Feyza Akyürek, Eric Pan, Garry Kuwanto, and Derry Wijaya. 2023. DUnE: [27] Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for
Dataset for unified editing. arXiv preprint arXiv:2311.16087 (2023). parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
[3] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie [28] Shahar Levy, Koren Lazar, and Gabriel Stanovsky. 2021. Collecting a large-scale
Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, gender bias dataset for coreference resolution and machine translation. arXiv
Aleksandra Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for preprint arXiv:2109.03858 (2021).
detecting and mitigating algorithmic bias. IBM Journal of Research and Develop- [29] Y Li, M Du, R Song, X Wang, and Y Wang. 2023. A survey on fairness in large
ment 63, 4/5 (2019), 4–1. language models. arXiv. doi: 10.48550. arXiv preprint arXiv.2308.10149 (2023).
[4] Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Lan- [30] Yunqi Li, Yingqiang Ge, and Yongfeng Zhang. 2021. Tutorial on fairness of ma-
guage (technology) is power: A critical survey of" bias" in nlp. arXiv preprint chine learning in recommender systems. In Proceedings of the 44th international
arXiv:2005.14050 (2020). ACM SIGIR conference on research and development in information retrieval. 2654–
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, 2657.
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda [31] Yunqi Li and Yongfeng Zhang. 2023. Fairness of chatgpt. arXiv preprint:
Askell, et al. 2020. Language models are few-shot learners. Advances in neural 2305.18569 (2023).
information processing systems 33 (2020), 1877–1901. [32] Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michi-
[6] Garima Chhikara, Anurag Sharma, Kripabandhu Ghosh, and Abhijnan hiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Ku-
Chakraborty. 2024. Few-Shot Fairness: Unveiling LLM’s Potential for Fairness- mar, et al. 2022. Holistic evaluation of language models. arXiv preprint
Aware Classification. arXiv preprint arXiv:2402.18502 (2024). arXiv:2211.09110 (2022).
[7] Zhibo Chu, Shiwen Ni, Zichong Wang, Xi Feng, Chengming Li, Xiping Hu, [33] Justus Mattern, Zhijing Jin, Mrinmaya Sachan, Rada Mihalcea, and Bernhard
Ruifeng Xu, Min Yang, and Wenbin Zhang. 2024. History, Development, and Schölkopf. 2022. Understanding stereotypes in language models: Towards robust
Principles of Large Language Models-An Introductory Survey. arXiv preprint measurement and zero-shot debiasing. arXiv preprint arXiv:2212.10678 (2022).
arXiv:2402.06853 (2024). [34] Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D
[8] Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in Large Lan- Manning. 2021. Fast model editing at scale. arXiv preprint arXiv:2110.11309
guage Models: A Taxonomic Survey. ACM SIGKDD explorations newsletter (2021).
(2024). [35] Aurélie Névéol, Yoann Dupont, Julien Bezançon, and Karën Fort. 2022. French
[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: CrowS-pairs: Extending a challenge dataset for measuring social bias in masked
Pre-training of deep bidirectional transformers for language understanding. language models to a language other than English. In Proceedings of the 60th
arXiv preprint arXiv:1810.04805 (2018). Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
[10] Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruk- Papers). 8521–8531.
sachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. Bold: Dataset and metrics
Fairness in Large Language Models in Three Hours CIKM ’24, October 21–25, 2024, Boise, ID, USA.

[36] SunYoung Park, Kyuri Choi, Haeun Yu, and Youngjoong Ko. 2023. Never too [43] Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael R
late to learn: Regularizing gender bias in coreference resolution. In Proceedings Lyu. 2023. Biasasker: Measuring the bias in conversational ai system. In Pro-
of the Sixteenth ACM International Conference on Web Search and Data Mining. ceedings of the 31st ACM Joint European Software Engineering Conference and
15–23. Symposium on the Foundations of Software Engineering. 515–527.
[37] Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, and Yoav Goldberg. [44] Zichong Wang, Giri Narasimhan, Xin Yao, and Wenbin Zhang. 2023. Mitigating
2020. Null it out: Guarding protected attributes by iterative nullspace projection. multisource biases in graph neural networks via real counterfactual samples. In
arXiv preprint arXiv:2004.07667 (2020). 2023 IEEE International Conference on Data Mining (ICDM). IEEE, 638–647.
[38] Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. [45] Zichong Wang, Nripsuta Saxena, Tongjia Yu, Sneha Karki, Tyler Zetty, Israat
2018. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 Haque, Shan Zhou, Dukka Kc, Ian Stockwell, Xuyu Wang, et al. 2023. Preventing
(2018). discriminatory decision-making in evolving data streams. In Proceedings of the
[39] Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari 2023 ACM Conference on Fairness, Accountability, and Transparency. 149–159.
Anisfeld, Kit T Rodolfa, and Rayid Ghani. 2018. Aequitas: A bias and fairness [46] Zichong Wang, Charles Wallace, Albert Bifet, Xin Yao, and Wenbin Zhang. 2023.
audit toolkit. arXiv preprint arXiv:1811.05577 (2018). : Fairness-aware graph generative adversarial networks. In Joint European Con-
[40] Pedro Saleiro, Kit T Rodolfa, and Rayid Ghani. 2020. Dealing with bias and ference on Machine Learning and Knowledge Discovery in Databases. Springer,
fairness in data science systems: A practical hands-on tutorial. In Proceedings of 259–275.
the 26th ACM SIGKDD international conference on knowledge discovery & data [47] Vithya Yogarajan, Gillian Dobbie, Te Taka Keegan, and Rostam J Neuwirth.
mining. 3513–3514. 2023. Tackling Bias in Pre-trained Language Models: Current Trends and Under-
[41] Ewoenam Kwaku Tokpo and Toon Calders. 2022. Text style transfer for bias represented Societies. arXiv preprint arXiv:2312.01509 (2023).
mitigation using masked language modeling. arXiv preprint arXiv:2201.08643 [48] Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He.
(2022). 2023. Is chatgpt fair for recommendation? evaluating fairness in large language
[42] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne model recommendation. In Proceedings of the 17th ACM Conference on Recom-
Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal mender Systems. 993–999.
Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv [49] Wenbin Zhang, Zichong Wang, Juyong Kim, Cheng Cheng, Thomas Oommen,
preprint arXiv:2302.13971 (2023). Pradeep Ravikumar, and Jeremy Weiss. 2023. Individual fairness under uncer-
tainty. In ECAI 2023. IOS Press, 3042–3049.

You might also like