2023.ijcnlp Tutorials.3
2023.ijcnlp Tutorials.3
2023.ijcnlp Tutorials.3
Abstract
Generative language technologies have become
integral to everyday communication, shap-
ing social interactions and informing critical
decision-making processes in areas such as re-
cruitment, healthcare, and education. However, Figure 1: DA theory quantifies key properties of text
they often struggle to grasp the "long tail" of data to inform us about model generalization; e.g., it
data distributions — concepts less frequently can identify the long tail to promote equitable text gen-
observed during training — which could have eration for underrepresented groups.
significant repercussions. These models may
marginalize underrepresented groups by failing
to comprehend preferred communication styles, algorithm design, and evaluation become more
such as code-switching, or perpetuating soci-
and more important. It is vital that under-served
etal biases like gender bias. Sectors like health-
care, education, and law, requiring personaliza- demographics are not left behind in the wake of
tion and exhibiting nuanced linguistic features, this technological wave – e.g., by supporting user-
are also particularly affected when pre-trained specific behaviors like code-switching (Harring-
models misconstrue or overlook "long tail" data ton and Egede, 2023), low resource languages like
concepts. While methods like distillation of American Sign Language (Inan et al., 2022), and
smaller language models, active learning, and equitable language use (Mayfield et al., 2019).
other bias mitigation strategies can augment tra-
ditional training techniques, a careful statistical While we still have much to learn about new
analysis is essential for their effective applica- generative technologies (Rogers et al., 2020), what
tion. This tutorial offers a comprehensive exam- we do know can be alarming. For example, these
ination of how to develop equitable, robust, and models typically fail to learn infrequent data con-
inclusive language technologies using statisti- cepts in the long tail of text distributions (Kand-
cal tools from Domain Adaptation (DA) that pal et al., 2022). Indeed, this can lead to unfortu-
catalyze positive social change. We will delve
nate, unintended outcomes such as social inequities
into strategies for bias mitigation, explore how
to measure bias, and examine open problems (Bolukbasi et al., 2016), abysmal lexical diversity
in creating culturally-grounded and inclusive (Shekhar et al., 2019), or hard to resolve toxicity
language technologies. Accompanying code issues (Xu et al., 2021). All this is to say, without
notebooks and packages will be provided.1 doubt, our use of machine learning as a tool has
outpaced our understanding of this tool in many
1 Introduction ways. For robust, responsible deployment of gen-
Large language models are increasingly deployed erative AI, we need a principled means of analysis.
in critical areas of our daily life. Applications can This tutorial aims to meet this demand, proposing
improve health literacy (Ufuk, 2023), offer new domain adaptation (DA) theory as a mechanism to
avenues for improved education (Kasneci et al., study the nuanced data issues that plague our mod-
2023), and yield new legal technologies (Chalkidis els; e.g., the linguistic and societal biases induced
et al., 2020). Meanwhile, as the complexity of by long tailed data. We cover statistical tools for:
these models increases, robust decision making, 1. training generative models with reinforcement
1
https://github.com/anthonysicilia/ learning, multi-agent techniques, distillation,
AACL2023-DA4GenerativeAI traditional supervision, and more;
14
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of
the Asia-Pacific Chapter of the Association for Computational Linguistics (Tutorial Abstracts), pages 14–22
November 1–4, 2023. ©2023 Association for Computational Linguistics
s Phan et al. (2023)
• Example: Personalized Education
⋆ Can we personalize generative models for
individualized student experiences?
s Hu et al. (2008)
• Example: Assistive Legal Technologies
Figure 2: Overview of planned topics. Application of
DA to text generation enables more than just obvious
⋆ Can generative models be robust to specifi-
applications (e.g., model transfer). This tutorial focuses cation (e.g., locality) in legal applications?
on these new emerging applications for generative AI. s Abdallah et al. (2023)
• Example: Inclusive and Accessible Dialogue
⋆ Can generative models support users with
2. evaluating the equity, human-likeness, inclusiv-
different preferences and capabilities?
ity, and robustness of generative models;
s Sicilia et al. (2023); Inan et al. (2022)
3. and, decision making in small data regimes –
2. Domain Adaptation Theory: The Basics
e.g., model and dataset selection strategies.
• Learning Theory and Adaptation Bounds
Our accessible presentation of these tools can help
s Redko et al. (2020)
to enable more robust deployment of generative AI.
While DA theory first appeared at ∗ ACL venues • Classifier-based Statistical Distances
over a decade ago (Blitzer et al., 2007), recently, s Ben-David et al. (2010); Sicilia et al.
more and more contemporary works have seen the (2022a);
benefit of carefully analyzing impacts of data-shift • Measuring Model Data-Efficiency
on their models. This is for good reason. DA theory s Shalev-Shwartz and Ben-David (2014); Si-
allows us to answer complex questions like: cilia et al. (2021c)
• Will a pre-trained model generalize to my data? • Domain Adaptation for Generative Models
s Sicilia and Alikhani (2022)
• Can I improve generalization without much data?
3. Inclusive Text-Generation Algorithms
• Is my corpus even large enough to measure bias • Adversarial Training for Domain Alignment
and other language errors of a model? ⋆ Application Areas: Unsupervised and Semi-
Despite its utility, the use of DA theory is supervised Summarization
not wide spread – a quick keyword search on s Ganin et al. (2016); Chen and Chen (2019)
aclanthology provided less than 10 papers at • Other Ways to Align: Semantics and Tokens
∗ ACL venues (excluding our own), which employ
⋆ Application Areas: Out-of-Domain Machine
DA theory2 or related techniques. This tutorial Translation and Low Resource Languages
aims to bring awareness to emerging applications s Štefánik et al. (2023); Phan et al. (2023)
of DA theory to equitable, inclusive, and robust • Adapters and Adapter Soups
generation in an accessible way – connecting DA ⋆ Application Area: Adapting Language Mod-
to more contemporary works whenever applicable. els to New Domains without Training
s Chronopoulou et al. (2022, 2023)
2 Overview of Topics
• Augmentation with Generative Models
We give a presentation plan next. For most topics, ⋆ Applications: Semi-supervised Question-
we highlight application areas or detailed questions Answering, Accessible Dialogue, Counseling
we aim to address (see: ⋆ and italicized text), and s Yang et al. (2017); Parthasarathi et al.
also provide potential reading lists (see: s). (2020); Shen et al. (2020); Inan et al. (2022)
1. Inclusive Generation: Setup and Motivation • Instance Weighting for Generative AI
• Language Models and Data Sources ⋆ Applications: Out-of-Domain Machine
⋆ Which end-users are left behind? Translation and Personalized Dialogue
s Brown et al. (2020) s Wang et al. (2017); Welch et al. (2022)
• Example: Summarizing Medical Records • Domain Adaptive MLM Objectives
⋆ Are domains with limited data impacted? ⋆ Applications: Mental Health Risk Predic-
2
We distinguish between more common DA applications,
tion and other Healthcare Tasks
and theoretical foundations; e.g., as in Redko et al. (2020). s Aragon et al. (2023); Lu et al. (2023)
15
4. Computational Techniques (Activity) been discussed in past tutorials (e.g., transfer learn-
• Confidence intervals and significance ing and learning with limited data), the focus of
⋆ Is my test set large enough? this tutorial is on more rigorous theoretical aspects
s Shalev-Shwartz and Ben-David (2014) of DA and how these techniques can be applied
• Uncertainty and Confidence for Fairness in the, perhaps, unexpected area of equitable and
⋆ Is my model fair to protected demographics? inclusive generation. Our tutorial will also pay
Do I even have enough data to determine this? particular attention to large language models.
s Ethayarajh (2020)
Timing We anticipate each of the 6 numbered
• Transferring Models across Text-Genres top-level sections will take roughly 20 minutes,
⋆ How can I pick datasets when transferring leaving extra time for questions and longer sections.
models to small data regimes like medicine? Every 2 sections can be followed by a break.
s Blitzer et al. (2007); Atwell et al. (2022)
• Supplementing Expertise with Bronze labels 4 Prerequisite Knowledge
⋆ What’s the best annotation protocol when
Some familiarity with text-generation techniques
(domain expert) gold labels are too expensive?
and related tasks is recommended. The tutorial
s Hao and Paul (2019); Elsahar and Gallé
content will be accessible to Senior undergraduate,
(2019); He et al. (2021)
masters, and PhD students. In particular, we as-
5. Equitable Text-Generation
sume no attendee will have experience with DA
• Bias, Representational Harm, & Task Success theory, and plan to explain adaptation bounds and
s Mayfield et al. (2019); Harrington and their distribution distances in an accessible way,
Egede (2023) giving preference to visualizations and high-level
• Defining Bias and Equity in Text-Generation descriptions (over detailed equations). If desired,
s Hendricks et al. (2018); Das and Balke attendees can expound these topics themselves af-
(2022); Sicilia and Alikhani (2023) ter the tutorial, using take-home resources pro-
• Representation Learning and Bias Projection vided during the talk or on the tutorial website
⋆ Applications: Mitigating Social Bias in Text (e.g., python packages, papers, surveys, etc.).3
Embedding and Masked Language Modeling
s Vargas and Cotterell (2020); Yu et al. 5 Related Tutorials
(2023); Kumar et al. (2023)
No tutorial on DA theory for inclusive and eq-
• Data Augmentation and Interventions uitable generation has been provided at an ∗ ACL
⋆ Applications: Toxicity Reduction in Masked venue. With that said, recent tutorials have related
Language Models and Equitable Distillation motivation and complementary coverage.
s Sun et al. (2019); Thakur et al. (2023) Dyer et al. (2016); Church et al. (2022) and
• Reinforcement Learning and Self-Play multiple other tutorials have previously considered
⋆ Applications: Morality, Toxicity, and Bias in deep neural networks for NLP. Deep networks have
Language Models; Bias in Dialogue Systems become a dominating trend and, as noted, their
s Liu et al. (2022); Madanagopal and Caver- complexity poses issues for confident, responsible
lee (2023); Sicilia and Alikhani (2023) decision making as it pertains to training and de-
6. Future Work: TBA, Time Permitting ploying these models for generative applications.
Our tutorial complements these existing tutorials,
3 Tutorial Type and Length and pays careful attention to tools from DA theory
specifically designed for large language models (Si-
This tutorial is meant to be a cutting-edge tutorial
cilia et al., 2022a). Our hope is to make application
and is meant to fill up a 3 hour time slot.
of these models more robust.
Cutting Edge While DA theory has been well Chien (2019) present a tutorial on Deep
studied in ML theory communities, practical ap- Bayesian techniques, Ruder et al. (2019) present
plication for inclusivity, equity, and robustness of a tutorial on transfer learning, Yang et al. (2022)
generative AI is an emerging area. Indeed, most of present a tutorial on learning with limited data, and
the reading-list has been published in ∗ ACL venues 3
https://github.com/anthonysicilia/
across the last few years. While similar areas have AACL2023-DA4GenerativeAI
16
Fisch et al. (2022) present a tutorial on uncertainty tification of linguistic and social biases in large
estimation. These tutorials set the stage for our pro- language models. Previously, he also applied learn-
posed tutorial, since DA theory provides rigorous ing theory in vision, especially small-data medi-
solutions to many of the problems posed within cal applications with a primary focus on bias miti-
these topics. As such, we do expect some topical gation and robust model evaluation (Sicilia et al.,
overlap, but all of the techniques and solutions we 2021a,b,c; Zhao et al., 2022).
present to attendees are likely to be new. Attendees Malihe Alikhani is an expert in natural lan-
that were/are interested in these previous tutorials guage processing (NLP) and machine learning.
will benefit from seeing how DA theory can be Alikhani’s research interests center on using repre-
applied to solve their problems in a new way. sentations of communicative structure to improve
Tripodi and Pelillo (2016) present a tutorial on ethical and practical machine learning models. One
game theory, Belinkov et al. (2020) present a tu- of the main focuses of her recent research has been
torial on interpretability, and Lucic et al. (2022) on studying formal methods of machine learning
present a tutorial on reproducible ML. Each of for designing equitable and robust NLP tasks. This
these tutorials shares a common theme with our pro- includes using tools from learning theory for effi-
posed tutorial: making NLP more robust through cient dialogue management, text generation, classi-
principled analyses. Similar to these tutorials, we fication and measuring and mitigating biases in gen-
will provide the tools for NLP practitioners apply- eration and classification tasks (Atwell et al., 2022;
ing ML to rigorously justify their decision making Sicilia and Alikhani, 2022; Sicilia et al., 2022b;
processes and algorithm designs. Atwell et al., 2021; Sicilia and Alikhani, 2023; Si-
Finally, Chang et al. (2019) present a tutorial on cilia et al., 2022a). Her work in these areas have re-
bias and fairness in NLP. Our tutorial complements ceived three best paper awards at UAI 2022, ACM
this previous tutorial in topic, but presents a new UMAP 2022 and INLG 2021.
perspective: the application of DA theory to this She has designed several task-oriented dialogue
area with a focus on large generative models. systems and conversational QA models (Khalid
et al., 2020b,a; Sicilia et al., 2022b). Her work has
6 Instructors explored data-driven modeling of inferential links
in text and imagery (Alikhani and Stone, 2019),
Anthony Sicilia is a 5th year Ph.D student, special- neural controllable description generation mod-
izing in applications of learning theory and domain els for images (Alikhani et al., 2020b), datasets
adaptation theory to NLP problems such as inclu- and models of coherent diagram interpretation
sivity, equity, and robustness. He has experience (Alikhani and Stone, 2018; Hiippala et al., 2021)
in practical deployment of NLP systems, leading and interpretation of multimodal pointing actions in
an Alexa Prize TaskBot team (focused on inclusiv- human-robot collaboration (Alikhani et al., 2020a).
ity and collaboration) to 3rd place overall in this She has worked on distributional semantic ap-
international contest. He has published 4 papers proaches for modeling lexical aspect of verbs in En-
on robust NLP at ∗ ACL venues, which are present glish and six other languages (Kober et al., 2020).
in the reading list: Atwell et al. (2022); Sicilia and She has also been involved in various projects for
Alikhani (2022); Sicilia et al. (2022b); Sicilia and studying the cognitive science of language use (Per-
Alikhani (2023). He also received a best paper saud et al., 2017) and formal language and au-
award at UAI 2022 for his work on novel PAC- tomata, including probabilistic models of success
Bayesian DA theory for multiclass neural-networks runs in Markov independent trials (Alikhani et al.,
(Sicilia et al., 2022a). His work spans application 2015). Alikhani has collected several corpora anno-
of DA theory to diverse areas such as: analysis of tated by crowdworkers and expert linguists in the
the impact of data-shift on parsers and sentiment area of discourse, multimodality, dialogue, human–
classifiers, dialogue management and generation robot interaction and psycholinguistics (Alikhani
in non-cooperative multi-objective environments, and Stone, 2019; Hiippala et al., 2021; Alikhani
causal analysis of the impact of model/dataset prop- and Stone, 2018; Alikhani et al., 2019a, 2020a).
erties on discourse analysis, human-like dialogue She has designed software for annotation, formal
management and generation, equitable dialogue and ML models for studying communicative intents
management and generation, evaluation of both and the context of human-machine communication
human-likeness and equity in dialogue, and quan- (Alikhani et al., 2019b; Khalid et al., 2020b).
17
References Yonatan Belinkov, Sebastian Gehrmann, and Ellie
Pavlick. 2020. Interpretability and analysis in neural
Abdelrahman Abdallah, Bhawna Piryani, and Adam NLP. In Proceedings of the 58th Annual Meeting of
Jatowt. 2023. Exploring the state of the art in legal the Association for Computational Linguistics: Tu-
qa systems. arXiv preprint arXiv:2304.06623. torial Abstracts, pages 1–5, Online. Association for
Computational Linguistics.
Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard
de Melo, and Matthew Stone. 2019a. CITE: A corpus Shai Ben-David, John Blitzer, Koby Crammer, Alex
of image-text discourse relations. In Proceedings of Kulesza, Fernando Pereira, and Jennifer Wortman
the 2019 Conference of the North American Chap- Vaughan. 2010. A theory of learning from different
ter of the Association for Computational Linguistics: domains. Machine learning, 79(1):151–175.
Human Language Technologies, Volume 1 (Long and
Short Papers), pages 570–575. John Blitzer, Mark Dredze, and Fernando Pereira. 2007.
Biographies, Bollywood, boom-boxes and blenders:
Malihe Alikhani, Baber Khalid, Rahul Shome, Chai- Domain adaptation for sentiment classification. In
tanya Mitash, Kostas Bekris, and Matthew Stone. Proceedings of the 45th Annual Meeting of the Asso-
2020a. That and there: Judging the intent of pointing ciation of Computational Linguistics, pages 440–447,
actions with robotic arms. 34(06):10343–10351. Prague, Czech Republic. Association for Computa-
tional Linguistics.
Malihe Alikhani, Bjørn Kjos-Hanssen, Amirarsalan
Pakravan, and Babak Saadat. 2015. Pricing complex- Tolga Bolukbasi, Kai-Wei Chang, James Y Zou,
ity options. Algorithmic Finance, 4(3-4):127–137. Venkatesh Saligrama, and Adam T Kalai. 2016. Man
is to computer programmer as woman is to home-
Malihe Alikhani, Ethan Selfridge, Matthew Stone, and maker? debiasing word embeddings. Advances in
Michael Johnston. 2019b. Multimodal decisions for neural information processing systems, 29.
conversational ai. In Submission.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie
Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind
Soricut, and Matthew Stone. 2020b. CLUE: Cross- Neelakantan, Pranav Shyam, Girish Sastry, Amanda
modal coherence modeling for caption generation. In Askell, et al. 2020. Language models are few-shot
Proceedings of the 58th Annual Meeting of the Asso- learners. Advances in neural information processing
ciation for Computational Linguistics, pages 6525– systems, 33:1877–1901.
6535.
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka-
Malihe Alikhani and Matthew Stone. 2018. Arrows are siotis, Nikolaos Aletras, and Ion Androutsopoulos.
the verbs of diagrams. In Proceedings of the 27th 2020. Legal-bert: The muppets straight out of law
International Conference on Computational Linguis- school. In Findings of the Association for Computa-
tics, pages 3552–3563. tional Linguistics: EMNLP 2020, pages 2898–2904.
Malihe Alikhani and Matthew Stone. 2019. “Caption” Kai-Wei Chang, Vinodkumar Prabhakaran, and Vicente
as a coherence relation: Evidence and implications. Ordonez. 2019. Bias and fairness in natural language
In Second Workshop on Shortcomings in Vision and processing. In Proceedings of the 2019 Conference
Language (SiVL). on Empirical Methods in Natural Language Process-
ing and the 9th International Joint Conference on
Mario Aragon, Adrián Pastor López Monroy, Luis Gon- Natural Language Processing (EMNLP-IJCNLP):
zalez, David E Losada, and Manuel Montes. 2023. Tutorial Abstracts, Hong Kong, China. Association
Disorbert: A double domain adaptation model for for Computational Linguistics.
detecting signs of mental disorders in social media.
In Proceedings of the 61st Annual Meeting of the Francine Chen and Yan-Ying Chen. 2019. Adversarial
Association for Computational Linguistics (Volume domain adaptation using artificial titles for abstrac-
1: Long Papers), pages 15305–15318. tive title generation. In Proceedings of the 57th An-
nual Meeting of the Association for Computational
Katherine Atwell, Junyi Jessy Li, and Malihe Alikhani. Linguistics, pages 2197–2203, Florence, Italy. Asso-
2021. Where are we in discourse relation recogni- ciation for Computational Linguistics.
tion? In Proceedings of the 22nd Annual Meeting
of the Special Interest Group on Discourse and Dia- Jen-Tzung Chien. 2019. Deep Bayesian natural lan-
logue, pages 314–325, Singapore and Online. Asso- guage processing. In Proceedings of the 57th Annual
ciation for Computational Linguistics. Meeting of the Association for Computational Lin-
guistics: Tutorial Abstracts, pages 25–30, Florence,
Katherine Atwell, Anthony Sicilia, Seong Jae Hwang, Italy. Association for Computational Linguistics.
and Malihe Alikhani. 2022. The change that matters
in discourse parsing: Estimating the impact of do- Alexandra Chronopoulou, Matthew Peters, and Jesse
main shift on parser error. In Findings of the Associa- Dodge. 2022. Efficient hierarchical domain adapta-
tion for Computational Linguistics: ACL 2022, pages tion for pretrained language models. In Proceedings
824–845, Dublin, Ireland. Association for Computa- of the 2022 Conference of the North American Chap-
tional Linguistics. ter of the Association for Computational Linguistics:
18
Human Language Technologies, pages 1336–1351, Minneapolis, Minnesota. Association for Computa-
Seattle, United States. Association for Computational tional Linguistics.
Linguistics.
Christina N Harrington and Lisa Egede. 2023. Trust,
Alexandra Chronopoulou, Matthew E Peters, Alexander comfort and relatability: Understanding black older
Fraser, and Jesse Dodge. 2023. Adaptersoup: Weight adults’ perceptions of chatbot design for health in-
averaging to improve generalization of pretrained formation seeking. In Proceedings of the 2023 CHI
language models. In Findings of the Association Conference on Human Factors in Computing Systems,
for Computational Linguistics: EACL 2023, pages pages 1–18.
2009–2018.
Hangfeng He, Mingyuan Zhang, Qiang Ning, and Dan
Kenneth Church, Valia Kordoni, Gary Marcus, Ernest Roth. 2021. Foreseeing the benefits of incidental
Davis, Yanjun Ma, and Zeyu Chen. 2022. A gen- supervision. In Proceedings of the 2021 Conference
tle introduction to deep nets and opportunities for on Empirical Methods in Natural Language Process-
the future. In Proceedings of the 60th Annual Meet- ing, pages 1782–1800, Online and Punta Cana, Do-
ing of the Association for Computational Linguistics: minican Republic. Association for Computational
Tutorial Abstracts, pages 1–6, Dublin, Ireland. Asso- Linguistics.
ciation for Computational Linguistics.
Lisa Anne Hendricks, Kaylee Burns, Kate Saenko,
Mayukh Das and Wolf Tilo Balke. 2022. Quantify- Trevor Darrell, and Anna Rohrbach. 2018. Women
ing bias from decoding techniques in natural lan- also snowboard: Overcoming bias in captioning mod-
guage generation. In Proceedings of the 29th Inter- els. In Proceedings of the European conference on
national Conference on Computational Linguistics, computer vision (ECCV), pages 771–787.
pages 1311–1323, Gyeongju, Republic of Korea. In-
ternational Committee on Computational Linguistics. Tuomo Hiippala, Malihe Alikhani, Jonas Haverinen,
Timo Kalliokoski, Evanfiya Logacheva, Serafina
Chris Dyer, Yoav Goldberg, and Graham Neubig. 2016. Orekhova, Aino Tuomainen, Matthew Stone, and
Practical neural networks for NLP: From theory to John A Bateman. 2021. Ai2d-rst: A multimodal cor-
code. In Proceedings of the 2016 Conference on pus of 1000 primary school science diagrams. Lan-
Empirical Methods in Natural Language Processing: guage Resources and Evaluation, 55(3):661–688.
Tutorial Abstracts, Austin, Texas. Association for
Computational Linguistics. Dawei Hu, Wei Chen, Qingtian Zeng, Tianyong Hao,
Hady Elsahar and Matthias Gallé. 2019. To annotate Feng Min, and Liu Wenyin. 2008. Using a user-
or not? predicting performance drop under domain interactive qa system for personalized e-learning. In-
shift. In Proceedings of the 2019 Conference on ternational Journal of Distance Education Technolo-
Empirical Methods in Natural Language Processing gies (IJDET), 6(3):1–22.
and the 9th International Joint Conference on Natu- Mert Inan, Yang Zhong, Sabit Hassan, Lorna Quandt,
ral Language Processing (EMNLP-IJCNLP), pages and Malihe Alikhani. 2022. Modeling intensifica-
2163–2173, Hong Kong, China. Association for Com- tion for sign language generation: A computational
putational Linguistics. approach. In Findings of the Association for Com-
Kawin Ethayarajh. 2020. Is your classifier actually putational Linguistics: ACL 2022, pages 2897–2911,
biased? measuring fairness under uncertainty with Dublin, Ireland. Association for Computational Lin-
bernstein bounds. In Proceedings of the 58th Annual guistics.
Meeting of the Association for Computational Lin-
guistics, pages 2914–2919, Online. Association for Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric
Computational Linguistics. Wallace, and Colin Raffel. 2022. Large language
models struggle to learn long-tail knowledge. arXiv
Adam Fisch, Robin Jia, and Tal Schuster. 2022. Uncer- preprint arXiv:2211.08411.
tainty estimation for natural language processing. In
COLING. Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann,
Maria Bannert, Daryna Dementieva, Frank Fischer,
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke
Pascal Germain, Hugo Larochelle, François Lavi- Hüllermeier, et al. 2023. Chatgpt for good? on op-
olette, Mario Marchand, and Victor Lempitsky. 2016. portunities and challenges of large language models
Domain-adversarial training of neural networks. The for education. Learning and Individual Differences,
journal of machine learning research, 17(1):2096– 103:102274.
2030.
Baber Khalid, Malihe Alikhani, Michael Fellner, Brian
Shudong Hao and Michael J. Paul. 2019. Analyzing McMahan, and Matthew Stone. 2020a. Discourse
Bayesian crosslingual transfer in topic models. In coherence, reference grounding and goal oriented
Proceedings of the 2019 Conference of the North dialogue. In Proceedings of the 24th Workshop on
American Chapter of the Association for Computa- the Semantics and Pragmatics of Dialogue - Full
tional Linguistics: Human Language Technologies, Papers, Virtually at Brandeis, Waltham, New Jersey.
Volume 1 (Long and Short Papers), pages 1551–1565, SEMDIAL.
19
Baber Khalid, Malihe Alikhani, and Matthew Stone. 2017. When is likely unlikely: Investigating the vari-
2020b. Combining cognitive modeling and reinforce- ability of vagueness. In Proceedings of the Cognitive
ment learning for clarification in dialogue. In Pro- Science Society Conference.
ceedings of the 28th International Conference on
Computational Linguistics. Long Phan, Tai Dang, Hieu Tran, Trieu Trinh, Vy Phan,
Lam Chau, and Minh-Thang Luong. 2023. Enrich-
Thomas Kober, Stone Matthew Alikhani, Malihe, and ing biomedical knowledge for low-resource language
Mark Steedman. 2020. Aspectuality across genre: A through large-scale translation. In Proceedings of the
distributional semantics approach. 17th Conference of the European Chapter of the As-
sociation for Computational Linguistics, pages 3123–
Deepak Kumar, Oleg Lesota, George Zerveas, Daniel 3134.
Cohen, Carsten Eickhoff, Markus Schedl, and Navid
Rekabsaz. 2023. Parameter-efficient modularised Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc
bias mitigation via AdapterFusion. In Proceedings Sebban, and Younès Bennani. 2020. A survey on
of the 17th Conference of the European Chapter domain adaptation theory: learning bounds and theo-
of the Association for Computational Linguistics, retical guarantees. arXiv preprint arXiv:2004.11829.
pages 2738–2751, Dubrovnik, Croatia. Association
for Computational Linguistics. Anna Rogers, Olga Kovaleva, and Anna Rumshisky.
Ruibo Liu, Ge Zhang, Xinyu Feng, and Soroush 2020. A primer in BERTology: What we know about
Vosoughi. 2022. Aligning generative language mod- how BERT works. Transactions of the Association
els with human values. In Findings of the Associ- for Computational Linguistics, 8:842–866.
ation for Computational Linguistics: NAACL 2022,
pages 241–252, Seattle, United States. Association Sebastian Ruder, Matthew E. Peters, Swabha
for Computational Linguistics. Swayamdipta, and Thomas Wolf. 2019. Transfer
learning in natural language processing. In Proceed-
Keming Lu, Peter Potash, Xihui Lin, Yuwen Sun, Zihan ings of the 2019 Conference of the North American
Qian, Zheng Yuan, Tristan Naumann, Tianxi Cai, and Chapter of the Association for Computational
Junwei Lu. 2023. Prompt discriminative language Linguistics: Tutorials, pages 15–18, Minneapo-
models for domain adaptation. In Proceedings of the lis, Minnesota. Association for Computational
5th Clinical Natural Language Processing Workshop, Linguistics.
pages 247–258.
Shai Shalev-Shwartz and Shai Ben-David. 2014. Un-
Ana Lucic, Maurits Bleeker, Samarth Bhargav, Jessica derstanding machine learning: From theory to algo-
Forde, Koustuv Sinha, Jesse Dodge, Sasha Luccioni, rithms. Cambridge university press.
and Robert Stojnic. 2022. Towards reproducible ma-
chine learning research in natural language process- Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner,
ing. In Proceedings of the 60th Annual Meeting of the Elia Bruni, Barbara Plank, Raffaella Bernardi, and
Association for Computational Linguistics: Tutorial Raquel Fernández. 2019. Beyond task success: A
Abstracts, pages 7–11, Dublin, Ireland. Association closer look at jointly learning to see, ask, and Guess-
for Computational Linguistics. What. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for
Karthic Madanagopal and James Caverlee. 2023. Re- Computational Linguistics: Human Language Tech-
inforced sequence training based subjective bias cor- nologies, Volume 1 (Long and Short Papers), pages
rection. In Proceedings of the 17th Conference of 2578–2587, Minneapolis, Minnesota. Association for
the European Chapter of the Association for Compu- Computational Linguistics.
tational Linguistics, pages 2585–2598, Dubrovnik,
Croatia. Association for Computational Linguistics. Siqi Shen, Charles Welch, Rada Mihalcea, and Verónica
Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, Pérez-Rosas. 2020. Counseling-style reflection gen-
David Gerritsen, Brittany McLaughlin, Ezekiel eration using generative pretrained transformers with
Dixon-Román, and Alan W Black. 2019. Equity augmented context. In Proceedings of the 21th An-
beyond bias in language technologies for education. nual Meeting of the Special Interest Group on Dis-
In Proceedings of the Fourteenth Workshop on Inno- course and Dialogue, pages 10–20, 1st virtual meet-
vative Use of NLP for Building Educational Appli- ing. Association for Computational Linguistics.
cations, pages 444–460, Florence, Italy. Association
for Computational Linguistics. Anthony Sicilia and Malihe Alikhani. 2022. Leather: A
framework for learning to generate human-like text
Prasanna Parthasarathi, Sharan Narang, and Arvind Nee- in dialogue. In Findings of the Association for Com-
lakantan. 2020. On task-level dialogue composition putational Linguistics: AACL-IJCNLP 2022, pages
of generative transformer model. In Proceedings of 30–53.
the First Workshop on Insights from Negative Results
in NLP, pages 41–47. Anthony Sicilia and Malihe Alikhani. 2023. Learning
to generate equitable text in dialogue from biased
Kimele Persaud, Brian McMahan, Malihe Alikhani, training data. In Annual Meeting of the Association
Kevin Pei, Pernille Hemmer, and Matthew Stone. for Computational Linguistics 2023.
20
Anthony Sicilia, Katherine Atwell, Malihe Alikhani, Rocco Tripodi and Marcello Pelillo. 2016. Game theory
and Seong Jae Hwang. 2022a. Pac-bayesian domain and natural language: Origin, evolution and process-
adaptation bounds for multiclass learners. In The ing. In Proceedings of the 54th Annual Meeting of
38th Conference on Uncertainty in Artificial Intelli- the Association for Computational Linguistics: Tu-
gence. torial Abstracts, Berlin, Germany. Association for
Computational Linguistics.
Anthony Sicilia, Jennifer C Gates, and Malihe Alikhani.
2023. How old is gpt?: The humbel framework Furkan Ufuk. 2023. The role and limitations of large
for evaluating language models using human demo- language models such as chatgpt in clinical settings
graphic dat. arXiv preprint arXiv:2305.14195. and medical journalism. Radiology, 307(3):e230276.
Anthony Sicilia, Tristan Maidment, Pat Healy, and Mal- Francisco Vargas and Ryan Cotterell. 2020. Exploring
ihe Alikhani. 2022b. Modeling non-cooperative dia- the linear subspace hypothesis in gender bias miti-
logue: Theoretical and empirical insights. Transac- gation. In Proceedings of the 2020 Conference on
tions of the Association for Computational Linguis- Empirical Methods in Natural Language Processing
tics, 10:1084–1102. (EMNLP), pages 2902–2913, Online. Association for
Computational Linguistics.
Anthony Sicilia, Xingchen Zhao, and Seong Jae Hwang.
2021a. Domain adversarial neural networks for do- Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen,
main generalization: When it works and how to im- and Eiichiro Sumita. 2017. Instance weighting for
prove. arXiv preprint arXiv:2102.03924. neural machine translation domain adaptation. In
Anthony Sicilia, Xingchen Zhao, Davneet S Minhas, Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, pages
Erin E O’Connor, Howard J Aizenstein, William E
1482–1488, Copenhagen, Denmark. Association for
Klunk, Dana L Tudorascu, and Seong Jae Hwang.
Computational Linguistics.
2021b. Multi-domain learning by meta-learning:
Taking optimal steps in multi-domain loss landscapes
by inner-loop learning. In 2021 IEEE 18th Inter- Charles Welch, Chenxi Gu, Jonathan K. Kummerfeld,
national Symposium on Biomedical Imaging (ISBI), Veronica Perez-Rosas, and Rada Mihalcea. 2022.
pages 650–654. IEEE. Leveraging similar users for personalized language
modeling with limited data. In Proceedings of the
Anthony Sicilia, Xingchen Zhao, Anastasia Sos- 60th Annual Meeting of the Association for Compu-
novskikh, and Seong Jae Hwang. 2021c. Pac tational Linguistics (Volume 1: Long Papers), pages
bayesian performance guarantees for deep (stochas- 1742–1752, Dublin, Ireland. Association for Compu-
tic) networks in medical imaging. In Medical Im- tational Linguistics.
age Computing and Computer Assisted Intervention–
MICCAI 2021: 24th International Conference, Stras- Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Guru-
bourg, France, September 27–October 1, 2021, Pro- rangan, Maarten Sap, and Dan Klein. 2021. Detoxi-
ceedings, Part III 24, pages 560–570. Springer. fying language models risks marginalizing minority
voices. In Proceedings of the 2021 Conference of
Michal Štefánik, Marek Kadlcik, and Petr Sojka. 2023. the North American Chapter of the Association for
Soft alignment objectives for robust adaptation of lan- Computational Linguistics: Human Language Tech-
guage generation. In Proceedings of the 61st Annual nologies, pages 2390–2397, Online. Association for
Meeting of the Association for Computational Lin- Computational Linguistics.
guistics (Volume 1: Long Papers), pages 8837–8853,
Toronto, Canada. Association for Computational Lin- Diyi Yang, Ankur Parikh, and Colin Raffel. 2022.
guistics. Learning with limited text data. In Proceedings of the
60th Annual Meeting of the Association for Compu-
Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, tational Linguistics: Tutorial Abstracts, pages 28–31,
Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Dublin, Ireland. Association for Computational Lin-
Belding, Kai-Wei Chang, and William Yang Wang. guistics.
2019. Mitigating gender bias in natural language
processing: Literature review. In Proceedings of the Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, and
57th Annual Meeting of the Association for Computa- William Cohen. 2017. Semi-supervised QA with
tional Linguistics, pages 1630–1640, Florence, Italy. generative domain-adaptive nets. In Proceedings
Association for Computational Linguistics. of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers),
Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, pages 1040–1050, Vancouver, Canada. Association
Paul Pu Liang, and Louis-Philippe Morency. 2023. for Computational Linguistics.
Language models get a gender makeover: Mitigat-
ing gender bias with few-shot data interventions. In Charles Yu, Sullam Jeoung, Anish Kasi, Pengfei Yu, and
Proceedings of the 61st Annual Meeting of the As- Heng Ji. 2023. Unlearning bias in language models
sociation for Computational Linguistics (Volume 2: by partitioning gradients. In Findings of the Associa-
Short Papers), pages 340–351, Toronto, Canada. As- tion for Computational Linguistics: ACL 2023, pages
sociation for Computational Linguistics. 6032–6048.
21
Xingchen Zhao, Chang Liu, Anthony Sicilia, Seong Jae
Hwang, and Yun Fu. 2022. Test-time fourier style
calibration for domain generalization. arXiv preprint
arXiv:2205.06427.
22