0% found this document useful (0 votes)
22 views23 pages

Project Themes

The document outlines term project themes for the EEE 486/586 course, focusing on areas such as Graph Neural Networks for NLP, Semantic Communications, State Space Models in NLP, Computational Biology in NLP, and Computational Efficiency in Large Language Models. Students are encouraged to form teams and propose projects based on these themes, with starter resources provided for each topic. The document serves as an introduction to the project themes, with further details on project proposals to be announced later.

Uploaded by

onuralpzoral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

Project Themes

The document outlines term project themes for the EEE 486/586 course, focusing on areas such as Graph Neural Networks for NLP, Semantic Communications, State Space Models in NLP, Computational Biology in NLP, and Computational Efficiency in Large Language Models. Students are encouraged to form teams and propose projects based on these themes, with starter resources provided for each topic. The document serves as an introduction to the project themes, with further details on project proposals to be announced later.

Uploaded by

onuralpzoral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

EEE 486/586

2024-2025 SPRING
-----------

TOPICS for TERM PROJECTS

● Students are expected to propose their term projects subject to the themes given below, where
each theme has its description and starter reading list.

● Students are expected to form teams of 1-4 students by themselves. Teams that may include both
undergraduate and graduate students can be freely composed. The details for project proposals
will be announced later. This document is prepared only to introduce the project themes that you
can work on.

● You can use Moodle utilities to start discussions on finding groups members by coordinating with
the TA.
1) Graph Neural Networks for NLP

Graph signal processing (GSP) is a fast-developing field that studies data residing on irregular
structures such as social, sensor and biological networks in which data can be modelled at the
vertices of a graph. Unlike classical signal processing that processes regular signal structures such
as time series and image data, GSP can be used to analyze a wider range of data that can be
modeled as signals defined on non-Euclidean domains. Graphs are one of the most expressive data
structures which have been used to model a variety of problems. Embedding graphs involve
learning a representation of all nodes (and relations) in the graph which allows to effectively utilize
graphs for various downstream problems.

Traditional neural networks like convolutional networks and recurrent neural networks are
constrained to learn representation of Euclidean data. Graph Neural Networks (GNNs) have
emerged as a versatile framework for modeling unstructured data, making them highly effective for
various NLP tasks. Many NLP problems naturally involve hierarchical or relational data structures
such as syntactic dependency trees, semantic graphs, or co-occurrence networks, which GNNs can
exploit to capture complex dependencies and enrich task-specific representations.

Potential directions within this theme include:

● Graph Construction: Building text graphs from syntactic trees, semantic dependencies, or
document-level relationships.
● Hybrid Architectures: Combining GNNs with transformer models (e.g., BERT-GCN) to enhance
embeddings with relational information.
● Graph-based Interpretability: Using GNNs to explore relationships between linguistic elements
and interpret model predictions.
● Applications: Tackling tasks like document clustering, few-shot learning, graph-based text
summarization, or dynamic graph modeling for evolving datasets.

● Starter Resources:

○ Yao, L., Mao, C., and Luo, Y. (2019). Graph Convolutional Networks for Text Classification.
arXiv preprint arXiv:1809.05679.
○ Lin, K., Liu, Z., Sun, M., and Kuang, K. (2021). BertGCN: Transductive Text Classification by
Combining GNN and BERT. Proceedings of the 2021 Conference on Empirical Methods in
Natural Language Processing (EMNLP).
○ Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive Representation Learning on
Large Graphs. Proceedings of NeurIPS 2017.
○ Nguyen, D. Q., Vu, T., and Phung, D. (2020). A Capsule Network-based Embedding Model
for Dynamic Graphs in NLP. arXiv preprint arXiv:2004.05541.
○ Schlichtkrull, M., Kipf, T., Bloem, P., van den Berg, R., Titov, I., and Welling, M. (2018).
Modeling Relational Data with Graph Convolutional Networks. Proceedings of the
European Semantic Web Conference (ESWC).
○ Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. (2018). Graph
Attention Networks. International Conference on Learning Representations (ICLR).
○ Huang, L., Ma, D., Li, S., Zhang, X., and Wang, H. (2019). Text Level Graph Neural Network
for Text Classification. arXiv preprint arXiv:1910.02356.

○ Aras, A. C., Alikaşifoğlu, T., & Koç, A. (2024). Text-rgnns: Relational modeling for
heterogeneous text graphs. IEEE Signal Processing Letters.

○ Aras, A. C., Alikasifoglu, T., & Koç, A. (2024). Graph Receptive Transformer Encoder for
Text Classification. IEEE Transactions on Signal and Information Processing over Networks.
2) Semantic Communications
Communication can be categorized into two levels: i) transmission of symbols; ii) semantic exchange
of transmitted symbols. The first level of communication concerns about the successful transmission
of symbols which is irrespective of the meaning of the sentence holds. On the other hand, the second
level of communications concerns about the successful transmission of semantic information from
the transmitter to receiver. The semantic information sent from the transmitter and the meaning
interpreted at receiver is called semantic communications.

Current communication systems (such as 4G, LTE) mainly focus on the first level with the advanced
channel/source coding schemes and higher order modulation techniques. However, with the
deployment of advanced NLP models, people start to think about how future communication
systems can be re-designed with NLP (in general AI) such that second level can be realized. With this
approach, paradigm changes from accurately transmitting bits to accurately transmitting semantic
meaning in a goal-oriented fashion. Various new applications based on IoT networks such as
transportation networks, VR/AR, robotics require transmitting of data in the order of zeta-bytes.
With AI based models, we can further compress the data while reserving meaning such that high
data rate communications could be possible.

With the advances in NLP, communication system is re-designed with transformer-based encoder
and decoder schemes [1]. For text transmission, blocks in communication system design (source
encoder-decoder, channel-decoder) are replaced with NLP models. With this approach, aim is to
minimize semantic errors by recovering the meaning of sentences. Compared the traditional system,
this approach can be useful for low latency communications and unstable channel conditions. System
is modified for IoT networks as well in [2] which allows massive multiple-input multiple output
(MIMO) systems. For speech recognition, similar system is adopted in [3]. [4] is a game-theoretic
approach with Bayesian games. Below are the starter papers:

Starter resources are:

o H. Xie, Z. Qin, G. Y. Li and B. -H. Juang, "Deep Learning Enabled Semantic Communication
Systems," in IEEE Transactions on Signal Processing, vol. 69, pp. 2663-2675, 2021, doi:
10.1109/TSP.2021.3071210.

o H. Xie and Z. Qin, “A Lite Distributed Semantic Communication System for Internet of
Things,” IEEE JSAC, vol. 39, no. 1, Jan. 2021, pp. 142–53.

o Z. Weng, Z. Qin and G. Y. Li, "Semantic Communications for Speech Signals," ICC 2021 -
IEEE International Conference on Communications, Montreal, QC, Canada, 2021, pp. 1-6,
doi: 10.1109/ICC42927.2021.9500590.

o B. Guler, A. Yener, and A. Swami, “The semantic communication game,” IEEE Trans. Cogn.
Commun. Netw., vol. 4, no. 4, pp. 787–802, Sep. 2018.

o X. Luo, H. -H. Chen and Q. Guo, "Semantic Communications: Overview, Open Issues, and
Future Research Directions," in IEEE Wireless Communications, vol. 29, no. 1, pp. 210-219,
February 2022, doi: 10.1109/MWC.101.2100269.
o N. Farsad, M. Rao, and A. Goldsmith, “Deep Learning for Joint Source-Channel Coding of
Text,” Proc. 2018 IEEE Int’l. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2018,
pp. 2326–30

o Ozates, T., & Koç, A. (2025). “Semantic Communication over Channels with Insertions,
Deletions, and Substitutions.”, IEEE Communications Letters.

o Ozates, T., Kargı, U., & Koç, A. (2024). “Sememe Based Semantic Communications.”, IEEE
Communications Letters.
3) State Space Models in NLP:

State Space Models (SSMs) have gained significant attention as an alternative to traditional sequence
modeling architectures such as recurrent neural networks (RNNs) and transformers. SSMs leverage
continuous-time dynamics to efficiently model long-range dependencies, making them highly
scalable and effective for natural language processing (NLP) tasks. These models process sequences
in subquadratic or even linear time complexity, offering potential advantages over transformers in
terms of efficiency and memory usage.

Structured State Space Model (S4) is a state space model that efficiently captures long-range
dependencies by maintaining a structured hidden state, which compresses past information instead
of storing all previous time steps. It leverages fast convolutional techniques to compute state
updates in parallel during training, also acts like RNNs in inference time.

Selective State Space Model-Mamba (S6) is a new class of selective SSMs that enhance prior
approaches by addressing key limitations while achieving Transformer-level modeling capacity with
linear scalability in sequence length. A fundamental improvement lies in the selection mechanism,
which enables the model to process input-dependent information efficiently. Unlike previous SSMs,
which struggle to dynamically prioritize relevant data, this approach parameterizes the SSM
components based on input values, allowing the model to selectively retain essential information
while filtering out irrelevant details. Inspired by synthetic tasks such as selective copy and induction
heads, this mechanism strengthens long-term memory retention and improves adaptability to
complex sequences.

Implementing this mechanism efficiently poses a computational challenge, as existing SSMs rely on
time- and input-invariant operations to maintain efficiency. To address this, a hardware-aware
algorithm was developed that processes the model recurrently using a scan operation instead of
convolution, ensuring computational efficiency. A key advantage of this method is that it avoids
materializing the expanded state, thereby reducing input-output overhead and minimizing memory
access inefficiencies across different levels of GPU memory. This optimization allows the model to
maintain high throughput on modern hardware while preserving the advantages of state-space
models.

Starter resources are:

o Albert Gu, Karan Goel, and Christopher Re. “Efficiently Modeling Long Sequences with
Structured State Spaces”, In International Conference on Learning Representations
(ICLR).2021.
o A. Gu and T. Dao, “Mamba: Linear-time Sequence Modeling with Selective State Spaces”,
arXiv preprint arXiv:2312.00752, 2023.
o Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré.
“Hungry Hungry Hippos: Towards Language Modeling with State Space Models”, In
International Conference on Learning Representations (ICLR). 2023.
o Tri Dao and Albert Gu. “Transformers are SSMs: Generalized Models and Efficient
Algorithms Through Structured State Space Duality”, arXiv preprint arXiv:2405.21060,
2024.
o Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, and Tri Dao. “The
Mamba in the Llama: Distilling and Accelerating Hybrid Model”, arXiv preprint
arXiv:2408.15237, 2024.
o Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao,
Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, et al. “An Empirical Study
of Mamba-based Language Models”, arXiv preprint arXiv:2406.07887, 2024.
o Ling Yue, Sixue Xing, Yingzhou Lu, and Tianfan Fu. “BioMamba: A Pre-trained Biomedical
Language Representation Model Leveraging Mamba", arXiv preprint arXiv:2408.02600,
2024.
o Annotated S4
o Mamba Blog
4) Computational Biology in NLP:

Large Language Models (LLMs) are revolutionizing computational biology by enabling semantic-level
understanding and contextual reasoning across genomic, transcriptomic, and proteomic data. Traditional
bioinformatics approaches rely on statistical and alignment-based methods that focus on direct sequence
matching, often missing higher-order relationships. LLMs, however, can process biological sequences as
structured language, capturing complex dependencies and functional patterns beyond simple symbol-
based representations.

In genomics, LLMs are being used to predict the functional impact of genetic variations, model gene
regulatory interactions, and generate synthetic DNA sequences with specific properties. By leveraging
self-supervised pretraining on massive genomic datasets, these models learn biologically meaningful
embeddings, improving variant classification and disease association studies.

For transcriptomics, LLMs enhance cell-type annotation, differential gene expression analysis, and RNA-
seq interpretation by contextualizing gene relationships across different cellular states. Instead of relying
on pairwise correlations, these models can infer regulatory networks and predict transcriptomic
responses to perturbations, advancing personalized medicine and drug discovery.

In proteomics, LLMs are being applied to protein folding, function prediction, and protein-protein
interactions. By treating amino acid sequences as structured text, transformer-based architectures such
as AlphaFold and ESM models have demonstrated state-of-the-art performance in predicting 3D protein
structures and discovering novel functional motifs.

A major advantage of LLMs in computational biology is their ability to generalize across different biological
datasets and integrate multimodal omics data. Unlike traditional models that require extensive
handcrafted features, LLMs can learn representations directly from raw sequences, allowing for zero-shot
and few-shot learning in unseen biological contexts. As these models continue to advance, they are
expected to reshape bioinformatics workflows, enhance biomarker discovery, and accelerate biomedical
research through scalable, interpretable, and high-performance biological sequence analysis.

Starter resources are:

o Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., & Wang, B. (2024). scGPT: toward building
a foundation model for single-cell multi-omics using generative AI. Nature Methods, 1-11.
o Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., ... & Yao, J. (2022). scBERT as a large-
scale pre-trained deep language model for cell type annotation of single-cell RNA-seq data. Nature
Machine Intelligence, 4(10), 852-866.
o Hu, M., Alkhairy, S., Lee, I., Pillich, R. T., Fong, D., Smith, K., ... & Pratt, D. (2024). Evaluation of
large language models for discovery of gene set function. Nature Methods, 1-10
o Zhuo, L., Chi, Z., Xu, M., Huang, H., Zheng, H., He, C., ... & Zhang, W. (2024). Protllm: An interleaved
protein-language llm with protein-as-word pre-training. arXiv preprint arXiv:2403.07920.
o Fang, C., Wang, Y., Song, Y., Long, Q., Lu, W., Chen, L., ... & Li, X. (2024). How do large language
models understand genes and cells. ACM Transactions on Intelligent Systems and Technology
5) Computational Efficiency in Large Language Models
Large language models (LLMs) achieve state-of-the-art performance but come with high computational
costs, making efficiency a crucial research area as we have recently seen the impact the release of
DeepSeek has made. Various methods aim to reduce resource consumption while maintaining
performance, including few-shot learning, knowledge distillation, model pruning, quantization,
parameter-efficient fine-tuning (PEFT) methods like LoRA, and efficient architectures like Mixture of
Experts (MoE). Few-shot learning enables LLMs to generalize from minimal data, reducing the need for
fine-tuning. PEFT techniques, such as Low-Rank Adaptation (LoRA) and adapters, allow models to be fine-
tuned efficiently by updating only a small subset of parameters. Lightweight models, such as distilled or
sparsely activated models, offer competitive results with significantly lower inference costs. These
approaches are essential for democratizing AI, enabling deployment on edge devices, and reducing
environmental impact.

Given the recent reports, DeepSeek pushes the boundaries of these techniques by combining cutting-
edge computational optimizations with an architecture designed for high efficiency. By incorporating
methods like sparse activation and selective token processing, DeepSeek minimizes unnecessary
computations, which drastically reduces resource consumption during both training and inference. This
makes it possible to achieve near state-of-the-art performance while using fewer computational
resources, positioning DeepSeek as a major step forward in the quest for more sustainable AI.

Starter resources are:

o Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., ... & Piao, Y. (2024). Deepseek-v3
technical report. arXiv preprint arXiv:2412.19437.
o Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., ... & He, Y. (2025). Deepseek-r1:
Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint
arXiv:2501.12948.
o Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-
rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
o Yan, M., Wang, Y., Pang, K., Xie, M., & Li, J. (2024, August). Efficient mixture of experts
based on large language models for low-resource data preprocessing. In Proceedings of
the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 3690-
3701).

o Brown et al., 2020 – Language Models are Few-Shot Learners (GPT-3)


https://arxiv.org/abs/2005.14165
Introduces GPT-3 and demonstrates few-shot learning capabilities, reducing the need for
labeled data.

o Hu et al., 2021 – LoRA: Low-Rank Adaptation of Large Language Models


https://arxiv.org/abs/2106.09685
Proposes LoRA, a fine-tuning method that significantly reduces memory usage and
computation by learning low-rank updates.
o Dettmers et al., 2022 – 8-bit Optimizers via Block-wise Quantization
https://arxiv.org/abs/2208.07339
Proposes quantization techniques to reduce memory and computation requirements
while maintaining performance.
o Touvron et al., 2023 – LLaMA: Open and Efficient Foundation Language Models
https://arxiv.org/abs/2302.13971
Introduces LLaMA, a smaller yet competitive model, showing that well-trained compact
architectures can rival larger counterparts.
o Zhang et al., 2022 – OPT: Open Pre-trained Transformer Language Models
https://arxiv.org/abs/2205.01068
Describes efficient scaling strategies for training open-source LLMs.
o Fedus et al., 2022 – Switch Transformers: Scaling to Trillion Parameter Models with Sparse
Computation
https://arxiv.org/abs/2101.03961
Introduces sparse activation techniques using Mixture of Experts (MoE) to improve
efficiency.
6) NLP and Vision-Language Models For Medical Imaging

Natural Language Processing (NLP) and Large Language Models (LLMs) are changing the way medical
imaging is analyzed by making automated report generation, image-text alignment, and clinical decision
support possible. Vision-language models (VLMs) combine computer vision with NLP to help interpret
radiology images, generate structured reports, and support doctors in diagnosing complex conditions.
These models use large medical datasets to improve accuracy, efficiency, and consistency in clinical
workflows. You can explore the intersection of NLP and medical imaging. Your project can focus on
medical captioning, radiology report summarization, pathology detection, or multimodal learning using
both vision and text to improve diagnostic support. You can experiment with pre-trained models, fine-
tune existing architectures, or develop new methods to bridge the gap between medical imaging and NLP.

Starter resources:

o “Sloan, Phillip, et al. "Automated Radiology Report Generation: A Review of Recent Advances."
IEEE Reviews in Biomedical Engineering (2024).”:
o “Miura, Yasuhide, et al. "Improving factual completeness and consistency of image-to-text
radiology report generation." arXiv preprint arXiv:2010.10042 (2020).”
o “Zhao, Brian Nlong, et al. "Large Multimodal Model for Real-World Radiology Report Generation."

o “Pellegrini, Chantal, et al. "RaDialog: A large vision-language model for radiology report
generation and conversational assistance." arXiv preprint arXiv:2311.18681 (2023).”

o https://github.com/lab-rasool/Awesome-Medical-VLMs-and-datasets,

o https://www.frontiersin.org/articles/10.3389/frai.2024.1430984/full Hartsock et al., "Vision-


language models for medical report generation and visual question answering: a review."

o https://www.medrxiv.org/content/10.1101/2024.10.23.24316003v1, Huang et al., "Multimodal


Foundation Models for Medical Imaging- A Systematic Review and Implementation Guidelines"
7) Language Games

Since having a complex language is a feature that differentiates humans from other living creatures,
creating agents that can communicate with each other is one of the primary objectives of artificial
intelligence (AI). One way to establish that communication is with games. Games with AI in general
gained a huge interest after Deep Blue’s success against former chess champion Garry Kasparov in
1997. That victory showed that powerful games which are successful to solve well-defined hard tasks
can be indeed implemented. Thus, since natural language between agents can be seen as an
interactive game itself, there has been a growing interest in creating language games using NLP
techniques to mimic human behaviours as well as to achieve some goals. It is also believed that
achieving advanced machine intelligence in NLP is possible with interactive language games.

Starter resources:

o Yao, Y., Zhong, H., Zhang, Z., Han, X., Wang, X., Xiao, C., Zeng, G., Liu, Z., Sun. M. 2021.
Adversarial Language Games for Advanced Natural Language Intelligence. Proceedings of
the AAAI Conference on Artificial Intelligence, 35(16), 14248-14256, 2021.

o Lazaridou, A., Peysakhovich, A. and Baroni, M. 2016. Multi-agent cooperation and the
emergence of (natural) language”, arXiv preprint arXiv: 1612.07182.

o Havrylov S. and Titov I.. 2017. Emergence of language with multi-agent games: Learning
to communicate with sequences of symbols, In Advances in neural information processing
systems, pages 2149–2159.

o Khani, F., Goodman, N. D., and Liang, P. 2018. Planning, inference and pragmatics in
sequential language games, Transactions of the Association for Computational Linguistics,
6:543–555.
8) Bias in NLP

Many NLP algorithms inherit unwanted social bias such as racial and gender from the corpora they
are trained on. One simple example is that an NLP algorithm can perceive the “programmer” more
likely as a male occupation where the same algorithm matches females to be “housekeeper”. There
are a lot more examples in many applications in NLP and in various forms like racial and religious.
The research on this topic is focused on detecting, measuring, and eliminating the unwanted bias.
The starter papers on this topic are as follows:

Starter Resources:

o Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., and Kalai, A. 2016. Man is to computer
programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of
the 30th International Conference on Neural Information Processing Systems, NIPS’16,
4356–4364, Red Hook, NY, USA. Curran Associates Inc.

o Caliskan, A., Bryson, J. J., and Narayanan, A. 2017. Semantics derived automatically from
language corpora contain human-like biases. Science, 356(6334):183–186.

o Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J. 2018. Word embeddings quantify 100
years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences,
115(16):3635–3644.

o Manzini, T., Yao Chong, L., Black, A. W., and Tsvetkov, Y. 2019. Black is to criminal as
caucasian is to police: Detecting and removing multiclass bias in word embeddings1. In
Proceedings of the 2019 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short
Papers), pp. 615–621, Minneapolis, Minnesota. Association for Computational Linguistics.
9) Diachronic Study of Language / Lexical Semantic Change

Languages, as the societies that construct them, evolve with the progression of humankind. The
changes in word meanings are regarded as the semantic change and in the diachronic study the aim
is to identify and categorize these changes. Though semantic change can occur due to linguistic
reasons, cultural phenomena also affect the language as expected. One of the most contemporary
examples is with the word “Corona”. This word which shapes the recent events in our lives used to
mean disc for astronomical units.

With the advances in NLP, such as but not limited to modern word embedding architectures, it is
possible to computationally study Semantic Change. By comparing word embeddings trained over
historical corpora, one can demonstrate not only qualitative aspects of Semantic Change but also
attain quantitative results. Here you can find the Pioneer Works in Diachronic Study, thorough
surveys, and state-of-the-art designs:

Starter Resources:

o Hamilton, W. L., Lekovec, J., and Jurafsky, D. 2016. Diachronic Word Embeddings Reveal
Statistical Laws of Semantic Change. Proceedings of the 54th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers).

o Petersen, A., Tenenbaum, J., Havlin, S. et al. 2012. Languages cool as they expand:
Allometric scaling and the decreasing need for new words. Scientific Reports 2, 943.

o Yuksel, A., Ugurlu, B. and Koç, A. 2021. Semantic Change Detection with Gaussian Word
Embeddings, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 29,
pp. 3349-3361.

o Tang, X. 2018. A state-of-the-art of semantic change computation,” Natural Language


Engineering, vol. 24, no. 5, pp. 649–676.

o Kutuzov, A., Øvrelid, L., Szymanski, T. and Velldal, E. 2018. Diachronic word embeddings
and semantic shifts: A survey,” in Proc. 27th Int. Conf. Comput. Linguistics, Santa Fe, New
Mexico, USA: Association for Computational Linguistics, pp. 1384–1397.

o Tahmasebi, N., Borin, L. and Jatowt, A. 2019. Survey of computational approaches to


lexical semantic change,” 2019, arXiv: 1811.06278.
10) Vision + NLP
"Vision + NLP" combines NLP with image processing and computer vision to enable machines to
understand and generate both visual and textual information. This allows for a deeper understanding
of the relationship between the two types of data, and enables the creation of more sophisticated
systems that can handle both text and images. Some examples of applications that use the
combination of NLP and computer vision include:

1. Image captioning: where a model generates a natural language description of an image.


2. Visual question answering: where a model answers questions about an image in natural
language.
3. Text-to-image generation: where a model generates an image based on a natural
language description.
4. Image search by text: where a model searches for images that match a natural language
query.
5. Object detection and recognition in images using NLP techniques to understand the
context and relationships between objects in images.
6. Image tagging and annotation, which use NLP to describe the contents of an image for
improved searchability and organization.
7. Image-text matching: which is used in various applications like retrieval and
recommendation systems, where an image and its associated text are used to make
decisions.
8. Optical Character Recognition (OCR): Conversion of scanned images, PDFs, and other
documents into editable and searchable text.

Starter Resources:

o A. Mogadala, M. Kalimuthu, and D. Klakow, “Trends in integration of vision and language


research: A survey of tasks, datasets, and methods,” Journal of Artificial Intelligence
Research, vol. 71, pp. 1183–1317, Aug. 2021.
o P. Anderson et al., "Bottom-Up and Top-Down Attention for Image Captioning and Visual
Question Answering," 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 2018, pp. 6077-6086.
o K. Han, Y. Wang, H. Chen, et al., “A survey on vision transformer,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87–110, 2023.
o R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image
synthesis with latent diffusion models,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2022
o K. Xu, J. Ba, R. Kiros, et al., “Show, attend and tell: Neural image caption generation with
visual attention,” in International conference on machine learning, PMLR, 2015, pp.
2048–2057.
o S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial
text to image synthesis,” in Proceedings of The 33rd International Conference on Machine
Learning, PMLR, 2016.
o L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, Visualbert: A simple and
performant baseline for vision and language, 2019.
11) Gaussian Word Embeddings

Conventional word embeddings treat all words as points in semantic vector spaces. For that reason,
such models are occasionally called point embeddings. Regardless of the semantic extent of words,
they are represented as single points in the semantic space. There is no difference between
representation of a conjunction which has no meaning alone and a word including many side
meanings or covering a broad semantic extent. Gaussian word embeddings encode word semantics
as Gaussian distributions. In this model, words are represented as multivariate Gaussians with mean
vectors and covariance matrices. Variance values are able to represent semantic uncertainty levels
of words, unlike traditional word embeddings. Several enhancements are also proposed by obtaining
Gaussian word embeddings in a multimodal scheme.

Starter Resources:

o L. Vilnis and A. McCallum, “Word representations via Gaussian embedding,” in 3rd


International Conference on Learning Representations, ICLR 2015, Conference Track
Proceedings, (San Diego, CA, USA), 2015

o B. Athiwaratkun and A. Wilson, “Multimodal word distributions,” in Proceedings of the


55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers), (Vancouver, Canada), pp. 1645–1656, Association for Computational Linguistics,
July 2017.
12) Schizophrenia, Psychosis and other mental problems & NLP

Schizophrenia is a severe mental disorder where almost 0.5% of the adults in the world are suffering
from it. One of the symptoms for such mental disorders is the incoherent and/or disorganized speech
and text. Recently, NLP tools are started to be used for detecting and analyzing psychosis and other
mental problems. More generally, there are efforts at the intersection of computational linguistics
and clinical psychology which are consolidated on “CLPhych: The Workshop on Computational
Linguistics and Clinical Psychology CLPsych, a workshop series founded in 2014” https://clpsych.org/

Starter Resources:

o Iter, D., Yoon, J. and Jurafsky, D. 2018. Automatic Detection of Incoherent Speech for
Diagnosing Schizophrenia. 136-146. Proceedings of the Fifth Workshop on Computational
Linguistics and Clinical Psychology: From Keyboard to Clinic.

o Tang, S.X., Kriz, R., Cho, S. et al. 2021. Natural language processing methods are sensitive
to sub-clinical linguistic differences in schizophrenia spectrum disorders. npj Schizophr 7,
25.

o Corcoran, C. M., Mittal, V. A., Bearden, C. E., E Gur, R., Hitczenko, K., Bilgrami, Z., Savic,
A., Cecchi, G. A., & Wolff, P. 2020. Language as a biomarker for psychosis: A natural
language processing approach. Schizophrenia research, 226, 158–166.
13) Signal processing meets NLP

Due to exceptional performances in numerous tasks, deep learning models are utilized in other
different but close research fields such as information retrieval and signal processing. Merging these
fields leads to develop a mutualistic relationship by employing characteristic tools of each field to
build a single model. Despite their task performances, deep learning models can suffer from
computational cost due to the huge number of parameters and complexity of the models. For
example, the attention layer in a transformer-based language model isreplaced with a Fourier
Transform layer to deliver a solution to the computational overhead of the models without sacrificing
the task performance much. On the other hand, spectral filters are utilized to extract useful
information for tasks at different scales such as word-level, utterance-level or document-level.

Starter Resources:

o Tamkin, A., Jurafsky, D. and Goodman, N. 2020. Language through a prism: A spectral
approach for multiscale language representations. In Advances in Neural Information
Processing Systems, volume 33, pages 5492–5504.

o Backurs, A., Chen, M. and Gimpel, K. 2021. A note on more efficient architectures for NLP.
http://www.mit.edu/~backurs/NLP.pdf

o Lee-Thorp, J., Ainslie, J., Eckstein, I. and Ontanon, S. 2021. FNet: Mixing Tokens with
Fourier Transforms. arXiv preprint arXiv:2105.03824.

o Şahinuç, F., & Koç, A. (2022). Fractional Fourier transform meets transformer encoder.
IEEE Signal Processing Letters, 29, 2258-2262.
14) Quaternion Networks for NLP
Most of the traditional machine learning architectures are designed for inputs in real space. To
evaluate multi-dimensional inputs from a different perspective, deep learning architectures can be
designed with hypercomplex number, moving beyond real number space. One such number system
is the quaternion numbers. Quaternions are defined as 4 dimensional hypercomplex numbers, with
3 imaginary components and a real component. A quaternion Q ϵ H, is defined as following
Q=r+xi+yj+zk, where r is the real component, and x, y, z ϵ R represent the imaginary components of
the quaternion. With the corresponding multiplication rules, algebraic rules can be defined in the
quaternion space. Even though some algebraic properties are lost when moving beyond real
numbers to higher dimensions, designing machine learning architectures using quaternion algebra
can reduce the number of parameters up to 75% without causing a significant decrease in the
performance. Since state-of-the-art models in NLP are heavily parametrized, this could reduce
training time of the models. Furthermore, due to the properties of quaternion algebra, the
interactions between variables can be expressed better by designing the architectures in quaternion
space. Quaternion Transformer and Quaternion Attention models were designed to implement NLP
architectures in quaternion space.

Quaternion numbers can also fit better to multi-dimensional problems. In different works, RGB data
was fit to quaternion numbers, and Quaternion Convolutional Neural Networks (QCNN) were used
for image processing tasks. An example usage of quaternion networks in NLP domain can be by
considering the multi-sense nature of language. Quaternion Transformer, Quaternion Attention, and
Quaternion Recurrent Neural Network (QRNN) models can be used to implement NLP tasks in
quaternion space. Other than using multi-dimensional inputs that can be fit to quaternion numbers,
real inputs can be treated as the concatenation of quaternion numbers to fasten the training time
and reduce computational cost of NLP tasks.

Starter Resources:
o H.-D. Schutte and J. Wenzel, “Hypercomplex numbers in Digital Signal Processing,” IEEE
International Symposium on Circuits and Systems.
o Y. Tay, A. Zhang, A. T. Luu, J. Rao, S. Zhang, S. Wang, J. Fu, and S. C. Hui, “Lightweight and
efficient neural natural language processing with quaternion networks,” Proceedings of
the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
o C. J. Gaudet and A. S. Maida, “Deep Quaternion Networks,” 2018 International Joint
Conference on Neural Networks (IJCNN), 2018.
o T. Parcollet, M. Morchid, and G. Linares, “Quaternion convolutional neural networks for
heterogeneous image processing,” ICASSP 2019 - 2019 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2019.
o T. Parcollet, M. Morchid, and G. Linares, “Deep quaternion neural networks for spoken
language understanding,” 2017 IEEE Automatic Speech Recognition and Understanding
Workshop (ASRU), 2017.
15) Legal NLP

The interactions between law and artificial intelligence (AI) fields have indeed a long history,
beginning with ideas in 1970s, establishing an active community in 1987, and with further
resurrected developments after the so-called machine learning revolution. NLP tools are great
helpers for law professional in dealing with huge amount of texts and documents. Very recently, AI
and NLP based methods that provide automated solutions for problems in legal domain have
accelerated. In commercial domain technologies called LegalTech are also on the go.

Starter Resources:

Some important papers are listed below. However, start checking the websites of Natural Legal
Language Processing Workshops (NNLP) organized within the leading NLP-based AI conference
EMNLP for the most recent works: https://nllpw.org/workshop/

o Ashley, K. D. and Brüninghaus, S. 2009. Automatically classifying case texts and predicting
outcomes. Artificial Intelligence and Law, 17(2):125–165.

o Chalkidis, I. and Kampas, D. 2019. Deep learning in law: Early adaptation and legal word
embeddings trained on large corpora. Artificial Intelligence and Law, 27(2):171–198.

o Bach, N. X., Minh, N. L., Oanh, T. T., and Shimazu, A. 2013. A two-phase framework for
learning logical structures of paragraphs in legal articles. ACM Transactions on Asian
Language Information Processing, 12(1).

o Mumcuoğlu, E., Öztürk, C. E., Ozaktas, H. M., and Koc ,̧ A. 2021. Natural language
processing in law: Prediction of outcomes in the higher courts of Turkey. Information
Processing & Management, 58(5):102684.

o Chalkidis, I., Jana, A., Hartung, D., Bommarito, M. J., Androutsopoulos, I., Katz, D. M., and
Aletras, N. 2022. LexGLUE: A benchmark dataset for legal language understanding in
English. In Proceedings of ACL 2022.

o Zheng, L., Guha N., Anderson, B.R., Henderson, P., and Ho, D.E. 2021. When does
pretraining help? Assessing self-supervised learning for law and the CaseHOLD dataset of
53,000+ legal holdings. In Proceedings of the Eighteenth International Conference on
Artificial Intelligence and Law (ICAIL '21), 159–168.

o Niklaus, J., Matoshi, V., Rani, P., Galassi, A., Stürmer, M., Chalkidis, I. LEXTREME: A Multi-
Lingual and Multi-Task Benchmark for the Legal Domain, arXiv preprint:
arXiv:2301.13126v1.
16) Sememes
Words are the smallest elements of natural languages that can stand by themselves, but they are not
the smallest indivisible semantic units. Semantic units called sememes are the minimum semantic
units of word meaning just like the elements of the periodic table being the indivisible (of course
chemically) building blocks of matter. As a simplistic example, the word school can be considered as
a combination of the meanings of education and building, while the word hospital can be considered
as the combination of medicine and building. In these examples, education, medicine and building
are sememes, and words school and hospital are annotated as combinations of those sememes.

Sememes have proven themselves to be useful in various NLP tasks such as word similarity
computation, word representation learning, sentiment analysis, and lexicon expansion. However,
both the construction of a proper predefined sememe set and the annotation of words with
appropriate sememes from this predefined set are quite challenging tasks. Previously, linguistic
experts have done these laborious tasks manually over a timespan of years. There are interesting
ongoing research efforts on sememes with different objectives including automatically compiling
sememe lists, annotating words with these sememes, and improving several machine learning
architectures and methods with sememe knowledge.

Starter resources are:

o Dong, Z. and Dong, Q. 2004. HowNet - a hybrid language and knowledge resource. In the
International Conference on Natural Language Processing and Knowledge Engineering,
IEEE.

o Qin, Y., Qi, F., Ouyang, S., Liu, Z., Yang, C., Wang, Y., Liu, Q. and Sun, M. 2020. Improving
sequence modeling ability of recurrent neural networks via sememes. IEEE ACM
Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2364–2373.

o Zhang, Y., Yang, C., Zhou, Z. and Liu, Z. 2020. Enhancing transformer with sememe
knowledge. In Proceedings of the 5th Workshop on Representation Learning for NLP.
Stroudsburg, PA, USA: Association for Computational Linguistics.

o Xie, R., Yuan, X., Liu, Z. and Sun, M. 2017. Lexical sememe prediction via word embeddings
and matrix factorization. In Proceedings of the Twenty- Sixth International Joint
Conference on Artificial Intelligence. California: International Joint Conferences on
Artificial Intelligence Organization.

o Qi, F., Chen Y., Wang, F., Liu, Z., Chen X. and Sun, M. 2021. Automatic Construction of
Sememe Knowledge Bases via Dictionaries. ACL Findings.
17) Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is an emerging paradigm in Natural Language Processing that
combines the strengths of large language models with external knowledge retrieval systems. RAG
models enhance the capabilities of traditional language models by incorporating relevant information
from external sources during generation, leading to more accurate, up-to-date, and verifiable outputs.
This approach addresses key limitations of standalone language models such as hallucinations, outdated
knowledge, and lack of source attribution.

RAG systems typically consist of two main components: a retriever that identifies and fetches relevant
information from a knowledge base, and a generator that produces outputs based on both the retrieved
information and the input query. Recent advancements in RAG include dynamic knowledge integration,
multi-modal extensions, domain-specific adaptations, and self-reflective mechanisms for improved
factuality.

Applications of RAG span various domains including question-answering, document summarization, fact-
checking, and domain-specific knowledge tasks. The technology has shown particular promise in fields
requiring high accuracy and verifiability, such as healthcare, legal analysis, and scientific research.

Starter Resources are:

 Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In
Proceedings of the 34th International Conference on Neural Information Processing Systems
(NIPS '20).
 Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to retrieve, generate,
and critique through self-reflection. In The Twelfth International Conference on Learning
Representations.
 Gao, Y., et al. (2024). Retrieval-Augmented Generation for Large Language Models: A Survey.
arXiv preprint arXiv:2312.10997.
 Gao, Y., Xiong, Y., Wang, M., and Wang, H. (2024). Modular RAG: Transforming RAG Systems
into LEGO-like Reconfigurable Frameworks. arXiv preprint arXiv:2407.21059.
 Wu, S., et al. (2024). Retrieval-Augmented Generation for Natural Language Processing: A
Survey. arXiv preprint arXiv:2407.13193.
 Kang, M., Gürel, N. M., Yu, N., Song, D., & Li, B. (2024). C-RAG: Certified generation risks for
retrieval-augmented language models. In Proceedings of the 41st International Conference on
Machine Learning (ICML'24)
 Islam, S. B., et al. (2024). Open-RAG: Enhanced retrieval augmented reasoning with open-source
large language models. In Findings of the Association for Computational Linguistics: EMNLP 2024
 Wang, X., et al. (2024). Searching for best practices in retrieval-augmented generation.
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing.
 Edge, D., et al. (2024). From Local to Global: A Graph RAG Approach to Query-Focused
Summarization. arXiv preprint arXiv:2404.16130.
 Li, S., et al. (2024). GraphReader: Building graph-based agent to enhance long-context abilities of
large language models. In Findings of the Association for Computational Linguistics: EMNLP
2024.
 Sarmah, B., et al. (2024). HybridRAG: Integrating knowledge graphs and vector retrieval
augmented generation for efficient information extraction. In Proceedings of the 5th ACM
International Conference on AI in Finance (ICAIF '24).
 Zhang, T., et al. (2024). RAFT: Adapting language model to domain specific RAG. In First
Conference on Language Modeling.
 Jin, C., et al. (2024). RAGCache: Efficient Knowledge Caching for Retrieval-Augmented
Generation. arXiv preprint arXiv:2404.12457.

You might also like