Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Kumar, Puneet; Malik, Sarthak; Raman, Balasubramanian; Li, Xiaobai

Computer Science > Multimedia

arXiv:2402.07640 (cs)

[Submitted on 12 Feb 2024 (v1), last revised 18 Oct 2024 (this version, v3)]

Title:Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Authors:Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

View PDF HTML (experimental)

Abstract:The ability to generate sentiment-controlled feedback in response to multimodal inputs comprising text and images addresses a critical gap in human-computer interaction. This capability allows systems to provide empathetic, accurate, and engaging responses, with useful applications in education, healthcare, marketing, and customer service. To this end, we have constructed a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and propose a controllable feedback synthesis system. The system features an encoder, decoder, and controllability block for textual and visual inputs. It extracts features using a transformer and Faster R-CNN networks, combining them to generate feedback. The CMFeed dataset includes images, texts, reactions to the posts, human comments with relevance scores, and reactions to these comments. These reactions train the model to produce feedback with specified sentiments, achieving a sentiment classification accuracy of 77.23\%, which is 18.82\% higher than the accuracy without controllability. The system also incorporates a similarity module for assessing feedback relevance through rank-based metrics and an interpretability technique to analyze the contributions of textual and visual features during feedback generation. Access to the CMFeed dataset and the system's code is available at this https URL.

Subjects:	Multimedia (cs.MM); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.07640 [cs.MM]
	(or arXiv:2402.07640v3 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2402.07640

Submission history

From: Puneet Kumar [view email]
[v1] Mon, 12 Feb 2024 13:27:22 UTC (6,788 KB)
[v2] Thu, 6 Jun 2024 00:26:26 UTC (25,383 KB)
[v3] Fri, 18 Oct 2024 02:50:53 UTC (7,266 KB)

Computer Science > Multimedia

Title:Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators